Posts Tagged ‘Population genetics’

Most significant article published in the Society of Evolution, Medicine and Public Health won Prize: polygenic scores, polygenic adaptation, and human phenotypic differences

Reporter: Aviva Lev-Ari, PhD, RN 


UPDATED on 8/30/2020

Analysis of polygenic risk score usage and performance in diverse human populations


A historical tendency to use European ancestry samples hinders medical genetics research, including the use of polygenic scores, which are individual-level metrics of genetic risk. We analyze the first decade of polygenic scoring studies (2008–2017, inclusive), and find that 67% of studies included exclusively European ancestry participants and another 19% included only East Asian ancestry participants. Only 3.8% of studies were among cohorts of African, Hispanic, or Indigenous peoples. We find that predictive performance of European ancestry-derived polygenic scores is lower in non-European ancestry samples (e.g. African ancestry samples: t = −5.97, df = 24, p = 3.7 × 10−6), and we demonstrate the effects of methodological choices in polygenic score distributions for worldwide populations. These findings highlight the need for improved treatment of linkage disequilibrium and variant frequencies when applying polygenic scoring to cohorts of non-European ancestry, and bolster the rationale for large-scale GWAS in diverse human populations.



The Voice of Prof. Marcus W. Feldman

You might be interested in the paper “interpreting polygenic scores, polygenic adaptation, and human phenotypic differences” by N. Rosenberg, M. Edge, J. Pritchard, and M. Feldman, published in Evolution, Medicine and Public Health  (2019).    Rosenberg and Pritchard are my former PhD students, both full professors at Stanford, and M.Edge is a student of Rosenberg.


On Aug 28, 2020, at 4:36 PM, Horowitz, Barbara Natterson <natterson-horowitz@fas.harvard.edu> wrote:

Dear Dr. Rosenberg,

It is my pleasure in my role as President of the International Society for Evolution, Medicine and Public Health to inform you that your 2019 EMPH article, “Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences” has won The George C. Williams Prize which is awarded each year to the first author of the most significant article published in the Society’s flagship journal, Evolution, Medicine and Public Health.  

The Prize recognizes the contributions of George C. Williams to evolutionary medicine and aims to encourage and highlight important research in this growing field. It includes $5,000 and an invitation to present at the online lecture series, Club EvMed. The Prize is made possible by donations from Doris Williams, Randolph Nesse, and other supporters of EMPH.

The winning article:


Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences

Evolution, Medicine, and Public Health, Volume 2019, Issue 1, 2019, Pages 26–34, https://doi.org/10.1093/emph/eoy036
27 December 2018

Article history



Recent analyses of polygenic scores have opened new discussions concerning the genetic basis and evolutionary significance of differences among populations in distributions of phenotypes. Here, we highlight limitations in research on polygenic scores, polygenic adaptation and population differences. We show how genetic contributions to traits, as estimated by polygenic scores, combine with environmental contributions so that differences among populations in trait distributions need not reflect corresponding differences in genetic propensity. Under a null model in which phenotypes are selectively neutral, genetic propensity differences contributing to phenotypic differences among populations are predicted to be small. We illustrate this null hypothesis in relation to health disparities between African Americans and European Americans, discussing alternative hypotheses with selective and environmental effects. Close attention to the limitations of research on polygenic phenomena is important for the interpretation of their relationship to human population differences.


We are currently witnessing a surge in public interest in the intersection of evolutionary genetics with such topics as cognitive phenotypes, disease, race and heritability of human traits [1–7]. This attention emerges partly from recent advances in genomics, including the introduction of polygenic scores—the aggregation of estimated effects of genome-wide variants to predict the contribution of a person’s genome to a phenotypic trait [8–10]—and a new focus on polygenic adaptations, namely adaptations that have occurred by natural selection on traits influenced by many genes [11–13].

Theories involving natural selection have long been applied in the scientific literature to explain mean phenotypic differences among human populations [14–16]. Although new tools for statistical analysis of polygenic variation and polygenic adaptation provide opportunities for studying human evolution and the genetic basis of traits, they also generate potential for misinterpretation. In the past, public attention to research on human variation and its possible evolutionary basis has often been accompanied by claims that are not justified by the research findings [17]. Recognizing pitfalls in the interpretation of new research on human variation is therefore important for advancing discussions on associated sensitive and controversial topics.

The contribution of polygenic score distributions to phenotype distributions. Two populations are considered, populations 1 (red) and 2 (blue). Each population has a distribution of genetic propensities, which are treated as accurately estimated in the form of polygenic scores (left). The genetic propensity distribution and an environment distribution sum to produce a phenotype distribution (right). All plots have the same numerical scale. (A) Environmental differences amplify an underlying difference in genetic propensities. (B) Populations differ in their phenotypes despite having no differences in genetic propensity distributions. (C) Environmental differences obscure a difference in genetic propensities opposite in direction to the difference in phenotype means. (D) Similarity in phenotype distributions is achieved despite a difference in genetic propensity distributions by an intervention that reduces the environmental contribution for individuals with polygenic scores above a threshold. (E) Within populations, heritability is high, so that genetic variation explains the majority of phenotypic variation; however, the difference between populations is explained by an environmental difference. Panels (A–C and E) present independent normal distributions for genotype and environment that sum to produce normal distributions for phenotype. In (D), (genotype, environment) pairs are simulated from independent normal distributions and a negative constant—reflecting the effect of a medication or other intervention—is added to environmental contributions associated with simulated genotypic values that exceed a threshold


These limitations illustrate that much of the complexity embedded in use of polygenic scores—the effects of the environment on phenotype and its relationship to genotype, the proportion of variance explained, and the peculiarities of the underlying GWAS data that have been used to estimate effect sizes—is obscured by the apparent simplicity of the single values computed for each individual for each phenotype. Consequently, in using polygenic scores to describe genomic contributions to traits, particularly traits for which the total contribution of genetic variation to trait variation, as measured by heritability, is low—but even if it is high (Fig. 1E)—a difference in polygenic scores between populations provides little information about potential genetic bases for trait differences between those populations.

Unlike heritability, which ranges from 0 to 1 and therefore makes it obvious that the remaining contribution to phenotypic variation is summarized by its difference from 1, the limited explanatory role of genetics is not embedded in the nature of the polygenic scores themselves. Although polygenic scores encode knowledge about specific genetic correlates of trait variation, they do not change the conceptual framework for genetic and environmental contribution to population differences. Attributions of phenotypic differences among populations to genetic differences should therefore be treated with as much caution as similar genetic attributions from heritability in the pre-genomic era.


Read Full Post »

Diversity and Health Disparity Issues Need to be Addressed for GWAS and Precision Medicine Studies

Curator: Stephen J. Williams, PhD




Ethics of inclusion: Cultivate trust in precision medicine

 See all authors and affiliations

Science  07 Jun 2019:
Vol. 364, Issue 6444, pp. 941-942
DOI: 10.1126/science.aaw8299

Precision medicine is at a crossroads. Progress toward its central goal, to address persistent health inequities, will depend on enrolling populations in research that have been historically underrepresented, thus eliminating longstanding exclusions from such research (1). Yet the history of ethical violations related to protocols for inclusion in biomedical research, as well as the continued misuse of research results (such as white nationalists looking to genetic ancestry to support claims of racial superiority), continue to engender mistrust among these populations (2). For precision medicine research (PMR) to achieve its goal, all people must believe that there is value in providing information about themselves and their families, and that their participation will translate into equitable distribution of benefits. This requires an ethics of inclusion that considers what constitutes inclusive practices in PMR, what goals and values are being furthered through efforts to enhance diversity, and who participates in adjudicating these questions. The early stages of PMR offer a critical window in which to intervene before research practices and their consequences become locked in (3).

Initiatives such as the All of Us program have set out to collect and analyze health information and biological samples from millions of people (1). At the same time, questions of trust in biomedical research persist. For example, although the recent assertions of white nationalists were eventually denounced by the American Society of Human Genetics (4), the misuse of ancestry testing may have already undermined public trust in genetic research.

There are also infamous failures in research that included historically underrepresented groups, including practices of deceit, as in the Tuskegee Syphilis Study, or the misuse of samples, as with the Havasupai tribe (5). Many people who are being asked to give their data and samples for PMR must not only reconcile such past research abuses, but also weigh future risks of potential misuse of their data.

To help assuage these concerns, ongoing PMR studies should open themselves up to research, conducted by social scientists and ethicists, that examines how their approaches enhance diversity and inclusion. Empirical studies are needed to account for how diversity is conceptualized and how goals of inclusion are operationalized throughout the life course of PMR studies. This is not limited to selection and recruitment of populations but extends to efforts to engage participants and communities, through data collection and measurement, and interpretations and applications of study findings. A commitment to transparency is an important step toward cultivating public trust in PMR’s mission and practices.

From Inclusion to Inclusive

The lack of diverse representation in precision medicine and other biomedical research is a well-known problem. For example, rare genetic variants may be overlooked—or their association with common, complex diseases can be misinterpreted—as a result of sampling bias in genetics research (6). Concentrating research efforts on samples with largely European ancestry has limited the ability of scientists to make generalizable inferences about the relationships among genes, lifestyle, environmental exposures, and disease risks, and thereby threatens the equitable translation of PMR for broad public health benefit (7).

However, recruiting for diverse research participation alone is not enough. As with any push for “diversity,” related questions arise about how to describe, define, measure, compare, and explain inferred similarities and differences among individuals and groups (8). In the face of ambivalence about how to represent population variation, there is ample evidence that researchers resort to using definitions of diversity that are heterogeneous, inconsistent, and sometimes competing (9). Varying approaches are not inherently problematic; depending on the scientific question, some measures may be more theoretically justified than others and, in many cases, a combination of measures can be leveraged to offer greater insight (10). For example, studies have shown that American adults who do not self-identify as white report better mental and physical health if they think others perceive them as white (1112).

The benefit of using multiple measures of race and ancestry also extends to genetic studies. In a study of hypertension in Puerto Rico, not only did classifications based on skin color and socioeconomic status better predict blood pressure than genetic ancestry, the inclusion of these sociocultural measures also revealed an association between a genetic polymorphism and hypertension that was otherwise hidden (13). Thus, practices that allow for a diversity of measurement approaches, when accompanied by a commitment to transparency about the rationales for chosen approaches, are likely to benefit PMR research more than striving for a single gold standard that would apply across all studies. These definitional and measurement issues are not merely semantic. They also are socially consequential to broader perceptions of PMR research and the potential to achieve its goals of inclusion.

Study Practices, Improve Outcomes

Given the uncertainty and complexities of the current, early phase of PMR, the time is ripe for empirical studies that enable assessment and modulation of research practices and scientific priorities in light of their social and ethical implications. Studying ongoing scientific practices in real time can help to anticipate unintended consequences that would limit researchers’ ability to meet diversity recruitment goals, address both social and biological causes of health disparities, and distribute the benefits of PMR equitably. We suggest at least two areas for empirical attention and potential intervention.

First, we need to understand how “upstream” decisions about how to characterize study populations and exposures influence “downstream” research findings of what are deemed causal factors. For example, when precision medicine researchers rely on self-identification with U.S. Census categories to characterize race and ethnicity, this tends to circumscribe their investigation of potential gene-environment interactions that may affect health. The convenience and routine nature of Census categories seemed to lead scientists to infer that the reasons for differences among groups were self-evident and required no additional exploration (9). The ripple effects of initial study design decisions go beyond issues of recruitment to shape other facets of research across the life course of a project, from community engagement and the return of results to the interpretation of study findings for human health.

Second, PMR studies are situated within an ecosystem of funding agencies, regulatory bodies, disciplines, and other scholars. This partly explains the use of varied terminology, different conceptual understandings and interpretations of research questions, and heterogeneous goals for inclusion. It also makes it important to explore how expectations related to funding and regulation influence research definitions of diversity and benchmarks for inclusion.

For example, who defines a diverse study population, and how might those definitions vary across different institutional actors? Who determines the metrics that constitute successful inclusion, and why? Within a research consortium, how are expectations for data sharing and harmonization reconciled with individual studies’ goals for recruitment and analysis? In complex research fields that include multiple investigators, organizations, and agendas, how are heterogeneous, perhaps even competing, priorities negotiated? To date, no studies have addressed these questions or investigated how decisions facilitate, or compromise, goals of diversity and inclusion.

The life course of individual studies and the ecosystems in which they reside cannot be easily separated and therefore must be studied in parallel to understand how meanings of diversity are shaped and how goals of inclusion are pursued. Empirically “studying the studies” will also be instrumental in creating mechanisms for transparency about how PMR is conducted and how trade-offs among competing goals are resolved. Establishing open lines of inquiry that study upstream practices may allow researchers to anticipate and address downstream decisions about how results can be interpreted and should be communicated, with a particular eye toward the consequences for communities recruited to augment diversity. Understanding how scientists negotiate the challenges and barriers to achieving diversity that go beyond fulfilling recruitment numbers is a critical step toward promoting meaningful inclusion in PMR.

Transparent Reflection, Cultivation of Trust

Emerging research on public perceptions of PMR suggests that although there is general support, questions of trust loom large. What we learn from studies that examine on-the-ground approaches aimed at enhancing diversity and inclusion, and how the research community reflects and responds with improvements in practices as needed, will play a key role in building a culture of openness that is critical for cultivating public trust.

Cultivating long-term, trusting relationships with participants underrepresented in biomedical research has been linked to a broad range of research practices. Some of these include the willingness of researchers to (i) address the effect of history and experience on marginalized groups’ trust in researchers and clinicians; (ii) engage concerns about potential group harms and risks of stigmatization and discrimination; (iii) develop relationships with participants and communities that are characterized by transparency, clear communication, and mutual commitment; and (iv) integrate participants’ values and expectations of responsible oversight beyond initial informed consent (14). These findings underscore the importance of multidisciplinary teams that include social scientists, ethicists, and policy-makers, who can identify and help to implement practices that respect the histories and concerns of diverse publics.

A commitment to an ethics of inclusion begins with a recognition that risks from the misuse of genetic and biomedical research are unevenly distributed. History makes plain that a multitude of research practices ranging from unnecessarily limited study populations and taken-for-granted data collection procedures to analytic and interpretive missteps can unintentionally bolster claims of racial superiority or inferiority and provoke group harm (15). Sustained commitment to transparency about the goals, limits, and potential uses of research is key to further cultivating trust and building long-term research relationships with populations underrepresented in biomedical studies.

As calls for increasing diversity and inclusion in PMR grow, funding and organizational pathways must be developed that integrate empirical studies of scientific practices and their rationales to determine how goals of inclusion and equity are being addressed and to identify where reform is required. In-depth, multidisciplinary empirical investigations of how diversity is defined, operationalized, and implemented can provide important insights and lessons learned for guiding emerging science, and in so doing, meet our ethical obligations to ensure transparency and meaningful inclusion.

References and Notes

  1. C. P. Jones et al Ethn. Dis. 18496 (2008).
  2. C. C. GravleeA. L. NonC. J. Mulligan
  3. S. A. Kraft et al Am. J. Bioeth. 183 (2018).
  4. A. E. Shields et al Am. Psychol. 6077 (2005).

Read Full Post »

Icelandic Population Genomic Study Results by deCODE Genetics come to Fruition: Curation of Current genomic studies

Reporter/Curator: Stephen J. Williams, Ph.D.


UPDATED on 9/6/2017

On 9/6/2017, Aviva Lev-Ari, PhD, RN had attend a talk by Paul Nioi, PhD, Amgen, at HMS, Harvard BioTechnology Club (GSAS).

Nioi discussed his 2016 paper in NEJM, 2016, 374:2131-2141

Variant ASGR1 Associated with a Reduced Risk of Coronary Artery Disease

Paul Nioi, Ph.D., Asgeir Sigurdsson, B.Sc., Gudmar Thorleifsson, Ph.D., Hannes Helgason, Ph.D., Arna B. Agustsdottir, B.Sc., Gudmundur L. Norddahl, Ph.D., Anna Helgadottir, M.D., Audur Magnusdottir, Ph.D., Aslaug Jonasdottir, M.Sc., Solveig Gretarsdottir, Ph.D., Ingileif Jonsdottir, Ph.D., Valgerdur Steinthorsdottir, Ph.D., Thorunn Rafnar, Ph.D., Dorine W. Swinkels, M.D., Ph.D., Tessel E. Galesloot, Ph.D., Niels Grarup, Ph.D., Torben Jørgensen, D.M.Sc., Henrik Vestergaard, D.M.Sc., Torben Hansen, Ph.D., Torsten Lauritzen, D.M.Sc., Allan Linneberg, Ph.D., Nele Friedrich, Ph.D., Nikolaj T. Krarup, Ph.D., Mogens Fenger, Ph.D., Ulrik Abildgaard, D.M.Sc., Peter R. Hansen, D.M.Sc., Anders M. Galløe, Ph.D., Peter S. Braund, Ph.D., Christopher P. Nelson, Ph.D., Alistair S. Hall, F.R.C.P., Michael J.A. Williams, M.D., Andre M. van Rij, M.D., Gregory T. Jones, Ph.D., Riyaz S. Patel, M.D., Allan I. Levey, M.D., Ph.D., Salim Hayek, M.D., Svati H. Shah, M.D., Muredach Reilly, M.B., B.Ch., Gudmundur I. Eyjolfsson, M.D., Olof Sigurdardottir, M.D., Ph.D., Isleifur Olafsson, M.D., Ph.D., Lambertus A. Kiemeney, Ph.D., Arshed A. Quyyumi, F.R.C.P., Daniel J. Rader, M.D., William E. Kraus, M.D., Nilesh J. Samani, F.R.C.P., Oluf Pedersen, D.M.Sc., Gudmundur Thorgeirsson, M.D., Ph.D., Gisli Masson, Ph.D., Hilma Holm, M.D., Daniel Gudbjartsson, Ph.D., Patrick Sulem, M.D., Unnur Thorsteinsdottir, Ph.D., and Kari Stefansson, M.D., Ph.D.

N Engl J Med 2016; 374:2131-2141June 2, 2016DOI: 10.1056/NEJMoa1508419

Citing Articles (22)


Several sequence variants are known to have effects on serum levels of non–high-density lipoprotein (HDL) cholesterol that alter the risk of coronary artery disease.


We sequenced the genomes of 2636 Icelanders and found variants that we then imputed into the genomes of approximately 398,000 Icelanders. We tested for association between these imputed variants and non-HDL cholesterol levels in 119,146 samples. We then performed replication testing in two populations of European descent. We assessed the effects of an implicated loss-of-function variant on the risk of coronary artery disease in 42,524 case patients and 249,414 controls from five European ancestry populations. An augmented set of genomes was screened for additional loss-of-function variants in a target gene. We evaluated the effect of an implicated variant on protein stability.


We found a rare noncoding 12-base-pair (bp) deletion (del12) in intron 4 of ASGR1, which encodes a subunit of the asialoglycoprotein receptor, a lectin that plays a role in the homeostasis of circulating glycoproteins. The del12 mutation activates a cryptic splice site, leading to a frameshift mutation and a premature stop codon that renders a truncated protein prone to degradation. Heterozygous carriers of the mutation (1 in 120 persons in our study population) had a lower level of non-HDL cholesterol than noncarriers, a difference of 15.3 mg per deciliter (0.40 mmol per liter) (P=1.0×10−16), and a lower risk of coronary artery disease (by 34%; 95% confidence interval, 21 to 45; P=4.0×10−6). In a larger set of sequenced samples from Icelanders, we found another loss-of-function ASGR1 variant (p.W158X, carried by 1 in 1850 persons) that was also associated with lower levels of non-HDL cholesterol (P=1.8×10−3).


ASGR1 haploinsufficiency was associated with reduced levels of non-HDL cholesterol and a reduced risk of coronary artery disease. (Funded by the National Institutes of Health and others.)


Amgen’s deCODE Genetics Publishes Largest Human Genome Population Study to Date

Mark Terry, BioSpace.com Breaking News Staff reported on results of one of the largest genome sequencing efforts to date, sequencing of the genomes of 2,636 people from Iceland by deCODE genetics, Inc., a division of Thousand Oaks, Calif.-based Amgen (AMGN).

Amgen had bought deCODE genetics Inc. in 2012, saving the company from bankruptcy.

There were a total of four studies, published on March 25, 2015 on the online version of Nature Genetics; titled “Large-scale whole-genome sequencing of the Icelandic population[1],” “Identification of a large set of rare complete human knockouts[2],” “The Y-chromosome point mutation rate in humans[3]” and “Loss-of-function variants in ABCA7 confer risk of Alzheimer’s disease[4].”

The project identified some new genetic variants which increase risk of Alzheimer’s disease and confirmed some variants known to increase risk of diabetes and atrial fibrillation. A more in-depth post will curate these findings but there was an interesting discrete geographic distribution of certain rare variants located around Iceland. The dataset offers a treasure trove of meaningful genetic information not only about the Icelandic population but offers numerous new targets for breast, ovarian cancer as well as Alzheimer’s disease.

View Mark Terry’s article here on Biospace.com.

“This work is a demonstration of the unique power sequencing gives us for learning more about the history of our species,” said Kari Stefansson, founder and chief executive officer of deCode and one of the lead authors in a statement, “and for contributing to new means of diagnosing, treating and preventing disease.”

The scale and ambition of the study is impressive, but perhaps more important, the research identified a new genetic variant that increases the risk of Alzheimer’s disease and already had identified an APP variant that is associated with decreased risk of Alzheimer’s Disease. It also confirmed variants that increase the risk of diabetes and a variant that results in atrial fibrillation.
The database of human genetic variation (dbSNP) contained over 50 million unique sequence variants yet this database only represents a small proportion of single nucleotide variants which is thought to exist. These “private” or rare variants undoubtedly contribute to important phenotypes, such as disease susceptibility. Non-SNV variants, like indels and structural variants, are also under-represented in public databases. The only way to fully elucidate the genetic basis of a trait is to consider all of these types of variants, and the only way to find them is by large-scale sequencing.

Curation of Population Genomic Sequencing Programs/Corporate Partnerships

Click on “Curation of genomic studies” below for full Table

Curation of genomic studies
Study Partners Population Enrolled Disease areas Analysis
Icelandic Genome


deCODE/Amgen Icelandic 2,636 Variants related to: Alzheimer’s, cardiovascular, diabetes WES + EMR; blood samples
Genome Sequencing Study Geisinger Health System/Regeneron Northeast PA, USA 100,000 Variants related to hypercholestemia, autism, obesity, other diseases WES +EMR +MyCode;

– Blood samples

The 100,000 Genomes Project National Health Service/NHS Genome Centers/ 10 companies forming Gene Consortium including Abbvie, Alexion, AstraZeneca, Biogen, Dimension, GSK, Helomics, Roche,   Takeda, UCB Rare disorders population UK Starting to recruit 100,000 Initially rare diseases, cancer, infectious diseases WES of blood, saliva and tissue samples

Ref paper

Saudi Human Genome Program 7 centers across Saudi Arabia in conjunction with King Abdulaziz City Science & Tech., King Faisal Hospital & Research Centre/Life Technologies General population Saudi Arabia 20,000 genomes over three years First focus on rare severe early onset diseases: diabetes, deafness, cardiovascular, skeletal deformation Whole genome sequence blood samples + EMR
Genome of the Netherlands (GoNL) Consortium consortium of the UMCG,LUMCErasmus MCVU university and UMCU. Samples where contributed by LifeLinesThe Leiden Longevity StudyThe Netherlands Twin Registry (NTR), The Rotterdam studies, and The Genetic Research in Isolated Populations program. All the sequencing work is done by BGI Hong Kong. Families in Netherlands 769 Variants, SNV, indels, deletions from apparently healthy individuals, family trios Whole genome NGS of whole blood no EMR

Ref paper in Nat. Genetics

Ref paper describing project

Faroese FarGen project Privately funded Faroe Islands Faroese population 50,000 Small population allows for family analysis Combine NGS with EMR and genealogy reports
Personal Genome Project Canada $4000.00 fee from participants; collaboration with University of Toronto and SickKids Organization; technical assistance with Harvard Canadian Health System Goal: 100,000 ? just started no defined analysis goals yet Whole exome and medical records
Singapore Sequencing Malay Project (SSMP) Singapore Genome Variation Project

Singapore Pharmacogenomics Project

Malaysian 100 healthy Malays from Singapore Pop. Health Study Variant analysis Deep whole genome sequencing
GenomeDenmark four Danish universities (KU, AU, DTU and AAU), two hospitals (Herlev and Vendsyssel) and two private firms (Bavarian Nordic and BGI-Europe). 150 complete genomes; first 30 published in Nature Comm. ? See link
Neuromics Consortium University of Tübingen and 18 academic and industrial partners (see link for description) European and Australian 1,100 patients with neuro-

degenerative and neuro-

muscular disease

Moved from SNP to whole exome analysis Whole Exome, RNASeq


  1. Gudbjartsson DF, Helgason H, Gudjonsson SA, Zink F, Oddson A, Gylfason A, Besenbacher S, Magnusson G, Halldorsson BV, Hjartarson E et al: Large-scale whole-genome sequencing of the Icelandic population. Nature genetics 2015, advance online publication.
  2. Sulem P, Helgason H, Oddson A, Stefansson H, Gudjonsson SA, Zink F, Hjartarson E, Sigurdsson GT, Jonasdottir A, Jonasdottir A et al: Identification of a large set of rare complete human knockouts. Nature genetics 2015, advance online publication.
  3. Helgason A, Einarsson AW, Gumundsdottir VB, Sigursson A, Gunnarsdottir ED, Jagadeesan A, Ebenesersdottir SS, Kong A, Stefansson K: The Y-chromosome point mutation rate in humans. Nature genetics 2015, advance online publication.
  4. Steinberg S, Stefansson H, Jonsson T, Johannsdottir H, Ingason A, Helgason H, Sulem P, Magnusson OT, Gudjonsson SA, Unnsteinsdottir U et al: Loss-of-function variants in ABCA7 confer risk of Alzheimer’s disease. Nature genetics 2015, advance online publication.

Other post related to DECODE, population genomics, and NGS on this site include:

Illumina Says 228,000 Human Genomes Will Be Sequenced in 2014

CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics

CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics and Computational Genomics – Part IIB

Human genome: UK to become world number 1 in DNA testing

Synthetic Biology: On Advanced Genome Interpretation for Gene Variants and Pathways: What is the Genetic Base of Atherosclerosis and Loss of Arterial Elasticity with Aging

Genomic Promise for Neurodegenerative Diseases, Dementias, Autism Spectrum, Schizophrenia, and Serious Depression

Sequencing the exomes of 1,100 patients with neurodegenerative and neuromuscular diseases: A consortium of 18 European and Australian institutions

University of California Santa Cruz’s Genomics Institute will create a Map of Human Genetic Variations

Three Ancestral Populations Contributed to Modern-day Europeans: Ancient Genome Analysis

Impact of evolutionary selection on functional regions: The imprint of evolutionary selection on ENCODE regulatory elements is manifested between species and within human populations

Read Full Post »

1:00PM 11/13/2014 – 10th Annual Personalized Medicine Conference at the Harvard Medical School, Boston

REAL TIME Coverage of this Conference by Dr. Aviva Lev-Ari, PhD, RN – Director and Founder of LEADERS in PHARMACEUTICAL BUSINESS INTELLIGENCE, Boston http://pharmaceuticalintelligence.com

1:00 p.m. Panel Discussion Genomics in Prenatal and Childhood Disorders

Genomics in Prenatal and Childhood Disorders


David Sweetser, M.D., Ph.D.
Unit Chief, Division of Medical Genetics; Attending Physician in Pediatric Hematology/Oncology,
Massachusetts General Hospital for Children

Genomics revolutionized medicine and genetic variation in a larger scale

Cases one on Causing Autism – mutations in a gene of synapse formation, clinical trials

Treatment: IGF1

Genetics: embryo – implant only the healthy embryo – newborn comprehensive genetics testing in the medical record integrated – Standard language of GENE-DRUG interaction not only drug-drug interaction

Potential Harms: May or may not happen disease – stigma issues

Explaining to parents the conditions is very difficult for MDs


3. Diana Bianchi, M.D.
Executive Director, Mother Infant Research Institute;
Vice Chair for Research and Academic Affairs,
Department of Pediatrics; Attending Geneticists and Neonatologist;
Natalie V. Zucker Professor, Tufts University School of Medicine

Medical Geneticist – Pediatrics

  • Prenatal screening and diagnosis – chromosomal abnormality – Down Syndrome, testing is more precise 70% fewer procedures to correct defects due to screening prenatally.
  • Prenatal diagnostics — patient is not in front of us, ultrasound examination, options to terminate pregnancies, genetic counseling — changed due to Genomics
  • Prenatal treatment to down syndrome before the birth – Transcriptomic approach, treat the fetus prebirth
  • Standard of care – all pregnant women – must receive from MD the option for screening for down syndrome, it is a test positive or negative
  • NOW – DNA allows to test for  fetal sex, chromosome in maternal circulation fetal and maternal genetics — Mother may have chromosomal variation
  • high false positive – DNA for Down Syndrome, 97% effective Micro duplication only 5%
  • genetics information protection act – sue prospective employer using Genome, life insurance issues
  • most data available is on Down Syndrome, of all parents informed of a fetus with Down Syndrome – 40% continues the pregnancy
  • accuracy in testing, offering choice and treatment are LEADING principles NOT elimination of a disease (i.e. down syndromes)

for reference see Prenatal Treatment of Down’s Syndrome: a Reality?

and ref list by Dr. Bianchi

2. Holmes Morton, M.D. @ClinicSpecChild
Medical Director, Clinic for Special Children

Small population in Lancaster, PA – risk for untreatable disease 52,000 screens 4.2 millions in US are screened Target mutation analysis, diagnosis very effectively. Harrisburg, PA – small scale natural history studies

Carrier testing offered in 70s. Discourages  from marriage, culture reaction is different. Working in the community, clinical practice using exon sequencing, combine population genetics and molecular biology.Translate Genomics to Clinical, small number of risk factors

History of genetics in population important to establish treatment

Upon birth, affected newborns get matching bone marrow transplant, thus, bypass stem cells – Gene therapy is another thing

1. Benjamin Solomon, Ph.D., M.D.
Chief, Division of Medical Genomics,
Inova Translational Medicine Institute

Longer term, statistical model in asthma research,  rigorous process on patient consent, life insurance, mutation that parents also have. Consequences: actionable findings are communicated
135 Genes – sequencing for some conditions

Questions from the Podium

– See more at: http://personalizedmedicine.partners.org/Education/Personalized-Medicine-Conference/Program.aspx#sthash.qGbGZXXf.dpuf










Read Full Post »

Curator: Aviva Lev-Ari, PhD, RN

Population Genetics

HAPAA: a tool for ancestral haploblock reconstruction. Specifically, given the genotype  (for instance, as derived by an Illumina genotyping array) of an individual of admixed ancestry, find the source population for each segment of the individual’s genome.

Protein Interaction Networks

A tool for aligning multiple global protein interaction networks; Graemlin also supports search for homology between a query module of proteins and a database of interaction networks.

Machine Learning

CONTRA: Conditionally trained models for sequence analysis. SeeCONTRAlign, a protein sequence aligner with very high accuracy, especially in twilight alignments. See CONTRAfold, an RNA secondary structure prediction tool. Stay tuned for more…

RNA Structure Prediction

CONTRAfold: Prediction of RNA secondary structure with a Conditional Log-Linear model that relies on automatically trained parameters, rather than on a physics-based energy model of RNA folding.

Protein Alignment

CONTRAlign: A protein sequence aligner that users can optionally train on feature sets such as secondary structure and solvent accessibility; see the CONTRA project above.
A protein multiple sequence aligner that exhibits high accuracy on popular benchmarks.
A protein multiple aligner that automatically finds domain structures of sequences with shuffled and repeated domain architectures.

Motif Finding

MotifCut: a non-parametric graph-based motif finding algorithm.
MotifScan: a non-parametric method for representing motifs and scanning DNA sequences for known motifs.
 CompareProspector: motif finding with Gibbs sampling & alignment.

Genomic Alignment

Stanford ENCODE: Multiple Alignments of 1% of the Human genome.
Typhon: BLAST-like sequence search to a multiple alignments database.
LAGAN: tools for genomic alignment. These include the MLAGAN multiple alignment tool, and Shuffle-LAGAN for alignment with rearrangements.

Microarray Analysis

Application of Independent Component Analysis (ICA) to microarrays.

Researchers Hope New Database Becomes Universal Cancer Genomics Tool

Swiss scientists hope that a new online database called “arrayMap” will bring cancer genomics to the desktop, laptop, and tablet computers of pathologists and researchers everywhere.

The database combines genomic information from three sources: large repositories such as the NCBI Gene Expression Omnibus (GEO) and Cancer Genome Atlas (CGA); journal literature; and submissions from individual investigators. It incorporates more than 42,000 genomic copy number arrays—normal and abnormal DNA comparisons—from 195 cancer types.

“arrayMap includes a wider range of human cancer copy number samples than any single repository,” said principal investigator Michael Baudis, M.D. Ease of access, visualization, and data manipulation, he added, are top priorities in its ongoing development.

A product of the University of Zurich Institute for Molecular Life Sciences, where Baudis researches bioinformatics and oncogenomics, arrayMap illustrates the importance of copy number abnormalities (CNA)—dysfunctional DNA gains or losses that visibly lengthen or shorten certain chromosomes—in the diagnosis, staging, and treatment of various malignancies.

“I have this particular tumor type—are there any CNAs in it that can tell me anything about prognosis or treatment?” said Michael Rossi, Ph.D., director of the Winship Cancer Institute cancer genomics program at the Emory University School of Medicine in Atlanta. “Data mining tools like arrayMap are incredibly useful to help answer such questions.”

arrayMap – genomic arrays for copy number profiling in human cancer

arrayMap is a curated reference database and bioinformatics resource targeting copy number profiling data in human cancer. The arrayMap database provides an entry point for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data. The current data reflects:

  • 42875 genomic copy number arrays
  • 634 experimental series
  • 256 array platforms
  • 197 ICD-O cancer entities
  • 480 publications (Pubmed entries)

For the majority of the samples, probe level visualization as well as customized data representation facilitate gene level and genome wide data review. Results from multi-case selections can be connected to downstream data analysis and visualization tools, as we provide through our Progenetix project.

arrayMap is developed by the group “Theoretical Cytogenetics and Oncogenomics” at the Institute of Molecular Life Sciences of the University of Zurich.

These tools were developed for our research projects. You are welcome to try them out, but there is only sparse documentation. If more support and/or custom analysis is needed, please contact Michael Baudis regarding a collaborative project.

MIT: A New Approach Uses Compression to Speed Up Genome Analysis

Public-Domain Computing Resources

Structural Bioinformatics

The BetaWrap program detects the right-handed parallel beta-helix super-secondary structural motif in primary amino acid sequences by using beta-strand interactions learned from non-beta-helix structures.
Wrap-and-pack detects beta-trefoils in protein sequences by using both pairwise beta-strand interactions and 3-D energetic packing information
The BetaWrapPro program predicts right-handed beta-helices and beta-trefoils by using both sequence profiles and pairwise beta-strand interactions, and returns coordinates for the structure.
The MSARi program indentifies conserved RNA secondary structure in non-coding RNA genes and mRNAs by searching multiple sequence alignments of a large set of candidate catalogs for correlated arrangements of reverse-complementary regions
The Paircoil2 program predicts coiled-coil domains in protein sequences by using pairwise residue correlations obtained from a coiled-coil database. The original Paircoil program is still available for use.
The MultiCoil program predicts the location of coiled-coil regions in amino acid sequences and classifies the predictions as dimeric or trimeric. An updated version, Multicoil2, will soon be available.
The LearnCoil Histidase Kinase program uses an iterative learning algorithm to detect possible coiled-coil domains in histidase kinase receptors.
The LearnCoil-VMF program uses an iterative learning algorithm to detect coiled-coil-like regions in viral membrane-fusion proteins.
The Trilogy program discovers novel sequence-structure patterns in proteins by exhaustively searching through three-residue motifs using both sequence and structure information.
The ChainTweak program efficiently samples from the neighborhood of a given base configuration by iteratively modifying a conformation using a dihedral angle representation.
The TreePack program uses a tree-decomposition based algorithm to solve the side-chain packing problem more efficiently. This algorithm is more efficient than SCWRL 3.0 while maintaining the same level of accuracy.
PartiFold: Ensemble prediction of transmembrane protein structures. Using statistical mechanics principles, partiFold computes residue contact probabilities and sample super-secondary structures from sequence only.
tFolder: Prediction of beta sheet folding pathways. Predict a coarse grained representation of the folding pathway of beta sheet proteins in a couple of minutes.
RNAmutants: Algorithms for exploring the RNA mutational landscape.Predict the effect of mutations on structures and reciprocally the influence of structures on mutations. A tool for molecular evolution studies and RNA design.
AmyloidMutants is a statistical mechanics approach for de novo prediction and analysis of wild-type and mutant amyloid structures. Based on the premise of protein mutational landscapes, AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability.


GLASS aligns large orthologous genomic regions using an iterative global alignment system. Rosetta identifies genes based on conservation of exonic features in sequences aligned by GLASS.
RNAiCut – Automated Detection of Significant Genes from Functional Genomic Screens.
MinoTar – Predict microRNA Targets in Coding Sequence.

Systems Biology

The Struct2Net program predicts protein-protein interactions (PPI) by integrating structure-based information with other functional annotations, e.g. GO, co-expression and co-localization etc. The structure-based protein interaction prediction is conducted using a protein threading server RAPTOR plus logistic regression.
IsoRank is an algorithm for global alignment of multiple protein-protein interaction (PPI) networks. The intuition is that a protein in one PPI network is a good match for a protein in another network if the former’s neighbors are good matches for the latter’s neighbors.


t-sample is an online algorithm for time-series experiments that allows an experimenter to determine which biological samples should be hybridized to arrays to recover expression profiles within a given error bound.


Compressive genomics


Nature Biotechnology 30, 627–630 (2012) doi:10.1038/nbt.2241

Published online 10 July 2012


BMIR is committed to the development of research tools as part of its goal to provide reusable, computational building blocks to facilitate the development of a vast array of systems. Some of these resources are described below.


The National Center for Biomedical Ontology (NCBO)


The National Center for Biomedical Ontology is a consortium of leading biologists, clinicians, informaticians, and ontologists who develop innovative technology and methods that allow scientists to create, disseminate, and manage biomedical information and knowledge in machine-processable form.

visit site


Protege Logo

Protégé is a free, open-source platform that provides its community of more than 80,000 users with a suite of tools to construct domain models and knowledge-based applications with ontologies.

visit site



PharmGKB curates information that establishes knowledge about the relationships among drugs, diseases and genes, including their variations and gene products. Our mission is to catalyze pharmacogenomics research.

visit site


Simbios Logo

About Simbios

Simbios, the National NIH Center for Physics-based Simulation of Biological Structures is devoted to helping biomedical researchers understand biological form and function. It provides infrastructure, software, and training to assist users as they create novel drugs, synthetic tissues, medical devices, and surgical interventions.

Simbios scientists investigate structure-function studies on a wide scale of biology – from molecules to organisms, and are currently focusing on challenging biological problems in RNA folding, myosin dynamics, neuromuscular biomechanics and cardiovascular dynamics.

visit site

Stanford BioMedical Informatics Research (BMIR) – Publications by Project

There are 8 publications for the project “Genomic Nosology for Medicine (GNOMED)”.

Identifying compartment-specific non-HLA targets after renal transplantation by integrating transcriptome and ‘‘antibodyome’’ measures
L. Li, P. Wadia, M. Sarwal, N. Kambham, T. Sigdel, D. B. Miklos, R. Chen, M. Naesens, A. J. Butte
PNAS, 106, 11, 4148-4153. Published in 2009
Using SNOMED-CT For Translational Genomics Data Integration
J. Dudley, D. P. Chen, A. J. Butte
Ronald Cornet, Kent Spackman (eds.): Representing and sharing knowledge using SNOMED. Proceedings of the 3rd International Conference on Knowledge Rep, Pheonix (AZ), USA, CEUR Workshop Proceedings, ISSN 1613-0073, online CEUR-WS.org/Vol-410/, 91-96. Published in 2008
The Ultimate Model Organism
A. J. Butte
Science, 320, 5874, 325-327. Published in 2008
Novel Integration of Hopsital Electronic Medical Records and Gene Expression Measurements to Identify Genetic Markers of Maturation
D. P. Chen, S. C. Weber, P. S. Constantinou, T. A. Ferris, H. J. Lowe, A. J. Butte
Pacific Symposium on Biocomputing, Big Island, Hawaii, 13, 243-254. Published in 2008
Enabling Integrative Genomic Analysis of High-Impact Human Diseases through Text Mining
J. Dudley, A. J. Butte
Pacific Symposium on Biocomputing, Big Island, Hawaii, 13, 580-591. Published in 2008
Methodologies for Extracting Functional Pharmacogenomic Experiments from International Repository
Y. Lin, A. P. Chiang, P. Yao, R. Chen, A. J. Butte, R. S. Lin
AMIA Annual Symposium, Chicago, IL, 463-467. Published in 2007
Clinical Arrays of Laboratory Measures, or “Clinarrays”, Built from an Electronic Health Record Enable Disease Subtyping by Severity
D. P. Chen, S. C. Weber, P. S. Constantinou, T. A. Ferris, H. J. Lowe, A. J. Butte
AMIA Annual Symposium, Chicago, IL, 115-119. Published in 2007
Finding Disease-Related Genomic Experiments Within an International Repository: First Steps in Translational Bioinformatics
A. J. Butte, R. Chen
Annual Symposium of the American Medical Informatics Association, Washington, D.C., 106-10. Published in 2006

Featured Publications

The National Center for Biomedical Ontology
M. A. Musen, N. F. Noy, C. G. Chute, M. A. Storey, B. Smith, N. H. Shah
. Published in 2011
Prototyping a Biomedical Ontology Recommender Service
C. Jonquet, N. H. Shah, M. A. Musen
Bio-Ontologies: Knowledge in Biology, SIG, ISMB ECCB 2009, Stockholm, Sweden. Published in 2009
Translational bioinformatics applications in genome medicine
A. J. Butte
Genome Medicine, 1, 6, 64. Published in 2009
Identifying compartment-specific non-HLA targets after renal transplantation by integrating transcriptome and ‘‘antibodyome’’ measures
L. Li, P. Wadia, M. Sarwal, N. Kambham, T. Sigdel, D. B. Miklos, R. Chen, M. Naesens, A. J. Butte
PNAS, 106, 11, 4148-4153. Published in 2009
Technology for Building Intelligent Systems: From Psychology to Engineering
M. A. Musen
Modeling Complex Systems, Bill Shuart, Will Spaulding and Jeffrey Poland, U Nebraska P, Lincoln, Nebraska, Vol 52 of the Nebraska Symposium on Motivation, 145-184. Published in 2009
Software-Engineering Challenges of Building and Deploying Reusable Problem Solvers
M. J. O’Connor, C. I. Nyulas, A. Okhmatovskaia, D. Buckeridge, S. W. Tu, M. A. Musen
Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 24, 3. Published in 2009
Data-Driven Methods to Discover Molecular Determinants of Serious Adverse Drug Events
A. P. Chiang, A. J. Butte
Clinical Pharmacology and Therapeutics, 28 January 2009, Advance online publication, doi:10.1038/clpt.2008.274. Published in 2009
Knowledge-Data Integration for Temporal Reasoning in a Clinical Trial System
M. J. O’Connor, R. D. Shankar, D. B. Parrish, A. K. Das
International Journal of Medical Informatics, 78, Suppl. 1, S77-S85. Published in 2009
GeneChaser: Identifying all biological and clinical conditions in which genes of interest are differentially expressed
R. Chen, R. Mallelwar, A. Thosar, S. Venkatasubrahmanyam, A. J. Butte
BMC Bioinformatics, 9, 1, 548. (doi:10.1186/1471-2105-9-548). Published in 2008
FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease
R. Chen, A. A. Morgan, J. Dudley, A. M. Deshpande, L. Li, K. Kodama, A. P. Chiang, A. J. Butte
Genome Biology, 9, 12, R170 (doi:10.1186/gb-2008-9-12-r170). Published in 2008
Translational Bioinformatics: Coming of Age
A. J. Butte
Journal of the American Medical Informatics Association, JAMIA, 15, 6, 709-14. Published in 2008
An Ontology-Driven Framework for Deploying JADE Agent Systems
C. I. Nyulas, M. J. O’Connor, S. W. Tu, A. Okhmatovskaia, D. Buckeridge, M. A. Musen
IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Sydney, Australia, 2, 573-577. Published in 2008
Understanding Detection Performance in Public Health Surveillance: Modeling Aberrancy-Detection Algorithms
D. Buckeridge, A. Okhmatovskaia, S. W. Tu, C. I. Nyulas, M. J. O’Connor, M. A. Musen
Journal of the American Medical Informatics Association, 15, 6, 760-769. Published in 2008
Network Analysis of Intrinsic Functional Brain Connectivity in Alzheimer’s Disease
K. S. Supekar, V. Menon, M. A. Musen, D. L. Rubin, M. Greicius
Public Library of Science-Computational Biology., PLoS Computational Biology, June 2008. Published in 2008
Medical Imaging on the Semantic Web: Annotation and Image Markup
D. L. Rubin, P. Mongkolwat, V. Kleper, K. S. Supekar, D. S. Channin
AAAI Spring Symposium Series, Semantic Scientific Knowledge Integration, Stanford. Published in 2008
The Ultimate Model Organism
A. J. Butte
Science, 320, 5874, 325-327. Published in 2008
BioPortal: A Web Portal to Biomedical Ontologies
D. L. Rubin, D. de Abreu Moreira, P. P. Kanjamala, M. A. Musen
AAAI Spring Symposium Series, Symbiotic Relationships between Semantic Web and Knowledge Engineering, Stanford University, (in press). Published in 2008
AILUN: reannotating gene expression data automatically
R. Chen, L. Li, A. J. Butte
Nature Methods, 4, 11, 879. Published in 2007
Evaluation and Integration of 49 Genome-wide Experiments and the Prediction of Previously Unknown Obesity-related Genes
S. B. English, A. J. Butte
Bioinformatics, Epub. Published in 2007
Protege: A Tool for Managing and Using Terminology in Radiology Applications
D. L. Rubin, N. F. Noy, M. A. Musen
Journal of Digital Imaging, J Digit Imaging. Published in 2007
Efficiently Querying Relational Databases using OWL and SWRL
M. J. O’Connor, R. D. Shankar, S. W. Tu, C. I. Nyulas, A. K. Das, M. A. Musen
The First International Conference on Web Reasoning and Rule Systems, Innsbruck, Austria, Springer, LNCS 4524, 361-363. Published in 2007
Creation and implications of a phenome-genome network
A. J. Butte, I. S. Kohane
Nature Biotechnology, 24, 1, 55 – 62. Published in 2006


National Center for Simulation of Biological Structures (SimBioS) at Stanford University

National Center for the Multiscale Analysis of Genomic and Cellular Networks (MAGNet) at Columbia University

National Alliance for Medical Image Computing (NA-MIC) at Brigham and Women’s Hospital, Boston, MA

Integrating Biology and the Bedside (I2B2) at Brigham and Women’s Hospital, Boston, MA

National Center for Biomedical Ontology (NCBO) at Stanford University

Integrate Data for Analysis, Anonymization, and Sharing (IDASH) at the University of California, San Diego



Read Full Post »