Posts Tagged ‘Mycobacterium’

Genomics and Evolution

Author: Marcus W. Feldman, PhD


Insofar as the genetic evolution of modern humans is concerned, large scale SNP studies of worldwide populations have provided a consistent picture of a migration out of Africa that gave rise to the human populations of the other continents. This migration probably began 60–80 kya, was probably not continuous, and could have resulted in a division during the passage through the Levant en route from east Africa. One division may have moved in a more southerly direction towards south and east Asia, possibly to Australia, and eventually, 15–30 kya into the Americas. The other division may have “turned left” and moved towards Europe.

In this process, which we call the “serial founder” model of human expansion (refs. 1, 2), migration and demography probably had effects that constrained the subsequent action of natural selection on human genes.

  • Variation in skin pigmentation genes today provides some of the strongest signals of natural selection during this human expansion. However, it is also likely that the
  • Immune response genes, e.g., MHC genes, achieved their high levels of polymorphism in response to new pathogens encountered in the great expansion.

Many of the strongest signals of natural selection indicate the importance of the innovations of farming and pastoralism. The gene sequences involved in lactose tolerance and starch metabolism, for example, are strikingly different in groups that adopted dairying or farming, respectively, from hunter-gatherers, who did not.

From the analysis of SNPs, I take home two messages.

  • The first is that although some parts of the genome show clear signals of selection, most of our DNA perceived via SNPs does not.
  • The second is that population growth and migration have been major forces in determining the patterns of variation. Indeed,
  • recent analyses of exome sequences confirm that the spectrum of rare allele frequencies is compatible only with recent and rapid population growth (ref. 3). Indeed,
  • recent analyses of the 1000 genomes data, that is, data from whole genome sequencing of one-thousand human genomes representing Africa (Yoruba), Europe (from Utah), and East Asia (China and Japan), identified only 35 non-synonymous SNPs from 33 genes as having been subject to recent adaptive selection (ref. 4).

The next phase of genomic analysis of humans, complete exome sequencing of large cohorts, or whole genome sequencing of samples from many representative populations, will focus more on two themes.

  • The first will be the role of rare alleles in human phenotypes, especially diseases. The previous phase, GWAS (genome-wide association studies), has been disappointing in revealing genetic “causes” of complex traits. However, my view is that
  • the second theme, the molecular genetics of gene regulation, and interaction of this regulation with the environment, is likely to have bigger payoffs, not only for determination of phenotypes, but also in showing where in the genome the strongest signals of selection lie. As more methylation profiles, small RNA patterns of interference, and other gene-regulatory analyses of whole genomes are completed, both the medical and evolutionary significance of DNA variation will become clearer.

Pemberton, T. J., D. Absher, M. W. Feldman, R. M. Myers, N. A. Rosenberg, and J. Z. Li. 2012. Genomic patterns of homozygosity in worldwide human populations. Am. J. Hum. Genet. 91: 275–292.

Genome-wide patterns of homozygosity runs and their variation across individuals provide a valuable and often untapped resource for studying human genetic diversity and evolutionary history. Using genotype data at 577,489 autosomal SNPs, we employed a likelihood-based approach to identify runs of homozygosity (ROH) in 1,839 individuals representing 64 worldwide populations, classifying them by length into three classes—short, intermediate, and long—with a model-based clustering algorithm. For each class, the number and total length of ROH per individual show considerable variation across individuals and populations. The total lengths of short and intermediate ROH per individual increase with the distance of a population from East Africa, in agreement with similar patterns previously observed for locus-wise homozygosity and linkage disequilibrium. By contrast, total lengths of long ROH show large inter-individual variations that probably reflect recent inbreeding patterns, with higher values occurring more often in populations with known high frequencies of consanguineous unions. Across the genome, distributions of ROH are not uniform, and they have distinctive continental patterns. ROH frequencies across the genome are correlated with local genomic variables such as recombination rate, as well as with signals of recent positive selection. In addition, long ROH are more frequent in genomic regions harboring genes associated with autosomal- dominant diseases than in regions not implicated in Mendelian diseases. These results provide insight into the way in which homozygosity patterns are produced, and they generate baseline homozygosity patterns that can be used to aid homozygosity mapping of genes associated with recessive diseases.

Pepperell, C. S., J. M. Granka, D. C. Alexander, M. A. Behr, L. Chui, J. Gordon, J. L. Guthrie, F. B. Jamieson, D. Langlois-Klassen, R. Long, D. Nguyen, W. Wobeser, and M. W. Feldman. 2011. Dispersal of Mycobacterium tuberculosis via the Canadian fur trade. Proc. Natl. Acad. Sci. USA 108: 6526–6531.

Patterns of gene flow can have marked effects on the evolution of populations. To better understand the migration dynamics of Mycobacterium tuberculosis, we studied genetic data from European M. tuberculosis lineages currently circulating in Aboriginal and French Canadian communities. A single M. tuberculosis lineage, characterized by the DS6Quebec genomic deletion, is at highest frequency among Aboriginal populations in Ontario, Saskatchewan, and Alberta; this bacterial lineage is also dominant among tuberculosis (TB) cases in French Canadians resident in Quebec. Substantial contact between these human populations is limited to a specific historical era (1710–1870), during which individuals from these populations met to barter furs. Statistical analyses of extant M. tuberculosis minisatellite data are consistent with Quebec as a source population for M. tuberculosis gene flow into Aboriginal populations during the fur trade era. Historical and genetic analyses suggest that tiny M. tuberculosis populations persisted for ∼100 y among indigenous populations and subsequently expanded in the late 19th century after environmental changes favoring the pathogen. Our study suggests that spread of TB can occur by two asynchronous processes: (i) dispersal of M. tuberculosis by minimal numbers of human migrants, during which small pathogen populations are sustained by ongoing migration and slow disease dynamics, and (ii) expansion of the M. tuberculosis population facilitated by shifts in host ecology. If generalizable, these migration dynamics can help explain the low DNA sequence diversity observed among isolates of M. tuberculosis and the difficulties in global elimination of tuberculosis, as small, widely dispersed pathogen populations are difficult both to detect and to eradicate.

Henn, B. M., C. R. Gignoux, M. Jobin, J. M. Granka, J. M. Macpherson, J. M. Kidd, L. Rodríguez-Botigué, S. Ramachandran, L. Hon, A. Brisbin, A. A. Lin, P. A. Underhill, D. Comas, K. K. Kidd, P. J. Norman, P. Parham, C. D. Bustamante, J. L. Mountain, and M. W. Feldman. 2011. Hunter-gatherer genomic diversity suggests a southern African origin for modern humans. Proc. Natl. Acad. Sci. USA 108: 5154–5162.

Africa is inferred to be the continent of origin for all modern human populations, but the details of human prehistory and evolution in Africa remain largely obscure owing to the complex histories of hundreds of distinct populations. We present data for more than 580,000 SNPs for several hunter-gatherer populations: the Hadza and Sandawe of Tanzania, and the !Khomani Bushmen of South Africa, including speakers of the nearly extinct N|u language. We find that African hunter-gatherer populations today remain highly differentiated, encompassing major components of variation that are not found in other African populations. Hunter-gatherer populations also tend to have the lowest levels of genome-wide linkage disequilibrium among 27 African populations. We analyzed geographic patterns of linkage disequilibrium and population differentiation, as measured by FST, in Africa. The observed patterns are consistent with an origin of modern humans in southern Africa rather than eastern Africa, as is generally assumed. Additionally, genetic variation in African hunter-gatherer populations has been significantly affected by interaction with farmers and herders over the past 5,000 y, through both severe population bottlenecks and sex-biased migration. However, African hunter-gatherer populations continue to maintain the highest levels of genetic diversity in the world.

Casto, A. M., and M. W. Feldman. 2011. Genome-wide association study SNPs in the human genome diversity project populations: does selection affect unlinked SNPs with shared trait associations? PLoS Genet. 7(1): e1001266.

Genome-wide association studies (GWAS) have identified more than 2,000 trait-SNP associations, and the number continues to increase. GWAS have focused on traits with potential consequences for human fitness, including many immunological, metabolic, cardiovascular, and behavioral phenotypes. Given the polygenic nature of complex traits, selection may exert its influence on them by altering allele frequencies at many associated loci, a possibility which has yet to be explored empirically. Here we use 38 different measures of allele frequency variation and 8 iHS scores to characterize over 1,300 GWAS SNPs in 53 globally distributed human populations. We apply these same techniques to evaluate SNPs grouped by trait association. We find that groups of SNPs associated with pigmentation, blood pressure, infectious disease, and autoimmune disease traits exhibit unusual allele frequency patterns and elevated iHS scores in certain geographical locations. We also find that GWAS SNPs have generally elevated scores for measures of allele frequency variation and for iHS in Eurasia and East Asia. Overall, we believe that our results provide evidence for selection on several complex traits that has caused changes in allele frequencies and/or elevated iHS scores at a number of associated loci. Since GWAS SNPs collectively exhibit elevated allele frequency measures and iHS scores, selection on complex traits may be quite widespread. Our findings are most consistent with this selection being either positive or negative, although the relative contributions of the two are difficult to discern. Our results also suggest that trait-SNP associations identified in Eurasian samples may not be present in Africa, Oceania, and the Americas, possibly due to differences in linkage disequilibrium patterns. This observation suggests that non-Eurasian and non-East Asian sample populations should be included in future GWAS.

Casto, A. M., J. Z. Li, D. Absher, R. Myers, S. Ramachandran, and M. W. Feldman. 2010. Characterization of X-linked SNP genotypic variation in globally distributed human populations. Genome Biol. 11:R10.

Background: The transmission pattern of the human X chromosome reduces its population size relative to the autosomes, subjects it to disproportionate influence by female demography, and leaves X-linked mutations exposed to selection in males. As a result, the analysis of X-linked genomic variation can provide insights into the influence of demography and selection on the human genome. Here we characterize the genomic variation represented by 16,297 X-linked SNPs genotyped in the CEPH human genome diversity project samples.
Results: We found that X chromosomes tend to be more differentiated between human populations than autosomes, with several notable exceptions. Comparisons between genetically distant populations also showed an excess of Xlinked SNPs with large allele frequency differences. Combining information about these SNPs with results from tests designed to detect selective sweeps, we identified two regions that were clear outliers from the rest of the X chromosome for haplotype structure and allele frequency distribution. We were also able to more precisely define the geographical extent of some previously described X-linked selective sweeps.
Conclusions: The relationship between male and female demographic histories is likely to be complex as evidence supporting different conclusions can be found in the same dataset. Although demography may have contributed to the excess of SNPs with large allele frequency differences observed on the X chromosome, we believe that selection is at least partially responsible. Finally, our results reveal the geographical complexities of selective sweeps on the X chromosome and argue for the use of diverse populations in studies of selection.


1.  Cavalli-Sforza, L.L., and M.W. Feldman. 2003. The application of molecular genetic approaches to the study of human evolution. Nat. Genet. Supp. 33: 266–275.

2.  Henn, B. M., L. L. Cavalli-Sforza, and M. W. Feldman. 2012. The great human expansion. Proc. Natl. Acad. Sci. USA 109: 17758–17764.

3.  Keinan, A., and A. G. Clark. 2012. Recent explosive human population growth has resulted in an excess of rate genetic variants. Science 336: 740–743.

4.  Grossman, S. R., K. G. Andersen, I. Shlyakhter, S. Tabrizi, S. Winnicki, A. Yen, D. J. Park, D. Griesemer, E. K. Karlsson, S. H. Wong, M. Cabili, R. A. Adegbola, R. N. K. Bamezai, A. V. S. Hill, F. O. Vannberg, J. L. Rinn, 1000 Genomes Project, E. S. Lander, S. F. Schaffner, and P. C. Sabeti. 2013. Identifying recent adaptations in large-scale genomic data. Cell 152: 703–713.

Read Full Post »