Posts Tagged ‘short tandem repeats’

Tandem Repeats, with Application to Human Population-Divergence Time

Larry H. Bernstein, MD, FCAP, Curator




The Effective Mutation Rate at Y Chromosome Short Tandem Repeats,with Application to Human Population-Divergence Time
Lev A. ZhivotovskyPeter A. Underhill, Cengiz Cinniog˘lu, Manfred KayserBharti MorarToomas KivisildRosaria Scozzari, et al.
We estimate an effective mutation rate at an average Y chromosome short-tandem repeat locus as per 1025 years, with a standard deviation across loci of , using data on microsatellite variation within chromosome haplogroups defined by unique-event polymorphisms in populations with documented short-termhistories, as well as comparative data on worldwide populations at both the Y chromosome and various autosomal loci.This value is used to estimate the times of the African Bantu expansion, the divergence of Polynesian populations (the Maoris, Cook Islanders, and Samoans), and the origin of Gypsy populations from Bulgaria.
Microsatellites, or STR polymorphisms, are abundant in the human genome and can be easily genotyped andscored; they have thus become a useful tool for the elucidation of human population history and for forensic purposes. Knowledge of the mutation rate at STR loci isimportant, both for calibration of the molecular clock inevolutionary studies and for forensic probabilistic calculations. Increasing attention has recently been paid to microsatellite variation within Y chromosome haplo-groups defined by binary polymorphisms, such as SNPs or, as a general term, “unique-event polymorphisms”(UEPs) (Underhill et al. 1996; Zerjal et al. 1997;deKnijff 2000; Kayser et al. 2000a), many of which are specific to populations related through their recent or distant history (Underhill et al. 2000; Hammer et al. 2001; Y Chromosome Consortium2002).  Although the Y chromosome locus is the ultimate SNP-STR system, similar linked SNP and STR haplotypes are also available in autosomes (Mountain et al. 2002).
A mutation rate of per generation has been estimated for Y chromosome microsatellites by direct count in deep-rooted pedigrees (Heyer et al. 1997). A similar average mutation rate value of per locus
per generation was estimated by studying Y chromosome STRs (Y-STRs) in father/son pairs of confirmed paternity, although locus-specific values varied from 0 to 8 (Kayser et al. 2000b).  Y-STR analysis in sperm revealed an average rate of repeat gains of for two Y-STRs (Holtkemper et al. 2001); however, mutations that included repeat losses could not be considered owing to limitations of the methodology used. No germ-line mutation was observed in Y-STRsfromcelllineDNA(Bianchi et al. 1998), but this result does not differ in a statistically significant way from the mutation rate esti-mates mentioned above.By counting the number of mutations in the branches of a haplotype network from samples of Native American populations, Forster et al. (2000) found a striking difference between their “evolutionary” estimate per 20 years) and the “pedigree” estimate described above. It is unclear which rate should be used; for evolutionary studies, we need to know those mutations that are involved in differences between lineages or populations. An inappropriate choiceofthemutationratevaluemay produce a 10-fold deviation from the true age of pastpopulationevents.Thediscordancebetweenthetwokinds of estimate needs to be addressed.The estimate y Forster et al. (2000) refers to a me-dian network constructed from the Y chromosome haplotypes found in the combined data from Native American populations. However, haplotype history might not represent the population history. In addition, such a network assumes single-repeat–unit mutational changes. Multistep Y-STR mutations, which have been observed recently (Forster et al. 1998; Kayser et al. 2000b;  Nebelet al. 2001), can contribute significantly to the  effective mutation rate (with  the product of the mutation rate and the variance of mutational changes in repeat scores), which determines the rate of microsatellite evolution (Slatkin 1995; Zhivotovsky and Feldman 1995). Furthermore, Forster et al. (2000) used for calibration an estimate of the time of population expansion in North America of 20,000 before the present (BP)—an estimate upon which there is no general agreement.In the present study, we estimate the effective mutation rate, using data on microsatellite variation withinY chromosome haplogroups defined by SNPs in populations with documented short-term histories, as well as comparative data on worldwide populations at both autosomal and Y chromosome loci. Then we apply our finding to estimate the time of expansion of Bantu-speaking populations in sub-Saharan Africa, the time of differentiation of some Polynesian populations, and the time of origin of Bulgarian Gypsy populations.
Estimates of Effective Mutation Rates on the Basisof Genetic Distances
Comparison of the Maoris and Cook Islanders gavean average value (over the seven loci; see table 1; fig.1) for (dm) of 0.00998, which suggests an average effective mutation rate of 0.000312 per 25 years( ). Pairwise comparisons of the
0.00998/800Bulgarian Gypsy populations (without the Darakchiisample, in which only one M82 individual was found) gave (dmof 0.01272 (averaged across population pairs and loci) or 0.000454 for the average effective mutation rate. However, these are most probably underestimates, because the (dmdistance assumes constant size for each SNP lineage over time, and it also assumes the same within-lineage variation in an ancestral population prior to its split as at the present generation. It is more likely that each of those populations was founded by a small number of haplotypes and,thus,had lower STR variation prior to divergence; this can lead to an underestimate of the rate of divergence (Zhivotovsky 2001). Therefore, we apply the second esti-mator, the average squared difference, to the Maoriand the Gypsy populations.The haplotype network shown in table 1 and figure1 suggests two founder haplotypes for the seven loci in the Maori population, PA and PD (both present in the Cook Islanders), because these haplotypes differ at two loci with no connection by single mutations (see the network of Maori haplotypes in table 1 and fig. 1). By using the ASD
 estimator for each of the haplotype networks in table 1 and figure 1 and then averaging them with weights proportional to sample sizes, we obtain a mean SE effective mutation rate of 0.000705, with SD p = 0.00078 across loci. Each of the Gypsy populations contains haplotype A at high frequency (table 3), which suggests that it is the ancestral type. The Lom population is the only one thatcontains a different haplotype, B, at the highest fre-quency; therefore, it is likely that both A and B werefounder haplotypes in this population. The Musicians are extremely heterogeneous compared with the other populations: of 19 Y chromosomes, 6 carry haplotypes that differ from haplotype A by two alleles. No other Gypsy population displays chromosomes that diverge to this extent from the ancestral haplotype (except for the Lom, in which only 1 of 26 chromosomes differed by two alleles from haplotype A). Moreover, the distribution of chromosomes in the Musicians (with 0, 1, and 2 differences relative to haplotype A) has a mean of 0.237 and a variance of 0.417, thus deviating significantly from a Poisson distribution. (∼  2.28 sample size, and  follows a  t  distribution, with df and a one-tailed value of .018. This is not the case for the other populations, which suggests that the genetic structure of the Musicians differs fromthat of other Gypsy groups in Bulgaria. The populationof the Musicians could have been founded with multiple haplotypes and/or could have been subject to admixture;therefore, we do not include it in the analysis. After removing the Musicians, we compute  w  for each population and then weight its values with the sample sizes;this gives p = 0.000725 with 0.000187 SD across loci. For the two sets of comparisons, we use the estimates 0.000705 and 0.000725 in the subsequent analysis.
Estimates of Effective Mutation Rates Based on Comparative Variation
The variances in the number of repeats were computed for each Y chromosome locus in each of 52 worldwide populations. The variances were then converted into estimates of effective mutation rates, as described in the“Material and Methods” section. Averaging over populations gives a  SD estimate of 0.000638 – 0.000109 across loci.   (p = 0.00029)
Overall Estimate On the basis of the arithmetic mean of the above three figures (i.e., 0.000705, 0.000725, and 0.000638), we estimate of the effective mutation rate at the average Y chromosome locus.
Comparison with Autosomal Loci 

Our estimate of the average effective mutation rate at Y chromosome STR loci( per 25 years) is closto those at autosomal microsatellites with tri- and tetra-nucleotide repeats, and (Zhivotovsky et al. 2000) and and (Zhivotovsky et al. 2003), which probably reflect the same slippage machinery that underlies STR mutations. Itshould be kept in mind that our estimate of effective mutation rate was based on STRs with three- and four-nucleotide motifs; inclusion of loci with dinucleotide repeatmay increase this value, because they generally have a higher (effective) mutation rate (Chakraborty et al.1997; Zhivotovsky et al. 2000).


Dependence of the Estimate on NongeneticInformation
Estimating mutation rates for the SNP/STR data from populations with available archaeological/historical records relies heavily on those records. For example, in the present study, we used 800 years BP as the time of arrival give a mutation rate of 0.00056. Therefore, the above mutation rate, 0.000705, which was inferred from the Maori data, might be an overestimate.Other proposed dates include 650–700 years BP (Mc-Fadgen et al. 1994), and 1,200 years BP (Bellwood 1989), leading to mutation rate estimates of of the Maoris in New Zealand. This may be a lower bound for the time of colonization, and 800–1,000 years BP seems to be an appropriate range for that event (Irwin 1992; Sutton 1994; Diamond and Bellwood 2003); the latter date would  ∼ 0.00087–0.00081 and  ∼ 0.00047, respectively.The same argument can be applied to the Gypsy data.Historical records suggest that the Gypsies arrived in Bulgaria 
700 years BP. This may be an underestimate ,since small groups are not historically “visible” until they become numerous or involved in an important event. If an actual divergence occurred 800 years BP, this would give an effective mutation rate of 0.000634 instead of 0.000725.
All this demonstrates that, despite variation in estimates of average Y-STR effective mutation rate (variations due to uncertainties in archaeological/historical data and in male/female population dynamics), these estimates are close to the overall point estimate ( per 25 years) and lie within the interval defined by SE, which is attributable to differences between loci. Doubling the SE, we obtain and as heuristic confidence limits for w. Potential errors in estimates of  w
 attributable to uncertainties in archaeological/historical records (see the first two paragraphs of the present section) lie within these limits. Therefore, variation among loci in effective mutation rate of various loci may be a major source of deviation of an average estimate of  w  from a true value for Y chromosome STRs. 
Between-Locus Variation in the Effective Mutation Rate
Mutation rates are reported to vary substantially among autosomal microsatellites (Di Rienzo et al.1998; Zhivotovsky et al. 2001); the same is expected for Y chromosome STRs (Forster et al. 1998; Kayser et al.2000b;
 Nebel et al. 2001). On the basis of our data, we calculate that the coefficient of between-locus variation in effective mutation rate is 0.00057/0.00069 ; a similar level of between-locus variation in effective mutation rate has been observed for autosomal loci (Zhivotovsky et al. 2001). Although sampling errors contribute to this variation, the differences in  w between loci are nevertheless important. Indeed, mutation rates can vary from locus to locus, depending on their structure. Forexample, DYS389 is a complex locus consisting of four tetranucleotide-repeat subloci (Cooper et al.1996; Rolf et al. 1998) that yield two distinctive fragments when genotyped using conventional protocols,since the forward primer anneals twice. One fragment contains all four repeat motifs (A, B, C, and D), and the other fragment, which contains just two (C and D), is often denoted by “I”. The shorter CD fragment is subtracted from the larger to yield the AB (“II”) allele. Itis important to note that the C motif is almost alwaysonly three repeats and thus is monomorphic, whereas the longer combined AB motifs are both polymorphic,thus making the AB region more mutable than the CD region. The sublocus DYS389AB can be treated as aseparate microsatellite locus that has an inherentlyhigher mutation rate than the CD sublocus. This genomic complexity and the consequent differential mutation properties of the subloci are expected to increase the overall mutation rate for DYS389. Removing DYS389 from the analysis gives . Counting that locus twice produces the same value, 0.00061. However, it is difficult to conclude that DYS389AB or other such loci will always behave—in UEP lineages or entire populations—as loci with high mutation rates,and more data will be needed to distinguish loci with different effective mutation rates. Another source of apparent between-locus variation may be different mutation rates for alleles with different numbers of repeats(Brinkmann et al. 1998). This variation actually occurswithin a locus and can greatly confound between- andwithin-locus variation. Probably, our estimate of SD,, includes both kinds of variability and therefore encompasses an entire range of “between-allele”variation.Variation in mutation rates should be kept in mind,because it might be a major source of uncertainty whena small number of loci are used. The large SE of the average mutation rate obtained here and the large SE of divergence time estimates (see below) reflect such variation. (Note that highly variable Y chromosome haplotypes cause very big CIs for coalescent times based on microsatellites [Pritchard et al. 1999]).Therefore,datinghistorical events on the basis of a small number of Y-STR loci might disagree with historical/archaeologicalrecords, although the latter might also have large “SEs.”Theoretically, hundreds of loci may be needed for precise dating of ancient demographic events (Zhivotovsky andFeldman 1995; Goldstein et al. 1996; Jorde et al. 1997), and different subsets of loci may give different estimates because of different mutation rates (Zhivotovsky et al.2003). Analysis of population divergence within UEP lineages should require fewer microsatellite loci for precise dating, because STR variation within a UEP lineage must be smaller than that in the entire UEP-heterogeneous population. The sample of Y chromosome STR loci (no more than 10 were used here) still seems too small, and a larger number of loci need to be analyzed (e.g., Seielstad et al. 2003), and
150 new Y-STRs will be available in the near future (M.Kayser, M.A.Jobling, A. Sajantila, C. Tyler-Smith, unpublished data). Futhermore, we cannot exclude the possibility that mu-tation rates at the same STR locus vary among haplo-groups because of differences in allele repeat scores, repetitive structures, or other factors (Nebeletal.2001); mutation rates might also be population specific, because of variation in genes that encode proteins involved in DNA replication and repair mechanisms or proteins that cause associated selection, if these exist (see Jobling and Tyler-Smith 2000). A large sample of loci might decrease these possible effects, but, in the absence of hard information, it seems reasonable to use the same overall average mutation rate for all instances.The estimates of average effective mutation rate and the SD can be used to obtain a two-parameter prior distribution for Y chromosome effective mutation rates for use in coalescent models.
The origin and time of arrival of the Samoans are inquestion. The tree in figure 3 shows that the Samoans split much earlier than the ancestral population that gave rise to the Maoris and Cook Islanders. Recall that theMaoris arrived in New Zealand 800–1,000 years BP; together with the time estimate between the two separation events in figure 3, this implies  ∼ 3,500 years BP for divergence of the Samoans from a common ancestral population. This can be compared with the time of East Polynesian settlement, estimated to have occurred by 500–1000  BC  (Irwin 1992, p. 81), and to the estimate of the early peopling of Polynesia, 3,000–4,000 years BP (Underhill et al. 2001).
Divergence between the Gypsy Populations

Computation of  T values averaged over all possible pairs of the 10 Gypsy populations (omitting both the Darakchii, with one M82 individual only, and the Musicians) and gives us an estimate of the time of founding of an ancestral population of related males sharing the same Y haplotype that gave rise to the contemporary Bulgarian Gypsy populations. This estimate gives 1,500-700 BC an upper bound for the divergence time and is compatible with the formation of the proto-Gypsies in India,predating their entry into the Byzantine Empire 900–1,000 years BP (Fraser 1992). If diversity was already substantial in the founder male population, and the estimate of divergence time would be smaller. The genetic composition of the Musicians differs greatly from that of the other studied Bulgarian Gypsy populations, a fact that points to possible differences in their evolutionary history. There are two possible explanations for this: the Musicians share the same origin but were greatly admixed with populations from South Asia that carried the M82 mutation, or they descended from an ancestral population different from that of the other Bulgarian Gypsy populations studied. The origins of the proto-Gypsies, as well as the time and number of migrations out of India, are still disputed among cultural anthropologists and linguists (Fraser 1992; Marushia-kova and Popov 1997; Hancock 2000). Our previous study (Gresham et al. 2001) suggested a common origin from a small group of ancestors. One should note, however, that the Musicians were not included in that study and that they are the sole representatives of a particular Balkan dialect of the Romanes language. In addition to the unusual distribution of M82 haplotypes, they display a generally higher diversity of Y chromosome lineages, including other uncommon types, that are unlikely to result from European admixture. If we follow the “different origins” scenario, the  TD estimator gives an upper bound of 2,600 years BP for the separation of the Musicians from a population ancestral to the other studied populations of Bulgarian Gypsies. The difference of 1,100 years between the two splits (fig. 4) allows not only for heterogeneous origins but also for the possibility of different proto-Gypsy migrations from the Indian subcontinent.

Read Full Post »

%d bloggers like this: