te d *, P.O Keywords: Phrynosomatinae; Uma; Callisaurus; Cophosaurus; Holbrookia; Mixed models; Mixture models; Process heterogeneity; Phylogeny; Mitocho- era: Uma (6 species), Callisaurus (1 species), Cophosaurus (1 ships, Wilgenbusch and de Queiroz (2000) evaluated four previously proposed phylogenetic hypotheses (Fig. 1) using fragments of two mitochondrial genes, cytochrome b (650 bp) and 12S rRNA (350 bp). Although these data consistently favored one of the four topologies (Topology I in Fig. 1), support values for the two critical nodes distin- * Corresponding author. Present address: Department of Biology, 177 Clarkson Science Center, MRC 5805, Clarkson University, Potsdam, NY 13699-5805, USA. Fax: +1 315 268 7118. E-mail address: jschulte@clarkson.edu (J.A. Schulte II). Available online at www.sciencedirect.com Molecular Phylogenetics and Evolutndrial DNA 1. Introduction The phrynosomatine sand lizards are a well-studied clade of squamate reptiles for which several alternative phylogenetic hypotheses have been proposed. The group is composed of 12 currently recognized species that form four mutually exclusive clades traditionally ranked as gen- species), and Holbrookia (4 species) (Crother, 2000, 2003; Wilgenbusch and de Queiroz, 2000). Although these four clades are well-supported and largely uncontested, the rela- tionships among them have been the subject of consider- able debate (reviewed by Wilgenbusch and de Queiroz, 2000). In an earlier study of sand lizard phylogenetic relation-Received 8 August 2007; revised 3 December 2007; accepted 11 January 2008 Available online 24 January 2008 Abstract Phylogenetic analyses of DNA sequences were conducted to evaluate four alternative hypotheses of phrynosomatine sand lizard rela- tionships. Sequences comprising 2871 aligned base pair positions representing the regions spanning ND1?COI and cyt b-tRNAThr of the mitochondrial genome from all recognized sand lizard species were analyzed using unpartitioned parsimony and likelihood methods, likelihood methods with assumed partitions, Bayesian methods with assumed partitions, and Bayesian mixture models. The topology (Uma, (Callisaurus, (Cophosaurus, Holbrookia))) and thus monophyly of the ??earless? taxa, Cophosaurus and Holbrookia, is supported by all analyses. Previously proposed topologies in which Uma and Callisaurus are sister taxa and those in which Holbrookia is the sister group to all other sand lizard taxa are rejected using both parsimony and likelihood-based signi?cance tests with the combined, unpari- tioned data set. Bayesian hypothesis tests also reject those topologies using six assumed partitioning strategies, and the two partitioning strategies presumably associated with the most powerful tests also reject a third previously proposed topology, in which Callisaurus and Cophosaurus are sister taxa. For both maximum likelihood and Bayesian methods with assumed partitions, those partitions de?ned by codon position and tRNA stem and nonstems explained the data better than other strategies examined. Bayes factor estimates comparing results of assumed partitions versus mixture models suggest that mixture models perform better than assumed partitions when the latter were not based on functional characteristics of the data, such as codon position and tRNA stem and nonstems. However, assumed par- titions performed better than mixture models when functional di?erences were incorporated. We reiterate the importance of accounting for heterogeneous evolutionary processes in the analysis of complex data sets and emphasize the importance of implementing mixed model likelihood methods.  2008 Elsevier Inc. All rights reserved.Phylogenetic relationships and he among phrynosomatine sand lizar James A. Schulte II Division of Amphibians and Reptiles, Smithsonian Institution,1055-7903/$ - see front matter  2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ympev.2008.01.010rogeneous evolutionary processes s (Squamata, Iguanidae) revisited Kevin de Queiroz . Box 37012, MRC 162, Washington, DC 20013-7012, USA www.elsevier.com/locate/ympev ion 47 (2008) 700?716 hyloguishing the favored topology from alternative hypotheses were unimpressive, and the three alternative topologies could not be rejected under the conventional P 6 0.05 cri- terion of signi?cance. Assuming patterns of nucleotide sequence variation similar to those observed in the cyto- chrome b (cyt b) gene, Wilgenbusch and de Queiroz (2000) estimated that from 1037 to 2586 base pairs of sequence data would be necessary to reject the various alternative topologies. Therefore, we undertook a project to sequence two fragments of mitochondrial DNA encod- ing the nearly complete cyt b gene (1130 bp; Kumazawa et al., 1998) and a genomic segment extending from the gene encoding subunit one of nicotinamide adenine dinu- cleotide dehydrogenase (ND1) through the gene encoding subunit 1 of cytochrome c oxidase (COI) (1700 bp; Macey et al., 1997) in an attempt to distinguish between the alternative hypotheses with a higher degree of con?- dence. In the process, we also investigated the e?ects of dif- ferent methods for accommodating evolutionary Fig. 1. Alternative hypotheses of higher-level relationships among the phrynosomatine sand lizards (after Wilgenbusch and de Queiroz, 2000). Topology I (Savage, 1958; Cox and Tanner, 1977; Etheridge and de Queiroz, 1988; de Queiroz, 1989, 1992; Reeder and Wiens, 1996; ChangChien, 1996); topology II (Mittleman, 1942; Smith, 1946); topology III (Norris, 1958; Earle, 1961, 1962); topology IV (Axtell, 1958; Clarke, 1965; Adest, 1978). J.A. Schulte II, K. de Queiroz /Molecular Pheterogeneity among subsets of the data. Several recent studies (Brandley et al., 2005; Castoe and Parkinson, 2006; Nylander et al., 2004; Strugnell et al., 2005) have shown that partitioning data and using parti- tion speci?c parameter estimates are associated with higher probabilities of the data, and this result has been shown to hold for previously collected nucleotide sequence data on sand lizards (Wilgenbusch and de Queiroz, 2000). Alterna- tively, heterogeneity can be accommodated without parti- tioning the data by applying several models, each weighted by its probability, to every site (Pagel and Meade, 2004; Pagel and Meade, 2005). Here, we use the term mixed model (=assumed partitions = a priori partitions) for analy- ses that assume prede?ned data partitions (models) prior to conducting the analysis, with a di?erent model (including parameter estimates) applied to each partition. In contrast, mixture models accommodate heterogeneity across sites by applying several models to each site, where the likelihood of a given site is the sum of the likelihoods under the di?er- ent models, each weighted by its probability as estimatedfrom the data (Pagel and Meade, 2004). In this paper, we analyzed our data using both di?erent general approaches (mixed models versus mixture models) and di?erent strate- gies within the general approaches (di?erent assumed par- titions; di?erent numbers of Q matrices), which permitted us to compare the ability of these di?erent approaches and strategies to accommodate heterogeneity in our data. 2. Materials and methods 2.1. Taxon and character sampling Frozen liver or muscle tissue of 26 phrynosomatine sand lizards, representing all 12 currently recognized species, was used in DNA extractions (Appendix A). These same samples were used by Wilgenbusch and de Queiroz (2000) and cover a wide range of variation within and among sand lizard species with regard to geographic distribution, mor- phology, and taxonomy. We also used the same outgroup specimens sequenced by Wilgenbusch and de Queiroz (2000): Phrynosoma platyrhinos, P. hernandesi, Sceloporus jarrovii, Urosaurus ornatus, and Uta stansburiana. Wil- genbusch and de Queiroz (2000) reported sequencing CAS 174377 representing the taxon Holbrookia maculata thermophila; however, our sequences for the sample from CAS 174377 revealed that it is H. m. ?avilenta. The correct voucher specimen for the data reported for H. m. thermo- phila appears to be CAS 174400, which also was sequenced by Wilgenbusch and de Queiroz (2000) and has an identical cyt b sequence to that reported by those authors and was therefore used as the representative of H. m. thermophila in the present study. In addition, the museum number for the specimen of H. m. ?avilenta sequenced by Wilgenbusch and de Queiroz (2000) was published as MVZ 21814; the correct number is MVZ 214814. See Appendix A for museum numbers, localities of voucher specimens from which DNA was extracted, and GenBank accession num- bers for DNA sequences. Genomic DNA was extracted using Qiagen QIAamp tis- sue kits. Ampli?cation of genomic DNA was conducted in a DNA Engine (PTC-200TM) Peltier Thermal Cycler (MJ Research) using denaturation at 94 C for 35 s, annealing at 45?55 C for 35 s, and extension at 72 C for 150 s for 30?35 cycles. Negative controls were run on all ampli?ca- tions to check for contamination. Ampli?ed products were visualized on 1.5% SeaKem agarose gels. Reampli?ed double-stranded products were puri?ed with AMPure magnetic beads (Agencourt). Cycle-sequencing reactions were run using the ABI Prism Big Dye Terminator DNA Sequencing Kit (Perkin-Elmer) with denaturation at 96 C for 10 s, annealing at 50 C for 10 s, and extension at 60 C for 4 min for 35?40 cycles. Sequencing reactions were run on an ABI 3100 Genetic Analyzer (Applied Biosystems). We obtained nucleotide sequences from four protein- genetics and Evolution 47 (2008) 700?716 701coding genes (ND1, ND2, COI, cyt b), nine transfer RNA genes (tRNAIle, tRNAGln, tRNAMet, tRNATrp, tRNAAla, tRNAAsn, tRNACys, tRNATyr, tRNAThr), and the origin of light-strand replication (OL) in the mitochondrial DNA. PCR and sequencing primers used in this study are listed in Table 1. DNA sequences were aligned manually, using tRNA secondary structure models and information on codon position. Positions encoding part of ND1, complete ND2, part of COI, and part of cyt b were translated to amino acids using MacClade 4.08 (Maddison and Maddi- son, 2003) for con?rmation of alignment. Alignment of protein-coding regions was straightforward, as these regions contained no length variation. Noncoding inter- genic regions contained length variation in several instances (see Section 3.1) and gaps were inserted to ensure alignment of ?anking coding regions. Alignment of sequences encoding tRNAs was based on secondary struc- ture models (Kumazawa and Nishida, 1993; Macey and Verma, 1997). Secondary structures of tRNAs were inferred from primary structures of the corresponding tRNA genes using these models. Alignment of tRNA stem regions was straightforward; however, several tRNA loop regions were more di?cult to align due to signi?cant length variation. Gaps were inserted in these regions to maintain alignment of ?anking stem regions by minimizing implied substitutions and indel events, with indels weighted equally to transitions and transversions weighted approximately 4 times indels. A region was considered ambiguously aligned if more than one alternative placement of a gap had the same cost, with equal costs for gap insertions and exten- sions, while still maintaining alignment of conserved regions of adjacent tRNA stems or coding regions. Gaps were treated as missing data, and regions considered ambiguously aligned were excluded from phylogenetic analyses (see Section 3.1). 2.2. Phylogenetic analyses Phylogenetic trees were estimated using both parsimony and probabilistic methods. Parsimony analyses were con- ducted using PAUP* 4.0b10 (Swo?ord, 2002) using equal weighting of characters and equal costs for state transfor- mations. Optimal trees were estimated using heuristic searches with 1000 replicates of random stepwise addition and tree bisection and reconnection (TBR) branch swap- ping. Bootstrap resampling (Felsenstein, 1985a) was applied to assess support for individual nodes using 1000 bootstrap replicates and full heuristic searches with 100 replicates of random stepwise addition and TBR branch swapping. Maximum-likelihood (ML) analyses also were per- formed using PAUP*. Searches were conducted using a successive approximations approach (Swo?ord et al., 1996; Sullivan et al., 2005) for (1) the unpartitioned, com- bined data set, (2) the fragment encoding the genes span- ning ND1?COI, (3) and the fragment spanning cyt b to NA 0?3 GA TA TT AC TA CG GC CC GT AA AT TT GT AA RG CA TT CCA CT AC CAA GT TG AC CT AA ), ta 702 J.A. Schulte II, K. de Queiroz /Molecular Phylogenetics and Evolution 47 (2008) 700?716Table 1 Primers used to amplify and sequence the mtDNA ND1?COI and cyt b-tR Primer name Human position Gene Sequence (5 ND1ba,s L4160 ND1 CGATTCC ND1f.3a,s L4178a ND1 CAACTAA ND1f.7a,s L3914 ND1 GCCCCAT ILEf.1s L4221 tRNAIle AAGGATT ILEf.6s L4221 tRNAIle AAGGGN METf.6a,s L4437 tRNAMet AAGCTTT METr.5a,s H4419 tRNAMet GGTATGA ND2f.4a,s L4645 ND2 ACAGAAG ND2r.6a H4980 ND2 ATTTTTC ND2f.14a,s L4882 ND2 TGACAAA ND2f.54s L5263 ND2 TAACYGG ND2f.57s L4833 ND2 CACAYTT ND2r.61a,s H4568 ND2 ATGGCTA TRPf.11s L5550 tRNATrp AACCAAG TRPf.12a,s L5549 tRNATrp AACCAAG ALAf.4s L5638b tRNAAla CTGAATG ASNr.2a,s H5692 tRNAAsn TTGGGTG CO1r.1a,s H5934a COI AGRGTG CO1r.8a,s H6159 COI GCTATGT CYTbf.1a,s L14842 Cyt b CCATCCA CYTbf.2a,s L15172 Cyt b TGAGGA CYTbr.3a,s H15916 Cyt b GTCTTCA CYTbr.5a,s H15402 Cyt b CTTTGTA CYTbf.6s L15429 Cyt b CCATTYC CYTbr.7s H15216 Cyt b ATTCATT CYTbf.8s L15691 Cyt b ACATCAA Their positions in the human mitochondrial genome (Anderson et al., 1981 whose extension produces the heavy and light strands, respectively. a Primers used in ampli?cation. s Primers used in sequencing.Thr regions in this study 0) Reference TATGACCARCT Kumazawa and Nishida (1993) CACCTACTATGAAA Macey et al. (1997) GACCTCACAGAAGG Macey et al. (1998) TTTGATAGAGT Macey et al. (1997) CTTTGATAGAGT Schulte et al. (2003) GGCCCATACC Macey et al. (1997) CCGATAGCTT Macey et al. (1997) GCAACAAAATA Macey et al. (1997) AGTTGGGTTTGRTT Macey et al. (1997) CTAGCCCC Macey et al. (1999) TTTTACCAAAATGAC This study GACTNCCAGAAGT Torres-Carvajal et al. (2006) GTGTTTATTTCTA This study CCTTCAAAGT This study CCTTCAAAG Schulte et al. (2003) ACTCAGACACTTT Macey et al. (1997) TAGCTGTTAA Macey et al. (1997) ATGTCTTTGTGRTT Macey et al. (1997) GGGGCTCCAATTAT Weisrock et al. (2001) ATCTCAGCATGATGAAA Kocher et al. (1989) ATATCCTTCTGAGG Fu (1999) TTTTGGTTTACAAGAC Kocher et al. (1989) ARAAGTATGGGTGRAATGG This study CCATACTTYTCATACAAAG This study ACTAGGGTTGTTCC This study CAACGAAGCAC This study rget genes, sequences, and references are given. H and L designate primers tRNAThr. Both hierarchical likelihood ratio tests and the Akaike Information Criterion as implemented in Modeltest v3.7 (Posada and Crandall, 1998) were used to ?nd the best ?tting model of sequence evolution for a tree estimated using neighbor joining (NJ). Parameter values estimated on the NJ tree were ?xed in initial heuristic searches under the following conditions: (1) Starting trees were obtained via NJ; (2) TBR branch-swapping; (3) reconnection limit set to eight. Optimal tree(s) obtained from this search were strategy on that tree. The tree with the highest partitioned likelihood score is the best (ML) tree for each partitioning strategy. The current implementation of mmml does not conduct bootstrap analyses, so uncertainty of (support for) the various clades implied by this ??point? estimate of the phylogeny could not be evaluated. In addition, the most complicated evolutionary model currently imple- mented in this program is the general time reversible model with rate variation among sites estimated using a gamma J.A. Schulte II, K. de Queiroz /Molecular Phylogenetics and Evolution 47 (2008) 700?716 703used to estimate new parameter values under an identical model. These new parameter values were then ?xed in a second search with the same conditions as the initial run. This process was repeated until the same tree and parame- ter values were obtained in two successive searches. Boot- strap resampling was applied using ML with 100 pseudoreplicates and heuristic searches as above except that successive approximations were not conducted for each pseudoreplicate. Parameter estimates for bootstrap analyses were ?xed at the parameter values obtained from the ?nal iteration of the corresponding successive approxi- mations analysis. In our evaluation of branch support strength, we considered a bootstrap value of P95% as strongly supported (Felsenstein and Kishino, 1993), <95 to P70% as moderately supported, and <70% as weakly supported. The impact of using partition-speci?c (=??mixed?) evo- lutionary models on phylogeny estimation has been inves- tigated recently using both likelihood (Wilgenbusch and de Queiroz, 2000) and Bayesian methods (Brandley et al., 2005; Castoe et al., 2004, 2005, 2006; Nylander et al., 2004). We used both approaches under the partitioning strategies (assumed or prior data partitions) described in Table 2. For ML analyses, the program mixed model max- imum likelihood analysis (mmml), written by James C. Wilgenbusch, uses PAUP* to ?nd an initial tree using neighbor joining and to perform a heuristic likelihood search saving N (in this case 100) best trees for all data (unpartitioned) and for each partition (Table 2). Trees from all data sets (each partition and the unpartitioned data set) were combined into a single tree?le removing duplicate topologies in the process. Likelihood scores for these trees were calculated separately for each partition and then summed to yield a likelihood score for each tree under a given partitioning strategy. For each tree, the par- titioned likelihood score is the sum of the likelihood scores for each component partition of a particular partitioning Table 2 Partitioning strategies (assumed partitions) used in this study Partition strategy All data combined ND1?COI coding region; cyt b-tRNAThr coding region All protein-coding genes; tRNAs Codon positions of all protein coding genes; tRNAs ND1; ND2; COI; cyt b; tRNAs Codon positions of all protein coding genes; tRNA stem regions; tRNA nonsdistribution (GTR + C). Bayesian analyses were performed in MrBayes 3.1?3.1.2 (Ronquist and Huelsenbeck, 2003) using a general time- reversible model of sequence evolution with rate variation among sites estimated using a gamma distribution and an estimated proportion of invariant sites (GTR + I + C; Tavare?, 1986) for the assumed (prior) partitioning strate- gies outlined in Table 2. This model was chosen for com- parison to the results of the ML analyses, in which GTR + I + C was identi?ed as the most appropriate model for several partitions (see Section 3.1?Phylogeny estima- tion). Bayesian analyses under the various assumed parti- tions (Table 2) also were performed using GTR + C (i.e., without I) to compare likelihood scores under this model to those resulting from both ML analyses under the same assumed partitions performed using mmml and mixture model Bayesian analyses performed using BayesPhyloge- nies (see below). For each assumed partition analysis with MrBayes a run of 4  106 generations and four Markov chains with default heating values was used. Parameter values for the model were estimated from the data and most were initi- ated with default uniform priors except that branch lengths were unconstrained (no molecular clock) with default exponential priors. Trees and parameter values were sam- pled every 1000 generations resulting in 4000 saved trees per analysis, of which 2000 were discarded as ??burn-in?. Stationarity was assessed by plotting the lnL per genera- tion in the program Tracer 1.3 (Rambaut and Drummond, 2005) and using the average standard deviation of split fre- quencies available in MrBayes. If the two simultaneous runs are converging on the same tree, the average standard deviation of split frequencies are expected to approach zero. After con?rming that the analysis appeared to reach stationarity, the resultant 4000 trees (2000 from each simul- taneous run) were used to calculate Bayesian credibility values (BC) for each branch in a 50% majority-rule consen- Abbreviation Number of characters in partition(s) P1 2763 P2 1697; 1066 P3 2192; 571 P4 731; 730; 731; 571 P5 84; 1033; 30; 1045; 571 tem regions P6 731; 730; 731; 378; 193 hylosus tree. Clades with BCP 95% were considered strongly supported with the caveat that BC may overestimate sup- port for reasons discussed by Alfaro et al. (2003), Douady et al. (2003) and Lewis et al. (2005). In addition, we analyzed our complete dataset under the phylogenetic mixture model approach using BayesPhyloge- nies (Pagel and Meade, 2004). By default, BayesPhyloge- nies assigns uniform priors on a 0?100 interval for the gamma and substitution rate parameters, an exponential prior for branch lengths with a mean of 10, state frequen- cies drawn from the Dirichlet distribution, and all trees are considered equally likely a priori. These default settings were used in all mixture model analyses. We performed a total of 12 analyses with nQ and nQ + C mixture models, where n varied between 1 and 6 independent rate matrices (Qs) and C represents among-site rate variation under a gamma distribution with four rate categories. A parameter for the proportion of invariant sites is not implemented in BayesPhylogenies. Each MCMC run consisted of 4  106 generations and four Markov chains. We allowed the Mar- kov-chains to reach stationarity before sampling trees at intervals of 1000 generations. Stationarity was assessed by plotting the lnL per generation in Microsoft Excel v. X and checking for no average improvement in the like- lihood scores. We discarded the ?rst 2000 trees (2  106 generations) as ??burn-in;? thus, estimates of trees and parameter values are based on 2000 trees sampled from each run. Conventional likelihood ratio tests used to distinguish maximum likelihood estimates for di?erent partitioning strategies are not applicable in a Bayesian framework because they assume that parameter estimates are at their maximum likelihood values, and MCMC methods sample the posterior density of parameters rather than ?nding maximum likelihood point estimates (Pagel and Meade, 2005). In a Bayesian context, an alternative approach to model selection uses Bayes factors (Je?reys, 1935; Kass and Raftery, 1995; Nylander et al., 2004). The Bayes factor (BF) penalizes more complex hypotheses by multiplying the probability of the data given the model parameters of the hypothesis by the set of prior probability terms for each parameter (numbers 6 1), which reduces the marginal like- lihood. If we consider two alternative partitioning strate- gies, a and b, as alternative hypotheses, then the Bayes factor (Bab) is the ratio of the marginal likelihood (i.e., probability of the data given all the model parameters for a particular hypothesis scaled by that hypothesis? prior probability and integrated over all parameter values) of hypothesis a to that of hypothesis b and represents a sum- mary of the evidence provided by the data in favor of hypothesis a relative to hypothesis b. However, computing the marginal likelihoods of the hypotheses (alternative partitioning strategies) is very di?- cult in practice. We estimated the marginal likelihood of a hypothesis by computing the harmonic mean of the ln-like- 704 J.A. Schulte II, K. de Queiroz /Molecular Plihood (lnL) values sampled from the posterior distribu- tions of the two corresponding converged chains beingcompared (Kass and Raftery, 1995; Raftery, 1996). Although the harmonic mean estimator can be unstable, it is said to be su?ciently accurate for interpretation on a natural logarithmic scale. Harmonic mean calculations were made using the ??sump? command in MrBayes of all saved trees after ??burn-in? for both MrBayes and Bayes- Phylogenies runs. Following Pagel and Meade (2005), Bayes factor estimates were calculated by subtracting the harmonic mean lnL of hypothesis b from the harmonic mean lnL of hypothesis a, multiplying this value by 2, then subtracting the di?erence in number of free parame- ters between models multiplied by ln(0.01), re?ecting the uniform 1?100 prior distribution (see Table 6 for an exam- ple). BF values from 2 to 5 were considered positive evi- dence, >5 were strong evidence, and values >10 as very strong evidence favoring one hypothesis over the other (Kass and Raftery, 1995; Pagel and Meade, 2004; Raftery, 1996). 2.3. Hypothesis testing Signi?cance tests were used to examine support for the optimal tree(s) relative to alternative hypotheses. Phyloge- netic hypotheses were tested by comparing the optimal (parsimony or likelihood) topology in the absence of a con- straint with the optimal tree under a topological constraint. To ?nd the optimal tree(s) compatible with a particular phylogenetic hypothesis, topologies containing the minimal number of groups necessary to describe the hypothesis of interest (Appendix B) were constructed using MacClade 4.08 (Maddison and Maddison, 2003). These partially resolved topologies were loaded as constraints in PAUP*, which was then used to analyze the sequence data under the parsimony criterion using heuristic searches with the same settings used to estimate the most parsimonious trees in the absence of topological constraints. Phylogenetic hypotheses were tested similarly under likelihood using constraint trees and a successive approximations approach using the same search strategies as described for searches in the absence of topological constraints. Wilcoxon signed-ranks (WSR) tests (Felsenstein, 1985b; Templeton, 1983) were used with the parsimony trees. They were conducted as two-tailed tests using PAUP*, which incorporates a correction for tied ranks. Goldman et al. (2000) criticized the application of the WSR test as applied in many studies because this test assumes that phylogenetic hypotheses are selected independently of the data analysis (a priori). However, as the optimal trees used in all of our tests corresponded with one of the four alternative hypotheses identi?ed in previous studies (Fig. 1), they may be considered to have been chosen a priori. In any case, Shimodaira?Hasegawa (SH) (Shimodaira and Hase- gawa, 1999) and Approximately Unbiased (AU) tests (Shi- modaira, 2002) also were performed to test the optimal ML tree in the absence of topological constraints (which corre- genetics and Evolution 47 (2008) 700?716sponded to one of the a priori hypotheses) against the opti- mal ML trees under each of the alternative hypotheses (see hyloAppendix B for constraint trees). SH tests were conducted in PAUP* using bootstrap resampling of 10,000 replicate estimated ln-likelihoods of sites (RELL) and AU tests were conducted in the program CONSEL (Shimodaira and Hasegawa, 2001) with 10 sets of 10,000 bootstrap repli- cates. Note that the constraint topology consistent with Topology I was identical to the optimal ML tree in the absence of constraints from the analyses of the combined data, the ND1?COI partition, and the cyt b-tRNAThr parti- tions. Therefore, SH and AU tests included only the ML topology and the best trees compatible with Topologies II?IV. Alternative hypotheses also were tested in a Bayesian framework (Huelsenbeck et al., 2002) using the post- burn-in samples of 4000 trees from the analyses with assumed partitions. Topologies corresponding to the alter- native hypotheses (Appendix B) were loaded as constraints in PAUP*, and post-burn-in trees for each of the six parti- tioning strategies (Table 2) were imported. Each of the four alternative topological constraints was then used to ?lter the trees, saving only those trees that were compatible with a particular constraint. The Bayesian credibility value for the associated hypothesis was calculated by dividing the number of saved trees by the total number in the post- burn-in sample (4000). If the fraction of ?ltered trees was less than 5% of the total, then the constrained topology in question was outside of the 95% credible interval (i.e., was rejected). 3. Results 3.1. Sequence variation Prior to alignment, the fragment spanning ND1?COI was between 1732 and 1750 base pairs in length and the cyt b to tRNAThr fragment was between 1077 and 1093 base pairs across all samples. For the taxa sampled in this study, alignment of sequences for protein-coding genes was unambiguous. Among tRNA genes, several loop regions could not be aligned unambiguously as was the case for some noncoding regions between genes. Regions that could not be aligned unambiguously were parts or all of the dihydrouridine (D) loops for tRNAIle (positions 104?109), tRNAMet (positions 250?252), tRNATrp (positions 1356? 1361), tRNACys (positions 1644?1647), and tRNAThr (posi- tions 2852?2863). Nucleotide positions in the TWC (T) loops for the genes encoding tRNATrp (positions 1393? 1399) and tRNACys (positions 1606?1612) were aligned ambiguously as were noncoding sequences between the genes encoding ND1 and tRNAIle (positions 85?88), tRNAGln and tRNAMet (positions 231?235), ND2 and tRNATrp (positions 1337?1339), tRNATrp and tRNAAla (positions 1413?1417), tRNAAla and tRNAAsn (positions 1487?1490), tRNACys and tRNATyr (positions 1661? 1664), tRNA Tyr and COI (positions 1737?1745), and cyt J.A. Schulte II, K. de Queiroz /Molecular Pb and tRNAThr (positions 2821?2838). These ambiguously aligned regions were removed prior to conducting analyses.All taxa used for phylogenetic analysis had a recognizable origin for light-strand replication (OL) between the tRNAAsn and tRNACys genes according to the criteria out- lined in Macey et al. (1997). However, the OL loop could not be aligned unambiguously; therefore, this region (posi- tions 1574?1584) was excluded. Excluded regions comprise less than 3.8% (108 of 2871) of the aligned sequence posi- tions. Aligned DNA sequences are available in TreeBASE (Study accession number = S1936; Matrix accession number = M3568). Aligned DNA sequences for all regions comprised 2871 nucleotide positions. Of 2763 inferred unambiguous sites in 31 aligned sequences, 1200 (1003 ingroup only) are vari- able, 987 (858 ingroup only) are informative under the par- simony criterion, and there are 1075 distinct data (site) patterns under the GTR + C + I model in the combined data set. The partition spanning ND1?COI has 735 (599 ingroup only) variable characters of which 587 (506 ingroup only) are parsimony informative and 671 distinct data patterns. The region spanning cyt b to tRNAThr has 465 (404 ingroup only) variable characters of which 400 (352 ingroup only) are parsimony informative and 452 data patterns. For all protein-coding genes (ND1, ND2, COI, cyt b), codon positions vary in their contribution of phylo- genetic information with 272 variable ?rst positions (224 parsimony informative; 261 data patterns), 109 variable second positions (73 parsimony informative; 101 data pat- terns), and 650 variable third positions (571 parsimony informative; 622 data patterns). Among tRNA genes stem regions contribute 119 variable positions (81 parsimony informative; 116 data patterns) and non-stem regions have 50 variable positions (38 parsimony informative; 55 data patterns). 3.2. Phylogeny estimation 3.2.1. Single model and set of parameter values (Unpartitioned Parsimony and ML) Parsimony analysis of the equally weighted data set including all mtDNA regions identi?ed three most-parsi- monious (MP) trees (length = 4200 steps). Two of these trees conform to topology I of Fig. 1, and di?er from one another only in the position of the clade containing Callisaurus spp. and C. d. bogerti, while the third is com- patible with topology III. Analysis of 1697 base pairs span- ning the region from ND1?COI yielded four MP trees (length = 2324 steps), which are all compatible with topol- ogy I (Fig. 1). The trees di?er in the positions of Holbrookia propinqua and the clade containing Callisaurus spp. and C. d. bogerti. Two MP trees were obtained from analysis of 1066 nucleotide positions spanning genes encoding cyt b and tRNAThr (length = 1866 steps). These trees also con- form to topology I (Fig. 1) and di?er from each other in the position of the clade containing Callisaurus spp. and C. d. bogerti. High bootstrap support for most branches genetics and Evolution 47 (2008) 700?716 705was found for trees resulting from each of these partitions and the combined data set (Table 3 and Fig. 2). The com- in D1? 5 0 0 0 0 0 0 0 0 7 6 0 0 4 0 0 7 0 hyloTable 3 Bootstrap proportions and Bayesian credibility values for nodes labeled partitioning strategies Branch number Parsimony Likelihood Combined ND1?COI Cytb-tRNAThr Combined N 1 77 98 ? ? 9 2 100 100 54 100 10 3 100 100 76 100 10 4 98 89 75 100 10 5 100 99 73 99 10 6 100 100 99 100 10 7 100 100 100 100 10 8 100 100 100 100 10 9 100 100 100 100 10 10 70 60 75 72 5 11 99 99 60 99 9 12 100 100 100 100 10 13 100 100 99 100 10 14 54 ? ? 62 6 15 100 100 100 100 10 16 100 100 100 100 10 17 100 100 100 100 9 18 56 56 ? 70 ? 19 100 100 100 100 10 706 J.A. Schulte II, K. de Queiroz /Molecular Pbined data set had the highest average bootstrap values while the cyt b-tRNAThr region had the lowest with the ND1?COI fragment intermediate (Table 3; Fig. 2). Both hierarchical likelihood ratio tests (LRT) and the Akaike information criterion (AIC) selected the GTR + I + C model as the most appropriate model for the combined, unpartitioned data. Using these data and that model in a successive approximations ML analysis, a single topology was found (lnL = 21799.01) conforming to Topology I (Fig. 3). Analysis of the ND1?COI data set with Modeltest selected the TrN + I + C model using LRTs and the GTR + I + C model using the AIC. In this case, the GTR + I + C model was never compared to the TrN + I + C model in the path taken by the program for the LRTs. These models di?er by 16.77 likelihood units and are signi?cantly di?erent (P < 0.01) when a LRT is applied; thus, we used the GTR + I + C model for subse- quent ML analyses of the ND1?COI data set. Application of the successive approximation ML analysis yielded a sin- gle ML tree (lnL = 12685.57) that conformed to Topol- ogy I (Fig. 1). The model selected for the mtDNA fragment spanning cyt b-tRNAThr was GTR + I + C using LRTs but HKY + I + C using the AIC. These models were not compared by Modeltest using the LRT and di?er by 20 90 82 66 99 88 21 100 100 100 100 100 22 100 100 99 100 100 23 96 91 84 100 100 24 92 53 93 92 ? 25 100 100 100 100 100 26 89 83 66 99 98 27 95 92 75 100 89 28 100 99 98 100 98 P1?P6 refer to the partitioning strategies listed in Table 2. Empty cells (????) re majority rule consensus tree.Fig. 2 based on di?erent analytical methods and character partitions or Bayesian COI Cytb-tRNAThr Assumed partitions (P1?P6) Mixture models 1?6Q 1?6Q + C ? 73?94 <50?94 <50?92 69 100 100 100 100 100 100 100 78 100 100 100 96 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 88 98?100 86?100 85?100 72 100 100 100 100 100 100 100 96 100 100 100 ? 79?88 82?100 72?90 99 100 100 100 100 100 100 100 94 100 100 100 58 85?97 63?98 69?91 97 100 100 100 genetics and Evolution 47 (2008) 700?716only 0.75 likelihood units, which is not signi?cant accord- ing to the LRT (P > 0.05), so we used the HKY + I + C model. The successive approximations ML analysis of this fragment found one tree (lnL = 9043.26) that conformed to Topology I (Fig. 1). Trees from the combined data and the ND1?COI region were almost identical, the only excep- tion being the position of Uma inornata, while the topology found using the cyt b fragment di?ered from the other two in the position of the clade containing C. d. bogerti and C. d. spp. and the relationships of the outgroup species to each other. As with MP analyses, most branches of the ML trees were well supported (Table 3), though about 15.5% of the branches were poorly to moderately supported, and ca. 13% were poorly supported. 3.2.2. Mixed model ML (assumed partitions) Each of the six partitioning strategies (Table 2) was ana- lyzed using the GTR + C model as implemented in mmml. Current implementation of the program allows speci?ca- tion of only one model, though estimated parameter values can di?er between partitions. Although the GTR model is not necessarily optimal for all of the various partitions, it was selected by both LRTs and the AIC for both the unpartitioned data set and the ND1?COI partition, and 77 100 100 100 100 100 100 100 97 100 100 100 92 100 100 100 95 100 100 99?100 99 100 100 100 57 100 100 100 98 100 100 100 96 100 100 100 fer to bootstrap values less than 50% and thus not present in the bootstrap hyloJ.A. Schulte II, K. de Queiroz /Molecular Pby LRTs for the cyt b partition. Moreover, using a single model facilitates the comparison of alternative partitioning strategies using LRTs (Table 4). The optimal trees for all partition strategies (Table 2) conformed to Topology I (Fig. 1), with the strategy allowing the three codon posi- tions of all protein coding genes, tRNA stem regions, and tRNA nonstem regions each to have a unique set of parameter values (P6) having the highest likelihood (lnL = 20545.99; Table 4). According to LRTs (Table 4), P6 was signi?cantly better than all nested partitioning strategies (P1, P3, and P4). For all other comparisons involving nested strategies, strategies with greater numbers of partitions were signi?cantly better than those with fewer partitions (P2 versus P1, P3 versus P1, P4 versus P1 and P3, P5 versus P1 and P3). Finally, for comparisons involving similar numbers of partitions, partitioning strategies based on structural and/or functional properties of the molecules (codon position and RNA secondary structural features) Fig. 2. Phylogenetic relationships of phrynosomatine sand lizards based on parsimony, likelihood, and Bayesian analysis of combined mtDNA sequences. The numbers above each node correspond to the clade numbers in Table 3, which gives bootstrap proportions (MP and ML) and Bayesian credibility (BC) values for the various nodes. The topology shown corresponds to 1 of 3 MP trees for the combined data and 1 of 4 MP trees for the ND1?COI fragment, ML trees for the unpartitioned, combined data and the ND1?COI fragment, and the Bayesian consensus trees for assumed Bayesian partition strategies P1?P6, and mixture models 2Q?5Q, 1Q + C, and 3Q?6Q + C (note that the analyses favoring this particular topology are a subset of those favoring Topology I). Dashed branches indicate that alternative topological arrangements were found in the optimal tree resulting from one or more analyses or in one or more of the three most-parsimonious trees.genetics and Evolution 47 (2008) 700?716 707explained the data better than those based on contiguous sequences and/or genes (e.g., P3 versus P2, P6 versus P5) though these di?erences were not tested for signi?cance because the partitions are not nested. Indeed, P4, which has a separate partition for each codon position, explained the data better than P5, which has a separate partition for each protein-coding gene, despite having fewer partitions (4 versus 5). 3.2.3. Mixed model bayesian (assumed partitions) The consensus topologies of the 4000 trees in the poster- ior distributions were identical for all partitioning strate- gies using both GTR + I + C and GTR + C and congruent with Topology I, as well as being identical to one of the MP trees and the ML tree for the combined (unpartitioned) data (Fig. 2). The only major e?ect of the di?erent partitions was that BC values for node 18 (Coph- osaurus plus Holbrookia) increased from 85% to 90% using P1?P3 and 91% with P5 to 96% and 97% using strategies P4 and P6, respectively (Table 3). Most nodal BC values were high (BC > 95%) under all partitioning strategies, and only two nodes (1 and 14) had BC values below 95% under most partitioning strategies. Fig. 3. ML phylogram of sand lizards relationships based on analysis of unpartitioned data containing 2763 included characters spanning the regions from ND1 to COI and cyt b to tRNAThr under the GTR + I + C model. Branches are proportional to their lengths. The ML score of the single topology obtained using the GTR + I + C model is lnL = 21799.01. Table 4 Comparison of alternative partitioning strategies (assumed partitions) based on mixed model maximum likelihood analyses using the ML tree for each partition, all of which correspond to Topology I Partition(s) Number of partitions ML score (lnL) lnL di?erences; P values (partition compared) All data combined (P1) 1 21886.84a ND1?COI coding region; cyt b-tRNAThr coding region (P2) 2 21821.79 65.05; P < 0.001 (P1) All protein-coding genes; tRNAs (P3) 2 21699.59 187.25; P < 0.001 (P1) Codon positions of all protein coding genes; tRNAs (P4) 4 20610.58 1276.26; P < 0.001 (P1) 1089.01; P < 0.001 (P3) ND1; ND2; COI; cyt b; tRNAs (P5) 5 21550.62 336.22; P < 0.001 (P1) 148.97; P < 0.001 (P3) Codon positions of all protein coding genes; tRNA stems; tRNA nonstems (P6) 5 20545.99 1340.85; P < 0.001 (P1) 1153.60; P < 0.001 (P3) 64.59; P < 0.001 (P4) P values for comparisons of nested partitioning strategies are based on likelihood ratio tests. a Current models implemented in the ??mmml? program do not include a parameter for the proportion of invariant sites (I), hence the ML score for all data combined (P1) in this analysis is not as good as that obtained for the same data set in analyses that included this parameter (e.g., Table 7). 708 J.A. Schulte II, K. de Queiroz /Molecular Phylogenetics and Evolution 47 (2008) 700?716The harmonic mean lnL (Table 5) and approximate Bayes Factors (Table 6) were used to evaluate the ability of the six assumed (prior) partitioning strategies to explain the data set. Among all strategies P6, which separately ana- lyzed each codon position, tRNA stems, and tRNA non- stems, explained the data better than the next best partition (P4) by almost 23 lnL units and an approximate BF of 3.85. Partition identity was found to have the greatest e?ect on improving model performance. Partitions that sepa- rated each protein-coding codon position (P4 and P6) were favored over partitions that were de?ned by fragment or gene boundaries (P2 and P5) or gene type (protein-coding genes and tRNAs?P3) as indicated by both lnL (Table 5) and BF estimates (Table 6). Even though strategy P4, which has a partition for each codon position, has fewer partitions (4 versus 5) than P5, which does not, the har- monic mean lnL score of P4 is almost 940 units better than that of P5, and the BF estimate of 1918.07 over- whelmingly favors P4. Partitioning tRNAs by stem and non-stem regions (P6 versus P4) also improved the model, though the improvement was far less in terms of both the Table 5 Arithmetic mean (Xa) and harmonic mean (Xh?  ln Ls for sets of trees obtained MrBayes) and mixture models (implemented in BayesPhylogenies) Assumed partitions Mixture models Partitioning strategy Xa  lnL Xh  ln L Free parameters Model Xa  ln L P1 21911.82 21940.15 9 1Q 24141.80 P2 21870.94 21904.18 18 2Q 21943.82 P3 21752.79 21791.47 18 3Q 21755.12 P4 20766.30 20803.95 36 4Q 21636.08 P5 21701.16 21742.26 45 5Q 21594.21 P6 20736.52 20781.30 45 6Q 21575.72 Assumed partitions results are from analyses using the GTR + C model (no mixture model analyses results (see text for explanation). Free parameters for a set to 1), 3 for base frequencies, and 1 for gamma shape parameter; 9 paramet implemented in MrBayes. Free parameters for the 1Q + C mixture model are th 7 parameters are added, 6 parameters for the rate matrix (G to T substitution ra value of that matrix.harmonic mean lnL score (ca. 23 lnL units) and the BF (3.85). 3.2.4. Mixture model bayesian The consensus topology of the 2000 trees in the pos- terior distribution from all mixture model analyses (1? 6Q and 1?6Q + C) was congruent with Topology I (Fig. 1). All models yielded identical topologies for sand lizard relationships with the only di?erence between topologies being the relative position of the outgroup taxa Sceloporus, Uta, and Urosaurus. BC values for the relationship of Cophosaurus and Holbrookia (node 18) were above 95% only for the 1Q analysis (98%) and below this cuto? for all other mixture models. The sis- ter-taxon relationship between Uma notata and U. inor- nata (node 10) was below 95% using a single rate matrix with or without gamma but the addition of more than one rate matrix yielded BC values above 95% for all other models. Interestingly, node 14 had a BC value above 95% using models 1?4Q but below 95% when models 5?6Q and 1?6Q + C were used. in Bayesian analyses based on assumed (prior) partitions (implemented in Xh  ln L Free parameters Model Xa  ln L Xh  lnL Free parameters 24155.32 8 1Q + C 21924.03 21939.13 9 21970.29 15 2Q + C 21747.91 21761.55 16 21770.30 22 3Q + C 21656.37 21674.23 23 21651.64 29 4Q + C 21633.54 21651.82 30 21621.55 36 5Q + C 21633.52 21652.69 37 21600.21 43 6Q + C 21634.24 21651.57 44 t incorporating an estimate of invariant sites) to allow comparison with ssumed partitions are calculated as follows: 5 for rate matrix (G to T rate ers are added for each additional partition as these values are unlinked as e same as for P1 under GTR + C; however, for each rate matrix above 1Q tes for additional matrices are free to vary) and 1 parameter for the weight gies 1Q 4 4 4 6 4 6 hyloTable 6 Approximate BF estimates used to compare alternative partitioning strate Partitioning strategies P1 P2 P3 P4 P5 P6 P1 ? 30.49 255.91 2148.06 229.99 2151.91 P2 ? 225.42 2117.57 199.50 2121.42 P3 ? 1892.15 25.92 1896.00 P4 ? 1918.07 3.85 P5 ? 1921.92 P6 ? 1Q 2Q 3Q 4Q 5Q J.A. Schulte II, K. de Queiroz /Molecular PThe harmonic mean lnL score of the model with one rate matrix and gamma rate-heterogeneity (1Q + C) is about 2216 units better than the score of the model with one rate matrix alone (1Q) (Table 5), indicating that there is a large component of rate-heterogeneity in the data. When a second Q matrix without C is used, the harmonic mean lnL score is only about 30 units below that of the 1Q + C model suggesting that an additional rate matrix (pattern heterogeneity) is an alternative way of accounting for rate heterogeneity (Pagel and Meade, 2004). However, the additional rate matrix comes at the cost of seven addi- tional parameters (6 substitution rate categories plus 1 weight term) versus a single parameter, C, and conse- quently, the simpler model is strongly favored by the approximate BF (Table 6). As the number of rate matrices 6Q 1Q + C 2Q + C 3Q + C 4Q + C 5Q + C 6Q + C 1Q + C 2Q + C 3Q + C P1 2.04 324.96 467.3 P2 28.45 294.47 436.8 P3 253.87 69.05 211.4 P4 2146.02 1823.10 1680.6 P5 227.95 94.97 237.3 P6 2149.87 1826.95 1684.5 1Q 4427.77 4750.70 4893.1 2Q 89.95 412.87 555.2 3Q 277.79 45.13 187.5 4Q 482.88 159.95 17.5 5Q 510.82 187.90 45.4 6Q 521.26 198.34 55.9 1Q + C ? 322.92 465.3 2Q + C ? 142.4 3Q + C ? 4Q + C 5Q + C 6Q + C Bayes factors were calculated using the following formula: BF = 2 (lnLa  Likelihoods (from Table 5) and the number of free parameters of model (partit yields: 2*(21940.15  (21904.18))  ((9  18)*ln(0.01)) = 30.49. BF values and values >10 as very strong evidence favoring one model over the other (Ka values favor the model in the column heading and negative BF values favor tand mixture models 2Q 3Q 4Q 5Q 6Q 425.73 87.91 279.83 484.92 512.86 523.30 456.23 118.40 249.34 454.42 482.37 492.81 681.65 343.82 23.92 229.00 256.95 267.39 573.80 2235.97 1868.23 1663.14 1635.20 1624.76 655.73 -317.90 49.84 254.92 282.87 293.31 577.65 2239.82 1872.08 1667.00 1639.05 1628.61 ? 4337.82 4705.57 4910.65 4938.60 4949.04 ? 367.74 572.83 600.77 611.22 ? 205.08 233.03 243.47 ? 27.94 38.39 ? 10.44 genetics and Evolution 47 (2008) 700?716 709increases (from 1 to 6), both with and without gamma, the rate of score improvement reaches a plateau (Table 5). For the 5Q and 6Q mixture models, the scores are higher with- out C than with it (Table 5), which may result from sto- chastic variation or insu?cient mixing of chains due to relatively high numbers of parameters (A. Meade, pers. comm.). These factors may also explain why 1Q + C and the identical P1 (under GTR + C) di?er slightly with regard to both mean lnL scores (Table 5) and approxi- mate Bayes Factor (Table 6). Determining how many rate matrices to estimate is com- monly done using a combination of Bayes factors (Raftery, 1996) and examining the variability in estimated parame- ters (Pagel and Meade, 2004; Pagel and Meade, 2005). Based on BFs (Table 6) it appears that the model with ? 4Q + C 5Q + C 6Q + C 7 479.95 445.98 415.98 7 449.46 415.48 385.49 5 224.04 190.06 160.07 9 1668.11 1702.09 1732.08 7 249.96 215.98 185.99 5 1671.96 1705.94 1735.93 0 4905.69 4871.71 4841.71 8 567.86 533.89 503.89 3 200.12 166.14 136.15 5 4.97 38.94 68.94 9 32.91 66.89 96.88 4 43.35 77.33 107.33 3 477.91 443.94 413.94 0 154.99 121.01 91.02 12.58 21.39 51.39 ? 33.98 63.97 ? 30.00 ? lnLb)  (Pa  Pb), where lnLx and Px are the harmonic mean of the ln ion strategy) x. For example, comparing models P1 (row) and P2 (column) from 2 to 5 were considered positive evidence, >5 were strong evidence, ss and Raftery, 1995; Pagel and Meade, 2004; Raftery, 1996). Positive BF he model in the row heading. six rate matrices and no gamma is justi?ed as the best solu- tion to these data. However, the model with ?ve rate matri- ces may be more appropriate because, as Pagel and Meade (2005) point out, the Bayes factor test assumes that all sites are independent whereas the true number of independent sites in our data set is likely to be fewer than the 2763 aligned sites, given phenomena such as the pairing of bases in the stem regions of the tRNAs. Non-independence has the e?ect of in?ating the di?erence in likelihood between models (Pagel and Meade, 2005). We also can use the matrix weights (provided by Bayes- Phylogenies) as a guide for evaluating how many rate matrices to estimate (Pagel and Meade, 2004; Pagel and Meade, 2005). If additional rate matrices have much lower weights and high variability (measured by standard devia- tion) than other matrices in the analysis then it is likely that the additional rate matrices are being poorly estimated. For the 5Q model, rate matrix weights were 0.05,0.15,0.15, 0.56, and 0.09 with all standard deviations below 0.025 suggesting an adequate estimation of these parameters by the data. Other models, 4Q, 6Q, and 4? 6Q + C, had either lower rate matrix weights for some matrices or higher variability in those matrix weight esti- els are strongly favored even when penalized for additional parameters in the comparisons involving estimated Bayes factors (Table 6). 3.3. Tests of alternative hypotheses Optimal topologies (Topology I, Fig. 1) for the com- bined, unpartitioned data set, the ND1?COI fragment, and the cyt b-tRNAThr fragment were tested against each of the three alternative hypotheses (Topologies II?IV, Fig. 1) for each of the same partitions under both parsi- mony (WSR test) and likelihood (SH and AU tests) (Table 7). The combined data set as well as the ND1?COI frag- ment found Topology I to explain the data signi?cantly better than Topology II with the three tests; however, the cyt b-tRNAThr fragment failed to reject Topology II using all tests (Table 7). All three data sets failed to reject Topol- ogy III in favor of Topology I using WSR, SH, and AU tests. Using the combined data set, all three tests rejected Topology IV, and whereas the ND1?COI fragment could not reject Topology IV using any of the tests, the AU test (but not the others) rejected this topology using the cyt b- Thr and e sp 710 J.A. Schulte II, K. de Queiroz /Molecular Phylogenetics and Evolution 47 (2008) 700?716mates; therefore we propose that the 5Q model may be the most appropriate for this data set based on the two cri- teria outlined above. In any case, none of these mixture models (4?6Q, 4?6Q + C) had lnL scores as good as the best scores for assumed partitions (strategies P4 and P6) under the GTR + C model (Table 5). Although some of this advantage may result from larger numbers of free parameters in P4 and P6 relative to mixture models with 4?6Q with and without gamma (Table 5), the former mod- Table 7 Results of Wilcoxon signed-ranks (WSR), Shimodaira?Hasegawa (SH), hypotheses (topologies) Tree WSR tests Length N P Topology I Combined 4200 (Best) N/A ND1?COI 2324 (Best) N/A Cyt b-tRNAThr 1866 (Best) N/A Topology II Combined 4221 101 0.037* ND1?COI 2341 55 0.022* Cyt b-tRNAThr 1874 22 0.088 Topology III Combined 4200 (Best) N/A ND1?COI 2325 47 0.884 Cyt b-tRNAThr 1867 17 0.808 Topology IV Combined 4214 36 0.019* ND1?COI 2335 69 0.185 Cyt b-tRNAThr 1874 30 0.144 For each hypothesis and data set, the score of the best tree conforming to th trees, and the P-value are given. The four competing hypotheses are illustr hypothesis, are indicated with an asterisk.tRNA fragment. The results of Bayesian hypothesis tests of the four alter- native hypotheses of sand lizard relationships (Fig. 1) for the six di?erent partitioning strategies based on assumed partitions (Table 2) are presented in Table 8. Topology I was compatible with a substantial majority (P87%) of the trees in the post-burn-in sample for all six strategies, although greater than 95% of the trees were compatible with this constraint only for P4 and P6 (Table 8). Using a 95% credible interval cuto?, Topologies II and IV were approximately-unbiased (AU) tests comparing alternative phylogenetic SH tests AU tests lnL d lnL P P 21799.01 (Best) N/A N/A 12685.57 (Best) N/A N/A 9043.26 (Best) N/A N/A 21822.80 23.79 0.014* 0.002* 12704.15 18.58 0.033* 0.011* 9048.80 5.53 0.188 0.090 21801.16 2.15 0.631 0.323 12686.21 0.64 0.736 0.525 9045.21 1.95 0.518 0.233 21822.39 23.38 0.024* 0.012* 12700.71 15.14 0.091 0.071 9051.85 8.59 0.079 0.042* eci?ed hypothesis, the number of characters that discriminate between two ated in Fig. 1. Signi?cant values (P 6 0.05), indicating rejection of the formed to this topology, as did all optimal trees resulting ode ning 1* 1* 1* 1* 1* 1* a re hylofrom parsimony analysis of the ND1?COI fragment, parsi- mony analysis of the cyt b-tRNAThr fragment, successive approximations likelihood analysis under a single set of parameter values for the combined data set, the ND1? COI fragment, and the cyt b-tRNAThr fragment, mixed model maximum likelihood analysis with partition-speci?crejected using all six assumed partitioning strategies. Topology III could not be rejected using P1, P2, P3, and P5 but was rejected by strategies P4, and P6, the partitions that explain the data best (Table 6) and thus presumably represent the most powerful tests. 4. Discussion 4.1. Sand lizard relationships Previous studies have identi?ed four competing hypoth- eses of higher-level relationships among phrynosomatine sand lizards (Fig. 1). Although the results of most recent studies have favored one of these alternative hypotheses (Topology I; ChangChien, 1996; de Queiroz, 1989, 1992; Reeder and Wiens, 1996; Wilgenbusch and de Queiroz, 2000; but see Leache? and McGuire, 2006), previously col- lected data have not been su?cient to reject the alternative hypotheses according to conventional statistical criteria. The results of the present study provide substantially stronger support for Topology I over alternative phyloge- netic hypotheses. Two out of three optimal trees resulting from parsimony analysis of the combined data set con- Table 8 The number and frequency of trees in post burn-in samples from mixed m hypotheses of sand lizard relationships (Fig. 1) under six assumed partitio Topology 1 Topology 2 # Compatible trees Freq # Compatible trees Freq P1 3601 =0.900 0 <0.00 P2 3637 =0.909 0 <0.00 P3 3511 =0.878 0 <0.00 P4 3900 =0.975 0 <0.00 P5 3544 =0.886 0 <0.00 P6 3874 =0.969 0 <0.00 Frequencies are based on samples of 4000 post-burn-in trees; those below J.A. Schulte II, K. de Queiroz /Molecular Psets of parameter values under six di?erent partitioning strategies (P1?P6) applied to the combined data set, Bayes- ian analysis with partition-speci?c sets of parameter values under six di?erent partitioning strategies (P1?P6) applied to the combined data set, and Bayesian mixture models with one to six rate matrices both with and without a parameter for rate heterogeneity applied to the combined data set. In addition, explicit tests of alternative phyloge- netic hypotheses based on parsimony and likelihood (Table 7) as well as Bayesian (Table 8) approaches indicate that the combined data set signi?cantly favors Topology I over both Topology II and Topology IV (failure to reject these topologies using the separate gene fragments under parsi- mony and likelihood presumably re?ects decreased testpower associated with smaller sample sizes when the frag- ments are analyzed separately). Of the three alternatives to Topology I, only Topology III is not consistently rejected by our data. One of the three optimal trees resulting from the parsimony analysis of the combined data conformed to this topology, which was ranked a close second in the likelihood analysis of the com- bined data set and in both parsimony and likelihood anal- yses of the separate ND1?COI and the cyt b-tRNAThr fragments (Table 7). Moreover, Topology III was not even close to being rejected by the various explicit parsimony- and likelihood-based tests of alternative phylogenetic hypotheses for the separate gene fragments and the com- bined (but unpartitioned) data (P values ranging from 0.518 to 1.00). Topology III and Topology I are similar in that both have Uma as the sister group of the remaining sand lizards, and they di?er only in placing Cophosaurus as the sister group of Callisaurus (Topology III) versus Holb- rookia (Topology I). Consistent with the weak support for Topology I over Topology III, the node uniting Cophosau- rus and Holbrookia (node 18) is relatively weakly sup- ported (Table 3). On the other hand, Topology I is favored over Topology III not only by all likelihood and Bayesian analyses of our combined mtDNA data set, but also by previous analyses of allozymes, morphology, immunology, and mtDNA (ChangChien, 1996; de Que- iroz, 1989, 1992; Reeder and Wiens, 1996; Wilgenbusch and de Queiroz, 2000). Moreover, under our Bayesian hypothesis tests (Table 8), which employed partition spe- ci?c parameter estimates, Topology III is much closer to being rejected even in the worst case (P = 0.113 for strat- l Bayesian analyses that are compatible with each of the four alternative strategies (Table 2) Topology 3 Topology 4 # Compatible trees Freq # Compatible trees Post P 376 =0.094 0 <0.001* 329 =0.082 0 <0.001* 453 =0.113 0 <0.001* 91 =0.023* 0 <0.001* 429 =0.107 0 <0.001* 120 =0.030* 0 <0.001* jection cuto? of 0.05 are indicated with an asterisk. genetics and Evolution 47 (2008) 700?716 711egy P3) than in the case of the parsimony and likelihood tests based on combined but unpartitioned data (Table 7), and it is rejected by the partitioning strategies that explain the data best (P4 and P6, Tables 5 and 6) and thus are expected to be associated with the most powerful tests. Thus, in contrast to the situation before the present study, Topology I can now be considered highly corroborated (strongly supported) relative to the three alternative phylo- genetic hypotheses (Fig. 1). 4.2. Phylogenetic analysis under heterogeneous processes Accounting for heterogeneity in nucleotide sequence evolution is an active area of investigation in phylogenetics hylo(Brandley et al., 2005; Castoe et al., 2004; Huelsenbeck and Nielsen, 1999; Nylander et al., 2004; Pagel and Meade, 2004; Wilgenbusch and de Queiroz, 2000). Studies utilizing sequences from di?erent parts of the genome have to account for variation in evolutionary processes and the resulting patterns that exist both among and within those sequences. For example, protein-coding genes are expected to exhibit di?erent patterns of sequence evolution than genes coding for ribosomal or transfer RNAs; similarly, di?erent codon positions within protein coding genes and positions that code for stem versus non-stem regions in RNAs are likewise expected to exhibit di?erent patterns. Application of a single set of model parameter estimates to heterogeneous data partitions will often result in a poor ?t to the data. A better strategy is to assess the ability of di?erent models and/or parameter estimates to explain the data (Swo?ord et al., 1996). Two primary approaches have been developed for assessing and incorporating heterogeneity into phyloge- netic analyses. The ?rst is to partition data into biologically meaningful classes based on prior knowledge of the charac- ters, and then to assume a di?erent evolutionary model for each class. In the case of nucleotide sequences, partitioning is commonly based on transcriptional or translational products, such as protein-coding genes, tRNA genes, or non-transcribed DNA. This is by far the most common strategy and is easily implemented in some existing com- puter programs, such as MrBayes (Ronquist and Huelsen- beck, 2003). A second approach is to invoke an increasing number of evolutionary models without a priori partition- ing of the data, allowing the data to estimate the appropri- ate number of models while attempting to avoid over- parameterization (=mixture-model approach; Pagel and Meade, 2004). An advantage of the latter method is that variation within a partition that may have been missed by the former strategy can be taken into account (Pagel and Meade, 2004). It is also advantageous when biologi- cally meaningful partitions are not evident. Consistent with previous studies (Brandley et al., 2005; Nylander et al., 2004), partition-speci?c parameter esti- mates generally improved the ?t of our data by allowing di?erent components of the data set to have distinct values. For example, substitution rates among 1st, 2nd, and 3rd codon positions for protein coding genes each had di?erent parameter values that exhibited limited overlap in their 95% credible intervals. This result was found using both mixed model ML and Bayesian analyses with assumed par- titions (results not shown). Brandley et al. (2005) noted that data partitioning greatly reduces systematic error and improves mean lnLs and posterior probabilities. Our results support this conclusion with regard to tree scores but not necessarily with regard to clade posterior probabilities. Tree scores for ML mixed model and Bayes- ian analyses using assumed partitions generally improved with increasing numbers of partitions (except P4 was better 712 J.A. Schulte II, K. de Queiroz /Molecular Pthan P5; Table 4), as they did for mixture models with increasing numbers of Q matrices up to 6Q without Cand 4Q with C (Table 5). For Bayesian credibility values, however, alternative partitioning strategies showed no obvious trend toward improvement. Partitioning strategies P4 and P6, among the most complex partitions assumed, showed an increase in BC values for node 18 on our phylo- genetic tree, but for node 1 there was a decrease in BC val- ues and node 14 showed no improvement. In mixture model analyses, BC values for nodes 1, 14, and 18 generally decreased with increasing numbers of rate matrices, whereas node 10 showed higher BC values with more rate matrices. Most studies utilizing prior (assumed) partitions in phy- logenetic analyses of nucleotide sequence data have imple- mented four basic strategies. One is to partition by contiguous DNA fragments, such as the region spanning ND1?COI versus that spanning cyt b through tRNAThr (as in our P2). Another strategy is to partition by classes of genes based on their transcriptional and/or translational products, such as tRNA versus rRNA versus protein-cod- ing genes (P3, P4 in part, P5 in part). Yet another strategy is to partition by individual genes, such as ND2 versus COI versus tRNAThr (P5 in part). The fourth strategy is to par- tition based on structurally and/or functionally de?ned classes of sites, such as codon positions and RNA stems versus nonstems, regardless of the gene boundaries (P4 in part, P6). These four common alternatives were examined in the present study. In addition, other more complicated partitioning strategies may be applied, including classes of nucleotides that code for functional properties of the encoded protein, such as hydrophobicity, charge, or chem- ical properties of R-groups (Naylor et al., 1995; Savolainen et al., 2002). Using both likelihood and Bayesian methods (Tables 4? 6), all of the assumed partitions implemented here (P2?P6) are better than no partitioning at all (P1). Partitioning by classes of genes (P3, P4, P5 in part), by individual genes (P5 in part), and by classes of sites (P4 in part, P6), is each better than partitioning by contiguous DNA fragments (P2). Partitioning by individual genes (P5 in part) or by classes of sites within genes (P4 in part, P6) is better than partitioning by classes of genes (P3), though Bayes factors (Table 6) favored partitioning by classes of genes (P3) over partitioning by individual genes (P5 in part). Finally, par- titioning by classes of sites within genes (P4 in part, P6) is better than partitioning by individual genes (P5). These results are congruent with those of several previous studies (Brandley et al., 2005; Castoe et al., 2005; Castoe and Par- kinson, 2006; Nylander et al., 2004; Strugnell et al., 2005). Nylander et al. (2004) suggested that although account- ing for data heterogeneity across partitions greatly improved the ?t of the data, allowing heterogeneity within partitions might yield even greater improvements. Further partitioning might lead to improvements in lnL scores, but at some point the advantages of further partitioning (improvement in lnL) are outweighed by the disadvan- genetics and Evolution 47 (2008) 700?716tages (inaccurate parameter estimation). All data sets examined to date have found that partitioning by codon hyloposition and/or tRNA secondary structure (stem versus nonstem) improved mean lnL scores; however, Strugnell et al. (2005) found that in some cases partitioning by codon for all genes together produced a better ?t to the data using AIC than partitioning by both codon and gene. These results indicate that the addition of substitution parameters with increasing partitions in some cases is not justi?ed. As the number of parameters increases, the amount of sequence data to be evaluated per parameter decreases, a?ecting the ability of probabilistic methods to estimate model parameter values, as well as inferences derived from them (e.g., topologies, branch lengths, nodal support), accurately. An alternative approach to prior (assumed) partitioning is to estimate heterogeneity using mixture models (Pagel and Meade, 2004). Phylogenetic mixture models allow dif- ferent models to be applied to the same site with varying probabilities, which are estimated from the data them- selves. For our data (Tables 5, 6), mixture models both with and without rate heterogeneity (C) performed better than unpartitioned data (P1) and assumed partitions based on contiguous gene fragments (P2) and classes of genes (P3). On the other hand, assumed partitions based on codon position and tRNA stems and non-stems (P4 in part, P6) were between 800 and 900 lnL units better than the preferred mixture model (5Q) or the best mixture model score (6Q). This improvement comes only at the cost of a single free parameter di?erence between P6 and 6Q, and thus P6 (and P4) was very strongly favored over 6Q (with or without C) according to Bayes factors (Table 6). Development and implementation of mixed model max- imum likelihood, perhaps using genetic algorithms (Zwickl, 2006) or reversible jump (RJ) Markov chain Monte Carlo methods (Huelsenbeck et al., 2004; Pagel and Meade, 2006) that can simultaneously estimate substitution rate parameters and a phylogenetic hypothesis, are the next stage in the evolution of these analytical methods. We anticipate that these developments will serve two very important functions for the phylogenetic community. First, they will simplify the steps researchers must take to per- form the multitude of analyses currently available. Second, these methods will greatly facilitate the ability to account for heterogeneous evolutionary processes in complex phy- logenetic data sets. Acknowledgments We thank the Smithsonian Institution (postdoctoral fel- lowship to J.A. Schulte), the Laboratories of Analytical Biology, and the National Science Foundation (postdoc- toral fellowship to J.A. Schulte II) for ?nancial support; L. Weigt and J. Hunt for laboratory support; O. Torres- Carvajal, A. Meade, and M. Pagel for analytical advice; J. Wilgenbusch for information concerning tissue samples, and providing invaluable support concerning phylogenetic J.A. Schulte II, K. de Queiroz /Molecular Panalyses, particularly those involving mmml. Thanks also are owed to the many people who assisted with obtainingspecimens, whose names are listed in the Acknowledgments of Wilgenbusch and de Queiroz (2000). Appendix A Collection numbers and localities for voucher specimens from which DNA was obtained and GenBank accession numbers are presented. Abbreviations for collections are as follows: California Academy of Sciences (CAS), San Francisco, California; Louisiana State University Museum of Zoology (LSUMZ), Baton Rouge, Louisiana; Museum of Vertebrate Zoology (MVZ), University of California, Berkeley, California; Royal Ontario Museum (ROM), Tor- onto, Canada; National Museum of Natural History (USNM), Smithsonian Institution, Washington, District of Columbia; ?eld specimen numbers of Kevin de Queiroz (KdQ); ?eld specimen numbers of Richard R. Montanucci (RRM); ?eld specimen numbers of Christopher A. Phillips (CAP). Subspecies taxonomy for Holbrookia maculata fol- lows Axtell (1958). Outgroups: Phrynosoma hernandesi, USA; Colorado; Weld Co.; vic. Stoneham (RRM 2470, EU543778, EU543747); Phrynosoma platyrhinos, USA; California; no further data (KdQ 057, EU543777, EU543746); Sceloporus jarrovii, USA; Arizona; Cochise Co.; Huachuca Mts., vic. Carr Canyon (KdQ 036, EU543774, EU543743); Urosaurus ornatus, USA; Arizona; Yavapai Co.; 3 mi N Hillside (MVZ 214658, EU543776, EU543745); Uta stansburiana, USA; Arizona; Pima Co.; Santa Catalina Mts,, mouth of Pima Canyon (CAS 178153, EU543775, EU543744); Sand Lizards: Callisaurus draconoides bogerti, Me?xico; Sinaloa; 14 km S Dimas (ROM 14970, EU543788, EU543757); Callisaurus dracono- ides carmenensis, Me?xico; Baja California; San Ignacio (ROM 14182, EU543786, EU543755); Callisaurus dracono- ides crinitis, Me?xico; Baja California; Dunes N of Guerrero Negro (MVZ 214832, EU543787, EU543756); Callisaurus draconoides myurus, USA; Nevada; Mineral Co.; W shore of Walker Lake (MVZ 214751, EU543791, EU543760); Callisaurus draconoides rhodostictus, USA; California; Inyo Co.; Argus Mts., Darwin Wash (MVZ 214739, EU543792, EU543761); Callisaurus draconoides ssp., Me?xico; Sonora; 4.0 mi. NW of Bah??a Kino (ROM 15034, EU543789, EU543758); Callisaurus draconoides ventralis, USA; New Mexico; Hidalgo Co.; 12.4 mi N and 0.9 mi W of jct. US Hwy 80 and Portal Road (NM 533) (LSUMZ 48811, EU543790, EU543759); Cophosaurus texanus scitulus, USA; Arizona; Pima Co.; Santa Catalina Mts., mouth of Pima Canyon (MVZ 214711, EU543793, EU543762); Cophosaurus texanus texanus, USA; Texas; Blanco Co.; Flat Creek, 3.8 mi N of U. S. Rte. 290 (at Henly) on Farm Road 3232, then 0.5 mi E on dirt road (USNM 315499, EU543794, EU543763); Holbrookia elegans elegans, Me?xi- co; Sinaloa; Mazatla?n (ROM 14964, EU543797, EU543766); Holbrookia elegans thermophila, USA; Ari- zona; Pima Co.; 0.6 mi E of Hwy 83 on Old Sonoita genetics and Evolution 47 (2008) 700?716 713Hwy (CAS 174400, EU543798, EU543767);Holbrookia lac- erata lacerata, USA; Texas; Concho Co.; 5.8 (by air) miles hyloE of Eden on dirt road paralleling U. S. Rte. 87 (USNM 315500, EU543795, EU543764); Holbrookia lacerata sub- caudalis, USA; Texas; McMullen Co.; Charles Caron Ranch, 9.5 mi E of Texas Rte. 16 on Farm Road 1962 (USNM 315503, EU543796, EU543765); Holbrookia pro- pinqua propinqua, USA; Texas; Nueces Co.; dunes just N of E end of Access Rd. 3, just S of Mustang Island State Park (MVZ 214863, EU543799, EU543768); Holbrookia maculata bunkeri, USA; New Mexico; Luna Co.; 2.0 mi W of Cambray on NM Hwy 549 (USNM 337740, EU543803, EU543772); Holbrookia maculata campi, USA; New Mexico; McKinley Co.; Zuni Indian Reserva- tion; jct. of Sand Spring Canyon Creek and Zuni River (MVZ 214801, EU543801, EU543770); Holbrookia macula- ta ?avilenta, USA; Arizona; Cochise Co.; 0.1 mi S Hwy I- 10 at junction 666N (MVZ 214814, EU543804, EU543773); Holbrookia maculata maculata, USA; New Mexico; Chaves Co.; Dune area 0.9 mi S Hwy 380 and 37.0 mi E jct. Hwy 285 at Roswell (USNM 214806, EU543800, EU543769); Holbrookia maculata ruthveni, USA; New Mexico; Otero Co.; White Sands National Monument, 5.0 mi W of U.S. Hwy 70/82 on The Dunes Drive (USNM 337752, EU543802, EU543771); Uma exsul, Me?xico; Coahuila, Bil- bao Dunes (ROM 15315, EU543779, EU543748);Uma inornata, USA; California; Riverside Co.; Windy Point (CAP 1752, EU543785, EU543754); Uma notata, USA; California; Imperial Co.; E edge Algodones Dunes, 0.9 mi W jct. of Hwy I-10 and S-34 (MVZ 214799, EU543784, EU543753);Uma rufopunctata, Me?xico; Sonora; 17 km NE Puerto Pen?asco (ROM 13919, EU543783, EU543752); Uma paraphygas, Me?xico; Durango; Bolso?n de Mapim?? , vic. Laboratorio del Desierto (ROM 15089, EU543780, EU543749); Uma scoparia (1), USA; California; San Bernardino Co.; Pisgah Crater (ROM 14637, EU543781, EU543750); Uma scoparia (2), USA; California; San Bernardino Co.; S side Kelso Dunes (MVZ 214784, EU543782, EU543751). Appendix B Trees used as constraints for tests of alternative hypoth- eses of sand lizard relationships (note that these topologies do not constrain the traditional sand lizard genera to be monophyletic, which makes them more di?cult to reject than topologies that constrain monophyly of the genera): Topology 1: (Sceloporus jarrovii, Uta stansburiana, Uro- saurus ornatus, Phrynosoma platyrhinos, Phrynosoma her- nandesi, (Uma exsul, Uma paraphygas, Uma scoparia 1, Uma scoparia 2, Uma rufopunctata, Uma notata, Uma inor- nata, (Callisaurus draconoides carmenensis, Callisaurus dra- conoides crinitus, Callisaurus draconoides bogerti, Callisaurus draconoides ssp., Callisaurus draconoides ventra- lis, Callisaurus draconoides myurus, Callisaurus draconoides rhodostictus, (Cophosaurus texanus scitulus, Cophosaurus texanus texanus, Holbrookia lacerata lacerata, Holbrookia 714 J.A. Schulte II, K. de Queiroz /Molecular Placerata subcaudalis, Holbrookia elegans elegans,Holbrookia elegans thermophila, Holbrookia propinqua propinqua, Holb-rookia maculata maculata, Holbrookia maculata campi, Holbrookia maculata ruthveni, Holbrookia maculata bunkeri, Holbrookia maculata ?avilenta)))). Topology 2: (Sceloporus jarrovii, Uta stansburiana, Uro- saurus ornatus, Phrynosoma platyrhinos, Phrynosoma her- nandesi, ((Uma exsul, Uma paraphygas, Uma scoparia 1, Uma scoparia 2, Uma rufopunctata, Uma notata, Uma inor- nata, Callisaurus draconoides carmenensis, Callisaurus draconoides crinitus, Callisaurus draconoides bogerti, Calli- saurus draconoides ssp, Callisaurus draconoides ventralis, Callisaurus draconoides myurus, Callisaurus draconoides rho- dostictus), (Cophosaurus texanus scitulus, Cophosaurus tex- anus texanus, Holbrookia lacerata lacerata, Holbrookia lacerata subcaudalis, Holbrookia elegans elegans,Holbrookia elegans thermophila, Holbrookia propinqua propinqua, Holb- rookia maculata maculata, Holbrookia maculata campi, Holbrookia maculata ruthveni, Holbrookia maculata bunkeri, Holbrookia maculata ?avilenta))). Topology 3: (Sceloporus jarrovii, Uta stansburiana, Uro- saurus ornatus, Phrynosoma platyrhinos, Phrynosoma her- nandesi, (Uma exsul, Uma paraphygas, Uma scoparia 1, Uma scoparia 2, Uma rufopunctata, Uma notata, Uma inor- nata, ((Callisaurus draconoides carmenensis, Callisaurus dra- conoides crinitus,Callisaurus draconoides bogerti, Callisaurus draconoides ssp, Callisaurus draconoides ventralis, Callisau- rus draconoides myurus, Callisaurus draconoides rhodostic- tus, Cophosaurus texanus scitulus, Cophosaurus texanus texanus), Holbrookia lacerata lacerata, Holbrookia lacerata subcaudalis, Holbrookia elegans elegans, Holbrookia elegans thermophila, Holbrookia propinqua propinqua, Holbrookia maculata maculata, Holbrookia maculata campi, Holbrookia maculata ruthveni, Holbrookia maculata bunkeri, Holbrookia maculata ?avilenta))). Topology 4: (Sceloporus jarrovii, Uta stansburiana, Uro- saurus ornatus, Phrynosoma platyrhinos, Phrynosoma her- nandesi, ((Uma exsul, Uma paraphygas, Uma scoparia 1, Uma scoparia 2, Uma rufopunctata, Uma notata, Uma inor- nata, (Callisaurus draconoides carmenensis, Callisaurus draconoides crinitus, Callisaurus draconoides bogerti, Calli- saurus draconoides ssp, Callisaurus draconoides ventralis, Callisaurus draconoides myurus, Callisaurus draconoides rhodostictus, Cophosaurus texanus scitulus, Cophosaurus texanus texanus)), Holbrookia lacerata lacerata, Holbrookia lacerata subcaudalis, Holbrookia elegans elegans,Holbrookia elegans thermophila, Holbrookia propinqua propinqua, Holb- rookia maculata maculata, Holbrookia maculata campi, Holbrookia maculata ruthveni, Holbrookia maculata bunkeri, Holbrookia maculata ?avilenta)). References Adest, G.A., 1978. The relationships of the sand lizards Uma, Callisaurus, and Holbrookia (Sauria: Iguanidae): an electrophoretic study. Unpub- lished Ph.D. dissertation, University of California, Los Angeles. Alfaro, M.E., Zoller, S., Lutzoni, F., 2003. Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov genetics and Evolution 47 (2008) 700?716chain Monte Carlo sampling and bootstrapping in assessing phyloge- netic con?dence. Mol. Biol. Evol. 20, 255?266. hyloAnderson, S., Bankier, A.T., Barrell, B.G., De Bruijn, M.H.L., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J.H., Staden, R., Young, I.G., 1981. Sequence and organization of the human mitochondrial genome. Nature 290, 457?465. Axtell, R.W., 1958. A monographic revision of the iguanid genus Holbrookia. Ph.D. dissertation, University of Texas, Austin. Brandley, M.C., Schmitz, A., Reeder, T.W., 2005. Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards. Syst. Biol. 54, 373?390. Castoe, T.A., Doan, T.M., Parkinson, C.L., 2004. Data partitions and complex models in Bayesian analysis: the phylogeny of gymnophthal- mid lizards. Syst. Biol. 53, 448?469. Castoe, T.A., Sasa, M.M., Parkinson, C.L., 2005. Modeling nucleotide evolution at the mesoscale: the phylogeny of the Neotropical pitvipers of the Porthidium group (Viperidae: Crotalinae). Mol. Phylogenet. Evol. 37, 881?898. Castoe, T.A., Parkinson, C.L., 2006. Bayesian mixed models and the phylogeny of pitvipers (Viperidae: Serpentes). Mol. Phylogenet. Evol. 39, 91?110. ChangChien, L.-L., 1996. A phylogenetic study of sceloporine lizards and their relationships with other iguanid lizards based on DNA/DNA hybridization. Ph.D. dissertation, University of Wisconsin, Madison. Clarke, R.F., 1965. An ethological study of the iguanid lizard genera Callisaurus, Cophosaurus, and Holbrookia. Emporia St. Res. Stud. 13, 1?66. Cox, D.C., Tanner, W.W., 1977. Osteology and myology of the head and neck regions of Callisaurus, Cophosaurus, Holbrookia, and Uma (Reptilia: Iguanidae). Great Basin Nat. 37, 35?56. Crother, B.I., 2000. Scienti?c and standard English names of amphibians and reptiles of North America north of Mexico, with comments regarding con?dence in our understanding. Herpetolog. Circ. 29, Society for the Study of Amphibians and Reptiles. Crother, B.I., 2003. Scienti?c and standard English names of amphibians and reptiles of North America north of Mexico: update. Herpetol. Rev. 34, 196?203. de Queiroz, K., 1989. Morphological and biochemical evolution in the sand lizards. Ph.D. dissertation, University of California, Berkeley. de Queiroz, K., 1992. Phylogenetic relationship and rates of allozyme evolution among the lineages of sceloporine sand lizards. Biol. J. Linn. Soc. 45, 333?362. Douady, C.J., Delsuc, F., Boucher, Y., Doolittle, W.F., Douzery, E.J.P., 2003. Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol. Biol. Evol. 20, 248?254. Earle, A.M., 1961. The middle ear of Holbrookia and Callisaurus. Copeia 1961, 405?410. Earle, A.M., 1962. The middle ear of the genus Uma compared to those of the other sand lizards. Copeia 1962, 185?188. Etheridge, R., de Queiroz, K., 1988. A phylogeny of Iguanidae. In: Estes, R., Pregill, G. (Eds.), Phylogenetic Relationships of the Lizard Families. Stanford Univ. Press, Stanford, California, pp. 283?367. Felsenstein, J., 1985a. Con?dence limits on phylogenies an approach using the bootstrap. Evolution 39, 783?791. Felsenstein, J., 1985b. Con?dence limits on phylogenies with a molecular clock. Syst. Zool. 34, 152?161. Felsenstein, J., Kishino, H., 1993. Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Syst. Biol. 42, 193?200. Fu, J., 1999. Phylogeny of Lacertid Lizards (Squamata: Lacertidae) and the Evolution of Unisexuality, Ph.D. dissertation. University of Toronto, Toronto. Goldman, N., Anderson, J.P., Rodrigo, A.G., 2000. Likelihood-based tests of topologies in phylogenetics. Syst. Biol. 49, 652?670. Huelsenbeck, J.P., Nielsen, R., 1999. The e?ect of non-independent substitution on phylogenetic accuracy. Syst. Biol. 48, 317?328. Huelsenbeck, J.P., Larget, B., Miller, R.E., Ronquist, F., 2002. Potential J.A. Schulte II, K. de Queiroz /Molecular Papplications and pitfalls of Bayesian inference of phylogeny. Syst. Biol. 51, 673?688.Huelsenbeck, J.P., Larget, B., Alfaro, M.E., 2004. Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo. Mol. Biol. Evol. 21, 1123?1133. Je?reys, H., 1935. Some tests of signi?cance, treated by the theory of probability. Proc. Camb. Phil. Soc. 31, 203?222. Kass, R.E., Raftery, A.E., 1995. Bayes factors. J. Am. Stat. Assoc. 90, 773?795. Kocher, T.D., Thomas, W.K., Meyer, A., Edwards, S.V., Paabo, S., Villablanca, F.X., Wilson, A.C., 1989. Dynamics of mitochondrial DNA evolution in animals: Ampli?cation and sequencing with conserved primers. Proc. Natl. Acad. Sci. USA 86, 6196?6200. Kumazawa, Y., Nishida, M., 1993. Sequence evolution of mitochondrial tRNA genes and deep-branch animal phylogenetics. J. Mol. Evol. 37, 380?398. Kumazawa, Y., Ota, H., Nishida, M., Ozawa, T., 1998. The complete nucleotide sequence of a snake (Dinodon semicarinatus) mitochondrial genome with two identical control regions. Genetics 150, 313?329. Leache?, A.D., McGuire, J.A., 2006. Phylogenetic relationships of horned lizards (Phrynosoma) based on nuclear and mitochondrial data: evidence for a misleading mitochondrial gene tree. Mol. Phylogenet. Evol. 39, 628?644. Lewis, P.O., Holder, M.T., Holsinger, K.E., 2005. Polytomies and Bayesian phylogenetic inference. Syst. Biol. 54, 241?253. Macey, J.R., Verma, A., 1997. Homology in phylogenetic analysis: alignment of transfer RNA genes and the phylogenetic position of snakes. Mol. Phylogenet. Evol. 7, 272?279. Macey, J.R., Larson, A., Ananjeva, N.B., Fang, Z., Papenfuss, T.J., 1997. Two novel gene orders and the role of light-strand replication in rearrangement of the vertebrate mitochondrial genome. Mol. Biol. Evol. 14, 91?104. Macey, J.R., Schulte II, J.A., Larson, A., Fang, Z., Wang, Y., Tuniyev, B.S., Papenfuss, T.J., 1998. Phylogenetic relationships of toads in the Bufo bufo species group from the eastern escarpment of the Tibetan Plateau: a case of vicariance and dispersal. Mol. Phylogenet. Evol. 9, 80?87. Macey, J.R., Schulte II, J.A., Larson, A., Tuniyev, B.S., Orlov, N., Papenfuss, T.J., 1999. Molecular phylogenetics, tRNA evolution, and historical biogeography in anguid lizards and related taxonomic families. Mol. Phylogenet. Evol. 12, 250?272. Maddison, D.R., Maddison, W.P., 2003. MacClade: Analysis of Phylog- eny and Character Evolution. Version 4.08. Sinauer Associates, Sunderland, MA. Mittleman, M.B., 1942. A summary of the iguanid genus Urosaurus. Bull. Mus. Comp. Zool. Harvard Univ. 91, 106?176. Naylor, G.J.P., Collins, T.M., Wesley, M.B., 1995. Hydrophobicity and phylogeny. Nature 373, 565?566. Norris, K.S., 1958. The evolution and systematics of the iguanid genus Uma and its relation to the evolution of other North American desert reptiles. Bull. Am. Mus. Nat. Hist. 114, 253?326. Nylander, J.A.A., Ronquist, F., Huelsenbeck, J.P., Nieves-Aldrey, J.L., 2004. Bayesian phylogenetic analysis of combined data. Syst. Biol. 53, 47?67. Pagel, M., Meade, A., 2004. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 53, 571?581. Pagel, M., Meade, A., 2005. Mixture models in phylogenetic inference. In: Gascuel, O. (Ed.), Mathematics of Evolution and Phylogeny. Claren- don Press, Oxford, pp. 121?142. Pagel, M., Meade, A., 2006. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. Am. Nat. 167, 808?825. Posada, D., Crandall, K.A., 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14, 817?818. Raftery, A.E., 1996. Hypothesis testing and model selection. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (Eds.), Markov Chain Monte Carlo in Practice. Chapman & Hall, London, pp. 163?187. genetics and Evolution 47 (2008) 700?716 715Rambaut, A., Drummond, A.J., 2005. Tracer. Version 1.3, Available from: http://tree.bio.ed.ac.uk/software/tracer/. Reeder, T.W., Wiens, J.J., 1996. Evolution of the lizard family Phryno- somatidae as inferred from diverse types of data. Herpetol. Monogr. 10, 43?84. Ronquist, F., Huelsenbeck, J.P., 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572?1574. Savolainen, V., Chase, M.W., Salamin, N., Soltis, D.E., Soltis, P.S., Lo?pez, A.J., Fe?drigo, O., Naylor, G.J.P., 2002. Phylogeny reconstruc- tion and functional constraints in organellar genomes: plastid atpB and rbcL sequences versus animal mitochondrion. Syst. Biol. 51, 638?647. Savage, J.M., 1958. The iguanid lizard genera Urosaurus and Uta, with remarks on related groups. Zoologica 43, 41?54. Schulte II, J.A., Valladares, J.P., Larson, A., 2003. Phylogenetic relation- ships within Iguanidae inferred using molecular and morphological data and a phylogenetic taxonomy of iguanian lizards. Herpetologica 59, 399?419. Shimodaira, H., 2002. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492?508. Shimodaira, H., Hasegawa, M., 1999. Multiple comparisons of log- likelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16, 1114?1116. Shimodaira, H., Hasegawa, M., 2001. CONSEL: for assessing the con?dence of phylogenetic tree selection. Bioinformatics 17, 1246? 1247. Smith, H.M., 1946. Handbook of lizards. Comstock, Ithaca, NY. Strugnell, J., Norman, M., Jackson, J., Drummond, A.J., Cooper, A., 2005. Molecular phylogeny of coleoid cephalopods (Mollusca: Ceph- alopoda) using a multigene approach; the e?ect of data partitioning on resolving phylogenies in a Bayesian framework. Mol. Phylogenet. Evol. 37, 426?441. Sullivan, J., Abdo, Z., Joyce, P., Swo?ord, D.L., 2005. Evaluating the performance of a successive-approximations approach to parameter optimization in maximum-likelihood phylogeny estimation. Mol. Biol. Evol. 22, 1386?1392. Swo?ord, D.L., 2002. PAUP*. Phylogenetic Analysis Using Parsimony* (and other methods). Version 4.0. Sinauer Associates, Sunderland, MA. Swo?ord, D.L., Olsen, G.J., Waddell, P.J., Hillis, D.M., 1996. Phyloge- netic inference. In: Hillis, D.M., Moritz, C., Mable, B.K. (Eds.), Molecular Systematics, 2nd ed. Sinauer Associates, Sunderland, MA, pp. 407?543. Tavare?, S., 1986. Some probabilistic and statistical problems in the analysis of DNA sequences. Lec. Math. Life Sci. 17, 57?86. Templeton, A.R., 1983. Phylogenetic inference from restriction endonu- clease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37, 221?244. Torres-Carvajal, O., Schulte II, J.A., Cadle, J.E., 2006. Phylogenetic relationships of South American lizards of the genus Stenocercus (Squamata: Iguania): a new approach using a general mixture model for gene sequence data. Mol. Phylogenet. Evol. 39, 171?185. Weisrock, D.W., Macey, J.R., Ugurtas, I.H., Larson, A., Papenfuss, T.J., 2001. Molecular phylogenetics and historical biogeography among salamandrids of the ??true? salamander clade: rapid branching of numerous highly divergent lineages in Mertensiella luschani associated with the rise of Anatolia. Mol. Phylogenet. Evol. 18, 434?448. Wilgenbusch, J., de Queiroz, K., 2000. Phylogenetic relationships among the phrynosomatid sand lizards inferred from mitochondrial DNA sequences generated by heterogeneous evolutionary processes. Syst. Biol. 49, 592?612. Zwickl, D.J., 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. dissertation, The University of Texas at Austin. 716 J.A. Schulte II, K. de Queiroz /Molecular Phylogenetics and Evolution 47 (2008) 700?716