te
d
*,
P.O
Keywords: Phrynosomatinae; Uma; Callisaurus; Cophosaurus; Holbrookia; Mixed models; Mixture models; Process heterogeneity; Phylogeny; Mitocho-
era: Uma (6 species), Callisaurus (1 species), Cophosaurus (1 ships, Wilgenbusch and de Queiroz (2000) evaluated four
previously proposed phylogenetic hypotheses (Fig. 1) using
fragments of two mitochondrial genes, cytochrome b
(650 bp) and 12S rRNA (350 bp). Although these data
consistently favored one of the four topologies (Topology
I in Fig. 1), support values for the two critical nodes distin-
* Corresponding author. Present address: Department of Biology, 177
Clarkson Science Center, MRC 5805, Clarkson University, Potsdam, NY
13699-5805, USA. Fax: +1 315 268 7118.
E-mail address: jschulte@clarkson.edu (J.A. Schulte II).
Available online at www.sciencedirect.com
Molecular Phylogenetics and Evolutndrial DNA
1. Introduction
The phrynosomatine sand lizards are a well-studied
clade of squamate reptiles for which several alternative
phylogenetic hypotheses have been proposed. The group
is composed of 12 currently recognized species that form
four mutually exclusive clades traditionally ranked as gen-
species), and Holbrookia (4 species) (Crother, 2000, 2003;
Wilgenbusch and de Queiroz, 2000). Although these four
clades are well-supported and largely uncontested, the rela-
tionships among them have been the subject of consider-
able debate (reviewed by Wilgenbusch and de Queiroz,
2000).
In an earlier study of sand lizard phylogenetic relation-Received 8 August 2007; revised 3 December 2007; accepted 11 January 2008
Available online 24 January 2008
Abstract
Phylogenetic analyses of DNA sequences were conducted to evaluate four alternative hypotheses of phrynosomatine sand lizard rela-
tionships. Sequences comprising 2871 aligned base pair positions representing the regions spanning ND1?COI and cyt b-tRNAThr of the
mitochondrial genome from all recognized sand lizard species were analyzed using unpartitioned parsimony and likelihood methods,
likelihood methods with assumed partitions, Bayesian methods with assumed partitions, and Bayesian mixture models. The topology
(Uma, (Callisaurus, (Cophosaurus, Holbrookia))) and thus monophyly of the ??earless? taxa, Cophosaurus and Holbrookia, is supported
by all analyses. Previously proposed topologies in which Uma and Callisaurus are sister taxa and those in which Holbrookia is the sister
group to all other sand lizard taxa are rejected using both parsimony and likelihood-based signi?cance tests with the combined, unpari-
tioned data set. Bayesian hypothesis tests also reject those topologies using six assumed partitioning strategies, and the two partitioning
strategies presumably associated with the most powerful tests also reject a third previously proposed topology, in which Callisaurus and
Cophosaurus are sister taxa. For both maximum likelihood and Bayesian methods with assumed partitions, those partitions de?ned by
codon position and tRNA stem and nonstems explained the data better than other strategies examined. Bayes factor estimates comparing
results of assumed partitions versus mixture models suggest that mixture models perform better than assumed partitions when the latter
were not based on functional characteristics of the data, such as codon position and tRNA stem and nonstems. However, assumed par-
titions performed better than mixture models when functional di?erences were incorporated. We reiterate the importance of accounting
for heterogeneous evolutionary processes in the analysis of complex data sets and emphasize the importance of implementing mixed
model likelihood methods.
 2008 Elsevier Inc. All rights reserved.Phylogenetic relationships and he
among phrynosomatine sand lizar
James A. Schulte II
Division of Amphibians and Reptiles, Smithsonian Institution,1055-7903/$ - see front matter  2008 Elsevier Inc. All rights reserved.
doi:10.1016/j.ympev.2008.01.010rogeneous evolutionary processes
s (Squamata, Iguanidae) revisited
Kevin de Queiroz
. Box 37012, MRC 162, Washington, DC 20013-7012, USA
www.elsevier.com/locate/ympev
ion 47 (2008) 700?716
hyloguishing the favored topology from alternative hypotheses
were unimpressive, and the three alternative topologies
could not be rejected under the conventional P 6 0.05 cri-
terion of signi?cance. Assuming patterns of nucleotide
sequence variation similar to those observed in the cyto-
chrome b (cyt b) gene, Wilgenbusch and de Queiroz
(2000) estimated that from 1037 to 2586 base pairs of
sequence data would be necessary to reject the various
alternative topologies. Therefore, we undertook a project
to sequence two fragments of mitochondrial DNA encod-
ing the nearly complete cyt b gene (1130 bp; Kumazawa
et al., 1998) and a genomic segment extending from the
gene encoding subunit one of nicotinamide adenine dinu-
cleotide dehydrogenase (ND1) through the gene encoding
subunit 1 of cytochrome c oxidase (COI) (1700 bp;
Macey et al., 1997) in an attempt to distinguish between
the alternative hypotheses with a higher degree of con?-
dence. In the process, we also investigated the e?ects of dif-
ferent methods for accommodating evolutionary
Fig. 1. Alternative hypotheses of higher-level relationships among the
phrynosomatine sand lizards (after Wilgenbusch and de Queiroz, 2000).
Topology I (Savage, 1958; Cox and Tanner, 1977; Etheridge and de
Queiroz, 1988; de Queiroz, 1989, 1992; Reeder and Wiens, 1996;
ChangChien, 1996); topology II (Mittleman, 1942; Smith, 1946); topology
III (Norris, 1958; Earle, 1961, 1962); topology IV (Axtell, 1958; Clarke,
1965; Adest, 1978).
J.A. Schulte II, K. de Queiroz /Molecular Pheterogeneity among subsets of the data.
Several recent studies (Brandley et al., 2005; Castoe and
Parkinson, 2006; Nylander et al., 2004; Strugnell et al.,
2005) have shown that partitioning data and using parti-
tion speci?c parameter estimates are associated with higher
probabilities of the data, and this result has been shown to
hold for previously collected nucleotide sequence data on
sand lizards (Wilgenbusch and de Queiroz, 2000). Alterna-
tively, heterogeneity can be accommodated without parti-
tioning the data by applying several models, each
weighted by its probability, to every site (Pagel and Meade,
2004; Pagel and Meade, 2005). Here, we use the term mixed
model (=assumed partitions = a priori partitions) for analy-
ses that assume prede?ned data partitions (models) prior to
conducting the analysis, with a di?erent model (including
parameter estimates) applied to each partition. In contrast,
mixture models accommodate heterogeneity across sites by
applying several models to each site, where the likelihood
of a given site is the sum of the likelihoods under the di?er-
ent models, each weighted by its probability as estimatedfrom the data (Pagel and Meade, 2004). In this paper, we
analyzed our data using both di?erent general approaches
(mixed models versus mixture models) and di?erent strate-
gies within the general approaches (di?erent assumed par-
titions; di?erent numbers of Q matrices), which permitted
us to compare the ability of these di?erent approaches
and strategies to accommodate heterogeneity in our data.
2. Materials and methods
2.1. Taxon and character sampling
Frozen liver or muscle tissue of 26 phrynosomatine sand
lizards, representing all 12 currently recognized species,
was used in DNA extractions (Appendix A). These same
samples were used by Wilgenbusch and de Queiroz (2000)
and cover a wide range of variation within and among sand
lizard species with regard to geographic distribution, mor-
phology, and taxonomy. We also used the same outgroup
specimens sequenced by Wilgenbusch and de Queiroz
(2000): Phrynosoma platyrhinos, P. hernandesi, Sceloporus
jarrovii, Urosaurus ornatus, and Uta stansburiana. Wil-
genbusch and de Queiroz (2000) reported sequencing
CAS 174377 representing the taxon Holbrookia maculata
thermophila; however, our sequences for the sample from
CAS 174377 revealed that it is H. m. ?avilenta. The correct
voucher specimen for the data reported for H. m. thermo-
phila appears to be CAS 174400, which also was sequenced
by Wilgenbusch and de Queiroz (2000) and has an identical
cyt b sequence to that reported by those authors and was
therefore used as the representative of H. m. thermophila
in the present study. In addition, the museum number for
the specimen of H. m. ?avilenta sequenced by Wilgenbusch
and de Queiroz (2000) was published as MVZ 21814; the
correct number is MVZ 214814. See Appendix A for
museum numbers, localities of voucher specimens from
which DNA was extracted, and GenBank accession num-
bers for DNA sequences.
Genomic DNA was extracted using Qiagen QIAamp tis-
sue kits. Ampli?cation of genomic DNA was conducted in
a DNA Engine (PTC-200TM) Peltier Thermal Cycler (MJ
Research) using denaturation at 94 C for 35 s, annealing
at 45?55 C for 35 s, and extension at 72 C for 150 s for
30?35 cycles. Negative controls were run on all ampli?ca-
tions to check for contamination. Ampli?ed products were
visualized on 1.5% SeaKem agarose gels. Reampli?ed
double-stranded products were puri?ed with AMPure
magnetic beads (Agencourt). Cycle-sequencing reactions
were run using the ABI Prism Big Dye Terminator DNA
Sequencing Kit (Perkin-Elmer) with denaturation at
96 C for 10 s, annealing at 50 C for 10 s, and extension
at 60 C for 4 min for 35?40 cycles. Sequencing reactions
were run on an ABI 3100 Genetic Analyzer (Applied
Biosystems).
We obtained nucleotide sequences from four protein-
genetics and Evolution 47 (2008) 700?716 701coding genes (ND1, ND2, COI, cyt b), nine transfer RNA
genes (tRNAIle, tRNAGln, tRNAMet, tRNATrp, tRNAAla,
tRNAAsn, tRNACys, tRNATyr, tRNAThr), and the origin of
light-strand replication (OL) in the mitochondrial DNA.
PCR and sequencing primers used in this study are listed
in Table 1. DNA sequences were aligned manually, using
tRNA secondary structure models and information on
codon position. Positions encoding part of ND1, complete
ND2, part of COI, and part of cyt b were translated to
amino acids using MacClade 4.08 (Maddison and Maddi-
son, 2003) for con?rmation of alignment. Alignment of
protein-coding regions was straightforward, as these
regions contained no length variation. Noncoding inter-
genic regions contained length variation in several
instances (see Section 3.1) and gaps were inserted to ensure
alignment of ?anking coding regions. Alignment of
sequences encoding tRNAs was based on secondary struc-
ture models (Kumazawa and Nishida, 1993; Macey and
Verma, 1997). Secondary structures of tRNAs were
inferred from primary structures of the corresponding
tRNA genes using these models. Alignment of tRNA stem
regions was straightforward; however, several tRNA loop
regions were more di?cult to align due to signi?cant length
variation. Gaps were inserted in these regions to maintain
alignment of ?anking stem regions by minimizing implied
substitutions and indel events, with indels weighted equally
to transitions and transversions weighted approximately 4
times indels. A region was considered ambiguously aligned
if more than one alternative placement of a gap had the
same cost, with equal costs for gap insertions and exten-
sions, while still maintaining alignment of conserved
regions of adjacent tRNA stems or coding regions. Gaps
were treated as missing data, and regions considered
ambiguously aligned were excluded from phylogenetic
analyses (see Section 3.1).
2.2. Phylogenetic analyses
Phylogenetic trees were estimated using both parsimony
and probabilistic methods. Parsimony analyses were con-
ducted using PAUP* 4.0b10 (Swo?ord, 2002) using equal
weighting of characters and equal costs for state transfor-
mations. Optimal trees were estimated using heuristic
searches with 1000 replicates of random stepwise addition
and tree bisection and reconnection (TBR) branch swap-
ping. Bootstrap resampling (Felsenstein, 1985a) was
applied to assess support for individual nodes using 1000
bootstrap replicates and full heuristic searches with 100
replicates of random stepwise addition and TBR branch
swapping.
Maximum-likelihood (ML) analyses also were per-
formed using PAUP*. Searches were conducted using a
successive approximations approach (Swo?ord et al.,
1996; Sullivan et al., 2005) for (1) the unpartitioned, com-
bined data set, (2) the fragment encoding the genes span-
ning ND1?COI, (3) and the fragment spanning cyt b to
NA
0?3
GA
TA
TT
AC
TA
CG
GC
CC
GT
AA
AT
TT
GT
AA
RG
CA
TT
CCA
CT
AC
CAA
GT
TG
AC
CT
AA
), ta
702 J.A. Schulte II, K. de Queiroz /Molecular Phylogenetics and Evolution 47 (2008) 700?716Table 1
Primers used to amplify and sequence the mtDNA ND1?COI and cyt b-tR
Primer name Human position Gene Sequence (5
ND1ba,s L4160 ND1 CGATTCC
ND1f.3a,s L4178a ND1 CAACTAA
ND1f.7a,s L3914 ND1 GCCCCAT
ILEf.1s L4221 tRNAIle AAGGATT
ILEf.6s L4221 tRNAIle AAGGGN
METf.6a,s L4437 tRNAMet AAGCTTT
METr.5a,s H4419 tRNAMet GGTATGA
ND2f.4a,s L4645 ND2 ACAGAAG
ND2r.6a H4980 ND2 ATTTTTC
ND2f.14a,s L4882 ND2 TGACAAA
ND2f.54s L5263 ND2 TAACYGG
ND2f.57s L4833 ND2 CACAYTT
ND2r.61a,s H4568 ND2 ATGGCTA
TRPf.11s L5550 tRNATrp AACCAAG
TRPf.12a,s L5549 tRNATrp AACCAAG
ALAf.4s L5638b tRNAAla CTGAATG
ASNr.2a,s H5692 tRNAAsn TTGGGTG
CO1r.1a,s H5934a COI AGRGTG
CO1r.8a,s H6159 COI GCTATGT
CYTbf.1a,s L14842 Cyt b CCATCCA
CYTbf.2a,s L15172 Cyt b TGAGGA
CYTbr.3a,s H15916 Cyt b GTCTTCA
CYTbr.5a,s H15402 Cyt b CTTTGTA
CYTbf.6s L15429 Cyt b CCATTYC
CYTbr.7s H15216 Cyt b ATTCATT
CYTbf.8s L15691 Cyt b ACATCAA
Their positions in the human mitochondrial genome (Anderson et al., 1981
whose extension produces the heavy and light strands, respectively.
a Primers used in ampli?cation.
s Primers used in sequencing.Thr regions in this study
0) Reference
TATGACCARCT Kumazawa and Nishida (1993)
CACCTACTATGAAA Macey et al. (1997)
GACCTCACAGAAGG Macey et al. (1998)
TTTGATAGAGT Macey et al. (1997)
CTTTGATAGAGT Schulte et al. (2003)
GGCCCATACC Macey et al. (1997)
CCGATAGCTT Macey et al. (1997)
GCAACAAAATA Macey et al. (1997)
AGTTGGGTTTGRTT Macey et al. (1997)
CTAGCCCC Macey et al. (1999)
TTTTACCAAAATGAC This study
GACTNCCAGAAGT Torres-Carvajal et al. (2006)
GTGTTTATTTCTA This study
CCTTCAAAGT This study
CCTTCAAAG Schulte et al. (2003)
ACTCAGACACTTT Macey et al. (1997)
TAGCTGTTAA Macey et al. (1997)
ATGTCTTTGTGRTT Macey et al. (1997)
GGGGCTCCAATTAT Weisrock et al. (2001)
ATCTCAGCATGATGAAA Kocher et al. (1989)
ATATCCTTCTGAGG Fu (1999)
TTTTGGTTTACAAGAC Kocher et al. (1989)
ARAAGTATGGGTGRAATGG This study
CCATACTTYTCATACAAAG This study
ACTAGGGTTGTTCC This study
CAACGAAGCAC This study
rget genes, sequences, and references are given. H and L designate primers
tRNAThr. Both hierarchical likelihood ratio tests and the
Akaike Information Criterion as implemented in Modeltest
v3.7 (Posada and Crandall, 1998) were used to ?nd the best
?tting model of sequence evolution for a tree estimated
using neighbor joining (NJ). Parameter values estimated
on the NJ tree were ?xed in initial heuristic searches under
the following conditions: (1) Starting trees were obtained
via NJ; (2) TBR branch-swapping; (3) reconnection limit
set to eight. Optimal tree(s) obtained from this search were
strategy on that tree. The tree with the highest partitioned
likelihood score is the best (ML) tree for each partitioning
strategy. The current implementation of mmml does not
conduct bootstrap analyses, so uncertainty of (support
for) the various clades implied by this ??point? estimate of
the phylogeny could not be evaluated. In addition, the
most complicated evolutionary model currently imple-
mented in this program is the general time reversible model
with rate variation among sites estimated using a gamma
J.A. Schulte II, K. de Queiroz /Molecular Phylogenetics and Evolution 47 (2008) 700?716 703used to estimate new parameter values under an identical
model. These new parameter values were then ?xed in a
second search with the same conditions as the initial run.
This process was repeated until the same tree and parame-
ter values were obtained in two successive searches. Boot-
strap resampling was applied using ML with 100
pseudoreplicates and heuristic searches as above except
that successive approximations were not conducted for
each pseudoreplicate. Parameter estimates for bootstrap
analyses were ?xed at the parameter values obtained from
the ?nal iteration of the corresponding successive approxi-
mations analysis. In our evaluation of branch support
strength, we considered a bootstrap value of P95% as
strongly supported (Felsenstein and Kishino, 1993), <95
to P70% as moderately supported, and <70% as weakly
supported.
The impact of using partition-speci?c (=??mixed?) evo-
lutionary models on phylogeny estimation has been inves-
tigated recently using both likelihood (Wilgenbusch and
de Queiroz, 2000) and Bayesian methods (Brandley et al.,
2005; Castoe et al., 2004, 2005, 2006; Nylander et al.,
2004). We used both approaches under the partitioning
strategies (assumed or prior data partitions) described in
Table 2. For ML analyses, the program mixed model max-
imum likelihood analysis (mmml), written by James C.
Wilgenbusch, uses PAUP* to ?nd an initial tree using
neighbor joining and to perform a heuristic likelihood
search saving N (in this case 100) best trees for all data
(unpartitioned) and for each partition (Table 2). Trees
from all data sets (each partition and the unpartitioned
data set) were combined into a single tree?le removing
duplicate topologies in the process. Likelihood scores for
these trees were calculated separately for each partition
and then summed to yield a likelihood score for each tree
under a given partitioning strategy. For each tree, the par-
titioned likelihood score is the sum of the likelihood scores
for each component partition of a particular partitioning
Table 2
Partitioning strategies (assumed partitions) used in this study
Partition strategy
All data combined
ND1?COI coding region; cyt b-tRNAThr coding region
All protein-coding genes; tRNAs
Codon positions of all protein coding genes; tRNAs
ND1; ND2; COI; cyt b; tRNAs
Codon positions of all protein coding genes; tRNA stem regions; tRNA nonsdistribution (GTR + C).
Bayesian analyses were performed in MrBayes 3.1?3.1.2
(Ronquist and Huelsenbeck, 2003) using a general time-
reversible model of sequence evolution with rate variation
among sites estimated using a gamma distribution and an
estimated proportion of invariant sites (GTR + I + C;
Tavare?, 1986) for the assumed (prior) partitioning strate-
gies outlined in Table 2. This model was chosen for com-
parison to the results of the ML analyses, in which
GTR + I + C was identi?ed as the most appropriate model
for several partitions (see Section 3.1?Phylogeny estima-
tion). Bayesian analyses under the various assumed parti-
tions (Table 2) also were performed using GTR + C (i.e.,
without I) to compare likelihood scores under this model
to those resulting from both ML analyses under the same
assumed partitions performed using mmml and mixture
model Bayesian analyses performed using BayesPhyloge-
nies (see below).
For each assumed partition analysis with MrBayes a run
of 4  106 generations and four Markov chains with
default heating values was used. Parameter values for the
model were estimated from the data and most were initi-
ated with default uniform priors except that branch lengths
were unconstrained (no molecular clock) with default
exponential priors. Trees and parameter values were sam-
pled every 1000 generations resulting in 4000 saved trees
per analysis, of which 2000 were discarded as ??burn-in?.
Stationarity was assessed by plotting the lnL per genera-
tion in the program Tracer 1.3 (Rambaut and Drummond,
2005) and using the average standard deviation of split fre-
quencies available in MrBayes. If the two simultaneous
runs are converging on the same tree, the average standard
deviation of split frequencies are expected to approach
zero. After con?rming that the analysis appeared to reach
stationarity, the resultant 4000 trees (2000 from each simul-
taneous run) were used to calculate Bayesian credibility
values (BC) for each branch in a 50% majority-rule consen-
Abbreviation Number of characters in partition(s)
P1 2763
P2 1697; 1066
P3 2192; 571
P4 731; 730; 731; 571
P5 84; 1033; 30; 1045; 571
tem regions P6 731; 730; 731; 378; 193
hylosus tree. Clades with BCP 95% were considered strongly
supported with the caveat that BC may overestimate sup-
port for reasons discussed by Alfaro et al. (2003), Douady
et al. (2003) and Lewis et al. (2005).
In addition, we analyzed our complete dataset under the
phylogenetic mixture model approach using BayesPhyloge-
nies (Pagel and Meade, 2004). By default, BayesPhyloge-
nies assigns uniform priors on a 0?100 interval for the
gamma and substitution rate parameters, an exponential
prior for branch lengths with a mean of 10, state frequen-
cies drawn from the Dirichlet distribution, and all trees are
considered equally likely a priori. These default settings
were used in all mixture model analyses. We performed a
total of 12 analyses with nQ and nQ + C mixture models,
where n varied between 1 and 6 independent rate matrices
(Qs) and C represents among-site rate variation under a
gamma distribution with four rate categories. A parameter
for the proportion of invariant sites is not implemented in
BayesPhylogenies. Each MCMC run consisted of 4  106
generations and four Markov chains. We allowed the Mar-
kov-chains to reach stationarity before sampling trees at
intervals of 1000 generations. Stationarity was assessed
by plotting the lnL per generation in Microsoft Excel
v. X and checking for no average improvement in the like-
lihood scores. We discarded the ?rst 2000 trees (2  106
generations) as ??burn-in;? thus, estimates of trees and
parameter values are based on 2000 trees sampled from
each run.
Conventional likelihood ratio tests used to distinguish
maximum likelihood estimates for di?erent partitioning
strategies are not applicable in a Bayesian framework
because they assume that parameter estimates are at their
maximum likelihood values, and MCMC methods sample
the posterior density of parameters rather than ?nding
maximum likelihood point estimates (Pagel and Meade,
2005). In a Bayesian context, an alternative approach to
model selection uses Bayes factors (Je?reys, 1935; Kass
and Raftery, 1995; Nylander et al., 2004). The Bayes factor
(BF) penalizes more complex hypotheses by multiplying
the probability of the data given the model parameters of
the hypothesis by the set of prior probability terms for each
parameter (numbers 6 1), which reduces the marginal like-
lihood. If we consider two alternative partitioning strate-
gies, a and b, as alternative hypotheses, then the Bayes
factor (Bab) is the ratio of the marginal likelihood (i.e.,
probability of the data given all the model parameters for
a particular hypothesis scaled by that hypothesis? prior
probability and integrated over all parameter values) of
hypothesis a to that of hypothesis b and represents a sum-
mary of the evidence provided by the data in favor of
hypothesis a relative to hypothesis b.
However, computing the marginal likelihoods of the
hypotheses (alternative partitioning strategies) is very di?-
cult in practice. We estimated the marginal likelihood of a
hypothesis by computing the harmonic mean of the ln-like-
704 J.A. Schulte II, K. de Queiroz /Molecular Plihood (lnL) values sampled from the posterior distribu-
tions of the two corresponding converged chains beingcompared (Kass and Raftery, 1995; Raftery, 1996).
Although the harmonic mean estimator can be unstable,
it is said to be su?ciently accurate for interpretation on a
natural logarithmic scale. Harmonic mean calculations
were made using the ??sump? command in MrBayes of all
saved trees after ??burn-in? for both MrBayes and Bayes-
Phylogenies runs. Following Pagel and Meade (2005),
Bayes factor estimates were calculated by subtracting the
harmonic mean lnL of hypothesis b from the harmonic
mean lnL of hypothesis a, multiplying this value by 2,
then subtracting the di?erence in number of free parame-
ters between models multiplied by ln(0.01), re?ecting the
uniform 1?100 prior distribution (see Table 6 for an exam-
ple). BF values from 2 to 5 were considered positive evi-
dence, >5 were strong evidence, and values >10 as very
strong evidence favoring one hypothesis over the other
(Kass and Raftery, 1995; Pagel and Meade, 2004; Raftery,
1996).
2.3. Hypothesis testing
Signi?cance tests were used to examine support for the
optimal tree(s) relative to alternative hypotheses. Phyloge-
netic hypotheses were tested by comparing the optimal
(parsimony or likelihood) topology in the absence of a con-
straint with the optimal tree under a topological constraint.
To ?nd the optimal tree(s) compatible with a particular
phylogenetic hypothesis, topologies containing the minimal
number of groups necessary to describe the hypothesis of
interest (Appendix B) were constructed using MacClade
4.08 (Maddison and Maddison, 2003). These partially
resolved topologies were loaded as constraints in PAUP*,
which was then used to analyze the sequence data under
the parsimony criterion using heuristic searches with the
same settings used to estimate the most parsimonious trees
in the absence of topological constraints. Phylogenetic
hypotheses were tested similarly under likelihood using
constraint trees and a successive approximations approach
using the same search strategies as described for searches in
the absence of topological constraints.
Wilcoxon signed-ranks (WSR) tests (Felsenstein, 1985b;
Templeton, 1983) were used with the parsimony trees. They
were conducted as two-tailed tests using PAUP*, which
incorporates a correction for tied ranks. Goldman et al.
(2000) criticized the application of the WSR test as applied
in many studies because this test assumes that phylogenetic
hypotheses are selected independently of the data analysis
(a priori). However, as the optimal trees used in all of
our tests corresponded with one of the four alternative
hypotheses identi?ed in previous studies (Fig. 1), they
may be considered to have been chosen a priori. In any
case, Shimodaira?Hasegawa (SH) (Shimodaira and Hase-
gawa, 1999) and Approximately Unbiased (AU) tests (Shi-
modaira, 2002) also were performed to test the optimal ML
tree in the absence of topological constraints (which corre-
genetics and Evolution 47 (2008) 700?716sponded to one of the a priori hypotheses) against the opti-
mal ML trees under each of the alternative hypotheses (see
hyloAppendix B for constraint trees). SH tests were conducted
in PAUP* using bootstrap resampling of 10,000 replicate
estimated ln-likelihoods of sites (RELL) and AU tests were
conducted in the program CONSEL (Shimodaira and
Hasegawa, 2001) with 10 sets of 10,000 bootstrap repli-
cates. Note that the constraint topology consistent with
Topology I was identical to the optimal ML tree in the
absence of constraints from the analyses of the combined
data, the ND1?COI partition, and the cyt b-tRNAThr parti-
tions. Therefore, SH and AU tests included only the ML
topology and the best trees compatible with Topologies
II?IV.
Alternative hypotheses also were tested in a Bayesian
framework (Huelsenbeck et al., 2002) using the post-
burn-in samples of 4000 trees from the analyses with
assumed partitions. Topologies corresponding to the alter-
native hypotheses (Appendix B) were loaded as constraints
in PAUP*, and post-burn-in trees for each of the six parti-
tioning strategies (Table 2) were imported. Each of the four
alternative topological constraints was then used to ?lter
the trees, saving only those trees that were compatible with
a particular constraint. The Bayesian credibility value for
the associated hypothesis was calculated by dividing the
number of saved trees by the total number in the post-
burn-in sample (4000). If the fraction of ?ltered trees was
less than 5% of the total, then the constrained topology
in question was outside of the 95% credible interval (i.e.,
was rejected).
3. Results
3.1. Sequence variation
Prior to alignment, the fragment spanning ND1?COI
was between 1732 and 1750 base pairs in length and the
cyt b to tRNAThr fragment was between 1077 and 1093 base
pairs across all samples. For the taxa sampled in this study,
alignment of sequences for protein-coding genes was
unambiguous. Among tRNA genes, several loop regions
could not be aligned unambiguously as was the case for
some noncoding regions between genes. Regions that could
not be aligned unambiguously were parts or all of the
dihydrouridine (D) loops for tRNAIle (positions 104?109),
tRNAMet (positions 250?252), tRNATrp (positions 1356?
1361), tRNACys (positions 1644?1647), and tRNAThr (posi-
tions 2852?2863). Nucleotide positions in the TWC (T)
loops for the genes encoding tRNATrp (positions 1393?
1399) and tRNACys (positions 1606?1612) were aligned
ambiguously as were noncoding sequences between the
genes encoding ND1 and tRNAIle (positions 85?88),
tRNAGln and tRNAMet (positions 231?235), ND2 and
tRNATrp (positions 1337?1339), tRNATrp and tRNAAla
(positions 1413?1417), tRNAAla and tRNAAsn (positions
1487?1490), tRNACys and tRNATyr (positions 1661?
1664), tRNA Tyr and COI (positions 1737?1745), and cyt
J.A. Schulte II, K. de Queiroz /Molecular Pb and tRNAThr (positions 2821?2838). These ambiguously
aligned regions were removed prior to conducting analyses.All taxa used for phylogenetic analysis had a recognizable
origin for light-strand replication (OL) between the
tRNAAsn and tRNACys genes according to the criteria out-
lined in Macey et al. (1997). However, the OL loop could
not be aligned unambiguously; therefore, this region (posi-
tions 1574?1584) was excluded. Excluded regions comprise
less than 3.8% (108 of 2871) of the aligned sequence posi-
tions. Aligned DNA sequences are available in TreeBASE
(Study accession number = S1936; Matrix accession
number = M3568).
Aligned DNA sequences for all regions comprised 2871
nucleotide positions. Of 2763 inferred unambiguous sites in
31 aligned sequences, 1200 (1003 ingroup only) are vari-
able, 987 (858 ingroup only) are informative under the par-
simony criterion, and there are 1075 distinct data (site)
patterns under the GTR + C + I model in the combined
data set. The partition spanning ND1?COI has 735 (599
ingroup only) variable characters of which 587 (506
ingroup only) are parsimony informative and 671 distinct
data patterns. The region spanning cyt b to tRNAThr has
465 (404 ingroup only) variable characters of which 400
(352 ingroup only) are parsimony informative and 452 data
patterns. For all protein-coding genes (ND1, ND2, COI,
cyt b), codon positions vary in their contribution of phylo-
genetic information with 272 variable ?rst positions (224
parsimony informative; 261 data patterns), 109 variable
second positions (73 parsimony informative; 101 data pat-
terns), and 650 variable third positions (571 parsimony
informative; 622 data patterns). Among tRNA genes stem
regions contribute 119 variable positions (81 parsimony
informative; 116 data patterns) and non-stem regions have
50 variable positions (38 parsimony informative; 55 data
patterns).
3.2. Phylogeny estimation
3.2.1. Single model and set of parameter values
(Unpartitioned Parsimony and ML)
Parsimony analysis of the equally weighted data set
including all mtDNA regions identi?ed three most-parsi-
monious (MP) trees (length = 4200 steps). Two of these
trees conform to topology I of Fig. 1, and di?er from
one another only in the position of the clade containing
Callisaurus spp. and C. d. bogerti, while the third is com-
patible with topology III. Analysis of 1697 base pairs span-
ning the region from ND1?COI yielded four MP trees
(length = 2324 steps), which are all compatible with topol-
ogy I (Fig. 1). The trees di?er in the positions of Holbrookia
propinqua and the clade containing Callisaurus spp. and C.
d. bogerti. Two MP trees were obtained from analysis of
1066 nucleotide positions spanning genes encoding cyt b
and tRNAThr (length = 1866 steps). These trees also con-
form to topology I (Fig. 1) and di?er from each other in
the position of the clade containing Callisaurus spp. and
C. d. bogerti. High bootstrap support for most branches
genetics and Evolution 47 (2008) 700?716 705was found for trees resulting from each of these partitions
and the combined data set (Table 3 and Fig. 2). The com-
in
D1?
5
0
0
0
0
0
0
0
0
7
6
0
0
4
0
0
7
0
hyloTable 3
Bootstrap proportions and Bayesian credibility values for nodes labeled
partitioning strategies
Branch number Parsimony Likelihood
Combined ND1?COI Cytb-tRNAThr Combined N
1 77 98 ? ? 9
2 100 100 54 100 10
3 100 100 76 100 10
4 98 89 75 100 10
5 100 99 73 99 10
6 100 100 99 100 10
7 100 100 100 100 10
8 100 100 100 100 10
9 100 100 100 100 10
10 70 60 75 72 5
11 99 99 60 99 9
12 100 100 100 100 10
13 100 100 99 100 10
14 54 ? ? 62 6
15 100 100 100 100 10
16 100 100 100 100 10
17 100 100 100 100 9
18 56 56 ? 70 ?
19 100 100 100 100 10
706 J.A. Schulte II, K. de Queiroz /Molecular Pbined data set had the highest average bootstrap values
while the cyt b-tRNAThr region had the lowest with the
ND1?COI fragment intermediate (Table 3; Fig. 2).
Both hierarchical likelihood ratio tests (LRT) and the
Akaike information criterion (AIC) selected the
GTR + I + C model as the most appropriate model for
the combined, unpartitioned data. Using these data and
that model in a successive approximations ML analysis, a
single topology was found (lnL = 21799.01) conforming
to Topology I (Fig. 3). Analysis of the ND1?COI data
set with Modeltest selected the TrN + I + C model using
LRTs and the GTR + I + C model using the AIC. In this
case, the GTR + I + C model was never compared to the
TrN + I + C model in the path taken by the program for
the LRTs. These models di?er by 16.77 likelihood units
and are signi?cantly di?erent (P < 0.01) when a LRT is
applied; thus, we used the GTR + I + C model for subse-
quent ML analyses of the ND1?COI data set. Application
of the successive approximation ML analysis yielded a sin-
gle ML tree (lnL = 12685.57) that conformed to Topol-
ogy I (Fig. 1). The model selected for the mtDNA
fragment spanning cyt b-tRNAThr was GTR + I + C using
LRTs but HKY + I + C using the AIC. These models were
not compared by Modeltest using the LRT and di?er by
20 90 82 66 99 88
21 100 100 100 100 100
22 100 100 99 100 100
23 96 91 84 100 100
24 92 53 93 92 ?
25 100 100 100 100 100
26 89 83 66 99 98
27 95 92 75 100 89
28 100 99 98 100 98
P1?P6 refer to the partitioning strategies listed in Table 2. Empty cells (????) re
majority rule consensus tree.Fig. 2 based on di?erent analytical methods and character partitions or
Bayesian
COI Cytb-tRNAThr Assumed partitions (P1?P6) Mixture models
1?6Q 1?6Q + C
? 73?94 <50?94 <50?92
69 100 100 100
100 100 100 100
78 100 100 100
96 100 100 100
100 100 100 100
100 100 100 100
100 100 100 100
100 100 100 100
88 98?100 86?100 85?100
72 100 100 100
100 100 100 100
96 100 100 100
? 79?88 82?100 72?90
99 100 100 100
100 100 100 100
94 100 100 100
58 85?97 63?98 69?91
97 100 100 100
genetics and Evolution 47 (2008) 700?716only 0.75 likelihood units, which is not signi?cant accord-
ing to the LRT (P > 0.05), so we used the HKY + I + C
model. The successive approximations ML analysis of this
fragment found one tree (lnL = 9043.26) that conformed
to Topology I (Fig. 1). Trees from the combined data and
the ND1?COI region were almost identical, the only excep-
tion being the position of Uma inornata, while the topology
found using the cyt b fragment di?ered from the other two
in the position of the clade containing C. d. bogerti and C.
d. spp. and the relationships of the outgroup species to each
other. As with MP analyses, most branches of the ML trees
were well supported (Table 3), though about 15.5% of the
branches were poorly to moderately supported, and ca.
13% were poorly supported.
3.2.2. Mixed model ML (assumed partitions)
Each of the six partitioning strategies (Table 2) was ana-
lyzed using the GTR + C model as implemented in mmml.
Current implementation of the program allows speci?ca-
tion of only one model, though estimated parameter values
can di?er between partitions. Although the GTR model is
not necessarily optimal for all of the various partitions, it
was selected by both LRTs and the AIC for both the
unpartitioned data set and the ND1?COI partition, and
77 100 100 100
100 100 100 100
97 100 100 100
92 100 100 100
95 100 100 99?100
99 100 100 100
57 100 100 100
98 100 100 100
96 100 100 100
fer to bootstrap values less than 50% and thus not present in the bootstrap
hyloJ.A. Schulte II, K. de Queiroz /Molecular Pby LRTs for the cyt b partition. Moreover, using a single
model facilitates the comparison of alternative partitioning
strategies using LRTs (Table 4). The optimal trees for all
partition strategies (Table 2) conformed to Topology I
(Fig. 1), with the strategy allowing the three codon posi-
tions of all protein coding genes, tRNA stem regions,
and tRNA nonstem regions each to have a unique set of
parameter values (P6) having the highest likelihood
(lnL = 20545.99; Table 4). According to LRTs (Table
4), P6 was signi?cantly better than all nested partitioning
strategies (P1, P3, and P4). For all other comparisons
involving nested strategies, strategies with greater numbers
of partitions were signi?cantly better than those with fewer
partitions (P2 versus P1, P3 versus P1, P4 versus P1 and P3,
P5 versus P1 and P3). Finally, for comparisons involving
similar numbers of partitions, partitioning strategies based
on structural and/or functional properties of the molecules
(codon position and RNA secondary structural features)
Fig. 2. Phylogenetic relationships of phrynosomatine sand lizards based
on parsimony, likelihood, and Bayesian analysis of combined mtDNA
sequences. The numbers above each node correspond to the clade numbers
in Table 3, which gives bootstrap proportions (MP and ML) and Bayesian
credibility (BC) values for the various nodes. The topology shown
corresponds to 1 of 3 MP trees for the combined data and 1 of 4 MP trees
for the ND1?COI fragment, ML trees for the unpartitioned, combined
data and the ND1?COI fragment, and the Bayesian consensus trees for
assumed Bayesian partition strategies P1?P6, and mixture models 2Q?5Q,
1Q + C, and 3Q?6Q + C (note that the analyses favoring this particular
topology are a subset of those favoring Topology I). Dashed branches
indicate that alternative topological arrangements were found in the
optimal tree resulting from one or more analyses or in one or more of the
three most-parsimonious trees.genetics and Evolution 47 (2008) 700?716 707explained the data better than those based on contiguous
sequences and/or genes (e.g., P3 versus P2, P6 versus P5)
though these di?erences were not tested for signi?cance
because the partitions are not nested. Indeed, P4, which
has a separate partition for each codon position, explained
the data better than P5, which has a separate partition for
each protein-coding gene, despite having fewer partitions
(4 versus 5).
3.2.3. Mixed model bayesian (assumed partitions)
The consensus topologies of the 4000 trees in the poster-
ior distributions were identical for all partitioning strate-
gies using both GTR + I + C and GTR + C and
congruent with Topology I, as well as being identical to
one of the MP trees and the ML tree for the combined
(unpartitioned) data (Fig. 2). The only major e?ect of the
di?erent partitions was that BC values for node 18 (Coph-
osaurus plus Holbrookia) increased from 85% to 90% using
P1?P3 and 91% with P5 to 96% and 97% using strategies P4
and P6, respectively (Table 3). Most nodal BC values were
high (BC > 95%) under all partitioning strategies, and only
two nodes (1 and 14) had BC values below 95% under most
partitioning strategies.
Fig. 3. ML phylogram of sand lizards relationships based on analysis of
unpartitioned data containing 2763 included characters spanning the
regions from ND1 to COI and cyt b to tRNAThr under the GTR + I + C
model. Branches are proportional to their lengths. The ML score of the
single topology obtained using the GTR + I + C model is
lnL = 21799.01.
Table 4
Comparison of alternative partitioning strategies (assumed partitions) based on mixed model maximum likelihood analyses using the ML tree for each
partition, all of which correspond to Topology I
Partition(s) Number of partitions ML score (lnL) lnL di?erences; P values
(partition compared)
All data combined (P1) 1 21886.84a
ND1?COI coding region; cyt b-tRNAThr coding region (P2) 2 21821.79 65.05; P < 0.001 (P1)
All protein-coding genes; tRNAs (P3) 2 21699.59 187.25; P < 0.001 (P1)
Codon positions of all protein coding genes; tRNAs (P4) 4 20610.58 1276.26; P < 0.001 (P1)
1089.01; P < 0.001 (P3)
ND1; ND2; COI; cyt b; tRNAs (P5) 5 21550.62 336.22; P < 0.001 (P1)
148.97; P < 0.001 (P3)
Codon positions of all protein coding genes; tRNA stems;
tRNA nonstems (P6)
5 20545.99 1340.85; P < 0.001 (P1)
1153.60; P < 0.001 (P3)
64.59; P < 0.001 (P4)
P values for comparisons of nested partitioning strategies are based on likelihood ratio tests.
a Current models implemented in the ??mmml? program do not include a parameter for the proportion of invariant sites (I), hence the ML score for all
data combined (P1) in this analysis is not as good as that obtained for the same data set in analyses that included this parameter (e.g., Table 7).
708 J.A. Schulte II, K. de Queiroz /Molecular Phylogenetics and Evolution 47 (2008) 700?716The harmonic mean lnL (Table 5) and approximate
Bayes Factors (Table 6) were used to evaluate the ability
of the six assumed (prior) partitioning strategies to explain
the data set. Among all strategies P6, which separately ana-
lyzed each codon position, tRNA stems, and tRNA non-
stems, explained the data better than the next best partition
(P4) by almost 23 lnL units and an approximate BF of
3.85. Partition identity was found to have the greatest e?ect
on improving model performance. Partitions that sepa-
rated each protein-coding codon position (P4 and P6) were
favored over partitions that were de?ned by fragment or
gene boundaries (P2 and P5) or gene type (protein-coding
genes and tRNAs?P3) as indicated by both lnL (Table
5) and BF estimates (Table 6). Even though strategy P4,
which has a partition for each codon position, has fewer
partitions (4 versus 5) than P5, which does not, the har-
monic mean lnL score of P4 is almost 940 units better
than that of P5, and the BF estimate of 1918.07 over-
whelmingly favors P4. Partitioning tRNAs by stem and
non-stem regions (P6 versus P4) also improved the model,
though the improvement was far less in terms of both the
Table 5
Arithmetic mean (Xa) and harmonic mean (Xh?  ln Ls for sets of trees obtained
MrBayes) and mixture models (implemented in BayesPhylogenies)
Assumed partitions Mixture models
Partitioning
strategy
Xa  lnL Xh  ln L Free
parameters
Model Xa  ln L
P1 21911.82 21940.15 9 1Q 24141.80
P2 21870.94 21904.18 18 2Q 21943.82
P3 21752.79 21791.47 18 3Q 21755.12
P4 20766.30 20803.95 36 4Q 21636.08
P5 21701.16 21742.26 45 5Q 21594.21
P6 20736.52 20781.30 45 6Q 21575.72
Assumed partitions results are from analyses using the GTR + C model (no
mixture model analyses results (see text for explanation). Free parameters for a
set to 1), 3 for base frequencies, and 1 for gamma shape parameter; 9 paramet
implemented in MrBayes. Free parameters for the 1Q + C mixture model are th
7 parameters are added, 6 parameters for the rate matrix (G to T substitution ra
value of that matrix.harmonic mean lnL score (ca. 23 lnL units) and the
BF (3.85).
3.2.4. Mixture model bayesian
The consensus topology of the 2000 trees in the pos-
terior distribution from all mixture model analyses (1?
6Q and 1?6Q + C) was congruent with Topology I
(Fig. 1). All models yielded identical topologies for sand
lizard relationships with the only di?erence between
topologies being the relative position of the outgroup
taxa Sceloporus, Uta, and Urosaurus. BC values for the
relationship of Cophosaurus and Holbrookia (node 18)
were above 95% only for the 1Q analysis (98%) and
below this cuto? for all other mixture models. The sis-
ter-taxon relationship between Uma notata and U. inor-
nata (node 10) was below 95% using a single rate
matrix with or without gamma but the addition of more
than one rate matrix yielded BC values above 95% for
all other models. Interestingly, node 14 had a BC value
above 95% using models 1?4Q but below 95% when
models 5?6Q and 1?6Q + C were used.
in Bayesian analyses based on assumed (prior) partitions (implemented in
Xh  ln L Free
parameters
Model Xa  ln L Xh  lnL Free
parameters
24155.32 8 1Q + C 21924.03 21939.13 9
21970.29 15 2Q + C 21747.91 21761.55 16
21770.30 22 3Q + C 21656.37 21674.23 23
21651.64 29 4Q + C 21633.54 21651.82 30
21621.55 36 5Q + C 21633.52 21652.69 37
21600.21 43 6Q + C 21634.24 21651.57 44
t incorporating an estimate of invariant sites) to allow comparison with
ssumed partitions are calculated as follows: 5 for rate matrix (G to T rate
ers are added for each additional partition as these values are unlinked as
e same as for P1 under GTR + C; however, for each rate matrix above 1Q
tes for additional matrices are free to vary) and 1 parameter for the weight
gies
1Q
4
4
4
6
4
6
hyloTable 6
Approximate BF estimates used to compare alternative partitioning strate
Partitioning strategies
P1 P2 P3 P4 P5 P6
P1 ? 30.49 255.91 2148.06 229.99 2151.91
P2 ? 225.42 2117.57 199.50 2121.42
P3 ? 1892.15 25.92 1896.00
P4 ? 1918.07 3.85
P5 ? 1921.92
P6 ?
1Q
2Q
3Q
4Q
5Q
J.A. Schulte II, K. de Queiroz /Molecular PThe harmonic mean lnL score of the model with one
rate matrix and gamma rate-heterogeneity (1Q + C) is
about 2216 units better than the score of the model with
one rate matrix alone (1Q) (Table 5), indicating that there
is a large component of rate-heterogeneity in the data.
When a second Q matrix without C is used, the harmonic
mean lnL score is only about 30 units below that of the
1Q + C model suggesting that an additional rate matrix
(pattern heterogeneity) is an alternative way of accounting
for rate heterogeneity (Pagel and Meade, 2004). However,
the additional rate matrix comes at the cost of seven addi-
tional parameters (6 substitution rate categories plus 1
weight term) versus a single parameter, C, and conse-
quently, the simpler model is strongly favored by the
approximate BF (Table 6). As the number of rate matrices
6Q
1Q + C
2Q + C
3Q + C
4Q + C
5Q + C
6Q + C
1Q + C 2Q + C 3Q + C
P1 2.04 324.96 467.3
P2 28.45 294.47 436.8
P3 253.87 69.05 211.4
P4 2146.02 1823.10 1680.6
P5 227.95 94.97 237.3
P6 2149.87 1826.95 1684.5
1Q 4427.77 4750.70 4893.1
2Q 89.95 412.87 555.2
3Q 277.79 45.13 187.5
4Q 482.88 159.95 17.5
5Q 510.82 187.90 45.4
6Q 521.26 198.34 55.9
1Q + C ? 322.92 465.3
2Q + C ? 142.4
3Q + C ?
4Q + C
5Q + C
6Q + C
Bayes factors were calculated using the following formula: BF = 2 (lnLa 
Likelihoods (from Table 5) and the number of free parameters of model (partit
yields: 2*(21940.15  (21904.18))  ((9  18)*ln(0.01)) = 30.49. BF values
and values >10 as very strong evidence favoring one model over the other (Ka
values favor the model in the column heading and negative BF values favor tand mixture models
2Q 3Q 4Q 5Q 6Q
425.73 87.91 279.83 484.92 512.86 523.30
456.23 118.40 249.34 454.42 482.37 492.81
681.65 343.82 23.92 229.00 256.95 267.39
573.80 2235.97 1868.23 1663.14 1635.20 1624.76
655.73 -317.90 49.84 254.92 282.87 293.31
577.65 2239.82 1872.08 1667.00 1639.05 1628.61
? 4337.82 4705.57 4910.65 4938.60 4949.04
? 367.74 572.83 600.77 611.22
? 205.08 233.03 243.47
? 27.94 38.39
? 10.44
genetics and Evolution 47 (2008) 700?716 709increases (from 1 to 6), both with and without gamma, the
rate of score improvement reaches a plateau (Table 5). For
the 5Q and 6Q mixture models, the scores are higher with-
out C than with it (Table 5), which may result from sto-
chastic variation or insu?cient mixing of chains due to
relatively high numbers of parameters (A. Meade, pers.
comm.). These factors may also explain why 1Q + C and
the identical P1 (under GTR + C) di?er slightly with
regard to both mean lnL scores (Table 5) and approxi-
mate Bayes Factor (Table 6).
Determining how many rate matrices to estimate is com-
monly done using a combination of Bayes factors (Raftery,
1996) and examining the variability in estimated parame-
ters (Pagel and Meade, 2004; Pagel and Meade, 2005).
Based on BFs (Table 6) it appears that the model with
?
4Q + C 5Q + C 6Q + C
7 479.95 445.98 415.98
7 449.46 415.48 385.49
5 224.04 190.06 160.07
9 1668.11 1702.09 1732.08
7 249.96 215.98 185.99
5 1671.96 1705.94 1735.93
0 4905.69 4871.71 4841.71
8 567.86 533.89 503.89
3 200.12 166.14 136.15
5 4.97 38.94 68.94
9 32.91 66.89 96.88
4 43.35 77.33 107.33
3 477.91 443.94 413.94
0 154.99 121.01 91.02
12.58 21.39 51.39
? 33.98 63.97
? 30.00
?
lnLb)  (Pa  Pb), where lnLx and Px are the harmonic mean of the ln
ion strategy) x. For example, comparing models P1 (row) and P2 (column)
from 2 to 5 were considered positive evidence, >5 were strong evidence,
ss and Raftery, 1995; Pagel and Meade, 2004; Raftery, 1996). Positive BF
he model in the row heading.
six rate matrices and no gamma is justi?ed as the best solu-
tion to these data. However, the model with ?ve rate matri-
ces may be more appropriate because, as Pagel and Meade
(2005) point out, the Bayes factor test assumes that all sites
are independent whereas the true number of independent
sites in our data set is likely to be fewer than the 2763
aligned sites, given phenomena such as the pairing of bases
in the stem regions of the tRNAs. Non-independence has
the e?ect of in?ating the di?erence in likelihood between
models (Pagel and Meade, 2005).
We also can use the matrix weights (provided by Bayes-
Phylogenies) as a guide for evaluating how many rate
matrices to estimate (Pagel and Meade, 2004; Pagel and
Meade, 2005). If additional rate matrices have much lower
weights and high variability (measured by standard devia-
tion) than other matrices in the analysis then it is likely that
the additional rate matrices are being poorly estimated.
For the 5Q model, rate matrix weights were
0.05,0.15,0.15, 0.56, and 0.09 with all standard deviations
below 0.025 suggesting an adequate estimation of these
parameters by the data. Other models, 4Q, 6Q, and 4?
6Q + C, had either lower rate matrix weights for some
matrices or higher variability in those matrix weight esti-
els are strongly favored even when penalized for additional
parameters in the comparisons involving estimated Bayes
factors (Table 6).
3.3. Tests of alternative hypotheses
Optimal topologies (Topology I, Fig. 1) for the com-
bined, unpartitioned data set, the ND1?COI fragment,
and the cyt b-tRNAThr fragment were tested against each
of the three alternative hypotheses (Topologies II?IV,
Fig. 1) for each of the same partitions under both parsi-
mony (WSR test) and likelihood (SH and AU tests) (Table
7). The combined data set as well as the ND1?COI frag-
ment found Topology I to explain the data signi?cantly
better than Topology II with the three tests; however, the
cyt b-tRNAThr fragment failed to reject Topology II using
all tests (Table 7). All three data sets failed to reject Topol-
ogy III in favor of Topology I using WSR, SH, and AU
tests. Using the combined data set, all three tests rejected
Topology IV, and whereas the ND1?COI fragment could
not reject Topology IV using any of the tests, the AU test
(but not the others) rejected this topology using the cyt b-
Thr
and
e sp
710 J.A. Schulte II, K. de Queiroz /Molecular Phylogenetics and Evolution 47 (2008) 700?716mates; therefore we propose that the 5Q model may be
the most appropriate for this data set based on the two cri-
teria outlined above. In any case, none of these mixture
models (4?6Q, 4?6Q + C) had lnL scores as good as
the best scores for assumed partitions (strategies P4 and
P6) under the GTR + C model (Table 5). Although some
of this advantage may result from larger numbers of free
parameters in P4 and P6 relative to mixture models with
4?6Q with and without gamma (Table 5), the former mod-
Table 7
Results of Wilcoxon signed-ranks (WSR), Shimodaira?Hasegawa (SH),
hypotheses (topologies)
Tree WSR tests
Length N P
Topology I
Combined 4200 (Best) N/A
ND1?COI 2324 (Best) N/A
Cyt b-tRNAThr 1866 (Best) N/A
Topology II
Combined 4221 101 0.037*
ND1?COI 2341 55 0.022*
Cyt b-tRNAThr 1874 22 0.088
Topology III
Combined 4200 (Best) N/A
ND1?COI 2325 47 0.884
Cyt b-tRNAThr 1867 17 0.808
Topology IV
Combined 4214 36 0.019*
ND1?COI 2335 69 0.185
Cyt b-tRNAThr 1874 30 0.144
For each hypothesis and data set, the score of the best tree conforming to th
trees, and the P-value are given. The four competing hypotheses are illustr
hypothesis, are indicated with an asterisk.tRNA fragment.
The results of Bayesian hypothesis tests of the four alter-
native hypotheses of sand lizard relationships (Fig. 1) for
the six di?erent partitioning strategies based on assumed
partitions (Table 2) are presented in Table 8. Topology I
was compatible with a substantial majority (P87%) of
the trees in the post-burn-in sample for all six strategies,
although greater than 95% of the trees were compatible
with this constraint only for P4 and P6 (Table 8). Using
a 95% credible interval cuto?, Topologies II and IV were
approximately-unbiased (AU) tests comparing alternative phylogenetic
SH tests AU tests
lnL d lnL P P
21799.01 (Best) N/A N/A
12685.57 (Best) N/A N/A
9043.26 (Best) N/A N/A
21822.80 23.79 0.014* 0.002*
12704.15 18.58 0.033* 0.011*
9048.80 5.53 0.188 0.090
21801.16 2.15 0.631 0.323
12686.21 0.64 0.736 0.525
9045.21 1.95 0.518 0.233
21822.39 23.38 0.024* 0.012*
12700.71 15.14 0.091 0.071
9051.85 8.59 0.079 0.042*
eci?ed hypothesis, the number of characters that discriminate between two
ated in Fig. 1. Signi?cant values (P 6 0.05), indicating rejection of the
formed to this topology, as did all optimal trees resulting
ode
ning
1*
1*
1*
1*
1*
1*
a re
hylofrom parsimony analysis of the ND1?COI fragment, parsi-
mony analysis of the cyt b-tRNAThr fragment, successive
approximations likelihood analysis under a single set of
parameter values for the combined data set, the ND1?
COI fragment, and the cyt b-tRNAThr fragment, mixed
model maximum likelihood analysis with partition-speci?crejected using all six assumed partitioning strategies.
Topology III could not be rejected using P1, P2, P3, and
P5 but was rejected by strategies P4, and P6, the partitions
that explain the data best (Table 6) and thus presumably
represent the most powerful tests.
4. Discussion
4.1. Sand lizard relationships
Previous studies have identi?ed four competing hypoth-
eses of higher-level relationships among phrynosomatine
sand lizards (Fig. 1). Although the results of most recent
studies have favored one of these alternative hypotheses
(Topology I; ChangChien, 1996; de Queiroz, 1989, 1992;
Reeder and Wiens, 1996; Wilgenbusch and de Queiroz,
2000; but see Leache? and McGuire, 2006), previously col-
lected data have not been su?cient to reject the alternative
hypotheses according to conventional statistical criteria.
The results of the present study provide substantially
stronger support for Topology I over alternative phyloge-
netic hypotheses. Two out of three optimal trees resulting
from parsimony analysis of the combined data set con-
Table 8
The number and frequency of trees in post burn-in samples from mixed m
hypotheses of sand lizard relationships (Fig. 1) under six assumed partitio
Topology 1 Topology 2
# Compatible trees Freq # Compatible trees Freq
P1 3601 =0.900 0 <0.00
P2 3637 =0.909 0 <0.00
P3 3511 =0.878 0 <0.00
P4 3900 =0.975 0 <0.00
P5 3544 =0.886 0 <0.00
P6 3874 =0.969 0 <0.00
Frequencies are based on samples of 4000 post-burn-in trees; those below
J.A. Schulte II, K. de Queiroz /Molecular Psets of parameter values under six di?erent partitioning
strategies (P1?P6) applied to the combined data set, Bayes-
ian analysis with partition-speci?c sets of parameter values
under six di?erent partitioning strategies (P1?P6) applied
to the combined data set, and Bayesian mixture models
with one to six rate matrices both with and without a
parameter for rate heterogeneity applied to the combined
data set. In addition, explicit tests of alternative phyloge-
netic hypotheses based on parsimony and likelihood (Table
7) as well as Bayesian (Table 8) approaches indicate that
the combined data set signi?cantly favors Topology I over
both Topology II and Topology IV (failure to reject these
topologies using the separate gene fragments under parsi-
mony and likelihood presumably re?ects decreased testpower associated with smaller sample sizes when the frag-
ments are analyzed separately).
Of the three alternatives to Topology I, only Topology
III is not consistently rejected by our data. One of the three
optimal trees resulting from the parsimony analysis of the
combined data conformed to this topology, which was
ranked a close second in the likelihood analysis of the com-
bined data set and in both parsimony and likelihood anal-
yses of the separate ND1?COI and the cyt b-tRNAThr
fragments (Table 7). Moreover, Topology III was not even
close to being rejected by the various explicit parsimony-
and likelihood-based tests of alternative phylogenetic
hypotheses for the separate gene fragments and the com-
bined (but unpartitioned) data (P values ranging from
0.518 to 1.00). Topology III and Topology I are similar
in that both have Uma as the sister group of the remaining
sand lizards, and they di?er only in placing Cophosaurus as
the sister group of Callisaurus (Topology III) versus Holb-
rookia (Topology I). Consistent with the weak support for
Topology I over Topology III, the node uniting Cophosau-
rus and Holbrookia (node 18) is relatively weakly sup-
ported (Table 3). On the other hand, Topology I is
favored over Topology III not only by all likelihood and
Bayesian analyses of our combined mtDNA data set, but
also by previous analyses of allozymes, morphology,
immunology, and mtDNA (ChangChien, 1996; de Que-
iroz, 1989, 1992; Reeder and Wiens, 1996; Wilgenbusch
and de Queiroz, 2000). Moreover, under our Bayesian
hypothesis tests (Table 8), which employed partition spe-
ci?c parameter estimates, Topology III is much closer to
being rejected even in the worst case (P = 0.113 for strat-
l Bayesian analyses that are compatible with each of the four alternative
strategies (Table 2)
Topology 3 Topology 4
# Compatible trees Freq # Compatible trees Post P
376 =0.094 0 <0.001*
329 =0.082 0 <0.001*
453 =0.113 0 <0.001*
91 =0.023* 0 <0.001*
429 =0.107 0 <0.001*
120 =0.030* 0 <0.001*
jection cuto? of 0.05 are indicated with an asterisk.
genetics and Evolution 47 (2008) 700?716 711egy P3) than in the case of the parsimony and likelihood
tests based on combined but unpartitioned data (Table
7), and it is rejected by the partitioning strategies that
explain the data best (P4 and P6, Tables 5 and 6) and thus
are expected to be associated with the most powerful tests.
Thus, in contrast to the situation before the present study,
Topology I can now be considered highly corroborated
(strongly supported) relative to the three alternative phylo-
genetic hypotheses (Fig. 1).
4.2. Phylogenetic analysis under heterogeneous processes
Accounting for heterogeneity in nucleotide sequence
evolution is an active area of investigation in phylogenetics
hylo(Brandley et al., 2005; Castoe et al., 2004; Huelsenbeck and
Nielsen, 1999; Nylander et al., 2004; Pagel and Meade,
2004; Wilgenbusch and de Queiroz, 2000). Studies utilizing
sequences from di?erent parts of the genome have to
account for variation in evolutionary processes and the
resulting patterns that exist both among and within those
sequences. For example, protein-coding genes are expected
to exhibit di?erent patterns of sequence evolution than
genes coding for ribosomal or transfer RNAs; similarly,
di?erent codon positions within protein coding genes and
positions that code for stem versus non-stem regions in
RNAs are likewise expected to exhibit di?erent patterns.
Application of a single set of model parameter estimates
to heterogeneous data partitions will often result in a poor
?t to the data. A better strategy is to assess the ability of
di?erent models and/or parameter estimates to explain
the data (Swo?ord et al., 1996).
Two primary approaches have been developed for
assessing and incorporating heterogeneity into phyloge-
netic analyses. The ?rst is to partition data into biologically
meaningful classes based on prior knowledge of the charac-
ters, and then to assume a di?erent evolutionary model for
each class. In the case of nucleotide sequences, partitioning
is commonly based on transcriptional or translational
products, such as protein-coding genes, tRNA genes, or
non-transcribed DNA. This is by far the most common
strategy and is easily implemented in some existing com-
puter programs, such as MrBayes (Ronquist and Huelsen-
beck, 2003). A second approach is to invoke an increasing
number of evolutionary models without a priori partition-
ing of the data, allowing the data to estimate the appropri-
ate number of models while attempting to avoid over-
parameterization (=mixture-model approach; Pagel and
Meade, 2004). An advantage of the latter method is that
variation within a partition that may have been missed
by the former strategy can be taken into account (Pagel
and Meade, 2004). It is also advantageous when biologi-
cally meaningful partitions are not evident.
Consistent with previous studies (Brandley et al., 2005;
Nylander et al., 2004), partition-speci?c parameter esti-
mates generally improved the ?t of our data by allowing
di?erent components of the data set to have distinct values.
For example, substitution rates among 1st, 2nd, and 3rd
codon positions for protein coding genes each had di?erent
parameter values that exhibited limited overlap in their
95% credible intervals. This result was found using both
mixed model ML and Bayesian analyses with assumed par-
titions (results not shown). Brandley et al. (2005) noted
that data partitioning greatly reduces systematic error
and improves mean lnLs and posterior probabilities.
Our results support this conclusion with regard to tree
scores but not necessarily with regard to clade posterior
probabilities. Tree scores for ML mixed model and Bayes-
ian analyses using assumed partitions generally improved
with increasing numbers of partitions (except P4 was better
712 J.A. Schulte II, K. de Queiroz /Molecular Pthan P5; Table 4), as they did for mixture models with
increasing numbers of Q matrices up to 6Q without Cand 4Q with C (Table 5). For Bayesian credibility values,
however, alternative partitioning strategies showed no
obvious trend toward improvement. Partitioning strategies
P4 and P6, among the most complex partitions assumed,
showed an increase in BC values for node 18 on our phylo-
genetic tree, but for node 1 there was a decrease in BC val-
ues and node 14 showed no improvement. In mixture
model analyses, BC values for nodes 1, 14, and 18 generally
decreased with increasing numbers of rate matrices,
whereas node 10 showed higher BC values with more rate
matrices.
Most studies utilizing prior (assumed) partitions in phy-
logenetic analyses of nucleotide sequence data have imple-
mented four basic strategies. One is to partition by
contiguous DNA fragments, such as the region spanning
ND1?COI versus that spanning cyt b through tRNAThr
(as in our P2). Another strategy is to partition by classes
of genes based on their transcriptional and/or translational
products, such as tRNA versus rRNA versus protein-cod-
ing genes (P3, P4 in part, P5 in part). Yet another strategy
is to partition by individual genes, such as ND2 versus COI
versus tRNAThr (P5 in part). The fourth strategy is to par-
tition based on structurally and/or functionally de?ned
classes of sites, such as codon positions and RNA stems
versus nonstems, regardless of the gene boundaries (P4 in
part, P6). These four common alternatives were examined
in the present study. In addition, other more complicated
partitioning strategies may be applied, including classes
of nucleotides that code for functional properties of the
encoded protein, such as hydrophobicity, charge, or chem-
ical properties of R-groups (Naylor et al., 1995; Savolainen
et al., 2002).
Using both likelihood and Bayesian methods (Tables 4?
6), all of the assumed partitions implemented here (P2?P6)
are better than no partitioning at all (P1). Partitioning by
classes of genes (P3, P4, P5 in part), by individual genes
(P5 in part), and by classes of sites (P4 in part, P6), is each
better than partitioning by contiguous DNA fragments
(P2). Partitioning by individual genes (P5 in part) or by
classes of sites within genes (P4 in part, P6) is better than
partitioning by classes of genes (P3), though Bayes factors
(Table 6) favored partitioning by classes of genes (P3) over
partitioning by individual genes (P5 in part). Finally, par-
titioning by classes of sites within genes (P4 in part, P6)
is better than partitioning by individual genes (P5). These
results are congruent with those of several previous studies
(Brandley et al., 2005; Castoe et al., 2005; Castoe and Par-
kinson, 2006; Nylander et al., 2004; Strugnell et al., 2005).
Nylander et al. (2004) suggested that although account-
ing for data heterogeneity across partitions greatly
improved the ?t of the data, allowing heterogeneity within
partitions might yield even greater improvements. Further
partitioning might lead to improvements in lnL scores,
but at some point the advantages of further partitioning
(improvement in lnL) are outweighed by the disadvan-
genetics and Evolution 47 (2008) 700?716tages (inaccurate parameter estimation). All data sets
examined to date have found that partitioning by codon
hyloposition and/or tRNA secondary structure (stem versus
nonstem) improved mean lnL scores; however, Strugnell
et al. (2005) found that in some cases partitioning by codon
for all genes together produced a better ?t to the data using
AIC than partitioning by both codon and gene. These
results indicate that the addition of substitution parameters
with increasing partitions in some cases is not justi?ed. As
the number of parameters increases, the amount of
sequence data to be evaluated per parameter decreases,
a?ecting the ability of probabilistic methods to estimate
model parameter values, as well as inferences derived from
them (e.g., topologies, branch lengths, nodal support),
accurately.
An alternative approach to prior (assumed) partitioning
is to estimate heterogeneity using mixture models (Pagel
and Meade, 2004). Phylogenetic mixture models allow dif-
ferent models to be applied to the same site with varying
probabilities, which are estimated from the data them-
selves. For our data (Tables 5, 6), mixture models both
with and without rate heterogeneity (C) performed better
than unpartitioned data (P1) and assumed partitions based
on contiguous gene fragments (P2) and classes of genes
(P3). On the other hand, assumed partitions based on
codon position and tRNA stems and non-stems (P4 in part,
P6) were between 800 and 900 lnL units better than the
preferred mixture model (5Q) or the best mixture model
score (6Q). This improvement comes only at the cost of a
single free parameter di?erence between P6 and 6Q, and
thus P6 (and P4) was very strongly favored over 6Q (with
or without C) according to Bayes factors (Table 6).
Development and implementation of mixed model max-
imum likelihood, perhaps using genetic algorithms (Zwickl,
2006) or reversible jump (RJ) Markov chain Monte Carlo
methods (Huelsenbeck et al., 2004; Pagel and Meade,
2006) that can simultaneously estimate substitution rate
parameters and a phylogenetic hypothesis, are the next
stage in the evolution of these analytical methods. We
anticipate that these developments will serve two very
important functions for the phylogenetic community. First,
they will simplify the steps researchers must take to per-
form the multitude of analyses currently available. Second,
these methods will greatly facilitate the ability to account
for heterogeneous evolutionary processes in complex phy-
logenetic data sets.
Acknowledgments
We thank the Smithsonian Institution (postdoctoral fel-
lowship to J.A. Schulte), the Laboratories of Analytical
Biology, and the National Science Foundation (postdoc-
toral fellowship to J.A. Schulte II) for ?nancial support;
L. Weigt and J. Hunt for laboratory support; O. Torres-
Carvajal, A. Meade, and M. Pagel for analytical advice;
J. Wilgenbusch for information concerning tissue samples,
and providing invaluable support concerning phylogenetic
J.A. Schulte II, K. de Queiroz /Molecular Panalyses, particularly those involving mmml. Thanks also
are owed to the many people who assisted with obtainingspecimens, whose names are listed in the Acknowledgments
of Wilgenbusch and de Queiroz (2000).
Appendix A
Collection numbers and localities for voucher specimens
from which DNA was obtained and GenBank accession
numbers are presented. Abbreviations for collections are
as follows: California Academy of Sciences (CAS), San
Francisco, California; Louisiana State University Museum
of Zoology (LSUMZ), Baton Rouge, Louisiana; Museum
of Vertebrate Zoology (MVZ), University of California,
Berkeley, California; Royal Ontario Museum (ROM), Tor-
onto, Canada; National Museum of Natural History
(USNM), Smithsonian Institution, Washington, District
of Columbia; ?eld specimen numbers of Kevin de Queiroz
(KdQ); ?eld specimen numbers of Richard R. Montanucci
(RRM); ?eld specimen numbers of Christopher A. Phillips
(CAP). Subspecies taxonomy for Holbrookia maculata fol-
lows Axtell (1958). Outgroups: Phrynosoma hernandesi,
USA; Colorado; Weld Co.; vic. Stoneham (RRM 2470,
EU543778, EU543747); Phrynosoma platyrhinos, USA;
California; no further data (KdQ 057, EU543777,
EU543746); Sceloporus jarrovii, USA; Arizona; Cochise
Co.; Huachuca Mts., vic. Carr Canyon (KdQ 036,
EU543774, EU543743); Urosaurus ornatus, USA; Arizona;
Yavapai Co.; 3 mi N Hillside (MVZ 214658, EU543776,
EU543745); Uta stansburiana, USA; Arizona; Pima Co.;
Santa Catalina Mts,, mouth of Pima Canyon (CAS
178153, EU543775, EU543744); Sand Lizards: Callisaurus
draconoides bogerti, Me?xico; Sinaloa; 14 km S Dimas
(ROM 14970, EU543788, EU543757); Callisaurus dracono-
ides carmenensis, Me?xico; Baja California; San Ignacio
(ROM 14182, EU543786, EU543755); Callisaurus dracono-
ides crinitis, Me?xico; Baja California; Dunes N of Guerrero
Negro (MVZ 214832, EU543787, EU543756); Callisaurus
draconoides myurus, USA; Nevada; Mineral Co.; W shore
of Walker Lake (MVZ 214751, EU543791, EU543760);
Callisaurus draconoides rhodostictus, USA; California; Inyo
Co.; Argus Mts., Darwin Wash (MVZ 214739, EU543792,
EU543761); Callisaurus draconoides ssp., Me?xico; Sonora;
4.0 mi. NW of Bah??a Kino (ROM 15034, EU543789,
EU543758); Callisaurus draconoides ventralis, USA; New
Mexico; Hidalgo Co.; 12.4 mi N and 0.9 mi W of jct. US
Hwy 80 and Portal Road (NM 533) (LSUMZ 48811,
EU543790, EU543759); Cophosaurus texanus scitulus,
USA; Arizona; Pima Co.; Santa Catalina Mts., mouth of
Pima Canyon (MVZ 214711, EU543793, EU543762);
Cophosaurus texanus texanus, USA; Texas; Blanco Co.;
Flat Creek, 3.8 mi N of U. S. Rte. 290 (at Henly) on Farm
Road 3232, then 0.5 mi E on dirt road (USNM 315499,
EU543794, EU543763); Holbrookia elegans elegans, Me?xi-
co; Sinaloa; Mazatla?n (ROM 14964, EU543797,
EU543766); Holbrookia elegans thermophila, USA; Ari-
zona; Pima Co.; 0.6 mi E of Hwy 83 on Old Sonoita
genetics and Evolution 47 (2008) 700?716 713Hwy (CAS 174400, EU543798, EU543767);Holbrookia lac-
erata lacerata, USA; Texas; Concho Co.; 5.8 (by air) miles
hyloE of Eden on dirt road paralleling U. S. Rte. 87 (USNM
315500, EU543795, EU543764); Holbrookia lacerata sub-
caudalis, USA; Texas; McMullen Co.; Charles Caron
Ranch, 9.5 mi E of Texas Rte. 16 on Farm Road 1962
(USNM 315503, EU543796, EU543765); Holbrookia pro-
pinqua propinqua, USA; Texas; Nueces Co.; dunes just N
of E end of Access Rd. 3, just S of Mustang Island State
Park (MVZ 214863, EU543799, EU543768); Holbrookia
maculata bunkeri, USA; New Mexico; Luna Co.; 2.0 mi
W of Cambray on NM Hwy 549 (USNM 337740,
EU543803, EU543772); Holbrookia maculata campi,
USA; New Mexico; McKinley Co.; Zuni Indian Reserva-
tion; jct. of Sand Spring Canyon Creek and Zuni River
(MVZ 214801, EU543801, EU543770); Holbrookia macula-
ta ?avilenta, USA; Arizona; Cochise Co.; 0.1 mi S Hwy I-
10 at junction 666N (MVZ 214814, EU543804, EU543773);
Holbrookia maculata maculata, USA; New Mexico; Chaves
Co.; Dune area 0.9 mi S Hwy 380 and 37.0 mi E jct. Hwy
285 at Roswell (USNM 214806, EU543800, EU543769);
Holbrookia maculata ruthveni, USA; New Mexico; Otero
Co.; White Sands National Monument, 5.0 mi W of U.S.
Hwy 70/82 on The Dunes Drive (USNM 337752,
EU543802, EU543771); Uma exsul, Me?xico; Coahuila, Bil-
bao Dunes (ROM 15315, EU543779, EU543748);Uma
inornata, USA; California; Riverside Co.; Windy Point
(CAP 1752, EU543785, EU543754); Uma notata, USA;
California; Imperial Co.; E edge Algodones Dunes, 0.9
mi W jct. of Hwy I-10 and S-34 (MVZ 214799,
EU543784, EU543753);Uma rufopunctata, Me?xico; Sonora;
17 km NE Puerto Pen?asco (ROM 13919, EU543783,
EU543752); Uma paraphygas, Me?xico; Durango; Bolso?n
de Mapim?? , vic. Laboratorio del Desierto (ROM 15089,
EU543780, EU543749); Uma scoparia (1), USA; California;
San Bernardino Co.; Pisgah Crater (ROM 14637,
EU543781, EU543750); Uma scoparia (2), USA; California;
San Bernardino Co.; S side Kelso Dunes (MVZ 214784,
EU543782, EU543751).
Appendix B
Trees used as constraints for tests of alternative hypoth-
eses of sand lizard relationships (note that these topologies
do not constrain the traditional sand lizard genera to be
monophyletic, which makes them more di?cult to reject
than topologies that constrain monophyly of the genera):
Topology 1: (Sceloporus jarrovii, Uta stansburiana, Uro-
saurus ornatus, Phrynosoma platyrhinos, Phrynosoma her-
nandesi, (Uma exsul, Uma paraphygas, Uma scoparia 1,
Uma scoparia 2, Uma rufopunctata, Uma notata, Uma inor-
nata, (Callisaurus draconoides carmenensis, Callisaurus dra-
conoides crinitus, Callisaurus draconoides bogerti,
Callisaurus draconoides ssp., Callisaurus draconoides ventra-
lis, Callisaurus draconoides myurus, Callisaurus draconoides
rhodostictus, (Cophosaurus texanus scitulus, Cophosaurus
texanus texanus, Holbrookia lacerata lacerata, Holbrookia
714 J.A. Schulte II, K. de Queiroz /Molecular Placerata subcaudalis, Holbrookia elegans elegans,Holbrookia
elegans thermophila, Holbrookia propinqua propinqua, Holb-rookia maculata maculata, Holbrookia maculata campi,
Holbrookia maculata ruthveni, Holbrookia maculata bunkeri,
Holbrookia maculata ?avilenta)))).
Topology 2: (Sceloporus jarrovii, Uta stansburiana, Uro-
saurus ornatus, Phrynosoma platyrhinos, Phrynosoma her-
nandesi, ((Uma exsul, Uma paraphygas, Uma scoparia 1,
Uma scoparia 2, Uma rufopunctata, Uma notata, Uma inor-
nata, Callisaurus draconoides carmenensis, Callisaurus
draconoides crinitus, Callisaurus draconoides bogerti, Calli-
saurus draconoides ssp, Callisaurus draconoides ventralis,
Callisaurus draconoides myurus, Callisaurus draconoides rho-
dostictus), (Cophosaurus texanus scitulus, Cophosaurus tex-
anus texanus, Holbrookia lacerata lacerata, Holbrookia
lacerata subcaudalis, Holbrookia elegans elegans,Holbrookia
elegans thermophila, Holbrookia propinqua propinqua, Holb-
rookia maculata maculata, Holbrookia maculata campi,
Holbrookia maculata ruthveni, Holbrookia maculata bunkeri,
Holbrookia maculata ?avilenta))).
Topology 3: (Sceloporus jarrovii, Uta stansburiana, Uro-
saurus ornatus, Phrynosoma platyrhinos, Phrynosoma her-
nandesi, (Uma exsul, Uma paraphygas, Uma scoparia 1,
Uma scoparia 2, Uma rufopunctata, Uma notata, Uma inor-
nata, ((Callisaurus draconoides carmenensis, Callisaurus dra-
conoides crinitus,Callisaurus draconoides bogerti, Callisaurus
draconoides ssp, Callisaurus draconoides ventralis, Callisau-
rus draconoides myurus, Callisaurus draconoides rhodostic-
tus, Cophosaurus texanus scitulus, Cophosaurus texanus
texanus), Holbrookia lacerata lacerata, Holbrookia lacerata
subcaudalis, Holbrookia elegans elegans, Holbrookia elegans
thermophila, Holbrookia propinqua propinqua, Holbrookia
maculata maculata, Holbrookia maculata campi, Holbrookia
maculata ruthveni, Holbrookia maculata bunkeri, Holbrookia
maculata ?avilenta))).
Topology 4: (Sceloporus jarrovii, Uta stansburiana, Uro-
saurus ornatus, Phrynosoma platyrhinos, Phrynosoma her-
nandesi, ((Uma exsul, Uma paraphygas, Uma scoparia 1,
Uma scoparia 2, Uma rufopunctata, Uma notata, Uma inor-
nata, (Callisaurus draconoides carmenensis, Callisaurus
draconoides crinitus, Callisaurus draconoides bogerti, Calli-
saurus draconoides ssp, Callisaurus draconoides ventralis,
Callisaurus draconoides myurus, Callisaurus draconoides
rhodostictus, Cophosaurus texanus scitulus, Cophosaurus
texanus texanus)), Holbrookia lacerata lacerata, Holbrookia
lacerata subcaudalis, Holbrookia elegans elegans,Holbrookia
elegans thermophila, Holbrookia propinqua propinqua, Holb-
rookia maculata maculata, Holbrookia maculata campi,
Holbrookia maculata ruthveni, Holbrookia maculata bunkeri,
Holbrookia maculata ?avilenta)).
References
Adest, G.A., 1978. The relationships of the sand lizards Uma, Callisaurus,
and Holbrookia (Sauria: Iguanidae): an electrophoretic study. Unpub-
lished Ph.D. dissertation, University of California, Los Angeles.
Alfaro, M.E., Zoller, S., Lutzoni, F., 2003. Bayes or bootstrap? A
simulation study comparing the performance of Bayesian Markov
genetics and Evolution 47 (2008) 700?716chain Monte Carlo sampling and bootstrapping in assessing phyloge-
netic con?dence. Mol. Biol. Evol. 20, 255?266.
hyloAnderson, S., Bankier, A.T., Barrell, B.G., De Bruijn, M.H.L., Coulson,
A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F.,
Schreier, P.H., Smith, A.J.H., Staden, R., Young, I.G., 1981. Sequence
and organization of the human mitochondrial genome. Nature 290,
457?465.
Axtell, R.W., 1958. A monographic revision of the iguanid genus
Holbrookia. Ph.D. dissertation, University of Texas, Austin.
Brandley, M.C., Schmitz, A., Reeder, T.W., 2005. Partitioned Bayesian
analyses, partition choice, and the phylogenetic relationships of scincid
lizards. Syst. Biol. 54, 373?390.
Castoe, T.A., Doan, T.M., Parkinson, C.L., 2004. Data partitions and
complex models in Bayesian analysis: the phylogeny of gymnophthal-
mid lizards. Syst. Biol. 53, 448?469.
Castoe, T.A., Sasa, M.M., Parkinson, C.L., 2005. Modeling nucleotide
evolution at the mesoscale: the phylogeny of the Neotropical pitvipers
of the Porthidium group (Viperidae: Crotalinae). Mol. Phylogenet.
Evol. 37, 881?898.
Castoe, T.A., Parkinson, C.L., 2006. Bayesian mixed models and the
phylogeny of pitvipers (Viperidae: Serpentes). Mol. Phylogenet. Evol.
39, 91?110.
ChangChien, L.-L., 1996. A phylogenetic study of sceloporine lizards and
their relationships with other iguanid lizards based on DNA/DNA
hybridization. Ph.D. dissertation, University of Wisconsin, Madison.
Clarke, R.F., 1965. An ethological study of the iguanid lizard genera
Callisaurus, Cophosaurus, and Holbrookia. Emporia St. Res. Stud. 13,
1?66.
Cox, D.C., Tanner, W.W., 1977. Osteology and myology of the head and
neck regions of Callisaurus, Cophosaurus, Holbrookia, and Uma
(Reptilia: Iguanidae). Great Basin Nat. 37, 35?56.
Crother, B.I., 2000. Scienti?c and standard English names of amphibians
and reptiles of North America north of Mexico, with comments
regarding con?dence in our understanding. Herpetolog. Circ. 29,
Society for the Study of Amphibians and Reptiles.
Crother, B.I., 2003. Scienti?c and standard English names of amphibians
and reptiles of North America north of Mexico: update. Herpetol.
Rev. 34, 196?203.
de Queiroz, K., 1989. Morphological and biochemical evolution in the
sand lizards. Ph.D. dissertation, University of California, Berkeley.
de Queiroz, K., 1992. Phylogenetic relationship and rates of allozyme
evolution among the lineages of sceloporine sand lizards. Biol. J. Linn.
Soc. 45, 333?362.
Douady, C.J., Delsuc, F., Boucher, Y., Doolittle, W.F., Douzery, E.J.P.,
2003. Comparison of Bayesian and maximum likelihood bootstrap
measures of phylogenetic reliability. Mol. Biol. Evol. 20, 248?254.
Earle, A.M., 1961. The middle ear of Holbrookia and Callisaurus. Copeia
1961, 405?410.
Earle, A.M., 1962. The middle ear of the genus Uma compared to those of
the other sand lizards. Copeia 1962, 185?188.
Etheridge, R., de Queiroz, K., 1988. A phylogeny of Iguanidae. In: Estes,
R., Pregill, G. (Eds.), Phylogenetic Relationships of the Lizard
Families. Stanford Univ. Press, Stanford, California, pp. 283?367.
Felsenstein, J., 1985a. Con?dence limits on phylogenies an approach using
the bootstrap. Evolution 39, 783?791.
Felsenstein, J., 1985b. Con?dence limits on phylogenies with a molecular
clock. Syst. Zool. 34, 152?161.
Felsenstein, J., Kishino, H., 1993. Is there something wrong with the
bootstrap on phylogenies? A reply to Hillis and Bull. Syst. Biol. 42,
193?200.
Fu, J., 1999. Phylogeny of Lacertid Lizards (Squamata: Lacertidae) and
the Evolution of Unisexuality, Ph.D. dissertation. University of
Toronto, Toronto.
Goldman, N., Anderson, J.P., Rodrigo, A.G., 2000. Likelihood-based
tests of topologies in phylogenetics. Syst. Biol. 49, 652?670.
Huelsenbeck, J.P., Nielsen, R., 1999. The e?ect of non-independent
substitution on phylogenetic accuracy. Syst. Biol. 48, 317?328.
Huelsenbeck, J.P., Larget, B., Miller, R.E., Ronquist, F., 2002. Potential
J.A. Schulte II, K. de Queiroz /Molecular Papplications and pitfalls of Bayesian inference of phylogeny. Syst. Biol.
51, 673?688.Huelsenbeck, J.P., Larget, B., Alfaro, M.E., 2004. Bayesian phylogenetic
model selection using reversible jump Markov chain Monte Carlo.
Mol. Biol. Evol. 21, 1123?1133.
Je?reys, H., 1935. Some tests of signi?cance, treated by the theory of
probability. Proc. Camb. Phil. Soc. 31, 203?222.
Kass, R.E., Raftery, A.E., 1995. Bayes factors. J. Am. Stat. Assoc. 90,
773?795.
Kocher, T.D., Thomas, W.K., Meyer, A., Edwards, S.V., Paabo, S.,
Villablanca, F.X., Wilson, A.C., 1989. Dynamics of mitochondrial
DNA evolution in animals: Ampli?cation and sequencing with
conserved primers. Proc. Natl. Acad. Sci. USA 86, 6196?6200.
Kumazawa, Y., Nishida, M., 1993. Sequence evolution of mitochondrial
tRNA genes and deep-branch animal phylogenetics. J. Mol. Evol. 37,
380?398.
Kumazawa, Y., Ota, H., Nishida, M., Ozawa, T., 1998. The complete
nucleotide sequence of a snake (Dinodon semicarinatus) mitochondrial
genome with two identical control regions. Genetics 150, 313?329.
Leache?, A.D., McGuire, J.A., 2006. Phylogenetic relationships of horned
lizards (Phrynosoma) based on nuclear and mitochondrial data:
evidence for a misleading mitochondrial gene tree. Mol. Phylogenet.
Evol. 39, 628?644.
Lewis, P.O., Holder, M.T., Holsinger, K.E., 2005. Polytomies and
Bayesian phylogenetic inference. Syst. Biol. 54, 241?253.
Macey, J.R., Verma, A., 1997. Homology in phylogenetic analysis:
alignment of transfer RNA genes and the phylogenetic position of
snakes. Mol. Phylogenet. Evol. 7, 272?279.
Macey, J.R., Larson, A., Ananjeva, N.B., Fang, Z., Papenfuss, T.J., 1997.
Two novel gene orders and the role of light-strand replication in
rearrangement of the vertebrate mitochondrial genome. Mol. Biol.
Evol. 14, 91?104.
Macey, J.R., Schulte II, J.A., Larson, A., Fang, Z., Wang, Y., Tuniyev,
B.S., Papenfuss, T.J., 1998. Phylogenetic relationships of toads in the
Bufo bufo species group from the eastern escarpment of the Tibetan
Plateau: a case of vicariance and dispersal. Mol. Phylogenet. Evol. 9,
80?87.
Macey, J.R., Schulte II, J.A., Larson, A., Tuniyev, B.S., Orlov, N.,
Papenfuss, T.J., 1999. Molecular phylogenetics, tRNA evolution, and
historical biogeography in anguid lizards and related taxonomic
families. Mol. Phylogenet. Evol. 12, 250?272.
Maddison, D.R., Maddison, W.P., 2003. MacClade: Analysis of Phylog-
eny and Character Evolution. Version 4.08. Sinauer Associates,
Sunderland, MA.
Mittleman, M.B., 1942. A summary of the iguanid genus Urosaurus. Bull.
Mus. Comp. Zool. Harvard Univ. 91, 106?176.
Naylor, G.J.P., Collins, T.M., Wesley, M.B., 1995. Hydrophobicity and
phylogeny. Nature 373, 565?566.
Norris, K.S., 1958. The evolution and systematics of the iguanid genus
Uma and its relation to the evolution of other North American desert
reptiles. Bull. Am. Mus. Nat. Hist. 114, 253?326.
Nylander, J.A.A., Ronquist, F., Huelsenbeck, J.P., Nieves-Aldrey, J.L.,
2004. Bayesian phylogenetic analysis of combined data. Syst. Biol. 53,
47?67.
Pagel, M., Meade, A., 2004. A phylogenetic mixture model for detecting
pattern-heterogeneity in gene sequence or character-state data. Syst.
Biol. 53, 571?581.
Pagel, M., Meade, A., 2005. Mixture models in phylogenetic inference. In:
Gascuel, O. (Ed.), Mathematics of Evolution and Phylogeny. Claren-
don Press, Oxford, pp. 121?142.
Pagel, M., Meade, A., 2006. Bayesian analysis of correlated evolution of
discrete characters by reversible-jump Markov chain Monte Carlo.
Am. Nat. 167, 808?825.
Posada, D., Crandall, K.A., 1998. Modeltest: testing the model of DNA
substitution. Bioinformatics 14, 817?818.
Raftery, A.E., 1996. Hypothesis testing and model selection. In: Gilks,
W.R., Richardson, S., Spiegelhalter, D.J. (Eds.), Markov Chain
Monte Carlo in Practice. Chapman & Hall, London, pp. 163?187.
genetics and Evolution 47 (2008) 700?716 715Rambaut, A., Drummond, A.J., 2005. Tracer. Version 1.3, Available
from: http://tree.bio.ed.ac.uk/software/tracer/.
Reeder, T.W., Wiens, J.J., 1996. Evolution of the lizard family Phryno-
somatidae as inferred from diverse types of data. Herpetol. Monogr.
10, 43?84.
Ronquist, F., Huelsenbeck, J.P., 2003. MrBayes 3: Bayesian phylogenetic
inference under mixed models. Bioinformatics 19, 1572?1574.
Savolainen, V., Chase, M.W., Salamin, N., Soltis, D.E., Soltis, P.S.,
Lo?pez, A.J., Fe?drigo, O., Naylor, G.J.P., 2002. Phylogeny reconstruc-
tion and functional constraints in organellar genomes: plastid atpB and
rbcL sequences versus animal mitochondrion. Syst. Biol. 51, 638?647.
Savage, J.M., 1958. The iguanid lizard genera Urosaurus and Uta, with
remarks on related groups. Zoologica 43, 41?54.
Schulte II, J.A., Valladares, J.P., Larson, A., 2003. Phylogenetic relation-
ships within Iguanidae inferred using molecular and morphological
data and a phylogenetic taxonomy of iguanian lizards. Herpetologica
59, 399?419.
Shimodaira, H., 2002. An approximately unbiased test of phylogenetic
tree selection. Syst. Biol. 51, 492?508.
Shimodaira, H., Hasegawa, M., 1999. Multiple comparisons of log-
likelihoods with applications to phylogenetic inference. Mol. Biol.
Evol. 16, 1114?1116.
Shimodaira, H., Hasegawa, M., 2001. CONSEL: for assessing the
con?dence of phylogenetic tree selection. Bioinformatics 17, 1246?
1247.
Smith, H.M., 1946. Handbook of lizards. Comstock, Ithaca, NY.
Strugnell, J., Norman, M., Jackson, J., Drummond, A.J., Cooper, A.,
2005. Molecular phylogeny of coleoid cephalopods (Mollusca: Ceph-
alopoda) using a multigene approach; the e?ect of data partitioning on
resolving phylogenies in a Bayesian framework. Mol. Phylogenet.
Evol. 37, 426?441.
Sullivan, J., Abdo, Z., Joyce, P., Swo?ord, D.L., 2005. Evaluating the
performance of a successive-approximations approach to parameter
optimization in maximum-likelihood phylogeny estimation. Mol. Biol.
Evol. 22, 1386?1392.
Swo?ord, D.L., 2002. PAUP*. Phylogenetic Analysis Using Parsimony*
(and other methods). Version 4.0. Sinauer Associates, Sunderland,
MA.
Swo?ord, D.L., Olsen, G.J., Waddell, P.J., Hillis, D.M., 1996. Phyloge-
netic inference. In: Hillis, D.M., Moritz, C., Mable, B.K. (Eds.),
Molecular Systematics, 2nd ed. Sinauer Associates, Sunderland, MA,
pp. 407?543.
Tavare?, S., 1986. Some probabilistic and statistical problems in the
analysis of DNA sequences. Lec. Math. Life Sci. 17, 57?86.
Templeton, A.R., 1983. Phylogenetic inference from restriction endonu-
clease cleavage site maps with particular reference to the evolution of
humans and the apes. Evolution 37, 221?244.
Torres-Carvajal, O., Schulte II, J.A., Cadle, J.E., 2006. Phylogenetic
relationships of South American lizards of the genus Stenocercus
(Squamata: Iguania): a new approach using a general mixture model
for gene sequence data. Mol. Phylogenet. Evol. 39, 171?185.
Weisrock, D.W., Macey, J.R., Ugurtas, I.H., Larson, A., Papenfuss, T.J.,
2001. Molecular phylogenetics and historical biogeography among
salamandrids of the ??true? salamander clade: rapid branching of
numerous highly divergent lineages in Mertensiella luschani associated
with the rise of Anatolia. Mol. Phylogenet. Evol. 18, 434?448.
Wilgenbusch, J., de Queiroz, K., 2000. Phylogenetic relationships among
the phrynosomatid sand lizards inferred from mitochondrial DNA
sequences generated by heterogeneous evolutionary processes. Syst.
Biol. 49, 592?612.
Zwickl, D.J., 2006. Genetic algorithm approaches for the phylogenetic
analysis of large biological sequence datasets under the maximum
likelihood criterion. Ph.D. dissertation, The University of Texas at
Austin.
716 J.A. Schulte II, K. de Queiroz /Molecular Phylogenetics and Evolution 47 (2008) 700?716