THE UNIVERSITY OF KANSAS MUSEUM OF NATURAL HISTORY SPECIAL PUBLICATION No. 19 D=m] TU y ii d. MITP WJ LAID \H rfl ,Am A Primer of Phylogenetic Procednr E. O. WILEY D. SIEGEL-CAUSEY D. R. BROOKS V. A. FUNK LAWRENCE October 1991 THE UNIVERSITY OF KANSAS MUSEUM OF NATURAL HISTORY SPECIAL PUBLICATIONS To receive our 1990 Catalog of Publications, send $1.00 (post- paid) to Publications, Museum of Natural History, The University of Kansas, Lawrence, Kansas 66045-2454, USA. To order by phone call 913-864-4540. MasterCard and VISA accepted. See the inside back cover for a list of other available numbers in this series. THE UNIVERSITY OF KANSAS MUSEUM OF NATURAL HISTORY SPECIAL PUBLICATION No. 19 OCTOBER 1991 THE COMPLEAT CLADIST A Primer of Phylogenetic Procedures E. O. Wiley Museum of Natural History The University of Kansas Lawrence, Kansas 66045 D. Siegel-Causey Museum of Natural History The University of Kansas Lawrence, Kansas 66045 D. R. Brooks Department of Zoology The University of Toronto Toronto, Ontario M5S 1A1 CANADA V. A. Funk Department of Botany National Museum of Natural History The Smithsonian Institution Washington, D.C. 20560 MUSEUM OF NATURAL HISTORY DYCHE HALL THE UNIVERSITY OF KANSAS LAWRENCE, KANSAS 1991 THE UNIVERSITY OF KANSAS MUSEUM OF NATURAL HISTORY Joseph T. Collins, Editor Kimberlee Wollter, Copy editing and Design Kate Shaw, Design and Typesetting ? 1991 Museum of Natural History The University of Kansas Lawrence, Kansas 66045-2454, USA Special Publication No. 19 pp. x + 1-158; 122 figures; 60 tables Published October 1991 Text for this publication was produced on a Macintosh II computer in Microsoft? Word, and figures were drafted in Claris? MacDraw? II. The publication was then designed and typeset in Aldus PageMaker? and forwarded to the printer. PRINTED BY THE UNIVERSITY OF KANSAS PRINTING SERVICE LAWRENCE, KANSAS ISBN 0-89338-035-0 "... but, he that hopes to be a good Angler must not onely bring an inquiring, searching, observing wit, but he must bring a large measure of hope and patience, and a love and propensity to the Art it self; but having once got and practised it, then doubt not but Angling will prove to be so pleasant, that it will prove like Vertue, a reward to it self." Piscator speaking to Venator and Auceps The Compleat Angler by Izaak Walton The Modern Library Printing of the Fourth (1668) Edition Random House, New York in IV PREFACE In writing this workbook, we have strived to follow in the tradition of Brooks et al. (1984) of providing a guide to basic phylogenetic techniques as we now understand them. The field of phylogenetics has undergone many changes, some philosophical and some empirical, in the last 10 years. We hope to reflect some of these changes in this workbook. The workbook is arranged in a manner roughly functional and pedagogical. The sophisticated reader, for example, might question why we spend so much time with exercises (in Chapter 2) that do not really reflect the way "real" phylogenetic analysis is performed or why so much space is given to Hennig Argumentation and Wagner algorithms (Chapter 4) when they are either not a part of modern computer algorithms or, at best, are only a starting point for finding the best hypothesis of common ancestry for the particular data analyzed. Our answer, perhaps constrained by our own histories, is that this approach seems to help at least some students learn phylogenetics. Because we can only touch on the most basic topics and provide exercises for only a few of these, we invite the student to explore the original literature; we have cited sources in the body of the text and called attention to some papers in the "Chapter Notes and References" sections at the end of each chapter. The absence of a paper in the text or in the "Notes" section is no reflection on the worth of the paper. A compilation of all of the useful papers relating to phylogenetics is quite beyond the scope of this workbook. In addition to the exercises, we have provided immediate feedback sections termed "Quick Quizzes." We are interested in reader opinion regarding both the exercises and the quick quizzes. We will incorporate suggestions, wherever possible, in subsequent editions. The major title, The Compleat Cladist, is inspired by the title The Compleat Angler by Izaak Walton, a marvelous book published in many editions since 1653. Of course, this book is not "complete" or even "compleat" in the archaic sense of representing a book that teaches complete mastery of a subject. Phylogenetics is much too dynamic for a small workbook to fulfill that criterion. Rather, we take our inspiration from Walton; the compleat cladist is one who approaches the subject with energy, wonder, and joy. Unfortunately, none of us are clever enough to come up with an analogy to the "Anglers Song." We thank the following people for their valuable comments on part or all of the earlier drafts of this workbook: the students of Biology 864 and Mike Bamshad (University of Kansas), John Hay den (University of Richmond), Debbie McLennan (University of Toronto), David Swofford (Illinois Natural History Survey), Charlotte Taylor, Richard Thomas, and Rafael Joglar (University of Puerto Rico), Wayne Maddison (Harvard University), and Arnold Kluge (University of Michigan). Special thanks are due to David Kizirian (University of Kansas) for working through the answers to the exercises and to Kate Shaw and Kim Wollter (University of Kansas) for their editorial skills. Partial support in the form of computer hardware and software by the National Science Foundation (BSR 8722562) and the University of Kansas Museum of Natural History is gratefully acknowledged. Mistakes in interpretation and exercise answers are our responsibility, and we will be grateful for any suggestions and corrections for incorporation into future editions. E. O. Wiley, D. Siegel-Causey, D. R. Brooks, and V. A. Funk Lawrence, Kansas; Toronto, Ontario; and Washington, DC. Summer, 1991 VI CONTENTS PREFACE v CHAPTER 1: INTRODUCTION, TERMS, AND CONCEPTS 1 Terms for Groups of Organisms 3 Quick Quiz?Groups 5 Terms for the Relationships of Taxa 6 Quick Quiz?Relationships 7 Terms for Classifications 7 Quick Quiz?Classification 8 Process Terms 8 Terms for the Attributes of Specimens 9 Quick Quiz?Characters 11 Chapter Notes and References 11 Quick Quiz Answers 11 CHAPTER 2: BASIC PHYLOGENETIC TECHNIQUES 13 Quick Quiz?Basic Rules of Analysis 17 Sample Analyses 17 Exercises 22 Chapter Notes and References 24 Quick Quiz Answers 24 CHAPTER 3: CHARACTER ARGUMENTATION AND CODING 25 Outgroup Comparison 25 Polarity Decisions 25 Rules of Thumb 31 Other Situations 31 Quick Quiz?Outgroups and Polarities 31 Polarity Exercises 32 Character Coding 34 Quick Quiz?Character Coding 40 Coding Exercises 40 Chapter Notes and References 42 Quick Quiz Answers 43 CHAPTER 4: TREE BUILDING AND OPTIMIZATION 45 Hennig Argumentation 45 Hennig Exercises 46 vn The Wagner Algorithm 47 Wagner Definitions 49 The Algorithm 50 Wagner Tree Exercises 54 Optimal Trees and Parsimony Criteria 54 Optimizing Trees 56 ACCTRAN 57 ACCTRAN Exercises 60 Discussion 61 Finding MPR Sets 62 DELTRAN 63 DELTRAN Exercises 63 Current Technology 66 Chapter Notes and References 68 CHAPTER 5: TREE COMPARISONS 71 Summary Tree Measures 71 Tree Length 71 Consistency Indices 72 Ensemble Consistency Indices 75 The F-Ratio 76 Tree Summaries Exercises 78 Consensus Techniques 80 Strict Consensus Trees 81 Adams Trees 83 Majority Consensus Trees 88 Chapter Notes and References 89 CHAPTER 6: CLASSIFICATION 91 Evaluation of Existing Classifications 92 Logical Consistency 92 Determining the Number of Derivative Classifications 99 Classification Evaluation Exercises 100 Constructing Phylogenetic Classifications 102 Rules of Phylogenetic Classifications 102 Conventions 103 Quick Quiz?Taxonomy vs. Systematics 108 Convention Exercises 108 Chapter Notes and References Ill Quick Quiz Answers Ill via CHAPTER 7: COEVOLUTIONARY STUDIES 113 Coding Phylogenetic Trees 113 Quick Quiz?Biogeography 115 Single Tree Exercises 117 More Than One Group 118 Missing Taxa 120 Widespread Species 124 Sympatry within a Clade 127 The Analogy between Phylogenetics and Historical Biogeography 127 Chapter Notes and References 128 Quick Quiz Answers 128 LITERATURE CITED 129 ANSWERS TO EXERCISES 137 Chapter 2 137 Chapter 3 139 Chapter 4 141 Chapter 5 150 Chapter 6 151 Chapter 7 156 Note: Page numbers in PDF document may not correspond to page numbers in the original publication. IX CHAPTER 1 INTRODUCTION, TERMS, AND CONCEPTS The core concept of phylogenetic systematics is the use of derived or apomorphic characters to reconstruct common ancestry relationships and the grouping of taxa based on common ancestry. This concept, first formalized by Hennig (1950, 1966), has been slowly, and not so quietly, changing the nature of systematics. Why should we be interested in this approach? What about phylogenetic systematics is different from traditional systematics? The answer is simple: classifications that are not known to be phylogenetic are possibly artificial and are, therefore, useful only for identification and not for asking questions about evolution. There are two other means of making statements of relationship: traditional system- atics and phenetics. Traditional systematic methods employ intuition. In practical terms, intuition is character weighting. The scientist studies a group of organisms, selects the character(s) believed to be important (i.e., conservative), and delimits species and groups of species based on these characters. Disagreements usually arise when different scientists think different characters are important. It is difficult to evaluate the evolutionary significance of groups classified by intuition because we do not know why they were created or whether they represent anything real in nature. Because these groups may not be defined at all or may be defined by characters that have no evolutionary significance, such groups may be artificial. Phenetics is an attempt to devise an empirical method for determining taxonomic relationships. In practice, phenetics is no better than traditional systematics in determin- ing relationships because the various algorithms concentrate on reflecting the total similarity of the organisms in question. Organisms that appear to be more similar are grouped together, ignoring the results of parallel or convergent evolution and again creating possibly artificial groups. Phylogeneticists differ from traditional systematists in that we employ empirical methods to reconstruct phylogenies and strictly evolutionary principles to form classifications rather than relying on intuition or authority. We differ from pheneticists in that our methods seek to find the genealogic relationships among the taxa we study rather than the phenetic or overall similarity relationships. What all this means is that the groups we discover are thought to be natural, or monophyletic. Given any array of taxa, which two are more closely related to each other than either is to any other taxon? We attempt to discover the common ancestry relationships indirectly through finding evidence for common ancestry. This evidence comes in the form of shared derived characters (synapomorphies). For example, among Aves (birds), Crocodylia (alligators and crocodiles), and Squamata (lizards, snakes, and amphisbaenians), Aves and Crocodylia are thought to be more closely related because they share a number of synapomorphies thought to have originated in their common 2 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 ancestor, which appeared after (later than) the common ancestor of all three taxa. This relationship is shown in the form of a phylogenetic tree, a reconstruction of the genealogic relationships. In addition, phylogeneticists view the reconstructed tree (frequently termed a cla- dogram) as the classification, and when expressing it in a hierarchical scheme, we insist on maintaining monophyletic groups and sister-group relationships. The discovery of monophyletic groups is the basic quest of phylogenetics. Going to all the trouble of finding the groups and then throwing them away does not make sense to us. Ever since the general theory of evolution gained acceptance, systematists have sought the one evolutionary history for organisms and have tried to fit that history into a hierarchical structure. We seek to reflect in our classifications the groups that we find in nature. Because phylogenetic reasoning delimits groups based on common ancestry, we can attempt to reconstruct evolutionary histories and from them develop a hierarchi- cal ranking scheme. Phylogenetic groups are then a reflection of the order in nature. Therefore, our classifications can be used for the study of other characters and for further investigations in biogeography, coevolution, molecular evolution, rates of evo- lution, ecology, etc. If you wish to use classifications to study evolution, they must reflect the genealogy of the taxa in question. Groups that are potentially artificial cannot be used in such investigations. One of the greatest strengths of the phylogenetic system is that the method and results are transparent, meaning that decisions, whether right or wrong, are based on data that can be examined by any and all persons willing to understand the nature of the data. The phylogenetic system does not depend on some special and mysterious knowledge about organisms that only the "expert" can understand. A critic cannot claim that your idea of the phylogenetic history of a group is wrong just because he has studied the groups longer than you have. Of course, there are valid disagreements, and there is room for change and improvement. But these disagreements are data based, not opinion based. Phylogenetics, to put it crudely, is a put-up-or-shut-up scientific disci- pline. This workbook presents the basics of phylogenetic systematics as we use it today. We also cite references for those interested in following some of the debates currently underway among the proponents of phylogenetic systematics. We hope that this infor- mation will stimulate you and illustrate the importance of systematics as the basis of comparative biology. When you have finished this workbook, you should be able to reread this introduction and understand what we are trying to accomplish. As an acid test, go read Hennig (1966); it's the way we got started, and it remains the classic in the field. All new scientific ideas and analytical methods are accompanied by new sets of terms and concepts, which can be unsettling to the tyro and even more unsettling to the experienced systematist who is called upon to abandon the "traditional" meanings of terms and embrace new meanings. The basic rationale for adopting the definitions and concepts presented in this workbook is twofold. First, it is vitally important for systematics and taxonomy to be integrated into in the field of evolutionary theory. Willi Hennig's major motivation for reforming systematics and taxonomy was to bring them in line with the Darwinian Revolution, making the results obtained through phyloge- netic systematics directly relevant to studies in other fields of evolutionary research. INTRODUCTION, TERMS, AND CONCEPTS 3 Second, it is vitally important that the terms used in an empirical field be as unambigu- ous as possible so that hypotheses are as clear as possible. With these rationales in mind, we offer the following definitions for the basic terms in our field. They are largely taken from Hennig (1966) or Wiley (1980, 1981a). Other more specialized terms will be introduced in other chapters. TERMS FOR GROUPS OF ORGANISMS 1. Taxon.?A taxon is a group of organisms that is given a name. The name is a proper name. The form of many of these proper names must follow the rules set forth in one of the codes that govern the use of names. The relative hierarchical position of a taxon in a classification can be indicated in many ways. In the Linnaean system, relative rank is denoted by the use of categories. You should not confuse the rank of a taxon with its reality as a group. Aves is a taxon that includes exactly the same organisms whether it is ranked as a class, an order, or a family. 2. Natural taxon.?A natural taxon is a group of organisms that exists in nature as a result of evolution. Although there are many possible groupings of organisms, only a few groupings comprise natural taxa. In the phylogenetic system, there are two basic kinds of natural taxa: species and monophyletic groups. A species is a lineage. It is a taxon that represents the largest unit of taxic evolution and is associated with an array of processes termed speciation. A monophyletic group is a group of species that includes an ancestral species and all of its descendants (Fig. 1.1a). Members of monophyletic groups share a set of common ancestry relationships not shared with any other species placed outside the group. In other terms, a monophyletic group is a unit of evolutionary history. Examples include Mammalia and Angiospermae. 3. Clade.?A clade is a monophyletic group, i.e., a natural taxon. 4. Ancestral taxon.?An ancestral taxon is a species that gave rise to at least one new daughter species during speciation, either through cladogenesis or reticulate speciation. By cladogenesis we mean speciation that results in two or more branches on the phylogenetic tree where there was only one branch before. By reticulate speciation we mean the establishment of a new species through a hybridization event involving two different species. A species that emerged from cladogenesis has one ancestral species but a species emerging from reticulate speciation has two ancestral species. In the phylogenetic system, only species can be ancestral taxa. Groups of species are specifically excluded from being ancestral to other groups of species or to single species. The biological rationale for this distinction is clear; there is an array of processes termed speciation that allow for one species to give rise to another (or two species to give rise to a species of hybrid origin), but there are no known processes that allow for a genus or a family to give rise to other taxa that contain two or more species ("genusation" and "familization" are biologically unknown). Thus, each monophyletic group begins as a single species. This species is the ancestor of all subsequent members of the monophyletic group. 5. Artificial taxon.?An artificial taxon is one that does not correspond to a unit involved in the evolutionary process or to a unit of evolutionary history. You will KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 0 M N M N A 3 C 0 A 3 C M N B C M N 0 A 3 C Fig. 1.1.?Examples of monophyletic (a), paraphyletic (b), and polyphyletic (c) groups. encounter two kinds of artificial groups. Paraphyletic groups are artificial because one or more descendants of an ancestor are excluded from the group (Fig. 1.1b). Examples include Dicotyledonae, Vermes, and Reptilia. Polyphyletic groups are artificial be- cause the common ancestor is placed in another taxon (Fig. 1.1c). An example would be the Homeothermia, a group composed of birds and mammals. Note that the term "ancestor" is used in its logical sense, i.e., the ancestor is unknown but its inclusion or exclusion can be deduced as a logical consequence of the grouping. The important contrast is between monophyletic groups and nonmonophyletic groups. Paraphyletic groups are as artificial as polyphyletic groups. Further, it is not always possible to distinguish clearly the status of a group as either paraphyletic or polyphyletic. 6. Grade.?A grade is an artificial taxon. Grade taxa are frequently paraphyletic and sometimes polyphyletic but are supposed to represent some level of evolutionary progress, level of organization, or level of adaptation (e.g., Reptilia or Vermes). 7. Ingroup.?The ingroup is the group actually studied by the investigator (Fig. 1.2a). That is, it is the group of interest. 8. Sister group.?A sister group is the taxon that is genealogically most closely related to the ingroup (Fig. 1.2a). The ancestor of the ingroup cannot be its sister because the ancestor is a member of the group. INTRODUCTION, TERMS, AND CONCEPTS 2nd Outgroup Sister Group Ingroup Branch Node Internode M B N Fig. 1.2.?A rooted (a) and unrooted (b) tree for the group ABC and two of its outgroups, N (the sister group) and M. 9. Outgroup.?An outgroup is any group used in an analysis that is not included in the taxon under study. It is used for comparative purposes, usually in arguments concerning the relative polarity of a pair (or series) of homologous characters. The most important outgroup is the sister group, and considerable phylogenetic research may be needed to find the sister group. Usually more than one outgroup is needed in an analysis. This will become apparent in Chapter 3. Quick Quiz?Groups Examine Fig. 1.1 and answer the following: 1. Why do we say that the group A+B+C and the group M+N are monophyletic? 2. Which taxa would have to be either included or excluded to change the paraphyletic groups into monophyletic groups? 3. Can polyphyletic groups ever contain monophyletic groups within them? 4. Where are the ancestors in these diagrams? 6 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 TERMS FOR THE RELATIONSHIPS OF TAXA 1. Relationship.?In the phylogenetic system, the term relationship refers to the genealogic or "blood" relationship that exists between parent and child or between sister and brother. In other systems, relationship can also refer to similarity, with the evolutionary implication that taxa that are more similar to each other are more closely related. This meaning is specifically excluded from the phylogenetic system. 2. Genealogy and genealogic descent.?A genealogy is a graphic representation of the descent of offspring from parents. Genealogic descent on the taxon level (i.e., between groups recognized as taxa) is based on the proposition that species give rise to daughter species through an array of mechanisms termed speciation. 3. Tree.?A tree is a branching structure and, in our sense, might contain reticula- tions as well as branches. A tree may be rooted (Fig. 1.2a) or unrooted (Fig. 1.2b) and is composed of several parts. A branch is a line connecting a branch point to a terminal taxon. A branch point, or node, represents a speciation event. This is true even if the taxa joined by the branch point are higher taxa such as families or phyla, because higher taxa originated as species. Branch points are sometimes represented by circles. An internode is a line connecting two speciation events and represents at least one ancestral species. (We say at least one because the statement is made relative to the species and groups we actually know about. It is always possible to find a new species or group of species that belongs to this part of the phylogeny. To make this addition, we would bisect the internode and create the possibility for an additional ancestral species.) The internode at the bottom of the tree is given the special term root. The term interval is a synonym of internode and is used in the Wagner algorithm (see Chapter 4). A neighborhood is an area of a tree relative to a particular taxon or taxa. In Fig. 1.2b, taxon B is the nearest neighbor of taxa A and C. Note that A may or may not be the sister of a monophyletic group B+C. This relationship cannot be established until the root is specified. 4. Phylogenetic tree.?A phylogenetic tree is a graphic representation of the genealogic relationships between taxa as these relationships are understood by a particular investigator. In other words, a phylogenetic tree is a hypothesis of genealogic relationships on the taxon level. Although it is possible for an investigator to actually name ancestors and associate them with specific internodes, most phylogenetic trees are common ancestry trees. Further, phylogenetic trees are hypotheses, not facts. Our ideas about the relationships among organisms change with increasing understanding. 5. Cladogram.?Cladograms are phylogenetic trees. They have specific connota- tions of implied ancestry and a relative time axis. Thus, a cladogram is one kind of phylogenetic tree, a common ancestry tree. In some modifications of the phylogenetic system, specifically what some have termed Transformed Cladistics, the cladogram is the basic unit of analysis and is held to be fundamentally different from a phylogenetic tree. Specifically, it is purely a depiction of the derived characters shared by taxa with no necessary connotation of common ancestry or relative time axis. INTRODUCTION, TERMS, AND CONCEPTS 7 6. Venn diagram.?A Venn diagram is a graphic representation of the relationships among taxa using internested circles and ellipses. The ellipses take the place of internode connections. A typical Venn diagram is contrasted with a phylogenetic tree in Fig. 1.3. Aves Crocodylia Lepidosauria Fig. 1.3.?A phylogenetic tree (a) and a Venn diagram (b) of three groups of tetrapod vertebrates. Quick Quiz?Relationships Examine Fig. 1.2a and answer the following: 1. What is the sister group of the clade N? 2. What is the sister group of the clade M? 3. What is the sister group of a group composed of M+N? 4. Where is the hypothetical ancestor of the ingroup on the tree? 5. How many ancestors can a group have? 6. Draw a Venn diagram of Fig. 1.2a. TERMS FOR CLASSIFICATIONS 1. Natural classification.?A classification containing only monophyletic groups and/or species is a natural classification. A natural classification is logically consistent with the phylogenetic relationships of the organisms classified as they are understood by the investigator constructing the classification. That is, the knowledge claims inherent in a natural classification do not conflict with any of the knowledge claims inherent in the phylogenetic tree. The protocols for determining if a classification is logically consistent with a phylogenetic tree are given in Chapter 6. 8 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 2. Artificial classification.?An artificial classification is a classification containing one or more artificial groups (i.e., one or more paraphyletic or polyphyletic groups). An artificial classification is logically inconsistent with the phylogenetic relationships of the organisms as they are understood by the investigator making the classification. That is, some of the knowledge claims inherent in the classification conflict with knowledge claims in the phylogenetic tree. 3. Arrangement.?An arrangement is a classification of a group whose phyloge- netic relationships are not known because no investigator has ever attempted to recon- struct the evolutionary history of the group. The vast majority of current classifications are arrangements. A particular arrangement may turn out to be either a natural or an artificial classification. Arrangements serve as interim and completely necessary ve- hicles for classifying organisms until the phylogenetic relationships of these organisms can be worked out. 4. Category.?The category of a taxon indicates its relative place in the hierarchy of the classification. The Linnaean hierarchy is the most common taxonomic hierarchy and its categories include class, order, family, genus, and species. The formation of the names of taxa that occupy certain places in the hierarchy are governed by rules contained in various codes of nomenclature. For example, animal taxa ranked at the level of the category family have names that end in -idae, whereas plant taxa ranked at this level have names that end in -aceae. It is important to remember that the rank of a taxon does not affect its status in the phylogenetic system. To the phylogeneticist, all monophyletic taxa are equally important and all paraphyletic and polyphyletic taxa are equally misleading. Classifications and arrangements are usually presented as hierarchies of names, with relative position in the hierarchy (rank) noted by categories. However, these classifications can be portrayed as tree diagrams and as Venn diagrams. The use of these methods of presenting classifications is discussed in Chapter 6. Quick Quiz?Classification 1. In the phylogenetic system, must the taxa be clades? 2. In the phylogenetic system, must categories be clades? 3. Which is more important, a phylum or a genus? PROCESS TERMS Three process terms are of particular importance in the phylogenetic system. Specia- tion results in an increase in the number of species in a group. Speciation is not a single process but an array of processes. Cladogenesis is branching or divergent evolution and is caused by speciation. Anagenesis is change within a species that does not involve branching. The extent to which anagenesis and cladogenesis are coupled is an interest- ing evolutionary question but not a question that must be settled to understand the phylogenetic system. INTRODUCTION, TERMS, AND CONCEPTS 9 TERMS FOR THE ATTRIBUTES OF SPECIMENS 1. Character.?A character is a feature, that is, an observable part of, or attribute of, an organism. 2. Evolutionary novelty.?An evolutionary novelty is an inherited change from a previously existing character. The novelty is the homologue of the previously existing character in an ancestor/descendant relationship. As we shall see below, novelties are apomorphies at the time they originate. 3. Homologue.?Two characters in two taxa are homologues if one of the following two conditions are met: 1) they are the same as the character that is found in the ancestor of the two taxa or 2) they are different characters that have an ancestor/descendant relationship described as preexisting/novel. The ancestral character is termed the plesiomorphic character, and the descendant character is termed the apomorphic character. The process of determining which of two homologues is plesiomorphic or apomorphic lies at the heart of the phylogenetic method and is termed character polarization or character argumentation. Three (or more) characters are homologues if they meet condition 2. 4. Homoplasy.?A homoplasy is a similar character that is shared by two taxa but does not meet the criteria of homology. Every statement of homology is a hypothesis subject to testing. What you thought were homologues at the beginning of an analysis may end up to be homoplasies. 5. Transformation series.?A transformation series (abbreviated TS in some tables and exercises) is a group of homologous characters. If the transformation series is ordered, a particular path of possible evolution is specified but not necessarily the direction that path might take. All transformation series containing only two homolo- gous characters (the binary condition) are automatically ordered but not necessarily polarized (contrast Fig. 1.4a and Fig. 1.4b). Transformation series having more than two characters are termed multicharacter (or multistate) transformation series. If a multistate transformation series is unordered (Fig. 1.4c), several paths might be pos- sible. Ordered transformation series are not the same as polarized transformation series (compare Figs. 1.4d and 1.4e). An unpolarized transformation series is one in which the direction of character evolution has not been specified (Figs. 1.4a, c, d). A polarized transformation series is one in which the relative apomorphy and plesiomorphy of characters has been determined by an appropriate criterion (Figs. 1.4b, e). It is possible for a transformation series to be both unordered and polarized. For example, we might know from outgroup comparison that 0 is the plesiomorphic state, but we might not know whether 1 gave rise to 2, or vice versa, or whether 1 and 2 arose independently from 0. Ordering and polarization of multicharacter transformation series can become very complicated, as we shall see in Chapter 3. Our use of the convention "transforma- tion series/character" differs from that of many authors who use "character" as a synonym for "transformation series" and "character state" as a synonym for "charac- ter." We use "transformation series/character" instead of "character/character state" in our research and in this workbook for philosophical reasons. The "character/character state" convention reduces "character" to a term that does not refer to the attributes of organisms but instead to a class construct that contains the attributes of organisms, 10 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 a 0 ? 1 /\ b 0 >-1 C 1 -< >? 2 d 0 -< ^- 1 -< >? 2 e o ^ 1 > 2 Fig. 1.4.?Characters, a. Unpolarized binary characters, b. Polarized binary characters, c. An unordered transformation series of three characters, d. The same transformation series ordered but not polarized, e. The same transformation series ordered and polarized. homologues or not. For example, dandelions do not have "color of flower" as an attribute; they have "yellow flowers." We adopt "transformation series/character" because it explicitly avoids the construction of character classes and implicitly encour- ages the investigator to use characters hypothesized to be homologues of each other. 6. Character argumentation.?Character argumentation is the logical process of determining which characters in a transformation series are plesiomorphic and which are apomorphic. Character argumentation is based on a priori arguments of an "if, then" deductive nature and is based on outgroup comparison. This process is frequently termed "polarizing the characters." Polarity refers to which of the characters is plesiomorphic or apomorphic. Character argumentation will be covered in detail in Chapter 3. 7. Character optimization.?Character optimization consists of a posteriori argu- ments as to how particular characters should be polarized given a particular tree topology. Character optimization might seem a priori when used in a computer pro- gram, but it is not. 8. Character code and data matrix.?Phylogenetic systematists are quickly convert- ing to computer-assisted analysis of their data. When using a computer, the investigator produces a data matrix. Usually, the columns of the matrix are transformation series and the rows are taxa. A code is the numerical name of a particular character. By convention, the code "1" is usually assigned to the apomorphic character and "0" to the plesiomorphic character of a transformation series if the polarity of that series is determined (=hypothesized) by an appropriate method of polarization. If the transfor- mation series consists of more than two characters, additional numerical codes are assigned. Alternatively, the matrix might be coded using binary coding as discussed in Chapter 3. There are many ways of reflecting the code of a character when that character is placed on a tree. We will use the following convention: characters are denoted by their transformation series and their code. The designation 1-1 means "transformation series 1, character coded 1." Some basic ways of coding characters are discussed in Chapter 3. INTRODUCTION, TERMS, AND CONCEPTS 11 9. Tree length.?The length of a tree is usually considered the number of evolution- ary transformations needed to explain the data given a particular tree topology. You will probably need some time to assimilate all of the definitions presented. A good strategy is to review this chapter and then go to Chapter 2, working your way through the examples. We have found that deeper understanding comes from actual work. Although we cannot pick a real group for you to work on, we have attempted the next best thing, a series of exercises designed to teach basic phylogenetic techniques that we hope will mimic real situations as closely as possible. Quick Quiz?Characters 1. How would the transformation series in Fig. 1.4c look if it were polarized and unordered? 2. Is character "1" in Fig. 1.4e apomorphic or plesiomorphic? CHAPTER NOTES AND REFERENCES 1. There is no substitute for reading Hennig (1966). We suggest, however, that you become familiar with most of the basics before attempting to read the 1966 text. Hennig (1965) is the most accessible original Hennig. Other classics include Brundin (1966) and Crowson (1970). An interesting analysis of Hennig's impact on systematics can be found in Dupuis (1984). A considerable portion of the history of phylogenetic thought (and indeed post-1950 systematics) can be followed in a single journal, Systematic Zoology. We highly recommend that students examine this journal. 2. Post-Hennig texts that are suitable for beginners are Eldredge and Cracraft (1980), Wiley (1981a), Ridley (1985), Schoch (1986), Ax (1987), and Sober (1988a). A more difficult text written from the point of view of the transformed cladists is Nelson and Platnick (1981). 3. A very readable review of the entire field of systematics is Ridley (1985), whose defense of phylogenetics and criticisms of traditional (evolutionary) taxonomy, phenet- ics, and transformed cladistics are generally on the mark. QUICK QUIZ ANSWERS Groups 1. They are monophyletic because no descendant of their respective common ancestor is left out of the group. 2. To make the group O+A+B monophyletic, you would have to include C. To make the group N+O+A+B+C monophyletic, you could either include M or exclude N. 3. Yes; e.g., N+A+B+C contains the monophyletic group ABC. 4. You were pretty clever if you answered this one because we haven't covered it yet. The ancestors are represented by internodes between branches. Obviously they are hypothetical because none of them are named. 12 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 Relationships 1. The ingroup (A+B+C) is the sister group of N. 2. N plus the ingroup is the sister group of M. 3. A group composed of M and N is paraphyletic. Paraphyletic groups are artificial and thus cannot have sister groups. 4. The internode labeled "Internode." 5. A bunch, stretching back to the origin of life. But we usually refer only to the immediate common ancestor. 6. Classification 1. Only clades (monophyletic groups and species) are permitted in the phylogenetic system. Grades are specifically rejected. As you will see in Chapter 6, this is because classifications that contain even a single grade are logically inconsistent with the phylogeny of the group containing the grade. 2. Categories are not taxa. They are designations of relative rank in a classification. As such, categories are neither clades nor grades. 3. All monophyletic taxa are equally important and interesting to the phylogeneticist. Characters l. 1 -<- ->- 2 2. Character 1 is both apomorphic and plesiomorphic. It is apomorphic relative to 0 and plesiomorphic relative to 2. CHAPTER 2 BASIC PHYLOGENETIC TECHNIQUES Phylogenetic systematists work under the principle that there is a single and histori- cally unique genealogic history relating all organisms. Further, because characters are features of organisms, they should have a place on the tree representing this history. The proper place for a character on the tree is where it arose during evolutionary history. A "proper" tree should be one on which the taxa are placed in correct genealogic order and the characters are placed where they arose. For example, in Fig. 2.1 we show a tree of some major land plant groups with some of their associated characters. This tree can be used to explain the association of characters and taxa. The characters xylem and phloem are placed where they are because the investigator has hypothesized that both arose in the common ancestor of mosses and tracheophytes. In other words, they arose between the time of origin of the homworts and its sister group. Xylem and phloem are thought to be homologous in all plants that have these tissues. Thus, each appears only once, at Liverworts Homworts Mosses Tracheophytes Oil bodies Elaters in sporangium Lunularic acid Pseudo-elaters in sporangium Spore production nonsynchronous Intercalary meristem in sporophyte Leaves on gametophyte Multicellular rhyzoids True lignin Ornamented tracheid walls Independent sporophyte Branched sporophyte Xylem Phloem Polyphenolics in xylem wall Perine layer on spores Aerial sporophyte axis Ability to distinguish D-methionine Stomates Fig. 2.1.?The phylogenetic relationships among several groups of plants (after Bremer, 1985). Synapomorphies and autapomorphies for each group are listed. Some characters are not shown. 14 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 the level of the tree where each is thought to have arisen as an evolutionary novelty. Now, if we did not have this tree but only the characters, we might suspect that all of the taxa that have xylem and phloem shared a common ancestor that was not shared by taxa that lack xylem and phloem. Our suspicion is based on a complex of hypotheses, including the fact that xylem and phloem from different plants are very similar, being composed of a few basic cell types or obvious derivatives of these cell types. In the phylogenetic system, such detailed similarity is always considered evidence for homol- ogy, which necessarily implies common origin. This concept is so important that it has been given the name Hennig's Auxiliary Principle (Hennig, 1953, 1966). Hennig's Auxiliary Principle.?Never assume convergence or parallel evolution, always assume homology in the absence of contrary evidence. This principle is a powerful one. Without it, we must pack up and go home because we could exclude any similarity that we don't wish to deal with by asserting that "it probably arose by convergent evolution." Of course, just because we use Hennig's Auxiliary Principle doesn't mean that we believe that convergences are rare or nonex- istent. Convergences are facts of nature and are rather common in some groups. But to pinpoint convergence, you first have to have a tree, and without Hennig's Auxiliary Principle you will never get to a tree because you will be too worried that the characters you study are convergences. Hennig is simply suggesting that you sit back and let his method worry about convergences rather than doing something rash and ad hoc. Back to the xylem and phloem. With Hennig's Auxiliary Principle, you can deduce that plants that have xylem and phloem shared a common ancestor not shared with other plants. Of course, you don't make such a deduction in a vacuum. You "know" that more primitive plants lack xylem and phloem, and thus it's a good guess that having xylem and phloem is more derived than lacking xylem and phloem. This deduction is a primitive sort of outgroup comparison, which we will discuss in some detail in Chapter 3. For now, we want you to consider another principle. Grouping Rule.?Synapomorphies are evidence for common ancestry relationships, whereas symplesiomorphies, convergences, and parallelisms are useless in providing evidence of common ancestry (Hennig, 1966). Intuitively, you know convergences and parallelisms (both termed homoplasies) are useless for showing common ancestry relationships because they evolved indepen- dently. However, plesiomorphies are also homologies. So, why can't all homologies show common ancestry relationships? The answer is that they do. It's just that symplesiomorphies can't show common ancestry relationships at the level in the hierarchy you are working at because they evolved earlier than any of the taxa you are trying to sort out. In addition, they have already been used at the level where they first appeared. If they hadn't been used, you would not be where you are. For example, you would never hypothesize that pineapples are more closely related to mosses than to some species of mistletoe based on the plesiomorphy "presence of chlorophyll." If you BASIC PHYLOGENETIC TECHNIQUES 15 accepted that as valid evidence, then you would have to conclude that pineapples are also more closely related to green algae than to mistletoes. A common complaint by traditional taxonomists is that "cladists" use only part of the data available to them (Cronquist, 1987). This is not true, as the above example demonstrates. What we do is to attempt to find the correlation between the relative age and origin of characters. Finally, we have to consider how to combine the information from different transfor- mation series into hypotheses of genealogic relationships. There are several ways of accomplishing this, depending on the algorithm you use. We will find out more about this as we proceed through the workbook. For now, we will use an old-fashioned (and perfectly valid) grouping rule that goes back to the roots of the phylogenetic method, the inclusion/exclusion rule. This rule is implicit in the early work of Hennig (1966), as well as being used as an explicit rule in the much later group compatibility algorithm developed by M. Zandee (Zandee and Geesink, 1987). Inclusion/Exclusion Rule.?The information from two transformation series can be combined into a single hypothesis of relationship if that information allows for the complete inclusion or the complete exclusion of groups that were formed by the separate transformation series. Overlap of groupings leads to the generation of two or more hypotheses of relationship because the information cannot be directly com- bined into a single hypothesis. The inclusion/exclusion rule is directly related to the concept of logical consistency. Trees that conform to the rule are logically consistent with each other. Those trees that show overlap are logically inconsistent with each other. This can be shown graphically using Venn diagrams. You can get an idea of how this rule works by studying the examples in Fig. 2.2. In Fig. 2.2a, we have four characters and four trees. The first tree contains no character information. It is logically consistent with any tree that has character information. The second tree states that N, O, and P form a monophyletic group based on characters from two transformation series (1-1 and 2-1). The third tree states that O and P form a monophyletic group based on two additional characters (3-1 and 4-1). Note that O+P is one of the possible groupings that could be found in the group N+O+P, and N+O+P completely includes O+P. The fourth tree combines these logically consistent hypoth- eses of relationship. Thus, these data lead to two groupings that are logically consistent with each other. The second example, Fig. 2.2b, shows the result of the inclusion of two smaller monophyletic groups (S+T) and (U+V) within a larger group (S-V). In Fig. 2.2c, we have an example of the violation of the inclusion/exclusion rule. All six transformation series imply groupings that can be included within the larger group A- D. Both C+B and C+D can be included within the group B+C+D. However, their knowledge claims conflict, and the groups overlap (Fig. 2.2d). Transformation series 1- 1 and 2-1 imply a group C+B while excluding D, and transformation series 3-1 and 4-1 imply a group C+D while excluding B. C is included in two different groups, as shown by the Venn diagram in Fig. 2.2d. As a result, there are two equally parsimonious trees that are logically inconsistent with each other. To resolve which of these trees (or another tree) is preferable, we would have to analyze more data. 16 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 MNOP M NOP MNOP MNO P a STUV STUV UVST S TU V ABC D = 4-1 ADCB ABCD ABCD \ ^-1 Fig. 2.2.?Three examples (a-c) of the use of the inclusion/exclusion rule for combining the information of different transformation series into trees, d. A Venn diagram showing the logical inconsistency in c. BASIC PHYLOGENETIC TECHNIQUES 17 Quick Quiz?Basic Rules of Analysis 1. Does it follow from Hennig's Auxiliary Principle that birds and insects share a common ancestor not shared with, say, crocodiles because both have wings and are capable of flight? 2. Lizards and crocodiles have amniotic eggs. Does it follow from the Grouping Rule that lizards and crocodiles share a common ancestor? 3. How can you tell that "presence of chlorophyll a" is a plesiomorphy rather than a synapomorphy? SAMPLE ANALYSES We will cover the complexities of character argumentation in the next few chapters. The exercises below are based on the proposition that the outgroup has plesiomorphic characters. You can determine which character of the transformation series is plesiomorphic by simple inspection of the outgroup. (By the time you get through with Chapter 3, you will see that such a simple rule doesn't always hold, but it's good enough to get through these exercises.) Grouping is accomplished by application of the Group- ing Rule. We will first take you through two exercises. Then we present a series of data matrices for you to work with. (Solutions to all exercises are in the back.) Example 2.1.?The relationships of ABCidae. 1. Examine transformation series (TS) 1 in Table 2.1. It is composed of characters in the first column of the data matrix. We can draw a tree with the groupings implied by the synapomorphy found in the transformation series. We can do the same for TS 2. Our results look like the two trees to the left in Fig. 2.3. Because both imply the same groupings, we can say that they are topologically identical. That is, they are isomor- phic. The combination of the two trees, by applying the Grouping Rule, is the tree on the right. We can calculate a tree length for this tree by simply adding the number of synapomorphies that occur on it. In this case, the tree length is two steps. Table 2.1.?Data matrix for ABCidae (Example 2.1). Transformation series Taxon 1 2 3 4 5 6 7 X (outgroup) 0 0 0 0 0 0 0 A 1 1 0 0 0 0 0 B 1 1 1 1 0 0 0 C 1 1 1 1 1 1 1 18 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 X B C X B X B C Fig. 2.3.?Trees for transformation series 1 and 2 (Example 2.1). 2. If we inspect TS 3 and TS 4, we see that the synapomorphies have identical distributions, both implying that B and C form a monophyletic group (Fig. 2.4). If we put both of these synapomorphies on a tree, the results should look like the tree on the right in Fig. 2.4. What is the length of this tree? X B X B X B Fig. 2.4.?Trees for transformation series 3 and 4 (Example 2.1). 3. Note that only C has the apomorphies in TS 5, 6, and 7. Unique (single occur- rence) apomorphies are termed autapomorphies. They are not useful for grouping, as we can see in Fig. 2.5, but they are useful for diagnosing C. Autapomorphies also count when figuring tree length but not when comparing trees. The length of this tree is three steps. X B Fig. 2.5.?Tree for transformation series 5, 6, and 7 (Example 2.1). 4. We now have three different tree topologies. If we look at them closely, we can see that although the three trees are topologically different, they do not contain any conflicting information. For example, the tree implied by TS 5-7 does not conflict with the trees implied by the other transformation series because all that TS 5-7 imply is that BASIC PHYLOGENETIC TECHNIQUES 19 C is different from the other four taxa. Further, TS 1 and 2 do not conflict with TS 3 and 4 because TS 1 and 2 imply that A, B, and C form a monophyletic group, whereas TS 3 and 4 imply that B and C form a monophyletic group but say nothing about the relationships of A or the outgroup, X. Trees that contain different but mutually agree- able groupings are logically compatible or fully congruent. They can be combined without changing any hypothesis of homology, and the length of the resulting tree is the sum of the lengths of each subtree. For example, we have combined all of the informa- tion in the data matrix to produce the tree in Fig. 2.6. Its length is seven steps, the total of the number of steps of the subtrees. X A B C Fig. 2.6.?The best estimate of the common ancestry relationships of A, B, and C, given the data in Example 2.1. Example 2.2.?Analysis of MNOidae. The first thing you should notice about this matrix (Table 2.2) is that it has more characters scored as "1." Let's work through it. Table 2.2.?Data matrix for MNOidae (Example 2.2). Transformation series Taxon 1 2 3 4 5 6 7 X (outgroup) 0 0 0 0 0 0 0 M 1 1 0 0 1 1 1 N 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1. TS 1 and TS 2 imply that M, N, and O form a monophyletic group as shown in Fig. 2.7. X M N O Fig. 2.7.?Tree for transformation series 1 and 2 (Example 2.2). 20 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 2. TS 3 and TS 4 imply that N and O form a monophyletic group (Fig. 2.8). X M N O Fig. 2.8.?Tree for transformation series 3 and 4 (Example 2.2). 3. Finally, TS 5, 6, and 7 imply that M and N form a monophyletic group (Fig. 2.9). O N M Fig. 2.9.?Tree for transformation series 5, 6, and 7 (Example 2.2). 4. At this point you should suspect that something has gone wrong. TS 3 and 4 imply a monophyletic group that includes N and O but excludes M, whereas TS 5-7 imply a monophyletic group that includes M and N but excludes O. There must be a mistake, because we have violated the inclusion/exclusion rule. In such a situation, we invoke another important principle of phylogenetic analysis: there is only one true phylog- eny. Thus, one of our groupings must be wrong. (In fact, they might both be wrong, but the Auxiliary Principle keeps us going until such time that we demonstrate that both are wrong.) In this situation, we are faced with two logically incompatible trees (Fig. 2.10). Note that there is some congruence because both trees have the apomorphies of the first two transformation series. Fig. 2.10.?Trees for the two different sets of consistent transformation series (Example 2.2). BASIC PHYLOGENETIC TECHNIQUES 21 5. You should have guessed by now that neither of the trees shown above is really a complete tree. The tree on the right lacks TS 5, 6, and 7, and the one on the left lacks TS 3 and 4. Leaving out characters is not acceptable. About the only way that you can get into more trouble in phylogenetic analysis is to group by symplesiomorphies. Before we start, consider how characters might be homoplasious. A character might be a convergence/parallelism, or it might be a reversal to the "plesiomorphic" character. We must consider both kinds of homoplasies. In Fig. 2.11a, TS 3 and 4 are put on the tree under the assumption that 3-1 and 4-1 arose independently (i.e., via convergence/ parallelism). In Fig. 2.11b, we have placed 5-1, 6-1, and 7-1 on the alternate tree as convergences. In Fig. 2.11c, we assume that 3-1 and 4-1 arose in the common ancestor of the group and that M has reverted to the plesiomorphic character. Thus, 3-0 and 4-0 appear on the tree as autapomorphies of M. We have done the same thing for O in Fig. 2.lid for TS 5-7, given the alternative hypothesis. X O M N X M O N X M O N 7-0 Fig. 2.11.?Alternative hypotheses of the relationships of M, N, and O based on characters of Example 2.2. O = character showing convergence/parallelism or reversal (homoplasies). 6. The question is?which of these trees should we accept? That turns out to be a rather complicated question. If we adhere to the Auxiliary Principle, we should strive for two qualities, the greatest number of homologies and the least number of homopla- sies. These qualities are usually consistent with each other; that is, the tree with the greatest number of synapomorphies is also the tree with the least number of homopla- 22 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 sies. But you can find exceptions to this. Fortunately, both of these qualities are related to tree length. When you count the number of steps in the four trees in Fig. 2.11, you will find that trees a and c have nine steps, and trees b and d have 10 steps. We accept trees a and c as the best estimates of the phylogeny because they have a shorter length and thus the greatest number of statements of homology and the fewest number of statements of homoplasy for the data at hand. Note that such statements are relative only to trees derived from the same data set. The Auxiliary Principle coupled with the principle that there is only one phylogeny of life carried us to this point. Methodologi- cally, we have employed the principle of parsimony. In the phylogenetic system, the principle of parsimony is nearly synonymous with the Auxiliary Principle. We can see three additional characteristics. First, trees a and c are topologically identical. There- fore, the common ancestry relationships hypothesized are identical. Second, these two trees make different claims concerning character evolution. Third, they are equally parsimonious, therefore we cannot make a choice about character evolution unless we employ some parsimony criterion other than tree length. 7. Finally, we can evaluate the performance of each character originally coded as a synapomorphy by calculating a consistency index for it. The consistency index (CI) of a character is simply the reciprocal of the number of times that a character appears on the tree. The CI is a favorite summary "statistic" in computer programs such as PAUP (Swofford, 1990) and MacClade (Maddison and Maddison, in press); therefore, it is good to practice some hand calculations so that you will know how the CI works. We will discuss this index and other measures of tree comparisons in Chapter 5. For example, in one most parsimonious tree (Fig. 2.11a), the apomorphy coded 1 in TS 3 appears twice, so its CI is ^ 0.5. For a given tree, we can see that a character is not really a synapomorphy by simple inspection of its CI. True homologues (real synapomorphies) have CIs of 1.0. Of course, our best estimates of true homologues come a posteriori, that is, in reference to the best estimate of common ancestry relationships. We do not know in advance that a particular derived similarity will turn up with a CI less than 1.0. EXERCISES For each of the exercises below do the following: 1. Derive trees for each transformation series or each set of transformation series with the same distribution of synapomorphies (like TS 1 and TS 2 in Example 2.1). 2. Combine the logically consistent subtrees into the shortest tree or trees accounting for all of the transformation series. Don't forget to account for the homoplasies as well as the synapomorphies and autapomorphies. In some data matrices, there will only be one such tree, in others there will be two. Tip: Search the trees obtained above for groups that reoccur. Use these first (for example, sus+tus and vus+uus in Exercise 2.1). BASIC PHYLOGENETIC TECHNIQUES 23 3. Calculate the length of each tree and the CI for each character originally coded as a synapomorphy. EXERCISE 2.1.?Analysis of Sus (Table 2.3). Table 2.3.?Data matrix for analysis of Sus (Exercise 2.1). Transformation series Taxon 1 2 3 4 5 6 7 8 9 10 11 12 Outgroup 0 0 0 0 0 0 0 0 0 0 0 0 S. sus 0 1 1 1 1 1 0 1 0 0 0 S. tus 1 1 1 1 1 0 0 0 1 0 0 S. uus 0 1 1 0 0 0 0 0 0 0 0 S. vus 1 0 0 0 0 1 1 0 0 1 0 S. wus 1 0 0 0 0 1 1 0 0 0 1 EXERCISE 2.2.?Analysis of Midae (Table 2.4). Table 2.4.?Data matrix for analysis of Midae (Exercise 2.2). Transformation series Taxon 1 2 3 4 5 6 7 8 9 10 11 Outgroup 0 0 0 0 0 0 0 0 0 0 0 Mus 1 0 1 1 1 0 0 1 1 1 Nus 1 0 0 1 1 1 0 0 1 1 Ous 1 0 0 1 1 1 0 0 1 1 Pus 0 1 1 0 0 0 1 0 0 0 Qus 1 1 1 0 1 1 0 0 0 0 Rus 1 1 1 0 0 1 0 0 0 0 EXERCISE 2.3.?Analysis of Aus (Table 2.5). Table 2.5.?Data matrix for analysis of Aus (Exercise 2.3). Transformation series Taxon 1 2 3 4 5 6 7 8 9 10 Outgroup 0 0 0 0 0 0 0 0 0 0 A. aus 0 1 0 1 1 0 0 0 0 A. bus 0 1 0 1 0 1 0 0 0 A. cus 1 1 0 0 0 0 0 0 1 A. dus 1 0 1 0 0 0 1 0 0 A. eus 1 0 1 0 0 1 0 1 0 24 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 CHAPTER NOTES AND REFERENCES 1. All of the texts cited in Chapter 1 cover the fundamentals of reconstructing phylogenetic relationships, but they each do so from a slightly to very different point of view. The inclusion/exclusion criterion is not usually seen in the form we present it. Indeed, it is not the way one usually goes about doing phylogenetic reconstructions. We adopted our approach because it seemed to be the simplest one to use to teach basic principles. The inclusion/exclusion approach is explicit in the "group compatibility" approach of Zandee and Geesink (1987). 2. Considerable controversy surrounds the philosophical nature of phylogenetic hypothesis testing and its relationship to parsimony. Farris (1979, 1983), Sober (1983), Kluge (1984, 1989), and Brooks and Wiley (1988) present parsimony as the relevant criterion for judging competing hypotheses. This is in direct contrast to Wiley (1975, 1981a), who attempted to reconcile parsimony and the hypothetico-deductive approach of Popper (1965), or to Felsenstein (1978, 1983), who argued that parsimony might not be the preferred criterion. Most of this controversy has been summarized by Sober (1988a). QUICK QUIZ ANSWERS 1. Not when you examine the wings in detail. Insect wings have a completely different structure when compared with bird wings. They are so different that the best hypothesis is that the wings are not homologous. This leads to the hypothesis that flight has evolved independently in each group. Further, there is much evidence in the form of other characters that leads to the hypothesis that flying insects are more closely related to other insects that do not have wings and that birds are more closely related to other vertebrates. 2. Yes, it does follow. However, many other vertebrate groups, such as birds and mammals, also have an amniotic egg. Thus, while lizards and crocodiles certainly share a common ancestor, we cannot hypothesize that they share a common ancestor not shared also with birds and mammals. Rather, the amniotic egg is a character that provides evidence for a common ancestry relationship among all amniotes (as a synapomorphy), and its presence in lizards and crocodiles is a plesiomorphic homologous similarity. 3. You can deduce that the presence of chlorophyll a is a plesiomorphy in the same manner that you can deduce whether a character is an apomorphy, by outgroup comparison. This is covered in the next chapter. CHAPTER 3 CHARACTER ARGUMENTATION AND CODING This chapter is designed to teach the following skills: 1) interpretation of a phyloge- netic tree in terms of nodes and internodes, 2) polarization of characters at nodes and internodes on the phylogenetic tree according to the criterion of phylogenetic parsi- mony as evidenced by outgroup comparison, and 3) character coding. OUTGROUP COMPARISON Hennig (1966) and Brundin (1966) characterized the essence of phylogenetic analy- sis as the "search for the sister group." They recognized that if you can find the closest relative or relatives of the group you are working on, then you have the basic tools for deciding which characters are apomorphic and which are plesiomorphic in a transfor- mation series. The argument goes something like this. As an investigator, you see that members of your group have two different but homologous characters, "round pupils" and "square pupils." As a phylogeneticist, you know that one of these characters, the apomorphic one, might diagnose a monophyletic group, but both cannot (the Grouping Rule). If you think about it, Hennig's reasoning becomes clear. If you find square pupils in the sister group of the taxon you are studying, then it is fairly clear that "square pupils" is older than "round pupils," and if this is true, then "square pupils" must be the plesiomorphic character in the transformation series. Therefore, reasoned Hennig, the characteristics of the sister group are vital in making an intelligent decision regarding polarity within the taxon studied. The simplest rule for determining polarity can be stated in the following way. Rule for Determining Relative Apomorphy.?Of two or more homologous charac- ters found within a monophyletic group, that character also found in the sister group is the plesiomorphic character, and the one(s) found only in the ingroup is (are) the apomorphic one(s). As it turns out, actual polarity decisions can be a little more complicated than our simple example. What if, for example, we don't know the exact sister group but only an array of possible sister groups? What if the sister group is a monophyletic group, and it also has both characters? What if our group is not monophyletic? What if "square pupils" evolved in the sister group independently? POLARITY DECISIONS The answers to these questions depend on our ability to argue character polarities using some formal rules. The most satisfactory discussion of these rules was published by Maddison et al. (1984). We will present the case developed by them for situations 26 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 where the phylogenetic relationships among the outgroups and the relationships of these taxa to the ingroup are known. Some handy shortcuts, such as the "doublet rule," will also be covered. However, before we can examine these rules, we need to learn some terms. The ingroup node (IG node) represents the ancestor of the taxon we will eventually be analyzing, once we determine the polarities of our transformation series. The same characters are assigned to the IG node as would be assigned to the ancestral intemode, and the terms are interchangeable. If we determine that "square pupils" is primitive, then "square pupils" is assigned to the IG node. To know what is plesiomorphic within the group, we only need to know what character was found at the IG node. In this case, because "square pupils" is assigned to the IG node, the character "round pupils" can be used to diagnose a possible monophyletic group within the IG. Unfortunately, the ancestor isn't around, and thus we must infer what character it had. In Fig. 3.1, the character "square pupils" is coded "a," and "round pupils" is coded "b." The outgroup node (OG node) is the node immediately below the IG node. Don't be confused (like some of our students) and think that the OG node refers to characters that are associated with the IG internode; instead, the OG node is associated with characters of the internode immediately below it (in a manner similar to the IG node). So, the general rule is the internode is associated with the node directly above it. This is important because characters are usually put on intemodes rather than nodes; therefore, it is important to remember that these characters belong to the node above them rather than the node below them. Characters are designated by small letters and are placed where taxa are usually labeled. Letters are used purely as a heuristic device and to avoid connotations of OG IG a ?<- characters of a TS IG node IG internode OG node OGs IG OGs IG Fig. 3.1.?a. Tree illustrating some general terms used in this chapter, b. Known outgroup relationships, c. Unknown outgroup relationships, d. A decisive character polarity decision with "a" at the OG node. e. An equivocal character polarity decision with "a,b" at the OG node. CHARACTER ARGUMENTATION AND CODING 27 primitive and derived. The ingroup in Fig. 3.1a is indicated by a polytomy because we presume that the relationships among members are unknown. In all other diagrams, the ingroup is indicated by a shaded triangle, which means exactly the same thing as the polytomy but is easier to draw. Note that the ingroup always has both characters. The relationships among outgroups can either be resolved (Fig. 3.1b) or unresolved (Fig. 3.1c). A decision regarding the character found at the outgroup node may be either decisive (Fig. 3.Id) or equivocal (Fig. 3.1e). If decisive, then we know that the best estimate of the condition found in the ancestor of our ingroup is the character in question (in this case, "a" is plesiomorphic and "b" is apomorphic). If equivocal, then we are not sure; either "a" or "b" could be plesiomorphic. Maddison et al. (1984) treat the problem of polarity as one in which the investigator attempts to determine the character to be assigned to the OG node. In effect, it is the character of the ancestor of the ingroup and its sister group (first outgroup) that will give us information about the characters of the common ancestor of the ingroup. Simple parsimony arguments are used in conjunction with an optimization routine developed by Maddison et al. (1984) that was built on the earlier routines of Farris (1970) and Fitch (1971). There are two cases. The first case is relatively complete and is built on known relationships among the outgroups relative to the ingroup. The second case is where the relationships among the outgroups are either unknown or only partly re- solved. Because the first case is the simplest, we will use it to describe the general algorithm. To illustrate the algorithm, we will use the following character matrix (Table 3.1) for the hypothetical group M-S. Sidae is the ingroup, and M, N, O, P, Q, and R are outgroups. The sister group is PQR. Table 3.1.?Data matrix for the analysis of Sidae (Example 3.1). Taxon TS M N 0 P Q R Sidae 1 b a a b b a a,b 2 b b a b b a a,b 3 a b b b b a a,b 4 a a,b a b b a a,b Example 3.1.?Character polarity in the group Sidae. 1. Draw the phylogenetic tree of the ingroup and outgroups. You cannot reconstruct the entire tree on the basis of the characters in the matrix shown above. These characters relate to the resolution of relationships in the ingroup, not to the relationships of the ingroup to the outgroup taxa. Presumably, you have either done an analysis or you have used the analysis of another investigator. Figure 3.2 shows the result of this previous analysis with the nodes numbered. 28 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 M N O P Q R Sidae Fig. 3.2.?The relationships of the Sidae and its closest relatives. Outgroups are letters, nodes are numbers (Example 3.1). 2. For each transformation series, label each of the branches with the character for that taxon. Use the character matrix for this task. This has been done for TS 1 in Fig. 3.3. M N O P Q R Sidae Taxa baa b b a a,b Characters OG node 1 Root node Fig. 3.3.?The relationships of the Sidae and its relatives, with characters from TS 1 (Table 3.1) and relevant nodes labeled (Example 3.1). 3. Proceeding from the most distant branches (in this case, M and N), do the following. Label the node "a" if the lower node and adjacent branch are both "a" or "a" and "a,b"; label it "b" if the lower node and adjacent branch are "b" or "b" and "a,b." If these branches/nodes have different labels, one "a" and the other "b," then label the node "a,b." Note that node 1 is not labeled; it is termed the root node. For us to label the root, we would need another outgroup. Because we are not interested in the root or in the outgroups, we forget about this node. After all, we are supposed to be solving the relationships of the ingroup. Node 2, the node immediately above the root, is labeled "a,b" in Fig. 3.4 because the first branch (M) is "b" and the second branch (N) is "a." CHARACTER ARGUMENTATION AND CODING 29 M N O baa P Q R Sidae b b a a,b Taxa Characters Fig. 3.4.?First polarity decision for TS 1, analysis of the Sidae (Example 3.1). 4. Inspect the tree. Do any of the outgroups have a branching structure? In this example, the sister group has a branching structure. For each group of this kind, you need to assign values to their lowest node. So, you work down to the lowest node. Assign to the highest node in such a group a value derived from its two branches. For example, the value "b" is assigned to node 4 in Fig. 3.5. M N O baa P Q R Sidae b b a a,b Taxa Characters Fig. 3.5.?Second polarity decision for TS 1, analysis of the Sidae (Example 3.1). 5. Continue in the direction of the ingroup to the next nodes. For this we use a combination of previous decisions (labeled nodes) and new information from terminal taxa whose ancestral nodes have not been labeled. For example, node 3 in Fig. 3.6 is assigned a decisive "a" based on the "a" of O and the "a,b" of node 2. Node 5 is assigned "a,b" based on the "a" of R and the "b" of node 4. 30 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 M N O baa P Q R Sidae b b a a,b Taxa Characters Fig. 3.6.?Third and fourth polarity decisions for TS 1, analysis of the Sidae (Example 3.1). 6. The analysis is over when we reach an assignment concerning the OG node. In this example, the assignment to node 6 is a decisive "a" (Fig. 3.7). M N O POP, Sidae Taxa baa bba a,b Characters Fig. 3.7.?Assignment of polarity to the OG node for TS 1, analysis of the Sidae (Example 3.1). Figure 3.8 shows characters of TS 2 of the matrix worked out for each node. Note that in this case the decision is equivocal for the OG node. M N O P Q R Sidae Taxa bba bba a,b Characters Fig. 3.8.?Polarity decisions for TS 2, analysis of the Sidae (Example 3.1). CHARACTER ARGUMENTATION AND CODING 31 One last thing. Each of these decisions is made using a single transformation series at a time. This does not mean that equivocal decisions based on single characters taken will remain equivocal at the end of the analysis. The final disposition of character states is subject to overall parsimony rules. RULES OF THUMB Maddison et al. (1984) present two rules of analysis that can be used when sister group relationships are known. These rules will help you bypass some of the argumen- tation for each node of the tree. Rule 1: The Doublet Rule.?If the sister group and the first two consecutive outgroups have the same character, then that character is decisive for the OG node. Any two consecutive outgroups with the same character are called a doublet. Rule 2: The Alternating Sister Group Rule.?If characters are alternating down the tree, and if the last outgroup has the same character as the sister group, then the character will be decisive for the OG node. If the last outgroup has a different character, then the character decision will be equivocal. OTHER SITUATIONS Maddison et al. (1984) also discuss situations in which the relationships among the outgroups are either not resolved or only partly resolved. After you have finished this workbook, you should review their discussion on these topics. We will only mention two important observations. (1) Whatever the resolution of the outgroup relationships, the sister group is always dominant in her influence on the decision. If the sister group is decisive for a particular state, e.g., "a," no topology of outgroups farther down the tree can result in a decisive "b." (2) If you are faced with no sister group but only an unresolved polytomy below the group you are working on, the frequency of a particular character among the outgroups in the polytomy has no effect on the decision for the OG node. For example, you could have 10 possible sister groups with character "a" and one with character "b," and the decision would still be equivocal at the OG node. Thus, common is not the same as plesiomorphic, even among outgroups. Quick Quiz?Outgroups and Polarities 1. Halfway through your phylogenetic study of the saber-toothed cnidaria, your inquiry suffers a fate worse than death. The supposed world's expert, Professor Fenitico, publishes an arrangement lumping your group with its sister group, placing them both in the same genus. How does this affect your analysis? 2. What happens if all the members of the ingroup have a character not found in the sister group or any other outgroup? 32 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 POLARITY EXERCISES For each of the trees and tables below, determine the state to be assigned to the OG node for each of the transformation series. Show the work for each (labeled) node in the outgroup. Prepare a matrix of your decisions using the labeled nodes as taxa. State your decision as equivocal or decisive. The monophyly of the ingroup and the relationships among the ingroup and the outgroups are assumed in each exercise. (No information in the data matrices is relevant to solving the tree shown in each exercise.) EXERCISE 3.1.?Determine the character assignment for TS 3 and 4 from Table 3.1 for Example 3.1. In TS 4, treat the polymorphic character in taxon N exactly like you would treat an equivocal decision at a node. EXERCISE 3.2.?Use Table 3.2 and the tree in Fig. 3.9. Table 3.2.?Data matrix for Exercise 3.2. Transformation series Taxon 1 2 3 4 5 6 A a a a a a a B a a a b a b C b a a a b a D b b a b b b E b b b a a b IG a,b a,b a,b a,b a.b a,b ABODE IG Fig. 3.9.?Tree for Exercise 3.2. CHARACTER ARGUMENTATION AND CODING EXERCISE 3.3.?Use Table 3.3 and the tree in Fig. 3.10. Table 3.3.?Data matrix for Exercise 3.3. Transformation series Taxon 1 2 3 4 5 6 A a b a a a a B a a b a a a C b b a b a b D b b b a b a E b b a a b b IG a,b a,b a,b a,b a.b a.b ABODE IG Fig. 3.10.?Tree for Exercise 3.3. EXERCISE 3.4.?Use Table 3.4 and the tree in Fig. 3.11. Table 3.4.?Data matrix for Exercise 3.4. Transformation series Taxon 1 2 3 4 5 6 A a a a a a a B a b b b b b C a b a b b b D b b a b b b E b b b a a b F a b a b a a G a a a a a a IG a,b a,b a,b a,b a.b a.b 34 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 B D G IG Fig. 3.11.?Tree for Exercise 3.4. EXERCISE 3.5.?Use Table 3.5 and the tree in Fig. 3.12. Table 3.5.?Data matrix for Exercise 3.5. Transformation series Taxon 1 2 3 4 5 6 A a a a a a a B a b a b a a C b a a a b b D b a a b b b E a a a b b b F a a b b b b G b b b b b b H b b b a b b I b b a b b b J a b a a a b K a b a b a a IG a,b a,b a,b a,b a,b a,b CHARACTER CODING As you learned in Chapter 1, a character is a feature of an organism. A character code is a numerical or alphabetical symbol that represents a particular character. We have already used codes in our previous exercises. By using these characters and their codes, you have learned something about the basics of tree reconstruction using classical Hennig argumentation and some of the approaches to determining the polarity of characters through character argumentation. You already know something about differ- ent kinds of characters, homologies, analogies, and homoplasies. In this section, you CHARACTER ARGUMENTATION AND CODING 35 B D H K IG Fig. 3.12.?Tree for Exercise 3.5. will be introduced to some of the different kinds of derived characters encountered in phylogenetic research and some of the problems associated with assigning codes to these characters. Before you begin, it might be useful to reread the sections in Chapter 1 about the attributes of specimens. (You should be aware that some investigators refer to transformation series as "characters" and characters as "character states." It is usually quite clear what is being discussed, but this is a potential source of confusion.) All of the derived characters we have dealt with up to this point are 1) qualitative characters and 2) part of binary transformation series. A binary transformation series consists of a plesiomorphy and its single derived homologue. By convention, the plesiomorphy is coded "0" and the derived homologue is coded "1." As we mentioned in Chapter 1, such binary transformation series are already ordered by virtue of the fact that they are binary. When an investigator works on a large group, or even a small group that has undergone considerable evolution, she may find that there are several different homologous characters in a transformation series. For example, if she were researching the phylogenetic relationships of fossil and Recent horses, the transformation series containing the characters for the number of toes of the hind foot would contain four different but related characters: four toes, three toes, and one toe in the ingroup and five toes in the outgroups. This kind of transformation series is termed a multistate transformation series. A multistate transformation series contains a plesiomorphic character and two or more apomorphic characters. Simple binary transformation series present no problem in coding. The investigator codes by outgroup argumentation, according to the information available, producing a matrix full of 0 and 1 values. You have practiced this kind of coding in Chapter 2. Complications arise if there are one or more polymorphic taxa, i.e., taxa with both the plesiomorphic and apomorphic characters. The problem is only critical when both types 36 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 of characters are found in a single species. In such cases, the taxon can be treated as having both characters, a designation easily handled by available computer programs. Multistate transformation series can be grouped in two ways: 1) according to what we know of their evolution and 2) according to the way they are related. Ordered transformation series are those in which the relationship of characters within the transformation series are specified (and presumably "known"). Binary transformation series are the simplest case of ordered transformation series. A polarized ordered transformation series is one in which not only the relationships are specified, but also the direction of evolution. Unordered transformation series are those where the relationships of characters to one another are not specified. This does not mean that we know nothing about the transformation series. Frequently, we know which of the characters is plesiomorphic; we just don't know the order of the derived transformation series. Such a transformation series is partly polarized. If you are using a computer program such as PAUP or MacClade, the program can tell which of the unordered characters is the plesiomorphic one when you root the tree at a specific point or with a specific taxon. The program relies on you, however, to specify this information cor- rectly. The second way of grouping multistate transformation series is by their relationships. Polarized transformation series may come in several varieties. In the simplest case, the characters might be related in a linear fashion. A linear transformation series consists of characters related to one another in a straight-line fashion such that there are no branches on the character tree (Fig. 3.13). The relationships of these characters can be termed a character tree. It is important to understand that a character phylogeny is not the same as a phylogeny of taxa. A character tree contains only information about the relationships among characters; the distribution of these characters among taxa is shown in descriptions, diagnoses, and character matrices. OG, A B, C, D E, F G, H Fig. 3.13.?A simple linear character tree of four characters. Letters represent taxa in which each character is found. Linear transformation series present no problems in coding; one simply assigns a value to each character in ascending order. Each value is placed in the data matrix in a single column, and each apomorphy contributes to the length of the tree in an additive fashion. We use the term additive because each instance of evolution is one step along the tree, and counting all of the steps in a straight line shows exactly how much the transformation series has added to the overall tree length. (Such transformation series are often termed additive multistate characters.) CHARACTER ARGUMENTATION AND CODING 37 A branching transformation series contains characters that are not related to each other in a straight-line fashion (Fig. 3.14). Such transformation series may present problems because the relationships among the characters are represented by a branch- ing pattern rather than a straight-line pattern. Because of this, the characters cannot be coded in an additive fashion. Such transformation series are also called nonadditive or complex transformation series. Because the characters are not related in a linear fashion, simple additive coding will result in errors in translating the transformation series into a phylogenetic tree. C F H E,G B, D y\ OG,A Fig. 3.14.?A complex branching character tree of six characters. Letters represent taxa in which each character is found. We present two examples of character coding using three techniques. You will probably ask where we came up with the character trees of the two transformation series. This is a good question and one we will return to in a later section. For now, we are only concerned with the formalities of coding and not how one actually determines the character trees. Example 3.2.?A simple linear transformation series. In Fig. 3.13, we show a simple linear transformation series of four characters. Below the character tree is an account of the distribution of these characters among nine taxa. The data matrix for this transformation series can be constructed in one of two basic ways. First, we can simply code the transformation series in a linear fashion, assigning a value to each character based on its place in the character phylogeny. We have chosen to code with values ranging from 0 to 3 (Table 3.6). 38 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 Table 3.6.?Data matrix for a simple linear transformation series coded by the linear and the additive binary methods (Example 3.2). Additive binary coding* Taxon coding C + C/C + C/D C/C : + C/D C/D 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 2 1 0 2 1 0 3 1 1 3 1 1 OG A B C D E F G H * Column heads are apomorphic (1) characters. C = circle; C/C = circle/circle; C/D = circle/dot. We could also use additive binary coding, which is a method that breaks the character down into a number of subcharacters, each represented by its own column of information. For example, because the characters in the transformation series really consist of subsets of related characters, we can consider both "circle/circle" and "circle/ dot" as subsets of "circle" because each is derived from "circle" later in the character tree. The first additive binary column in Table 3.6 reflects this fact, coding "oval" as the plesiomorphic character (0) and "circle" plus all of its descendants as apomorphic (1). "Circle/dot" is a subset of "circle/circle." Both "oval" and "circle" are plesiomorphic relative to "circle/circle" so they get a coding of "0," whereas "circle/circle" and its descendant "circle/dot" get a coding of "1." Finally, "circle/dot" is apomorphic relative to "circle/circle" (and "oval" and "circle") so it is coded "1" in the third column. So in total, we have produced three columns to represent the transformation series. Now, go along the rows and add up all of the l's in the additive binary matrix and put them into a single column. You should find that you have replicated the original linear transforma- tion series of 0-1-2-3. Either method of coding produces exactly the same phylogeny. But, there are some differences. If you use binary coding, you must keep in mind that formal computer algorithms and the programs that use them cannot tell the difference between three noncorrelated and independent transformation series and a single binary coded transformation series. This doesn't cause problems with phylogenetic analysis, but it can produce results that seem strange in biogeographic analysis and the analysis of coevolutionary patterns, which we will discuss in Chapter 7. Example 3.3.?A branching transformation series. Let us look at the branching transformation series in Fig. 3.14. The taxa sharing a particular character are shown beside or above the character on the character tree. This transformation series is considerably more complex than the first one. It should be obvious that a single labeling of characters in a linear fashion would result in some misinformation. How do we show these complex relationships? There are two basic CHARACTER ARGUMENTATION AND CODING 39 methods, nonadditive binary coding and mixed coding. Because we have already seen an example of binary coding, let us turn to this method first. Review the character tree and then examine the nonadditive binary codings in Table 3.7. Note that "square" is apomorphic relative to "triangle," by outgroup comparison, and that "square" is ancestral to all other characters in the character tree. Our first binary column reflects this fact: "square" and all of its descendants are coded "1," whereas "triangle" is coded "0." "Square/square" is derived from "square." "Rectangle" is also directly derived from "square." Look at "square/square." It is only found in taxon C. We produce a new column reflecting this fact. In this column "square/square" acts like an autapomorphy (which it is). "Rectangle" does not act as an autapomorphy; it is plesiomorphic to two other characters. This fact is used to code "rectangle" in a similar manner to the way we coded "square," as shown in the third column. Since both "rectangle/triangle" and "rectangle/dot" are unique to their respective taxa, F and H, we code them in a manner similar to "square/square." Now, you are able to reconstruct the phylogeny of the group AH using the two character phylogenies. Table 3.7.?Data matrix for a branching transformation series coded by the nonadditive binary and the mixed methods (Example 3.3). Nonadditive binary coding* Mixed coding* Taxon All except T S/S R+ R/D R/T T+S + R +R/D S/S R/T OG 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 B 1 0 0 0 0 1 0 0 C 1 1 0 0 0 1 1 0 D 1 0 0 0 0 1 0 0 E 1 0 1 0 0 2 0 0 F 1 0 1 1 0 2 0 1 G 1 0 1 0 0 2 0 0 H 1 0 1 0 1 3 0 0 * Column heads are apomorphic (>1) characters. T = triangle; S = square; S/S = square/square; R+ = rectangle and all descendants; R/D = rectangle/dot; R/T = rectangle/triangle. Mixed coding is a hybrid between additive binary coding and linear coding. Mixed coding has also been termed nonredundant linear coding. By convention, the longest straight-line branch of the character tree is coded in a linear fashion. Branches off this linear tree are coded in an additive binary fashion. This strategy might save character columns, depending on the asymmetry of the character tree. We can code the section of the character tree that goes "triangle," "square," "rectangle," "rectangle/dot" in a single column (0-1-2-3) in the first column of Mixed coding in Table 3.7. (How do you know to use "rectangle/dot" as the fourth character in the transformation series? Actually, the choice is completely arbitrary; remember, nodes can be freely rotated. We could have just as well used "rectangle/triangle" and coded "rectangle/dot" as the autapomorphy.) A separate column is then used for "square/square" (column 2), and a final column for ' 'rectangle/triangle.'' 40 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 Basal bifurcations occur when both the outgroup and one of the ingroup taxa have the same character. In this case, assign "1" rather than "0" to the most plesiomorphic character and proceed. This coding strategy serves to "link" the columns and will not add steps to the tree. Note on character coding. Newer computer algorithms such as PAUP 3.0 can use a character tree directly if the investigator inputs the relationships of the characters. It then uses this information to construct an additive binary matrix for analysis, which the investigator never sees (Swofford, 1990). Quick Quiz?Character Coding Your research into the systematics of the spade-lipped mugmorts is halted until you resolve the coding of a troublesome transformation series. You have identified the plesiomorphic character, but the remaining six cannot be polarized. What type of transformation series should be considered? The copulatory organ of spade-lipped mugmorts has various colors, including no color at all. The evolutionary sequence of color change is not known except that the sister group and all other outgroups have colorless organs. Although you could opt for a polarized but unordered transformation series, you opt instead for a binary transformation series for each color (e.g., no color [0] to blue [1], no color [0] to green [1], etc.). What effect will your decision have on reconstructing the tree? CODING EXERCISES For each of the trees shown below do the following: 1. Determine the possible types of coding strategies that might be used and list them. 2. Explain why certain coding strategies cannot be employed for the particular charac- ter tree. 3. Prepare a data matrix for each type of coding strategy you think could be employed. 4. Solve the phylogenetic problem with the data in the matrix. EXERCISE 3.6.?Use Fig. 3.15. I >? m >? n >- o OG A B C Fig. 3.15.?A character tree for four characters. Capital letters represent taxa in which each character (lowercase letters) is found. We use letters rather than numbers to emphasize the difference between a character and a character coding. CHARACTER ARGUMENTATION AND CODING 41 EXERCISE 3.7.?Use the character tree in Fig. 3.16. I OG Fig. 3.16.?A character tree for eight characters. Capital letters represent taxa in which each character (lowercase letters) is found. EXERCISE 3.8.?Use the character tree in Fig. 3.17. m OG Fig. 3.17.?A character tree for nine characters. Capital letters represent taxa in which each character (lowercase letters) is found. 42 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 CHAPTER NOTES AND REFERENCES 1. Discussions of homology, different kinds of characters, and basic character argu- mentation can be found in Wiley (1981a). However, this reference is outdated when it comes to outgroup comparison and contains no information of use on character coding and other "modern" issues. 2. There is a lot of information on outgroups and outgroup comparisons. Maddison et al. (1984) was preceded by Watrous and Wheeler (1981) and a criticism of them by Farris (1982). Wiley (1987b) contains a summary of the three papers. The discussion by Crisci and Stuessy (1980) is, in our opinion, positively misleading and should be avoided. Donoghue and Cantino (1980) discuss one method of outgroup comparison, the outgroup substitution method, that can be useful when relationships among outgroups are problematic. 3. Those who have read some phylogenetic literature will note that we have avoided, until now, any mention of other criteria. Wiley (1981a) discusses several other criteria. Other useful discussions can be found in de Jong (1980) and Stevens (1980). The major bone of contention is what is known as the ontogenetic criterion. Some, such as Nelson (1978, 1985), Patterson (1982), Rieppel (1985), and Weston (1988), advocate the ontogenetic criterion as a (or the) major criterion for determining polarity. We do not think this is a general criterion (see Brooks and Wiley, 1985; Kluge, 1985, 1988a; O'Grady, 1985; Kluge and Strauss, 1986), but we recognize that it can be used to both check hypotheses of homology (cf. Hennig, 1966; Wiley, 1981a; Patterson, 1982; Kluge, 1988a) and infer polarity under certain assumptions. Before you employ this criterion, you should read Mabee (1989). 4. Some papers of interest on character coding include Farris et al. (1970), Mickevich (1982), O'Grady and Deets (1987), Pimentel and Riggins (1987), and O'Grady et al. (1989). Complications arise if one or more of the taxa have both the plesiomorphic and apomorphic character. Polymorphic taxa have both the plesiomorphic and apomorphic characters. Actually, the problem is only critical when both are found in a single species. Considerable controversy surrounds the coding of such characters, especially when biochemical characters are used. There are two ways of handling such characters: 1) coding the taxon as having the apomorphy only and discounting the plesiomorphy or coding both characters as present and using a computer program such as PAUP that can handle polymorphic data cells (qualitative coding) or, 2) coding according to frequency of each character. Swofford and Berlocher (1987) present a strong case for analysis of frequency data and suggest computational methods for accomplishing this within a phylogenetic analysis. D. L. Swofford (pers. comm.) has authored a computer program (FREQPARS) to accomplish this task. Buth (1984) is an excellent introduction to the use of electrophoretic characters. CHARACTER ARGUMENTATION AND CODING 43 QUICK QUIZ ANSWERS Outgroups and Polarities 1. This question has no simple answer. We suggest the following. Examine the paper. Has Professor Fenitico provided synapomorphies to support his argument? If not, then he has only produced another arrangement and not a scientific hypothesis you can evaluate, so you should proceed with your problem as if nothing had been published. If he does provide synapomorphies, what is the nature of these characters? Do they demonstrate that the sister group is monophyletic? Is the sister group still the sister group, even if it is now in the same genus? If so, then the nature of our character argumentation has not changed, only the taxonomy, which might be very important to Professor Fenitico but should not be important to you. However, if Professor Fenitico has demonstrated that the supposed sister group is really embedded within your group, then take this into consideration, redesign your arguments, and write Professor Fenitico to tell him that names really don't mean anything, especially his. 2. If this character is really unique to the ingroup, then it is a synapomorphy of the members of the group (or, if you wish, an autapomorphy of the group). Character Coding 1. You can opt for an unordered transformation series or you can try coding six binary ordered series. If you pick the binary series, check answer 2. 2. Because "no color" is symplesiomorphic, repeated use of this character in different transformation series will result in an answer that has no bearing on the relationships among the taxa. The effect is to render all of the color characters autapomorphic, which implies that all are independently derived from "no color." Better see answer 1 and opt for an unordered transformation series. 44 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 TREE BUILDING AND OPTIMIZATION 45 CHAPTER 4 TREE BUILDING AND OPTIMIZATION Phylogeneticists frequently describe their work in terms of "building trees" or "reconstructing phylogenies." These activities are directed towards attempts to dis- cover something we believe exists in nature, the common ancestry relationships among the organisms we study. Interestingly, modern computer programs do not spend much computing time building trees. Rather, most of the time is spent evaluating different tree topologies (branching patterns) in an effort to find the tree that meets a criterion of optimal!ty given the data. How the tree is actually generated may be irrelevant. For example, you can evaluate all of the possible trees for a three taxon problem by simply mapping the character distributions on the four possible trees in the most efficient manner (i.e., maximizing the number of synapomorphies and minimizing the number of homoplasies needed given the tree). You don't have to build a tree, all the possible trees are given. Under the criterion that the shortest tree is the optimal tree, all you have to do is count the changes and pick the shortest tree among the four possibilities. Of course, as the number of taxa increases, the number of possible trees increases very quickly (see Chapter 6). Because phylogenetic methods were originally built around constructing trees, many of the classic works emphasize reconstruction, and they do so using methods such as Hennig argumentation (Hennig, 1966) and the Wagner algorithm (Kluge and Farris, 1969). Although many "modern" phylogeneticists will never use classic Hennig argumentation and algorithms such as the Wagner algorithm provide only the starting point for some (not even all) modern computer programs, it is important for you to get the feel of these approaches because they give insight into the nature of phylogenetic data and help you understand how previous investigators arrived at their conclusions. Thus, we have organized this chapter in a quasi-historical fashion. We begin with Hennig argumentation, Hennig's own method for reconstructing phylogenies. We then use the Wagner algorithm to teach the rudi- ments of a formal algorithm and how such an algorithm might be implemented and provide some basic terms encountered in more modern methods. We then discuss the concepts of the optimal tree, optimal character distribution, various parsimony criteria, and ACCTRAN and DELTRAN optimization. Finally, we provide a very brief discus- sion of how current algorithms operate to produce optimal or near optimal trees. HENNIG ARGUMENTATION You already have had practice at performing analyses using Hennig argumentation in Chapter 2. However, you did it in a rather laborious way, using the inclusion/exclusion principle. Hennig argumentation was the original phylogenetic algorithm, and its application is still common. For simple problems, Hennig argumentation presents no 46 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 technical problems. However, even with relatively few taxa and characters, you will find it much too tedious to make all of those single-character trees and all of the logically incompatible alternative trees. In computer implementations of phylogenetics, the computer performs this boring task. An investigator working without a computer has two problems. First, she would never want to make all the inclusion/exclusion trees. Second, with even a small amount of homoplasy the investigator runs the chance of missing equally parsimonious solutions to the tree that she finds, not because the algorithm is defective but because the human mind rejects the ennui of considering all possible alternative trees. A really experienced phylogeneticist instead will "inspect" the data matrix and produce a first tree based on this general inspection, filtering through the data in her mind. We would like you to try this on the exercises below. One strategy, for example, would be to circle potential groupings within the data matrix. Another strategy is to start with "obvious groups" (=lots of synapomorphies) and then attempt to link them together. Although this might seem rather imprecise to you, remember that it is similar to the method Hennig himself probably used. Remember, no tree you draw has to be the final tree. All trees are hypotheses. Hennig Exercises EXERCISE 4.1.?Bremer's (1978) Leysera data. Leysera is a small genus of composite shrublets found in southern Africa (three species) and the Mediterranean region (one species). The closest relatives of Leysera appear to be the genera Athrixia, Rosenia, and Relhandia. The phylogenetic relation- ships of these genera are as follows. Leysera, Rosenia, and Relhandia form a tri- chotomy. Athrixia is the sister of these three genera. Leysera is monophyletic based on two characters: 1) the chromosome number is 2N = 8 and 2) all have a solitary capitula on a long peduncle. The distribution of characters among the four species of Leysera is given in Table 4.1. Table 4.1.?Leysera ingroup characters. Floret Pappus Achene Pappus Life Taxon Receptacle tubules type surface scales cycle L. longipes smooth with glands barbellate smooth subulate perennial L. leyseroides rough with hairs plumose rough wide, flat annual L. tennella rough with hairs plumose rough wide, flat annual L. gnaphalodes rough with hairs plumose rough subulate perennial Based on outgroup information, the following characters are plesiomorphic: 1) receptacles smooth, 2) hairs absent on the floret tubules, 3) barbellate pappus, 4) achene surface smooth, 5) pappus scales subulate, and 6) perennial life cycle. TREE BUILDING AND OPTIMIZATION 47 1. Prepare a data matrix. 2. Analyze the phylogenetic relationships of Ley sera based on the information given and draw the tree of relationships. EXERCISE 4.2.?Siegel-Causey's (1988) cliff shags. Shags, cormorants, and anhingas comprise a clade of marine and littoral fish-eating birds. Among shags, the genus Stictocarbo (cliff shags) comprises eight species. In this exercise, you will use both outgroup information and ingroup information to recon- struct the relationships among six species of Stictocarbo. The seventh species (S. magellanicus) is used as the sister group, and species in other genera provide additional outgroups. 1. Using the tree in Fig. 4.1 and the characters in Table 4.2, determine the character at the outgroup node for each transformation series and arrange this as a character vector labled "OG." ^ r& *> Sr & P ^ af & ^ rjp & fr # S* ^ t t t f 04 ^ $ 0 ^ & .$ $ <0 ^ ^ (f ?/ ^ Goodeidae Fig. 6.4.?Three possible phylogenetic trees of goodeids derived from the relationships implied by the classification in Fig. 6.3. Working out derivative classifications in tree form is easy when the number of possible derivatives is small. The method becomes cumbersome, however, when the number of possible derivative trees is large. Fortunately, there is an alternative method for checking the logical consistency of a classification with a particular phylogeny, the use of Venn diagrams. In Fig. 6.5, we have converted the phylogeny (Fig. 6.5a), the first 96 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 classification (Fig. 6.5b), and the second classification (Fig. 6.5c) into Venn diagrams. We then "layer" all three diagrams and check for overlap between ellipses, which is equivalent to checking for overlapping sets in set theory. If there are no overlaps, the classification(s) is logically consistent; if there is overlap, the classification(s) is logi- cally inconsistent. In this case there is no overlap, so both classifications are logically consistent with the phylogeny (Fig. 6.5d). a [Empethchthys) (Crenichthys j Empetrichyinae c Goodeinae Goodeidae ) [Empetrichthys\ CCrenichthys j C Goodeinae J Goodeidae Empethchthys Crenichthys =6 Goodeinae Fig. 6.5.?Venn diagrams of goodeids. a. The goodeid phylogeny. b, c. Two different classifications, d. The result of combining a, b, and c into a single diagram. CLASSIFICATION 97 You can easily see that a classification can be logically consistent with a phylogeny without being fully informative about the common ancestry relationships implied in the phylogeny. In fact, the only way to have a classification that is logically inconsistent with the phylogeny is to have a classification for which we can derive no tree that is the original phylogeny. More formally, a classification is logically inconsistent with a phylogeny if no derivative of that classification is the original phylogeny. In terms of our Venn diagrams, a classification is logically inconsistent with the phylogeny if its Venn diagrams have overlapping ellipses. In other words, they are logically inconsistent if they violate the inclusion/exclusion rule. We may be making it sound like most classifications are logically consistent with the phylogeny you are likely to generate with your analysis. Nothing could be further from the truth. You will find that most classifications are logically inconsistent with your hypotheses of common ancestry. Why? Because most existing classifications contain paraphyletic and even polyphyletic groups. Let's look at the effects of the inclusion of a paraphyletic group in the correspondence of classifications to phylogenies. Example 6.2.?The very distinctive Cus. Investigator Smith has performed a phylogenetic analysis on the genera comprising the family Cidae. She has arrived at the phylogenetic hypothesis shown in Fig. 6.6. This family was well known to previous investigators. What struck these investigators was how different members of the genus Cus were from other members of the family. This distinctiveness was embodied in the traditional classification. Family Cidae Subfamily Ainae Genus Aus Genus Bus Subfamily Cinae Genus Cus Smith wants to know if the current classification is logically consistent with her phylogenetic hypothesis. To do so, she must perform the following steps. 1. She prepares a Venn diagram for the classification and another Venn diagram for the phylogeny. 2. She layers the Venn diagram of the classification over the Venn diagram of the phylogeny. 3. If ellipses do not overlap, then she knows that the classification is logically consistent with the phylogeny. If one or more ellipses overlap, then the classification is logically inconsistent with the phylogeny. KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 Aus Bus Cus Fig. 6.6.?The phylogeny of the family Cidae (Example 6.2). The phylogeny and the relative amounts of change along each branch are taken as "true." Therefore, Smith does the following for the Cidae. 1. The classification is converted to a classification tree (Fig. 6.7a) and then into a Venn diagram (Fig. 6.7c). (As you gain experience, you can draw the Venn diagram directly from the classification.) The phylogeny (Fig. 6.7b) is then converted into a Venn diagram (Fig. 6.7d). Aus Bus Cus Aus Bus Cus ABC ABC c Aus Bus ) o o c Bus Cus ) f Aus (Bus") Cus j Fig. 6.7.?The Cidae classification, a. Tree form. b. The phylogeny. c. Venn diagram of a. d. Venn diagram of b. e. Venn diagram combining c and d. CLASSIFICATION 99 2. The Venn diagrams are layered (Fig. 6.7e). 3. The Venn diagram of the phylogeny (Fig. 6.7d) and the Venn diagram of the classification (Fig. 6.7c) overlap (Fig. 6.7e); Bus is a member of the group BC in the Venn diagram of the phylogeny, and it is a member of the group AB in the Venn diagram of the classification. Therefore, the classification is logically inconsistent with the phylogeny. Smith, realizing that she cannot tolerate a classification that is logically inconsistent with the evolution of the Cidae, creates the following classification. Family Cidae Subfamily Ainae Genus Aus Subfamily Cinae Genus Bus Genus Cus Determining the Number of Derivative Classifications Although Venn diagrams are the most direct route to determining whether a classification is logically consistent with a phylogeny, you might also wish to calculate the number of derivative classification trees for a particular classification. The number of alternative classifications that can be derived from a particular classification is directly related to the number of polytomies in the classification's branching structure. Felsenstein (1977) presents tables to determine the number of tree topologies that can be derived from a basic tree with multifurcations (polytomies). We have reproduced parts of one of these tables as Table 6.1. Note that the numbers refer only to the number of terminal taxa. Other tables must be consulted if ancestors are included. Example 6.3.?Classification of the hypothetical Xaceae. The Xaceae is classified into three major subgroups, as shown in the classification Table 6.1.?The total number of possible derivative trees for polytomies of n branches. Internodes cannot be occupied by "ancestors" (from Felsenstein, 1977). n All trees Dichotomous trees 3 4 3 4 26 15 5 236 105 6 2752 945 7 39,208 10,395 8 660,032 135,135 9 12,818,912 2,027,025 10 282,137,824 34,459,425 100 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 Xus Yus Zus Fig. 6.8.?The classification tree of the Xaceae (Example 6.3). tree in Fig. 6.8. We calculate the number of trees that can be derived from this classification in the following manner. 1. Determine if the phylogeny to be compared is dichotomous or contains polytomies. In this example, we assume that it is a dichotomous phylogeny. 2. Select the appropriate column in Table 6.1. In this case, we use the column on the right (dichotomous trees). 3. Determine the number of branches for each polytomy. 4. Using Table 6.1, find the number of trees possible for each polytomy. Multiply all of the values obtained in step 3 together to obtain the total number of derivative classification trees: (3)(3)(15)=135. Thus, there are 135 possible dichotomous trees that can be derived from our classification of Xaceae. Classification Evaluation Exercises Each exercise consists of one or more classifications and a phylogenetic tree. You are asked to 1) convert the classification into tree form, 2) state the number of possible derivative trees that can be obtained, and 3) evaluate the classification in terms of its consistency with the phylogeny. EXERCISE 6.1.?Phylogeny of the Recent tetrapod vertebrates. For classifications, see Fig. 6.9. For the phylogenetic tree, see Fig. 6.10. CLASSIFICATION 101 a Lissamphibia Reptilia Chelonia Lepidosauria Crocodylia Aves Mammalia Lissamphibia Mammalia Chelonia Lepidosauria Crocodylia Aves C Tetrapoda Lissamphibia Amniota Mammalia Reptilia Chelonia Sauna Lepidosauria Archosauria Crocodylia Aves Fig. 6.9.?Three classifications of Recent tetrapod vertebrates (Exercise 6.1). Feathers Long hard palate Pneumatic spaces in the -- skull Hooked 5th metatarsal Diapsid skull "Right-handed" circulatory system Pedicellate teeth Amniote egg -- Tetrapod limbs Fig. 6.10.?A phylogenetic hypothesis of Recent tetrapod relationships (Exercise 6.1). 102 KU MUSEUM OF NATURAL HISTORY, SPECIAL PUBLICATION No. 19 EXERCISE 6.2.?Phylogeny of the land plants. Use the following classification. Division Bryophyta Class Anthoceropsida Class Marchantiopsida Class Bryopsida Division Tracheophyta Subdivision Psilotophytina Subdivision Lycopodophytina Subdivision Sphenophytina Subdivision Pteridophytina Subdivision Spermatophytina Class Cycadopsida Class Pinopsida Class Ginkgopsida Class Gnetopsida Class Angiospermopsida For the phylogenetic tree, see Fig. 6.11. CONSTRUCTING PHYLOGENETIC CLASSIFICATIONS There are two basic ways to construct phylogenetic classifications. First, one can consistently place sister groups in the classification at the same rank or rank equivalent. In such a classification, every hypothesized monophyletic group is named. If this manner of classifying is adopted, rank within a restricted part of the classification denotes relative time of origin. Second, one can adopt a set of conventions designed to reflect the branching sequence exactly but not require that every monophyletic group be named. We consider it beyond the scope of this workbook to detail the controversies surrounding such topics as the suitability of Linnaean ranks versus indentation or numerical prefixes for constructing classifications (reviewed in Wiley, 1981a). Instead, we will briefly review some of the basics of phylogenetic classification and provide a summary of some of the conventions you might use in constructing your classifications. This will be followed by some exercises designed to demonstrate when certain conven- tions might be used. Rules of Phylogenetic Classifications Rule 1.?Only monophyletic groups will be formally classified. Rule 2.?All classifications will be logically consistent with the phylogenetic hypoth- esis accepted by the investigator. Rule 3.?Regardless of the conventions used, each classification must be capable of expressing the sister group relationships among the taxa classified. CLASSIFICATION 103 f ^<^ ^ ^p