Research A nuclear phylogenomic tree of grasses (Poaceae) recovers current classification despite gene tree incongruence Grass Phylogeny Working Group III* Summary Authors for correspondence:  Grasses (Poaceae) comprise c. 11 800 species and are central to human livelihoods and ter- Matheus E. Bianconi restrial ecosystems. Knowing their relationships and evolutionary history is key to comparative Email: matheus-enrique.bianconi@univ- research and crop breeding. Advances in genome-scale sequencing allow for increased tlse3.fr breadth and depth of phylogenomic analyses, making it possible to infer a new reference spe- Jan Hackel cies tree of the family. Email: jan.hackel@uni-marburg.de  We inferred a comprehensive species tree of grasses by combining new and published sequences for 331 nuclear genes from genome, transcriptome, target enrichment and shot- Maria S. Vorontsova gun data. Our 1153-tip tree covers 79% of grass genera (including 21 genera sequenced for Email: m.vorontsova@kew.org the first time) and all but two small tribes. We compared it to a newly inferred 910-tip plas- tome tree. Received: 3 June 2024  We recovered most of the tribes and subfamilies previously established, despite pervasive Accepted: 10 October 2024 incongruence among nuclear gene trees. The early diversification of the PACMAD clade could represent a hard polytomy. Gene tree–species tree reconciliation suggests that reticulation New Phytologist (2024) events occurred repeatedly. Nuclear–plastome incongruence is rare, with very few cases of doi: 10.1111/nph.20263 supported conflict.  We provide a robust framework for the grass tree of life to support research on grass evolu- Key words: Angiosperms353, genome, tion, including modes of reticulation, and genetic diversity for sustainable agriculture. incongruence, phylogenomics, plastome, Poaceae, target capture, transcriptomics. food crops rice, maize and wheat, sources of fibre and building Introduction materials such as reed and bamboo, and biofuel crops such as With almost 11 800 species in 791 genera (Soreng et al., 2022), sugarcane and switchgrass. Much of the global land surface is grasses (Poaceae) are among the largest plant families and one of covered by grass-dominated ecosystems, where grasses impact the most important for humans. Grasses include the primary productivity, nutrient cycling and vegetation structure by *Watchara Arthan1 (0000-0002-6941-2199), William J. Baker2,3 (0000-0001-6727-1831), Matthew D. Barrett⁴ (0000-0002-2926-4291), Russell L. Barrett5,6 (0000-0003-0360-8321), Jeffrey L. Bennetzen⁷ (0000-0003-1762-8307), Guillaume Besnard⁸ (0000-0003-2275-6012), Matheus E. Bianconi9,10 (0000-0002- 1585-5947), Joanne L. Birch11 (0000-0002-8226-6085), Pilar Catalan12 (0000-0001-7793-5259), Wenli Chen13 (0000-0002-5519-811X), Maarten Christenhusz2, Pascal-Antoine Christin⁹ (0000-0001-6292-8734), Lynn G. Clark14 (0000-0001-5564-4688), J. Travis Columbus15,16 (0000-0001-6949- 0245), Charlotte A. Couch2 (0000-0002-5707-9253), Darren M. Crayn⁴ (0000-0001-6614-4216), Gerrit Davidse17, Soejatmi Dransfield2, Luke T. Dunning⁹ (0000-0002-4776-9568), Melvin R. Duvall18 (0000-0001-8143-9442), Sarah Z. Ficinski2, Amanda E. Fisher19 (0000-0002-9928-9558), Siri Fjellheim20 (0000-0003-1282-2733), Felix Forest2 (0000-0002-2004-433X), Lynn J. Gillespie21 (0000-0003-3129-434X), Jan Hackel2,22 (0000-0002-9657-5372), Tho- mas Haevermans23 (0000-0001-8934-4544), Trevor R. Hodkinson24 (0000-0003-1384-7270), Chien-Hsun Huang25,26, Weichen Huang27, Aelys M. Humphreys28,29 (0000-0002-2515-6509), Richard W. Jobson5, Canisius J. Kayombo30, Elizabeth A. Kellogg31,32 (0000-0003-1671-7447), John M. Kimeu33 (0000-0002-8641-7039), Isabel Larridon2 (0000-0003-0285-722X), Rokiman Letsara34, De-Zhu Li35 (0000-0002-4990-724X), Jing-Xia Liu35, Ximena London~o36, Quentin W. R. Luke33, Hong Ma27 (0000-0001-8717-4422), Terry D. Macfarlane37 (0000-0002-7023-9231), Olivier Maurin2 (0000-0002- 4151-6164), Michael R. McKain38 (0000-0002-9091-306X), Todd G. B. McLay39,40,41, Maria Fernanda Moreno-Aguilar12 (0000-0003-0058-1792), Daniel J. Murphy40 (0000-0002-8358-363X), Olinirina P. Nanjarisoa2, Guy E. Onjalalaina42 (0000-0001-6614-2309), Paul M. Peterson43 (0000-0001-9405-5528), Rivontsoa A. Rakotonasolo34, Jacqueline Razanatsoa34, Jeffery M. Saarela21 (0000-0003-1790-4332), Lalita Simpson⁴, Neil W. Snow44 (0000-0001-8824- 7259), Robert J. Soreng43 (0000-0002-8358-4915), Marc S. M. Sosef45 (0000-0002-6997-5813), E. John Thompson46 (0000-0001-9298-4534), Paweena Traiperm¹, G. Anthony Verboom47,48 (0000-0002-1363-9781), Maria S. Vorontsova2 (0000-0003-0899-1120), Neville G. Walsh40 (0000-0003-4671-1425), Jacob D. Washburn49 (0000-0003-0185-7105), Teera Watcharamongkol50 (0000-0002-3065-8597), Michelle Waycott51 (0000-0002-0822-0564), Cassiano A. D. Welker52 (0000-0001-6347-341X), Martin D. Xanthos2 (0000-0002-5378-8757), Nianhe Xia53 (0000-0001-9852-7393), Lin Zhang54 (0000-0001- 6476-4526), Alexander Zizka22 (0000-0002-1680-9192), Fernando O. Zuloaga55 and Alexandre R. Zuntini2 (0000-0003-0705-8902)  2024 The Author(s). New Phytologist (2024) 1 New Phytologist  2024 New Phytologist Foundation. www.newphytologist.com This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. New 2 Research Phytologist mediating fire and herbivory (Edwards et al., 2010; Bond, 2016). grass phylogeny at all taxonomic levels and assembling informa- Grasses are also overrepresented among the world’s most dama- tion from all three genomes in the cell (plastid, mitochondrial, ging agricultural weeds (Holm et al., 1977) and invasive plants and nuclear). These efforts have been punctuated by two major (Linder et al., 2018). Understanding functional diversification, phylogenetic analyses, Grass Phylogeny Working Group I adaptation and novel crop breeding in this important plant group (GPWG, 2001) and GPWG II (2012), and family-wide classifi- requires a solid understanding of its evolutionary relationships. cations (Kellogg, 2015; Soreng et al., 2022) were enabled by Efforts to uncover the phylogenetic history of grasses have these and many other detailed phylogenetic analyses. tracked the development of new technology and analytical tools, The major outlines of grass phylogeny have now been known beginning with cladistic analysis of morphology (e.g. Campbell for several decades and corroborated by accumulating data, with & Kellogg, 1987). Almost as soon as nucleotide sequencing major lineages recognised as subfamilies (Kellogg, 2015; Soreng became possible, it was used to investigate grasses (rRNA sequen- et al., 2022). The earliest divergences in the grass family gave rise cing, Hamby & Zimmer, 1988, and chloroplast DNA, Clark to three successive lineages, Anomochlooideae, Pharoideae, and et al., 1995), and the results interpreted in the light of known Puelioideae, each comprising just a few species. After the diver- morphology and classification. Hundreds of papers have been gence of those three, however, the remaining grasses gave rise to published since using nucleic acids, most recently DNA, to assess two sister lineages, known as BOP and PACMAD, each of which 1Department of Pharmaceutical Botany, Faculty of Pharmacy, Mahidol University, Bangkok, 10400, Thailand; 2Royal Botanic Gardens, Kew, Richmond, TW9 3AE, UK; 3Department of Biology, Aarhus University, Aarhus, DK-8000, Denmark; 4Australian Tropical Herbarium, James Cook University Nguma Bada Campus, McGregor Road, Smithfield, Qld, 4878, Australia; 5National Herbarium of New South Wales, Botanic Gardens of Sydney, Australian Botanic Garden, Locked Bag 6002, Mount Annan, NSW, 2567, Australia; 6Evolution and Ecology Research Centre, School of Biological, Earth, and Environmental Sciences, University of New South Wales, Sydney, Kensington, NSW, 2052, Australia; 7Department of Genetics, University of Georgia, Athens, GA 30602, USA; 8CNRS, Universite Toulouse III – Paul Sabatier, INP, IRD, UMR 5300, CRBE (Centre de Recherche sur la Biodiversite et l’Environnement), 118 Route de Narbonne, 31062, Toulouse, France; 9Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK; 10Laboratoire de Recherche en Sciences Vegetales (LRSV), Universite Toulouse III – Paul Sabatier, CNRS, Toulouse INP, 31326, Castanet-Tolosan, France; 11School of BioSciences, University of Melbourne, Parkville, Vic., 3010, Australia; 12Department of Agricultural and Environmental Sciences, High Polytechnic School of Huesca, University of Zaragoza, Cta. Cuarte km 1, 22071, Huesca, Spain; 13State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing, 100093, China; 14Department of Ecology, Evolution and Organismal Biology, Iowa State University, 345 Bessey, 2200 Osborn Dr., Ames, IA 50011-4009, USA; 15California Botanic Garden, 1500 N College Ave, Claremont, CA 91711, USA; 16Claremont Graduate University, 150 E. 10th St, Claremont, CA 91711, USA; 17Missouri Botanical Garden, 4344 Shaw Blvd, St Louis, MO 63110, USA; 18Northern Illinois University, 1425 W. Lincoln Hwy, DeKalb, IL 60115-2861, USA; 19Department of Biological Sciences, California State University, Long Beach, 1250 Bellflower Boulevard, Long Beach, CA 90840, USA; 20Department of Plant Science, Faculty of Biosciences, Norwegian Univer- sity of Life Sciences, 1430,As, Norway; 21Research and Collections, Canadian Museum of Nature, Ottawa, ON, K1P 6P4, Canada; 22Department of Biology, Philipps-Universit€at Marburg, Karl-von-Frisch-Straße 8, 35053, Marburg, Germany; 23Institut de Systematique Evolution Biodiversite (ISYEB), Museum National d’histoire Naturelle, Centre National de la Recherche Scientifique, Ecole Pratique des Hautes Etudes, Universite des Antilles, Sorbonne Universite, 45 rue Buffon, CP 50, 75005, Paris, France; 24Botany, School of Natural Sciences, Trinity College Dublin, The University of Dublin, Dublin 2, Ireland; 25State Key Laboratory of Genetic Engineering, Ministry of Education Key Laboratory of Biodiversity Sciences and Ecological Engineering, Institute of Biodiversity Sciences and Institute of Plant Biology, School of Life Sciences, Fudan University, 2005 Songhu Road, Shanghai, 200438, China; 26State Key Laboratory of Reproductive Regulation & Breeding of Grassland Livestock, Key Laboratory of Herbage & Endemic Crop Biology of Ministry of Education, Inner Mongolia University, Hohhot, 010000, China; 27Department of Biology, 510 Mueller Laboratory, The Huck Institutes of the Life Sciences, The Pennsylvania State Uni- versity, University Park, PA 16802, USA; 28Department of Ecology, Environment and Plant Sciences, Stockholm University, 104 05, Stockholm, Sweden; 29Bolin Centre for Climate Research, Stockholm University, Stockholm, 106 91, Sweden; 30Tengeru Institute of Community Development, PO Box 1006, Arusha, Tanzania; 31Donald Danforth Plant Science Center, St Louis, MO 63132, USA; 32Arnold Arboretum of Harvard University, Boston, MA 02130, USA; 33East Africa Herbarium, National Museums of Kenya, Nairobi, P.O. Box 45166-00100, Kenya; 34Parc Botanique et Zoologique de Tsimbazaza, Anta- nanarivo, Madagascar; 35Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China; 36Sociedad Colombiana del Bambu, Quindio, Colombia; 37Department of Biodiversity, Conservation and Attractions, Western Australian Herbarium, Ken- sington, WA 6152, Australia; 38Department of Biological Sciences, The University of Alabama, Tuscaloosa, AL 35487, USA; 39National Biodiversity DNA Library, CSIRO, Parkville, Vic., 3010, Australia; 40Royal Botanic Gardens Victoria, Melbourne, Vic., 3004, Australia; 41School of BioSciences, The University of Melbourne, Parkville, 3010, Vic., Australia; 42Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chi- nese Academy of Sciences, Wuhan, 430074, China; 43Department of Botany, National Museum of Natural History, Smithsonian Institution, Washington DC 20013-7012, USA; 44T.M. Sperry Herbarium, Pittsburg State University, Pittsburg, KS 66762, USA; 45Meise Botanic Garden, Nieuwelaan 38, 1860, Meise, Belgium; 46Queensland Herbarium, Brisbane Botanic Gardens, Mt Coot-tha Rd, Toowong, Qld, 4066, Australia; 47Department of Biological & Environmen- tal Sciences, University of Gothenburg, Box 463, 40530, G€oteborg, Sweden; 48Gothenburg Botanical Garden, 41319, G€oteborg, Sweden; 49USDA-ARS, 302-A Curtis Hall, University of Missouri, Columbia, MO 65211, USA; 50Faculty of Science and Technology, Kanchanaburi Rajabhat University, Kanchana- buri, Thailand; 51School of Biological Sciences, University of Adelaide and Botanic Gardens and State Herbarium, Adelaide, SA, 5000, Australia; 52Universidade Federal de Uberl̂andia, Instituto de Biologia, Uberl̂andia, Minas Gerais, Brazil; 53Key Laboratory of Plant Resources Conservation and Sustain- able Utilisation, South China Botanical Garden Chinese Academy of Sciences, Guangzhou, 510650, China; 54Chongqing Key Laboratory of Plant Resource Conservation and Germplasm Innovation, School of Life Sciences, Southwest University, Chongqing, 400715, China; 55Instituto de Botanica Darwinion (CONICET-ANCEFN), Labarden 200, Casilla de Correo 22, B1642HYD San Isidro, Buenos Aires, Argentina New Phytologist (2024)  2024 The Author(s). www.newphytologist.com New Phytologist 2024 New Phytologist Foundation. New Phytologist Research 3 became a species-rich clade with several robust subclades. This make possible a phylogeny that incorporates representatives of sturdy phylogenetic framework is reflected in a strong subfamilial most of the 791 genera of the family using genome-scale data. In classification, with subfamilies divided into equally robust tribes. the process, we will gain a broader assessment of congruence Attention in recent years has largely shifted to relationships of among nuclear gene histories, including insights on the frequency tribes, subtribes, and genera. and impact of incomplete lineage sorting (ILS) and reticulation. Reticulate evolution is common in the grasses. Allopoly- Accordingly, here we present the most comprehensive nuclear ploidy is widespread in the family, particularly among closely phylogenomic tree of the grass family to date. Via a large com- related species and genera, with as many as 80% of species munity effort, we maximised taxon sampling by combining estimated to be of recent polyploid origin (Stebbins, 1985). whole-genome, transcriptome, target capture and shotgun data- The textbook example is bread wheat (Triticum aestivum) and sets. Based on the Angiosperms353 gene set, we inferred a nuclear its ruderal annual ancestors, the history of which was deter- multigene species tree using a coalescent-based method that mined in the first part of the 20th century using cytogenetic accounts for incongruence due to ILS and uses information from tools (Kihara, 1982; Tsunewaki, 2018). Nucleotide sequence multicopy gene trees. We also inferred a plastome tree and tested data have verified the hybrid origin of wheat and gone on to for incongruence between plastome and nuclear trees. Finally, we show that reticulate evolution is the norm in the entire tribe used gene tree–species tree reconciliation analyses to explore the Triticeae (Feldman & Levy, 2023; Mason-Gamer & signal for reticulation in the nuclear data. White, 2024). We have also learned that three of the four major clades of Bambusoideae are of allopolyploid origin (Tri- Materials and Methods plett et al., 2014; Guo et al., 2019; Chalopin et al., 2021; Ma et al., 2024), as are at least one third of the species in Andro- Datasets and species sampling pogoneae (Estep et al., 2014). Large-scale lateral gene transfer has also been demonstrated in Alloteropsis semialata (Dunning Drawing from a combined effort of the Poaceae research commu- et al., 2019) and for a number of genomes across the family nity, we leveraged five diverse sets of genomic data (see full acces- (Hibdige et al., 2021), although it remains unclear how com- sion table in the data repository, doi: 10.5281/zenodo. mon such genetic exchanges are. Network-like reticulations are 10996136). We deployed a set of automated filters and repeated therefore expected throughout Poaceae. expert input from the group to remove duplicates, samples with Data relevant to grass phylogeny continue to accumulate in insufficient data, and potentially misidentified accessions. The the genomic era, but in an uneven pattern. Major recent studies final set of accessions included: have inferred family trees based on the plastid genome (Saarela (1) 450 Illumina target capture read accessions enriched with the et al., 2018; Gallaher et al., 2022; Hu et al., 2023) or large parts Angiosperms353 probe set (Johnson et al., 2019), generated as of the nuclear genome (Huang et al., 2022). In addition, a wealth part of the ‘Genomics for Australian Plants’ (GAP) and ‘Plant of full-genome assemblies is now available for grasses, mainly for and Fungal Trees of Life’ (PAFTOL; Baker et al., 2022) initia- groups that have been studied intensively, such as major crops tives as well as a project focused on Loliinae grasses (P. Catalan and their congeners including rice (Wang & Han, 2022), maize et al., unpublished data). Sampling focused on genera without (Hufford et al., 2021), wheat (Walkowiak et al., 2020) and sugar- existing nuclear or plastome genomic data. cane (Healey et al., 2024), among many others. At the same time, (2) 295 Illumina shotgun, whole-genome sequencing accessions, some genera and many species remain virtually unknown beyond of which 204 are ‘genome skims’ with a sequencing depth < 59 a scientific name and general morphology. While the poorly estimated for our target gene set (to be described later). Of these known taxa may be represented in major herbaria, fresh material shotgun accessions, many had been used in previous studies for can be hard to obtain, weakening attempts to fully sample the the assembly of plastid genomes (see accession table). grass tree of life with phylogenomic technologies. (3) 17 Illumina target capture read accessions enriched in 122 Fortunately, we are now experiencing the confluence of: (1) nuclear loci (different from Angiosperms353) that were pre- global sources of diversity data including plant specimens held in viously used in a phylogenetic study of the subfamily Chloridoi- herbaria world-wide, (2) widespread use of short-read sequencing deae (Fisher et al., 2016). These are treated here like the shotgun that can accommodate even fragmented DNA, (3) analytical datasets. tools for assembling and interpreting massive amounts of (4) 343 assembled transcriptomes from two recent Poaceae stu- sequence data, and (4) technical tools for efficient sequencing, dies (331 samples; Huang et al., 2022; Zhang et al., 2022) and such as target capture. For example, the development of a univer- the 1KP initiative (12 samples, One Thousand Plant Transcrip- sal probe set for flowering plants, Angiosperms353 (Johnson tomes Initiative, 2019). et al., 2019; Baker et al., 2021), has enabled initiatives to (5) 48 assembled and annotated genome sequences from PHYTO- sequence all angiosperm plant genera (Baker et al., 2022; Zuntini ZOME v.13, Ensembl Plants, or other sources. et al., 2024) or entire continental floras such as that of Australia Angiosperms353 target capture data were generated by the (https://www.genomicsforaustralianplants.com/). It became PAFTOL project following the protocols of Baker et al. (2022). apparent that an updated synthesis of existing and new data for Methods varied for the other contributed datasets (details in grasses, similar to the previous Grass Phylogeny Working Group accession table and Supporting Information Methods S1). Leaves efforts (GPWG, 2001; GPWG II, 2012), would be timely and were sampled mostly from herbarium specimens, although silica  2024 The Author(s). New Phytologist (2024) New Phytologist 2024 New Phytologist Foundation. www.newphytologist.com New 4 Research Phytologist dried material was used in some cases. Sampling was iteratively Angiosperms353 sequence assembly refined using expert input from the working group to remove accessions with unclear identity and duplicates per species (retain- The orthogroup dataset was used as a reference for sequence ing the highest-coverage accession, that is genome > transcrip- assembly using HYBPIPER v.1.3.1 (Johnson et al., 2016). Illumina tome > target capture > shotgun). Species names were reads were initially trimmed using TRIMMOMATIC v.0.38 (Bolger harmonised using the World Checklist of Vascular Plants et al., 2014) to remove adapters, low-quality bases and short reads (Govaerts et al., 2021) as well as expertise from our working (SLIDINGWINDOW:4:20, MINLEN:40). Sequences were group. assembled using the Burrows-Wheeler Alignment tool (BWA, Li & Durbin, 2009) with default parameters, except the coverage cut-off level, which was reduced to 49 for the target capture Grass-specific Angiosperms353 reference dataset datasets, and to 19 for shotgun accessions due to the Before sequence assembly from target capture and shotgun low-sequencing depth of a subset of samples. Given the low num- datasets, we produced a Poaceae-specific set of reference Angios- ber of markers recovered for most shotgun accessions, we used a perms353 sequences to improve recovery and account for custom assembly strategy optimised for the assembly of sequences grass-wide gene duplications. This grass-specific reference data- from low-coverage datasets (explained below). When a sequence set consists of coding sequences (CDS) extracted from published was assembled by both HybPiper and the custom method, only genomes and transcriptomes of 60 species, representing seven of the longest assembly was retained. the 12 grass subfamilies and including an available genome The custom assembly strategy consisted of a mapping- sequence from the sister group Ecdeiocoleaceae–Joinvilleaceae consensus pipeline modified from Olofsson et al. (2019) and (Joinvillea ascendens Gaudich. ex Brongn. & Gris). First, CDS Bianconi et al. (2020) to support the assembly of paralogs of the Angiosperms353 homologs were extracted from the refer- (Fig. S1). First, filtered reads were mapped to the orthogroup ence genomes and transcriptomes using the tblastn tool of reference dataset using BOWTIE2 v.2.5.3 (Langmead & Salz- BLAST+ v.2.2.29 (Camacho et al., 2009), with the original berg, 2012) with the sensitive-local mode and reporting all align- Angiosperms353 probe set used as protein queries (e-value ments. Then, for each orthogroup, the reference sequence with ≤ 103). To reduce false positives, only hits with alignments the most bases covered was identified and included along with its > 65% of the query length and sequence identity > 60% were paralogs (i.e. homeologs or paralogs from lineage-specific duplica- retained. This filtered homolog set was then sorted into tions) in a second, accession-specific reference dataset. This orthogroups using ORTHOFINDER v.2.5.2 (Emms & Kelly, 2019), reduced the reference dataset to a single species per orthogroup, with the MSA mode using MAFFT v.7.481 (Katoh & Stand- which allowed subsequent read mapping refinement, and simpli- ley, 2013) as the sequence aligner, and FASTTREE v.2.1.11 (Price fied downstream processing. Read mapping was then repeated on et al., 2010) to generate gene trees, using default parameters in this accession-specific reference using the parameters described each case. above, and the resulting read alignments were converted into Using the phylogenetic hierarchical method of Orthofinder, majority consensus sequences using SAMTOOLS v.1.19.2 (Li we extracted orthogroups at the level of the most recent common et al., 2009; consensus function, --min-depth 1 --het-fract 1 --call- ancestor of the BOP–PACMAD clade, the crown group which fract 0.5). Only consensus sequences longer than 200 bp were covers > 99% of grass species and most available reference gen- retained for downstream analysis. Cases of multiple assemblies omes. Two of the original Angiosperms353 markers (g5422 and within a given orthogroup were treated as potential paralogs and g6924) were not detected in any of the reference genomes or subsequently inspected to remove spurious assemblies. First, iden- transcriptomes and were therefore not used. Five other markers tical assemblies (full length or partial) were removed using SEQKIT were duplicated before the BOP–PACMAD split (g4527, g5434, v.2.7.0 (Shen et al., 2016) and CD-HIT v.4.8.1 (Fu et al., 2012). If g5945, g5950 and g7024); these duplicates were therefore treated multiple assemblies remained for a given orthogroup, these were as separate markers in our analyses. For these five duplicated aligned together with the reference sequences used for their assem- genes, homologs of the three reference samples representing sub- bly using MAFFT. A phylogenetic tree was then estimated using IQ- family Anomochlooideae, sister to all other Poaceae, and the out- TREE v.2.1.3 (Minh et al., 2020; substitution model HKY) and group Joinvilleaceae were subsequently added to each of the two rooted on the longest branch. Only assemblies that formed a corresponding orthogroups. This initial reference dataset was monophyletic group with their corresponding references were vali- then curated to remove nonhomologous sequences and potential dated as paralogs and retained for downstream analyses. In all pseudogenes (see Methods S1). The final reference dataset con- other cases, only the longest assembly was retained. Steps that sisted of 356 orthogroups, and encompassed all homologous involved tree manipulation were implemented using NEWICK Uti- sequences of the 60 reference species, including paralogs from lities v.1.6 (Junier & Zdobnov, 2010). Note that this approach lineage-specific duplications within each orthogroup. Note that only recovers paralogs from duplication events that are shared with three of the markers (g5328, g5922 and g6128) were removed one of the reference species, so that paralog recovery is expected to before phylogenetic analysis on the basis that they contained be limited in groups that are not represented in the reference data- regions of low complexity in their sequences, which resulted in set. In such cases, paralogs from lineage-specific duplications are low-quality assemblies (to be described later) as revealed by preli- expected to be collapsed into single sequences, with differences minary analyses. coded as ambiguities. While such chimeric sequences might add New Phytologist (2024)  2024 The Author(s). www.newphytologist.com New Phytologist 2024 New Phytologist Foundation. New Phytologist Research 5 noise to gene tree estimation, particularly in the relationships trimming. Finally, to further reduce the impact of missing data, among accessions that share the duplicates, this is an intrinsic lim- only accessions with at least 50% of the total gene set were kept itation of short-read data, which cannot be fully overcome by our for analysis. The resulting dataset consisted of 1153 accessions custom assembly strategy or HybPiper, although we expect their and 331 gene alignments. Gene trees were then inferred using impact to be reduced due to the filters that are in place. RAXML v.8.2.12 (Stamatakis, 2014) with 100 rapid bootstrap The performance of the custom assembly strategy was evalu- pseudoreplicates. To strike a balance between computation time ated by reconstructing the Angiosperms353 sequences of two spe- and modelling rate heterogeneity adequately, we used a GTR cies from the reference dataset for which high-quality genomes substitution model with a CAT rate heterogeneity approximation are available (Brachypodium distachyon (L.) P.Beauv. and Oryza (25 rate categories; Stamatakis, 2006) across each alignment. sativa L.). For this, shotgun read datasets for these species were Abnormally long branches that significantly inflated tree diameter downloaded from the NCBI SRA database (accessions were detected using TREESHRINK v.1.3.9 (Mai & Mirarab, 2018), SRR891794 and SRR24031307) and subsampled to create four with the false positive rate set to 0.1 (option -q). These were then sets with varying sequencing depths (1, 5, 10 and 209). removed from the alignments, and the phylogenetic analysis was Sequences were then assembled using our pipeline and compared repeated. Branch support in gene trees was measured using trans- to the sequences extracted from the reference genomes to assess fer bootstrap expectation (TBE), which provides a gradual, rather the effect of sequencing depth on sequence completeness and than a presence–absence, measure of support and is more robust identity, and on the recall of paralogs (Figs S2, S3). to rogue tips in large trees compared to classical Felsenstein boot- strap proportion (Lemoine et al., 2018). A multigene coalescent species tree was inferred using the Extracting Angiosperms353 homologs from transcriptomes resulting 331 gene trees with ASTRAL-PRO3 v.1.17.3.5 (Zhang To identify Angiosperms353 homologs in the transcriptome et al., 2020). As measures of branch support and conflict in the accessions, we performed a BLASTN search with the orthogroup species tree, we used the Quartet Concordance (QC) and Quartet reference dataset as query (e-value ≤ 103), and retained only hits Differential (QD) metrics described by Pease et al. (2018). They with alignments covering > 50% of the query length and nucleo- were calculated from the paralogue-weighted proportions of gene tide identity > 70% for phylogenetic analysis. For the trees supporting each of the three possible quartets around a orthogroups corresponding to Angiosperms353 markers that branch, as reported by ASTRAL-PRO (R script ‘quartet_metrics.R’ were duplicated before the BOP–PACMAD split (as mentioned in the data repository). Following Pease et al., we interpret QC in the previous section), a BLASTN search was conducted and fil- values > 0.2 as strong support for one preferred quartet and tered as above, except that the query included the reference values between 0 and 0.2 as indicating conflict between gene trees sequences of the two paralogous orthogroups. The putative (the species tree already shows the majority quartets, so values homologous hits were then sorted into their corresponding cannot be < 0). QD will be 1 when the second and third alterna- orthogroups by aligning each hit with the query sequences using tive quartets are recovered with equal frequency, as expected MAFFT, and estimating a tree using IQ-TREE. The hit was then under ILS, especially when QC indicates conflict with the first assigned to one of the orthogroups based on the clade in which it quartet. When conflict is skewed to only two preferred alterna- was nested in the tree. tives in total at a branch, for example under introgression or hybridisation, QD will approach zero. We evaluated tree stability across two additional data filtering Nuclear tree inference strategies. In the first, the effect of missing data was assessed by We used all the recovered sequences, including paralogs from increasing the alignment trimming threshold and removing col- lineage-specific duplications within loci, for inferring a species umns with > 50% missing data (all 1153 samples retained). In tree using a coalescent-based approach that accounts for paralogy, the second filtered set, the same filtering strategy of the main which has been shown to improve species tree estimation and dataset was used, but to be sure that our novel assembly meth- vastly increase the data available for analysis (Smith & ods were not biasing the results, we tested the impact of omit- Hahn, 2021; Yan et al., 2021; Smith et al., 2022). Gene align- ting the shotgun sequences altogether (841 tips retained; i.e. ments were generated in a two-step approach. First, the reference only accessions from target capture, transcriptome and complete sequences were aligned using MAFFT (--maxiterate = 100) to gen- genome sources). We compared support and conflict in the erate a backbone alignment per gene. Then, gene assemblies of multigene coalescent tree across the three filtered sets using as shotgun, target capture and transcriptome accessions were aligned metrics QC, QD and the proportions of gene trees informative one by one using the options --addfragments and --keeplength to per branch. We counted the number of matching branches improve the quality of the alignment of partially assembled (based on tip sets) of the additionally filtered sets compared to sequences. Alignments were trimmed using TRIMAL v.1.4 the main tree and summarised support and conflict at these (Capella-Gutierrez et al., 2009) to remove columns with 90% or branches. We also calculated a measure of gene tree–species tree more missing data (-gt 0.1), and individual sequences shorter distance (Clustering Information Distance; Smith, 2020) using than 200 bp were removed from the trimmed alignments. To the TREEDIST R package v.2.7 (Smith, 2019); this required reduce uncertainty in tree estimation due to insufficient data, we keeping only one paralog, chosen randomly, for multicopy discarded gene alignments with a total length of < 500 bp after accessions in the gene trees.  2024 The Author(s). New Phytologist (2024) New Phytologist 2024 New Phytologist Foundation. www.newphytologist.com New 6 Research Phytologist queries. Assemblies per sample were selected to cover at least Gene tree–species tree reconciliation 25% of the reference length for at least five genes or intergenic We investigated the evidence for reticulations, whether from regions. Sequences were aligned per gene using MAFFT, alignment hybridisation, introgression or lateral transfers. We used gene columns containing large proportions of gaps were trimmed tree–species tree reconciliation under the maximum likelihood using the automated algorithm of TRIMAL v.1.4.15 and all gene implementation of a duplication–transfer–loss model (Unda- alignments concatenated using AMAS (Borowiec, 2016). After this tedDTL) in GENERAX v.2.0.1 (Morel et al., 2020). This is not step, accessions with 95% or more missing sites were removed, equivalent to a full, computationally expensive phylogenetic net- leaving the final 910 accessions. A maximum likelihood tree was work analysis (such as PhyloNet, Wen et al., 2018) but instead then inferred using RAXML v.8.2.12 with a GTR-CAT model assumes a true bifurcating species tree, which hugely constrains and 100 rapid bootstrap pseudoreplicates. the search space and makes the analysis amenable to our data. To measure to what degree nuclear relationships were sup- Note that apparent transfers may also reflect ILS, which is not ported by the plastome analysis, we mapped quartet support modelled by GeneRax, but we expected this to be limited to from 100 plastome bootstrap trees on the nuclear tree using lineages branching in short succession. Running the analysis for ASTRAL-Pro, after reducing both sets of trees to 751 tips we the whole dataset was not feasible, so we performed a tribe-level could match by accession, or, if the same accession was not reconciliation, where the species tree was collapsed to tribes, with available, by species. From the bootstrap frequencies per quar- gene trees matched to these tribes. We ran additional reconcilia- tet, we calculated QC, which here will be 1 if the plastome tree tion analyses for three clades of economic importance and with supports the same quartet, and 1 if the plastome tree strongly well documented reticulation histories: subfamily Bambusoideae supports an alternative quartet. We also tested if nuclear– (bamboos), tribe Andropogoneae (maize, sorghum and relatives), plastome conflict tends to affect branches where there is also and Triticeae (wheat and relatives). conflicting signal within the nuclear genome by correlating QC From the gene trees, GeneRax infers, in addition to duplica- calculated from nuclear gene trees with QC calculated from tions and losses, gene transfers between two branches of the spe- plastome bootstrap trees. cies tree. We summarised these transfers on the species tree using custom R scripts (see data repository). Because an apparent trans- Results fer may also be an artefact of a poorly supported gene tree, we considered that a transfer between two branches had to be sup- Nuclear reference dataset and genomic data ported by at least five gene trees to indicate possible reticulation. Note that in the case of the tribe-level tree, this number of trans- We compiled a grass-specific reference dataset for the assembly fers combines gene tree tips from all species within a tribe. Trans- of 356 nuclear genes, available in the data repository (file fers to/from the root were excluded (as they might involve any ‘target_Ang353_sequences_grasses.zip’, doi: 10.5281/zenodo. branch outside the ingroup that was not sampled). We high- 10996136). These genes were then extracted from genome and lighted the most frequent transfers as those with the top 10% transcriptome sequences, and assembled from target capture quantile counts per species tree. We also evaluated, for each reti- and shotgun data. culate connection, if transfer counts were skewed in one direction The final dataset used for phylogenetic analysis consisted of by highlighting those with > 50% proportional difference 1153 accessions and 331 genes. Taxon occupancy was above between counts in either direction. 70% in 95% of all genes, and the number of genes recovered per accession ranged from 166 to 331 (median = 308). Median gene recovery was highest in shotgun accessions (98%), followed by Plastome sequence assembly and tree inference transcriptomes (93%), target capture (92%) and genomes (91%) To compare the nuclear topology with the plastome topology, we (Fig. S2a; Table S1). The lower gene recovery in genomes was a inferred a 910-tip tree using the sequences of 70 coding plastome result of the stringent filters applied to prevent the incorporation regions and the trnL–trnF intergenic region. We retrieved the of deep paralogs and nonhomologous sequences into the grass- 520 assembled plastome sequences that were already publicly specific reference dataset (see the Materials and Methods section available, representing in most cases shotgun accessions in the and Methods S1), which occurred at the expense of discarding nuclear analysis, and sequences from the same species if the same some true orthologues. Among shotgun accessions, gene recovery accession was not available (see metadata table in data reposi- was correlated with sequencing depth, although sequencing depth tory). New plastome CDS were assembled from shotgun and as low as 19 was in most cases sufficient to recover sequences Angiosperms353 Illumina data using GETORGANELLE v.1.7.5 (Jin (> 200 bp) for more than 90% of all genes (Fig. S2b). Nonethe- et al., 2020) with default kmer settings for SPAdes (21, 45, 65, less, as expected, mean sequence completeness was higher among 85, 105) and 15 maximum extension rounds. We used a genome and transcriptome accessions (median = 85% and 83%) well-annotated plastome sequence (Digitaria exilis (Kippist) than in shotgun and target capture accessions (median = 63 and Stapf, INSDC accession KJ513091.1) as seed for assembly. Plas- 60%; Fig. S2c). We were able to recover at least 80% of the tome assemblies were annotated using GeSeq (Tillich Angiosperm353 genes (with sequences on average 49% com- et al., 2017). The target sequences were then recovered from the plete) for the 17 target capture samples that had been originally full or partial assemblies via BLAST, with the D. exilis sequences as enriched for 177 different nuclear loci (Fisher et al., 2016). New Phytologist (2024)  2024 The Author(s). www.newphytologist.com New Phytologist 2024 New Phytologist Foundation. New Phytologist Research 7 Paralogs from lineage-specific duplications were present in all of matching branches (Fig. S9d–f). More stringent filtering had 331 genes, and the median number of species with paralogs only slight effects on gene tree support (slight increase, Fig. S9g) across genes was 31 (min = 4, max = 138; Table S2). and gene tree distance from the species tree (slight decrease, Paralogs were more frequent in accessions represented by com- Fig. S9h). plete genomes, with on average 30% of the accessions having We compared our tree to the most recent Poaceae classification paralogs in each gene, followed by shotgun (4%), target capture (Soreng et al., 2022). The 1153 accessions correspond to 1133 (1.5%) and transcriptomes (0.5%; Fig. S3). In shotgun datasets, accepted species, covering all but two (Anomochloeae and Strep- the number of genes with paralogs varied among accessions, and togyneae) of the accepted tribes and 621 (79%) of the 791 gen- in some cases it was correlated with sequencing depth (Figs S4, era. Twenty-one genera were sequenced for the first time: S5), although this pattern was not consistent in the simulated Asthenochloa Buse, Bhidea Stapf ex Bor, 9 Cynochloris Clifford & datasets (Figs S6, S7). Such an overall low paralog recovery is in Everist, Dilophotriche (C.E.Hubb.) Jacq.-Fel., Fimbribambusa part explained by the filtering strategy of the custom assembly Widjaja, Ekmanochloa Hitchc., Kaokochloa De Winter, Hydro- method, which retained only 17% of the putative paralogous thauma C.E.Hubb, Mniochloa Chase, Parabambusa Widjaja, sequences assembled (Table S2). Pinga Widjaja, Pogonachne Bor, Pommereulla L.f., Ratzeburgia Increasing filtering stringency overall reduced missing data, at Kunth, Ruhooglandia S.Dransf. & K.M.Wong, Spathia Ewart, the expense of reducing the number of tips and/or alignment Suddia Renvoize, Taeniorhachis Cope, Thedachloa S.W.L.Jacobs, length (Table S1). For example, mean alignment completeness Thyridachne C.E.Hubb., and Trilobachne Schenck ex Henrard. was increased from 73% to 79% by increasing All subfamilies were recovered as monophyletic, except for the alignment-trimming stringency, while reducing mean alignment early-diverging Puelioideae, which is paraphyletic, with its two length from 1160 to 864. Likewise, by removing the 312 shotgun genera Guaduella and Puelia forming separate lineages, as also accessions, mean alignment completeness was only slightly noted by Huang et al. (2022). In Panicoideae, a clade comprising increased to 76% (Table S1). all accepted tribes is supported, but the branch subtending this clade plus Alloeochaete and Dichaetaria, only recently transferred from Arundinoideae (Teisher et al., 2017; Soreng et al., 2022) Nuclear genome phylogeny shows gene tree conflict. We found six further taxonomic discre- Our 1153-tip species tree recovered almost all subfamilies and pancies at tribe or subfamily level where tip positions did not tribes of Poaceae but points to frequent gene tree incongruence match the taxonomy (Fig. 1; Table 1), and further cases of non- (Fig. 1; see also detailed plot of the tree broken down into sub- monophyly at the subtribe level (see detailed tree in Fig. S8). clades in Fig. S8). Of the internal branches, only just above one Two accessions in surprising, isolated positions within Pani- quarter (314 of 1151) had one strongly preferred quartet config- coideae (Styppeiochloa hitchcockii and Ratzeburgia pulcherrima, uration (QC > 0.2). Clades with conflicting signals above tribe Table 1) passed all quality filtering steps. Individual gene tree level (QC ≤ 0.2) include BOP + PACMAD + Puelia + Gua- plots suggested unstable positions, but no clear indication of a duella, subfamily Panicoideae, and several divergences between laboratory mix-up or contamination. Because no prior DNA data subfamilies in the PACMAD clade and in subfamily Pooideae. are available for these species, we retained them in the analysis The distribution of gene tree conflict, with QC values skewed but emphasise the need for further validation with independent towards zero, remained almost unchanged when the dataset was samples. filtered more stringently (Fig. S9a), despite the high resolution in gene trees (Fig. S9g). The distribution of QD was strongly Gene tree–species tree reconciliation skewed towards 1, that is the second and third alternatives for each quartet had mostly similar frequencies, matching expecta- Reconciliation of gene trees with the species tree under a tions under frequent ILS. QD may be distorted when one quartet duplication–transfer–loss model suggests frequent reticulations in is strongly preferred and frequencies of the second and third the grass family (Fig. 2, see also detailed plots in Fig. S10). The quartet are low (high QC), but the QD skew towards 1 becomes tribe-level reconciliation for the whole tree suggests reticulation even clearer when looking only at highly conflicted branches early in the history of the grasses, involving the branch leading to (QC ≤ 0.2, Fig. S9b). It also holds under more stringent gene the large crown group, the BOP–PACMAD clade (Fig. 2a). At tree filtering, suggesting indeed ILS rather than the effect of this level of analysis, the most frequent reticulations primarily poorly supported, randomly resolving gene trees. Support for occurred in one direction (see arrows in Fig. 2a). Within Bambu- two alternative resolutions, expected under hybridisation or soideae, the inferred transfers for both woody bamboo tribes, introgression, was rare, with only 11 instances where branches Arundinarieae and Bambuseae, reflect the allopolyploid origins showed strong conflict (QC ≤ 0.2) and a > 50% skew in the fre- of their subgenomes (Triplett et al., 2014; Guo et al., 2019; Cha- quencies of the second and third quartet (QD < 0.5). The med- lopin et al., 2021; Ma et al., 2024). Note that in this tribe-level ian number of gene trees informative about a given analysis, the number of transfers combine gene trees from all spe- branch/quartet (from 331) was 202 (61%), with a range from 72 cies within a tribe, that is high numbers could be driven either by to 290 (22–88%) (Fig. S9c). Filtering the combined dataset more a few genes or a few (or a single) species. We interpret transfers stringently had negligible effects on species tree support or con- inferred from ancestors to descendants as transfers to a lineage flict (Fig. S9), both overall (Fig. S9a–c) and in direct comparison that is now extinct (or not sampled in our tree) but descended  2024 The Author(s). New Phytologist (2024) New Phytologist 2024 New Phytologist Foundation. www.newphytologist.com New 8 Research Phytologist Fig. 1 Phylogeny of 1153 Poaceae accessions inferred from 331 nuclear genes, including paralogs, using a multispecies coalescent approach. Closed dots indicate support or conflict on branches above tribe level based on the Quartet Concordance (QC) and Quartet Differential (QD) metrics, with blue dots indicating support for the quartet shown (QC > 0.2) and red dots indicating conflicting alternatives (QC ≤ 0.2). Open circles indicate supported conflict among nuclear gene trees at 11 branches, where two alternative quartet configurations are supported (QC ≤ 0.2 and QD < 0.5). Subfamilies and larger tribes (abbreviated) are labelled according to the most recent Poaceae classification (Soreng et al., 2022). The coloured lines link taxonomic outliers at tribe to subfamily level to their nominal taxa. Silhouettes show representatives for large subfamilies (from top): Maize or corn, Zea mays (Panicoideae); Dactyloctenium radulans (Chloridoideae); oat, Avena sativa (Pooideae); Bambusa textilis (Bambudoideae); rice,Oryza sativa (Oryzoideae). See Supporting Information Fig. S8 for a detailed version of the tree. from the same common ancestor. Note that apparent reticula- within particular clades, such as within Andropogoninae, the tions, especially between lineages branching in short succession, temperate woody bamboos (Arundinarieae), and, within the could instead be due to incomplete lineage sorting, which is not paleotropical woody bamboos (Bambuseae), the Malagasy Hicke- modelled by GeneRax. However, the frequency of inferred trans- liinae bamboos and the Bambusa–Dendrocalamus–Gigantochloa fers between more distant lineages does support reticulation, in complex. In Triticeae, reticulation is frequent across the tribe. addition to incomplete lineage sorting. The assembled genome of the known allohexaploid Thinopyrum Reconciliations at the species level (Fig. 2b–d) also support fre- intermedium accounts for a large proportion of the highly sup- quent reticulation. In Andropogoneae and Bambusoideae, the ported transfers in Triticeae (species in bold in Fig. 2d). The ori- most frequent reticulations are not between deeper branches but gin of Pascopyrum smithii from past hybridisation between New Phytologist (2024)  2024 The Author(s). www.newphytologist.com New Phytologist 2024 New Phytologist Foundation. New Phytologist Research 9 Table 1 Taxonomic discrepancies in the nuclear tree at subfamily to tribe level. Genus/species Nominal taxon Nuclear tree position Plastome tree position Amphipogon strictus R.Br. Arundinoideae: Arundineae Sister to Crinipedeae +Molinieae In Arundinoideae: Arundineae Baptorachis foliacea (Clayton) Paspaleae Paniceae: Anthephorinae Paniceae: Anthephorinae Clayton* Chaetium festucoides Nees* Panicoideae: Paniceae In Paspalinae, sister to Streptostachys Not included Guaduella Franch. Puelioideae Sister to (Puelia + BOP + PACMAD) Sister to Puelia Neomolinia Honda & Sakisaka* Pooideae: Diarrheneae Sister to Brachypodieae + Triticodae + Not analysed (sister to Poodae Diarrhena in Gallaher et al., 2022) Ratzeburgia pulcherrima Kunth* Panicoideae: Andropogoneae: Sister to Paniceae Not included Ratzeburginae Sporobolus subtilis Kunth. Chloridoideae: Zoysieae In Eragrostideae, in Eragrostis In Eragrostideae, in Eragrostis Styppeiochloa hitchcockii Arundinoideae: Crinipedeae Sister to Panicoideae In Arundinoideae: Crinipedeae (A.Camus) Cope Taxa listed here will need follow-up studies to validate their placement. An asterisk (*) denotes genera whose type species was sampled. Elymus and Leymus (Dewey, 1975), and the origins of bread (Gallaher et al., 2022), is polyphyletic in the nuclear tree, as its wheat, Triticum aestivum, from Aegilops ancestors are also evi- two genera Diarrhena and Neomolinia align in different clades; dent. our plastome tree does not include Neomolinia. Triticeae plas- tomes appear to be paraphyletic with regard to Bromeae, as described previously (Bernhardt et al., 2017), while Triticeae is Nuclear–plastome tree comparison monophyletic in the nuclear tree. Finally, the nuclear tree Nuclear–plastome conflict is rare across the grass phylogeny. We grouped the two woody bamboo tribes, Arundinarieae and Bam- inferred a plastome tree for 910 accessions, representing 893 spe- buseae, which have distinct allopolyploid origins (Triplett cies, 508 genera and all tribes except Ampelodesmeae and Steyer- et al., 2014; Guo et al., 2019; Chalopin et al., 2021; Ma markochloeae (Fig. 3; see also detailed plot broken down into et al., 2024), while in the plastid tree they are paraphyletic with subclades in Fig. S11). Of these, 751 species, 53 tribes and 478 regard to the herbaceous bamboos, Olyreae (Sungkaew genera were also present in the nuclear tree and their relationships et al., 2008). Below tribe level (see detailed tree in Fig. S11), the are compared between both trees (Fig. 3). Most branches in the nuclear tree confirms previous studies in finding the C4- nuclear tree were also highly supported by plastome data (74% photosynthetic subtribe Anthephorinae (Paniceae) sister to the with plastome QC > 0.2). Only 10 branches showed strong sig- C4 MCP clade of Melinidinae, Cenchrinae and Panicinae (Wash- nals of conflict, that is they were highly supported in the nuclear burn et al., 2015, 2017; Huang et al., 2022), but with strong tree (nuclear QC > 0.2) and had strong support for an alterna- gene tree incongruence between the subtribes or Paniceae. The tive configuration in the plastome tree (plastome QC < 0.2), chloroplast lineage of Anthephorinae is sister to the rest of Pani- all of them at shallow levels (open circles in Fig. 3). Nuclear and ceae as in previous studies (GPWG II, 2012; Washburn plastome QC values were positively correlated (t = 9.47, et al., 2017; Saarela et al., 2018; Gallaher et al., 2022). Further P < 0.001; Pearson’s correlation test, two-sided), i.e. branches differences in the branching order of subtribes are found in the that show a different configuration in the plastome tree tend to tribes Arundinarieae (temperate woody bamboos), Bambuseae be those with high intra-nuclear conflict. (tropical woody bamboos), Paspaleae, and Poeae. When directly comparing the positions of clades at subfamily to tribe level, differences are evident in some cases but mostly not Discussion strongly supported. The Puelioideae genera Guaduella and Puelia are sister taxa in the plastome tree (Puelioideae) but not in the Nuclear phylogenomic data support relationships of current nuclear tree (paraphyletic Puelioideae). Note that there is high subfamilies and tribes despite gene tree incongruence concordance among the nuclear gene trees grouping Puelia with BOP + PACMAD (QC = 0.6), but conflict in the gene trees We show that the nuclear genome topology of the grass family grouping Guaduella sister to this group (QC < 0.05), so that overall supports the monophyly of accepted subfamilies and there is no strongly supported nuclear–plastome conflict in this tribes (Kellogg, 2015; Soreng et al., 2022), despite the prevalence case. Arundinoideae and Micrairoideae are sisters in the plastome of gene tree conflict across the grass phylogeny. The subfamily- tree but paraphyletic in the nuclear tree, with high-gene tree to tribe-level classification of the grasses has proven remarkably incongruence. A striking difference was found in the position of stable over previous community-wide phylogenetic efforts Styppeiochloa hitchcockii, placed in Arundinoideae in classifica- (GPWG, 2001; GPWG II, 2012). Our nuclear phylogenomic tions and in the plastome tree, but as sister to Panicoideae in the analysis further substantiates this framework, building on pre- nuclear tree, based on the same target capture sample. In Pooi- vious work to provide the largest nuclear phylogenomic sampling deae, tribe Diarrheneae, although monophyletic in plastome trees to date, with 79% of grass genera and all but two small tribes.  2024 The Author(s). New Phytologist (2024) New Phytologist 2024 New Phytologist Foundation. www.newphytologist.com New 10 Research Phytologist Fig. 2 Nuclear gene reticulation in the grass family. For selected subgroups, the 331 gene trees were reconciled with the species tree under a duplication–transfer–loss model. Blue curves represent transfer events between two branches inferred at least five times (for different genes or within a gene family). Only the most frequent reticulations (top 10% quantile counts) are shown. Arrows indicate where transfers are highly skewed in one direction (> 50% of proportional difference). Branch lengths are not proportional to time, and transfer lines start at the midpoint of a branch but the actual timing was not inferred. (a) Whole grass family, where tips were relabelled with tribes. Note that here, numbers of transfers combine gene trees from all species within a tribe. (b) Maize tribe, Andropogoneae. (c) Bamboos, Bambusoideae. (d) Wheat tribe, Triticeae. See also detailed plots in Supporting Information Fig. S10. Such sampling has previously been a considerable challenge in Some taxonomic realignments will be necessary, despite the such a species-rich family. This phylogeny will help clarify gen- overall consistency with previous work. In addition, the place- eric limits and guide the search for useful genes and traits in wild ment of a few taxa will need to be validated by additional relatives of cereal, forage, biofuel and turf crops. sequences or samples (Table 1). These may represent cases of New Phytologist (2024)  2024 The Author(s). www.newphytologist.com New Phytologist 2024 New Phytologist Foundation. New Phytologist Research 11 Fig. 3 Comparison of nuclear and plastome topologies for the Poaceae. The 1153-tip nuclear tree is shown on the left, the 910-tip plastome tree on the right. Plastome support from bootstrap trees (Quartet Concordance, QC) was summarised for branches present in both trees (751 shared species). Grey branches in the nuclear tree had no equivalent for comparison in the plastome tree. Open circles indicate strong signals of conflict, that is high support in the nuclear tree (nuclear QC > 0.2) and high support for an alternative configuration in the plastome tree (plastome QC < 0.2). Tribes are matched between the two trees, and larger tribes are labelled for orientation. See also detailed version of the plastome tree in Supporting Information Fig. S11. biological interest (e.g. true reticulations) but our current data taxonomy in case of conflicting signals. In the bamboos, the cannot entirely rule out possible technical artefacts. More taxo- nuclear topology better reflected morphological differences than nomic mismatches will require attention at the subtribe level. the plastome phylogeny in previous work (Wang et al., 2017). In Using the sequence data of Huang et al. (2022) we were able to Paniceae, the position of Anthephorinae sister to Melinidinae– reproduce their results suggesting paraphyly of subfamily Puelioi- Cenchrinae–Panicinae would be in line with a common origin of deae (Guaduella and Puelia) in the nuclear tree; there is, however, C4 photosynthesis in the combined clade (Washburn high-nuclear incongruence and this result was not tested with et al., 2015), with two separate plastome sources. We refer taxo- independent plant samples. If the cyto-nuclear conflict continues nomic and nomenclatural changes to further studies by specialists to be supported, the well-supported monophyly of the group in of the relevant grass subgroups. the plastid phylogeny would suggest an introgression event in the Nuclear–plastome discordance is rare between higher taxo- early history of the grasses. Future morphological studies are nomic levels in the grasses. In the large PACMAD clade, we con- needed to determine whether there are characters that support firm previous nuclear (Bianconi et al., 2020; Huang et al., 2022) either monophyly or paraphyly of Puelioideae, one of the least and plastome studies (GPWG II, 2012) in finding Aristidoideae well known of the grass subfamilies. More generally, as nuclear sister to the other five subfamilies. However, there is high-nuclear genome-scale data continue to accumulate, the grass taxonomic gene tree incongruence, while the plastome PACMAD relation- community will have to decide whether the nuclear genome, ulti- ships are highly resolved. More recent plastome studies (Saarela mately underlying most phenotypic characters, should dictate et al., 2018; Duvall et al., 2020; Gallaher et al., 2022) suggested  2024 The Author(s). New Phytologist (2024) New Phytologist 2024 New Phytologist Foundation. www.newphytologist.com New 12 Research Phytologist this position might be artifactual and favoured a ‘panicoid sister’ species could foster research on reticulation, particularly in three hypothesis. Arundinoideae and Micrairoideae subfamilies do not areas. First, clarifying where apparent reticulate relationships may form a clade as in the plastome tree, a result also found in the actually stem from differential retention of homologs after nuclear analysis of Huang et al. (2022). There clearly is a concen- whole-genome duplication. For example, the reticulations we tration of gene tree conflict at the base of PACMAD, a period of inferred at the base of the BOP–PACMAD clade, the large crown rapid grass diversification (Christin et al., 2014). This suggests radiation of grasses, could potentially be remnants from the rho the split of the PACMAD subfamilies could represent a hard whole-genome duplication event at the stem of Poaceae (McKain polytomy, analogous to deep radiations in groups such as Amar- et al., 2016; Zhang et al., 2024). Second, correlating reticulation anthaceae (Morales-Briones et al., 2021b), Fabaceae (Koenen frequency with ecological and morphological predictors to iden- et al., 2021), or neoavian birds (Suh, 2016). Plastome lineages tify the physical mechanisms of lateral transfers, which remain may have sorted more rapidly due to geographically more limited speculative (Pereira et al., 2023). Third, using synteny informa- seed compared to pollen dispersal. Across angiosperms, episodes tion to identify the precise locations and origins of functional var- of rapid diversification are correlated with higher conflict among iation, potentially using new deep learning approaches for gene trees (Guo et al., 2023; Zuntini et al., 2024). Further inves- identifying introgression (Zhang et al., 2023). New crops, such tigation of this relationship for the grasses could build on the as Thinopyrum intermedium (intermediate wheatgrass or kernza) dataset we compiled here but will require tackling the complex with its mosaic genome and its potential as perennial cereal or issue of time calibration in grasses (to be described later). genetic resource (Baker et al., 2020), are certainly prime candi- dates for such research. However, given the frequency of allo- polyploidisation, and if lateral transfers are as frequent as Incomplete lineage sorting and reticulation have been recent work suggests (Hibdige et al., 2021), the grass family frequent in the grass family as a whole may well constitute a ‘single genetic system’ (Freel- Incomplete lineage sorting (ILS) may explain much of the gene ing, 2001; Mascher et al., 2024) or higher-level ‘pangenome’ tree incongruence in our data. Grasses often have very large (Dunning et al., 2019). The lateral recruitment, across 20 mil- ranges and population sizes (Linder et al., 2018), which will lion years of divergence, of key genes for C4 photosynthesis in favour ILS at speciation. Frequent ILS would imply that species panicoid grasses (Christin et al., 2012), illustrates this point. delimitation based on only a few markers may be unreliable in Species-level sampling including the more distant relatives of the grasses, and paraphyletic species common. However, it is crops is therefore needed to access the entire genetic diversity unclear why we find little support for introgression or hybridisa- potentially available for future sustainable agriculture. tion, although we know it must be frequent: c. 45–80% of grasses are polyploid (Stebbins, 1985; DeWet, 1986), with an appreci- Towards a complete grass tree of life able proportion of those being allopolyploid, that is hybrids. Unequal paralogue recovery and a species sampling not dense Our community effort resulted in the most comprehensive enough could mean that signals of hybridisation get blurred in nuclear phylogenomic tree for the grass family to date, including the type of data we used and produce gene tree distributions simi- 1133 species, with 21 genera sequenced for the first time. This lar to those expected under ILS. tree, and the dataset associated with it, paves the way towards pla- Nevertheless, gene tree–species tree reconciliation does illus- cing all c. 11 800 species in the grass tree of life. Already, the trate the potential scale of nontree-like phylogenetic structure. It International Nucleotide Sequence Database Collaboration hosts needs to be followed by more in-depth analyses with phased sequence data for more than 6200 grass species, as of April 2024. genomic data that can clearly distinguish reticulation from ILS The comprehensive phylogenomic backbone we provide here on closely related branches, and different modes of reticulation could provide a basis for assembling these shorter sequences into from each other. The methods used here are not designed to a grass supertree for analyses of trait evolution and biogeography, detect allopolyploidy but are still able to identify frequent reticu- as attempted previously with smaller Poaceae backbones (Spriggs lation events. The actual modes of reticulation in grasses certainly et al., 2014; Elliott et al., 2024). need more study, as we cannot distinguish here between intro- We show that the Angiosperms353 gene set can be success- gression and hybrid speciation. Recent work also demonstrated fully used to anchor different types of genomic datasets, includ- the frequency of lateral gene transfers of large blocks in the gen- ing unenriched Illumina sequence data. Sequencing depth and omes of Alloteropsis semialata (Dunning et al., 2019; Raimondeau paralog recovery obviously vary across such different datasets, et al., 2023) and other grass species (Hibdige et al., 2021). Con- which needs to be taken into account, for example in future stu- tamination and gene tree errors can potentially obscure patterns dies of whole-genome duplications and events of auto- and allo- in short-read data as we included in our analysis, but encoura- polyploidy (Thomas et al., 2017; Morales-Briones et al., 2021a; gingly, we retrieved known patterns such as the mosaic origins of Rothfels, 2023) or large-scale gene duplications that preceded the Thinopyrum intermedium genome (Mahelka et al., 2011). major innovations like cold tolerance (Schubert et al., 2020; This suggests that reduced-representation nuclear datasets do Zhang et al., 2022). The extent to which paralog-aware meth- retain signals of reticulation. ods of species tree inference are robust to unaccounted paralogs Our analysis offers a glimpse of how the accumulation of in large datasets has yet to be evaluated, but this problem assembled genome data for grasses beyond model and crop should decrease in importance as more high-coverage datasets New Phytologist (2024)  2024 The Author(s). www.newphytologist.com New Phytologist 2024 New Phytologist Foundation. New Phytologist Research 13 become available. The improved Angiosperms353 reference set ARZ; an Australian Biological Resources Study to RWJ; the constructed here for the grasses will facilitate the inclusion of National Science Foundation (grants DEB-1929514 and IOS- previously unsequenced grass species. This target capture 1822330) to EAK; the National Natural Science Foundation of approach allows in particular sequencing degraded DNA from China (grant 32120103003) to D-ZL and J-XL; a PhD scholar- herbarium specimens in a cost-efficient manner and thus filling ship (Chinese Academy of Sciences, Kunming Institute of Bot- the remaining gaps of the grass tree of life, even where there are any) to RAR; the U.S. Department of Agriculture – Agricultural logistical barriers to obtaining high-molecular weight DNA for Research Service, the U.S. National Science Foundation (Award full-genome sequencing. No. 1501406), and the University of Missouri to JDW; the Con- The timeline of grass evolution continues to be a matter of selho Nacional de Desenvolvimento Cientıfico e Tecnologico, debate, with recent studies suggesting a mid-Cretaceous origin Brazil (CNPq – grants 426334/2018-3, 441760/2020-1, for the grasses (Gallaher et al., 2022; Huang et al., 2022) and 315433/2023-0) and Fundac~ao de Amparo a Pesquisa do Estado thus supporting earlier suggestions based on phytolith fossils de Minas Gerais (FAPEMIG – grants APQ-01222-21, APQ- (Prasad et al., 2005, 2011). However, such age estimates hinge 03365-21, BPD-00736-22) to CADW; the National Natural on several factors (Christin et al., 2014), such as the placement of Science Foundation of China (grants 32270227 and 31670196) phytolith fossils, appropriate modelling of rate correlation, and to NX. the upper bound set by the age of flowering plants, which itself We thank Research Computing at the James Hutton Institute remains unclear and fraught with methodological challenges for providing computational resources and technical support for (Brown & Smith, 2018; Silvestro et al., 2021; Sauquet the UK’s Crop Diversity Bioinformatics HPC (BBSRC grants et al., 2022; Carruthers & Scotland, 2023). The nuclear dataset BB/S019669/1 and BB/X019683/1), use of which has contribu- we provide here, along with recent advances in grass phytolith ted to the results reported within this paper. classification (Gallaher et al., 2020) as well as a better under- We thank the herbarium curators at BR, CAN, EA, FTOH, standing of rate variation across branches (Carruthers KIB, MEL, NSW, P, PE, SI, TAN and US for supporting this et al., 2020; Carruthers & Scotland, 2020) and gene tree conflict project and the following people for providing technical sup- (Carruthers et al., 2022) on divergence time estimation suggest a port or advice at various stages: Paul Bailey (RBG Kew), Kevin new comprehensive analysis of grass divergence times as a pro- Lempoel (RBG Kew), Sophie Manzi (CNRS Toulouse) and mising avenue forward. Martin R. Smith (University of Durham). We thank M.C. Romay and E.S. Buckler for prepublication access to sequences from the Panandropogoneae project, and the US Department Acknowledgements of Energy Joint Genome Institute and the Open Green Gen- This work was supported by grants from the Calleva Foundation ome Initiative for early access to the Joinvillea ascendens to the Plant and Fungal Trees of Life (PAFTOL) project at the genome sequence. Royal Botanic Gardens, Kew. We would like to acknowledge We dedicate this paper to the memory of W. Derek Clayton the contribution of the Genomics for Australian Plants Framework who passed away in September 2023. His contribution to grass Initiative consortium (https://www.genomicsforaustralianplants. systematics cannot be overestimated and laid the foundations for com/consortium/) in the generation of data supporting this publi- many of the advances described in this paper. cation. The Initiative is supported by funding from Bioplatforms Australia (enabled by NCRIS), the Ian Potter Foundation, Royal Competing interests Botanic Gardens Foundation (Victoria), Royal Botanic Gardens, Victoria, the Royal Botanic Gardens and Domain Trust, the None declared. Council of Heads of Australasian Herbaria, CSIRO, Centre for Australian National Biodiversity Research and the Department of Author contributions Biodiversity, Conservation and Attractions, Western Australia. We also acknowledge funding from: a Giles Fellowship of the WJB, MEB, P-AC, LTD, JH, EAK, RJS and MSV conceptua- Georgia Research Alliance to JL Bennetzen; the Labex TULIP lised the study. MDB, RLB, GB, MEB, P-AC, PC, DMC, GD, and CEBA funded by Agence Nationale de la Recherche (ANR- LTD, MRD, SD, SZF, SF, JH, TRH, WH, RWJ, EAK, JMK, 10-LABX-0041; ANR-10-LABX-25-01) to GB; the Horizon XL, OM, TGBM, MFM-A, DJM, JR, LS, RJS, MSV, MW, Europe programme (MSCA-PF grant 101105838) to MEB; the CADW, MDX, LZ and FOZ curated the data. MEB and JH Spanish Ministry of Science and Innovation grants PID2022- performed formal analysis. WJB, GB, P-AC, JTC, DMC, FF, 140074NB-I00 and TED2021-131073B-I00, and Spanish Ara- EAK, LZ and AZ acquired funding. MEB, JTC, JH and IL per- gon Government grant LMP82_21 to PC; the European formed investigation. WJB, MEB, P-AC, LTD, JH, AMH, Research Council (grant ERC-2014-STG-638333) and the EAK, RJS, MSV and ARZ developed methodology. WJB, MEB, Royal Society (grant RGF\EA\181050) to P-AC; National P-AC, JH and MSV had administrative responsibility in the pro- Science Foundation grant DEB 0920147 to JTC; a NERC Inde- ject. WA, MDB, RLB, JL Bennetzen, JL Birch, GB, PC, WC, pendent Research Fellowship (NE/T011025/1) to LTD; the MC, LGC, JCAC, DMC, GD, MRD, SD, AEF, SF, FF, LJG, Canadian Museum of Nature to LJG and JMS; Future Leader in TH, TRH, C-HH, RWJ, EAK, CJK, JMK, IL, RL, D-ZL, J-XL, Plant and Fungal Science Fellowships (RBG Kew) to JH and XL, QWRL, HM, TDM, OM, MRM, TGBM, DJM, OPN,  2024 The Author(s). New Phytologist (2024) New Phytologist 2024 New Phytologist Foundation. www.newphytologist.com New 14 Research Phytologist GEO, PMP, RAR, JR, JMS, LS, NWS, RJS, MSMS, EJT, PT, Carruthers T, Scotland RW. 2020. Insights from empirical analyses and GAV, MSV, NGW, JDW, TW, MW, CADW, MDX, NX, LZ simulations on using multiple fossil calibrations with relaxed clocks to estimate divergence times.Molecular Biology and Evolution 37: 1508–1529. and FOZ provided resources. MEB and JH developed software Carruthers T, Scotland RW. 2023. Deconstructing age estimates for code. WJB, P-AC, EAK, RJS and MSV supervised the project. angiosperms.Molecular Phylogenetics and Evolution 186: 107861. GB, MEB, LTD, JH, EAK, JMK, D-ZL, J-XL, HM, RJS, Carruthers T, Sun M, Baker WJ, Smith SA, de Vos JM, Eiserhardt WL. 2022. MSMS, MSV and ARZ validated the results. RLB, MEB and JH The implications of incongruence between gene tree and species tree topologies visualised the data. MEB and JH wrote the original manuscript. for divergence time estimation. Systematic Biology 71: 1124–1146. Chalopin D, Clark LG, Wysocki WP, Park M, Duvall MR, Bennetzen JL. WA, WJB, MDB, RLB, JB, GB, MEB, P-AC, WC, PC, LGC, 2021. Integrated genomic analyses from low-depth sequencing help resolve DMC, LTD, AEF, FF, JH, TRH, WH, AMH, RWJ, EAK, phylogenetic incongruence in the bamboos (Poaceae: Bambusoideae). Frontiers JMK, IL, D-ZL, HM, MRM, TGBM, DJM, JMS, NWS, RJS, in Plant Science 12: 725728. MSMS, MSV, NGW, JDW, MW, CADW, LZ and AZ Christin P-A, Edwards EJ, Besnard G, Boxall SF, Gregory R, Kellogg EA, reviewed and edited the manuscript. Hartwell J, Osborne CP. 2012. Adaptive evolution of C4 photosynthesis through recurrent lateral gene transfer. Current Biology 22: 445–449. Christin P-A, Spriggs E, Osborne CP, Str€omberg CAE, Salamin N, Edwards EJ. Data availability 2014.Molecular dating, evolutionary rates, and the age of the grasses. Systematic Biology 63: 153–165. Data used and produced in this study, including metadata for all Clark LG, Zhang W, Wendel JF. 1995. A phylogeny of the grass family accessions, gene alignments and phylogenetic trees are available (Poaceae) based on ndhF sequence data. Systematic Botany 20: 436–460. DeWet JMJ. 1986.Hybridization and polyploidy in the Poaceae. In: Soderstrom in an open Zenodo repository (doi: 10.5281/zenodo.10996136). T, Hilu KW, Campbell CS, Barkworth ME, eds. Grass systematics and New short-read data and plastome assemblies are available via the evolution. Washington, DC: Smithsonian Institution Press, 188–194. International Sequence Database Collaboration (INSDC), Dewey DR. 1975. The origin of Agropyron smithii. American Journal of Botany BioProject no. PRJEB79360. 62: 524–530. Dunning LT, Olofsson JK, Parisod C, Choudhury RR, Moreno-Villena JJ, Yang Y, Dionora J, Quick WP, Park M, Bennetzen JL et al. 2019. References Lateral transfers of large DNA fragments spread functional genes among grasses. Proceedings of the National Academy of Sciences, USA 116: 4416– Baker L, Grewal S, Yang C, Hubbart-Edwards S, Scholefield D, Ashling S, 4425. Burridge AJ, Przewieslik-Allen AM, Wilkinson PA, King IP et al. 2020. Duvall MR, Burke SV, Clark DC. 2020. Plastome phylogenomics of Poaceae: Exploiting the genome of Thinopyrum elongatum to expand the gene pool of alternate topologies depend on alignment gaps. Botanical Journal of the Linnean hexaploid wheat. Theoretical and Applied Genetics 133: 2213–2226. Society 192: 9–20. Baker WJ, Bailey P, Barber V, Barker A, Bellot S, Bishop D, Botigue LR, Edwards EJ, Osborne CP, Str€omberg CAE, Smith SA, C4 Grasses Consortium. Brewer G, Carruthers T, Clarkson JJ et al. 2022. A comprehensive 2010. The origins of C4 grasslands: integrating evolutionary and ecosystem phylogenomic platform for exploring the angiosperm tree of life. Systematic science. Science 328: 587–591. Biology 71: 301–319. Elliott TL, Spalink D, Larridon I, Zuntini AR, Escudero M, Hackel J, Barrett Baker WJ, Dodsworth S, Forest F, Graham SW, Johnson MG, McDonnell A, RL, Martın-Bravo S, Marquez-Corro JI, Granados Mendoza C et al. 2024. Pokorny L, Tate JA, Wicke S, Wickett NJ. 2021. Exploring Angiosperms353: Global analysis of Poales diversification – parallel evolution in space and time An open, community toolkit for collaborative phylogenomic research on into open and closed habitats. New Phytologist 242: 727–743. flowering plants. American Journal of Botany 108: 1059–1065. Emms DM, Kelly S. 2019.ORTHOFINDER: phylogenetic orthology inference for Bernhardt N, Brassac J, Kilian B, Blattner FR. 2017. Dated tribe-wide whole comparative genomics. Genome Biology 20: 238. chloroplast genome phylogeny indicates recurrent hybridizations within Estep MC, McKain MR, Diaz DV, Zhong J, Hodge JG, Hodkinson TR, Layton Triticeae. BMC Evolutionary Biology 17: 141. DJ, Malcomber ST, Pasquet R, Kellogg EA. 2014. Allopolyploidy, Bianconi ME, Hackel J, Vorontsova MS, Alberti A, Arthan W, Burke SV, diversification, and the Miocene grassland expansion. Proceedings of the Duvall MR, Kellogg EA, Lavergne S, McKain MR et al. 2020. Continued National Academy of Sciences, USA 111: 15149–15154. adaptation of C4 photosynthesis after an initial burst of changes in the Feldman M, Levy AA. 2023.Wheat evolution and domestication. Cham, Andropogoneae grasses. Systematic Biology 69: 445–461. Switzerland: Springer International. Bolger AM, Lohse M, Usadel B. 2014. TRIMMOMATIC: a flexible trimmer for Fisher AE, Hasenstab KM, Bell HL, Blaine E, Ingram AL, Columbus JT. 2016. Illumina sequence data. Bioinformatics 30: 2114–2120. Evolutionary history of chloridoid grasses estimated from 122 nuclear loci. Bond WJ. 2016. Ancient grasslands at risk. Science 351: 120–122. Molecular Phylogenetics and Evolution 105: 1–14. Borowiec ML. 2016. AMAS: a fast tool for alignment manipulation and Freeling M. 2001. Grasses as a single genetic system. Reassessment 2001. Plant computing of summary statistics. PeerJ 4: e1660. Physiology 125: 1191–1197. Brown JW, Smith SA. 2018. The past sure is tense: on interpreting phylogenetic Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD-HIT: accelerated for clustering the divergence time estimates. Systematic Biology 67: 340–353. next-generation sequencing data. Bioinformatics 28: 3150–3152. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Gallaher TJ, Akbar SZ, Klahs PC, Marvet CR, Senske AM, Clark LG, Madden TL. 2009. BLAST+: architecture and applications. BMC Bioinformatics Str€omberg CAE. 2020. 3D shape analysis of grass silica short cell phytoliths 10: 421. (GSSCP): a new method for fossil classification and analysis of shape evolution. Campbell CS, Kellogg EA. 1987. Phylogenetic analyses of the Gramineae. In: New Phytologist 228: 376–392. Soderstrom TR, Hilu KW, Campbell CS, Barkworth ME, eds. Grass systematics Gallaher TJ, Peterson PM, Soreng RJ, Zuloaga FO, Li D, Clark LG, Tyrrell and evolution. Washington, DC, USA: Smithsonian Institution Press, 310–322. CD, Welker CAD, Kellogg EA, Teisher JK. 2022. Grasses through space and Capella-Gutierrez S, Silla-Martınez JM, Gabaldon T. 2009. TRIMAL: a tool for time: An overview of the biogeographical and macroevolutionary history of automated alignment trimming in large-scale phylogenetic analyses. Poaceae. Journal of Systematics and Evolution 60: 522–569. Bioinformatics 25: 1972–1973. Govaerts R, Nic Lughadha E, Black N, Turner R, Paton A. 2021. The World Carruthers T, Sanderson MJ, Scotland RW. 2020. The implications of Checklist of Vascular Plants, a continuously updated resource for exploring lineage-specific rates for divergence time estimation. Systematic Biology 69: global plant diversity. Scientific Data 8: 215. 660–670. New Phytologist (2024)  2024 The Author(s). www.newphytologist.com New Phytologist 2024 New Phytologist Foundation. New Phytologist Research 15 GPWG. 2001. Phylogeny and subfamilial classification of the grasses (Poaceae). 2009. The Sequence Alignment/Map format and SAMTOOLS. Bioinformatics Annals of the Missouri Botanical Garden 88: 373–457. 25: 2078–2079. GPWG II. 2012. New grass phylogeny resolves deep evolutionary relationships Linder HP, Lehmann CER, Archibald S, Osborne CP, Richardson DM. and discovers C4 origins. New Phytologist 193: 304–312. 2018. Global grass (Poaceae) success underpinned by traits facilitating Guo C, Luo Y, Gao L-M, Yi T-S, Li H-T, Yang J-B, Li D-Z. 2023. colonization, persistence and habitat transformation. Biological Reviews 93: Phylogenomics and the flowering plant tree of life. Journal of Integrative Plant 1125–1144. Biology 65: 299–323. Ma P-F, Liu Y-L, Guo C, Jin G, Guo Z-H, Mao L, Yang Y-Z, Niu L-Z, Wang Guo Z-H, Ma P-F, Yang G-Q, Hu J-Y, Liu Y-L, Xia E-H, Zhong M-C, Zhao L, Y-J, Clark LG et al. 2024. Genome assemblies of 11 bamboo species highlight Sun G-L, Xu Y-X et al. 2019. Genome sequences provide insights into the diversification induced by dynamic subgenome dominance. Nature Genetics 56: reticulate origin and unique traits of woody bamboos.Molecular Plant 12: 710–720. 1353–1365. Mahelka V, Kopecky D, Pastova L. 2011.On the genome constitution and Hamby RK, Zimmer EA. 1988. Ribosomal RNA sequences for inferring evolution of intermediate wheatgrass (Thinopyrum intermedium: Poaceae, phylogeny within the grass family (Poaceae). Plant Systematics and Evolution Triticeae). BMC Evolutionary Biology 11: 127. 160: 29–37. Mai U, Mirarab S. 2018. TREESHRINK: fast and accurate detection of outlier long Healey AL, Garsmeur O, Lovell JT, Shengquiang S, Sreedasyam A, Jenkins J, branches in collections of phylogenetic trees. BMC Genomics 19: 272. Plott CB, Piperidis N, Pompidor N, Llaca V et al. 2024. The complex Mascher M, Marone MP, Schreiber M, Stein N. 2024. Are cereal grasses a single polyploid genome architecture of sugarcane. Nature 628: 804–810. genetic system? Nature Plants 10: 719–731. Hibdige SGS, Raimondeau P, Christin P-A, Dunning LT. 2021.Widespread Mason-Gamer RJ, White DM. 2024. The phylogeny of the Triticeae: resolution lateral gene transfer among grasses. New Phytologist 230: 2474–2486. and phylogenetic conflict based on genomewide nuclear loci. American Journal Holm LRG, Plucknett DL, Pancho JV, Herberger JP. 1977. The World’s worst of Botany 111: e16404. weeds. Distribution and biology. Honolulu, HI: East-West Center, by the McKain MR, Tang H, McNeal JR, Ayyampalayam S, Davis JI, dePamphilis University Press of Hawai’i. CW, Givnish TJ, Pires JC, Stevenson DW, Leebens-Mack JH. 2016. A Hu Y, Sun Y, Zhu Q-H, Fan L, Li J. 2023. Poaceae chloroplast genome phylogenomic assessment of ancient polyploidy and genome evolution across sequencing: great leap forward in recent ten years. Current Genomics 23: 369– the Poales. Genome Biology and Evolution 8: 1150–1164. 384. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Huang W, Zhang L, Columbus JT, Hu Y, Zhao Y, Tang L, Guo Z, Chen W, Haeseler A, Lanfear R. 2020. IQ-TREE 2: new models and efficient methods for McKain M, Bartlett M et al. 2022. A well-supported nuclear phylogeny of phylogenetic inference in the genomic era.Molecular Biology and Evolution 37: Poaceae and implications for the evolution of C4 photosynthesis.Molecular 1530–1534. Plant 15: 755–777. Morales-Briones DF, Gehrke B, Huang C-H, Liston A, Ma H, Marx HE, Tank Hufford MB, Seetharam AS, Woodhouse MR, Chougule KM, Ou S, Liu J, DC, Yang Y. 2021a. Analysis of paralogs in target enrichment data pinpoints Ricci WA, Guo T, Olson A, Qiu Y et al. 2021. De novo assembly, annotation, multiple ancient polyploidy events in Alchemilla s.l. (Rosaceae). Systematic and comparative analysis of 26 diverse maize genomes. Science 373: 655–662. Biology 71: 190–207. Jin J-J, Yu W-B, Yang J-B, Song Y, dePamphilis CW, Yi T-S, Li D-Z. 2020. Morales-Briones DF, Kadereit G, Tefarikis DT, Moore MJ, Smith SA, GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of Brockington SF, Timoneda A, Yim WC, Cushman JC, Yang Y. 2021b. organelle genomes. Genome Biology 21: 241. Disentangling sources of gene tree discordance in phylogenomic data sets: testing Johnson MG, Gardner EM, Liu Y, Medina R, Goffinet B, Shaw AJ, Zerega ancient hybridizations in Amaranthaceae s.l. Systematic Biology 70: 219–235. NJC, Wickett NJ. 2016.HYBPIPER: extracting coding sequence and introns for Morel B, Kozlov AM, Stamatakis A, Szo€ll}osi GJ. 2020. GENERAX: a tool for phylogenetics from high-throughput sequencing reads using target enrichment. species-tree-aware maximum likelihood-based gene family tree inference under Applications in Plant Sciences 4: 1600016. gene duplication, transfer, and loss.Molecular Biology and Evolution 37: 2763– Johnson MG, Pokorny L, Dodsworth S, Botigue LR, Cowan RS, Devault A, 2774. Eiserhardt WL, Epitawalage N, Forest F, Kim JT et al. 2019. A universal Olofsson JK, Cantera I, Van de Paer C, Hong-Wa C, Zedane L, Dunning LT, probe set for targeted sequencing of 353 nuclear genes from any flowering plant Alberti A, Christin P-A, Besnard G. 2019. Phylogenomics using low-depth designed using k-medoids clustering. Systematic Biology 68: 594–606. whole genome sequencing: a case study with the olive tribe.Molecular Ecology Junier T, Zdobnov EM. 2010. The Newick utilities: high-throughput Resources 19: 877–892. phylogenetic tree processing in the Unix shell. Bioinformatics 26: 1669–1670. One Thousand Plant Transcriptomes Initiative. 2019.One thousand plant Katoh K, Standley DM. 2013.MAFFT multiple sequence alignment software v.7: transcriptomes and the phylogenomics of green plants. Nature 574: 679–685. improvements in performance and usability.Molecular Biology and Evolution Pease JB, Brown JW, Walker JF, Hinchliff CE, Smith SA. 2018.Quartet 30: 772–780. Sampling distinguishes lack of support from conflicting support in the green Kellogg EA. 2015. Flowering plants. Monocots: Poaceae. Heidelberg, Germany: plant tree of life. American Journal of Botany 105: 385–403. Springer. Pereira L, Christin P-A, Dunning LT. 2023. The mechanisms underpinning Kihara H. 1982.Wheat studies: retrospects and prospects. Tokyo, Japan; lateral gene transfer between grasses. Plants, People, Planet 5: 672–682. Amsterdam, the Netherlands; New York, NY, USA: Kodansha; Elsevier Prasad V, Str€omberg CAE, Alimohammadian H, Sahni A. 2005. Dinosaur Scientific Pub. coprolites and the early evolution of grasses and grazers. Science 310: 1177– Koenen EJM, Ojeda DI, Bakker FT, Wieringa JJ, Kidner C, Hardy OJ, 1180. Pennington RT, Herendeen PS, Bruneau A, Hughes CE. 2021. The origin of Prasad V, Str€omberg CAE, Leache AD, Samant B, Patnaik R, Tang L, Mohabey the legumes is a complex paleopolyploid phylogenomic tangle closely associated DM, Ge S, Sahni A. 2011. Late Cretaceous origin of the rice tribe provides with the Cretaceous-Paleogene (K-Pg) mass extinction event. Systematic Biology evidence for early diversification in Poaceae. Nature Communications 2: 480. 70: 508–526. Price MN, Dehal PS, Arkin AP. 2010. FASTTREE 2 – approximately maximum- Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with BOWTIE 2. likelihood trees for large alignments. PLoS ONE 5: e9490. Nature Methods 9: 357–359. Raimondeau P, Bianconi ME, Pereira L, Parisod C, Christin P-A, Dunning LT. Lemoine F, Entfellner J-BD, Wilkinson E, Correia D, Felipe MD, Oliveira T, 2023. Lateral gene transfer generates accessory genes that accumulate at Gascuel O. 2018. Renewing Felsenstein’s phylogenetic bootstrap in the era of different rates within a grass lineage. New Phytologist 240: 2072–2084. big data. Nature 556: 452–456. Rothfels CJ. 2023. Polyploid phylogenetics. New Phytologist 230: 66–72. Li H, Durbin R. 2009. Fast and accurate short read alignment with Saarela JM, Burke SV, Wysocki WP, Barrett MD, Clark LG, Craine JM, Burrows–Wheeler transform. Bioinformatics 25: 1754–1760. Peterson PM, Soreng RJ, Vorontsova MS, Duvall MR. 2018. A 250 plastome Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, phylogeny of the grass family (Poaceae): topological support under different Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. data partitions. PeerJ 6: e4299.  2024 The Author(s). New Phytologist (2024) New Phytologist 2024 New Phytologist Foundation. www.newphytologist.com New 16 Research Phytologist Sauquet H, Ramırez-Barahona S, Magallon S. 2022.What is the age of Washburn JD, Schnable JC, Davidse G, Pires JC. 2015. Phylogeny and flowering plants? Journal of Experimental Botany 73: 3840–3853. photosynthesis of the grass tribe Paniceae. American Journal of Botany 102: Schubert M, Humphreys AM, Lindberg CL, Preston JC, Fjellheim S. 2020. To 1493–1505. coldly go where no grass has gone before: a multidisciplinary review of cold Wen D, Yu Y, Zhu J, Nakhleh L. 2018. Inferring phylogenetic networks using adaptation in Poaceae. Annual Plant Reviews Online 3: 523–562. PHYLONET. Systematic Biology 67: 735–740. Shen W, Le S, Li Y, Hu F. 2016. SeqKit: a cross-platform and ultrafast toolkit Yan Z, Smith ML, Du P, Hahn MW, Nakhleh L. 2021. Species tree inference for FASTA/Q file manipulation. PLoS ONE 11: e0163962. methods intended to deal with incomplete lineage sorting are robust to the Silvestro D, Bacon CD, Ding W, Zhang Q, Donoghue PCJ, Antonelli A, Xing presence of paralogs. Systematic Biology 71: 367–381. Y. 2021. Fossil data support a pre-Cretaceous origin of flowering plants. Zhang C, Scornavacca C, Molloy EK, Mirarab S. 2020. ASTRAL-Pro: quartet- Nature Ecology & Evolution 5: 449–457. based species-tree inference despite paralogy.Molecular Biology and Evolution Smith ML, Hahn MW. 2021. New approaches for inferring phylogenies in the 37: 3292–3307. presence of paralogs. Trends in Genetics 37: 174–187. Zhang L, Zhu X, Zhao Y, Guo J, Zhang T, Huang W, Huang J, Hu Y, Huang Smith ML, Vanderpool D, Hahn MW. 2022. Using all gene families vastly C-H, Ma H. 2022. Phylotranscriptomics resolves the phylogeny of Pooideae expands data available for phylogenomic inference.Molecular Biology and and uncovers factors for their adaptive evolution.Molecular Biology and Evolution 39: msac112. Evolution 39: msac026. Smith MR. 2019. TreeDist: distances between phylogenetic trees. R package v.2.7. Zhang T, Huang W, Zhang L, Li D-Z, Qi J, Ma H. 2024. Phylogenomic Comprehensive R Archive Network. doi: 10.5281/zenodo.3528124. profiles of whole-genome duplications in Poaceae and landscape of differential Smith MR. 2020. Information theoretic generalized Robinson–Foulds metrics for duplicate retention and losses among major Poaceae lineages. Nature comparing phylogenetic trees. Bioinformatics 36: 5007–5013. Communications 15: 3305. Soreng RJ, Peterson PM, Zuloaga FO, Romaschenko K, Clark LG, Teisher JK, Zhang Y, Zhu Q, Shao Y, Jiang Y, Ouyang Y, Zhang L, Zhang W. 2023. Gillespie LJ, Barbera P, Welker CAD, Kellogg EA et al. 2022. A worldwide Inferring historical introgression with deep learning. Systematic Biology 72: phylogenetic classification of the Poaceae (Gramineae) III: An update. Journal 1013–1038. of Systematics and Evolution 60: 476–521. Zuntini AR, Carruthers T, Maurin O, Bailey PC, Leempoel K, Brewer GE, Spriggs EL, Christin P-A, Edwards EJ. 2014. C photosynthesis promoted Epitawalage N, Francoso E, Gallego-Paramo B, McGinnie C et al. 2024.4 species diversification during the Miocene grassland expansion. PLoS ONE 9: Phylogenomics and the rise of the angiosperms. Nature 629: 843–850. e97722. Stamatakis A. 2006. Phylogenetic models of rate heterogeneity: a high performance computing perspective. In: Proceedings 20th IEEE international Supporting Information parallel & distributed processing symposium. Rhodes, Greece: The Institute of Additional Supporting Information may be found online in the Electrical and Electronics Engineers, 8. Stamatakis A. 2014. RAXML v.8: a tool for phylogenetic analysis and post- Supporting Information section at the end of the article. analysis of large phylogenies. Bioinformatics 30: 1312–1313. Stebbins GL. 1985. Polyploidy, hybridization, and the invasion of new habitats. Fig. S1 Schematic overview of the custom workflow used for Annals of the Missouri Botanical Garden 72: 824–832. sequence assembly from Illumina shotgun accessions. Suh A. 2016. The phylogenomic forest of bird trees contains a hard polytomy at the root of Neoaves. Zoologica Scripta 45: 50–62. Sungkaew S, Stapleton CMA, Salamin N, Hodkinson TR. 2008. Non-monophyly Fig. S2 Nuclear gene recovery and sequence completeness. of the woody bamboos (Bambuseae; Poaceae): a multi-gene region phylogenetic analysis of Bambusoideae s.s. Journal of Plant Research 122: 95–108. Fig. S3 Paralog recovery across data types. Teisher JK, McKain MR, Schaal BA, Kellogg EA. 2017. Polyphyly of Arundinoideae (Poaceae) and evolution of the twisted geniculate lemma awn. Fig. S4 Effect of sequencing depth on the recovery of paralogs in Annals of Botany 120: 725–738. Thomas GWC, Ather SH, Hahn MW. 2017. Gene-tree reconciliation with shotgun accessions. MUL-Trees to resolve polyploidy events. Systematic Biology 66: 1007–1018. Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, Fig. S5 Overall paralog recovery across accessions. Greiner S. 2017. GeSeq – versatile and accurate annotation of organelle genomes. Nucleic Acids Research 45: W6–W11. Fig. S6 Test of the custom assembly workflow on full-genome Triplett JK, Clark LG, Fisher AE, Wen J. 2014. Independent allopolyploidization events preceded speciation in the temperate and tropical sequences. woody bamboos. New Phytologist 204: 66–73. Tsunewaki K. 2018. Dawn of modern wheat genetics: the story of the wheat Fig. S7 Test of the custom assembly workflow on full-genome stocks that contributed to the early stage of wheat cytogenetics. Cytologia 83: sequences – copy number recall. 351–364. Walkowiak S, Gao L, Monat C, Haberer G, Kassa MT, Brinton J, Ramirez- Gonzalez RH, Kolodziej MC, Delorean E, Thambugala D et al. 2020. Fig. S8 Detailed version of the multispecies coalescent nuclear Multiple wheat genomes reveal global variation in modern breeding. Nature species tree. 588: 277–283. Wang C, Han B. 2022. Twenty years of rice genomics research: from sequencing Fig. S9 Nuclear species tree stability under different data filtering and functional genomics to quantitative genomics.Molecular Plant 15: 593–619. strategies. Wang X, Ye X, Zhao L, Li D, Guo Z, Zhuang H. 2017. Genome-wide RAD sequencing data provide unprecedented resolution of the phylogeny of temperate bamboos (Poaceae: Bambusoideae). Scientific Reports 7: 11546. Fig. S10 Detailed plots of the reticulations inferred with gene Washburn JD, Schnable JC, Conant GC, Brutnell TP, Shao Y, Zhang Y, tree–species tree reconciliation. Ludwig M, Davidse G, Pires JC. 2017. Genome-guided phylo-transcriptomic methods and the nuclear phylogenetic tree of the Paniceae grasses. Scientific Fig. S11 Detailed version of the plastome tree. Reports 7: 13528. New Phytologist (2024)  2024 The Author(s). www.newphytologist.com New Phytologist 2024 New Phytologist Foundation. New Phytologist Research 17 Methods S1 DNA isolation, library preparation, sequencing and Table S4 Custom pipeline assembly statistics for shotgun acces- curation of the grass-specific Angiosperms353 reference dataset. sion. Table S1 Nuclear alignment summary statistics. Please note: Wiley is not responsible for the content or function- ality of any Supporting Information supplied by the authors. Any Table S2 Number of gene copies (paralogues) recovered per queries (other than missing material) should be directed to the accession and gene. New Phytologist Central Office. Table S3 HybPiper nuclear gene assembly statistics.  2024 The Author(s). New Phytologist (2024) New Phytologist 2024 New Phytologist Foundation. www.newphytologist.com