1 of 13 Innovations in phylogenetics and phylogenomics are rapidly ad- vancing our understanding of the tree of life, enabling the study of macroevolution at unprecedented scales. Despite these develop- ments, the overwhelming diversity of plant secondary metabolites of unknown structure and the taxonomic rarity of any given com- pound have until recently remained obstacles to comparative me- tabolomics, the comparison of small- molecule metabolite profiles, at the large taxonomic scales necessary for the study of macroevolu- tion and community ecology. However, recent advances in tandem mass spectrometry (MS/MS) bioinformatics enable the high- throughput comparison of the structures of unknown compounds (Wang et al., 2016), making possible comparative metabolomics at scales necessary for the study of chemical community ecology and macroevolution (Sedio, 2017). The structural comparison of unknown molecules using MS/MS is possible because molecules with similar structures fragment into many of the same substructures. MS/MS spectra can be collected from complex mixtures directly, or with the added separation pro- vided by ultra- high- performance liquid chromatograph (UHPLC), making MS- based metabolomics scalable to data sets containing hundreds of samples and tens of thousands of unique molecules. Comparative metabolomics of plant tissues, individuals, or species is aided by the organization of pairwise MS/MS similarities into mo- lecular networks in which nodes represent compounds and links in- dicate structural similarity (Watrous et al., 2012; Wang et al., 2016). A comparison of MS/MS spectra of unknown compounds to public spectral libraries, such as with the Global Natural Products Social (GNPS) Molecular Networking platform (https://gnps.ucsd.edu/; Applications in Plant Sciences 2018 6(3): e1033; http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Sedio et al. Applications in Plant Sciences is published by Wiley Periodicals, Inc. on behalf of the Botanical Society of America. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. INVITED SPECIAL ARTICLE For the Special Issue: Methods for Exploring the Plant Tree of Life A protocol for high- throughput, untargeted forest community metabolomics using mass spectrometry molecular networks Brian E. Sedio1,2,4 , Cristopher A. Boya P.2,3, and Juan Camilo Rojas Echeverri2 P R O T O CO L N O T E Manuscript received 23 August 2017; revision accepted 4 November 2017. 1 Smithsonian Tropical Research Institute, Apartado 0843-03092, Balboa, Ancón, Republic of Panama 2 Center for Biodiversity and Drug Discovery, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología, Apartado 0843-01103, Ciudad del Saber, Republic of Panama 3 Department of Biotechnology, Acharya Nagarjuna University, Nagarjuna Nagar, 522 510 Guntur, India 4 Author for correspondence: SedioB@si.edu Citation: Sedio, B. E., C. A. Boya P., and J. C. Rojas Echeverri. 2018. A protocol for high- throughput, untargeted forest commu- nity metabolomics using mass spectrometry molecular networks. Applications in Plant Sciences 6(3): e1033. doi:10.1002/aps3.1033 PREMISE OF THE STUDY: We describe a field collection, sample processing, and ultra- high- performance liquid chromatography–tandem mass spectrometry (UHPLC- MS/MS) instrumental and bioinformatics method developed for untargeted metabolomics of plant tissue and suitable for molecular networking applications. METHODS AND RESULTS: A total of 613 leaf samples from 204 tree species was collected in the field and analyzed using UHPLC- MS/MS. Matching of molecular fragmentation spectra generated over 125,000 consensus spectra representing unique molecular structures, 26,410 of which were linked to at least one structurally similar compound. CONCLUSIONS: Our workflow is able to generate molecular networks of hundreds of thousands of compounds representing broad classes of plant secondary chemistry and a wide range of molecular masses, from 100 to 2500 daltons, making possible large- scale comparative metabolomics, as well as studies of chemical community ecology and macroevolution in plants. KEY WORDS chemical ecology; liquid chromatography; molecular networking; tandem mass spectrometry; tropical forest ecology; untargeted metabolomics. Applications in Plant Sciences 2018 6(3): e1033 Sedio et al.—Mass spectrometry metabolomics • 2 of 13 http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Sedio et al. Wang et al., 2016), can identify known structures; comparing un- known spectra to each other can facilitate the study of chemical community ecology and evolution, even in diverse and understud- ied systems like tropical forests (Sedio, 2017; Sedio et al., 2017). The strength of MS/MS molecular networking metabolomics lies in its generality and scalability. Hence, there is a need for a simple, gen- eral, and repeatable method for the collection of MS/MS spectra that is broadly inclusive of chemical classes and molecular weights that can unleash the potential of molecular networking bioinfor- matics for the study of chemical community ecology and evolution. From 2014 to 2017, we investigated intra- and interspecific variation in foliar metabolomes in tree communities in temper- ate deciduous forest at the Smithsonian Environmental Research Center (SERC) near Edgewater, Maryland, and in tropical moist forest at Barro Colorado Island (BCI), Panama (Sedio et al., 2017; Appendix  1). To facilitate a community- level comparison of the metabolomic diversity of permanent forest recensus plots at BCI and SERC (B.  E. Sedio, J.  D. Parker, S.  M. McMahon, and S. J. Wright, unpublished data), we generated MS/MS metabolo- mic data for 613 leaf samples from 204 plant species, resulting in 248,570 individual MS/MS spectra and a molecular network com- prising 138,470 consensus spectra, or putative unique molecular structures (Fig. 1; Watrous et al., 2012). An excerpt of this network is presented in Fig. 2 that illustrates the utility of the molecular network approach for (a) visualization of structural relationships among unknown metabolites, (b) comparative metabolomics among plant species, and (c) identification of known compounds by searching public MS libraries (Wang et  al., 2016). Here, we describe a protocol for sample collection, chemical extraction, UHPLC- MS/MS instrumental methods, and bioinformatics workflow for the generation of molecular networks for plant me- tabolomics. This protocol is simple to execute, broadly inclusive of plant secondary chemical variation, effective over a relatively wide range of variation in polarity and molecular mass, and scala- ble to sample sizes large enough to facilitate chemical community ecology in species- rich plant communities such as tropical forests. METHODS AND RESULTS Field collection For community metabolomics of the forest plots at BCI and SERC, we collected young, unlignified leaves from saplings encountered in the shaded understory during the rainy season between June and August 2014. Leaves were placed on ice immediately in the forest and transferred to a −80°C freezer within 3 h of collection. See field collection protocol in Appendix 2. Extraction and sample preparation We homogenized 100 mg of frozen leaf tissue on liquid nitrogen in a ball mill (TissueLyser; QIAGEN, Hilden, Germany) and extracted the homogenate with 700 μL of 90% metha- nol : 10% water (pH 5) for 10 min. Methanol is an effective solvent for small molecules rep- resenting a wide range in polarity; mild acid- ity improves the extraction of most alkaloids. The solution was vortexed and centrifuged, and the supernatant was isolated. The extrac- tion was repeated on the remaining sample, and the fractions were combined. Samples were diluted in identical extraction sol- vent and filtered using 4- mm syringe filters with a hydrophilic polytetrafluoroethylene (PTFE) membrane with a 0.20- μm pore size (Merck Millipore, Billerica, Massachusetts, USA) prior to analysis using UHPLC- MS/ MS. See the chemical extraction protocol in Appendix 3. Liquid chromatography instrument methods Samples were analyzed using an Infinity 1290 UHPLC from Agilent Technologies (Santa Clara, California, USA) with a Kinetex C18 column that was 100 mm in length, 2.1 mm in internal diameter, with a 1.7- μm particle size (Phenomenex, Torrance, California, USA), and a flow rate of 0.5 mL/min at 25°C (no flow splitting was used prior to infusion into the mass spectrometer). To separate a complex FIGURE 1. The generation of molecular networks based on mass spectrometry. (A) Tandem mass spectrometry provides fragment ion (MS2) spectra representing seven compounds, with each peak representing the mass- to- charge ratio (m/z, horizontal axis) and ion intensity (vertical axis) of a constituent molecular fragment. (B) Spectra are aligned (colored vertical lines identify shared molecular fragments), and similarity scores (numbers with arrows) are calculated between every pair. (C) The similarity scores are used to define molecular networks in which nodes repre- sent compounds and the width of the links represents structural similarity. (D) Compounds are mapped onto two plant species. The figure is adapted from Watrous et al. (2012) with permission. A Mass/Charge 0.7 0.7 0.8 0.8 0.9 0.6 Species 1 Species 2 Both B C D Io n In te ns ity Applications in Plant Sciences 2018 6(3): e1033 Sedio et al.—Mass spectrometry metabolomics • 3 of 13 http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Sedio et al. mixture with molecules separated by a wide range of polarity, we employed a 37- min sol- vent gradient with 0.1% formic acid in water (A) and 0.1% formic acid in acetonitrile (B): 0–2 min at 5% B, 2–27 min gradient from 5% B to 100% B, followed by 8 min at 100% B, and 35–37 min from 100% B to 5% B. Mass spectrometry instrument methods Separation by LC was followed by electrospray ionization (ESI) in positive mode and MS/MS detection on a micrOTOF- QIII quadrupole- time-of-flight mass spectrometer (Bruker Daltonics, Billerica, Massachusetts, USA). We optimized MS parameters to detect and frag- ment molecules representing as wide a range in the mass- to- charge ratio (m/z) of the parent compound as possible. We began the process of optimization by analyzing ESI- L low con- centration tuning mix (G1969- 85000; Agilent Technologies), as well as foliar extracts of species of Psychotria L. (Rubiaceae), a genus that exhibits diverse alkaloids, flavonoids, and terpenes (Riba et al., 2003; Kowalczuk et al., 2015; Klein- Júnior et al., 2016). Of particular importance was P. acuminata Benth., one of the most chemically diverse species known on BCI (Sedio et al., 2017). We analyzed tuning mix and Psychotria samples with the Bruker default “tune wide” parameter setting and following Garg et al. (2015). We then sequen- tially tuned ion guide funnels and multipoles by modifying radio frequency (RF) stepping and transfer time until we were able to de- tect molecules ranging from 100 to 2500 m/z. Data- dependent collision energies were opti- mized to improve fragmentation quality and sensitivity. Mass spectra were acquired using a micrOTOF- QIII mass spectrometer from Bruker Daltonics by ESI in positive mode. The ESI source parameters were: end plate offset, 500 V; capillary voltage, 4500 V; nebu- lizer, 2.0 bar (nitrogen gas); dry gas, 9.0 L/min; and dry tempera- ture, 200°C. The ion optics settings included: funnel 1 RF amplitude, 150 Vpp; funnel 2 RF amplitude, 300 Vpp; hexapole RF amplitude, 150 Vpp; in- source collision- induced dissociation (isCID) energy, 0.0 eV; quadrupole ion (transfer) energy, 10.0 eV; quadrupole low mass cut- off, 50.0 m/z; and pre- pulse storage, 10.0 μs. Data were ac- quired both for molecular ions (MS1) and fragment ions (MS2) in data- dependent fragmentation (auto MS/MS). For MS1 acquisition, three spectra were collected per second (3 Hz). For MS2 acquisition, the rate of acquisition was slowed down for low- intensity molecular ions (20,000 counts) to 2 Hz in an at- tempt to increase the sensitivity for these ions and kept at 3 Hz for high- intensity molecular ions (1,000,000 counts); we employed a linear gradient in the rate of acquisition for species of intermedi- ate ion intensity. In an attempt to increase sensitivity, we utilized the advanced stepping mode to preferentially transfer (through the collision cell) low- intensity precursor ions and different fragment ions, resulting in acquisition of an averaged mass spectrum with four different parameter combinations (1: collision RF amplitude, 200 Vpp; transfer time, 96 μs; 2: collision RF amplitude, 300 Vpp; transfer time: 96.0 μs; 3: collision RF amplitude, 580 Vpp; transfer time, 120 μs; 4: collision RF amplitude, 680 Vpp; transfer time, 120 μs), each with an equal percentage of the time allotted for each MS2 acquisition cycle. Data- dependent fragmentation (auto MS/MS) was set to select a maximum of five precursor ions with intensities ≥6500 counts per fragmentation cycle of 3.0 s. A maximum of three spectra were col- lected for each precursor ion before placing it in an exclusion list for 1 min to allow collection of as many different ions per chromato- graphic peak as possible. The fragmentation energies used for two possible charged states (singly and doubly charged) are presented in Table 1. FIGURE 2. Subset of a molecular network of foliar metabolomes of 204 plant species from Maryland and Panama. Presented is a cluster of 56 nodes that is part of a larger network data set comprising 138,470 consensus spectra, or putative unique molecular structures, derived from 204 plant species (B. E. Sedio, J. D. Parker, S. M. McMahon, and S. J. Wright, unpublished data). (A) The molecular mass of parent ions prior to fragmentation is indicated by a color scale from yellow (400 Da) to red (800 Da). (B) Compounds (nodes) found exclusively in plant species collected at Barro Colorado Island (BCI), Panama, are indicated in blue, and those exclusively found in the trop- ical tree genera Piper and Protium are indicated in light blue and pink, respectively. Compounds found exclusively in plant species collected at the Smithsonian Environmental Research Center (SERC) in Maryland, USA, are indicated in yellow, and compounds found in species from both forest sites are indicated in gray. Species codes are CEOC (Celtis occidentalis, Cannabaceae, SERC), CEPE (Ceiba pentandra, Malvaceae, BCI), CLOC (Clidemia octona, Melastomataceae, BCI), CLSE (Clidemia septuplinervia, Melastomataceae, BCI), DEPA (Desmopsis panamensis, Annonaceae, BCI), HECO (Heisteria concinna, Olacaceae, BCI), HIAL (Hieronyma alchorneoides, Euphorbiaceae, BCI), LUSE (Luehea seemannii, Malvaceae, BCI), MIVI (Microstegium vimineum, Poaceae, SERC), OULU (Ouratea lucens, Ochnaceae, BCI), PIA1 (Piper arboreum, Piperaceae, BCI), PICA (Piper schiedeanum, Piperaceae, BCI), PICU (Piper colonense, Piperaceae, BCI), PIIM (Piper imperialis, Piperaceae, BCI), PIPE (Piper perlasense, Piperaceae, BCI), PIRE (Piper reticulatum, Piperaceae, BCI), PRAV (Prunus avium, Rosaceae, SERC), and PRTE (Protium tenuifolium, Burseraceae, BCI; see Appendix  1). (C) Spectra that matched an annotated spectrum in a public library are indicated. Compounds matched in Global Natural Products Social (GNPS) public libraries are: (I) orientin, (II) vitexin, (III) ReSpect:PM007805 isoorientin, (IV) ReSpect:PS086308 orientin, (V) ReSpect:PS043007 puerarin, (VI) ReSpect:PM007810 3′- O- Methylluteolin 6- C- glucoside, (VII) pentoside of (iso)vitexin, (VIII) hexanoside of (iso)vitexin, and (IX) Massbank:PB006223 vitexin- 2″- O- rhamnoside. LUSE LUSE MIVI PIA1 PIRE 12 spp PIA1 PICA MIVI DEPA HECO OULU MIVI PIA1 CLSE CLSE HECO 4 spp PICU PICU CEOC HECO PICU CEOC PICU HECO PICU PIPE HECOHIAL PIA1 CLSE LUSE LUSE CLSSE HECO LUSE HECO CEOC7 spp 7 spp LUSE HECO LUSE CEPE CEOC OULU HECO PRAV 24 spp PRTE 11 spp PRTE PRTE CEOC HECO HECOHECO HECO HECO PIA1 HECO HIAL 4 spp CEOC 4 spp PIRE PIA1 PICU PICU PICU I I I I I I I I I I I IV V VI VII VII VII I IX VIII Molecular Mass (Da) 400 440 480 520 560 600 640 680 720 760 800 Compounds found in: BCI species Piper only Protium only SERC species Both forests A B C Applications in Plant Sciences 2018 6(3): e1033 Sedio et al.—Mass spectrometry metabolomics • 4 of 13 http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Sedio et al. The optimized MS method provided a detection range of 100 to 2500 m/z. It should be noted that these parameter values are unique to the micrOTOF- QIII instrument. However, we suggest a similar approach to optimization for a wide mass range by using a calibration solution (e.g., ESI- Tunemix, G1969- 85000; Agilent Technologies) to sequentially modify the MS settings until the desired m/z range is achieved. A chemically diverse biological extract consisting of a single sample (e.g., P. acuminata) or a pool of samples that are representative of molecular families of interest can be used to further tune the collision energies and confirm their suitability for the biological system to be analyzed. Although we used static collision energies for discrete m/z ranges (Table 1), ramping or stepping the collision energy applied during collision- induced dissociation within each m/z range may further improve the quality of molecular fragmentation achieved over a range of masses, molecular ion stabilities, and chemical classes. To eliminate non- informative fragmentation spectra, we filtered spectral matches by requiring a minimum number of matched fragment ions in the downstream bioinformatics analyses (see Bioinformatics, below). For calibration, an external calibration with ESI- Tunemix (G1969- 85000; Agilent Technologies) or a 10 mM sodium for- mate solution (in 50 : 50 propan- 2- ol : water with 0.2% formic acid, v/v) was performed every 12 h using the “Quadratic + HPC” calibration mode. At the same time, depending on availability, reserpine (43530- 4.5ML- F; Sigma Aldrich, St. Louis, Missouri, USA) or hexakis (1H,1H,2H- difluoroethoxy)phosphazene (8H79- 3- 02; Synquest Laboratories, Alachua, Florida, USA) was used for post- acquisition internal calibration. Acquired spectra were internally calibrated and exported in batch mode with Compass DataAnalysis 4.1 SR1 from Bruker Daltonics; refer to Appendix 4 for further details. Bioinformatics We generated a molecular network using the online workflow at GNPS (https://gnps.ucsd.edu/; Wang et al., 2016). First, we filtered the data by removing all MS/MS peaks within ±17 Da of the pre- cursor m/z. We then window- filtered the MS/MS spectra by choos- ing only the top six peaks in each ±50 Da window throughout the spectrum. The data were then clustered with MS- Cluster (Frank et al., 2008) with a parent mass tolerance of 2.0 Da and an MS/MS fragment ion tolerance of 0.5 Da to generate consensus spectra rep- resenting putative unique molecular structures. Consensus spectra containing <2 spectra were discarded and the remaining spectra were networked. Edges were formed for spectral matches with co- sine score ≥0.6 and ≥6 matched peaks. Edges were retained in the network only if both nodes linked by the edge were in each other’s top 10 most similar nodes. During network generation, spectra were compared to anno- tated spectra in public libraries through GNPS (Wang et al., 2016). We applied identical filter criteria to library spectra as to our input data and employed the GNPS analog library search method with a maximum mass shift of 100 Da. We retained matches to library spectra characterized by a cosine score ≥0.6 and with ≥6 matched peaks. The “group mapping” feature of GNPS allows one to track the origin of spectra, and hence, the plant species, tissue, or treatment in which a compound was detected. Network visualization software such as Cytoscape (www.cytoscape.org) can be used to generate publication- quality figures of molecular networks that illustrate at- tributes of the data such as molecular mass (Fig. 2A); incidence in plant species, tissues, or treatments (Fig. 2B); and matches with an- notated spectra from public MS libraries (Fig. 2C; Wang et al., 2016). A GNPS bioinformatics workflow can be found in Appendix 5. Recent developments in the bioinformatics pipeline for the as- sembly of molecular networks have improved upon the methods we describe above in several key respects (Olivon et al., 2017). Namely, the MS- Cluster algorithm (Frank et al., 2008) for grouping spectra into consensus spectra was originally designed for proteomics rather than small molecule metabolomics and therefore was not designed to consider differences in LC retention time that typically distinguish structural isomers with identical molecular masses. Olivon et  al. (2017) describe a bioinformatics workflow that integrates the MS analytical software MZmine 2 (Pluskal et al., 2010) into the workflow for the assembly of raw MS/MS spectra into molecular networks with GNPS. In the short term, we recommend preprocessing MS/MS data using MZmine 2 prior to GNPS network assembly (without using MS- Cluster) as described by Olivon et al. (2017) to resolve isomeric compounds, annotate molecular networks with putative chemical formulas, and improve the quantification of variation in ion abun- dances among samples. Future versions of GNPS will incorporate MZmine 2 into the online bioinformatics workflow (M. Wang, University of California, San Diego, personal communication). CONCLUSIONS We have developed an effective untargeted plant metabolomics workflow for community metabolomics, including a protocol for tissue collection in the field, a chemically general extraction protocol that retains compounds from a broad spectrum of plant secondary chemistry and is appropriate for diverse taxa, a UHPLC- MS/MS in- strumental method suitable for a wide range of polarities and molec- ular size classes, and a protocol for sharing and networking MS/MS data with the GNPS molecular networking platform (Wang et al., 2016). Because of its simplicity and generality, this workflow can be scaled for the collection of large and taxonomically and chemically diverse data sets, such as ecological communities or evolutionary lineages, thus facilitating the study of chemical community ecology and macroevolution (Sedio, 2017). Future efforts should test the ro- bustness of this workflow for field collections in remote locations where the freezing of tissue may be unfeasible and in situ drying of tissue may be the preferred means of sample collection. In addition, alternative extraction solvents, LC column stationary phases, and TABLE 1. Isolation- and collision- induced energies used in data- dependent fragmentation (auto tandem mass spectrometry) experiments. Mass- to- charge ratio (m/z) Isolation width (Da) Collision energy (eV) Charge state 100.00 4.00 15.0 1 100.00 8.00 11.3 2 300.00 5.00 20.0 1 300.00 10.00 15.0 2 500.00 6.00 25.0 1 500.00 12.00 18.8 2 1000.00 7.00 35.0 1 1000.00 14.00 26.3 2 1500.00 8.00 47.5 1 1500.00 16.00 35.6 2 Applications in Plant Sciences 2018 6(3): e1033 Sedio et al.—Mass spectrometry metabolomics • 5 of 13 http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Sedio et al. ionization methods should be explored as these may facilitate the analysis of chemical classes for which our protocol is suboptimal. ACKNOWLEDGMENTS The authors thank P. Dorrestein, M. Gutierrez, M. Meehan, T. Luzzatto, M. Wang, A. Durant, R. Gittens, and S. J. Wright for helpful insight during the development of the protocol. This work was sup- ported by the Smithsonian Institution Grand Challenges Program and Scholarly Studies Program, and the Smithsonian Tropical Research Institute Earl S. Tupper Fellowship. C.A.B. acknowl- edges Secretaría Nacional de Ciencia, Tecnología e Innovación (SENACYT)–Instituto para la Formacíon y Aprovechamíento de Recursos Humanos (IFARHU) of Panama for financial support. SUPPORTING INFORMATION Additional Supporting Information (Appendices S1 and S2) may be found online in the supporting information tab for this article. LITERATURE CITED Frank, A. M., N. Bandeira, Z. Shen, S. Tanner, S. P. Briggs, R. D. Smith, and P. A. Pevzner. 2008. Clustering millions of tandem mass spectra. Journal of Proteome Research 7: 113–122. Garg, N., C. Kapono, Y. W. Lim, N. Koyama, M. J. A. Vermeij, D. Conrad, F. Rohwer, and P. C. Dorrestein. 2015. Mass spectral similarity for untargeted metabolomics data analysis of complex mixtures. International Journal of Mass Spectrometry 377: 719–727. Gostel, M. R., C. Kelloff, K. Wallick, and V. A. Funk. 2016. A workflow to pre- serve genome- quality tissue samples from plants in botanical gardens and arboreta. Applications in Plant Sciences 4: 1600039. Haddock, S., and C. Dunn. 2011. Practical computing for biologists. Sinauer, Sunderland, Massachusetts, USA. Klein-Júnior, L. C., J. Viaene, J. Salton, M. Koetz, A. L. Gasper, A. T. Henriques, and Y. Vander Heyden. 2016. The use of chemometrics to study multifunc- tional indole alkaloids from Psychotria nemorosa (Palicourea comb. nov.). Part I: Extraction and fractionation optimization based on metabolic profil- ing. Journal of Chromatography A 1463: 60–70. Kowalczuk, A. P., A. Łozak, R. Bachliński, A. Duszyński, J. Sakowska, and J. K. Zjawiony. 2015. Identification challenges in examination of commer- cial plant material of Psychotria viridis. Acta Poloniae Pharmaceutica 72: 747–755. Olivon, F., G. Grelier, F. Roussi, M. Litaudon, and D. Touboul. 2017. MZmine 2 data- preprocessing to enhance molecular networking reliability. Analytical Chemistry 89: 7836–7840. Pluskal, T., S. Castillo, A. Villar-Briones, and M. Orešič. 2010. MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry- based molecular profile data. BMC Bioinformatics 11: 395. Riba, J., M. Valle, G. Urbano, M. Yritia, A. Morte, and M. J. Barbanoj. 2003. Human pharmacology of ayahuasca: Subjective and cardiovascular ef- fects, monoamine metabolite excretion, and pharmacokinetics. Journal of Pharmacology and Experimental Therapeutics 306: 73–83. Sedio, B. E. 2017. Recent breakthroughs in metabolomics promise to reveal the cryptic chemical traits that mediate plant community composition, charac- ter evolution, and lineage diversification. New Phytologist 214: 952–958. Sedio, B. E., J. C. Rojas Echeverri, C. A. Boya P., and S. J. Wright. 2017. Sources of variation in foliar secondary chemistry in a tropical forest tree community. Ecology 98: 616–623. Wang, M. X., J. J. Carver, V. V. Phelan, L. M. Sanchez, N. Garg, Y. Peng, D. D. Nguyen, et al. 2016. Sharing and community curation of mass spectrome- try data with Global Natural Products Social Molecular Networking. Nature Biotechnology 34: 828–837. Watrous, J., P. Roach, T. Alexandrov, J. Y. Yang, R. D. Kersten, M. van der Voort, K. Pogliano, et al. 2012. Mass spectral molecular networking of living mi- crobial colonies. Proceedings of the National Academy of Sciences USA 109: E1743–E1752. Yang, Y., M. J. Moore, S. F. Brockington, A. Timoneda, T. Feng, H. W. Marx, J. F. Walker, and S. A. Smith. 2017. An efficient field and laboratory work- flow for plant phylotranscriptomic projects. Applications in Plant Sciences 5: 1600128. APPENDIX 1. Voucher information for species presented in this study. Species Family Voucher specimen accession no.a Collection localityb Herbariumc Ceiba pentandra (L.) Gaertn. Malvaceae 15420 BCI SCZ Celtis occidentalis L. Cannabaceae 362 SERC SERC Clidemia octona (Bonpl.) L. O. Williams Melastomataceae 15177 BCI SCZ Desmopsis panamensis (B. L. Rob.) Saff. Annonaceae 15113 BCI SCZ Heisteria concinna Standl. Olacaceae 16017 BCI SCZ Hieronyma alchorneoides Allemão Euphorbiaceae 15144 BCI SCZ Luehea seemannii Triana & Planch. Malvaceae 15160 BCI SCZ Microstegium vimineum (Trin.) A. Camus Poaceae 431 SERC SERC Ouratea lucens (Kunth) Engl. Ochnaceae 15136 BCI SCZ Piper arboreum Aubl. Piperaceae 15197 BCI SCZ Piper schiedeanum Steud. Piperaceae 15167 BCI SCZ Piper colonense C. DC. Piperaceae 1172 BCI SCZ Piper reticulatum L. Piperaceae 15201 BCI SCZ Prunus avium (L.) L. Rosaceae 352 SERC SERC Protium tenuifolium Engl. Burseraceae 15262 BCI SCZ Psychotria acuminata Benth. Rubiaceae 15334 BCI SCZ aVouchers of these species (not necessarily the individuals sampled in this study) are collected and maintained by the Smithsonian Institution Forest Global Earth Observatory (ForestGEO)– Center for Tropical Forest Science (CTFS). Given are barcode accession numbers. bCollections were made within ForestGEO- CTFS forest dynamics plots at Barro Colorado Island (BCI), Panama (9°9′N, 79°51′W), and at the Smithsonian Environmental Research Center (SERC), Maryland, USA (38°53′N, 76°33′W). cHerbarium codes refer to the Summit Herbarium at the Smithsonian Tropical Research Institute (SCZ) and the herbarium at SERC. Applications in Plant Sciences 2018 6(3): e1033 Sedio et al.—Mass spectrometry metabolomics • 6 of 13 http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Sedio et al. APPENDIX 2. Two alternative setups for field collection on ice or liquid nitrogen. I. Setup 1: Collection on ice. This setup is convenient if tissue for metabolomic analysis is intended for interspecific comparative metabolomics, if tis- sue for metabolomic analysis is to be collected alongside tissue intended for DNA extraction, and if collections are to be at a site within a few hours hike from a laboratory equipped with a −80°C freezer. A. Field supplies: 1. Plant press, straps, cardboard, blotting paper, and newspaper 2. Coin envelopes for seeds or small fruits of voucher specimens 3. GPS unit and maps 4. Fine-tip Sharpie marker (Sanford L.P., Downers Grove, Illinois, USA), pencils 5. Field notebook 6. Field guide and keys 7. Hand lens 8. Hedge clippers 9. 2-mL Safe-Lock tubes (Eppendorf, Hamburg, Germany), two per sample 10. Rite-in-the-Rain paper (JL Darling, Tacoma, Washington, USA), cut with scissors into rectangles small enough to fit into the 2-mL tubes 11. Fabric lunch cooler 12. Rechargeable ice packs (Thermos, Chicago, Illinois, USA) 13. Cryogenic gloves B. Field procedure: 1. Remove plant material sufficient for chemical and DNA extrac- tions and voucher material. Choose material with mature flowers and fruits for voucher specimens. For samples intended for chem- ical analysis, consider variables such as herbivore damage, leaf ontogenetic stage, individual ontogenetic stage, and light environ- ment when considering which individuals and tissues to sample. 2. Label two 2-mL Safe-Lock tubes for each sample. Place 50 to 100 mg of leaf tissue from a single individual plant into each tube. 3. Label one small piece of Rite-in-the-Rain paper per tube and place the paper labels in each tube. Add the paper labels to the tubes after the sample plant tissue so that the label can be checked without removing the plant tissue. 4. Screw on the screw caps and place the tubes into the cooler. 5. Press three to five voucher specimens for each collection. Record collection date, collector, GPS coordinates, descriptive location, habitat (including light environment and topography), plant habit, reproductive status, color, and other specimen in- formation. See Gostel et al. (2016) for additional information on vouchers. 6. Upon arrival in the laboratory, label cardboard freezer storage boxes prior to use. Place samples into labeled freezer boxes for storage. Place the freezer boxes into the −80°C freezer using cryogenic gloves. 7. Check newspaper pressed with voucher specimens daily and exchange for fresh newspaper if saturated with moisture. II. Setup 2: Collection with liquid nitrogen. This setup is convenient if tissue for metabolomic analysis is in- tended for intraspecific or intra- individual comparative metabolo- mics, if tissue for metabolomic analysis is to be collected alongside tissue intended for RNA extraction, or if collections are to be made over multiple days at a site more than a few hours from a laboratory equipped with a −80°C freezer. A. Field supplies: In addition to the supplies listed for Setup 1, also bring: 1. Large liquid nitrogen container, 25 to 50 L (see Yang et al., 2017) 2. 10-L cryogenic liquid nitrogen container with straps and carry bag and a holding time of 88 days (SKU YDS-10; Hardware Factory Store, Los Angeles, California, USA) 3. Long metal tongs (e.g., VWR 82027-366; VWR, Radnor, Pennsylvania, USA) B. Field procedure: 1. Remove plant material sufficient for chemical and DNA extrac- tions and voucher material. Choose material with mature flow- ers and fruits for voucher specimens. For samples intended for chemical analysis, consider variables such as herbivore damage, leaf ontogenetic stage, individual ontogenetic stage, and light environment when considering which individuals and tissues to sample. 2. Label two 2-mL Safe-Lock tubes for each sample. Place 50 to 100 mg of leaf tissue from a single individual plant into each tube. 3. Label one small piece of Rite-in-the-Rain paper per tube and place the paper labels in each tube. Add the paper labels to the tubes after the sample plant tissue so that the label can be checked without removing the plant tissue. 4. Screw on the screw caps and place the tubes into the back- pack-portable liquid nitrogen dry shipper. 5. Press three to five voucher specimens for each collection. Record collection date, collector, GPS coordinates, descriptive location, habitat (including light environment and topography), plant habit, reproductive status, color, and other specimen information. See Gostel et al. (2016) for additional information on vouchers. 6. Upon arrival at the camp or vehicle, use cryogenic gloves and long metal tongs to remove sample tubes from the back- pack-portable liquid nitrogen dry shipper and place them into the large liquid nitrogen tank for short-term storage and trans- port to the laboratory. 7. Upon arrival at the laboratory, label cardboard freezer storage boxes prior to use. Place samples into labeled freezer boxes for storage. Place the freezer boxes into the −80°C freezer using cryogenic gloves. 8. Check newspaper pressed with voucher specimens daily and exchange for fresh newspaper if saturated with moisture. Applications in Plant Sciences 2018 6(3): e1033 Sedio et al.—Mass spectrometry metabolomics • 7 of 13 http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Sedio et al. APPENDIX 3. Chemical extraction and sample preparation. A. Tools and equipment 1. Access to a fume hood during the entire duration of extraction 2. pH meter. We currently use Mettler Toledo (Columbus, Ohio, USA). 3. Tissue homogenizer. We currently use QIAGEN TissueLyser (QIAGEN, Hilden, Germany). 4. Two Styrofoam shipping boxes with lids 5. Waste beaker 6. Liquid nitrogen 7. Benchtop liquid nitrogen container 8. Centrifuge 9. Pipettes: 1000 and 200 μL 10. Pipettes: 10 mL 11. Graduated cylinders: 500 and 50 mL 12. Glass bottle: 500 mL 13. Vortexer 14. Magnetic stir bar 15. Stir plate B. Reagents 1. Ultra-high-performance liquid chromatography–tandem mass spectrometry (UHPLC-MS)-grade methanol (14262, Honeywell Burdick and Jackson, Muskegon, Michigan, USA) 2. UHPLC-MS-grade water (14263, Honeywell Burdick and Jackson) 3. Hydrochloric acid solution, 0.05 N (35320, Honeywell Burdick and Jackson) C. Consumables 1. Kimwipe (Kimberly-Clark Professional, Roswell, Georgia, USA) 2. Paper towels 3. Weigh paper 4. Pipette tips: 1000 and 200 μL 5. Stainless steel balls 6. Microcentrifuge tubes 7. Disposable nitrile and latex gloves 8. 4-mm syringe filters with a hydrophilic polytetrafluoroethylene (PTFE) membrane with 0.20-μm pore size (Merck Millipore, Billerica, Massachusetts, USA) D. General considerations for working with the TissueLyser and centrifugation (modified from QIAGEN TissueLyser and DNeasy manuals). 1. Do not allow the metal TissueLyser adaptor plates to come into contact with liquid nitrogen. Expose each plastic tube rack to liquid nitrogen prior to fitting the tube rack into the adaptor plate. 2. Stainless steel beads are reusable. Wash the beads with warm, soapy water and rinse thoroughly with distilled water to remove soap residue and allow to air dry. If beads are to be used in nu- cleic acid extractions, incubate beads in 0.4 M HCl for 1 min at room temperature, rinse thoroughly with distilled water, and allow to air dry. 3. All centrifugation steps should be performed at room temperature. E. Safety 1. Methanol is highly flammable and toxic. Work in the fume hood and temporarily dispose of tips and tubes in the hood in a resealable bag. Consult the Material Safety Data Sheet (MSDS) for additional information on methanol safety and disposal. 2. HCl is a strong acid and very hazardous in case of contact with the skin, eyes, or mucous membranes. For these reasons, diluted HCl aqueous solution is preferable to solid HCl or concentrated liquid for reagent preparation in this protocol. Consult the MSDS for additional information on HCl safety and disposal. F. Reagent preparation: Extraction solvent: 500 mL of 90 : 10 methanol : water, pH 5 1. Pipette 6.7 mL 0.05 N HCl solution into 43.3 mL of UHPLC- MS-grade water to create a 0.0067 N HCl, pH 5, stock solution. 2. In a graduated cylinder, measure 360 mL of UHPLC-MS-grade methanol. Add it to a clean 500-mL bottle. 3. In a graduated cylinder, measure 40 mL of the 0.0067 N, pH 5, HCl solution. Add it to the 500-mL bottle containing the methanol. 4. Stir the solution using a magnetic stir bar on a stir plate prior to use. G. Organic molecule extraction 1. Work with 12 to 24 samples at a time, including one blank. The blank is a 2-mL Safe-Lock tube containing no leaf tis- sue. Apply all steps to the blank as if it were a leaf sample. Compounds found in blanks will be removed from down- stream analyses. 2. Weigh 100 mg of frozen leaf tissue. Record the weight. 3. If the sample was collected directly into a 2-mL Safe-Lock tube, return the sample to the tube. If sample material was not col- lected directly into a 2-mL Safe-Lock tube, label a tube and place the sample into it. 4. Place a stainless steel bead into each tube. Screw on the screw cap and place the tube in the TissueLyser tube rack. 5. Repeat Steps 2 through 4 for a set of 12 or 24 samples. 6. Cool the TissueLyser tube rack in liquid nitrogen. If using dried or lyophilized tissue, the tubes do not need to be frozen in liq- uid nitrogen. 7. Fit each tube rack between the TissueLyser adapter plates and place them into the TissueLyser clamps as described in the TissueLyser User Manual. Tighten the clamps tightly by hand. Work quickly so that the plant material does not thaw. 8. Grind the samples for 2 min at 20 Hz. 9. Remove and disassemble the plates and racks, noting the orien- tation of the tube racks during the first round of homogeniza- tion. Ensure that each tube’s screw cap is tightly closed. Applications in Plant Sciences 2018 6(3): e1033 Sedio et al.—Mass spectrometry metabolomics • 8 of 13 http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Sedio et al. 10. Cool the tube racks again in liquid nitrogen. Knock the racks upside down against the bench five times to ensure that all stainless steel beads can move freely within the tubes. Ensure that no liquid nitrogen remains, but do not allow the leaf mate- rial to thaw. 11. Grind the samples for another 2 min at 20 Hz. 12. Remove the plates from the TissueLyser and remove the adapter plates from each tube rack. Knock the racks against the bench five times to ensure that no tissue powder remains in the caps. Keep the samples frozen until extraction solvent is added (Step 13). 13. Add 700 μL of extraction solvent to each tube. 14. Vortex each tube for 30 s. 15. Centrifuge for 3 min at 12,000 rpm. 16. From each tube, remove 500 μL of supernatant to a fresh, labeled 2-mL microtube. Be careful not to disturb the layer of solid material at the bottom of the tube. 17. Add 500 μL of extraction solvent to each tube. 18. Vortex each tube for 30 s. 19. Centrifuge for 3 min at 12,000 rpm. 20. From each tube, remove 500 μL of supernatant to the same, labeled 2-mL microtube as the first fraction. Be careful not to disturb the layer of solid material at the bottom of the tube. 21. To prepare a 10× dilution, vortex one tube for 30 s and remove 50 μL to a labeled microtube containing 450 μL of extraction solvent. 22. To prepare a 100× dilution, vortex the 10× dilution for 30 s and remove 50 μL to a labeled microtube containing 450 μL of ex- traction solvent. 23. To prepare the diluted samples for LC-MS, for each sample, draw the full sample volume into a clean syringe. Remove the syringe needle and replace it with a 4-mm, 0.20-μm pore size syringe filter. Express the sample through the filter into a fresh, labeled HPLC vial. Cap the vial. 24. Discard the syringe filter and replace it with the syringe. To clean the syringe, draw >700 μL methanol into the syringe and express it into a waste receptacle. 25. Repeat Steps 23 and 24 for all samples to be analyzed by LC-MS. APPENDIX 4. Calibration and data conversion from .d format to .mzxml format. DataAnalysis 4.1 allows the user to process large batches of data so they can be exported externally. Data processing and export re- quire as input parameters the internal calibrant signal to be used for lock mass calibration and export file type. This protocol requires that DataAnalysis and CompassXport are installed on the computer used for processing. A. Export data with 32-bit precision and using recalibrated spectra 1. Execute RegEdit command in the Windows search window. 2. Locate HKEY_CURRENT_USER\Software\Bruker Daltonik\ CompassXport folder. 3. Double-click ExportPrecision64Bit. 4. Set Value data as 0. 5. Click OK. This will ensure that .mzXML files are exported with 32-bit precision. 6. In the same registry folder, double-click UseRecalibratedSpectra. 7. Set Value data as 1 (or make sure this was the value set by default). 8. Click OK. This will ensure that the spectra exported is the recal- ibrated spectra obtained after lock-mass calibration. B. Create an automatic processing method and custom script: 1. In DataAnalysis, open any .d file (any liquid chromatography– tandem mass spectrometry [LC-MS/MS] data acquired) 2. Under Calibrate → Parameters → Mass List, choose Sum Peak. 3. Under Calibrate → Parameters → Calibration → Lock Mass Calibration → Calibration group, select ESI. 4. Click Edit Lists 5. Compass Reference Mass List Editor window will open. To cre- ate a new reference list, include the name of the reference com- pound, the ion formula, charge state (z), and the exact mass of the ion (Fig. A4-1). 6. Under File → Save As, make sure the file extension is .ref (e.g., Reserpine.ref) and click Save. 7. Calibrate → Parameters → Calibration → Lock Mass Calibration → Calibration group and choose the new reference list and set the Intensity threshold to 500. 8. Click OK. 9. Under Method → Script…, write the following simple Visual Basic script: option explicit Analysis.ApplyLockMassCalibration true Analysis.Export “C:\Users\username\userfolder”, daMzXML, daLine Form.close FIGURE A4-1. An example lock mass calibrant reference list using reser- pine as an internal calibrant. Applications in Plant Sciences 2018 6(3): e1033 Sedio et al.—Mass spectrometry metabolomics • 9 of 13 http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Sedio et al. This script will (1) apply lock-mass calibration using the refer- ence signal selected in the method parameters (e.g., reserpine), (2) export line spectra in .mzXML format to the folder specified (e.g., “C:\Users\username\userfolder\”), and (3) close the pro- cessed file in DataAnalysis so another file can be processed. Note: No changes will be saved to the raw data file. 10. Under Method → Save as…, save the modified method in a recognizable folder. 11. Under Tools → ProcessWithMethod, two windows will open. Compass Compass Automation Engine and Compass DataAnalysis ProcessQueuer. 12. In Compass Automation Engine, click on Method to choose the method that was previously saved in the known folder. Then click on Select to choose the raw .d files that are going to be processed. 13. Once you have chosen the desired files for processing, click on Process to begin. Chosen files should move to Compass DataAnalysis ProcessQueuer. During the processing period, DataAnalysis will be busy and a great deal of the computer pro- cessing power will be occupied. We recommend running the conversion process overnight and avoiding other computation- ally intensive processes while data conversion is taking place. In our case, it was usually left for overnight processing. 14. When the process is complete, the exported .mzXML files will be found in the assigned folder. APPENDIX 5. Global Natural Products Social (GNPS) Molecular Networking bioinformatics workflows using MS-Cluster. The following protocol uses mass spectra in the construction of a molecular network using the GNPS Molecular Networking online platform (http://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp; Wang et al., 2016). For a general GNPS manual, see https://bix-lab. ucsd.edu/display/Public/Molecular+Networking+Documentation (Wang et al., 2016). The protocol provided here can be used to re- peat networking parameters that we have used to generate networks of foliar metabolites for forest tree commu- nities and other plant samples. After calibra- tion and data conversion to the .mzXML file format (see Appendix 4), mass spectra files are organized using “group mapping” and “attribute mapping” files in .txt format. Mass spectra files, a group mapping file, and an at- tribute mapping file are uploaded onto the platform using an ftp client. Once uploaded, user- generated mass spectra files are se- lected for networking, and public spectral li- braries are chosen against which to compare user- generated spectra. Finally, networking and library- matching parameters are chosen for the construction of a molecular network. A. Software required 1. An ftp client such as WinSCP 5.95.5 or FileZilla 3.27.1 2. An account on the GNPS Molecular Networking online platform (http:// gnps.ucsd.edu) B. Upload mass spectra data to GNPS using an ftp client • Setting a new connection to GNPS using WinSCP 1. In WinSCP under login → New Site 2. Under Session → File protocol, choose FTP → Encryption, choose No encryption 3. Under Host Name, write ccms-ftp01.ucsd.edu → Port Number, choose 21. 4. Navigate to the GNPS platform (http://gnps.ucsd.edu) and register as a new user. 5. Return to WinSCP. Under login → Session → User Name, write your GNPS username. Under → Password, write your password. 6. Click Save, write site name → Click Ok • Data upload using WinSCP 1. In WinSCP under login, choose the directory that you created. 2. Click Login 3. WinSCP will show two window schemes: the local direc- tory (C:\Users\XXXX) at left and GNPS server (/) at right. 4. Choose the folder of your files in the local site menu. Highlight the files or folders to upload and select Upload by right clicking or drag and drop the file to the GNPS server window on the right. You will then see the files queued and transferred to GNPS. C. Molecular network assembly 1. Sign in on GNPS (http://gnps.ucsd.edu). Then navigate to → Data Analysis. Click the highlighted text ‘Data Analysis’ to navigate to a new window, the Network Workflow, shown in Fig. A5-1. 2. Under Workflow Selection → Title, write the name of the net- work. We recommend including one’s username, date, and some details regarding the parameters used to create the network. FIGURE A5-1. The GNPS Network Workflow window. Applications in Plant Sciences 2018 6(3): e1033 Sedio et al.—Mass spectrometry metabolomics • 10 of 13 http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Sedio et al. 3. Do not use Networking Parameter Presets. 4. To input your files, navigate to Basic Options → Select Input Files. A pop-up window with three tabs will appear: Select Input Files, Upload Files, Share Files, as shown in Fig. A5-2. 5. Select the Select Input Files tab. Proceed by selecting the files or folder that comprise the input mass spectra (typically .mzXML files) and selecting one of Spectrum Files G1 through Spectrum Files G6. • If the spectra can be meaningfully organized into six groups, for example, if the spectra were derived from six plant species, then the spectra representing each group can be uploaded into the six Spectrum Files folders. Proceed to Step 6. • If the network will contain more than six groups, add all user-supplied input mass spectra files to Spectrum Files G1. • Select the files you want to use for Group mapping, followed by selecting the Group mapping buttons, and then select the files to use for Attribute mapping, followed by selecting the Attribute mapping buttons. • A .txt Group mapping file must be provided by the user. This file must be custom-edited using a text editor (e.g., Notepad++ for Windows or TextWrangler/BBEdit for Mac). • To create a Group Mapping file use the following format: GROUP_GroupName1=file1.mzXML;file2.mzXML GROUP_ GroupName2=file3.mzXML;file4.mzXML Where “GroupName1” can be any user-defined group name (e.g., PsychotriaAcuminata, or psycac, or BCI), and “file1. mzXML,” etc. are user-generated tandem mass spectrometry (MS/MS) spectra files. Each line in the Group mapping file must begin with the prefix “GROUP_” in all capital letters. • Note: The downstream GNPS network analyses provided in Appendix 6, below, do not depend on groups defined in the Group mapping file, but rather assume that filenames include a six-character species code with which to identify spectra. 6. Select Finish Selection to return to the Network Workflow page. 7. To adjust parameters that govern the sen- sitivity of MS-Cluster (under-the-hood software that generates consensus MS/ MS spectra), molecular networking and spectral library searches, navigate to Basic Options. • Under Basic Options → Precursor ion mass tolerance, set the parameter to a value from 0.0075 to 2.0 Da; the lower value represents lower ppm error tolerance. The particular setting of this parameter depends on the mass accuracy of the mass spectrometer as well as the specific instrument method used to collect the MS/MS data. We recommend using a precursor ion mass tolerance value below 1 Da for data collected on a quadrupole- time-of-flight (q-TOF) instrument. • Under Basic Options → Fragment Ion Mass Tolerance, set the parameter to a value from 0.0075 to 2.0 Da. This value specifies within what range fragment ion m/z values will be considered equivalent. We recommend using values below 0.5 Da for data collected on a q-TOF instrument. 8. To adjust parameters governing the molecular network similar- ity matrix alignment and the formation of links between nodes, navigate to Advanced Network Options (see Fig. A5-3). • To adjust the threshold of similarity that must occur be- tween a pair of consensus MS/MS spectra, set the value for Minimum cosine score to a value between 0.5 and 0.99. The default value is 0.7. Lower values will increase the size of the clusters due the clustering of less similar MS/MS spectra, and higher values will generate smaller clusters and leave more nodes unlinked. We recommend a value ≥0.6 for Minimum cosine score. • To adjust the maximum number of links to other nodes per- mitted for any single node, set the value for Network TopK. The default value is 10. The edge between two nodes is kept only if both nodes are within each other’s TopK most similar nodes. We use the default value. • To adjust the minimum number of MS/MS spectra permit- ted to form a consensus spectrum, set the value for Minimum Cluster Size to a value ≥1. Make sure that Run MsCluster is activated (set to yes). MSCluster merges nearly identical MS/ MS spectra into consensus spectra that represent structurally unique molecules (Frank et al., 2008). We use values from 1 to 2 depending on the number of replicates collected per sample. • To modify the number of common fragment ions compared between two spectra, set the value for Minimum Matched Fragment Ions. The default value is 6. We use values from 3 to 6 depending on the molecular weight of molecules. • To adjust the maximum size of nodes allowed in a single connected network, set the value for Maximum Connected Component Size (Beta); the default value is 100. We use the default parameter for small networks of one to 10 sam- ples. For large networks (more than 10 samples), we allow FIGURE A5-2. The GNPS Select Input Files window. Applications in Plant Sciences 2018 6(3): e1033 Sedio et al.—Mass spectrometry metabolomics • 11 of 13 http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Sedio et al. an unlimited number of nodes in a single network by setting Maximum Connected Component Size to 0. 9. To adjust the parameters that govern dereplication, or the matching of the user’s spectra with those of known com- pounds in public spectral libraries, navi- gate to Advanced Library Search Options. • To control the number of shared fragment ions required for a library match, set the value for Library Search Min Matched Peaks to a value ≥6. To control the minimum cosine similar- ity score required for a library match, set the Score Threshold to a value ≥0.6. If the annotation of molecular fam- ilies is desired, set the Enable analog search setting to Do Search and choose a value for the upper threshold for the FIGURE A5-3. The Advanced Network Options and Advanced Library Search Options parameter windows on the GNPS Network Workflow page. FIGURE A5-4. The Job Status window during the execution of a network run. Applications in Plant Sciences 2018 6(3): e1033 Sedio et al.—Mass spectrometry metabolomics • 12 of 13 http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Sedio et al. mass difference tolerated between user and library spectra under Maximum Analog Search Mass Difference. We use a threshold of 100 Da for plant extracts. 10. We do not use other advanced parameters for plant extracts. To set other advanced parameters, refer to Molecular Networking Documentation (https://bix-lab.ucsd.edu/display/Public/Mole cular+Networking+Documentation). 1. Click Submit. The workflow will begin to process the net- work in a new window showing a diagram that highlights each task as it is processed. A notification will be emailed to the user-defined email when the network is complete. Fig. A5-4 illustrates the Job Status window during the execution of a GNPS networking run. 2. Note that during a run or after the completion of a run, the Clone button can be used to return to the Network Workflow page (Fig. A5-1) with all parameters set to the values used for the cloned run. 3. Upon completion of the network, the Job Status window will appear as in Fig. A5-5. 4. To download the network data, navigate to the Status section and the Auxiliary Views header and select View Network, Node Centric. A new page will open. 5. Under Download tab, choose Download. Then select Tab- Delimited Results Only and All fields. 6. Click the Download button. This will download your data as a compressed folder titled “ProteoSAFe-METABOLOMICS- SNETS-[code].zip.” APPENDIX 6. Calculation of chemical structural-compositional similarity (CSCS) for all pairwise combinations of samples using Global Natural Products Social (GNPS) Molecular Networking output. A. Data organization during liquid chromatography–mass spec- trometry (LC-MS) and GNPS analyses 1. Downstream analyses using the GNPS network will be facil- itated by the consistent application of filenames to MS files FIGURE A5-5. The GNPS Job Status window upon completion of a network. Applications in Plant Sciences 2018 6(3): e1033 Sedio et al.—Mass spectrometry metabolomics • 13 of 13 http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Sedio et al. during the collection of LC-MS data. We use a five-digit nu- meric code to uniquely identify each sample, followed by a six-character code to refer to all plant species in the data set, fol- lowed by a three-character code for treatment combinations in analyses concerning intraspecific variation. Other information retained within filenames can be appended after these codes, but should be as consistent as possible. For example, a file con- taining the results of an LC-MS run on a sample of Psychotria acuminata that was a young, expanding leaf collected during the wet season in the shade would be named “00001_psycac_ ywh_20170430.mzXML.” 2. Blanks should contain the work “blank” in the filename in place of the species code. For example, “00002_blank_20170430. mzXML.” 3. Load all of the MS sample files to be networked into a single folder on GNPS. This can be a subdirectory of the user di- rectory. For example, “username/forestchem/00001_psycac_ ywh_20170430.mzXML.” B. Preparation of an MS-sample index file using regular expressions 1. GNPS output will be stored in a zip file titled “ProteoSAFe- METABOLOMICS-SNETS-[code]-view_network.zip,” where [code] is an eight-character code generated by GNPS to give the network a unique ID. Move the file to the intended work directory and unzip it. 2. Place within the working directory a .csv file named “FreshMass.csv” containing three columns: (1) “Sample,” five-digit unique sample codes, such as 00001; (2) “Species,” six-character species code; and (3) “FreshMass,” sample masses recorded during Step 2 of the chemical extraction de- scribed in Appendix 3. 3. Use a text editor with the ability to execute search-and-replace using regular expressions such as TextWrangler or BBEdit (https://www.barebones.com/products/textwrangler/). We rec- ommend Haddock and Dunn (2011; http://practicalcomputing. org/) for a tutorial on the use of regular expressions to modify text files. Here, we refer to TextWrangler. 4. Open the file ProteoSAFe-METABOLOMICS-SNETS-[code]- view_network/params.xml in TextWrangler. 5. Navigate to Search > Find 6. Select “Case sensitive” and “Grep.” Ensure that “Selected text only” is not selected. 7. In the “Find” field, enter: ()(spec-\d\d\d\d\d.mzXML)(.username/forestchem/) (0\d\d\d\d_blank.+mzXML)(\r) 8. Leave the “Replace” field blank. 9. Select “Replace All.” 10. In the “Find” field, enter: ()(spec-\d\d\d\d\d.mzXML)(.username/forestchem/) (0\d\d\d\d)(_)(\w\w\w\w\w\w)(.+mzXML)() 11. In the “Replace” field, enter: \2\t\4\5\6\7\t\6\t\4 12. Select “Replace All.” 13. Delete all rows that did not convert (Lines 1–27, ending with the lines “2.0” and the final four to 20 lines, including those that identify public MS libraries, for example “lib-00000.mgf|speclibs/MASSBANK/ MASSBANK.mgf” 14. In Line 1, add the column headers, separated by tabs: SpecCode, OrigFilename, Species, Sample 15. Save the file as “SampleSpecMap.txt” in the directory ProteoSAFe-METABOLOMICS-SNETS-[code]-view_network C. Assembly of the sample chemical composition matrix 1. Run the function “molecNetsTraits” (Appendix S1), using as input: a. code, the 8-character code for the GNPS network b. date, the date in the format “yyyymmdd” c. outfile, an output filename, default “MolecNetsChemTraits [date].RData” 2. The script will write the file “AttributesBlanksRemoved[date]. txt,” which includes all attribute data, excluding all net- work nodes (consensus spectra) observed in blanks. This is useful for generating network figures, for example, using Cytoscape. 3. The script will write the files “NetworkBlanksRemoved[date]. txt,” and “NetworkBlanksRemovedNoSingletons[date].txt.” These files contain the edges of the network and can be used to generate network figures, for example, using Cytoscape. These files will also be used in Step D. 4. The function “molecNetsTraits” will generate the following ob- jects and save them to the file “MolecNetsChemTraits[date]. RData”: a. sampsByCompounds: rows are leaf samples, columns are network nodes/consensus spectra representing unique com- pounds, entries are ion intensity b. sppByCompounds: rows are species, columns are network nodes/consensus spectra representing unique compounds, entries are mean ion intensities of each compound in each species c. network: the network output given by GNPS; each row is a link between two compounds, columns include CLUSTERID1, CLUSTERID2, and Cosine. D. Calculation of CSCS similarity metric 1. Run the function “calcCSCS” (Appendix S2), using as input: a. the date in the format “yyyymmdd” b. species (TRUE/FALSE); if FALSE, the function calculates CSCS for all pairs of samples, if TRUE, the function calcu- lates CSCS for all pairs of species c. outfile d. either sampsByCompounds or sppByCompounds to calcu- late CSCS for pairs of samps or species, respectively e. network 2. The function will generate the following objects and save them to the file “CSCS[date].RData”: a. sampsCompsStand, standardized relative ion intensity of compounds in each sample or species b. diag, a diagonal matrix containing each species CSCS simi- larity to itself c. cscs, a CSCS chemical similarity matrix for all pairs of sam- ples or species