OPINION Why we need a centralized repository for isotopic data Jonathan N. Paulia,1, Seth D. Newsomeb, Joseph A. Cookc, Chris Harrodd, Shawn A. Steffane,f, Christopher J. O. Bakerg, Merav Ben-Davidh, David Bloomi, Gabriel J. Bowenj, Thure E. Cerlingj, Carla Cicerok, Craig Cookh, Michelle Dohml, Prarthana S. Dharampalf, Gary Gravesm,n, Robert Groppo, Keith A. Hobsonp, Chris Jordanq, Bruce MacFaddenr, Suzanne Pilaar Birchs,t, Jorrit Poelenu, Sujeevan Ratnasinghamv, Laura Russelli, Craig A. Strickerw, Mark D. Uhenx, Christopher T. Yarnesy, and Brian Haydenz Stable isotopes encode and integrate the origin of matter; thus, their analysis offers tremendous potential to address questions across diverse scientific disciplines (1, 2). Indeed, the broad applicability of stable isotopes, coupled with advancements in high-throughput analy- sis, have created a scientific field that is growing expo- nentially, and generating data at a rate paralleling the explosive rise of DNA sequencing and genomics (3). Centralized data repositories, such as GenBank, have become increasingly important as a means for archiving information, and “Big Data” analytics of these resources are revolutionizing science and everyday life. However, to date a centralized database for the management of isotopic data does not exist.Webelieve that the absence of such a resource has impeded research progress through the unnecessary duplication of effort, restricted the near-boundless application of stable isotopes, and curtailed the exchange of informa- tion among researchers. The creation of such a central- ized database would be more than a silo for data; it would be a dynamic resource to unite disciplinary fields and answer pressing questions in agriculture, animal sciences, archaeology, anthropology, ecology, medi- cine, nutrition, physiology, paleontology, forensics, and earth and planetary sciences. We believe that a central- ized database for isotopes would accelerate and en- hance such global and multidisciplinary endeavors, thus broaden the reach of isotope science. Here, we—a group of stable isotope scientists, data managers, mu- seum curators, journal editors, and educators—offer a vision for the public repository’s identity, structure, and long-term sustainability. The Need for IsoBank Stable isotopes play a ubiquitous role in modern sci- ence; hence, the benefits of IsoBank are potentially immense. Isotopes have been used to construct iso- scapes, continental or oceanic scale maps of isotope ratios in ground water and organic materials, trans- forming the fields of ecology and food and forensic science (4). Stable isotopes have a long history of use by archaeologists to reconstruct our past movements and diet and the rise and fall of civilizations (5), and by nu- tritionists to assess our current health (6). They are used by earth scientists to document the environmental and evolutionary history of the Earth, and by ecologists and physiologists to track the flux of nutrients between and within ecosystems (7) and individuals (8). More re- cently, researchers have begun to harness large isotopic datasets to address questions of global relevance—global nitrogen cycling (9) or continental climate variation (10). Yet, the syntheses of isotope data across broad spatial and temporal scales and across disciplinary fields has generally been hampered by the difficulty of effi- ciently procuring large datasets from the published lit- erature. This is compounded with the reality that most of the isotope data that currently exists are not, and may never be, published in peer-reviewed journals. Other relevant data are published in articles going back decades, but are effectively inaccessible to re- searchers. IsoBank would provide a route to enhance aDepartment of Forest and Wildlife Ecology, University of Wisconsin–Madison, Madison, WI 53706; bCenter for Stable Isotopes, Department of Biology, University of NewMexico, Albuquerque, NM 87131; cMuseum of Southwestern Biology, Department of Biology, University of NewMexico, Albuquerque, NM 87131; dInstituto de Ciencias Naturales Alexander von Humboldt, Universidad de Antofagasta, Antofagasta 1270300, Chile; eUS Department of Agriculture, Agricultural Research Service, Madison, WI 53706; fDepartment of Entomology, University of Wisconsin–Madison, Madison, WI 53706; gDepartment of Computer Science, University of New Brunswick, Saint John, NB, Canada E2L 4L5; hDepartment of Zoology and Physiology, University of Wyoming,WY 82071; iVertNet/iDigBio, FloridaMuseum of Natural History, University of Florida, Gainesville, FL 32611; jDepartment of Geology and Geophysics, University of Utah, Salt Lake City, UT 84112; kMuseum of Vertebrate Zoology, University of California, Berkeley, CA 94720; lPublic Library of Science, San Francisco, CA 94111; mDepartment of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012; nCenter for Macroecology, Evolution, and Climate, Natural History Museum of Denmark, University of Copenhagen, DK-2100 Copenhagen, Denmark; oAmerican Institute of Biological Sciences, Washington, DC 20005; pEnvironment Canada, Saskatoon, SK Canada S7N 3H5; qTexas Advanced Computing Center, The University of Texas at Austin, Austin, TX 78758; rFlorida Museum of Natural History, University of Florida, Gainesville, FL 32611; sDepartment of Anthropology, University of Georgia, GA 30602; tDepartment of Geography, University of Georgia, GA 30602; uPrivate address, Oakland, CA 94610; vCentre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada N1G 2W1; wUS Geological Survey, Fort Collins Science Center, Denver, CO 80225; xGeorge Mason University, Fairfax, VA 22030; yStable Isotope Facility, University of California, Davis, CA 95616; and zBiology Department, University of New Brunswick, Fredericton, NB, Canada E3B 5A3 The authors declare no conflict of interest. Any opinions, findings, conclusions, or recommendations expressed in this work are those of the authors and have not been endorsed by the National Academy of Sciences. 1To whom correspondence should be addressed. Email: jnpauli@wisc.edu. www.pnas.org/cgi/doi/10.1073/pnas.1701742114 PNAS | March 21, 2017 | vol. 114 | no. 12 | 2997–3001 O P IN IO N interdisciplinary research and a portal to published and unpublished datasets. Such a resource, then, could en- hance our understanding of human history, our predic- tions of global change, the diagnoses and treatment of human disease, and the study of our planet and solar system. We envisage IsoBank as both an aggregator and a repository of isotopic data. It should be an online, openly accessible database, with isotope measurements indexed via discipline-specific metadata.When possible, data deposited in IsoBank should be linked to archived samples and specimens. IsoBank will function as a uni- versal resource, and allow scientists to verify, replicate, compare, extend, and integrate data across studies. In the same way that GenBank filled an immediate need within the field of genetics, IsoBank will consolidate and organize the broad and growing number of disciplines that have the potential to use stable isotope measure- ments. IsoBank should be networked internationally with core isotope laboratories, government-funded science agencies, and peer-reviewed journals to foster collabo- rations and ensure sustainability. Organizational Structure The structure of IsoBank requires the recognition of the breadth of research conductedwith stable isotopes and the inclusion of a broad group of researchers, educa- tors, museum curators, and data repository experts to develop and oversee its operation (Fig. 1). The efforts of this group would be targeted by a team of project coordinators, each heading one subcommittee (below) and overseen by an independent advisory board con- sisting of experienced isotope scientists and database mangers. We envision at least four subcommittees (Fig. 1): (i) Information Technology: programmers, database ar- chitects, and web-designers who would build and maintain a high-capacity and user-friendly platform for IsoBank; (ii) Education and Training: specialists who would lead workshops, train potential users, provide online support, and promote professional develop- ment and outreach; (iii) Analytical Expertise: a con- sortium of core laboratories that analyze large volumes and diverse types of samples, to ensure rig- orous data standards and enhance cohesion and communication among independent analytical facili- ties and thereby facilitate the development of disci- plinary standards for data quality and laboratory operations, addressing the deficiency in isotopic in- vestigations; and (iv) Integrative Disciplinary: leaders in relevant fields, presenting the diverse use of iso- topes across disciplines, who would help craft rigorous metadata standards, identify disciplinary terminology, and reinforce its use. Data Storage and Metadata Structure For any repository to be useful, the data must be re- liable, accessible—ideally in a machine-readable for- mat—and have agreed-upon semantics for the data and metadata fields. A hierarchical design with relevant metadata fields would enable the alignment of isotope data from diverse research areas, and allow data to be traced back to analytical laboratories to facilitate in- dependent quality assurance/quality control reviews. Documenting the ontology of metadata will be one of the great challenges for IsoBank (e.g., ref. 11). Such a task is particularly challenging, given the broad range of disciplines involved and the importance of such metadata in statistical analyses (Fig. 2). Where possible, IsoBank should use existing ontologies, fa- cilitating current and future integration with existing databases. For example, IsoBank could assign in- tegrative taxonomic information system (https://www. ITIS.gov) serial numbers to organismal submissions that are then linked to a geographic distribution, evolutionary, or ecological relationships (12–14). We envision a database revolving around three pri- mary informational subunits—user, sample, and ana- lytical—each of which will be associated with core metadata terms, which could be further classified where required (Fig. 2). Isobank would need to seamlessly incorporate user information. Similar to other data repositories, IsoBank users should be able to link existing online profiles, ideally ORCID (https://orcid.org), to their IsoBank pro- file and data submissions. Just as active data re- positories, such as FigShare (https://figshare.com) and Dryad (www.datadryad.org), allocate a DOI for data loaded to their site, IsoBank should also allow users to receive recognition through DOI citations when data Fig. 1. Organizational structure for the proposed IsoBank. A central executive group would oversee four subcommittees (SC): Information technology, integrative disciplinary, education and training, and analytical expertise. GNIP, Global Network of Isotopes in Precipitation; IAEA, International Atomic Energy Association; QA/QC, quality assurance/quality control. 2998 | www.pnas.org/cgi/doi/10.1073/pnas.1701742114 Pauli et al. are downloaded or used in subsequent publications. We also see value in assigning unique IDs to analytical laboratories for data uploads to provide an opportunity to compare and evaluate different methods, analytical standards, and precision among laboratories. Ulti- mately, profiles of individuals and laboratories with a range of optional metadata will better connect data generators to contributors to users, ultimately enhanc- ing the use of stable isotope data. To accommodate a wide range of researchers, each isotopic data record in IsoBank should be stored under a tiered framework. Initially, data will be stored in a sub- repository (e.g., biogenic, inorganic, water), which will contain sufficient discipline-specific metadata to allow users to integrate data from IsoBank into discipline- specific or interdisciplinary analyses and to avoid han- dling irrelevant metadata terms (e.g., species taxonomy for water samples). Sample metadata will fall under two categories: essential metadata, describing every data record in IsoBank, and discipline-specific metadata. To maxi- mize the accessibility of IsoBank to data holders, the essential metadata should be kept to a minimum, and include latitude and longitude of sampling site, sam- ple material, isotopes measured, and their values. Discipline-specific metadata will be developed by working groups during the initial phase of IsoBank. Following the model established by the genomics community, the gold standard for accessions are data records that are tied directly to vouchered samples housed in permanent and accessible archives with data cross-linked to IsoBank, museum databases (e.g., Arc- tos), and data aggregators [e.g., iDigBio (15)]. If speci- mens are not curated in museums, users would be encouraged to provide sample storage location so that interested parties may contact them directly if they wish to conduct additional analyses. Stable isotope data are produced in a wide range of research and commercial laboratories. Although the methods by which the majority of data, mostly bulk car- bon (δ13C) and nitrogen (δ15N) stable isotope values, are generated is generally standardized, laboratories often use slightly different protocols and different laboratory reference materials to normalize data to internationally accepted scales (16). Other isotopes (e.g., δ2H and δ18O) have more fundamental issues associated with compa- rability measurements (17). To ensure data quality and user confidence in IsoBank, pertinent analytical in- formation must be submitted for each data record. Therefore, mirroring the subdivisions of sample met- adata, IsoBank should partition analytical fields into es- sential, recommended, and requested metadata. Such an approach will allow users with detailed analytical in- formation to post it, but will not inhibit others who lack those details from depositing their data. Essential metadata includes information, such as the specific isotope measured or the experimental error. In contrast, recommended and requested met- adata may include sample pretreatment methods (e.g., lipid extraction, demineralization), analytical methods, instrumentation, or laboratory reference materials used to normalize data (18). The reliability and accuracy of data could subsequently be ranked from “moderately reliable” to “very reliable” by data managers at IsoBank, based on the level of analytical metadata provided. Promoting Use Given the successful model of GenBank, the direct application of isotopic data to pressing questions across diverse fields, and recent initiatives for data transparency and sharing, we believe high-quality data in IsoBank will be heavily used. Thus, our at- tention is focused primarily on procedures that will ensure deposition of high-quality and relevant data in IsoBank. To accomplish this, IsoBank should in- clude features attractive to users as well as incentives to promote data-sharing. Fig. 2. A schematic of the proposed database structure, outlining how contributors and users would interface with samples, analyses, measurements, and datasets. GNIP, Global Network of Isotopes in Precipitation; IAEA, International Atomic Energy Association; VCDT, Vienna Canyon Diablo Troilite; VPDB, Vienna PDB; VSMOW, Vienna Standard Mean Oceanic Water. Pauli et al. PNAS | March 21, 2017 | vol. 114 | no. 12 | 2999 First, we envision that IsoBank’s graphical interface will enable users to easily navigate and query the database and rapidly upload and download data and associated metadata. We view IsoBank as a data re- pository and management system that features com- putational tools. However, the development of an application program interface would allow automated queries of the data and future integration with other datasets, a fundamental facet of Big Data analytics. Also, the structure of IsoBank’s interface should be designed in such a way that it can also serve as a personal data management system to further in- centivize use. This would encourage standardization between researchers and laboratories and would al- low users to archive all their data under the IsoBank ontology, while also maintaining shared and private data archives. The features of IsoBank that enable straightforward data uploads and analytical options would be paired with workshops and online assistance. To that end, IsoBank could follow the lead of other data re- positories (e.g., ref. 15) and sponsor a series of work- shops in the initial years at conferences, core isotope facilities, universities, and federal agencies to train potential users. Staff at IsoBank would also be avail- able to respond to queries or problems that users encounter while using IsoBank. We would also seek collaborative opportunities with data-mining groups to harvest previously published stable isotope data from the peer-reviewed literature. In archiving these additional data, IsoBank could serve as a central online bibliography for publications that contain sta- ble isotope data. The development of IsoBank would create norms around data-sharing expectations among stable isotope scientists. To facilitate use of IsoBank, partic- ipants could place embargo periods on their datasets before public release. IsoBank staff would work closely with funding agencies to help incentivize its use for supported research (e.g., requiring the use of IsoBank in proposal data-management plans). This group would also work with the editorial boards of journals to ensure that deposition of data in IsoBank meets journal requirements for data accessibility before publication. The value of inquiry-based approaches to education is now widely recognized (e.g., ref. 19) and motivates efforts to incorporate publicly available data into edu- cational initiatives. Web-accessible data provide edu- cators with excellent opportunities to build lessons that can engage students in original, data-driven exercises (20) and that promote the application of data to real- world problems, like climate change or disruption of biogeochemical cycles. Such experiential and authentic lessons encom- pass the biological knowledge, analytical abilities, and computational skills needed by our next generation of scientists and policy makers to shape responses to these 21st century challenges. IsoBank would allow a diverse audience of students to directly access iso- topic data for independent projects. We envision competitive IsoBank minigrants targeted to un- dergraduate students (e.g., National Science Foun- dation-Research Experiences for Undergraduates) who will conduct meta-analyses or quantitative re- views of isotopic data in their research projects. Securing Funding Given that stable isotopes are used by researchers globally, international opportunities should be pursued to fund IsoBank. To this end, we foresee IsoBank op- erating with independently funded mirror repositories as per GenBank (North America), EMBL (Europe), and INSDC (Japan). The large amount of start-up funding needed will likely require a collaboration between Eu- ropean and United States investigators. Applications to several European Union funding agencies, as well as similar agencies in the United States and Canada (e.g., National Science Foundation and National Sciences and Engineering Research Council), would facilitate a simultaneous start of both mirrors. To ensure IsoBank’s sustainability, we envision a long-term funding strategy that is part of governmental research infrastructure portfolios (e.g., National Insti- tutes of Health support for GenBank), as well as funding from the community of stable isotope users. For ex- ample, revenue could be generated for IsoBank through a small fee-per-upload, whereby users pay a nominal amount to deposit their data. This model is already in use by some existing online repositories (e.g., Dryad) and represents regular income that should grow with the size and use level of the repository. Imposing fees may potentially limit the use of Iso- Bank by researchers already facing constrained bud- gets. Thus, in the initial years of IsoBank, managers would need to ensure that data-deposit fees are man- ageable. In addition, IsoBank can engage directly with participating core laboratories to institute nominal sur- charges per sample submitted (e.g., US$ 0.10–0.25 per sample). Given the hundreds of thousands of samples analyzed annually at core isotope facilities, this ap- proach has the potential to generate sustained revenue to help offset the operation costs of IsoBank. By keeping fees low, the financial impact on researchers or laboratories would be limited. Finally, journal editors would need to ensure data deposition and availability in IsoBank by requiring authors to report data accession numbers in their manuscripts before publication, similar to current requirements for DNA data in GenBank. As evidence of the immediate demand for an IsoBank, several websites are emerging (e.g., Neotoma, IsoMemo) to consolidate isotopic data within search- able databases. These have been launched within a variety of disciplines among international collaborators. We believe that our shared vision for an IsoBank—a We believe that our shared vision for an IsoBank ... offers a viable and powerful framework to organize, consolidate, and broadly share stable isotope data across disciplines. 3000 | www.pnas.org/cgi/doi/10.1073/pnas.1701742114 Pauli et al. single, comprehensive, and centralized repository managed by a team of experts and following a uni- versally agreed ontology of metadata—offers a viable and powerful framework to organize, consolidate, and broadly share stable isotope data across disciplines. Such a repository would help to address the national initiative on data transparency, reinforce ongoing long- term and global data collection programs, and facilitate data integration as a tool to answer science’s most challenging problems. We welcome a continued dis- cussion to optimize the plan for IsoBank, but also see the need as extraordinary and encourage movement toward its rapid development and implementation. Acknowledgments We thank Brian Fry, Tamsin O’Connell, and Jim Ehleringer for constructive comments on an earlier daft of this manuscript; and the staff at the UNM Sevilleta Research Station for hosting the IsoBank Workshop. The IsoBank Workshop was funded with a grant through the National Science Foundation, Emerging Fron- tiers (NSF 1613214) and support from the Biodiversity Collections Network Research Coordinating Network (NSF 1441785). This article is dedicated to the memory of Scott Federhen. 1 Fry B (2006) Stable Isotope Ecology (Springer, New York). 2 West JB, Bowen GJ, Dawson TE, Tu KP (2010) Isoscapes: Understanding Movement, Pattern, and Process on Earth Through Isotope Mapping (Springer, The Netherlands). 3 Pauli JN, Steffan SA, Newsome SD (2015) It is time for IsoBank. Bioscience 65:229–230. 4 West JB, Bowen GJ, Cerling TE, Ehleringer JR (2006) Stable isotopes as one of nature’s ecological recorders. Trends Ecol Evol 21(7): 408–414. 5 Medina-Elizalde M, Rohling EJ (2012) Collapse of classic Maya civilization related to modest reduction in precipitation. Science 335(6071):956–959. 6 O’Brien DM (2015) Stable isotope ratios as biomarkers of diet for health research. Annu Rev Nutr 35:565–594. 7 Hall RO, Tank JL (2003) Ecosystem metabolism controls nitrogen uptake in streams in Grand Teton National Park, Wyoming. Limnol Oceanogr 48:1120–1128. 8 Boutton TW, Tyrrell HF, Patterson BW, Varga GA, Klein PD (1988) Carbon kinetics of milk formation in Holstein cows in late lactation. J Anim Sci 66(10):2636–2645. 9 Craine JM, et al. (2015) Convergence of soil nitrogen isotopes across global climate gradients. Sci Rep 5:8280. 10 Liu Z, et al. (2014) Paired oxygen isotope records reveal modern North American atmospheric dynamics during the Holocene. Nat Commun 5:3701. 11 Dumontier M, et al. (2014) The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. J Biomed Semantics 5(1):14. 12 Hinchliff CE, et al. (2015) Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proc Natl Acad Sci USA 112(41): 12764–12769. 13 Parr CS, et al. (2016) TraitBank: Practical semantics for organism attribute data. Semant Web 7:577–588. 14 Poelen JH, Simons JD, Mungall CJ (2014) Global biotic interactions: An open infrastructure to share and analyze species-interaction datasets. Ecol Inform 24:148–159. 15 Page LM, MacFadden BJ, Fortes JA, Soltis PS, Riccardi G (2015) Digitization of biodiversity collections reveals biggest data on biodiversity. Bioscience 65:841–842. 16 Ben-David M, Flaherty EA (2012) Stable isotopes in mammalian research: A beginner’s guide. J Mammal 93:312–328. 17 Meier-Augenstein W, Hobson KA, Wassenaar LI (2013) Critique: Measuring hydrogen stable isotope abundance of proteins to infer origins of wildlife, food and people. Bioanalysis 5(7):751–767. 18 Jardine TD, Cunjak RA (2005) Analytical error in stable isotope ecology. Oecologia 144(4):528–533. 19 Feldman A, Chapman A, Vernaza-Hernandez V, Ozalp D, Alshehri F (2012) Inquiry-based science education as multiple outcome interdisciplinary research and learning (MOIRL). Science Education International 23:328–337. 20 Cook JA, et al. (2014) Natural history collections as emerging resources for innovative education in biology. Bioscience 64:725–734. Pauli et al. PNAS | March 21, 2017 | vol. 114 | no. 12 | 3001