1 Salmonella enterica genomes from victims of a major 16th century epidemic 1 in Mexico 2 3 Authors: Åshild J. Vågene1,2#, Alexander Herbig1,2#*, Michael G. Campana2, 3†, Nelly M. Robles García4, 4 Christina Warinner1, Susanna Sabin1, Maria A. Spyrou1,2, Aida Andrades Valtueña1, Daniel Huson5, Noreen 5 Tuross2*, Kirsten I. Bos1,2* and Johannes Krause1,2* 6 Affiliations: 7 1Max Planck Institute for the Science of Human History, Jena, Germany. 8 2Institute for Archaeological Sciences, University of Tübingen, Tübingen, Germany 9 3Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA. 10 4Institute of Evolutionary Medicine, University of Zurich, Zurich, Switzerland. 11 5INAH, National Institute of Anthropology and History, Mexico, Teposcolula-Yucundaa Archaeological 12 Project. 13 6Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Tübingen, Germany. 14 15 #These authors contributed equally to this work 16 *Correspondence to: A.H.: herbig@shh.mpg.de; N.T.: tuross@fas.harvard.edu; K.I.B.: bos@shh.mpg.de; J.K.: 17 krause@shh.mpg.de 18 †Current address: M.G.C.: Smithsonian Conservation Biology Institute, Center for Conservation Genomics, 19 3001 Connecticut Avenue NW, Washington, DC 20008, USA. 20 21 Summary paragraph: Indigenous populations of the Americas experienced high mortality 22 rates during the early contact period as a result of infectious diseases, many of which were 23 introduced by Europeans. Most of the pathogenic agents that caused these outbreaks remain 24 unknown. Through the introduction of a new metagenomic analysis tool called MALT 25 applied here to search for traces of ancient pathogen DNA, we were able to identify 26 Salmonella enterica in individuals buried in an early contact era epidemic cemetery at 27 Teposcolula-Yucundaa, Oaxaca in southern Mexico. This cemetery is linked, based on 28 historical and archaeological evidence, to the 1545-1550 CE epidemic that affected large 29 parts of Mexico. Locally this epidemic was known as “cocoliztli”, the pathogenic cause of 30 which has been debated for over a century. Here we present genome-wide data from ten 31 individuals for Salmonella enterica subsp. enterica serovar Paratyphi C, a bacterial cause of 32 enteric fever. We propose that S. Paratyphi C be considered a strong candidate for the 33 epidemic population decline during the 1545 cocoliztli outbreak at Teposcolula-Yucundaa. 34 35 2 Infectious diseases introduced to the New World following European contact led to 36 successive outbreaks in many regions of the Americas that continued well into the 19th 37 century. These often caused high mortality and thus contributed a central, and often 38 underappreciated, influence on the demographic collapse of many indigenous populations1-4. 39 Population declines linked to regionally specific epidemics are estimated to have reached as 40 high as 95%3, and their genetic impact based on recent population-based studies of ancient 41 and modern human exome and mitochondrial data attest to their scale5,6. One hypothesis 42 posits that the increased susceptibility of New World populations to Old World diseases 43 facilitated European conquest, whereby rapidly disseminating diseases severely weakened 44 indigenous populations2, in some cases even in advance of European presence in the 45 region2,7. Well-characterized infections such as smallpox, measles, mumps, and influenza are 46 known causes of later contact era outbreaks; however, the diseases responsible for many 47 early contact period New World epidemics remain unknown and have been the subject of 48 scientific debate for over a century2-4,7,8. 49 50 Morphological changes in skeletal remains9 and ethnohistorical accounts10 are often explored 51 as sources for understanding population health in the past, although both provide only limited 52 resolution and have generated speculative and at times conflicting hypotheses about the 53 diseases introduced to New World populations2,3,7,11,12. Most infectious diseases do not leave 54 characteristic markers on the skeleton due to either their short periods of infectivity, the death 55 of the victim in the acute phase before skeletal changes formed, or a lack of osteological 56 involvement9. While historical descriptions of infectious disease symptoms can be detailed, 57 they are subject to cultural biases, suffer from translational inaccuracies, lack a foundation in 58 germ theory, and describe historical forms of a condition that may differ from modern 59 manifestations8,11. Additionally, differential diagnosis based on symptoms alone can be 60 3 unreliable even in modern contexts since many infectious diseases have similar clinical 61 presentations. 62 63 Genome-wide studies of ancient pathogens have proven instrumental in both identifying and 64 characterizing past human infectious diseases. These studies have largely been restricted to 65 skeletal collections where individuals display physical changes consistent with particular 66 infections13-15, an historical context that links a specific pathogen to a known epidemiological 67 event16, or an organism that was identified via targeted molecular screening without prior 68 indication of its presence17. Recent attempts to circumvent these limitations have 69 concentrated on broad-spectrum molecular approaches focused on pathogen detection via 70 fluorescence-hybridization-based microarray technology18, identification via DNA-71 enrichment of certain microbial regions19 or computational screening of non-enriched 72 sequence data against human microbiome data sets20. These approaches offer improvements, 73 but remain biased in the bacterial taxa used for species-level assignments. As archaeological 74 material is expected to harbour an abundance of bacteria that stem from the depositional 75 context, omission of environmental taxa in species assignments can lead to false-positive 76 identifications. Additional techniques for authenticating ancient DNA have been 77 developed21,22, including the identification of characteristic damage patterns caused by the 78 deamination of cytosines23, methods that evaluate evenness of coverage of aligned reads 79 across a reference genome, or length distributions that consider the degree of fragmentation, 80 where ancient molecules are expected to be shorter than those from modern contaminants24. 81 82 A typical NGS dataset from an ancient sample comprises millions of DNA sequencing reads, 83 making taxonomic assignment and screening based on sequence alignments computationally 84 challenging. The gold standard tool for alignment-based analyses is BLAST25, due to its 85 4 sensitivity and statistical model. However, the computational time and power BLAST 86 requires to analyse a typical metagenomic dataset is often prohibitive. 87 88 Here we introduce MALT (MEGAN ALignment Tool), a program for the fast alignment and 89 analysis of metagenomic DNA sequencing data. MALT contains the same taxonomic binning 90 algorithm, i.e. the naïve LCA (Lowest Common Ancestor) algorithm (for reviews see26,27), 91 implemented in the interactive metagenomics analysis software MEGAN28. Like BLAST, 92 MALT computes “local” alignments between highly conserved segments of reads and 93 references. MALT can also calculate “semi-global” alignments where reads are aligned end-94 to-end. In comparison to protein alignments or local DNA alignments, semi-global DNA 95 alignments are more suitable for assessing various quality and authenticity criteria that are 96 commonly applied in the field of paleogenetics. 97 98 We applied our MALT screening pipeline (Supplementary Figures 1, 2) using a database of 99 all complete bacterial genomes available in NCBI RefSeq to non-enriched DNA sequence 100 data from the pulp chamber of teeth collected from indigenous individuals excavated at the 101 site of Teposcolula-Yucundaa, located in the highland Mixteca Alta region of Oaxaca, 102 Mexico29,30. The site contains both pre-contact and contact era burials, including the earliest 103 identified contact era epidemic burial ground in Mexico30,31 (Fig. 1; Supplementary methods 104 1). This is the only known cemetery historically linked to the cocoliztli epidemic of 1545-105 1550 CE30, described as one of the principal epidemiological events responsible for the 106 cataclysmic population decline of 16th century Mesoamerica7,32. This outbreak affected large 107 areas of central Mexico and Guatemala, spreading perhaps as far south as Peru7,30. Via the 108 MALT screening approach, we were able to identify ancient Salmonella enterica DNA in the 109 sequence data generated from this archaeological material, to the exclusion of DNA 110 5 stemming from the complex environmental background. While the pathogenic cause of the 111 cocoliztli epidemic is ambiguous based on ethnohistorical evidence7,8,30, we report the first 112 molecular evidence of microbial infection with genome-wide data from S. enterica subsp. 113 enterica serovar Paratyphi C (enteric fever) isolated from ten epidemic-associated contact era 114 burials. 115 116 Results 117 The individuals included in this investigation were excavated from the contact era epidemic 118 cemetery located in the Grand Plaza (administrative square) (n=24) and the pre-contact 119 churchyard cemetery (n=5) at Teposcolula-Yucundaa between 2004 and 201030 (Fig. 1; 120 Supplementary Table 1; Supplementary methods 1). Previous work demonstrated ancient 121 DNA preservation at the site through the identification of New World mitochondrial 122 haplogroups in 48 individuals, 28 of which overlap with this study30. Additionally, oxygen 123 isotope analysis identified them as local inhabitants30. Thirteen individuals included in this 124 study were previously radiocarbon dated31 yielding dates that support archaeological 125 evidence that the Grand Plaza (n=10) and churchyard (n=3) contain contact and pre-contact 126 era burials, respectively (Supplementary Table 1). The Grand Plaza is estimated to contain 127 more than 800 individuals, mostly interred in graves containing multiple persons. Those 128 excavated contribute to a demographic profile consistent with an epidemic event29,30. 129 130 Tooth samples were processed according to protocols designed for ancient DNA work 131 (Supplementary methods 2). An aggregate soil sample from the two burial grounds was 132 analysed in parallel to gain an overview of environmental bacteria that may have infiltrated 133 our samples. Pre-processed sequencing data of approximately one million paired-end reads 134 per tooth were analysed with MALT using a curated reference database of 6247 complete 135 6 bacterial genomes, comprising all those available in NCBI RefSeq (December 2016). Our 136 approach limits ascertainment biases and false positive assignments that could result from 137 databases deficient in environmental taxa (Supplementary methods 3). A runtime analysis 138 revealed a 200-fold improvement in computation time for MALT in comparison to BLASTn 139 (see Methods). Results were visualized in MEGAN28 and taxonomic assignments were 140 evaluated with attention to known pathogenic species. Reads taxonomically assigned by 141 MALT ranged from 4,842 to 44,315 for the samples. Assigned reads belonging to bacterial 142 constituents of human oral and soil microbiota are present in varying proportions amongst the 143 samples (Fig. 2; Supplementary Table 2). Of note, three teeth (Tepos_10, Tepos_14 and 144 Tepos_35) had between 365 to 659 reads assigned to Salmonella enterica, a known cause of 145 enteric fever in humans today. Of the S. enterica strains present in the database, S. Paratyphi 146 C had the highest number of assigned reads (Supplementary methods 3; Supplementary Table 147 2). Mapping these three metagenomic datasets to the S. Paratyphi C RKS4594 genome 148 (NC_012125.1) revealed the characteristic pattern of damage expected of ancient DNA 149 (Supplementary Fig. 3; Supplementary methods 3; Supplementary Table 4). Subsequently, 150 the sequencing data for all samples was mapped to the human genome (hg19), revealing a 151 similar level of damage in the human reads for Tepos_10, Tepos_14 and Tepos_35, thus 152 providing further support for the ancient origin of the S. enterica reads (Supplementary 153 methods 4; Supplementary Table 5). An additional seven individuals from the Grand Plaza 154 cemetery and one negative control harboured low numbers of assigned S. enterica reads 155 ranging from 4 to 51 (Supplementary Table 2). These were considered as potential weak-156 positive samples. One negative control was found to contain 15 reads assigned to S. enterica, 157 and a further four contained one or two reads, as did nine sample libraries, seven of which 158 were not included in downstream analyses. The soil library and remaining sample libraries 159 were void of S. enterica reads (Fig. 2; Supplementary methods 3; Supplementary Table 2). 160 7 An additional MALT screen for traces of viral DNA revealed one notable taxonomic hit to 161 Salmonella phage Vi II-E1, a phage associated with Salmonella serovars that produce the Vi 162 capsule antigen33, which includes S. Paratyphi C (Supplementary methods 3; Supplementary 163 Table 3). 164 165 To further authenticate and elucidate our findings we performed whole-genome targeted 166 array and in-solution hybridization capture34,35, using probes designed to encompass modern 167 S. enterica genome diversity (Supplementary methods 5, 6, 7; Supplementary Table 6). All 168 five pre-contact samples, the soil sample, one post-contact sample putatively negative for S. 169 enterica based on our MALT screening, all negative controls, and both UDG-treated (DNA 170 damage removed) and non-UDG treated libraries from the ten putatively positive samples 171 (Tepos_10, Tepos_11, Tepos_20, Tepos_14, Tepos_34, Tepos_35, Tepos_36, Tepos_37, 172 Tepos_38, Tepos_41) were included in the capture (Supplementary methods 6, 7). 173 174 Mapping and genotyping of the captured Illumina sequenced reads was performed using the 175 S. Paratyphi C reference genome (NC_012125.1) (Supplementary methods 6, 7, 8; 176 Supplementary Table 7). Capture of S. enterica DNA was successful for the ten positive 177 samples yielding a minimum of 33,327 unique reads per UDG treated library. The remaining 178 bone samples, soil sample, and negative controls were determined to be negative for ancient 179 S. enterica DNA with the exception of one negative control (EB2-091013) that had likely 180 become cross-contaminated during processing (see Supplementary methods 8; 181 Supplementary Table 7). Five complete genomes were constructed for Tepos_10, Tepos_14, 182 Tepos_20, Tepos_35 and Tepos_37, covering 95%, 97%, 67%, 98% and 74% of the 183 reference at a minimum of 3-fold coverage and yielding an average genomic coverage of 33-, 184 36-, 4.6-, 96- and 5.5-fold, respectively (Table 1). Artificial reads generated in silico for 23 185 8 complete genomes included in the probe design were also mapped to the S. Paratyphi C 186 RKS4594 reference (Supplementary methods 8; Supplementary Table 6) and phylogenetic 187 comparison revealed that the five ancient genomes clustered with S. Paratyphi C (Fig. 3; 188 Supplementary Figures 4, 6; Supplementary methods 8). The phylogenetic positioning was 189 retained when the whole dataset was mapped to and genotyped against the S. Typhi CT18 190 reference genome (NC_003198.1) (Supplementary Figure 5; Supplementary Table 8), the 191 most common bacterial cause of enteric fever in humans today. This result excludes the 192 possibility of a reference bias. Despite all five ancient genomes being contemporaneous, the 193 Tepos_10 genome was observed to contain many more derived positions. An investigation of 194 heterozygous variant calls showed that Tepos_10 has a much higher number of heterozygous 195 sites. We believe this is best explained by the presence of genetically similar non-target DNA 196 that co-enriched in the capture for this sample alone. Based on the pattern of allele 197 frequencies, this genome was excluded from downstream analyses (Supplementary methods 198 9; Supplementary Figure 7). Tepos_20 and Tepos_37 were also excluded due to their 199 genomic coverage of less than 6-fold, which allowed more reliable SNP calling at a minimum 200 of 5-fold coverage for Tepos_14 and Tepos_35. Subsequent phylogenetic tree construction 201 with 1000 bootstrap replicates revealed 100% support and branch shortening for the 202 Tepos_14 and Tepos_35 genomes in all phylogenies, supporting their ancient origin (Fig. 3; 203 Supplementary Figure 8). 204 205 SNP analysis for the ancient genomes together with the reference dataset yielded a total of 206 203,256 variant positions amongst all 25 genomes. Our analyses identified 681 positions 207 present in one or both of the ancient genomes, where 133 are unique to the ancient lineage 208 (Supplementary methods 10; Supplementary Table 9). Of these, 130 unique SNPs are shared 209 between Tepos_14 and Tepos_35, supporting their close relationship and shared ancestry. 210 9 The ydiD gene involved in the breakdown of fatty acids36 and the tsr gene related to the 211 chemotaxic response system37 were found to contain multiple non-synonymous SNPs 212 (nsSNPs) unique to the ancient genomes (Supplementary methods 10). Seven homoplastic 213 and four tri-allelic variant positions were detected in the ancient genomes (Supplementary 214 methods 10; Supplementary Tables 10a, 10b). 215 216 A region of the pil operon consisting of five genes, pilS, pilU, pilT, pilV and rci, was found in 217 our ancient genomes and was absent in the S. Paratyphi C RKS4594 genome38 218 (Supplementary methods 12; Supplementary Table 12). This region is located in Salmonella 219 Pathogenicity Island 7 (SPI-7), and encodes a type IVB pili39,40. The version of pilV in our 220 ancient genomes is thought to facilitate bacterial self-aggregation, a phenomenon that 221 potentially aids in invasion of host tissues39,40 (for details see Supplementary methods 12). A 222 presence/absence analysis of virulence factors was also conducted (Supplementary methods 223 13; Supplementary Table 13; Supplementary Figure 9). 224 225 The S. Paratyphi C RKS4594 strain harbours a virulence plasmid, pSPCV, which was 226 included in our capture design. It is present at 10- to 224-fold average coverage for the five 227 genomes (Supplementary methods 14; Supplementary Tables 14, 15). 228 229 Non-UDG capture reads mapped to the S. Paratyphi C genome (NC_012125.1) for Tepos_11, 230 Tepos_34, Tepos_36, Tepos_38 and Tepos_41, i.e. those that did not yield full genomes, had 231 damage patterns characteristic of ancient DNA (Supplementary Figure 3). To further verify 232 these reads as true ancient S. Paratyphi C reads we investigated 45 SNPs unique to Tepos_14 233 and Tepos_35 that are included in our phylogenetic analysis (Supplementary methods 11; 234 Supplementary Table 11). Of the 45 positions, between 6 and 29 had a minimum of 1-fold 235 10 coverage in these low coverage libraries. All of these were in agreement with the unique 236 SNPs present in the high-coverage ancient genomes, thus confirming their shared ancestry. 237 238 Discussion 239 Interpretations of ethnohistorical documents have suggested some form of typhus or enteric 240 (typhoid/paratyphoid) fever (from the Spanish “tabardillo”, “tabardete”, and “tifus mortal”), 241 viral haemorrhagic fever, measles, or pneumonic plague as potential causes of the cocoliztli 242 epidemic of 1545 CE (for ref. see Supplementary discussion 1). These diseases present 243 symptoms similar to those that were recorded in the cocoliztli outbreak such as red spots on 244 the skin, bleeding from various body orifices, and vomiting (Supplementary discussion 1; 245 Supplementary Figure 10). Given the non-specific nature of these symptoms, additional 246 sources of data are needed to clarify which disease(s) was/were circulating. Previous 247 investigation of sequencing data generated from the Teposcolula-Yucundaa material did not 248 identify DNA traces of ancient pathogens; however, S. enterica was not considered as a 249 candidate41. Here we have isolated genome-wide data of ancient S. Paratyphi C from ten 250 Mixtec individuals buried in the Grand Plaza epidemic cemetery at Teposcolula-Yucundaa, 251 indicating that enteric fever was circulating in the indigenous population during the cocoliztli 252 epidemic of 1545-50 CE. As demonstrated here, MALT offers a sensitive approach for 253 screening non-enriched sequence data in search for unknown candidate bacterial pathogens 254 involved in past disease outbreaks, even to the exclusion of a dominant environmental 255 microbial background. Most importantly, it offers the advantage of extensive genome-level 256 screening without the need to specify a target organism, thus avoiding ascertainment biases 257 common to other screening approaches. Fast metagenomic profiling tools that are based on k-258 mer matching such as KRAKEN42 or specific diagnostic marker regions such as 259 MetaPhlan243 have limitations in ancient DNA applications. Complete alignments are needed 260 11 to authenticate candidate taxonomic assignments, and a small number of marker regions 261 might not provide sufficient resolution for identification, as target DNA is often present in 262 low amounts. Our focus on only bacterial and DNA viral taxa limits our resolution in 263 identifying other infectious agents that may have been present in the population during the 264 Teposcolula-Yucundaa epidemic. 265 266 Although our discussions here have focused on a single pathogenic organism, the potential of 267 its having acted synergistically with other pathogen(s) circulating during the epidemic must 268 be considered. The concept of syndemics and the complex biosocial factors that influence 269 infectious disease transmission and severity are well-documented in both modern and 270 historical contexts44,45. We are currently limited to the detection of bacterial pathogens and 271 DNA viruses included in the NCBI genomic database, though the resolution offered by 272 MALT analyses will increase as this database grows. We have not investigated the presence 273 of RNA viruses, since methods for RNA retrieval from archaeological tissues are 274 underdeveloped and not supported by our current protocols46. 275 276 We confidently exclude an environmental organism as the source for our ancient genomes on 277 the basis that 1) S. Paratyphi C is restricted to humans, 2) it is not known to freely inhabit soil 278 (our soil sample was negative for Salmonella during screening and after capture), 3) the 279 deamination patterns observed for the ancient human and S. Paratyphi C reads are 280 characteristic of authentic ancient DNA, and 4) the ancient S. Paratyphi C genomes display 281 expected branch-shortening in all constructed phylogenies. Moreover, we recovered all 282 ancient genomic data from the pulp-chambers of teeth collected in situ, increasing the 283 likelihood of our having identified a bacterium that was present in the victim’s blood at the 284 time of death. S. enterica introduction via post-burial disturbance is unlikely because the 285 12 graves in the Grand Plaza were dug directly into the thickly paved floor at the site and 286 historical records indicate that Teposcolula-Yucundaa was abandoned shortly after the 287 epidemic ended in 1552 CE29,30. 288 289 S. Paratyphi C is one of over 2600 identified S. enterica serovars distinguished by their 290 antigenic formula47. Only four serovars (S. Typhi and S. Paratyphi A, B, C), all of which 291 cause enteric fever, are restricted to the human host47. Today S. Typhi and S. Paratyphi A 292 cause the majority of reported cases48. S. Paratyphi C is rarely reported38,48. Infected 293 individuals shed bacteria long after the termination of symptoms47, and in the case of S. 294 Typhi infection, 1-6% of individuals become asymptomatic carriers49. Following the 295 hypothesis that this disease was introduced via European contact, it is conceivable that 296 asymptomatic European carriers who withstood the cross-Atlantic voyage could have 297 introduced S. Paratyphi C to Mesoamerican populations in the 16th century. First hand 298 descriptions of the 1545 cocoliztli epidemic suggest that both European and Mixtec 299 individuals were susceptible to the disease7,50, with one estimate of a 60-90% population 300 decline in New Spain during this period7. 301 302 The additional SPI-7 genes detected through indel analysis are reported to vary in 303 presence/absence amongst modern S. Paratyphi C strains38,40, and are suspected to cause 304 increased virulence when the inverted repeats in pilV allow the Rci recombinase to shuffle 305 between its two protein states (Supplementary methods 12). This may support an increased 306 capacity for our ancient strains to cause an epidemic outbreak. However, the overall 307 mechanisms through which S. Paratyphi causes enteric fever remain unclear. The nsSNPs in 308 the ydiD and tsr genes may signify adaptive processes, and comparison with a greater number 309 of S. Paratyphi C genomes may clarify this51. 310 13 311 Today, S. Paratyphi C is rare in Europe and the Americas, with more cases identified across 312 Africa and Asia52,53. Based on MLST data from modern S. Paratyphi C strains, no clear 313 phylogeographic pattern has been observed52. However, the presence of a 1200 CE S. 314 Paratyphi C genome in Norway indicates its presence in Europe in the pre-contact era51, 315 which would be necessary for it to be considered an Old World disease. However, based on 316 the small number of pre-contact individuals that we have screened, we cannot exclude the 317 presence of S. Paratyphi C at Teposcolula-Yucundaa prior to European arrival. A local origin 318 for the cocoliztli disease has been proposed elsewhere54. Historical accounts offer little 319 perspective on its origin since neither the indigenous population nor the European colonizers 320 had a pre-existing name for the disease7,8,30. Spanish colonial documents refer to it as 321 pujamiento de sangre (‘full bloodiness’), while the indigenous Aztec population of Central 322 Mexico called it cocoliztli, a generic term meaning ‘pestilence’ in Nahuatl7,8 (see 323 Supplementary discussion 1). 324 325 Little is known about the past severity and worldwide incidence of enteric fever, first 326 determined to be distinct from typhus in the mid-nineteenth century55. Enteric fevers are 327 regarded as major health threats across the world48, causing an estimated ~27 million 328 illnesses in 2000, the majority of which were attributed to S. Typhi56. Due to the rarity of S. 329 Paratyphi C diagnoses, mortality rates are not established for this particular serovar. Today, 330 outbreaks predominantly occur in developing countries. S. Typhi/Paratyphi are commonly 331 transmitted through the faecal-oral route via ingestion of contaminated food or water57. 332 Changes imposed under Spanish rule such as forced relocations under the policy of 333 congregación, altered living arrangements, and new subsistence farming practices29,30 334 14 compounded by drought conditions32 could have disrupted existing hygiene measures, 335 facilitating S. Paratyphi C transmission. 336 337 Our study represents a first step towards a molecular understanding of disease exchange in 338 contact era Mexico. The 1545 cocoliztli epidemic is regarded as one of the most devastating 339 epidemics in New World history7,32. Our findings contribute to the debate concerning the 340 causative agent of this epidemic at Teposcolula-Yucundaa, where we propose that S. 341 Paratyphi C be considered. We introduced MALT, a novel fast alignment and taxonomic 342 assignment method. Its application to the identification of ancient Salmonella enterica DNA 343 within a complex background of environmental microbial contaminants speaks to the 344 suitability of this approach, and its resolution will improve as the number of available 345 reference genomes increases. This method may be eminently useful for studies wishing to 346 identify pathogenic agents involved in ancient and modern disease, particularly in cases 347 where candidate organisms are not known a priori. 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 15 Methods 365 366 The MALT algorithm 367 MALT is based on the seed-and-extend paradigm and consists of two programs, malt-build 368 and malt-run. 369 First, malt-build is used to construct an index for the given database of reference sequences. 370 To do so, malt-build determines all occurrences of spaced seeds58-60 in the reference 371 sequences and places them into a hash table61. 372 Following this, malt-run is used to align a set of query sequences against the reference 373 database. To this end, the program generates a list of spaced seeds for each query and then 374 looks them up in the reference hash table, which is kept in main memory. Using the x-drop 375 extension heuristic25, a high-scoring ungapped alignment anchored at the seed is computed 376 and is used to decide whether or not a full alignment should be constructed. Local or semi-377 global alignments are computed using a banded implementation62 of the Smith-Waterman63 378 or Needleman-Wunsch64 algorithms, respectively. The program then computes the bit-score 379 and expected value (E-value) of the alignment and decides whether to keep or discard the 380 alignment depending on user-specified thresholds for bit-score, E-value or percent identity. 381 The application of malt-run is illustrated in Supplementary Figure 1. 382 383 The MALT screening pipeline 384 In order to use MALT in ancient DNA contexts to screen for bacterial DNA and to assess the 385 taxonomic composition of ancient bacterial communities we applied the following workflow 386 (Supplementary Figure 2). First we used malt-build to construct a MALT index on all 387 complete bacterial genomes in GenBank65. This was done only once, and is rebuilt only when 388 the target database requires updating. We align reads to the reference database using malt-run 389 16 in semi-global mode. MALT generates output in RMA format and in SAM format. The 390 former can be used for interactive analysis of taxonomic composition in MEGAN28 and the 391 latter for alignment-based assessment of damage patterns and other authenticity criteria. 392 393 Sample provenience 394 The site of Teposcolula-Yucundaa is situated on a mountain ridge in the Mixteca Alta region 395 of Oaxaca, Mexico. Prior archaeological excavation at this site revealed a large epidemic 396 cemetery located in the Grand Plaza – the town’s administrative centre, and an additional 397 cemetery in the churchyard (Fig. 1; Supplementary methods 1). 24 teeth were collected from 398 individuals buried in the Grand Plaza cemetery and 5 were collected from the individuals 399 buried in the churchyard cemetery (Supplementary methods 2; Supplementary Table 1). Soil 400 samples were also collected from both cemetery sites (Supplementary methods 2). 401 402 DNA extraction and library preparation 403 DNA extracts and double-stranded indexed libraries compatible with Illumina sequencing 404 were generated using methods tailor-made for ancient DNA66-68. This work was carried out in 405 dedicated ancient DNA cleanroom facilities at the University of Tübingen and Harvard 406 University (Supplementary methods 2). 407 408 Screening with MALT 409 Amplified libraries were shotgun sequenced. The reads were adapter clipped and merged 410 before being analysed with MALT and the results visualized in MEGAN628 (Supplementary 411 methods 2, 3). Two MALT runs were executed. The first using all complete bacterial 412 genomes available through NCBI RefSeq (December, 2016), and the second using the full 413 NCBI Nucleotide (nt) database (ftp://ftp-trace.ncbi.nih.gov/blast/db/FASTA/) as reference to 414 17 screen for viral DNA (Fig. 2; Supplementary methods 3; Supplementary Tables 2, 3). Both 415 runs used ‘semi-global’ alignment and a minimum percent identity of 95% (Supplementary 416 methods 3). The shotgun data was also mapped to the S. Paratyphi C RKS4594 reference 417 (NC_012125.1) and the human genome (hg19) and damage plots were generated 418 (Supplementary methods 3, 4; Supplementary Table 4, 5; Supplementary Figure 3). 419 420 Runtime comparison 421 The programs MALT (version 0.3.8) and BLAST25 (version 2.6.0+) were applied to the 422 shotgun screening data of Tepos_35 consisting of altogether 952,511 reads. For both 423 programs the DNA alignment mode (blastn) was chosen. The maximal E-value was set to 424 1.0. The maximal number of alignments for each query was set to 100. The minimal per cent 425 identity was set to 95. The number of threads was set to 16. The alignment type of MALT 426 was set to ‘Local’ in order to be comparable to BLAST. The total amount of RAM required 427 by MALT during this run was 252.7 GB. 428 For MALT the runtime was measured excluding the initial loading of our reference database, 429 which happens only once when screening multiple samples. The loading of the database takes 430 27.27 minutes. Including taxonomic binning the application of MALT to our complete 431 shotgun screening data took 123.36 minutes. As a comparison, processing only the screening 432 data of a single sample (Tepos_35) with BLASTn took 1420.58 minutes without any 433 taxonomic analysis. Processing of this sample alone with MALT, including taxonomic 434 binning, took 6.48 minutes, which constitutes a 200-fold improvement in terms of 435 computation time. 436 The computations were performed on a Dell PowerEdge R820 with four Intel Xeon E5-4620 437 2.2 GHz CPUs und 768 GB RAM. 438 439 18 Probe design and whole-genome capture 440 Array probes were designed based on 67 publicly available S. enterica 441 chromosomes/assemblies and 45 associated plasmid sequences (Supplementary methods 5; 442 Supplementary Table 6). Extracts from samples deemed to be positive for S. enterica 443 Paratyphi C were converted into additional rich UDG-treated libraries69 for whole-genome 444 capture (Supplementary methods 6, 7). Pre-contact and post-contact samples were serially 445 captured using our custom probe design, according to two established methods35,70. The 446 eluate from both array and in-solution capture was sequenced to a sufficient depth to allow 447 high coverage genome reconstruction (Supplementary methods 6, 7). 448 449 Sequence data processing, initial phylogenetic assessment and authenticity 450 The sequence data was adapter clipped and quality filtered before being mapped to the S. 451 Paratyphi C reference (NC_012125.1) (Supplementary methods 8; Supplementary Table 7). 452 Deamination patterns for the DNA were generated to assess the authenticity of the ancient S. 453 Paratyphi C DNA using mapDamage23 (Supplementary methods 8; Supplementary Table 7; 454 Supplementary Figure 3). Artificial read data was generated for a dataset of 23 genomes 455 selected for comparative phylogenetic analysis; this data was also mapped to the S. Paratyphi 456 C reference (Supplementary methods 8). SNP calling was carried out with GATK using a 457 quality score of ≥30 for the five S. Paratyphi C genomes and the artificial read dataset. A 458 neighbor-joining tree was constructed using MEGA671, based on a homozygous SNPs called 459 at a minimum of 3-fold coverage where at least 90% of reads are in agreement 460 (Supplementary methods 8; Supplementary Figure 4). In order to exclude a reference bias in 461 the ascertainment of the phylogenetic positioning of the five ancient genomes (Tepos_10, 462 Tepos_14 Tepos_20, Tepos_35 and Tepos_37), mapping, SNP calling and tree construction 463 19 was repeated for the S. Typhi CT18 reference (NC_003198.1) (Supplementary methods 8; 464 Supplementary Table 8; Supplementary Figure 5). 465 466 SNP typing and phylogenetic analysis 467 Homozygous SNPs were called from the complete dataset (5 ancient and 23 modern) based 468 on our criteria using a tool called MultiVCFAnalyzer (Supplementary methods 9). Repetitive 469 and highly conserved regions of the S. Paratyphi C genome (NC_012125.1) were excluded 470 from SNP calling to avoid spurious mapping reads. Maximum Parsimony71 and a Maximum 471 Likelihood72 trees were made including the five genomes (Fig. 3; Supplementary Figure 6). 472 Heterozygous positions were also called and their allele frequency distributions plotted using 473 R73 (Supplementary methods 9; Supplementary Figure 7). SNP calling and phylogenetic tree 474 construction was repeated excluding the Tepos_10, Tepos_20 and Tepos_37 genomes 475 (Supplementary methods 10; Fig. 3; Supplementary Figure 8). 476 477 The five weak-positive samples: Tepos_11, Tepos_34, Tepos_36, Tepos_38 and Tepos_41, 478 that did not yield enough data for genome reconstruction, were investigated for 46 SNPs 479 unique to the ancient genomes to verify that the captured reads for these samples are true 480 ancient S. Paratyphi C reads (Supplementary methods 11; Supplementary Table 11). 481 482 SNP effect and Indel analyses 483 SNP effect analysis was carried out for the two ancient genomes (Tepos_14 and Tepos_35) 484 alongside the modern dataset (see Supplementary methods 10). SNPs unique to the ancient 485 genomes, pseudogenes and homoplastic positions were investigated (Supplementary methods 486 10; Supplementary Tables 9, 10). Insertions and deletions (Indels), 700bp or larger, in the 487 two ancient genomes were identified through two approaches. Deletions were visually 488 20 detected by mapping the ancient data to the S. Paratyphi C reference using a mapping quality 489 threshold (-q) of 0 and manually viewing the genome alignment in the IGV browser 490 (Supplementary methods 12). In order to detect insertions, or regions present in the ancient 491 genomes that are missing the modern reference, the ancient data was mapped to concatenated 492 reference pairs. Where one reference was in all cases the S. Paratyphi C RKS4594 reference 493 (NC_012125.1) and the other was one of four S. enterica genomes of interest. A mapping 494 quality threshold (-q) of 37 was used, thus allowing only regions unique to one or the other 495 genome in the pair to map (Supplementary methods 12; Supplementary Table 12). 496 497 Virulence factor analysis 498 43 effector genes identified within Salmonella enterica subsp. enterica74 were investigated 499 using the BEDTools suite75. The percentage of each gene that was covered at least 1-fold in 500 the ancient and modern genomes in our dataset was plotted using the ggplot2package76 in R73 501 (Supplementary methods 13; Supplementary Figure 9). 502 503 Plasmid analysis 504 The ancient data was mapped to the S. Paratyphi C virulence plasmid, pSPCV. SNP effect 505 analysis was carried out in comparison to three other similar plasmid references 506 (Supplementary methods 14; Supplementary Tables 14, 15). 507 508 Data Availability 509 Sequence data that support the findings of this study have been submitted to the European 510 Nucleotide Archive under accession number PRJEB23438. MALT is open source and freely 511 available from: http://ab.inf.uni-tuebingen.de/software/malt. The program MultiVCFAnalyzer 512 is available on GitHub: https://github.com/alexherbig/MultiVCFAnalyzer. Source data for 513 21 figures are available upon request. 514 22 References: 515 516 1 Ubelaker, D. H. Prehistoric New World population size: Historical review and current appraisal of 517 North American estimates. Am J Phys Anthropol 45, 661-665, doi:10.1002/ajpa.1330450332 (1976). 518 2 Crosby, A. W. Virgin soil epidemics as a factor in the aboriginal depopulation in America. William 519 Mary Q 33, 289-299 (1976). 520 3 Dobyns, H. F. Disease Transfer at Contact. Annu Rev Anthropol 22, 273-291 (1993). 521 4 Acuna-Soto, R., Stahle, D. W., Therrell, M. D., Griffin, R. D. & Cleaveland, M. K. When half of the 522 population died: the epidemic of hemorrhagic fevers of 1576 in Mexico. FEMS Microbiol Lett 240, 1-523 5, doi:10.1016/j.femsle.2004.09.011 (2004). 524 5 Llamas, B. et al. Ancient mitochondrial DNA provides high-resolution time scale of the peopling of the 525 Americas. Sci Adv 2, e1501385, doi:10.1126/sciadv.1501385 (2016). 526 6 Lindo, J. et al. A time transect of exomes from a Native American population before and after 527 European contact. Nat Commun 7, 13175, doi:10.1038/ncomms13175 (2016). 528 7 Cook, N. D. & Lovell, W. G. Secret Judgments of God: Old World Disease in Colonial Spanish 529 America. (University of Oklahoma Press, 2001). 530 8 Fields, S. L. Pestilence and Headcolds: Encountering Illness in Colonial Mexico. (Columbia 531 University Press, 2008). 532 9 Ortner, D. J. Identification of pathological conditions in human skeletal remains. 2 edn, (Academic 533 Press, 2003). 534 10 Walker, R. S., Sattenspiel, L. & Hill, K. R. Mortality from contact-related epidemics among indigenous 535 populations in Greater Amazonia. Sci Rep 5, 14032, doi:10.1038/srep14032 (2015). 536 11 Joralemon, D. New World Depopulation and the Case of Disease. J Anthropol Res 38, 108-127 (1982). 537 12 Larsen, C. S. In the wake of Columbus: Native population biology in the postcontact Americas. Am J 538 Phys Anthropol 37, 109-154, doi:10.1002/ajpa.1330370606 (1994). 539 13 Bos, K. I. et al. Pre-Columbian mycobacterial genomes reveal seals as a source of New World human 540 tuberculosis. Nature 514, 494-497, doi:10.1038/nature13591 (2014). 541 14 Schuenemann, V. J. et al. Genome-wide comparison of medieval and modern Mycobacterium leprae. 542 Science 341, 179-183, doi:10.1126/science.1238286 (2013). 543 15 Warinner, C. et al. Pathogens and host immunity in the ancient human oral cavity. Nat Genet 46, 336-544 344, doi:10.1038/ng.2906 (2014). 545 16 Bos, K. I. et al. A draft genome of Yersinia pestis from victims of the Black Death. Nature 478, 506-546 510, doi:10.1038/nature10549 (2011). 547 17 Maixner, F. et al. The 5300-year-old Helicobacter pylori genome of the Iceman. Science 351, 162-165, 548 doi:10.1126/science.aad2545 (2016). 549 18 Devault, A. M. et al. Ancient pathogen DNA in archaeological samples detected with a Microbial 550 Detection Array. Sci Rep 4, 4245, doi:10.1038/srep04245 (2014). 551 19 Bos, K. I. et al. Parallel detection of ancient pathogens via array-based DNA capture. Philosophical 552 transactions of the Royal Society of London. Series B, Biological sciences 370, 20130375, 553 doi:10.1098/rstb.2013.0375 (2015). 554 20 Devault, A. M. et al. A molecular portrait of maternal sepsis from Byzantine Troy. Elife 6, 555 doi:10.7554/eLife.20983 (2017). 556 21 Warinner, C. et al. A Robust Framework for Microbial Archaeology. Annu Rev Genomics Hum Genet, 557 doi:10.1146/annurev-genom-091416-035526 (2017). 558 22 Key, F. M., Posth, C., Krause, J., Herbig, A. & Bos, K. I. Mining Metagenomic Data Sets for Ancient 559 DNA: Recommended Protocols for Authentication. Trends Genet 33, 508-520, 560 doi:10.1016/j.tig.2017.05.005 (2017). 561 23 Jonsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. & Orlando, L. mapDamage2.0: fast 562 approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682-1684, 563 doi:10.1093/bioinformatics/btt193 (2013). 564 24 Prufer, K. et al. Computational challenges in the analysis of ancient DNA. Genome Biol 11, R47, 565 doi:10.1186/gb-2010-11-5-r47 (2010). 566 25 Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. 567 J Mol Biol 215, 403-410, doi:10.1016/S0022-2836(05)80360-2 (1990). 568 26 Peabody, M. A., Van Rossum, T., Lo, R. & Brinkman, F. S. Evaluation of shotgun metagenomics 569 sequence classification methods using in silico and in vitro simulated communities. BMC 570 Bioinformatics 16, 363, doi:10.1186/s12859-015-0788-5 (2015). 571 27 Lindgreen, S., Adair, K. L. & Gardner, P. P. An evaluation of the accuracy and speed of metagenome 572 analysis tools. Sci Rep 6, 19233, doi:10.1038/srep19233 (2016). 573 23 28 Huson, D. H. et al. MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale 574 Microbiome Sequencing Data. PLoS Comput Biol 12, e1004957, doi:10.1371/journal.pcbi.1004957 575 (2016). 576 29 Spores, R. & Robles García, N. A prehispanic (postclassic) capital center in colonial transition: 577 excavations at Yucundaa Pueblo Viejo de Teposcolula, Oaxaca, Mexico. Lat Am Antiq 18, 33-353 578 (2007). 579 30 Warinner, C., Robles García, N., Spores, R. & Tuross, N. Disease, Demography, and Diet in Early 580 Colonial New Spain: Investigation of a Sixteenth-Century Mixtec Cemetery at Teposcolula Yucundaa. 581 Lat Am Antiq 23, 467-489 (2012). 582 31 Tuross, N., Warinner, C. & Robles García, N. in Yucundaa: La cuidad mixteca Yucundaa-Pueblo Viejo 583 de Teposcolula y su transformación prehispánica-colonial Vol. vol. 2 (eds Ronald Spores & Robles 584 García Nelly) 541-546 (Instituto Nacional de Antropología e Historia, 2014). 585 32 Acuna-Soto, R., Stahle, D. W., Cleaveland, M. K. & Therrell, M. D. Megadrought and megadeath in 586 16th century Mexico. Emerg Infect Dis 8, 360-362 (2002). 587 33 Pickard, D. et al. Molecular characterization of the Salmonella enterica serovar Typhi Vi-typing 588 bacteriophage E1. Journal of bacteriology 190, 2580-2587, doi:10.1128/JB.01654-07 (2008). 589 34 Burbano, H. A. et al. Targeted Investigation of the Neandertal Genome by Array-Based Sequence 590 Capture. Science 328, 723-725, doi:10.1126/science.1188046 (2010). 591 35 Fu, Q. et al. DNA analysis of an early modern human from Tianyuan Cave, China. Proceedings of the 592 National Academy of Sciences of the United States of America 110, 2223-2227, 593 doi:10.1073/pnas.1221359110 (2013). 594 36 Campbell, J. W., Morgan-Kiss, R. M. & Cronan, J. E., Jr. A new Escherichia coli metabolic 595 competency: growth on fatty acids by a novel anaerobic beta-oxidation pathway. Molecular 596 microbiology 47, 793-805 (2003). 597 37 Rivera-Chavez, F. et al. Salmonella uses energy taxis to benefit from intestinal inflammation. PLoS 598 Pathog 9, e1003267, doi:10.1371/journal.ppat.1003267 (2013). 599 38 Liu, W. Q. et al. Salmonella paratyphi C: genetic divergence from Salmonella choleraesuis and 600 pathogenic convergence with Salmonella typhi. PLoS One 4, e4510, doi:10.1371/journal.pone.0004510 601 (2009). 602 39 Tam, C. K., Morris, C. & Hackett, J. The Salmonella enterica serovar Typhi type IVB self-association 603 pili are detached from the bacterial cell by the PilV minor pilus proteins. Infect Immun 74, 5414-5418, 604 doi:10.1128/IAI.00172-06 (2006). 605 40 Tam, C. K., Hackett, J. & Morris, C. Salmonella enterica serovar Paratyphi C carries an inactive 606 shufflon. Infect Immun 72, 22-28 (2004). 607 41 Campana, M. G., Robles Garcia, N., Ruhli, F. J. & Tuross, N. False positives complicate ancient 608 pathogen identifications using high-throughput shotgun sequencing. BMC Res Notes 7, 111, 609 doi:10.1186/1756-0500-7-111 (2014). 610 42 Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact 611 alignments. Genome Biol 15, R46, doi:10.1186/gb-2014-15-3-r46 (2014). 612 43 Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker 613 genes. Nat Methods 9, 811-814, doi:10.1038/nmeth.2066 (2012). 614 44 Singer, M. & Clair, S. Syndemics and public health: reconceptualizing disease in bio-social context. 615 Med Anthropol Q 17, 423-441 (2003). 616 45 Herring, D. A. & Sattenspiel, L. Social Contexts, Syndemics, and Infectious Disease in Northern 617 Aboriginal Populations. American Journal of Human Biology 19, 190-202, doi:10.1002/ajhb (2007). 618 46 Guy, P. L. Prospects for analyzing ancient RNA in preserved materials. Wiley Interdiscip Rev RNA 5, 619 87-94, doi:10.1002/wrna.1199 (2014). 620 47 Gal-Mor, O., Boyle, E. C. & Grassl, G. A. Same species, different diseases: how and why typhoidal 621 and non-typhoidal Salmonella enterica serovars differ. Front Microbiol 5, 391, 622 doi:10.3389/fmicb.2014.00391 (2014). 623 48 Wain, J., Hendriksen, R. S., Mikoleit, M. L., Keddy, K. H. & Ochiai, R. L. Typhoid fever. Lancet 385, 624 1136-1145, doi:10.1016/S0140-6736(13)62708-7 (2015). 625 49 Monack, D. M., Mueller, A. & Falkow, S. Persistent bacterial infections: the interface of the pathogen 626 and the host immune system. Nat Rev Micro 2, 747-765 (2004). 627 50 Sahagún, B. d. Florentine Codex: general history of the things of New Spain. (1950-1982). 628 51 Zhou, Z. et al. Millennia of genomic stability within the invasive Para C Lineage of Salmonella 629 enterica. bioRxiv, doi:10.1101/105759 (2017). 630 52 Achtman, M. et al. Multilocus sequence typing as a replacement for serotyping in Salmonella enterica. 631 PLoS Pathog 8, e1002776, doi:10.1371/journal.ppat.1002776 (2012). 632 24 53 Centers for Disease Control and Prevention (CDC). National Typhoid and Paratyphoid Fever 633 Surveillance Annual Summary, 2014. (Atlanta, Georgia, 2016). 634 54 Acuna-Soto, R., Romero, L. C. & Maguire, J. H. Large epidemics of hemorrhagic fevers in Mexico 635 1545-1815. Am J Trop Med Hyg 62, 733-739 (2000). 636 55 Smith, D. C. Gerhard's distinction between typhoid and typhus and its reception in America, 1833-637 1860. Bull Hist Med 54, 368-385 (1980). 638 56 Crump, J. A., Luby, S. P. & Mintz, E. D. The global burden of typhoid fever. Bull World Health Organ 639 82, 346-353 (2004). 640 57 World Health Organization. Typhoid fever – Uganda, (2015). 642 58 Burkhardt, S. & Kärkkäinen, J. in Combinatorial Pattern Matching: 12th Annual Symposium, CPM 643 2001 Jerusalem, Israel, July 1–4, 2001 Proceedings (ed Amihood Amir) 73-85 (Springer Berlin 644 Heidelberg, 2001). 645 59 Ma, B., Tromp, J. & Li, M. PatternHunter: faster and more sensitive homology search. Bioinformatics 646 18, 440-445 (2002). 647 60 Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat 648 Methods 12, 59-60, doi:10.1038/nmeth.3176 (2015). 649 61 Ning, Z., Cox, A. J. & Mullikin, J. C. SSAHA: a fast search method for large DNA databases. Genome 650 Res 11, 1725-1729, doi:10.1101/gr.194201 (2001). 651 62 Chao, K. M., Pearson, W. R. & Miller, W. Aligning two sequences within a specified diagonal band. 652 Comput Appl Biosci 8, 481-487 (1992). 653 63 Smith, T. F. & Waterman, M. S. Identification of common molecular subsequences. J Mol Biol 147, 654 195-197 (1981). 655 64 Needleman, S. B. & Wunsch, C. D. A general method applicable to the search for similarities in the 656 amino acid sequence of two proteins. J Mol Biol 48, 443-453 (1970). 657 65 Benson, D. A. et al. GenBank. Nucleic Acids Res 41, D36-42, doi:10.1093/nar/gks1195 (2013). 658 66 Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear 659 reconstructed from ultrashort DNA fragments. Proceedings of the National Academy of Sciences of the 660 United States of America 110, 15758-15763, doi:10.1073/pnas.1314445110 (2013). 661 67 Meyer, M. & Kircher, M. Illumina sequencing library preparation for highly multiplexed target capture 662 and sequencing. Cold Spring Harb Protoc 2010, pdb prot5448, doi:10.1101/pdb.prot5448 (2010). 663 68 Kircher, M., Sawyer, S. & Meyer, M. Double indexing overcomes inaccuracies in multiplex 664 sequencing on the Illumina platform. Nucleic Acids Res 40, e3, doi:10.1093/nar/gkr771 (2012). 665 69 Briggs, A. W. et al. Removal of deaminated cytosines and detection of in vivo methylation in ancient 666 DNA. Nucleic Acids Res 38, e87, doi:10.1093/nar/gkp1163 (2010). 667 70 Hodges, E. et al. Hybrid selection of discrete genomic intervals on custom-designed microarrays for 668 massively parallel sequencing. Nat Protoc 4, 960-974, doi:10.1038/nprot.2009.68 (2009). 669 71 Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary 670 Genetics Analysis version 6.0. Molecular biology and evolution 30, 2725-2729, 671 doi:10.1093/molbev/mst197 (2013). 672 72 Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large 673 phylogenies. Bioinformatics 30, 1312-1313, doi:10.1093/bioinformatics/btu033 (2014). 674 73 R Development Core Team. (The R Foundation for Statistical Computing, Vienna, Austria, 2011). 675 74 Connor, T. R. et al. What's in a Name? Species-Wide Whole-Genome Sequencing Resolves Invasive 676 and Noninvasive Lineages of Salmonella enterica Serotype Paratyphi B. MBio 7, 677 doi:10.1128/mBio.00527-16 (2016). 678 75 Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. 679 Bioinformatics 26, 841-842, doi:10.1093/bioinformatics/btq033 (2010). 680 76 Wickham, H. ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag New York, 2009). 681 682 683 684 685 686 687 688 689 690 691 25 Acknowledgements: This work was supported by the Max Planck Society (J.K.), the 692 European Research Council (ERC) starting grant APGREID (to J.K.), Social Sciences and 693 Humanities Research Council of Canada postdoctoral fellowship grant 756-2011-501 (to 694 K.I.B.), and the Mäxi Foundation (M.G.C.). We thank the Archaeology Council at Mexico’s 695 National Institute of Anthropology and History (INAH) and the Teposcolula-Yucundaa 696 Archaeological Project for sampling permissions. We are grateful to Antje Wissgott, Guido 697 Brandt and Verena Schuenemann for assistance with laboratory work, Annette Günzel for 698 providing graphic support, and Rodrigo Barquera, Jim Hackett and Menaka Pi for thoughts 699 and discussion on the manuscript. Part of the data storage and analysis was performed on the 700 computational resource bwGRiD Cluster Tübingen funded by the Ministry of Science, 701 Research and the Arts Baden-Württemberg, and the Universities of the State of Baden-702 Württemberg, Germany, within the framework program bwHPC. We thank the MALT user 703 community for helpful comments and bug reports. 704 705 Author Contributions: K.I.B., M.G.C., A.H., N.T. and J.K. conceived the investigation. 706 K.I.B., A.H. Å.J.V., M.G.C. and J.K. designed experiments. N.M.R.G. provided 707 archaeological information and drawings, submitted INAH permits and assisted in the 708 sampling processes. Å.J.V., M.G.C., S.S., M.A.S. and K.I.B. performed laboratory work. 709 Å.J.V., A.H., K.I.B., C.W. and A.A.V. performed analyses. D.H. implemented the MALT 710 algorithm. A.H., J.K. and D.H. designed and set up the MALT ancient DNA analysis 711 pipeline. C.W. performed ethnohistorical analyses. Å.J.V. and K.I.B. wrote the manuscript 712 with contributions from all authors. 713 714 Competing financial interests: The authors declare no competing financial interests. 715 26 Figure 1 | Overview of Teposcolula-Yucundaa. A) Illustrates the location of the 716 Teposcolula-Yucundaa site in the Mixteca Alta region of Oaxaca, Mexico; B) central 717 administrative area of Teposcolula-Yucundaa showing the relative positioning of the Grand 718 Plaza and churchyard cemetery sites. Burials within each cemetery are indicated with dark 719 grey outlines and the excavation area is shaded in grey; C) drawing of individual 35 from 720 which the Tepos_35 S. Paratyphi C genome was isolated. Panels B and C were adapted from 721 drawings provided by the Teposcolula-Yucundaa archaeological project archives-INAH and 722 ref. 30. The figure was kindly provided by Annette Günzel. 723 724 Figure 2 | MALT analysis and pathogen screening of shotgun data. Shotgun data was 725 analysed with MALT using a database constructed from all bacterial genomes available 726 through NCBI RefSeq (December 2016). MALT results were visualized using MEGAN628. 727 The bar chart was constructed from the MEGAN6 output and is based on the per cent reads 728 assigned to bacterial species when using a 95% identity filter. Reads assigned to Salmonella 729 enterica are coloured red regardless. Other taxa to which 3% or more reads, per sample, were 730 assigned are colour-coded depending on whether they are ‘environmental’ or ‘human oral 731 microbiome’ bacteria. Remaining taxa are sorted into two categories: ‘other environmental’ 732 or ‘other microbiome’ (Supplementary methods 3). Samples from the post-contact Grand 733 Plaza epidemic cemetery containing S. enterica reads, contact era samples from the 734 churchyard cemetery and the soil sample are illustrated. Additionally, a sample negative for 735 S. enterica from the Grand Plaza cemetery (Tepos_27) is shown. Samples whose names are 736 coloured in red are from the Grand Plaza and those in blue from the churchyard. The 737 percentage of reads in the shotgun data assigned by MALT per sample is indicated at the top 738 of each column. Only taxa with 4 or more reads assigned are visualized. 739 740 741 27 Figure 3 | Maximum Likelihood S. enterica phylogeny. Two maximum likelihood trees 742 were produced. Positions with missing data were excluded in both cases. In panel a) the tree 743 includes all five ancient genomes and is based on 3-fold SNP calls. It is based on 51,456 744 variant positions. b) shows a zoomed in view of the S. Paratyphi C genomes. In panel c) the 745 tree includes two high-coverage ancient genomes and is based on 5-fold SNP calls. It is based 746 on 81,474 variant positions. d) shows a zoomed in view of the S. Paratyphi C genomes, 747 illustrating the branch shortening of the two ancient genomes (Tepos_14 and Tepos_35). 748 Both trees were built with RAxML72. Branches coloured red indicate S. Paratyphi C genomes 749 and branches in blue indicate other genomes that are human-specific and cause enteric 750 (typhoid/paratyphoid) fever. 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 28 Table 1 | Overview of mapping statistics of captured sample libraries from the Grand 785 Plaza (contact era) and churchyard (pre-contact) 786 787 Sample ID Cemetery Site Library treatment # processed reads before mapping # Unique mapped reads Endogenous DNA (%) - quality filtered reads Mean Fold Coverage % of Genome Covered at least 3-fold Tepos_10 Grand Plaza non-UDG 16,945,834 399,561 20.56 4.35 52.17 UDG 68,628,270 2,903,258 16.30 32.84 95.49 Tepos_14 Grand Plaza non-UDG 20,559,478 1,222,402 23.51 14.41 95.77 UDG 73,204,225 3,410,610 18.62 36.44 97.67 Tepos_35 Grand Plaza non-UDG 27,248,720 1,803,043 31.37 25.50 97.67 UDG 90,815,050 7,025,774 30.00 96.43 98.06 Tepos_11 Grand Plaza non-UDG 21,941,119 19,576 0.87 0.21 0.93 UDG 48,959,732 103,492 0.75 1.21 14.56 Tepos_20 Grand Plaza non-UDG 771,431 15,236 6.94 0.15 0.26 UDG 20,123,713 427,781 4.75 4.59 67.53 Tepos_34 Grand Plaza non-UDG 18,934,710 123,307 2.55 1.35 14.65 UDG 26,284,766 157,930 2.05 1.74 21.67 Tepos_36 Grand Plaza non-UDG 23,147,904 36,224 0.75 0.40 1.76 UDG 21,910,196 33,327 0.42 0.36 1.4 Tepos_37 Grand Plaza non-UDG 5,223,138 218,874 9.28 2.55 42.12 UDG 9,603,890 416,449 7.71 5.49 74.48 Tepos_38 Grand Plaza non-UDG 8,280,412 18,308 0.91 0.19 0.97 UDG 47,835,731 65,812 0.54 0.67 4.22 Tepos_41 Grand Plaza non-UDG 17,608,445 33,664 0.72 0.37 1.47 UDG 19,966,958 36,208 0.48 0.40 1.34 Tepos_27 Grand Plaza non-UDG 17,931,300 4,778 0.07 0.04 0.27 Tepos_32 Churchyard non-UDG 25,721,427 6,665 0.08 0.06 0.47 Tepos_43 Churchyard non-UDG 31,129,662 3,426 0.05 0.03 0.25 Tepos_45 Churchyard non-UDG 18,027,289 6,879 0.12 0.06 0.34 Tepos_48 Churchyard non-UDG 17,915,341 4,312 0.06 0.04 0.25 Tepos_57 Churchyard non-UDG 24,478,844 5,527 0.07 0.05 0.28 Soil Grand Plaza & churchyard non-UDG 10,875,300 796 0.02 0.01 0.07 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 M IX T E C A A LTA M E X I C O G U E R R E R O C H IA PA S V E R A C R U Z O A X A C A O a x a c a C it y Te p o s c o lu la - Y u c u n d aa 5 0 m i0 5 0 k m0 P a c i f i c P U E B L A TA B A S C O 0 0 ,5 m0 2 0 m Te p o sc o lu la-Y u c u n d aa In d iv id u al 3 5 b u rial 1 8 CHURCHYARD GRAND PLAZA B CA 010 20 30 40 50 60 70 80 90 100 Re ad s (% ) Grand Plaza Post-Contact S. enterica positive S. enterica negative Churchyard Pre-Contact Soil 14Tepos_# 20 34 3835 413736 2.0Assigned reads (%) 10 11 0.70.60.90.90.70.80.90.71.3 45 48 57 soil27 43 0.20.71.90.82.11.2 32 3.2 Steroidobacter denitrificans Clostridium sporogenes Clostridium botulinum Clostridium tetani Streptosporangium roseum Other environmental Environmental bacteria: Microbiome bacteria: Salmonella enterica Other microbiome Pseudopropionibacterium propionicum Staphylococcus aureus Tannerella forsythia Actinomyces sp. oral taxon 414 Alcaligenes faecalis Desulfomicrobium orale Nitrobacter hamburgensis Pseudomonas stutzeri 0.2 Tepos_37 Bareilly CFSAN000189 Typhi Ty21a Bovismorbificans 3114 Tepos_35 Typhi CT18 arizonae serovar 62:z4,z23:- RSK2980 Typhi Ty2 Enteritidis P125109 Tepos_10 Heidelberg SL476 Tepos_20 Weltevreden 2007-60-3289-1 Cubana CFSAN002050 Paratyphi B SPB7 Paratyphi C RKS4594 Tepos_14 Typhi Pstx12 Gallinarum 287/91 Paratyphi A ATCC9150 Agona SL483 Pullorum S06004 Dublin CT_02021853 Typhimurium LT2 Typhimurium 081736 Choleraesuis SC-B67 Paratyphi A AKU12601 Schwarzengrund CVM19633 100 100 99 99 72 100 93 69 100 96 87 80 100 100 100 100 100 100 100 100 88 100 100 99 100 0.0002 Tepos_37 Tepos_35 Tepos_10 Tepos_20 Paratyphi C RKS4594 Tepos_14 72 87 80 100100 a b c d 0.2 Schwarzengrund CVM19633 Typhimurium LT2 Heidelberg SL476 Choleraesuis SC-B67 Typhi Ty21a Tepos_14 Enteritidis P125109 Bovismorbificans 3114 Agona SL483 Paratyphi A ATCC9150 Typhimurium 081736 Bareilly CFSAN000189 Cubana CFSAN002050 Paratyphi C RKS4594 Weltevreden 2007-60-3289-1 Typhi Ty2 Pullorum S06004 Typhi CT18 Paratyphi A AKU12601 Tepos_35 Dublin CT_02021853 Typhi Pstx12 Gallinarum 287/91 Paratyphi B SPB7 arizonae serovar 62:z4,z23:- RSK2980 100 100 100 99 100 100 100 100 100 100 92 100 100 87 91 100 100 87 100 93 100 100 0.0002 Tepos_14 Paratyphi C RKS4594 Tepos_35100100