1
Salmonella enterica genomes from victims of a major 16th century epidemic 1 
in Mexico 2 
 3 
Authors:  Åshild J. Vågene1,2#, Alexander Herbig1,2#*, Michael G. Campana2, 3†, Nelly M. Robles García4, 4 
Christina Warinner1, Susanna Sabin1, Maria A. Spyrou1,2, Aida Andrades Valtueña1, Daniel Huson5, Noreen 5 
Tuross2*, Kirsten I. Bos1,2* and Johannes Krause1,2* 6 
Affiliations: 7 
1Max Planck Institute for the Science of Human History, Jena, Germany.  8 
2Institute for Archaeological Sciences, University of Tübingen, Tübingen, Germany 9 
3Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA. 10 
4Institute of Evolutionary Medicine, University of Zurich, Zurich, Switzerland. 11 
5INAH, National Institute of Anthropology and History, Mexico, Teposcolula-Yucundaa Archaeological 12 
Project. 13 
6Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Tübingen, Germany. 14 
 15 
#These authors contributed equally to this work 16 
*Correspondence to: A.H.: herbig@shh.mpg.de; N.T.: tuross@fas.harvard.edu; K.I.B.: bos@shh.mpg.de; J.K.: 17 
krause@shh.mpg.de 18 
†Current address: M.G.C.: Smithsonian Conservation Biology Institute, Center for Conservation Genomics, 19 
3001 Connecticut Avenue NW, Washington, DC 20008, USA. 20 
 21 
Summary paragraph: Indigenous populations of the Americas experienced high mortality 22 
rates during the early contact period as a result of infectious diseases, many of which were 23 
introduced by Europeans. Most of the pathogenic agents that caused these outbreaks remain 24 
unknown. Through the introduction of a new metagenomic analysis tool called MALT 25 
applied here to search for traces of ancient pathogen DNA, we were able to identify 26 
Salmonella enterica in individuals buried in an early contact era epidemic cemetery at 27 
Teposcolula-Yucundaa, Oaxaca in southern Mexico. This cemetery is linked, based on 28 
historical and archaeological evidence, to the 1545-1550 CE epidemic that affected large 29 
parts of Mexico. Locally this epidemic was known as “cocoliztli”, the pathogenic cause of 30 
which has been debated for over a century. Here we present genome-wide data from ten 31 
individuals for Salmonella enterica subsp. enterica serovar Paratyphi C, a bacterial cause of 32 
enteric fever. We propose that S. Paratyphi C be considered a strong candidate for the 33 
epidemic population decline during the 1545 cocoliztli outbreak at Teposcolula-Yucundaa.  34 
  35 
 2
Infectious diseases introduced to the New World following European contact led to 36 
successive outbreaks in many regions of the Americas that continued well into the 19th 37 
century. These often caused high mortality and thus contributed a central, and often 38 
underappreciated, influence on the demographic collapse of many indigenous populations1-4. 39 
Population declines linked to regionally specific epidemics are estimated to have reached as 40 
high as 95%3, and their genetic impact based on recent population-based studies of ancient 41 
and modern human exome and mitochondrial data attest to their scale5,6. One hypothesis 42 
posits that the increased susceptibility of New World populations to Old World diseases 43 
facilitated European conquest, whereby rapidly disseminating diseases severely weakened 44 
indigenous populations2, in some cases even in advance of European presence in the 45 
region2,7. Well-characterized infections such as smallpox, measles, mumps, and influenza are 46 
known causes of later contact era outbreaks; however, the diseases responsible for many 47 
early contact period New World epidemics remain unknown and have been the subject of 48 
scientific debate for over a century2-4,7,8.  49 
 50 
Morphological changes in skeletal remains9 and ethnohistorical accounts10 are often explored 51 
as sources for understanding population health in the past, although both provide only limited 52 
resolution and have generated speculative and at times conflicting hypotheses about the 53 
diseases introduced to New World populations2,3,7,11,12. Most infectious diseases do not leave 54 
characteristic markers on the skeleton due to either their short periods of infectivity, the death 55 
of the victim in the acute phase before skeletal changes formed, or a lack of osteological 56 
involvement9. While historical descriptions of infectious disease symptoms can be detailed, 57 
they are subject to cultural biases, suffer from translational inaccuracies, lack a foundation in 58 
germ theory, and describe historical forms of a condition that may differ from modern 59 
manifestations8,11. Additionally, differential diagnosis based on symptoms alone can be 60 
 3
unreliable even in modern contexts since many infectious diseases have similar clinical 61 
presentations. 62 
 63 
Genome-wide studies of ancient pathogens have proven instrumental in both identifying and 64 
characterizing past human infectious diseases. These studies have largely been restricted to 65 
skeletal collections where individuals display physical changes consistent with particular 66 
infections13-15, an historical context that links a specific pathogen to a known epidemiological 67 
event16, or an organism that was identified via targeted molecular screening without prior 68 
indication of its presence17. Recent attempts to circumvent these limitations have 69 
concentrated on broad-spectrum molecular approaches focused on pathogen detection via 70 
fluorescence-hybridization-based microarray technology18, identification via DNA-71 
enrichment of certain microbial regions19 or computational screening of non-enriched 72 
sequence data against human microbiome data sets20. These approaches offer improvements, 73 
but remain biased in the bacterial taxa used for species-level assignments. As archaeological 74 
material is expected to harbour an abundance of bacteria that stem from the depositional 75 
context, omission of environmental taxa in species assignments can lead to false-positive 76 
identifications. Additional techniques for authenticating ancient DNA have been 77 
developed21,22, including the identification of characteristic damage patterns caused by the 78 
deamination of cytosines23, methods that evaluate evenness of coverage of aligned reads 79 
across a reference genome, or length distributions that consider the degree of fragmentation, 80 
where ancient molecules are expected to be shorter than those from modern contaminants24. 81 
 82 
A typical NGS dataset from an ancient sample comprises millions of DNA sequencing reads, 83 
making taxonomic assignment and screening based on sequence alignments computationally 84 
challenging. The gold standard tool for alignment-based analyses is BLAST25, due to its 85 
 4
sensitivity and statistical model. However, the computational time and power BLAST 86 
requires to analyse a typical metagenomic dataset is often prohibitive.  87 
 88 
Here we introduce MALT (MEGAN ALignment Tool), a program for the fast alignment and 89 
analysis of metagenomic DNA sequencing data. MALT contains the same taxonomic binning 90 
algorithm, i.e. the naïve LCA (Lowest Common Ancestor) algorithm (for reviews see26,27), 91 
implemented in the interactive metagenomics analysis software MEGAN28. Like BLAST, 92 
MALT computes “local” alignments between highly conserved segments of reads and 93 
references. MALT can also calculate “semi-global” alignments where reads are aligned end-94 
to-end. In comparison to protein alignments or local DNA alignments, semi-global DNA 95 
alignments are more suitable for assessing various quality and authenticity criteria that are 96 
commonly applied in the field of paleogenetics.  97 
 98 
We applied our MALT screening pipeline (Supplementary Figures 1, 2) using a database of 99 
all complete bacterial genomes available in NCBI RefSeq to non-enriched DNA sequence 100 
data from the pulp chamber of teeth collected from indigenous individuals excavated at the 101 
site of Teposcolula-Yucundaa, located in the highland Mixteca Alta region of Oaxaca, 102 
Mexico29,30. The site contains both pre-contact and contact era burials, including the earliest 103 
identified contact era epidemic burial ground in Mexico30,31 (Fig. 1; Supplementary methods 104 
1). This is the only known cemetery historically linked to the cocoliztli epidemic of 1545-105 
1550 CE30, described as one of the principal epidemiological events responsible for the 106 
cataclysmic population decline of 16th century Mesoamerica7,32. This outbreak affected large 107 
areas of central Mexico and Guatemala, spreading perhaps as far south as Peru7,30. Via the 108 
MALT screening approach, we were able to identify ancient Salmonella enterica DNA in the 109 
sequence data generated from this archaeological material, to the exclusion of DNA 110 
 5
stemming from the complex environmental background. While the pathogenic cause of the 111 
cocoliztli epidemic is ambiguous based on ethnohistorical evidence7,8,30, we report the first 112 
molecular evidence of microbial infection with genome-wide data from S. enterica subsp. 113 
enterica serovar Paratyphi C (enteric fever) isolated from ten epidemic-associated contact era 114 
burials. 115 
 116 
Results 117 
The individuals included in this investigation were excavated from the contact era epidemic 118 
cemetery located in the Grand Plaza (administrative square) (n=24) and the pre-contact 119 
churchyard cemetery (n=5) at Teposcolula-Yucundaa between 2004 and 201030 (Fig. 1; 120 
Supplementary Table 1; Supplementary methods 1). Previous work demonstrated ancient 121 
DNA preservation at the site through the identification of New World mitochondrial 122 
haplogroups in 48 individuals, 28 of which overlap with this study30. Additionally, oxygen 123 
isotope analysis identified them as local inhabitants30. Thirteen individuals included in this 124 
study were previously radiocarbon dated31 yielding dates that support archaeological 125 
evidence that the Grand Plaza (n=10) and churchyard (n=3) contain contact and pre-contact 126 
era burials, respectively (Supplementary Table 1). The Grand Plaza is estimated to contain 127 
more than 800 individuals, mostly interred in graves containing multiple persons. Those 128 
excavated contribute to a demographic profile consistent with an epidemic event29,30.  129 
 130 
Tooth samples were processed according to protocols designed for ancient DNA work 131 
(Supplementary methods 2). An aggregate soil sample from the two burial grounds was 132 
analysed in parallel to gain an overview of environmental bacteria that may have infiltrated 133 
our samples. Pre-processed sequencing data of approximately one million paired-end reads 134 
per tooth were analysed with MALT using a curated reference database of 6247 complete 135 
 6
bacterial genomes, comprising all those available in NCBI RefSeq (December 2016). Our 136 
approach limits ascertainment biases and false positive assignments that could result from 137 
databases deficient in environmental taxa (Supplementary methods 3). A runtime analysis 138 
revealed a 200-fold improvement in computation time for MALT in comparison to BLASTn 139 
(see Methods). Results were visualized in MEGAN28 and taxonomic assignments were 140 
evaluated with attention to known pathogenic species. Reads taxonomically assigned by 141 
MALT ranged from 4,842 to 44,315 for the samples. Assigned reads belonging to bacterial 142 
constituents of human oral and soil microbiota are present in varying proportions amongst the 143 
samples (Fig. 2; Supplementary Table 2). Of note, three teeth (Tepos_10, Tepos_14 and 144 
Tepos_35) had between 365 to 659 reads assigned to Salmonella enterica, a known cause of 145 
enteric fever in humans today. Of the S. enterica strains present in the database, S. Paratyphi 146 
C had the highest number of assigned reads (Supplementary methods 3; Supplementary Table 147 
2). Mapping these three metagenomic datasets to the S. Paratyphi C RKS4594 genome 148 
(NC_012125.1) revealed the characteristic pattern of damage expected of ancient DNA 149 
(Supplementary Fig. 3; Supplementary methods 3; Supplementary Table 4). Subsequently, 150 
the sequencing data for all samples was mapped to the human genome (hg19), revealing a 151 
similar level of damage in the human reads for Tepos_10, Tepos_14 and Tepos_35, thus 152 
providing further support for the ancient origin of the S. enterica reads (Supplementary 153 
methods 4; Supplementary Table 5). An additional seven individuals from the Grand Plaza 154 
cemetery and one negative control harboured low numbers of assigned S. enterica reads 155 
ranging from 4 to 51 (Supplementary Table 2). These were considered as potential weak-156 
positive samples. One negative control was found to contain 15 reads assigned to S. enterica, 157 
and a further four contained one or two reads, as did nine sample libraries, seven of which 158 
were not included in downstream analyses. The soil library and remaining sample libraries 159 
were void of S. enterica reads (Fig. 2; Supplementary methods 3; Supplementary Table 2). 160 
 7
An additional MALT screen for traces of viral DNA revealed one notable taxonomic hit to 161 
Salmonella phage Vi II-E1, a phage associated with Salmonella serovars that produce the Vi 162 
capsule antigen33, which includes S. Paratyphi C (Supplementary methods 3; Supplementary 163 
Table 3). 164 
 165 
To further authenticate and elucidate our findings we performed whole-genome targeted 166 
array and in-solution hybridization capture34,35, using probes designed to encompass modern 167 
S. enterica genome diversity (Supplementary methods 5, 6, 7; Supplementary Table 6). All 168 
five pre-contact samples, the soil sample, one post-contact sample putatively negative for S. 169 
enterica based on our MALT screening, all negative controls, and both UDG-treated (DNA 170 
damage removed) and non-UDG treated libraries from the ten putatively positive samples 171 
(Tepos_10, Tepos_11, Tepos_20, Tepos_14, Tepos_34, Tepos_35, Tepos_36, Tepos_37, 172 
Tepos_38, Tepos_41) were included in the capture (Supplementary methods 6, 7).  173 
 174 
Mapping and genotyping of the captured Illumina sequenced reads was performed using the 175 
S. Paratyphi C reference genome (NC_012125.1) (Supplementary methods 6, 7, 8; 176 
Supplementary Table 7). Capture of S. enterica DNA was successful for the ten positive 177 
samples yielding a minimum of 33,327 unique reads per UDG treated library. The remaining 178 
bone samples, soil sample, and negative controls were determined to be negative for ancient 179 
S. enterica DNA with the exception of one negative control (EB2-091013) that had likely 180 
become cross-contaminated during processing (see Supplementary methods 8; 181 
Supplementary Table 7). Five complete genomes were constructed for Tepos_10, Tepos_14, 182 
Tepos_20, Tepos_35 and Tepos_37, covering 95%, 97%, 67%, 98% and 74% of the 183 
reference at a minimum of 3-fold coverage and yielding an average genomic coverage of 33-, 184 
36-, 4.6-, 96- and 5.5-fold, respectively (Table 1). Artificial reads generated in silico for 23 185 
 8
complete genomes included in the probe design were also mapped to the S. Paratyphi C 186 
RKS4594 reference (Supplementary methods 8; Supplementary Table 6) and phylogenetic 187 
comparison revealed that the five ancient genomes clustered with S. Paratyphi C (Fig. 3; 188 
Supplementary Figures 4, 6; Supplementary methods 8). The phylogenetic positioning was 189 
retained when the whole dataset was mapped to and genotyped against the S. Typhi CT18 190 
reference genome (NC_003198.1) (Supplementary Figure 5; Supplementary Table 8), the 191 
most common bacterial cause of enteric fever in humans today. This result excludes the 192 
possibility of a reference bias. Despite all five ancient genomes being contemporaneous, the 193 
Tepos_10 genome was observed to contain many more derived positions. An investigation of 194 
heterozygous variant calls showed that Tepos_10 has a much higher number of heterozygous 195 
sites. We believe this is best explained by the presence of genetically similar non-target DNA 196 
that co-enriched in the capture for this sample alone. Based on the pattern of allele 197 
frequencies, this genome was excluded from downstream analyses (Supplementary methods 198 
9; Supplementary Figure 7). Tepos_20 and Tepos_37 were also excluded due to their 199 
genomic coverage of less than 6-fold, which allowed more reliable SNP calling at a minimum 200 
of 5-fold coverage for Tepos_14 and Tepos_35. Subsequent phylogenetic tree construction 201 
with 1000 bootstrap replicates revealed 100% support and branch shortening for the 202 
Tepos_14 and Tepos_35 genomes in all phylogenies, supporting their ancient origin (Fig. 3; 203 
Supplementary Figure 8).  204 
 205 
SNP analysis for the ancient genomes together with the reference dataset yielded a total of 206 
203,256 variant positions amongst all 25 genomes. Our analyses identified 681 positions 207 
present in one or both of the ancient genomes, where 133 are unique to the ancient lineage 208 
(Supplementary methods 10; Supplementary Table 9). Of these, 130 unique SNPs are shared 209 
between Tepos_14 and Tepos_35, supporting their close relationship and shared ancestry. 210 
 9
The ydiD gene involved in the breakdown of fatty acids36 and the tsr gene related to the 211 
chemotaxic response system37 were found to contain multiple non-synonymous SNPs 212 
(nsSNPs) unique to the ancient genomes (Supplementary methods 10). Seven homoplastic 213 
and four tri-allelic variant positions were detected in the ancient genomes (Supplementary 214 
methods 10; Supplementary Tables 10a, 10b). 215 
 216 
A region of the pil operon consisting of five genes, pilS, pilU, pilT, pilV and rci, was found in 217 
our ancient genomes and was absent in the S. Paratyphi C RKS4594 genome38 218 
(Supplementary methods 12; Supplementary Table 12). This region is located in Salmonella 219 
Pathogenicity Island 7 (SPI-7), and encodes a type IVB pili39,40. The version of pilV in our 220 
ancient genomes is thought to facilitate bacterial self-aggregation, a phenomenon that 221 
potentially aids in invasion of host tissues39,40 (for details see Supplementary methods 12). A 222 
presence/absence analysis of virulence factors was also conducted (Supplementary methods 223 
13; Supplementary Table 13; Supplementary Figure 9). 224 
 225 
The S. Paratyphi C RKS4594 strain harbours a virulence plasmid, pSPCV, which was 226 
included in our capture design. It is present at 10- to 224-fold average coverage for the five 227 
genomes (Supplementary methods 14; Supplementary Tables 14, 15). 228 
 229 
Non-UDG capture reads mapped to the S. Paratyphi C genome (NC_012125.1) for Tepos_11, 230 
Tepos_34, Tepos_36, Tepos_38 and Tepos_41, i.e. those that did not yield full genomes, had 231 
damage patterns characteristic of ancient DNA (Supplementary Figure 3). To further verify 232 
these reads as true ancient S. Paratyphi C reads we investigated 45 SNPs unique to Tepos_14 233 
and Tepos_35 that are included in our phylogenetic analysis (Supplementary methods 11; 234 
Supplementary Table 11). Of the 45 positions, between 6 and 29 had a minimum of 1-fold 235 
 10
coverage in these low coverage libraries. All of these were in agreement with the unique 236 
SNPs present in the high-coverage ancient genomes, thus confirming their shared ancestry. 237 
 238 
Discussion 239 
Interpretations of ethnohistorical documents have suggested some form of typhus or enteric 240 
(typhoid/paratyphoid) fever (from the Spanish “tabardillo”, “tabardete”, and “tifus mortal”), 241 
viral haemorrhagic fever, measles, or pneumonic plague as potential causes of the cocoliztli 242 
epidemic of 1545 CE (for ref. see Supplementary discussion 1). These diseases present 243 
symptoms similar to those that were recorded in the cocoliztli outbreak such as red spots on 244 
the skin, bleeding from various body orifices, and vomiting (Supplementary discussion 1; 245 
Supplementary Figure 10). Given the non-specific nature of these symptoms, additional 246 
sources of data are needed to clarify which disease(s) was/were circulating. Previous 247 
investigation of sequencing data generated from the Teposcolula-Yucundaa material did not 248 
identify DNA traces of ancient pathogens; however, S. enterica was not considered as a 249 
candidate41. Here we have isolated genome-wide data of ancient S. Paratyphi C from ten 250 
Mixtec individuals buried in the Grand Plaza epidemic cemetery at Teposcolula-Yucundaa, 251 
indicating that enteric fever was circulating in the indigenous population during the cocoliztli 252 
epidemic of 1545-50 CE. As demonstrated here, MALT offers a sensitive approach for 253 
screening non-enriched sequence data in search for unknown candidate bacterial pathogens 254 
involved in past disease outbreaks, even to the exclusion of a dominant environmental 255 
microbial background. Most importantly, it offers the advantage of extensive genome-level 256 
screening without the need to specify a target organism, thus avoiding ascertainment biases 257 
common to other screening approaches. Fast metagenomic profiling tools that are based on k-258 
mer matching such as KRAKEN42 or specific diagnostic marker regions such as 259 
MetaPhlan243 have limitations in ancient DNA applications. Complete alignments are needed 260 
 11
to authenticate candidate taxonomic assignments, and a small number of marker regions 261 
might not provide sufficient resolution for identification, as target DNA is often present in 262 
low amounts. Our focus on only bacterial and DNA viral taxa limits our resolution in 263 
identifying other infectious agents that may have been present in the population during the 264 
Teposcolula-Yucundaa epidemic.  265 
 266 
Although our discussions here have focused on a single pathogenic organism, the potential of 267 
its having acted synergistically with other pathogen(s) circulating during the epidemic must 268 
be considered. The concept of syndemics and the complex biosocial factors that influence 269 
infectious disease transmission and severity are well-documented in both modern and 270 
historical contexts44,45. We are currently limited to the detection of bacterial pathogens and 271 
DNA viruses included in the NCBI genomic database, though the resolution offered by 272 
MALT analyses will increase as this database grows. We have not investigated the presence 273 
of RNA viruses, since methods for RNA retrieval from archaeological tissues are 274 
underdeveloped and not supported by our current protocols46. 275 
 276 
We confidently exclude an environmental organism as the source for our ancient genomes on 277 
the basis that 1) S. Paratyphi C is restricted to humans, 2) it is not known to freely inhabit soil 278 
(our soil sample was negative for Salmonella during screening and after capture), 3) the 279 
deamination patterns observed for the ancient human and S. Paratyphi C reads are 280 
characteristic of authentic ancient DNA, and 4) the ancient S. Paratyphi C genomes display 281 
expected branch-shortening in all constructed phylogenies. Moreover, we recovered all 282 
ancient genomic data from the pulp-chambers of teeth collected in situ, increasing the 283 
likelihood of our having identified a bacterium that was present in the victim’s blood at the 284 
time of death. S. enterica introduction via post-burial disturbance is unlikely because the 285 
 12
graves in the Grand Plaza were dug directly into the thickly paved floor at the site and 286 
historical records indicate that Teposcolula-Yucundaa was abandoned shortly after the 287 
epidemic ended in 1552 CE29,30.  288 
 289 
S. Paratyphi C is one of over 2600 identified S. enterica serovars distinguished by their 290 
antigenic formula47. Only four serovars (S. Typhi and S. Paratyphi A, B, C), all of which 291 
cause enteric fever, are restricted to the human host47. Today S. Typhi and S. Paratyphi A 292 
cause the majority of reported cases48. S. Paratyphi C is rarely reported38,48. Infected 293 
individuals shed bacteria long after the termination of symptoms47, and in the case of S. 294 
Typhi infection, 1-6% of individuals become asymptomatic carriers49. Following the 295 
hypothesis that this disease was introduced via European contact, it is conceivable that 296 
asymptomatic European carriers who withstood the cross-Atlantic voyage could have 297 
introduced S. Paratyphi C to Mesoamerican populations in the 16th century. First hand 298 
descriptions of the 1545 cocoliztli epidemic suggest that both European and Mixtec 299 
individuals were susceptible to the disease7,50, with one estimate of a 60-90% population 300 
decline in New Spain during this period7.  301 
 302 
The additional SPI-7 genes detected through indel analysis are reported to vary in 303 
presence/absence amongst modern S. Paratyphi C strains38,40, and are suspected to cause 304 
increased virulence when the inverted repeats in pilV allow the Rci recombinase to shuffle 305 
between its two protein states (Supplementary methods 12). This may support an increased 306 
capacity for our ancient strains to cause an epidemic outbreak. However, the overall 307 
mechanisms through which S. Paratyphi causes enteric fever remain unclear. The nsSNPs in 308 
the ydiD and tsr genes may signify adaptive processes, and comparison with a greater number 309 
of S. Paratyphi C genomes may clarify this51. 310 
 13
 311 
Today, S. Paratyphi C is rare in Europe and the Americas, with more cases identified across 312 
Africa and Asia52,53. Based on MLST data from modern S. Paratyphi C strains, no clear 313 
phylogeographic pattern has been observed52. However, the presence of a 1200 CE S. 314 
Paratyphi C genome in Norway indicates its presence in Europe in the pre-contact era51, 315 
which would be necessary for it to be considered an Old World disease. However, based on 316 
the small number of pre-contact individuals that we have screened, we cannot exclude the 317 
presence of S. Paratyphi C at Teposcolula-Yucundaa prior to European arrival. A local origin 318 
for the cocoliztli disease has been proposed elsewhere54.  Historical accounts offer little 319 
perspective on its origin since neither the indigenous population nor the European colonizers 320 
had a pre-existing name for the disease7,8,30. Spanish colonial documents refer to it as 321 
pujamiento de sangre (‘full bloodiness’), while the indigenous Aztec population of Central 322 
Mexico called it cocoliztli, a generic term meaning ‘pestilence’ in Nahuatl7,8 (see 323 
Supplementary discussion 1).  324 
 325 
Little is known about the past severity and worldwide incidence of enteric fever, first 326 
determined to be distinct from typhus in the mid-nineteenth century55. Enteric fevers are 327 
regarded as major health threats across the world48, causing an estimated ~27 million 328 
illnesses in 2000, the majority of which were attributed to S. Typhi56. Due to the rarity of S. 329 
Paratyphi C diagnoses, mortality rates are not established for this particular serovar. Today, 330 
outbreaks predominantly occur in developing countries. S. Typhi/Paratyphi are commonly 331 
transmitted through the faecal-oral route via ingestion of contaminated food or water57. 332 
Changes imposed under Spanish rule such as forced relocations under the policy of 333 
congregación, altered living arrangements, and new subsistence farming practices29,30 334 
 14
compounded by drought conditions32 could have disrupted existing hygiene measures, 335 
facilitating S. Paratyphi C transmission.  336 
 337 
Our study represents a first step towards a molecular understanding of disease exchange in 338 
contact era Mexico. The 1545 cocoliztli epidemic is regarded as one of the most devastating 339 
epidemics in New World history7,32. Our findings contribute to the debate concerning the 340 
causative agent of this epidemic at Teposcolula-Yucundaa, where we propose that S. 341 
Paratyphi C be considered. We introduced MALT, a novel fast alignment and taxonomic 342 
assignment method. Its application to the identification of ancient Salmonella enterica DNA 343 
within a complex background of environmental microbial contaminants speaks to the 344 
suitability of this approach, and its resolution will improve as the number of available 345 
reference genomes increases. This method may be eminently useful for studies wishing to 346 
identify pathogenic agents involved in ancient and modern disease, particularly in cases 347 
where candidate organisms are not known a priori. 348 
 349 
 350 
 351 
 352 
 353 
 354 
 355 
 356 
 357 
 358 
 359 
 360 
 361 
 362 
 363 
 364 
 15
Methods 365 
 366 
The MALT algorithm 367 
MALT is based on the seed-and-extend paradigm and consists of two programs, malt-build 368 
and malt-run.  369 
First, malt-build is used to construct an index for the given database of reference sequences. 370 
To do so, malt-build determines all occurrences of spaced seeds58-60 in the reference 371 
sequences and places them into a hash table61. 372 
Following this, malt-run is used to align a set of query sequences against the reference 373 
database. To this end, the program generates a list of spaced seeds for each query and then 374 
looks them up in the reference hash table, which is kept in main memory. Using the x-drop 375 
extension heuristic25, a high-scoring ungapped alignment anchored at the seed is computed 376 
and is used to decide whether or not a full alignment should be constructed. Local or semi-377 
global alignments are computed using a banded implementation62 of the Smith-Waterman63 378 
or Needleman-Wunsch64 algorithms, respectively. The program then computes the bit-score 379 
and expected value (E-value) of the alignment and decides whether to keep or discard the 380 
alignment depending on user-specified thresholds for bit-score, E-value or percent identity. 381 
The application of malt-run is illustrated in Supplementary Figure 1. 382 
 383 
The MALT screening pipeline 384 
In order to use MALT in ancient DNA contexts to screen for bacterial DNA and to assess the 385 
taxonomic composition of ancient bacterial communities we applied the following workflow 386 
(Supplementary Figure 2). First we used malt-build to construct a MALT index on all 387 
complete bacterial genomes in GenBank65. This was done only once, and is rebuilt only when 388 
the target database requires updating. We align reads to the reference database using malt-run 389 
 16
in semi-global mode. MALT generates output in RMA format and in SAM format. The 390 
former can be used for interactive analysis of taxonomic composition in MEGAN28 and the 391 
latter for alignment-based assessment of damage patterns and other authenticity criteria. 392 
 393 
Sample provenience 394 
The site of Teposcolula-Yucundaa is situated on a mountain ridge in the Mixteca Alta region 395 
of Oaxaca, Mexico. Prior archaeological excavation at this site revealed a large epidemic 396 
cemetery located in the Grand Plaza – the town’s administrative centre, and an additional 397 
cemetery in the churchyard (Fig. 1; Supplementary methods 1). 24 teeth were collected from 398 
individuals buried in the Grand Plaza cemetery and 5 were collected from the individuals 399 
buried in the churchyard cemetery (Supplementary methods 2; Supplementary Table 1). Soil 400 
samples were also collected from both cemetery sites (Supplementary methods 2). 401 
 402 
DNA extraction and library preparation 403 
DNA extracts and double-stranded indexed libraries compatible with Illumina sequencing 404 
were generated using methods tailor-made for ancient DNA66-68. This work was carried out in 405 
dedicated ancient DNA cleanroom facilities at the University of Tübingen and Harvard 406 
University (Supplementary methods 2).  407 
 408 
Screening with MALT 409 
Amplified libraries were shotgun sequenced. The reads were adapter clipped and merged 410 
before being analysed with MALT and the results visualized in MEGAN628 (Supplementary 411 
methods 2, 3). Two MALT runs were executed. The first using all complete bacterial 412 
genomes available through NCBI RefSeq (December, 2016), and the second using the full 413 
NCBI Nucleotide (nt) database (ftp://ftp-trace.ncbi.nih.gov/blast/db/FASTA/) as reference to 414 
 17
screen for viral DNA (Fig. 2; Supplementary methods 3; Supplementary Tables 2, 3). Both 415 
runs used ‘semi-global’ alignment and a minimum percent identity of 95% (Supplementary 416 
methods 3). The shotgun data was also mapped to the S. Paratyphi C RKS4594 reference 417 
(NC_012125.1) and the human genome (hg19) and damage plots were generated 418 
(Supplementary methods 3, 4; Supplementary Table 4, 5; Supplementary Figure 3). 419 
 420 
Runtime comparison 421 
The programs MALT (version 0.3.8) and BLAST25 (version 2.6.0+) were applied to the 422 
shotgun screening data of Tepos_35 consisting of altogether 952,511 reads. For both 423 
programs the DNA alignment mode (blastn) was chosen. The maximal E-value was set to 424 
1.0. The maximal number of alignments for each query was set to 100. The minimal per cent 425 
identity was set to 95. The number of threads was set to 16. The alignment type of MALT 426 
was set to ‘Local’ in order to be comparable to BLAST. The total amount of RAM required 427 
by MALT during this run was 252.7 GB. 428 
For MALT the runtime was measured excluding the initial loading of our reference database, 429 
which happens only once when screening multiple samples. The loading of the database takes 430 
27.27 minutes. Including taxonomic binning the application of MALT to our complete 431 
shotgun screening data took 123.36 minutes. As a comparison, processing only the screening 432 
data of a single sample (Tepos_35) with BLASTn took 1420.58 minutes without any 433 
taxonomic analysis. Processing of this sample alone with MALT, including taxonomic 434 
binning, took 6.48 minutes, which constitutes a 200-fold improvement in terms of 435 
computation time. 436 
The computations were performed on a Dell PowerEdge R820 with four Intel Xeon E5-4620 437 
2.2 GHz CPUs und 768 GB RAM. 438 
 439 
 18
Probe design and whole-genome capture 440 
Array probes were designed based on 67 publicly available S. enterica 441 
chromosomes/assemblies and 45 associated plasmid sequences (Supplementary methods 5; 442 
Supplementary Table 6). Extracts from samples deemed to be positive for S. enterica 443 
Paratyphi C were converted into additional rich UDG-treated libraries69 for whole-genome 444 
capture (Supplementary methods 6, 7). Pre-contact and post-contact samples were serially 445 
captured using our custom probe design, according to two established methods35,70. The 446 
eluate from both array and in-solution capture was sequenced to a sufficient depth to allow 447 
high coverage genome reconstruction (Supplementary methods 6, 7).  448 
 449 
Sequence data processing, initial phylogenetic assessment and authenticity  450 
The sequence data was adapter clipped and quality filtered before being mapped to the S. 451 
Paratyphi C reference (NC_012125.1) (Supplementary methods 8; Supplementary Table 7). 452 
Deamination patterns for the DNA were generated to assess the authenticity of the ancient S. 453 
Paratyphi C DNA using mapDamage23 (Supplementary methods 8; Supplementary Table 7; 454 
Supplementary Figure 3). Artificial read data was generated for a dataset of 23 genomes 455 
selected for comparative phylogenetic analysis; this data was also mapped to the S. Paratyphi 456 
C reference (Supplementary methods 8). SNP calling was carried out with GATK using a 457 
quality score of ≥30 for the five S. Paratyphi C genomes and the artificial read dataset. A 458 
neighbor-joining tree was constructed using MEGA671, based on a homozygous SNPs called 459 
at a minimum of 3-fold coverage where at least 90% of reads are in agreement 460 
(Supplementary methods 8; Supplementary Figure 4). In order to exclude a reference bias in 461 
the ascertainment of the phylogenetic positioning of the five ancient genomes (Tepos_10, 462 
Tepos_14 Tepos_20, Tepos_35 and Tepos_37), mapping, SNP calling and tree construction 463 
 19
was repeated for the S. Typhi CT18 reference (NC_003198.1) (Supplementary methods 8; 464 
Supplementary Table 8; Supplementary Figure 5).  465 
 466 
SNP typing and phylogenetic analysis 467 
Homozygous SNPs were called from the complete dataset (5 ancient and 23 modern) based 468 
on our criteria using a tool called MultiVCFAnalyzer (Supplementary methods 9). Repetitive 469 
and highly conserved regions of the S. Paratyphi C genome (NC_012125.1) were excluded 470 
from SNP calling to avoid spurious mapping reads. Maximum Parsimony71 and a Maximum 471 
Likelihood72 trees were made including the five genomes (Fig. 3; Supplementary Figure 6). 472 
Heterozygous positions were also called and their allele frequency distributions plotted using 473 
R73 (Supplementary methods 9; Supplementary Figure 7). SNP calling and phylogenetic tree 474 
construction was repeated excluding the Tepos_10, Tepos_20 and Tepos_37 genomes 475 
(Supplementary methods 10; Fig. 3; Supplementary Figure 8).  476 
 477 
The five weak-positive samples: Tepos_11, Tepos_34, Tepos_36, Tepos_38 and Tepos_41, 478 
that did not yield enough data for genome reconstruction, were investigated for 46 SNPs 479 
unique to the ancient genomes to verify that the captured reads for these samples are true 480 
ancient S. Paratyphi C reads (Supplementary methods 11; Supplementary Table 11).  481 
 482 
SNP effect and Indel analyses 483 
SNP effect analysis was carried out for the two ancient genomes (Tepos_14 and Tepos_35) 484 
alongside the modern dataset (see Supplementary methods 10). SNPs unique to the ancient 485 
genomes, pseudogenes and homoplastic positions were investigated (Supplementary methods 486 
10; Supplementary Tables 9, 10). Insertions and deletions (Indels), 700bp or larger, in the 487 
two ancient genomes were identified through two approaches. Deletions were visually 488 
 20
detected by mapping the ancient data to the S. Paratyphi C reference using a mapping quality 489 
threshold (-q) of 0 and manually viewing the genome alignment in the IGV browser 490 
(Supplementary methods 12). In order to detect insertions, or regions present in the ancient 491 
genomes that are missing the modern reference, the ancient data was mapped to concatenated 492 
reference pairs. Where one reference was in all cases the S. Paratyphi C RKS4594 reference 493 
(NC_012125.1) and the other was one of four S. enterica genomes of interest. A mapping 494 
quality threshold (-q) of 37 was used, thus allowing only regions unique to one or the other 495 
genome in the pair to map (Supplementary methods 12; Supplementary Table 12). 496 
 497 
Virulence factor analysis 498 
43 effector genes identified within Salmonella enterica subsp. enterica74 were investigated 499 
using the BEDTools suite75. The percentage of each gene that was covered at least 1-fold in 500 
the ancient and modern genomes in our dataset was plotted using the ggplot2package76 in R73 501 
(Supplementary methods 13; Supplementary Figure 9).  502 
 503 
Plasmid analysis 504 
The ancient data was mapped to the S. Paratyphi C virulence plasmid, pSPCV. SNP effect 505 
analysis was carried out in comparison to three other similar plasmid references 506 
(Supplementary methods 14; Supplementary Tables 14, 15). 507 
 508 
Data Availability 509 
Sequence data that support the findings of this study have been submitted to the European 510 
Nucleotide Archive under accession number PRJEB23438. MALT is open source and freely 511 
available from: http://ab.inf.uni-tuebingen.de/software/malt. The program MultiVCFAnalyzer 512 
is available on GitHub: https://github.com/alexherbig/MultiVCFAnalyzer. Source data for 513 
 21
figures are available upon request. 514 
 22
References: 515 
 516 
1 Ubelaker, D. H. Prehistoric New World population size: Historical review and current appraisal of 517 
North American estimates. Am J Phys Anthropol 45, 661-665, doi:10.1002/ajpa.1330450332 (1976). 518 
2 Crosby, A. W. Virgin soil epidemics as a factor in the aboriginal depopulation in America. William 519 
Mary Q 33, 289-299 (1976). 520 
3 Dobyns, H. F. Disease Transfer at Contact. Annu Rev Anthropol 22, 273-291 (1993). 521 
4 Acuna-Soto, R., Stahle, D. W., Therrell, M. D., Griffin, R. D. & Cleaveland, M. K. When half of the 522 
population died: the epidemic of hemorrhagic fevers of 1576 in Mexico. FEMS Microbiol Lett 240, 1-523 
5, doi:10.1016/j.femsle.2004.09.011 (2004). 524 
5 Llamas, B. et al. Ancient mitochondrial DNA provides high-resolution time scale of the peopling of the 525 
Americas. Sci Adv 2, e1501385, doi:10.1126/sciadv.1501385 (2016). 526 
6 Lindo, J. et al. A time transect of exomes from a Native American population before and after 527 
European contact. Nat Commun 7, 13175, doi:10.1038/ncomms13175 (2016). 528 
7 Cook, N. D. & Lovell, W. G. Secret Judgments of God: Old World Disease in Colonial Spanish 529 
America.  (University of Oklahoma Press, 2001). 530 
8 Fields, S. L. Pestilence and Headcolds: Encountering Illness in Colonial Mexico.  (Columbia 531 
University Press, 2008). 532 
9 Ortner, D. J. Identification of pathological conditions in human skeletal remains. 2 edn,  (Academic 533 
Press, 2003). 534 
10 Walker, R. S., Sattenspiel, L. & Hill, K. R. Mortality from contact-related epidemics among indigenous 535 
populations in Greater Amazonia. Sci Rep 5, 14032, doi:10.1038/srep14032 (2015). 536 
11 Joralemon, D. New World Depopulation and the Case of Disease. J Anthropol Res 38, 108-127 (1982). 537 
12 Larsen, C. S. In the wake of Columbus: Native population biology in the postcontact Americas. Am J 538 
Phys Anthropol 37, 109-154, doi:10.1002/ajpa.1330370606 (1994). 539 
13 Bos, K. I. et al. Pre-Columbian mycobacterial genomes reveal seals as a source of New World human 540 
tuberculosis. Nature 514, 494-497, doi:10.1038/nature13591 (2014). 541 
14 Schuenemann, V. J. et al. Genome-wide comparison of medieval and modern Mycobacterium leprae. 542 
Science 341, 179-183, doi:10.1126/science.1238286 (2013). 543 
15 Warinner, C. et al. Pathogens and host immunity in the ancient human oral cavity. Nat Genet 46, 336-544 
344, doi:10.1038/ng.2906 (2014). 545 
16 Bos, K. I. et al. A draft genome of Yersinia pestis from victims of the Black Death. Nature 478, 506-546 
510, doi:10.1038/nature10549 (2011). 547 
17 Maixner, F. et al. The 5300-year-old Helicobacter pylori genome of the Iceman. Science 351, 162-165, 548 
doi:10.1126/science.aad2545 (2016). 549 
18 Devault, A. M. et al. Ancient pathogen DNA in archaeological samples detected with a Microbial 550 
Detection Array. Sci Rep 4, 4245, doi:10.1038/srep04245 (2014). 551 
19 Bos, K. I. et al. Parallel detection of ancient pathogens via array-based DNA capture. Philosophical 552 
transactions of the Royal Society of London. Series B, Biological sciences 370, 20130375, 553 
doi:10.1098/rstb.2013.0375 (2015). 554 
20 Devault, A. M. et al. A molecular portrait of maternal sepsis from Byzantine Troy. Elife 6, 555 
doi:10.7554/eLife.20983 (2017). 556 
21 Warinner, C. et al. A Robust Framework for Microbial Archaeology. Annu Rev Genomics Hum Genet, 557 
doi:10.1146/annurev-genom-091416-035526 (2017). 558 
22 Key, F. M., Posth, C., Krause, J., Herbig, A. & Bos, K. I. Mining Metagenomic Data Sets for Ancient 559 
DNA: Recommended Protocols for Authentication. Trends Genet 33, 508-520, 560 
doi:10.1016/j.tig.2017.05.005 (2017). 561 
23 Jonsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. & Orlando, L. mapDamage2.0: fast 562 
approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682-1684, 563 
doi:10.1093/bioinformatics/btt193 (2013). 564 
24 Prufer, K. et al. Computational challenges in the analysis of ancient DNA. Genome Biol 11, R47, 565 
doi:10.1186/gb-2010-11-5-r47 (2010). 566 
25 Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. 567 
J Mol Biol 215, 403-410, doi:10.1016/S0022-2836(05)80360-2 (1990). 568 
26 Peabody, M. A., Van Rossum, T., Lo, R. & Brinkman, F. S. Evaluation of shotgun metagenomics 569 
sequence classification methods using in silico and in vitro simulated communities. BMC 570 
Bioinformatics 16, 363, doi:10.1186/s12859-015-0788-5 (2015). 571 
27 Lindgreen, S., Adair, K. L. & Gardner, P. P. An evaluation of the accuracy and speed of metagenome 572 
analysis tools. Sci Rep 6, 19233, doi:10.1038/srep19233 (2016). 573 
 23
28 Huson, D. H. et al. MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale 574 
Microbiome Sequencing Data. PLoS Comput Biol 12, e1004957, doi:10.1371/journal.pcbi.1004957 575 
(2016). 576 
29 Spores, R. & Robles García, N. A prehispanic (postclassic) capital center in colonial transition: 577 
excavations at Yucundaa Pueblo Viejo de Teposcolula, Oaxaca, Mexico. Lat Am Antiq 18, 33-353 578 
(2007). 579 
30 Warinner, C., Robles García, N., Spores, R. & Tuross, N. Disease, Demography, and Diet in Early 580 
Colonial New Spain: Investigation of a Sixteenth-Century Mixtec Cemetery at Teposcolula Yucundaa. 581 
Lat Am Antiq 23, 467-489 (2012). 582 
31 Tuross, N., Warinner, C. & Robles García, N. in Yucundaa: La cuidad mixteca Yucundaa-Pueblo Viejo 583 
de Teposcolula y su transformación prehispánica-colonial Vol. vol. 2  (eds Ronald Spores & Robles 584 
García Nelly)  541-546 (Instituto Nacional de Antropología e Historia, 2014). 585 
32 Acuna-Soto, R., Stahle, D. W., Cleaveland, M. K. & Therrell, M. D. Megadrought and megadeath in 586 
16th century Mexico. Emerg Infect Dis 8, 360-362 (2002). 587 
33 Pickard, D. et al. Molecular characterization of the Salmonella enterica serovar Typhi Vi-typing 588 
bacteriophage E1. Journal of bacteriology 190, 2580-2587, doi:10.1128/JB.01654-07 (2008). 589 
34 Burbano, H. A. et al. Targeted Investigation of the Neandertal Genome by Array-Based Sequence 590 
Capture. Science 328, 723-725, doi:10.1126/science.1188046 (2010). 591 
35 Fu, Q. et al. DNA analysis of an early modern human from Tianyuan Cave, China. Proceedings of the 592 
National Academy of Sciences of the United States of America 110, 2223-2227, 593 
doi:10.1073/pnas.1221359110 (2013). 594 
36 Campbell, J. W., Morgan-Kiss, R. M. & Cronan, J. E., Jr. A new Escherichia coli metabolic 595 
competency: growth on fatty acids by a novel anaerobic beta-oxidation pathway. Molecular 596 
microbiology 47, 793-805 (2003). 597 
37 Rivera-Chavez, F. et al. Salmonella uses energy taxis to benefit from intestinal inflammation. PLoS 598 
Pathog 9, e1003267, doi:10.1371/journal.ppat.1003267 (2013). 599 
38 Liu, W. Q. et al. Salmonella paratyphi C: genetic divergence from Salmonella choleraesuis and 600 
pathogenic convergence with Salmonella typhi. PLoS One 4, e4510, doi:10.1371/journal.pone.0004510 601 
(2009). 602 
39 Tam, C. K., Morris, C. & Hackett, J. The Salmonella enterica serovar Typhi type IVB self-association 603 
pili are detached from the bacterial cell by the PilV minor pilus proteins. Infect Immun 74, 5414-5418, 604 
doi:10.1128/IAI.00172-06 (2006). 605 
40 Tam, C. K., Hackett, J. & Morris, C. Salmonella enterica serovar Paratyphi C carries an inactive 606 
shufflon. Infect Immun 72, 22-28 (2004). 607 
41 Campana, M. G., Robles Garcia, N., Ruhli, F. J. & Tuross, N. False positives complicate ancient 608 
pathogen identifications using high-throughput shotgun sequencing. BMC Res Notes 7, 111, 609 
doi:10.1186/1756-0500-7-111 (2014). 610 
42 Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact 611 
alignments. Genome Biol 15, R46, doi:10.1186/gb-2014-15-3-r46 (2014). 612 
43 Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker 613 
genes. Nat Methods 9, 811-814, doi:10.1038/nmeth.2066 (2012). 614 
44 Singer, M. & Clair, S. Syndemics and public health: reconceptualizing disease in bio-social context. 615 
Med Anthropol Q 17, 423-441 (2003). 616 
45 Herring, D. A. & Sattenspiel, L. Social Contexts, Syndemics, and Infectious Disease in Northern 617 
Aboriginal Populations. American Journal of Human Biology 19, 190-202, doi:10.1002/ajhb (2007). 618 
46 Guy, P. L. Prospects for analyzing ancient RNA in preserved materials. Wiley Interdiscip Rev RNA 5, 619 
87-94, doi:10.1002/wrna.1199 (2014). 620 
47 Gal-Mor, O., Boyle, E. C. & Grassl, G. A. Same species, different diseases: how and why typhoidal 621 
and non-typhoidal Salmonella enterica serovars differ. Front Microbiol 5, 391, 622 
doi:10.3389/fmicb.2014.00391 (2014). 623 
48 Wain, J., Hendriksen, R. S., Mikoleit, M. L., Keddy, K. H. & Ochiai, R. L. Typhoid fever. Lancet 385, 624 
1136-1145, doi:10.1016/S0140-6736(13)62708-7 (2015). 625 
49 Monack, D. M., Mueller, A. & Falkow, S. Persistent bacterial infections: the interface of the pathogen 626 
and the host immune system. Nat Rev Micro 2, 747-765 (2004). 627 
50 Sahagún, B. d. Florentine Codex: general history of the things of New Spain.  (1950-1982). 628 
51 Zhou, Z. et al. Millennia of genomic stability within the invasive Para C Lineage of Salmonella 629 
enterica. bioRxiv, doi:10.1101/105759 (2017). 630 
52 Achtman, M. et al. Multilocus sequence typing as a replacement for serotyping in Salmonella enterica. 631 
PLoS Pathog 8, e1002776, doi:10.1371/journal.ppat.1002776 (2012). 632 
 24
53 Centers for Disease Control and Prevention (CDC). National Typhoid and Paratyphoid Fever 633 
Surveillance Annual Summary, 2014. (Atlanta, Georgia, 2016). 634 
54 Acuna-Soto, R., Romero, L. C. & Maguire, J. H. Large epidemics of hemorrhagic fevers in Mexico 635 
1545-1815. Am J Trop Med Hyg 62, 733-739 (2000). 636 
55 Smith, D. C. Gerhard's distinction between typhoid and typhus and its reception in America, 1833-637 
1860. Bull Hist Med 54, 368-385 (1980). 638 
56 Crump, J. A., Luby, S. P. & Mintz, E. D. The global burden of typhoid fever. Bull World Health Organ 639 
82, 346-353 (2004). 640 
57 World Health Organization. Typhoid fever – Uganda, <http://www.who.int/csr/don/17-march-2015-641 
uganda/en/> (2015). 642 
58 Burkhardt, S. & Kärkkäinen, J. in Combinatorial Pattern Matching: 12th Annual Symposium, CPM 643 
2001 Jerusalem, Israel, July 1–4, 2001 Proceedings   (ed Amihood Amir)  73-85 (Springer Berlin 644 
Heidelberg, 2001). 645 
59 Ma, B., Tromp, J. & Li, M. PatternHunter: faster and more sensitive homology search. Bioinformatics 646 
18, 440-445 (2002). 647 
60 Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat 648 
Methods 12, 59-60, doi:10.1038/nmeth.3176 (2015). 649 
61 Ning, Z., Cox, A. J. & Mullikin, J. C. SSAHA: a fast search method for large DNA databases. Genome 650 
Res 11, 1725-1729, doi:10.1101/gr.194201 (2001). 651 
62 Chao, K. M., Pearson, W. R. & Miller, W. Aligning two sequences within a specified diagonal band. 652 
Comput Appl Biosci 8, 481-487 (1992). 653 
63 Smith, T. F. & Waterman, M. S. Identification of common molecular subsequences. J Mol Biol 147, 654 
195-197 (1981). 655 
64 Needleman, S. B. & Wunsch, C. D. A general method applicable to the search for similarities in the 656 
amino acid sequence of two proteins. J Mol Biol 48, 443-453 (1970). 657 
65 Benson, D. A. et al. GenBank. Nucleic Acids Res 41, D36-42, doi:10.1093/nar/gks1195 (2013). 658 
66 Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear 659 
reconstructed from ultrashort DNA fragments. Proceedings of the National Academy of Sciences of the 660 
United States of America 110, 15758-15763, doi:10.1073/pnas.1314445110 (2013). 661 
67 Meyer, M. & Kircher, M. Illumina sequencing library preparation for highly multiplexed target capture 662 
and sequencing. Cold Spring Harb Protoc 2010, pdb prot5448, doi:10.1101/pdb.prot5448 (2010). 663 
68 Kircher, M., Sawyer, S. & Meyer, M. Double indexing overcomes inaccuracies in multiplex 664 
sequencing on the Illumina platform. Nucleic Acids Res 40, e3, doi:10.1093/nar/gkr771 (2012). 665 
69 Briggs, A. W. et al. Removal of deaminated cytosines and detection of in vivo methylation in ancient 666 
DNA. Nucleic Acids Res 38, e87, doi:10.1093/nar/gkp1163 (2010). 667 
70 Hodges, E. et al. Hybrid selection of discrete genomic intervals on custom-designed microarrays for 668 
massively parallel sequencing. Nat Protoc 4, 960-974, doi:10.1038/nprot.2009.68 (2009). 669 
71 Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary 670 
Genetics Analysis version 6.0. Molecular biology and evolution 30, 2725-2729, 671 
doi:10.1093/molbev/mst197 (2013). 672 
72 Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large 673 
phylogenies. Bioinformatics 30, 1312-1313, doi:10.1093/bioinformatics/btu033 (2014). 674 
73 R Development Core Team.     (The R Foundation for Statistical Computing, Vienna, Austria, 2011). 675 
74 Connor, T. R. et al. What's in a Name? Species-Wide Whole-Genome Sequencing Resolves Invasive 676 
and Noninvasive Lineages of Salmonella enterica Serotype Paratyphi B. MBio 7, 677 
doi:10.1128/mBio.00527-16 (2016). 678 
75 Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. 679 
Bioinformatics 26, 841-842, doi:10.1093/bioinformatics/btq033 (2010). 680 
76 Wickham, H. ggplot2: Elegant Graphics for Data Analysis.  (Springer-Verlag New York, 2009). 681 
 682 
 683 
 684 
 685 
 686 
 687 
 688 
 689 
 690 
 691 
 25
Acknowledgements: This work was supported by the Max Planck Society (J.K.), the 692 
European Research Council (ERC) starting grant APGREID (to J.K.), Social Sciences and 693 
Humanities Research Council of Canada postdoctoral fellowship grant 756-2011-501 (to 694 
K.I.B.), and the Mäxi Foundation (M.G.C.). We thank the Archaeology Council at Mexico’s 695 
National Institute of Anthropology and History (INAH) and the Teposcolula-Yucundaa 696 
Archaeological Project for sampling permissions. We are grateful to Antje Wissgott, Guido 697 
Brandt and Verena Schuenemann for assistance with laboratory work, Annette Günzel for 698 
providing graphic support, and Rodrigo Barquera, Jim Hackett and Menaka Pi for thoughts 699 
and discussion on the manuscript. Part of the data storage and analysis was performed on the 700 
computational resource bwGRiD Cluster Tübingen funded by the Ministry of Science, 701 
Research and the Arts Baden-Württemberg, and the Universities of the State of Baden-702 
Württemberg, Germany, within the framework program bwHPC. We thank the MALT user 703 
community for helpful comments and bug reports. 704 
 705 
Author Contributions: K.I.B., M.G.C., A.H., N.T. and J.K. conceived the investigation. 706 
K.I.B., A.H. Å.J.V., M.G.C. and J.K. designed experiments. N.M.R.G. provided 707 
archaeological information and drawings, submitted INAH permits and assisted in the 708 
sampling processes. Å.J.V., M.G.C., S.S., M.A.S. and K.I.B. performed laboratory work. 709 
Å.J.V., A.H., K.I.B., C.W. and A.A.V. performed analyses. D.H. implemented the MALT 710 
algorithm. A.H., J.K. and D.H. designed and set up the MALT ancient DNA analysis 711 
pipeline. C.W. performed ethnohistorical analyses. Å.J.V. and K.I.B. wrote the manuscript 712 
with contributions from all authors.  713 
 714 
Competing financial interests: The authors declare no competing financial interests. 715 
 26
Figure 1 | Overview of Teposcolula-Yucundaa. A) Illustrates the location of the 716 
Teposcolula-Yucundaa site in the Mixteca Alta region of Oaxaca, Mexico; B) central 717 
administrative area of Teposcolula-Yucundaa showing the relative positioning of the Grand 718 
Plaza and churchyard cemetery sites. Burials within each cemetery are indicated with dark 719 
grey outlines and the excavation area is shaded in grey; C) drawing of individual 35 from 720 
which the Tepos_35 S. Paratyphi C genome was isolated. Panels B and C were adapted from 721 
drawings provided by the Teposcolula-Yucundaa archaeological project archives-INAH and 722 
ref. 30. The figure was kindly provided by Annette Günzel.  723 
 724 
Figure 2 | MALT analysis and pathogen screening of shotgun data. Shotgun data was 725 
analysed with MALT using a database constructed from all bacterial genomes available 726 
through NCBI RefSeq (December 2016). MALT results were visualized using MEGAN628. 727 
The bar chart was constructed from the MEGAN6 output and is based on the per cent reads 728 
assigned to bacterial species when using a 95% identity filter. Reads assigned to Salmonella 729 
enterica are coloured red regardless. Other taxa to which 3% or more reads, per sample, were 730 
assigned are colour-coded depending on whether they are ‘environmental’ or ‘human oral 731 
microbiome’ bacteria. Remaining taxa are sorted into two categories: ‘other environmental’ 732 
or ‘other microbiome’ (Supplementary methods 3). Samples from the post-contact Grand 733 
Plaza epidemic cemetery containing S. enterica reads, contact era samples from the 734 
churchyard cemetery and the soil sample are illustrated. Additionally, a sample negative for 735 
S. enterica from the Grand Plaza cemetery (Tepos_27) is shown. Samples whose names are 736 
coloured in red are from the Grand Plaza and those in blue from the churchyard. The 737 
percentage of reads in the shotgun data assigned by MALT per sample is indicated at the top 738 
of each column. Only taxa with 4 or more reads assigned are visualized. 739 
 740 
 741 
 27
Figure 3 | Maximum Likelihood S. enterica phylogeny. Two maximum likelihood trees 742 
were produced. Positions with missing data were excluded in both cases. In panel a) the tree 743 
includes all five ancient genomes and is based on 3-fold SNP calls. It is based on 51,456 744 
variant positions. b) shows a zoomed in view of the S. Paratyphi C genomes. In panel c) the 745 
tree includes two high-coverage ancient genomes and is based on 5-fold SNP calls. It is based 746 
on 81,474 variant positions. d) shows a zoomed in view of the S. Paratyphi C genomes, 747 
illustrating the branch shortening of the two ancient genomes (Tepos_14 and Tepos_35). 748 
Both trees were built with RAxML72. Branches coloured red indicate S. Paratyphi C genomes 749 
and branches in blue indicate other genomes that are human-specific and cause enteric 750 
(typhoid/paratyphoid) fever.  751 
 752 
 753 
 754 
 755 
 756 
 757 
 758 
 759 
 760 
 761 
 762 
 763 
 764 
 765 
 766 
 767 
 768 
 769 
 770 
 771 
 772 
 773 
 774 
 775 
 776 
 777 
 778 
 779 
 780 
 781 
 782 
 783 
 784 
 28
Table 1 | Overview of mapping statistics of captured sample libraries from the Grand 785 
Plaza (contact era) and churchyard (pre-contact) 786 
 787 
Sample ID Cemetery Site Library treatment 
# processed 
reads before 
mapping 
# Unique 
mapped 
reads 
Endogenous 
DNA (%) - 
quality filtered 
reads 
Mean Fold 
Coverage 
% of Genome 
Covered at 
least 3-fold 
Tepos_10 Grand Plaza non-UDG 
16,945,834 399,561 20.56 4.35 52.17 
UDG 68,628,270 2,903,258 16.30 32.84 95.49 
Tepos_14 Grand Plaza non-UDG 
20,559,478 1,222,402 23.51 14.41 95.77 
UDG 73,204,225 3,410,610 18.62 36.44 97.67 
Tepos_35 Grand Plaza non-UDG 
27,248,720 1,803,043 31.37 25.50 97.67 
UDG 90,815,050 7,025,774 30.00 96.43 98.06 
Tepos_11 Grand Plaza non-UDG 
21,941,119 19,576 0.87 0.21 0.93 
UDG 48,959,732 103,492 0.75 1.21 14.56 
Tepos_20 Grand Plaza non-UDG 
771,431 15,236 6.94 0.15 0.26 
UDG 20,123,713 427,781 4.75 4.59 67.53 
Tepos_34 Grand Plaza non-UDG 
18,934,710 123,307 2.55 1.35 14.65 
UDG 26,284,766 157,930 2.05 1.74 21.67 
Tepos_36 Grand Plaza non-UDG 
23,147,904 36,224 0.75 0.40 1.76 
UDG 21,910,196 33,327 0.42 0.36 1.4 
Tepos_37 Grand Plaza non-UDG 
5,223,138 218,874 9.28 2.55 42.12 
UDG 9,603,890 416,449 7.71 5.49 74.48 
Tepos_38 Grand Plaza non-UDG 
8,280,412 18,308 0.91 0.19 0.97 
UDG 47,835,731 65,812 0.54 0.67 4.22 
Tepos_41 Grand Plaza non-UDG 
17,608,445 33,664 0.72 0.37 1.47 
UDG 19,966,958 36,208 0.48 0.40 1.34 
Tepos_27 Grand Plaza non-UDG 17,931,300 4,778 0.07 0.04 0.27 
Tepos_32 Churchyard non-UDG 25,721,427 6,665 0.08 0.06 0.47 
Tepos_43 Churchyard non-UDG 31,129,662 3,426 0.05 0.03 0.25 
Tepos_45 Churchyard non-UDG 18,027,289 6,879 0.12 0.06 0.34 
Tepos_48 Churchyard non-UDG 17,915,341 4,312 0.06 0.04 0.25 
Tepos_57 Churchyard non-UDG 24,478,844 5,527 0.07 0.05 0.28 
Soil Grand Plaza & churchyard non-UDG 10,875,300 796 0.02 0.01 0.07 
 788 
 789 
 790 
 791 
 792 
 793 
 794 
 795 
 796 
 797 
 798 
 799 
 800 
 801 
 802 
 803 
 804 
 805 
 806 
 807 
 808 
 809 
 810 
M IX T E C A
A LTA
M E X I C O
G U E R R E R O
C H IA PA S
V E R A C R U Z
O A X A C A
O a x a c a  C it y
Te p o s c o lu la -
Y u c u n d aa
5 0  m i0
5 0  k m0 P a c i f i c
P U E B L A
TA B A S C O
0 0 ,5 m0 2 0  m
Te p o sc o lu la-Y u c u n d aa In d iv id u al 3 5
b u rial 1 8
CHURCHYARD
GRAND 
PLAZA
B CA
010
20
30
40
50
60
70
80
90
100
Re
ad
s 
(%
)
Grand Plaza
Post-Contact
S. enterica positive S. enterica negative
Churchyard
Pre-Contact Soil
14Tepos_# 20 34 3835 413736
2.0Assigned reads (%)
10 11
0.70.60.90.90.70.80.90.71.3
45 48 57 soil27 43
0.20.71.90.82.11.2
32
3.2
Steroidobacter denitrificans
Clostridium sporogenes
Clostridium botulinum
Clostridium tetani
Streptosporangium roseum
Other environmental
Environmental bacteria:
Microbiome bacteria:
Salmonella enterica
Other microbiome
Pseudopropionibacterium propionicum
Staphylococcus aureus
Tannerella forsythia
Actinomyces sp. oral taxon 414
Alcaligenes faecalis
Desulfomicrobium orale
Nitrobacter hamburgensis
Pseudomonas stutzeri
0.2
Tepos_37
Bareilly CFSAN000189
Typhi Ty21a
Bovismorbificans 3114
Tepos_35
Typhi CT18
arizonae serovar 62:z4,z23:- RSK2980
Typhi Ty2
Enteritidis P125109
Tepos_10
Heidelberg SL476
Tepos_20
Weltevreden 2007-60-3289-1
Cubana CFSAN002050
Paratyphi B SPB7
Paratyphi C RKS4594
Tepos_14
Typhi Pstx12
Gallinarum 287/91
Paratyphi A ATCC9150
Agona SL483
Pullorum S06004
Dublin CT_02021853
Typhimurium LT2
Typhimurium 081736
Choleraesuis SC-B67
Paratyphi A AKU12601
Schwarzengrund CVM19633
100
100
99
99
72
100
93
69
100
96
87
80
100
100
100
100
100
100
100
100
88
100
100
99
100
0.0002
Tepos_37
Tepos_35
Tepos_10
Tepos_20
Paratyphi C RKS4594
Tepos_14
72
87
80
100100
a
b
c
d
0.2
Schwarzengrund CVM19633
Typhimurium LT2
Heidelberg SL476
Choleraesuis SC-B67
Typhi Ty21a
Tepos_14
Enteritidis P125109
Bovismorbificans 3114
Agona SL483
Paratyphi A ATCC9150
Typhimurium 081736
Bareilly CFSAN000189
Cubana CFSAN002050
Paratyphi C RKS4594
Weltevreden 2007-60-3289-1
Typhi Ty2
Pullorum S06004
Typhi CT18
Paratyphi A AKU12601
Tepos_35
Dublin CT_02021853
Typhi Pstx12
Gallinarum 287/91
Paratyphi B SPB7
arizonae serovar 62:z4,z23:- RSK2980
100
100
100
99
100
100
100
100
100
100
92
100
100
87
91
100
100
87
100
93
100
100
0.0002
Tepos_14
Paratyphi C RKS4594
Tepos_35100100