METHODS Estimating global arthropod species richness: refining probabilistic models using probability bounds analysis Andrew J. Hamilton ? Vojtech Novotny? ? Edward K. Waters ? Yves Basset ? Kurt K. Benke ? Peter S. Grimbacher ? Scott E. Miller ? G. Allan Samuelson ? George D. Weiblen ? Jian D. L. Yen ? Nigel E. Stork Received: 26 October 2011 / Accepted: 7 August 2012 / Published online: 12 September 2012  Springer-Verlag 2012 Abstract A key challenge in the estimation of tropical arthropod species richness is the appropriate management of the large uncertainties associated with any model. Such uncertainties had largely been ignored until recently, when we attempted to account for uncertainty associated with model variables, using Monte Carlo analysis. This model is restricted by various assumptions. Here, we use a technique known as probability bounds analysis to assess the influence of assumptions about (1) distributional form and (2) dependencies between variables, and to construct probability bounds around the original model prediction distribution. The original Monte Carlo model yielded a median estimate of 6.1 million species, with a 90 % con- fidence interval of [3.6, 11.4]. Here we found that the probability bounds (p-bounds) surrounding this cumulative distribution were very broad, owing to uncertainties in distributional form and dependencies between variables. Replacing the implicit assumption of pure statistical Communicated by Andreas Prinzing. A. J. Hamilton (&)  P. S. Grimbacher Department of Agriculture and Food Systems, Melbourne School of Land and Environment, The University of Melbourne, Dookie Campus, 940 Dookie?Nalinga Road, Dookie College, VIC 3647, Australia e-mail: andrewjh@unimelb.edu.au P. S. Grimbacher e-mail: petersg@unimelb.edu.au V. Novotny? Biology Centre, Czech Academy of Sciences, Ceske Budejovice, Czech Republic e-mail: Novotny@entu.cas.cz V. Novotny? Faculty of Science, University of South Bohemia, Branis?ovska? 31, 370 05 Ceske Budejovice, Czech Republic E. K. Waters The University of Notre Dame Australia, PO Box 944, Broadway, NSW 2007, Australia e-mail: edward.waters@nd.edu.au Y. Basset Smithsonian Tropical Research Institute, Apartado 0843-03092, Balboa, Anco?n, Panama e-mail: bassety@si.edu K. K. Benke Department of Primary Industries, 32 Lincoln Square North, Carlton, Parkville Centre, Parkville, VIC 3052, Australia e-mail: kurt.benke@dpi.vic.gov.au S. E. Miller National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA e-mail: millers@si.edu G. A. Samuelson Bishop Museum, Honolulu, HI, USA e-mail: alsam@bishopmuseum.org G. D. Weiblen Department of Plant Biology, University of Minnesota, 220 Biological Sciences Centre, 1445 Gortner Avenue, St Paul, MN 55108-1095, USA e-mail: gweiblen@umn.edu J. D. L. Yen School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia e-mail: jdyen@iinet.net.au N. E. Stork Griffith School of Environment, Griffith University, 170 Kessels Road, Nathan, QLD 4111, Australia e-mail: nigel.stork@griffith.edu.au 123 Oecologia (2013) 171:357?365 DOI 10.1007/s00442-012-2434-5 independence between variables in the model with no dependency assumptions resulted in lower and upper p-bounds at 0.5 cumulative probability (i.e., at the median estimate) of 2.9?12.7 million. From here, replacing prob- ability distributions with probability boxes, which repre- sent classes of distributions, led to even wider bounds (2.4?20.0 million at 0.5 cumulative probability). Even the 100th percentile of the uppermost bound produced (i.e., the absolutely most conservative scenario) did not encompass the well-known hyper-estimate of 30 million species of tropical arthropods. This supports the lower estimates made by several authors over the last two decades. Keywords Host specificity  Model  Monte Carlo  Uncertainty Introduction Extrapolating global estimates of tropical arthropod species richness from samples, as first proposed by Erwin (1982) and revisited by many since (e.g., Thomas 1990; Stork 1988, 1993; ?degaard 2000; Novotny? et al. 2002), is an intriguing exercise because it potentially offers a signifi- cant short-cut that would save having to count species one by one, but at the same time it is vulnerable to producing massively misleading estimates, owing ultimately to the need to base extrapolations on host specificity measure- ments made for a minute proportion of all tropical tree species. The models are typically based upon a sample of beetle species collected from one or several tree species. This is because beetles are the most common taxon, accounting for about 25 and 40 % of all described insects and species, respectively (Hammond 1992; Yeates et al. 2003). Then, by making assumptions about host specificity to trees, the number of tropical tree species in the world, the proportions of species in the canopy and ground, and the proportion of all arthropods that are beetles, one can estimate how many tropical arthropod species might exist. The model described above is a model of mean behav- iour, that is, the average state of one parameter (species richness) of a far more complicated system over time. An individual-based model, where individual species, and even individual insects, are represented as discrete units would plainly be an insurmountable undertaking. For example, in this mean-behaviour model, the host specificity is a single parameter, but in an individual-based model it could require consideration of such things as the number of individual trees per tree species and hence the size of the populations per beetle species, the evolutionary life time of individual tree species, the number of months during which each tree carries leaves to be eaten by phytophages, the number of closely related tree species that might serve as a pool of phytophagous species to colonise a focal tree species, and the niche breadth and intraspecific differenti- ation of the tree species. Until recently, all such extrapolation mean-behaviour models were purely deterministic; that is, despite the considerable uncertainties associated with the various parameters, no attempts were made to account for these. To this end, we recently published a probabilistic model (Hamilton et al. 2010, 2011), which has been seen as a significant step forward because it was the first attempt to explicitly deal with uncertainties in the extrapolation pro- cess (May 2010). In line with previous models, the model took the following form: NAi ? xc=pcgpba   nt; ?1? where NAi is the estimator of the number of tropical arthropod species under the assumption of independence between variables, x is the average effective specialisation (May 1990) of herbivorous beetle species across all tree species, c is a correction factor for non-herbivorous beetle species, pba is the proportion of canopy arthropod species that are beetles, pcg is the proportion of all arthropod spe- cies found in the canopy, and nt is the number of tropical tree species. Note the change in notation for pba and pcg from the original model; this was done because in retro- spect the original notation was potentially ambiguous and confusing (see Hamilton et al. 2010, 2011). Probability distributions were assigned to all parameters. Implementation of our original model was achieved using Latin Hypercube Sampling (LHS), a specialised form of Monte Carlo simulation wherein probability distribu- tions are sampled in a stratified random manner (McKay et al. 1979). As with any modelling technique, Monte Carlo simulation necessitates assumptions. Thus, while this was the first attempt to account for uncertainty, the model (1) made certain assumptions about distributional form used to represent uncertainty and (2) did not consider potential dependencies between variables. Before considering the relevance of assumptions about distributional form, we need to appreciate the fundamental nature of uncertainty. While various taxonomies of uncertainty have been proposed (Kahneman and Tversky 1982; Morgan and Henrion 1990; Regan et al. 2002), there are in essence only two basic forms?variability and ignorance (Casti 1990; Benke et al. 2007). Variability represents natural randomness or stochasticity and cannot be reduced, and is often called aleatory uncertainty. Igno- rance, on the other hand, is reducible and arises from numerous factors, including, inter alia, measurement error, lack of data and small sample sizes, and personal biases, and is also known as epistemic uncertainty. Theoretically, different methods are required to propagate ignorance and variability (Ferson and Ginzburg 1996). This can be 358 Oecologia (2013) 171:357?365 123 attempted in the Monte Carlo framework using a technique known as Second-Order Monte Carlo, wherein variability is represented by a probability distribution and ignorance is characterised in an outer-loop in one of a number of ways, such as alternative model scenarios or distributional shapes (Vose 2000). However, as pointed out by Regan et al. (2004), Second-Order Monte Carlo still requires a sub- jective assessment of the realistic range of input distribu- tions. In fact, the process is innately contradictory, because the greater the ignorance, the more data are required to specify bounds on the distribution. Of course, a greater amount of data should lead to narrower bounds for the variable?s distribution. This conundrum often leaves the Monte Carlo analyst with little choice but to construct a one-dimensional model, wherein variability and ignorance are confounded in simple distributions, as we noted in our original paper. For example, very little information existed for variable c in our model, a correction factor for non-herbivorous beetle species. ?degaard (2000) identified seven different studies relevant to the determination of c. There will of course be true natural variability associated with c, as it would be unreasonable to expect that the ratio of herbivorous to non- herbivorous beetle species would be constant across all tree species throughout the tropics and at all tropical locations. Likewise, ignorance emerges from the facts that the handful of studies used to estimate c use different methods, are all subject to various limitations associated with sam- pling arthropod faunas, and all have associated biases inherent with site selection (i.e., c would ideally be deter- mined from studies of randomly selected tree species at randomly selected sites across the tropics if their sole purpose was to contribute to the estimation of this variable for this model). With such limited information available, it was clearly not possible to propagate variability and ignorance sepa- rately for c or the other variables in the model (Hamilton et al. 2010). Rather, the approach taken was to consider them together using Uniform distributions. The rationale for using the Uniform distribution to represent highly uncertain environmental variables is that it is the most conservative approach (e.g., Brook et al. 2003; Mara et al. 2007). This makes intuitive sense because we have no reason to favour the selection of any value in the range over another. Upon closer inspection though, the Uniform actually makes some potentially significant assumptions about a variable. Consider c again, which covers the interval [1.79, 2.70]. The Cumulative Distribution Function (CDF) of a Uniform distribution for this interval is a per- fectly linear monotonically increasing function where the mean = mode = median = 2.25. In theory, an infinite number of distributions could describe this interval, and these would be bounded within a box defined by two vertical lines extending from 0 to 100 % cumulative probability at the minimum and maximum values. But there can be only one true distribution representing vari- ability for this interval, yet its form is unknown to us, and this ignorance needs to be expressed through allowing variation in shape. Using a single distribution (be it Uni- form, Triangular or something else) ignores shape uncer- tainty (a sub-set of ignorance), and therefore leads to an overstatement of confidence. The second problem associated with the application of Monte Carlo techniques to ecological models is that uncertainties about dependencies between variables cannot be expressed (Ferson 1996, 2002). Knowledge about dependencies is typically very poor in ecological models. Clearly, natural systems are complex and dependencies between variables are likely to exist, and these species richness estimation models are no exception. For example, the Janzen?Connell hypothesis (Janzen 1970; Connell 1971) proposes that predation on plants (in part by arthropods) is one of the mechanisms leading to the high plant richness found in the tropics. But there is also a reciprocal relationship because a greater diversity of plant resources provides opportunities for arthropods to spe- cialise over time and thus diversify (Janz et al. 2006). Therefore, over evolutionary time, plant and arthropod communities interact through positive feedback and this increases the richness of both groups. Of course, the exact details and form of such dependencies are unclear, but the fact that they could exist means that they should not be ignored. It is possible that the dependencies themselves are not stable, and are likely to change over evolutionary time and even as a function of anthropogenic changes. Highly specialised species, for example, are not necessarily des- tined for an evolutionary dead-end, and can even give rise to generalists (Colles et al. 2009). Despite the complexity of such dependencies, the time is ripe to at least introduce the concept to species richness models so that advances in evolutionary biology can be used to modify these simple sample extrapolation models. With respect to modelling dependencies, the default approach of independence, which is rarely stated explicitly, is intuitively appealing because there is usually not a clear dependency relationship between the various pairs of variables. However, as noted by Tucker and Ferson (2003), independence implies zero correlation but zero correlation does not demand independence. Furthermore, the possi- bility of higher-order dependencies should not be excluded. That is, not all dependencies will be pair-wise, between two variables: some could be multivariate. Consequently, Ferson (2002) suggests that models should start by making the assumption of dependence between all variables and at all levels, and independence should be assumed only when sound empirical information exists to support it. Vose Oecologia (2013) 171:357?365 359 123 (2000), whilst acknowledging it is a contentious issue, takes the opposing view, and suggests that one should avoid attempting to model correlation ?where there is neither a logical reason nor evidence for its existence.? In line with many Monte Carlo ecological models of systems wherein very little is known about the nature of inter-var- iable dependencies (Jonzen et al. 2002; Brook et al. 2003), our original species richness model invoked Vose?s phi- losophy. While dependencies can be specified in the Monte Carlo construct, uncertainty about their nature?magnitude and form?cannot be accommodated. In other words, ignorance about the dependencies cannot be included. Also, as noted by Ferson et al. (2004), the use of correlation coefficients to define dependencies?the typical approach used in Monte Carlo (Vose 2000)?is weak, as a depen- dency needs to be described by a complete dependency function (a copula), and several copulae can in fact have the same correlation. Finally, Monte Carlo methods do not readily allow for modelling of higher-order dependencies. Probability bounds (p-bounds) analysis necessitates neither subjective assumptions about distributional form nor the nature of dependencies, and has proved useful in ecological models, where large uncertainties are often associated with these properties (Ferson 2002; Regan et al. 2002). Briefly, p-bounds analysis deals with classes of distributions rather than individual distributions (Frank et al. 1987; Williamson and Downs 1990). It not only offers a method for computing the bounds for a given variable but also enables the convolution (e.g., multipli- cation, division, addition, subtraction or exponentiation) of these distributional classes, and thus propagation of igno- rance and variability together, through the model. While confidence intervals or credible intervals set bounds around a statistic for a variable, effectively as a function of its distribution, p-bounds are bounds surrounding the proba- bility distribution itself. P-bounds must be expressed in terms of the CDF, not probability density or mass func- tions. In essence, p-bounds analysis can be seen as a highly conservative technique for determining the limits of an infinite array of possible CDFs, and it has been described by Burgman (2005) simply as a more honest approach because the analyst is not forced to make unjustified assumptions to satisfy a mathematical framework (cf. Monte Carlo). Philosophically, p-bounds analysis involves specifying total possible uncertainty and then explicitly removing it, whereas Monte Carlo approaches require uncertainty to be explicitly included. Ferson (2002) describes p-bounds analysis as a useful method for pro- viding ?quality assurance for Monte Carlo results?. Regan et al. (2002), for example, found for a food-web model that the p-bounds analysis was useful for checking the plausi- bility of a Monte Carlo model. It is also worth noting that they found the p-bounds envelope on the CDF to be markedly broader than one generated by a second-order Monte Carlo analysis. Here we use probability bounds analysis to explore the implications of assumptions on the independence of vari- ables and distributional forms used to account for uncer- tainties made by a previous model on the global species richness estimate for tropical arthropods. Materials and methods Hamilton et al. (2010, 2011) presented two models, A and B, which were respectively based on the estimated number of tree species in the tropics and the number of tropical tree genera in New Guinea alone. Here, p-bounds modelling is applied to Model A only (Eq. 1), as this is overwhelmingly the most common approach to the problem (Erwin 1982; ?degaard 2000; Stork 1988; Thomas 1990). Furthermore, the two models are otherwise analogous. Model A is described in detail in Hamilton et al. (2010), with termi- nology specifically appropriate to the LHS methodology used. For the LHS implementation of Model A the following Uniform distributions were used for four variables: c = 1.79?2.70, pcg = 0.25?0.66, pba = 0.18?0.33, nt = 43,000? 50,000. The variable x is the product of nk, the number of herbivorous canopy beetle species on tree species k (k ? 1; 2; . . .; l), and fk, the proportion of the beetle species effectively specialised on that species (see Hamilton et al. 2010 for calculation of fk). A distribution for x was then obtained by producing 500,000 non-parametric bootstrap estimates of nkfk. The reader is also directed to the published corrigendum (Hamilton et al. 2011). It is also important to note that x represents an estimate of the average effective spe- cialisation rather than the effective specialisation of a given tree species, nkfk. A distribution of x is what is required for this model. Drawing realisations from the distribution of nkfk would result in a distribution of imprecise estimates of tropical arthropod species richness (i.e., where each estimate is based on a single tree species), rather than a distribution of precise estimates with each being based upon the suite of species. This potential pitfall is common in uncertainty models, as descri- bed by Karavarsamis and Hamilton (2010) in a health risk context. Calculations in p-bounds analysis are made on p-boxes. A p-box is defined as the class of CDFs (F(y)) bounded by a pair of CDFs, F?y? and F?y?, such that F?y? F?y? F?y?. In our models, two types of p-boxes were constructed for each variable. First, the entire cumulative probability space within the possible range for the variables was rep- resented using ?minimum?maximum? (min?max) boxes. This is superficially, and perhaps intuitively, analogous to the use of a Uniform distribution in a Monte Carlo analysis, 360 Oecologia (2013) 171:357?365 123 although it is in fact quite different because it permits all possible cumulative distributions between specified 0th and 100th cumulative percentiles. Second, ?empirical histo- gram? p-boxes were used to include all the available esti- mates of variables, not just the minima and maxima. In line with the original paper, we used the review of ?degaard (2000) to obtain estimates of c, pcg and pba. It is worth noting that the various studies listed by ?degaard are not all directly comparable, owing to different sampling techniques, and they do not always explicitly represent the variable of interest, but they characterise the best available information and hence have been used by many authors in extrapolating tropical arthropod species richness estimates (May 1990; Thomas 1990; Stork 1988, 1993; Novotny? et al. 2002). Therefore, the following p-boxes were con- structed, where MM and EH, respectively, denote variables for which min?max and empirical histogram boxes have been defined, and ?minmax? and ?histogram? is the respective coding terminology used by Ferson (2002): cMM = minmax (1.79?3.37), cEH = histogram (1.79, 2.70, 1.79, 2.13, 2.27, 2.44, 2.50, 2.70), pcgMM = minmax (0.25, 0.66), pcgEH = histogram (0.25, 0.66, 0.25, 0.33, 0.5, 0.66), pbaMM = minmax (0.18, 0.33), pbaEH = histogram (0.18, 0.33, 0.18, 0.22, 0.23, 0.23, 0.33), and ntMM = minmax (43,000, 50,000). Note that there were only two estimates available for nt (see Hamilton et al. 2010), hence only a min?max box is required. All the empirical histogram p-boxes are shown in Fig. 1. Min?max p-boxes are not shown because they are simply vertical lines extending from zero to 1 cumulative probability at the minima and maxima. x is the only variable for which p-bounds were not constructed (Fig. 1). Bootstrapping is a sampling proce- dure, and therefore the resultant distribution will converge to the Normal with increasing sample size, owing to the Central Limit Theorem. P-bounds are appropriate when uncertainty about distributional form exists, but that is not the case here. As noted in Hamilton et al. (2010), the dataset we used for determining the number of herbivorous beetle species effectively specialised on a given tree spe- cies is substantially larger than any other available dataset for this parameter. In any case, it is necessary to use a method that is congruent with the original model, so that the implications of the assumptions stated above can be assessed. Given that 500,000 bootstrap replicates were taken, the CDF of x is Normal. This CDF jointly expresses variability and ignorance. Empirical p-bounds constructed from multiple bootstrap replicates simply converge to the CDF as the number of replicates increases, and are meaningless in the context of representing natural variation and ignorance, reflecting nothing other than the effect of computational sample size. The original LHS estimate was recalculated (NAi, Eq. 1). Additionally the following estimators of tropical arthropod species richness were solved for: Fig. 1 a Cumulative distribution function of the average effective specialisation across all tree species (x). b?d Empirical probability bounds (solid lines) for the cumulative distribution functions of the correction factor for non-herbivorous arthropods (cEH), the proportion of all arthropods found in the canopy (pcgEH), and the proportion of beetles that are arthropods (pbaEH), respectively. The dashed lines in (b?d) represent the cumulative distribution function for uniform distributions defined as follows: c = Uniform (1.79, 2.70), pcg = Uniform (0.25, 0.66), pba = Uniform (0.18, 0.33) Oecologia (2013) 171:357?365 361 123 NAd ? xc  pcgpba   nt; ?2? where the subscript d denotes that no assumptions are made about the nature of dependencies between variables and all variables are represented by the distributions described for Eq. 1, NAiMM ? x j jcMM =j jpcgMM j jpbaMM   j jntMM; ?3? where all variables are assumed to be statistically independent (i), as marked by the pipes (paired vertical lines) either side of each operator, and most variables are represented by min?max p-boxes, NAdMM ? xcMM  pcgMMpbaMM   ntMM; ?4? NAiEH ? x j jcEH =j jpcgEH j jpbaEH   j jntEH; and ?5? NAdEH ? xcEH  pcgEHpbaEH   ntEH: ?6? P-boxes were specified and convolved using RAMAS Risc Calc 4.0 (Ferson 2002). Distributions with infinite tails, such as the Normal, which was used for x, cannot be convolved in p-bounds analysis, and therefore truncation was enforced at 0.005 and 0.995 cumulative probability. Truncation was not necessary for the distributions with finite bounds (i.e., minimum?maximum and empirical histogram). Empirical histogram bounds were constructed using Kolmogorov?Smirnov confidence limits of 95 %. The mathematics behind convolving p-boxes is described elsewhere (Frank et al. 1987; Williamson and Downs 1990). Copulae were used to convolve distributions in solving Eqs. (2)?(6). A copula is a function that describes the dependence relationship between multiple variables by transforming the marginal distributions of each variable to uniform distributions. This works because any variable in the model, y, can be represented by a generalised inverse, v = F-1(u), where F-1 is an inverse CDF and u is a uni- formly distributed random variable. Thus a copula func- tion, C, is defined as a function, f, of the generalised inverses U ? u1; . . .; ud of d variables, Y ? y1; . . .; yd, in the model so that C?U? ? f F11 ?u1?; . . .; F1d ?ud?   : ?7? Independence is a special copula function. Where indepen- dence is assumed C?U? ? Qd i?1 u1; which is identical to the form of Eq. (1). Where no assumptions were made about dependencies, Ferson?s et al. (2004) approach of convolving p-boxes within Fre?chet bounds was used. Let Cd?U? ? Xd i?1 C?U? ! ; ?8? where C(U) is a copula fitting within the lower Fre?chet copula bounds CF?U? ? max Xd i?1 ui !  1; 0 ! : ?9? Thus both the lower and upper bounds of the dependency function governing Y enclose a copula describing the depen- dencies in terms of the general inverse U. Both the upper and the lower bounds enclosing this copula can be expressed in terms of the Fre?chet lower bounds, which can then be used to elucidate the nature of the dependency function of Y (Ferson et al. 2004). All possible copulae describing Y are enclosed within the Fre?chet bounds such that no assumptions about dependency need be made. Kendall?s grade correlation was then used to describe the nature of the dependencies identified through this process (Ferson et al. 2004). Results The Monte Carlo-LHS model yielded a median estimate of 6.1 million species, with a 90 % confidence interval of [3.6, 11.4] million (Fig. 2a). Simply replacing the assumption of pure statistical independence between vari- ables in the model with no dependency assumptions resulted in reasonably broad p-bounds (Fig. 2b), with lower and upper bounds at 0.5 cumulative probability (i.e., at the median estimate) of 2.9?12.7 million. Bounding the input variables had an even larger effect on the bounds for the prediction. In the case of min?max bounds, the probability envelope was so wide and steep that there was negligible difference with respect to the dependency assumptions, with pure independence and no dependency assumptions yielding bounds at 0.5 cumulative probability of 2.35?19.7 million and 2.4?20.0 million, respectively (Fig. 2c, d). Furthermore, the shapes of the bounds were almost iden- tical in both these cases. The use of empirical bounds on the input variables had negligible impact relative to the min?max bounds, and, in fact, for the case of no depen- dency assumptions, the lower and upper bounds at 0.5 cumulative probability were identical (2.4?20.0 million) to those for the parallel min?max case, but the bounds did vary slightly in shape from those produced from the min? max model (Fig. 2f). Likewise, under the assumptions of pure statistical independence the empirical bounding approach produced bounds of slightly different shape to the min?max model, but the values at 0.5 cumulative proba- bility were very similar (2.7?18.4 million) (Fig. 2e). Discussion Probability bounds analysis was used to assess the plausi- bility of a Monte Carlo model of tropical arthropod species 362 Oecologia (2013) 171:357?365 123 richness. While broad, the bounds rule out the possibility of estimates of 30 million species or greater, with the 100th percentile of the right-hand bound?i.e., the absolutely most conservative scenario?being \30 million in each case. P-bounds define the cumulative probability space in which the true distribution will lie, but it is important to note that each of the infinite number of CDFs within this space is not equally likely. In fact, it could be argued that this approach is markedly too conservative, as it even allows for highly unlikely distribution forms, such as multimodal, that are probably inappropriate for the parameters in this model (or indeed the prediction). Inter- estingly, the CDF of the Monte Carlo model was always situated toward the left-hand side of cumulative probability space defined by the p-bounds, regardless of the depen- dency or distributional form assumptions made in assigning these bounds. The reason for this is unknown. Removing the assumption of independence between variables in the original model and replacing it with no dependency assumptions resulted in reasonably broad p-bounds (Fig. 1b). While the state of knowledge about the nature of dependencies between the model variables is very poor, it is reasonable to expect that some dependencies will indeed exist. For example, the richness and specialisation of insects is not independent of tropical tree species rich- ness, as illustrated through the Janzen?Connell hypothesis (Janzen 1970; Connell 1971), but it is difficult to know the strength of the relationship because the relative contribu- tions of phylogenetic conservativeness and geographic contingency and local mass effects in the assemblage of communities remain unclear (Go?ner et al. 2009). It would also be reasonable to hypothesise that the proportion of arthropods?including beetles?that are herbivores is likely to be dependent upon plant species richness. On the whole, plant-feeding arthropods are more specialised and constrained in the diversity of their resource use than non- plant-feeding species, such as carnivores and fungivores (but not parasitoids) (Ross et al. 1982). It could be that this is because dealing with plant physical and chemical her- bivore deterrents is more digestively demanding and thus requires a more specialised digestive system. Therefore, it could be hypothesised that over time increasing tree diversity would likely lead to more herbivores and alter the ratio of herbivores to non-herbivores. The relationship between tree species richness and the ratio of herbivores to non-herbivores logically leads to the possibility of Fig. 2 a Cumulative distribution function for the original Monte Carlo estimator NAi (dotted curved line) and associated 5th and 95th confidence limits (dotted vertical lines). The filled circle marks the median. b?f Probability bounds (solid lines) for the estimation of tropical arthropod species richness using the following estimators: b NAd the original estimator but with no dependency assumptions; c NAiMM all variables assumed to be statistically independent from each other and represented with min?max p-boxes; d NAdMM no dependency assumptions, min?max p-boxes; e NAiEH statistical independence; empirical histogram p-boxes; f NAdEH no dependency assumptions, empirical histogram p-boxes. The dotted line in each plot represents the cumulative distribution function for NAi Oecologia (2013) 171:357?365 363 123 secondary dependencies with the canopy to ground ratio, and the proportion of non-beetle arthropods. The canopy to ground ratio may be dependent upon plant species diversity if there are more plant-feeding arthropods in the canopy (e.g., Grimbacher and Stork 2007). Arthropods associated with the ground are likely to show lower levels of specialisation (e.g., Crutsinger et al. 2008; Donoso et al. 2010). Thus, if there are more plant- feeding arthropods in the canopy, then over evolutionary time an increase in tree species richness might be expected to lead to increasing arthropod richness in the canopy at a greater rate, relative to the ground. For similar reasons, it may be that the proportion of non- beetle arthropods found in the canopy relative to the ground is related to tree species richness. Unlike the Coleoptera, which are highly diverse in their feeding ecology, species from most insect orders are likely to have one main mode of feeding (Ross et al. 1982). The Lepidoptera, for example, are pre- dominantly herbivorous, while the non-ant Hymenoptera are largely predatory or parasitoid. Because not all arthropod orders contribute equally to global species richness (Nielsen and Mound 2000), and herbivores are likely to have a much tighter association to tree species richness than non-herbi- vores, the ratio of canopy to ground diversity is likely to alter the relative contribution of non-beetle arthropods. Of course, these are just some examples of potential dependencies between variables typically used in an extrapo- lation model of species richness. The exact form of these dependencies is unknown; there are likely to be other depen- dencies, including those of higher-order. The theoretical arguments given above cover only some processes that could influence dependencies?there may indeed be other processes negating, antagonising, or complementing these. It is for these reasons that Fre?chet?s (1935) copula was used to convolve the distributions, as it makes no assumptions about the nature of the dependencies. Other copulae can be used to specify other dependencies, such as perfect, opposite, positive, negative, straight-positive, and straight-negative, and these can be implemented in RiscCalc (Ferson et al. 2004). With further ecological and evolutionary insight into the nature of the dependencies, these less conservative copulae could be used in such models. But before any gains are to be made in narrowing the bounds through this means, the more problematic issue of uncertainty associated with the model variables needs to be addressed, as discussed in the original manuscript (Hamilton et al. 2010). This had a larger effect on the breadth of the bounds than the independence assumption (Fig. 2b, c, e). Since Erwin presented the extrapolation approach to esti- mating tropical arthropod species richness, overwhelmingly the debate has centred around what values best represent the variables (Erwin 1988; Thomas 1990; Stork 1988, 1993; ?degaard 2000), but this thinking needs to be broadened to consider the relevance of potential dependencies between variables and the relative merits of different technical approa- ches to uncertainty modelling in this context. Furthermore, other methods of estimating tropical arthropod species richness would benefit from more thorough use of uncertainty model- ling, including, inter alia, extrapolations from known faunas and regions, methods using ecological models, eliciting tax- onomists? views, and species description rates (Stork 1993). Mora et al. (2011) recently made a step in this direction through accommodating uncertainty in their taxonomic-level based global species richness model, which produced a median esti- mate of 8.7 million eukaryotic organisms on Earth (?1.3 mil- lion SE), with 6.5 million of these being terrestrial, which accords well with our original median estimate of 6.1 million tropical arthropods (Hamilton et al. 2010, 2011). Another recently published model (Costello et al. 2012), based on species description rates, also accounted for uncertainty, and produced a median estimate of 490,960 (95 % CI = 449,010, 477,990) terrestrial species remaining to be described, which equates to only 1.6?1.7 million terrestrial species existing globally, a much lower prediction than that of Mora et al. (2011) and Hamilton et al. (2010, 2011). The variation in such models highlights the need to consider uncertainty surrounding this important question even more broadly, that is, not just within models but between models. While tropical arthropods species are of primary interest on a global scale, given their high richness and the potentially huge numbers of undescribed species, improved uncertainty modelling can contribute also to other large-scale species richness estimates, be it European marine species (Wilson and Costello 2005) or flowering plants globally (Joppa et al. 2011). Every statistical approach has something to offer but equally has its limitations; the next step in tackling this important question will be to combine models and their associated uncertainties, perhaps using techniques such as Bayesian modelling averaging. Acknowledgments Cindy Hauser provided useful technical com- ments on a draft of this manuscript. The host specificity studies in New Guinea, upon which this model draws substantially, were supported by the National Science Foundation (USA) (DEB-0841885), Christensen Fund (USA), Grant Agency of the Czech Republic (206/09/0115), Czech Academy of Sciences, the Swedish Natural Science Research Council, Czech Ministry of Education (CZ.1.07/2.3.00/20.0064, LH11008), Otto Kinne Foundation, Darwin Initiative (UK) (19-008), International Centre of Insect Physiology and Ecology (ICIPE) and Bishop Museum. Parataxonomists in New Guinea are thanked for their assistance and are listed in Novotny? et al. (2002). This paper is dedi- cated to the late Ken Hamilton, the consummate logician and giver. References Benke KK, Hamilton AJ, Lowell K (2007) Uncertainty analysis and risk assessment in the management of environmental resources. Aust J Environ Manag 14:243?249 Brook BW, Sodhi NS, Ng PKL (2003) Catastrophic extinctions follow deforestation in Singapore. Nature 424:420?423. doi: 10.1038/nature01795 364 Oecologia (2013) 171:357?365 123 Burgman M (2005) Risks and decisions for conservation and environmental management, 1st edn. Cambridge University Press, Cambridge Casti JL (1990) Searching for Certainty, 1st edn. William Morrow and Company, London Colles A, Liow LH, Prinzing A (2009) Are specialists at risk under environmental change? Neoecological, paleoecological and phylogenetic approaches. Ecol Lett 12:849?863. doi:10.1111/ j.1461-0248.2009.01336.x Connell JH (1971) On the role of natural enemies in preventing competitive exclusion in some marine animals and in rain forest trees. In: Den Boer PJ, Gradwell G (eds) Dynamics of populations. Centre for Agricultural Publishing and Documen- tation, Wageningen, pp 298?312 Costello MJ, Wilson S, Houlding B (2012) Predicting total global species richness using rates of species description and estimates of taxonomic effort. Syst Biol (in press) Crutsinger GM, Reynolds WN, Classen AT, Sanders NJ (2008) Disparate effects of plant genotypic diversity on foliage and litter arthropod communities. Oecologia 158:65?75. doi:10.1007/ s00442-008-1130-y Donoso DA, Johnston MK, Kaspari M (2010) Trees as templates for tropical litter arthropod diversity. Oecologia 164:201?211. doi: 10.1007/s00442-010-1607-3 Erwin TL (1982) Tropical forests: their richness in Coleoptera and other arthropod species. Coleopt Bull 36:74?75 Erwin TL (1988) The tropical forest canopy: the heart of biotic diversity. In: Wilson EO (ed) Biodiversity. National Academy Press, Washington, pp 123?129 Ferson S (1996) What Monte Carlo methods cannot do. Hum Ecol Risk Assess 2:990?1007. doi:10.1080/10807039609383659 Ferson S (2002) RAMAS Risk Calc 4.0 software: risk assessment with uncertain numbers. Lewis, Boca Raton Ferson S, Ginzburg LR (1996) Different methods are needed to propagate ignorance and variability. Reliab Eng Syst Safe 54:133?144. doi:10.1016/S0951-8320(96)00071-3 Ferson S, Nelsen RB, Hajagos J, Berleant DJ, Zhang J, Tucker WT, Ginzburg LR, Oberkampf WL (2004) Dependence in probabi- listic modelling, Dempster?Shafer theory, and probability bounds analysis. Sandia National Laboratories, New Mexico. http://www.ramas.com/depend.pdf Frank MJ, Nelsen RB, Schweizer B (1987) Best-possible bounds for the distribution of a sum?a problem of Kolmogorov. Probab Theory Rel 74:199?211. doi:10.1007/BF00569989 Fre?chet M (1935) Ge?ne?ralisations du theorem des probabilities totales. Fund Math 25:379?387 Go?ner MM, Chao A, Bailey RI, Prinzing A (2009) Native fauna on exotic trees: phylogenetic conservatism and geographic contin- gency in two lineages of phytophages on two lineages of trees. Am Nat 173:599?614 Grimbacher PS, Stork NE (2007) Vertical stratification of feeding guilds and body size in beetle assemblages from an Australian tropical rainforest. Austral Ecol 32:77?85. doi:10.1111/j.1442-9993.2007.01735.x Hamilton AJ, Basset Y, Benke KK, Grimbacher PS, Miller SE, Novotny0 V, Samuelson GA, Stork NE, Weiblen GD, Yen JDL (2010) Quantifying uncertainty in tropical arthropod species richness estimation. Am Nat 176:90?95. doi:10.1086/652998 Hamilton AJ, Basset Y, Benke KK, Grimbacher PS, Miller SE, Novotny0 V, Samuelson GA, Stork NE, Weiblen GD, Yen JDL (2011) Correction: quantifying uncertainty in tropical arthropod species richness estimation. Am Nat 177:544?545 Hammond PM (1992) Species inventory. In: Groombridge B (ed) Global biodiversity: status of the earth?s living resources. Chapman and Hall, London, pp 17?39 Janz N, Nylin S, Wahlberg N (2006) Diversity begets diversity: host expansions and the diversification of plant-feeding insects. BMC Evol Biol 6:1?10. doi:10.1186/1471-2148-6-4 Janzen DH (1970) Herbivores and number of tree species in tropical forests. Am Nat 104:501?528 Jonzen N, Cardinale M, Ga?rdmark A, Arrhenius F, Lundberg P (2002) Risk of collapse in the eastern Baltic cod fishery. Mar Ecol Prog Ser 240:225?233 Joppa LN, Roberts DL, Pimm SL (2011) How many species of flowering plants are there? Proc R Soc Lond B 278:554?559 Kahneman D, Tversky A (1982) Variants of uncertainty. In: Kahneman D, Slovic P, Tversky A (eds) Judgements under uncertainty: heuristics and biases. Cambridge University Press, Cambridge, pp 509?520 Karavarsamis N, Hamilton AJ (2010) Estimators of annual infection risk. J Water Health 80:365?373. doi:10.2166/wh.2010.045 Mara DD, Sleigh PA, Blumenthal UJ, Carr RM (2007) Health risks in wastewater irrigation: comparing estimates from quantitative microbial risk analyses and epidemiological studies. J Water Health 5:39?50. doi:10.2166/wh.2006.055 May RM (1990) How many species? Philos Trans R Soc Lond B 330:293?304. doi:10.1098/rstb.1990.0200 May RM (2010) Tropical arthropod species, more or less. Science 329:41?42. doi:10.1126/science.1191058 McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21:239?245 Mora C, Tittensor DP, Adl S, Simpson AGB, Worm B (2011) How many species are there on earth and in the ocean. PLoS Biol 9:e1001127 Morgan MG, Henrion M (1990) Uncertainty: a guide to dealing with uncertainty in quantitative risk and policy analysis, 1st edn. Cambridge University Press, Cambridge Nielsen ES, Mound LA (2000) Global diversity of insects: the problems of estimating numbers. In: Raven PH, Williams T (eds) Nature and human society: the quest for a sustainable world. National Academy Press, Washington, pp 212?222 Novotny? V, Basset Y, Miller SE, Weiblen GD, Bremer B, Cizek L, Drozd P (2002) Low host specificity of herbivorous insects in a tropical forest. Nature 416:841?844. doi:10.1038/416841a ?degaard F (2000) How many species of arthropods? Erwin?s estimate revised. Biol J Linn Soc 71:583?597. doi:10.1006/bijl.2000.0468 Regan HM, Hope BK, Ferson S (2002) Analysis and portrayal of uncertainty in a food web exposure model. Human Ecol Risk Assess 8:1757?1777 Regan HM, Ferson S, Berleant D (2004) Equivalence of five methods for bounding uncertainty. Int J Approx Reason 36:1?30 Ross HH, Ross CA, Ross JPR (1982) A textbook of entomology, 4th edn. Wiley, New York Stork NE (1988) Insect diversity: facts, fiction and speculation. Biol J Linn Soc 35:321?337. doi:10.1111/j.1095-8312.1988.tb00474.x Stork NE (1993) How many species are there? Biodivers Conserv 2:215?232. doi:10.1007/BF00056669 Thomas CD (1990) Fewer species. Nature 347:237. doi:10.1038/ 347237a0 Tucker WT, Ferson S (2003) Probability bounds analysis in environmental risk assessment, 1st edn. Applied Biomathemat- ics, Setauket Vose D (2000) Risk analysis: a quantitative guide, 2nd edn. Wiley, Chichester Williamson RC, Downs T (1990) Probabilistic arithmetic I: numerical methods for calculating convolutions and dependency bounds. Int J Approx Reason 4:89?158 Wilson SP, Costello MJ (2005) Predicting future discoveries of European marine species by using a nonhomogeneous renewal process. App Stat 54:897?918 Yeates DK, Meier R, Wiegmann BM (2003) Phylogeny of true flies (Diptera): a 250 million year old success story in terrestrial diversification. Entomol Abh 61:170?172 Oecologia (2013) 171:357?365 365 123