Environ Ecol Stat (2011) 18:709?733 DOI 10.1007/s10651-010-0158-4 Geoadditive regression modeling of stream biological condition Matthias Schmid ? Torsten Hothorn ? Kelly O. Maloney ? Donald E. Weller ? Sergej Potapov Received: 19 December 2009 / Revised: 12 July 2010 / Published online: 16 November 2010 ? Springer Science+Business Media, LLC 2010 Abstract Indices of biotic integrity have become an established tool to quantify the condition of small non-tidal streams and their watersheds. To investigate the effects of watershed characteristics on stream biological condition, we present a new tech- nique for regressing IBIs on watershed-specific explanatory variables. Since IBIs are typically evaluated on an ordinal scale, our method is based on the proportional odds model for ordinal outcomes. To avoid overfitting, we do not use classical maximum likelihood estimation but a component-wise functional gradient boosting approach. Because component-wise gradient boosting has an intrinsic mechanism for variable selection and model choice, determinants of biotic integrity can be identified. In addi- tion, the method offers a relatively simple way to account for spatial correlation in ecological data. An analysis of the Maryland Biological Streams Survey shows that nonlinear effects of predictor variables on stream condition can be quantified while, in addition, accurate predictions of biological condition at unsurveyed locations are obtained. Electronic supplementary material The online version of this article (doi:10.1007/s10651-010-0158-4) contains supplementary material, which is available to authorized users. M. Schmid ( B ) ? S. Potapov Institut f?r Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universit?t Erlangen-N?rnberg, Waldstra?e 6, 91054 Erlangen, Germany e-mail: matthias.schmid@imbe.med.uni-erlangen.de T. Hothorn Institut f?r Statistik, Ludwig-Maximilians-Universit?t M?nchen, Ludwigstra?e 33, 80539 Munich, Germany K. O. Maloney ? D. E. Weller Smithsonian Environmental Research Center, 647 Contees Wharf Rd., P.O. Box 28, Edgewater, MD 21037-0028, USA 123 710 Environ Ecol Stat (2011) 18:709?733 Keywords Proportional odds model ? Gradient boosting ? Geoadditive regression ? Stream biological condition ? Maryland Biological Streams Survey 1 Introduction In view of the growing impact of humans on their natural environment, conserving and managing small streams and their watersheds have become important. Policy makers and land managers must assess the ecological effects of their decisions on streams, but also have to investigate the impacts of stream degradation on human health and the quality of life. For these reasons, a detailed understanding of the relationship between anthropogenic stressors and stream ecosystems is essential (Cushing and Allan 2001; USEPA 2006; Maloney et al. 2009). To develop that understanding, ecologists and statisticians need to quantify how watershed characteristics affect stream biological condition. Because small streams are numerous, assessing the biological condition of all streams in a landscape would be logistically impractical and cost prohibitive. It is therefore necessary to develop predictive models for site-specific stream condition using data from a limited number of sample sites. Ideally, those models would both quantify the effects of anthropogenic stressors on stream condition and accurately predict biological condition at unsurveyed locations (USEPA 2006). In recent years, a wide range of statistical tools to characterize and to model the condition of small streams have been developed (see, e.g., Barker et al. 2006; Collier 2009; Maloney et al. 2009, or Cooper 2009 for recent studies in this field). The responses of streams to anthropogenic stress are often examined using biologi- cal metrics that describe biological conditions from structural and functional measures of the biological community (Karr 1991; Barbour et al. 1999). However, singlemetrics only measure a single feature of the community (e.g., number of taxa or diversity) and may not capture the effects of multiple stressors. Therefore, stream assessments usu- ally compile several singlemetrics that are selected a priori to relate stream impairment to anthropogenic stress and then combine those metrics into a single multimetric index of biotic integrity (?IBI?, Karr et al. 1986; Schleiger 2000; Southerland et al. 2005). These IBI indicators can then be statistically related to watershed-specific predictor variables using modeling approaches such as linear or ordinal regression, principal component analysis, or tree-based methods such as CART and random forests. In this paper, we address the problem of developing predictive models for indices of biotic integrity for fish (FIBI) and benthic macroinvertebrates (BIBI). Both indices have become widely established tools for characterizing stream biological condition (Karr et al. 1986; Barbour et al. 1999; Southerland et al. 2005). When modeling FIBI and BIBI indicators, the following key issues need to be addressed: 1. IBI indicators are typically evaluated on an ordinal scale (e.g., using categories ranging from ?poor condition? to ?very good condition?). Although it is possi- ble to use linear regression methods to model ordinal indicators, ordinal regres- sion models such as the proportional odds model are a more appropriate choice (McCullagh 1980; Agresti 2002; Bigler et al. 2005). However, if maximum likeli- hood estimation is used to fit the model, and a large number of predictor variables are considered, proportional odds models tend to overfit the data. Usually, this 123 Environ Ecol Stat (2011) 18:709?733 711 leads to a decrease in prediction accuracy. On the other hand, heuristic strategies to control the number of predictors are often biased and imprecise (Rawlings et al. 1998). 2. Stream condition is affected by many factors that are often highly correlated. Moreover, spatial correlation is usually evident in ecological data (Peterson and Urquhart 2006; Gelfand 2007). A statistical model must be able to identify the most important factors and to account for spatial correlation in the data. In addi- tion, prediction models should be able to represent nonlinear relationships that often exist between predictors and indicators of stream biological condition. 3. It is well-known that maximizing prediction accuracy does not necessarily go hand in hand with finding a statistical model that is easy to interpret. Common examples are statistical learning techniques such as bagging or random forests (Breiman 2001; Cutler et al. 2007), which yield ?black-box? predictions that are typically accurate but lack interpretability. This is not desirable in situations where effects of predictor variables need to be quantified. The aim of this paper is to develop a statistical method for modeling IBI indicators that simultaneously addresses all issues outlined above. Following Agresti (2002), we use the proportional odds model framework to accomodate the ordinal structure of IBI indicators (issue 1). To avoid overfitting the data, however, we do not use classi- cal maximum likelihood estimation to obtain model estimates but a component-wise gradient boosting approach (for an overview of boosting methods, see B?hlmann and Hothorn 2007). Because component-wise gradient boosting has a built-in mechanism for variable selection and shrinkage of estimates, the method can be used to obtain regularized fits of many types of statistical models. Consequently, heuristic techniques for variable selection and model choice are not needed. In recent years, various authors have shown that gradient boosting can be modified such that prediction accuracy is optimized while, in addition, a meaningful interpreta- tion of the model estimates is possible (Friedman et al. 2000; B?hlmann and Yu 2003; B?hlmann and Hothorn 2007; Kneib et al. 2009). Regarding issue 3, this is exactly what one wants to achieve: The structure and the interpretability of the proportional odds model is preserved while, in contrast to maximum likelihood estimation, predic- tion accuracy is maximized by fitting the model in a regularized way. Most notably, by using penalized regression splines to model effects of covariates, nonlinear rela- tionships and spatial information can be easily incorporated into the prediction model. This is important with respect to issue 2, cf. Kneib et al. (2008, 2009). While component-wise gradient boosting has become an established tool to fit con- tinuous and binary data, it has not been possible so far to use boosting methods for fitting proportional odds models. The reason for this is that the boosting algorithms considered by B?hlmann and Hothorn (2007) do not allow for the estimation of scale parameters. The proportional odds model, however, involves the constrained estima- tion of an ordered set of threshold parameters that have to be estimated simultaneously with the othermodel parameters. To take this problem into account, we construct a new boosting algorithm that combines the methods considered by B?hlmann and Hothorn (2007) with an estimation approach suggested by Schmid et al. (2010). With the latter approach, it is possible to adapt boosting algorithms to model families depending on 123 712 Environ Ecol Stat (2011) 18:709?733 a set of scale parameters. As will be shown, the method by Schmid et al. (2010) can be re-formulated such that fitting a proportional odds model via boosting techniques is feasible. We will analyze IBI data from the Maryland Biological Streams Survey (MBSS) to demonstrate that the new algorithm is an efficient modeling tool for the biological assessment of small streams and their watersheds. Boosting predictions of FIBI and BIBI indicators are similar to predictions obtained from other established statistical techniques (see, e.g., Maloney et al. 2009), but spatial covariate patterns are detected and model estimates can be interpreted in a more meaningful way. This is possible because the structure of the proportional odds model allows for inspection and visu- alization of marginal predictor effects. As a consequence, the model can be used both for extrapolating estimates of stream biological condition to unsurveyed sites and exploring the determinants of biotic integrity. The rest of the paper is organized as follows: In Sect. 2, the new algorithm is presented, along with a number of technical details involved in choosing appropriate tuning parameters. The characteristics of the algorithm are demonstrated in Sect. 3. Here, the new method is benchmarked against other regression techniques, and an analysis of the MBSS data is carried out. A summary and discussion of the main findings of the paper is given in Sect. 4. Additional figures and technical details are presented in the Appendix and the Web Appendix of the paper. 2 Methods Proportional odds model Let Y be an IBI outcome with K ordered categories and denote the vector of predictor variables by X = (X1, . . . , Xp). Let (X1, Y1), . . . , (Xn, Yn) be a set of independent realizations of (X, Y ). Define X := (X1, . . . , Xn) and Y := (Y1, . . . , Yn). The proportional odds model is given by P(Y ? k) = 1 1 + exp( f (X) ? ?k) , k = 1, . . . , K , (1) where f = f (X) is a prediction function depending on the predictor variables and ? ? < ?1 < ? ? ? < ?K?1 < ?K = ? (2) is a set of threshold values that has to be estimated simultaneously with f . In many applications, f is restricted to being a linear function of the covariates (see Agresti 2002). To take nonlinear predictor effects into account, we will use a more flexible approach: f will be modeled as the sum of (possibly nonlinear) marginal prediction functions f1(X1), . . . , f p(Xp), i.e., f (X) ? f (X1, . . . , Xp) = p ? j=1 f j (X j ) (3) 123 Environ Ecol Stat (2011) 18:709?733 713 (cf. Kneib et al. 2009).With this approach, themodel has essentially the same structure as a generalized additive regression model (Hastie and Tibshirani 1990). From (1) we obtain P(Y = 1|X) = 1 1 + exp( f ? ?1) , P(Y = 2|X) = 1 1 + exp( f ? ?2) ? 1 1 + exp( f ? ?1) , . . . P(Y = K ? 1|X) = 1 1 + exp( f ? ?K?1) ? 1 1 + exp( f ? ?K?2) , P(Y = K |X) = 1 ? 1 1 + exp( f ? ?K?1) , (4) which allows for specifying the log-likelihood of the proportional odds model (see Appendix A). By definition, the probability of observing a large outcome category increases with the magnitude of the estimates of f j , j = 1, . . . , p. For two sites with covariate vectors X1 and X2, (4) implies that the log ratio of cumulative odds does not depend on the category k under consideration: log ( P(Y ? k|X1)/P(Y > k|X1) P(Y ? k|X2)/P(Y > k|X2) ) = f (X2) ? f (X1). (5) Equation (5) is the well-known ?proportional odds assumption? which leads to esti- mates that are interpretable in terms of cumulative odds ratios. Component-wise gradient boosting As stated in the introduction, overfitting of the data can be avoided if component-wise boosting is used to fit the proportional odds model. In the following, we will adapt the component-wise gradient boosting algorithm considered by B?hlmann and Hothorn (2007) to the proportional odds model specified above. In the boosting framework, the aim is to estimate the ?optimal? prediction func- tion f ? and the ?optimal? set of threshold values ?? := (??1 , . . . , ??K?1) defined by ( f ?, ??) := argmin f,? E Y ,X [?(Y , f (X), ?)] , (6) where the loss function ? is assumed to be differentiable with respect to f . In case of the proportional odds model, it is a natural choice to set ? equal to the negative log-likelihood derived from (4). The full log-likelihood function and its derivative are given in Appendix A. Instead of minimizing the theoretical mean given in (6), we consider the empir- ical risk R := ?n i=1 ?(Yi , f (Xi ), ?) and use the following boosting algorithm to minimize the R over f and ? : 123 714 Environ Ecol Stat (2011) 18:709?733 1. Initialize the n-dimensional vector ?f [0] and the threshold parameter estimates ? ? [0] 1 , . . . , ? ? [0] K?1 with offset values. 2. For each of the predictor variables specify a base-learner, i.e., a regression esti- mator with one input variable and one output variable. Set m = 0. 3. Increase m by 1. 4. (a) Compute the negative gradient ? ?? ? f and evaluate at ?f [m?1](Xi ), ?? [m?1] = ( ? ? [m?1] 1 , . . . , ? ? [m?1] K?1 ) , i = 1, . . . , n. This yields the negative gradient vector U [m] = ( U [m]i ) i=1,...,n : = ( ? ? ? f ? ( Yi , ?f [m?1](Xi ), ?? [m?1] ) ) i=1,...,n . (b) Fit the negative gradient vector U [m] to each of the p predictor variables sep- arately by using the base-learners specified in step 2. This yields p vectors of predicted values, where each vector is an estimate of the negative gradient vector U [m]. (c) Select the base-learner that fits U [m] best according to the R2 goodness-of-fit criterion. Set ?U [m] equal to the fitted values of the best model. (d) Update ?f [m] ? ?f [m?1] + ? ?U [m], where 0 < ? ? 1 is a real-valued step length factor. 5. Plug ?f [m] into the empirical risk function ?ni=1 ?(Yi , f, ?) and minimize the empirical risk over ? . Set ?? [m] equal to the newly obtained estimate of ??. 6. Iterate Steps 3?5 until the stopping iteration mstop is reached (the choice of mstop will be discussed below). In the following, we will refer to the boosting algorithm introduced above as ?pro- portional odds boosting? (P/O boosting). Steps 1?4 of the P/O boosting algorithm correspond to the classical gradient boosting approach discussed in B?hlmann and Hothorn (2007). From step 4 it is seen that the algorithm descends the gradient of the empirical risk R, which is the main feature of all gradient boosting algorithms. In each iteration, an estimate of the true negative gradient of R is added to the current estimate of f ?. Consequently, R is minimized in a stagewise fashion, and a structural (regres- sion) relationship between Y and X is established. Obviously, using P/O boosting corresponds to replacing classical Fisher scoring algorithms for maximum likelihood estimation of f ? (McCullagh 1980) by a gradient descent algorithm in function space. As seen from steps 4(c) and 4(d), the P/O boosting algorithm additionally carries out variable selection, as only one base-learner (i.e., one component of X) is selected for updating ?f [m] in each iteration. Due to the additive update, the final boosting estimate at iteration mstop can be interpreted as an additive prediction function, as defined in (3). In step 5, the estimation approach of Schmid et al. (2010) is used to obtain updates of ? . Here, the empirical risk is minimized over ? , using the current estimate of f ? as offset value. Note that step 5 of the P/O algorithm involves the constrained estima- tion of an ordered set of parameters, which has not been considered by Schmid et al. (2010). As shown in Web Appendix A, however, the constrained estimation problem 123 Environ Ecol Stat (2011) 18:709?733 715 can be re-formulated as an unconstrained problem, so that the method by Schmid et al. (2010) can be applied. Specification of base-learners It is clear from step 4 of the P/O boosting algorithm that the specification of the base- learners is crucial for interpreting the model fit. Here it is important to keep in mind that, due to the additive update in step 4(d), the estimate of a marginal function f j at iteration mstop has the same structure as the base-learner used in each iteration. For example, f j is linear in X j if the base-learner used to model f j in step 4(b) is a simple linear model (cf. B?hlmann and Hothorn 2007, p. 484). Similarly, f j is a smooth function of X j if the corresponding base-learner is smooth as well. Concerning the choice of appropriate base-learners, we follow the approach used by Kneib et al. (2009): Themarginal functions f j corresponding to continuous predictors are either modeled as linear functions or as penalized regression splines (?P-splines?, cf. Wood 2006; Schmid and Hothorn 2008; Kneib et al. 2009), where selection of the best modeling alternative (smooth nonlinear vs. linear) is carried out automatically by the P/O boosting algorithm. To do this, we modify step 2 of the P/O algorithm as follows: For each covariate, we specify two competing base-learners, namely a linear base-learner and a smooth P-spline deviation from the linear base-leaner (cf. Kneib et al. 2009, p. 628). Consequently, due to the base-learner selection carried out in step 4(c), the marginal functions f j depending on continuous predictors become either linear or smooth. To account for spatial dependency between neighboring sample sites, we addition- ally include a smooth function quantifying marginal spatial effects into the model. This function depends on the coordinates of the site locations and is added to the other functions specified in (3), see Kneib et al. (2008, 2009). As a base-learner for the mar- ginal spatial effect we use a P-spline tensor product surface depending on the UTM easting and northing coordinates of the site locations. Thus, denoting the coordinates by XE and XN, the spatial effect becomes a smooth marginal surface fsp(XE, XN) depending on the bivariate ?predictor? variable (XE, XN). As noted by Kneib et al. (2008), fsp(XE, XN) can be interpreted as the realization of a spatially correlated sto- chastic process, emphasizing the fact that one needs to account for spatial correlation in the data. Finally, we model categorical predictors using dummy coded binary variables as base-learners. As a consequence, the functions f j correspond to linear category effects in these cases. Detailed descriptions of P-splines and P-spline tensor product surfaces have been given by Fahrmeir et al. (2004), Wood (2006) and Kneib et al. (2009). Tuning parameters In the literature, it has been argued that boosting algorithms should generally not be run until convergence. Otherwise, overfits resulting in a suboptimal out-of-sample prediction accuracy are likely (see B?hlmann and Hothorn 2007). As a consequence 123 716 Environ Ecol Stat (2011) 18:709?733 of this ?early stopping? strategy, the stopping iteration mstop becomes the main tuning parameter of the P/O algorithm. In the following, we will use five-fold cross-valida- tion to determine the value of mstop, i.e., mstop is the iteration with lowest predictive empirical risk. Alternatively, information criteria such as AIC or BIC could be used to determine the stopping iteration mstop. For example, in case of Gaussian regression, a correctedAIC criterion could be calculated in each boosting iteration, and the stopping iteration would be given by the iteration with smallest AIC (B?hlmann and Hothorn 2007, p. 495). In this paper, however, we will consider cross-validation instead of information criteria because the latter have been criticized as being systematically biased towards stopping boosting algorithms too late (see Hastie 2007). In contrast to the choice of the optimal stopping iteration, the choice of the step length factor ? has been shown to be of minor importance for the predictive performance of a boosting algorithm. The only requirement is that the value of ? is ?small?, such that a stagewise adaption of the prediction function is possible (see Schmid and Hothorn 2008). We will set ? = 0.1. Regularization A major consequence of the early stopping strategy is that the estimates of f ? are shrunken towards zero. In fact, using a small step length ? ensures that marginal function estimates increase ?slowly? in the course of the P/O boosting algorithm but stop increasing as soon as the optimal stopping iteration mstop is reached. As stated above, the optimal stopping iteration is chosen such that out-of-sample prediction accuracy is optimized within the proportional odds framework. In other words, stop- ping the P/O boosting algorithm at the optimal iteration implies that the amount of shrinkage is chosen such that the predictive power of the proportional odds model is maximal. Shrinkage is a well-established strategy to regularize model estimates: Esti- mates tend to have a slightly increased bias but a decreased variance, thereby improv- ing prediction accuracy. On the other hand, the choice of the base-learners specified above ensures that black-box predictions are avoided and marginal effect estimates are obtained. Although, in contrast to maximum likelihood estimation, estimates are shrunken towards zero, the main characteristics (and thus the interpretability) of the proportional odds model are preserved. Prediction For given estimates of f ? and ??, the predicted outcome category (denoted by k?) is the category with highest posterior probability, i.e., k? = max k ?P(Y = k|X), (7) which can be computed from (4). Thus, misclassification rates and weighted kappa indices for ordinal data (Fleiss and Cohen 1973) can be used to evaluate the predictive power of the P/O boosting fit. 123 Environ Ecol Stat (2011) 18:709?733 717 Confidence intervals Since boosting estimates are shrunken towards zero, computation of confidence inter- vals for marginal functions is infeasible. This problem can also be found with other shrinkage methods such as ridge regression or the Lasso (Tibshirani 1996). With the help of bootstrapmethods, however, it is possible to approximate the distribution of the boosting estimates in a non-parametric fashion (see Sect. 3). Consequently, the boot- strapped estimates can be used to assess whether a function estimate is systematically different from zero. As an alternative to approximating the distribution of effect estimates via bootstrap sampling, Bayesian methods could be used to fit the proportional odds model and to calculate posterior distributions of marginal predictor effects. This approach would require Bayesian methods for shrinkage (such as the Bayesian Lasso, Park and Casella 2008) and variable selection (O?Hara and Sillanp?? 2009) to be adapted to geoadditive proportional odds models. While being potentially useful, the Bayesian approach is conceptually different from the proposed P/O boosting algorithm and will therefore not be considered in this paper. 3 Analysis of the Maryland biological streams survey Data sources and pre-processing In this section, we apply the P/O boosting algorithm to develop a predictive model for fish (FIBI) and benthic macroinvertebrates (BIBI) indicators of biological condi- tion. Our study is focused on the 23,408 km2 part of Maryland, USA, lying in the Chesapeake Bay watershed (Fig. 1). This area includes six Level III ecoregions: Cen- tral Appalachians, Ridge and Valley, Blue Ridge, Northern Piedmont, Southeastern Plains, and Middle Atlantic Coastal Plains (see Omernik 1987). Climate types range from cold with hot summers in the mountainous western area to temperate with hot summers toward the southeast; vegetation patterns range from northern hardwood for- ests in the highlands to oak, hickory, pine, and southern mixed forests of the Coastal Plains. The Appalachian, Ridge and Valley, and Blue Ridge ecoregions are under- lain largely by folded and faulted sedimentary rocks. The Piedmont ecoregion is under- lain by crystalline igneous and metamorphic rocks, and the Plains ecoregions are underlain by unconsolidated sediments. Our analysis is based on theMarylandBiological Streams Survey (MBSS), which is an on-goingmonitoring program designed to describewater quality in 1st- to 4th-order non-tidal streams within the state of Maryland, USA (USEPA 1999). MBSS scientists used a probabilistic sampling design stratified by major watershed and stream order to sample approximately 2,500 sites from 1994 to 2004 (cf. Southerland et al. 2005). An MBSS site is a ?75 m stream segment where data were collected for stream physical and hydrological attributes (e.g., flow, width, depth, and embeddedness), streamwater chemistry, location (latitude and longitude), riparian conditions, and biological com- munities (i.e., benthic macroinvertebrates and fish). For a detailed description of the MBSS, see http://www.dnr.state.md.us/streams. We considered only the first record 123 718 Environ Ecol Stat (2011) 18:709?733 Fig. 1 The Maryland portion of the Chesapeake Bay watershed, its major ecoregions, and stream assess- ment sites where data were collected. Inset shows the study area (dark gray) in relation to the Chesapeake Bay watershed (light gray) for sites that were sampled more than once. This resulted in a database containing measurements at n =1,573 stream sites (see Fig. 1). There were 96 sites that had no fish collected, and these sites were not used to examine FIBI, leaving n =1,477 sites for the FIBI models. Land cover data was obtained from the 2001 US National Land Cover Database (Homer et al. 2004). Watershed predictors were calculated in ARCGIS using watershed boundaries and relevant environmental coverages. Individual IBIs were developed by MBSS scientists separately for each subregion of the study area and included individual metrics specific to each subregion (Web Appendix B, see Southerland et al. 2005 for IBI developments and for a complete list of metrics in each IBI). Following Maloney et al. (2009), we used an ordinal scale to quantify BIBI and FIBI indicators (1 = ?very poor site?, 2 = ?poor site?, 3 = ?fair site?, 4 = ?good site?). FIBI and BIBI indicators were regressed on site- specific predictor variables using the P/O boosting algorithm introduced in Section 2. Predictors included UTM easting and northing coordinates, watershed land use, dominant ecoregion (Omernik 1987), and the ?distance to mainstem? measured from the MBSS sampling site to a mainstem tributary with >500 km2 in upslope drainage area. A value of 0 was assigned to sites that drained into the Chesapeake Bay before reaching a mainstem river. For a detailed description of the predictor variables, see Appendix B. Predictors with a highly right-skewed distribution were log transformed before statistical analysis. Benchmark analysis We first investigated the prediction accuracy of the P/O boosting algorithm, i.e., the ability of P/O boosting to predict the FIBI and BIBI values at unsurveyed sites. To 123 Environ Ecol Stat (2011) 18:709?733 719 do this, we carried out a benchmark experiment using 100 bootstrap samples drawn from the full data set. The bootstrap samples were used as training data sets, and the P/O boosting algorithm was applied to these samples. Five-fold cross-validation was carried out on the training data sets to determine the values of mstop. In a next step, we applied the 100 prediction rules obtained from P/O boosting to the respective sets of out-of-bootstrap observations (?bootstrap cross-validation?, see, e.g., Hothorn et al. 2005). The predictions and the true outcome values of the 100 out-of-bootstrap data sets were used to compute classification rates and weighted kappa values. To benchmark the P/O boosting algorithm against other established techniques, we also considered the random forest method (Breiman 2001), a non-parametric classification technique based on ensembles of decision trees. Random forests have been shown to be one of the most accurate methods for predicting IBI indicators (Maloney et al. 2009). Because the random forest method neither requires the propor- tional odds assumption nor the additivity of the prediction function specified in (3), it is less restrictive than P/O boosting. As above, we estimated prediction accuracy using the same bootstrap samples to compute classification rates and weighted kappa values. Table 1 shows that boosting and the random forest method had similar classification rates for both indices of biotic integrity. For the FIBI indicator,mean classification rates were nearly equal (P/O boosting: 51.4%, random forest method: 51.2%) while in case of the BIBI indicator, classification rates were slightly lower for P/O boosting (45.4%) than for random forests (46.6%). Weighted kappa values obtained from P/O boosting were larger on average than the corresponding weighted kappa values obtained from random forests (Table 2). This result can be explained by the fact that P/O boosting explicitly accounts for the ordinal structure of the BIBI and FIBI indicators: Due to the structure of the proportional odds model, misclassification of observations into neigh- boring categories tends to be more likely than misclassification into categories that are ?far away? from the true category. Note that all weighted kappa values obtained from P/O boosting were larger than 0.5, i.e., P/O boosting predicted significantly better than chance alone. Table 1 also suggests that classification rates of outcome categories are considerably higher if site-specific covariate information is used to obtain predictions than if the unconditional distribution of the FIBI and BIBI indicators is used. As seen from Table 1, classification rates are largest for sites with good biological condition. This result has previously been reported by Maloney et al. (2009). Analysis of the full data set: FIBI indicator After demonstrating that the prediction accuracy of P/O boosting is comparable to that of the random forest method, we analyzed the full data set to examine functional relationships between predictors and IBI indicators.We applied the P/O boosting algo- rithm to the full data set and visualized marginal function estimates using partial plots (Figs. 2, 3, 4, 5). Note that partial plots cannot be obtained from the random forest method because, in contrast to P/O boosting, random forests yield black-box estimates that are not easily interpreted. Although the random forest method can provide esti- mates of variable importance (Cutler et al. 2007), functional relationships between 123 720 Environ Ecol Stat (2011) 18:709?733 Table 1 P/O boosting and random forest classification rates, as obtained from bootstrap cross-validation Mean Min. 1st Qu. Median 3rd Qu. Max. uncond. FIBI classification rates, P/O boosting All sites 0.514 0.461 0.500 0.514 0.528 0.563 Very poor sites 0.372 0.223 0.317 0.357 0.403 0.610 0.167 Poor sites 0.221 0.056 0.183 0.219 0.259 0.361 0.172 Fair sites 0.453 0.320 0.416 0.455 0.485 0.557 0.277 Good sites 0.728 0.634 0.699 0.731 0.759 0.809 0.383 FIBI classification rates, random forests All sites 0.512 0.466 0.500 0.513 0.526 0.561 Very poor sites 0.395 0.279 0.341 0.391 0.441 0.596 0.167 Poor sites 0.278 0.152 0.241 0.278 0.309 0.405 0.172 Fair sites 0.425 0.304 0.394 0.428 0.457 0.529 0.277 Good sites 0.711 0.609 0.686 0.708 0.739 0.781 0.383 BIBI classification rates, P/O boosting All sites 0.454 0.414 0.442 0.454 0.468 0.500 Very poor sites 0.435 0.285 0.399 0.435 0.473 0.568 0.181 Poor sites 0.335 0.208 0.297 0.333 0.364 0.503 0.249 Fair sites 0.483 0.329 0.453 0.482 0.516 0.590 0.295 Good sites 0.546 0.440 0.508 0.552 0.578 0.653 0.275 BIBI classification rates, random forests All sites 0.466 0.424 0.455 0.467 0.478 0.504 Very poor sites 0.474 0.350 0.443 0.480 0.513 0.579 0.181 Poor sites 0.318 0.212 0.283 0.313 0.349 0.441 0.249 Fair sites 0.436 0.279 0.408 0.431 0.466 0.519 0.295 Good sites 0.629 0.532 0.601 0.632 0.655 0.736 0.275 uncond. = unconditional distribution of categories in the full data set Table 2 Weighted kappa values, as obtained from bootstrap cross-validation Mean Min. 1st Qu. Median 3rd Qu. Max. FIBI weighted kappa values, P/O boosting 0.591 0.502 0.577 0.593 0.609 0.657 FIBI weighted kappa values, random forests 0.553 0.473 0.533 0.552 0.576 0.637 BIBI weighted kappa values, P/O boosting 0.586 0.527 0.568 0.583 0.603 0.643 BIBI weighted kappa values, random forests 0.580 0.501 0.565 0.580 0.596 0.634 Fleiss?Cohen weights were used to account for the ordinal structure of FIBI and BIBI indicators (cf. Fleiss and Cohen 1973) 123 Environ Ecol Stat (2011) 18:709?733 721 Fig. 2 FIBI model?marginal function estimates obtained from applying P/O boosting to the full data set predictors and outcome variables cannot be quantified directly. We therefore did not use this method to analyze the full data set. Let us first consider the FIBI indicator of biological condition. Figure 2 shows the most pronounced marginal effects obtained from P/O boosting. Light grey lines correspond to the function estimates obtained from the 100 bootstrap samples used in the benchmark experiment. Increases in watershed area and average watershed ele- vation have positive effects on the FIBI indicator, supporting previous findings of the importance of these factors on stream fishes (Angermeier and Schlosser 1989; Oberdorff and Hughes 1992; Matthews and Robison 1998; Joy and Death 2004). While the effect of watershed area is clearly nonlinear, a linear marginal predic- tion function was obtained for average watershed elevation. This demonstrates the ability of P/O boosting to incorporate both linear and smooth (nonlinear) predictor effects into the proportional odds model. Regarding the magnitude of its marginal function, watershed area is clearly the most important predictor for FIBI (Fig. 2). For example, consider two stream sites with watershed areas WA1 = 1 km2 and WA2 = 10 km2. We obtain log(WA1) = 0 and log(WA2) ? 2.3, which results in marginal function estimates fWA(WA1) ? ?2 and fWA(WA1) ? 1 (see Fig. 2). 123 722 Environ Ecol Stat (2011) 18:709?733 Fig. 3 FIBI model?marginal spatial effect estimate obtained from applying P/O boosting to the full data set Assuming constant values for the other predictor variables, it follows from Eq. (5) that the cumulative odds ratio of site 2 is approximately exp(1 ? (?2)) ? 20 times larger than the cumulative odds ratio of site 1. The strong positive effect of watershed area might be due to low natural richness of fishes in headwater streams (Angermeier and Schlosser 1989; Matthews and Robison 1998), which may affect individual metrics composing the IBI (Schleiger 2000). Increases in the percentage of upstream watershed under impervious surface cover have a negative effect on FIBI scores, which is an often-reported pattern (e.g., Wang and Lyons 2003; Helms et al. 2009) that results from the numerous negative effects that impervious surfaces have on stream hydrologic and geomorphic factors (Paul and Meyer 2001; Walsh et al. 2005). The marginal function estimate for the distance from sampling location to the nearest main stem stream indicates that there is an inverted U-shaped relationshipwith the FIBI indicator, i.e., FIBI increaseswith low but increas- ing distance values but decreases again for large distance values. Sites with large dis- tances from mainstems are likely headwaters having low FIBIs as discussed above. The lower FIBI scores for short distances to mainstem might reflect sites that directly drain into the Chesapeake Bay (which were given a distance value of 0). Marginal function estimates corresponding to other continuous predictors are relatively small in magnitude (relative to the estimates shown in Fig. 2), indicating theirminor importance for modeling FIBI. The function estimates corresponding to these predictor variables are shown in Web Appendix C. Marginal effect estimates of the categorical covariate ?percentage of bedrock that is calcareous in a watershed? were small and suggest that FIBI is lower when calcareous rock is present (percentage of calcareous bedrock >0%, see Table 3), highlighting the importance of geology in structuring fish assemblages (Montgomery 1999; Joy and Death 2004). However, we caution over-interpretation of these findings because the range of the bootstrapped marginal effect estimates contains zero and because this covariate was analyzed at a coarse scale (presence/absence). The categorical effects of dominant ecoregions are also relatively small (Table 3), indicating that the dominant ecoregion is of minor importance for modeling FIBI. 123 Environ Ecol Stat (2011) 18:709?733 723 Fig. 4 BIBI model?marginal function estimates obtained from applying P/O boosting to the full data set A marginal spatial effect was still evident for the FIBI after accounting for all other covariates (Fig. 3). Sites in the Blue Ridge region, the Ridge and Valley region, and the South-Eastern part of the Middle Atlantic Coastal plain tended to have lower FIBI 123 724 Environ Ecol Stat (2011) 18:709?733 Fig. 5 BIBI model?marginal spatial effect estimate obtained from applying P/O boosting to the full data set scores than other sites. In contrast, the middle region of the Northern Piedmont region shows a very positive marginal effect on FIBI. It is important to remember that these effects are marginal and therefore not caused by variations in other predictors (such as the dominant ecoregion). They may reflect missing predictors, or, alternatively, problems with the calculation of FIBI itself. For example, the FIBI for Blue Ridge and Ridge and Valley ecoregions was stratified only into warmwater or coldwater streams (see Web Appendix B, Southerland et al. 2005). A more refined FIBI, possibly strat- ified by ecoregions or sub-ecoregions (Schleiger 2000), might reduce the marginal spatial patterns in the FIBI. Analysis of the full data set: BIBI indicator We next consider the BIBI indicator of biological condition. Figure 4 shows the most pronounced marginal effects obtained from P/O boosting. Obviously, increases in watershed area, distance from sampling location to the nearest main stem stream, and percentage of upstream watershed under tree cover have large positive effects on the BIBI indicator. In contrast, increases in the percentage of upstream watershed under impervious surface cover have a very strong negative effect on BIBI. Apart from a few sites with a small upstream population density (showing large variations in their effects on BIBI), the effect of population density on BIBI is also negative. The positive effect of the percentage of upstreamwatershed under tree cover and the negative effect of the percentage of upstream watershed under impervious surface cover on benthic macroinvertebrates support previous reports (Roy et al. 2003; Walsh et al. 2005; King et al. 2005; Maloney et al. 2009) and document the sensitivity of benthic macroin- vertebrates to watershed conditions. The upstream population density was positively correlated with the percentage of upstream watershed under impervious surface cover (r = 0.78), so these covariates are likely to represent the same anthropogenic stressor (population pressure). The effect of distance to main stem demonstrates the impor- tance of position within a stream network to the benthic community (Vannote et al. 123 Environ Ecol Stat (2011) 18:709?733 725 Table 3 Effects of the categorical predictors ?percentage of bedrock that is calcareous in a watershed? and ?dominant ecoregion? on FIBI and BIBI indicators of stream condition Predictor Category Full data set Mean Min. Max. FIBI indicator, P/O boosting % Of calc. bedrock =0% 0 >0% ?0.082 ?0.144 ?0.474 0.102 Ecoregion Blue ridge 0 Centr. Appalachian ?0.023 ?0.188 ?0.748 ?0.001 Mid. Atl. coastal plains 0.005 0.032 ?0.004 0.141 Northern Piedmont ?0.017 ?0.082 ?0.259 0.031 Ridge & Valley 0.007 0.083 ?0.017 0.337 South Eastern plains 0.004 0.019 ?0.055 0.165 BIBI indicator, P/O boosting % Of calc. bedrock =0% 0 >0% ?0.294 ?0.308 ?0.760 ?0.013 Ecoregion Blue ridge 0 Centr. Appalachian ?0.605 ?0.545 ?1.017 ?0.078 Mid. Atl. coastal plains ?0.099 ?0.086 ?0.282 0.069 Northern piedmont ?0.238 ?0.220 ?0.459 0.031 Ridge & Valley 0.511 0.450 0.041 0.878 South Eastern plains 0.266 0.254 0.031 0.574 Values in columns 4?6 were obtained from applying P/O boosting to 100 bootstrap samples drawn from the full data 1980). All functions shown in Fig. 4 are nonlinear, demonstrating the ability of P/O boosting to account for nonlinear predictor effects. Marginal function estimates for other continuous predictors (Web Appendix D) are smaller in magnitude than those shown in Fig. 4, indicating their minor importance for modeling BIBI. Marginal effect estimates of the categorical covariate ?percentage of bedrock that is calcareous in a watershed? suggest that BIBI is lower when calcareous rock is present. This effect is much stronger than the effect obtained for the FIBI indicator (Table 3), reinforcing geology as an important structuring factor on local benthic macroinvertebrate assem- blages (Montgomery 1999; Pyne et al. 2007). Again we caution over-interpretation of these results because of the coarse scale of this predictor. The categorical effects of three dominant ecoregions on BIBI are significantly different from zero (Table 3). The Central Appalachian ecoregion has the lowest marginal biotic integrity while the Ridge and Valley ecoregion has the largest positive effect on the BIBI indicator. Maloney et al. (2009) reported similar effects of ecoregions on BIBI. Themarginal spatial effect estimates show that sites in theBlueRidge region and the eastern part of the Ridge andValley region tended to have lower BIBI scores than other sites (Fig. 5). These spatial effects may be due to missing predictors or coarse stratifi- cation during BIBI development. For example, the Ridge and Valley, Blue Ridge, and Central Appalachians ecoregions were lumped into a single ?Combined Highlands? stratum during the BIBI construction (See Web Appendix B, Southerland et al. 2005). 123 726 Environ Ecol Stat (2011) 18:709?733 4 Summary and conclusion In recent years, much research has been undertaken to assess the degree of impairment in ecosystem structure and function due to anthropogenic disturbance in watersheds. As part of this research, biological assessments of stream condition have become an important tool to identify impairments and to develop appropriate management and conservation strategies. In this paper, we have considered indicators of biologic con- dition (Karr et al. 1986; Karr 1991; Barbour et al. 1999), which are indispensable tools for measuring and managing the health of streams and their watersheds. We have developed a boosting algorithm for modeling indices of biotic integrity and applied our method to data collected from small non-tidal streams in the state of Maryland, USA. Because IBI indicators are often evaluated on an ordinal scale, the P/O boosting algorithm developed in this paper is based on the well-established pro- portional odds model introduced by McCullagh (1980). To obtain regularized model fits, we combined classical gradient boosting techniques with two recent advances:We used themodeling approach suggested byKneib et al. (2009) to obtain predictionmod- els accounting for nonlinear effects and spatial correlation, and we re-formulated the boostingmethod bySchmid et al. (2010) to obtain scale parameter estimates (which are necessary for adapting classical boosting techniques to the proportional odds model). In summary, the boosting algorithm presented in this paper combines the following advantages: 1. P/O boosting allows for fully automatic variable selection and model choice. In particular, it does not require scientists to select predictor variables using heuristic approaches such as stepwise variable selection. 2. Although boosting estimates are typically different from classical maximum like- lihood estimates, the P/O boosting algorithm preserves the structure of the propor- tional odds model. Therefore, boosting estimates are accessible for interpretation, which is a major advantage over estimation techniques that result in black-box estimates. On the other hand, of course, black-box predictions are expected to have a higher degree of flexibility than predictions that are linked to the pre-specified structure of the additive proportional odds model. 3. P/O boosting accounts for spatial effects. Although there are numerous methods to model spatial correlation in ecological data (cf. Bigler et al. 2005; Gelfand 2007), the interaction surfaces used in this paper have the advantage that marginal spatial effects can be visualized and information contained in unobserved (latent) predictor variables is quantified. 4. Spatial effects and nonlinear effects of predictor variables are estimated jointly based on penalized spline functions. Both the joint estimation and the additive structure of the prediction function facilitate a relatively simple interpretation of the results. For example, the use of P/O boosting led to clear interpretations of the relationships between watershed attributes and stream biological condition in the MBSS data, where both linear and nonlinear marginal predictor effects were present. 5. Due to the early stopping strategy, prediction accuracy of the P/O boosting fit is maximized in the proportional odds model framework. Of course, it is well known 123 Environ Ecol Stat (2011) 18:709?733 727 that there is no ?uniformly best? prediction method for ordinal data. Therefore, we do not claim that P/O boosting is generally superior to other methods for ordinal data. For the MBSS data, however, predictions obtained from P/O boosting turned out to be very similar to predictions obtained from the random forest method. This is remarkable because the random forest method is a completely non-parametric technique that is generally considered to be one of the most powerful statistical prediction methods (see Hastie et al. 2009, Chap. 15). Although, for the sake of interpretation, we restricted ourselves to considering main- effects models in this paper, the gradient boosting framework can be extended to include interaction terms between predictor variables in the model formula. As dem- onstrated byKneib et al. (2009), this can be accomplished by specifying additional sets of linear and smooth base-learners depending on the products of predictor variables. When interpreting marginal function estimates, one should be aware of the fact that P/O boosting is based on two important assumptions: First, we assumed the propor- tional odds property to hold. Second, we assumed predictor effects to be additive. If these assumptions are not met, estimates might show a bias caused by model mis- specification. Usually, this bias cannot be fully compensated by the high flexibility of the P/O boosting algorithm. Because the early stopping strategy results in regularized boosting estimates that are shrunken towards zero, it is generally difficult to derive tests on the appropriateness of model assumptions in the boosting framework. There- fore, assessing the robustness of boosting estimates against model misspecification constitutes an important issue of future research. Instead of regularizing effect estimates via gradient boosting (in combination with early stopping), it would alternatively be possible to optimize out-of-sample predic- tion accuracy using penalized regression techniques. For example, the Lasso method (Tibshirani 1996), which is based on L1-penalized likelihood estimation, would be a natural approach to incorporate shrinkage and variable selection into a proportional odds model. However, the original Lasso method has mainly been designed for regres- sion models with a linear prediction function. Combining penalized estimation with sparse nonlinear additive modeling has only recently been accomplished (Meier et al. 2009). To date, there is no extension of the method developed by Meier et al. (2009) to geoadditive proportional odds models. On the other hand, gradient boosting and the Lasso are closely related, as both algorithms can be embedded into the LARS framework (Efron et al. 2004). Also, in case of Gaussian regression, there is evidence that the properties of gradient boosting are similar to those of the Lasso (Hastie et al. 2009, Chap. 16). These results suggest that the role of the boosting stopping iteration is similar to the role of the (inverse of the) shrinkage parameter used for the L1 penalty of the Lasso method. Finally, the boosting algorithm presented in this paper is not restricted to ecological applications but can be used to analyze very general types of ordinal data. Although developing a predictionmethod for streambiological conditionwas the primary goal of this paper, both the proportional odds model and the boosting frameworks are essen- tially independent of the application context in which they are used. It is therefore possible to apply the P/O boosting algorithm in many fields, for example in clinical or biomedical research. 123 728 Environ Ecol Stat (2011) 18:709?733 Software All computations were carried out with the R System for Statistical Computing (ver- sion 2.10.1, R Development Core Team 2009). The gamboost() function of R package mboost (Hothorn et al. 2010) was used to calculate boosting estimates. Base- learners were made comparable by centering predictors at the beginning of the algo- rithm and by using the same degrees of freedom for each base-learner (see Kneib et al. 2009 for details). For example, in case of the FIBI model, the R code for specifying the model formula was given by > library(mboost) > FIBI.formula <- FIBI ? bols(EASTING, intercept = FALSE) + + bols(NORTHING, intercept = FALSE) + + bspatial(EASTING, NORTHING, knots = 20, df = 1, + differences = 1) + + bols(DrainageDensity, intercept = FALSE) + + bbs(DrainageDensity, center = TRUE, df = 1) + ... + bols(PerWet, intercept = FALSE) + + bbs(PerWet, center = TRUE, df = 1) + + bols(Ecoregion, intercept = FALSE, df = 1) + + bols(INT, intercept = FALSE, df = 1) where FIBI denotes the FIBI outcome, EASTING and NORTHING denote the UTM easting and northing coordinates of the site locations, respectively, and Drain- ageDensity, PerWet and Ecoregion are examples of predictor variables. The bols() and bbs() functions in R package mboost (using the intercept = FALSE and center = TRUE options) correspond to linear base-learners and smooth P-spline deviations from the linear base-learners, respectively (see Kneib et al. 2009 for details). Specifying the base-learners as shown above ensures that selection of the best modeling alternative (smooth nonlinear vs. linear) is carried out automatically by the P/O boosting algorithm. Similarly, the bspatial() function in R package mboost (using the center = TRUE option) corresponds to a smooth P-spline ten- sor product deviation from a spatial linear surface. Note that specifying a bbs() base-learner for Ecoregion was not necessary because this predictor variable is categorical. Using the model formula specified above, the proportional odds model was fitted with the help of the PropOdds() family in R package mboost. The corresponding R code was given by > ctrl <- boost_control(mstop = 20000, nu = 0.1) > FIBI.model <- gamboost(FIBI.formula, data = MBSS.training, family = + PropOdds(), control = ctrl) where MBSS.training is the name of the training data set containing the variables specified in FIBI.formula and where the step length ? and the initial number of boosting iterations were specified using the boost_control() function of R package mboost. Internal five-fold bootstrap cross-validation for determining the opti- mal stopping iteration was carried out using the cvrisk() function of R package mboost: > ntrain <- nrow(MBSS.training) > bs5 <- rmultinom(5, ntrain, rep(1, ntrain) / ntrain) 123 Environ Ecol Stat (2011) 18:709?733 729 > cvm <- cvrisk(FIBI.model, folds = bs5) > st <- mstop(cvm) After having determined the optimal stopping iteration (denoted by st), the ?optimal? boosting fit at iteration st was calculated as follows: > FIBI.optimal <- FIBI.model[st] Afterwards, the predict() function of R package mboost was used to calculate predictions: > pred <- predict(FIBI.optimal, newdata = MBSS.test, type = ?response?) where MBSS.test denotes the test set of out-of-bootstrap observations (cf. Sect. 3). The pred object is a matrix containing the posterior class probabilities corresponding to the out-of-bootstrap observations. Using this object, the predicted outcome catego- ries were calculated as described in Sect. 2. For a detailed description of the mboost package we refer to Hothorn et al. (2010). Random forest analysis was carried out using the R package randomForest (Liaw andWiener 2002, 2009). The random forest algorithm of R packagerandomForest has two main tuning parameters: (a) ntree, which is the number of trees used for the forest, and (b) mtry, which is the number of variables randomly selected at each node. To achieve sufficiently stable results, the number of trees was set to 2000 (see Cutler et al. 2007). The hyper-parameter mtry was tuned using additional internal ten-fold cross-validation. Topographic surface plots were created using the R package sp (Pebesma and Bivand 2009). The Kappa function of R package vcd (Meyer et al. 2009) was used to compute weighted kappa indices. Acknowledgments The work of Matthias Schmid and Torsten Hothorn was supported by Deutsche Fors- chungsgemeinschaft (DFG), grant HO 3242/1-3, and by the Interdisciplinary Center for Clinical Research (IZKF) at the University Hospital of the University of Erlangen-Nuremberg (Project J11). We thank the Maryland Department of Natural Resources for providing data from their Maryland Biological Stream Sur- vey (MBSS). Funding for this project was provided by a Smithsonian Institution Post-Doctoral Fellowship awarded to KOM and by the US Environmental Protection Agency National Center for Environmental Research (NCER) Science to Achieve Results (STAR), grant #R831369. We thank the editor and two anonymous reviewers for helpful comments and suggestions. Appendix A Log-likelihood of the proportional odds model The system of equations (4) implies that the log-likelihood of the proportional odds model is given by l( f, ?) = ? I(Y = 1) ? log(1 + exp( f ? ?1)) + K?1 ? k=2 I(Y = k) ? [ log ( (1 + exp( f ? ?k))?1 ? (1 + exp( f ? ?k?1))?1 )] + I(Y = K ) ? log ( 1 ? (1 + exp( f ? ?K?1))?1 ) . 123 730 Environ Ecol Stat (2011) 18:709?733 Thus, the loss function used for the P/O boosting algorithm becomes ? = ?l. The negative derivative of ? w.r.t. f is given by ? ?? ? f = ?l ? f = ? I(Y = 1) ? (1 + exp(?1 ? f )) ?1 + K?1 ? k=2 I(Y = k) ? 1 ? exp(2 f ? ?k?1 ? ?k) 1 + exp( f ? ?k?1) + exp( f ? ?k) + exp(2 f ? ?k?1 ? ?k) + I(Y = K ) ? (1 + exp( f ? ?K?1))?1. B Predictor variables used for the analysis of the MBSS data We included the following predictor variables in our analysis of the MBSS data: ? UTM easting and northing coordinates provided by MBSS (from Maryland State Plane Coordinate System). These predictors were used to take the spatial depen- dence structure of sample sites into account (predictor variables XE and XN, see Sect. 2). ? Watershed Area, i.e., area of drainage upstream of sampling point (in km2). ? Population density (#/km2) of upstream watershed. ? Average upstream watershed elevation (in m). ? Average annual precipitation for upstream watershed elevation (in cm y?1). ? Percentage of upstream watershed under tree cover. ? Percentage of upstream watershed under impervious surface cover. ? Percentage of upstream watershed under pasture cover. ? Percentage of upstream watershed under row crop cover. ? Percentage of upstream watershed under wetland cover. ? Percentage of upstream watershed under barren cover. ? Drainage density, defined as total stream length (in km) / watershed area (in km2). ? Distance from sampling location to the nearest main stem stream (in km). Val- ues of this predictor variable were set to zero for sites that drained directly into Chesapeake Bay. ? Average percentage of sand content in soil. ? Percentage of bedrock that is calcareous in a watershed. ? Dominant ecoregion (categorical predictor with six categories, see Sect. 3 and Omernik 1987). A preliminary analysis of the data showed that the distributions of watershed area, population density, drainage density, upstream watershed elevation, and the percent- ages of upstreamwatershed under impervious surface, wetland, and barren cover were highly right-skewed.We therefore applied a log transformation to these predictor vari- ables before fitting the proportional odds models. Since we observed a large number of zero percentages in calcareous bedrock, we transformed this predictor into a binary variable with categories ?percentage of calcareous bedrock = 0%? and ?percentage of calcareous bedrock > 0%?. 123 Environ Ecol Stat (2011) 18:709?733 731 References Agresti A (2002) Categorical data analysis. 2 edn. Wiley, New York Angermeier PL, Schlosser IJ (1989) Species?area relationship for stream fishes. Ecology 70:1450?1462 Barbour MT, Gerritsen J, Snyder BD, Stribling JB (1999) Rapid bioassessment protocols for use in streams and wadeable rivers: Periphyton, benthic macroinvertebrates and fish. 2 edn. Office of Water, US Environmental Protection Agency, Washington, DC Barker LS, Felton GK, Russek-Cohen E (2006) Use of Maryland biological stream survey data to deter- mine effects of agricultural riparian buffers on measures of biological stream health. Environ Monitor Assess 117:1?19 Bigler C, Kulakowski D, Veblen TT (2005) Multiple disturbance interactions and drought influence fire severity in Rocky Mountain subalpine forests. Ecology 86:3018?3029 Breiman L (2001) Random forests. Mach Learn 45:5?32 B?hlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting (with discussion). Stat Sci 22:477?522 B?hlmann P, YuB (2003) Boostingwith the L2 loss: regression and classification. J AmStat Assoc 98:324? 338 Collier KJ (2009) Linking multimetric and multivariate approaches to assess the ecological condition of streams. Environ Monitor Assess 157:113?124 Cooper C (2009) Assessing environmental impact on riparian benthic community vigor with unconditional estimates of quantile differences. Environ Ecol Stat (to appear) Cushing CE, Allan JD (2001) Streams: their ecology and life. Academic Press, New York Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT (2007) Random forests for classification in ecology. Ecology 88:2783?2792 Efron B, Johnston I, Hastie T, Tibshirani R. (2004) Least angle regression. Ann Stat 32:407?499 Fahrmeir L, Kneib T, Lang S (2004) Penalized structured additive regression: a Bayesian perspective. Stat Sin 14:731?761 Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Measure 33:613?619 Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28:337?407 Gelfand AE (2007) Guest editorial: spatial and spatio-temporal modeling in environmental and ecological statistics. Environ Ecol Stat 14:191?192 Hastie T (2007) Discussion of ?Boosting algorithms: Regularization, prediction and model fitting? by P. B?hlmann and T. Hothorn. Stat Sci 22:513?515 Hastie T., Tibshirani R (1990) Generalized additive models. Chapman & Hall, London Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. 2 edn. Springer, New York Helms BS, Schoonover JE, Feminella JW (2009) Assessing influences of hydrology, physicochemistry, and habitat on streamfish assemblages across a changing landscape. JAmWater ResourAssoc 45:157?169 Homer C, Huang CQ, Yang LM, Wylie B, Coan M (2004) Development of a 2001 national land-cover database for the United States. Photogrammetr Eng Remote Sens 70:829?840 Hothorn T, B?hlmann P, Kneib T, Schmid M, Hofner B (2010) mboost: Model-Based Boosting. R package version 2.0-6. http://cran.r-project.org/web/packages/mboost/index.html Hothorn T, Leisch F, Zeileis A, Hornik K (2005) The design and analysis of benchmark experiments. J Comput Graph Stat 14(3):675?699 JoyMK, Death RG (2004) Predictive modelling and spatial mapping of freshwater fish and decapod assem- blages using GIS and neural networks. Freshw Biol 49:1036?1052 Karr JR (1991) Biological integrity: a long-neglected aspect of water resource management. Ecol Appl 1:66?84 Karr JR, Fausch KD, Angermeier PL, Yant PR, Schlosser IJ (1986) Assessing biological integrity in run- ning waters: a method and its rationale, 2 edn. Illinois Natural History Survey Special Publication 5, Champaign, IL King RS, Baker ME, Whigham DF, Weller DE, Jordan TE, Kazyak PF, Hurd MK (2005) Spatial consider- ations for linking watershed land cover to ecological indicators in streams. Ecol Appl 15:137?153 Kneib T, Hothorn T, Tutz G (2009) Variable selection and model choice in geoadditive regression models. Biometrics 65:626?634 123 732 Environ Ecol Stat (2011) 18:709?733 Kneib T, M?ller J, Hothorn T (2008) Spatial smoothing techniques for the assessment of habitat suitability. Environ Ecol Stat 15:343?364 Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2:18?22 LiawA,WienerM (2009) randomForest: Breiman and Cutler?s random forests for classification and regres- sion. R package version 4.5-33. http://cran.r-project.org/web/packages/randomForest/index.html Maloney KO, Weller DE, Russell MJ, Hothorn T (2009) Classifying the biological condition of small streams: an example using benthic macroinvertebrates. J North Am Benthol Soc 28:869?884 Matthews WJ, Robison HW (1998) Influence of drainage connectivity, drainage area and regional species richness on fishes of the Interior Highlands in Arkansas. Am Midland Nat 139:1?19 McCullagh P (1980) Regression models for ordinal data (with discussion). J R Stat Soc Ser B 42:109?142 Meier L, van de Geer S, B?hlmann P (2009) High-dimensional additive modeling. Ann Stat 37:3779?3821 Meyer D, Zeileis A, Hornik K (2009) vcd: Visualizing Categorical Data. R package version 1.2-7. http:// cran.r-project.org/web/packages/vcd/index.html Montgomery DR (1999) Process domains and the river continuum. J Am Water Resour Assoc 35:397?410 Oberdorff T, Hughes RM (1992) Modification of an index of biotic integrity based on fish assemblages to characterize rivers of the Seine Basin, France. Hydrobiologia 228:117?130 O?Hara RB, Sillanp?? MJ (2009) A review of Bayesian variable selection methods: what, how and which. Bayesian Anal 4:85?118 Omernik JM (1987) Ecoregions of the conterminous United States. Ann Assoc Am Geograph 77:118?125 Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103:681?686 Paul MJ, Meyer JL (2001) Streams in the urban landscape. Annu Rev Ecol Syst 32:333?365 Pebesma EJ, Bivand R (2009) sp: Classes and methods for spatial data. R package version 0.9-47. http:// cran.r-project.org/web/packages/sp/index.html Peterson EE, Urquhart NS (2006) Predicting water quality impaired stream segments using landscape-scale data and a regional geostatistical model: a case study in Maryland. Environ Monitor Assess 121:615? 638 Pyne MI, Rader RB, Christensen WF (2007) Predicting local biological characteristics in streams: a com- parison of landscape classifications. Freshw Biol 52:1302?1321 R Development Core Team (2009) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. url:http://www.R-project.org Rawlings JO, Pantula SG, Dickey DA (1998) Applied regression analysis: a research tool. 2 edn. Springer, New York Roy AH, Rosemond AD, Paul MJ, Leigh DS, Wallace JB (2003) Stream macroinvertebrate response to catchment urbanisation (Georgia, U.S.A.). Freshw Biol 48:329?346 Schleiger SL (2000) Use of an index of biotic integrity to detect effects of land uses on stream fish com- munities in west-central Georgia. Trans Am Fish Soc 129:1118?1133 SchmidM,Hothorn T (2008) Boosting additivemodels using component-wise P-splines. Comput Stat Data Anal 53:298?311 Schmid M, Potapov S, Pfahlberg A, Hothorn T (2010) Estimation and regularization techniques for regres- sion models with multidimensional prediction functions. Stat Comput 20:139?150 Southerland MT, Rogers GM, Kline MJ, Morgan RP, Boward DM, Kazyak PF, Klauda RJ, Stranko SA (2005)MarylandBiological StreamSurvey 2000?2004, VolumeXVI: new biological indicators to bet- ter assess the condition of Maryland streams. DNR-12-0305-0100, Maryland Department of Natural Resources, Annapolis, MD Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:267?288 USEPA (1999) From the mountains to the sea: the state of Maryland?s freshwater streams. EPA 903-R- 99-023. Office of Research and Development, US Environmental Protection Agency, Washington, DC USEPA (2006) Wadeable streams assessment: a collaborative survey of the Nation?s streams. EPA 841-B- 06-002. Office of Water, US Environmental Protection Agency, Washington, DC Vannote RL, Minshall GW, Cummins KW, Sedell JR, Cushing CE (1980) The river continuum concept. Can J Fish Aquatic Sci 37:130?137 Walsh CJ, Roy AH, Feminella JW, Cottingham PD, Groffman PM, Morgan RP (2005) The urban stream syndrome: current knowledge and the search for a cure. J North Am Benthol Soc 24:706?723 Wang L, Lyons J (2003) Fish and benthic macroinvertebrate assemblages as indicators of stream degrada- tion in urbanizing watersheds. In: Simon TP (ed) Biological response signatures: indicator patterns using aquatic communities. CRC Press, New York, pp 227?249 Wood S (2006) Generalized additive models: an introduction with R. Chapman & Hall/CRC, Boca Raton 123 Environ Ecol Stat (2011) 18:709?733 733 Author Biographies Matthias Schmid is currently a researcher at the Department of Biometry and Epidemiology at the Friedrich-Alexander University Erlangen-Nuremberg, Germany. He received a diploma (2004) and a Ph.D. (2007) in Statistics from the University of Munich, Germany. His research interests include statistical learn- ing, regression methods in biostatistics and statistical disclosure control. Torsten Hothorn was born in Dresden, Germany in 1975. He received a diploma in statistics from the Universit?t Dortmund in 2000 and was lecturer of statistics at the Friedrich-Alexander-Universit?t Erlangen-N?rnberg later on. Since 2007 he is professor of Biostatistics at the Institut f?r Statistik, Ludwig- Maximilians-Universit?t M?nchen. His research interests currently include statistical learning, nonpara- metric statistics and statistical computing. Kelly O. Maloney is currently a Research Ecologist with the US Geological Survey in Wellsboro, Pennsylvania, USA. He graduated with a B.S. from The Pennsylvania State University in 1994 and a Ph.D. in Biological Sciences from Auburn University, Alabama USA in 2004. His research focuses on diversity patterns of aquatic communities and how anthropogenic stressors affect this diversity. He is cur- rently evaluating the applicability of novel statistical techniques in addressing these issues. Donald E. Weller is a Senior Scientist and Quantitative Ecologist at the Smithsonian Environmental Research Center in Edgewater, Maryland, USA. He earned a B.A. in Biology from Wabash College, Craw- fordsville, Indiana, USA (1974) and a Ph.D. in Ecology from the University of Tennessee, Knoxville, USA (1985). Don has expertise in ecological modeling and landscape ecology. His recent research has focused on the linkages of watersheds to wetland condition, to stream chemistry and biology, and to estuarine health. Sergej Potapov is currently a Ph.D. researcher at the Department of Biometry and Epidemiology at the Friedrich-Alexander University Erlangen-Nuremberg, Germany. He received his M.Sc. in Business Math- ematics at the University of Augsburg, Germany (2006). His research interests include statistical learning, classification methods and statistical computing. 123