We recommend that for soil taxonomic class prediction, complex models and covariates selected by recursive feature elimination be used. Random forests (RF) using covariates selected via recursive feature elimination was consistently the most accurate, or was among the most accurate, classifiers between study areas and between covariate sets within each study area. Overall, complex models were consistently more accurate than simple or moderately complex models.
CONDITIONED LATIN HYPERCUBE SAMPLING PLUS
We also compared environmental covariates derived from digital elevation models and Landsat imagery that were divided into three different sets: 1) covariates selected a priori by soil scientists familiar with each area and used as input into cLHS, 2) the covariates in set 1 plus 113 additional covariates, and 3) covariates selected using recursive feature elimination. Tested machine learning models were divided into three groups based on model complexity: simple, moderate, and complex. We compared models that had been used in other DSM studies, including clustering algorithms, discriminant analysis, multinomial logistic regression, neural networks, tree based methods, and support vector machine classifiers. Sampling sites at each study area were selected using conditioned Latin hypercube sampling (cLHS).
![conditioned latin hypercube sampling conditioned latin hypercube sampling](https://images.slideplayer.com/26/8340474/slides/slide_38.jpg)
All three areas were the focus of digital soil mapping studies. Our objective was to compare multiple machine learning models and covariate sets for predicting soil taxonomic classes at three geographically distinct areas in the semi-arid western United States of America (southern New Mexico, southwestern Utah, and northeastern Wyoming). However, there is little guidance as to which, if any, machine learning model and covariate set might be optimal for predicting soil classes across different landscapes. Many different machine learning models have been applied in the literature and there are different approaches for selecting covariates for DSM. Machine learning is a general term for a broad set of statistical modeling techniques. Key components of DSM are the method and the set of environmental covariates used to predict soil classes. Digital soil mapping (DSM) can quantitatively predict the spatial distribution of soil taxonomic classes.
![conditioned latin hypercube sampling conditioned latin hypercube sampling](https://ars.els-cdn.com/content/image/1-s2.0-S009830040500292X-gr2.jpg)
![conditioned latin hypercube sampling conditioned latin hypercube sampling](https://abaqus-docs.mit.edu/2017/English/IhrComponentImages/olh_figure.png)
Mapping the spatial distribution of soil taxonomic classes is important for informing soil use and management decisions.