Assessing Performance of Multivariate Linear Regression (MLR), artificial neural network (ANN) and Gene Expression Programming (GEP) in estimating soil

Document Type : Complete scientific research article

Authors

Abstract

Assessing Performance of Multivariate Linear Regression (MLR), artificial neural network (ANN) and Gene Expression Programming (GEP) in estimating soil properties

Abstract
Background and Objectives: With the emergence of computers and geographic information system (GIS), as well as access to spatial digital data, different methods of data mining, modeling and estimation of soil properties found their place in soil sciences. Data mining of soil properties using computer-based statistical methods uncovers hidden patterns in the database which ultimately leads to the fitness of a model for estimation of soil properties. These methods can be used in the scorpan equation. Two main components of scorpan model include environmental variables and learning program. In the present study, three different methods including multiple linear regression (MLR), artificial neural network (ANN) and Gene Expression Programming (GEP) as “f’ function in scorpan model were evaluated and compared in estimating of soil properties using auxiliary data such as vegetation data, topography and remote sensing data.
Material and methods: The study area with an area of 1225 ha was located in Bajgiran rangelands, Khorasan Razavi province, Iran. In order to investigate vegetation cover and soil 137 units were investigated in which 3-5 plots were selected with a distance of 10 meters apart along an accidental transect, and plant species names and numbers besides vegetation percentage were recorded. Next, one soil sample was taken from each transect (Totally 137 soil sample). Train attributes derived from digital elevation model; different bands derived from the ETM and used for computing spectral indices; and plant diversity indices were calculated using Simpson and Shannon-Wiener. These obtained parameters were used as covariate in estimating calcium carbonate equivalent, clay, density, nitrogen, carbon, sand, silt and saturated moisture capacity. Data deduction was done by PCA analysis to deduct the number of input data for ANN and GEP models, and finally, Normalization and standardization were carried out on the data.
Results: The results obtained from the evaluation of three numerical methods based on root mean square error (RMSE), mean bias error (MBE) and coefficient of determination (R2) showed that ANN model had the highest accuracy in estimating soil properties, given the higher coefficients of determination for calcium carbonate equivalent , clay, density, nitrogen, carbon, sand, silt and saturated moisture capacity with the values of 0.72, 0.46, 0.69, 0.67, 0.77, 0.62, 0.7 and 0.85, respectively, moreover, lower RMSE with the values of 7.46, 4.46, 0.08, 0.03, 0.27, 5.6, 3.5 and 3.4, respectively. ANN could explain 60-85 percent of variability of soil properties, among which the best estimates were for saturated moisture capacity and soil organic carbon with R2 = 0.85 and R2 = 0.77, respectively.
Conclusion: Evaluating the estimation of soil properties through three numerical models introduced ANN as the most accurate model. ANN validation results showed that mean bias error (MBE) for estimated soil properties were close to zero, and this confirms that the fitting has been created unbiased by model. Furthermore, the low RMSE of model verified accurate estimation of soil variables. The results also indicate that GEP had higher accuracy than the linear regression method for most soil properties.

Keywords


1.Aitkenhead, M.J., and Coull, M.C. 2016. Mapping soil carbon stocks across Scotland using a
neural network model. Geoderma. 262: 187-198.
2.Aitkenhead, M.J., Coull, M., Towers, W., Hudson, G., and Black, H.J. 2013. Prediction of soil
characteristics and colour using data from the National Soil Inventory of Scotland.
Geoderma. 200/201: 99-107.
3.Alami, M., Sadegh Fam, S., and Fazelifard, M. 2012. Data series Modeling, 3rd edition, Tabriz
University Press, Tabriz, Iran, 622p. (In Persian)
4.Amini, M., Abbaspour, K.C., Khademi, H., Fathianpour, N., Afyuni, M., and Schulin, R.
2005. Neural network models to predict cation exchange capacity in arid regions of Iran.
Eur. J. Soil Sci. 56: 551-559.
5.Andrews, S.S., Mitchell, J.P., Mancinelli, R., Karlen, D.L., Hartz, T.K., Horwath, W.R.,
Pettygrove, G.S., Scow, K.M., and Munk, D.S. 2002. On-farm assessment of soil quality in
California's central valley. Agron. J. 94: 1. 12-23.
6.Ayoubi, S., and Alizadeh, M.H. 2006. Soil Surface properties prediction using digital
elevation model. J. Agric. Sci. Natur. Resour. 10: 2. 85-96. (In Persian)
7.Bannari, A., Morin, D., Bonn, F., and Huete, A.R. 1995. A review of vegetation indices.
Rem. Sens. Rev. 13: 2. 95-120.
8.Bagheri, M.B., and Mart, A. 2015. Digital soil mapping using artificial neural networks and
terrain-related attributes. Pedosphere. 25: 4. 580-591.
9.Ben-Dor, E., and Banin, A. 1995. Near-infrared analysis as a rapid method to simultaneously
evaluate several soil properties. Soil Sci. Soc. Am. J. 59: 364-372.
10.Boettinger, J.L., Ramsey, R.D., Bodily, J.M., Cole, N.J., Kienast-Brown, S., Nield, S.J.,
Saunders, A.M., and Stum, A.K. 2008. Landsat Spectral Data for Digital Soil Mapping,
P 193-202. In: Hartemink, A.E., McBratney, A., and de Lourdes Mendonça-Santos, M.
(eds.). Digital Soil Mapping with Limited Data. Springer.
11.Bourennane, H., Couturier, A., Pasquier, C., Chartin, C., Hinschberger, F., Macaire, J., and
Salvador-blanes, S. 2014. Comparative performance of classification algorithms for the
development of models of spatial distribution of landscape structures. Geoderma. 219: 136-44.
12.Cavazzi, S., Corstanje, R., Mayr, T., Hannam, J., and Fealy, R. 2013. Are fine
resolution digital elevation models always the best choice in digital soil mapping?
Geoderma. 195/196: 111-121.
13.Coleman, T.L., Agbu, P.A., and Montgomery, O.L. 1993. Spectral differentiation of surface
soils and soil properties: is it possible from space platforms? Soil Sci. 155: 283-293.
14.Collard, F., Kempen, B., Heuvelink, G.B.M., Saby, N.P.A., Richer, A.C., Forges, D.,
Lehmann, S., Nehlig, P., and Arrouays, D. 2014. Regional refining a reconnaissance soil
map by calibrating regression models with data from the same map (Normandy, France).
Geoderma Reg. 1: 21-30.
15.Demattê, J.A.M., Fiorio, P.R., Ben-dor, E., Fioriob, P.R., and Ben-Dorc, E. 2009. Estimation
of soil properties by orbital and laboratory reflectance means and its relation with soil
classification. Open Remote Sens. J. 2: 12-23.
16.Emamgolizadeh, S., Bateni, S.M., Shahsavani, D., Ashrafi, T., and Ghorbani, H. 2015.
Estimation of soil cation exchange capacity using genetic expression programming (GEP)
and multivariate adaptive regression splines. J. Hydrol. 529: 1590-1600.
17.Fazeli Sangani, M., Sarmadian, F., and Shorafa, M. 2010. Surveying and mapping of soil
physical properties using Geostatistic. M.Sc Thesis, Soil Science Department, Faculty of
Agriculture, University of Tehran, Tehran. (In Persian)
18.Ferreira, C. 2002. Gene Expression Programming in Problem Solving, Pp: 635-653..In: Roy,
R., Koeppen, M., Ovaska, S., Furuhashi, T., and Hoffmann, F. (eds.) Soft Computing and
Industry. Springer London.
19.Guo, P.T., Wu, W., Sheng, Q.K., Li, M.F., Liu, H.B., and Wang, Z.Y. 2013. Prediction of
soil organic matter using artificial neural network and topographic indicators in hilly areas.
Nutr. Cycl. Agroecosys. 95: 3. 333-344.
20.Han, J., Pei, J., and Kamber, M. 2011. Data mining: concepts and techniques. Elsevier,
Waltham, USA, 673p.
21.Hengl, T., and Reuter, H.I. 2009. Geomorphometry: Concepts, Software, Applications.
AE Amsterdam, Netherlands. 775p.
22.Heung, B., Chak, H., Zhang, J., Knudby, A., Bulmer, C.E., and Schmidt, M.G. 2016.
An overview and comparison of machine-learning techniques for classification purposes in
digital soil mapping. Geoderma, 265: 62-77.
23.Ingleby, H.R., and Crowe, T.G. 2001. Neural network models for predicting organic matter
content in Saskatchewan soils. Canad. Bisys. Engin. 43: 7. 1-7.5.
24.Karamouz, M., and Araghinejad, S. 2014. Advanced Hydrology. 3rd edition. Amirkabir
University of Technology Press, 464p. (In Persian)
25.Kia, M. 2009. Neural Networks in Matlab. Kian Rayan Sabz Publication, Tehran, 408p.
(In Persian)
26.Ließ, M., Glaser, B., and Huwe, B. 2012. Uncertainty in the spatial prediction of soil texture:
Comparison of regression tree and random forest models. Geoderma. 170: 70-79.
27.Luo, Z., Yaolin, L., Jian, W., and Jing, W. 2008. Quantitative mapping of soil organic
material using field spectrometer and hyperspectral remote sensing. Int. Arch. Photogramm.
Remote Sens. Spat. Inf. Sci. 37: 901-906.
28.Mahmoudabadi, E., and Karimi, A. 2015. Mapping of calcium carbonate equivalent and clay
content of surface soil using geostatistical methods (Case study: Chitgar park, Tehran).
RS GIS Tech. Nat. Resour. 6: 3. 73-85. (In Persian)
29.McBratney, A.B., Santos, M.L.M., and Minasny, B. 2003. On digital soil mapping.
Geoderma. 117: 3-52.
30.Metternicht, G.I., and Zinck, J.A. 2003. Remote sensing of soil salinity: potentials and
constraints. Rem. Sens. Environ. 85: 1-20.
31.Moghimi, S., and Parvizi, Y. 2015. Comparison of applying multi linear regression analysis
and artificial neural network methods for simulating topographic factors effect on soil
organic carbon. Water. Eng. Manage. 6: 312-322.
32.Montgomery, D.C., Peck, E.A., and Vining, G.G. 2015. Introduction to Linear Regression
Analysis. John Wiley & Sons.
33.Naseri, K. 2008. Calibration and application of rangeland health assessment method in the
range ecosystem of Khorasan province (Case study: Tandoureh area). Ph.D. Thesis,
Gorgan University of Agriculture Sciences and Natural Resources. (In Persian)
34.Pansu, M., and Gautheyrou, J. 2007. Handbook of Soil Analysis: Mineralogical, Organic and
Inorganic Methods. Springer Science & Business Media. 987p.
35.Parvizi, Y., Gorji, M., Omid, M., Mahdian, M.H., and Amini, M. 2010. Determination of soil
organic carbon variability of rainfed crop land in semi-arid region (Neural Network
Approach). Mod. Appl. Sci. 4: 7. 25-33.
36.Pilevar, A.R., Ayoubi, S., and Khademi, H. 2011. Comparison of artificial neural network
(ANN) and multivariate linear regression (MLR) models to predict soil organic carbon using
digital terrain analysis (Case study: Zargham Abad Semirom, Isfahan proviance). J. Water
Soil. 24: 1151-1163. (In Persian)
37.Priori, S., Bianconi, N., and Costantini, E.A.C. 2014. Can γ -radiometrics predict soil textural
data and stoniness in different parent materials ? A comparison of two machine-learning
methods. Geoderma. 226: 354-364.
38.Ratnayake, R.R., Karunaratne, S.B., Lessels, J.S., Yogenthiran, N., Rajapaksha, R.K., and
Gnanavelrajah, N. 2016. Regional digital soil mapping of organic carbon concentration in
paddy growing soils of Northern Sri Lanka. Geodrma Reg. 7: 2. 167-176.
39.Rouhnavaz, M., and Htamloo, A. 2014. Modeling of fluent- participation using genetic
algorithm programming, in: 1st National Industrial Mathematics Conference (NIMC 2014)
28 May 2014. Tabriz. (In Persian)
ابراهیم محمودآبادی و همکاران
43
40.Schaap, M.G., Leij, F.J., and van Genuchten, M.T. 1998. Neural network analysis for
hierarchical prediction of soil hydraulic properties. Soil Sci. Soc. Am. J. 62: 847-855.
41.Shabani, A. 2011. Topographic and soil attributes effects on rainfed wheat yield in Sisab
region, Northeastern Iran. M.Sc. Thesis, Faculty of Agriculture, Ferdowsi University of
Mashhad, Mashhad, Iran. (In Persian)
42.Sumfleth, K., and Duttmann, R. 2008. Prediction of soil property distribution in paddy soil
landscapes using terrain data and satellite information as indicators. Ecol. Ind. 8: 485-501.
43.Taborda, C., Oka-fiori, C., José, L., Santos, C., Evaristo, A., Ribeiro, C., and Faria, M.,
2013. Soil prediction using artificial neural networks and topographic attributes. Geoderma.
195: 165-172.
44.Taghizadeh-mehrjardi, R. 2015. Digital mapping of cation exchange capacity using
genetic programming and soil depth functions in Baneh region, Iran. Arch. Agrono. Soil Sci.
62: 1. 37-41.
45.Taghizadeh-mehrjardi, R., Ayoubi, S., Namazi, Z., and Malone, B.P. 2016. Prediction of soil
surface salinity in arid region of central Iran using auxiliary variables and genetic
programming. Arid Land Res. Manage. 30: 1. 49-64.
46.Thomas, M., Clifford, D., Bartley, R., Philip, S., Brough, D., Gregory, L., Willis, R., and
Glover, M. 2015. Putting regional digital soil mapping into practice in tropical Northern
Australia. Geoderma. 241: 145-157.
47.Thompson, J.A., and Kolka, R.K. 2005. Soil carbon storage estimation in a forested
watershed using quantitative soil-landscape modeling. Soil Sci. Soc. Am. J. 69: 1086-1093.
48.Wuttichaikitcharoen, P., and Babel, M.S. 2014. Principal component and multiple regression
analyses for the estimation of suspended sediment yield in Ungauged Basins of Northern
Thailand. Water. 6: 8. 2412-2435.
49.Zhou, P., Zhao, Y., Zhao, Z., and Chai, T. 2015. Source mapping and determining of soil
contamination by heavy metals using statistical analysis, artificial neural network and
adaptive genetic algorithm. J. Environ. Chem. Engin. 3: 4. 2569-79.