Comparing decision tree and instance-based learning models to estimate soil saturated hydraulic conductivity

Document Type : Complete scientific research article

Authors

1 MSC. Student, Dept. of Water Science and Engineering, Birjand University

2 Faculty member of Birjand University

Abstract

Background and objectives : Soil saturated hydraulic conductivity is one of the most important physical characteristics of soils which affects water movement in soil. Knowledge of this parameter can help to understand and solve environmental problems. But measurement of this parameter by direct laboratory and field methods is hard, time consuming and expensive. Thus there is need to use alternative methods based on conveniently available soil properties to estimate it with less effort, time and cost. Nonparametric methods are new indirect methods to estimate hydraulic properties of soil, including soil saturated hydraulic conductivity (ks). The aim of this study was to use two methods such as M5P decision tree and an IBk instance-based learning method, which is a classifier with k nearest neighbors to estimate ks from conveniently available properties of soil.
Materials and methods: In this study a dataset of 151 soil samples which was collected from a site in Bojnord province was used. Conveniently available soil properties included sand, silt and clay percentage, bulk density, particle density, EC, OC, TNV, saturated moisture and pH. Saturated hydraulic conductivity was measured with the Guelph permeameter. The Gamma test was used to determine important parameters for predicting and the modeling procedure of ks. Then various combinations of parameters of the data set were compared to each other based on their Gamma value, to determine the optimum combination of parameters for modeling ks. Using the optimum combination which had the least Gamma value, the M5P decision tree and the IBk instance-based learning methods were performed. To improve the IBk, two different distance weighting systems were used. Finally, evaluation statistics of each model including R2, RMSE, MAE, and MAPE were calculated.
Results : The optimum combination determined by the Gamma test which was then used for modeling, included sand, silt and clay percent, TNV percent, EC, and bulk density. The tree selected bulk density as the most important discriminative parameter, and constructed 3 linear equations for predicting ks, based on the bulk density value. Evaluation criteria calculated for this model with RMSE= 23.89 cm/d and MAPE= 20.50% it didn’t predict ks accurately. Different weighting systems didn’t improve IBk performance. Also the IBk model with RMSE= 31.23 cm/d and MAPE= 23.24% didn't estimate ks accurately.
Conclusion : The decision tree model performed better than the instance-based learning model to estimate ks. Also the tree showed some information about the structure of the studied soil.

Keywords


1.Abbasi, F. 2017. Advanced soil physics. Tehran university press, 320p. (In Persian)
2.Aha, D.W., Kibler, D., and Albert, M.K. 1991. Instance-based learning algorithms. Machine learning, 6: 37-66.
3.Azar, A., and Momeny, M. 2006. Statistics and its application in management (Statistical analysis). Tehran: The organization for researching and composing university textbooks in the Humanities (SAMT). 440p. (In Persian)
4.Cateni, S., Colla, V., and Vannucci, M. 2008. Outlier detection methods for industrial applications. In: Arámburo, A. and Ramírez Treviño, A. (eds), Advances in Robotics, Automation and Control. (265-282). In Tech, Vienna, Austria.
5.Debeljak, M., and Džeroski, S. 2011. Decision Trees in Ecological Modelling. In: Jopp, F., Reuter, H., Breckling, B. (eds), Modelling Complex Ecological Dynamics. (197-209). Springer, Berlin, Heidelberg.
6.Evans, D. 2002. The Gamma Test: Data-derived estimates of noise for unknown smooth models using
near-neighbour asymptotics. Doctoral thesis, Department of computer science, Cardiff university, University of Wales.
7.Ghabaei Sough, M., Masaedi, A., Hesam, M., and Hezarjaribi, A. 2010. Evaluation effect of input parameters preprocessing in Artificial Neural Networks (Anns) by using stepwise regression and Gamma test techniques for fast estimation of daily evapotranspiration. J. Water Soil. 24: 3. 610-624. (In Persian)
8.Haghverdi, A., Ghahraman, B., Khoshnood Yazdi, A.A., and Arabi, Z. 2010. Estimating of water content in FC and PWP in North and North East of Iran's soil samples using k-Nearest Neighbor and Artificial Neural Networks. J. Water Soil. 24: 4. 804-814. (In Persian)
9.Jabro, J.D. 1992. Estimation of saturated hydraulic conductivity of soils from particle size distribution and bulk density data. Transactions of the ASAE, 35: 2. 557-560.
10.Jalali, V.R., and Homaee, M. 2011. Introducing a nonparametric model using k-nearest neighbor technique for predicting soil bulk density. Journal of Science and Technology of Agriculture and Natural Resources, Water and Soil Science. 15: 56. 181-191. (In Persian)
11.Jones, A.J. 1998. The WinGamma user guide. University of Wales, Cardiff.
12.Kemp, S.E., Wilson, I.D., and Ware, J.A. 2005. A tutorial on the gamma test. J. Sim. Syst. Sci. Technol. 6: 1-2. 67-73.
13.Khamis, A., Ismail, Z., Haron, Kh., and Tarmizi Mohammad, A. 2005. The effects of outlier data on neural network performance. J. Appl. Sci. 5: 8. 1394-1398.
14.Khashei Siuki, A., Jalali Moakhar, V.R., Noferesti, A.M., and Ramazani, Y. 2015. Comparing nonparametric
k-nearest neighbor technique with ANN model for predicting soil saturated hydraulic conductivity. J. Soil Manage. Sust. Prod. 5: 3. 81-95. (In Persian)
15.Lall, U., and Sharma, A. 1996. A nearest neighbor bootstrap for resampling hydrologic time series. Water Resources Research, 32: 3. 679-693.
16.Mahdian, M.H. 2005. Soil hydraulic conductivity and its application in drainage designs. J. Agric. Engin. Res. 6: 23. 159-170. (In Persian)
17.Mallant, D., Mohanty, B.P., Vervoort, A., and Feyen, J. 1997. Spatial analysis of saturated hydraulic conductivity in a soil with macropores. Soil Technology. 10: 115-131.
18.Moghaddamnia, A., Gousheh, M.G., Piri, J., Amin, S., and Han, D. 2009. Evaporation estimation using artificial neural networks and adaptive neuro-fuzzy inference system techniques. Advances in Water Resources. 32: 1. 88-97.
19.Moncada, M.P., Gabriels, D., and Cornelis, W.M. 2014. Data-driven analysis of soil quality indicators using limited data. Geoderma. 235: 271-278.
20.Nemes, A., Rawls, W.J., and Pachepsky, Y.A. 2006. Use of the nonparametric nearest neighbor approach to estimate soil hydraulic properties. Soil Sci. Soc. Amer. J. 70: 2. 327-336.
21.Nosrati Karizak, F., Movahedi Naeni, S.A., and Hezarjaribi, A. 2012. Using Artificial Neural Networks to estimate saturated hydraulic conductivity from easily available soil properties. J. Soil Manage. Sust. Prod. 2: 1. 95-110. (In Persian)
22.Rasoulzadeh, A., Razavi, S., and Neyshoubori, R. 2012. Evaluation the accuracy of methods of estimating saturated hydraulic conductivity in different soils. J. Water Res. Agric. 26: 3. 303-316. (In Persian)
23.Schaap, M.G., Leij, F.J., and Van Genuchten, M.T. 2001. Rosetta: A computer program for estimating soil hydraulic parameters with hierarchical pedotransfer functions. J. Hydrol. 251: 3-4. 163-176.
24.Torabi, M. 2004. Assessment of five methods of saturated hydraulic conductivity measurement in a saline soil. 2nd Students Conference on Soil and Water Resources. University of Shiraz. (In Persian)
25.Wang, Y., and Witten, I.H. 1997. Inducing model trees for continuous classes. In Proceedings of the Ninth European Conference on Machine Learning. Pp: 128-137.