 
								نوع مقاله : مقاله کامل علمی پژوهشی
نویسندگان
1 گروه مکانیک، دانشکده فنی- مهندسی، دانشگاه صنعتی خاتمالانبیاء بهبهان، بهبهان، ایران
2 نویسنده مسئول، گروه عمران، دانشکده فنی - مهندسی، دانشگاه صنعتی خاتمالانبیاء بهبهان، بهبهان، ایران
3 دانشکده مهندسی عمران، ساخت و ساز و محیط زیست (دپارتمان ۲۴۷۰)، دانشگاه ایالتی داکوتای شمالی
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسندگان [English]
Background and objectives: Precise forecasting of water quality (WQ) parameters, specifically PS (potential salinity), is critical for sustainable water utilization. In water-stressed regions like the Karun River in Iran, effective monitoring and prediction of the PS is not only important but also critical because of anthropogenic activities, climate change, and reduced inflows of freshwater. Therefore, effective machine learning (ML) models and appropriate input data is very important for monitoring and predicting WQ parameters. However, the influencing factors exhibit complex and non-linear relationships, and multicollinearity in the datasets makes it challenging for traditional ML models to address the problem. Limitations, thus, can result in inaccurate predictions, which obstruct the establishment of sustainable water management strategies. As mentioned above, accurate forecasting of PS is essential for water and soil conservation, because PS helps mitigate salinity-related degradation of agricultural lands and ensure the sustainability of vital ecosystems. This study supports the development of effective conservation strategies to maintain soil productivity and WQ in vulnerable regions by providing reliable predictions. To address these issues, the present study introduces a new hybrid model, IKRidge-GRM, which inherits the advantages of improved kernel ridge regression (IKRidge) and generalized ridge regression (GRM). The hybrid model integrates IKRidge's improved capacity to identify non-linearity with GRM's resilience against multicollinearity problems to improve the predictive performance of the PS prediction. This unique framework offers improved stability and interpretability of results, as well as increases forecast accuracy, making it a helpful tool for environmental monitoring and decision-making. The proposed strategy could aid policymakers and water resource managers in designing reasonable strategies to alleviate salinity issues, protect aquatic ecosystems, and ensure the long-term survival of vital water sources like the Karun River.
Materials and methods: This study introduces a novel hybrid ML model based on two regression techniques, namely: generalized ridge regression (GRM) and improved kernel ridge regression (IKRidge), called IKRidge-GRM. The GRM effectively addresses multicollinearity and overfitting issues using the iteratively reweighted least squares (IRLS) process. On the other hand, IKRidge incorporates a wavelet kernel function, optimized through the INFO algorithm, and the regularized locally weighted (RLW) approach, enabling it to capture complex, non-linear patterns in the data with high precision. This combination of techniques allows the hybrid model to overcome the limitations of traditional ML methods, making it particularly suitable for handling the intricate relationships inherent in WQ datasets. To further enhance the model's predictive accuracy, the IKRidge-GRM framework integrates a light gradient boosting machine (LGBM) for feature selection. It reduces dimensionality by identifying the most relevant input variables while eliminating redundant or irrelevant features.
Additionally, the model employs multivariate variational mode decomposition (MVMD) to decompose the input data into high- and low-frequency components, allowing it to capture both short-term fluctuations and long-term trends in WQ parameters. The study utilized an extensive dataset comprising 48 years of monthly WQ data collected from the Farisat station on the Karun River. Nine keys WQ parameters, including magnesium (Mg), sulfate (SO42−), calcium (Ca), discharge (Q), sodium (Na), bicarbonate (HCO3), chloride (Cl), electrical conductivity (EC), total dissolved solids (TDS) and pH, were used as inputs to forecast the PS three months ahead. 
Results: The proposed IKRidge-GRM model accurately predicted PS values at the Farisat station, significantly outperforming baseline models (Ridge, DELM, and LSSVM) and their MVMD-enhanced versions. By leveraging its hybrid architecture and advanced feature extraction techniques, the MVMD-IKRidge-GRM model achieved remarkable results during the testing phase, with the highest correlation coefficient (R = 0.977), the lowest RMSE (0.956), and the lowest MAPE (4.521). These metrics indicate the model's superior predictive accuracy and reliability in handling complex, non-linear relationships. The model also achieved high IA (0.988) and KGE (0.948) scores, underscoring its robustness and effectiveness in capturing the intricate dynamics of the PS variations. These results highlight the model's ability to uncover hidden patterns in the data and provide highly accurate predictions, even in challenging scenarios involving multicollinearity and non-linear dependencies. The model's exceptional performance was further confirmed by visual evaluations such as scatter plots, relative error plots, and Taylor diagrams. Scatter plots demonstrated that the MVMD-IKRidge-GRM model's predictions closely aligned with measured values, with minimal prediction intervals and narrow error distributions, reflecting its precision and consistency. Relative error plots revealed that the model exhibited the most compact and symmetric error distribution, with minimal bias and variability. Relative error plots also indicated the models’ ability to generalize well across different data points. Taylor diagrams provided evidence of the model's strong agreement with reference data, showcasing its ability to balance accuracy, variability representation, and error minimization effectively. Residual analysis further confirmed the model's precision and reliability. Among all the models tested, the MVMD-IKRidge-GRM model achieved the smallest mean residual (-0.0073) and the lowest standard deviation (0.0613), demonstrating its ability to minimize prediction errors consistently. This level of precision is critical for practical applications, as it ensures that the model can provide reliable forecasts for decision-making in water resource management. The model's ability to integrate advanced regression techniques, feature selection, and frequency decomposition enhances its predictive capabilities. The ability also establishes the proposed model as a robust framework for addressing complex environmental challenges. These findings emphasized the potential of the MVMD-IKRidge-GRM model as a powerful tool for sustainable water resource management, particularly in regions like the Karun River basin, where accurate and reliable predictions are essential for mitigating environmental degradation and ensuring long-term ecological balance.
Conclusion: The IKRidge-GRM model predicted PS values at the Farisat station on the Karun River. The findings demonstrated high accuracy and reliability across all evaluation metrics. The IKRidge-GRM model has the ability to uncover hidden patterns in complex, non-linear datasets. Its capacity to deliver precise predictions also highlights its potential as a valuable tool for environmental monitoring and management. By integrating advanced regression techniques, such as improved kernel ridge regression (IKRidge) and generalized ridge regression (GRM), with innovative feature selection and decomposition methods like light gradient boosting machine (LGBM) and multivariate variational mode decomposition (MVMD), the model effectively addresses challenges such as multicollinearity, overfitting, and non-linear relationships. This comprehensive framework ensures that the IKRidge-GRM model achieves superior predictive performance and maintains robustness and adaptability across diverse environmental conditions. This study emphasizes the importance of combining advanced ML techniques with effective preprocessing methods to develop reliable models for analyzing and forecasting complex environmental data. Integrating feature selection and frequency decomposition enhances the model's ability to extract meaningful information from high-dimensional datasets. This integration also enable the models to capture both short-term fluctuations and long-term trends in WQ parameters better. Such capabilities are essential for addressing the multifaceted challenges posed by environmental degradation, particularly in regions like the Karun River basin, where water resources are under significant stress due to anthropogenic activities and climate change.
کلیدواژهها [English]