Applying Reinforcement Learning- Cooperative Game Theory (CGT-RL) in the fair distribution of profits from cooperation among basin stakeholders

Document Type : Complete scientific research article

Authors

1 Department of Civil Engineering, University of Sistan and Bluchestan, Zahedan, Iran.

2 Department of Civil Engineering, University of Sistan & Baluchestan

3 School of Engineering, University of Warwick, Coventry, CV4 7AL

Abstract

Background and objectives: In basins with diverse stakeholders, ensuring the equitable distribution of water resources and the benefits derived from them is a fundamental aspect of integrated water resources management. This issue becomes more complex when one party, despite being the primary resource of the basin, is less developed than the other due to inequality in wealth distribution. In such cases, achieving an equitable distribution of benefits among stakeholders is crucial, especially when the parties have the potential for sustainable cooperation, which can lead to greater collective benefits. Cooperative game theory (CGT) provides a suitable framework to address the equitable allocation of benefits in such competitive environments. However, determining the values of benefits in the objective function, considering different coalitions of stakeholders, can be very challenging. Reinforcement learning (RL) provides a valuable tool to determine the benefits of various levels of cooperation, including full cooperation, partial cooperation, and no cooperation among stakeholders. This study uses a Cooperative Game Theory-Reinforcement Learning (CGT-RL) approach to examine two adjacent watersheds, the North Karoon and Zayandeh Rood. The three main stakeholder provinces in these two basins, namely Chaharmahal Bakhtiari, Isfahan, and Khuzestan, have disagreements about water allocation and the fair and efficient distribution of benefits from its use, which have increased in the past few years. In this paper, the CGT-RL method is used for the first time to address this real-world challenge in a large water system. This study aims to use the aforementioned framework to fairly and efficiently distribute benefits (revenue) from water use in the case of a grand coalition and full cooperation among these three stakeholders. The proposed framework combines RL and CGT to address two major weaknesses of the approaches prevalent in previous research on maximizing and distributing benefits in multi-stakeholder water resource systems. The first weakness of the application of conventional optimization methods is the maximization of the total system benefit regardless of how it is distributed among the stakeholders. These methods assume that there is perfect cooperation among stakeholders and ignore the dynamics of individual decision-making and the benefits of each of them. Cooperative game solutions can provide strong incentives for individual decision-makers and facilitate cooperation to achieve the optimal solution. However, obtaining the information required to use them is very challenging and computationally complex. This leads to the second weakness of the applications of game theory in previous water resources studies, which made simplifying assumptions about the benefits achievable by the parties under different levels of cooperation. Overall, the goal of the present study is to empower various stakeholders through creating coalitions and group cooperation (collective rationality) to achieve greater individual benefits (individual rationality).
Materials and methods: Based on the CGT-RL method, first, the benefits achievable under each possible coalition, including the grand coalition (full cooperation), partial coalitions, and single coalition (non-cooperation), are obtained by applying the Q-Learning algorithm. In the second step, cooperative game solving methods, including Nash-Harsanyi bargaining, Shapley value, and Nucleolus, are used to fairly distribute the benefits resulting from full cooperation among stakeholders, considering various concepts of fairness. The stakeholders (players) include the provinces of Esfahan (ESF), Chaharmahal Bakhtiari (CHB), and Khuzestan (KHZ). Their different levels of cooperation include full cooperation, partial cooperation, and no cooperation. In full cooperation, the system is managed by a single agent that tries to optimize the objective function. In partial cooperation, the system is a two-agent system, one agent is responsible for the coalition of two provinces, and the other is responsible for optimizing the objective function for a single province. In non-cooperation, the system has three agents, each of which is separately responsible for a province, and each agent tries to maximize the objective function for the province that it is in charge of. The objective function is a linear function including the amount of water withdrawal from rivers by each province and the average monthly income of the provinces per unit of water consumed. The input data for optimization include the average monthly income of each province per cubic meter of water withdrawal, the volume of monthly water withdrawal by the provinces (MCM), the average volume of river discharge in each province (MCM), and the maximum and minimum values of annual water withdrawal and storage by each province (MCM).
Results: The research findings indicate that, in the case of full cooperation among stakeholders (i.e., establishing a grand coalition), the benefits for both parties increase compared to other levels of cooperation. In the case of no cooperation, that is, with the continuation of the current situation, the share of each province from the benefits derived from the withdrawal and use of water from the North Karoon and Zayandeh Rood rivers (average total annual income) is estimated to be 478765.72 billion rials for Isfahan, 421791.33 billion rials for Khuzestan, and 156881.39 billion rials for Chaharmahal Bakhtiari. In the case of full cooperation and establishing a coalition between all the beneficiary provinces, the annual revenues calculated using the whole system optimization method (Q learning algorithm) for Isfahan, Khuzestan, and Chaharmahal Bakhtiari provinces increase to 1641776.17 billion rials, 503201.94 billion rials, and 179054.8 billion rials, respectively. These values are 54 percent higher than the non-cooperation case and about 30 to 40 percent higher than the partial cooperation case (creation of small coalitions). The redistribution of income from full cooperation among stakeholders based on the Nash-Harsani bargaining solution for the three provinces of Isfahan, Khuzestan, and Chaharmahal Bakhtiari is 900963.88, 843989.49, and 579079.55 billion rials, respectively. Based on the Shapley value method, these values for the three provinces of Isfahan, Khuzestan, and Chaharmahal Bakhtiari are 1006962.48, 798112.57, and 518957.86 billion rials, respectively, and in the nucleolus method, the income distributed among the three provinces of Isfahan, Chaharmahal Bakhtiari, and Khuzestan is 886626.26, 847094.96, and 590311.69 billion rials per year, respectively. Esfahan (ESF), having a higher income due to the advantage of industrialization and development, has the highest share of income redistribution (about 38-43%). Chaharmahal Bakhtiari (CHB) gains more stability under the nucleolus method, as this method slightly increases its allocation to reduce dissatisfaction. Khuzestan (KHZ) has a share of income redistribution of the whole system optimization in all methods of 34-36%.
Conclusion: This study applied a novel framework to develop collaborative solutions to increase the efficiency of multi-stage, multi-agent water management problems in a practical manner to address a real-world issue. The proposed framework combines RL and CGT. The proposed CGT-RL approach was applied to distribute water benefits from the North Karoon and Zayandeh Rood basins among three stakeholder provinces. The results of this study show that with the computational capacity and the possibility of implementing the Q-learning algorithm, the CGT-RL method can solve much more complex problems in a reasonable time. The combination of RL and CGT provides an opportunity to explore coordinated policies that, in addition to maximizing the benefits of the entire system, also consider the fair allocation of benefits. According to the results, stakeholders can increase their benefits (revenue) from the Zayandeh Rood and North Karoon rivers by fully cooperating and coordinating exploitation policies. The results demonstrate that in the case of full cooperation among stakeholders, the benefits of parties increase relative to other levels of cooperation, as partial and non-cooperation.

Keywords

Main Subjects