Approach at a Glance
[Expand]
DRIVE is a decentralized peer-incentivization mechanism for emergent cooperation under changing rewards. It shapes incentives through reciprocal exchange of reward differences, enabling agents to dynamically align on cooperative behavior even when environmental reward scales or offsets drift over time.
DRIVE augments standard independent multi-agent reinforcement learning with a lightweight, local incentive-exchange protocol. Importantly, DRIVE does not learn incentive values and does not modify the action space; instead, it reshapes rewards dynamically based on observed outcomes.
| Advantage gating: an agent checks whether its temporal-difference (TD) advantage is non-negative before issuing an incentive request, exposing potentially exploitative behavior. | Reward-difference exchange: neighbors respond with differences between their epoch-average reward and the received request, ensuring incentives remain proportional and scale-free. |
Payoff interpretation: In matrix games such as the Prisoner’s Dilemma, this exchange swaps the temptation and sucker payoffs under unilateral defection, turning cooperation into the individually rational choice without altering environment dynamics.
Domains and Experimental Settings
[Expand]
The implementation reproduces the experimental evaluation from the paper, covering both matrix games and sequential social dilemmas (SSDs):
| Domain | Label | Description |
|---|---|---|
| IPD | Matrix-IPD |
Iterated Prisoner’s Dilemma |
| Coin-2 | CoinGame-2 |
2-player Coin Game |
| Coin-4 | CoinGame-4 |
4-player Coin Game |
| Harvest-12 | Harvest-12 |
12-agent Harvest environment |
Reward Change (Drift) Functions
To evaluate robustness under changing rewards, experiments apply shared per-epoch affine transformations:
| Drift Function | Label |
|---|---|
| No change | identity |
| Linear increase | linear |
| Exponential decay | exponential_decay |
| Stepwise increase | stepwise_increase |
| Damped cosine | cos_damped |
These transformations preserve the strategic structure of the social dilemma while altering reward magnitudes and offsets.
Key Results
[Expand]
Implemented MARL Algorithms
[Expand]
The repository includes the following algorithms for comparison:
| Algorithm | Label | Notes |
|---|---|---|
| Random policy | Random |
Non-learning baseline |
| Naive independent learning | IAC |
Policy gradient with normalized returns |
| LIO | LIO |
Learned incentive function |
| MATE | MATE-TD |
Fixed-token peer incentivization |
| DRIVE | DRIVE-TD |
Dynamic reward-difference exchange |
Non-PI baselines (e.g., Naive Learning) are invariant to reward drift due to return normalization, but do not resolve incentive misalignment in social dilemmas.
Experiment Parameters
[Expand]
Global experiment parameters such as the learning rate (params["learning_rate"]) or the number of episodes per epoch (params["episodes_per_epoch"]) are defined in settings.py.
Algorithm-specific hyperparameters are implemented in src/controllers, with default values matching those reported in the technical appendix of the paper. All parameters can be overridden via the params dictionary in settings.py.
Prerequisites and Installation
[Expand]
Prerequisites:
pip or compatible package managerInstallation:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Training
[Expand]
To train algorithm M (see Section 4) in domain D (Section 2) under reward drift function X, run:
python train.py D M X
This creates an output directory of the form output/N-agents_domain-D_drift-X_M_datetime containing trained models (if applicable) and training statistics as JSON files.
The script run.sh reproduces the full experimental sweep reported in the paper.
Code Structure
[Expand]
The core DRIVE mechanism is implemented in:
src/controllers/drive.py
This module contains the TD-gated request/response protocol and the reward shaping rule based on reciprocal reward differences.
Citation
[Expand]
When using this repository you can cite it as:
@inproceedings{altmann2026dynmaic,
title = {Dynamic Incentivized Cooperation under Changing Rewards},
author = {Philipp Altmann and Thomy Phan and Maximilian Zorn and Claudia Linnhoff-Popien and Sven Koenig},
booktitle = {The 25th International Conference on Autonomous Agents and Multi-Agent Systems},
series = {AAMAS '26},
year = {2026},
publisher = {International Foundation for Autonomous Agents and Multiagent Systems},
}