Philipp Altmann

Philipp Altmann

research

Dynamic Reward Incentives for Variable Exchange

Implementations accompanying research on dynamic reward incentives for variable exchange (DRIVE), a decentralized peer-incentivization mechanism for emergent cooperation under changing rewards.

Approach at a Glance

[Expand]

DRIVE is a decentralized peer-incentivization mechanism for emergent cooperation under changing rewards. It shapes incentives through reciprocal exchange of reward differences, enabling agents to dynamically align on cooperative behavior even when environmental reward scales or offsets drift over time.

DRIVE augments standard independent multi-agent reinforcement learning with a lightweight, local incentive-exchange protocol. Importantly, DRIVE does not learn incentive values and does not modify the action space; instead, it reshapes rewards dynamically based on observed outcomes.

Advantage gating: an agent checks whether its temporal-difference (TD) advantage is non-negative before issuing an incentive request, exposing potentially exploitative behavior. Reward-difference exchange: neighbors respond with differences between their epoch-average reward and the received request, ensuring incentives remain proportional and scale-free.
Advantage check Reward difference

Payoff interpretation: In matrix games such as the Prisoner’s Dilemma, this exchange swaps the temptation and sucker payoffs under unilateral defection, turning cooperation into the individually rational choice without altering environment dynamics.

DRIVE overview Payoff tables

Domains and Experimental Settings

[Expand]

The implementation reproduces the experimental evaluation from the paper, covering both matrix games and sequential social dilemmas (SSDs):

Domain Label Description
IPD Matrix-IPD Iterated Prisoner’s Dilemma
Coin-2 CoinGame-2 2-player Coin Game
Coin-4 CoinGame-4 4-player Coin Game
Harvest-12 Harvest-12 12-agent Harvest environment

Coin game domains Harvest domain

Reward Change (Drift) Functions

To evaluate robustness under changing rewards, experiments apply shared per-epoch affine transformations:

Drift Function Label
No change identity
Linear increase linear
Exponential decay exponential_decay
Stepwise increase stepwise_increase
Damped cosine cos_damped

Reward drift functions

These transformations preserve the strategic structure of the social dilemma while altering reward magnitudes and offsets.


Key Results

[Expand]
  • Robust cooperation under reward drift: DRIVE consistently maintains high cooperation levels across domains and reward-change schedules.
  • Scale and shift invariance: Unlike prior peer-incentivization methods with fixed incentives or learned incentive functions, DRIVE remains effective without retuning.
  • Competitive baseline performance: In static settings, DRIVE matches or exceeds state-of-the-art peer-incentivization methods, while significantly outperforming them when rewards change.
  • Aggregate results Results under drift


    Implemented MARL Algorithms

    [Expand]

    The repository includes the following algorithms for comparison:

    Algorithm Label Notes
    Random policy Random Non-learning baseline
    Naive independent learning IAC Policy gradient with normalized returns
    LIO LIO Learned incentive function
    MATE MATE-TD Fixed-token peer incentivization
    DRIVE DRIVE-TD Dynamic reward-difference exchange

    Non-PI baselines (e.g., Naive Learning) are invariant to reward drift due to return normalization, but do not resolve incentive misalignment in social dilemmas.


    Experiment Parameters

    [Expand]

    Global experiment parameters such as the learning rate (params["learning_rate"]) or the number of episodes per epoch (params["episodes_per_epoch"]) are defined in settings.py.

    Algorithm-specific hyperparameters are implemented in src/controllers, with default values matching those reported in the technical appendix of the paper. All parameters can be overridden via the params dictionary in settings.py.


    Prerequisites and Installation

    [Expand]

    Prerequisites:

  • Python 3.8+
  • pip or compatible package manager
  • Optional: CUDA-enabled GPU for PyTorch acceleration
  • Installation:

    python -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt
    

    Training

    [Expand]

    To train algorithm M (see Section 4) in domain D (Section 2) under reward drift function X, run:

    python train.py D M X
    

    This creates an output directory of the form output/N-agents_domain-D_drift-X_M_datetime containing trained models (if applicable) and training statistics as JSON files.

    The script run.sh reproduces the full experimental sweep reported in the paper.


    Code Structure

    [Expand]

    The core DRIVE mechanism is implemented in:

    src/controllers/drive.py
    

    This module contains the TD-gated request/response protocol and the reward shaping rule based on reciprocal reward differences.

    Citation

    [Expand]

    When using this repository you can cite it as:

    @inproceedings{altmann2026dynmaic,
      title = {Dynamic Incentivized Cooperation under Changing Rewards},
      author = {Philipp Altmann and Thomy Phan and Maximilian Zorn and Claudia Linnhoff-Popien and Sven Koenig},
      booktitle = {The 25th International Conference on Autonomous Agents and Multi-Agent Systems},
      series = {AAMAS '26},
      year = {2026},
      publisher = {International Foundation for Autonomous Agents and Multiagent Systems},
    }
    
    , ,, in , , in vol. no. , pp. , . . [Code] [PDF] [Preprint]