Discriminative Reward Co-Training

[Expand]

This repository contains the implementation of Discriminative Reward Co-Training (DIRECT), a novel reinforcement learning extension designed to enhance policy optimization in challenging environments with sparse rewards, hard exploration tasks, and dynamic conditions. DIRECT integrates a self-imitation buffer for storing high-return trajectories and a discriminator to evaluate policy-generated actions against these stored experiences. By using the discriminator as a surrogate reward signal, DIRECT enables efficient navigation of the reward landscape, outperforming existing state-of-the-art methods in various benchmark scenarios. This implementation supports reproducibility and further exploration of DIRECT's capabilities.

DIRECT Architecture Evaluation Results

Setup

[Expand]

Requirements

python 3.10

hyphi_gym

Stable Baselines 3

Installation

pip install -r requirements.txt

Training

[Expand]

Example for training DIRECT:

from baselines import DIRECT 

envs = ['Maze9Sparse']; epochs = 24
model = DIRECT(envs=envs, seed=42, path='results')
model.learn(total_timesteps = epochs * 2048 * 4)
model.save()

Running Experiments

[Expand]

Train DIRECT and baselines

python -m run DIRECT -e Maze9Sparse -t 24 --path 'results/1-eval'
python -m run [DIRECT|GASIL|SIL|A2C|PPO|VIME|PrefPPO] -e FetchReach -t 96 --path 'results/2-bench'

Display help for command line arguments

python -m run -h

Run Evaluation Scripts

./run/1-eval/kappa.sh
./run/1-eval/omega.sh
./run/1-eval/chi.sh

Run Benchmark Scripts

./run/2-bench/maze.sh
./run/2-bench/shift.sh
./run/2-bench/fetch.sh

Plotting

[Expand]

Evaluation

Kappa

python -m plot results/1-eval/kappa -m Buffer --merge Training Momentum Scores

Omega

python -m plot results/1-eval/omega -m Discriminator --merge Training

Chi

python -m plot results/1-eval/chi -m DIRECT --merge Training

Benchmarks

Maze

python -m plot results/2-bench -e Maze9Sparse -m Training

HoleyGrid

python -m plot results/2-bench -e HoleyGrid -m Shift --merge Training

Fetch

python -m plot results/2-bench -e FetchReach -m Training

Citation

[Expand]

When using this repository you can cite it as:

@article{altmann2024discriminative,
  title = {Discriminative Reward Co-Training},
  author = {Philipp Altmann and Fabian Ritz and Maximilian Zorn and Michael Kölle and Thomy Phan and Thomas Gabor and Claudia Linnhoff-Popien},
  journal = {Neural Computing and Applications},
  year = {2025}, 
  volume = {37},
  number = {23},
  pages = {18793--18809},
  publisher = {Springer Nature},
  doi = {10.1007/s00521-024-10512-8},
}