Discriminative Reward Co-Training
[Expand]
This repository contains the implementation of Discriminative Reward Co-Training (DIRECT), a novel reinforcement learning extension designed to enhance policy optimization in challenging environments with sparse rewards, hard exploration tasks, and dynamic conditions. DIRECT integrates a self-imitation buffer for storing high-return trajectories and a discriminator to evaluate policy-generated actions against these stored experiences. By using the discriminator as a surrogate reward signal, DIRECT enables efficient navigation of the reward landscape, outperforming existing state-of-the-art methods in various benchmark scenarios. This implementation supports reproducibility and further exploration of DIRECT's capabilities.

Setup
[Expand]
Requirements
Installation
pip install -r requirements.txt
Training
[Expand]
Example for training DIRECT:
from baselines import DIRECT
envs = ['Maze9Sparse']; epochs = 24
model = DIRECT(envs=envs, seed=42, path='results')
model.learn(total_timesteps = epochs * 2048 * 4)
model.save()
Running Experiments
[Expand]
Train DIRECT and baselines
python -m run DIRECT -e Maze9Sparse -t 24 --path 'results/1-eval'
python -m run [DIRECT|GASIL|SIL|A2C|PPO|VIME|PrefPPO] -e FetchReach -t 96 --path 'results/2-bench'
Display help for command line arguments
python -m run -h
Run Evaluation Scripts
./run/1-eval/kappa.sh
./run/1-eval/omega.sh
./run/1-eval/chi.sh
Run Benchmark Scripts
./run/2-bench/maze.sh
./run/2-bench/shift.sh
./run/2-bench/fetch.sh
Plotting
[Expand]
Evaluation
Kappa
python -m plot results/1-eval/kappa -m Buffer --merge Training Momentum Scores
Omega
python -m plot results/1-eval/omega -m Discriminator --merge Training
Chi
python -m plot results/1-eval/chi -m DIRECT --merge Training
Benchmarks
Maze
python -m plot results/2-bench -e Maze9Sparse -m Training
HoleyGrid
python -m plot results/2-bench -e HoleyGrid -m Shift --merge Training
Fetch
python -m plot results/2-bench -e FetchReach -m Training
Citation
[Expand]
When using this repository you can cite it as:
@article{altmann2024discriminative,
title = {Discriminative Reward Co-Training},
author = {Philipp Altmann and Fabian Ritz and Maximilian Zorn and Michael Kölle and Thomy Phan and Thomas Gabor and Claudia Linnhoff-Popien},
journal = {Neural Computing and Applications},
year = {2025},
volume = {37},
number = {23},
pages = {18793--18809},
publisher = {Springer Nature},
doi = {10.1007/s00521-024-10512-8},
}