Philipp Altmann

Philipp Altmann

research

Discriminative Reward Co-Training

Implementations accompanying research on discriminative reward co-training (DIRECT), a self-imitation architecture for robust reinforcement learning from sparse rewards.

Discriminative Reward Co-Training

[Expand]

This repository contains the implementation of Discriminative Reward Co-Training (DIRECT), a novel reinforcement learning extension designed to enhance policy optimization in challenging environments with sparse rewards, hard exploration tasks, and dynamic conditions. DIRECT integrates a self-imitation buffer for storing high-return trajectories and a discriminator to evaluate policy-generated actions against these stored experiences. By using the discriminator as a surrogate reward signal, DIRECT enables efficient navigation of the reward landscape, outperforming existing state-of-the-art methods in various benchmark scenarios. This implementation supports reproducibility and further exploration of DIRECT's capabilities.

DIRECT Architecture Evaluation Results

Setup

[Expand]

Requirements

  • python 3.10
  • hyphi_gym
  • Stable Baselines 3
  • Installation

    pip install -r requirements.txt
    

    Training

    [Expand]

    Example for training DIRECT:

    from baselines import DIRECT 
    
    envs = ['Maze9Sparse']; epochs = 24
    model = DIRECT(envs=envs, seed=42, path='results')
    model.learn(total_timesteps = epochs * 2048 * 4)
    model.save()
    

    Running Experiments

    [Expand]

    Train DIRECT and baselines

    python -m run DIRECT -e Maze9Sparse -t 24 --path 'results/1-eval'
    python -m run [DIRECT|GASIL|SIL|A2C|PPO|VIME|PrefPPO] -e FetchReach -t 96 --path 'results/2-bench'
    

    Display help for command line arguments

    python -m run -h
    

    Run Evaluation Scripts

    ./run/1-eval/kappa.sh
    ./run/1-eval/omega.sh
    ./run/1-eval/chi.sh
    

    Run Benchmark Scripts

    ./run/2-bench/maze.sh
    ./run/2-bench/shift.sh
    ./run/2-bench/fetch.sh
    

    Plotting

    [Expand]

    Evaluation

    Kappa

    python -m plot results/1-eval/kappa -m Buffer --merge Training Momentum Scores 
    

    Omega

    python -m plot results/1-eval/omega -m Discriminator --merge Training
    

    Chi

    python -m plot results/1-eval/chi -m DIRECT --merge Training
    

    Benchmarks

    Maze

    python -m plot results/2-bench -e Maze9Sparse -m Training
    

    HoleyGrid

    python -m plot results/2-bench -e HoleyGrid -m Shift --merge Training
    

    Fetch

    python -m plot results/2-bench -e FetchReach -m Training
    

    Citation

    [Expand]

    When using this repository you can cite it as:

    @article{altmann2024discriminative,
      title = {Discriminative Reward Co-Training},
      author = {Philipp Altmann and Fabian Ritz and Maximilian Zorn and Michael Kölle and Thomy Phan and Thomas Gabor and Claudia Linnhoff-Popien},
      journal = {Neural Computing and Applications},
      year = {2025}, 
      volume = {37},
      number = {23},
      pages = {18793--18809},
      publisher = {Springer Nature},
      doi = {10.1007/s00521-024-10512-8},
    }
    
    , ,, in , , in vol. no. , pp. , . . [Code] [PDF] [Preprint]