Niryo Robotic Grasping Environment
[Expand]
This is a simple reinforcement learning mujoco gym-environment for the Niryo-NED2 robot arm. The basis is heavily insprired by the Farama foundation's gymnasium-robotics (the Fetch evironments in particular). The original URDF files and meshes of the Niryo robot(s) that are included here are taken from the official Niryo repository.
The repository is intended to be as stand-alone as possible, depending only on gymnasium, mujoco and numpy (developed with python 3.12.8).
Environments
[Expand]
We include the following environments as gymnasium-registered strings:
| Base Environments | Task |
|---|---|
NiryoReach-v1 |
Move end-effector (gripper) to random target site. |
NiryoLift-v1 |
Grasp randomly placed object cube with end-effector (gripper) and lift to target site (always +0.5 in the z-axis from wherever the object position is). |
NiryoPickAndPlace-v1 |
Grasp randomly placed object cube with end-effector (gripper) and move to random target site. |
The control frequency of the robot is of f = 25 Hz. This is achieved by applying the same action in 20 subsequent simulator step (with a time step of dt = 0.002 s) before returning the control to the robot.
Modifications
All base-environments are registered with the following default parameters:
default_kwargs = {
"reward_type":"sparse", # one of ['sparse', 'dense']
"observation_type":"goal", # one of ['goal', 'camera']
"control_type": "mocap", # one of ['mocap', 'free']
"width":512, "height":512 # rendering resolution (px)
}
max_episode_length = 50
You can change these setting in the gym.make call, e.g.;
env = gym.make(
'NiryoReach-v1',
max_episode_steps=100,
reward_type="dense",
observation_type="camera",
control_type="free",
width=128, height=128
)
Reward
By default, the agent receives a sparse reward signal (reward_type='sparse'), i.e., $1$ if the Euclidean distance between the achieved goal position and the desired goal (depending on the Task) is below the distance_threshold=0.05 and $0$ otherwise.
Use reward_type='dense' to receive a dense reward signal calculated according to the distance to the target, i.e., the returned reward is the negative Euclidean distance between the achieved goal position and the desired goal.
Action and Observation Spaces
Depending on the control_type and observation_type the environments changes respectively to:
control_type |
observation_type |
observation | action |
|---|---|---|---|
mocap |
goal |
dict(desired_goal=Box(3,),achieved_goal=Box(3,),observation=Box(10,)) |
Box(-1.0, 1.0, (4,)) |
mocap |
camera |
dict(camera_pov=Box(width,height,3),camera_front=Box(width,height,3),observation=Box(10,)) |
Box(-1.0, 1.0, (4,)) |
free |
goal |
dict(desired_goal=Box(3,),achieved_goal=Box(3,),observation=Box(10,)) |
Box(-1.0, 1.0, (7,)) |
free |
camera |
dict(camera_pov=Box(width,height,3),camera_front=Box(width,height,3),observation=Box(10,)) |
Box(-1.0, 1.0, (7,)) |
where
observation_type='goal' is the goal-aware farama-robotics fetch robot dict-observation space (e.g. for Hindsight Experience Replay),observation_type='camera' is a dict-observation space with the robot's vector observation, as well as the frontal and gripper-pov camera-render image,control_type='mocap' is the farama-robotics Cartesian displacement dx, dy, and dz of the end effector, plus the control for closing and opening of the gripper, andcontrol_type='free' is the full DOF control of all 6 controllable joints, plus the control for closing and opening of the gripper. (In the order of shoulder, arm, elbow, forarm, wrist, hand, gripper.)Depending on the height, width resolution of your renderer (here 512x512), the camera observations look like this:
camera_front |
camera_pov |
|---|---|
![]() |
![]() |
Installation
[Expand]
Setup
git clone https://github.com/m-zorn/niryo-gym.git
cd niryo-gym
conda create -n niryo_gym python=3.12.8
conda activate niryo_gym
(niryo_gym) pip install -r requirements.txt
Testing
# if you want to run a benchmark-tests, also install
(niryo_gym) pip install stable-baselines3
Then you can simply run the following (cf. test_ppo.py):
import gymnasium as gym
from stable_baselines3 import PPO
import niryo_gym
if __name__=="__main__":
# Initialize env and model
env = gym.make("NiryoReach-v1")
model = PPO("MultiInputPolicy", env, verbose=1)
# Train the model
model.learn(50000)
# Visualize trained policy
env = gym.make("NiryoReach-v1", render_mode="human")
obs, _ = env.reset()
while True:
action, _states = model.predict(obs)
obs, rewards, trunc, term, info = env.step(action)
if trunc or term: obs, _ = env.reset()

