Reward¶

Reward is the key element in Reinforcement Learning. Each training, an instance of reward class is created and serves for reward signal generation during training steps. In myGym Reward Module there are three basic types of reward signal implemented: Distance, Complex Distance and Sparse. You can choose on of them to be used in training by specifying reward= eiher distance, complex_distance or sparse in config file or pass it as a command line argument.

class myGym.envs.rewards.Reward(env, task=None)[source]¶

Reward base class for reward signal calculation and visualization

Parameters:

param env: (object) Environment, where the training takes place
param task: (object) Task that is being trained, instance of a class TaskModule

visualize_reward_over_steps()[source]¶: Plot and save a graph of reward values assigned to individual steps during an episode. Call this method after the end of the episode.

visualize_reward_over_episodes()[source]¶: Plot and save a graph of cumulative reward values assigned to individual episodes. Call this method to plot data from the current and all previous episodes.

class myGym.envs.rewards.DistanceReward(env, task)[source]¶

Reward class for reward signal calculation based on distance differences between 2 objects

Parameters:

param env: (object) Environment, where the training takes place
param task: (object) Task that is being trained, instance of a class TaskModule

compute(observation)[source]¶

Compute reward signal based on distance between 2 objects. The position of the objects must be present in observation.

Params:

param observation: (list) Observation of the environment

Returns:

return reward: (float) Reward signal for the environment

reset()[source]¶: Reset stored value of distance between 2 objects. Call this after the end of an episode.

calc_dist_diff(obj1_position, obj2_position)[source]¶

Calculate change in the distance between 2 objects in previous and in current step. Normalize the change by the value of distance in previous step.

Params:

param obj1_position: (list) Position of the first object
param obj2_position: (list) Position of the second object

Returns:

return norm_diff: (float) Normalized difference of distances between 2 objects in previsous and in current step

class myGym.envs.rewards.ComplexDistanceReward(env, task)[source]¶

Reward class for reward signal calculation based on distance differences between 3 objects, e.g. 2 objects and gripper for complex tasks

Parameters:

param env: (object) Environment, where the training takes place
param task: (object) Task that is being trained, instance of a class TaskModule

compute(observation)[source]¶

Compute reward signal based on distances between 3 objects. The position of the objects must be present in observation.

Params:

param observation: (list) Observation of the environment

Returns:

return reward: (float) Reward signal for the environment

reset()[source]¶: Reset stored value of distance between 2 objects. Call this after the end of an episode.

calc_dist_diff(obj1_position, obj2_position, obj3_position)[source]¶

Calculate change in the distances between 3 objects in previous and in current step. Normalize the change by the value of distance in previous step.

Params:

param obj1_position: (list) Position of the first object
param obj2_position: (list) Position of the second object
param obj3_position: (list) Position of the third object

Returns:

return norm_diff: (float) Sum of normalized differences of distances between 3 objects in previsous and in current step

class myGym.envs.rewards.PnPDistanceReward(env, task)[source]¶

Reward class for reward signal calculation based on distance differences between 4 objects, e.g. 1 object and gripper, finger1, finger2 for Panda

Parameters:

param env: (object) Environment, where the training takes place
param task: (object) Task that is being trained, instance of a class TaskModule

compute(observation)[source]¶

Compute reward signal based on distances between 4 objects. The position of the objects must be present in observation.

Params:

param observation: (list) Observation of the environment

Returns:

return reward: (float) Reward signal for the environment

reset()[source]¶: Reset stored value of distance between objects. Call this after the end of an episode.

calc_dist_diff(obj1_position, obj2_position, obj3_position, obj4_position)[source]¶

Calculate change in the distances between 4 objects in previous and in current step. Normalize the change by the value of distance in previous step.

Params:

param obj1_position: (list) Position of the first object
param obj2_position: (list) Position of the second object
param obj3_position: (list) Position of the third object
param obj4_position: (list) Position of the fourth object

Returns:

return norm_diff: (float) Sum of normalized differences of distances between 4 objects in previsous and in current step

calc_orn_diff(gripper_orn)[source]¶

Calculate change in the distances between 4 objects in previous and in current step. Normalize the change by the value of distance in previous step.

Params:

param gripper_orn: (list) Orientation of gripper

Returns:

return norm_diff: (float) Normalized differences of distances between desired orientation and orientation of gripper

class myGym.envs.rewards.SparseReward(env, task)[source]¶

Reward class for sparse reward signal

Parameters:

param env: (object) Environment, where the training takes place
param task: (object) Task that is being trained, instance of a class TaskModule

compute(observation=None)[source]¶

Compute sparse reward signal. Reward is 0 when goal is reached, -1 in every other step.

Params:

param observation: Ignored

Returns:

return reward: (float) Reward signal for the environment