Reward¶
Reward is the key element in Reinforcement Learning. Each training, an instance of reward class is created and serves for reward signal generation during training steps. In myGym Reward Module there are three basic types of reward signal implemented: Distance, Complex Distance and Sparse. You can choose on of them to be used in training by specifying reward= eiher distance, complex_distance or sparse in config file or pass it as a command line argument.
-
class
myGym.envs.rewards.
Reward
(env, task=None)[source]¶ Reward base class for reward signal calculation and visualization
- Parameters:
- param env
(object) Environment, where the training takes place
- param task
(object) Task that is being trained, instance of a class TaskModule
-
class
myGym.envs.rewards.
DistanceReward
(env, task)[source]¶ Reward class for reward signal calculation based on distance differences between 2 objects
- Parameters:
- param env
(object) Environment, where the training takes place
- param task
(object) Task that is being trained, instance of a class TaskModule
-
compute
(observation)[source]¶ Compute reward signal based on distance between 2 objects. The position of the objects must be present in observation.
- Params:
- param observation
(list) Observation of the environment
- Returns:
- return reward
(float) Reward signal for the environment
-
reset
()[source]¶ Reset stored value of distance between 2 objects. Call this after the end of an episode.
-
calc_dist_diff
(obj1_position, obj2_position)[source]¶ Calculate change in the distance between 2 objects in previous and in current step. Normalize the change by the value of distance in previous step.
- Params:
- param obj1_position
(list) Position of the first object
- param obj2_position
(list) Position of the second object
- Returns:
- return norm_diff
(float) Normalized difference of distances between 2 objects in previsous and in current step
-
class
myGym.envs.rewards.
ComplexDistanceReward
(env, task)[source]¶ Reward class for reward signal calculation based on distance differences between 3 objects, e.g. 2 objects and gripper for complex tasks
- Parameters:
- param env
(object) Environment, where the training takes place
- param task
(object) Task that is being trained, instance of a class TaskModule
-
compute
(observation)[source]¶ Compute reward signal based on distances between 3 objects. The position of the objects must be present in observation.
- Params:
- param observation
(list) Observation of the environment
- Returns:
- return reward
(float) Reward signal for the environment
-
reset
()[source]¶ Reset stored value of distance between 2 objects. Call this after the end of an episode.
-
calc_dist_diff
(obj1_position, obj2_position, obj3_position)[source]¶ Calculate change in the distances between 3 objects in previous and in current step. Normalize the change by the value of distance in previous step.
- Params:
- param obj1_position
(list) Position of the first object
- param obj2_position
(list) Position of the second object
- param obj3_position
(list) Position of the third object
- Returns:
- return norm_diff
(float) Sum of normalized differences of distances between 3 objects in previsous and in current step
-
class
myGym.envs.rewards.
PnPDistanceReward
(env, task)[source]¶ Reward class for reward signal calculation based on distance differences between 4 objects, e.g. 1 object and gripper, finger1, finger2 for Panda
- Parameters:
- param env
(object) Environment, where the training takes place
- param task
(object) Task that is being trained, instance of a class TaskModule
-
compute
(observation)[source]¶ Compute reward signal based on distances between 4 objects. The position of the objects must be present in observation.
- Params:
- param observation
(list) Observation of the environment
- Returns:
- return reward
(float) Reward signal for the environment
-
reset
()[source]¶ Reset stored value of distance between objects. Call this after the end of an episode.
-
calc_dist_diff
(obj1_position, obj2_position, obj3_position, obj4_position)[source]¶ Calculate change in the distances between 4 objects in previous and in current step. Normalize the change by the value of distance in previous step.
- Params:
- param obj1_position
(list) Position of the first object
- param obj2_position
(list) Position of the second object
- param obj3_position
(list) Position of the third object
- param obj4_position
(list) Position of the fourth object
- Returns:
- return norm_diff
(float) Sum of normalized differences of distances between 4 objects in previsous and in current step
-
calc_orn_diff
(gripper_orn)[source]¶ Calculate change in the distances between 4 objects in previous and in current step. Normalize the change by the value of distance in previous step.
- Params:
- param gripper_orn
(list) Orientation of gripper
- Returns:
- return norm_diff
(float) Normalized differences of distances between desired orientation and orientation of gripper