Generate dataset¶

Mygym provides this useful tool to help you generate dataset for custom training of vision models YOLACT and VAE. You can configure the dataset by json file, where you first specify the general dataset parameters as type, number of images to generate etc. Next, specify the environment, choose cameras to use for rendering images and set their resolution. Then choose a robot and objects to appear in the scene. You can further control the objects quantity, appearance and location.

A very useful feature that eventualy helps you to achieve better performance with trained vision network is myGym’s randomizer. Randomizer works as a wrapper to your standart myGym environment that enables more advanced setting of the scene. Thanks to randomizer, you can change textures of static and dynamic objects and/or their colors. You can change light conditions such as light intensity, direction or color. Camera randomizer slightly changes camera properties. Joint randomizer enables robots and dynamic objects to change their configuration.

Note

To be able to use texture randomizer, download the texture dataset first:

cd myGym sh download_textures.sh

Dataset json file¶

You can see an example json file with all parameters here:

{
# directory
    "output_folder" : "../myGym/yolact_vision/data/yolact/datasets/my_dataset",
# dataset parameters
    "dataset_type" : "coco", #"coco" (for yolact)/ "dope" / "vae"
    "make_dataset" : "display", #mode of writing files, "new" (override), "resume" (append),"display" (don't store results)
    "num_episodes" : 10000000, #total number of episodes
    "num_steps" : 1, #may need more steps per episode, because the arms are moving and the objects are first falling down from above the table
    "make_shot_every_frame" : 1, #used as if episode % make_shot_every_frame : 0, so for 60 % 30 it's 3 shots (0, 30, and 60)
    "num_episodes_hard_reset" : 40, #hard reset every x episode ensures objects rendering when GUI is on
    "autosafe_episode" : 100, #if episode % auto_safe_episode, write json files to directory (prevent data loss when process crashes)
    "random_arm_movement" : false, 
    "active_cameras" : [1,0,1,1,1], #set 1 at a position(=camera number) to save images from this camera
    "camera_resolution" : [640,480],
    "min_obj_area" : 49, #each object will have at least this pixels visible, to be reasonably recognizable. If not, skip. (49 ~ 7x7pix img)
    "train_test_split_pct" : 0.1, #data split, 0.0 = only train, 1.0 = only test 
    "visualize" : false, #binary masks for each labeled object
# env parameters 
    "env_name" : "Gym-v0", #name of environment
    "workspace" : "table", #name of workspace
    "visgym" : true, #whether visualize gym background
    "robot" : "jaco", #which robot to show in the scene
    "gui_on" : true, #whether the GUI of the simulation should be used or not
    "show_bounding_boxes_gui" : false,
    "changing_light_gui" : false, 
    "shadows_on" : true,
    "color_dict" : null, #use to make (fixed) object colors - textures and color_randomizer will be suppressed, pass null to ignore
    "object_sampling_area" : [0.1, 0.8, 0.4, 1.0, 1.25, 1.35], # xxyyzz
    "num_objects_range" : [3,5], #range for random count of sampled objects in each scene (>=0)
# randomization parameters
"seed": 42,
    "texture_randomizer": {
      "enabled": true,
      "exclude": [], #objects that will not be texturized, e.g. "table" or "floor" or "objects"
      "seamless": true,
      "textures_path": "./envs/dtdseamless/",
      "seamless_textures_path": "./envs/dtdseamless/"
    },
    "light_randomizer": {
        "enabled": true,
        "randomized_dimensions": {
            "light_color": true, "light_direction": true,
            "light_distance": true, "light_ambient": true,
            "light_diffuse": true, "light_specular": true
            }},
    "camera_randomizer": {
        "enabled": true,
        "randomized_dimensions": {"target_position": true},
        "shift": [0.1, 0.1, 0.1]},
    "color_randomizer": {
        "enabled": true,
        "exclude": [], #objects that will not be texturized, e.g. "table" or "floor" or "objects"
        "randomized_dimensions": {"rgb_color": true, "specular_color": true}},
# objects parameters
    #Select here which classes you want to use in the simulator and annotate. Format: [quantity, class_name, class_id] 
    #If you want to make some classes to be classified as the same, assign them the same value, i.e. screw_round:1, peg:1
    #If you want some class to appear statistically more often, increase the quantity
    "used_class_names_quantity" : [[1,"jaco",1], [1,"jaco_gripper",2], [1,"car_roof",3], [1,"cube_holes",4], [1,"ex_bucket",5], [1,"hammer",6], [1,"nut",7], [1,"peg_screw",8], [1,"pliers",9], [1,"screw_round",10], [1,"screwdriver",11], [1,"sphere_holes",12],[1,"wafer",13], [1,"wheel",14], [1,"wrench",15]],
    #fix colors of objects, when color_dict = object_colors
    "object_colors" : {"car_roof": ["yellow"], "cube_holes": ["light_green"], "ex_bucket": ["black"], "hammer": ["red"], 
        "nut": ["light_blue"], "peg_screw": ["white"], "pliers": ["sepia"], "screw_round": ["light_blue"], 
        "screwdriver": ["purple"], "sphere_holes": ["gold"], "wafer":["dark_purple"], "wheel":["redwine"], "wrench": ["moccasin"]}
}

myGym.generate_dataset.color_names_to_rgb()[source]¶: Assign RGB colors to objects by name as specified in the training config file

myGym.generate_dataset.create_coco_json()[source]¶

Create COCO json data structure

Returns:

return data_train: (dict) Data structure for training data
return data_test: (dist) Data structure for testing data

class myGym.generate_dataset.GeneratorCoco[source]¶

Generator class for COCO image dataset for YOLACT vision model training

get_env()[source]¶

Create environment for COCO dataset generation according to dataset config file

Returns:

return env: (object) Environment for dataset generation

episode_zero()[source]¶: Initial espisode set-up

init_data()[source]¶

Initialize data structures for COCO dataset annotations

Returns:

return data_train: (dict) Data structure for training data
return data_test: (dist) Data structure for testing data

resume()[source]¶

Resume COCO dataset generation

Returns:

return data_train: (dict) Training data from preceding dataset generation in COCO data structure
return data_test: (dist) Testing data from preceding dataset generation in COCO data structure
return image_id: (int) ID of last generated image in preceding dataset generation

data_struct_image()[source]¶

Assign name to COCO dataset image and train of test status

Returns:

param data: (dict) Corresponding data dictionary
param name: (string) Name of image file for saving

store_image_info()[source]¶: Append COCO dataset image info to corresponding data dict

get_append_annotations()[source]¶: Make and append COCO annotations for each object in the scene

visualize()[source]¶: Visualize mask and bounding box coordinates for COCO annotated object

write_json_end()[source]¶: Write json file with COCO annotations to output directory

class myGym.generate_dataset.GeneratorVae[source]¶

Generator class for image dataset for VAE vision model training

get_env()[source]¶: Create environment for VAE dataset generation according to dataset config file

collect_data(steps)[source]¶

Collect data for VAE dataset

Parameters:

param steps: (int) Number of episodes initiated during dataset generation