Reinforcement Learning (RL) with Simulate

Simulate is designed to provide easy and scalable integration with reinforcement learning algorithms. The core abstraction is through the RLEnv class that wraps a Scene. The RLEnv allows an Actuator to be manipulated by an external agent or policy.

It is core to the design of Simulate that we are not creating Agents, but rather providing an interface for applications of machine learning and embodied AI. The core API for RL applications can be seen below, where Simulate constrains the information that flows from the Scene to the external agent through an Actuator abstraction.

At release, we include a set of pre-designed Actor’s that can act or navigate a scene. An Actor inherits from an Object3D and has sensors, actuators, and action mappings.

Core Classes

Actuator

class simulate.Actuator

< source >

( mapping: typing.List[simulate.assets.action_mapping.ActionMapping] actuator_tag: typing.Optional[str] = None n: typing.Optional[int] = None low: typing.Union[float, typing.List[float], numpy.ndarray, NoneType] = None high: typing.Union[float, typing.List[float], numpy.ndarray, NoneType] = None shape: typing.Optional[typing.List[int]] = None dtype: str = 'float32' seed: typing.Optional[int] = None )

Parameters

(we always have a scene-level gym dict space). —
n (int or List[int]) — for discrete actions, the number of possible actions for multi-binary actions, the number of possible binary actions or a list of the number of possible actions for each dimension low — low bound of continuous action space dimensions, either a float or list of floats high — high bound of continuous action space dimensions, either a float or list of floats shape — shape of continuous action space, should match low/high dtype — sampling type for continuous action spaces only

An Asset Actuator can be used to move an asset in the scene.

The actuator is designed to be a part of an Actor that manipulates a scene.

We define:

the space were the actions operate (discrete, continuous), it’s similar to gym spaces in RL, self.action_space is a gym.space (define the space action happens in and allow to sample)
a mapping to the physics engine behavior self.mapping is a list of ActionMapping (to physics engine behaviors)

RLEnv

class simulate.RLEnv

< source >

( scene_or_map_fn: typing.Union[typing.Callable, simulate.scene.Scene] n_maps: typing.Optional[int] = 1 n_show: typing.Optional[int] = 1 time_step: typing.Optional[float] = 0.03333333333333333 frame_skip: typing.Optional[int] = 4 **engine_kwargs )

RL environment wrapper for Simulate scene. Uses functionality from the VecEnv in stable baselines 3 For more information on VecEnv, see the source https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html

reset

< source >

( ) → obs (Dict)

Returns

obs (Dict)

the observation of the environment after reset.

Resets the actors and the scene of the environment.

sample_action

< source >

( ) → action

Returns

action

TODO

Samples an action from the actors in the environment. This function loads the configuration of maps and actors to return the correct shape across multiple configurations.

step

< source >

( action: typing.Union[typing.Dict, typing.List, numpy.ndarray] ) → observation (Dict)

Parameters

action (Dict or List) — TODO verify, a dict with actuator tags as keys and as values a Tensor of shape (n_show, n_actors, n_actions)

Returns

observation (Dict)

TODO reward (float): TODO done (bool): TODO info: TODO

The step function for the environment, follows the API from OpenAI Gym.

ActionMapping

class simulate.ActionMapping

< source >

( action: str amplitude: float = 1.0 offset: float = 0.0 axis: typing.Optional[typing.List[float]] = None position: typing.Optional[typing.List[float]] = None use_local_coordinates: bool = True is_impulse: bool = False max_velocity_threshold: typing.Optional[float] = None )

Parameters

action (str) — the physical action to be mapped to. A string selected in:
- “add_force”: apply a force to the object (at the center of mass) The force is given in Newton if is_impulse is False and in Newton*second if is_impulse is True. If is_impulse is False:
  - the value can be considered as applied during the duration of the time step (controlled by the frame rate)
  - changing the frame rate will change the force applied at each step but will lead to the same result over a given total duration. If is_impulse is True:
  - the force can be considered as a velocity change applied instantaneously at the step
  - changing the frame rate will not change the force applied at each step but will lead to the different result over a given total duration. (see https://docs.unity3d.com/ScriptReference/Rigidbody.AddForce.html) (see https://docs.unity3d.com/ScriptReference/Rigidbody.AddRelativeForce.html)
- “add_torque”: add a torque to the object (see https://docs.unity3d.com/ScriptReference/Rigidbody.AddTorque.html) (see https://docs.unity3d.com/ScriptReference/Rigidbody.AddRelativeTorque.html)
- “add_force_at_position”: add a force to the object at a position in the object’s local coordinate system (see https://docs.unity3d.com/ScriptReference/Rigidbody.AddForceAtPosition.html)
- “change_position”: teleport the object along an axis (see https://docs.unity3d.com/ScriptReference/Rigidbody.MovePosition.html)
- “change_rotation”: teleport the object around an axis (see https://docs.unity3d.com/ScriptReference/Rigidbody.MoveRotation.html)
- “do_nothing”: step the environment with no external input.
- “set_position”: teleport the object’s position to ‘position’ (see https://docs.unity3d.com/ScriptReference/Rigidbody.MovePosition.html)
- “set_rotation”: teleport the object’s rotation to ‘rotation’ (see https://docs.unity3d.com/ScriptReference/Rigidbody.MoveRotation.html)
axis (List[float] Vector3) — the axis of the action to be applied along or around
amplitude (float) — the amplitude of the action to be applied (see below for details)
offset (float) — the offset of the action to be applied (see below for details)
position (List[float] Vector3) — the position of the action in the case of the “add_force_at_position” action, this is the position of the force in the case of the set_position, this is the position to set the object to
use_local_coordinates (bool, default True) — whether to use the local/relative coordinates of the object
is_impulse (bool, default False) — whether to apply the action as an impulse or a force
max_velocity_threshold (float) — when we apply a force/torque, only apply if the velocity is below this value.

Map a RL agent action to an actor physical action

The conversion is as follows (where X is the RL input action and Y the physics engine action e.g. force, torque, position): Y = Y + (X - offset) * amplitude For discrete action we assume X = 1.0 so that amplitude can be used to define the discrete value to apply.

“max_velocity_threshold” can be used to limit the max resulting velocity or angular velocity after the action was applied :

max final velocity for “add_force” actions (in m/s) – only apply the action if the current velocity is below this value
max angular velocity for “add_torque” actions (in rad/s) - only apply the action if the current angular velocity is below this value Long discussion on Unity here: https://forum.unity.com/threads/terminal-velocity.34667/

Included Actors

class simulate.SimpleActor

< source >

( name: typing.Optional[str] = None position: typing.Optional[typing.List[float]] = None rotation: typing.Optional[typing.List[float]] = None scaling: typing.Union[float, typing.List[float], NoneType] = None transformation_matrix: typing.Optional[numpy.ndarray] = None material: typing.Optional[simulate.assets.material.Material] = None parent: typing.Optional[ForwardRef('Asset')] = None children: typing.Union[ForwardRef('Asset'), typing.List[ForwardRef('Asset')], NoneType] = None **kwargs )

Parameters

name (str) — position — length 3 list of the position of the agent, defaults to (0,0,0) rotation — length 3 list of the rotation of the agent, defaults to (0,0,0) scaling — transformation_matrix — parent — children —

Creates a bare-bones RL agent in the scene.

A SimpleActor is a sphere asset with:

basic XYZ positional control (continuous),
mass of 1 (default)
no attached Camera

class simulate.EgocentricCameraActor

< source >

( mass: float = 1.0 name: typing.Optional[str] = None position: typing.Optional[typing.List[float]] = None rotation: typing.Optional[typing.List[float]] = None scaling: typing.Union[float, typing.List[float], NoneType] = None camera_height: int = 40 camera_width: int = 40 camera_name: typing.Optional[str] = None transformation_matrix: typing.Optional[numpy.ndarray] = None material: typing.Optional[simulate.assets.material.Material] = None parent: typing.Optional[ForwardRef('Asset')] = None children: typing.Union[ForwardRef('Asset'), typing.List[ForwardRef('Asset')], NoneType] = None **kwargs )

Parameters

mass (float, Optional) —
name (str) — position — length 3 list of the position of the agent, defaults to (0,0,0) rotation — length 3 list of the rotation of the agent, defaults to (0,0,0) scaling — camera_height — pixel height of first-person camera observations camera_width — pixel width of first-person camera observations transformation_matrix — parent — children —

Create an Egocentric RL Actor in the Scene — essentially a basic first-person agent.

An egocentric actor is a capsule asset with:

a Camera as a child asset for observation device
a RigidBodyComponent component with a mass of 1.0
a discrete actuator

Future Applications

In the future we intend to support more functionality such as multi-agent RL, accelerated physics, and more.