| # Unity ML-Agents Python Low Level API |
|
|
| The `mlagents` Python package contains two components: a low level API which |
| allows you to interact directly with a Unity Environment (`mlagents_envs`) and |
| an entry point to train (`mlagents-learn`) which allows you to train agents in |
| Unity Environments using our implementations of reinforcement learning or |
| imitation learning. This document describes how to use the `mlagents_envs` API. |
| For information on using `mlagents-learn`, see [here](Training-ML-Agents.md). |
| For Python Low Level API documentation, see [here](Python-LLAPI-Documentation.md). |
|
|
| The Python Low Level API can be used to interact directly with your Unity |
| learning environment. As such, it can serve as the basis for developing and |
| evaluating new learning algorithms. |
|
|
| ## mlagents_envs |
| |
| The ML-Agents Toolkit Low Level API is a Python API for controlling the |
| simulation loop of an environment or game built with Unity. This API is used by |
| the training algorithms inside the ML-Agent Toolkit, but you can also write your |
| own Python programs using this API. |
| |
| The key objects in the Python API include: |
| |
| - **UnityEnvironment** — the main interface between the Unity application and |
| your code. Use UnityEnvironment to start and control a simulation or training |
| session. |
| - **BehaviorName** - is a string that identifies a behavior in the simulation. |
| - **AgentId** - is an `int` that serves as unique identifier for Agents in the |
| simulation. |
| - **DecisionSteps** — contains the data from Agents belonging to the same |
| "Behavior" in the simulation, such as observations and rewards. Only Agents |
| that requested a decision since the last call to `env.step()` are in the |
| DecisionSteps object. |
| - **TerminalSteps** — contains the data from Agents belonging to the same |
| "Behavior" in the simulation, such as observations and rewards. Only Agents |
| whose episode ended since the last call to `env.step()` are in the |
| TerminalSteps object. |
| - **BehaviorSpec** — describes the shape of the observation data inside |
| DecisionSteps and TerminalSteps as well as the expected action shapes. |
| |
| These classes are all defined in the |
| [base_env](../ml-agents-envs/mlagents_envs/base_env.py) script. |
| |
| An Agent "Behavior" is a group of Agents identified by a `BehaviorName` that |
| share the same observations and action types (described in their |
| `BehaviorSpec`). You can think about Agent Behavior as a group of agents that |
| will share the same policy. All Agents with the same behavior have the same goal |
| and reward signals. |
| |
| To communicate with an Agent in a Unity environment from a Python program, the |
| Agent in the simulation must have `Behavior Parameters` set to communicate. You |
| must set the `Behavior Type` to `Default` and give it a `Behavior Name`. |
| |
| _Notice: Currently communication between Unity and Python takes place over an |
| open socket without authentication. As such, please make sure that the network |
| where training takes place is secure. This will be addressed in a future |
| release._ |
|
|
| ## Loading a Unity Environment |
|
|
| Python-side communication happens through `UnityEnvironment` which is located in |
| [`environment.py`](../ml-agents-envs/mlagents_envs/environment.py). To load a |
| Unity environment from a built binary file, put the file in the same directory |
| as `envs`. For example, if the filename of your Unity environment is `3DBall`, |
| in python, run: |
|
|
| ```python |
| from mlagents_envs.environment import UnityEnvironment |
| # This is a non-blocking call that only loads the environment. |
| env = UnityEnvironment(file_name="3DBall", seed=1, side_channels=[]) |
| # Start interacting with the environment. |
| env.reset() |
| behavior_names = env.behavior_specs.keys() |
| ... |
| ``` |
| **NOTE:** Please read [Interacting with a Unity Environment](#interacting-with-a-unity-environment) |
| to read more about how you can interact with the Unity environment from Python. |
|
|
| - `file_name` is the name of the environment binary (located in the root |
| directory of the python project). |
| - `worker_id` indicates which port to use for communication with the |
| environment. For use in parallel training regimes such as A3C. |
| - `seed` indicates the seed to use when generating random numbers during the |
| training process. In environments which are stochastic, setting the seed |
| enables reproducible experimentation by ensuring that the environment and |
| trainers utilize the same random seed. |
| - `side_channels` provides a way to exchange data with the Unity simulation that |
| is not related to the reinforcement learning loop. For example: configurations |
| or properties. More on them in the [Side Channels](Custom-SideChannels.md) doc. |
|
|
| If you want to directly interact with the Editor, you need to use |
| `file_name=None`, then press the **Play** button in the Editor when the message |
| _"Start training by pressing the Play button in the Unity Editor"_ is displayed |
| on the screen |
|
|
| ### Interacting with a Unity Environment |
|
|
| #### The BaseEnv interface |
|
|
| A `BaseEnv` has the following methods: |
|
|
| - **Reset : `env.reset()`** Sends a signal to reset the environment. Returns |
| None. |
| - **Step : `env.step()`** Sends a signal to step the environment. Returns None. |
| Note that a "step" for Python does not correspond to either Unity `Update` nor |
| `FixedUpdate`. When `step()` or `reset()` is called, the Unity simulation will |
| move forward until an Agent in the simulation needs a input from Python to |
| act. |
| - **Close : `env.close()`** Sends a shutdown signal to the environment and |
| terminates the communication. |
| - **Behavior Specs : `env.behavior_specs`** Returns a Mapping of |
| `BehaviorName` to `BehaviorSpec` objects (read only). |
| A `BehaviorSpec` contains the observation shapes and the |
| `ActionSpec` (which defines the action shape). Note that |
| the `BehaviorSpec` for a specific group is fixed throughout the simulation. |
| The number of entries in the Mapping can change over time in the simulation |
| if new Agent behaviors are created in the simulation. |
| - **Get Steps : `env.get_steps(behavior_name: str)`** Returns a tuple |
| `DecisionSteps, TerminalSteps` corresponding to the behavior_name given as |
| input. The `DecisionSteps` contains information about the state of the agents |
| **that need an action this step** and have the behavior behavior_name. The |
| `TerminalSteps` contains information about the state of the agents **whose |
| episode ended** and have the behavior behavior_name. Both `DecisionSteps` and |
| `TerminalSteps` contain information such as the observations, the rewards and |
| the agent identifiers. `DecisionSteps` also contains action masks for the next |
| action while `TerminalSteps` contains the reason for termination (did the |
| Agent reach its maximum step and was interrupted). The data is in `np.array` |
| of which the first dimension is always the number of agents note that the |
| number of agents is not guaranteed to remain constant during the simulation |
| and it is not unusual to have either `DecisionSteps` or `TerminalSteps` |
| contain no Agents at all. |
| - **Set Actions :`env.set_actions(behavior_name: str, action: ActionTuple)`** Sets |
| the actions for a whole agent group. `action` is an `ActionTuple`, which |
| is made up of a 2D `np.array` of `dtype=np.int32` for discrete actions, and |
| `dtype=np.float32` for continuous actions. The first dimension of `np.array` |
| in the tuple is the number of agents that requested a decision since the |
| last call to `env.step()`. The second dimension is the number of discrete or |
| continuous actions for the corresponding array. |
| - **Set Action for Agent : |
| `env.set_action_for_agent(agent_group: str, agent_id: int, action: ActionTuple)`** |
| Sets the action for a specific Agent in an agent group. `agent_group` is the |
| name of the group the Agent belongs to and `agent_id` is the integer |
| identifier of the Agent. `action` is an `ActionTuple` as described above. |
| **Note:** If no action is provided for an agent group between two calls to |
| `env.step()` then the default action will be all zeros. |
| |
| #### DecisionSteps and DecisionStep |
| |
| `DecisionSteps` (with `s`) contains information about a whole batch of Agents |
| while `DecisionStep` (no `s`) only contains information about a single Agent. |
| |
| A `DecisionSteps` has the following fields : |
| |
| - `obs` is a list of numpy arrays observations collected by the group of agent. |
| The first dimension of the array corresponds to the batch size of the group |
| (number of agents requesting a decision since the last call to `env.step()`). |
| - `reward` is a float vector of length batch size. Corresponds to the rewards |
| collected by each agent since the last simulation step. |
| - `agent_id` is an int vector of length batch size containing unique identifier |
| for the corresponding Agent. This is used to track Agents across simulation |
| steps. |
| - `action_mask` is an optional list of two dimensional arrays of booleans which is only |
| available when using multi-discrete actions. Each array corresponds to an |
| action branch. The first dimension of each array is the batch size and the |
| second contains a mask for each action of the branch. If true, the action is |
| not available for the agent during this simulation step. |
| |
| It also has the two following methods: |
| |
| - `len(DecisionSteps)` Returns the number of agents requesting a decision since |
| the last call to `env.step()`. |
| - `DecisionSteps[agent_id]` Returns a `DecisionStep` for the Agent with the |
| `agent_id` unique identifier. |
| |
| A `DecisionStep` has the following fields: |
| |
| - `obs` is a list of numpy arrays observations collected by the agent. (Each |
| array has one less dimension than the arrays in `DecisionSteps`) |
| - `reward` is a float. Corresponds to the rewards collected by the agent since |
| the last simulation step. |
| - `agent_id` is an int and an unique identifier for the corresponding Agent. |
| - `action_mask` is an optional list of one dimensional arrays of booleans which is only |
| available when using multi-discrete actions. Each array corresponds to an |
| action branch. Each array contains a mask for each action of the branch. If |
| true, the action is not available for the agent during this simulation step. |
| |
| #### TerminalSteps and TerminalStep |
| |
| Similarly to `DecisionSteps` and `DecisionStep`, `TerminalSteps` (with `s`) |
| contains information about a whole batch of Agents while `TerminalStep` (no `s`) |
| only contains information about a single Agent. |
| |
| A `TerminalSteps` has the following fields : |
| |
| - `obs` is a list of numpy arrays observations collected by the group of agent. |
| The first dimension of the array corresponds to the batch size of the group |
| (number of agents requesting a decision since the last call to `env.step()`). |
| - `reward` is a float vector of length batch size. Corresponds to the rewards |
| collected by each agent since the last simulation step. |
| - `agent_id` is an int vector of length batch size containing unique identifier |
| for the corresponding Agent. This is used to track Agents across simulation |
| steps. |
| - `interrupted` is an array of booleans of length batch size. Is true if the |
| associated Agent was interrupted since the last decision step. For example, |
| if the Agent reached the maximum number of steps for the episode. |
| |
| It also has the two following methods: |
| |
| - `len(TerminalSteps)` Returns the number of agents requesting a decision since |
| the last call to `env.step()`. |
| - `TerminalSteps[agent_id]` Returns a `TerminalStep` for the Agent with the |
| `agent_id` unique identifier. |
| |
| A `TerminalStep` has the following fields: |
| |
| - `obs` is a list of numpy arrays observations collected by the agent. (Each |
| array has one less dimension than the arrays in `TerminalSteps`) |
| - `reward` is a float. Corresponds to the rewards collected by the agent since |
| the last simulation step. |
| - `agent_id` is an int and an unique identifier for the corresponding Agent. |
| - `interrupted` is a bool. Is true if the Agent was interrupted since the last |
| decision step. For example, if the Agent reached the maximum number of steps for |
| the episode. |
| |
| #### BehaviorSpec |
| |
| A `BehaviorSpec` has the following fields : |
| |
| - `observation_specs` is a List of `ObservationSpec` objects : Each `ObservationSpec` |
| corresponds to an observation's properties: `shape` is a tuple of ints that |
| corresponds to the shape of the observation (without the number of agents dimension). |
| `dimension_property` is a tuple of flags containing extra information about how the |
| data should be processed in the corresponding dimension. `observation_type` is an enum |
| corresponding to what type of observation is generating the data (i.e., default, goal, |
| etc). Note that the `ObservationSpec` have the same ordering as the ordering of observations |
| in the DecisionSteps, DecisionStep, TerminalSteps and TerminalStep. |
| - `action_spec` is an `ActionSpec` namedtuple that defines the number and types |
| of actions for the Agent. |
| |
| An `ActionSpec` has the following fields and properties: |
| - `continuous_size` is the number of floats that constitute the continuous actions. |
| - `discrete_size` is the number of branches (the number of independent actions) that |
| constitute the multi-discrete actions. |
| - `discrete_branches` is a Tuple of ints. Each int corresponds to the number of |
| different options for each branch of the action. For example: |
| In a game direction input (no movement, left, right) and |
| jump input (no jump, jump) there will be two branches (direction and jump), |
| the first one with 3 options and the second with 2 options. (`discrete_size = 2` |
| and `discrete_action_branches = (3,2,)`) |
| |
| |
| ### Communicating additional information with the Environment |
| |
| In addition to the means of communicating between Unity and python described |
| above, we also provide methods for sharing agent-agnostic information. These |
| additional methods are referred to as side channels. ML-Agents includes two |
| ready-made side channels, described below. It is also possible to create custom |
| side channels to communicate any additional data between a Unity environment and |
| Python. Instructions for creating custom side channels can be found |
| [here](Custom-SideChannels.md). |
| |
| Side channels exist as separate classes which are instantiated, and then passed |
| as list to the `side_channels` argument of the constructor of the |
| `UnityEnvironment` class. |
| |
| ```python |
| channel = MyChannel() |
| |
| env = UnityEnvironment(side_channels = [channel]) |
| ``` |
| |
| **Note** : A side channel will only send/receive messages when `env.step` or |
| `env.reset()` is called. |
| |
| #### EngineConfigurationChannel |
| |
| The `EngineConfiguration` side channel allows you to modify the time-scale, |
| resolution, and graphics quality of the environment. This can be useful for |
| adjusting the environment to perform better during training, or be more |
| interpretable during inference. |
| |
| `EngineConfigurationChannel` has two methods : |
| |
| - `set_configuration_parameters` which takes the following arguments: |
| - `width`: Defines the width of the display. (Must be set alongside height) |
| - `height`: Defines the height of the display. (Must be set alongside width) |
| - `quality_level`: Defines the quality level of the simulation. |
| - `time_scale`: Defines the multiplier for the deltatime in the simulation. If |
| set to a higher value, time will pass faster in the simulation but the |
| physics may perform unpredictably. |
| - `target_frame_rate`: Instructs simulation to try to render at a specified |
| frame rate. |
| - `capture_frame_rate` Instructs the simulation to consider time between |
| updates to always be constant, regardless of the actual frame rate. |
| - `set_configuration` with argument config which is an `EngineConfig` NamedTuple |
| object. |
| |
| For example, the following code would adjust the time-scale of the simulation to |
| be 2x realtime. |
| |
| ```python |
| from mlagents_envs.environment import UnityEnvironment |
| from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel |
| |
| channel = EngineConfigurationChannel() |
| |
| env = UnityEnvironment(side_channels=[channel]) |
| |
| channel.set_configuration_parameters(time_scale = 2.0) |
| |
| i = env.reset() |
| ... |
| ``` |
| |
| #### EnvironmentParameters |
| |
| The `EnvironmentParameters` will allow you to get and set pre-defined numerical |
| values in the environment. This can be useful for adjusting environment-specific |
| settings, or for reading non-agent related information from the environment. You |
| can call `get_property` and `set_property` on the side channel to read and write |
| properties. |
| |
| `EnvironmentParametersChannel` has one methods: |
| |
| - `set_float_parameter` Sets a float parameter in the Unity Environment. |
| - key: The string identifier of the property. |
| - value: The float value of the property. |
| |
| ```python |
| from mlagents_envs.environment import UnityEnvironment |
| from mlagents_envs.side_channel.environment_parameters_channel import EnvironmentParametersChannel |
| |
| channel = EnvironmentParametersChannel() |
| |
| env = UnityEnvironment(side_channels=[channel]) |
| |
| channel.set_float_parameter("parameter_1", 2.0) |
| |
| i = env.reset() |
| ... |
| ``` |
| |
| Once a property has been modified in Python, you can access it in C# after the |
| next call to `step` as follows: |
| |
| ```csharp |
| var envParameters = Academy.Instance.EnvironmentParameters; |
| float property1 = envParameters.GetWithDefault("parameter_1", 0.0f); |
| ``` |
| |
| #### Custom side channels |
| |
| For information on how to make custom side channels for sending additional data |
| types, see the documentation [here](Custom-SideChannels.md). |
| |