| # Designing a Learning Environment |
|
|
| This page contains general advice on how to design your learning environment, in |
| addition to overviewing aspects of the ML-Agents Unity SDK that pertain to |
| setting up your scene and simulation as opposed to designing your agents within |
| the scene. We have a dedicated page on |
| [Designing Agents](Learning-Environment-Design-Agents.md) which includes how to |
| instrument observations, actions and rewards, define teams for multi-agent |
| scenarios and record agent demonstrations for imitation learning. |
|
|
| To help on-board to the entire set of functionality provided by the ML-Agents |
| Toolkit, we recommend exploring our [API documentation](API-Reference.md). |
| Additionally, our [example environments](Learning-Environment-Examples.md) are a |
| great resource as they provide sample usage of almost all of our features. |
|
|
| ## The Simulation and Training Process |
|
|
| Training and simulation proceed in steps orchestrated by the ML-Agents Academy |
| class. The Academy works with Agent objects in the scene to step through the |
| simulation. |
|
|
| During training, the external Python training process communicates with the |
| Academy to run a series of episodes while it collects data and optimizes its |
| neural network model. When training is completed successfully, you can add the |
| trained model file to your Unity project for later use. |
|
|
| The ML-Agents Academy class orchestrates the agent simulation loop as follows: |
|
|
| 1. Calls your Academy's `OnEnvironmentReset` delegate. |
| 1. Calls the `OnEpisodeBegin()` function for each Agent in the scene. |
| 1. Gathers information about the scene. This is done by calling the |
| `CollectObservations(VectorSensor sensor)` function for each Agent in the |
| scene, as well as updating their sensor and collecting the resulting |
| observations. |
| 1. Uses each Agent's Policy to decide on the Agent's next action. |
| 1. Calls the `OnActionReceived()` function for each Agent in the scene, passing |
| in the action chosen by the Agent's Policy. |
| 1. Calls the Agent's `OnEpisodeBegin()` function if the Agent has reached its |
| `Max Step` count or has otherwise marked itself as `EndEpisode()`. |
|
|
| To create a training environment, extend the Agent class to implement the above |
| methods whether you need to implement them or not depends on your specific |
| scenario. |
|
|
| ## Organizing the Unity Scene |
|
|
| To train and use the ML-Agents Toolkit in a Unity scene, the scene as many Agent |
| subclasses as you need. Agent instances should be attached to the GameObject |
| representing that Agent. |
|
|
| ### Academy |
|
|
| The Academy is a singleton which orchestrates Agents and their decision making |
| processes. Only a single Academy exists at a time. |
|
|
| #### Academy resetting |
|
|
| To alter the environment at the start of each episode, add your method to the |
| Academy's OnEnvironmentReset action. |
|
|
| ```csharp |
| public class MySceneBehavior : MonoBehaviour |
| { |
| public void Awake() |
| { |
| Academy.Instance.OnEnvironmentReset += EnvironmentReset; |
| } |
| |
| void EnvironmentReset() |
| { |
| // Reset the scene here |
| } |
| } |
| ``` |
|
|
| For example, you might want to reset an Agent to its starting position or move a |
| goal to a random position. An environment resets when the `reset()` method is |
| called on the Python `UnityEnvironment`. |
|
|
| When you reset an environment, consider the factors that should change so that |
| training is generalizable to different conditions. For example, if you were |
| training a maze-solving agent, you would probably want to change the maze itself |
| for each training episode. Otherwise, the agent would probably on learn to solve |
| one, particular maze, not mazes in general. |
|
|
| ### Multiple Areas |
|
|
| In many of the example environments, many copies of the training area are |
| instantiated in the scene. This generally speeds up training, allowing the |
| environment to gather many experiences in parallel. This can be achieved simply |
| by instantiating many Agents with the same Behavior Name. If possible, consider |
| designing your scene to support multiple areas. |
|
|
| Check out our example environments to see examples of multiple areas. |
| Additionally, the |
| [Making a New Learning Environment](Learning-Environment-Create-New.md#optional-multiple-training-areas-within-the-same-scene) |
| guide demonstrates this option. |
|
|
| ## Environments |
|
|
| When you create a training environment in Unity, you must set up the scene so |
| that it can be controlled by the external training process. Considerations |
| include: |
|
|
| - The training scene must start automatically when your Unity application is |
| launched by the training process. |
| - The Academy must reset the scene to a valid starting point for each episode of |
| training. |
| - A training episode must have a definite end — either using `Max Steps` or by |
| each Agent ending its episode manually with `EndEpisode()`. |
|
|
| ## Environment Parameters |
|
|
| Curriculum learning and environment parameter randomization are two training |
| methods that control specific parameters in your environment. As such, it is |
| important to ensure that your environment parameters are updated at each step to |
| the correct values. To enable this, we expose a `EnvironmentParameters` C# class |
| that you can use to retrieve the values of the parameters defined in the |
| training configurations for both of those features. Please see our |
| [documentation](Training-ML-Agents.md#environment-parameters) |
| for curriculum learning and environment parameter randomization for details. |
|
|
| We recommend modifying the environment from the Agent's `OnEpisodeBegin()` |
| function by leveraging `Academy.Instance.EnvironmentParameters`. See the |
| WallJump example environment for a sample usage (specifically, |
| [WallJumpAgent.cs](../Project/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs) |
| ). |
|
|
| ## Agent |
|
|
| The Agent class represents an actor in the scene that collects observations and |
| carries out actions. The Agent class is typically attached to the GameObject in |
| the scene that otherwise represents the actor — for example, to a player object |
| in a football game or a car object in a vehicle simulation. Every Agent must |
| have appropriate `Behavior Parameters`. |
|
|
| Generally, when creating an Agent, you should extend the Agent class and implement |
| the `CollectObservations(VectorSensor sensor)` and `OnActionReceived()` methods: |
|
|
| - `CollectObservations(VectorSensor sensor)` — Collects the Agent's observation |
| of its environment. |
| - `OnActionReceived()` — Carries out the action chosen by the Agent's Policy and |
| assigns a reward to the current state. |
|
|
| Your implementations of these functions determine how the Behavior Parameters |
| assigned to this Agent must be set. |
|
|
| You must also determine how an Agent finishes its task or times out. You can |
| manually terminate an Agent episode in your `OnActionReceived()` function when |
| the Agent has finished (or irrevocably failed) its task by calling the |
| `EndEpisode()` function. You can also set the Agent's `Max Steps` property to a |
| positive value and the Agent will consider the episode over after it has taken |
| that many steps. You can use the `Agent.OnEpisodeBegin()` function to prepare |
| the Agent to start again. |
|
|
| See [Agents](Learning-Environment-Design-Agents.md) for detailed information |
| about programming your own Agents. |
|
|
| ## Recording Statistics |
|
|
| We offer developers a mechanism to record statistics from within their Unity |
| environments. These statistics are aggregated and generated during the training |
| process. To record statistics, see the `StatsRecorder` C# class. |
|
|
| See the FoodCollector example environment for a sample usage (specifically, |
| [FoodCollectorSettings.cs](../Project/Assets/ML-Agents/Examples/FoodCollector/Scripts/FoodCollectorSettings.cs) |
| ). |
|
|