| # Profiling in Python |
|
|
| As part of the ML-Agents Tookit, we provide a lightweight profiling system, in |
| order to identity hotspots in the training process and help spot regressions |
| from changes. |
|
|
| Timers are hierarchical, meaning that the time tracked in a block of code can be |
| further split into other blocks if desired. This also means that a function that |
| is called from multiple places in the code will appear in multiple places in the |
| timing output. |
|
|
| All timers operate using a "global" instance by default, but this can be |
| overridden if necessary (mainly for testing). |
|
|
| ## Adding Profiling |
|
|
| There are two ways to indicate code should be included in profiling. The |
| simplest way is to add the `@timed` decorator to a function or method of |
| interested. |
|
|
| ```python |
| class TrainerController: |
| # .... |
| @timed |
| def advance(self, env: EnvManager) -> int: |
| # do stuff |
| ``` |
|
|
| You can also used the `hierarchical_timer` context manager. |
|
|
| ```python |
| with hierarchical_timer("communicator.exchange"): |
| outputs = self.communicator.exchange(step_input) |
| ``` |
|
|
| The context manager may be easier than the `@timed` decorator for profiling |
| different parts of a large function, or profiling calls to abstract methods that |
| might not use decorator. |
|
|
| ## Output |
|
|
| By default, at the end of training, timers are collected and written in json |
| format to `{summaries_dir}/{run_id}_timers.json`. The output consists of node |
| objects with the following keys: |
|
|
| - total (float): The total time in seconds spent in the block, including child |
| calls. |
| - count (int): The number of times the block was called. |
| - self (float): The total time in seconds spent in the block, excluding child |
| calls. |
| - children (dictionary): A dictionary of child nodes, keyed by the node name. |
| - is_parallel (bool): Indicates that the block of code was executed in multiple |
| threads or processes (see below). This is optional and defaults to false. |
| |
| ### Parallel execution |
| |
| #### Subprocesses |
| |
| For code that executes in multiple processes (for example, |
| SubprocessEnvManager), we periodically send the timer information back to the |
| "main" process, aggregate the timers there, and flush them in the subprocess. |
| Note that (depending on the number of processes) this can result in timers where |
| the total time may exceed the parent's total time. This is analogous to the |
| difference between "real" and "user" values reported from the unix `time` |
| command. In the timer output, blocks that were run in parallel are indicated by |
| the `is_parallel` flag. |
|
|
| #### Threads |
|
|
| Timers currently use `time.perf_counter()` to track time spent, which may not |
| give accurate results for multiple threads. If this is problematic, set |
| `threaded: false` in your trainer configuration. |
|
|