Data collection based on FSDP backend on Ascend devices(en) ========================================================================================== Last updated: 08/14/2025. This is a tutorial for data collection using the GRPO or DAPO algorithm based on FSDP on Ascend devices. Configuration ------------- Leverage two levels of configuration to control data collection: 1. **Global profiler control**: Use parameters in ``ppo_trainer.yaml`` to control the collection mode and steps. 2. **Role profile control**: Use parameters in each role's ``profile`` field to control the collection mode for each role. Global collection control ~~~~~~~~~~~~~~~~~~~~~~~~~ Use parameters in ppo_trainer.yaml to control the collection mode and steps. - global_profiler: Control the ranks and mode of profiling - tool: The profiling tool to use, options are nsys, npu, torch, torch_memory. - steps: This parameter can be set as a list that has collection steps, such as [2, 4], which means it will collect steps 2 and 4. If set to null, no collection occurs. - save_path: The path to save the collected data. Default is "outputs/profile". Role collection control ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In each role's ``profiler`` field, you can control the collection mode for that role. - enable: Whether to enable profiling for this role. - all_ranks: Whether to collect data from all ranks. - ranks: A list of ranks to collect data from. If empty, no data is collected. - tool_config: Configuration for the profiling tool used by this role. Use parameters in each role's ``profiler.tool_config.npu`` to control npu profiler behavior: - level: Collection level—options are level_none, level0, level1, and level2 - level_none: Disables all level-based data collection (turns off profiler_level). - level0: Collect high-level application data, underlying NPU data, and operator execution details on NPU. - level1: Extends level0 by adding CANN-layer AscendCL data and AI Core performance metrics on NPU. - level2: Extends level1 by adding CANN-layer Runtime data and AI CPU metrics. - contents: A list of options to control the collection content, such as npu, cpu, memory, shapes, module, stack. - npu: Whether to collect device-side performance data. - cpu: Whether to collect host-side performance data. - memory: Whether to enable memory analysis. - shapes: Whether to record tensor shapes. - module: Whether to record framework-layer Python call stack information. - stack: Whether to record operator call stack information. - analysis: Enables automatic data parsing. - discrete: Whether to enable discrete mode. Examples -------- Disabling collection ~~~~~~~~~~~~~~~~~~~~ .. code:: yaml global_profiler: steps: null # disable profile End-to-End collection ~~~~~~~~~~~~~~~~~~~~~ .. code:: yaml global_profiler: steps: [1, 2, 5] actor_rollout_ref: actor: profiler: enable: True all_ranks: True tool_config: npu: discrete: False # rollout & ref follow actor settings Discrete Mode Collection ~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: yaml global_profiler: steps: [1, 2, 5] actor_rollout_ref: actor: profiler: enable: True all_ranks: True tool_config: npu: discrete: True # rollout & ref follow actor settings Visualization ------------- Collected data is stored in the user-defined save_path and can be visualized by using the `MindStudio Insight `_ tool. If the analysis parameter is set to False, offline parsing is required after data collection: .. code:: python import torch_npu # Set profiler_path to the parent directory of the "localhost.localdomain___ascend_pt" folder torch_npu.profiler.profiler.analyse(profiler_path=profiler_path)