CANN/oam-tools HCCL测试工具

HCCL Test

【免费下载链接】oam-tools 本项目为开发者提供故障定位工具,包含故障信息收集,软硬件信息展示,AI core error报错分析等能力,提升故障问题定位效率,文档可在昇腾社区搜索“故障处理简介”(选择社区版)。 【免费下载链接】oam-tools 项目地址: https://gitcode.com/cann/oam-tools

HCCL Test provides HCCL communication performance and correctness testing.

Environment Preparation

  • Install CANN Toolkit Package

    Running HCCL Test tool requires CANN Toolkit development kit package and CANN operator package. Download the corresponding version of CANN software package according to operating system architecture. Refer to Ascend Documentation Center - CANN Software Installation Guide for installation.

  • Set CANN Software Environment Variables

    # Default path, root user installation
    source /usr/local/Ascend/cann/set_env.sh
    # Default path, non-root user installation
    source $HOME/Ascend/cann/set_env.sh
    
  • Install and Configure MPI Environment Variables

    For MPI installation, refer to the "MPI Installation and Configuration" chapter in the HCCL Performance Test Tool User Guide.

    Environment variable configuration example:

    export PATH=/path/to/mpi/bin:$PATH
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/mpi/lib
    
  • For multi-machine cluster training, configure environment variable to specify host network card: (HCCL_SOCKET_IFNAME)

    # Configure HCCL initialization root communication network card name. HCCL can obtain Host IP through this network card name to complete communication domain creation.
    # Supported formats: (Choose one of the 4 specifications)
    # Exact match network card
    export HCCL_SOCKET_IFNAME==eth0,enp0     # Use specified eth0 or enp0 network card
    export HCCL_SOCKET_IFNAME=^=eth0,enp0    # Do not use eth0 and enp0 network cards
    # Fuzzy match network card
    export HCCL_SOCKET_IFNAME=eth,enp        # Use all network cards with eth or enp prefix
    export HCCL_SOCKET_IFNAME=^eth,enp       # Do not use any network cards with eth or enp prefix
    

    Note: Network card names are examples only and do not only apply to eth and enp network cards

  • For multi-machine cluster training, collect host network card information used by all nodes

    Edit hostfile file:

    vim hostfile
    

    Write all participating training node information into hostfile file. The format is as follows:

    # All participating training node ip:processes per node
    192.168.1.1:8
    192.168.1.2:8
    ...
    

Build

  • Build HCCL Test

    Where MPI_HOME is the MPI installation path and ASCEND_DIR is the CANN Toolkit development kit package installation path.

    make MPI_HOME=/path/to/mpi ASCEND_DIR=${ASCEND_HOME_PATH}
    

Execution

  • Single node run:

    mpirun -n 8 ./bin/all_reduce_test -b 8K -e 64M -f 2 -d fp32 -o sum -p 8
    
  • Multi-node run: (two nodes as example)

    mpirun -f hostfile -n 16 ./bin/all_reduce_test -b 8K -e 64M -f 2 -d fp32 -o sum -p 8
    

Parameters

All tests support the same parameter set:

  • NPU Count

    • [-p,--npus <npus used for one node>] Number of NPUs participating in training on each compute node. Default value: total NPUs on current node
  • Data Volume

    • -b,--minbytes <min size in bytes> Data volume starting value. Default value: 64M
    • -e,--maxbytes <max size in bytes> Data volume ending value. Default value: 64M
    • Data increment through increment step or multiplication factor parameter setting
      • -i,--stepbytes <increment size> Increment step. Default value: (max-min)/10

        Note: When input increment step (-i) is 0, continuous testing on data volume starting value (-b) is performed.

      • -f,--stepfactor <increment factor> Multiplication factor. Default value: not enabled

  • HCCL Operation Parameters

    • -o,--op <sum/prod/max/min> Collective communication operation reduction type. Default value: sum

    • -r,--root <root> Root node. Effective for broadcast, reduce, and scatter operations. Default value: 0

    • -d,--datatype <int8/int16/int32/fp16/fp32/int64/uint64/uint8/uint16/uint32/fp64> Data type. Default value: fp32 (that is, float32)

    • -z,--zero_copy <0/1> Enable zero copy. Effective for allgather, reduce_scatter, broadcast, allreduce operations that meet constraint conditions. Default value: 0

    • -m,--symmetric_memory <0/1> Enable symmetric memory. Effective for allgather, reduce_scatter, allreduce operations that meet constraint conditions. Default value: 0 Note: Zero copy and symmetric memory cannot be enabled simultaneously.

  • Performance

    • -n,--iters <iteration count> Iteration count. Default value: 20
    • -w,--warmup_iters <warmup iteration count> Warmup iteration count (not included in performance statistics, only affects HCCL Test execution time). Default value: 10
    • -t,--onlydevicetime <0/1> Exclude communication operator host-side software time and kernel loading time from communication execution time, only count device execution time (affects HCCL TEST execution time). Default value: 0 Note:
      1. When -t parameter is enabled, -w and -n parameter configuration does not exceed 100 rounds
      2. When -t parameter is enabled, aicpu_ts mode is not supported
      3. When -t parameter is enabled and HCCL_BUFFSIZE configuration is less than or equal to 100MB, -t parameter does not take effect
  • Result Verification

    • -c,--check <0/1> Verify collective communication operation result correctness (in large-scale cluster scenarios, enabling result verification increases HCCL Test execution time). Default value: 1 (enabled)

Execution Examples

  • allreduce

    # Single node 8 NPUs
    mpirun -n 8 ./bin/all_reduce_test -b 8K -e 64M -f 2 -p 8
    
    # Two nodes 16 NPUs
    mpirun -f hostfile -n 16 ./bin/all_reduce_test -b 8K -e 64M -f 2 -p 8
    
  • broadcast

    # Single node 8 NPUs
    mpirun -n 8 ./bin/broadcast_test -b 8K -e 64M -f 2 -p 8 -r 1
    
    # Two nodes 16 NPUs
    mpirun -f hostfile -n 16 ./bin/broadcast_test -b 8K -e 64M -f 2 -p 8 -r 1
    
  • allgather

    # Single node 8 NPUs
    mpirun -n 8 ./bin/all_gather_test -b 8K -e 64M -f 2 -p 8
    
    # Two nodes 16 NPUs
    mpirun -f hostfile -n 16 ./bin/all_gather_test -b 8K -e 64M -f 2 -p 8
    
  • alltoallv

    # Single node 8 NPUs
    mpirun -n 8 ./bin/alltoallv_test -b 8K -e 64M -f 2 -p 8
    
    # Two nodes 16 NPUs
    mpirun -f hostfile -n 16 ./bin/alltoallv_test -b 8K -e 64M -f 2 -p 8
    
  • alltoall

    # Single node 8 NPUs
    mpirun -n 8 ./bin/alltoall_test -b 8K -e 64M -f 2 -p 8
    
    # Two nodes 16 NPUs
    mpirun -f hostfile -n 16 ./bin/alltoall_test -b 8K -e 64M -f 2 -p 8
    
  • reducescatter

    # Single node 8 NPUs
    mpirun -n 8 ./bin/reduce_scatter_test -b 8K -e 64M -f 2 -p 8
    
    # Two nodes 16 NPUs
    mpirun -f hostfile -n 16 ./bin/reduce_scatter_test -b 8K -e 64M -f 2 -p 8
    
  • reduce

    # Single node 8 NPUs
    mpirun -n 8 ./bin/reduce_test -b 8K -e 64M -f 2 -p 8 -r 1
    
    # Two nodes 16 NPUs
    mpirun -f hostfile -n 16 ./bin/reduce_test -b 8K -e 64M -f 2 -p 8 -r 1
    

Run Cases with Specified deviceId

Before executing HCCL Test tool, enable the following environment variable to specify the devices to start.

Create an executable file in the hccl_test directory on the current compute node. For example: run.sh.

  • Single server scenario (start cards 4, 5, 6, 7)

    • Create executable file:

      run.sh file content:

      #Numbers after HCCL_TEST_USE_DEVS are the deviceIds to start
      export HCCL_TEST_USE_DEVS="4,5,6,7"
      $1
      
    • Case execution:

      mpirun -n 4 ./run.sh "./all_reduce_test -b 8K -e 64M -f 2 -p 4"
      
  • Multi-server scenario (compute node 1 starts cards 0, 1, 2, 3, compute node 2 starts cards 4, 5, 6, 7)

    • Compute node 1:

      • Create executable file:

        run.sh file content:

        export HCCL_TEST_USE_DEVS="0,1,2,3"
        $1
        
    • Compute node 2:

      • Create executable file:

        run.sh file content:

        export HCCL_TEST_USE_DEVS="4,5,6,7"
        $1
        
    • Case execution:

      mpirun -n 8 -f hostfile ./run.sh "./all_reduce_test -b 8K -e 64M -f 2 -p 4"
      

Enable Performance Data Collection

Before executing HCCL Test tool, set the following environment variable to enable performance data collection.

# "1" means enable profiling, "0" means disable profiling. Default value is "0". When enabled, performance data is collected during HCCL Test execution
export HCCL_TEST_PROFILING=1
# Specify profiling data storage path. Default is "/var/log/npu/profiling"
export HCCL_TEST_PROFILING_PATH=/home/profiling

After HCCL Test tool execution completes, profiling data is generated in the directory specified by HCCL_TEST_PROFILING_PATH.

【免费下载链接】oam-tools 本项目为开发者提供故障定位工具,包含故障信息收集,软硬件信息展示,AI core error报错分析等能力,提升故障问题定位效率,文档可在昇腾社区搜索“故障处理简介”(选择社区版)。 【免费下载链接】oam-tools 项目地址: https://gitcode.com/cann/oam-tools

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值