RDMA设备/IB网卡重命名规则和方法

我们有时会遇到带IB网卡的GPU机器RDMA设备名不一样的问题,这种情况就会影响我们的正常使用,所以这个时候需要我们对RDMA设备名字进行修改
注意我们此次涉及到两个名字,一个是RDMA设备名,一个是IB网卡名,他们是不同的概念,你通常看到的mlx5开头的这是RDMA设备名,ib*这样的是IB网卡名,可以通过ip命令查看。

1.查看当前环境中的RDMA设备名

ibv_devices
    device                 node GUID
    ------              ----------------
    mlx5_0              a088c2030054f946
    mlx5_1              a088c2030054f96e
    mlx5_2              a088c2030054f81e
    mlx5_3              a088c203005a9bec
    mlx5_4              a088c2030054f93e
    mlx5_5              a088c2030054f7de
    mlx5_6              a088c2030054f996
    mlx5_7              a088c2030054f8b6

2.查看当前IB网卡(RDMA)设备的总线ID

lspci | grep -i Infiniband
0e:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
35:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
47:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
5b:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
86:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
af:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
c3:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
d6:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]

3.使用udev规则重命名RDMA设备/IB网卡

一般需要使用udev规则来给RDMA设备/IB网卡重命名,以下就是将mlx5_改为mpi_*的RDMA设备名

cat /usr/lib/udev/rules.d/60-rdma-persistent-naming.rules
ACTION=="add", KERNELS=="0000:0e:00.0", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FIXED mpi_0"
ACTION=="add", KERNELS=="0000:35:00.0", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FIXED mpi_1"
ACTION=="add", KERNELS=="0000:35:00.0", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FIXED mpi_2"
ACTION=="add", KERNELS=="0000:5b:00.0", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FIXED mpi_3"
ACTION=="add", KERNELS=="0000:86:00.0", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FIXED mpi_4"
ACTION=="add", KERNELS=="0000:af:00.0", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FIXED mpi_5"
ACTION=="add", KERNELS=="0000:c3:00.0", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FIXED mpi_6"
ACTION=="add", KERNELS=="0000:d6:00.0", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FIXED mpi_7"

修改IB网卡名也是直接使用bus id,可以直接查看下面配置文件

cat /etc/udev/rules.d/80-infiniband-names.rules
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:0e:00.0", DRIVERS=="mlx5_core", NAME="ibp0"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:35:00.0", DRIVERS=="mlx5_core", NAME="ibp1"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:47:00.0", DRIVERS=="mlx5_core", NAME="ibp2"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:5b:00.0", DRIVERS=="mlx5_core", NAME="ibp3"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:86:00.0", DRIVERS=="mlx5_core", NAME="ibp4"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:af:00.0", DRIVERS=="mlx5_core", NAME="ibp5"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:c3:00.0", DRIVERS=="mlx5_core", NAME="ibp6"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:d6:00.0", DRIVERS=="mlx5_core", NAME="ibp7"

然后使用udev命令重载数据

Reload udev database

重启openibd服务

/etc/init.d/openibd restart

或者直接重启操作系统也行
然后通过ib_vdevinfo再次查看是否名字已经按照我们的修改生效了。

udevadm info -a /sys/class/infiniband/mpi_0 | head -n 20

4.使用rdma命令修改RDMA设备名

/opt/mellanox/iproute2/sbin/rdma dev set mlx5_0 name mpi_0

注意这种修改方式,重启RDMA设备名字就会恢复,您需要将它做成服务给systemd托管,开机的时候自动进行修改。

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.sulao.cn/post/1172

评论列表

0%