Wednesday, July 8, 2015

Installing and configuring Infiniband on a Red Hat system

This post will take you through the installation and configuration of an Infiniband card on a server running Red Hat Enterprise Linux 5.4.  These steps are applicable to any version of Red Hat 5, and will probably work with version 6 as well.  It has been surprisingly hard to find all of these steps in one document.

Required packages

openib-1.4.1-6.el5.noarch
libibverbs-1.1.3-2.el5.x86_64
libnes-0.9.0-2.el5.x86_64
libibumad-1.3.3-1.el5.x86_64
opensm-libs-3.3.3-2.el5.x86_64
swig-1.3.29-2.el5.x86_64
ibutils-libs-1.2-11.1.el5.x86_64
ibutils-1.2-11.1.el5.x86_64
(provides ibdiagnet and others)
opensm-3.3.3-2.el5.x86_64
libibmad-1.3.3-1.el5.x86_64
infiniband-diags-1.5.3-1.el5.x86_64
(provides handy tools like ibstat and ibstatus)
libibverbs-utils-1.1.3-2.el5.x86_64
(provides handy tools ibv_devinfo and ibv_devices)
libibverbs-devel-1.1.3-2.el5.x86_64

Hardware

First, make sure your hardware is working correctly:
$ lspci | grep fini
Make sure the card shows up!  If not, there is basic hardware problem.  Try re-seating the card or moving it to another PCI slot.

Kernel driver

If you have installed the openib package, the Infiniband kernel module should be installed.  Reboot the system and look at the kernel boot messages for a good clue to which driver you need:
$ dmesg | grep mth
ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008)
ib_mthca: Initializing 0000:51:00.0
I need the mthca driver, so I use modprobe to load the kernel module:
$ modprobe ib_mthca
Note that even though the correct driver was mentioned in the boot messages, the module is NOT automatically installed.  I added this module to the /etc/modules file so that it will be automatically loaded at boot.  Now, check the module:
$ lsmod | grep ib_mthca
ib_mthca              158053  0
ib_mad                 70757  5 ib_mthca,ib_umad,ib_cm,ib_sa,mlx4_ib
ib_core               104901  17 ib_mthca,ib_iser,ib_srp,rds,ib_sdp,ib_ipoib,rdma_ucm,rdma_cm,ib_ucm,ib_uverbs,ib_umad,ib_cm,iw_cm,ib_sa,mlx4_ib,ib_mad,iw_cxgb3
Once the kernel driver is loaded, you should see a directory under /sys/class/infiniband:
$ ls /sys/class/infiniband
mthca0

User-space driver

Okay, that’s working, but ibv_devices and ibv_devinfo still report:
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
Now install the appropriate driver for the user space:
$ yum install libmthca
Check again:
$ ibv_devices
device                 node GUID
------              ----------------
mthca0              0005ad00000c1588
[root@evc2 ~]# ibv_devinfo
hca_id:    mthca0
transport:            InfiniBand (0)
fw_ver:                1.2.917
node_guid:            0005:ad00:000c:1588
sys_image_guid:            0005:ad00:0100:d050
vendor_id:            0x05ad
vendor_part_id:            25204
hw_ver:                0xA0
board_id:            HCA.Cheetah-DDR.20
phys_port_cnt:            1
port:    1
state:            PORT_ACTIVE (4)
max_mtu:        2048 (4)
active_mtu:        2048 (4)
sm_lid:            1
port_lid:        275
port_lmc:        0x00
If you still get the error
Failed to get IB devices list: Function not implemented
the root cause might be that you need to load the correct driver:
modprobe ib_uverbs
$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:14:5E:F4:3A:A8
inet addr:172.20.102.2  Bcast:172.20.255.255  Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:253460 errors:0 dropped:0 overruns:0 frame:0
TX packets:140500 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:351821939 (335.5 MiB)  TX bytes:12147168 (11.5 MiB)
Interrupt:185 Memory:e4000000-e4012800
 
ib0       Link encap:InfiniBand  HWaddr 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:172.21.102.2  Bcast:172.21.255.255  Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
RX packets:25 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:1 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:1400 (1.3 KiB)  TX bytes:0 (0.0 b)
 
lo        Link encap:Local Loopback
inet addr:127.0.0.1  Mask:255.0.0.0
UP LOOPBACK RUNNING  MTU:16436  Metric:1
RX packets:1516 errors:0 dropped:0 overruns:0 frame:0
TX packets:1516 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2394328 (2.2 MiB)  TX bytes:2394328 (2.2 MiB)

TCP over Infiniband

Following the Red Hat documentation, create a device configuration file called
/etc/sysconfig/network-scripts/ifcfg-ib0
Make sure to set the right IP address for YOUR configuration: (EDIT: Removed TYPE parameter)
DEVICE=ib0
BOOTPROTO=none
ONBOOT=yes
IPADDR=172.21.102.2
NETMASK=255.255.0.0
Now you should be able to ping the Infiniband interface (assuming the cable is plugged in to a working fabric):
$ ping ivc2
PING ivc2 (172.21.102.2) 56(84) bytes of data.
64 bytes from ivc2 (172.21.102.2): icmp_seq=1 ttl=64 time=2.38 ms

Enable connected mode

Edit: 21 Sept 2013
By default, RedHat does not enable “connected mode” on Infiniband. Enabling connected mode can substantially speed up IP-over-IB transport. Add the following line to the config file you created in the previous step
CONNECTED_MODE=Yes
and restart Infiniband.

References

No comments:

Post a Comment