Tuesday, July 7, 2015

HPC TECH TIP: Infiniband Connectivity Testing

Scott CollierWhen you receive a new cluster, you’ll want to test the various components to make sure everything is working. Here we’ll take a look at how to do some very basic Infiniband connectivity tests to ensure your links are up and running at the correct speed. There are several different tools and methods you can use, but we’ll just cover a few.











My environment
Hardware
· Frontend: Dell PER710
· Compute Nodes: PER410 x 64
· Infiniband Connectx-2 IB Cards
· Qlogic 12800-040 IB Switch

Cluster Middleware
· PCM 1.2a
· Mellanox OFED 1.4

First, make sure IB hardware is discovered on a single compute node

Do the following:

1. ssh to a compute node as root and check the available IB tools in your path
i. Type ib<tab><tab> to see all the tools
ii. Experiment with a few
1. # ibstat
# ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 1
Firmware version: 2.7.0
Hardware version: a0
Node GUID: 0x0002c903000442f4
System image GUID: 0x0002c903000442f7
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 37
LMC: 0
SM lid: 1
Capability mask: 0x02510868
Port GUID: 0x0002c903000442f5
For this one you are looking at the “rate”, in our case, we are expecting 40 for QDR>

2. # ibhosts
# ibhosts
Ca : 0x0002c90300077f86 ports 1 "compute-00-08 HCA-1"
Ca : 0x0002c90300077f92 ports 1 "compute-00-06 HCA-1"
Ca : 0x0002c90300077fb2 ports 1 "compute-00-09 HCA-1"
Ca : 0x0002c90300077eae ports 1 "compute-00-02 HCA-1"
Ca : 0x0002c90300077f8e ports 1 "compute-00-10 HCA-1"
Ca : 0x0002c903000442a4 ports 1 "compute-00-26 HCA-1"
Ca : 0x0002c90300077f06 ports 1 "compute-00-18 HCA-1"
<snip>
3. # ibv_devinfo
# ibv_devinfo
hca_id: mlx4_0
fw_ver: 2.7.000
node_guid: 0002:c903:0004:42f4
sys_image_guid: 0002:c903:0004:42f7
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xA0
board_id: MT_0C40110009
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid: 37
port_lmc: 0x00
For this one, you are paying attention to the “state”, your port can be in 3 states:

PORT_ACTIVE = good
PORT_INIT = link but no subnet manager
PORT_DOWN = bad, no link detected

4. # ibswitches

5. etc… There are many more tools.
iii. Exit and return back to Installer node
Now check all the nodes

2. Check to make sure you have an active port on each node
a. # pdsh –a ibv_devinfo | grep –i port_active | dshbak -c
3. Check to make sure you can see all the hosts on the IB network
a. # pdsh -a ibhosts
4. Explore IB card model, from the installer node
a. # pdsh -a lspci | grep -i infini
<snip>
compute-00-63-eth0: 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev a0)
compute-00-13-eth0: 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev b0)
compute-00-32-eth0: 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev b0)
compute-00-23-eth0: 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev a0)
compute-00-04-eth0: 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev b0)
compute-00-33-eth0: 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev b0)
<snip>
Here you’ll notice the “rev”, some are “a0” and some are “b0”, this signifies whether the card is ConnectX-1 or ConnectX-2. In this case, “a0” is ConnectX-1 and “b0” is ConnectX-2.
b. # pdsh -a dmesg | grep -i infini
5. Look at driver modules loaded
a. # lsmod | grep –i core
b. # modinfo ib_core

6. To determine what type of IB switch you have (from a node with a IB connection)
a. # ibswitches
Check basic connectivity by running tests from one node to another

Do the following:

1. Open 2 terminals and SSH into each compute node

2. Run a simple send / receive test from one node to the other, we’ll test with ib_send_bw, ib_send_lat
a. To test latency, on the first compute node:
i. # ibv_send_lat
1. Hit enter and you will see the following output, while this node is waiting on traffic
# ib_send_lat
------------------------------------------------------------------
Send Latency Test
Inline data is used up to 400 bytes message
Connection type : RC
local address: LID 0x2b QPN 0x100049 PSN 0xcb9adf
b. On compute node 1
i. # ibv_send_lat compute-0X-00
2. Hit enter and you will see something like the following output on a successful run

# ib_send_lat compute-00-11
------------------------------------------------------------------
Send Latency Test
Inline data is used up to 400 bytes message
Connection type : RC
local address: LID 0x33 QPN 0x40049 PSN 0x550d5e
remote address: LID 0x2b QPN 0x100049 PSN 0xcb9adf
Mtu : 2048
------------------------------------------------------------------
#bytes #iterations t_min[usec] t_max[usec] t_typical[usec]
2 1000 1.52 11.40 1.54
------------------------------------------------------------------
b. To test bandwidth, on the first compute node
i. # ibv_send_bw
1. Hit enter and you will see the following output, while this node is waiting on traffic:
# ib_send_bw
------------------------------------------------------------------
Send BW Test
Connection type : RC
Inline data is used up to 1 bytes message
local address: LID 0x2b, QPN 0xc0049, PSN 0x30a89f

ii. # ibv_send_bw compute-0X-00
2. Hit enter and you will see the following output from a successful run
# ib_send_bw compute-00-11
------------------------------------------------------------------
Send BW Test
Connection type : RC
Inline data is used up to 1 bytes message
local address: LID 0x33, QPN 0x0049, PSN 0x70aa26
remote address: LID 0x2b, QPN 0xc0049, PSN 0x30a89f
Mtu : 2048
------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec]
65536 1000 3204.65 3202.92
------------------------------------------------------------------
That concludes this tech tip. We’ve really just scratched the surface and there are many other topics to cover with regards to Infiniband. What kind of bandwidth and latency to expect with different cards, how to run code over the Infiniband network, etc. I’ll cover those topics and more in some later posts.

-- Scott Collier

No comments:

Post a Comment