Featured Post

Links library

2019/10/09

RoCE Troubleshooting

s2d.dk is not responsible for any errors, or for the results obtained from the use of this information on s2d.dk. All information in this site is provided as "draft notes" and "as is", with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information. Always test in a lab setup, before use any of the information in production environment. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

(Updated 2019.10.09)

RoCE Troubleshooting

(DRAFT, work in progress last update 2019.10.09 New Screen Images and videos will be available within the next days)

The RDMA Activity Performance Counter can give you some useful information

RDMA Accepted Connections will increase for each new connection. The Picture show a lab system with a RoCE Error. RDMA Accepted Connections will normally only change if you move VMs (SOFS), disable/enable pNIC/vNIC or reboot a Node.

The picture show that I have thousands of Accepted Connections and after just a few seconds I have 4 Accepted Connections. With no change in the Cluster.

Active Connections

Show the Active Connections on each vNIC/pNIC to see the connect use the "netstat -xan"