Featured Post

YouTube and link library for S2D.dk

2019/10/09

RoCE Troubleshooting

RDMA/RoCE Troubleshooting

*** Disclaimer ***
s2d.dk is not responsible for any errors, or for the results obtained from the use of this information on s2d.dk. All information in this site is provided as "draft notes" and "as is", with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information. Always test in a lab setup, before use any of the information in production environment.
For any reference links to other websites we encourages you to read the privacy statements of the third-party websites.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
***

RoCE Troubleshooting

The RDMA Activity Performance Counter can give you some useful information

RDMA Accepted Connections will increase for each new connection. The Picture show a lab system with a RoCE Error. RDMA Accepted Connections will normally only change if you move VMs (SOFS), disable/enable pNIC/vNIC or reboot a Node.

The picture show that I have thousands of Accepted Connections and after just a few seconds I have 4 Accepted Connections. With no change in the Cluster.

Active Connections

Show the Active Connections on each vNIC/pNIC to see the connect use the "netstat -xan"