Azure Stack HCI Troubleshooting the Cluster Object "UseRdmaForStorage"
*** Disclaimer ***
s2d.dk is not responsible for any errors, or for the results obtained from the use of this information on s2d.dk. All information in this site is provided as "draft notes" and "as is", with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information. Always test in a lab setup, before use any of the information in production environment.
For any reference links to other websites we encourages you to read the privacy statements of the third-party websites.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
Last update: 2021.04.30
Azure Stack HCI (20H2)
Azure Stack HCI (20H2), have a new toggle switch called "UseRDMAForStorage" which gets flipped to 0 (Off) when Network issues are detected. The issue detection looks for SMB spontaneous disconnects and, if they occur often without an obvious explanation (e.g., the node restarting) then the Cluster stops to relying on RDMA/RoCE as a precaution. If you are confident that the network issue is fixed, you can flip the setting back to 1 (On).
When the Cluster disable RDMA and change to TCP...
Microsoft-Windows-FailoverClustering/Operational Event 5163:
- Cluster service disabled RDMA on the SMB instance for SBL IO on this node. All IO for this instance will now go over TCP connections only.
- Cluster service disabled RDMA on the SMB instance for CSV IO on this node. All IO for this instance will now go over TCP connections only.
When you enable RDMA again...
Microsoft-Windows-FailoverClustering/Operational Event 5164:
- Cluster service enabled RDMA on the SMB instance for SBL IO on this node.
- Cluster service enabled RDMA on the SMB instance for CSV IO on this node
Events that you will see in the minutes before the disabling of RDMA...
Microsoft-Windows-SMBClient/Connectivity Event 30804:
Instance name: \Device\SmbVsa
Server name: x.x.x.x
Server address: x.x.x.x:445
Connection type: Rdma
This indicates that the client's connection to the server was disconnected.
Frequent, unexpected disconnects when using an RDMA over Converged Ethernet (RoCE) adapter may indicate a network misconfiguration. RoCE requires Priority Flow Control (PFC) to be configured for every host, switch and router on the RoCE network. Failure to properly configure PFC will cause packet loss, frequent disconnects and poor performance.
- Get-Cluster | fl *
- Cluster Object: UseRdmaForStorage (1=On) or (0=Off)
- "netstat -xan" shows the RDMA SMB connections
- "Validate-DCB" (More information)
- "Perfmon /sys" add counter for Network and RDMA