Featured Post

YouTube and link library for S2D.dk

2021/04/30

Azure Stack HCI Troubleshooting "UseRdmaForStorage"

Azure Stack HCI Troubleshooting the Cluster Object "UseRdmaForStorage"


*** Disclaimer ***
s2d.dk is not responsible for any errors, or for the results obtained from the use of this information on s2d.dk. All information in this site is provided as "draft notes" and "as is", with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information. Always test in a lab setup, before use any of the information in production environment.
For any reference links to other websites we encourages you to read the privacy statements of the third-party websites.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
***

Last update: 2021.04.30

Get-HealthFault list a event with the Reason:
The cluster detected network connectivity issues that prevent Storage Spaces Direct from working properly.
To ensure consistent performance and data safety, Storage Spaces Direct has stopped using remote direct memory access (RDMA) even if RDMA-capable hardware is present and enabled.
Storage Spaces Direct will continue to flow but diminished performance using TCP/IP.

Azure Stack HCI (20H2)

Azure Stack HCI (20H2), have a new toggle switch called "UseRDMAForStorage" which gets flipped to 0 (Off) when Network issues are detected. The issue detection looks for SMB spontaneous disconnects and, if they occur often without an obvious explanation (e.g., the node restarting) then the Cluster stops to relying on RDMA/RoCE as a precaution. If you are confident that the network issue is fixed, you can flip the setting back to 1 (On).

If you get the Event RDMA is off and the Azure Stack HCI (20H2) used TCP/IP from Get-HealthFault

Review the Cluster Objects:
  • Get-Cluster | fl *
  • Cluster Object: UseRdmaForStorage (1=On) or (0=Off)
If the value is 0 review and validate the RDMA/RoCE and DCB settings with tools like:
  • "netstat -xan" shows the RDMA SMB connections
  • "Validate-DCB" (More information)
  • "Perfmon /sys" add counter for Network and RDMA
Change the Cluster Object:
  • (Get-Cluster).UseRdmaForStorage=1
Links: