Featured Post

YouTube and link library for S2D.dk

2019/12/05

Monitor RoCE - Mellanox Switch SX1012

Monitor RoCE and configuration examples for Mellanox Switch SX1012

*** Disclaimer ***
s2d.dk is not responsible for any errors, or for the results obtained from the use of this information on s2d.dk. All information in this site is provided as "draft notes" and "as is", with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information. Always test in a lab setup, before use any of the information in production environment.
For any reference links to other websites we encourages you to read the privacy statements of the third-party websites.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
***

Performance Monitor and configuration examples for Mellanox Switch SX1012

Monitor Priority and Pause Frames for RDMA (RoCE) on Mellanox Switch SX1012
For the screen captures in the blog I used the Mellanox Switch SX1012
  • Mellanox SX1012 (SwitchX - Onyx 3.6.8010)

******************************************************************************
Mellanox Show Commands for DCB, PFC and ETS
******************************************************************************

show dcb priority-flow-control
show dcb priority-flow-control detail
show dcb ets

Review Interface
show dcb priority-flow-control interface ethernet 1/3
show dcb ets interface ethernet 1/3
show interfaces ethernet 1/3 counters priority 3




References

Microsoft:
Mellanox L2 PCP TC:
Terminology:
pNIC Physical NIC, the physical hardware that exchanges packets with the TOR
vNIC    Host vNIC – Virtual NIC from vSwitch exposed in the host partition
tNIC Host tNIC - Team Interface NIC from LBFO Team 
vmNIC Virtual Machine NIC – Virtual NIC from vSwitch exposed in a guest partition
vSwitch Hyper-V virtual switch
SET Switch Embedded Teaming, Hyper-V virtual switch supported in Windows Server 2016 and 2019

ToR Top of Rack switch
RDMA Remote Direct Memory Access
RoCE    RDMA over Converged Ethernet
RoCEv2 2nd generation RoCE using UDP/IP for routability (a.k.a. Routable RoCE)

DCB     Data Center Bridging
LLDP    Link Layer Data Protocol
DCBx    Data Center Bridging Capability Exchange protocol (DCBX) is an extension of LLDP.
PFC     Priority Flow Control
ETS     Enhanced Transmission Service
TC      Traffic Class

ECN     Explicit Congestion Notification
RED     Random Early Detection
CNP     Congestion Notification Packet. CNP control frames (congestion ACK)
SP      Strict Priority
WRR     Weighted Round Robin


******************************************************************************
Mellanox Switch configuration example (L2):
******************************************************************************

Mellanox SX1012

enable
configure terminal 
interface ethernet 1/1-1/12 flowcontrol send off force
interface ethernet 1/1-1/12 flowcontrol receive off force

dcb priority-flow-control enable force
dcb priority-flow-control priority 3 enable
(Priority 3 is used for Storage (SMB) traffic)
interface ethernet 1/1-1/12 dcb priority-flow-control mode on force
interface ethernet 1/1 switchport mode hybrid
(Need to be repeated for each port)
S2D# interface ethernet 1/x switchport hybrid allowed-vlan all
Need to be repeated for each port. Only allow the needed vlans, the "allowed-vlan all" is not recommended, this is from a lab/test system).
dcb ets tc bandwidth 49 50 0 1
(More infromation below)
exit
write memory

******************************************************************************
Mellanox Switch Disable PFC

******************************************************************************

no dcb priority-flow-control priority 3 enable



******************************************************************************
Mellanox SX1012 (SwitchX)
******************************************************************************
show dcb priority-flow-control detail





















******************************************************************************
show dcb priority-flow-control
show dcb priority-flow-control interface ethernet 1/1
******************************************************************************




































******************************************************************************
show dcb ets
******************************************************************************
























******************************************************************************
For the PFC enabled priorities we need to use the lossless TCs

So for the Microsoft use cases we change the default to:
Priority 0 Default traffic 49%
Priority 3 SMB traffic 50%
Priority 7 Cluster traffic 1%

dcb ets tc bandwidth 49 50 0 1

******************************************************************************
show dcb ets interface ethernet 1/1
















































******************************************************************************
show interfaces ethernet 1/3 counters priority 3
******************************************************************************








Monitor RoCE - Mellanox Switch SN2100

Monitor RoCE and configuration examples for Mellanox Switch SN2100

*** Disclaimer ***
s2d.dk is not responsible for any errors, or for the results obtained from the use of this information on s2d.dk. All information in this site is provided as "draft notes" and "as is", with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information. Always test in a lab setup, before use any of the information in production environment.
For any reference links to other websites we encourages you to read the privacy statements of the third-party websites.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
***

Performance Monitor and configuration examples for Mellanox Switch SN2100
Monitor Priority and Pause Frames for RDMA (RoCE) on Mellanox Switch.SN2100
For the screen captures the Mellanox Switch SN2100 was used with the listed version and config:
  • Mellanox SN2100 (Spectrum - Onyx 3.6.8010)
  • The SwitchX and Spectrum are configured in different ways when we look on L2 ETS
  • The Spectrum also support to use L3 (DSCP)Note: Microsoft configuration guide use RoCE Layer 2 (L2) for S2D that use SET (NDKm2)

******************************************************************************
Mellanox Show Commands for DCB, PFC and ETS
******************************************************************************

show dcb priority-flow-control
show dcb priority-flow-control detail
show dcb ets

Show Interface
show dcb priority-flow-control interface ethernet 1/11
show dcb ets interface ethernet 1/11

show qos interface ethernet 1/11 tc-mapping
show qos interface ethernet 1/11 rewrite-mapping

show interfaces ethernet 1/11 counters pfc prio all
show interfaces ethernet 1/11 counters pfc prio 3

show interfaces ethernet 1/11 pfc-wd

******************************************************************************




References

Microsoft:
Mellanox:
Mellanox L2 PCP TC:
Mellanox L3 - DSCP:
Terminology:
pNIC Physical NIC, the physical hardware that exchanges packets with the TOR
vNIC    Host vNIC – Virtual NIC from vSwitch exposed in the host partition
tNIC Host tNIC - Team Interface NIC from LBFO Team 
vmNIC Virtual Machine NIC – Virtual NIC from vSwitch exposed in a guest partition
vSwitch Hyper-V virtual switch
SET Switch Embedded Teaming, Hyper-V virtual switch supported in Windows Server 2016 and 2019
ToR Top of Rack switch
DCB     Data Center Bridging
TC      Traffic Class
LLDP    Link Layer Data Protocol

RDMA Remote Direct Memory Access
RoCE    RDMA over Converged Ethernet
RoCEv2 2nd generation RoCE using UDP/IP for routability (a.k.a. Routable RoCE)
DCBx    Data Center Bridging Capability Exchange protocol (DCBX) is an extension of LLDP.
PFC     Priority Flow Control
ETS     Enhanced Transmission Service
ECN     Explicit Congestion Notification
RED     Random Early Detection
CNP     Congestion Notification Packet. CNP control frames (congestion ACK)
SP      Strict Priority
WRR     Weighted Round Robin

******************************************************************************
Mellanox SN-2100 change from WRR to Strict
******************************************************************************

enable
configure terminal 
interface ethernet 1/1-1/12 traffic-class 3 dcb ets strict
interface ethernet 1/1-1/12 traffic-class 7 dcb ets strict
show dcb ets interface ethernet 1/1
exit
write memory




******************************************************************************
Mellanox Switch configuration example (L2):
******************************************************************************

Mellanox SN2100 (Spectrum)

enable
configure terminal 
interface ethernet 1/1-1/12 traffic-class 3 dcb ets strict
interface ethernet 1/1-1/12 traffic-class 7 dcb ets strict

interface ethernet 1/1-1/12 flowcontrol send off force
interface ethernet 1/1-1/12 flowcontrol receive off force

dcb priority-flow-control enable force
dcb priority-flow-control priority 3 enable
(Priority 3 is used for Storage (SMB) traffic)
interface ethernet 1/1-1/12 dcb priority-flow-control mode on force
interface ethernet 1/1 switchport mode hybrid
(Need to be repeated for each port)
S2D# interface ethernet 1/x switchport hybrid allowed-vlan all
Need to be repeated for each port. Only allow the needed vlans, the "allowed-vlan all" is not recommended, this is from a lab/test system).
exit
write memory

******************************************************************************
Reset to default ETS for Port 1 to 12
******************************************************************************

enable
configure terminal
no interface ethernet 1/1-1/12 traffic-class 0 dcb ets
no interface ethernet 1/1-1/12 traffic-class 1 dcb ets
no interface ethernet 1/1-1/12 traffic-class 2 dcb ets
no interface ethernet 1/1-1/12 traffic-class 3 dcb ets
no interface ethernet 1/1-1/12 traffic-class 4 dcb ets
no interface ethernet 1/1-1/12 traffic-class 5 dcb ets
no interface ethernet 1/1-1/12 traffic-class 6 dcb ets
no interface ethernet 1/1-1/12 traffic-class 7 dcb ets
show dcb ets interface ethernet 1/1
exit
write memory




******************************************************************************
Mellanox Switch configuration example with DSCP:

Draft - Test only last update 2019.01.19
Note:
#1: The example use both L2/L3. SwitchX use L2 and Spectrum use both L2/L3
#2: Microsoft configuration guide use RoCE Layer 2 (L2) for S2D.
#3: DSCP is supported by Windows Server OS. However to use it for S2D is not recommended right now by Microsoft and for test you need to work close with the Hardware vendors. 

Lossless RoCE Configuration for MLNX-OS Switches in DSCP-Based QoS Mode
Lossless RoCE Configuration for MLNX-OS Switches in DSCP-Based QoS Mode (Old version)
******************************************************************************

S2D (Config)# interface ethernet 1/1-1/x qos trust ?
       port       based on port default settings
       L2         based on PCP, DEI fields
       L3         based on EXP, DSCP fields
       both       based on PCP, DEI and EXP, DSCP fields

S2D (Config)# interface ethernet 1/1-1/x qos trust both
(Note: In my lab I used both SwitchX and Spectrum. The Spectrum Switch need to accepts PCP and DSCP in my example)

******************************************************************************

interface ethernet 1/1-1/x traffic-class 3 congestion-control ecn minimum-absolute 150 maximum-absolute 1500
traffic pool roce type lossless
raffic pool roce memory percent 50.00
traffic pool roce map switch-priority 3
interface ethernet 1/1-1/x traffic-class 3 dcb ets strict
interface ethernet 1/1-1/x traffic-class 7 dcb ets strict
interface ethernet 1/1-1/x qos trust both

******************************************************************************

interface ethernet 1/1-1/x traffic-class 0 bind switch-priority 0
interface ethernet 1/1-1/x traffic-class 3 bind switch-priority 3
interface ethernet 1/1-1/x traffic-class 7 bind switch-priority 7

******************************************************************************

interface ethernet 1/1-1/x flowcontrol send off force
interface ethernet 1/1-1/x flowcontrol receive off force

dcb priority-flow-control enable force
dcb priority-flow-control priority 3 enable
(Priority 3 is used for Storage (SMB) traffic)
interface ethernet 1/1-1/x dcb priority-flow-control mode on force
interface ethernet 1/x switchport mode hybrid
(Need to be repeated for each port from 1 to x)
S2D (Config)# interface ethernet 1/x switchport hybrid allowed-vlan all
Need to be repeated for each port from 1 to x. Only allow the needed vlans, the "allowed-vlan all" is not recommended, this is from a lab/test system).

******************************************************************************
Host pNIC configuration

Driver version 2.10 or newer

S2D PS> Mlx5Cmd.exe -QosConfig -SetupRoceQosConfig -Name NIC3 -Configure 2



Default DSCP to switch-priority mapping:
0-7 → 0
8-15 → 1
16-23 → 2
24-31 → 3
32-39 → 4
40-47 → 5
48-55 → 6
56-63 → 7


****************************************************************************** 

2019/11/24

YouTube and link library for S2D.dk

YouTube and link library for S2D.dk

Please subscribe to S2D.dk YouTube… to support the channel

S2D.dk YouTube Channel

*** Disclaimer ***
s2d.dk is not responsible for any errors, or for the results obtained from the use of this information on s2d.dk. All information in this site is provided as "draft notes" and "as is", with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information. Always test in a lab setup, before use any of the information in production environment.
For any reference links to other websites we encourages you to read the privacy statements of the third-party websites.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
***

YouTube
Update
News
Storage Spaces Direct (S2D)
Micron Tools
Microsoft Tools
Mellanox
Chelsio
Cisco

Dell
HPE

2019/11/23

DiskSpd remote disk Performance and Host impact

DiskSpd (diskspd.exe) - Disk Read/Write impact on the Host CPU/LPs (Remote Disk with RDMA/iWARP)
Examples with 1, 10 and 32 threads per target. 1 or 8 Number of outstanding I/O requests per-target per-threads.

*** Disclaimer ***
s2d.dk is not responsible for any errors, or for the results obtained from the use of this information on s2d.dk. All information in this site is provided as "draft notes" and "as is", with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information. Always test in a lab setup, before use any of the information in production environment.
For any reference links to other websites we encourages you to read the privacy statements of the third-party websites.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
***

diskspd.exe - Disk Read/Write impact on the Host CPU/LPs 
The performance counters shows the Remote Client Host CPU/LPs impact, while the diskspd.exe creating workload

Both the File Server and the Client Host use Chelsio 40G Network Adapteres with RDMA/iWARP
The File Server use a "SET Switch" with two vNICs and the Client use pNIC
  • S046001E - File Server
    • 172.18.0.119
    • 172.18.0.120
  • S046002A - Client (Running DiskSpd)
    • 172.18.0.121
    • 172.18.0.122
Part 1 - Performance test with DiskSpd (Remote Disk Performance - Part 1)



Part 2 - Performance test with DiskSpd (Remote Disk Performance - Part 2)




The first Picture shows the SMB RDMA Connections
Note: The direction is opposite of the job, Read Test will show SMB Direct Write traffic

SMB Direct Connections from Client Server to File Server
***

Client Server Read Performance from the remote File Share (Remote Disk)
Note:
The Client Server use a pNIC from Chelsio with RDMA/iWARP enabled
***

Client Server Read Performance from the remote File Share (Remote Disk)
***

Client Server Write Performance from the remote File Share (Remote Disk)
***


Client Server Write Performance from the remote File Share (Remote Disk)
***

File Server Host Read Performance from the remote Client (Local Disk impact on the File Server)
Note:
The File Server use a Virtual Switch (SET Switch) with vNICs and RDMA enabled (NDKm2)
The pNICs are also Chelsio with RDMA/iWARP
***

File Server Host Write Performance from the remote Client (Local Disk impact on the File Server)
***






2019/11/22

DiskSpd local disk Performance and Host impact

DiskSpd (diskspd.exe) - Disk Read/Write impact on the Host CPU/LPs (Local Disk)
Examples with 1, 10 and 32 threads per target. 1 or 8 Number of outstanding I/O requests per-target per-threads.

*** Disclaimer ***
s2d.dk is not responsible for any errors, or for the results obtained from the use of this information on s2d.dk. All information in this site is provided as "draft notes" and "as is", with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information. Always test in a lab setup, before use any of the information in production environment.
For any reference links to other websites we encourages you to read the privacy statements of the third-party websites.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
***

diskspd.exe - Disk Read/Write impact on the Host CPU/LPs 
The performance counters shows the Host CPU/LPs impact, while the diskspd.exe creating workloads




Links for download DiskSpd and Command help on GitHub

Example:
Create the Test file "io.dat" with the size of 100GB add the command "-c100G" the first time
Diskspd.exe -b8K -d60 -Su -L -o1 -t1 -r -w0 -W20 -c100G D:\Temp\io.dat


Read Test:
Diskspd.exe -b8K -d60 -Su -L -o1 -t1 -r -w0 -W30 D:\Temp\io.dat
Diskspd.exe -b8K -d60 -Su -L -o8 -t1 -r -w0 -W30 D:\Temp\io.dat

Diskspd.exe -b8K -d60 -Su -L -o1 -t10 -r -w0 -W20 D:\Temp\io.dat
Diskspd.exe -b8K -d60 -Su -L -o8 -t10 -r -w0 -W20 D:\Temp\io.dat

Diskspd.exe -b8K -d60 -Su -L -o1 -t32 -r -w0 -W20  D:\Temp\io.dat
Diskspd.exe -b8K -d60 -Su -L -o8 -t32 -r -w0 -W20  D:\Temp\io.da


Write Tests:
Diskspd.exe -b8K -d60 -Su -L -o1 -t1 -r -w100 -W30 D:\Temp\io.dat
Diskspd.exe -b8K -d60 -Su -L -o8 -t1 -r -w100 -W30 D:\Temp\io.dat

Diskspd.exe -b8K -d60 -Su -L -o1 -t10 -r -w100 -W20 D:\Temp\io.dat
Diskspd.exe -b8K -d60 -Su -L -o8 -t10 -r -w100 -W20 D:\Temp\io.dat

Diskspd.exe -b8K -d60 -Su -L -o1 -t32 -r -w100 -W20  D:\Temp\io.dat
Diskspd.exe -b8K -d60 -Su -L -o8 -t32 -r -w100 -W20  D:\Temp\io.da

Examples with 1, 10 and 32 threads per target. 1 or 8 Number of outstanding I/O requests per-target per-thread
Read, Number of threads 1 and 1 I/O requests per shared thread
***

Read, Number of threads 1 and 1 I/O requests per shared thread
***

Write, Number of threads 1 and 1 I/O requests per shared thread
***

Write, Number of threads 1 and 1 I/O requests per shared thread
***


Read, Number of threads 10 and 1 I/O requests per shared thread
***

Read, Number of threads 10 and 1 I/O requests per shared thread
***

Read, Number of threads 10 and 8 I/O requests per shared thread
***

Read, Number of threads 10 and 8 I/O requests per shared thread
***

Write, Number of threads 10 and 1 I/O requests per shared thread
***

Write, Number of threads 10 and 1 I/O requests per shared thread
***

Write, Number of threads 10 and 8 I/O requests per shared thread
***

Write, Number of threads 10 and 8 I/O requests per shared thread
***


Read, Number of threads 32 and 1 I/O requests per shared thread
***

Read, Number of threads 32 and 1 I/O requests per shared thread
***

Read, Number of threads 32 and 8 I/O requests per shared thread
***

Read, Number of threads 32 and 8 I/O requests per shared thread
***
Write, Number of threads 32 and 1 I/O requests per shared thread
***

Write, Number of threads 32 and 1 I/O requests per shared thread
***

Write, Number of threads 32 and 8 I/O requests per shared thread
***

Write, Number of threads 32 and 8 I/O requests per shared thread
***