Featured Post

Links library

2019/01/05

Monitor RoCE - Mellanox

s2d.dk is not responsible for any errors, or for the results obtained from the use of this information on s2d.dk. All information in this site is provided as "draft notes" and "as is", with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information. Always test in a lab setup, before use any of the information in production environment. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.


Performance Monitor
Monitor Priority and Pause Frames for RDMA (RoCE) on Mellanox Switch.

(DRAFT, work in progress last update 2019.08.08)


Microsoft references:

For the screen captures I use two different Mellanox Switch.
  • Mellanox SX1012 (SwitchX - Onyx 3.6.8010)
  • Mellanox SN2100 (Spectrum - Onyx 3.6.8010)
  • The SwitchX and Spectrum are configured in different ways when we look on L2 ETS
  • The Spectrum also support to use L3 (DSCP) Note: Microsoft configuration guide use RoCE Layer 2 (L2) for S2D that use SET (NDKm2)

******************************************************************************
Mellanox Show Commands
******************************************************************************

S2D# show dcb priority-flow-control detail
S2D# show dcb priority-flow-control
S2D# show dcb priority-flow-control interface ethernet 1/1
S2D# show dcb ets
S2D# show dcb ets interface ethernet 1/1


References

Microsoft:
Windows Server 2016 and 2019 RDMA Deployment Guide
Validate DCB

Mellanox L2 PCP TC:
How to Install Windows Server 2016 with RoCEv2 and Switch Embedded Teaming over HA Mellanox Network Solution
Understanding QoS Classification (Trust) on Spectrum Switche
Understanding Traffic Class (TC) Scheduling on Mellanox Spectrum Switches (WRR,SP)
Understanding RoCEv2 Congestion Management
DCBX Versions and Support on Mellanox  Ethernet Switches
HowTo Configure PFC on ConnectX-4

Mellanox L3 - DSCP:
Recommended Network Configuration Examples for RoCE Deployment
Lossless RoCE Configuration for MLNX-OS Switches in DSCP-Based QoS Mode
Lossless RoCE Configuration for MLNX-OS Switches in DSCP-Based QoS Mode (advanced mode)
Lossless RoCE Configuration for WinOF2 driver in DSCP-based QoS mode
HowTo Configure Mellanox Spectrum Switch for Lossless RoCE
HowTo Configure ECN on Mellanox Ethernet Switches (Spectrum)

Terminology:
pNIC Physical NIC, the physical hardware that exchanges packets with the TOR
vNIC    Host vNIC – Virtual NIC from vSwitch exposed in the host partition
tNIC Host tNIC - Team Interface NIC from LBFO Team 
vmNIC Virtual Machine NIC – Virtual NIC from vSwitch exposed in a guest partition
vSwitch Hyper-V virtual switch
SET Switch Embedded Teaming, Hyper-V virtual switch supported in Windows Server 2016 and 2019

ToR Top of Rack switch
RDMA Remote Direct Memory Access
RoCE    RDMA over Converged Ethernet
RoCEv2 2nd generation RoCE using UDP/IP for routability (a.k.a. Routable RoCE)

DCB     Data Center Bridging
LLDP    Link Layer Data Protocol
DCBx    Data Center Bridging Capability Exchange protocol (DCBX) is an extension of LLDP.
PFC     Priority Flow Control
ETS     Enhanced Transmission Service
TC      Traffic Class

ECN     Explicit Congestion Notification
RED     Random Early Detection
CNP     Congestion Notification Packet. CNP control frames (congestion ACK)
SP      Strict Priority
WRR     Weighted Round Robin


******************************************************************************
Mellanox SX1012 (SwitchX)
******************************************************************************
S2D# show dcb priority-flow-control detail

PFC enabled
Priority Enabled List    :3
Priority Disabled List   :0 1 2 4 5 6 7

PFC Port Eth1/x            Information
-----------------------------------------------
PFC Port Mode       :On
PFC Oper State      :On
No Remote Entry is Present
-----------------------------------------------

******************************************************************************
S2D# show dcb priority-flow-control

PFC enabled
Priority Enabled List    :3
Priority Disabled List   :0 1 2 4 5 6 7

TC     Lossless
---    ----------
0           N
1           Y
2           Y
3           N

Interface      PFC admin        PFC oper
------------   --------------   -------------
Eth1/1           On               Enabled
Eth1/2           On               Enabled
Eth1/3           On               Enabled
Eth1/4           On               Enabled
Eth1/5           On               Enabled
Eth1/6           On               Enabled
Eth1/7           On               Enabled
Eth1/8           On               Enabled
Eth1/9           On               Enabled
Eth1/10          On               Enabled
Eth1/11          On               Enabled
Eth1/12          On               Enabled


******************************************************************************
S2D# show dcb priority-flow-control interface ethernet 1/1

PFC enabled
Priority Enabled List    :3
Priority Disabled List   :0 1 2 4 5 6 7

TC     Lossless
---    ----------
0           N
1           Y
2           Y
3           N

Interface      PFC admin        PFC oper
------------   --------------   -------------
Eth1/1           On               Enabled

******************************************************************************
S2D# show dcb ets
(Note: Default value before change)

ETS enabled

TC        Bandwidth
--------------------------
0         25%
1         25%
2         25%
3         25%

Number of Traffic Class: 4

******************************************************************************
S2D# show dcb ets interface ethernet 1/1
(Note: Default value before change)

ETS Port Mode             :AUTO MODE
ETS Oper State            :INIT STATE
ETS State Machine Type    :Assymetric
-----------------------------------------------
ETS Local Port Info
-----------------------------------------------
TC bandwidth table
-----------------------------------------------
TC        Bandwidth        RecomBandwidth
-----------------------------------------------
0         25%              25%
1         25%              25%
2         25%              25%
3         25%              25%

priority assignment table
--------------------------------------
Priority     TC
--------------------------------------
0            0
1            0
2            1
3            1
4            2
5            2
6            3
7            3

Number of Traffic Class: 4

Willing Status:  Disable
-----------------------------------------------
ETS Admin Port Info
-----------------------------------------------
TC        Bandwidth        RecomBandwidth
-----------------------------------------------
0         25%              25%
1         25%              25%
2         25%              25%
3         25%              25%
-----------------------------------------------
ETS Remote Port Info
-----------------------------------------------
No Remote Entry is Present
-----------------------------------------------

******************************************************************************
Configuration of SX1012 (SwitchX)
******************************************************************************

Mapping of priority to traffic classes (TC)
Priority 0 and 1 mapped to TC 0
Priority 2 and 3 mapped to TC 1
Priority 4 and 5 mapped to TC 2
Priority 6 and 7 mapped to TC 3

TC 0 and TC 3 are lossy TCs.
TC 1 and TC 2 are lossless TCs.

For the PFC enabled priorities we need to use the lossless TCs

So for the Microsoft Example use case we change the default to:
Priority 0 Default traffic 39%
Priority 3 SMB traffic 60%
Priority 5 Cluster traffic 1%

S2D (config)# dcb ets tc bandwidth 39 60 1 0

******************************************************************************

S2D# show dcb ets interface ethernet 1/1
(Note: Value after change)

S2D# sh dcb ets interface ethernet 1/1

ETS Port Mode             :AUTO MODE
ETS Oper State            :INIT STATE
ETS State Machine Type    :Assymetric
-----------------------------------------------
ETS Local Port Info
-----------------------------------------------
TC bandwidth table
-----------------------------------------------
TC        Bandwidth        RecomBandwidth
-----------------------------------------------
0         39%              39%
1         60%              60%
2          1%               1%
3          0%               0%

priority assignment table
--------------------------------------
Priority     TC
--------------------------------------
0            0
1            0
2            1
3            1
4            2
5            2
6            3
7            3

Number of Traffic Class: 4

Willing Status:  Disable
-----------------------------------------------
ETS Admin Port Info
-----------------------------------------------
TC        Bandwidth        RecomBandwidth
-----------------------------------------------
0         39%              39%
1         60%              60%
2          1%               1%
3          0%               0%
-----------------------------------------------
ETS Remote Port Info
-----------------------------------------------
No Remote Entry is Present
-----------------------------------------------

******************************************************************************
Mellanox SN-2100 Spectrum
******************************************************************************

S2D# show dcb ets interface ethernet 1/1
(Note: Default value before change)

Eth1/1
 Interface Bandwidth Shape [Mbps]: N/A
 Multicast unaware mapping : disabled

 ETS per TC :
 TC Scheduling Mode Weight Weight (%)
 -- --------------- ------ ----------
 0  WRR             12     12
 1  WRR             13     13
 2  WRR             12     12
 3  WRR             13     13
 4  WRR             12     12
 5  WRR             13     13
 6  WRR             12     12
 7  WRR             13     13

 Bandwidth Shape per TC:
 TC Bandwidth Shape [Mbps]
 -- ----------------------
 0  N/A
 1  N/A
 2  N/A
 3  N/A
 4  N/A
 5  N/A
 6  N/A
 7  N/A

 Bandwidth Guarantee per TC:
 TC Bandwidth Guaranteed [Mbps]
 -- ---------------------------
 0  0
 1  0
 2  0
 3  0
 4  0
 5  0
 6  0
 7  0

 Switch Priority to TC mapping:
 Switch Priority TC
 --------------- --
 0               0
 1               1
 2               2
 3               3
 4               4
 5               5
 6               6
 7               7

******************************************************************************







******************************************************************************

Mellanox Switch configuration example (L2):
******************************************************************************

Mellanox SN2100 (Spectrum)

S2D> enable
S2D# configure terminal 
S2D (Config)# interface ethernet 1/1-1/12 traffic-class 3 dcb ets Strict
S2D (Config)# interface ethernet 1/1-1/12 traffic-class 7 dcb ets Strict

S2D (Config)# interface ethernet 1/1-1/x flowcontrol send off force
S2D (Config)# interface ethernet 1/1-1/x flowcontrol receive off force

S2D (Config)# dcb priority-flow-control enable force
S2D (Config)# dcb priority-flow-control priority 3 enable
(Priority 3 is used for Storage (SMB) traffic)
S2D (Config)# interface ethernet 1/1-1/x dcb priority-flow-control mode on force
S2D (Config)# interface ethernet 1/x switchport mode hybrid
(Need to be repeated for each port from 1 to 12/16)
S2D# interface ethernet 1/x switchport hybrid allowed-vlan all
Need to be repeated for each port from 1 to 12/16. Only allow the needed vlans, the "allowed-vlan all" is not recommended, this is from a lab/test system).
S2D (Config)# exit
S2D# write memory

******************************************************************************
S2D# show dcb ets interface ethernet 1/1
Eth1/1
 Interface Bandwidth Shape [Mbps]: N/A
 Multicast unaware mapping : disabled

 ETS per TC :
 TC Scheduling Mode Weight Weight (%)
 -- --------------- ------ ----------
 0  WRR             12     12
 1  WRR             13     13
 2  WRR             12     12
 3  Strict          0      0
 4  WRR             12     12
 5  WRR             13     13
 6  WRR             12     12
 7  Strict          0      0

 Bandwidth Shape per TC:
 TC Bandwidth Shape [Mbps]
 -- ----------------------
 0  N/A
 1  N/A
 2  N/A
 3  N/A
 4  N/A
 5  N/A
 6  N/A
 7  N/A

 Bandwidth Guarantee per TC:
 TC Bandwidth Guaranteed [Mbps]
 -- ---------------------------
 0  0
 1  0
 2  0
 3  0
 4  0
 5  0
 6  0
 7  0

 Switch Priority to TC mapping:
 Switch Priority TC
 --------------- --
 0               0
 1               1
 2               2
 3               3
 4               4
 5               5
 6               6
 7               7



******************************************************************************
Mellanox Switch configuration example with DSCP:

Draft - Test only last update 2019.01.19
Note:
#1: The example use both L2/L3. SwitchX use L2 and Spectrum use both L2/L3
#2: Microsoft configuration guide use RoCE Layer 2 (L2) for S2D.
#3: DSCP is supported by Windows Server OS. However to use it for S2D is not recommended right now by Microsoft and for test you need to work close with the Hardware vendors. 

Lossless RoCE Configuration for MLNX-OS Switches in DSCP-Based QoS Mode
Lossless RoCE Configuration for MLNX-OS Switches in DSCP-Based QoS Mode (Old version)
******************************************************************************

S2D (Config)# interface ethernet 1/1-1/x qos trust ?
       port       based on port default settings
       L2         based on PCP, DEI fields
       L3         based on EXP, DSCP fields
       both       based on PCP, DEI and EXP, DSCP fields

S2D (Config)# interface ethernet 1/1-1/x qos trust both
(Note: In my lab I used both SwitchX and Spectrum. The Spectrum Switch need to accepts PCP and DSCP in my example)

******************************************************************************

S2D (config) # interface ethernet 1/1-1/x traffic-class 3 congestion-control ecn minimum-absolute 150 maximum-absolute 1500
S2D (config) # traffic pool roce type lossless
S2D (config) # traffic pool roce memory percent 50.00
S2D (config) # traffic pool roce map switch-priority 3
S2D (config) # interface ethernet 1/1-1/x traffic-class 3 dcb ets strict
S2D (config) # interface ethernet 1/1-1/x traffic-class 7 dcb ets strict
S2D (config) # interface ethernet 1/1-1/x qos trust both

******************************************************************************

S2D (Config)# interface ethernet 1/1-1/x traffic-class 0 bind switch-priority 0
S2D (Config)# interface ethernet 1/1-1/x traffic-class 3 bind switch-priority 3
S2D (Config)# interface ethernet 1/1-1/x traffic-class 7 bind switch-priority 7

******************************************************************************

S2D (Config)# interface ethernet 1/1-1/x flowcontrol send off force
S2D (Config)# interface ethernet 1/1-1/x flowcontrol receive off force

S2D (Config)# dcb priority-flow-control enable force
S2D (Config)# dcb priority-flow-control priority 3 enable
(Priority 3 is used for Storage (SMB) traffic)
S2D (Config)# interface ethernet 1/1-1/x dcb priority-flow-control mode on force
S2D (Config)# interface ethernet 1/x switchport mode hybrid
(Need to be repeated for each port from 1 to x)
S2D (Config)# interface ethernet 1/x switchport hybrid allowed-vlan all
Need to be repeated for each port from 1 to x. Only allow the needed vlans, the "allowed-vlan all" is not recommended, this is from a lab/test system).

******************************************************************************
Host pNIC configuration

Driver version 2.10 or newer







S2D PS> Mlx5Cmd.exe -QosConfig -SetupRoceQosConfig -Name NIC3 -Configure 2





Default DSCP to switch-priority mapping:
0-7 → 0
8-15 → 1
16-23 → 2
24-31 → 3
32-39 → 4
40-47 → 5
48-55 → 6
56-63 → 7


******************************************************************************