aboutsummaryrefslogtreecommitdiffstats
path: root/doc/guides/nics/mlx5.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/guides/nics/mlx5.rst')
-rw-r--r--doc/guides/nics/mlx5.rst221
1 files changed, 151 insertions, 70 deletions
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 0e6e525c..79c982e2 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1,40 +1,14 @@
-.. BSD LICENSE
+.. SPDX-License-Identifier: BSD-3-Clause
Copyright 2015 6WIND S.A.
- Copyright 2015 Mellanox
-
- Redistribution and use in source and binary forms, with or without
- modification, are permitted provided that the following conditions
- are met:
-
- * Redistributions of source code must retain the above copyright
- notice, this list of conditions and the following disclaimer.
- * Redistributions in binary form must reproduce the above copyright
- notice, this list of conditions and the following disclaimer in
- the documentation and/or other materials provided with the
- distribution.
- * Neither the name of 6WIND S.A. nor the names of its
- contributors may be used to endorse or promote products derived
- from this software without specific prior written permission.
-
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ Copyright 2015 Mellanox Technologies, Ltd
MLX5 poll mode driver
=====================
The MLX5 poll mode driver library (**librte_pmd_mlx5**) provides support
-for **Mellanox ConnectX-4**, **Mellanox ConnectX-4 Lx** and **Mellanox
-ConnectX-5** families of 10/25/40/50/100 Gb/s adapters as well as their
-virtual functions (VF) in SR-IOV context.
+for **Mellanox ConnectX-4**, **Mellanox ConnectX-4 Lx** , **Mellanox
+ConnectX-5** and **Mellanox Bluefield** families of 10/25/40/50/100 Gb/s
+adapters as well as their virtual functions (VF) in SR-IOV context.
Information and documentation about these adapters can be found on the
`Mellanox website <http://www.mellanox.com>`__. Help is also provided by the
@@ -95,17 +69,17 @@ Features
- Multiple process.
- KVM and VMware ESX SR-IOV modes are supported.
- RSS hash result is supported.
-- Hardware TSO.
-- Hardware checksum TX offload for VXLAN and GRE.
+- Hardware TSO for generic IP or UDP tunnel, including VXLAN and GRE.
+- Hardware checksum Tx offload for generic IP or UDP tunnel, including VXLAN and GRE.
- RX interrupts.
- Statistics query including Basic, Extended and per queue.
- Rx HW timestamp.
+- Tunnel types: VXLAN, L3 VXLAN, VXLAN-GPE, GRE, MPLSoGRE, MPLSoUDP.
+- Tunnel HW offloads: packet type, inner/outer RSS, IP and UDP checksum verification.
Limitations
-----------
-- Inner RSS for VXLAN frames is not supported yet.
-- Hardware checksum RX offloads for VXLAN inner header are not supported yet.
- For secondary process:
- Forked secondary process not supported.
@@ -131,11 +105,33 @@ Limitations
- A multi segment packet must have less than 6 segments in case the Tx burst function
is set to multi-packet send or Enhanced multi-packet send. Otherwise it must have
less than 50 segments.
+
- Count action for RTE flow is **only supported in Mellanox OFED**.
+
- Flows with a VXLAN Network Identifier equal (or ends to be equal)
to 0 are not supported.
+
- VXLAN TSO and checksum offloads are not supported on VM.
+- L3 VXLAN and VXLAN-GPE tunnels cannot be supported together with MPLSoGRE and MPLSoUDP.
+
+- VF: flow rules created on VF devices can only match traffic targeted at the
+ configured MAC addresses (see ``rte_eth_dev_mac_addr_add()``).
+
+.. note::
+
+ MAC addresses not already present in the bridge table of the associated
+ kernel network device will be added and cleaned up by the PMD when closing
+ the device. In case of ungraceful program termination, some entries may
+ remain present and should be removed manually by other means.
+
+- When Multi-Packet Rx queue is configured (``mprq_en``), a Rx packet can be
+ externally attached to a user-provided mbuf with having EXT_ATTACHED_MBUF in
+ ol_flags. As the mempool for the external buffer is managed by PMD, all the
+ Rx mbufs must be freed before the device is closed. Otherwise, the mempool of
+ the external buffers will be freed by PMD and the application which still
+ holds the external buffers may be corrupted.
+
Statistics
----------
@@ -171,9 +167,10 @@ These options can be modified in the ``.config`` file.
missing with ``ldd(1)``.
It works by moving these dependencies to a purpose-built rdma-core "glue"
- plug-in, which must either be installed in ``CONFIG_RTE_EAL_PMD_PATH`` if
- set, or in a standard location for the dynamic linker (e.g. ``/lib``) if
- left to the default empty string (``""``).
+ plug-in which must either be installed in a directory whose name is based
+ on ``CONFIG_RTE_EAL_PMD_PATH`` suffixed with ``-glue`` if set, or in a
+ standard location for the dynamic linker (e.g. ``/lib``) if left to the
+ default empty string (``""``).
This option has no performance impact.
@@ -183,14 +180,6 @@ These options can be modified in the ``.config`` file.
adds additional run-time checks and debugging messages at the cost of
lower performance.
-- ``CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE`` (default **8**)
-
- Maximum number of cached memory pools (MPs) per TX queue. Each MP from
- which buffers are to be transmitted must be associated to memory regions
- (MRs). This is a slow operation that must be cached.
-
- This value is always 1 for RX queues since they use a single MP.
-
Environment variables
~~~~~~~~~~~~~~~~~~~~~
@@ -250,8 +239,55 @@ Run-time configuration
Supported on:
- - x86_64 with ConnectX-4, ConnectX-4 LX and ConnectX-5.
- - POWER8 and ARMv8 with ConnectX-4 LX and ConnectX-5.
+ - x86_64 with ConnectX-4, ConnectX-4 LX, ConnectX-5 and Bluefield.
+ - POWER8 and ARMv8 with ConnectX-4 LX, ConnectX-5 and Bluefield.
+
+- ``mprq_en`` parameter [int]
+
+ A nonzero value enables configuring Multi-Packet Rx queues. Rx queue is
+ configured as Multi-Packet RQ if the total number of Rx queues is
+ ``rxqs_min_mprq`` or more and Rx scatter isn't configured. Disabled by
+ default.
+
+ Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe bandwidth
+ by posting a single large buffer for multiple packets. Instead of posting a
+ buffers per a packet, one large buffer is posted in order to receive multiple
+ packets on the buffer. A MPRQ buffer consists of multiple fixed-size strides
+ and each stride receives one packet. MPRQ can improve throughput for
+ small-packet tarffic.
+
+ When MPRQ is enabled, max_rx_pkt_len can be larger than the size of
+ user-provided mbuf even if DEV_RX_OFFLOAD_SCATTER isn't enabled. PMD will
+ configure large stride size enough to accommodate max_rx_pkt_len as long as
+ device allows. Note that this can waste system memory compared to enabling Rx
+ scatter and multi-segment packet.
+
+- ``mprq_log_stride_num`` parameter [int]
+
+ Log 2 of the number of strides for Multi-Packet Rx queue. Configuring more
+ strides can reduce PCIe tarffic further. If configured value is not in the
+ range of device capability, the default value will be set with a warning
+ message. The default value is 4 which is 16 strides per a buffer, valid only
+ if ``mprq_en`` is set.
+
+ The size of Rx queue should be bigger than the number of strides.
+
+- ``mprq_max_memcpy_len`` parameter [int]
+
+ The maximum length of packet to memcpy in case of Multi-Packet Rx queue. Rx
+ packet is mem-copied to a user-provided mbuf if the size of Rx packet is less
+ than or equal to this parameter. Otherwise, PMD will attach the Rx packet to
+ the mbuf by external buffer attachment - ``rte_pktmbuf_attach_extbuf()``.
+ A mempool for external buffers will be allocated and managed by PMD. If Rx
+ packet is externally attached, ol_flags field of the mbuf will have
+ EXT_ATTACHED_MBUF and this flag must be preserved. ``RTE_MBUF_HAS_EXTBUF()``
+ checks the flag. The default value is 128, valid only if ``mprq_en`` is set.
+
+- ``rxqs_min_mprq`` parameter [int]
+
+ Configure Rx queues as Multi-Packet RQ if the total number of Rx queues is
+ greater or equal to this value. The default value is 12, valid only if
+ ``mprq_en`` is set.
- ``txq_inline`` parameter [int]
@@ -270,34 +306,35 @@ Run-time configuration
This option should be used in combination with ``txq_inline`` above.
- On ConnectX-4, ConnectX-4 LX and ConnectX-5 without Enhanced MPW:
+ On ConnectX-4, ConnectX-4 LX, ConnectX-5 and Bluefield without
+ Enhanced MPW:
- Disabled by default.
- In case ``txq_inline`` is set recommendation is 4.
- On ConnectX-5 with Enhanced MPW:
+ On ConnectX-5 and Bluefield with Enhanced MPW:
- Set to 8 by default.
- ``txq_mpw_en`` parameter [int]
A nonzero value enables multi-packet send (MPS) for ConnectX-4 Lx and
- enhanced multi-packet send (Enhanced MPS) for ConnectX-5. MPS allows the
- TX burst function to pack up multiple packets in a single descriptor
- session in order to save PCI bandwidth and improve performance at the
- cost of a slightly higher CPU usage. When ``txq_inline`` is set along
- with ``txq_mpw_en``, TX burst function tries to copy entire packet data
- on to TX descriptor instead of including pointer of packet only if there
- is enough room remained in the descriptor. ``txq_inline`` sets
- per-descriptor space for either pointers or inlined packets. In addition,
- Enhanced MPS supports hybrid mode - mixing inlined packets and pointers
- in the same descriptor.
+ enhanced multi-packet send (Enhanced MPS) for ConnectX-5 and Bluefield.
+ MPS allows the TX burst function to pack up multiple packets in a
+ single descriptor session in order to save PCI bandwidth and improve
+ performance at the cost of a slightly higher CPU usage. When
+ ``txq_inline`` is set along with ``txq_mpw_en``, TX burst function tries
+ to copy entire packet data on to TX descriptor instead of including
+ pointer of packet only if there is enough room remained in the
+ descriptor. ``txq_inline`` sets per-descriptor space for either pointers
+ or inlined packets. In addition, Enhanced MPS supports hybrid mode -
+ mixing inlined packets and pointers in the same descriptor.
This option cannot be used with certain offloads such as ``DEV_TX_OFFLOAD_TCP_TSO,
DEV_TX_OFFLOAD_VXLAN_TNL_TSO, DEV_TX_OFFLOAD_GRE_TNL_TSO, DEV_TX_OFFLOAD_VLAN_INSERT``.
When those offloads are requested the MPS send function will not be used.
- It is currently only supported on the ConnectX-4 Lx and ConnectX-5
+ It is currently only supported on the ConnectX-4 Lx, ConnectX-5 and Bluefield
families of adapters. Enabled by default.
- ``txq_mpw_hdr_dseg_en`` parameter [int]
@@ -318,14 +355,14 @@ Run-time configuration
- ``tx_vec_en`` parameter [int]
- A nonzero value enables Tx vector on ConnectX-5 only NIC if the number of
+ A nonzero value enables Tx vector on ConnectX-5 and Bluefield NICs if the number of
global Tx queues on the port is lesser than MLX5_VPMD_MIN_TXQS.
This option cannot be used with certain offloads such as ``DEV_TX_OFFLOAD_TCP_TSO,
DEV_TX_OFFLOAD_VXLAN_TNL_TSO, DEV_TX_OFFLOAD_GRE_TNL_TSO, DEV_TX_OFFLOAD_VLAN_INSERT``.
When those offloads are requested the MPS send function will not be used.
- Enabled by default on ConnectX-5.
+ Enabled by default on ConnectX-5 and Bluefield.
- ``rx_vec_en`` parameter [int]
@@ -334,6 +371,41 @@ Run-time configuration
Enabled by default.
+- ``vf_nl_en`` parameter [int]
+
+ A nonzero value enables Netlink requests from the VF to add/remove MAC
+ addresses or/and enable/disable promiscuous/all multicast on the Netdevice.
+ Otherwise the relevant configuration must be run with Linux iproute2 tools.
+ This is a prerequisite to receive this kind of traffic.
+
+ Enabled by default, valid only on VF devices ignored otherwise.
+
+- ``l3_vxlan_en`` parameter [int]
+
+ A nonzero value allows L3 VXLAN and VXLAN-GPE flow creation. To enable
+ L3 VXLAN or VXLAN-GPE, users has to configure firmware and enable this
+ parameter. This is a prerequisite to receive this kind of traffic.
+
+ Disabled by default.
+
+Firmware configuration
+~~~~~~~~~~~~~~~~~~~~~~
+
+- L3 VXLAN and VXLAN-GPE destination UDP port
+
+ .. code-block:: console
+
+ mlxconfig -d <mst device> set IP_OVER_VXLAN_EN=1
+ mlxconfig -d <mst device> set IP_OVER_VXLAN_PORT=<udp dport>
+
+ Verify configurations are set:
+
+ .. code-block:: console
+
+ mlxconfig -d <mst device> query | grep IP_OVER_VXLAN
+ IP_OVER_VXLAN_EN True(1)
+ IP_OVER_VXLAN_PORT <udp dport>
+
Prerequisites
-------------
@@ -353,8 +425,9 @@ DPDK and must be installed separately:
- **libmlx5**
- Low-level user space driver library for Mellanox ConnectX-4/ConnectX-5
- devices, it is automatically loaded by libibverbs.
+ Low-level user space driver library for Mellanox
+ ConnectX-4/ConnectX-5/Bluefield devices, it is automatically loaded
+ by libibverbs.
This library basically implements send/receive calls to the hardware
queues.
@@ -368,15 +441,16 @@ DPDK and must be installed separately:
Unlike most other PMDs, these modules must remain loaded and bound to
their devices:
- - mlx5_core: hardware driver managing Mellanox ConnectX-4/ConnectX-5
- devices and related Ethernet kernel network devices.
+ - mlx5_core: hardware driver managing Mellanox
+ ConnectX-4/ConnectX-5/Bluefield devices and related Ethernet kernel
+ network devices.
- mlx5_ib: InifiniBand device driver.
- ib_uverbs: user space driver for Verbs (entry point for libibverbs).
- **Firmware update**
- Mellanox OFED releases include firmware updates for ConnectX-4/ConnectX-5
- adapters.
+ Mellanox OFED releases include firmware updates for
+ ConnectX-4/ConnectX-5/Bluefield adapters.
Because each release provides new features, these updates must be applied to
match the kernel modules and libraries they come with.
@@ -413,6 +487,7 @@ Mellanox OFED
- ConnectX-4 Lx: **14.21.1000** and above.
- ConnectX-5: **16.21.1000** and above.
- ConnectX-5 Ex: **16.21.1000** and above.
+ - Bluefield: **18.99.3950** and above.
While these libraries and kernel modules are available on OpenFabrics
Alliance's `website <https://www.openfabrics.org/>`__ and provided by package
@@ -603,6 +678,12 @@ Performance tuning
The XXX can be different on different systems. Make sure to configure
according to the setpci output.
+7. To minimize overhead of searching Memory Regions:
+
+ - '--socket-mem' is recommended to pin memory by predictable amount.
+ - Configure per-lcore cache when creating Mempools for packet buffer.
+ - Refrain from dynamically allocating/freeing memory in run-time.
+
Notes for testpmd
-----------------
@@ -624,7 +705,7 @@ Usage example
-------------
This section demonstrates how to launch **testpmd** with Mellanox
-ConnectX-4/ConnectX-5 devices managed by librte_pmd_mlx5.
+ConnectX-4/ConnectX-5/Bluefield devices managed by librte_pmd_mlx5.
#. Load the kernel modules: