diff options
Diffstat (limited to 'doc/guides/nics/mlx5.rst')
-rw-r--r-- | doc/guides/nics/mlx5.rst | 264 |
1 files changed, 192 insertions, 72 deletions
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 0e6e525c..52e1213c 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -1,40 +1,14 @@ -.. BSD LICENSE +.. SPDX-License-Identifier: BSD-3-Clause Copyright 2015 6WIND S.A. - Copyright 2015 Mellanox - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions - are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the - distribution. - * Neither the name of 6WIND S.A. nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + Copyright 2015 Mellanox Technologies, Ltd MLX5 poll mode driver ===================== The MLX5 poll mode driver library (**librte_pmd_mlx5**) provides support -for **Mellanox ConnectX-4**, **Mellanox ConnectX-4 Lx** and **Mellanox -ConnectX-5** families of 10/25/40/50/100 Gb/s adapters as well as their -virtual functions (VF) in SR-IOV context. +for **Mellanox ConnectX-4**, **Mellanox ConnectX-4 Lx** , **Mellanox +ConnectX-5** and **Mellanox Bluefield** families of 10/25/40/50/100 Gb/s +adapters as well as their virtual functions (VF) in SR-IOV context. Information and documentation about these adapters can be found on the `Mellanox website <http://www.mellanox.com>`__. Help is also provided by the @@ -75,7 +49,7 @@ libibverbs. Features -------- -- Multi arch support: x86_64, POWER8, ARMv8. +- Multi arch support: x86_64, POWER8, ARMv8, i686. - Multiple TX and RX queues. - Support for scattered TX and RX frames. - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues. @@ -95,17 +69,17 @@ Features - Multiple process. - KVM and VMware ESX SR-IOV modes are supported. - RSS hash result is supported. -- Hardware TSO. -- Hardware checksum TX offload for VXLAN and GRE. +- Hardware TSO for generic IP or UDP tunnel, including VXLAN and GRE. +- Hardware checksum Tx offload for generic IP or UDP tunnel, including VXLAN and GRE. - RX interrupts. - Statistics query including Basic, Extended and per queue. - Rx HW timestamp. +- Tunnel types: VXLAN, L3 VXLAN, VXLAN-GPE, GRE, MPLSoGRE, MPLSoUDP. +- Tunnel HW offloads: packet type, inner/outer RSS, IP and UDP checksum verification. Limitations ----------- -- Inner RSS for VXLAN frames is not supported yet. -- Hardware checksum RX offloads for VXLAN inner header are not supported yet. - For secondary process: - Forked secondary process not supported. @@ -131,11 +105,37 @@ Limitations - A multi segment packet must have less than 6 segments in case the Tx burst function is set to multi-packet send or Enhanced multi-packet send. Otherwise it must have less than 50 segments. + - Count action for RTE flow is **only supported in Mellanox OFED**. + - Flows with a VXLAN Network Identifier equal (or ends to be equal) to 0 are not supported. + - VXLAN TSO and checksum offloads are not supported on VM. +- L3 VXLAN and VXLAN-GPE tunnels cannot be supported together with MPLSoGRE and MPLSoUDP. + +- VF: flow rules created on VF devices can only match traffic targeted at the + configured MAC addresses (see ``rte_eth_dev_mac_addr_add()``). + +.. note:: + + MAC addresses not already present in the bridge table of the associated + kernel network device will be added and cleaned up by the PMD when closing + the device. In case of ungraceful program termination, some entries may + remain present and should be removed manually by other means. + +- When Multi-Packet Rx queue is configured (``mprq_en``), a Rx packet can be + externally attached to a user-provided mbuf with having EXT_ATTACHED_MBUF in + ol_flags. As the mempool for the external buffer is managed by PMD, all the + Rx mbufs must be freed before the device is closed. Otherwise, the mempool of + the external buffers will be freed by PMD and the application which still + holds the external buffers may be corrupted. + +- If Multi-Packet Rx queue is configured (``mprq_en``) and Rx CQE compression is + enabled (``rxq_cqe_comp_en``) at the same time, RSS hash result is not fully + supported. Some Rx packets may not have PKT_RX_RSS_HASH. + Statistics ---------- @@ -171,9 +171,10 @@ These options can be modified in the ``.config`` file. missing with ``ldd(1)``. It works by moving these dependencies to a purpose-built rdma-core "glue" - plug-in, which must either be installed in ``CONFIG_RTE_EAL_PMD_PATH`` if - set, or in a standard location for the dynamic linker (e.g. ``/lib``) if - left to the default empty string (``""``). + plug-in which must either be installed in a directory whose name is based + on ``CONFIG_RTE_EAL_PMD_PATH`` suffixed with ``-glue`` if set, or in a + standard location for the dynamic linker (e.g. ``/lib``) if left to the + default empty string (``""``). This option has no performance impact. @@ -183,14 +184,6 @@ These options can be modified in the ``.config`` file. adds additional run-time checks and debugging messages at the cost of lower performance. -- ``CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE`` (default **8**) - - Maximum number of cached memory pools (MPs) per TX queue. Each MP from - which buffers are to be transmitted must be associated to memory regions - (MRs). This is a slow operation that must be cached. - - This value is always 1 for RX queues since they use a single MP. - Environment variables ~~~~~~~~~~~~~~~~~~~~~ @@ -250,8 +243,55 @@ Run-time configuration Supported on: - - x86_64 with ConnectX-4, ConnectX-4 LX and ConnectX-5. - - POWER8 and ARMv8 with ConnectX-4 LX and ConnectX-5. + - x86_64 with ConnectX-4, ConnectX-4 LX, ConnectX-5 and Bluefield. + - POWER8 and ARMv8 with ConnectX-4 LX, ConnectX-5 and Bluefield. + +- ``mprq_en`` parameter [int] + + A nonzero value enables configuring Multi-Packet Rx queues. Rx queue is + configured as Multi-Packet RQ if the total number of Rx queues is + ``rxqs_min_mprq`` or more and Rx scatter isn't configured. Disabled by + default. + + Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe bandwidth + by posting a single large buffer for multiple packets. Instead of posting a + buffers per a packet, one large buffer is posted in order to receive multiple + packets on the buffer. A MPRQ buffer consists of multiple fixed-size strides + and each stride receives one packet. MPRQ can improve throughput for + small-packet tarffic. + + When MPRQ is enabled, max_rx_pkt_len can be larger than the size of + user-provided mbuf even if DEV_RX_OFFLOAD_SCATTER isn't enabled. PMD will + configure large stride size enough to accommodate max_rx_pkt_len as long as + device allows. Note that this can waste system memory compared to enabling Rx + scatter and multi-segment packet. + +- ``mprq_log_stride_num`` parameter [int] + + Log 2 of the number of strides for Multi-Packet Rx queue. Configuring more + strides can reduce PCIe tarffic further. If configured value is not in the + range of device capability, the default value will be set with a warning + message. The default value is 4 which is 16 strides per a buffer, valid only + if ``mprq_en`` is set. + + The size of Rx queue should be bigger than the number of strides. + +- ``mprq_max_memcpy_len`` parameter [int] + + The maximum length of packet to memcpy in case of Multi-Packet Rx queue. Rx + packet is mem-copied to a user-provided mbuf if the size of Rx packet is less + than or equal to this parameter. Otherwise, PMD will attach the Rx packet to + the mbuf by external buffer attachment - ``rte_pktmbuf_attach_extbuf()``. + A mempool for external buffers will be allocated and managed by PMD. If Rx + packet is externally attached, ol_flags field of the mbuf will have + EXT_ATTACHED_MBUF and this flag must be preserved. ``RTE_MBUF_HAS_EXTBUF()`` + checks the flag. The default value is 128, valid only if ``mprq_en`` is set. + +- ``rxqs_min_mprq`` parameter [int] + + Configure Rx queues as Multi-Packet RQ if the total number of Rx queues is + greater or equal to this value. The default value is 12, valid only if + ``mprq_en`` is set. - ``txq_inline`` parameter [int] @@ -270,34 +310,35 @@ Run-time configuration This option should be used in combination with ``txq_inline`` above. - On ConnectX-4, ConnectX-4 LX and ConnectX-5 without Enhanced MPW: + On ConnectX-4, ConnectX-4 LX, ConnectX-5 and Bluefield without + Enhanced MPW: - Disabled by default. - In case ``txq_inline`` is set recommendation is 4. - On ConnectX-5 with Enhanced MPW: + On ConnectX-5 and Bluefield with Enhanced MPW: - Set to 8 by default. - ``txq_mpw_en`` parameter [int] A nonzero value enables multi-packet send (MPS) for ConnectX-4 Lx and - enhanced multi-packet send (Enhanced MPS) for ConnectX-5. MPS allows the - TX burst function to pack up multiple packets in a single descriptor - session in order to save PCI bandwidth and improve performance at the - cost of a slightly higher CPU usage. When ``txq_inline`` is set along - with ``txq_mpw_en``, TX burst function tries to copy entire packet data - on to TX descriptor instead of including pointer of packet only if there - is enough room remained in the descriptor. ``txq_inline`` sets - per-descriptor space for either pointers or inlined packets. In addition, - Enhanced MPS supports hybrid mode - mixing inlined packets and pointers - in the same descriptor. + enhanced multi-packet send (Enhanced MPS) for ConnectX-5 and Bluefield. + MPS allows the TX burst function to pack up multiple packets in a + single descriptor session in order to save PCI bandwidth and improve + performance at the cost of a slightly higher CPU usage. When + ``txq_inline`` is set along with ``txq_mpw_en``, TX burst function tries + to copy entire packet data on to TX descriptor instead of including + pointer of packet only if there is enough room remained in the + descriptor. ``txq_inline`` sets per-descriptor space for either pointers + or inlined packets. In addition, Enhanced MPS supports hybrid mode - + mixing inlined packets and pointers in the same descriptor. This option cannot be used with certain offloads such as ``DEV_TX_OFFLOAD_TCP_TSO, DEV_TX_OFFLOAD_VXLAN_TNL_TSO, DEV_TX_OFFLOAD_GRE_TNL_TSO, DEV_TX_OFFLOAD_VLAN_INSERT``. When those offloads are requested the MPS send function will not be used. - It is currently only supported on the ConnectX-4 Lx and ConnectX-5 + It is currently only supported on the ConnectX-4 Lx, ConnectX-5 and Bluefield families of adapters. Enabled by default. - ``txq_mpw_hdr_dseg_en`` parameter [int] @@ -318,14 +359,14 @@ Run-time configuration - ``tx_vec_en`` parameter [int] - A nonzero value enables Tx vector on ConnectX-5 only NIC if the number of + A nonzero value enables Tx vector on ConnectX-5 and Bluefield NICs if the number of global Tx queues on the port is lesser than MLX5_VPMD_MIN_TXQS. This option cannot be used with certain offloads such as ``DEV_TX_OFFLOAD_TCP_TSO, DEV_TX_OFFLOAD_VXLAN_TNL_TSO, DEV_TX_OFFLOAD_GRE_TNL_TSO, DEV_TX_OFFLOAD_VLAN_INSERT``. When those offloads are requested the MPS send function will not be used. - Enabled by default on ConnectX-5. + Enabled by default on ConnectX-5 and Bluefield. - ``rx_vec_en`` parameter [int] @@ -334,6 +375,53 @@ Run-time configuration Enabled by default. +- ``vf_nl_en`` parameter [int] + + A nonzero value enables Netlink requests from the VF to add/remove MAC + addresses or/and enable/disable promiscuous/all multicast on the Netdevice. + Otherwise the relevant configuration must be run with Linux iproute2 tools. + This is a prerequisite to receive this kind of traffic. + + Enabled by default, valid only on VF devices ignored otherwise. + +- ``l3_vxlan_en`` parameter [int] + + A nonzero value allows L3 VXLAN and VXLAN-GPE flow creation. To enable + L3 VXLAN or VXLAN-GPE, users has to configure firmware and enable this + parameter. This is a prerequisite to receive this kind of traffic. + + Disabled by default. + +- ``representor`` parameter [list] + + This parameter can be used to instantiate DPDK Ethernet devices from + existing port (or VF) representors configured on the device. + + It is a standard parameter whose format is described in + :ref:`ethernet_device_standard_device_arguments`. + + For instance, to probe port representors 0 through 2:: + + representor=[0-2] + +Firmware configuration +~~~~~~~~~~~~~~~~~~~~~~ + +- L3 VXLAN and VXLAN-GPE destination UDP port + + .. code-block:: console + + mlxconfig -d <mst device> set IP_OVER_VXLAN_EN=1 + mlxconfig -d <mst device> set IP_OVER_VXLAN_PORT=<udp dport> + + Verify configurations are set: + + .. code-block:: console + + mlxconfig -d <mst device> query | grep IP_OVER_VXLAN + IP_OVER_VXLAN_EN True(1) + IP_OVER_VXLAN_PORT <udp dport> + Prerequisites ------------- @@ -353,12 +441,19 @@ DPDK and must be installed separately: - **libmlx5** - Low-level user space driver library for Mellanox ConnectX-4/ConnectX-5 - devices, it is automatically loaded by libibverbs. + Low-level user space driver library for Mellanox + ConnectX-4/ConnectX-5/Bluefield devices, it is automatically loaded + by libibverbs. This library basically implements send/receive calls to the hardware queues. +- **libmnl** + + Minimalistic Netlink library mainly relied on to manage E-Switch flow + rules (i.e. those with the "transfer" attribute and typically involving + port representors). + - **Kernel modules** They provide the kernel-side Verbs API and low level device drivers that @@ -368,15 +463,16 @@ DPDK and must be installed separately: Unlike most other PMDs, these modules must remain loaded and bound to their devices: - - mlx5_core: hardware driver managing Mellanox ConnectX-4/ConnectX-5 - devices and related Ethernet kernel network devices. + - mlx5_core: hardware driver managing Mellanox + ConnectX-4/ConnectX-5/Bluefield devices and related Ethernet kernel + network devices. - mlx5_ib: InifiniBand device driver. - ib_uverbs: user space driver for Verbs (entry point for libibverbs). - **Firmware update** - Mellanox OFED releases include firmware updates for ConnectX-4/ConnectX-5 - adapters. + Mellanox OFED releases include firmware updates for + ConnectX-4/ConnectX-5/Bluefield adapters. Because each release provides new features, these updates must be applied to match the kernel modules and libraries they come with. @@ -399,6 +495,10 @@ RMDA Core with Linux Kernel - Minimal kernel version : v4.14 or the most recent 4.14-rc (see `Linux installation documentation`_) - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull request #227 from yishaih/tm") (see `RDMA Core installation documentation`_) +- When building for i686 use: + + - rdma-core version 18.0 or above built with 32bit support. + - Kernel version 4.14.41 or above. .. _`Linux installation documentation`: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/plain/Documentation/admin-guide/README.rst .. _`RDMA Core installation documentation`: https://raw.githubusercontent.com/linux-rdma/rdma-core/master/README.md @@ -406,13 +506,14 @@ RMDA Core with Linux Kernel Mellanox OFED ^^^^^^^^^^^^^ -- Mellanox OFED version: **4.2, 4.3**. +- Mellanox OFED version: **4.3, 4.4**. - firmware version: - ConnectX-4: **12.21.1000** and above. - ConnectX-4 Lx: **14.21.1000** and above. - ConnectX-5: **16.21.1000** and above. - ConnectX-5 Ex: **16.21.1000** and above. + - Bluefield: **18.99.3950** and above. While these libraries and kernel modules are available on OpenFabrics Alliance's `website <https://www.openfabrics.org/>`__ and provided by package @@ -431,6 +532,19 @@ required from that distribution. this DPDK release was developed and tested against is strongly recommended. Please check the `prerequisites`_. +Libmnl +^^^^^^ + +Minimal version for libmnl is **1.0.3**. + +As a dependency of the **iproute2** suite, this library is often installed +by default. It is otherwise readily available through standard system +packages. + +Its development headers must be installed in order to compile this PMD. +These packages are usually named **libmnl-dev** or **libmnl-devel** +depending on the Linux distribution. + Supported NICs -------------- @@ -603,6 +717,12 @@ Performance tuning The XXX can be different on different systems. Make sure to configure according to the setpci output. +7. To minimize overhead of searching Memory Regions: + + - '--socket-mem' is recommended to pin memory by predictable amount. + - Configure per-lcore cache when creating Mempools for packet buffer. + - Refrain from dynamically allocating/freeing memory in run-time. + Notes for testpmd ----------------- @@ -624,7 +744,7 @@ Usage example ------------- This section demonstrates how to launch **testpmd** with Mellanox -ConnectX-4/ConnectX-5 devices managed by librte_pmd_mlx5. +ConnectX-4/ConnectX-5/Bluefield devices managed by librte_pmd_mlx5. #. Load the kernel modules: |