diff options
Diffstat (limited to 'doc/guides/nics/mlx5.rst')
-rw-r--r-- | doc/guides/nics/mlx5.rst | 424 |
1 files changed, 424 insertions, 0 deletions
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst new file mode 100644 index 00000000..b6f91e6a --- /dev/null +++ b/doc/guides/nics/mlx5.rst @@ -0,0 +1,424 @@ +.. BSD LICENSE + Copyright 2015 6WIND S.A. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of 6WIND S.A. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +MLX5 poll mode driver +===================== + +The MLX5 poll mode driver library (**librte_pmd_mlx5**) provides support for +**Mellanox ConnectX-4** and **Mellanox ConnectX-4 Lx** families of +10/25/40/50/100 Gb/s adapters as well as their virtual functions (VF) in +SR-IOV context. + +Information and documentation about these adapters can be found on the +`Mellanox website <http://www.mellanox.com>`__. Help is also provided by the +`Mellanox community <http://community.mellanox.com/welcome>`__. + +There is also a `section dedicated to this poll mode driver +<http://www.mellanox.com/page/products_dyn?product_family=209&mtag=pmd_for_dpdk>`__. + +.. note:: + + Due to external dependencies, this driver is disabled by default. It must + be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX5_PMD=y`` and + recompiling DPDK. + +Implementation details +---------------------- + +Besides its dependency on libibverbs (that implies libmlx5 and associated +kernel support), librte_pmd_mlx5 relies heavily on system calls for control +operations such as querying/updating the MTU and flow control parameters. + +For security reasons and robustness, this driver only deals with virtual +memory addresses. The way resources allocations are handled by the kernel +combined with hardware specifications that allow it to handle virtual memory +addresses directly ensure that DPDK applications cannot access random +physical memory (or memory that does not belong to the current process). + +This capability allows the PMD to coexist with kernel network interfaces +which remain functional, although they stop receiving unicast packets as +long as they share the same MAC address. + +Enabling librte_pmd_mlx5 causes DPDK applications to be linked against +libibverbs. + +Features +-------- + +- Multiple TX and RX queues. +- Support for scattered TX and RX frames. +- IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues. +- Several RSS hash keys, one for each flow type. +- Configurable RETA table. +- Support for multiple MAC addresses. +- VLAN filtering. +- RX VLAN stripping. +- TX VLAN insertion. +- RX CRC stripping configuration. +- Promiscuous mode. +- Multicast promiscuous mode. +- Hardware checksum offloads. +- Flow director (RTE_FDIR_MODE_PERFECT and RTE_FDIR_MODE_PERFECT_MAC_VLAN). +- Secondary process TX is supported. + +Limitations +----------- + +- KVM and VMware ESX SR-IOV modes are not supported yet. +- Inner RSS for VXLAN frames is not supported yet. +- Port statistics through software counters only. +- Hardware checksum offloads for VXLAN inner header are not supported yet. +- Secondary process RX is not supported. + +Configuration +------------- + +Compilation options +~~~~~~~~~~~~~~~~~~~ + +These options can be modified in the ``.config`` file. + +- ``CONFIG_RTE_LIBRTE_MLX5_PMD`` (default **n**) + + Toggle compilation of librte_pmd_mlx5 itself. + +- ``CONFIG_RTE_LIBRTE_MLX5_DEBUG`` (default **n**) + + Toggle debugging code and stricter compilation flags. Enabling this option + adds additional run-time checks and debugging messages at the cost of + lower performance. + +- ``CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N`` (default **4**) + + Number of scatter/gather elements (SGEs) per work request (WR). Lowering + this number improves performance but also limits the ability to receive + scattered packets (packets that do not fit a single mbuf). The default + value is a safe tradeoff. + +- ``CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE`` (default **0**) + + Amount of data to be inlined during TX operations. Improves latency. + Can improve PPS performance when PCI backpressure is detected and may be + useful for scenarios involving heavy traffic on many queues. + + Since the additional software logic necessary to handle this mode can + lower performance when there is no backpressure, it is not enabled by + default. + +- ``CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE`` (default **8**) + + Maximum number of cached memory pools (MPs) per TX queue. Each MP from + which buffers are to be transmitted must be associated to memory regions + (MRs). This is a slow operation that must be cached. + + This value is always 1 for RX queues since they use a single MP. + +Environment variables +~~~~~~~~~~~~~~~~~~~~~ + +- ``MLX5_ENABLE_CQE_COMPRESSION`` + + A nonzero value lets ConnectX-4 return smaller completion entries to + improve performance when PCI backpressure is detected. It is most useful + for scenarios involving heavy traffic on many queues. + + Since the additional software logic necessary to handle this mode can + lower performance when there is no backpressure, it is not enabled by + default. + +- ``MLX5_PMD_ENABLE_PADDING`` + + Enables HW packet padding in PCI bus transactions. + + When packet size is cache aligned and CRC stripping is enabled, 4 fewer + bytes are written to the PCI bus. Enabling padding makes such packets + aligned again. + + In cases where PCI bandwidth is the bottleneck, padding can improve + performance by 10%. + + This is disabled by default since this can also decrease performance for + unaligned packet sizes. + +Run-time configuration +~~~~~~~~~~~~~~~~~~~~~~ + +- librte_pmd_mlx5 brings kernel network interfaces up during initialization + because it is affected by their state. Forcing them down prevents packets + reception. + +- **ethtool** operations on related kernel interfaces also affect the PMD. + +Prerequisites +------------- + +This driver relies on external libraries and kernel drivers for resources +allocations and initialization. The following dependencies are not part of +DPDK and must be installed separately: + +- **libibverbs** + + User space Verbs framework used by librte_pmd_mlx5. This library provides + a generic interface between the kernel and low-level user space drivers + such as libmlx5. + + It allows slow and privileged operations (context initialization, hardware + resources allocations) to be managed by the kernel and fast operations to + never leave user space. + +- **libmlx5** + + Low-level user space driver library for Mellanox ConnectX-4 devices, + it is automatically loaded by libibverbs. + + This library basically implements send/receive calls to the hardware + queues. + +- **Kernel modules** (mlnx-ofed-kernel) + + They provide the kernel-side Verbs API and low level device drivers that + manage actual hardware initialization and resources sharing with user + space processes. + + Unlike most other PMDs, these modules must remain loaded and bound to + their devices: + + - mlx5_core: hardware driver managing Mellanox ConnectX-4 devices and + related Ethernet kernel network devices. + - mlx5_ib: InifiniBand device driver. + - ib_uverbs: user space driver for Verbs (entry point for libibverbs). + +- **Firmware update** + + Mellanox OFED releases include firmware updates for ConnectX-4 adapters. + + Because each release provides new features, these updates must be applied to + match the kernel modules and libraries they come with. + +.. note:: + + Both libraries are BSD and GPL licensed. Linux kernel modules are GPL + licensed. + +Currently supported by DPDK: + +- Mellanox OFED **3.1-1.0.3**, **3.1-1.5.7.1** or **3.2-2.0.0.0** depending + on usage. + + The following features are supported with version **3.1-1.5.7.1** and + above only: + + - IPv6, UPDv6, TCPv6 RSS. + - RX checksum offloads. + - IBM POWER8. + + The following features are supported with version **3.2-2.0.0.0** and + above only: + + - Flow director. + - RX VLAN stripping. + - TX VLAN insertion. + - RX CRC stripping configuration. + +- Minimum firmware version: + + With MLNX_OFED **3.1-1.0.3**: + + - ConnectX-4: **12.12.1240** + - ConnectX-4 Lx: **14.12.1100** + + With MLNX_OFED **3.1-1.5.7.1**: + + - ConnectX-4: **12.13.0144** + - ConnectX-4 Lx: **14.13.0144** + + With MLNX_OFED **3.2-2.0.0.0**: + + - ConnectX-4: **12.14.2036** + - ConnectX-4 Lx: **14.14.2036** + +Getting Mellanox OFED +~~~~~~~~~~~~~~~~~~~~~ + +While these libraries and kernel modules are available on OpenFabrics +Alliance's `website <https://www.openfabrics.org/>`__ and provided by package +managers on most distributions, this PMD requires Ethernet extensions that +may not be supported at the moment (this is a work in progress). + +`Mellanox OFED +<http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux>`__ +includes the necessary support and should be used in the meantime. For DPDK, +only libibverbs, libmlx5, mlnx-ofed-kernel packages and firmware updates are +required from that distribution. + +.. note:: + + Several versions of Mellanox OFED are available. Installing the version + this DPDK release was developed and tested against is strongly + recommended. Please check the `prerequisites`_. + +Notes for testpmd +----------------- + +Compared to librte_pmd_mlx4 that implements a single RSS configuration per +port, librte_pmd_mlx5 supports per-protocol RSS configuration. + +Since ``testpmd`` defaults to IP RSS mode and there is currently no +command-line parameter to enable additional protocols (UDP and TCP as well +as IP), the following commands must be entered from its CLI to get the same +behavior as librte_pmd_mlx4: + +.. code-block:: console + + > port stop all + > port config all rss all + > port start all + +Usage example +------------- + +This section demonstrates how to launch **testpmd** with Mellanox ConnectX-4 +devices managed by librte_pmd_mlx5. + +#. Load the kernel modules: + + .. code-block:: console + + modprobe -a ib_uverbs mlx5_core mlx5_ib + + Alternatively if MLNX_OFED is fully installed, the following script can + be run: + + .. code-block:: console + + /etc/init.d/openibd restart + + .. note:: + + User space I/O kernel modules (uio and igb_uio) are not used and do + not have to be loaded. + +#. Make sure Ethernet interfaces are in working order and linked to kernel + verbs. Related sysfs entries should be present: + + .. code-block:: console + + ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5 + + Example output: + + .. code-block:: console + + eth30 + eth31 + eth32 + eth33 + +#. Optionally, retrieve their PCI bus addresses for whitelisting: + + .. code-block:: console + + { + for intf in eth2 eth3 eth4 eth5; + do + (cd "/sys/class/net/${intf}/device/" && pwd -P); + done; + } | + sed -n 's,.*/\(.*\),-w \1,p' + + Example output: + + .. code-block:: console + + -w 0000:05:00.1 + -w 0000:06:00.0 + -w 0000:06:00.1 + -w 0000:05:00.0 + +#. Request huge pages: + + .. code-block:: console + + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages + +#. Start testpmd with basic parameters: + + .. code-block:: console + + testpmd -c 0xff00 -n 4 -w 05:00.0 -w 05:00.1 -w 06:00.0 -w 06:00.1 -- --rxq=2 --txq=2 -i + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:05:00.0 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_0" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fe + EAL: PCI device 0000:05:00.1 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_1" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:ff + EAL: PCI device 0000:06:00.0 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_2" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fa + EAL: PCI device 0000:06:00.1 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_3" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fb + Interactive-mode selected + Configuring Port 0 (socket 0) + PMD: librte_pmd_mlx5: 0x8cba80: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8cba80: RX queues number update: 0 -> 2 + Port 0: E4:1D:2D:E7:0C:FE + Configuring Port 1 (socket 0) + PMD: librte_pmd_mlx5: 0x8ccac8: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8ccac8: RX queues number update: 0 -> 2 + Port 1: E4:1D:2D:E7:0C:FF + Configuring Port 2 (socket 0) + PMD: librte_pmd_mlx5: 0x8cdb10: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8cdb10: RX queues number update: 0 -> 2 + Port 2: E4:1D:2D:E7:0C:FA + Configuring Port 3 (socket 0) + PMD: librte_pmd_mlx5: 0x8ceb58: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8ceb58: RX queues number update: 0 -> 2 + Port 3: E4:1D:2D:E7:0C:FB + Checking link statuses... + Port 0 Link Up - speed 40000 Mbps - full-duplex + Port 1 Link Up - speed 40000 Mbps - full-duplex + Port 2 Link Up - speed 10000 Mbps - full-duplex + Port 3 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> |