aboutsummaryrefslogtreecommitdiffstats
path: root/doc/guides/nics/mlx5.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/guides/nics/mlx5.rst')
-rw-r--r--doc/guides/nics/mlx5.rst424
1 files changed, 424 insertions, 0 deletions
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
new file mode 100644
index 00000000..b6f91e6a
--- /dev/null
+++ b/doc/guides/nics/mlx5.rst
@@ -0,0 +1,424 @@
+.. BSD LICENSE
+ Copyright 2015 6WIND S.A.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in
+ the documentation and/or other materials provided with the
+ distribution.
+ * Neither the name of 6WIND S.A. nor the names of its
+ contributors may be used to endorse or promote products derived
+ from this software without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+MLX5 poll mode driver
+=====================
+
+The MLX5 poll mode driver library (**librte_pmd_mlx5**) provides support for
+**Mellanox ConnectX-4** and **Mellanox ConnectX-4 Lx** families of
+10/25/40/50/100 Gb/s adapters as well as their virtual functions (VF) in
+SR-IOV context.
+
+Information and documentation about these adapters can be found on the
+`Mellanox website <http://www.mellanox.com>`__. Help is also provided by the
+`Mellanox community <http://community.mellanox.com/welcome>`__.
+
+There is also a `section dedicated to this poll mode driver
+<http://www.mellanox.com/page/products_dyn?product_family=209&mtag=pmd_for_dpdk>`__.
+
+.. note::
+
+ Due to external dependencies, this driver is disabled by default. It must
+ be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX5_PMD=y`` and
+ recompiling DPDK.
+
+Implementation details
+----------------------
+
+Besides its dependency on libibverbs (that implies libmlx5 and associated
+kernel support), librte_pmd_mlx5 relies heavily on system calls for control
+operations such as querying/updating the MTU and flow control parameters.
+
+For security reasons and robustness, this driver only deals with virtual
+memory addresses. The way resources allocations are handled by the kernel
+combined with hardware specifications that allow it to handle virtual memory
+addresses directly ensure that DPDK applications cannot access random
+physical memory (or memory that does not belong to the current process).
+
+This capability allows the PMD to coexist with kernel network interfaces
+which remain functional, although they stop receiving unicast packets as
+long as they share the same MAC address.
+
+Enabling librte_pmd_mlx5 causes DPDK applications to be linked against
+libibverbs.
+
+Features
+--------
+
+- Multiple TX and RX queues.
+- Support for scattered TX and RX frames.
+- IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues.
+- Several RSS hash keys, one for each flow type.
+- Configurable RETA table.
+- Support for multiple MAC addresses.
+- VLAN filtering.
+- RX VLAN stripping.
+- TX VLAN insertion.
+- RX CRC stripping configuration.
+- Promiscuous mode.
+- Multicast promiscuous mode.
+- Hardware checksum offloads.
+- Flow director (RTE_FDIR_MODE_PERFECT and RTE_FDIR_MODE_PERFECT_MAC_VLAN).
+- Secondary process TX is supported.
+
+Limitations
+-----------
+
+- KVM and VMware ESX SR-IOV modes are not supported yet.
+- Inner RSS for VXLAN frames is not supported yet.
+- Port statistics through software counters only.
+- Hardware checksum offloads for VXLAN inner header are not supported yet.
+- Secondary process RX is not supported.
+
+Configuration
+-------------
+
+Compilation options
+~~~~~~~~~~~~~~~~~~~
+
+These options can be modified in the ``.config`` file.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_PMD`` (default **n**)
+
+ Toggle compilation of librte_pmd_mlx5 itself.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_DEBUG`` (default **n**)
+
+ Toggle debugging code and stricter compilation flags. Enabling this option
+ adds additional run-time checks and debugging messages at the cost of
+ lower performance.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N`` (default **4**)
+
+ Number of scatter/gather elements (SGEs) per work request (WR). Lowering
+ this number improves performance but also limits the ability to receive
+ scattered packets (packets that do not fit a single mbuf). The default
+ value is a safe tradeoff.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE`` (default **0**)
+
+ Amount of data to be inlined during TX operations. Improves latency.
+ Can improve PPS performance when PCI backpressure is detected and may be
+ useful for scenarios involving heavy traffic on many queues.
+
+ Since the additional software logic necessary to handle this mode can
+ lower performance when there is no backpressure, it is not enabled by
+ default.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE`` (default **8**)
+
+ Maximum number of cached memory pools (MPs) per TX queue. Each MP from
+ which buffers are to be transmitted must be associated to memory regions
+ (MRs). This is a slow operation that must be cached.
+
+ This value is always 1 for RX queues since they use a single MP.
+
+Environment variables
+~~~~~~~~~~~~~~~~~~~~~
+
+- ``MLX5_ENABLE_CQE_COMPRESSION``
+
+ A nonzero value lets ConnectX-4 return smaller completion entries to
+ improve performance when PCI backpressure is detected. It is most useful
+ for scenarios involving heavy traffic on many queues.
+
+ Since the additional software logic necessary to handle this mode can
+ lower performance when there is no backpressure, it is not enabled by
+ default.
+
+- ``MLX5_PMD_ENABLE_PADDING``
+
+ Enables HW packet padding in PCI bus transactions.
+
+ When packet size is cache aligned and CRC stripping is enabled, 4 fewer
+ bytes are written to the PCI bus. Enabling padding makes such packets
+ aligned again.
+
+ In cases where PCI bandwidth is the bottleneck, padding can improve
+ performance by 10%.
+
+ This is disabled by default since this can also decrease performance for
+ unaligned packet sizes.
+
+Run-time configuration
+~~~~~~~~~~~~~~~~~~~~~~
+
+- librte_pmd_mlx5 brings kernel network interfaces up during initialization
+ because it is affected by their state. Forcing them down prevents packets
+ reception.
+
+- **ethtool** operations on related kernel interfaces also affect the PMD.
+
+Prerequisites
+-------------
+
+This driver relies on external libraries and kernel drivers for resources
+allocations and initialization. The following dependencies are not part of
+DPDK and must be installed separately:
+
+- **libibverbs**
+
+ User space Verbs framework used by librte_pmd_mlx5. This library provides
+ a generic interface between the kernel and low-level user space drivers
+ such as libmlx5.
+
+ It allows slow and privileged operations (context initialization, hardware
+ resources allocations) to be managed by the kernel and fast operations to
+ never leave user space.
+
+- **libmlx5**
+
+ Low-level user space driver library for Mellanox ConnectX-4 devices,
+ it is automatically loaded by libibverbs.
+
+ This library basically implements send/receive calls to the hardware
+ queues.
+
+- **Kernel modules** (mlnx-ofed-kernel)
+
+ They provide the kernel-side Verbs API and low level device drivers that
+ manage actual hardware initialization and resources sharing with user
+ space processes.
+
+ Unlike most other PMDs, these modules must remain loaded and bound to
+ their devices:
+
+ - mlx5_core: hardware driver managing Mellanox ConnectX-4 devices and
+ related Ethernet kernel network devices.
+ - mlx5_ib: InifiniBand device driver.
+ - ib_uverbs: user space driver for Verbs (entry point for libibverbs).
+
+- **Firmware update**
+
+ Mellanox OFED releases include firmware updates for ConnectX-4 adapters.
+
+ Because each release provides new features, these updates must be applied to
+ match the kernel modules and libraries they come with.
+
+.. note::
+
+ Both libraries are BSD and GPL licensed. Linux kernel modules are GPL
+ licensed.
+
+Currently supported by DPDK:
+
+- Mellanox OFED **3.1-1.0.3**, **3.1-1.5.7.1** or **3.2-2.0.0.0** depending
+ on usage.
+
+ The following features are supported with version **3.1-1.5.7.1** and
+ above only:
+
+ - IPv6, UPDv6, TCPv6 RSS.
+ - RX checksum offloads.
+ - IBM POWER8.
+
+ The following features are supported with version **3.2-2.0.0.0** and
+ above only:
+
+ - Flow director.
+ - RX VLAN stripping.
+ - TX VLAN insertion.
+ - RX CRC stripping configuration.
+
+- Minimum firmware version:
+
+ With MLNX_OFED **3.1-1.0.3**:
+
+ - ConnectX-4: **12.12.1240**
+ - ConnectX-4 Lx: **14.12.1100**
+
+ With MLNX_OFED **3.1-1.5.7.1**:
+
+ - ConnectX-4: **12.13.0144**
+ - ConnectX-4 Lx: **14.13.0144**
+
+ With MLNX_OFED **3.2-2.0.0.0**:
+
+ - ConnectX-4: **12.14.2036**
+ - ConnectX-4 Lx: **14.14.2036**
+
+Getting Mellanox OFED
+~~~~~~~~~~~~~~~~~~~~~
+
+While these libraries and kernel modules are available on OpenFabrics
+Alliance's `website <https://www.openfabrics.org/>`__ and provided by package
+managers on most distributions, this PMD requires Ethernet extensions that
+may not be supported at the moment (this is a work in progress).
+
+`Mellanox OFED
+<http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux>`__
+includes the necessary support and should be used in the meantime. For DPDK,
+only libibverbs, libmlx5, mlnx-ofed-kernel packages and firmware updates are
+required from that distribution.
+
+.. note::
+
+ Several versions of Mellanox OFED are available. Installing the version
+ this DPDK release was developed and tested against is strongly
+ recommended. Please check the `prerequisites`_.
+
+Notes for testpmd
+-----------------
+
+Compared to librte_pmd_mlx4 that implements a single RSS configuration per
+port, librte_pmd_mlx5 supports per-protocol RSS configuration.
+
+Since ``testpmd`` defaults to IP RSS mode and there is currently no
+command-line parameter to enable additional protocols (UDP and TCP as well
+as IP), the following commands must be entered from its CLI to get the same
+behavior as librte_pmd_mlx4:
+
+.. code-block:: console
+
+ > port stop all
+ > port config all rss all
+ > port start all
+
+Usage example
+-------------
+
+This section demonstrates how to launch **testpmd** with Mellanox ConnectX-4
+devices managed by librte_pmd_mlx5.
+
+#. Load the kernel modules:
+
+ .. code-block:: console
+
+ modprobe -a ib_uverbs mlx5_core mlx5_ib
+
+ Alternatively if MLNX_OFED is fully installed, the following script can
+ be run:
+
+ .. code-block:: console
+
+ /etc/init.d/openibd restart
+
+ .. note::
+
+ User space I/O kernel modules (uio and igb_uio) are not used and do
+ not have to be loaded.
+
+#. Make sure Ethernet interfaces are in working order and linked to kernel
+ verbs. Related sysfs entries should be present:
+
+ .. code-block:: console
+
+ ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5
+
+ Example output:
+
+ .. code-block:: console
+
+ eth30
+ eth31
+ eth32
+ eth33
+
+#. Optionally, retrieve their PCI bus addresses for whitelisting:
+
+ .. code-block:: console
+
+ {
+ for intf in eth2 eth3 eth4 eth5;
+ do
+ (cd "/sys/class/net/${intf}/device/" && pwd -P);
+ done;
+ } |
+ sed -n 's,.*/\(.*\),-w \1,p'
+
+ Example output:
+
+ .. code-block:: console
+
+ -w 0000:05:00.1
+ -w 0000:06:00.0
+ -w 0000:06:00.1
+ -w 0000:05:00.0
+
+#. Request huge pages:
+
+ .. code-block:: console
+
+ echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages
+
+#. Start testpmd with basic parameters:
+
+ .. code-block:: console
+
+ testpmd -c 0xff00 -n 4 -w 05:00.0 -w 05:00.1 -w 06:00.0 -w 06:00.1 -- --rxq=2 --txq=2 -i
+
+ Example output:
+
+ .. code-block:: console
+
+ [...]
+ EAL: PCI device 0000:05:00.0 on NUMA socket 0
+ EAL: probe driver: 15b3:1013 librte_pmd_mlx5
+ PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_0" (VF: false)
+ PMD: librte_pmd_mlx5: 1 port(s) detected
+ PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fe
+ EAL: PCI device 0000:05:00.1 on NUMA socket 0
+ EAL: probe driver: 15b3:1013 librte_pmd_mlx5
+ PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_1" (VF: false)
+ PMD: librte_pmd_mlx5: 1 port(s) detected
+ PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:ff
+ EAL: PCI device 0000:06:00.0 on NUMA socket 0
+ EAL: probe driver: 15b3:1013 librte_pmd_mlx5
+ PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_2" (VF: false)
+ PMD: librte_pmd_mlx5: 1 port(s) detected
+ PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fa
+ EAL: PCI device 0000:06:00.1 on NUMA socket 0
+ EAL: probe driver: 15b3:1013 librte_pmd_mlx5
+ PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_3" (VF: false)
+ PMD: librte_pmd_mlx5: 1 port(s) detected
+ PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fb
+ Interactive-mode selected
+ Configuring Port 0 (socket 0)
+ PMD: librte_pmd_mlx5: 0x8cba80: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx5: 0x8cba80: RX queues number update: 0 -> 2
+ Port 0: E4:1D:2D:E7:0C:FE
+ Configuring Port 1 (socket 0)
+ PMD: librte_pmd_mlx5: 0x8ccac8: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx5: 0x8ccac8: RX queues number update: 0 -> 2
+ Port 1: E4:1D:2D:E7:0C:FF
+ Configuring Port 2 (socket 0)
+ PMD: librte_pmd_mlx5: 0x8cdb10: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx5: 0x8cdb10: RX queues number update: 0 -> 2
+ Port 2: E4:1D:2D:E7:0C:FA
+ Configuring Port 3 (socket 0)
+ PMD: librte_pmd_mlx5: 0x8ceb58: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx5: 0x8ceb58: RX queues number update: 0 -> 2
+ Port 3: E4:1D:2D:E7:0C:FB
+ Checking link statuses...
+ Port 0 Link Up - speed 40000 Mbps - full-duplex
+ Port 1 Link Up - speed 40000 Mbps - full-duplex
+ Port 2 Link Up - speed 10000 Mbps - full-duplex
+ Port 3 Link Up - speed 10000 Mbps - full-duplex
+ Done
+ testpmd>