diff options
author | C.J. Collier <cjcollier@linuxfoundation.org> | 2016-06-14 07:54:47 -0700 |
---|---|---|
committer | C.J. Collier <cjcollier@linuxfoundation.org> | 2016-06-14 07:55:43 -0700 |
commit | 5129044dce1f85ce4950f31bcf90f3886466f06a (patch) | |
tree | 1c6269614c0c15ffef8451c58ae8f8b30a1bc804 /doc/guides/nics | |
parent | e04be89c2409570e0055b2cda60bd11395bb93b0 (diff) |
Imported upstream release 16.04
* gbp import-orig ../dpdk-16.04.tar.xz
Change-Id: Iac2196db782ba322f6974d8a752acc34ce5024c3
Signed-off-by: C.J. Collier <cjcollier@linuxfoundation.org>
Diffstat (limited to 'doc/guides/nics')
30 files changed, 5216 insertions, 0 deletions
diff --git a/doc/guides/nics/bnx2x.rst b/doc/guides/nics/bnx2x.rst new file mode 100644 index 00000000..df8fb477 --- /dev/null +++ b/doc/guides/nics/bnx2x.rst @@ -0,0 +1,331 @@ +.. BSD LICENSE + Copyright (c) 2015 QLogic Corporation + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of QLogic Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +BNX2X Poll Mode Driver +====================== + +The BNX2X poll mode driver library (**librte_pmd_bnx2x**) implements support +for **QLogic 578xx** 10/20 Gbps family of adapters as well as their virtual +functions (VF) in SR-IOV context. It is supported on several standard Linux +distros like Red Hat 7.x and SLES12 OS. It is compile-tested under FreeBSD OS. + +More information can be found at `QLogic Corporation's Official Website +<http://www.qlogic.com>`_. + +Supported Features +------------------ + +BNX2X PMD has support for: + +- Base L2 features +- Unicast/multicast filtering +- Promiscuous mode +- Port hardware statistics +- SR-IOV VF + +Non-supported Features +---------------------- + +The features not yet supported include: + +- TSS (Transmit Side Scaling) +- RSS (Receive Side Scaling) +- LRO/TSO offload +- Checksum offload +- SR-IOV PF +- Rx TX scatter gather + +Co-existence considerations +--------------------------- + +- BCM578xx being a CNA can have both NIC and Storage personalities. + However, coexistence with storage protocol drivers (cnic, bnx2fc and + bnx2fi) is not supported on the same adapter. So storage personality + has to be disabled on that adapter when used in DPDK applications. + +- For SR-IOV case, bnx2x PMD will be used to bind to SR-IOV VF device and + Linux native kernel driver (bnx2x) will be attached to SR-IOV PF. + + +Supported QLogic NICs +--------------------- + +- 578xx + +Prerequisites +------------- + +- Requires firmware version **7.2.51.0**. It is included in most of the + standard Linux distros. If it is not available visit + `QLogic Driver Download Center <http://driverdownloads.qlogic.com>`_ + to get the required firmware. + +Pre-Installation Configuration +------------------------------ + +Config File Options +~~~~~~~~~~~~~~~~~~~ + +The following options can be modified in the ``.config`` file. Please note that +enabling debugging options may affect system performance. + +- ``CONFIG_RTE_LIBRTE_BNX2X_PMD`` (default **y**) + + Toggle compilation of bnx2x driver. + +- ``CONFIG_RTE_LIBRTE_BNX2X_DEBUG`` (default **n**) + + Toggle display of generic debugging messages. + +- ``CONFIG_RTE_LIBRTE_BNX2X_DEBUG_INIT`` (default **n**) + + Toggle display of initialization related messages. + +- ``CONFIG_RTE_LIBRTE_BNX2X_DEBUG_TX`` (default **n**) + + Toggle display of transmit fast path run-time messages. + +- ``CONFIG_RTE_LIBRTE_BNX2X_DEBUG_RX`` (default **n**) + + Toggle display of receive fast path run-time messages. + +- ``CONFIG_RTE_LIBRTE_BNX2X_DEBUG_PERIODIC`` (default **n**) + + Toggle display of register reads and writes. + + +.. _bnx2x_driver-compilation: + +Driver Compilation +~~~~~~~~~~~~~~~~~~ + +BNX2X PMD for Linux x86_64 gcc target, run the following "make" +command:: + + cd <DPDK-source-directory> + make config T=x86_64-native-linuxapp-gcc install + +To compile BNX2X PMD for Linux x86_64 clang target, run the following "make" +command:: + + cd <DPDK-source-directory> + make config T=x86_64-native-linuxapp-clang install + +To compile BNX2X PMD for Linux i686 gcc target, run the following "make" +command:: + + cd <DPDK-source-directory> + make config T=i686-native-linuxapp-gcc install + +To compile BNX2X PMD for Linux i686 gcc target, run the following "make" +command: + +.. code-block:: console + + cd <DPDK-source-directory> + make config T=i686-native-linuxapp-gcc install + +To compile BNX2X PMD for FreeBSD x86_64 clang target, run the following "gmake" +command:: + + cd <DPDK-source-directory> + gmake config T=x86_64-native-bsdapp-clang install + +To compile BNX2X PMD for FreeBSD x86_64 gcc target, run the following "gmake" +command:: + + cd <DPDK-source-directory> + gmake config T=x86_64-native-bsdapp-gcc install -Wl,-rpath=/usr/local/lib/gcc48 CC=gcc48 + +To compile BNX2X PMD for FreeBSD x86_64 gcc target, run the following "gmake" +command: + +.. code-block:: console + + cd <DPDK-source-directory> + gmake config T=x86_64-native-bsdapp-gcc install -Wl,-rpath=/usr/local/lib/gcc48 CC=gcc48 + +Linux +----- + +.. _bnx2x_Linux-installation: + +Linux Installation +~~~~~~~~~~~~~~~~~~ + +Sample Application Notes +~~~~~~~~~~~~~~~~~~~~~~~~ + +This section demonstrates how to launch ``testpmd`` with QLogic 578xx +devices managed by ``librte_pmd_bnx2x`` in Linux operating system. + +#. Request huge pages: + + .. code-block:: console + + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages + +#. Load ``igb_uio`` or ``vfio-pci`` driver: + + .. code-block:: console + + insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko + + or + + .. code-block:: console + + modprobe vfio-pci + +#. Bind the QLogic adapters to ``igb_uio`` or ``vfio-pci`` loaded in the + previous step:: + + ./tools/dpdk_nic_bind.py --bind igb_uio 0000:84:00.0 0000:84:00.1 + + or + + Setup VFIO permissions for regular users and then bind to ``vfio-pci``: + + .. code-block:: console + + sudo chmod a+x /dev/vfio + + sudo chmod 0666 /dev/vfio/* + + ./tools/dpdk_nic_bind.py --bind vfio-pci 0000:84:00.0 0000:84:00.1 + +#. Start ``testpmd`` with basic parameters: + + .. code-block:: console + + ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xf -n 4 -- -i + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:84:00.0 on NUMA socket 1 + EAL: probe driver: 14e4:168e rte_bnx2x_pmd + EAL: PCI memory mapped at 0x7f14f6fe5000 + EAL: PCI memory mapped at 0x7f14f67e5000 + EAL: PCI memory mapped at 0x7f15fbd9b000 + EAL: PCI device 0000:84:00.1 on NUMA socket 1 + EAL: probe driver: 14e4:168e rte_bnx2x_pmd + EAL: PCI memory mapped at 0x7f14f5fe5000 + EAL: PCI memory mapped at 0x7f14f57e5000 + EAL: PCI memory mapped at 0x7f15fbd4f000 + Interactive-mode selected + Configuring Port 0 (socket 0) + PMD: bnx2x_dev_tx_queue_setup(): fp[00] req_bd=512, thresh=512, + usable_bd=1020, total_bd=1024, + tx_pages=4 + PMD: bnx2x_dev_rx_queue_setup(): fp[00] req_bd=128, thresh=0, + usable_bd=510, total_bd=512, + rx_pages=1, cq_pages=8 + PMD: bnx2x_print_adapter_info(): + [...] + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> + +SR-IOV: Prerequisites and sample Application Notes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This section provides instructions to configure SR-IOV with Linux OS. + +#. Verify SR-IOV and ARI capabilities are enabled on the adapter using ``lspci``: + + .. code-block:: console + + lspci -s <slot> -vvv + + Example output: + + .. code-block:: console + + [...] + Capabilities: [1b8 v1] Alternative Routing-ID Interpretation (ARI) + [...] + Capabilities: [1c0 v1] Single Root I/O Virtualization (SR-IOV) + [...] + Kernel driver in use: igb_uio + +#. Load the kernel module: + + .. code-block:: console + + modprobe bnx2x + + Example output: + + .. code-block:: console + + systemd-udevd[4848]: renamed network interface eth0 to ens5f0 + systemd-udevd[4848]: renamed network interface eth1 to ens5f1 + +#. Bring up the PF ports: + + .. code-block:: console + + ifconfig ens5f0 up + ifconfig ens5f1 up + +#. Create VF device(s): + + Echo the number of VFs to be created into "sriov_numvfs" sysfs entry + of the parent PF. + + Example output: + + .. code-block:: console + + echo 2 > /sys/devices/pci0000:00/0000:00:03.0/0000:81:00.0/sriov_numvfs + + +#. Assign VF MAC address: + + Assign MAC address to the VF using iproute2 utility. The syntax is: + ip link set <PF iface> vf <VF id> mac <macaddr> + + Example output: + + .. code-block:: console + + ip link set ens5f0 vf 0 mac 52:54:00:2f:9d:e8 + + +#. PCI Passthrough: + + The VF devices may be passed through to the guest VM using virt-manager or + virsh etc. bnx2x PMD should be used to bind the VF devices in the guest VM + using the instructions outlined in the Application notes below. diff --git a/doc/guides/nics/cxgbe.rst b/doc/guides/nics/cxgbe.rst new file mode 100644 index 00000000..d718f196 --- /dev/null +++ b/doc/guides/nics/cxgbe.rst @@ -0,0 +1,589 @@ +.. BSD LICENSE + Copyright 2015 Chelsio Communications. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Chelsio Communications nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +CXGBE Poll Mode Driver +====================== + +The CXGBE PMD (**librte_pmd_cxgbe**) provides poll mode driver support +for **Chelsio T5** 10/40 Gbps family of adapters. CXGBE PMD has support +for the latest Linux and FreeBSD operating systems. + +More information can be found at `Chelsio Communications Official Website +<http://www.chelsio.com>`_. + +Features +-------- + +CXGBE PMD has support for: + +- Multiple queues for TX and RX +- Receiver Side Steering (RSS) +- VLAN filtering +- Checksum offload +- Promiscuous mode +- All multicast mode +- Port hardware statistics +- Jumbo frames + +Limitations +----------- + +The Chelsio T5 devices provide two/four ports but expose a single PCI bus +address, thus, librte_pmd_cxgbe registers itself as a +PCI driver that allocates one Ethernet device per detected port. + +For this reason, one cannot whitelist/blacklist a single port without +whitelisting/blacklisting the other ports on the same device. + +Supported Chelsio T5 NICs +------------------------- + +- 1G NICs: T502-BT +- 10G NICs: T520-BT, T520-CR, T520-LL-CR, T520-SO-CR, T540-CR +- 40G NICs: T580-CR, T580-LP-CR, T580-SO-CR +- Other T5 NICs: T522-CR + +Prerequisites +------------- + +- Requires firmware version **1.13.32.0** and higher. Visit + `Chelsio Download Center <http://service.chelsio.com>`_ to get latest firmware + bundled with the latest Chelsio Unified Wire package. + + For Linux, installing and loading the latest cxgb4 kernel driver from the + Chelsio Unified Wire package should get you the latest firmware. More + information can be obtained from the User Guide that is bundled with the + Chelsio Unified Wire package. + + For FreeBSD, the latest firmware obtained from the Chelsio Unified Wire + package must be manually flashed via cxgbetool available in FreeBSD source + repository. + + Instructions on how to manually flash the firmware are given in section + :ref:`linux-installation` for Linux and section :ref:`freebsd-installation` + for FreeBSD. + +Pre-Installation Configuration +------------------------------ + +Config File Options +~~~~~~~~~~~~~~~~~~~ + +The following options can be modified in the ``.config`` file. Please note that +enabling debugging options may affect system performance. + +- ``CONFIG_RTE_LIBRTE_CXGBE_PMD`` (default **y**) + + Toggle compilation of librte_pmd_cxgbe driver. + +- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG`` (default **n**) + + Toggle display of generic debugging messages. + +- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG_REG`` (default **n**) + + Toggle display of registers related run-time check messages. + +- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG_MBOX`` (default **n**) + + Toggle display of firmware mailbox related run-time check messages. + +- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG_TX`` (default **n**) + + Toggle display of transmission data path run-time check messages. + +- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG_RX`` (default **n**) + + Toggle display of receiving data path run-time check messages. + +.. _driver-compilation: + +Driver Compilation +~~~~~~~~~~~~~~~~~~ + +To compile CXGBE PMD for Linux x86_64 gcc target, run the following "make" +command: + +.. code-block:: console + + cd <DPDK-source-directory> + make config T=x86_64-native-linuxapp-gcc install + +To compile CXGBE PMD for FreeBSD x86_64 clang target, run the following "gmake" +command: + +.. code-block:: console + + cd <DPDK-source-directory> + gmake config T=x86_64-native-bsdapp-clang install + +Linux +----- + +.. _linux-installation: + +Linux Installation +~~~~~~~~~~~~~~~~~~ + +Steps to manually install the latest firmware from the downloaded Chelsio +Unified Wire package for Linux operating system are as follows: + +#. Load the kernel module: + + .. code-block:: console + + modprobe cxgb4 + +#. Use ifconfig to get the interface name assigned to Chelsio card: + + .. code-block:: console + + ifconfig -a | grep "00:07:43" + + Example output: + + .. code-block:: console + + p1p1 Link encap:Ethernet HWaddr 00:07:43:2D:EA:C0 + p1p2 Link encap:Ethernet HWaddr 00:07:43:2D:EA:C8 + +#. Install cxgbtool: + + .. code-block:: console + + cd <path_to_uwire>/tools/cxgbtool + make install + +#. Use cxgbtool to load the firmware config file onto the card: + + .. code-block:: console + + cxgbtool p1p1 loadcfg <path_to_uwire>/src/network/firmware/t5-config.txt + +#. Use cxgbtool to load the firmware image onto the card: + + .. code-block:: console + + cxgbtool p1p1 loadfw <path_to_uwire>/src/network/firmware/t5fw-*.bin + +#. Unload and reload the kernel module: + + .. code-block:: console + + modprobe -r cxgb4 + modprobe cxgb4 + +#. Verify with ethtool: + + .. code-block:: console + + ethtool -i p1p1 | grep "firmware" + + Example output: + + .. code-block:: console + + firmware-version: 1.13.32.0, TP 0.1.4.8 + +Running testpmd +~~~~~~~~~~~~~~~ + +This section demonstrates how to launch **testpmd** with Chelsio T5 +devices managed by librte_pmd_cxgbe in Linux operating system. + +#. Change to DPDK source directory where the target has been compiled in + section :ref:`driver-compilation`: + + .. code-block:: console + + cd <DPDK-source-directory> + +#. Load the kernel module: + + .. code-block:: console + + modprobe cxgb4 + +#. Get the PCI bus addresses of the interfaces bound to cxgb4 driver: + + .. code-block:: console + + dmesg | tail -2 + + Example output: + + .. code-block:: console + + cxgb4 0000:02:00.4 p1p1: renamed from eth0 + cxgb4 0000:02:00.4 p1p2: renamed from eth1 + + .. note:: + + Both the interfaces of a Chelsio T5 2-port adapter are bound to the + same PCI bus address. + +#. Unload the kernel module: + + .. code-block:: console + + modprobe -ar cxgb4 csiostor + +#. Request huge pages: + + .. code-block:: console + + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages + +#. Mount huge pages: + + .. code-block:: console + + mkdir /mnt/huge + mount -t hugetlbfs nodev /mnt/huge + +#. Load igb_uio or vfio-pci driver: + + .. code-block:: console + + insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko + + or + + .. code-block:: console + + modprobe vfio-pci + +#. Bind the Chelsio T5 adapters to igb_uio or vfio-pci loaded in the previous + step: + + .. code-block:: console + + ./tools/dpdk_nic_bind.py --bind igb_uio 0000:02:00.4 + + or + + Setup VFIO permissions for regular users and then bind to vfio-pci: + + .. code-block:: console + + sudo chmod a+x /dev/vfio + + sudo chmod 0666 /dev/vfio/* + + ./tools/dpdk_nic_bind.py --bind vfio-pci 0000:02:00.4 + + .. note:: + + Currently, CXGBE PMD only supports the binding of PF4 for Chelsio T5 NICs. + +#. Start testpmd with basic parameters: + + .. code-block:: console + + ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xf -n 4 -w 0000:02:00.4 -- -i + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:02:00.4 on NUMA socket -1 + EAL: probe driver: 1425:5401 rte_cxgbe_pmd + EAL: PCI memory mapped at 0x7fd7c0200000 + EAL: PCI memory mapped at 0x7fd77cdfd000 + EAL: PCI memory mapped at 0x7fd7c10b7000 + PMD: rte_cxgbe_pmd: fw: 1.13.32.0, TP: 0.1.4.8 + PMD: rte_cxgbe_pmd: Coming up as MASTER: Initializing adapter + Interactive-mode selected + Configuring Port 0 (socket 0) + Port 0: 00:07:43:2D:EA:C0 + Configuring Port 1 (socket 0) + Port 1: 00:07:43:2D:EA:C8 + Checking link statuses... + PMD: rte_cxgbe_pmd: Port0: passive DA port module inserted + PMD: rte_cxgbe_pmd: Port1: passive DA port module inserted + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> + +.. note:: + + Flow control pause TX/RX is disabled by default and can be enabled via + testpmd. Refer section :ref:`flow-control` for more details. + +FreeBSD +------- + +.. _freebsd-installation: + +FreeBSD Installation +~~~~~~~~~~~~~~~~~~~~ + +Steps to manually install the latest firmware from the downloaded Chelsio +Unified Wire package for FreeBSD operating system are as follows: + +#. Load the kernel module: + + .. code-block:: console + + kldload if_cxgbe + +#. Use dmesg to get the t5nex instance assigned to the Chelsio card: + + .. code-block:: console + + dmesg | grep "t5nex" + + Example output: + + .. code-block:: console + + t5nex0: <Chelsio T520-CR> irq 16 at device 0.4 on pci2 + cxl0: <port 0> on t5nex0 + cxl1: <port 1> on t5nex0 + t5nex0: PCIe x8, 2 ports, 14 MSI-X interrupts, 31 eq, 13 iq + + In the example above, a Chelsio T520-CR card is bound to a t5nex0 instance. + +#. Install cxgbetool from FreeBSD source repository: + + .. code-block:: console + + cd <path_to_FreeBSD_source>/tools/tools/cxgbetool/ + make && make install + +#. Use cxgbetool to load the firmware image onto the card: + + .. code-block:: console + + cxgbetool t5nex0 loadfw <path_to_uwire>/src/network/firmware/t5fw-*.bin + +#. Unload and reload the kernel module: + + .. code-block:: console + + kldunload if_cxgbe + kldload if_cxgbe + +#. Verify with sysctl: + + .. code-block:: console + + sysctl -a | grep "t5nex" | grep "firmware" + + Example output: + + .. code-block:: console + + dev.t5nex.0.firmware_version: 1.13.32.0 + +Running testpmd +~~~~~~~~~~~~~~~ + +This section demonstrates how to launch **testpmd** with Chelsio T5 +devices managed by librte_pmd_cxgbe in FreeBSD operating system. + +#. Change to DPDK source directory where the target has been compiled in + section :ref:`driver-compilation`: + + .. code-block:: console + + cd <DPDK-source-directory> + +#. Copy the contigmem kernel module to /boot/kernel directory: + + .. code-block:: console + + cp x86_64-native-bsdapp-clang/kmod/contigmem.ko /boot/kernel/ + +#. Add the following lines to /boot/loader.conf: + + .. code-block:: console + + # reserve 2 x 1G blocks of contiguous memory using contigmem driver + hw.contigmem.num_buffers=2 + hw.contigmem.buffer_size=1073741824 + # load contigmem module during boot process + contigmem_load="YES" + + The above lines load the contigmem kernel module during boot process and + allocate 2 x 1G blocks of contiguous memory to be used for DPDK later on. + This is to avoid issues with potential memory fragmentation during later + system up time, which may result in failure of allocating the contiguous + memory required for the contigmem kernel module. + +#. Restart the system and ensure the contigmem module is loaded successfully: + + .. code-block:: console + + reboot + kldstat | grep "contigmem" + + Example output: + + .. code-block:: console + + 2 1 0xffffffff817f1000 3118 contigmem.ko + +#. Repeat step 1 to ensure that you are in the DPDK source directory. + +#. Load the cxgbe kernel module: + + .. code-block:: console + + kldload if_cxgbe + +#. Get the PCI bus addresses of the interfaces bound to t5nex driver: + + .. code-block:: console + + pciconf -l | grep "t5nex" + + Example output: + + .. code-block:: console + + t5nex0@pci0:2:0:4: class=0x020000 card=0x00001425 chip=0x54011425 rev=0x00 + + In the above example, the t5nex0 is bound to 2:0:4 bus address. + + .. note:: + + Both the interfaces of a Chelsio T5 2-port adapter are bound to the + same PCI bus address. + +#. Unload the kernel module: + + .. code-block:: console + + kldunload if_cxgbe + +#. Set the PCI bus addresses to hw.nic_uio.bdfs kernel environment parameter: + + .. code-block:: console + + kenv hw.nic_uio.bdfs="2:0:4" + + This automatically binds 2:0:4 to nic_uio kernel driver when it is loaded in + the next step. + + .. note:: + + Currently, CXGBE PMD only supports the binding of PF4 for Chelsio T5 NICs. + +#. Load nic_uio kernel driver: + + .. code-block:: console + + kldload ./x86_64-native-bsdapp-clang/kmod/nic_uio.ko + +#. Start testpmd with basic parameters: + + .. code-block:: console + + ./x86_64-native-bsdapp-clang/app/testpmd -c 0xf -n 4 -w 0000:02:00.4 -- -i + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:02:00.4 on NUMA socket 0 + EAL: probe driver: 1425:5401 rte_cxgbe_pmd + EAL: PCI memory mapped at 0x8007ec000 + EAL: PCI memory mapped at 0x842800000 + EAL: PCI memory mapped at 0x80086c000 + PMD: rte_cxgbe_pmd: fw: 1.13.32.0, TP: 0.1.4.8 + PMD: rte_cxgbe_pmd: Coming up as MASTER: Initializing adapter + Interactive-mode selected + Configuring Port 0 (socket 0) + Port 0: 00:07:43:2D:EA:C0 + Configuring Port 1 (socket 0) + Port 1: 00:07:43:2D:EA:C8 + Checking link statuses... + PMD: rte_cxgbe_pmd: Port0: passive DA port module inserted + PMD: rte_cxgbe_pmd: Port1: passive DA port module inserted + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> + +.. note:: + + Flow control pause TX/RX is disabled by default and can be enabled via + testpmd. Refer section :ref:`flow-control` for more details. + +Sample Application Notes +------------------------ + +.. _flow-control: + +Enable/Disable Flow Control +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Flow control pause TX/RX is disabled by default and can be enabled via +testpmd as follows: + +.. code-block:: console + + testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off autoneg on 0 + testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off autoneg on 1 + +To disable again, run: + +.. code-block:: console + + testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off autoneg off 0 + testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off autoneg off 1 + +Jumbo Mode +~~~~~~~~~~ + +There are two ways to enable sending and receiving of jumbo frames via testpmd. +One method involves using the **mtu** command, which changes the mtu of an +individual port without having to stop the selected port. Another method +involves stopping all the ports first and then running **max-pkt-len** command +to configure the mtu of all the ports with a single command. + +- To configure each port individually, run the mtu command as follows: + + .. code-block:: console + + testpmd> port config mtu 0 9000 + testpmd> port config mtu 1 9000 + +- To configure all the ports at once, stop all the ports first and run the + max-pkt-len command as follows: + + .. code-block:: console + + testpmd> port stop all + testpmd> port config all max-pkt-len 9000 diff --git a/doc/guides/nics/e1000em.rst b/doc/guides/nics/e1000em.rst new file mode 100644 index 00000000..265b147a --- /dev/null +++ b/doc/guides/nics/e1000em.rst @@ -0,0 +1,182 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Driver for VM Emulated Devices +============================== + +The DPDK EM poll mode driver supports the following emulated devices: + +* qemu-kvm emulated Intel® 82540EM Gigabit Ethernet Controller (qemu e1000 device) + +* VMware* emulated Intel® 82545EM Gigabit Ethernet Controller + +* VMware emulated Intel® 8274L Gigabit Ethernet Controller. + +Validated Hypervisors +--------------------- + +The validated hypervisors are: + +* KVM (Kernel Virtual Machine) with Qemu, version 0.14.0 + +* KVM (Kernel Virtual Machine) with Qemu, version 0.15.1 + +* VMware ESXi 5.0, Update 1 + +Recommended Guest Operating System in Virtual Machine +----------------------------------------------------- + +The recommended guest operating system in a virtualized environment is: + +* Fedora* 18 (64-bit) + +For supported kernel versions, refer to the *DPDK Release Notes*. + +Setting Up a KVM Virtual Machine +-------------------------------- + +The following describes a target environment: + +* Host Operating System: Fedora 14 + +* Hypervisor: KVM (Kernel Virtual Machine) with Qemu version, 0.14.0 + +* Guest Operating System: Fedora 14 + +* Linux Kernel Version: Refer to the DPDK Getting Started Guide + +* Target Applications: testpmd + +The setup procedure is as follows: + +#. Download qemu-kvm-0.14.0 from + `http://sourceforge.net/projects/kvm/files/qemu-kvm/ <http://sourceforge.net/projects/kvm/files/qemu-kvm/>`_ + and install it in the Host OS using the following steps: + + When using a recent kernel (2.6.25+) with kvm modules included: + + .. code-block:: console + + tar xzf qemu-kvm-release.tar.gz cd qemu-kvm-release + ./configure --prefix=/usr/local/kvm + make + sudo make install + sudo /sbin/modprobe kvm-intel + + When using an older kernel or a kernel from a distribution without the kvm modules, + you must download (from the same link), compile and install the modules yourself: + + .. code-block:: console + + tar xjf kvm-kmod-release.tar.bz2 + cd kvm-kmod-release + ./configure + make + sudo make install + sudo /sbin/modprobe kvm-intel + + Note that qemu-kvm installs in the /usr/local/bin directory. + + For more details about KVM configuration and usage, please refer to: + `http://www.linux-kvm.org/page/HOWTO1 <http://www.linux-kvm.org/page/HOWTO1>`_. + +#. Create a Virtual Machine and install Fedora 14 on the Virtual Machine. + This is referred to as the Guest Operating System (Guest OS). + +#. Start the Virtual Machine with at least one emulated e1000 device. + + .. note:: + + The Qemu provides several choices for the emulated network device backend. + Most commonly used is a TAP networking backend that uses a TAP networking device in the host. + For more information about Qemu supported networking backends and different options for configuring networking at Qemu, + please refer to: + + — `http://www.linux-kvm.org/page/Networking <http://www.linux-kvm.org/page/Networking>`_ + + — `http://wiki.qemu.org/Documentation/Networking <http://wiki.qemu.org/Documentation/Networking>`_ + + — `http://qemu.weilnetz.de/qemu-doc.html <http://qemu.weilnetz.de/qemu-doc.html>`_ + + For example, to start a VM with two emulated e1000 devices, issue the following command: + + .. code-block:: console + + /usr/local/kvm/bin/qemu-system-x86_64 -cpu host -smp 4 -hda qemu1.raw -m 1024 + -net nic,model=e1000,vlan=1,macaddr=DE:AD:1E:00:00:01 + -net tap,vlan=1,ifname=tapvm01,script=no,downscript=no + -net nic,model=e1000,vlan=2,macaddr=DE:AD:1E:00:00:02 + -net tap,vlan=2,ifname=tapvm02,script=no,downscript=no + + where: + + — -m = memory to assign + + — -smp = number of smp cores + + — -hda = virtual disk image + + This command starts a new virtual machine with two emulated 82540EM devices, + backed up with two TAP networking host interfaces, tapvm01 and tapvm02. + + .. code-block:: console + + # ip tuntap show + tapvm01: tap + tapvm02: tap + +#. Configure your TAP networking interfaces using ip/ifconfig tools. + +#. Log in to the guest OS and check that the expected emulated devices exist: + + .. code-block:: console + + # lspci -d 8086:100e + 00:04.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) + 00:05.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) + +#. Install the DPDK and run testpmd. + +Known Limitations of Emulated Devices +------------------------------------- + +The following are known limitations: + +#. The Qemu e1000 RX path does not support multiple descriptors/buffers per packet. + Therefore, rte_mbuf should be big enough to hold the whole packet. + For example, to allow testpmd to receive jumbo frames, use the following: + + testpmd [options] -- --mbuf-size=<your-max-packet-size> + +#. Qemu e1000 does not validate the checksum of incoming packets. + +#. Qemu e1000 only supports one interrupt source, so link and Rx interrupt should be exclusive. + +#. Qemu e1000 does not support interrupt auto-clear, application should disable interrupt immediately when woken up. diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst new file mode 100644 index 00000000..9f938486 --- /dev/null +++ b/doc/guides/nics/ena.rst @@ -0,0 +1,251 @@ +.. BSD LICENSE + + Copyright (c) 2015-2016 Amazon.com, Inc. or its affiliates. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Amazon.com, Inc. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ENA Poll Mode Driver +==================== + +The ENA PMD is a DPDK poll-mode driver for the Amazon Elastic +Network Adapter (ENA) family. + +Overview +-------- + +The ENA driver exposes a lightweight management interface with a +minimal set of memory mapped registers and an extendable command set +through an Admin Queue. + +The driver supports a wide range of ENA adapters, is link-speed +independent (i.e., the same driver is used for 10GbE, 25GbE, 40GbE, +etc.), and it negotiates and supports an extendable feature set. + +ENA adapters allow high speed and low overhead Ethernet traffic +processing by providing a dedicated Tx/Rx queue pair per CPU core. + +The ENA driver supports industry standard TCP/IP offload features such +as checksum offload and TCP transmit segmentation offload (TSO). + +Receive-side scaling (RSS) is supported for multi-core scaling. + +Some of the ENA devices support a working mode called Low-latency +Queue (LLQ), which saves several more microseconds. + +Management Interface +-------------------- + +ENA management interface is exposed by means of: + +* Device Registers +* Admin Queue (AQ) and Admin Completion Queue (ACQ) + +ENA device memory-mapped PCIe space for registers (MMIO registers) +are accessed only during driver initialization and are not involved +in further normal device operation. + +AQ is used for submitting management commands, and the +results/responses are reported asynchronously through ACQ. + +ENA introduces a very small set of management commands with room for +vendor-specific extensions. Most of the management operations are +framed in a generic Get/Set feature command. + +The following admin queue commands are supported: + +* Create I/O submission queue +* Create I/O completion queue +* Destroy I/O submission queue +* Destroy I/O completion queue +* Get feature +* Set feature +* Get statistics + +Refer to ``ena_admin_defs.h`` for the list of supported Get/Set Feature +properties. + +Data Path Interface +------------------- + +I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx +SQ correspondingly). Each SQ has a completion queue (CQ) associated +with it. + +The SQs and CQs are implemented as descriptor rings in contiguous +physical memory. + +Refer to ``ena_eth_io_defs.h`` for the detailed structure of the descriptor + +The driver supports multi-queue for both Tx and Rx. + +Configuration information +------------------------- + +**DPDK Configuration Parameters** + + The following configuration options are available for the ENA PMD: + + * **CONFIG_RTE_LIBRTE_ENA_PMD** (default y): Enables or disables inclusion + of the ENA PMD driver in the DPDK compilation. + + + * **CONFIG_RTE_LIBRTE_ENA_DEBUG_INIT** (default y): Enables or disables debug + logging of device initialization within the ENA PMD driver. + + * **CONFIG_RTE_LIBRTE_ENA_DEBUG_RX** (default n): Enables or disables debug + logging of RX logic within the ENA PMD driver. + + * **CONFIG_RTE_LIBRTE_ENA_DEBUG_TX** (default n): Enables or disables debug + logging of TX logic within the ENA PMD driver. + + * **CONFIG_RTE_LIBRTE_ENA_COM_DEBUG** (default n): Enables or disables debug + logging of low level tx/rx logic in ena_com(base) within the ENA PMD driver. + +**ENA Configuration Parameters** + + * **Number of Queues** + + This is the requested number of queues upon initialization, however, the actual + number of receive and transmit queues to be created will be the minimum between + the maximal number supported by the device and number of queues requested. + + * **Size of Queues** + + This is the requested size of receive/transmit queues, while the actual size + will be the minimum between the requested size and the maximal receive/transmit + supported by the device. + +Building DPDK +------------- + +See the :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` for +instructions on how to build DPDK. + +By default the ENA PMD library will be built into the DPDK library. + +For configuring and using UIO and VFIO frameworks, please also refer :ref:`the +documentation that comes with DPDK suite <linux_gsg>`. + +Supported ENA adapters +---------------------- + +Current ENA PMD supports the following ENA adapters including: + +* ``1d0f:ec20`` - ENA VF +* ``1d0f:ec21`` - ENA VF with LLQ support + +Supported Operating Systems +--------------------------- + +Any Linux distribution fulfilling the conditions described in ``System Requirements`` +section of :ref:`the DPDK documentation <linux_gsg>` or refer to *DPDK Release Notes*. + +Supported features +------------------ + +* Jumbo frames up to 9K +* Port Hardware Statistics +* IPv4/TCP/UDP checksum offload +* TSO offload +* Multiple receive and transmit queues +* RSS +* Low Latency Queue for Tx + +Unsupported features +-------------------- + +The features supported by the device and not yet supported by this PMD include: + +* Asynchronous Event Notification Queue (AENQ) + +Prerequisites +------------- + +#. Prepare the system as recommended by DPDK suite. This includes environment + variables, hugepages configuration, tool-chains and configuration + +#. Insert igb_uio kernel module using the command 'modprobe igb_uio' + +#. Bind the intended ENA device to igb_uio module + + +At this point the system should be ready to run DPDK applications. Once the +application runs to completion, the ENA can be detached from igb_uio if necessary. + +Usage example +------------- + +This section demonstrates how to launch **testpmd** with Amazon ENA +devices managed by librte_pmd_ena. + +#. Load the kernel modules: + + .. code-block:: console + + modprobe uio + insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko + + .. note:: + + Currently Amazon ENA PMD driver depends on igb_uio user space I/O kernel module + +#. Mount and request huge pages: + + .. code-block:: console + + mount -t hugetlbfs nodev /mnt/hugepages + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages + +#. Bind UIO driver to ENA device (using provided by DPDK binding tool): + + .. code-block:: console + + ./tools/dpdk_nic_bind.py --bind=igb_uio 0000:02:00.1 + +#. Start testpmd with basic parameters: + + .. code-block:: console + + ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xf -n 4 -- -i + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:02:00.1 on NUMA socket -1 + EAL: probe driver: 1d0f:ec20 rte_ena_pmd + EAL: PCI memory mapped at 0x7f9b6c400000 + PMD: eth_ena_dev_init(): Initializing 0:2:0.1 + Interactive-mode selected + Configuring Port 0 (socket 0) + Port 0: 00:00:00:11:00:01 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> diff --git a/doc/guides/nics/enic.rst b/doc/guides/nics/enic.rst new file mode 100644 index 00000000..e67c3dbf --- /dev/null +++ b/doc/guides/nics/enic.rst @@ -0,0 +1,221 @@ +.. BSD LICENSE + Copyright (c) 2015, Cisco Systems, Inc. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS + FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE + COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, + INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, + BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN + ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + POSSIBILITY OF SUCH DAMAGE. + +ENIC Poll Mode Driver +===================== + +ENIC PMD is the DPDK poll-mode driver for the Cisco System Inc. VIC Ethernet +NICs. These adapters are also referred to as vNICs below. If you are running +or would like to run DPDK software applications on Cisco UCS servers using +Cisco VIC adapters the following documentation is relevant. + +Version Information +------------------- + +The version of the ENIC PMD driver is 1.0.0.6 and will be printed by ENIC PMD +during the initialization. + +How to obtain ENIC PMD integrated DPDK +-------------------------------------- + +ENIC PMD support is integrated into the DPDK suite. dpdk-<version>.tar.gz +should be downloaded from http://dpdk.org + + +Configuration information +------------------------- + +- **DPDK Configuration Parameters** + + The following configuration options are available for the ENIC PMD: + + - **CONFIG_RTE_LIBRTE_ENIC_PMD** (default y): Enables or disables inclusion + of the ENIC PMD driver in the DPDK compilation. + + - **CONFIG_RTE_LIBRTE_ENIC_DEBUG** (default n): Enables or disables debug + logging within the ENIC PMD driver. + +- **vNIC Configuration Parameters** + + - **Number of Queues** + + The maximum number of receive and transmit queues are configurable on a per + vNIC basis through the Cisco UCS Manager (CIMC or UCSM). These values + should be configured to be greater than or equal to the nb_rx_q and nb_tx_q + parameters expected to used in the call to the rte_eth_dev_configure() + function. + + - **Size of Queues** + + Likewise, the number of receive and transmit descriptors are configurable on + a per vNIC bases via the UCS Manager and should be greater than or equal to + the nb_rx_desc and nb_tx_desc parameters expected to be used in the calls + to rte_eth_rx_queue_setup() and rte_eth_tx_queue_setup() respectively. + + - **Interrupts** + + Only one interrupt per vNIC interface should be configured in the UCS + manager regardless of the number receive/transmit queues. The ENIC PMD + uses this interrupt to get information about errors in the fast path. + +Limitations +----------- + +- **VLAN 0 Priority Tagging** + + If a vNIC is configured in TRUNK mode by the UCS manager, the adapter will + priority tag egress packets according to 802.1Q if they were not already + VLAN tagged by software. If the adapter is connected to a properly configured + switch, there will be no unexpected behavior. + + In test setups where an Ethernet port of a Cisco adapter in TRUNK mode is + connected point-to-point to another adapter port or connected though a router + instead of a switch, all ingress packets will be VLAN tagged. Programs such + as l3fwd which do not account for VLAN tags in packets will misbehave. The + solution is to enable VLAN stripping on ingress. The follow code fragment is + example of how to accomplish this: + +.. code-block:: console + + vlan_offload = rte_eth_dev_get_vlan_offload(port); + vlan_offload |= ETH_VLAN_STRIP_OFFLOAD; + rte_eth_dev_set_vlan_offload(port, vlan_offload); + +How to build the suite? +----------------------- +The build instructions for the DPDK suite should be followed. By default +the ENIC PMD library will be built into the DPDK library. + +For configuring and using UIO and VFIO frameworks, please refer the +documentation that comes with DPDK suite. + +Supported Cisco VIC adapters +---------------------------- + +ENIC PMD supports all recent generations of Cisco VIC adapters including: + +- VIC 1280 +- VIC 1240 +- VIC 1225 +- VIC 1285 +- VIC 1225T +- VIC 1227 +- VIC 1227T +- VIC 1380 +- VIC 1340 +- VIC 1385 +- VIC 1387 + +- Flow director features are not supported on generation 1 Cisco VIC adapters + (M81KR and P81E) + +Supported Operating Systems +--------------------------- +Any Linux distribution fulfilling the conditions described in Dependencies +section of DPDK documentation. + +Supported features +------------------ +- Unicast, multicast and broadcast transmission and reception +- Receive queue polling +- Port Hardware Statistics +- Hardware VLAN acceleration +- IP checksum offload +- Receive side VLAN stripping +- Multiple receive and transmit queues +- Flow Director ADD, UPDATE, DELETE, STATS operation support for IPV4 5-TUPLE + flows +- Promiscuous mode +- Setting RX VLAN (supported via UCSM/CIMC only) +- VLAN filtering (supported via UCSM/CIMC only) +- Execution of application by unprivileged system users +- IPV4, IPV6 and TCP RSS hashing + +Known bugs and Unsupported features in this release +--------------------------------------------------- +- Signature or flex byte based flow direction +- Drop feature of flow direction +- VLAN based flow direction +- non-IPV4 flow direction +- Setting of extended VLAN +- UDP RSS hashing + +Prerequisites +------------- +- Prepare the system as recommended by DPDK suite. This includes environment + variables, hugepages configuration, tool-chains and configuration +- Insert vfio-pci kernel module using the command 'modprobe vfio-pci' if the + user wants to use VFIO framework +- Insert uio kernel module using the command 'modprobe uio' if the user wants + to use UIO framework +- DPDK suite should be configured based on the user's decision to use VFIO or + UIO framework +- If the vNIC device(s) to be used is bound to the kernel mode Ethernet driver + (enic), use 'ifconfig' to bring the interface down. The dpdk_nic_bind.py tool + can then be used to unbind the device's bus id from the enic kernel mode + driver. +- Bind the intended vNIC to vfio-pci in case the user wants ENIC PMD to use + VFIO framework using dpdk_nic_bind.py. +- Bind the intended vNIC to igb_uio in case the user wants ENIC PMD to use + UIO framework using dpdk_nic_bind.py. + +At this point the system should be ready to run DPDK applications. Once the +application runs to completion, the vNIC can be detached from vfio-pci or +igb_uio if necessary. + +Root privilege is required to bind and unbind vNICs to/from VFIO/UIO. +VFIO framework helps an unprivileged user to run the applications. +For an unprivileged user to run the applications on DPDK and ENIC PMD, +it may be necessary to increase the maximum locked memory of the user. +The following command could be used to do this. + +.. code-block:: console + + sudo sh -c "ulimit -l <value in Kilo Bytes>" + +The value depends on the memory configuration of the application, DPDK and +PMD. Typically, the limit has to be raised to higher than 2GB. +e.g., 2621440 + +The compilation of any unused drivers can be disabled using the +configuration file in config/ directory (e.g., config/common_linuxapp). +This would help in bringing down the time taken for building the +libraries and the initialization time of the application. + +Additional Reference +-------------------- +- http://www.cisco.com/c/en/us/products/servers-unified-computing + +Contact Information +------------------- +Any questions or bugs should be reported to DPDK community and to the ENIC PMD +maintainers: + +- John Daley <johndale@cisco.com> +- Nelson Escobar <neescoba@cisco.com> diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst new file mode 100644 index 00000000..c4915d82 --- /dev/null +++ b/doc/guides/nics/fm10k.rst @@ -0,0 +1,203 @@ +.. BSD LICENSE + Copyright(c) 2015-2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +FM10K Poll Mode Driver +====================== + +The FM10K poll mode driver library provides support for the Intel FM10000 +(FM10K) family of 40GbE/100GbE adapters. + +FTAG Based Forwarding of FM10K +------------------------------ + +FTAG Based Forwarding is a unique feature of FM10K. The FM10K family of NICs +support the addition of a Fabric Tag (FTAG) to carry special information. +The FTAG is placed at the beginning of the frame, it contains information +such as where the packet comes from and goes, and the vlan tag. In FTAG based +forwarding mode, the switch logic forwards packets according to glort (global +resource tag) information, rather than the mac and vlan table. Currently this +feature works only on PF. + +To enable this feature, the user should pass a devargs parameter to the eal +like "-w 84:00.0,enable_ftag=1", and the application should make sure an +appropriate FTAG is inserted for every frame on TX side. + +Vector PMD for FM10K +-------------------- + +Vector PMD (vPMD) uses Intel® SIMD instructions to optimize packet I/O. +It improves load/store bandwidth efficiency of L1 data cache by using a wider +SSE/AVX ''register (1)''. +The wider register gives space to hold multiple packet buffers so as to save +on the number of instructions when bulk processing packets. + +There is no change to the PMD API. The RX/TX handlers are the only two entries for +vPMD packet I/O. They are transparently registered at runtime RX/TX execution +if all required conditions are met. + +1. To date, only an SSE version of FM10K vPMD is available. + To ensure that vPMD is in the binary code, set + ``CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y`` in the configure file. + +Some constraints apply as pre-conditions for specific optimizations on bulk +packet transfers. The following sections explain RX and TX constraints in the +vPMD. + + +RX Constraints +~~~~~~~~~~~~~~ + + +Prerequisites and Pre-conditions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For Vector RX it is assumed that the number of descriptor rings will be a power +of 2. With this pre-condition, the ring pointer can easily scroll back to the +head after hitting the tail without a conditional check. In addition Vector RX +can use this assumption to do a bit mask using ``ring_size - 1``. + + +Features not Supported by Vector RX PMD +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Some features are not supported when trying to increase the throughput in +vPMD. They are: + +* IEEE1588 + +* Flow director + +* Header split + +* RX checksum offload + +Other features are supported using optional MACRO configuration. They include: + +* HW VLAN strip + +* L3/L4 packet type + +To enable via ``RX_OLFLAGS`` use ``RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y``. + +To guarantee the constraint, the following configuration flags in ``dev_conf.rxmode`` +will be checked: + +* ``hw_vlan_extend`` + +* ``hw_ip_checksum`` + +* ``header_split`` + +* ``fdir_conf->mode`` + + +RX Burst Size +^^^^^^^^^^^^^ + +As vPMD is focused on high throughput, it processes 4 packets at a time. So it assumes +that the RX burst should be greater than 4 packets per burst. It returns zero if using +``nb_pkt`` < 4 in the receive handler. If ``nb_pkt`` is not a multiple of 4, a +floor alignment will be applied. + + +TX Constraint +~~~~~~~~~~~~~ + +Features not Supported by TX Vector PMD +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +TX vPMD only works when ``txq_flags`` is set to ``FM10K_SIMPLE_TX_FLAG``. +This means that it does not support TX multi-segment, VLAN offload or TX csum +offload. The following MACROs are used for these three features: + +* ``ETH_TXQ_FLAGS_NOMULTSEGS`` + +* ``ETH_TXQ_FLAGS_NOVLANOFFL`` + +* ``ETH_TXQ_FLAGS_NOXSUMSCTP`` + +* ``ETH_TXQ_FLAGS_NOXSUMUDP`` + +* ``ETH_TXQ_FLAGS_NOXSUMTCP`` + +Limitations +----------- + + +Switch manager +~~~~~~~~~~~~~~ + +The Intel FM10000 family of NICs integrate a hardware switch and multiple host +interfaces. The FM10000 PMD driver only manages host interfaces. For the +switch component another switch driver has to be loaded prior to to the +FM10000 PMD driver. The switch driver can be acquired for Intel support or +from the `Match Interface <https://github.com/match-interface>`_ project. +Only Testpoint is validated with DPDK, the latest version that has been +validated with DPDK2.2 is 4.1.6. + +CRC striping +~~~~~~~~~~~~ + +The FM10000 family of NICs strip the CRC for every packets coming into the +host interface. So, CRC will be stripped even when the +``rxmode.hw_strip_crc`` member is set to 0 in ``struct rte_eth_conf``. + + +Maximum packet length +~~~~~~~~~~~~~~~~~~~~~ + +The FM10000 family of NICS support a maximum of a 15K jumbo frame. The value +is fixed and cannot be changed. So, even when the ``rxmode.max_rx_pkt_len`` +member of ``struct rte_eth_conf`` is set to a value lower than 15364, frames +up to 15364 bytes can still reach the host interface. + +Statistic Polling Frequency +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The FM10000 NICs expose a set of statistics via the PCI BARs. These statistics +are read from the hardware registers when ``rte_eth_stats_get()`` or +``rte_eth_xstats_get()`` is called. The packet counting registers are 32 bits +while the byte counting registers are 48 bits. As a result, the statistics must +be polled regularly in order to ensure the consistency of the returned reads. + +Given the PCIe Gen3 x8, about 50Gbps of traffic can occur. With 64 byte packets +this gives almost 100 million packets/second, causing 32 bit integer overflow +after approx 40 seconds. To ensure these overflows are detected and accounted +for in the statistics, it is necessary to read statistic regularly. It is +suggested to read stats every 20 seconds, which will ensure the statistics +are accurate. + + +Interrupt mode +~~~~~~~~~~~~~~ + +The FM10000 family of NICS need one separate interrupt for mailbox. So only +drivers which support multiple interrupt vectors e.g. vfio-pci can work +for fm10k interrupt mode. diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst new file mode 100644 index 00000000..934eb027 --- /dev/null +++ b/doc/guides/nics/i40e.rst @@ -0,0 +1,368 @@ +.. BSD LICENSE + Copyright(c) 2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +I40E Poll Mode Driver +====================== + +The I40E PMD (librte_pmd_i40e) provides poll mode driver support +for the Intel X710/XL710/X722 10/40 Gbps family of adapters. + + +Features +-------- + +Features of the I40E PMD are: + +- Multiple queues for TX and RX +- Receiver Side Scaling (RSS) +- MAC/VLAN filtering +- Packet type information +- Flow director +- Cloud filter +- Checksum offload +- VLAN/QinQ stripping and inserting +- TSO offload +- Promiscuous mode +- Multicast mode +- Port hardware statistics +- Jumbo frames +- Link state information +- Link flow control +- Mirror on port, VLAN and VSI +- Interrupt mode for RX +- Scattered and gather for TX and RX +- Vector Poll mode driver +- DCB +- VMDQ +- SR-IOV VF +- Hot plug +- IEEE1588/802.1AS timestamping + + +Prerequisites +------------- + +- Identifying your adapter using `Intel Support + <http://www.intel.com/support>`_ and get the latest NVM/FW images. + +- Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup the basic DPDK environment. + +- To get better performance on Intel platforms, please follow the "How to get best performance with NICs on Intel platforms" + section of the :ref:`Getting Started Guide for Linux <linux_gsg>`. + + +Pre-Installation Configuration +------------------------------ + +Config File Options +~~~~~~~~~~~~~~~~~~~ + +The following options can be modified in the ``config`` file. +Please note that enabling debugging options may affect system performance. + +- ``CONFIG_RTE_LIBRTE_I40E_PMD`` (default ``y``) + + Toggle compilation of the ``librte_pmd_i40e`` driver. + +- ``CONFIG_RTE_LIBRTE_I40E_DEBUG_*`` (default ``n``) + + Toggle display of generic debugging messages. + +- ``CONFIG_RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC`` (default ``y``) + + Toggle bulk allocation for RX. + +- ``CONFIG_RTE_LIBRTE_I40E_INC_VECTOR`` (default ``n``) + + Toggle the use of Vector PMD instead of normal RX/TX path. + To enable vPMD for RX, bulk allocation for Rx must be allowed. + +- ``CONFIG_RTE_LIBRTE_I40E_RX_OLFLAGS_ENABLE`` (default ``y``) + + Toggle to enable RX ``olflags``. + This is only meaningful when Vector PMD is used. + +- ``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC`` (default ``n``) + + Toggle to use a 16-byte RX descriptor, by default the RX descriptor is 32 byte. + +- ``CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF`` (default ``64``) + + Number of queues reserved for PF. + +- ``CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF`` (default ``4``) + + Number of queues reserved for each SR-IOV VF. + +- ``CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM`` (default ``4``) + + Number of queues reserved for each VMDQ Pool. + +- ``CONFIG_RTE_LIBRTE_I40E_ITR_INTERVAL`` (default ``-1``) + + Interrupt Throttling interval. + + +Driver Compilation +~~~~~~~~~~~~~~~~~~ + +To compile the I40E PMD see :ref:`Getting Started Guide for Linux <linux_gsg>` or +:ref:`Getting Started Guide for FreeBSD <freebsd_gsg>` depending on your platform. + + +Linux +----- + + +Running testpmd +~~~~~~~~~~~~~~~ + +This section demonstrates how to launch ``testpmd`` with Intel XL710/X710 +devices managed by ``librte_pmd_i40e`` in the Linux operating system. + +#. Load ``igb_uio`` or ``vfio-pci`` driver: + + .. code-block:: console + + modprobe uio + insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko + + or + + .. code-block:: console + + modprobe vfio-pci + +#. Bind the XL710/X710 adapters to ``igb_uio`` or ``vfio-pci`` loaded in the previous step: + + .. code-block:: console + + ./tools/dpdk_nic_bind.py --bind igb_uio 0000:83:00.0 + + Or setup VFIO permissions for regular users and then bind to ``vfio-pci``: + + .. code-block:: console + + ./tools/dpdk_nic_bind.py --bind vfio-pci 0000:83:00.0 + +#. Start ``testpmd`` with basic parameters: + + .. code-block:: console + + ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xf -n 4 -w 83:00.0 -- -i + + Example output: + + .. code-block:: console + + ... + EAL: PCI device 0000:83:00.0 on NUMA socket 1 + EAL: probe driver: 8086:1572 rte_i40e_pmd + EAL: PCI memory mapped at 0x7f7f80000000 + EAL: PCI memory mapped at 0x7f7f80800000 + PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 8000208a + Interactive-mode selected + Configuring Port 0 (socket 0) + ... + + PMD: i40e_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are + satisfied.Rx Burst Bulk Alloc function will be used on port=0, queue=0. + + ... + Port 0: 68:05:CA:26:85:84 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Done + + testpmd> + + +SR-IOV: Prerequisites and sample Application Notes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +#. Load the kernel module: + + .. code-block:: console + + modprobe i40e + + Check the output in dmesg: + + .. code-block:: console + + i40e 0000:83:00.1 ens802f0: renamed from eth0 + +#. Bring up the PF ports: + + .. code-block:: console + + ifconfig ens802f0 up + +#. Create VF device(s): + + Echo the number of VFs to be created into the ``sriov_numvfs`` sysfs entry + of the parent PF. + + Example: + + .. code-block:: console + + echo 2 > /sys/devices/pci0000:00/0000:00:03.0/0000:81:00.0/sriov_numvfs + + +#. Assign VF MAC address: + + Assign MAC address to the VF using iproute2 utility. The syntax is: + + .. code-block:: console + + ip link set <PF netdev id> vf <VF id> mac <macaddr> + + Example: + + .. code-block:: console + + ip link set ens802f0 vf 0 mac a0:b0:c0:d0:e0:f0 + +#. Assign VF to VM, and bring up the VM. + Please see the documentation for the *I40E/IXGBE/IGB Virtual Function Driver*. + + +Sample Application Notes +------------------------ + +Vlan filter +~~~~~~~~~~~ + +Vlan filter only works when Promiscuous mode is off. + +To start ``testpmd``, and add vlan 10 to port 0: + +.. code-block:: console + + ./app/testpmd -c ffff -n 4 -- -i --forward-mode=mac + ... + + testpmd> set promisc 0 off + testpmd> rx_vlan add 10 0 + + +Flow Director +~~~~~~~~~~~~~ + +The Flow Director works in receive mode to identify specific flows or sets of flows and route them to specific queues. +The Flow Director filters can match the different fields for different type of packet: flow type, specific input set per flow type and the flexible payload. + +The default input set of each flow type is:: + + ipv4-other : src_ip_address, dst_ip_address + ipv4-frag : src_ip_address, dst_ip_address + ipv4-tcp : src_ip_address, dst_ip_address, src_port, dst_port + ipv4-udp : src_ip_address, dst_ip_address, src_port, dst_port + ipv4-sctp : src_ip_address, dst_ip_address, src_port, dst_port, + verification_tag + ipv6-other : src_ip_address, dst_ip_address + ipv6-frag : src_ip_address, dst_ip_address + ipv6-tcp : src_ip_address, dst_ip_address, src_port, dst_port + ipv6-udp : src_ip_address, dst_ip_address, src_port, dst_port + ipv6-sctp : src_ip_address, dst_ip_address, src_port, dst_port, + verification_tag + l2_payload : ether_type + +The flex payload is selected from offset 0 to 15 of packet's payload by default, while it is masked out from matching. + +Start ``testpmd`` with ``--disable-rss`` and ``--pkt-filter-mode=perfect``: + +.. code-block:: console + + ./app/testpmd -c ffff -n 4 -- -i --disable-rss --pkt-filter-mode=perfect \ + --rxq=8 --txq=8 --nb-cores=8 --nb-ports=1 + +Add a rule to direct ``ipv4-udp`` packet whose ``dst_ip=2.2.2.5, src_ip=2.2.2.3, src_port=32, dst_port=32`` to queue 1: + +.. code-block:: console + + testpmd> flow_director_filter 0 mode IP add flow ipv4-udp \ + src 2.2.2.3 32 dst 2.2.2.5 32 vlan 0 flexbytes () \ + fwd pf queue 1 fd_id 1 + +Check the flow director status: + +.. code-block:: console + + testpmd> show port fdir 0 + + ######################## FDIR infos for port 0 #################### + MODE: PERFECT + SUPPORTED FLOW TYPE: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other + ipv6-frag ipv6-tcp ipv6-udp ipv6-sctp ipv6-other + l2_payload + FLEX PAYLOAD INFO: + max_len: 16 payload_limit: 480 + payload_unit: 2 payload_seg: 3 + bitmask_unit: 2 bitmask_num: 2 + MASK: + vlan_tci: 0x0000, + src_ipv4: 0x00000000, + dst_ipv4: 0x00000000, + src_port: 0x0000, + dst_port: 0x0000 + src_ipv6: 0x00000000,0x00000000,0x00000000,0x00000000, + dst_ipv6: 0x00000000,0x00000000,0x00000000,0x00000000 + FLEX PAYLOAD SRC OFFSET: + L2_PAYLOAD: 0 1 2 3 4 5 6 ... + L3_PAYLOAD: 0 1 2 3 4 5 6 ... + L4_PAYLOAD: 0 1 2 3 4 5 6 ... + FLEX MASK CFG: + ipv4-udp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv4-tcp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv4-sctp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv4-other: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv4-frag: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-udp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-tcp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-sctp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-other: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-frag: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + l2_payload: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + guarant_count: 1 best_count: 0 + guarant_space: 512 best_space: 7168 + collision: 0 free: 0 + maxhash: 0 maxlen: 0 + add: 0 remove: 0 + f_add: 0 f_remove: 0 + + +Delete all flow director rules on a port: + +.. code-block:: console + + testpmd> flush_flow_director 0 + diff --git a/doc/guides/nics/img/console.png b/doc/guides/nics/img/console.png Binary files differnew file mode 100644 index 00000000..99423340 --- /dev/null +++ b/doc/guides/nics/img/console.png diff --git a/doc/guides/nics/img/fast_pkt_proc.png b/doc/guides/nics/img/fast_pkt_proc.png Binary files differnew file mode 100644 index 00000000..48d57e5c --- /dev/null +++ b/doc/guides/nics/img/fast_pkt_proc.png diff --git a/doc/guides/nics/img/forward_stats.png b/doc/guides/nics/img/forward_stats.png Binary files differnew file mode 100644 index 00000000..23e35325 --- /dev/null +++ b/doc/guides/nics/img/forward_stats.png diff --git a/doc/guides/nics/img/host_vm_comms.png b/doc/guides/nics/img/host_vm_comms.png Binary files differnew file mode 100644 index 00000000..4e0b3c96 --- /dev/null +++ b/doc/guides/nics/img/host_vm_comms.png diff --git a/doc/guides/nics/img/host_vm_comms_qemu.png b/doc/guides/nics/img/host_vm_comms_qemu.png Binary files differnew file mode 100644 index 00000000..391a4eac --- /dev/null +++ b/doc/guides/nics/img/host_vm_comms_qemu.png diff --git a/doc/guides/nics/img/inter_vm_comms.png b/doc/guides/nics/img/inter_vm_comms.png Binary files differnew file mode 100644 index 00000000..6d85ece7 --- /dev/null +++ b/doc/guides/nics/img/inter_vm_comms.png diff --git a/doc/guides/nics/img/perf_benchmark.png b/doc/guides/nics/img/perf_benchmark.png Binary files differnew file mode 100644 index 00000000..aba818c3 --- /dev/null +++ b/doc/guides/nics/img/perf_benchmark.png diff --git a/doc/guides/nics/img/single_port_nic.png b/doc/guides/nics/img/single_port_nic.png Binary files differnew file mode 100644 index 00000000..8f39d73b --- /dev/null +++ b/doc/guides/nics/img/single_port_nic.png diff --git a/doc/guides/nics/img/vm_vm_comms.png b/doc/guides/nics/img/vm_vm_comms.png Binary files differnew file mode 100644 index 00000000..2bf1cd27 --- /dev/null +++ b/doc/guides/nics/img/vm_vm_comms.png diff --git a/doc/guides/nics/img/vmxnet3_int.png b/doc/guides/nics/img/vmxnet3_int.png Binary files differnew file mode 100644 index 00000000..6541feba --- /dev/null +++ b/doc/guides/nics/img/vmxnet3_int.png diff --git a/doc/guides/nics/img/vswitch_vm.png b/doc/guides/nics/img/vswitch_vm.png Binary files differnew file mode 100644 index 00000000..ac817aaa --- /dev/null +++ b/doc/guides/nics/img/vswitch_vm.png diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst new file mode 100644 index 00000000..769f6770 --- /dev/null +++ b/doc/guides/nics/index.rst @@ -0,0 +1,75 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Network Interface Controller Drivers +==================================== + +.. toctree:: + :maxdepth: 3 + :numbered: + + overview + bnx2x + cxgbe + e1000em + ena + enic + fm10k + i40e + ixgbe + intel_vf + mlx4 + mlx5 + nfp + szedata2 + virtio + vhost + vmxnet3 + pcap_ring + +**Figures** + +:numref:`figure_single_port_nic` :ref:`figure_single_port_nic` + +:numref:`figure_perf_benchmark` :ref:`figure_perf_benchmark` + +:numref:`figure_fast_pkt_proc` :ref:`figure_fast_pkt_proc` + +:numref:`figure_inter_vm_comms` :ref:`figure_inter_vm_comms` + +:numref:`figure_host_vm_comms` :ref:`figure_host_vm_comms` + +:numref:`figure_host_vm_comms_qemu` :ref:`figure_host_vm_comms_qemu` + +:numref:`figure_vmxnet3_int` :ref:`figure_vmxnet3_int` + +:numref:`figure_vswitch_vm` :ref:`figure_vswitch_vm` + +:numref:`figure_vm_vm_comms` :ref:`figure_vm_vm_comms` diff --git a/doc/guides/nics/intel_vf.rst b/doc/guides/nics/intel_vf.rst new file mode 100644 index 00000000..a68198f8 --- /dev/null +++ b/doc/guides/nics/intel_vf.rst @@ -0,0 +1,608 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +I40E/IXGBE/IGB Virtual Function Driver +====================================== + +Supported Intel® Ethernet Controllers (see the *DPDK Release Notes* for details) +support the following modes of operation in a virtualized environment: + +* **SR-IOV mode**: Involves direct assignment of part of the port resources to different guest operating systems + using the PCI-SIG Single Root I/O Virtualization (SR IOV) standard, + also known as "native mode" or "pass-through" mode. + In this chapter, this mode is referred to as IOV mode. + +* **VMDq mode**: Involves central management of the networking resources by an IO Virtual Machine (IOVM) or + a Virtual Machine Monitor (VMM), also known as software switch acceleration mode. + In this chapter, this mode is referred to as the Next Generation VMDq mode. + +SR-IOV Mode Utilization in a DPDK Environment +--------------------------------------------- + +The DPDK uses the SR-IOV feature for hardware-based I/O sharing in IOV mode. +Therefore, it is possible to partition SR-IOV capability on Ethernet controller NIC resources logically and +expose them to a virtual machine as a separate PCI function called a "Virtual Function". +Refer to :numref:`figure_single_port_nic`. + +Therefore, a NIC is logically distributed among multiple virtual machines (as shown in :numref:`figure_single_port_nic`), +while still having global data in common to share with the Physical Function and other Virtual Functions. +The DPDK fm10kvf, i40evf, igbvf or ixgbevf as a Poll Mode Driver (PMD) serves for the Intel® 82576 Gigabit Ethernet Controller, +Intel® Ethernet Controller I350 family, Intel® 82599 10 Gigabit Ethernet Controller NIC, +Intel® Fortville 10/40 Gigabit Ethernet Controller NIC's virtual PCI function, or PCIe host-interface of the Intel Ethernet Switch +FM10000 Series. +Meanwhile the DPDK Poll Mode Driver (PMD) also supports "Physical Function" of such NIC's on the host. + +The DPDK PF/VF Poll Mode Driver (PMD) supports the Layer 2 switch on Intel® 82576 Gigabit Ethernet Controller, +Intel® Ethernet Controller I350 family, Intel® 82599 10 Gigabit Ethernet Controller, +and Intel® Fortville 10/40 Gigabit Ethernet Controller NICs so that guest can choose it for inter virtual machine traffic in SR-IOV mode. + +For more detail on SR-IOV, please refer to the following documents: + +* `SR-IOV provides hardware based I/O sharing <http://www.intel.com/network/connectivity/solutions/vmdc.htm>`_ + +* `PCI-SIG-Single Root I/O Virtualization Support on IA + <http://www.intel.com/content/www/us/en/pci-express/pci-sig-single-root-io-virtualization-support-in-virtualization-technology-for-connectivity-paper.html>`_ + +* `Scalable I/O Virtualized Servers <http://www.intel.com/content/www/us/en/virtualization/server-virtualization/scalable-i-o-virtualized-servers-paper.html>`_ + +.. _figure_single_port_nic: + +.. figure:: img/single_port_nic.* + + Virtualization for a Single Port NIC in SR-IOV Mode + + +Physical and Virtual Function Infrastructure +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following describes the Physical Function and Virtual Functions infrastructure for the supported Ethernet Controller NICs. + +Virtual Functions operate under the respective Physical Function on the same NIC Port and therefore have no access +to the global NIC resources that are shared between other functions for the same NIC port. + +A Virtual Function has basic access to the queue resources and control structures of the queues assigned to it. +For global resource access, a Virtual Function has to send a request to the Physical Function for that port, +and the Physical Function operates on the global resources on behalf of the Virtual Function. +For this out-of-band communication, an SR-IOV enabled NIC provides a memory buffer for each Virtual Function, +which is called a "Mailbox". + +The PCIE host-interface of Intel Ethernet Switch FM10000 Series VF infrastructure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In a virtualized environment, the programmer can enable a maximum of *64 Virtual Functions (VF)* +globally per PCIE host-interface of the Intel Ethernet Switch FM10000 Series device. +Each VF can have a maximum of 16 queue pairs. +The Physical Function in host could be only configured by the Linux* fm10k driver +(in the case of the Linux Kernel-based Virtual Machine [KVM]), DPDK PMD PF driver doesn't support it yet. + +For example, + +* Using Linux* fm10k driver: + + .. code-block:: console + + rmmod fm10k (To remove the fm10k module) + insmod fm0k.ko max_vfs=2,2 (To enable two Virtual Functions per port) + +Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. +When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# +represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. +However: + +* Virtual Functions 0 and 2 belong to Physical Function 0 + +* Virtual Functions 1 and 3 belong to Physical Function 1 + +.. note:: + + The above is an important consideration to take into account when targeting specific packets to a selected port. + +Intel® Fortville 10/40 Gigabit Ethernet Controller VF Infrastructure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In a virtualized environment, the programmer can enable a maximum of *128 Virtual Functions (VF)* +globally per Intel® Fortville 10/40 Gigabit Ethernet Controller NIC device. +Each VF can have a maximum of 16 queue pairs. +The Physical Function in host could be either configured by the Linux* i40e driver +(in the case of the Linux Kernel-based Virtual Machine [KVM]) or by DPDK PMD PF driver. +When using both DPDK PMD PF/VF drivers, the whole NIC will be taken over by DPDK based application. + +For example, + +* Using Linux* i40e driver: + + .. code-block:: console + + rmmod i40e (To remove the i40e module) + insmod i40e.ko max_vfs=2,2 (To enable two Virtual Functions per port) + +* Using the DPDK PMD PF i40e driver: + + Kernel Params: iommu=pt, intel_iommu=on + + .. code-block:: console + + modprobe uio + insmod igb_uio + ./dpdk_nic_bind.py -b igb_uio bb:ss.f + echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific PCI device) + + Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library. + +* Using the DPDK PMD PF ixgbe driver to enable VF RSS: + + Same steps as above to install the modules of uio, igb_uio, specify max_vfs for PCI device, and + launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library. + + The available queue number(at most 4) per VF depends on the total number of pool, which is + determined by the max number of VF at PF initialization stage and the number of queue specified + in config: + + * If the max number of VF is set in the range of 1 to 32: + + If the number of rxq is specified as 4(e.g. '--rxq 4' in testpmd), then there are totally 32 + pools(ETH_32_POOLS), and each VF could have 4 or less(e.g. 2) queues; + + If the number of rxq is specified as 2(e.g. '--rxq 2' in testpmd), then there are totally 32 + pools(ETH_32_POOLS), and each VF could have 2 queues; + + * If the max number of VF is in the range of 33 to 64: + + If the number of rxq is 4 ('--rxq 4' in testpmd), then error message is expected as rxq is not + correct at this case; + + If the number of rxq is 2 ('--rxq 2' in testpmd), then there is totally 64 pools(ETH_64_POOLS), + and each VF have 2 queues; + + On host, to enable VF RSS functionality, rx mq mode should be set as ETH_MQ_RX_VMDQ_RSS + or ETH_MQ_RX_RSS mode, and SRIOV mode should be activated(max_vfs >= 1). + It also needs config VF RSS information like hash function, RSS key, RSS key length. + + .. code-block:: console + + testpmd -c 0xffff -n 4 -- --coremask=<core-mask> --rxq=4 --txq=4 -i + + The limitation for VF RSS on Intel® 82599 10 Gigabit Ethernet Controller is: + The hash and key are shared among PF and all VF, the RETA table with 128 entries is also shared + among PF and all VF; So it could not to provide a method to query the hash and reta content per + VF on guest, while, if possible, please query them on host(PF) for the shared RETA information. + +Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. +When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# +represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. +However: + +* Virtual Functions 0 and 2 belong to Physical Function 0 + +* Virtual Functions 1 and 3 belong to Physical Function 1 + +.. note:: + + The above is an important consideration to take into account when targeting specific packets to a selected port. + +Intel® 82599 10 Gigabit Ethernet Controller VF Infrastructure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The programmer can enable a maximum of *63 Virtual Functions* and there must be *one Physical Function* per Intel® 82599 +10 Gigabit Ethernet Controller NIC port. +The reason for this is that the device allows for a maximum of 128 queues per port and a virtual/physical function has to +have at least one queue pair (RX/TX). +The current implementation of the DPDK ixgbevf driver supports a single queue pair (RX/TX) per Virtual Function. +The Physical Function in host could be either configured by the Linux* ixgbe driver +(in the case of the Linux Kernel-based Virtual Machine [KVM]) or by DPDK PMD PF driver. +When using both DPDK PMD PF/VF drivers, the whole NIC will be taken over by DPDK based application. + +For example, + +* Using Linux* ixgbe driver: + + .. code-block:: console + + rmmod ixgbe (To remove the ixgbe module) + insmod ixgbe max_vfs=2,2 (To enable two Virtual Functions per port) + +* Using the DPDK PMD PF ixgbe driver: + + Kernel Params: iommu=pt, intel_iommu=on + + .. code-block:: console + + modprobe uio + insmod igb_uio + ./dpdk_nic_bind.py -b igb_uio bb:ss.f + echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific PCI device) + + Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library. + +Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. +When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# +represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. +However: + +* Virtual Functions 0 and 2 belong to Physical Function 0 + +* Virtual Functions 1 and 3 belong to Physical Function 1 + +.. note:: + + The above is an important consideration to take into account when targeting specific packets to a selected port. + +Intel® 82576 Gigabit Ethernet Controller and Intel® Ethernet Controller I350 Family VF Infrastructure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In a virtualized environment, an Intel® 82576 Gigabit Ethernet Controller serves up to eight virtual machines (VMs). +The controller has 16 TX and 16 RX queues. +They are generally referred to (or thought of) as queue pairs (one TX and one RX queue). +This gives the controller 16 queue pairs. + +A pool is a group of queue pairs for assignment to the same VF, used for transmit and receive operations. +The controller has eight pools, with each pool containing two queue pairs, that is, two TX and two RX queues assigned to each VF. + +In a virtualized environment, an Intel® Ethernet Controller I350 family device serves up to eight virtual machines (VMs) per port. +The eight queues can be accessed by eight different VMs if configured correctly (the i350 has 4x1GbE ports each with 8T X and 8 RX queues), +that means, one Transmit and one Receive queue assigned to each VF. + +For example, + +* Using Linux* igb driver: + + .. code-block:: console + + rmmod igb (To remove the igb module) + insmod igb max_vfs=2,2 (To enable two Virtual Functions per port) + +* Using DPDK PMD PF igb driver: + + Kernel Params: iommu=pt, intel_iommu=on modprobe uio + + .. code-block:: console + + insmod igb_uio + ./dpdk_nic_bind.py -b igb_uio bb:ss.f + echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific pci device) + + Launch DPDK testpmd/example or your own host daemon application using the DPDK PMD library. + +Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a four-port NIC. +When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# +represented by (Bus#, Device#, Function#) in sequence, starting from 0 to 7. +However: + +* Virtual Functions 0 and 4 belong to Physical Function 0 + +* Virtual Functions 1 and 5 belong to Physical Function 1 + +* Virtual Functions 2 and 6 belong to Physical Function 2 + +* Virtual Functions 3 and 7 belong to Physical Function 3 + +.. note:: + + The above is an important consideration to take into account when targeting specific packets to a selected port. + +Validated Hypervisors +~~~~~~~~~~~~~~~~~~~~~ + +The validated hypervisor is: + +* KVM (Kernel Virtual Machine) with Qemu, version 0.14.0 + +However, the hypervisor is bypassed to configure the Virtual Function devices using the Mailbox interface, +the solution is hypervisor-agnostic. +Xen* and VMware* (when SR- IOV is supported) will also be able to support the DPDK with Virtual Function driver support. + +Expected Guest Operating System in Virtual Machine +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The expected guest operating systems in a virtualized environment are: + +* Fedora* 14 (64-bit) + +* Ubuntu* 10.04 (64-bit) + +For supported kernel versions, refer to the *DPDK Release Notes*. + +Setting Up a KVM Virtual Machine Monitor +---------------------------------------- + +The following describes a target environment: + +* Host Operating System: Fedora 14 + +* Hypervisor: KVM (Kernel Virtual Machine) with Qemu version 0.14.0 + +* Guest Operating System: Fedora 14 + +* Linux Kernel Version: Refer to the *DPDK Getting Started Guide* + +* Target Applications: l2fwd, l3fwd-vf + +The setup procedure is as follows: + +#. Before booting the Host OS, open **BIOS setup** and enable **Intel® VT features**. + +#. While booting the Host OS kernel, pass the intel_iommu=on kernel command line argument using GRUB. + When using DPDK PF driver on host, pass the iommu=pt kernel command line argument in GRUB. + +#. Download qemu-kvm-0.14.0 from + `http://sourceforge.net/projects/kvm/files/qemu-kvm/ <http://sourceforge.net/projects/kvm/files/qemu-kvm/>`_ + and install it in the Host OS using the following steps: + + When using a recent kernel (2.6.25+) with kvm modules included: + + .. code-block:: console + + tar xzf qemu-kvm-release.tar.gz + cd qemu-kvm-release + ./configure --prefix=/usr/local/kvm + make + sudo make install + sudo /sbin/modprobe kvm-intel + + When using an older kernel, or a kernel from a distribution without the kvm modules, + you must download (from the same link), compile and install the modules yourself: + + .. code-block:: console + + tar xjf kvm-kmod-release.tar.bz2 + cd kvm-kmod-release + ./configure + make + sudo make install + sudo /sbin/modprobe kvm-intel + + qemu-kvm installs in the /usr/local/bin directory. + + For more details about KVM configuration and usage, please refer to: + + `http://www.linux-kvm.org/page/HOWTO1 <http://www.linux-kvm.org/page/HOWTO1>`_. + +#. Create a Virtual Machine and install Fedora 14 on the Virtual Machine. + This is referred to as the Guest Operating System (Guest OS). + +#. Download and install the latest ixgbe driver from: + + `http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=14687 <http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=14687>`_ + +#. In the Host OS + + When using Linux kernel ixgbe driver, unload the Linux ixgbe driver and reload it with the max_vfs=2,2 argument: + + .. code-block:: console + + rmmod ixgbe + modprobe ixgbe max_vfs=2,2 + + When using DPDK PMD PF driver, insert DPDK kernel module igb_uio and set the number of VF by sysfs max_vfs: + + .. code-block:: console + + modprobe uio + insmod igb_uio + ./dpdk_nic_bind.py -b igb_uio 02:00.0 02:00.1 0e:00.0 0e:00.1 + echo 2 > /sys/bus/pci/devices/0000\:02\:00.0/max_vfs + echo 2 > /sys/bus/pci/devices/0000\:02\:00.1/max_vfs + echo 2 > /sys/bus/pci/devices/0000\:0e\:00.0/max_vfs + echo 2 > /sys/bus/pci/devices/0000\:0e\:00.1/max_vfs + + .. note:: + + You need to explicitly specify number of vfs for each port, for example, + in the command above, it creates two vfs for the first two ixgbe ports. + + Let say we have a machine with four physical ixgbe ports: + + + 0000:02:00.0 + + 0000:02:00.1 + + 0000:0e:00.0 + + 0000:0e:00.1 + + The command above creates two vfs for device 0000:02:00.0: + + .. code-block:: console + + ls -alrt /sys/bus/pci/devices/0000\:02\:00.0/virt* + lrwxrwxrwx. 1 root root 0 Apr 13 05:40 /sys/bus/pci/devices/0000:02:00.0/virtfn1 -> ../0000:02:10.2 + lrwxrwxrwx. 1 root root 0 Apr 13 05:40 /sys/bus/pci/devices/0000:02:00.0/virtfn0 -> ../0000:02:10.0 + + It also creates two vfs for device 0000:02:00.1: + + .. code-block:: console + + ls -alrt /sys/bus/pci/devices/0000\:02\:00.1/virt* + lrwxrwxrwx. 1 root root 0 Apr 13 05:51 /sys/bus/pci/devices/0000:02:00.1/virtfn1 -> ../0000:02:10.3 + lrwxrwxrwx. 1 root root 0 Apr 13 05:51 /sys/bus/pci/devices/0000:02:00.1/virtfn0 -> ../0000:02:10.1 + +#. List the PCI devices connected and notice that the Host OS shows two Physical Functions (traditional ports) + and four Virtual Functions (two for each port). + This is the result of the previous step. + +#. Insert the pci_stub module to hold the PCI devices that are freed from the default driver using the following command + (see http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM Section 4 for more information): + + .. code-block:: console + + sudo /sbin/modprobe pci-stub + + Unbind the default driver from the PCI devices representing the Virtual Functions. + A script to perform this action is as follows: + + .. code-block:: console + + echo "8086 10ed" > /sys/bus/pci/drivers/pci-stub/new_id + echo 0000:08:10.0 > /sys/bus/pci/devices/0000:08:10.0/driver/unbind + echo 0000:08:10.0 > /sys/bus/pci/drivers/pci-stub/bind + + where, 0000:08:10.0 belongs to the Virtual Function visible in the Host OS. + +#. Now, start the Virtual Machine by running the following command: + + .. code-block:: console + + /usr/local/kvm/bin/qemu-system-x86_64 -m 4096 -smp 4 -boot c -hda lucid.qcow2 -device pci-assign,host=08:10.0 + + where: + + — -m = memory to assign + + — -smp = number of smp cores + + — -boot = boot option + + — -hda = virtual disk image + + — -device = device to attach + + .. note:: + + — The pci-assign,host=08:10.0 alue indicates that you want to attach a PCI device + to a Virtual Machine and the respective (Bus:Device.Function) + numbers should be passed for the Virtual Function to be attached. + + — qemu-kvm-0.14.0 allows a maximum of four PCI devices assigned to a VM, + but this is qemu-kvm version dependent since qemu-kvm-0.14.1 allows a maximum of five PCI devices. + + — qemu-system-x86_64 also has a -cpu command line option that is used to select the cpu_model + to emulate in a Virtual Machine. Therefore, it can be used as: + + .. code-block:: console + + /usr/local/kvm/bin/qemu-system-x86_64 -cpu ? + + (to list all available cpu_models) + + /usr/local/kvm/bin/qemu-system-x86_64 -m 4096 -cpu host -smp 4 -boot c -hda lucid.qcow2 -device pci-assign,host=08:10.0 + + (to use the same cpu_model equivalent to the host cpu) + + For more information, please refer to: `http://wiki.qemu.org/Features/CPUModels <http://wiki.qemu.org/Features/CPUModels>`_. + +#. Install and run DPDK host app to take over the Physical Function. Eg. + + .. code-block:: console + + make install T=x86_64-native-linuxapp-gcc + ./x86_64-native-linuxapp-gcc/app/testpmd -c f -n 4 -- -i + +#. Finally, access the Guest OS using vncviewer with the localhost:5900 port and check the lspci command output in the Guest OS. + The virtual functions will be listed as available for use. + +#. Configure and install the DPDK with an x86_64-native-linuxapp-gcc configuration on the Guest OS as normal, + that is, there is no change to the normal installation procedure. + + .. code-block:: console + + make config T=x86_64-native-linuxapp-gcc O=x86_64-native-linuxapp-gcc + cd x86_64-native-linuxapp-gcc + make + +.. note:: + + If you are unable to compile the DPDK and you are getting "error: CPU you selected does not support x86-64 instruction set", + power off the Guest OS and start the virtual machine with the correct -cpu option in the qemu- system-x86_64 command as shown in step 9. + You must select the best x86_64 cpu_model to emulate or you can select host option if available. + +.. note:: + + Run the DPDK l2fwd sample application in the Guest OS with Hugepages enabled. + For the expected benchmark performance, you must pin the cores from the Guest OS to the Host OS (taskset can be used to do this) and + you must also look at the PCI Bus layout on the board to ensure you are not running the traffic over the QPI Interface. + +.. note:: + + * The Virtual Machine Manager (the Fedora package name is virt-manager) is a utility for virtual machine management + that can also be used to create, start, stop and delete virtual machines. + If this option is used, step 2 and 6 in the instructions provided will be different. + + * virsh, a command line utility for virtual machine management, + can also be used to bind and unbind devices to a virtual machine in Ubuntu. + If this option is used, step 6 in the instructions provided will be different. + + * The Virtual Machine Monitor (see :numref:`figure_perf_benchmark`) is equivalent to a Host OS with KVM installed as described in the instructions. + +.. _figure_perf_benchmark: + +.. figure:: img/perf_benchmark.* + + Performance Benchmark Setup + + +DPDK SR-IOV PMD PF/VF Driver Usage Model +---------------------------------------- + +Fast Host-based Packet Processing +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Software Defined Network (SDN) trends are demanding fast host-based packet handling. +In a virtualization environment, +the DPDK VF PMD driver performs the same throughput result as a non-VT native environment. + +With such host instance fast packet processing, lots of services such as filtering, QoS, +DPI can be offloaded on the host fast path. + +:numref:`figure_fast_pkt_proc` shows the scenario where some VMs directly communicate externally via a VFs, +while others connect to a virtual switch and share the same uplink bandwidth. + +.. _figure_fast_pkt_proc: + +.. figure:: img/fast_pkt_proc.* + + Fast Host-based Packet Processing + + +SR-IOV (PF/VF) Approach for Inter-VM Communication +-------------------------------------------------- + +Inter-VM data communication is one of the traffic bottle necks in virtualization platforms. +SR-IOV device assignment helps a VM to attach the real device, taking advantage of the bridge in the NIC. +So VF-to-VF traffic within the same physical port (VM0<->VM1) have hardware acceleration. +However, when VF crosses physical ports (VM0<->VM2), there is no such hardware bridge. +In this case, the DPDK PMD PF driver provides host forwarding between such VMs. + +:numref:`figure_inter_vm_comms` shows an example. +In this case an update of the MAC address lookup tables in both the NIC and host DPDK application is required. + +In the NIC, writing the destination of a MAC address belongs to another cross device VM to the PF specific pool. +So when a packet comes in, its destination MAC address will match and forward to the host DPDK PMD application. + +In the host DPDK application, the behavior is similar to L2 forwarding, +that is, the packet is forwarded to the correct PF pool. +The SR-IOV NIC switch forwards the packet to a specific VM according to the MAC destination address +which belongs to the destination VF on the VM. + +.. _figure_inter_vm_comms: + +.. figure:: img/inter_vm_comms.* + + Inter-VM Communication diff --git a/doc/guides/nics/ixgbe.rst b/doc/guides/nics/ixgbe.rst new file mode 100644 index 00000000..3dc6b004 --- /dev/null +++ b/doc/guides/nics/ixgbe.rst @@ -0,0 +1,226 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +IXGBE Driver +============ + +Vector PMD for IXGBE +-------------------- + +Vector PMD uses Intel® SIMD instructions to optimize packet I/O. +It improves load/store bandwidth efficiency of L1 data cache by using a wider SSE/AVX register 1 (1). +The wider register gives space to hold multiple packet buffers so as to save instruction number when processing bulk of packets. + +There is no change to PMD API. The RX/TX handler are the only two entries for vPMD packet I/O. +They are transparently registered at runtime RX/TX execution if all condition checks pass. + +1. To date, only an SSE version of IX GBE vPMD is available. + To ensure that vPMD is in the binary code, ensure that the option CONFIG_RTE_IXGBE_INC_VECTOR=y is in the configure file. + +Some constraints apply as pre-conditions for specific optimizations on bulk packet transfers. +The following sections explain RX and TX constraints in the vPMD. + +RX Constraints +~~~~~~~~~~~~~~ + +Prerequisites and Pre-conditions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following prerequisites apply: + +* To enable vPMD to work for RX, bulk allocation for Rx must be allowed. + +Ensure that the following pre-conditions are satisfied: + +* rxq->rx_free_thresh >= RTE_PMD_IXGBE_RX_MAX_BURST + +* rxq->rx_free_thresh < rxq->nb_rx_desc + +* (rxq->nb_rx_desc % rxq->rx_free_thresh) == 0 + +* rxq->nb_rx_desc < (IXGBE_MAX_RING_DESC - RTE_PMD_IXGBE_RX_MAX_BURST) + +These conditions are checked in the code. + +Scattered packets are not supported in this mode. +If an incoming packet is greater than the maximum acceptable length of one "mbuf" data size (by default, the size is 2 KB), +vPMD for RX would be disabled. + +By default, IXGBE_MAX_RING_DESC is set to 4096 and RTE_PMD_IXGBE_RX_MAX_BURST is set to 32. + +Feature not Supported by RX Vector PMD +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Some features are not supported when trying to increase the throughput in vPMD. +They are: + +* IEEE1588 + +* FDIR + +* Header split + +* RX checksum off load + +Other features are supported using optional MACRO configuration. They include: + +* HW VLAN strip + +* HW extend dual VLAN + +* Enabled by RX_OLFLAGS (RTE_IXGBE_RX_OLFLAGS_ENABLE=y) + + +To guarantee the constraint, configuration flags in dev_conf.rxmode will be checked: + +* hw_vlan_strip + +* hw_vlan_extend + +* hw_ip_checksum + +* header_split + +* dev_conf + +fdir_conf->mode will also be checked. + +RX Burst Size +^^^^^^^^^^^^^ + +As vPMD is focused on high throughput, it assumes that the RX burst size is equal to or greater than 32 per burst. +It returns zero if using nb_pkt < 32 as the expected packet number in the receive handler. + +TX Constraint +~~~~~~~~~~~~~ + +Prerequisite +^^^^^^^^^^^^ + +The only prerequisite is related to tx_rs_thresh. +The tx_rs_thresh value must be greater than or equal to RTE_PMD_IXGBE_TX_MAX_BURST, +but less or equal to RTE_IXGBE_TX_MAX_FREE_BUF_SZ. +Consequently, by default the tx_rs_thresh value is in the range 32 to 64. + +Feature not Supported by RX Vector PMD +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +TX vPMD only works when txq_flags is set to IXGBE_SIMPLE_FLAGS. + +This means that it does not support TX multi-segment, VLAN offload and TX csum offload. +The following MACROs are used for these three features: + +* ETH_TXQ_FLAGS_NOMULTSEGS + +* ETH_TXQ_FLAGS_NOVLANOFFL + +* ETH_TXQ_FLAGS_NOXSUMSCTP + +* ETH_TXQ_FLAGS_NOXSUMUDP + +* ETH_TXQ_FLAGS_NOXSUMTCP + + +Sample Application Notes +~~~~~~~~~~~~~~~~~~~~~~~~ + +testpmd +^^^^^^^ + +By default, using CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y: + +.. code-block:: console + + ./x86_64-native-linuxapp-gcc/app/testpmd -c 300 -n 4 -- -i --burst=32 --rxfreet=32 --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 --txqflags=0xf01 + +When CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=n, better performance can be achieved: + +.. code-block:: console + + ./x86_64-native-linuxapp-gcc/app/testpmd -c 300 -n 4 -- -i --burst=32 --rxfreet=32 --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 --txqflags=0xf01 --disable-hw-vlan + +l3fwd +^^^^^ + +When running l3fwd with vPMD, there is one thing to note. +In the configuration, ensure that port_conf.rxmode.hw_ip_checksum=0. +Otherwise, by default, RX vPMD is disabled. + +load_balancer +^^^^^^^^^^^^^ + +As in the case of l3fwd, set configure port_conf.rxmode.hw_ip_checksum=0 to enable vPMD. +In addition, for improved performance, use -bsz "(32,32),(64,64),(32,32)" in load_balancer to avoid using the default burst size of 144. + + +Malicious Driver Detection not Supported +---------------------------------------- + +The Intel x550 series NICs support a feature called MDD (Malicious +Driver Detection) which checks the behavior of the VF driver. +If this feature is enabled, the VF must use the advanced context descriptor +correctly and set the CC (Check Context) bit. +DPDK PF doesn't support MDD, but kernel PF does. We may hit problem in this +scenario kernel PF + DPDK VF. If user enables MDD in kernel PF, DPDK VF will +not work. Because kernel PF thinks the VF is malicious. But actually it's not. +The only reason is the VF doesn't act as MDD required. +There's significant performance impact to support MDD. DPDK should check if +the advanced context descriptor should be set and set it. And DPDK has to ask +the info about the header length from the upper layer, because parsing the +packet itself is not acceptable. So, it's too expensive to support MDD. +When using kernel PF + DPDK VF on x550, please make sure using the kernel +driver that disables MDD or can disable MDD. (Some kernel driver can use +this CLI 'insmod ixgbe.ko MDD=0,0' to disable MDD. Some kernel driver disables +it by default.) + + +Statistics +---------- + +The statistics of ixgbe hardware must be polled regularly in order for it to +remain consistent. Running a DPDK application without polling the statistics will +cause registers on hardware to count to the maximum value, and "stick" at +that value. + +In order to avoid statistic registers every reaching the maximum value, +read the statistics from the hardware using ``rte_eth_stats_get()`` or +``rte_eth_xstats_get()``. + +The maximum time between statistics polls that ensures consistent results can +be calculated as follows: + +.. code-block:: c + + max_read_interval = UINT_MAX / max_packets_per_second + max_read_interval = 4294967295 / 14880952 + max_read_interval = 288.6218096127183 (seconds) + max_read_interval = ~4 mins 48 sec. + +In order to ensure valid results, it is recommended to poll every 4 minutes. diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst new file mode 100644 index 00000000..49f46263 --- /dev/null +++ b/doc/guides/nics/mlx4.rst @@ -0,0 +1,383 @@ +.. BSD LICENSE + Copyright 2012-2015 6WIND S.A. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of 6WIND S.A. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +MLX4 poll mode driver library +============================= + +The MLX4 poll mode driver library (**librte_pmd_mlx4**) implements support +for **Mellanox ConnectX-3** and **Mellanox ConnectX-3 Pro** 10/40 Gbps adapters +as well as their virtual functions (VF) in SR-IOV context. + +Information and documentation about this family of adapters can be found on +the `Mellanox website <http://www.mellanox.com>`_. Help is also provided by +the `Mellanox community <http://community.mellanox.com/welcome>`_. + +There is also a `section dedicated to this poll mode driver +<http://www.mellanox.com/page/products_dyn?product_family=209&mtag=pmd_for_dpdk>`_. + +.. note:: + + Due to external dependencies, this driver is disabled by default. It must + be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX4_PMD=y`` and + recompiling DPDK. + +Implementation details +---------------------- + +Most Mellanox ConnectX-3 devices provide two ports but expose a single PCI +bus address, thus unlike most drivers, librte_pmd_mlx4 registers itself as a +PCI driver that allocates one Ethernet device per detected port. + +For this reason, one cannot white/blacklist a single port without also +white/blacklisting the others on the same device. + +Besides its dependency on libibverbs (that implies libmlx4 and associated +kernel support), librte_pmd_mlx4 relies heavily on system calls for control +operations such as querying/updating the MTU and flow control parameters. + +For security reasons and robustness, this driver only deals with virtual +memory addresses. The way resources allocations are handled by the kernel +combined with hardware specifications that allow it to handle virtual memory +addresses directly ensure that DPDK applications cannot access random +physical memory (or memory that does not belong to the current process). + +This capability allows the PMD to coexist with kernel network interfaces +which remain functional, although they stop receiving unicast packets as +long as they share the same MAC address. + +Compiling librte_pmd_mlx4 causes DPDK to be linked against libibverbs. + +Features +-------- + +- RSS, also known as RCA, is supported. In this mode the number of + configured RX queues must be a power of two. +- VLAN filtering is supported. +- Link state information is provided. +- Promiscuous mode is supported. +- All multicast mode is supported. +- Multiple MAC addresses (unicast, multicast) can be configured. +- Scattered packets are supported for TX and RX. +- Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation. +- Outer L3 (IP) TX/RX checksum offloading and validation for VXLAN frames. +- Secondary process TX is supported. + +Limitations +----------- + +- RSS hash key cannot be modified. +- RSS RETA cannot be configured +- RSS always includes L3 (IPv4/IPv6) and L4 (UDP/TCP). They cannot be + dissociated. +- Hardware counters are not implemented (they are software counters). +- Secondary process RX is not supported. + +Configuration +------------- + +Compilation options +~~~~~~~~~~~~~~~~~~~ + +These options can be modified in the ``.config`` file. + +- ``CONFIG_RTE_LIBRTE_MLX4_PMD`` (default **n**) + + Toggle compilation of librte_pmd_mlx4 itself. + +- ``CONFIG_RTE_LIBRTE_MLX4_DEBUG`` (default **n**) + + Toggle debugging code and stricter compilation flags. Enabling this option + adds additional run-time checks and debugging messages at the cost of + lower performance. + +- ``CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N`` (default **4**) + + Number of scatter/gather elements (SGEs) per work request (WR). Lowering + this number improves performance but also limits the ability to receive + scattered packets (packets that do not fit a single mbuf). The default + value is a safe tradeoff. + +- ``CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE`` (default **0**) + + Amount of data to be inlined during TX operations. Improves latency but + lowers throughput. + +- ``CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE`` (default **8**) + + Maximum number of cached memory pools (MPs) per TX queue. Each MP from + which buffers are to be transmitted must be associated to memory regions + (MRs). This is a slow operation that must be cached. + + This value is always 1 for RX queues since they use a single MP. + +- ``CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS`` (default **1**) + + Toggle software counters. No counters are available if this option is + disabled since hardware counters are not supported. + +Environment variables +~~~~~~~~~~~~~~~~~~~~~ + +- ``MLX4_INLINE_RECV_SIZE`` + + A nonzero value enables inline receive for packets up to that size. May + significantly improve performance in some cases but lower it in + others. Requires careful testing. + +Run-time configuration +~~~~~~~~~~~~~~~~~~~~~~ + +- The only constraint when RSS mode is requested is to make sure the number + of RX queues is a power of two. This is a hardware requirement. + +- librte_pmd_mlx4 brings kernel network interfaces up during initialization + because it is affected by their state. Forcing them down prevents packets + reception. + +- **ethtool** operations on related kernel interfaces also affect the PMD. + +Kernel module parameters +~~~~~~~~~~~~~~~~~~~~~~~~ + +The **mlx4_core** kernel module has several parameters that affect the +behavior and/or the performance of librte_pmd_mlx4. Some of them are described +below. + +- **num_vfs** (integer or triplet, optionally prefixed by device address + strings) + + Create the given number of VFs on the specified devices. + +- **log_num_mgm_entry_size** (integer) + + Device-managed flow steering (DMFS) is required by DPDK applications. It is + enabled by using a negative value, the last four bits of which have a + special meaning. + + - **-1**: force device-managed flow steering (DMFS). + - **-7**: configure optimized steering mode to improve performance with the + following limitation: VLAN filtering is not supported with this mode. + This is the recommended mode in case VLAN filter is not needed. + +Prerequisites +------------- + +This driver relies on external libraries and kernel drivers for resources +allocations and initialization. The following dependencies are not part of +DPDK and must be installed separately: + +- **libibverbs** + + User space verbs framework used by librte_pmd_mlx4. This library provides + a generic interface between the kernel and low-level user space drivers + such as libmlx4. + + It allows slow and privileged operations (context initialization, hardware + resources allocations) to be managed by the kernel and fast operations to + never leave user space. + +- **libmlx4** + + Low-level user space driver library for Mellanox ConnectX-3 devices, + it is automatically loaded by libibverbs. + + This library basically implements send/receive calls to the hardware + queues. + +- **Kernel modules** (mlnx-ofed-kernel) + + They provide the kernel-side verbs API and low level device drivers that + manage actual hardware initialization and resources sharing with user + space processes. + + Unlike most other PMDs, these modules must remain loaded and bound to + their devices: + + - mlx4_core: hardware driver managing Mellanox ConnectX-3 devices. + - mlx4_en: Ethernet device driver that provides kernel network interfaces. + - mlx4_ib: InifiniBand device driver. + - ib_uverbs: user space driver for verbs (entry point for libibverbs). + +- **Firmware update** + + Mellanox OFED releases include firmware updates for ConnectX-3 adapters. + + Because each release provides new features, these updates must be applied to + match the kernel modules and libraries they come with. + +.. note:: + + Both libraries are BSD and GPL licensed. Linux kernel modules are GPL + licensed. + +Currently supported by DPDK: + +- Mellanox OFED **3.1**. +- Firmware version **2.35.5100** and higher. +- Supported architectures: **x86_64** and **POWER8**. + +Getting Mellanox OFED +~~~~~~~~~~~~~~~~~~~~~ + +While these libraries and kernel modules are available on OpenFabrics +Alliance's `website <https://www.openfabrics.org/>`_ and provided by package +managers on most distributions, this PMD requires Ethernet extensions that +may not be supported at the moment (this is a work in progress). + +`Mellanox OFED +<http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers>`_ +includes the necessary support and should be used in the meantime. For DPDK, +only libibverbs, libmlx4, mlnx-ofed-kernel packages and firmware updates are +required from that distribution. + +.. note:: + + Several versions of Mellanox OFED are available. Installing the version + this DPDK release was developed and tested against is strongly + recommended. Please check the `prerequisites`_. + +Usage example +------------- + +This section demonstrates how to launch **testpmd** with Mellanox ConnectX-3 +devices managed by librte_pmd_mlx4. + +#. Load the kernel modules: + + .. code-block:: console + + modprobe -a ib_uverbs mlx4_en mlx4_core mlx4_ib + + Alternatively if MLNX_OFED is fully installed, the following script can + be run: + + .. code-block:: console + + /etc/init.d/openibd restart + + .. note:: + + User space I/O kernel modules (uio and igb_uio) are not used and do + not have to be loaded. + +#. Make sure Ethernet interfaces are in working order and linked to kernel + verbs. Related sysfs entries should be present: + + .. code-block:: console + + ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5 + + Example output: + + .. code-block:: console + + eth2 + eth3 + eth4 + eth5 + +#. Optionally, retrieve their PCI bus addresses for whitelisting: + + .. code-block:: console + + { + for intf in eth2 eth3 eth4 eth5; + do + (cd "/sys/class/net/${intf}/device/" && pwd -P); + done; + } | + sed -n 's,.*/\(.*\),-w \1,p' + + Example output: + + .. code-block:: console + + -w 0000:83:00.0 + -w 0000:83:00.0 + -w 0000:84:00.0 + -w 0000:84:00.0 + + .. note:: + + There are only two distinct PCI bus addresses because the Mellanox + ConnectX-3 adapters installed on this system are dual port. + +#. Request huge pages: + + .. code-block:: console + + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages + +#. Start testpmd with basic parameters: + + .. code-block:: console + + testpmd -c 0xff00 -n 4 -w 0000:83:00.0 -w 0000:84:00.0 -- --rxq=2 --txq=2 -i + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:83:00.0 on NUMA socket 1 + EAL: probe driver: 15b3:1007 librte_pmd_mlx4 + PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF: false) + PMD: librte_pmd_mlx4: 2 port(s) detected + PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:b7:50 + PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:b7:51 + EAL: PCI device 0000:84:00.0 on NUMA socket 1 + EAL: probe driver: 15b3:1007 librte_pmd_mlx4 + PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_1" (VF: false) + PMD: librte_pmd_mlx4: 2 port(s) detected + PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:ba:b0 + PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:ba:b1 + Interactive-mode selected + Configuring Port 0 (socket 0) + PMD: librte_pmd_mlx4: 0x867d60: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867d60: RX queues number update: 0 -> 2 + Port 0: 00:02:C9:B5:B7:50 + Configuring Port 1 (socket 0) + PMD: librte_pmd_mlx4: 0x867da0: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867da0: RX queues number update: 0 -> 2 + Port 1: 00:02:C9:B5:B7:51 + Configuring Port 2 (socket 0) + PMD: librte_pmd_mlx4: 0x867de0: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867de0: RX queues number update: 0 -> 2 + Port 2: 00:02:C9:B5:BA:B0 + Configuring Port 3 (socket 0) + PMD: librte_pmd_mlx4: 0x867e20: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867e20: RX queues number update: 0 -> 2 + Port 3: 00:02:C9:B5:BA:B1 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 40000 Mbps - full-duplex + Port 2 Link Up - speed 10000 Mbps - full-duplex + Port 3 Link Up - speed 40000 Mbps - full-duplex + Done + testpmd> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst new file mode 100644 index 00000000..b6f91e6a --- /dev/null +++ b/doc/guides/nics/mlx5.rst @@ -0,0 +1,424 @@ +.. BSD LICENSE + Copyright 2015 6WIND S.A. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of 6WIND S.A. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +MLX5 poll mode driver +===================== + +The MLX5 poll mode driver library (**librte_pmd_mlx5**) provides support for +**Mellanox ConnectX-4** and **Mellanox ConnectX-4 Lx** families of +10/25/40/50/100 Gb/s adapters as well as their virtual functions (VF) in +SR-IOV context. + +Information and documentation about these adapters can be found on the +`Mellanox website <http://www.mellanox.com>`__. Help is also provided by the +`Mellanox community <http://community.mellanox.com/welcome>`__. + +There is also a `section dedicated to this poll mode driver +<http://www.mellanox.com/page/products_dyn?product_family=209&mtag=pmd_for_dpdk>`__. + +.. note:: + + Due to external dependencies, this driver is disabled by default. It must + be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX5_PMD=y`` and + recompiling DPDK. + +Implementation details +---------------------- + +Besides its dependency on libibverbs (that implies libmlx5 and associated +kernel support), librte_pmd_mlx5 relies heavily on system calls for control +operations such as querying/updating the MTU and flow control parameters. + +For security reasons and robustness, this driver only deals with virtual +memory addresses. The way resources allocations are handled by the kernel +combined with hardware specifications that allow it to handle virtual memory +addresses directly ensure that DPDK applications cannot access random +physical memory (or memory that does not belong to the current process). + +This capability allows the PMD to coexist with kernel network interfaces +which remain functional, although they stop receiving unicast packets as +long as they share the same MAC address. + +Enabling librte_pmd_mlx5 causes DPDK applications to be linked against +libibverbs. + +Features +-------- + +- Multiple TX and RX queues. +- Support for scattered TX and RX frames. +- IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues. +- Several RSS hash keys, one for each flow type. +- Configurable RETA table. +- Support for multiple MAC addresses. +- VLAN filtering. +- RX VLAN stripping. +- TX VLAN insertion. +- RX CRC stripping configuration. +- Promiscuous mode. +- Multicast promiscuous mode. +- Hardware checksum offloads. +- Flow director (RTE_FDIR_MODE_PERFECT and RTE_FDIR_MODE_PERFECT_MAC_VLAN). +- Secondary process TX is supported. + +Limitations +----------- + +- KVM and VMware ESX SR-IOV modes are not supported yet. +- Inner RSS for VXLAN frames is not supported yet. +- Port statistics through software counters only. +- Hardware checksum offloads for VXLAN inner header are not supported yet. +- Secondary process RX is not supported. + +Configuration +------------- + +Compilation options +~~~~~~~~~~~~~~~~~~~ + +These options can be modified in the ``.config`` file. + +- ``CONFIG_RTE_LIBRTE_MLX5_PMD`` (default **n**) + + Toggle compilation of librte_pmd_mlx5 itself. + +- ``CONFIG_RTE_LIBRTE_MLX5_DEBUG`` (default **n**) + + Toggle debugging code and stricter compilation flags. Enabling this option + adds additional run-time checks and debugging messages at the cost of + lower performance. + +- ``CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N`` (default **4**) + + Number of scatter/gather elements (SGEs) per work request (WR). Lowering + this number improves performance but also limits the ability to receive + scattered packets (packets that do not fit a single mbuf). The default + value is a safe tradeoff. + +- ``CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE`` (default **0**) + + Amount of data to be inlined during TX operations. Improves latency. + Can improve PPS performance when PCI backpressure is detected and may be + useful for scenarios involving heavy traffic on many queues. + + Since the additional software logic necessary to handle this mode can + lower performance when there is no backpressure, it is not enabled by + default. + +- ``CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE`` (default **8**) + + Maximum number of cached memory pools (MPs) per TX queue. Each MP from + which buffers are to be transmitted must be associated to memory regions + (MRs). This is a slow operation that must be cached. + + This value is always 1 for RX queues since they use a single MP. + +Environment variables +~~~~~~~~~~~~~~~~~~~~~ + +- ``MLX5_ENABLE_CQE_COMPRESSION`` + + A nonzero value lets ConnectX-4 return smaller completion entries to + improve performance when PCI backpressure is detected. It is most useful + for scenarios involving heavy traffic on many queues. + + Since the additional software logic necessary to handle this mode can + lower performance when there is no backpressure, it is not enabled by + default. + +- ``MLX5_PMD_ENABLE_PADDING`` + + Enables HW packet padding in PCI bus transactions. + + When packet size is cache aligned and CRC stripping is enabled, 4 fewer + bytes are written to the PCI bus. Enabling padding makes such packets + aligned again. + + In cases where PCI bandwidth is the bottleneck, padding can improve + performance by 10%. + + This is disabled by default since this can also decrease performance for + unaligned packet sizes. + +Run-time configuration +~~~~~~~~~~~~~~~~~~~~~~ + +- librte_pmd_mlx5 brings kernel network interfaces up during initialization + because it is affected by their state. Forcing them down prevents packets + reception. + +- **ethtool** operations on related kernel interfaces also affect the PMD. + +Prerequisites +------------- + +This driver relies on external libraries and kernel drivers for resources +allocations and initialization. The following dependencies are not part of +DPDK and must be installed separately: + +- **libibverbs** + + User space Verbs framework used by librte_pmd_mlx5. This library provides + a generic interface between the kernel and low-level user space drivers + such as libmlx5. + + It allows slow and privileged operations (context initialization, hardware + resources allocations) to be managed by the kernel and fast operations to + never leave user space. + +- **libmlx5** + + Low-level user space driver library for Mellanox ConnectX-4 devices, + it is automatically loaded by libibverbs. + + This library basically implements send/receive calls to the hardware + queues. + +- **Kernel modules** (mlnx-ofed-kernel) + + They provide the kernel-side Verbs API and low level device drivers that + manage actual hardware initialization and resources sharing with user + space processes. + + Unlike most other PMDs, these modules must remain loaded and bound to + their devices: + + - mlx5_core: hardware driver managing Mellanox ConnectX-4 devices and + related Ethernet kernel network devices. + - mlx5_ib: InifiniBand device driver. + - ib_uverbs: user space driver for Verbs (entry point for libibverbs). + +- **Firmware update** + + Mellanox OFED releases include firmware updates for ConnectX-4 adapters. + + Because each release provides new features, these updates must be applied to + match the kernel modules and libraries they come with. + +.. note:: + + Both libraries are BSD and GPL licensed. Linux kernel modules are GPL + licensed. + +Currently supported by DPDK: + +- Mellanox OFED **3.1-1.0.3**, **3.1-1.5.7.1** or **3.2-2.0.0.0** depending + on usage. + + The following features are supported with version **3.1-1.5.7.1** and + above only: + + - IPv6, UPDv6, TCPv6 RSS. + - RX checksum offloads. + - IBM POWER8. + + The following features are supported with version **3.2-2.0.0.0** and + above only: + + - Flow director. + - RX VLAN stripping. + - TX VLAN insertion. + - RX CRC stripping configuration. + +- Minimum firmware version: + + With MLNX_OFED **3.1-1.0.3**: + + - ConnectX-4: **12.12.1240** + - ConnectX-4 Lx: **14.12.1100** + + With MLNX_OFED **3.1-1.5.7.1**: + + - ConnectX-4: **12.13.0144** + - ConnectX-4 Lx: **14.13.0144** + + With MLNX_OFED **3.2-2.0.0.0**: + + - ConnectX-4: **12.14.2036** + - ConnectX-4 Lx: **14.14.2036** + +Getting Mellanox OFED +~~~~~~~~~~~~~~~~~~~~~ + +While these libraries and kernel modules are available on OpenFabrics +Alliance's `website <https://www.openfabrics.org/>`__ and provided by package +managers on most distributions, this PMD requires Ethernet extensions that +may not be supported at the moment (this is a work in progress). + +`Mellanox OFED +<http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux>`__ +includes the necessary support and should be used in the meantime. For DPDK, +only libibverbs, libmlx5, mlnx-ofed-kernel packages and firmware updates are +required from that distribution. + +.. note:: + + Several versions of Mellanox OFED are available. Installing the version + this DPDK release was developed and tested against is strongly + recommended. Please check the `prerequisites`_. + +Notes for testpmd +----------------- + +Compared to librte_pmd_mlx4 that implements a single RSS configuration per +port, librte_pmd_mlx5 supports per-protocol RSS configuration. + +Since ``testpmd`` defaults to IP RSS mode and there is currently no +command-line parameter to enable additional protocols (UDP and TCP as well +as IP), the following commands must be entered from its CLI to get the same +behavior as librte_pmd_mlx4: + +.. code-block:: console + + > port stop all + > port config all rss all + > port start all + +Usage example +------------- + +This section demonstrates how to launch **testpmd** with Mellanox ConnectX-4 +devices managed by librte_pmd_mlx5. + +#. Load the kernel modules: + + .. code-block:: console + + modprobe -a ib_uverbs mlx5_core mlx5_ib + + Alternatively if MLNX_OFED is fully installed, the following script can + be run: + + .. code-block:: console + + /etc/init.d/openibd restart + + .. note:: + + User space I/O kernel modules (uio and igb_uio) are not used and do + not have to be loaded. + +#. Make sure Ethernet interfaces are in working order and linked to kernel + verbs. Related sysfs entries should be present: + + .. code-block:: console + + ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5 + + Example output: + + .. code-block:: console + + eth30 + eth31 + eth32 + eth33 + +#. Optionally, retrieve their PCI bus addresses for whitelisting: + + .. code-block:: console + + { + for intf in eth2 eth3 eth4 eth5; + do + (cd "/sys/class/net/${intf}/device/" && pwd -P); + done; + } | + sed -n 's,.*/\(.*\),-w \1,p' + + Example output: + + .. code-block:: console + + -w 0000:05:00.1 + -w 0000:06:00.0 + -w 0000:06:00.1 + -w 0000:05:00.0 + +#. Request huge pages: + + .. code-block:: console + + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages + +#. Start testpmd with basic parameters: + + .. code-block:: console + + testpmd -c 0xff00 -n 4 -w 05:00.0 -w 05:00.1 -w 06:00.0 -w 06:00.1 -- --rxq=2 --txq=2 -i + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:05:00.0 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_0" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fe + EAL: PCI device 0000:05:00.1 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_1" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:ff + EAL: PCI device 0000:06:00.0 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_2" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fa + EAL: PCI device 0000:06:00.1 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_3" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fb + Interactive-mode selected + Configuring Port 0 (socket 0) + PMD: librte_pmd_mlx5: 0x8cba80: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8cba80: RX queues number update: 0 -> 2 + Port 0: E4:1D:2D:E7:0C:FE + Configuring Port 1 (socket 0) + PMD: librte_pmd_mlx5: 0x8ccac8: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8ccac8: RX queues number update: 0 -> 2 + Port 1: E4:1D:2D:E7:0C:FF + Configuring Port 2 (socket 0) + PMD: librte_pmd_mlx5: 0x8cdb10: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8cdb10: RX queues number update: 0 -> 2 + Port 2: E4:1D:2D:E7:0C:FA + Configuring Port 3 (socket 0) + PMD: librte_pmd_mlx5: 0x8ceb58: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8ceb58: RX queues number update: 0 -> 2 + Port 3: E4:1D:2D:E7:0C:FB + Checking link statuses... + Port 0 Link Up - speed 40000 Mbps - full-duplex + Port 1 Link Up - speed 40000 Mbps - full-duplex + Port 2 Link Up - speed 10000 Mbps - full-duplex + Port 3 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst new file mode 100644 index 00000000..dfc36836 --- /dev/null +++ b/doc/guides/nics/nfp.rst @@ -0,0 +1,265 @@ +.. BSD LICENSE + Copyright(c) 2015 Netronome Systems, Inc. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +NFP poll mode driver library +============================ + +Netronome's sixth generation of flow processors pack 216 programmable +cores and over 100 hardware accelerators that uniquely combine packet, +flow, security and content processing in a single device that scales +up to 400 Gbps. + +This document explains how to use DPDK with the Netronome Poll Mode +Driver (PMD) supporting Netronome's Network Flow Processor 6xxx +(NFP-6xxx). + +Currently the driver supports virtual functions (VFs) only. + +Dependencies +------------ + +Before using the Netronome's DPDK PMD some NFP-6xxx configuration, +which is not related to DPDK, is required. The system requires +installation of **Netronome's BSP (Board Support Package)** which includes +Linux drivers, programs and libraries. + +If you have a NFP-6xxx device you should already have the code and +documentation for doing this configuration. Contact +**support@netronome.com** to obtain the latest available firmware. + +The NFP Linux kernel drivers (including the required PF driver for the +NFP) are available on Github at +**https://github.com/Netronome/nfp-drv-kmods** along with build +instructions. + +DPDK runs in userspace and PMDs uses the Linux kernel UIO interface to +allow access to physical devices from userspace. The NFP PMD requires +a separate UIO driver, **nfp_uio**, to perform correct +initialization. This driver is part of Netronome´s BSP and it is +equivalent to Intel's igb_uio driver. + +Building the software +--------------------- + +Netronome's PMD code is provided in the **drivers/net/nfp** directory. +Because Netronome´s BSP dependencies the driver is disabled by default +in DPDK build using **common_linuxapp configuration** file. Enabling the +driver or if you use another configuration file and want to have NFP +support, this variable is needed: + +- **CONFIG_RTE_LIBRTE_NFP_PMD=y** + +Once DPDK is built all the DPDK apps and examples include support for +the NFP PMD. + + +System configuration +-------------------- + +Using the NFP PMD is not different to using other PMDs. Usual steps are: + +#. **Configure hugepages:** All major Linux distributions have the hugepages + functionality enabled by default. By default this allows the system uses for + working with transparent hugepages. But in this case some hugepages need to + be created/reserved for use with the DPDK through the hugetlbfs file system. + First the virtual file system need to be mounted: + + .. code-block:: console + + mount -t hugetlbfs none /mnt/hugetlbfs + + The command uses the common mount point for this file system and it needs to + be created if necessary. + + Configuring hugepages is performed via sysfs: + + .. code-block:: console + + /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages + + This sysfs file is used to specify the number of hugepages to reserve. + For example: + + .. code-block:: console + + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages + + This will reserve 2GB of memory using 1024 2MB hugepages. The file may be + read to see if the operation was performed correctly: + + .. code-block:: console + + cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages + + The number of unused hugepages may also be inspected. + + Before executing the DPDK app it should match the value of nr_hugepages. + + .. code-block:: console + + cat /sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages + + The hugepages reservation should be performed at system initialization and + it is usual to use a kernel parameter for configuration. If the reservation + is attempted on a busy system it will likely fail. Reserving memory for + hugepages may be done adding the following to the grub kernel command line: + + .. code-block:: console + + default_hugepagesz=1M hugepagesz=2M hugepages=1024 + + This will reserve 2GBytes of memory using 2Mbytes huge pages. + + Finally, for a NUMA system the allocation needs to be made on the correct + NUMA node. In a DPDK app there is a master core which will (usually) perform + memory allocation. It is important that some of the hugepages are reserved + on the NUMA memory node where the network device is attached. This is because + of a restriction in DPDK by which TX and RX descriptors rings must be created + on the master code. + + Per-node allocation of hugepages may be inspected and controlled using sysfs. + For example: + + .. code-block:: console + + cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages + + For a NUMA system there will be a specific hugepage directory per node + allowing control of hugepage reservation. A common problem may occur when + hugepages reservation is performed after the system has been working for + some time. Configuration using the global sysfs hugepage interface will + succeed but the per-node allocations may be unsatisfactory. + + The number of hugepages that need to be reserved depends on how the app uses + TX and RX descriptors, and packets mbufs. + +#. **Enable SR-IOV on the NFP-6xxx device:** The current NFP PMD works with + Virtual Functions (VFs) on a NFP device. Make sure that one of the Physical + Function (PF) drivers from the above Github repository is installed and + loaded. + + Virtual Functions need to be enabled before they can be used with the PMD. + Before enabling the VFs it is useful to obtain information about the + current NFP PCI device detected by the system: + + .. code-block:: console + + lspci -d19ee: + + Now, for example, configure two virtual functions on a NFP-6xxx device + whose PCI system identity is "0000:03:00.0": + + .. code-block:: console + + echo 2 > /sys/bus/pci/devices/0000:03:00.0/sriov_numvfs + + The result of this command may be shown using lspci again: + + .. code-block:: console + + lspci -d19ee: -k + + Two new PCI devices should appear in the output of the above command. The + -k option shows the device driver, if any, that devices are bound to. + Depending on the modules loaded at this point the new PCI devices may be + bound to nfp_netvf driver. + +#. **To install the uio kernel module (manually):** All major Linux + distributions have support for this kernel module so it is straightforward + to install it: + + .. code-block:: console + + modprobe uio + + The module should now be listed by the lsmod command. + +#. **To install the nfp_uio kernel module (manually):** This module supports + NFP-6xxx devices through the UIO interface. + + This module is part of Netronome´s BSP and it should be available when the + BSP is installed. + + .. code-block:: console + + modprobe nfp_uio.ko + + The module should now be listed by the lsmod command. + + Depending on which NFP modules are loaded, nfp_uio may be automatically + bound to the NFP PCI devices by the system. Otherwise the binding needs + to be done explicitly. This is the case when nfp_netvf, the Linux kernel + driver for NFP VFs, was loaded when VFs were created. As described later + in this document this configuration may also be performed using scripts + provided by the Netronome´s BSP. + + First the device needs to be unbound, for example from the nfp_netvf + driver: + + .. code-block:: console + + echo 0000:03:08.0 > /sys/bus/pci/devices/0000:03:08.0/driver/unbind + + lspci -d19ee: -k + + The output of lspci should now show that 0000:03:08.0 is not bound to + any driver. + + The next step is to add the NFP PCI ID to the NFP UIO driver: + + .. code-block:: console + + echo 19ee 6003 > /sys/bus/pci/drivers/nfp_uio/new_id + + And then to bind the device to the nfp_uio driver: + + .. code-block:: console + + echo 0000:03:08.0 > /sys/bus/pci/drivers/nfp_uio/bind + + lspci -d19ee: -k + + lspci should show that device bound to nfp_uio driver. + +#. **Using tools from Netronome´s BSP to install and bind modules:** DPDK provides + scripts which are useful for installing the UIO modules and for binding the + right device to those modules avoiding doing so manually. However, these scripts + have not support for Netronome´s UIO driver. Along with drivers, the BSP installs + those DPDK scripts slightly modified with support for Netronome´s UIO driver. + + Those specific scripts can be found in Netronome´s BSP installation directory. + Refer to BSP documentation for more information. + + * **setup.sh** + * **dpdk_nic_bind.py** + + Configuration may be performed by running setup.sh which invokes + dpdk_nic_bind.py as needed. Executing setup.sh will display a menu of + configuration options. diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst new file mode 100644 index 00000000..ed116e31 --- /dev/null +++ b/doc/guides/nics/overview.rst @@ -0,0 +1,149 @@ +.. BSD LICENSE + Copyright 2016 6WIND S.A. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of 6WIND S.A. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Overview of Networking Drivers +============================== + +The networking drivers may be classified in two categories: + +- physical for real devices +- virtual for emulated devices + +Some physical devices may be shaped through a virtual layer as for +SR-IOV. +The interface seen in the virtual environment is a VF (Virtual Function). + +The ethdev layer exposes an API to use the networking functions +of these devices. +The bottom half part of ethdev is implemented by the drivers. +Thus some features may not be implemented. + +There are more differences between drivers regarding some internal properties, +portability or even documentation availability. +Most of these differences are summarized below. + +.. _table_net_pmd_features: + +.. raw:: html + + <style> + table#id1 th { + font-size: 80%; + white-space: pre-wrap; + text-align: center; + vertical-align: top; + padding: 3px; + } + table#id1 th:first-child { + vertical-align: bottom; + } + table#id1 td { + font-size: 70%; + padding: 1px; + } + table#id1 td:first-child { + padding-left: 1em; + } + </style> + +.. table:: Features availability in networking drivers + + ==================== = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = + Feature a b b b c e e e i i i i i i i i i i f f f f m m m n n p r s v v v v x + f n n o x 1 n n 4 4 4 4 g g x x x x m m m m l l p f u c i z h i i m e + p x x n g 0 a i 0 0 0 0 b b g g g g 1 1 1 1 x x i p l a n e o r r x n + a 2 2 d b 0 c e e e e v b b b b 0 0 0 0 4 5 p l p g d s t t n v + c x x i e 0 . v v f e e e e k k k k e a t i i e i + k v n . f f . v v . v v t o o t r + e f g . . . f f . f f a . 3 t + t v v v v v v 2 v + e e e e e e e + c c c c c c c + ==================== = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = + speed capabilities + link status X X X X X X X X X X X X X X X X X X + link status event X X X X X X X X X X X + queue status event X + Rx interrupt X X X X X X X X X X X X X X X + queue start/stop X X X X X X X X X X X X X X X X X X + MTU update X X X X X X X X X X + jumbo frame X X X X X X X X X X X X X X X X X X X X + scattered Rx X X X X X X X X X X X X X X X X X X X X X + LRO X X X X + TSO X X X X X X X X X X X X X X X X + promiscuous mode X X X X X X X X X X X X X X X X X X X X + allmulticast mode X X X X X X X X X X X X X X X X X X X + unicast MAC filter X X X X X X X X X X X X X X X X X X X X + multicast MAC filter X X X X X X X X X X X X X + RSS hash X X X X X X X X X X X X X X X X X X + RSS key update X X X X X X X X X X X X X X X + RSS reta update X X X X X X X X X X X X X X X + VMDq X X X X X X X + SR-IOV X X X X X X X X X + DCB X X X X X + VLAN filter X X X X X X X X X X X X X X X X X X + ethertype filter X X X X X + n-tuple filter X X X + SYN filter X X X + tunnel filter X X X X + flexible filter X + hash filter X X X X + flow director X X X X X + flow control X X X X X X X + rate limitation X X + traffic mirroring X X X X + CRC offload X X X X X X X X X X X X X X X + VLAN offload X X X X X X X X X X X X X X X + QinQ offload X X X X X X X + L3 checksum offload X X X X X X X X X X X X X X X X + L4 checksum offload X X X X X X X X X X X X X X X X + inner L3 checksum X X X X X X + inner L4 checksum X X X X X X + packet type parsing X X X X X X X X X X X X X X + timesync X X X X X + basic stats X X X X X X X X X X X X X X X X X X X X X X X X X X X + extended stats X X X X X X X X X X X X X X X X X + stats per queue X X X X X X X X X X X X + EEPROM dump X X X + registers dump X X X X X X + multiprocess aware X X X X X X X X X X X X X X X + BSD nic_uio X X X X X X X X X X X X X X X X X X X + Linux UIO X X X X X X X X X X X X X X X X X X X X X X + Linux VFIO X X X X X X X X X X X X X X X X X X X + other kdrv X X X + ARMv7 X X X + ARMv8 X X X + Power8 X X X + TILE-Gx X + x86-32 X X X X X X X X X X X X X X X X X X X X X X X X + x86-64 X X X X X X X X X X X X X X X X X X X X X X X X X X X + usage doc X X X X X X X X X + design doc + perf doc + ==================== = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = diff --git a/doc/guides/nics/pcap_ring.rst b/doc/guides/nics/pcap_ring.rst new file mode 100644 index 00000000..aa48d339 --- /dev/null +++ b/doc/guides/nics/pcap_ring.rst @@ -0,0 +1,271 @@ +.. BSD LICENSE + Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Libpcap and Ring Based Poll Mode Drivers +======================================== + +In addition to Poll Mode Drivers (PMDs) for physical and virtual hardware, +the DPDK also includes two pure-software PMDs. These two drivers are: + +* A libpcap -based PMD (librte_pmd_pcap) that reads and writes packets using libpcap, + - both from files on disk, as well as from physical NIC devices using standard Linux kernel drivers. + +* A ring-based PMD (librte_pmd_ring) that allows a set of software FIFOs (that is, rte_ring) + to be accessed using the PMD APIs, as though they were physical NICs. + +.. note:: + + The libpcap -based PMD is disabled by default in the build configuration files, + owing to an external dependency on the libpcap development files which must be installed on the board. + Once the libpcap development files are installed, + the library can be enabled by setting CONFIG_RTE_LIBRTE_PMD_PCAP=y and recompiling the DPDK. + +Using the Drivers from the EAL Command Line +------------------------------------------- + +For ease of use, the DPDK EAL also has been extended to allow pseudo-Ethernet devices, +using one or more of these drivers, +to be created at application startup time during EAL initialization. + +To do so, the --vdev= parameter must be passed to the EAL. +This takes take options to allow ring and pcap-based Ethernet to be allocated and used transparently by the application. +This can be used, for example, for testing on a virtual machine where there are no Ethernet ports. + +Libpcap-based PMD +~~~~~~~~~~~~~~~~~ + +Pcap-based devices can be created using the virtual device --vdev option. +The device name must start with the eth_pcap prefix followed by numbers or letters. +The name is unique for each device. Each device can have multiple stream options and multiple devices can be used. +Multiple device definitions can be arranged using multiple --vdev. +Device name and stream options must be separated by commas as shown below: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -c f -n 4 --vdev 'eth_pcap0,stream_opt0=..,stream_opt1=..' --vdev='eth_pcap1,stream_opt0=..' + +Device Streams +^^^^^^^^^^^^^^ + +Multiple ways of stream definitions can be assessed and combined as long as the following two rules are respected: + +* A device is provided with two different streams - reception and transmission. + +* A device is provided with one network interface name used for reading and writing packets. + +The different stream types are: + +* rx_pcap: Defines a reception stream based on a pcap file. + The driver reads each packet within the given pcap file as if it was receiving it from the wire. + The value is a path to a valid pcap file. + + rx_pcap=/path/to/file.pcap + +* tx_pcap: Defines a transmission stream based on a pcap file. + The driver writes each received packet to the given pcap file. + The value is a path to a pcap file. + The file is overwritten if it already exists and it is created if it does not. + + tx_pcap=/path/to/file.pcap + +* rx_iface: Defines a reception stream based on a network interface name. + The driver reads packets coming from the given interface using the Linux kernel driver for that interface. + The value is an interface name. + + rx_iface=eth0 + +* tx_iface: Defines a transmission stream based on a network interface name. + The driver sends packets to the given interface using the Linux kernel driver for that interface. + The value is an interface name. + + tx_iface=eth0 + +* iface: Defines a device mapping a network interface. + The driver both reads and writes packets from and to the given interface. + The value is an interface name. + + iface=eth0 + +Examples of Usage +^^^^^^^^^^^^^^^^^ + +Read packets from one pcap file and write them to another: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,rx_pcap=/path/to/ file_rx.pcap,tx_pcap=/path/to/file_tx.pcap' -- --port-topology=chained + +Read packets from a network interface and write them to a pcap file: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,rx_iface=eth0,tx_pcap=/path/to/file_tx.pcap' -- --port-topology=chained + +Read packets from a pcap file and write them to a network interface: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,rx_pcap=/path/to/ file_rx.pcap,tx_iface=eth1' -- --port-topology=chained + +Forward packets through two network interfaces: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,iface=eth0' --vdev='eth_pcap1;iface=eth1' + +Using libpcap-based PMD with the testpmd Application +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +One of the first things that testpmd does before starting to forward packets is to flush the RX streams +by reading the first 512 packets on every RX stream and discarding them. +When using a libpcap-based PMD this behavior can be turned off using the following command line option: + +.. code-block:: console + + --no-flush-rx + +It is also available in the runtime command line: + +.. code-block:: console + + set flush_rx on/off + +It is useful for the case where the rx_pcap is being used and no packets are meant to be discarded. +Otherwise, the first 512 packets from the input pcap file will be discarded by the RX flushing operation. + +.. code-block:: console + + $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_pcap0,rx_pcap=/path/to/ file_rx.pcap,tx_pcap=/path/to/file_tx.pcap' -- --port-topology=chained --no-flush-rx + + +Rings-based PMD +~~~~~~~~~~~~~~~ + +To run a DPDK application on a machine without any Ethernet devices, a pair of ring-based rte_ethdevs can be used as below. +The device names passed to the --vdev option must start with eth_ring and take no additional parameters. +Multiple devices may be specified, separated by commas. + +.. code-block:: console + + ./testpmd -c E -n 4 --vdev=eth_ring0 --vdev=eth_ring1 -- -i + EAL: Detected lcore 1 as core 1 on socket 0 + ... + + Interactive-mode selected + Configuring Port 0 (socket 0) + Configuring Port 1 (socket 0) + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + + testpmd> start tx_first + io packet forwarding - CRC stripping disabled - packets/burst=16 + nb forwarding cores=1 - nb forwarding ports=2 + RX queues=1 - RX desc=128 - RX free threshold=0 + RX threshold registers: pthresh=8 hthresh=8 wthresh=4 + TX queues=1 - TX desc=512 - TX free threshold=0 + TX threshold registers: pthresh=36 hthresh=0 wthresh=0 + TX RS bit threshold=0 - TXQ flags=0x0 + + testpmd> stop + Telling cores to stop... + Waiting for lcores to finish... + +.. image:: img/forward_stats.* + +.. code-block:: console + + +++++++++++++++ Accumulated forward statistics for allports++++++++++ + RX-packets: 462384736 RX-dropped: 0 RX-total: 462384736 + TX-packets: 462384768 TX-dropped: 0 TX-total: 462384768 + +++++++++++++++++++++++++++++++++++++++++++++++++++++ + + Done. + + +Using the Poll Mode Driver from an Application +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Both drivers can provide similar APIs to allow the user to create a PMD, that is, +rte_ethdev structure, instances at run-time in the end-application, +for example, using rte_eth_from_rings() or rte_eth_from_pcaps() APIs. +For the rings-based PMD, this functionality could be used, for example, +to allow data exchange between cores using rings to be done in exactly the +same way as sending or receiving packets from an Ethernet device. +For the libpcap-based PMD, it allows an application to open one or more pcap files +and use these as a source of packet input to the application. + +Usage Examples +^^^^^^^^^^^^^^ + +To create two pseudo-Ethernet ports where all traffic sent to a port is looped back +for reception on the same port (error handling omitted for clarity): + +.. code-block:: c + + #define RING_SIZE 256 + #define NUM_RINGS 2 + #define SOCKET0 0 + + struct rte_ring *ring[NUM_RINGS]; + int port0, port1; + + ring[0] = rte_ring_create("R0", RING_SIZE, SOCKET0, RING_F_SP_ENQ|RING_F_SC_DEQ); + ring[1] = rte_ring_create("R1", RING_SIZE, SOCKET0, RING_F_SP_ENQ|RING_F_SC_DEQ); + + /* create two ethdev's */ + + port0 = rte_eth_from_rings("eth_ring0", ring, NUM_RINGS, ring, NUM_RINGS, SOCKET0); + port1 = rte_eth_from_rings("eth_ring1", ring, NUM_RINGS, ring, NUM_RINGS, SOCKET0); + + +To create two pseudo-Ethernet ports where the traffic is switched between them, +that is, traffic sent to port 0 is read back from port 1 and vice-versa, +the final two lines could be changed as below: + +.. code-block:: c + + port0 = rte_eth_from_rings("eth_ring0", &ring[0], 1, &ring[1], 1, SOCKET0); + port1 = rte_eth_from_rings("eth_ring1", &ring[1], 1, &ring[0], 1, SOCKET0); + +This type of configuration could be useful in a pipeline model, for example, +where one may want to have inter-core communication using pseudo Ethernet devices rather than raw rings, +for reasons of API consistency. + +Enqueuing and dequeuing items from an rte_ring using the rings-based PMD may be slower than using the native rings API. +This is because DPDK Ethernet drivers make use of function pointers to call the appropriate enqueue or dequeue functions, +while the rte_ring specific functions are direct function calls in the code and are often inlined by the compiler. + + Once an ethdev has been created, for either a ring or a pcap-based PMD, + it should be configured and started in the same way as a regular Ethernet device, that is, + by calling rte_eth_dev_configure() to set the number of receive and transmit queues, + then calling rte_eth_rx_queue_setup() / tx_queue_setup() for each of those queues and + finally calling rte_eth_dev_start() to allow transmission and reception of packets to begin. diff --git a/doc/guides/nics/szedata2.rst b/doc/guides/nics/szedata2.rst new file mode 100644 index 00000000..741b4008 --- /dev/null +++ b/doc/guides/nics/szedata2.rst @@ -0,0 +1,150 @@ +.. BSD LICENSE + Copyright 2015 - 2016 CESNET + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of CESNET nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +SZEDATA2 poll mode driver library +================================= + +The SZEDATA2 poll mode driver library implements support for cards from COMBO +family (**COMBO-80G**, **COMBO-100G**). +The SZEDATA2 PMD uses interface provided by libsze2 library to communicate +with COMBO cards over sze2 layer. + +More information about family of +`COMBO cards <https://www.liberouter.org/technologies/cards/>`_ +and used technology +(`NetCOPE platform <https://www.liberouter.org/technologies/netcope/>`_) can be +found on the `Liberouter website <https://www.liberouter.org/>`_. + +.. note:: + + This driver has external dependencies. + Therefore it is disabled in default configuration files. + It can be enabled by setting ``CONFIG_RTE_LIBRTE_PMD_SZEDATA2=y`` + and recompiling. + +.. note:: + + Currently the driver is supported only on x86_64 architectures. + Only x86_64 versions of the external libraries are provided. + +Prerequisites +------------- + +This PMD requires kernel modules which are responsible for initialization and +allocation of resources needed for sze2 layer function. +Communication between PMD and kernel modules is mediated by libsze2 library. +These kernel modules and library are not part of DPDK and must be installed +separately: + +* **libsze2 library** + + The library provides API for initialization of sze2 transfers, receiving and + transmitting data segments. + +* **Kernel modules** + + * combov3 + * szedata2_cv3 + + Kernel modules manage initialization of hardware, allocation and + sharing of resources for user space applications. + +Information about getting the dependencies can be found `here +<https://www.liberouter.org/technologies/netcope/access-to-libsze2-library/>`_. + +Configuration +------------- + +These configuration options can be modified before compilation in the +``.config`` file: + +* ``CONFIG_RTE_LIBRTE_PMD_SZEDATA2`` default value: **n** + + Value **y** enables compilation of szedata2 PMD. + +* ``CONFIG_RTE_LIBRTE_PMD_SZEDATA2_AS`` default value: **0** + + This option defines type of firmware address space. + Currently supported value is: + + * **0** for firmwares: + + * NIC_100G1_LR4 + * HANIC_100G1_LR4 + * HANIC_100G1_SR10 + +Using the SZEDATA2 PMD +---------------------- + +From DPDK version 16.04 the type of SZEDATA2 PMD is changed to PMD_PDEV. +SZEDATA2 device is automatically recognized during EAL initialization. +No special command line options are needed. + +Kernel modules have to be loaded before running the DPDK application. + +Example of usage +---------------- + +Read packets from 0. and 1. receive channel and write them to 0. and 1. +transmit channel: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -c 0xf -n 2 \ + -- --port-topology=chained --rxq=2 --txq=2 --nb-cores=2 -i -a + +Example output: + +.. code-block:: console + + [...] + EAL: PCI device 0000:06:00.0 on NUMA socket -1 + EAL: probe driver: 1b26:c1c1 rte_szedata2_pmd + PMD: Initializing szedata2 device (0000:06:00.0) + PMD: SZEDATA2 path: /dev/szedataII0 + PMD: Available DMA channels RX: 8 TX: 8 + PMD: resource0 phys_addr = 0xe8000000 len = 134217728 virt addr = 7f48f8000000 + PMD: szedata2 device (0000:06:00.0) successfully initialized + Interactive-mode selected + Auto-start selected + Configuring Port 0 (socket 0) + Port 0: 00:11:17:00:00:00 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Done + Start automatic packet forwarding + io packet forwarding - CRC stripping disabled - packets/burst=32 + nb forwarding cores=2 - nb forwarding ports=1 + RX queues=2 - RX desc=128 - RX free threshold=0 + RX threshold registers: pthresh=0 hthresh=0 wthresh=0 + TX queues=2 - TX desc=512 - TX free threshold=0 + TX threshold registers: pthresh=0 hthresh=0 wthresh=0 + TX RS bit threshold=0 - TXQ flags=0x0 + testpmd> diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst new file mode 100644 index 00000000..1e3f1ade --- /dev/null +++ b/doc/guides/nics/vhost.rst @@ -0,0 +1,110 @@ +.. BSD LICENSE + Copyright(c) 2016 IGEL Co., Ltd.. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of IGEL Co., Ltd. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Poll Mode Driver that wraps vhost library +========================================= + +This PMD is a thin wrapper of the DPDK vhost library. +The user can handle virtqueues as one of normal DPDK port. + +Vhost Implementation in DPDK +---------------------------- + +Please refer to Chapter "Vhost Library" of *DPDK Programmer's Guide* to know detail of vhost. + +Features and Limitations of vhost PMD +------------------------------------- + +Currently, the vhost PMD provides the basic functionality of packet reception, transmission and event handling. + +* It has multiple queues support. + +* It supports ``RTE_ETH_EVENT_INTR_LSC`` and ``RTE_ETH_EVENT_QUEUE_STATE`` events. + +* It supports Port Hotplug functionality. + +* Don't need to stop RX/TX, when the user wants to stop a guest or a virtio-net driver on guest. + +Vhost PMD arguments +------------------- + +The user can specify below arguments in `--vdev` option. + +#. ``iface``: + + It is used to specify a path to connect to a QEMU virtio-net device. + +#. ``queues``: + + It is used to specify the number of queues virtio-net device has. + (Default: 1) + +Vhost PMD event handling +------------------------ + +This section describes how to handle vhost PMD events. + +The user can register an event callback handler with ``rte_eth_dev_callback_register()``. +The registered callback handler will be invoked with one of below event types. + +#. ``RTE_ETH_EVENT_INTR_LSC``: + + It means link status of the port was changed. + +#. ``RTE_ETH_EVENT_QUEUE_STATE``: + + It means some of queue statuses were changed. Call ``rte_eth_vhost_get_queue_event()`` in the callback handler. + Because changing multiple statuses may occur only one event, call the function repeatedly as long as it doesn't return negative value. + +Vhost PMD with testpmd application +---------------------------------- + +This section demonstrates vhost PMD with testpmd DPDK sample application. + +#. Launch the testpmd with vhost PMD: + + .. code-block:: console + + ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i + + Other basic DPDK preparations like hugepage enabling here. + Please refer to the *DPDK Getting Started Guide* for detailed instructions. + +#. Launch the QEMU: + + .. code-block:: console + + qemu-system-x86_64 <snip> + -chardev socket,id=chr0,path=/tmp/sock0 \ + -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \ + -device virtio-net-pci,netdev=net0 + + This command attaches one virtio-net device to QEMU guest. + After initialization processes between QEMU and DPDK vhost library are done, status of the port will be linked up. diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst new file mode 100644 index 00000000..06ca433a --- /dev/null +++ b/doc/guides/nics/virtio.rst @@ -0,0 +1,213 @@ +.. BSD LICENSE + Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Poll Mode Driver for Emulated Virtio NIC +======================================== + +Virtio is a para-virtualization framework initiated by IBM, and supported by KVM hypervisor. +In the Data Plane Development Kit (DPDK), +we provide a virtio Poll Mode Driver (PMD) as a software solution, comparing to SRIOV hardware solution, +for fast guest VM to guest VM communication and guest VM to host communication. + +Vhost is a kernel acceleration module for virtio qemu backend. +The DPDK extends kni to support vhost raw socket interface, +which enables vhost to directly read/ write packets from/to a physical port. +With this enhancement, virtio could achieve quite promising performance. + +In future release, we will also make enhancement to vhost backend, +releasing peak performance of virtio PMD driver. + +For basic qemu-KVM installation and other Intel EM poll mode driver in guest VM, +please refer to Chapter "Driver for VM Emulated Devices". + +In this chapter, we will demonstrate usage of virtio PMD driver with two backends, +standard qemu vhost back end and vhost kni back end. + +Virtio Implementation in DPDK +----------------------------- + +For details about the virtio spec, refer to Virtio PCI Card Specification written by Rusty Russell. + +As a PMD, virtio provides packet reception and transmission callbacks virtio_recv_pkts and virtio_xmit_pkts. + +In virtio_recv_pkts, index in range [vq->vq_used_cons_idx , vq->vq_ring.used->idx) in vring is available for virtio to burst out. + +In virtio_xmit_pkts, same index range in vring is available for virtio to clean. +Virtio will enqueue to be transmitted packets into vring, advance the vq->vq_ring.avail->idx, +and then notify the host back end if necessary. + +Features and Limitations of virtio PMD +-------------------------------------- + +In this release, the virtio PMD driver provides the basic functionality of packet reception and transmission. + +* It supports merge-able buffers per packet when receiving packets and scattered buffer per packet + when transmitting packets. The packet size supported is from 64 to 1518. + +* It supports multicast packets and promiscuous mode. + +* The descriptor number for the RX/TX queue is hard-coded to be 256 by qemu. + If given a different descriptor number by the upper application, + the virtio PMD generates a warning and fall back to the hard-coded value. + +* Features of mac/vlan filter are supported, negotiation with vhost/backend are needed to support them. + When backend can't support vlan filter, virtio app on guest should disable vlan filter to make sure + the virtio port is configured correctly. E.g. specify '--disable-hw-vlan' in testpmd command line. + +* RTE_PKTMBUF_HEADROOM should be defined larger than sizeof(struct virtio_net_hdr), which is 10 bytes. + +* Virtio does not support runtime configuration. + +* Virtio supports Link State interrupt. + +* Virtio supports software vlan stripping and inserting. + +* Virtio supports using port IO to get PCI resource when uio/igb_uio module is not available. + +Prerequisites +------------- + +The following prerequisites apply: + +* In the BIOS, turn VT-x and VT-d on + +* Linux kernel with KVM module; vhost module loaded and ioeventfd supported. + Qemu standard backend without vhost support isn't tested, and probably isn't supported. + +Virtio with kni vhost Back End +------------------------------ + +This section demonstrates kni vhost back end example setup for Phy-VM Communication. + +.. _figure_host_vm_comms: + +.. figure:: img/host_vm_comms.* + + Host2VM Communication Example Using kni vhost Back End + + +Host2VM communication example + +#. Load the kni kernel module: + + .. code-block:: console + + insmod rte_kni.ko + + Other basic DPDK preparations like hugepage enabling, uio port binding are not listed here. + Please refer to the *DPDK Getting Started Guide* for detailed instructions. + +#. Launch the kni user application: + + .. code-block:: console + + examples/kni/build/app/kni -c 0xf -n 4 -- -p 0x1 -P --config="(0,1,3)" + + This command generates one network device vEth0 for physical port. + If specify more physical ports, the generated network device will be vEth1, vEth2, and so on. + + For each physical port, kni creates two user threads. + One thread loops to fetch packets from the physical NIC port into the kni receive queue. + The other user thread loops to send packets in the kni transmit queue. + + For each physical port, kni also creates a kernel thread that retrieves packets from the kni receive queue, + place them onto kni's raw socket's queue and wake up the vhost kernel thread to exchange packets with the virtio virt queue. + + For more details about kni, please refer to :ref:`kni`. + +#. Enable the kni raw socket functionality for the specified physical NIC port, + get the generated file descriptor and set it in the qemu command line parameter. + Always remember to set ioeventfd_on and vhost_on. + + Example: + + .. code-block:: console + + echo 1 > /sys/class/net/vEth0/sock_en + fd=`cat /sys/class/net/vEth0/sock_fd` + exec qemu-system-x86_64 -enable-kvm -cpu host \ + -m 2048 -smp 4 -name dpdk-test1-vm1 \ + -drive file=/data/DPDKVMS/dpdk-vm.img \ + -netdev tap, fd=$fd,id=mynet_kni, script=no,vhost=on \ + -device virtio-net-pci,netdev=mynet_kni,bus=pci.0,addr=0x3,ioeventfd=on \ + -vnc:1 -daemonize + + In the above example, virtio port 0 in the guest VM will be associated with vEth0, which in turns corresponds to a physical port, + which means received packets come from vEth0, and transmitted packets is sent to vEth0. + +#. In the guest, bind the virtio device to the uio_pci_generic kernel module and start the forwarding application. + When the virtio port in guest bursts rx, it is getting packets from the raw socket's receive queue. + When the virtio port bursts tx, it is sending packet to the tx_q. + + .. code-block:: console + + modprobe uio + echo 512 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages + modprobe uio_pci_generic + python tools/dpdk_nic_bind.py -b uio_pci_generic 00:03.0 + + We use testpmd as the forwarding application in this example. + + .. figure:: img/console.* + + Running testpmd + +#. Use IXIA packet generator to inject a packet stream into the KNI physical port. + + The packet reception and transmission flow path is: + + IXIA packet generator->82599 PF->KNI rx queue->KNI raw socket queue->Guest VM virtio port 0 rx burst->Guest VM virtio port 0 tx burst-> KNI tx queue->82599 PF-> IXIA packet generator + +Virtio with qemu virtio Back End +-------------------------------- + +.. _figure_host_vm_comms_qemu: + +.. figure:: img/host_vm_comms_qemu.* + + Host2VM Communication Example Using qemu vhost Back End + + +.. code-block:: console + + qemu-system-x86_64 -enable-kvm -cpu host -m 2048 -smp 2 -mem-path /dev/ + hugepages -mem-prealloc + -drive file=/data/DPDKVMS/dpdk-vm1 + -netdev tap,id=vm1_p1,ifname=tap0,script=no,vhost=on + -device virtio-net-pci,netdev=vm1_p1,bus=pci.0,addr=0x3,ioeventfd=on + -device pci-assign,host=04:10.1 \ + +In this example, the packet reception flow path is: + + IXIA packet generator->82599 PF->Linux Bridge->TAP0's socket queue-> Guest VM virtio port 0 rx burst-> Guest VM 82599 VF port1 tx burst-> IXIA packet generator + +The packet transmission flow is: + + IXIA packet generator-> Guest VM 82599 VF port1 rx burst-> Guest VM virtio port 0 tx burst-> tap -> Linux Bridge->82599 PF-> IXIA packet generator diff --git a/doc/guides/nics/vmxnet3.rst b/doc/guides/nics/vmxnet3.rst new file mode 100644 index 00000000..e919088d --- /dev/null +++ b/doc/guides/nics/vmxnet3.rst @@ -0,0 +1,197 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Poll Mode Driver for Paravirtual VMXNET3 NIC +============================================ + +The VMXNET3 adapter is the next generation of a paravirtualized NIC, introduced by VMware* ESXi. +It is designed for performance and is not related to VMXNET or VMXENET2. +It offers all the features available in VMXNET2, and adds several new features such as, +multi-queue support (also known as Receive Side Scaling, RSS), +IPv6 offloads, and MSI/MSI-X interrupt delivery. +Because operating system vendors do not provide built-in drivers for this card, +VMware Tools must be installed to have a driver for the VMXNET3 network adapter available. +One can use the same device in a DPDK application with VMXNET3 PMD introduced in DPDK API. + +Currently, the driver provides basic support for using the device in a DPDK application running on a guest OS. +Optimization is needed on the backend, that is, the VMware* ESXi vmkernel switch, to achieve optimal performance end-to-end. + +In this chapter, two setups with the use of the VMXNET3 PMD are demonstrated: + +#. Vmxnet3 with a native NIC connected to a vSwitch + +#. Vmxnet3 chaining VMs connected to a vSwitch + +VMXNET3 Implementation in the DPDK +---------------------------------- + +For details on the VMXNET3 device, refer to the VMXNET3 driver's vmxnet3 directory and support manual from VMware*. + +For performance details, refer to the following link from VMware: + +`http://www.vmware.com/pdf/vsp_4_vmxnet3_perf.pdf <http://www.vmware.com/pdf/vsp_4_vmxnet3_perf.pdf>`_ + +As a PMD, the VMXNET3 driver provides the packet reception and transmission callbacks, vmxnet3_recv_pkts and vmxnet3_xmit_pkts. +It does not support scattered packet reception as part of vmxnet3_recv_pkts and vmxnet3_xmit_pkts. +Also, it does not support scattered packet reception as part of the device operations supported. + +The VMXNET3 PMD handles all the packet buffer memory allocation and resides in guest address space +and it is solely responsible to free that memory when not needed. +The packet buffers and features to be supported are made available to hypervisor via VMXNET3 PCI configuration space BARs. +During RX/TX, the packet buffers are exchanged by their GPAs, +and the hypervisor loads the buffers with packets in the RX case and sends packets to vSwitch in the TX case. + +The VMXNET3 PMD is compiled with vmxnet3 device headers. +The interface is similar to that of the other PMDs available in the DPDK API. +The driver pre-allocates the packet buffers and loads the command ring descriptors in advance. +The hypervisor fills those packet buffers on packet arrival and write completion ring descriptors, +which are eventually pulled by the PMD. +After reception, the DPDK application frees the descriptors and loads new packet buffers for the coming packets. +The interrupts are disabled and there is no notification required. +This keeps performance up on the RX side, even though the device provides a notification feature. + +In the transmit routine, the DPDK application fills packet buffer pointers in the descriptors of the command ring +and notifies the hypervisor. +In response the hypervisor takes packets and passes them to the vSwitch. It writes into the completion descriptors ring. +The rings are read by the PMD in the next transmit routine call and the buffers and descriptors are freed from memory. + +Features and Limitations of VMXNET3 PMD +--------------------------------------- + +In release 1.6.0, the VMXNET3 PMD provides the basic functionality of packet reception and transmission. +There are several options available for filtering packets at VMXNET3 device level including: + +#. MAC Address based filtering: + + * Unicast, Broadcast, All Multicast modes - SUPPORTED BY DEFAULT + + * Multicast with Multicast Filter table - NOT SUPPORTED + + * Promiscuous mode - SUPPORTED + + * RSS based load balancing between queues - SUPPORTED + +#. VLAN filtering: + + * VLAN tag based filtering without load balancing - SUPPORTED + +.. note:: + + + * Release 1.6.0 does not support separate headers and body receive cmd_ring and hence, + multiple segment buffers are not supported. + Only cmd_ring_0 is used for packet buffers, one for each descriptor. + + * Receive and transmit of scattered packets is not supported. + + * Multicast with Multicast Filter table is not supported. + +Prerequisites +------------- + +The following prerequisites apply: + +* Before starting a VM, a VMXNET3 interface to a VM through VMware vSphere Client must be assigned. + This is shown in the figure below. + +.. _figure_vmxnet3_int: + +.. figure:: img/vmxnet3_int.* + + Assigning a VMXNET3 interface to a VM using VMware vSphere Client + +.. note:: + + Depending on the Virtual Machine type, the VMware vSphere Client shows Ethernet adaptors while adding an Ethernet device. + Ensure that the VM type used offers a VMXNET3 device. Refer to the VMware documentation for a listed of VMs. + +.. note:: + + Follow the *DPDK Getting Started Guide* to setup the basic DPDK environment. + +.. note:: + + Follow the *DPDK Sample Application's User Guide*, L2 Forwarding/L3 Forwarding and + TestPMD for instructions on how to run a DPDK application using an assigned VMXNET3 device. + +VMXNET3 with a Native NIC Connected to a vSwitch +------------------------------------------------ + +This section describes an example setup for Phy-vSwitch-VM-Phy communication. + +.. _figure_vswitch_vm: + +.. figure:: img/vswitch_vm.* + + VMXNET3 with a Native NIC Connected to a vSwitch + +.. note:: + + Other instructions on preparing to use DPDK such as, hugepage enabling, uio port binding are not listed here. + Please refer to *DPDK Getting Started Guide and DPDK Sample Application's User Guide* for detailed instructions. + +The packet reception and transmission flow path is:: + + Packet generator -> 82576 + -> VMware ESXi vSwitch + -> VMXNET3 device + -> Guest VM VMXNET3 port 0 rx burst + -> Guest VM 82599 VF port 0 tx burst + -> 82599 VF + -> Packet generator + +VMXNET3 Chaining VMs Connected to a vSwitch +------------------------------------------- + +The following figure shows an example VM-to-VM communication over a Phy-VM-vSwitch-VM-Phy communication channel. + +.. _figure_vm_vm_comms: + +.. figure:: img/vm_vm_comms.* + + VMXNET3 Chaining VMs Connected to a vSwitch + +.. note:: + + When using the L2 Forwarding or L3 Forwarding applications, + a destination MAC address needs to be written in packets to hit the other VM's VMXNET3 interface. + +In this example, the packet flow path is:: + + Packet generator -> 82599 VF + -> Guest VM 82599 port 0 rx burst + -> Guest VM VMXNET3 port 1 tx burst + -> VMXNET3 device + -> VMware ESXi vSwitch + -> VMXNET3 device + -> Guest VM VMXNET3 port 0 rx burst + -> Guest VM 82599 VF port 1 tx burst + -> 82599 VF + -> Packet generator |