summaryrefslogtreecommitdiffstats
path: root/doc/guides/nics/enic.rst
blob: a11627a0b7a669a683318c858255763922dc2a0d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
..  BSD LICENSE
    Copyright (c) 2017, Cisco Systems, Inc.
    All rights reserved.

    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions
    are met:

    1. Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in
    the documentation and/or other materials provided with the
    distribution.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
    FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
    COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
    INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
    BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
    LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
    CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
    LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
    ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
    POSSIBILITY OF SUCH DAMAGE.

ENIC Poll Mode Driver
=====================

ENIC PMD is the DPDK poll-mode driver for the Cisco System Inc. VIC Ethernet
NICs. These adapters are also referred to as vNICs below. If you are running
or would like to run DPDK software applications on Cisco UCS servers using
Cisco VIC adapters the following documentation is relevant.

How to obtain ENIC PMD integrated DPDK
--------------------------------------

ENIC PMD support is integrated into the DPDK suite. dpdk-<version>.tar.gz
should be downloaded from http://dpdk.org


Configuration information
-------------------------

- **DPDK Configuration Parameters**

  The following configuration options are available for the ENIC PMD:

  - **CONFIG_RTE_LIBRTE_ENIC_PMD** (default y): Enables or disables inclusion
    of the ENIC PMD driver in the DPDK compilation.

  - **CONFIG_RTE_LIBRTE_ENIC_DEBUG** (default n): Enables or disables debug
    logging within the ENIC PMD driver.

- **vNIC Configuration Parameters**

  - **Number of Queues**

    The maximum number of receive queues (RQs), work queues (WQs) and
    completion queues (CQs) are configurable on a per vNIC basis
    through the Cisco UCS Manager (CIMC or UCSM).

    These values should be configured as follows:

    - The number of WQs should be greater or equal to the value of the
      expected nb_tx_q parameter in the call to the
      rte_eth_dev_configure()

    - The number of RQs configured in the vNIC should be greater or
      equal to *twice* the value of the expected nb_rx_q parameter in
      the call to rte_eth_dev_configure().  With the addition of Rx
      scatter, a pair of RQs on the vnic is needed for each receive
      queue used by DPDK, even if Rx scatter is not being used.
      Having a vNIC with only 1 RQ is not a valid configuration, and
      will fail with an error message.

    - The number of CQs should set so that there is one CQ for each
      WQ, and one CQ for each pair of RQs.

    For example: If the application requires 3 Rx queues, and 3 Tx
    queues, the vNIC should be configured to have at least 3 WQs, 6
    RQs (3 pairs), and 6 CQs (3 for use by WQs + 3 for use by the 3
    pairs of RQs).

  - **Size of Queues**

    Likewise, the number of receive and transmit descriptors are configurable on
    a per vNIC bases via the UCS Manager and should be greater than or equal to
    the nb_rx_desc and   nb_tx_desc parameters expected to be used in the calls
    to rte_eth_rx_queue_setup() and rte_eth_tx_queue_setup() respectively.
    An application requesting more than the set size will be limited to that
    size.

    Unless there is a lack of resources due to creating many vNICs, it
    is recommended that the WQ and RQ sizes be set to the maximum.  This
    gives the application the greatest amount of flexibility in its
    queue configuration.

    - *Note*: Since the introduction of Rx scatter, for performance
      reasons, this PMD uses two RQs on the vNIC per receive queue in
      DPDK.  One RQ holds descriptors for the start of a packet the
      second RQ holds the descriptors for the rest of the fragments of
      a packet.  This means that the nb_rx_desc parameter to
      rte_eth_rx_queue_setup() can be a greater than 4096.  The exact
      amount will depend on the size of the mbufs being used for
      receives, and the MTU size.

      For example: If the mbuf size is 2048, and the MTU is 9000, then
      receiving a full size packet will take 5 descriptors, 1 from the
      start of packet queue, and 4 from the second queue.  Assuming
      that the RQ size was set to the maximum of 4096, then the
      application can specify up to 1024 + 4096 as the nb_rx_desc
      parameter to rte_eth_rx_queue_setup().

  - **Interrupts**

    Only one interrupt per vNIC interface should be configured in the UCS
    manager regardless of the number receive/transmit queues. The ENIC PMD
    uses this interrupt to get information about link status and errors
    in the fast path.

.. _enic-flow-director:

Flow director support
---------------------

Advanced filtering support was added to 1300 series VIC firmware starting
with version 2.0.13 for C-series UCS servers and version 3.1.2 for UCSM
managed blade servers. In order to enable advanced filtering the 'Advanced
filter' radio button should be enabled via CIMC or UCSM followed by a reboot
of the server.

With advanced filters, perfect matching of all fields of IPv4, IPv6 headers
as well as TCP, UDP and SCTP L4 headers is available through flow director.
Masking of these fields for partial match is also supported.

Without advanced filter support, the flow director is limited to IPv4
perfect filtering of the 5-tuple with no masking of fields supported.

SR-IOV mode utilization
-----------------------

UCS blade servers configured with dynamic vNIC connection policies in UCS
manager are capable of supporting assigned devices on virtual machines (VMs)
through a KVM hypervisor. Assigned devices, also known as 'passthrough'
devices, are SR-IOV virtual functions (VFs) on the host which are exposed
to VM instances.

The Cisco Virtual Machine Fabric Extender (VM-FEX) gives the VM a dedicated
interface on the Fabric Interconnect (FI). Layer 2 switching is done at
the FI. This may eliminate the requirement for software switching on the
host to route intra-host VM traffic.

Please refer to `Creating a Dynamic vNIC Connection Policy
<http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/sw/vm_fex/vmware/gui/config_guide/b_GUI_VMware_VM-FEX_UCSM_Configuration_Guide/b_GUI_VMware_VM-FEX_UCSM_Configuration_Guide_chapter_010.html#task_433E01651F69464783A68E66DA8A47A5>`_
for information on configuring SR-IOV Adapter policies using UCS manager.

Once the policies are in place and the host OS is rebooted, VFs should be
visible on the host, E.g.:

.. code-block:: console

     # lspci | grep Cisco | grep Ethernet
     0d:00.0 Ethernet controller: Cisco Systems Inc VIC Ethernet NIC (rev a2)
     0d:00.1 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
     0d:00.2 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
     0d:00.3 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
     0d:00.4 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
     0d:00.5 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
     0d:00.6 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)
     0d:00.7 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)

Enable Intel IOMMU on the host and install KVM and libvirt. A VM instance should
be created with an assigned device. When using libvirt, this configuration can
be done within the domain (i.e. VM) config file. For example this entry maps
host VF 0d:00:01 into the VM.

.. code-block:: console

    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:ac:ff:b6'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x0d' slot='0x00' function='0x1'/>
      </source>

Alternatively, the configuration can be done in a separate file using the
``network`` keyword. These methods are described in the libvirt documentation for
`Network XML format <https://libvirt.org/formatnetwork.html>`_.

When the VM instance is started, the ENIC KVM driver will bind the host VF to
vfio, complete provisioning on the FI and bring up the link.

.. note::

    It is not possible to use a VF directly from the host because it is not
    fully provisioned until the hypervisor brings up the VM that it is assigned
    to.

In the VM instance, the VF will now be visible. E.g., here the VF 00:04.0 is
seen on the VM instance and should be available for binding to a DPDK.

.. code-block:: console

     # lspci | grep Ether
     00:04.0 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2)

Follow the normal DPDK install procedure, binding the VF to either ``igb_uio``
or ``vfio`` in non-IOMMU mode.

Please see :ref:`Limitations <enic_limitations>` for limitations in
the use of SR-IOV.

.. _enic-genic-flow-api:

Generic Flow API support
------------------------

Generic Flow API is supported. The baseline support is:

- **1200 series VICs**

  5-tuple exact Flow support for 1200 series adapters. This allows:

  - Attributes: ingress
  - Items: ipv4, ipv6, udp, tcp (must exactly match src/dst IP
    addresses and ports and all must be specified).
  - Actions: queue and void
  - Selectors: 'is'

- **1300 series VICS with Advanced filters disabled**

  With advanced filters disabled, an IPv4 or IPv6 item must be specified
  in the pattern.

  - Attributes: ingress
  - Items: eth, ipv4, ipv6, udp, tcp, vxlan, inner eth, ipv4, ipv6, udp, tcp
  - Actions: queue and void
  - Selectors: 'is', 'spec' and 'mask'. 'last' is not supported
  - In total, up to 64 bytes of mask is allowed across all haeders

- **1300 series VICS with Advanced filters enabled**

  - Attributes: ingress
  - Items: eth, ipv4, ipv6, udp, tcp, vxlan, inner eth, ipv4, ipv6, udp, tcp
  - Actions: queue, mark, flag and void
  - Selectors: 'is', 'spec' and 'mask'. 'last' is not supported
  - In total, up to 64 bytes of mask is allowed across all haeders

More features may be added in future firmware and new versions of the VIC.
Please refer to the release notes.

Ingress VLAN Rewrite
--------------------

VIC adapters can tag, untag, or modify the VLAN headers of ingress
packets. The ingress VLAN rewrite mode controls this behavior. By
default, it is set to pass-through, where the NIC does not modify the
VLAN header in any way so that the application can see the original
header. This mode is sufficient for many applications, but may not be
suitable for others. Such applications may change the mode by setting
``devargs`` parameter ``ig-vlan-rewrite`` to one of the following.

- ``pass``: Pass-through mode. The NIC does not modify the VLAN
  header. This is the default mode.

- ``priority``: Priority-tag default VLAN mode. If the ingress packet
  is tagged with the default VLAN, the NIC replaces its VLAN header
  with the priority tag (VLAN ID 0).

- ``trunk``: Default trunk mode. The NIC tags untagged ingress packets
  with the default VLAN. Tagged ingress packets are not modified. To
  the application, every packet appears as tagged.

- ``untag``: Untag default VLAN mode. If the ingress packet is tagged
  with the default VLAN, the NIC removes or untags its VLAN header so
  that the application sees an untagged packet. As a result, the
  default VLAN becomes `untagged`. This mode can be useful for
  applications such as OVS-DPDK performance benchmarks that utilize
  only the default VLAN and want to see only untagged packets.

.. _enic_limitations:

Limitations
-----------

- **VLAN 0 Priority Tagging**

  If a vNIC is configured in TRUNK mode by the UCS manager, the adapter will
  priority tag egress packets according to 802.1Q if they were not already
  VLAN tagged by software. If the adapter is connected to a properly configured
  switch, there will be no unexpected behavior.

  In test setups where an Ethernet port of a Cisco adapter in TRUNK mode is
  connected point-to-point to another adapter port or connected though a router
  instead of a switch, all ingress packets will be VLAN tagged. Programs such
  as l3fwd may not account for VLAN tags in packets and may misbehave. One
  solution is to enable VLAN stripping on ingress so the VLAN tag is removed
  from the packet and put into the mbuf->vlan_tci field. Here is an example
  of how to accomplish this:

.. code-block:: console

     vlan_offload = rte_eth_dev_get_vlan_offload(port);
     vlan_offload |= ETH_VLAN_STRIP_OFFLOAD;
     rte_eth_dev_set_vlan_offload(port, vlan_offload);

Another alternative is modify the adapter's ingress VLAN rewrite mode so that
packets with the default VLAN tag are stripped by the adapter and presented to
DPDK as untagged packets. In this case mbuf->vlan_tci and the PKT_RX_VLAN and
PKT_RX_VLAN_STRIPPED mbuf flags would not be set. This mode is enabled with the
``devargs`` parameter ``ig-vlan-rewrite=untag``. For example::

    -w 12:00.0,ig-vlan-rewrite=untag

- Limited flow director support on 1200 series and 1300 series Cisco VIC
  adapters with old firmware. Please see :ref:`enic-flow-director`.

- Flow director features are not supported on generation 1 Cisco VIC adapters
  (M81KR and P81E)

- **SR-IOV**

  - KVM hypervisor support only. VMware has not been tested.
  - Requires VM-FEX, and so is only available on UCS managed servers connected
    to Fabric Interconnects. It is not on standalone C-Series servers.
  - VF devices are not usable directly from the host. They can  only be used
    as assigned devices on VM instances.
  - Currently, unbind of the ENIC kernel mode driver 'enic.ko' on the VM
    instance may hang. As a workaround, enic.ko should blacklisted or removed
    from the boot process.
  - pci_generic cannot be used as the uio module in the VM. igb_uio or
    vfio in non-IOMMU mode can be used.
  - The number of RQs in UCSM dynamic vNIC configurations must be at least 2.
  - The number of SR-IOV devices is limited to 256. Components on target system
    might limit this number to fewer than 256.

- **Flow API**

  - The number of filters that can be specified with the Generic Flow API is
    dependent on how many header fields are being masked. Use 'flow create' in
    a loop to determine how many filters your VIC will support (not more than
    1000 for 1300 series VICs). Filter are checked for matching in the order they
    were added. Since there currently is no grouping or priority support,
    'catch-all' filters should be added last.

How to build the suite
----------------------

The build instructions for the DPDK suite should be followed. By default
the ENIC PMD library will be built into the DPDK library.

Refer to the document :ref:`compiling and testing a PMD for a NIC
<pmd_build_and_test>` for details.

By default the ENIC PMD library will be built into the DPDK library.

For configuring and using UIO and VFIO frameworks, please refer to the
documentation that comes with DPDK suite.

Supported Cisco VIC adapters
----------------------------

ENIC PMD supports all recent generations of Cisco VIC adapters including:

- VIC 1280
- VIC 1240
- VIC 1225
- VIC 1285
- VIC 1225T
- VIC 1227
- VIC 1227T
- VIC 1380
- VIC 1340
- VIC 1385
- VIC 1387

Supported Operating Systems
---------------------------

Any Linux distribution fulfilling the conditions described in Dependencies
section of DPDK documentation.

Supported features
------------------

- Unicast, multicast and broadcast transmission and reception
- Receive queue polling
- Port Hardware Statistics
- Hardware VLAN acceleration
- IP checksum offload
- Receive side VLAN stripping
- Multiple receive and transmit queues
- Flow Director ADD, UPDATE, DELETE, STATS operation support IPv4 and IPv6
- Promiscuous mode
- Setting RX VLAN (supported via UCSM/CIMC only)
- VLAN filtering (supported via UCSM/CIMC only)
- Execution of application by unprivileged system users
- IPV4, IPV6 and TCP RSS hashing
- Scattered Rx
- MTU update
- SR-IOV on UCS managed servers connected to Fabric Interconnects.
- Flow API

Known bugs and unsupported features in this release
---------------------------------------------------

- Signature or flex byte based flow direction
- Drop feature of flow direction
- VLAN based flow direction
- non-IPV4 flow direction
- Setting of extended VLAN
- UDP RSS hashing
- MTU update only works if Scattered Rx mode is disabled

Prerequisites
-------------

- Prepare the system as recommended by DPDK suite.  This includes environment
  variables, hugepages configuration, tool-chains and configuration
- Insert vfio-pci kernel module using the command 'modprobe vfio-pci' if the
  user wants to use VFIO framework
- Insert uio kernel module using the command 'modprobe uio' if the user wants
  to use UIO framework
- DPDK suite should be configured based on the user's decision to use VFIO or
  UIO framework
- If the vNIC device(s) to be used is bound to the kernel mode Ethernet driver
  use 'ifconfig' to bring the interface down. The dpdk-devbind.py tool can
  then be used to unbind the device's bus id from the ENIC kernel mode driver.
- Bind the intended vNIC to vfio-pci in case the user wants ENIC PMD to use
  VFIO framework using dpdk-devbind.py.
- Bind the intended vNIC to igb_uio in case the user wants ENIC PMD to use
  UIO framework using dpdk-devbind.py.

At this point the system should be ready to run DPDK applications. Once the
application runs to completion, the vNIC can be detached from vfio-pci or
igb_uio if necessary.

Root privilege is required to bind and unbind vNICs to/from VFIO/UIO.
VFIO framework helps an unprivileged user to run the applications.
For an unprivileged user to run the applications on DPDK and ENIC PMD,
it may be necessary to increase the maximum locked memory of the user.
The following command could be used to do this.

.. code-block:: console

    sudo sh -c "ulimit -l <value in Kilo Bytes>"

The value depends on the memory configuration of the application, DPDK and
PMD.  Typically, the limit has to be raised to higher than 2GB.
e.g., 2621440

The compilation of any unused drivers can be disabled using the
configuration file in config/ directory (e.g., config/common_linuxapp).
This would help in bringing down the time taken for building the
libraries and the initialization time of the application.

Additional Reference
--------------------

- http://www.cisco.com/c/en/us/products/servers-unified-computing

Contact Information
-------------------

Any questions or bugs should be reported to DPDK community and to the ENIC PMD
maintainers:

- John Daley <johndale@cisco.com>
- Nelson Escobar <neescoba@cisco.com>