summaryrefslogtreecommitdiffstats
path: root/doc/guides/prog_guide/switch_representation.rst
blob: f5ee516f6f81d61c105a307e087d94c95aba912d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
..  SPDX-License-Identifier: BSD-3-Clause
    Copyright(c) 2018 6WIND S.A.

.. _switch_representation:

Switch Representation within DPDK Applications
==============================================

.. contents:: :local:

Introduction
------------

Network adapters with multiple physical ports and/or SR-IOV capabilities
usually support the offload of traffic steering rules between their virtual
functions (VFs), physical functions (PFs) and ports.

Like for standard Ethernet switches, this involves a combination of
automatic MAC learning and manual configuration. For most purposes it is
managed by the host system and fully transparent to users and applications.

On the other hand, applications typically found on hypervisors that process
layer 2 (L2) traffic (such as OVS) need to steer traffic themselves
according on their own criteria.

Without a standard software interface to manage traffic steering rules
between VFs, PFs and the various physical ports of a given device,
applications cannot take advantage of these offloads; software processing is
mandatory even for traffic which ends up re-injected into the device it
originates from.

This document describes how such steering rules can be configured through
the DPDK flow API (**rte_flow**), with emphasis on the SR-IOV use case
(PF/VF steering) using a single physical port for clarity, however the same
logic applies to any number of ports without necessarily involving SR-IOV.

Port Representors
-----------------

In many cases, traffic steering rules cannot be determined in advance;
applications usually have to process a bit of traffic in software before
thinking about offloading specific flows to hardware.

Applications therefore need the ability to receive and inject traffic to
various device endpoints (other VFs, PFs or physical ports) before
connecting them together. Device drivers must provide means to hook the
"other end" of these endpoints and to refer them when configuring flow
rules.

This role is left to so-called "port representors" (also known as "VF
representors" in the specific context of VFs), which are to DPDK what the
Ethernet switch device driver model (**switchdev**) [1]_ is to Linux, and
which can be thought as a software "patch panel" front-end for applications.

- DPDK port representors are implemented as additional virtual Ethernet
  device (**ethdev**) instances, spawned on an as needed basis through
  configuration parameters passed to the driver of the underlying
  device using devargs.

::

   -w pci:dbdf,representor=0
   -w pci:dbdf,representor=[0-3]
   -w pci:dbdf,representor=[0,5-11]

- As virtual devices, they may be more limited than their physical
  counterparts, for instance by exposing only a subset of device
  configuration callbacks and/or by not necessarily having Rx/Tx capability.

- Among other things, they can be used to assign MAC addresses to the
  resource they represent.

- Applications can tell port representors apart from other physical of virtual
  port by checking the dev_flags field within their device information
  structure for the RTE_ETH_DEV_REPRESENTOR bit-field.

.. code-block:: c

  struct rte_eth_dev_info {
      ...
      uint32_t dev_flags; /**< Device flags */
      ...
  };

- The device or group relationship of ports can be discovered using the
  switch ``domain_id`` field within the devices switch information structure. By
  default the switch ``domain_id`` of a port will be
  ``RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID`` to indicate that the port doesn't
  support the concept of a switch domain, but ports which do support the concept
  will be allocated a unique switch ``domain_id``, ports within the same switch
  domain will share the same ``domain_id``. The switch ``port_id`` is used to
  specify the port_id in terms of the switch, so in the case of SR-IOV devices
  the switch ``port_id`` would represent the virtual function identifier of the
  port.

.. code-block:: c

   /**
    * Ethernet device associated switch information
    */
   struct rte_eth_switch_info {
       const char *name; /**< switch name */
       uint16_t domain_id; /**< switch domain id */
       uint16_t port_id; /**< switch port id */
   };


.. [1] `Ethernet switch device driver model (switchdev)
       <https://www.kernel.org/doc/Documentation/networking/switchdev.txt>`_

Basic SR-IOV
------------

"Basic" in the sense that it is not managed by applications, which
nonetheless expect traffic to flow between the various endpoints and the
outside as if everything was linked by an Ethernet hub.

The following diagram pictures a setup involving a device with one PF, two
VFs and one shared physical port

::

       .-------------.                 .-------------. .-------------.
       | hypervisor  |                 |    VM 1     | |    VM 2     |
       | application |                 | application | | application |
       `--+----------'                 `----------+--' `--+----------'
          |                                       |       |
    .-----+-----.                                 |       |
    | port_id 3 |                                 |       |
    `-----+-----'                                 |       |
          |                                       |       |
        .-+--.                                .---+--. .--+---.
        | PF |                                | VF 1 | | VF 2 |
        `-+--'                                `---+--' `--+---'
          |                                       |       |
          `---------.     .-----------------------'       |
                    |     |     .-------------------------'
                    |     |     |
                 .--+-----+-----+--.
                 | interconnection |
                 `--------+--------'
                          |
                     .----+-----.
                     | physical |
                     |  port 0  |
                     `----------'

- A DPDK application running on the hypervisor owns the PF device, which is
  arbitrarily assigned port index 3.

- Both VFs are assigned to VMs and used by unknown applications; they may be
  DPDK-based or anything else.

- Interconnection is not necessarily done through a true Ethernet switch and
  may not even exist as a separate entity. The role of this block is to show
  that something brings PF, VFs and physical ports together and enables
  communication between them, with a number of built-in restrictions.

Subsequent sections in this document describe means for DPDK applications
running on the hypervisor to freely assign specific flows between PF, VFs
and physical ports based on traffic properties, by managing this
interconnection.

Controlled SR-IOV
-----------------

Initialization
~~~~~~~~~~~~~~

When a DPDK application gets assigned a PF device and is deliberately not
started in `basic SR-IOV`_ mode, any traffic coming from physical ports is
received by PF according to default rules, while VFs remain isolated.

::

       .-------------.                 .-------------. .-------------.
       | hypervisor  |                 |    VM 1     | |    VM 2     |
       | application |                 | application | | application |
       `--+----------'                 `----------+--' `--+----------'
          |                                       |       |
    .-----+-----.                                 |       |
    | port_id 3 |                                 |       |
    `-----+-----'                                 |       |
          |                                       |       |
        .-+--.                                .---+--. .--+---.
        | PF |                                | VF 1 | | VF 2 |
        `-+--'                                `------' `------'
          |
          `-----.
                |
             .--+----------------------.
             | managed interconnection |
             `------------+------------'
                          |
                     .----+-----.
                     | physical |
                     |  port 0  |
                     `----------'

In this mode, interconnection must be configured by the application to
enable VF communication, for instance by explicitly directing traffic with a
given destination MAC address to VF 1 and allowing that with the same source
MAC address to come out of it.

For this to work, hypervisor applications need a way to refer to either VF 1
or VF 2 in addition to the PF. This is addressed by `VF representors`_.

VF Representors
~~~~~~~~~~~~~~~

VF representors are virtual but standard DPDK network devices (albeit with
limited capabilities) created by PMDs when managing a PF device.

Since they represent VF instances used by other applications, configuring
them (e.g. assigning a MAC address or setting up promiscuous mode) affects
interconnection accordingly. If supported, they may also be used as two-way
communication ports with VFs (assuming **switchdev** topology)


::

       .-------------.                 .-------------. .-------------.
       | hypervisor  |                 |    VM 1     | |    VM 2     |
       | application |                 | application | | application |
       `--+---+---+--'                 `----------+--' `--+----------'
          |   |   |                               |       |
          |   |   `-------------------.           |       |
          |   `---------.             |           |       |
          |             |             |           |       |
    .-----+-----. .-----+-----. .-----+-----.     |       |
    | port_id 3 | | port_id 4 | | port_id 5 |     |       |
    `-----+-----' `-----+-----' `-----+-----'     |       |
          |             |             |           |       |
        .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
        | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
        `-+--'    `-----+-----' `-----+-----' `---+--' `--+---'
          |             |             |           |       |
          |             |   .---------'           |       |
          `-----.       |   |   .-----------------'       |
                |       |   |   |   .---------------------'
                |       |   |   |   |
             .--+-------+---+---+---+--.
             | managed interconnection |
             `------------+------------'
                          |
                     .----+-----.
                     | physical |
                     |  port 0  |
                     `----------'

- VF representors are assigned arbitrary port indices 4 and 5 in the
  hypervisor application and are respectively associated with VF 1 and VF 2.

- They can't be dissociated; even if VF 1 and VF 2 were not connected,
  representors could still be used for configuration.

- In this context, port index 3 can be thought as a representor for physical
  port 0.

As previously described, the "interconnection" block represents a logical
concept. Interconnection occurs when hardware configuration enables traffic
flows from one place to another (e.g. physical port 0 to VF 1) according to
some criteria.

This is discussed in more detail in `traffic steering`_.

Traffic Steering
~~~~~~~~~~~~~~~~

In the following diagram, each meaningful traffic origin or endpoint as seen
by the hypervisor application is tagged with a unique letter from A to F.

::

       .-------------.                 .-------------. .-------------.
       | hypervisor  |                 |    VM 1     | |    VM 2     |
       | application |                 | application | | application |
       `--+---+---+--'                 `----------+--' `--+----------'
          |   |   |                               |       |
          |   |   `-------------------.           |       |
          |   `---------.             |           |       |
          |             |             |           |       |
    .----(A)----. .----(B)----. .----(C)----.     |       |
    | port_id 3 | | port_id 4 | | port_id 5 |     |       |
    `-----+-----' `-----+-----' `-----+-----'     |       |
          |             |             |           |       |
        .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
        | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
        `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
          |             |             |           |       |
          |             |   .---------'           |       |
          `-----.       |   |   .-----------------'       |
                |       |   |   |   .---------------------'
                |       |   |   |   |
             .--+-------+---+---+---+--.
             | managed interconnection |
             `------------+------------'
                          |
                     .---(F)----.
                     | physical |
                     |  port 0  |
                     `----------'

- **A**: PF device.
- **B**: port representor for VF 1.
- **C**: port representor for VF 2.
- **D**: VF 1 proper.
- **E**: VF 2 proper.
- **F**: physical port.

Although uncommon, some devices do not enforce a one to one mapping between
PF and physical ports. For instance, by default all ports of **mlx4**
adapters are available to all their PF/VF instances, in which case
additional ports appear next to **F** in the above diagram.

Assuming no interconnection is provided by default in this mode, setting up
a `basic SR-IOV`_ configuration involving physical port 0 could be broken
down as:

PF:

- **A to F**: let everything through.
- **F to A**: PF MAC as destination.

VF 1:

- **A to D**, **E to D** and **F to D**: VF 1 MAC as destination.
- **D to A**: VF 1 MAC as source and PF MAC as destination.
- **D to E**: VF 1 MAC as source and VF 2 MAC as destination.
- **D to F**: VF 1 MAC as source.

VF 2:

- **A to E**, **D to E** and **F to E**: VF 2 MAC as destination.
- **E to A**: VF 2 MAC as source and PF MAC as destination.
- **E to D**: VF 2 MAC as source and VF 1 MAC as destination.
- **E to F**: VF 2 MAC as source.

Devices may additionally support advanced matching criteria such as
IPv4/IPv6 addresses or TCP/UDP ports.

The combination of matching criteria with target endpoints fits well with
**rte_flow** [6]_, which expresses flow rules as combinations of patterns
and actions.

Enhancing **rte_flow** with the ability to make flow rules match and target
these endpoints provides a standard interface to manage their
interconnection without introducing new concepts and whole new API to
implement them. This is described in `flow API (rte_flow)`_.

.. [6] `Generic flow API (rte_flow)
       <http://dpdk.org/doc/guides/prog_guide/rte_flow.html>`_

Flow API (rte_flow)
-------------------

Extensions
~~~~~~~~~~

Compared to creating a brand new dedicated interface, **rte_flow** was
deemed flexible enough to manage representor traffic only with minor
extensions:

- Using physical ports, PF, VF or port representors as targets.

- Affecting traffic that is not necessarily addressed to the DPDK port ID a
  flow rule is associated with (e.g. forcing VF traffic redirection to PF).

For advanced uses:

- Rule-based packet counters.

- The ability to combine several identical actions for traffic duplication
  (e.g. VF representor in addition to a physical port).

- Dedicated actions for traffic encapsulation / decapsulation before
  reaching an endpoint.

Traffic Direction
~~~~~~~~~~~~~~~~~

From an application standpoint, "ingress" and "egress" flow rule attributes
apply to the DPDK port ID they are associated with. They select a traffic
direction for matching patterns, but have no impact on actions.

When matching traffic coming from or going to a different place than the
immediate port ID a flow rule is associated with, these attributes keep
their meaning while applying to the chosen origin, as highlighted by the
following diagram

::

       .-------------.                 .-------------. .-------------.
       | hypervisor  |                 |    VM 1     | |    VM 2     |
       | application |                 | application | | application |
       `--+---+---+--'                 `----------+--' `--+----------'
          |   |   |                               |       |
          |   |   `-------------------.           |       |
          |   `---------.             |           |       |
          | ^           | ^           | ^         |       |
          | | ingress   | | ingress   | | ingress |       |
          | | egress    | | egress    | | egress  |       |
          | v           | v           | v         |       |
    .----(A)----. .----(B)----. .----(C)----.     |       |
    | port_id 3 | | port_id 4 | | port_id 5 |     |       |
    `-----+-----' `-----+-----' `-----+-----'     |       |
          |             |             |           |       |
        .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
        | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
        `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
          |             |             |         ^ |       | ^
          |             |             |  egress | |       | | egress
          |             |             | ingress | |       | | ingress
          |             |   .---------'         v |       | v
          `-----.       |   |   .-----------------'       |
                |       |   |   |   .---------------------'
                |       |   |   |   |
             .--+-------+---+---+---+--.
             | managed interconnection |
             `------------+------------'
                        ^ |
                ingress | |
                 egress | |
                        v |
                     .---(F)----.
                     | physical |
                     |  port 0  |
                     `----------'

Ingress and egress are defined as relative to the application creating the
flow rule.

For instance, matching traffic sent by VM 2 would be done through an ingress
flow rule on VF 2 (**E**). Likewise for incoming traffic on physical port
(**F**). This also applies to **C** and **A** respectively.

Transferring Traffic
~~~~~~~~~~~~~~~~~~~~

Without Port Representors
^^^^^^^^^^^^^^^^^^^^^^^^^

`Traffic direction`_ describes how an application could match traffic coming
from or going to a specific place reachable from a DPDK port ID. This makes
sense when the traffic in question is normally seen (i.e. sent or received)
by the application creating the flow rule (e.g. as in "redirect all traffic
coming from VF 1 to local queue 6").

However this does not force such traffic to take a specific route. Creating
a flow rule on **A** matching traffic coming from **D** is only meaningful
if it can be received by **A** in the first place, otherwise doing so simply
has no effect.

A new flow rule attribute named "transfer" is necessary for that. Combining
it with "ingress" or "egress" and a specific origin requests a flow rule to
be applied at the lowest level

::

             ingress only           :       ingress + transfer
                                    :
    .-------------. .-------------. : .-------------. .-------------.
    | hypervisor  | |    VM 1     | : | hypervisor  | |    VM 1     |
    | application | | application | : | application | | application |
    `------+------' `--+----------' : `------+------' `--+----------'
           |           | | traffic  :        |           | | traffic
     .----(A)----.     | v          :  .----(A)----.     | v
     | port_id 3 |     |            :  | port_id 3 |     |
     `-----+-----'     |            :  `-----+-----'     |
           |           |            :        | ^         |
           |           |            :        | | traffic |
         .-+--.    .---+--.         :      .-+--.    .---+--.
         | PF |    | VF 1 |         :      | PF |    | VF 1 |
         `-+--'    `--(D)-'         :      `-+--'    `--(D)-'
           |           | | traffic  :        | ^         | | traffic
           |           | v          :        | | traffic | v
        .--+-----------+--.         :     .--+-----------+--.
        | interconnection |         :     | interconnection |
        `--------+--------'         :     `--------+--------'
                 | | traffic        :              |
                 | v                :              |
            .---(F)----.            :         .---(F)----.
            | physical |            :         | physical |
            |  port 0  |            :         |  port 0  |
            `----------'            :         `----------'

With "ingress" only, traffic is matched on **A** thus still goes to physical
port **F** by default


::

   testpmd> flow create 3 ingress pattern vf id is 1 / end
              actions queue index 6 / end

With "ingress + transfer", traffic is matched on **D** and is therefore
successfully assigned to queue 6 on **A**


::

    testpmd> flow create 3 ingress transfer pattern vf id is 1 / end
              actions queue index 6 / end


With Port Representors
^^^^^^^^^^^^^^^^^^^^^^

When port representors exist, implicit flow rules with the "transfer"
attribute (described in `without port representors`_) are be assumed to
exist between them and their represented resources. These may be immutable.

In this case, traffic is received by default through the representor and
neither the "transfer" attribute nor traffic origin in flow rule patterns
are necessary. They simply have to be created on the representor port
directly and may target a different representor as described in `PORT_ID
action`_.

Implicit traffic flow with port representor

::

       .-------------.   .-------------.
       | hypervisor  |   |    VM 1     |
       | application |   | application |
       `--+-------+--'   `----------+--'
          |       | ^               | | traffic
          |       | | traffic       | v
          |       `-----.           |
          |             |           |
    .----(A)----. .----(B)----.     |
    | port_id 3 | | port_id 4 |     |
    `-----+-----' `-----+-----'     |
          |             |           |
        .-+--.    .-----+-----. .---+--.
        | PF |    | VF 1 rep. | | VF 1 |
        `-+--'    `-----+-----' `--(D)-'
          |             |           |
       .--|-------------|-----------|--.
       |  |             |           |  |
       |  |             `-----------'  |
       |  |              <-- traffic   |
       `--|----------------------------'
          |
     .---(F)----.
     | physical |
     |  port 0  |
     `----------'

Pattern Items And Actions
~~~~~~~~~~~~~~~~~~~~~~~~~

PORT Pattern Item
^^^^^^^^^^^^^^^^^

Matches traffic originating from (ingress) or going to (egress) a physical
port of the underlying device.

Using this pattern item without specifying a port index matches the physical
port associated with the current DPDK port ID by default. As described in
`traffic steering`_, specifying it should be rarely needed.

- Matches **F** in `traffic steering`_.

PORT Action
^^^^^^^^^^^

Directs matching traffic to a given physical port index.

- Targets **F** in `traffic steering`_.

PORT_ID Pattern Item
^^^^^^^^^^^^^^^^^^^^

Matches traffic originating from (ingress) or going to (egress) a given DPDK
port ID.

Normally only supported if the port ID in question is known by the
underlying PMD and related to the device the flow rule is created against.

This must not be confused with the `PORT pattern item`_ which refers to the
physical port of a device. ``PORT_ID`` refers to a ``struct rte_eth_dev``
object on the application side (also known as "port representor" depending
on the kind of underlying device).

- Matches **A**, **B** or **C** in `traffic steering`_.

PORT_ID Action
^^^^^^^^^^^^^^

Directs matching traffic to a given DPDK port ID.

Same restrictions as `PORT_ID pattern item`_.

- Targets **A**, **B** or **C** in `traffic steering`_.

PF Pattern Item
^^^^^^^^^^^^^^^

Matches traffic originating from (ingress) or going to (egress) the physical
function of the current device.

If supported, should work even if the physical function is not managed by
the application and thus not associated with a DPDK port ID. Its behavior is
otherwise similar to `PORT_ID pattern item`_ using PF port ID.

- Matches **A** in `traffic steering`_.

PF Action
^^^^^^^^^

Directs matching traffic to the physical function of the current device.

Same restrictions as `PF pattern item`_.

- Targets **A** in `traffic steering`_.

VF Pattern Item
^^^^^^^^^^^^^^^

Matches traffic originating from (ingress) or going to (egress) a given
virtual function of the current device.

If supported, should work even if the virtual function is not managed by
the application and thus not associated with a DPDK port ID. Its behavior is
otherwise similar to `PORT_ID pattern item`_ using VF port ID.

Note this pattern item does not match VF representors traffic which, as
separate entities, should be addressed through their own port IDs.

- Matches **D** or **E** in `traffic steering`_.

VF Action
^^^^^^^^^

Directs matching traffic to a given virtual function of the current device.

Same restrictions as `VF pattern item`_.

- Targets **D** or **E** in `traffic steering`_.

\*_ENCAP actions
^^^^^^^^^^^^^^^^

These actions are named according to the protocol they encapsulate traffic
with (e.g. ``VXLAN_ENCAP``) and using specific parameters (e.g. VNI for
VXLAN).

While they modify traffic and can be used multiple times (order matters),
unlike `PORT_ID action`_ and friends, they have no impact on steering.

As described in `actions order and repetition`_ this means they are useless
if used alone in an action list, the resulting traffic gets dropped unless
combined with either ``PASSTHRU`` or other endpoint-targeting actions.

\*_DECAP actions
^^^^^^^^^^^^^^^^

They perform the reverse of `\*_ENCAP actions`_ by popping protocol headers
from traffic instead of pushing them. They can be used multiple times as
well.

Note that using these actions on non-matching traffic results in undefined
behavior. It is recommended to match the protocol headers to decapsulate on
the pattern side of a flow rule in order to use these actions or otherwise
make sure only matching traffic goes through.

Actions Order and Repetition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Flow rules are currently restricted to at most a single action of each
supported type, performed in an unpredictable order (or all at once). To
repeat actions in a predictable fashion, applications have to make rules
pass-through and use priority levels.

It's now clear that PMD support for chaining multiple non-terminating flow
rules of varying priority levels is prohibitively difficult to implement
compared to simply allowing multiple identical actions performed in a
defined order by a single flow rule.

- This change is required to support protocol encapsulation offloads and the
  ability to perform them multiple times (e.g. VLAN then VXLAN).

- It makes the ``DUP`` action redundant since multiple ``QUEUE`` actions can
  be combined for duplication.

- The (non-)terminating property of actions must be discarded. Instead, flow
  rules themselves must be considered terminating by default (i.e. dropping
  traffic if there is no specific target) unless a ``PASSTHRU`` action is
  also specified.

Switching Examples
------------------

This section provides practical examples based on the established testpmd
flow command syntax [2]_, in the context described in `traffic steering`_

::

      .-------------.                 .-------------. .-------------.
      | hypervisor  |                 |    VM 1     | |    VM 2     |
      | application |                 | application | | application |
      `--+---+---+--'                 `----------+--' `--+----------'
         |   |   |                               |       |
         |   |   `-------------------.           |       |
         |   `---------.             |           |       |
         |             |             |           |       |
   .----(A)----. .----(B)----. .----(C)----.     |       |
   | port_id 3 | | port_id 4 | | port_id 5 |     |       |
   `-----+-----' `-----+-----' `-----+-----'     |       |
         |             |             |           |       |
       .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
       | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
       `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
         |             |             |           |       |
         |             |   .---------'           |       |
         `-----.       |   |   .-----------------'       |
               |       |   |   |   .---------------------'
               |       |   |   |   |
            .--|-------|---|---|---|--.
            |  |       |   `---|---'  |
            |  |       `-------'      |
            |  `---------.            |
            `------------|------------'
                         |
                    .---(F)----.
                    | physical |
                    |  port 0  |
                    `----------'

By default, PF (**A**) can communicate with the physical port it is
associated with (**F**), while VF 1 (**D**) and VF 2 (**E**) are isolated
and restricted to communicate with the hypervisor application through their
respective representors (**B** and **C**) if supported.

Examples in subsequent sections apply to hypervisor applications only and
are based on port representors **A**, **B** and **C**.

.. [2] `Flow syntax
    <http://dpdk.org/doc/guides/testpmd_app_ug/testpmd_funcs.html#flow-syntax>`_

Associating VF 1 with Physical Port 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Assign all port traffic (**F**) to VF 1 (**D**) indiscriminately through
their representors

::

   flow create 3 ingress pattern / end actions port_id id 4 / end
   flow create 4 ingress pattern / end actions port_id id 3 / end

More practical example with MAC address restrictions

::

   flow create 3 ingress
       pattern eth dst is {VF 1 MAC} / end
       actions port_id id 4 / end

::

   flow create 4 ingress
       pattern eth src is {VF 1 MAC} / end
       actions port_id id 3 / end


Sharing Broadcasts
~~~~~~~~~~~~~~~~~~

From outside to PF and VFs

::

   flow create 3 ingress
      pattern eth dst is ff:ff:ff:ff:ff:ff / end
      actions port_id id 3 / port_id id 4 / port_id id 5 / end

Note ``port_id id 3`` is necessary otherwise only VFs would receive matching
traffic.

From PF to outside and VFs

::

   flow create 3 egress
      pattern eth dst is ff:ff:ff:ff:ff:ff / end
      actions port / port_id id 4 / port_id id 5 / end

From VFs to outside and PF

::

   flow create 4 ingress
      pattern eth dst is ff:ff:ff:ff:ff:ff src is {VF 1 MAC} / end
      actions port_id id 3 / port_id id 5 / end

   flow create 5 ingress
      pattern eth dst is ff:ff:ff:ff:ff:ff src is {VF 2 MAC} / end
      actions port_id id 4 / port_id id 4 / end

Similar ``33:33:*`` rules based on known MAC addresses should be added for
IPv6 traffic.

Encapsulating VF 2 Traffic in VXLAN
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Assuming pass-through flow rules are supported

::

   flow create 5 ingress
      pattern eth / end
      actions vxlan_encap vni 42 / passthru / end

::

   flow create 5 egress
      pattern vxlan vni is 42 / end
      actions vxlan_decap / passthru / end

Here ``passthru`` is needed since as described in `actions order and
repetition`_, flow rules are otherwise terminating; if supported, a rule
without a target endpoint will drop traffic.

Without pass-through support, ingress encapsulation on the destination
endpoint might not be supported and action list must provide one

::

   flow create 5 ingress
      pattern eth src is {VF 2 MAC} / end
      actions vxlan_encap vni 42 / port_id id 3 / end

   flow create 3 ingress
      pattern vxlan vni is 42 / end
      actions vxlan_decap / port_id id 5 / end