1 files changed, 807 insertions, 0 deletions
diff --git a/docs/developer/corearchitecture/vnet.rst b/docs/developer/corearchitecture/vnet.rst
new file mode 100644
index 00000000000..812e2fb4f8a
--- /dev/null
+++ b/docs/developer/corearchitecture/vnet.rst
@@ -0,0 +1,807 @@
+VNET (VPP Network Stack)
+========================
+
+The files associated with the VPP network stack layer are located in the
+*./src/vnet* folder. The Network Stack Layer is basically an
+instantiation of the code in the other layers. This layer has a vnet
+library that provides vectorized layer-2 and 3 networking graph nodes, a
+packet generator, and a packet tracer.
+
+In terms of building a packet processing application, vnet provides a
+platform-independent subgraph to which one connects a couple of
+device-driver nodes.
+
+Typical RX connections include “ethernet-input” [full software
+classification, feeds ipv4-input, ipv6-input, arp-input etc.] and
+“ipv4-input-no-checksum” [if hardware can classify, perform ipv4 header
+checksum].
+
+Effective graph dispatch function coding
+----------------------------------------
+
+Over the 15 years, multiple coding styles have emerged: a
+single/dual/quad loop coding model (with variations) and a
+fully-pipelined coding model.
+
+Single/dual loops
+-----------------
+
+The single/dual/quad loop model variations conveniently solve problems
+where the number of items to process is not known in advance: typical
+hardware RX-ring processing. This coding style is also very effective
+when a given node will not need to cover a complex set of dependent
+reads.
+
+Here is an quad/single loop which can leverage up-to-avx512 SIMD vector
+units to convert buffer indices to buffer pointers:
+
+.. code:: c
+
+      static uword
+      simulated_ethernet_interface_tx (vlib_main_t * vm,
+                    vlib_node_runtime_t *
+                    node, vlib_frame_t * frame)
+      {
+        u32 n_left_from, *from;
+        u32 next_index = 0;
+        u32 n_bytes;
+        u32 thread_index = vm->thread_index;
+        vnet_main_t *vnm = vnet_get_main ();
+        vnet_interface_main_t *im = &vnm->interface_main;
+        vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
+        u16 nexts[VLIB_FRAME_SIZE], *next;
+
+        n_left_from = frame->n_vectors;
+        from = vlib_frame_vector_args (frame);
+
+        /*
+         * Convert up to VLIB_FRAME_SIZE indices in "from" to
+         * buffer pointers in bufs[]
+         */
+        vlib_get_buffers (vm, from, bufs, n_left_from);
+        b = bufs;
+        next = nexts;
+
+        /*
+         * While we have at least 4 vector elements (pkts) to process..
+         */
+        while (n_left_from >= 4)
+          {
+            /* Prefetch next quad-loop iteration. */
+            if (PREDICT_TRUE (n_left_from >= 8))
+          {
+            vlib_prefetch_buffer_header (b[4], STORE);
+            vlib_prefetch_buffer_header (b[5], STORE);
+            vlib_prefetch_buffer_header (b[6], STORE);
+            vlib_prefetch_buffer_header (b[7], STORE);
+              }
+
+            /*
+             * $$$ Process 4x packets right here...
+             * set next[0..3] to send the packets where they need to go
+             */
+
+             do_something_to (b[0]);
+             do_something_to (b[1]);
+             do_something_to (b[2]);
+             do_something_to (b[3]);
+
+            /* Process the next 0..4 packets */
+        b += 4;
+        next += 4;
+        n_left_from -= 4;
+       }
+        /*
+         * Clean up 0...3 remaining packets at the end of the incoming frame
+         */
+        while (n_left_from > 0)
+          {
+            /*
+             * $$$ Process one packet right here...
+             * set next[0..3] to send the packets where they need to go
+             */
+             do_something_to (b[0]);
+
+            /* Process the next packet */
+            b += 1;
+            next += 1;
+            n_left_from -= 1;
+          }
+
+        /*
+         * Send the packets along their respective next-node graph arcs
+         * Considerable locality of reference is expected, most if not all
+         * packets in the inbound vector will traverse the same next-node
+         * arc
+         */
+        vlib_buffer_enqueue_to_next (vm, node, from, nexts, frame->n_vectors);
+
+        return frame->n_vectors;
+      }
+
+Given a packet processing task to implement, it pays to scout around
+looking for similar tasks, and think about using the same coding
+pattern. It is not uncommon to recode a given graph node dispatch
+function several times during performance optimization.
+
+Creating Packets from Scratch
+-----------------------------
+
+At times, it’s necessary to create packets from scratch and send them.
+Tasks like sending keepalives or actively opening connections come to
+mind. Its not difficult, but accurate buffer metadata setup is required.
+
+Allocating Buffers
+~~~~~~~~~~~~~~~~~~
+
+Use vlib_buffer_alloc, which allocates a set of buffer indices. For
+low-performance applications, it’s OK to allocate one buffer at a time.
+Note that vlib_buffer_alloc(…) does NOT initialize buffer metadata. See
+below.
+
+In high-performance cases, allocate a vector of buffer indices, and hand
+them out from the end of the vector; decrement \_vec_len(..) as buffer
+indices are allocated. See tcp_alloc_tx_buffers(…) and
+tcp_get_free_buffer_index(…) for an example.
+
+Buffer Initialization Example
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The following example shows the **main points**, but is not to be
+blindly cut-’n-pasted.
+
+.. code:: c
+
+     u32 bi0;
+     vlib_buffer_t *b0;
+     ip4_header_t *ip;
+     udp_header_t *udp;
+
+     /* Allocate a buffer */
+     if (vlib_buffer_alloc (vm, &bi0, 1) != 1)
+       return -1;
+
+     b0 = vlib_get_buffer (vm, bi0);
+
+     /* At this point b0->current_data = 0, b0->current_length = 0 */
+
+     /*
+      * Copy data into the buffer. This example ASSUMES that data will fit
+      * in a single buffer, and is e.g. an ip4 packet.
+      */
+     if (have_packet_rewrite)
+        {
+          clib_memcpy (b0->data, data, vec_len (data));
+          b0->current_length = vec_len (data);
+        }
+     else
+        {
+          /* OR, build a udp-ip packet (for example) */
+          ip = vlib_buffer_get_current (b0);
+          udp = (udp_header_t *) (ip + 1);
+          data_dst = (u8 *) (udp + 1);
+
+          ip->ip_version_and_header_length = 0x45;
+          ip->ttl = 254;
+          ip->protocol = IP_PROTOCOL_UDP;
+          ip->length = clib_host_to_net_u16 (sizeof (*ip) + sizeof (*udp) +
+                     vec_len(udp_data));
+          ip->src_address.as_u32 = src_address->as_u32;
+          ip->dst_address.as_u32 = dst_address->as_u32;
+          udp->src_port = clib_host_to_net_u16 (src_port);
+          udp->dst_port = clib_host_to_net_u16 (dst_port);
+          udp->length = clib_host_to_net_u16 (vec_len (udp_data));
+          clib_memcpy (data_dst, udp_data, vec_len(udp_data));
+
+          if (compute_udp_checksum)
+            {
+              /* RFC 7011 section 10.3.2. */
+              udp->checksum = ip4_tcp_udp_compute_checksum (vm, b0, ip);
+              if (udp->checksum == 0)
+                udp->checksum = 0xffff;
+         }
+         b0->current_length = vec_len (sizeof (*ip) + sizeof (*udp) +
+                                      vec_len (udp_data));
+
+       }
+     b0->flags |= VLIB_BUFFER_TOTAL_LENGTH_VALID;
+
+     /* sw_if_index 0 is the "local" interface, which always exists */
+     vnet_buffer (b0)->sw_if_index[VLIB_RX] = 0;
+
+     /* Use the default FIB index for tx lookup. Set non-zero to use another fib */
+     vnet_buffer (b0)->sw_if_index[VLIB_TX] = 0;
+
+If your use-case calls for large packet transmission, use
+vlib_buffer_chain_append_data_with_alloc(…) to create the requisite
+buffer chain.
+
+Enqueueing packets for lookup and transmission
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The simplest way to send a set of packets is to use
+vlib_get_frame_to_node(…) to allocate fresh frame(s) to ip4_lookup_node
+or ip6_lookup_node, add the constructed buffer indices, and dispatch the
+frame using vlib_put_frame_to_node(…).
+
+.. code:: c
+
+       vlib_frame_t *f;
+       f = vlib_get_frame_to_node (vm, ip4_lookup_node.index);
+       f->n_vectors = vec_len(buffer_indices_to_send);
+       to_next = vlib_frame_vector_args (f);
+
+       for (i = 0; i < vec_len (buffer_indices_to_send); i++)
+         to_next[i] = buffer_indices_to_send[i];
+
+       vlib_put_frame_to_node (vm, ip4_lookup_node_index, f);
+
+It is inefficient to allocate and schedule single packet frames. That’s
+typical in case you need to send one packet per second, but should
+**not** occur in a for-loop!
+
+Packet tracer
+-------------
+
+Vlib includes a frame element [packet] trace facility, with a simple
+debug CLI interface. The cli is straightforward: “trace add
+input-node-name count” to start capturing packet traces.
+
+To trace 100 packets on a typical x86_64 system running the dpdk plugin:
+“trace add dpdk-input 100”. When using the packet generator: “trace add
+pg-input 100”
+
+To display the packet trace: “show trace”
+
+Each graph node has the opportunity to capture its own trace data. It is
+almost always a good idea to do so. The trace capture APIs are simple.
+
+The packet capture APIs snapshoot binary data, to minimize processing at
+capture time. Each participating graph node initialization provides a
+vppinfra format-style user function to pretty-print data when required
+by the VLIB “show trace” command.
+
+Set the VLIB node registration “.format_trace” member to the name of the
+per-graph node format function.
+
+Here’s a simple example:
+
+.. code:: c
+
+       u8 * my_node_format_trace (u8 * s, va_list * args)
+       {
+           vlib_main_t * vm = va_arg (*args, vlib_main_t *);
+           vlib_node_t * node = va_arg (*args, vlib_node_t *);
+           my_node_trace_t * t = va_arg (*args, my_trace_t *);
+
+           s = format (s, "My trace data was: %d", t-><whatever>);
+
+           return s;
+       }
+
+The trace framework hands the per-node format function the data it
+captured as the packet whizzed by. The format function pretty-prints the
+data as desired.
+
+Graph Dispatcher Pcap Tracing
+-----------------------------
+
+The vpp graph dispatcher knows how to capture vectors of packets in pcap
+format as they’re dispatched. The pcap captures are as follows:
+
+::
+
+       VPP graph dispatch trace record description:
+
+           0                   1                   2                   3
+           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | Major Version | Minor Version | NStrings      | ProtoHint     |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | Buffer index (big endian)                                     |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          + VPP graph node name ...     ...               | NULL octet    |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | Buffer Metadata ... ...                       | NULL octet    |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | Buffer Opaque ... ...                         | NULL octet    |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | Buffer Opaque 2 ... ...                       | NULL octet    |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | VPP ASCII packet trace (if NStrings > 4)      | NULL octet    |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | Packet data (up to 16K)                                       |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+Graph dispatch records comprise a version stamp, an indication of how
+many NULL-terminated strings will follow the record header and preceed
+packet data, and a protocol hint.
+
+The buffer index is an opaque 32-bit cookie which allows consumers of
+these data to easily filter/track single packets as they traverse the
+forwarding graph.
+
+Multiple records per packet are normal, and to be expected. Packets will
+appear multiple times as they traverse the vpp forwarding graph. In this
+way, vpp graph dispatch traces are significantly different from regular
+network packet captures from an end-station. This property complicates
+stateful packet analysis.
+
+Restricting stateful analysis to records from a single vpp graph node
+such as “ethernet-input” seems likely to improve the situation.
+
+As of this writing: major version = 1, minor version = 0. Nstrings
+SHOULD be 4 or 5. Consumers SHOULD be wary values less than 4 or greater
+than 5. They MAY attempt to display the claimed number of strings, or
+they MAY treat the condition as an error.
+
+Here is the current set of protocol hints:
+
+.. code:: c
+
+       typedef enum
+         {
+           VLIB_NODE_PROTO_HINT_NONE = 0,
+           VLIB_NODE_PROTO_HINT_ETHERNET,
+           VLIB_NODE_PROTO_HINT_IP4,
+           VLIB_NODE_PROTO_HINT_IP6,
+           VLIB_NODE_PROTO_HINT_TCP,
+           VLIB_NODE_PROTO_HINT_UDP,
+           VLIB_NODE_N_PROTO_HINTS,
+         } vlib_node_proto_hint_t;
+
+Example: VLIB_NODE_PROTO_HINT_IP6 means that the first octet of packet
+data SHOULD be 0x60, and should begin an ipv6 packet header.
+
+Downstream consumers of these data SHOULD pay attention to the protocol
+hint. They MUST tolerate inaccurate hints, which MAY occur from time to
+time.
+
+Dispatch Pcap Trace Debug CLI
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To start a dispatch trace capture of up to 10,000 trace records:
+
+::
+
+        pcap dispatch trace on max 10000 file dispatch.pcap
+
+To start a dispatch trace which will also include standard vpp packet
+tracing for packets which originate in dpdk-input:
+
+::
+
+        pcap dispatch trace on max 10000 file dispatch.pcap buffer-trace dpdk-input 1000
+
+To save the pcap trace, e.g. in /tmp/dispatch.pcap:
+
+::
+
+       pcap dispatch trace off
+
+Wireshark dissection of dispatch pcap traces
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It almost goes without saying that we built a companion wireshark
+dissector to display these traces. As of this writing, we have
+upstreamed the wireshark dissector.
+
+Since it will be a while before wireshark/master/latest makes it into
+all of the popular Linux distros, please see the “How to build a vpp
+dispatch trace aware Wireshark” page for build info.
+
+Here is a sample packet dissection, with some fields omitted for
+clarity. The point is that the wireshark dissector accurately displays
+**all** of the vpp buffer metadata, and the name of the graph node in
+question.
+
+::
+
+       Frame 1: 2216 bytes on wire (17728 bits), 2216 bytes captured (17728 bits)
+           Encapsulation type: USER 13 (58)
+           [Protocols in frame: vpp:vpp-metadata:vpp-opaque:vpp-opaque2:eth:ethertype:ip:tcp:data]
+       VPP Dispatch Trace
+           BufferIndex: 0x00036663
+       NodeName: ethernet-input
+       VPP Buffer Metadata
+           Metadata: flags:
+           Metadata: current_data: 0, current_length: 102
+           Metadata: current_config_index: 0, flow_id: 0, next_buffer: 0
+           Metadata: error: 0, n_add_refs: 0, buffer_pool_index: 0
+           Metadata: trace_index: 0, recycle_count: 0, len_not_first_buf: 0
+           Metadata: free_list_index: 0
+           Metadata:
+       VPP Buffer Opaque
+           Opaque: raw: 00000007 ffffffff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
+           Opaque: sw_if_index[VLIB_RX]: 7, sw_if_index[VLIB_TX]: -1
+           Opaque: L2 offset 0, L3 offset 0, L4 offset 0, feature arc index 0
+           Opaque: ip.adj_index[VLIB_RX]: 0, ip.adj_index[VLIB_TX]: 0
+           Opaque: ip.flow_hash: 0x0, ip.save_protocol: 0x0, ip.fib_index: 0
+           Opaque: ip.save_rewrite_length: 0, ip.rpf_id: 0
+           Opaque: ip.icmp.type: 0 ip.icmp.code: 0, ip.icmp.data: 0x0
+           Opaque: ip.reass.next_index: 0, ip.reass.estimated_mtu: 0
+           Opaque: ip.reass.fragment_first: 0 ip.reass.fragment_last: 0
+           Opaque: ip.reass.range_first: 0 ip.reass.range_last: 0
+           Opaque: ip.reass.next_range_bi: 0x0, ip.reass.ip6_frag_hdr_offset: 0
+           Opaque: mpls.ttl: 0, mpls.exp: 0, mpls.first: 0, mpls.save_rewrite_length: 0, mpls.bier.n_bytes: 0
+           Opaque: l2.feature_bitmap: 00000000, l2.bd_index: 0, l2.l2_len: 0, l2.shg: 0, l2.l2fib_sn: 0, l2.bd_age: 0
+           Opaque: l2.feature_bitmap_input:   none configured, L2.feature_bitmap_output:   none configured
+           Opaque: l2t.next_index: 0, l2t.session_index: 0
+           Opaque: l2_classify.table_index: 0, l2_classify.opaque_index: 0, l2_classify.hash: 0x0
+           Opaque: policer.index: 0
+           Opaque: ipsec.flags: 0x0, ipsec.sad_index: 0
+           Opaque: map.mtu: 0
+           Opaque: map_t.v6.saddr: 0x0, map_t.v6.daddr: 0x0, map_t.v6.frag_offset: 0, map_t.v6.l4_offset: 0
+           Opaque: map_t.v6.l4_protocol: 0, map_t.checksum_offset: 0, map_t.mtu: 0
+           Opaque: ip_frag.mtu: 0, ip_frag.next_index: 0, ip_frag.flags: 0x0
+           Opaque: cop.current_config_index: 0
+           Opaque: lisp.overlay_afi: 0
+           Opaque: tcp.connection_index: 0, tcp.seq_number: 0, tcp.seq_end: 0, tcp.ack_number: 0, tcp.hdr_offset: 0, tcp.data_offset: 0
+           Opaque: tcp.data_len: 0, tcp.flags: 0x0
+           Opaque: sctp.connection_index: 0, sctp.sid: 0, sctp.ssn: 0, sctp.tsn: 0, sctp.hdr_offset: 0
+           Opaque: sctp.data_offset: 0, sctp.data_len: 0, sctp.subconn_idx: 0, sctp.flags: 0x0
+           Opaque: snat.flags: 0x0
+           Opaque:
+       VPP Buffer Opaque2
+           Opaque2: raw: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
+           Opaque2: qos.bits: 0, qos.source: 0
+           Opaque2: loop_counter: 0
+           Opaque2: gbp.flags: 0, gbp.src_epg: 0
+           Opaque2: pg_replay_timestamp: 0
+           Opaque2:
+       Ethernet II, Src: 06:d6:01:41:3b:92 (06:d6:01:41:3b:92), Dst: IntelCor_3d:f6    Transmission Control Protocol, Src Port: 22432, Dst Port: 54084, Seq: 1, Ack: 1, Len: 36
+           Source Port: 22432
+           Destination Port: 54084
+           TCP payload (36 bytes)
+       Data (36 bytes)
+
+       0000  cf aa 8b f5 53 14 d4 c7 29 75 3e 56 63 93 9d 11   ....S...)u>Vc...
+       0010  e5 f2 92 27 86 56 4c 21 ce c5 23 46 d7 eb ec 0d   ...'.VL!..#F....
+       0020  a8 98 36 5a                                       ..6Z
+           Data: cfaa8bf55314d4c729753e5663939d11e5f2922786564c21…
+           [Length: 36]
+
+It’s a matter of a couple of mouse-clicks in Wireshark to filter the
+trace to a specific buffer index. With that specific kind of filtration,
+one can watch a packet walk through the forwarding graph; noting any/all
+metadata changes, header checksum changes, and so forth.
+
+This should be of significant value when developing new vpp graph nodes.
+If new code mispositions b->current_data, it will be completely obvious
+from looking at the dispatch trace in wireshark.
+
+pcap rx, tx, and drop tracing
+-----------------------------
+
+vpp also supports rx, tx, and drop packet capture in pcap format,
+through the “pcap trace” debug CLI command.
+
+This command is used to start or stop a packet capture, or show the
+status of packet capture. Each of “pcap trace rx”, “pcap trace tx”, and
+“pcap trace drop” is implemented. Supply one or more of “rx”, “tx”, and
+“drop” to enable multiple simultaneous capture types.
+
+These commands have the following optional parameters:
+
+-  rx - trace received packets.
+
+-  tx - trace transmitted packets.
+
+-  drop - trace dropped packets.
+
+-  max *nnnn*\  - file size, number of packet captures. Once packets
+   have been received, the trace buffer buffer is flushed to the
+   indicated file. Defaults to 1000. Can only be updated if packet
+   capture is off.
+
+-  max-bytes-per-pkt *nnnn*\  - maximum number of bytes to trace on a
+   per-packet basis. Must be >32 and less than 9000. Default value:
+
+   512.
+
+-  filter - Use the pcap rx / tx / drop trace filter, which must be
+   configured. Use classify filter pcap… to configure the filter. The
+   filter will only be executed if the per-interface or any-interface
+   tests fail.
+
+-  intfc *interface* \| *any*\  - Used to specify a given interface, or
+   use ‘any’ to run packet capture on all interfaces. ‘any’ is the
+   default if not provided. Settings from a previous packet capture are
+   preserved, so ‘any’ can be used to reset the interface setting.
+
+-  file *filename*\  - Used to specify the output filename. The file
+   will be placed in the ‘/tmp’ directory. If *filename* already exists,
+   file will be overwritten. If no filename is provided, ‘/tmp/rx.pcap
+   or tx.pcap’ will be used, depending on capture direction. Can only be
+   updated when pcap capture is off.
+
+-  status - Displays the current status and configured attributes
+   associated with a packet capture. If packet capture is in progress,
+   ‘status’ also will return the number of packets currently in the
+   buffer. Any additional attributes entered on command line with a
+   ‘status’ request will be ignored.
+
+-  filter - Capture packets which match the current packet trace filter
+   set. See next section. Configure the capture filter first.
+
+packet trace capture filtering
+------------------------------
+
+The “classify filter pcap \| \| trace” debug CLI command constructs an
+arbitrary set of packet classifier tables for use with “pcap rx \| tx \|
+drop trace,” and with the vpp packet tracer on a per-interface or
+system-wide basis.
+
+Packets which match a rule in the classifier table chain will be traced.
+The tables are automatically ordered so that matches in the most
+specific table are tried first.
+
+It’s reasonably likely that folks will configure a single table with one
+or two matches. As a result, we configure 8 hash buckets and 128K of
+match rule space by default. One can override the defaults by specifying
+“buckets ” and “memory-size ” as desired.
+
+To build up complex filter chains, repeatedly issue the classify filter
+debug CLI command. Each command must specify the desired mask and match
+values. If a classifier table with a suitable mask already exists, the
+CLI command adds a match rule to the existing table. If not, the CLI
+command add a new table and the indicated mask rule
+
+Configure a simple pcap classify filter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+       classify filter pcap mask l3 ip4 src match l3 ip4 src 192.168.1.11
+       pcap trace rx max 100 filter
+
+Configure a simple per-interface capture filter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+       classify filter GigabitEthernet3/0/0 mask l3 ip4 src match l3 ip4 src 192.168.1.11"
+       pcap trace rx max 100 intfc GigabitEthernet3/0/0
+
+Note that per-interface capture filters are *always* applied.
+
+Clear per-interface capture filters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+       classify filter GigabitEthernet3/0/0 del
+
+Configure another fairly simple pcap classify filter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+      classify filter pcap mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10
+      pcap trace tx max 100 filter
+
+Configure a vpp packet tracer filter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+      classify filter trace mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10
+      trace add dpdk-input 100 filter
+
+Clear all current classifier filters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+       classify filter [pcap | <interface> | trace] del
+
+To inspect the classifier tables
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+      show classify table [verbose]
+
+The verbose form displays all of the match rules, with hit-counters.
+
+Terse description of the “mask ” syntax:
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+       l2 src dst proto tag1 tag2 ignore-tag1 ignore-tag2 cos1 cos2 dot1q dot1ad
+       l3 ip4 <ip4-mask> ip6 <ip6-mask>
+       <ip4-mask> version hdr_length src[/width] dst[/width]
+                  tos length fragment_id ttl protocol checksum
+       <ip6-mask> version traffic-class flow-label src dst proto
+                  payload_length hop_limit protocol
+       l4 tcp <tcp-mask> udp <udp_mask> src_port dst_port
+       <tcp-mask> src dst  # ports
+       <udp-mask> src_port dst_port
+
+To construct **matches**, add the values to match after the indicated
+keywords in the mask syntax. For example: “… mask l3 ip4 src” -> “…
+match l3 ip4 src 192.168.1.11”
+
+VPP Packet Generator
+--------------------
+
+We use the VPP packet generator to inject packets into the forwarding
+graph. The packet generator can replay pcap traces, and generate packets
+out of whole cloth at respectably high performance.
+
+The VPP pg enables quite a variety of use-cases, ranging from functional
+testing of new data-plane nodes to regression testing to performance
+tuning.
+
+PG setup scripts
+----------------
+
+PG setup scripts describe traffic in detail, and leverage vpp debug CLI
+mechanisms. It’s reasonably unusual to construct a pg setup script which
+doesn’t include a certain amount of interface and FIB configuration.
+
+For example:
+
+::
+
+       loop create
+       set int ip address loop0 192.168.1.1/24
+       set int state loop0 up
+
+       packet-generator new {
+           name pg0
+           limit 100
+           rate 1e6
+           size 300-300
+           interface loop0
+           node ethernet-input
+           data { IP4: 1.2.3 -> 4.5.6
+                  UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10
+                  UDP: 1234 -> 2345
+                  incrementing 286
+           }
+       }
+
+A packet generator stream definition includes two major sections: -
+Stream Parameter Setup - Packet Data
+
+Stream Parameter Setup
+~~~~~~~~~~~~~~~~~~~~~~
+
+Given the example above, let’s look at how to set up stream parameters:
+
+-  **name pg0** - Name of the stream, in this case “pg0”
+
+-  **limit 1000** - Number of packets to send when the stream is
+   enabled. “limit 0” means send packets continuously.
+
+-  **maxframe <nnn>** - Maximum frame size. Handy for injecting multiple
+   frames no larger than <nnn>. Useful for checking dual / quad loop
+   codes
+
+-  **rate 1e6** - Packet injection rate, in this case 1 MPPS. When not
+   specified, the packet generator injects packets as fast as possible
+
+-  **size 300-300** - Packet size range, in this case send 300-byte
+   packets
+
+-  **interface loop0** - Packets appear as if they were received on the
+   specified interface. This datum is used in multiple ways: to select
+   graph arc feature configuration, to select IP FIBs. Configure
+   features e.g. on loop0 to exercise those features.
+
+-  **tx-interface <name>** - Packets will be transmitted on the
+   indicated interface. Typically required only when injecting packets
+   into post-IP-rewrite graph nodes.
+
+-  **pcap <filename>** - Replay packets from the indicated pcap capture
+   file. “make test” makes extensive use of this feature: generate
+   packets using scapy, save them in a .pcap file, then inject them into
+   the vpp graph via a vpp pg “pcap <filename>” stream definition
+
+-  **worker <nn>** - Generate packets for the stream using the indicated
+   vpp worker thread. The vpp pg generates and injects O(10 MPPS /
+   core). Use multiple stream definitions and worker threads to generate
+   and inject enough traffic to easily fill a 40 gbit pipe with small
+   packets.
+
+Data definition
+~~~~~~~~~~~~~~~
+
+Packet generator data definitions make use of a layered implementation
+strategy. Networking layers are specified in order, and the notation can
+seem a bit counter-intuitive. In the example above, the data definition
+stanza constructs a set of L2-L4 headers layers, and uses an
+incrementing fill pattern to round out the requested 300-byte packets.
+
+-  **IP4: 1.2.3 -> 4.5.6** - Construct an L2 (MAC) header with the ip4
+   ethertype (0x800), src MAC address of 00:01:00:02:00:03 and dst MAC
+   address of 00:04:00:05:00:06. Mac addresses may be specified in
+   either *xxxx.xxxx.xxxx* format or *xx:xx:xx:xx:xx:xx* format.
+
+-  **UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10** - Construct an
+   incrementing set of L3 (IPv4) headers for successive packets with
+   source addresses ranging from .10 to .254. All packets in the stream
+   have a constant dest address of 192.168.2.10. Set the protocol field
+   to 17, UDP.
+
+-  **UDP: 1234 -> 2345** - Set the UDP source and destination ports to
+   1234 and 2345, respectively
+
+-  **incrementing 256** - Insert up to 256 incrementing data bytes.
+
+Obvious variations involve “s/IP4/IP6/” in the above, along with
+changing from IPv4 to IPv6 address notation.
+
+The vpp pg can set any / all IPv4 header fields, including tos, packet
+length, mf / df / fragment id and offset, ttl, protocol, checksum, and
+src/dst addresses. Take a look at ../src/vnet/ip/ip[46]_pg.c for
+details.
+
+If all else fails, specify the entire packet data in hex:
+
+-  **hex 0xabcd…** - copy hex data verbatim into the packet
+
+When replaying pcap files (“**pcap <filename>**”), do not specify a data
+stanza.
+
+Diagnosing “packet-generator new” parse failures
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want to inject packets into a brand-new graph node, remember to
+tell the packet generator debug CLI how to parse the packet data stanza.
+
+If the node expects L2 Ethernet MAC headers, specify “.unformat_buffer =
+unformat_ethernet_header”:
+
+.. code:: c
+
+       VLIB_REGISTER_NODE (ethernet_input_node) =
+       {
+         <snip>
+         .unformat_buffer = unformat_ethernet_header,
+         <snip>
+       };
+
+Beyond that, it may be necessary to set breakpoints in
+…/src/vnet/pg/cli.c. Debug image suggested.
+
+When debugging new nodes, it may be far simpler to directly inject
+ethernet frames - and add a corresponding vlib_buffer_advance in the new
+node - than to modify the packet generator.
+
+Debug CLI
+---------
+
+The descriptions above describe the “packet-generator new” debug CLI in
+detail.
+
+Additional debug CLI commands include:
+
+::
+
+       vpp# packet-generator enable [<stream-name>]
+
+which enables the named stream, or all streams.
+
+::
+
+       vpp# packet-generator disable [<stream-name>]
+
+disables the named stream, or all streams.
+
+::
+
+       vpp# packet-generator delete <stream-name>
+
+Deletes the named stream.
+
+::
+
+       vpp# packet-generator configure <stream-name> [limit <nnn>]
+            [rate <f64-pps>] [size <nn>-<nn>]
+
+Changes stream parameters without having to recreate the entire stream
+definition. Note that re-issuing a “packet-generator new” command will
+correctly recreate the named stream.