17 files changed, 4112 insertions, 0 deletions
diff --git a/docs/developer/corearchitecture/bihash.rst b/docs/developer/corearchitecture/bihash.rst
new file mode 100644
index 00000000000..9b62baaf9cf
--- /dev/null
+++ b/docs/developer/corearchitecture/bihash.rst
@@ -0,0 +1,313 @@
+Bounded-index Extensible Hashing (bihash)
+=========================================
+
+Vpp uses bounded-index extensible hashing to solve a variety of
+exact-match (key, value) lookup problems. Benefits of the current
+implementation:
+
+-  Very high record count scaling, tested to 100,000,000 records.
+-  Lookup performance degrades gracefully as the number of records
+   increases
+-  No reader locking required
+-  Template implementation, it’s easy to support arbitrary (key,value)
+   types
+
+Bounded-index extensible hashing has been widely used in databases for
+decades.
+
+Bihash uses a two-level data structure:
+
+::
+
+       +-----------------+
+       | bucket-0        |
+       |  log2_size      |
+       |  backing store  |
+       +-----------------+
+       | bucket-1        |
+       |  log2_size      |           +--------------------------------+
+       |  backing store  | --------> | KVP_PER_PAGE * key-value-pairs |
+       +-----------------+           | page 0                         |
+            ...                      +--------------------------------+
+       +-----------------+           | KVP_PER_PAGE * key-value-pairs |
+       | bucket-2**N-1   |           | page 1                         |
+       |  log2_size      |           +--------------------------------+
+       |  backing store  |                       ---
+       +-----------------+           +--------------------------------+
+                                     | KVP_PER_PAGE * key-value-pairs |
+                                     | page 2**(log2(size)) - 1       |
+                                     +--------------------------------+
+
+Discussion of the algorithm
+---------------------------
+
+This structure has a couple of major advantages. In practice, each
+bucket entry fits into a 64-bit integer. Coincidentally, vpp’s target
+CPU architectures support 64-bit atomic operations. When modifying the
+contents of a specific bucket, we do the following:
+
+-  Make a working copy of the bucket’s backing storage
+-  Atomically swap a pointer to the working copy into the bucket array
+-  Change the original backing store data
+-  Atomically swap back to the original
+
+So, no reader locking is required to search a bihash table.
+
+At lookup time, the implementation computes a key hash code. We use the
+least-significant N bits of the hash to select the bucket.
+
+With the bucket in hand, we learn log2 (nBackingPages) for the selected
+bucket. At this point, we use the next log2_size bits from the hash code
+to select the specific backing page in which the (key,value) page will
+be found.
+
+Net result: we search **one** backing page, not 2**log2_size pages. This
+is a key property of the algorithm.
+
+When sufficient collisions occur to fill the backing pages for a given
+bucket, we double the bucket size, rehash, and deal the bucket contents
+into a double-sized set of backing pages. In the future, we may
+represent the size as a linear combination of two powers-of-two, to
+increase space efficiency.
+
+To solve the “jackpot case” where a set of records collide under hashing
+in a bad way, the implementation will fall back to linear search across
+2**log2_size backing pages on a per-bucket basis.
+
+To maintain *space* efficiency, we should configure the bucket array so
+that backing pages are effectively utilized. Lookup performance tends to
+change *very little* if the bucket array is too small or too large.
+
+Bihash depends on selecting an effective hash function. If one were to
+use a truly broken hash function such as “return 1ULL.” bihash would
+still work, but it would be equivalent to poorly-programmed linear
+search.
+
+We often use cpu intrinsic functions - think crc32 - to rapidly compute
+a hash code which has decent statistics.
+
+Bihash Cookbook
+---------------
+
+Using current (key,value) template instance types
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It’s quite easy to use one of the template instance types. As of this
+writing, …/src/vppinfra provides pre-built templates for 8, 16, 20, 24,
+40, and 48 byte keys, u8 \* vector keys, and 8 byte values.
+
+See …/src/vppinfra/{bihash\_\_8}.h
+
+To define the data types, #include a specific template instance, most
+often in a subsystem header file:
+
+.. code:: c
+
+        #include <vppinfra/bihash_8_8.h>
+
+If you’re building a standalone application, you’ll need to define the
+various functions by #including the method implementation file in a C
+source file.
+
+The core vpp engine currently uses most if not all of the known bihash
+types, so you probably won’t need to #include the method implementation
+file.
+
+.. code:: c
+
+        #include <vppinfra/bihash_template.c>
+
+Add an instance of the selected bihash data structure to e.g. a “main_t”
+structure:
+
+.. code:: c
+
+       typedef struct
+       {
+         ...
+         BVT (clib_bihash) hash_table;
+         or
+         clib_bihash_8_8_t hash_table;
+         ...
+       } my_main_t;
+
+The BV macro concatenate its argument with the value of the preprocessor
+symbol BIHASH_TYPE. The BVT macro concatenates its argument with the
+value of BIHASH_TYPE and the fixed-string “_t”. So in the above example,
+BVT (clib_bihash) generates “clib_bihash_8_8_t”.
+
+If you’re sure you won’t decide to change the template / type name
+later, it’s perfectly OK to code “clib_bihash_8_8_t” and so forth.
+
+In fact, if you #include multiple template instances in a single source
+file, you **must** use fully-enumerated type names. The macros stand no
+chance of working.
+
+Initializing a bihash table
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Call the init function as shown. As a rough guide, pick a number of
+buckets which is approximately
+number_of_expected_records/BIHASH_KVP_PER_PAGE from the relevant
+template instance header-file. See previous discussion.
+
+The amount of memory selected should easily contain all of the records,
+with a generous allowance for hash collisions. Bihash memory is
+allocated separately from the main heap, and won’t cost anything except
+kernel PTE’s until touched, so it’s OK to be reasonably generous.
+
+For example:
+
+.. code:: c
+
+       my_main_t *mm = &my_main;
+       clib_bihash_8_8_t *h;
+
+       h = &mm->hash_table;
+
+       clib_bihash_init_8_8 (h, "test", (u32) number_of_buckets,
+                              (uword) memory_size);
+
+Add or delete a key/value pair
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Use BV(clib_bihash_add_del), or the explicit type variant:
+
+.. code:: c
+
+      clib_bihash_kv_8_8_t kv;
+      clib_bihash_8_8_t * h;
+      my_main_t *mm = &my_main;
+      clib_bihash_8_8_t *h;
+
+      h = &mm->hash_table;
+      kv.key = key_to_add_or_delete;
+      kv.value = value_to_add_or_delete;
+
+      clib_bihash_add_del_8_8 (h, &kv, is_add /* 1=add, 0=delete */);
+
+In the delete case, kv.value is irrelevant. To change the value
+associated with an existing (key,value) pair, simply re-add the [new]
+pair.
+
+Simple search
+~~~~~~~~~~~~~
+
+The simplest possible (key, value) search goes like so:
+
+.. code:: c
+
+      clib_bihash_kv_8_8_t search_kv, return_kv;
+      clib_bihash_8_8_t * h;
+      my_main_t *mm = &my_main;
+      clib_bihash_8_8_t *h;
+
+      h = &mm->hash_table;
+      search_kv.key = key_to_add_or_delete;
+
+      if (clib_bihash_search_8_8 (h, &search_kv, &return_kv) < 0)
+        key_not_found();
+      else
+        key_found();
+
+Note that it’s perfectly fine to collect the lookup result
+
+.. code:: c
+
+      if (clib_bihash_search_8_8 (h, &search_kv, &search_kv))
+        key_not_found();
+      etc.
+
+Bihash vector processing
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+When processing a vector of packets which need a certain lookup
+performed, it’s worth the trouble to compute the key hash, and prefetch
+the correct bucket ahead of time.
+
+Here’s a sketch of one way to write the required code:
+
+Dual-loop: \* 6 packets ahead, prefetch 2x vlib_buffer_t’s and 2x packet
+data required to form the record keys \* 4 packets ahead, form 2x record
+keys and call BV(clib_bihash_hash) or the explicit hash function to
+calculate the record hashes. Call 2x BV(clib_bihash_prefetch_bucket) to
+prefetch the buckets \* 2 packets ahead, call 2x
+BV(clib_bihash_prefetch_data) to prefetch 2x (key,value) data pages. \*
+In the processing section, call 2x
+BV(clib_bihash_search_inline_with_hash) to perform the search
+
+Programmer’s choice whether to stash the hash code somewhere in
+vnet_buffer(b) metadata, or to use local variables.
+
+Single-loop: \* Use simple search as shown above.
+
+Walking a bihash table
+~~~~~~~~~~~~~~~~~~~~~~
+
+A fairly common scenario to build “show” commands involves walking a
+bihash table. It’s simple enough:
+
+.. code:: c
+
+      my_main_t *mm = &my_main;
+      clib_bihash_8_8_t *h;
+      void callback_fn (clib_bihash_kv_8_8_t *, void *);
+
+      h = &mm->hash_table;
+
+      BV(clib_bihash_foreach_key_value_pair) (h, callback_fn, (void *) arg);
+
+To nobody’s great surprise: clib_bihash_foreach_key_value_pair iterates
+across the entire table, calling callback_fn with active entries.
+
+Bihash table iteration safety
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The iterator template “clib_bihash_foreach_key_value_pair” must be used
+with a certain amount of care. For one thing, the iterator template does
+*not* take the bihash hash table writer lock. If your use-case requires
+it, lock the table.
+
+For another, the iterator template is not safe under all conditions:
+
+-  It’s **OK to delete** bihash table entries during a table-walk. The
+   iterator checks whether the current bucket has been freed after each
+   *callback_fn(…)* invocation.
+
+-  It is **not OK to add** entries during a table-walk.
+
+The add-during-walk case involves a jackpot: while processing a
+key-value-pair in a particular bucket, add a certain number of entries.
+By luck, assume that one or more of the added entries causes the
+**current bucket** to split-and-rehash.
+
+Since we rehash KVP’s to different pages based on what amounts to a
+different hash function, either of these things can go wrong:
+
+-  We may revisit previously-visited entries. Depending on how one coded
+   the use-case, we could end up in a recursive-add situation.
+
+-  We may skip entries that have not been visited
+
+One could build an add-safe iterator, at a significant cost in
+performance: copy the entire bucket, and walk the copy.
+
+It’s hard to imagine a worthwhile add-during walk use-case in the first
+place; let alone one which couldn’t be implemented by walking the table
+without modifying it, then adding a set of records.
+
+Creating a new template instance
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Creating a new template is easy. Use one of the existing templates as a
+model, and make the obvious changes. The hash and key_compare methods
+are performance-critical in multiple senses.
+
+If the key compare method is slow, every lookup will be slow. If the
+hash function is slow, same story. If the hash function has poor
+statistical properties, space efficiency will suffer. In the limit, a
+bad enough hash function will cause large portions of the table to
+revert to linear search.
+
+Use of the best available vector unit is well worth the trouble in the
+hash and key_compare functions.
diff --git a/docs/developer/corearchitecture/buffer_metadata.rst b/docs/developer/corearchitecture/buffer_metadata.rst
new file mode 100644
index 00000000000..545c31f3041
--- /dev/null
+++ b/docs/developer/corearchitecture/buffer_metadata.rst
@@ -0,0 +1,237 @@
+Buffer Metadata
+===============
+
+Each vlib_buffer_t (packet buffer) carries buffer metadata which
+describes the current packet-processing state. The underlying techniques
+have been used for decades, across multiple packet processing
+environments.
+
+We will examine vpp buffer metadata in some detail, but folks who need
+to manipulate and/or extend the scheme should expect to do a certain
+level of code inspection.
+
+Vlib (Vector library) primary buffer metadata
+---------------------------------------------
+
+The first 64 octets of each vlib_buffer_t carries the primary buffer
+metadata. See …/src/vlib/buffer.h for full details.
+
+Important fields:
+
+-  i16 current_data: the signed offset in data[], pre_data[] that we are
+   currently processing. If negative current header points into the
+   pre-data (rewrite space) area.
+-  u16 current_length: nBytes between current_data and the end of this
+   buffer.
+-  u32 flags: Buffer flag bits. Heavily used, not many bits left
+
+   -  src/vlib/buffer.h flag bits
+
+      -  VLIB_BUFFER_IS_TRACED: buffer is traced
+      -  VLIB_BUFFER_NEXT_PRESENT: buffer has multiple chunks
+      -  VLIB_BUFFER_TOTAL_LENGTH_VALID:
+         total_length_not_including_first_buffer is valid (see below)
+
+   -  src/vnet/buffer.h flag bits
+
+      -  VNET_BUFFER_F_L4_CHECKSUM_COMPUTED: tcp/udp checksum has been
+         computed
+      -  VNET_BUFFER_F_L4_CHECKSUM_CORRECT: tcp/udp checksum is correct
+      -  VNET_BUFFER_F_VLAN_2_DEEP: two vlan tags present
+      -  VNET_BUFFER_F_VLAN_1_DEEP: one vlan tag present
+      -  VNET_BUFFER_F_SPAN_CLONE: packet has already been cloned (span
+         feature)
+      -  VNET_BUFFER_F_LOOP_COUNTER_VALID: packet look-up loop count
+         valid
+      -  VNET_BUFFER_F_LOCALLY_ORIGINATED: packet built by vpp
+      -  VNET_BUFFER_F_IS_IP4: packet is ipv4, for checksum offload
+      -  VNET_BUFFER_F_IS_IP6: packet is ipv6, for checksum offload
+      -  VNET_BUFFER_F_OFFLOAD_IP_CKSUM: hardware ip checksum offload
+         requested
+      -  VNET_BUFFER_F_OFFLOAD_TCP_CKSUM: hardware tcp checksum offload
+         requested
+      -  VNET_BUFFER_F_OFFLOAD_UDP_CKSUM: hardware udp checksum offload
+         requested
+      -  VNET_BUFFER_F_IS_NATED: natted packet, skip input checks
+      -  VNET_BUFFER_F_L2_HDR_OFFSET_VALID: L2 header offset valid
+      -  VNET_BUFFER_F_L3_HDR_OFFSET_VALID: L3 header offset valid
+      -  VNET_BUFFER_F_L4_HDR_OFFSET_VALID: L4 header offset valid
+      -  VNET_BUFFER_F_FLOW_REPORT: packet is an ipfix packet
+      -  VNET_BUFFER_F_IS_DVR: packet to be reinjected into the l2
+         output path
+      -  VNET_BUFFER_F_QOS_DATA_VALID: QoS data valid in
+         vnet_buffer_opaque2
+      -  VNET_BUFFER_F_GSO: generic segmentation offload requested
+      -  VNET_BUFFER_F_AVAIL1: available bit
+      -  VNET_BUFFER_F_AVAIL2: available bit
+      -  VNET_BUFFER_F_AVAIL3: available bit
+      -  VNET_BUFFER_F_AVAIL4: available bit
+      -  VNET_BUFFER_F_AVAIL5: available bit
+      -  VNET_BUFFER_F_AVAIL6: available bit
+      -  VNET_BUFFER_F_AVAIL7: available bit
+
+-  u32 flow_id: generic flow identifier
+-  u8 ref_count: buffer reference / clone count (e.g. for span
+   replication)
+-  u8 buffer_pool_index: buffer pool index which owns this buffer
+-  vlib_error_t (u16) error: error code for buffers enqueued to error
+   handler
+-  u32 next_buffer: buffer index of next buffer in chain. Only valid if
+   VLIB_BUFFER_NEXT_PRESENT is set
+-  union
+
+   -  u32 current_config_index: current index on feature arc
+   -  u32 punt_reason: reason code once packet punted. Mutually
+      exclusive with current_config_index
+
+-  u32 opaque[10]: primary vnet-layer opaque data (see below)
+-  END of first cache line / data initialized by the buffer allocator
+-  u32 trace_index: buffer’s index in the packet trace subsystem
+-  u32 total_length_not_including_first_buffer: see
+   VLIB_BUFFER_TOTAL_LENGTH_VALID above
+-  u32 opaque2[14]: secondary vnet-layer opaque data (see below)
+-  u8 pre_data[VLIB_BUFFER_PRE_DATA_SIZE]: rewrite space, often used to
+   prepend tunnel encapsulations
+-  u8 data[0]: buffer data received from the wire. Ordinarily, hardware
+   devices use b->data[0] as the DMA target but there are exceptions. Do
+   not write code which blindly assumes that packet data starts in
+   b->data[0]. Use vlib_buffer_get_current(…).
+
+Vnet (network stack) primary buffer metadata
+--------------------------------------------
+
+Vnet primary buffer metadata occupies space reserved in the vlib opaque
+field shown above, and has the type name vnet_buffer_opaque_t.
+Ordinarily accessed using the vnet_buffer(b) macro. See
+../src/vnet/buffer.h for full details.
+
+Important fields:
+
+-  u32 sw_if_index[2]: RX and TX interface handles. At the ip lookup
+   stage, vnet_buffer(b)->sw_if_index[VLIB_TX] is interpreted as a FIB
+   index.
+-  i16 l2_hdr_offset: offset from b->data[0] of the packet L2 header.
+   Valid only if b->flags & VNET_BUFFER_F_L2_HDR_OFFSET_VALID is set
+-  i16 l3_hdr_offset: offset from b->data[0] of the packet L3 header.
+   Valid only if b->flags & VNET_BUFFER_F_L3_HDR_OFFSET_VALID is set
+-  i16 l4_hdr_offset: offset from b->data[0] of the packet L4 header.
+   Valid only if b->flags & VNET_BUFFER_F_L4_HDR_OFFSET_VALID is set
+-  u8 feature_arc_index: feature arc that the packet is currently
+   traversing
+-  union
+
+   -  ip
+
+      -  u32 adj_index[2]: adjacency from dest IP lookup in [VLIB_TX],
+         adjacency from source ip lookup in [VLIB_RX], set to ~0 until
+         source lookup done
+      -  union
+
+         -  generic fields
+         -  ICMP fields
+         -  reassembly fields
+
+   -  mpls fields
+   -  l2 bridging fields, only valid in the L2 path
+   -  l2tpv3 fields
+   -  l2 classify fields
+   -  vnet policer fields
+   -  MAP fields
+   -  MAP-T fields
+   -  ip fragmentation fields
+   -  COP (whitelist/blacklist filter) fields
+   -  LISP fields
+   -  TCP fields
+
+      -  connection index
+      -  sequence numbers
+      -  header and data offsets
+      -  data length
+      -  flags
+
+   -  SCTP fields
+   -  NAT fields
+   -  u32 unused[6]
+
+Vnet (network stack) secondary buffer metadata
+----------------------------------------------
+
+Vnet primary buffer metadata occupies space reserved in the vlib opaque2
+field shown above, and has the type name vnet_buffer_opaque2_t.
+Ordinarily accessed using the vnet_buffer2(b) macro. See
+../src/vnet/buffer.h for full details.
+
+Important fields:
+
+-  qos fields
+
+   -  u8 bits
+   -  u8 source
+
+-  u8 loop_counter: used to detect and report internal forwarding loops
+-  group-based policy fields
+
+   -  u8 flags
+   -  u16 sclass: the packet’s source class
+
+-  u16 gso_size: L4 payload size, persists all the way to
+   interface-output in case GSO is not enabled
+-  u16 gso_l4_hdr_sz: size of the L4 protocol header
+-  union
+
+   -  packet trajectory tracer (largely deprecated)
+
+      -  u16 \*trajectory_trace; only #if VLIB_BUFFER_TRACE_TRAJECTORY >
+         0
+
+   -  packet generator
+
+      -  u64 pg_replay_timestamp: timestamp for replayed pcap trace
+         packets
+
+   -  u32 unused[8]
+
+Buffer Metadata Extensions
+--------------------------
+
+Plugin developers may wish to extend either the primary or secondary
+vnet buffer opaque unions. Please perform a manual live variable
+analysis, otherwise nodes which use shared buffer metadata space may
+break things.
+
+It’s not OK to add plugin or proprietary metadata to the core vpp engine
+header files named above. Instead, proceed as follows. The example
+concerns the vnet primary buffer opaque union vlib_buffer_opaque_t. It’s
+a very simple variation to use the vnet secondary buffer opaque union
+vlib_buffer_opaque2_t.
+
+In a plugin header file:
+
+::
+
+       /* Add arbitrary buffer metadata */
+       #include <vnet/buffer.h>
+
+       typedef struct
+       {
+         u32 my_stuff[6];
+       } my_buffer_opaque_t;
+
+       STATIC_ASSERT (sizeof (my_buffer_opaque_t) <=
+                      STRUCT_SIZE_OF (vnet_buffer_opaque_t, unused),
+                      "Custom meta-data too large for vnet_buffer_opaque_t");
+
+       #define my_buffer_opaque(b)  \
+         ((my_buffer_opaque_t *)((u8 *)((b)->opaque) + STRUCT_OFFSET_OF (vnet_buffer_opaque_t, unused)))
+
+To set data in the custom buffer opaque type given a vlib_buffer_t \*b:
+
+::
+
+       my_buffer_opaque (b)->my_stuff[2] = 123;
+
+To read data from the custom buffer opaque type:
+
+::
+
+       stuff0 = my_buffer_opaque (b)->my_stuff[2];
diff --git a/docs/developer/corearchitecture/buildsystem/buildrootmakefile.rst b/docs/developer/corearchitecture/buildsystem/buildrootmakefile.rst
new file mode 100644
index 00000000000..1eb4e6b5301
--- /dev/null
+++ b/docs/developer/corearchitecture/buildsystem/buildrootmakefile.rst
@@ -0,0 +1,353 @@
+Introduction to build-root/Makefile
+===================================
+
+The vpp build system consists of a top-level Makefile, a data-driven
+build-root/Makefile, and a set of makefile fragments. The various parts
+come together as the result of a set of well-thought-out conventions.
+
+This section describes build-root/Makefile in some detail.
+
+Repository Groups and Source Paths
+----------------------------------
+
+Current vpp workspaces comprise a single repository group. The file
+.../build-root/build-config.mk defines a key variable called
+SOURCE\_PATH. The SOURCE\_PATH variable names the set of repository
+groups. At the moment, there is only one repository group.
+
+Single pass build system, dependencies and components
+-----------------------------------------------------
+
+The vpp build system caters to components built with GNU autoconf /
+automake. Adding such components is a simple process. Dealing with
+components which use BSD-style raw Makefiles is a more difficult.
+Dealing with toolchain components such as gcc, glibc, and binutils can
+be considerably more complicated.
+
+The vpp build system is a **single-pass** build system. A partial order
+must exist for any set of components: the set of (a before b) tuples
+must resolve to an ordered list. If you create a circular dependency of
+the form; (a,b) (b,c) (c,a), gmake will try to build the target list,
+but there’s a 0.0% chance that the results will be pleasant. Cut-n-paste
+mistakes in .../build-data/packages/.mk can produce confusing failures.
+
+In a single-pass build system, it’s best to separate libraries and
+applications which instantiate them. For example, if vpp depends on
+libfoo.a, and myapp depends on both vpp and libfoo.a, it's best to place
+libfoo.a and myapp in separate components. The build system will build
+libfoo.a, vpp, and then (as a separate component) myapp. If you try to
+build libfoo.a and myapp from the same component, it won’t work.
+
+If you absolutely, positively insist on having myapp and libfoo.a in the
+same source tree, you can create a pseudo-component in a separate .mk
+file in the .../build-data/packages/ directory. Define the code
+phoneycomponent\_source = realcomponent, and provide manual
+configure/build/install targets.
+
+Separate components for myapp, libfoo.a, and vpp is the best and easiest
+solution. However, the “mumble\_source = realsource” degree of freedom
+exists to solve intractable circular dependencies, such as: to build
+gcc-bootstrap, followed by glibc, followed by “real” gcc/g++ [which
+depends on glibc too].
+
+.../build-root
+--------------
+
+The .../build-root directory contains the repository group specification
+build-config.mk, the main Makefile, and the system-wide set of
+autoconf/automake variable overrides in config.site. We'll describe
+these files in some detail. To be clear about expectations: the main
+Makefile and config.site file are subtle and complex. It's unlikely that
+you'll need or want to modify them. Poorly planned changes in either
+place typically cause bugs that are difficult to solve.
+
+.../build-root/build-config.mk
+------------------------------
+
+As described above, the build-config.mk file is straightforward: it sets
+the make variable SOURCE\_PATH to a list of repository group absolute
+paths.
+
+The SOURCE\_PATH variable If you choose to move a workspace, make sure
+to modify the paths defined by the SOURCE\_PATH variable. Those paths
+need to match changes you make in the workspace paths. For example, if
+you place the vpp directory in the workspace of a user named jsmith, you
+might change the SOURCE\_PATH to:
+
+SOURCE\_PATH = /home/jsmithuser/workspace/vpp
+
+The "out of the box" setting should work 99.5% of the time:
+
+::
+
+        SOURCE_PATH = $(CURDIR)/..
+
+.../vpp/build-root/Makefile
+---------------------------
+
+The main Makefile is complex in a number of dimensions. If you think you
+need to modify it, it's a good idea to do some research, or ask for
+advice before you change it.
+
+The main Makefile was organized and designed to provide the following
+characteristics: excellent performance, accurate dependency processing,
+cache enablement, timestamp optimizations, git integration,
+extensibility, builds with cross-compilation tool chains, and builds
+with embedded Linux distributions.
+
+If you really need to do so, you can build double-cross tools with it,
+with a minimum amount of fuss. For example, you could: compile gdb on
+x86\_64, to run on PowerPC, to debug the Xtensa instruction set.
+
+The PLATFORM variable
+---------------------
+
+The PLATFORM make/environment variable controls a number of important
+characteristics, primarily:
+
+-  CPU architecture
+-  The list of images to build.
+
+With respect to .../build-root/Makefile, the list of images to build is
+specified by the target. For example:
+
+::
+
+       make PLATFORM=vpp TAG=vpp_debug install-deb
+
+builds vpp debug Debian packages.
+
+The main Makefile interprets $PLATFORM by attempting to "-include" the
+file /build-data/platforms.mk:
+
+::
+
+        $(foreach d,$(FULL_SOURCE_PATH), \
+          $(eval -include $(d)/platforms.mk))
+
+By convention, we don't define **platforms** in the
+...//build-data/platforms.mk file.
+
+In the vpp case, we search for platform definition makefile fragments in
+.../vpp/build-data/platforms.mk, as follows:
+
+::
+
+        $(foreach d,$(SOURCE_PATH_BUILD_DATA_DIRS), \
+             $(eval -include $(d)/platforms/*.mk))
+
+With vpp, which uses the "vpp" platform as discussed above, we end up
+"-include"-ing .../vpp/build-data/platforms/vpp.mk.
+
+The platform-specific .mk fragment
+----------------------------------
+
+Here are the contents of .../build-data/platforms/vpp.mk:
+
+::
+
+        MACHINE=$(shell uname -m)
+
+        vpp_arch = native
+        ifeq ($(TARGET_PLATFORM),thunderx)
+        vpp_dpdk_target = arm64-thunderx-linuxapp-gcc
+        endif
+        vpp_native_tools = vppapigen
+
+        vpp_uses_dpdk = yes
+
+        # Uncomment to enable building unit tests
+        # vpp_enable_tests = yes
+
+        vpp_root_packages = vpp
+
+        # DPDK configuration parameters
+        # vpp_uses_dpdk_mlx4_pmd = yes
+        # vpp_uses_dpdk_mlx5_pmd = yes
+        # vpp_uses_external_dpdk = yes
+        # vpp_dpdk_inc_dir = /usr/include/dpdk
+        # vpp_dpdk_lib_dir = /usr/lib
+        # vpp_dpdk_shared_lib = yes
+
+        # Use '--without-libnuma' for non-numa aware architecture
+        # Use '--enable-dlmalloc' to use dlmalloc instead of mheap
+        vpp_configure_args_vpp = --enable-dlmalloc
+        sample-plugin_configure_args_vpp = --enable-dlmalloc
+
+        # load balancer plugin is not portable on 32 bit platform
+        ifeq ($(MACHINE),i686)
+        vpp_configure_args_vpp += --disable-lb-plugin
+        endif
+
+        vpp_debug_TAG_CFLAGS = -g -O0 -DCLIB_DEBUG \
+            -fstack-protector-all -fPIC -Werror
+        vpp_debug_TAG_CXXFLAGS = -g -O0 -DCLIB_DEBUG \
+            -fstack-protector-all -fPIC -Werror
+        vpp_debug_TAG_LDFLAGS = -g -O0 -DCLIB_DEBUG \
+            -fstack-protector-all -fPIC -Werror
+
+        vpp_TAG_CFLAGS = -g -O2 -D_FORTIFY_SOURCE=2 -fstack-protector -fPIC -Werror
+        vpp_TAG_CXXFLAGS = -g -O2 -D_FORTIFY_SOURCE=2 -fstack-protector -fPIC -Werror
+        vpp_TAG_LDFLAGS = -g -O2 -D_FORTIFY_SOURCE=2 -fstack-protector -fPIC -Werror -pie -Wl,-z,now
+
+        vpp_clang_TAG_CFLAGS = -g -O2 -D_FORTIFY_SOURCE=2 -fstack-protector -fPIC -Werror
+        vpp_clang_TAG_LDFLAGS = -g -O2 -D_FORTIFY_SOURCE=2 -fstack-protector -fPIC -Werror
+
+        vpp_gcov_TAG_CFLAGS = -g -O0 -DCLIB_DEBUG -fPIC -Werror -fprofile-arcs -ftest-coverage
+        vpp_gcov_TAG_LDFLAGS = -g -O0 -DCLIB_DEBUG -fPIC -Werror -coverage
+
+        vpp_coverity_TAG_CFLAGS = -g -O2 -fPIC -Werror -D__COVERITY__
+        vpp_coverity_TAG_LDFLAGS = -g -O2 -fPIC -Werror -D__COVERITY__
+
+Note the following variable settings:
+
+-  The variable \_arch sets the CPU architecture used to build the
+   per-platform cross-compilation toolchain. With the exception of the
+   "native" architecture - used in our example - the vpp build system
+   produces cross-compiled binaries.
+
+-  The variable \_native\_tools lists the required set of self-compiled
+   build tools.
+
+-  The variable \_root\_packages lists the set of images to build when
+   specifying the target: make PLATFORM= TAG= [install-deb \|
+   install-rpm].
+
+The TAG variable
+----------------
+
+The TAG variable indirectly sets CFLAGS and LDFLAGS, as well as the
+build and install directory names in the .../vpp/build-root directory.
+See definitions above.
+
+Important targets build-root/Makefile
+-------------------------------------
+
+The main Makefile and the various makefile fragments implement the
+following user-visible targets:
+
++------------------+----------------------+--------------------------------------------------------------------------------------+
+| Target           | ENV Variable Settings| Notes                                                                                |
+|                  |                      |                                                                                      |
++==================+======================+======================================================================================+
+| foo              |      bar             | mumble                                                                               |
++------------------+----------------------+--------------------------------------------------------------------------------------+
+| bootstrap-tools  | none                 |  Builds the set of native tools needed by the vpp build system to                    |
+|                  |                      |  build images. Example: vppapigen. In a full cross compilation case might include    |
+|                  |                      |  include "make", "git", "find", and "tar                                             |
++------------------+----------------------+--------------------------------------------------------------------------------------+
+| install-tools    | PLATFORM             | Builds the tool chain for the indicated <platform>. Not used in vpp builds           |
++------------------+----------------------+--------------------------------------------------------------------------------------+
+| distclean        | none                 | Roto-rooters everything in sight: toolchains, images, and so forth.                  |
++------------------+----------------------+--------------------------------------------------------------------------------------+
+| install-deb      | PLATFORM and TAG     | Build Debian packages comprising components listed in <platform>_root_packages,      |
+|                  |                      | using compile / link options defined by TAG.                                         |
++------------------+----------------------+--------------------------------------------------------------------------------------+
+| install-rpm      | PLATFORM and TAG     | Build RPMs comprising components listed in <platform>_root_packages,                 |
+|                  |                      | using compile / link options defined by TAG.                                         |
++------------------+----------------------+--------------------------------------------------------------------------------------+
+
+Additional build-root/Makefile environment variable settings
+------------------------------------------------------------
+
+These variable settings may be of use:
+
++----------------------+------------------------------------------------------------------------------------------------------------+
+| ENV Variable         | Notes                                                                                                      |
++======================+======================+=====================================================================================+
+| BUILD_DEBUG=vx       | Directs Makefile et al. to make a good-faith effort to show what's going on in excruciating detail.        |
+|                      | Use it as follows: "make ... BUILD_DEBUG=vx". Fairly effective in Makefile debug situations.               |
++----------------------+------------------------------------------------------------------------------------------------------------+
+| V=1                  | print detailed cc / ld command lines. Useful for discovering if -DFOO=11 is in the command line or not     |
++----------------------+------------------------------------------------------------------------------------------------------------+
+| CC=mygcc             | Override the configured C-compiler                                                                         |
++----------------------+------------------------------------------------------------------------------------------------------------+
+
+.../build-root/config.site
+--------------------------
+
+The contents of .../build-root/config.site override individual autoconf /
+automake default variable settings. Here are a few sample settings related to
+building a full toolchain:
+
+::
+
+    # glibc needs these setting for cross compiling
+    libc_cv_forced_unwind=yes
+    libc_cv_c_cleanup=yes
+    libc_cv_ssp=no
+
+Determining the set of variables which need to be overridden, and the
+override values is a matter of trial and error. It should be
+unnecessary to modify this file for use with fd.io vpp.
+
+.../build-data/platforms.mk
+---------------------------
+
+Each repo group includes the platforms.mk file, which is included by
+the main Makefile. The vpp/build-data/platforms.mk file is not terribly
+complex. As of this writing, .../build-data/platforms.mk file accomplishes two
+tasks.
+
+First, it includes vpp/build-data/platforms/\*.mk:
+
+::
+
+    # Pick up per-platform makefile fragments
+    $(foreach d,$(SOURCE_PATH_BUILD_DATA_DIRS),	\
+      $(eval -include $(d)/platforms/*.mk))
+
+This collects the set of platform definition makefile fragments, as discussed above.
+
+Second, platforms.mk implements the user-visible "install-deb" target.
+
+.../build-data/packages/\*.mk
+-----------------------------
+
+Each component needs a makefile fragment in order for the build system
+to recognize it. The per-component makefile fragments vary
+considerably in complexity. For a component built with GNU autoconf /
+automake which does not depend on other components, the make fragment
+can be empty. See .../build-data/packages/vpp.mk for an uncomplicated
+but fully realistic example.
+
+Here are some of the important variable settings in per-component makefile fragments:
+
++----------------------+------------------------------------------------------------------------------------------------------------+
+| Variable             | Notes                                                                                                      |
++======================+======================+=====================================================================================+
+| xxx_configure_depend |  Lists the set of component build dependencies for the xxx component. In plain English: don't try to       |
+|                      |  configure this component until you've successfully built the indicated targets. Almost always,            |
+|                      |  xxx_configure_depend will list a set of "yyy-install" targets. Note the pattern:                          |
+|                      |  "variable names contain underscores, make target names contain hyphens"                                   |
++----------------------+------------------------------------------------------------------------------------------------------------+
+| xxx_configure_args   | (optional) Lists any additional arguments to pass to the xxx component "configure" script.                 |
+|                      | The main Makefile %-configure rule adds the required settings for --libdir, --prefix, and                  |
+|                      | --host (when cross-compiling)                                                                              |
++----------------------+------------------------------------------------------------------------------------------------------------+
+| xxx_CPPFLAGS         | Adds -I stanzas to CPPFLAGS for components upon which xxx depends.                                         |
+|                      | Almost invariably "xxx_CPPFLAGS = $(call installed_includes_fn, dep1 dep2 dep3)", where dep1, dep2, and    |
+|                      | dep3 are listed in xxx_configure_depend. It is bad practice to set "-g -O3" here. Those settings           |
+|                      | belong in a TAG.                                                                                           |
++----------------------+------------------------------------------------------------------------------------------------------------+
+| xxx_LDFLAGS          | Adds -Wl,-rpath -Wl,depN stanzas to LDFLAGS for components upon which xxx depends.                         |
+|                      | Almost invariably "xxx_LDFLAGS = $(call installed_lib_fn, dep1 dep2 dep3)", where dep1, dep2, and          |
+|                      | dep3 are listed in xxx_configure_depend. It is bad manners to set "-liberty-or-death" here.                |
+|                      | Those settings belong in Makefile.am.                                                                      |
++----------------------+------------------------------------------------------------------------------------------------------------+
+
+When dealing with "irritating" components built with raw Makefiles
+which only work when building in the source tree, we use a specific
+strategy in the xxx.mk file.
+
+The strategy is simple for those components: We copy the source tree
+into .../vpp/build-root/build-xxx. This works, but completely defeats
+dependency processing. This strategy is acceptable only for 3rd party
+software which won't need extensive (or preferably any) modifications.
+
+Take a look at .../vpp/build-data/packages/dpdk.mk. When invoked, the
+dpdk_configure variable copies source code into $(PACKAGE_BUILD_DIR),
+and performs the BSD equivalent of "autoreconf -i -f" to configure the
+build area. The rest of the file is similar: a bunch of hand-rolled
+glue code which manages to make the dpdk act like a good vpp build
+citizen even though it is not.
diff --git a/docs/developer/corearchitecture/buildsystem/cmakeandninja.rst b/docs/developer/corearchitecture/buildsystem/cmakeandninja.rst
new file mode 100644
index 00000000000..580d261bdac
--- /dev/null
+++ b/docs/developer/corearchitecture/buildsystem/cmakeandninja.rst
@@ -0,0 +1,186 @@
+Introduction to cmake and ninja
+===============================
+
+Cmake plus ninja is approximately equal to GNU autotools plus GNU
+make, respectively. Both cmake and GNU autotools support self and
+cross-compilation, checking for required components and versions.
+
+- For a decent-sized project - such as vpp - build performance is drastically better with (cmake, ninja).
+
+- The cmake input language looks like an actual language, rather than a shell scripting scheme on steroids.
+
+- Ninja doesn't pretend to support manually-generated input files. Think of it as a fast, dumb robot which eats mildly legible byte-code.
+
+See the `cmake website <http://cmake.org>`_, and the `ninja website
+<https://ninja-build.org>`_ for additional information.
+
+vpp cmake configuration files
+-----------------------------
+
+The top of the vpp project cmake hierarchy lives in .../src/CMakeLists.txt.
+This file defines the vpp project, and (recursively) includes two kinds
+of files: rule/function definitions, and target lists.
+
+- Rule/function definitions live in .../src/cmake/{\*.cmake}. Although the contents of these files is simple enough to read, it shouldn't be necessary to modify them very often
+
+- Build target lists come from CMakeLists.txt files found in subdirectories, which are named in the SUBDIRS list in .../src/CMakeLists.txt
+
+::
+
+    ##############################################################################
+    # subdirs - order matters
+    ##############################################################################
+    if("${CMAKE_SYSTEM_NAME}" STREQUAL "Linux")
+      find_package(OpenSSL REQUIRED)
+      set(SUBDIRS
+        vppinfra svm vlib vlibmemory vlibapi vnet vpp vat vcl plugins
+        vpp-api tools/vppapigen tools/g2 tools/perftool)
+    elseif("${CMAKE_SYSTEM_NAME}" STREQUAL "Darwin")
+      set(SUBDIRS vppinfra)
+    else()
+      message(FATAL_ERROR "Unsupported system: ${CMAKE_SYSTEM_NAME}")
+    endif()
+
+    foreach(DIR ${SUBDIRS})
+      add_subdirectory(${DIR})
+    endforeach()
+
+- The vpp cmake configuration hierarchy discovers the list of plugins to be built by searching for subdirectories in .../src/plugins which contain CMakeLists.txt files
+
+
+::
+
+    ##############################################################################
+    # find and add all plugin subdirs
+    ##############################################################################
+    FILE(GLOB files RELATIVE
+      ${CMAKE_CURRENT_SOURCE_DIR}
+      ${CMAKE_CURRENT_SOURCE_DIR}/*/CMakeLists.txt
+    )
+    foreach (f ${files})
+      get_filename_component(dir ${f} DIRECTORY)
+      add_subdirectory(${dir})
+    endforeach()
+
+How to write a plugin CMakeLists.txt file
+-----------------------------------------
+
+It's really quite simple. Follow the pattern:
+
+::
+
+    add_vpp_plugin(mactime
+      SOURCES
+      mactime.c
+      node.c
+
+      API_FILES
+      mactime.api
+
+      INSTALL_HEADERS
+      mactime_all_api_h.h
+      mactime_msg_enum.h
+
+      API_TEST_SOURCES
+      mactime_test.c
+    )
+
+Adding a target elsewhere in the source tree
+--------------------------------------------
+
+Within reason, adding a subdirectory to the SUBDIRS list in
+.../src/CMakeLists.txt is perfectly OK. The indicated directory will
+need a CMakeLists.txt file.
+
+.. _building-g2:
+
+Here's how we build the g2 event data visualization tool:
+
+::
+
+    option(VPP_BUILD_G2 "Build g2 tool." OFF)
+    if(VPP_BUILD_G2)
+      find_package(GTK2 COMPONENTS gtk)
+      if(GTK2_FOUND)
+        include_directories(${GTK2_INCLUDE_DIRS})
+        add_vpp_executable(g2
+          SOURCES
+          clib.c
+          cpel.c
+          events.c
+          main.c
+          menu1.c
+          pointsel.c
+          props.c
+          g2version.c
+          view1.c
+
+          LINK_LIBRARIES vppinfra Threads::Threads m ${GTK2_LIBRARIES}
+          NO_INSTALL
+        )
+      endif()
+    endif()
+
+The g2 component is optional, and is not built by default. There are
+a couple of ways to tell cmake to include it in build.ninja [or in Makefile.]
+
+When invoking cmake manually [rarely done and not very easy], specify
+-DVPP_BUILD_G2=ON:
+
+::
+
+   $ cmake ... -DVPP_BUILD_G2=ON
+
+Take a good look at .../build-data/packages/vpp.mk to see where and
+how the top-level Makefile and .../build-root/Makefile set all of the
+cmake arguments. One strategy to enable an optional component is fairly
+obvious. Add -DVPP_BUILD_G2=ON to vpp_cmake_args.
+
+That would work, of course, but it's not a particularly elegant solution.
+
+Tinkering with build options: ccmake
+------------------------------------
+
+The easy way to set VPP_BUILD_G2 - or frankly **any** cmake
+parameter - is to install the "cmake-curses-gui" package and use
+it.
+
+- Do a straightforward vpp build using the top level Makefile, "make build" or "make build-release"
+- Ajourn to .../build-root/build-vpp-native/vpp or .../build-root/build-vpp_debug-native/vpp
+- Invoke "ccmake ." to reconfigure the project as desired
+
+Here's approximately what you'll see:
+
+::
+
+     CCACHE_FOUND                     /usr/bin/ccache
+     CMAKE_BUILD_TYPE
+     CMAKE_INSTALL_PREFIX             /scratch/vpp-gate/build-root/install-vpp-nati
+     DPDK_INCLUDE_DIR                 /scratch/vpp-gate/build-root/install-vpp-nati
+     DPDK_LIB                         /scratch/vpp-gate/build-root/install-vpp-nati
+     MBEDTLS_INCLUDE_DIR              /usr/include
+     MBEDTLS_LIB1                     /usr/lib/x86_64-linux-gnu/libmbedtls.so
+     MBEDTLS_LIB2                     /usr/lib/x86_64-linux-gnu/libmbedx509.so
+     MBEDTLS_LIB3                     /usr/lib/x86_64-linux-gnu/libmbedcrypto.so
+     MUSDK_INCLUDE_DIR                MUSDK_INCLUDE_DIR-NOTFOUND
+     MUSDK_LIB                        MUSDK_LIB-NOTFOUND
+     PRE_DATA_SIZE                    128
+     VPP_API_TEST_BUILTIN             ON
+     VPP_BUILD_G2                     OFF
+     VPP_BUILD_PERFTOOL               OFF
+     VPP_BUILD_VCL_TESTS              ON
+     VPP_BUILD_VPPINFRA_TESTS         OFF
+
+    CCACHE_FOUND: Path to a program.
+    Press [enter] to edit option Press [d] to delete an entry   CMake Version 3.10.2
+    Press [c] to configure
+    Press [h] for help           Press [q] to quit without generating
+    Press [t] to toggle advanced mode (Currently Off)
+
+Use the cursor to point at the VPP_BUILD_G2 line. Press the return key
+to change OFF to ON. Press "c" to regenerate build.ninja, etc.
+
+At that point "make build" or "make build-release" will build g2. And so on.
+
+Note that toggling advanced mode ["t"] gives access to substantially
+all of the cmake option, discovered directories and paths.
diff --git a/docs/developer/corearchitecture/buildsystem/index.rst b/docs/developer/corearchitecture/buildsystem/index.rst
new file mode 100644
index 00000000000..908e91e1fc1
--- /dev/null
+++ b/docs/developer/corearchitecture/buildsystem/index.rst
@@ -0,0 +1,14 @@
+.. _buildsystem:
+
+Build System
+============
+
+This guide describes the vpp build system in detail. As of this writing,
+the build systems uses a mix of make / Makefiles, cmake, and ninja to
+achieve excellent build performance.
+
+.. toctree::
+
+    mainmakefile
+    cmakeandninja
+    buildrootmakefile
diff --git a/docs/developer/corearchitecture/buildsystem/mainmakefile.rst b/docs/developer/corearchitecture/buildsystem/mainmakefile.rst
new file mode 100644
index 00000000000..96b97496350
--- /dev/null
+++ b/docs/developer/corearchitecture/buildsystem/mainmakefile.rst
@@ -0,0 +1,2 @@
+Introduction to the top-level Makefile
+======================================
diff --git a/docs/developer/corearchitecture/featurearcs.rst b/docs/developer/corearchitecture/featurearcs.rst
new file mode 100644
index 00000000000..89c50e38dce
--- /dev/null
+++ b/docs/developer/corearchitecture/featurearcs.rst
@@ -0,0 +1,225 @@
+Feature Arcs
+============
+
+A significant number of vpp features are configurable on a per-interface
+or per-system basis. Rather than ask feature coders to manually
+construct the required graph arcs, we built a general mechanism to
+manage these mechanics.
+
+Specifically, feature arcs comprise ordered sets of graph nodes. Each
+feature node in an arc is independently controlled. Feature arc nodes
+are generally unaware of each other. Handing a packet to “the next
+feature node” is quite inexpensive.
+
+The feature arc implementation solves the problem of creating graph arcs
+used for steering.
+
+At the beginning of a feature arc, a bit of setup work is needed, but
+only if at least one feature is enabled on the arc.
+
+On a per-arc basis, individual feature definitions create a set of
+ordering dependencies. Feature infrastructure performs a topological
+sort of the ordering dependencies, to determine the actual feature
+order. Missing dependencies **will** lead to runtime disorder. See
+https://gerrit.fd.io/r/#/c/12753 for an example.
+
+If no partial order exists, vpp will refuse to run. Circular dependency
+loops of the form “a then b, b then c, c then a” are impossible to
+satisfy.
+
+Adding a feature to an existing feature arc
+-------------------------------------------
+
+To nobody’s great surprise, we set up feature arcs using the typical
+“macro -> constructor function -> list of declarations” pattern:
+
+.. code:: c
+
+       VNET_FEATURE_INIT (mactime, static) =
+       {
+         .arc_name = "device-input",
+         .node_name = "mactime",
+         .runs_before = VNET_FEATURES ("ethernet-input"),
+       };
+
+This creates a “mactime” feature on the “device-input” arc.
+
+Once per frame, dig up the vnet_feature_config_main_t corresponding to
+the “device-input” feature arc:
+
+.. code:: c
+
+       vnet_main_t *vnm = vnet_get_main ();
+       vnet_interface_main_t *im = &vnm->interface_main;
+       u8 arc = im->output_feature_arc_index;
+       vnet_feature_config_main_t *fcm;
+
+       fcm = vnet_feature_get_config_main (arc);
+
+Note that in this case, we’ve stored the required arc index - assigned
+by the feature infrastructure - in the vnet_interface_main_t. Where to
+put the arc index is a programmer’s decision when creating a feature
+arc.
+
+Per packet, set next0 to steer packets to the next node they should
+visit:
+
+.. code:: c
+
+       vnet_get_config_data (&fcm->config_main,
+                             &b0->current_config_index /* value-result */,
+                             &next0, 0 /* # bytes of config data */);
+
+Configuration data is per-feature arc, and is often unused. Note that
+it’s normal to reset next0 to divert packets elsewhere; often, to drop
+them for cause:
+
+.. code:: c
+
+       next0 = MACTIME_NEXT_DROP;
+       b0->error = node->errors[DROP_CAUSE];
+
+Creating a feature arc
+----------------------
+
+Once again, we create feature arcs using constructor macros:
+
+.. code:: c
+
+       VNET_FEATURE_ARC_INIT (ip4_unicast, static) =
+       {
+         .arc_name = "ip4-unicast",
+         .start_nodes = VNET_FEATURES ("ip4-input", "ip4-input-no-checksum"),
+         .arc_index_ptr = &ip4_main.lookup_main.ucast_feature_arc_index,
+       };
+
+In this case, we configure two arc start nodes to handle the
+“hardware-verified ip checksum or not” cases. During initialization, the
+feature infrastructure stores the arc index as shown.
+
+In the head-of-arc node, do the following to send packets along the
+feature arc:
+
+.. code:: c
+
+       ip_lookup_main_t *lm = &im->lookup_main;
+       arc = lm->ucast_feature_arc_index;
+
+Once per packet, initialize packet metadata to walk the feature arc:
+
+.. code:: c
+
+   vnet_feature_arc_start (arc, sw_if_index0, &next, b0);
+
+Enabling / Disabling features
+-----------------------------
+
+Simply call vnet_feature_enable_disable to enable or disable a specific
+feature:
+
+.. code:: c
+
+       vnet_feature_enable_disable ("device-input", /* arc name */
+                                    "mactime",      /* feature name */
+                                sw_if_index,    /* Interface sw_if_index */
+                                    enable_disable, /* 1 => enable */
+                                    0 /* (void *) feature_configuration */,
+                                    0 /* feature_configuration_nbytes */);
+
+The feature_configuration opaque is seldom used.
+
+If you wish to make a feature a *de facto* system-level concept, pass
+sw_if_index=0 at all times. Sw_if_index 0 is always valid, and
+corresponds to the “local” interface.
+
+Related “show” commands
+-----------------------
+
+To display the entire set of features, use “show features [verbose]”.
+The verbose form displays arc indices, and feature indicies within the
+arcs
+
+::
+
+   $ vppctl show features verbose
+   Available feature paths
+   <snip>
+   [14] ip4-unicast:
+     [ 0]: nat64-out2in-handoff
+     [ 1]: nat64-out2in
+     [ 2]: nat44-ed-hairpin-dst
+     [ 3]: nat44-hairpin-dst
+     [ 4]: ip4-dhcp-client-detect
+     [ 5]: nat44-out2in-fast
+     [ 6]: nat44-in2out-fast
+     [ 7]: nat44-handoff-classify
+     [ 8]: nat44-out2in-worker-handoff
+     [ 9]: nat44-in2out-worker-handoff
+     [10]: nat44-ed-classify
+     [11]: nat44-ed-out2in
+     [12]: nat44-ed-in2out
+     [13]: nat44-det-classify
+     [14]: nat44-det-out2in
+     [15]: nat44-det-in2out
+     [16]: nat44-classify
+     [17]: nat44-out2in
+     [18]: nat44-in2out
+     [19]: ip4-qos-record
+     [20]: ip4-vxlan-gpe-bypass
+     [21]: ip4-reassembly-feature
+     [22]: ip4-not-enabled
+     [23]: ip4-source-and-port-range-check-rx
+     [24]: ip4-flow-classify
+     [25]: ip4-inacl
+     [26]: ip4-source-check-via-rx
+     [27]: ip4-source-check-via-any
+     [28]: ip4-policer-classify
+     [29]: ipsec-input-ip4
+     [30]: vpath-input-ip4
+     [31]: ip4-vxlan-bypass
+     [32]: ip4-lookup
+   <snip>
+
+Here, we learn that the ip4-unicast feature arc has index 14, and that
+e.g. ip4-inacl is the 25th feature in the generated partial order.
+
+To display the features currently active on a specific interface, use
+“show interface features”:
+
+::
+
+   $ vppctl show interface GigabitEthernet3/0/0 features
+   Feature paths configured on GigabitEthernet3/0/0...
+   <snip>
+   ip4-unicast:
+     nat44-out2in
+   <snip>
+
+Table of Feature Arcs
+---------------------
+
+Simply search for name-strings to track down the arc definition,
+location of the arc index, etc.
+
+::
+
+               |    Arc Name      |
+               |------------------|
+               | device-input     |
+               | ethernet-output  |
+               | interface-output |
+               | ip4-drop         |
+               | ip4-local        |
+               | ip4-multicast    |
+               | ip4-output       |
+               | ip4-punt         |
+               | ip4-unicast      |
+               | ip6-drop         |
+               | ip6-local        |
+               | ip6-multicast    |
+               | ip6-output       |
+               | ip6-punt         |
+               | ip6-unicast      |
+               | mpls-input       |
+               | mpls-output      |
+               | nsh-output       |
diff --git a/docs/developer/corearchitecture/index.rst b/docs/developer/corearchitecture/index.rst
new file mode 100644
index 00000000000..ecd5a3cdb08
--- /dev/null
+++ b/docs/developer/corearchitecture/index.rst
@@ -0,0 +1,21 @@
+.. _corearchitecture:
+
+=================
+Core Architecture
+=================
+
+.. toctree::
+    :maxdepth: 1
+
+    softwarearchitecture
+    infrastructure
+    vlib
+    vnet
+    featurearcs
+    buffer_metadata
+    multiarch/index
+    bihash
+    buildsystem/index
+    mem
+    multi_thread
+
diff --git a/docs/developer/corearchitecture/infrastructure.rst b/docs/developer/corearchitecture/infrastructure.rst
new file mode 100644
index 00000000000..b4e1065f81e
--- /dev/null
+++ b/docs/developer/corearchitecture/infrastructure.rst
@@ -0,0 +1,612 @@
+VPPINFRA (Infrastructure)
+=========================
+
+The files associated with the VPP Infrastructure layer are located in
+the ``./src/vppinfra`` folder.
+
+VPPinfra is a collection of basic c-library services, quite sufficient
+to build standalone programs to run directly on bare metal. It also
+provides high-performance dynamic arrays, hashes, bitmaps,
+high-precision real-time clock support, fine-grained event-logging, and
+data structure serialization.
+
+One fair comment / fair warning about vppinfra: you can't always tell a
+macro from an inline function from an ordinary function simply by name.
+Macros are used to avoid function calls in the typical case, and to
+cause (intentional) side-effects.
+
+Vppinfra has been around for almost 20 years and tends not to change
+frequently. The VPP Infrastructure layer contains the following
+functions:
+
+Vectors
+-------
+
+Vppinfra vectors are ubiquitous dynamically resized arrays with by user
+defined "headers". Many vpppinfra data structures (e.g. hash, heap,
+pool) are vectors with various different headers.
+
+The memory layout looks like this:
+
+::
+
+                      User header (optional, uword aligned)
+                      Alignment padding (if needed)
+                      Vector length in elements
+    User's pointer -> Vector element 0
+                      Vector element 1
+                      ...
+                      Vector element N-1
+
+As shown above, the vector APIs deal with pointers to the 0th element of
+a vector. Null pointers are valid vectors of length zero.
+
+To avoid thrashing the memory allocator, one often resets the length of
+a vector to zero while retaining the memory allocation. Set the vector
+length field to zero via the vec_reset_length(v) macro. [Use the macro!
+It’s smart about NULL pointers.]
+
+Typically, the user header is not present. User headers allow for other
+data structures to be built atop vppinfra vectors. Users may specify the
+alignment for first data element of a vector via the [vec]()*_aligned
+macros.
+
+Vector elements can be any C type e.g. (int, double, struct bar). This
+is also true for data types built atop vectors (e.g. heap, pool, etc.).
+Many macros have \_a variants supporting alignment of vector elements
+and \_h variants supporting non-zero-length vector headers. The \_ha
+variants support both. Additionally cacheline alignment within a vector
+element structure can be specified using the
+``[CLIB_CACHE_LINE_ALIGN_MARK]()`` macro.
+
+Inconsistent usage of header and/or alignment related macro variants
+will cause delayed, confusing failures.
+
+Standard programming error: memorize a pointer to the ith element of a
+vector, and then expand the vector. Vectors expand by 3/2, so such code
+may appear to work for a period of time. Correct code almost always
+memorizes vector **indices** which are invariant across reallocations.
+
+In typical application images, one supplies a set of global functions
+designed to be called from gdb. Here are a few examples:
+
+-  vl(v) - prints vec_len(v)
+-  pe(p) - prints pool_elts(p)
+-  pifi(p, index) - prints pool_is_free_index(p, index)
+-  debug_hex_bytes (p, nbytes) - hex memory dump nbytes starting at p
+
+Use the “show gdb” debug CLI command to print the current set.
+
+Bitmaps
+-------
+
+Vppinfra bitmaps are dynamic, built using the vppinfra vector APIs.
+Quite handy for a variety jobs.
+
+Pools
+-----
+
+Vppinfra pools combine vectors and bitmaps to rapidly allocate and free
+fixed-size data structures with independent lifetimes. Pools are perfect
+for allocating per-session structures.
+
+Hashes
+------
+
+Vppinfra provides several hash flavors. Data plane problems involving
+packet classification / session lookup often use
+./src/vppinfra/bihash_template.[ch] bounded-index extensible hashes.
+These templates are instantiated multiple times, to efficiently service
+different fixed-key sizes.
+
+Bihashes are thread-safe. Read-locking is not required. A simple
+spin-lock ensures that only one thread writes an entry at a time.
+
+The original vppinfra hash implementation in ./src/vppinfra/hash.[ch]
+are simple to use, and are often used in control-plane code which needs
+exact-string-matching.
+
+In either case, one almost always looks up a key in a hash table to
+obtain an index in a related vector or pool. The APIs are simple enough,
+but one must take care when using the unmanaged arbitrary-sized key
+variant. Hash_set_mem (hash_table, key_pointer, value) memorizes
+key_pointer. It is usually a bad mistake to pass the address of a vector
+element as the second argument to hash_set_mem. It is perfectly fine to
+memorize constant string addresses in the text segment.
+
+Timekeeping
+-----------
+
+Vppinfra includes high-precision, low-cost timing services. The datatype
+clib_time_t and associated functions reside in ./src/vppinfra/time.[ch].
+Call clib_time_init (clib_time_t \*cp) to initialize the clib_time_t
+object.
+
+Clib_time_init(…) can use a variety of different ways to establish the
+hardware clock frequency. At the end of the day, vppinfra timekeeping
+takes the attitude that the operating system’s clock is the closest
+thing to a gold standard it has handy.
+
+When properly configured, NTP maintains kernel clock synchronization
+with a highly accurate off-premises reference clock. Notwithstanding
+network propagation delays, a synchronized NTP client will keep the
+kernel clock accurate to within 50ms or so.
+
+Why should one care? Simply put, oscillators used to generate CPU ticks
+aren’t super accurate. They work pretty well, but a 0.1% error wouldn’t
+be out of the question. That’s a minute and a half’s worth of error in 1
+day. The error changes constantly, due to temperature variation, and a
+host of other physical factors.
+
+It’s far too expensive to use system calls for timing, so we’re left
+with the problem of continuously adjusting our view of the CPU tick
+register’s clocks_per_second parameter.
+
+The clock rate adjustment algorithm measures the number of cpu ticks and
+the “gold standard” reference time across an interval of approximately
+16 seconds. We calculate clocks_per_second for the interval: use rdtsc
+(on x86_64) and a system call to get the latest cpu tick count and the
+kernel’s latest nanosecond timestamp. We subtract the previous interval
+end values, and use exponential smoothing to merge the new clock rate
+sample into the clocks_per_second parameter.
+
+As of this writing, we maintain the clock rate by way of the following
+first-order differential equation:
+
+.. code:: c
+
+      clocks_per_second(t) = clocks_per_second(t-1) * K + sample_cps(t)*(1-K)
+      where K = e**(-1.0/3.75);
+
+This yields a per observation “half-life” of 1 minute. Empirically, the
+clock rate converges within 5 minutes, and appears to maintain
+near-perfect agreement with the kernel clock in the face of ongoing NTP
+time adjustments.
+
+See ./src/vppinfra/time.c:clib_time_verify_frequency(…) to look at the
+rate adjustment algorithm. The code rejects frequency samples
+corresponding to the sort of adjustment which might occur if someone
+changes the gold standard kernel clock by several seconds.
+
+Monotonic timebase support
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Particularly during system initialization, the “gold standard” system
+reference clock can change by a large amount, in an instant. It’s not a
+best practice to yank the reference clock - in either direction - by
+hours or days. In fact, some poorly-constructed use-cases do so.
+
+To deal with this reality, clib_time_now(…) returns the number of
+seconds since vpp started, *guaranteed to be monotonically increasing,
+no matter what happens to the system reference clock*.
+
+This is first-order important, to avoid breaking every active timer in
+the system. The vpp host stack alone may account for tens of millions of
+active timers. It’s utterly impractical to track down and fix timers, so
+we must deal with the issue at the timebase level.
+
+Here’s how it works. Prior to adjusting the clock rate, we collect the
+kernel reference clock and the cpu clock:
+
+.. code:: c
+
+     /* Ask the kernel and the CPU what time it is... */
+     now_reference = unix_time_now ();
+     now_clock = clib_cpu_time_now ();
+
+Compute changes for both clocks since the last rate adjustment, roughly
+15 seconds ago:
+
+.. code:: c
+
+     /* Compute change in the reference clock */
+     delta_reference = now_reference - c->last_verify_reference_time;
+
+     /* And change in the CPU clock */
+     delta_clock_in_seconds = (f64) (now_clock - c->last_verify_cpu_time) *
+       c->seconds_per_clock;
+
+Delta_reference is key. Almost 100% of the time, delta_reference and
+delta_clock_in_seconds are identical modulo one system-call time.
+However, NTP or a privileged user can yank the system reference time -
+in either direction - by an hour, a day, or a decade.
+
+As described above, clib_time_now(…) must return monotonically
+increasing answers to the question “how long has it been since vpp
+started, in seconds.” To do that, the clock rate adjustment algorithm
+begins by recomputing the initial reference time:
+
+.. code:: c
+
+     c->init_reference_time += (delta_reference - delta_clock_in_seconds);
+
+It’s easy to convince yourself that if the reference clock changes by
+15.000000 seconds and the cpu clock tick time changes by 15.000000
+seconds, the initial reference time won’t change.
+
+If, on the other hand, delta_reference is -86400.0 and delta clock is
+15.0 - reference time jumped backwards by exactly one day in a 15-second
+rate update interval - we add -86415.0 to the initial reference time.
+
+Given the corrected initial reference time, we recompute the total
+number of cpu ticks which have occurred since the corrected initial
+reference time, at the current clock tick rate:
+
+.. code:: c
+
+     c->total_cpu_time = (now_reference - c->init_reference_time)
+       * c->clocks_per_second;
+
+Timebase precision
+~~~~~~~~~~~~~~~~~~
+
+Cognoscenti may notice that vlib/clib_time_now(…) return a 64-bit
+floating-point value; the number of seconds since vpp started.
+
+Please see `this Wikipedia
+article <https://en.wikipedia.org/wiki/Double-precision_floating-point_format>`__
+for more information. C double-precision floating point numbers (called
+f64 in the vpp code base) have a 53-bit effective mantissa, and can
+accurately represent 15 decimal digits’ worth of precision.
+
+There are 315,360,000.000001 seconds in ten years plus one microsecond.
+That string has exactly 15 decimal digits. The vpp time base retains 1us
+precision for roughly 30 years.
+
+vlib/clib_time_now do *not* provide precision in excess of 1e-6 seconds.
+If necessary, please use clib_cpu_time_now(…) for direct access to the
+CPU clock-cycle counter. Note that the number of CPU clock cycles per
+second varies significantly across CPU architectures.
+
+Timer Wheels
+------------
+
+Vppinfra includes configurable timer wheel support. See the source code
+in …/src/vppinfra/tw_timer_template.[ch], as well as a considerable
+number of template instances defined in …/src/vppinfra/tw_timer\_.[ch].
+
+Instantiation of tw_timer_template.h generates named structures to
+implement specific timer wheel geometries. Choices include: number of
+timer wheels (currently, 1 or 2), number of slots per ring (a power of
+two), and the number of timers per “object handle”.
+
+Internally, user object/timer handles are 32-bit integers, so if one
+selects 16 timers/object (4 bits), the resulting timer wheel handle is
+limited to 2**28 objects.
+
+Here are the specific settings required to generate a single 2048 slot
+wheel which supports 2 timers per object:
+
+.. code:: c
+
+       #define TW_TIMER_WHEELS 1
+       #define TW_SLOTS_PER_RING 2048
+       #define TW_RING_SHIFT 11
+       #define TW_RING_MASK (TW_SLOTS_PER_RING -1)
+       #define TW_TIMERS_PER_OBJECT 2
+       #define LOG2_TW_TIMERS_PER_OBJECT 1
+       #define TW_SUFFIX _2t_1w_2048sl
+       #define TW_FAST_WHEEL_BITMAP 0
+       #define TW_TIMER_ALLOW_DUPLICATE_STOP 0
+
+See tw_timer_2t_1w_2048sl.h for a complete example.
+
+tw_timer_template.h is not intended to be #included directly. Client
+codes can include multiple timer geometry header files, although extreme
+caution would required to use the TW and TWT macros in such a case.
+
+API usage examples
+~~~~~~~~~~~~~~~~~~
+
+The unit test code in …/src/vppinfra/test_tw_timer.c provides a concrete
+API usage example. It uses a synthetic clock to rapidly exercise the
+underlying tw_timer_expire_timers(…) template.
+
+There are not many API routines to call.
+
+Initialize a two-timer, single 2048-slot wheel w/ a 1-second timer granularity
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: c
+
+       tw_timer_wheel_init_2t_1w_2048sl (&tm->single_wheel,
+                                        expired_timer_single_callback,
+                         1.0 / * timer interval * / );
+
+Start a timer
+^^^^^^^^^^^^^
+
+.. code:: c
+
+       handle = tw_timer_start_2t_1w_2048sl (&tm->single_wheel, elt_index,
+                                             [0 | 1] / * timer id * / ,
+                                             expiration_time_in_u32_ticks);
+
+Stop a timer
+^^^^^^^^^^^^
+
+.. code:: c
+
+       tw_timer_stop_2t_1w_2048sl (&tm->single_wheel, handle);
+
+An expired timer callback
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: c
+
+       static void
+       expired_timer_single_callback (u32 * expired_timers)
+       {
+           int i;
+           u32 pool_index, timer_id;
+           tw_timer_test_elt_t *e;
+           tw_timer_test_main_t *tm = &tw_timer_test_main;
+
+           for (i = 0; i < vec_len (expired_timers);
+               {
+               pool_index = expired_timers[i] & 0x7FFFFFFF;
+               timer_id = expired_timers[i] >> 31;
+
+               ASSERT (timer_id == 1);
+
+               e = pool_elt_at_index (tm->test_elts, pool_index);
+
+               if (e->expected_to_expire != tm->single_wheel.current_tick)
+                 {
+                   fformat (stdout, "[%d] expired at %d not %d\n",
+                            e - tm->test_elts, tm->single_wheel.current_tick,
+                            e->expected_to_expire);
+                 }
+            pool_put (tm->test_elts, e);
+            }
+        }
+
+We use wheel timers extensively in the vpp host stack. Each TCP session
+needs 5 timers, so supporting 10 million flows requires up to 50 million
+concurrent timers.
+
+Timers rarely expire, so it’s of utmost important that stopping and
+restarting a timer costs as few clock cycles as possible.
+
+Stopping a timer costs a doubly-linked list dequeue. Starting a timer
+involves modular arithmetic to determine the correct timer wheel and
+slot, and a list head enqueue.
+
+Expired timer processing generally involves bulk link-list retirement
+with user callback presentation. Some additional complexity at wheel
+wrap time, to relocate timers from slower-turning timer wheels into
+faster-turning wheels.
+
+Format
+------
+
+Vppinfra format is roughly equivalent to printf.
+
+Format has a few properties worth mentioning. Format’s first argument is
+a (u8 \*) vector to which it appends the result of the current format
+operation. Chaining calls is very easy:
+
+.. code:: c
+
+       u8 * result;
+
+       result = format (0, "junk = %d, ", junk);
+       result = format (result, "more junk = %d\n", more_junk);
+
+As previously noted, NULL pointers are perfectly proper 0-length
+vectors. Format returns a (u8 \*) vector, **not** a C-string. If you
+wish to print a (u8 \*) vector, use the “%v” format string. If you need
+a (u8 \*) vector which is also a proper C-string, either of these
+schemes may be used:
+
+.. code:: c
+
+       vec_add1 (result, 0)
+       or
+       result = format (result, "<whatever>%c", 0);
+
+Remember to vec_free() the result if appropriate. Be careful not to pass
+format an uninitialized (u8 \*).
+
+Format implements a particularly handy user-format scheme via the “%U”
+format specification. For example:
+
+.. code:: c
+
+       u8 * format_junk (u8 * s, va_list *va)
+       {
+         junk = va_arg (va, u32);
+         s = format (s, "%s", junk);
+         return s;
+       }
+
+       result = format (0, "junk = %U, format_junk, "This is some junk");
+
+format_junk() can invoke other user-format functions if desired. The
+programmer shoulders responsibility for argument type-checking. It is
+typical for user format functions to blow up spectacularly if the
+va_arg(va, type) macros don’t match the caller’s idea of reality.
+
+Unformat
+--------
+
+Vppinfra unformat is vaguely related to scanf, but considerably more
+general.
+
+A typical use case involves initializing an unformat_input_t from either
+a C-string or a (u8 \*) vector, then parsing via unformat() as follows:
+
+.. code:: c
+
+       unformat_input_t input;
+       u8 *s = "<some-C-string>";
+
+       unformat_init_string (&input, (char *) s, strlen((char *) s));
+       /* or */
+       unformat_init_vector (&input, <u8-vector>);
+
+Then loop parsing individual elements:
+
+.. code:: c
+
+       while (unformat_check_input (&input) != UNFORMAT_END_OF_INPUT)
+       {
+         if (unformat (&input, "value1 %d", &value1))
+           ;/* unformat sets value1 */
+         else if (unformat (&input, "value2 %d", &value2)
+           ;/* unformat sets value2 */
+         else
+           return clib_error_return (0, "unknown input '%U'",
+                                     format_unformat_error, input);
+       }
+
+As with format, unformat implements a user-unformat function capability
+via a “%U” user unformat function scheme. Generally, one can trivially
+transform “format (s,”foo %d”, foo) -> “unformat (input,”foo %d”,
+&foo)“.
+
+Unformat implements a couple of handy non-scanf-like format specifiers:
+
+.. code:: c
+
+       unformat (input, "enable %=", &enable, 1 /* defaults to 1 */);
+       unformat (input, "bitzero %|", &mask, (1<<0));
+       unformat (input, "bitone %|", &mask, (1<<1));
+       <etc>
+
+The phrase “enable %=” means “set the supplied variable to the default
+value” if unformat parses the “enable” keyword all by itself. If
+unformat parses “enable 123” set the supplied variable to 123.
+
+We could clean up a number of hand-rolled “verbose” + “verbose %d”
+argument parsing codes using “%=”.
+
+The phrase “bitzero %\|” means “set the specified bit in the supplied
+bitmask” if unformat parses “bitzero”. Although it looks like it could
+be fairly handy, it’s very lightly used in the code base.
+
+``%_`` toggles whether or not to skip input white space.
+
+For transition from skip to no-skip in middle of format string, skip
+input white space. For example, the following:
+
+.. code:: c
+
+   fmt = "%_%d.%d%_->%_%d.%d%_"
+   unformat (input, fmt, &one, &two, &three, &four);
+
+matches input “1.2 -> 3.4”. Without this, the space after -> does not
+get skipped.
+
+
+How to parse a single input line
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Debug CLI command functions MUST NOT accidentally consume input
+belonging to other debug CLI commands. Otherwise, it's impossible to
+script a set of debug CLI commands which "work fine" when issued one
+at a time.
+
+This bit of code is NOT correct:
+
+.. code:: c
+
+     /* Eats script input NOT beloging to it, and chokes! */
+     while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT)
+       {
+         if (unformat (input, ...))
+       ;
+         else if (unformat (input, ...))
+       ;
+         else
+           return clib_error_return (0, "parse error: '%U'",
+                                format_unformat_error, input);
+       }
+       }
+
+When executed as part of a script, such a function will return “parse
+error: ‘’” every time, unless it happens to be the last command in the
+script.
+
+Instead, use “unformat_line_input” to consume the rest of a line’s worth
+of input - everything past the path specified in the VLIB_CLI_COMMAND
+declaration.
+
+For example, unformat_line_input with “my_command” set up as shown below
+and user input “my path is clear” will produce an unformat_input_t that
+contains “is clear”.
+
+.. code:: c
+
+       VLIB_CLI_COMMAND (...) = {
+           .path = "my path",
+       };
+
+Here’s a bit of code which shows the required mechanics, in full:
+
+.. code:: c
+
+       static clib_error_t *
+       my_command_fn (vlib_main_t * vm,
+                      unformat_input_t * input,
+                      vlib_cli_command_t * cmd)
+       {
+         unformat_input_t _line_input, *line_input = &_line_input;
+         u32 this, that;
+         clib_error_t *error = 0;
+
+         if (!unformat_user (input, unformat_line_input, line_input))
+           return 0;
+
+         /*
+          * Here, UNFORMAT_END_OF_INPUT is at the end of the line we consumed,
+          * not at the end of the script...
+          */
+         while (unformat_check_input (line_input) != UNFORMAT_END_OF_INPUT)
+           {
+              if (unformat (line_input, "this %u", &this))
+                ;
+              else if (unformat (line_input, "that %u", &that))
+                ;
+              else
+                {
+                  error = clib_error_return (0, "parse error: '%U'",
+                                    format_unformat_error, line_input);
+                  goto done;
+                }
+             }
+
+       <do something based on "this" and "that", etc>
+
+       done:
+         unformat_free (line_input);
+         return error;
+       }
+      VLIB_CLI_COMMAND (my_command, static) = {
+        .path = "my path",
+        .function = my_command_fn",
+      };
+
+Vppinfra errors and warnings
+----------------------------
+
+Many functions within the vpp dataplane have return-values of type
+clib_error_t \*. Clib_error_t’s are arbitrary strings with a bit of
+metadata [fatal, warning] and are easy to announce. Returning a NULL
+clib_error_t \* indicates “A-OK, no error.”
+
+Clib_warning(format-args) is a handy way to add debugging output; clib
+warnings prepend function:line info to unambiguously locate the message
+source. Clib_unix_warning() adds perror()-style Linux system-call
+information. In production images, clib_warnings result in syslog
+entries.
+
+Serialization
+-------------
+
+Vppinfra serialization support allows the programmer to easily serialize
+and unserialize complex data structures.
+
+The underlying primitive serialize/unserialize functions use network
+byte-order, so there are no structural issues serializing on a
+little-endian host and unserializing on a big-endian host.
diff --git a/docs/developer/corearchitecture/mem.rst b/docs/developer/corearchitecture/mem.rst
new file mode 120000
index 00000000000..0fc53eab68c
--- /dev/null
+++ b/docs/developer/corearchitecture/mem.rst
@@ -0,0 +1 @@
+../../../src/vpp/mem/mem.rst
+\ No newline at end of file
diff --git a/docs/developer/corearchitecture/multi_thread.rst b/docs/developer/corearchitecture/multi_thread.rst
new file mode 100644
index 00000000000..195a9b791fd
--- /dev/null
+++ b/docs/developer/corearchitecture/multi_thread.rst
@@ -0,0 +1,169 @@
+.. _vpp_multi_thread:
+
+Multi-threading in VPP
+======================
+
+Modes
+-----
+
+VPP can work in 2 different modes:
+
+-  single-thread
+-  multi-thread with worker threads
+
+Single-thread
+~~~~~~~~~~~~~
+
+In a single-thread mode there is one main thread which handles both
+packet processing and other management functions (Command-Line Interface
+(CLI), API, stats). This is the default setup. There is no special
+startup config needed.
+
+Multi-thread with Worker Threads
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In this mode, the main threads handles management functions(debug CLI,
+API, stats collection) and one or more worker threads handle packet
+processing from input to output of the packet.
+
+Each worker thread polls input queues on subset of interfaces.
+
+With RSS (Receive Side Scaling) enabled multiple threads can service one
+physical interface (RSS function on NIC distributes traffic between
+different queues which are serviced by different worker threads).
+
+Thread placement
+----------------
+
+Thread placement is defined in the startup config under the cpu { … }
+section.
+
+The VPP platform can place threads automatically or manually. Automatic
+placement works in the following way:
+
+-  if “skip-cores X” is defined first X cores will not be used
+-  if “main-core X” is defined, VPP main thread will be placed on core
+   X, otherwise 1st available one will be used
+-  if “workers N” is defined vpp will allocate first N available cores
+   and it will run threads on them
+-  if “corelist-workers A,B1-Bn,C1-Cn” is defined vpp will automatically
+   assign those CPU cores to worker threads
+
+User can see active placement of cores by using the VPP debug CLI
+command show threads:
+
+.. code-block:: console
+
+   vpd# show threads
+   ID     Name                Type        LWP     lcore  Core   Socket State
+   0      vpe_main                        59723   2      2      0      wait
+   1      vpe_wk_0            workers     59755   4      4      0      running
+   2      vpe_wk_1            workers     59756   5      5      0      running
+   3      vpe_wk_2            workers     59757   6      0      1      running
+   4      vpe_wk_3            workers     59758   7      1      1      running
+   5                          stats       59775
+   vpd#
+
+The sample output above shows the main thread running on core 2 (2nd
+core on the CPU socket 0), worker threads running on cores 4-7.
+
+Sample Configurations
+---------------------
+
+By default, at start-up VPP uses
+configuration values from: ``/etc/vpp/startup.conf``
+
+The following sections describe some of the additional changes that can be made to this file.
+This file is initially populated from the files located in the following directory ``/vpp/vpp/conf/``
+
+Manual Placement
+~~~~~~~~~~~~~~~~
+
+Manual placement places the main thread on core 1, workers on cores
+4,5,20,21.
+
+.. code-block:: console
+
+   cpu {
+     main-core 1
+     corelist-workers  4-5,20-21
+   }
+
+Auto placement
+--------------
+
+Auto placement is likely to place the main thread on core 1 and workers
+on cores 2,3,4.
+
+.. code-block:: console
+
+   cpu {
+     skip-cores 1
+     workers 3
+   }
+
+Buffer Memory Allocation
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+The VPP platform is NUMA aware. It can allocate memory for buffers on
+different CPU sockets (NUMA nodes). The amount of memory allocated can
+be defined in the startup config for each CPU socket by using the
+socket-mem A[[,B],C] statement inside the dpdk { … } section.
+
+For example:
+
+.. code-block:: console
+
+   dpdk {
+     socket-mem 1024,1024
+   }
+
+The above configuration allocates 1GB of memory on NUMA#0 and 1GB on
+NUMA#1. Each worker thread uses buffers which are local to itself.
+
+Buffer memory is allocated from hugepages. VPP prefers 1G pages if they
+are available. If not 2MB pages will be used.
+
+VPP takes care of mounting/unmounting hugepages file-system
+automatically so there is no need to do that manually.
+
+’‘’NOTE’’’: If you are running latest VPP release, there is no need for
+specifying socket-mem manually. VPP will discover all NUMA nodes and it
+will allocate 512M on each by default. socket-mem is only needed if
+bigger number of mbufs is required (default is 16384 per socket and can
+be changed with num-mbufs startup config command).
+
+Interface Placement in Multi-thread Setup
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+On startup, the VPP platform assigns interfaces (or interface, queue
+pairs if RSS is used) to different worker threads in round robin
+fashion.
+
+The following example shows debug CLI commands to show and change
+interface placement:
+
+.. code-block:: console
+
+   vpd# sh dpdk interface placement
+   Thread 1 (vpp_wk_0 at lcore 5):
+    TenGigabitEthernet2/0/0 queue 0
+    TenGigabitEthernet2/0/1 queue 0
+   Thread 2 (vpp_wk_1 at lcore 6):
+    TenGigabitEthernet2/0/0 queue 1
+    TenGigabitEthernet2/0/1 queue 1
+
+The following shows an example of moving TenGigabitEthernet2/0/1 queue 1
+processing to 1st worker thread:
+
+.. code-block:: console
+
+   vpd# set interface placement TenGigabitEthernet2/0/1 queue 1 thread 1
+
+   vpp# sh dpdk interface placement
+   Thread 1 (vpp_wk_0 at lcore 5):
+    TenGigabitEthernet2/0/0 queue 0
+    TenGigabitEthernet2/0/1 queue 0
+    TenGigabitEthernet2/0/1 queue 1
+   Thread 2 (vpp_wk_1 at lcore 6):
+    TenGigabitEthernet2/0/0 queue 1
diff --git a/docs/developer/corearchitecture/multiarch/arbfns.rst b/docs/developer/corearchitecture/multiarch/arbfns.rst
new file mode 100644
index 00000000000..d469bd8a140
--- /dev/null
+++ b/docs/developer/corearchitecture/multiarch/arbfns.rst
@@ -0,0 +1,87 @@
+Multi-Architecture Arbitrary Function Cookbook
+==============================================
+
+Optimizing arbitrary functions for multiple architectures is simple
+enough, and very similar to process used to produce multi-architecture
+graph node dispatch functions.
+
+As with multi-architecture graph nodes, we compile source files
+multiple times, generating multiple implementations of the original
+function, and a public selector function.
+
+Details
+-------
+
+Decorate function definitions with CLIB_MARCH_FN macros. For example:
+
+Change the original function prototype...
+
+::
+
+   u32 vlib_frame_alloc_to_node (vlib_main_t * vm, u32 to_node_index,
+                                 u32 frame_flags)
+
+...by recasting the function name and return type as the first two
+arguments to the CLIB_MARCH_FN macro:
+
+::
+
+    CLIB_MARCH_FN (vlib_frame_alloc_to_node, u32, vlib_main_t * vm,
+                   u32 to_node_index, u32 frame_flags)
+
+In the actual vpp image, several versions of vlib_frame_alloc_to_node
+will appear: vlib_frame_alloc_to_node_avx2,
+vlib_frame_alloc_to_node_avx512, and so forth.
+
+
+For each multi-architecture function, use the CLIB_MARCH_FN_SELECT
+macro to help generate the one-and-only multi-architecture selector
+function:
+
+::
+
+    #ifndef CLIB_MARCH_VARIANT
+    u32
+    vlib_frame_alloc_to_node (vlib_main_t * vm, u32 to_node_index,
+    			  u32 frame_flags)
+    {
+      return CLIB_MARCH_FN_SELECT (vlib_frame_alloc_to_node)
+        (vm, to_node_index, frame_flags);
+    }
+    #endif /* CLIB_MARCH_VARIANT */
+
+Once bound, the multi-architecture selector function is about as
+expensive as an indirect function call; which is to say: not very
+expensive.
+
+Modify CMakeLists.txt
+---------------------
+
+If the component in question already lists "MULTIARCH_SOURCES", simply
+add the indicated .c file to the list.  Otherwise, add as shown
+below. Note that the added file "new_multiarch_node.c" should appear in
+*both* SOURCES and MULTIARCH_SOURCES:
+
+::
+
+    add_vpp_plugin(myplugin
+      SOURCES
+      multiarch_code.c
+      ...
+
+      MULTIARCH_SOURCES
+      multiarch_code.c
+      ...
+     )
+
+A Word to the Wise
+------------------
+
+A file which liberally mixes functions worth compiling for multiple
+architectures and functions which are not will end up full of
+#ifndef CLIB_MARCH_VARIANT conditionals. This won't do a thing to make
+the code look any better.
+
+Depending on requirements, it may make sense to move functions to
+(new) files to reduce complexity and/or improve legibility of the
+resulting code.
diff --git a/docs/developer/corearchitecture/multiarch/index.rst b/docs/developer/corearchitecture/multiarch/index.rst
new file mode 100644
index 00000000000..824a8e68438
--- /dev/null
+++ b/docs/developer/corearchitecture/multiarch/index.rst
@@ -0,0 +1,12 @@
+.. _multiarch:
+
+Multi-architecture support
+==========================
+
+This reference guide describes how to use the vpp multi-architecture support scheme
+
+.. toctree::
+   :maxdepth: 1
+
+   nodefns
+   arbfns
diff --git a/docs/developer/corearchitecture/multiarch/nodefns.rst b/docs/developer/corearchitecture/multiarch/nodefns.rst
new file mode 100644
index 00000000000..9647e64f08c
--- /dev/null
+++ b/docs/developer/corearchitecture/multiarch/nodefns.rst
@@ -0,0 +1,138 @@
+Multi-Architecture Graph Node Cookbook
+======================================
+
+In the context of graph node dispatch functions, it's easy enough to
+use the vpp multi-architecture support setup. The point of the scheme
+is simple: for performance-critical nodes, generate multiple CPU
+hardware-dependent versions of the node dispatch functions, and pick
+the best one at runtime.
+
+The vpp scheme is simple enough to use, but details matter.
+
+100,000 foot view
+-----------------
+
+We compile entire graph node dispatch function implementation files
+multiple times. These compilations give rise to multiple versions of
+the graph node dispatch functions. Per-node constructor-functions
+interrogate CPU hardware, select the node dispatch function variant to
+use, and set the vlib_node_registration_t ".function" member to the
+address of the selected variant.
+
+Details
+-------
+
+Declare the node dispatch function as shown, using the VLIB\_NODE\_FN macro. The
+name of the node function **MUST** match the name of the graph node.
+
+::
+
+    VLIB_NODE_FN (ip4_sdp_node) (vlib_main_t * vm, vlib_node_runtime_t * node,
+                                 vlib_frame_t * frame)
+    {
+      if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE))
+        return ip46_sdp_inline (vm, node, frame, 1 /* is_ip4 */ ,
+    			    1 /* is_trace */ );
+      else
+        return ip46_sdp_inline (vm, node, frame, 1 /* is_ip4 */ ,
+    			    0 /* is_trace */ );
+    }
+
+We need to generate *precisely one copy* of the
+vlib_node_registration_t, error strings, and packet trace decode function.
+
+Simply bracket these items with "#ifndef CLIB_MARCH_VARIANT...#endif":
+
+::
+
+    #ifndef CLIB_MARCH_VARIANT
+    static u8 *
+    format_sdp_trace (u8 * s, va_list * args)
+    {
+       <snip>
+    }
+    #endif
+
+    ...
+
+    #ifndef CLIB_MARCH_VARIANT
+    static char *sdp_error_strings[] = {
+    #define _(sym,string) string,
+      foreach_sdp_error
+    #undef _
+    };
+    #endif
+
+    ...
+
+    #ifndef CLIB_MARCH_VARIANT
+    VLIB_REGISTER_NODE (ip4_sdp_node) =
+    {
+      // DO NOT set the .function structure member.
+      // The multiarch selection __attribute__((constructor)) function
+      // takes care of it at runtime
+      .name = "ip4-sdp",
+      .vector_size = sizeof (u32),
+      .format_trace = format_sdp_trace,
+      .type = VLIB_NODE_TYPE_INTERNAL,
+
+      .n_errors = ARRAY_LEN(sdp_error_strings),
+      .error_strings = sdp_error_strings,
+
+      .n_next_nodes = SDP_N_NEXT,
+
+      /* edit / add dispositions here */
+      .next_nodes =
+      {
+        [SDP_NEXT_DROP] = "ip4-drop",
+      },
+    };
+    #endif
+
+To belabor the point: *do not* set the ".function" member! That's the job of the multi-arch
+selection \_\_attribute\_\_((constructor)) function
+
+Always inline node dispatch functions
+-------------------------------------
+
+It's typical for a graph dispatch function to contain one or more
+calls to an inline function. See above. If your node dispatch function
+is structured that way, make *ABSOLUTELY CERTAIN* to use the
+"always_inline" macro:
+
+::
+
+    always_inline uword
+    ip46_sdp_inline (vlib_main_t * vm, vlib_node_runtime_t * node,
+                 vlib_frame_t * frame,
+    		 int is_ip4, int is_trace)
+    { ... }
+
+Otherwise, the compiler is highly likely NOT to build multiple
+versions of the guts of your dispatch function.
+
+It's fairly easy to spot this mistake in "perf top." If you see, for
+example, a bunch of functions with names of the form
+"xxx_node_fn_avx2" in the profile, *BUT* your brand-new node function
+shows up with a name of the form "xxx_inline.isra.1", it's quite likely
+that the inline was declared "static inline" instead of "always_inline".
+
+Modify CMakeLists.txt
+---------------------
+
+If the component in question already lists "MULTIARCH_SOURCES", simply
+add the indicated .c file to the list.  Otherwise, add as shown
+below. Note that the added file "new_multiarch_node.c" should appear in
+*both* SOURCES and MULTIARCH_SOURCES:
+
+::
+
+    add_vpp_plugin(myplugin
+      SOURCES
+      new_multiarch_node.c
+      ...
+
+      MULTIARCH_SOURCES
+      new_ multiarch_node.c
+      ...
+     )
diff --git a/docs/developer/corearchitecture/softwarearchitecture.rst b/docs/developer/corearchitecture/softwarearchitecture.rst
new file mode 100644
index 00000000000..7f8a0e04645
--- /dev/null
+++ b/docs/developer/corearchitecture/softwarearchitecture.rst
@@ -0,0 +1,47 @@
+Software Architecture
+=====================
+
+The fd.io vpp implementation is a third-generation vector packet
+processing implementation specifically related to US Patent 7,961,636,
+as well as earlier work. Note that the Apache-2 license specifically
+grants non-exclusive patent licenses; we mention this patent as a point
+of historical interest.
+
+For performance, the vpp dataplane consists of a directed graph of
+forwarding nodes which process multiple packets per invocation. This
+schema enables a variety of micro-processor optimizations: pipelining
+and prefetching to cover dependent read latency, inherent I-cache phase
+behavior, vector instructions. Aside from hardware input and hardware
+output nodes, the entire forwarding graph is portable code.
+
+Depending on the scenario at hand, we often spin up multiple worker
+threads which process ingress-hashes packets from multiple queues using
+identical forwarding graph replicas.
+
+VPP Layers - Implementation Taxonomy
+------------------------------------
+
+.. figure:: /_images/VPP_Layering.png
+   :alt: image
+
+   image
+
+-  VPP Infra - the VPP infrastructure layer, which contains the core
+   library source code. This layer performs memory functions, works with
+   vectors and rings, performs key lookups in hash tables, and works
+   with timers for dispatching graph nodes.
+-  VLIB - the vector processing library. The vlib layer also handles
+   various application management functions: buffer, memory and graph
+   node management, maintaining and exporting counters, thread
+   management, packet tracing. Vlib implements the debug CLI (command
+   line interface).
+-  VNET - works with VPP's networking interface (layers 2, 3, and 4)
+   performs session and traffic management, and works with devices and
+   the data control plane.
+-  Plugins - Contains an increasingly rich set of data-plane plugins, as
+   noted in the above diagram.
+-  VPP - the container application linked against all of the above.
+
+It’s important to understand each of these layers in a certain amount of
+detail. Much of the implementation is best dealt with at the API level
+and otherwise left alone.
diff --git a/docs/developer/corearchitecture/vlib.rst b/docs/developer/corearchitecture/vlib.rst
new file mode 100644
index 00000000000..f542d33ebb8
--- /dev/null
+++ b/docs/developer/corearchitecture/vlib.rst
@@ -0,0 +1,888 @@
+VLIB (Vector Processing Library)
+================================
+
+The files associated with vlib are located in the ./src/{vlib, vlibapi,
+vlibmemory} folders. These libraries provide vector processing support
+including graph-node scheduling, reliable multicast support,
+ultra-lightweight cooperative multi-tasking threads, a CLI, plug in .DLL
+support, physical memory and Linux epoll support. Parts of this library
+embody US Patent 7,961,636.
+
+Init function discovery
+-----------------------
+
+vlib applications register for various [initialization] events by
+placing structures and \__attribute__((constructor)) functions into the
+image. At appropriate times, the vlib framework walks
+constructor-generated singly-linked structure lists, performs a
+topological sort based on specified constraints, and calls the indicated
+functions. Vlib applications create graph nodes, add CLI functions,
+start cooperative multi-tasking threads, etc. etc. using this mechanism.
+
+vlib applications invariably include a number of VLIB_INIT_FUNCTION
+(my_init_function) macros.
+
+Each init / configure / etc. function has the return type clib_error_t
+\*. Make sure that the function returns 0 if all is well, otherwise the
+framework will announce an error and exit.
+
+vlib applications must link against vppinfra, and often link against
+other libraries such as VNET. In the latter case, it may be necessary to
+explicitly reference symbol(s) otherwise large portions of the library
+may be AWOL at runtime.
+
+Init function construction and constraint specification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It’s easy to add an init function:
+
+.. code:: c
+
+      static clib_error_t *my_init_function (vlib_main_t *vm)
+      {
+         /* ... initialize things ... */
+
+         return 0; // or return clib_error_return (0, "BROKEN!");
+      }
+      VLIB_INIT_FUNCTION(my_init_function);
+
+As given, my_init_function will be executed “at some point,” but with no
+ordering guarantees.
+
+Specifying ordering constraints is easy:
+
+.. code:: c
+
+      VLIB_INIT_FUNCTION(my_init_function) =
+      {
+         .runs_before = VLIB_INITS("we_run_before_function_1",
+                                   "we_run_before_function_2"),
+         .runs_after = VLIB_INITS("we_run_after_function_1",
+                                  "we_run_after_function_2),
+       };
+
+It’s also easy to specify bulk ordering constraints of the form “a then
+b then c then d”:
+
+.. code:: c
+
+      VLIB_INIT_FUNCTION(my_init_function) =
+      {
+         .init_order = VLIB_INITS("a", "b", "c", "d"),
+      };
+
+It’s OK to specify all three sorts of ordering constraints for a single
+init function, although it’s hard to imagine why it would be necessary.
+
+Node Graph Initialization
+-------------------------
+
+vlib packet-processing applications invariably define a set of graph
+nodes to process packets.
+
+One constructs a vlib_node_registration_t, most often via the
+VLIB_REGISTER_NODE macro. At runtime, the framework processes the set of
+such registrations into a directed graph. It is easy enough to add nodes
+to the graph at runtime. The framework does not support removing nodes.
+
+vlib provides several types of vector-processing graph nodes, primarily
+to control framework dispatch behaviors. The type member of the
+vlib_node_registration_t functions as follows:
+
+-  VLIB_NODE_TYPE_PRE_INPUT - run before all other node types
+-  VLIB_NODE_TYPE_INPUT - run as often as possible, after pre_input
+   nodes
+-  VLIB_NODE_TYPE_INTERNAL - only when explicitly made runnable by
+   adding pending frames for processing
+-  VLIB_NODE_TYPE_PROCESS - only when explicitly made runnable.
+   “Process” nodes are actually cooperative multi-tasking threads. They
+   **must** explicitly suspend after a reasonably short period of time.
+
+For a precise understanding of the graph node dispatcher, please read
+./src/vlib/main.c:vlib_main_loop.
+
+Graph node dispatcher
+---------------------
+
+Vlib_main_loop() dispatches graph nodes. The basic vector processing
+algorithm is diabolically simple, but may not be obvious from even a
+long stare at the code. Here’s how it works: some input node, or set of
+input nodes, produce a vector of work to process. The graph node
+dispatcher pushes the work vector through the directed graph,
+subdividing it as needed, until the original work vector has been
+completely processed. At that point, the process recurs.
+
+This scheme yields a stable equilibrium in frame size, by construction.
+Here’s why: as the frame size increases, the per-frame-element
+processing time decreases. There are several related forces at work; the
+simplest to describe is the effect of vector processing on the CPU L1
+I-cache. The first frame element [packet] processed by a given node
+warms up the node dispatch function in the L1 I-cache. All subsequent
+frame elements profit. As we increase the number of frame elements, the
+cost per element goes down.
+
+Under light load, it is a crazy waste of CPU cycles to run the graph
+node dispatcher flat-out. So, the graph node dispatcher arranges to wait
+for work by sitting in a timed epoll wait if the prevailing frame size
+is low. The scheme has a certain amount of hysteresis to avoid
+constantly toggling back and forth between interrupt and polling mode.
+Although the graph dispatcher supports interrupt and polling modes, our
+current default device drivers do not.
+
+The graph node scheduler uses a hierarchical timer wheel to reschedule
+process nodes upon timer expiration.
+
+Graph dispatcher internals
+--------------------------
+
+This section may be safely skipped. It’s not necessary to understand
+graph dispatcher internals to create graph nodes.
+
+Vector Data Structure
+---------------------
+
+In vpp / vlib, we represent vectors as instances of the vlib_frame_t
+type:
+
+.. code:: c
+
+       typedef struct vlib_frame_t
+       {
+         /* Frame flags. */
+         u16 flags;
+
+         /* Number of scalar bytes in arguments. */
+         u8 scalar_size;
+
+         /* Number of bytes per vector argument. */
+         u8 vector_size;
+
+         /* Number of vector elements currently in frame. */
+         u16 n_vectors;
+
+         /* Scalar and vector arguments to next node. */
+         u8 arguments[0];
+       } vlib_frame_t;
+
+Note that one *could* construct all kinds of vectors - including vectors
+with some associated scalar data - using this structure. In the vpp
+application, vectors typically use a 4-byte vector element size, and
+zero bytes’ worth of associated per-frame scalar data.
+
+Frames are always allocated on CLIB_CACHE_LINE_BYTES boundaries. Frames
+have u32 indices which make use of the alignment property, so the
+maximum feasible main heap offset of a frame is CLIB_CACHE_LINE_BYTES \*
+0xFFFFFFFF: 64*4 = 256 Gbytes.
+
+Scheduling Vectors
+------------------
+
+As you can see, vectors are not directly associated with graph nodes. We
+represent that association in a couple of ways. The simplest is the
+vlib_pending_frame_t:
+
+.. code:: c
+
+       /* A frame pending dispatch by main loop. */
+       typedef struct
+       {
+         /* Node and runtime for this frame. */
+         u32 node_runtime_index;
+
+         /* Frame index (in the heap). */
+         u32 frame_index;
+
+         /* Start of next frames for this node. */
+         u32 next_frame_index;
+
+         /* Special value for next_frame_index when there is no next frame. */
+       #define VLIB_PENDING_FRAME_NO_NEXT_FRAME ((u32) ~0)
+       } vlib_pending_frame_t;
+
+Here is the code in …/src/vlib/main.c:vlib_main_or_worker_loop() which
+processes frames:
+
+.. code:: c
+
+         /*
+          * Input nodes may have added work to the pending vector.
+          * Process pending vector until there is nothing left.
+          * All pending vectors will be processed from input -> output.
+          */
+         for (i = 0; i < _vec_len (nm->pending_frames); i++)
+       cpu_time_now = dispatch_pending_node (vm, i, cpu_time_now);
+         /* Reset pending vector for next iteration. */
+
+The pending frame node_runtime_index associates the frame with the node
+which will process it.
+
+Complications
+-------------
+
+Fasten your seatbelt. Here’s where the story - and the data structures -
+become quite complicated…
+
+At 100,000 feet: vpp uses a directed graph, not a directed *acyclic*
+graph. It’s really quite normal for a packet to visit ip[46]-lookup
+multiple times. The worst-case: a graph node which enqueues packets to
+itself.
+
+To deal with this issue, the graph dispatcher must force allocation of a
+new frame if the current graph node’s dispatch function happens to
+enqueue a packet back to itself.
+
+There are no guarantees that a pending frame will be processed
+immediately, which means that more packets may be added to the
+underlying vlib_frame_t after it has been attached to a
+vlib_pending_frame_t. Care must be taken to allocate new frames and
+pending frames if a (pending_frame, frame) pair fills.
+
+Next frames, next frame ownership
+---------------------------------
+
+The vlib_next_frame_t is the last key graph dispatcher data structure:
+
+.. code:: c
+
+       typedef struct
+       {
+         /* Frame index. */
+         u32 frame_index;
+
+         /* Node runtime for this next. */
+         u32 node_runtime_index;
+
+         /* Next frame flags. */
+         u32 flags;
+
+         /* Reflects node frame-used flag for this next. */
+       #define VLIB_FRAME_NO_FREE_AFTER_DISPATCH \
+         VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH
+
+         /* This next frame owns enqueue to node
+            corresponding to node_runtime_index. */
+       #define VLIB_FRAME_OWNER (1 << 15)
+
+         /* Set when frame has been allocated for this next. */
+       #define VLIB_FRAME_IS_ALLOCATED VLIB_NODE_FLAG_IS_OUTPUT
+
+         /* Set when frame has been added to pending vector. */
+       #define VLIB_FRAME_PENDING VLIB_NODE_FLAG_IS_DROP
+
+         /* Set when frame is to be freed after dispatch. */
+       #define VLIB_FRAME_FREE_AFTER_DISPATCH VLIB_NODE_FLAG_IS_PUNT
+
+         /* Set when frame has traced packets. */
+       #define VLIB_FRAME_TRACE VLIB_NODE_FLAG_TRACE
+
+         /* Number of vectors enqueue to this next since last overflow. */
+         u32 vectors_since_last_overflow;
+       } vlib_next_frame_t;
+
+Graph node dispatch functions call vlib_get_next_frame (…) to set “(u32
+\*)to_next” to the right place in the vlib_frame_t corresponding to the
+ith arc (aka next0) from the current node to the indicated next node.
+
+After some scuffling around - two levels of macros - processing reaches
+vlib_get_next_frame_internal (…). Get-next-frame-internal digs up the
+vlib_next_frame_t corresponding to the desired graph arc.
+
+The next frame data structure amounts to a graph-arc-centric frame
+cache. Once a node finishes adding element to a frame, it will acquire a
+vlib_pending_frame_t and end up on the graph dispatcher’s run-queue. But
+there’s no guarantee that more vector elements won’t be added to the
+underlying frame from the same (source_node, next_index) arc or from a
+different (source_node, next_index) arc.
+
+Maintaining consistency of the arc-to-frame cache is necessary. The
+first step in maintaining consistency is to make sure that only one
+graph node at a time thinks it “owns” the target vlib_frame_t.
+
+Back to the graph node dispatch function. In the usual case, a certain
+number of packets will be added to the vlib_frame_t acquired by calling
+vlib_get_next_frame (…).
+
+Before a dispatch function returns, it’s required to call
+vlib_put_next_frame (…) for all of the graph arcs it actually used. This
+action adds a vlib_pending_frame_t to the graph dispatcher’s pending
+frame vector.
+
+Vlib_put_next_frame makes a note in the pending frame of the frame
+index, and also of the vlib_next_frame_t index.
+
+dispatch_pending_node actions
+-----------------------------
+
+The main graph dispatch loop calls dispatch pending node as shown above.
+
+Dispatch_pending_node recovers the pending frame, and the graph node
+runtime / dispatch function. Further, it recovers the next_frame
+currently associated with the vlib_frame_t, and detaches the
+vlib_frame_t from the next_frame.
+
+In …/src/vlib/main.c:dispatch_pending_node(…), note this stanza:
+
+.. code:: c
+
+     /* Force allocation of new frame while current frame is being
+        dispatched. */
+     restore_frame_index = ~0;
+     if (nf->frame_index == p->frame_index)
+       {
+         nf->frame_index = ~0;
+         nf->flags &= ~VLIB_FRAME_IS_ALLOCATED;
+         if (!(n->flags & VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH))
+       restore_frame_index = p->frame_index;
+       }
+
+dispatch_pending_node is worth a hard stare due to the several
+second-order optimizations it implements. Almost as an afterthought, it
+calls dispatch_node which actually calls the graph node dispatch
+function.
+
+Process / thread model
+----------------------
+
+vlib provides an ultra-lightweight cooperative multi-tasking thread
+model. The graph node scheduler invokes these processes in much the same
+way as traditional vector-processing run-to-completion graph nodes;
+plus-or-minus a setjmp/longjmp pair required to switch stacks. Simply
+set the vlib_node_registration_t type field to vlib_NODE_TYPE_PROCESS.
+Yes, process is a misnomer. These are cooperative multi-tasking threads.
+
+As of this writing, the default stack size is 2<<15 = 32kb. Initialize
+the node registration’s process_log2_n_stack_bytes member as needed. The
+graph node dispatcher makes some effort to detect stack overrun, e.g. by
+mapping a no-access page below each thread stack.
+
+Process node dispatch functions are expected to be “while(1) { }” loops
+which suspend when not otherwise occupied, and which must not run for
+unreasonably long periods of time.
+
+“Unreasonably long” is an application-dependent concept. Over the years,
+we have constructed frame-size sensitive control-plane nodes which will
+use a much higher fraction of the available CPU bandwidth when the frame
+size is low. The classic example: modifying forwarding tables. So long
+as the table-builder leaves the forwarding tables in a valid state, one
+can suspend the table builder to avoid dropping packets as a result of
+control-plane activity.
+
+Process nodes can suspend for fixed amounts of time, or until another
+entity signals an event, or both. See the next section for a description
+of the vlib process event mechanism.
+
+When running in vlib process context, one must pay strict attention to
+loop invariant issues. If one walks a data structure and calls a
+function which may suspend, one had best know by construction that it
+cannot change. Often, it’s best to simply make a snapshot copy of a data
+structure, walk the copy at leisure, then free the copy.
+
+Process events
+--------------
+
+The vlib process event mechanism API is extremely lightweight and easy
+to use. Here is a typical example:
+
+.. code:: c
+
+       vlib_main_t *vm = &vlib_global_main;
+       uword event_type, * event_data = 0;
+
+       while (1)
+       {
+          vlib_process_wait_for_event_or_clock (vm, 5.0 /* seconds */);
+
+          event_type = vlib_process_get_events (vm, &event_data);
+
+          switch (event_type) {
+          case EVENT1:
+              handle_event1s (event_data);
+              break;
+
+          case EVENT2:
+              handle_event2s (event_data);
+              break;
+
+          case ~0: /* 5-second idle/periodic */
+              handle_idle ();
+              break;
+
+          default: /* bug! */
+              ASSERT (0);
+          }
+
+          vec_reset_length(event_data);
+       }
+
+In this example, the VLIB process node waits for an event to occur, or
+for 5 seconds to elapse. The code demuxes on the event type, calling the
+appropriate handler function. Each call to vlib_process_get_events
+returns a vector of per-event-type data passed to successive
+vlib_process_signal_event calls; it is a serious error to process only
+event_data[0].
+
+Resetting the event_data vector-length to 0 [instead of calling
+vec_free] means that the event scheme doesn’t burn cycles continuously
+allocating and freeing the event data vector. This is a common vppinfra
+/ vlib coding pattern, well worth using when appropriate.
+
+Signaling an event is easy, for example:
+
+.. code:: c
+
+       vlib_process_signal_event (vm, process_node_index, EVENT1,
+           (uword)arbitrary_event1_data); /* and so forth */
+
+One can either know the process node index by construction - dig it out
+of the appropriate vlib_node_registration_t - or by finding the
+vlib_node_t with vlib_get_node_by_name(…).
+
+Buffers
+-------
+
+vlib buffering solves the usual set of packet-processing problems,
+albeit at high performance. Key in terms of performance: one ordinarily
+allocates / frees N buffers at a time rather than one at a time. Except
+when operating directly on a specific buffer, one deals with buffers by
+index, not by pointer.
+
+Packet-processing frames are u32[] arrays, not vlib_buffer_t[] arrays.
+
+Packets comprise one or more vlib buffers, chained together as required.
+Multiple particle sizes are supported; hardware input nodes simply ask
+for the required size(s). Coalescing support is available. For obvious
+reasons one is discouraged from writing one’s own wild and wacky buffer
+chain traversal code.
+
+vlib buffer headers are allocated immediately prior to the buffer data
+area. In typical packet processing this saves a dependent read wait:
+given a buffer’s address, one can prefetch the buffer header [metadata]
+at the same time as the first cache line of buffer data.
+
+Buffer header metadata (vlib_buffer_t) includes the usual rewrite
+expansion space, a current_data offset, RX and TX interface indices,
+packet trace information, and a opaque areas.
+
+The opaque data is intended to control packet processing in arbitrary
+subgraph-dependent ways. The programmer shoulders responsibility for
+data lifetime analysis, type-checking, etc.
+
+Buffers have reference-counts in support of e.g. multicast replication.
+
+Shared-memory message API
+-------------------------
+
+Local control-plane and application processes interact with the vpp
+dataplane via asynchronous message-passing in shared memory over
+unidirectional queues. The same application APIs are available via
+sockets.
+
+Capturing API traces and replaying them in a simulation environment
+requires a disciplined approach to the problem. This seems like a
+make-work task, but it is not. When something goes wrong in the
+control-plane after 300,000 or 3,000,000 operations, high-speed replay
+of the events leading up to the accident is a huge win.
+
+The shared-memory message API message allocator vl_api_msg_alloc uses a
+particularly cute trick. Since messages are processed in order, we try
+to allocate message buffering from a set of fixed-size, preallocated
+rings. Each ring item has a “busy” bit. Freeing one of the preallocated
+message buffers merely requires the message consumer to clear the busy
+bit. No locking required.
+
+Debug CLI
+---------
+
+Adding debug CLI commands to VLIB applications is very simple.
+
+Here is a complete example:
+
+.. code:: c
+
+       static clib_error_t *
+       show_ip_tuple_match (vlib_main_t * vm,
+                            unformat_input_t * input,
+                            vlib_cli_command_t * cmd)
+       {
+           vlib_cli_output (vm, "%U\n", format_ip_tuple_match_tables, &routing_main);
+           return 0;
+       }
+
+       static VLIB_CLI_COMMAND (show_ip_tuple_command) =
+       {
+           .path = "show ip tuple match",
+           .short_help = "Show ip 5-tuple match-and-broadcast tables",
+           .function = show_ip_tuple_match,
+       };
+
+This example implements the “show ip tuple match” debug cli command. In
+ordinary usage, the vlib cli is available via the “vppctl” application,
+which sends traffic to a named pipe. One can configure debug CLI telnet
+access on a configurable port.
+
+The cli implementation has an output redirection facility which makes it
+simple to deliver cli output via shared-memory API messaging,
+
+Particularly for debug or “show tech support” type commands, it would be
+wasteful to write vlib application code to pack binary data, write more
+code elsewhere to unpack the data and finally print the answer. If a
+certain cli command has the potential to hurt packet processing
+performance by running for too long, do the work incrementally in a
+process node. The client can wait.
+
+Macro expansion
+~~~~~~~~~~~~~~~
+
+The vpp debug CLI engine includes a recursive macro expander. This is
+quite useful for factoring out address and/or interface name specifics:
+
+::
+
+      define ip1 192.168.1.1/24
+      define ip2 192.168.2.1/24
+      define iface1 GigabitEthernet3/0/0
+      define iface2 loop1
+
+      set int ip address $iface1 $ip1
+      set int ip address $iface2 $(ip2)
+
+      undefine ip1
+      undefine ip2
+      undefine iface1
+      undefine iface2
+
+Each socket (or telnet) debug CLI session has its own macro tables. All
+debug CLI sessions which use CLI_INBAND binary API messages share a
+single table.
+
+The macro expander recognizes circular definitions:
+
+::
+
+       define foo \$(bar)
+       define bar \$(mumble)
+       define mumble \$(foo)
+
+At 8 levels of recursion, the macro expander throws up its hands and
+replies “CIRCULAR.”
+
+Macro-related debug CLI commands
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In addition to the “define” and “undefine” debug CLI commands, use “show
+macro [noevaluate]” to dump the macro table. The “echo” debug CLI
+command will evaluate and print its argument:
+
+::
+
+       vpp# define foo This\ Is\ Foo
+       vpp# echo $foo
+       This Is Foo
+
+Handing off buffers between threads
+-----------------------------------
+
+Vlib includes an easy-to-use mechanism for handing off buffers between
+worker threads. A typical use-case: software ingress flow hashing. At a
+high level, one creates a per-worker-thread queue which sends packets to
+a specific graph node in the indicated worker thread. With the queue in
+hand, enqueue packets to the worker thread of your choice.
+
+Initialize a handoff queue
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Simple enough, call vlib_frame_queue_main_init:
+
+.. code:: c
+
+      main_ptr->frame_queue_index
+          = vlib_frame_queue_main_init (dest_node.index, frame_queue_size);
+
+Frame_queue_size means what it says: the number of frames which may be
+queued. Since frames contain 1…256 packets, frame_queue_size should be a
+reasonably small number (32…64). If the frame queue producer(s) are
+faster than the frame queue consumer(s), congestion will occur. Suggest
+letting the enqueue operator deal with queue congestion, as shown in the
+enqueue example below.
+
+Under the floorboards, vlib_frame_queue_main_init creates an input queue
+for each worker thread.
+
+Please do NOT create frame queues until it’s clear that they will be
+used. Although the main dispatch loop is reasonably smart about how
+often it polls the (entire set of) frame queues, polling unused frame
+queues is a waste of clock cycles.
+
+Hand off packets
+~~~~~~~~~~~~~~~~
+
+The actual handoff mechanics are simple, and integrate nicely with a
+typical graph-node dispatch function:
+
+.. code:: c
+
+       always_inline uword
+       do_handoff_inline (vlib_main_t * vm,
+                      vlib_node_runtime_t * node, vlib_frame_t * frame,
+                      int is_ip4, int is_trace)
+       {
+         u32 n_left_from, *from;
+         vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
+         u16 thread_indices [VLIB_FRAME_SIZE];
+         u16 nexts[VLIB_FRAME_SIZE], *next;
+         u32 n_enq;
+         htest_main_t *hmp = &htest_main;
+         int i;
+
+         from = vlib_frame_vector_args (frame);
+         n_left_from = frame->n_vectors;
+
+         vlib_get_buffers (vm, from, bufs, n_left_from);
+         next = nexts;
+         b = bufs;
+
+         /*
+          * Typical frame traversal loop, details vary with
+          * use case. Make sure to set thread_indices[i] with
+          * the desired destination thread index. You may
+          * or may not bother to set next[i].
+          */
+
+         for (i = 0; i < frame->n_vectors; i++)
+           {
+             <snip>
+             /* Pick a thread to handle this packet */
+             thread_indices[i] = f (packet_data_or_whatever);
+             <snip>
+
+             b += 1;
+             next += 1;
+             n_left_from -= 1;
+           }
+
+          /* Enqueue buffers to threads */
+          n_enq =
+           vlib_buffer_enqueue_to_thread (vm, node, hmp->frame_queue_index,
+                                          from, thread_indices, frame->n_vectors,
+                                          1 /* drop on congestion */);
+          /* Typical counters,
+         if (n_enq < frame->n_vectors)
+           vlib_node_increment_counter (vm, node->node_index,
+                        XXX_ERROR_CONGESTION_DROP,
+                        frame->n_vectors - n_enq);
+         vlib_node_increment_counter (vm, node->node_index,
+                            XXX_ERROR_HANDED_OFF, n_enq);
+         return frame->n_vectors;
+   }
+
+Notes about calling vlib_buffer_enqueue_to_thread(…):
+
+-  If you pass “drop on congestion” non-zero, all packets in the inbound
+   frame will be consumed one way or the other. This is the recommended
+   setting.
+
+-  In the drop-on-congestion case, please don’t try to “help” in the
+   enqueue node by freeing dropped packets, or by pushing them to
+   “error-drop.” Either of those actions would be a severe error.
+
+-  It’s perfectly OK to enqueue packets to the current thread.
+
+Handoff Demo Plugin
+-------------------
+
+Check out the sample (plugin) example in …/src/examples/handoffdemo. If
+you want to build the handoff demo plugin:
+
+::
+
+   $ cd .../src/plugins
+   $ ln -s ../examples/handoffdemo
+
+This plugin provides a simple example of how to hand off packets between
+threads. We used it to debug packet-tracer handoff tracing support.
+
+Packet generator input script
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+    packet-generator new {
+       name x
+       limit 5
+       size 128-128
+       interface local0
+       node handoffdemo-1
+       data {
+           incrementing 30
+       }
+    }
+
+Start vpp with 2 worker threads
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The demo plugin hands packets from worker 1 to worker 2.
+
+Enable tracing, and start the packet generator
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+     trace add pg-input 100
+     packet-generator enable
+
+Sample Run
+~~~~~~~~~~
+
+::
+
+     DBGvpp# ex /tmp/pg_input_script
+     DBGvpp# pa en
+     DBGvpp# sh err
+      Count                    Node                  Reason
+            5              handoffdemo-1             packets handed off processed
+            5              handoffdemo-2             completed packets
+     DBGvpp# show run
+     Thread 1 vpp_wk_0 (lcore 0)
+     Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00
+       vector rates in 3.7331e-2, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
+                  Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
+     handoffdemo-1                    active                  1               5               0          4.76e3            5.00
+     pg-input                        disabled                 2               5               0          5.58e4            2.50
+     unix-epoll-input                 polling             22760               0               0          2.14e7            0.00
+     ---------------
+     Thread 2 vpp_wk_1 (lcore 2)
+     Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00
+       vector rates in 0.0000e0, out 0.0000e0, drop 3.7331e-2, punt 0.0000e0
+                  Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
+     drop                             active                  1               5               0          1.35e4            5.00
+     error-drop                       active                  1               5               0          2.52e4            5.00
+     handoffdemo-2                    active                  1               5               0          2.56e4            5.00
+     unix-epoll-input                 polling             22406               0               0          2.18e7            0.00
+
+Enable the packet tracer and run it again…
+
+::
+
+     DBGvpp# trace add pg-input 100
+     DBGvpp# pa en
+     DBGvpp# sh trace
+     sh trace
+     ------------------- Start of thread 0 vpp_main -------------------
+     No packets in trace buffer
+     ------------------- Start of thread 1 vpp_wk_0 -------------------
+     Packet 1
+
+     00:06:50:520688: pg-input
+       stream x, 128 bytes, 0 sw_if_index
+       current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000000
+       00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+       00000020: 0000000000000000000000000000000000000000000000000000000000000000
+       00000040: 0000000000000000000000000000000000000000000000000000000000000000
+       00000060: 0000000000000000000000000000000000000000000000000000000000000000
+     00:06:50:520762: handoffdemo-1
+       HANDOFFDEMO: current thread 1
+
+     Packet 2
+
+     00:06:50:520688: pg-input
+       stream x, 128 bytes, 0 sw_if_index
+       current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000001
+       00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+       00000020: 0000000000000000000000000000000000000000000000000000000000000000
+       00000040: 0000000000000000000000000000000000000000000000000000000000000000
+       00000060: 0000000000000000000000000000000000000000000000000000000000000000
+     00:06:50:520762: handoffdemo-1
+       HANDOFFDEMO: current thread 1
+
+     Packet 3
+
+     00:06:50:520688: pg-input
+       stream x, 128 bytes, 0 sw_if_index
+       current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000002
+       00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+       00000020: 0000000000000000000000000000000000000000000000000000000000000000
+       00000040: 0000000000000000000000000000000000000000000000000000000000000000
+       00000060: 0000000000000000000000000000000000000000000000000000000000000000
+     00:06:50:520762: handoffdemo-1
+       HANDOFFDEMO: current thread 1
+
+     Packet 4
+
+     00:06:50:520688: pg-input
+       stream x, 128 bytes, 0 sw_if_index
+       current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000003
+       00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+       00000020: 0000000000000000000000000000000000000000000000000000000000000000
+       00000040: 0000000000000000000000000000000000000000000000000000000000000000
+       00000060: 0000000000000000000000000000000000000000000000000000000000000000
+     00:06:50:520762: handoffdemo-1
+       HANDOFFDEMO: current thread 1
+
+     Packet 5
+
+     00:06:50:520688: pg-input
+       stream x, 128 bytes, 0 sw_if_index
+       current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000004
+       00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+       00000020: 0000000000000000000000000000000000000000000000000000000000000000
+       00000040: 0000000000000000000000000000000000000000000000000000000000000000
+       00000060: 0000000000000000000000000000000000000000000000000000000000000000
+     00:06:50:520762: handoffdemo-1
+       HANDOFFDEMO: current thread 1
+
+     ------------------- Start of thread 2 vpp_wk_1 -------------------
+     Packet 1
+
+     00:06:50:520796: handoff_trace
+       HANDED-OFF: from thread 1 trace index 0
+     00:06:50:520796: handoffdemo-2
+       HANDOFFDEMO: current thread 2
+     00:06:50:520867: error-drop
+       rx:local0
+     00:06:50:520914: drop
+       handoffdemo-2: completed packets
+
+     Packet 2
+
+     00:06:50:520796: handoff_trace
+       HANDED-OFF: from thread 1 trace index 1
+     00:06:50:520796: handoffdemo-2
+       HANDOFFDEMO: current thread 2
+     00:06:50:520867: error-drop
+       rx:local0
+     00:06:50:520914: drop
+       handoffdemo-2: completed packets
+
+     Packet 3
+
+     00:06:50:520796: handoff_trace
+       HANDED-OFF: from thread 1 trace index 2
+     00:06:50:520796: handoffdemo-2
+       HANDOFFDEMO: current thread 2
+     00:06:50:520867: error-drop
+       rx:local0
+     00:06:50:520914: drop
+       handoffdemo-2: completed packets
+
+     Packet 4
+
+     00:06:50:520796: handoff_trace
+       HANDED-OFF: from thread 1 trace index 3
+     00:06:50:520796: handoffdemo-2
+       HANDOFFDEMO: current thread 2
+     00:06:50:520867: error-drop
+       rx:local0
+     00:06:50:520914: drop
+       handoffdemo-2: completed packets
+
+     Packet 5
+
+     00:06:50:520796: handoff_trace
+       HANDED-OFF: from thread 1 trace index 4
+     00:06:50:520796: handoffdemo-2
+       HANDOFFDEMO: current thread 2
+     00:06:50:520867: error-drop
+       rx:local0
+     00:06:50:520914: drop
+       handoffdemo-2: completed packets
+    DBGvpp#
diff --git a/docs/developer/corearchitecture/vnet.rst b/docs/developer/corearchitecture/vnet.rst
new file mode 100644
index 00000000000..2740ac0ad03
--- /dev/null
+++ b/docs/developer/corearchitecture/vnet.rst
@@ -0,0 +1,807 @@
+VNET (VPP Network Stack)
+========================
+
+The files associated with the VPP network stack layer are located in the
+*./src/vnet* folder. The Network Stack Layer is basically an
+instantiation of the code in the other layers. This layer has a vnet
+library that provides vectorized layer-2 and 3 networking graph nodes, a
+packet generator, and a packet tracer.
+
+In terms of building a packet processing application, vnet provides a
+platform-independent subgraph to which one connects a couple of
+device-driver nodes.
+
+Typical RX connections include “ethernet-input” [full software
+classification, feeds ipv4-input, ipv6-input, arp-input etc.] and
+“ipv4-input-no-checksum” [if hardware can classify, perform ipv4 header
+checksum].
+
+Effective graph dispatch function coding
+----------------------------------------
+
+Over the 15 years, multiple coding styles have emerged: a
+single/dual/quad loop coding model (with variations) and a
+fully-pipelined coding model.
+
+Single/dual loops
+-----------------
+
+The single/dual/quad loop model variations conveniently solve problems
+where the number of items to process is not known in advance: typical
+hardware RX-ring processing. This coding style is also very effective
+when a given node will not need to cover a complex set of dependent
+reads.
+
+Here is an quad/single loop which can leverage up-to-avx512 SIMD vector
+units to convert buffer indices to buffer pointers:
+
+.. code:: c
+
+      static uword
+      simulated_ethernet_interface_tx (vlib_main_t * vm,
+                    vlib_node_runtime_t *
+                    node, vlib_frame_t * frame)
+      {
+        u32 n_left_from, *from;
+        u32 next_index = 0;
+        u32 n_bytes;
+        u32 thread_index = vm->thread_index;
+        vnet_main_t *vnm = vnet_get_main ();
+        vnet_interface_main_t *im = &vnm->interface_main;
+        vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
+        u16 nexts[VLIB_FRAME_SIZE], *next;
+
+        n_left_from = frame->n_vectors;
+        from = vlib_frame_vector_args (frame);
+
+        /*
+         * Convert up to VLIB_FRAME_SIZE indices in "from" to
+         * buffer pointers in bufs[]
+         */
+        vlib_get_buffers (vm, from, bufs, n_left_from);
+        b = bufs;
+        next = nexts;
+
+        /*
+         * While we have at least 4 vector elements (pkts) to process..
+         */
+        while (n_left_from >= 4)
+          {
+            /* Prefetch next quad-loop iteration. */
+            if (PREDICT_TRUE (n_left_from >= 8))
+          {
+            vlib_prefetch_buffer_header (b[4], STORE);
+            vlib_prefetch_buffer_header (b[5], STORE);
+            vlib_prefetch_buffer_header (b[6], STORE);
+            vlib_prefetch_buffer_header (b[7], STORE);
+              }
+
+            /*
+             * $$$ Process 4x packets right here...
+             * set next[0..3] to send the packets where they need to go
+             */
+
+             do_something_to (b[0]);
+             do_something_to (b[1]);
+             do_something_to (b[2]);
+             do_something_to (b[3]);
+
+            /* Process the next 0..4 packets */
+        b += 4;
+        next += 4;
+        n_left_from -= 4;
+       }
+        /*
+         * Clean up 0...3 remaining packets at the end of the incoming frame
+         */
+        while (n_left_from > 0)
+          {
+            /*
+             * $$$ Process one packet right here...
+             * set next[0..3] to send the packets where they need to go
+             */
+             do_something_to (b[0]);
+
+            /* Process the next packet */
+            b += 1;
+            next += 1;
+            n_left_from -= 1;
+          }
+
+        /*
+         * Send the packets along their respective next-node graph arcs
+         * Considerable locality of reference is expected, most if not all
+         * packets in the inbound vector will traverse the same next-node
+         * arc
+         */
+        vlib_buffer_enqueue_to_next (vm, node, from, nexts, frame->n_vectors);
+
+        return frame->n_vectors;
+      }
+
+Given a packet processing task to implement, it pays to scout around
+looking for similar tasks, and think about using the same coding
+pattern. It is not uncommon to recode a given graph node dispatch
+function several times during performance optimization.
+
+Creating Packets from Scratch
+-----------------------------
+
+At times, it’s necessary to create packets from scratch and send them.
+Tasks like sending keepalives or actively opening connections come to
+mind. Its not difficult, but accurate buffer metadata setup is required.
+
+Allocating Buffers
+~~~~~~~~~~~~~~~~~~
+
+Use vlib_buffer_alloc, which allocates a set of buffer indices. For
+low-performance applications, it’s OK to allocate one buffer at a time.
+Note that vlib_buffer_alloc(…) does NOT initialize buffer metadata. See
+below.
+
+In high-performance cases, allocate a vector of buffer indices, and hand
+them out from the end of the vector; decrement \_vec_len(..) as buffer
+indices are allocated. See tcp_alloc_tx_buffers(…) and
+tcp_get_free_buffer_index(…) for an example.
+
+Buffer Initialization Example
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The following example shows the **main points**, but is not to be
+blindly cut-’n-pasted.
+
+.. code:: c
+
+     u32 bi0;
+     vlib_buffer_t *b0;
+     ip4_header_t *ip;
+     udp_header_t *udp;
+
+     /* Allocate a buffer */
+     if (vlib_buffer_alloc (vm, &bi0, 1) != 1)
+       return -1;
+
+     b0 = vlib_get_buffer (vm, bi0);
+
+     /* At this point b0->current_data = 0, b0->current_length = 0 */
+
+     /*
+      * Copy data into the buffer. This example ASSUMES that data will fit
+      * in a single buffer, and is e.g. an ip4 packet.
+      */
+     if (have_packet_rewrite)
+        {
+          clib_memcpy (b0->data, data, vec_len (data));
+          b0->current_length = vec_len (data);
+        }
+     else
+        {
+          /* OR, build a udp-ip packet (for example) */
+          ip = vlib_buffer_get_current (b0);
+          udp = (udp_header_t *) (ip + 1);
+          data_dst = (u8 *) (udp + 1);
+
+          ip->ip_version_and_header_length = 0x45;
+          ip->ttl = 254;
+          ip->protocol = IP_PROTOCOL_UDP;
+          ip->length = clib_host_to_net_u16 (sizeof (*ip) + sizeof (*udp) +
+                     vec_len(udp_data));
+          ip->src_address.as_u32 = src_address->as_u32;
+          ip->dst_address.as_u32 = dst_address->as_u32;
+          udp->src_port = clib_host_to_net_u16 (src_port);
+          udp->dst_port = clib_host_to_net_u16 (dst_port);
+          udp->length = clib_host_to_net_u16 (vec_len (udp_data));
+          clib_memcpy (data_dst, udp_data, vec_len(udp_data));
+
+          if (compute_udp_checksum)
+            {
+              /* RFC 7011 section 10.3.2. */
+              udp->checksum = ip4_tcp_udp_compute_checksum (vm, b0, ip);
+              if (udp->checksum == 0)
+                udp->checksum = 0xffff;
+         }
+         b0->current_length = vec_len (sizeof (*ip) + sizeof (*udp) +
+                                      vec_len (udp_data));
+
+       }
+     b0->flags |= VLIB_BUFFER_TOTAL_LENGTH_VALID;
+
+     /* sw_if_index 0 is the "local" interface, which always exists */
+     vnet_buffer (b0)->sw_if_index[VLIB_RX] = 0;
+
+     /* Use the default FIB index for tx lookup. Set non-zero to use another fib */
+     vnet_buffer (b0)->sw_if_index[VLIB_TX] = 0;
+
+If your use-case calls for large packet transmission, use
+vlib_buffer_chain_append_data_with_alloc(…) to create the requisite
+buffer chain.
+
+Enqueueing packets for lookup and transmission
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The simplest way to send a set of packets is to use
+vlib_get_frame_to_node(…) to allocate fresh frame(s) to ip4_lookup_node
+or ip6_lookup_node, add the constructed buffer indices, and dispatch the
+frame using vlib_put_frame_to_node(…).
+
+.. code:: c
+
+       vlib_frame_t *f;
+       f = vlib_get_frame_to_node (vm, ip4_lookup_node.index);
+       f->n_vectors = vec_len(buffer_indices_to_send);
+       to_next = vlib_frame_vector_args (f);
+
+       for (i = 0; i < vec_len (buffer_indices_to_send); i++)
+         to_next[i] = buffer_indices_to_send[i];
+
+       vlib_put_frame_to_node (vm, ip4_lookup_node_index, f);
+
+It is inefficient to allocate and schedule single packet frames. That’s
+typical in case you need to send one packet per second, but should
+**not** occur in a for-loop!
+
+Packet tracer
+-------------
+
+Vlib includes a frame element [packet] trace facility, with a simple
+debug CLI interface. The cli is straightforward: “trace add
+input-node-name count” to start capturing packet traces.
+
+To trace 100 packets on a typical x86_64 system running the dpdk plugin:
+“trace add dpdk-input 100”. When using the packet generator: “trace add
+pg-input 100”
+
+To display the packet trace: “show trace”
+
+Each graph node has the opportunity to capture its own trace data. It is
+almost always a good idea to do so. The trace capture APIs are simple.
+
+The packet capture APIs snapshoot binary data, to minimize processing at
+capture time. Each participating graph node initialization provides a
+vppinfra format-style user function to pretty-print data when required
+by the VLIB “show trace” command.
+
+Set the VLIB node registration “.format_trace” member to the name of the
+per-graph node format function.
+
+Here’s a simple example:
+
+.. code:: c
+
+       u8 * my_node_format_trace (u8 * s, va_list * args)
+       {
+           vlib_main_t * vm = va_arg (*args, vlib_main_t *);
+           vlib_node_t * node = va_arg (*args, vlib_node_t *);
+           my_node_trace_t * t = va_arg (*args, my_trace_t *);
+
+           s = format (s, "My trace data was: %d", t-><whatever>);
+
+           return s;
+       }
+
+The trace framework hands the per-node format function the data it
+captured as the packet whizzed by. The format function pretty-prints the
+data as desired.
+
+Graph Dispatcher Pcap Tracing
+-----------------------------
+
+The vpp graph dispatcher knows how to capture vectors of packets in pcap
+format as they’re dispatched. The pcap captures are as follows:
+
+::
+
+       VPP graph dispatch trace record description:
+
+           0                   1                   2                   3
+           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | Major Version | Minor Version | NStrings      | ProtoHint     |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | Buffer index (big endian)                                     |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          + VPP graph node name ...     ...               | NULL octet    |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | Buffer Metadata ... ...                       | NULL octet    |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | Buffer Opaque ... ...                         | NULL octet    |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | Buffer Opaque 2 ... ...                       | NULL octet    |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | VPP ASCII packet trace (if NStrings > 4)      | NULL octet    |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+          | Packet data (up to 16K)                                       |
+          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+Graph dispatch records comprise a version stamp, an indication of how
+many NULL-terminated strings will follow the record header and preceed
+packet data, and a protocol hint.
+
+The buffer index is an opaque 32-bit cookie which allows consumers of
+these data to easily filter/track single packets as they traverse the
+forwarding graph.
+
+Multiple records per packet are normal, and to be expected. Packets will
+appear multiple times as they traverse the vpp forwarding graph. In this
+way, vpp graph dispatch traces are significantly different from regular
+network packet captures from an end-station. This property complicates
+stateful packet analysis.
+
+Restricting stateful analysis to records from a single vpp graph node
+such as “ethernet-input” seems likely to improve the situation.
+
+As of this writing: major version = 1, minor version = 0. Nstrings
+SHOULD be 4 or 5. Consumers SHOULD be wary values less than 4 or greater
+than 5. They MAY attempt to display the claimed number of strings, or
+they MAY treat the condition as an error.
+
+Here is the current set of protocol hints:
+
+.. code:: c
+
+       typedef enum
+         {
+           VLIB_NODE_PROTO_HINT_NONE = 0,
+           VLIB_NODE_PROTO_HINT_ETHERNET,
+           VLIB_NODE_PROTO_HINT_IP4,
+           VLIB_NODE_PROTO_HINT_IP6,
+           VLIB_NODE_PROTO_HINT_TCP,
+           VLIB_NODE_PROTO_HINT_UDP,
+           VLIB_NODE_N_PROTO_HINTS,
+         } vlib_node_proto_hint_t;
+
+Example: VLIB_NODE_PROTO_HINT_IP6 means that the first octet of packet
+data SHOULD be 0x60, and should begin an ipv6 packet header.
+
+Downstream consumers of these data SHOULD pay attention to the protocol
+hint. They MUST tolerate inaccurate hints, which MAY occur from time to
+time.
+
+Dispatch Pcap Trace Debug CLI
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To start a dispatch trace capture of up to 10,000 trace records:
+
+::
+
+        pcap dispatch trace on max 10000 file dispatch.pcap
+
+To start a dispatch trace which will also include standard vpp packet
+tracing for packets which originate in dpdk-input:
+
+::
+
+        pcap dispatch trace on max 10000 file dispatch.pcap buffer-trace dpdk-input 1000
+
+To save the pcap trace, e.g. in /tmp/dispatch.pcap:
+
+::
+
+       pcap dispatch trace off
+
+Wireshark dissection of dispatch pcap traces
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It almost goes without saying that we built a companion wireshark
+dissector to display these traces. As of this writing, we have
+upstreamed the wireshark dissector.
+
+Since it will be a while before wireshark/master/latest makes it into
+all of the popular Linux distros, please see the “How to build a vpp
+dispatch trace aware Wireshark” page for build info.
+
+Here is a sample packet dissection, with some fields omitted for
+clarity. The point is that the wireshark dissector accurately displays
+**all** of the vpp buffer metadata, and the name of the graph node in
+question.
+
+::
+
+       Frame 1: 2216 bytes on wire (17728 bits), 2216 bytes captured (17728 bits)
+           Encapsulation type: USER 13 (58)
+           [Protocols in frame: vpp:vpp-metadata:vpp-opaque:vpp-opaque2:eth:ethertype:ip:tcp:data]
+       VPP Dispatch Trace
+           BufferIndex: 0x00036663
+       NodeName: ethernet-input
+       VPP Buffer Metadata
+           Metadata: flags:
+           Metadata: current_data: 0, current_length: 102
+           Metadata: current_config_index: 0, flow_id: 0, next_buffer: 0
+           Metadata: error: 0, n_add_refs: 0, buffer_pool_index: 0
+           Metadata: trace_index: 0, recycle_count: 0, len_not_first_buf: 0
+           Metadata: free_list_index: 0
+           Metadata:
+       VPP Buffer Opaque
+           Opaque: raw: 00000007 ffffffff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
+           Opaque: sw_if_index[VLIB_RX]: 7, sw_if_index[VLIB_TX]: -1
+           Opaque: L2 offset 0, L3 offset 0, L4 offset 0, feature arc index 0
+           Opaque: ip.adj_index[VLIB_RX]: 0, ip.adj_index[VLIB_TX]: 0
+           Opaque: ip.flow_hash: 0x0, ip.save_protocol: 0x0, ip.fib_index: 0
+           Opaque: ip.save_rewrite_length: 0, ip.rpf_id: 0
+           Opaque: ip.icmp.type: 0 ip.icmp.code: 0, ip.icmp.data: 0x0
+           Opaque: ip.reass.next_index: 0, ip.reass.estimated_mtu: 0
+           Opaque: ip.reass.fragment_first: 0 ip.reass.fragment_last: 0
+           Opaque: ip.reass.range_first: 0 ip.reass.range_last: 0
+           Opaque: ip.reass.next_range_bi: 0x0, ip.reass.ip6_frag_hdr_offset: 0
+           Opaque: mpls.ttl: 0, mpls.exp: 0, mpls.first: 0, mpls.save_rewrite_length: 0, mpls.bier.n_bytes: 0
+           Opaque: l2.feature_bitmap: 00000000, l2.bd_index: 0, l2.l2_len: 0, l2.shg: 0, l2.l2fib_sn: 0, l2.bd_age: 0
+           Opaque: l2.feature_bitmap_input:   none configured, L2.feature_bitmap_output:   none configured
+           Opaque: l2t.next_index: 0, l2t.session_index: 0
+           Opaque: l2_classify.table_index: 0, l2_classify.opaque_index: 0, l2_classify.hash: 0x0
+           Opaque: policer.index: 0
+           Opaque: ipsec.flags: 0x0, ipsec.sad_index: 0
+           Opaque: map.mtu: 0
+           Opaque: map_t.v6.saddr: 0x0, map_t.v6.daddr: 0x0, map_t.v6.frag_offset: 0, map_t.v6.l4_offset: 0
+           Opaque: map_t.v6.l4_protocol: 0, map_t.checksum_offset: 0, map_t.mtu: 0
+           Opaque: ip_frag.mtu: 0, ip_frag.next_index: 0, ip_frag.flags: 0x0
+           Opaque: cop.current_config_index: 0
+           Opaque: lisp.overlay_afi: 0
+           Opaque: tcp.connection_index: 0, tcp.seq_number: 0, tcp.seq_end: 0, tcp.ack_number: 0, tcp.hdr_offset: 0, tcp.data_offset: 0
+           Opaque: tcp.data_len: 0, tcp.flags: 0x0
+           Opaque: sctp.connection_index: 0, sctp.sid: 0, sctp.ssn: 0, sctp.tsn: 0, sctp.hdr_offset: 0
+           Opaque: sctp.data_offset: 0, sctp.data_len: 0, sctp.subconn_idx: 0, sctp.flags: 0x0
+           Opaque: snat.flags: 0x0
+           Opaque:
+       VPP Buffer Opaque2
+           Opaque2: raw: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
+           Opaque2: qos.bits: 0, qos.source: 0
+           Opaque2: loop_counter: 0
+           Opaque2: gbp.flags: 0, gbp.src_epg: 0
+           Opaque2: pg_replay_timestamp: 0
+           Opaque2:
+       Ethernet II, Src: 06:d6:01:41:3b:92 (06:d6:01:41:3b:92), Dst: IntelCor_3d:f6    Transmission Control Protocol, Src Port: 22432, Dst Port: 54084, Seq: 1, Ack: 1, Len: 36
+           Source Port: 22432
+           Destination Port: 54084
+           TCP payload (36 bytes)
+       Data (36 bytes)
+
+       0000  cf aa 8b f5 53 14 d4 c7 29 75 3e 56 63 93 9d 11   ....S...)u>Vc...
+       0010  e5 f2 92 27 86 56 4c 21 ce c5 23 46 d7 eb ec 0d   ...'.VL!..#F....
+       0020  a8 98 36 5a                                       ..6Z
+           Data: cfaa8bf55314d4c729753e5663939d11e5f2922786564c21…
+           [Length: 36]
+
+It’s a matter of a couple of mouse-clicks in Wireshark to filter the
+trace to a specific buffer index. With that specific kind of filtration,
+one can watch a packet walk through the forwarding graph; noting any/all
+metadata changes, header checksum changes, and so forth.
+
+This should be of significant value when developing new vpp graph nodes.
+If new code mispositions b->current_data, it will be completely obvious
+from looking at the dispatch trace in wireshark.
+
+pcap rx, tx, and drop tracing
+-----------------------------
+
+vpp also supports rx, tx, and drop packet capture in pcap format,
+through the “pcap trace” debug CLI command.
+
+This command is used to start or stop a packet capture, or show the
+status of packet capture. Each of “pcap trace rx”, “pcap trace tx”, and
+“pcap trace drop” is implemented. Supply one or more of “rx”, “tx”, and
+“drop” to enable multiple simultaneous capture types.
+
+These commands have the following optional parameters:
+
+-  rx - trace received packets.
+
+-  tx - trace transmitted packets.
+
+-  drop - trace dropped packets.
+
+-  max *nnnn*\  - file size, number of packet captures. Once packets
+   have been received, the trace buffer buffer is flushed to the
+   indicated file. Defaults to 1000. Can only be updated if packet
+   capture is off.
+
+-  max-bytes-per-pkt *nnnn*\  - maximum number of bytes to trace on a
+   per-packet basis. Must be >32 and less than 9000. Default value:
+
+   512.
+
+-  filter - Use the pcap trace rx / tx / drop filter, which must be
+   configured. Use classify filter pcap… to configure the filter. The
+   filter will only be executed if the per-interface or any-interface
+   tests fail.
+
+-  intfc *interface* \| *any*\  - Used to specify a given interface, or
+   use ‘any’ to run packet capture on all interfaces. ‘any’ is the
+   default if not provided. Settings from a previous packet capture are
+   preserved, so ‘any’ can be used to reset the interface setting.
+
+-  file *filename*\  - Used to specify the output filename. The file
+   will be placed in the ‘/tmp’ directory. If *filename* already exists,
+   file will be overwritten. If no filename is provided, ‘/tmp/rx.pcap
+   or tx.pcap’ will be used, depending on capture direction. Can only be
+   updated when pcap capture is off.
+
+-  status - Displays the current status and configured attributes
+   associated with a packet capture. If packet capture is in progress,
+   ‘status’ also will return the number of packets currently in the
+   buffer. Any additional attributes entered on command line with a
+   ‘status’ request will be ignored.
+
+-  filter - Capture packets which match the current packet trace filter
+   set. See next section. Configure the capture filter first.
+
+packet trace capture filtering
+------------------------------
+
+The “classify filter pcap \| \| trace” debug CLI command constructs an
+arbitrary set of packet classifier tables for use with “pcap trace rx \|
+tx \| drop,” and with the vpp packet tracer on a per-interface or
+system-wide basis.
+
+Packets which match a rule in the classifier table chain will be traced.
+The tables are automatically ordered so that matches in the most
+specific table are tried first.
+
+It’s reasonably likely that folks will configure a single table with one
+or two matches. As a result, we configure 8 hash buckets and 128K of
+match rule space by default. One can override the defaults by specifying
+“buckets ” and “memory-size ” as desired.
+
+To build up complex filter chains, repeatedly issue the classify filter
+debug CLI command. Each command must specify the desired mask and match
+values. If a classifier table with a suitable mask already exists, the
+CLI command adds a match rule to the existing table. If not, the CLI
+command add a new table and the indicated mask rule
+
+Configure a simple pcap classify filter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+       classify filter pcap mask l3 ip4 src match l3 ip4 src 192.168.1.11
+       pcap trace rx max 100 filter
+
+Configure a simple per-interface capture filter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+       classify filter GigabitEthernet3/0/0 mask l3 ip4 src match l3 ip4 src 192.168.1.11"
+       pcap trace rx max 100 intfc GigabitEthernet3/0/0
+
+Note that per-interface capture filters are *always* applied.
+
+Clear per-interface capture filters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+       classify filter GigabitEthernet3/0/0 del
+
+Configure another fairly simple pcap classify filter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+      classify filter pcap mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10
+      pcap trace tx max 100 filter
+
+Configure a vpp packet tracer filter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+      classify filter trace mask l3 ip4 src dst match l3 ip4 src 192.168.1.10 dst 192.168.2.10
+      trace add dpdk-input 100 filter
+
+Clear all current classifier filters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+       classify filter [pcap | <interface> | trace] del
+
+To inspect the classifier tables
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+      show classify table [verbose]
+
+The verbose form displays all of the match rules, with hit-counters.
+
+Terse description of the “mask ” syntax:
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+       l2 src dst proto tag1 tag2 ignore-tag1 ignore-tag2 cos1 cos2 dot1q dot1ad
+       l3 ip4 <ip4-mask> ip6 <ip6-mask>
+       <ip4-mask> version hdr_length src[/width] dst[/width]
+                  tos length fragment_id ttl protocol checksum
+       <ip6-mask> version traffic-class flow-label src dst proto
+                  payload_length hop_limit protocol
+       l4 tcp <tcp-mask> udp <udp_mask> src_port dst_port
+       <tcp-mask> src dst  # ports
+       <udp-mask> src_port dst_port
+
+To construct **matches**, add the values to match after the indicated
+keywords in the mask syntax. For example: “… mask l3 ip4 src” -> “…
+match l3 ip4 src 192.168.1.11”
+
+VPP Packet Generator
+--------------------
+
+We use the VPP packet generator to inject packets into the forwarding
+graph. The packet generator can replay pcap traces, and generate packets
+out of whole cloth at respectably high performance.
+
+The VPP pg enables quite a variety of use-cases, ranging from functional
+testing of new data-plane nodes to regression testing to performance
+tuning.
+
+PG setup scripts
+----------------
+
+PG setup scripts describe traffic in detail, and leverage vpp debug CLI
+mechanisms. It’s reasonably unusual to construct a pg setup script which
+doesn’t include a certain amount of interface and FIB configuration.
+
+For example:
+
+::
+
+       loop create
+       set int ip address loop0 192.168.1.1/24
+       set int state loop0 up
+
+       packet-generator new {
+           name pg0
+           limit 100
+           rate 1e6
+           size 300-300
+           interface loop0
+           node ethernet-input
+           data { IP4: 1.2.3 -> 4.5.6
+                  UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10
+                  UDP: 1234 -> 2345
+                  incrementing 286
+           }
+       }
+
+A packet generator stream definition includes two major sections: -
+Stream Parameter Setup - Packet Data
+
+Stream Parameter Setup
+~~~~~~~~~~~~~~~~~~~~~~
+
+Given the example above, let’s look at how to set up stream parameters:
+
+-  **name pg0** - Name of the stream, in this case “pg0”
+
+-  **limit 1000** - Number of packets to send when the stream is
+   enabled. “limit 0” means send packets continuously.
+
+-  **maxframe <nnn>** - Maximum frame size. Handy for injecting multiple
+   frames no larger than <nnn>. Useful for checking dual / quad loop
+   codes
+
+-  **rate 1e6** - Packet injection rate, in this case 1 MPPS. When not
+   specified, the packet generator injects packets as fast as possible
+
+-  **size 300-300** - Packet size range, in this case send 300-byte
+   packets
+
+-  **interface loop0** - Packets appear as if they were received on the
+   specified interface. This datum is used in multiple ways: to select
+   graph arc feature configuration, to select IP FIBs. Configure
+   features e.g. on loop0 to exercise those features.
+
+-  **tx-interface <name>** - Packets will be transmitted on the
+   indicated interface. Typically required only when injecting packets
+   into post-IP-rewrite graph nodes.
+
+-  **pcap <filename>** - Replay packets from the indicated pcap capture
+   file. “make test” makes extensive use of this feature: generate
+   packets using scapy, save them in a .pcap file, then inject them into
+   the vpp graph via a vpp pg “pcap <filename>” stream definition
+
+-  **worker <nn>** - Generate packets for the stream using the indicated
+   vpp worker thread. The vpp pg generates and injects O(10 MPPS /
+   core). Use multiple stream definitions and worker threads to generate
+   and inject enough traffic to easily fill a 40 gbit pipe with small
+   packets.
+
+Data definition
+~~~~~~~~~~~~~~~
+
+Packet generator data definitions make use of a layered implementation
+strategy. Networking layers are specified in order, and the notation can
+seem a bit counter-intuitive. In the example above, the data definition
+stanza constructs a set of L2-L4 headers layers, and uses an
+incrementing fill pattern to round out the requested 300-byte packets.
+
+-  **IP4: 1.2.3 -> 4.5.6** - Construct an L2 (MAC) header with the ip4
+   ethertype (0x800), src MAC address of 00:01:00:02:00:03 and dst MAC
+   address of 00:04:00:05:00:06. Mac addresses may be specified in
+   either *xxxx.xxxx.xxxx* format or *xx:xx:xx:xx:xx:xx* format.
+
+-  **UDP: 192.168.1.10 - 192.168.1.254 -> 192.168.2.10** - Construct an
+   incrementing set of L3 (IPv4) headers for successive packets with
+   source addresses ranging from .10 to .254. All packets in the stream
+   have a constant dest address of 192.168.2.10. Set the protocol field
+   to 17, UDP.
+
+-  **UDP: 1234 -> 2345** - Set the UDP source and destination ports to
+   1234 and 2345, respectively
+
+-  **incrementing 256** - Insert up to 256 incrementing data bytes.
+
+Obvious variations involve “s/IP4/IP6/” in the above, along with
+changing from IPv4 to IPv6 address notation.
+
+The vpp pg can set any / all IPv4 header fields, including tos, packet
+length, mf / df / fragment id and offset, ttl, protocol, checksum, and
+src/dst addresses. Take a look at ../src/vnet/ip/ip[46]_pg.c for
+details.
+
+If all else fails, specify the entire packet data in hex:
+
+-  **hex 0xabcd…** - copy hex data verbatim into the packet
+
+When replaying pcap files (“**pcap <filename>**”), do not specify a data
+stanza.
+
+Diagnosing “packet-generator new” parse failures
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want to inject packets into a brand-new graph node, remember to
+tell the packet generator debug CLI how to parse the packet data stanza.
+
+If the node expects L2 Ethernet MAC headers, specify “.unformat_buffer =
+unformat_ethernet_header”:
+
+.. code:: c
+
+       VLIB_REGISTER_NODE (ethernet_input_node) =
+       {
+         <snip>
+         .unformat_buffer = unformat_ethernet_header,
+         <snip>
+       };
+
+Beyond that, it may be necessary to set breakpoints in
+…/src/vnet/pg/cli.c. Debug image suggested.
+
+When debugging new nodes, it may be far simpler to directly inject
+ethernet frames - and add a corresponding vlib_buffer_advance in the new
+node - than to modify the packet generator.
+
+Debug CLI
+---------
+
+The descriptions above describe the “packet-generator new” debug CLI in
+detail.
+
+Additional debug CLI commands include:
+
+::
+
+       vpp# packet-generator enable [<stream-name>]
+
+which enables the named stream, or all streams.
+
+::
+
+       vpp# packet-generator disable [<stream-name>]
+
+disables the named stream, or all streams.
+
+::
+
+       vpp# packet-generator delete <stream-name>
+
+Deletes the named stream.
+
+::
+
+       vpp# packet-generator configure <stream-name> [limit <nnn>]
+            [rate <f64-pps>] [size <nn>-<nn>]
+
+Changes stream parameters without having to recreate the entire stream
+definition. Note that re-issuing a “packet-generator new” command will
+correctly recreate the named stream.