1 files changed, 888 insertions, 0 deletions
diff --git a/docs/developer/corearchitecture/vlib.rst b/docs/developer/corearchitecture/vlib.rst
new file mode 100644
index 00000000000..f542d33ebb8
--- /dev/null
+++ b/docs/developer/corearchitecture/vlib.rst
@@ -0,0 +1,888 @@
+VLIB (Vector Processing Library)
+================================
+
+The files associated with vlib are located in the ./src/{vlib, vlibapi,
+vlibmemory} folders. These libraries provide vector processing support
+including graph-node scheduling, reliable multicast support,
+ultra-lightweight cooperative multi-tasking threads, a CLI, plug in .DLL
+support, physical memory and Linux epoll support. Parts of this library
+embody US Patent 7,961,636.
+
+Init function discovery
+-----------------------
+
+vlib applications register for various [initialization] events by
+placing structures and \__attribute__((constructor)) functions into the
+image. At appropriate times, the vlib framework walks
+constructor-generated singly-linked structure lists, performs a
+topological sort based on specified constraints, and calls the indicated
+functions. Vlib applications create graph nodes, add CLI functions,
+start cooperative multi-tasking threads, etc. etc. using this mechanism.
+
+vlib applications invariably include a number of VLIB_INIT_FUNCTION
+(my_init_function) macros.
+
+Each init / configure / etc. function has the return type clib_error_t
+\*. Make sure that the function returns 0 if all is well, otherwise the
+framework will announce an error and exit.
+
+vlib applications must link against vppinfra, and often link against
+other libraries such as VNET. In the latter case, it may be necessary to
+explicitly reference symbol(s) otherwise large portions of the library
+may be AWOL at runtime.
+
+Init function construction and constraint specification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It’s easy to add an init function:
+
+.. code:: c
+
+      static clib_error_t *my_init_function (vlib_main_t *vm)
+      {
+         /* ... initialize things ... */
+
+         return 0; // or return clib_error_return (0, "BROKEN!");
+      }
+      VLIB_INIT_FUNCTION(my_init_function);
+
+As given, my_init_function will be executed “at some point,” but with no
+ordering guarantees.
+
+Specifying ordering constraints is easy:
+
+.. code:: c
+
+      VLIB_INIT_FUNCTION(my_init_function) =
+      {
+         .runs_before = VLIB_INITS("we_run_before_function_1",
+                                   "we_run_before_function_2"),
+         .runs_after = VLIB_INITS("we_run_after_function_1",
+                                  "we_run_after_function_2),
+       };
+
+It’s also easy to specify bulk ordering constraints of the form “a then
+b then c then d”:
+
+.. code:: c
+
+      VLIB_INIT_FUNCTION(my_init_function) =
+      {
+         .init_order = VLIB_INITS("a", "b", "c", "d"),
+      };
+
+It’s OK to specify all three sorts of ordering constraints for a single
+init function, although it’s hard to imagine why it would be necessary.
+
+Node Graph Initialization
+-------------------------
+
+vlib packet-processing applications invariably define a set of graph
+nodes to process packets.
+
+One constructs a vlib_node_registration_t, most often via the
+VLIB_REGISTER_NODE macro. At runtime, the framework processes the set of
+such registrations into a directed graph. It is easy enough to add nodes
+to the graph at runtime. The framework does not support removing nodes.
+
+vlib provides several types of vector-processing graph nodes, primarily
+to control framework dispatch behaviors. The type member of the
+vlib_node_registration_t functions as follows:
+
+-  VLIB_NODE_TYPE_PRE_INPUT - run before all other node types
+-  VLIB_NODE_TYPE_INPUT - run as often as possible, after pre_input
+   nodes
+-  VLIB_NODE_TYPE_INTERNAL - only when explicitly made runnable by
+   adding pending frames for processing
+-  VLIB_NODE_TYPE_PROCESS - only when explicitly made runnable.
+   “Process” nodes are actually cooperative multi-tasking threads. They
+   **must** explicitly suspend after a reasonably short period of time.
+
+For a precise understanding of the graph node dispatcher, please read
+./src/vlib/main.c:vlib_main_loop.
+
+Graph node dispatcher
+---------------------
+
+Vlib_main_loop() dispatches graph nodes. The basic vector processing
+algorithm is diabolically simple, but may not be obvious from even a
+long stare at the code. Here’s how it works: some input node, or set of
+input nodes, produce a vector of work to process. The graph node
+dispatcher pushes the work vector through the directed graph,
+subdividing it as needed, until the original work vector has been
+completely processed. At that point, the process recurs.
+
+This scheme yields a stable equilibrium in frame size, by construction.
+Here’s why: as the frame size increases, the per-frame-element
+processing time decreases. There are several related forces at work; the
+simplest to describe is the effect of vector processing on the CPU L1
+I-cache. The first frame element [packet] processed by a given node
+warms up the node dispatch function in the L1 I-cache. All subsequent
+frame elements profit. As we increase the number of frame elements, the
+cost per element goes down.
+
+Under light load, it is a crazy waste of CPU cycles to run the graph
+node dispatcher flat-out. So, the graph node dispatcher arranges to wait
+for work by sitting in a timed epoll wait if the prevailing frame size
+is low. The scheme has a certain amount of hysteresis to avoid
+constantly toggling back and forth between interrupt and polling mode.
+Although the graph dispatcher supports interrupt and polling modes, our
+current default device drivers do not.
+
+The graph node scheduler uses a hierarchical timer wheel to reschedule
+process nodes upon timer expiration.
+
+Graph dispatcher internals
+--------------------------
+
+This section may be safely skipped. It’s not necessary to understand
+graph dispatcher internals to create graph nodes.
+
+Vector Data Structure
+---------------------
+
+In vpp / vlib, we represent vectors as instances of the vlib_frame_t
+type:
+
+.. code:: c
+
+       typedef struct vlib_frame_t
+       {
+         /* Frame flags. */
+         u16 flags;
+
+         /* Number of scalar bytes in arguments. */
+         u8 scalar_size;
+
+         /* Number of bytes per vector argument. */
+         u8 vector_size;
+
+         /* Number of vector elements currently in frame. */
+         u16 n_vectors;
+
+         /* Scalar and vector arguments to next node. */
+         u8 arguments[0];
+       } vlib_frame_t;
+
+Note that one *could* construct all kinds of vectors - including vectors
+with some associated scalar data - using this structure. In the vpp
+application, vectors typically use a 4-byte vector element size, and
+zero bytes’ worth of associated per-frame scalar data.
+
+Frames are always allocated on CLIB_CACHE_LINE_BYTES boundaries. Frames
+have u32 indices which make use of the alignment property, so the
+maximum feasible main heap offset of a frame is CLIB_CACHE_LINE_BYTES \*
+0xFFFFFFFF: 64*4 = 256 Gbytes.
+
+Scheduling Vectors
+------------------
+
+As you can see, vectors are not directly associated with graph nodes. We
+represent that association in a couple of ways. The simplest is the
+vlib_pending_frame_t:
+
+.. code:: c
+
+       /* A frame pending dispatch by main loop. */
+       typedef struct
+       {
+         /* Node and runtime for this frame. */
+         u32 node_runtime_index;
+
+         /* Frame index (in the heap). */
+         u32 frame_index;
+
+         /* Start of next frames for this node. */
+         u32 next_frame_index;
+
+         /* Special value for next_frame_index when there is no next frame. */
+       #define VLIB_PENDING_FRAME_NO_NEXT_FRAME ((u32) ~0)
+       } vlib_pending_frame_t;
+
+Here is the code in …/src/vlib/main.c:vlib_main_or_worker_loop() which
+processes frames:
+
+.. code:: c
+
+         /*
+          * Input nodes may have added work to the pending vector.
+          * Process pending vector until there is nothing left.
+          * All pending vectors will be processed from input -> output.
+          */
+         for (i = 0; i < _vec_len (nm->pending_frames); i++)
+       cpu_time_now = dispatch_pending_node (vm, i, cpu_time_now);
+         /* Reset pending vector for next iteration. */
+
+The pending frame node_runtime_index associates the frame with the node
+which will process it.
+
+Complications
+-------------
+
+Fasten your seatbelt. Here’s where the story - and the data structures -
+become quite complicated…
+
+At 100,000 feet: vpp uses a directed graph, not a directed *acyclic*
+graph. It’s really quite normal for a packet to visit ip[46]-lookup
+multiple times. The worst-case: a graph node which enqueues packets to
+itself.
+
+To deal with this issue, the graph dispatcher must force allocation of a
+new frame if the current graph node’s dispatch function happens to
+enqueue a packet back to itself.
+
+There are no guarantees that a pending frame will be processed
+immediately, which means that more packets may be added to the
+underlying vlib_frame_t after it has been attached to a
+vlib_pending_frame_t. Care must be taken to allocate new frames and
+pending frames if a (pending_frame, frame) pair fills.
+
+Next frames, next frame ownership
+---------------------------------
+
+The vlib_next_frame_t is the last key graph dispatcher data structure:
+
+.. code:: c
+
+       typedef struct
+       {
+         /* Frame index. */
+         u32 frame_index;
+
+         /* Node runtime for this next. */
+         u32 node_runtime_index;
+
+         /* Next frame flags. */
+         u32 flags;
+
+         /* Reflects node frame-used flag for this next. */
+       #define VLIB_FRAME_NO_FREE_AFTER_DISPATCH \
+         VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH
+
+         /* This next frame owns enqueue to node
+            corresponding to node_runtime_index. */
+       #define VLIB_FRAME_OWNER (1 << 15)
+
+         /* Set when frame has been allocated for this next. */
+       #define VLIB_FRAME_IS_ALLOCATED VLIB_NODE_FLAG_IS_OUTPUT
+
+         /* Set when frame has been added to pending vector. */
+       #define VLIB_FRAME_PENDING VLIB_NODE_FLAG_IS_DROP
+
+         /* Set when frame is to be freed after dispatch. */
+       #define VLIB_FRAME_FREE_AFTER_DISPATCH VLIB_NODE_FLAG_IS_PUNT
+
+         /* Set when frame has traced packets. */
+       #define VLIB_FRAME_TRACE VLIB_NODE_FLAG_TRACE
+
+         /* Number of vectors enqueue to this next since last overflow. */
+         u32 vectors_since_last_overflow;
+       } vlib_next_frame_t;
+
+Graph node dispatch functions call vlib_get_next_frame (…) to set “(u32
+\*)to_next” to the right place in the vlib_frame_t corresponding to the
+ith arc (aka next0) from the current node to the indicated next node.
+
+After some scuffling around - two levels of macros - processing reaches
+vlib_get_next_frame_internal (…). Get-next-frame-internal digs up the
+vlib_next_frame_t corresponding to the desired graph arc.
+
+The next frame data structure amounts to a graph-arc-centric frame
+cache. Once a node finishes adding element to a frame, it will acquire a
+vlib_pending_frame_t and end up on the graph dispatcher’s run-queue. But
+there’s no guarantee that more vector elements won’t be added to the
+underlying frame from the same (source_node, next_index) arc or from a
+different (source_node, next_index) arc.
+
+Maintaining consistency of the arc-to-frame cache is necessary. The
+first step in maintaining consistency is to make sure that only one
+graph node at a time thinks it “owns” the target vlib_frame_t.
+
+Back to the graph node dispatch function. In the usual case, a certain
+number of packets will be added to the vlib_frame_t acquired by calling
+vlib_get_next_frame (…).
+
+Before a dispatch function returns, it’s required to call
+vlib_put_next_frame (…) for all of the graph arcs it actually used. This
+action adds a vlib_pending_frame_t to the graph dispatcher’s pending
+frame vector.
+
+Vlib_put_next_frame makes a note in the pending frame of the frame
+index, and also of the vlib_next_frame_t index.
+
+dispatch_pending_node actions
+-----------------------------
+
+The main graph dispatch loop calls dispatch pending node as shown above.
+
+Dispatch_pending_node recovers the pending frame, and the graph node
+runtime / dispatch function. Further, it recovers the next_frame
+currently associated with the vlib_frame_t, and detaches the
+vlib_frame_t from the next_frame.
+
+In …/src/vlib/main.c:dispatch_pending_node(…), note this stanza:
+
+.. code:: c
+
+     /* Force allocation of new frame while current frame is being
+        dispatched. */
+     restore_frame_index = ~0;
+     if (nf->frame_index == p->frame_index)
+       {
+         nf->frame_index = ~0;
+         nf->flags &= ~VLIB_FRAME_IS_ALLOCATED;
+         if (!(n->flags & VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH))
+       restore_frame_index = p->frame_index;
+       }
+
+dispatch_pending_node is worth a hard stare due to the several
+second-order optimizations it implements. Almost as an afterthought, it
+calls dispatch_node which actually calls the graph node dispatch
+function.
+
+Process / thread model
+----------------------
+
+vlib provides an ultra-lightweight cooperative multi-tasking thread
+model. The graph node scheduler invokes these processes in much the same
+way as traditional vector-processing run-to-completion graph nodes;
+plus-or-minus a setjmp/longjmp pair required to switch stacks. Simply
+set the vlib_node_registration_t type field to vlib_NODE_TYPE_PROCESS.
+Yes, process is a misnomer. These are cooperative multi-tasking threads.
+
+As of this writing, the default stack size is 2<<15 = 32kb. Initialize
+the node registration’s process_log2_n_stack_bytes member as needed. The
+graph node dispatcher makes some effort to detect stack overrun, e.g. by
+mapping a no-access page below each thread stack.
+
+Process node dispatch functions are expected to be “while(1) { }” loops
+which suspend when not otherwise occupied, and which must not run for
+unreasonably long periods of time.
+
+“Unreasonably long” is an application-dependent concept. Over the years,
+we have constructed frame-size sensitive control-plane nodes which will
+use a much higher fraction of the available CPU bandwidth when the frame
+size is low. The classic example: modifying forwarding tables. So long
+as the table-builder leaves the forwarding tables in a valid state, one
+can suspend the table builder to avoid dropping packets as a result of
+control-plane activity.
+
+Process nodes can suspend for fixed amounts of time, or until another
+entity signals an event, or both. See the next section for a description
+of the vlib process event mechanism.
+
+When running in vlib process context, one must pay strict attention to
+loop invariant issues. If one walks a data structure and calls a
+function which may suspend, one had best know by construction that it
+cannot change. Often, it’s best to simply make a snapshot copy of a data
+structure, walk the copy at leisure, then free the copy.
+
+Process events
+--------------
+
+The vlib process event mechanism API is extremely lightweight and easy
+to use. Here is a typical example:
+
+.. code:: c
+
+       vlib_main_t *vm = &vlib_global_main;
+       uword event_type, * event_data = 0;
+
+       while (1)
+       {
+          vlib_process_wait_for_event_or_clock (vm, 5.0 /* seconds */);
+
+          event_type = vlib_process_get_events (vm, &event_data);
+
+          switch (event_type) {
+          case EVENT1:
+              handle_event1s (event_data);
+              break;
+
+          case EVENT2:
+              handle_event2s (event_data);
+              break;
+
+          case ~0: /* 5-second idle/periodic */
+              handle_idle ();
+              break;
+
+          default: /* bug! */
+              ASSERT (0);
+          }
+
+          vec_reset_length(event_data);
+       }
+
+In this example, the VLIB process node waits for an event to occur, or
+for 5 seconds to elapse. The code demuxes on the event type, calling the
+appropriate handler function. Each call to vlib_process_get_events
+returns a vector of per-event-type data passed to successive
+vlib_process_signal_event calls; it is a serious error to process only
+event_data[0].
+
+Resetting the event_data vector-length to 0 [instead of calling
+vec_free] means that the event scheme doesn’t burn cycles continuously
+allocating and freeing the event data vector. This is a common vppinfra
+/ vlib coding pattern, well worth using when appropriate.
+
+Signaling an event is easy, for example:
+
+.. code:: c
+
+       vlib_process_signal_event (vm, process_node_index, EVENT1,
+           (uword)arbitrary_event1_data); /* and so forth */
+
+One can either know the process node index by construction - dig it out
+of the appropriate vlib_node_registration_t - or by finding the
+vlib_node_t with vlib_get_node_by_name(…).
+
+Buffers
+-------
+
+vlib buffering solves the usual set of packet-processing problems,
+albeit at high performance. Key in terms of performance: one ordinarily
+allocates / frees N buffers at a time rather than one at a time. Except
+when operating directly on a specific buffer, one deals with buffers by
+index, not by pointer.
+
+Packet-processing frames are u32[] arrays, not vlib_buffer_t[] arrays.
+
+Packets comprise one or more vlib buffers, chained together as required.
+Multiple particle sizes are supported; hardware input nodes simply ask
+for the required size(s). Coalescing support is available. For obvious
+reasons one is discouraged from writing one’s own wild and wacky buffer
+chain traversal code.
+
+vlib buffer headers are allocated immediately prior to the buffer data
+area. In typical packet processing this saves a dependent read wait:
+given a buffer’s address, one can prefetch the buffer header [metadata]
+at the same time as the first cache line of buffer data.
+
+Buffer header metadata (vlib_buffer_t) includes the usual rewrite
+expansion space, a current_data offset, RX and TX interface indices,
+packet trace information, and a opaque areas.
+
+The opaque data is intended to control packet processing in arbitrary
+subgraph-dependent ways. The programmer shoulders responsibility for
+data lifetime analysis, type-checking, etc.
+
+Buffers have reference-counts in support of e.g. multicast replication.
+
+Shared-memory message API
+-------------------------
+
+Local control-plane and application processes interact with the vpp
+dataplane via asynchronous message-passing in shared memory over
+unidirectional queues. The same application APIs are available via
+sockets.
+
+Capturing API traces and replaying them in a simulation environment
+requires a disciplined approach to the problem. This seems like a
+make-work task, but it is not. When something goes wrong in the
+control-plane after 300,000 or 3,000,000 operations, high-speed replay
+of the events leading up to the accident is a huge win.
+
+The shared-memory message API message allocator vl_api_msg_alloc uses a
+particularly cute trick. Since messages are processed in order, we try
+to allocate message buffering from a set of fixed-size, preallocated
+rings. Each ring item has a “busy” bit. Freeing one of the preallocated
+message buffers merely requires the message consumer to clear the busy
+bit. No locking required.
+
+Debug CLI
+---------
+
+Adding debug CLI commands to VLIB applications is very simple.
+
+Here is a complete example:
+
+.. code:: c
+
+       static clib_error_t *
+       show_ip_tuple_match (vlib_main_t * vm,
+                            unformat_input_t * input,
+                            vlib_cli_command_t * cmd)
+       {
+           vlib_cli_output (vm, "%U\n", format_ip_tuple_match_tables, &routing_main);
+           return 0;
+       }
+
+       static VLIB_CLI_COMMAND (show_ip_tuple_command) =
+       {
+           .path = "show ip tuple match",
+           .short_help = "Show ip 5-tuple match-and-broadcast tables",
+           .function = show_ip_tuple_match,
+       };
+
+This example implements the “show ip tuple match” debug cli command. In
+ordinary usage, the vlib cli is available via the “vppctl” application,
+which sends traffic to a named pipe. One can configure debug CLI telnet
+access on a configurable port.
+
+The cli implementation has an output redirection facility which makes it
+simple to deliver cli output via shared-memory API messaging,
+
+Particularly for debug or “show tech support” type commands, it would be
+wasteful to write vlib application code to pack binary data, write more
+code elsewhere to unpack the data and finally print the answer. If a
+certain cli command has the potential to hurt packet processing
+performance by running for too long, do the work incrementally in a
+process node. The client can wait.
+
+Macro expansion
+~~~~~~~~~~~~~~~
+
+The vpp debug CLI engine includes a recursive macro expander. This is
+quite useful for factoring out address and/or interface name specifics:
+
+::
+
+      define ip1 192.168.1.1/24
+      define ip2 192.168.2.1/24
+      define iface1 GigabitEthernet3/0/0
+      define iface2 loop1
+
+      set int ip address $iface1 $ip1
+      set int ip address $iface2 $(ip2)
+
+      undefine ip1
+      undefine ip2
+      undefine iface1
+      undefine iface2
+
+Each socket (or telnet) debug CLI session has its own macro tables. All
+debug CLI sessions which use CLI_INBAND binary API messages share a
+single table.
+
+The macro expander recognizes circular definitions:
+
+::
+
+       define foo \$(bar)
+       define bar \$(mumble)
+       define mumble \$(foo)
+
+At 8 levels of recursion, the macro expander throws up its hands and
+replies “CIRCULAR.”
+
+Macro-related debug CLI commands
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In addition to the “define” and “undefine” debug CLI commands, use “show
+macro [noevaluate]” to dump the macro table. The “echo” debug CLI
+command will evaluate and print its argument:
+
+::
+
+       vpp# define foo This\ Is\ Foo
+       vpp# echo $foo
+       This Is Foo
+
+Handing off buffers between threads
+-----------------------------------
+
+Vlib includes an easy-to-use mechanism for handing off buffers between
+worker threads. A typical use-case: software ingress flow hashing. At a
+high level, one creates a per-worker-thread queue which sends packets to
+a specific graph node in the indicated worker thread. With the queue in
+hand, enqueue packets to the worker thread of your choice.
+
+Initialize a handoff queue
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Simple enough, call vlib_frame_queue_main_init:
+
+.. code:: c
+
+      main_ptr->frame_queue_index
+          = vlib_frame_queue_main_init (dest_node.index, frame_queue_size);
+
+Frame_queue_size means what it says: the number of frames which may be
+queued. Since frames contain 1…256 packets, frame_queue_size should be a
+reasonably small number (32…64). If the frame queue producer(s) are
+faster than the frame queue consumer(s), congestion will occur. Suggest
+letting the enqueue operator deal with queue congestion, as shown in the
+enqueue example below.
+
+Under the floorboards, vlib_frame_queue_main_init creates an input queue
+for each worker thread.
+
+Please do NOT create frame queues until it’s clear that they will be
+used. Although the main dispatch loop is reasonably smart about how
+often it polls the (entire set of) frame queues, polling unused frame
+queues is a waste of clock cycles.
+
+Hand off packets
+~~~~~~~~~~~~~~~~
+
+The actual handoff mechanics are simple, and integrate nicely with a
+typical graph-node dispatch function:
+
+.. code:: c
+
+       always_inline uword
+       do_handoff_inline (vlib_main_t * vm,
+                      vlib_node_runtime_t * node, vlib_frame_t * frame,
+                      int is_ip4, int is_trace)
+       {
+         u32 n_left_from, *from;
+         vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
+         u16 thread_indices [VLIB_FRAME_SIZE];
+         u16 nexts[VLIB_FRAME_SIZE], *next;
+         u32 n_enq;
+         htest_main_t *hmp = &htest_main;
+         int i;
+
+         from = vlib_frame_vector_args (frame);
+         n_left_from = frame->n_vectors;
+
+         vlib_get_buffers (vm, from, bufs, n_left_from);
+         next = nexts;
+         b = bufs;
+
+         /*
+          * Typical frame traversal loop, details vary with
+          * use case. Make sure to set thread_indices[i] with
+          * the desired destination thread index. You may
+          * or may not bother to set next[i].
+          */
+
+         for (i = 0; i < frame->n_vectors; i++)
+           {
+             <snip>
+             /* Pick a thread to handle this packet */
+             thread_indices[i] = f (packet_data_or_whatever);
+             <snip>
+
+             b += 1;
+             next += 1;
+             n_left_from -= 1;
+           }
+
+          /* Enqueue buffers to threads */
+          n_enq =
+           vlib_buffer_enqueue_to_thread (vm, node, hmp->frame_queue_index,
+                                          from, thread_indices, frame->n_vectors,
+                                          1 /* drop on congestion */);
+          /* Typical counters,
+         if (n_enq < frame->n_vectors)
+           vlib_node_increment_counter (vm, node->node_index,
+                        XXX_ERROR_CONGESTION_DROP,
+                        frame->n_vectors - n_enq);
+         vlib_node_increment_counter (vm, node->node_index,
+                            XXX_ERROR_HANDED_OFF, n_enq);
+         return frame->n_vectors;
+   }
+
+Notes about calling vlib_buffer_enqueue_to_thread(…):
+
+-  If you pass “drop on congestion” non-zero, all packets in the inbound
+   frame will be consumed one way or the other. This is the recommended
+   setting.
+
+-  In the drop-on-congestion case, please don’t try to “help” in the
+   enqueue node by freeing dropped packets, or by pushing them to
+   “error-drop.” Either of those actions would be a severe error.
+
+-  It’s perfectly OK to enqueue packets to the current thread.
+
+Handoff Demo Plugin
+-------------------
+
+Check out the sample (plugin) example in …/src/examples/handoffdemo. If
+you want to build the handoff demo plugin:
+
+::
+
+   $ cd .../src/plugins
+   $ ln -s ../examples/handoffdemo
+
+This plugin provides a simple example of how to hand off packets between
+threads. We used it to debug packet-tracer handoff tracing support.
+
+Packet generator input script
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+    packet-generator new {
+       name x
+       limit 5
+       size 128-128
+       interface local0
+       node handoffdemo-1
+       data {
+           incrementing 30
+       }
+    }
+
+Start vpp with 2 worker threads
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The demo plugin hands packets from worker 1 to worker 2.
+
+Enable tracing, and start the packet generator
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+     trace add pg-input 100
+     packet-generator enable
+
+Sample Run
+~~~~~~~~~~
+
+::
+
+     DBGvpp# ex /tmp/pg_input_script
+     DBGvpp# pa en
+     DBGvpp# sh err
+      Count                    Node                  Reason
+            5              handoffdemo-1             packets handed off processed
+            5              handoffdemo-2             completed packets
+     DBGvpp# show run
+     Thread 1 vpp_wk_0 (lcore 0)
+     Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00
+       vector rates in 3.7331e-2, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
+                  Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
+     handoffdemo-1                    active                  1               5               0          4.76e3            5.00
+     pg-input                        disabled                 2               5               0          5.58e4            2.50
+     unix-epoll-input                 polling             22760               0               0          2.14e7            0.00
+     ---------------
+     Thread 2 vpp_wk_1 (lcore 2)
+     Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00
+       vector rates in 0.0000e0, out 0.0000e0, drop 3.7331e-2, punt 0.0000e0
+                  Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
+     drop                             active                  1               5               0          1.35e4            5.00
+     error-drop                       active                  1               5               0          2.52e4            5.00
+     handoffdemo-2                    active                  1               5               0          2.56e4            5.00
+     unix-epoll-input                 polling             22406               0               0          2.18e7            0.00
+
+Enable the packet tracer and run it again…
+
+::
+
+     DBGvpp# trace add pg-input 100
+     DBGvpp# pa en
+     DBGvpp# sh trace
+     sh trace
+     ------------------- Start of thread 0 vpp_main -------------------
+     No packets in trace buffer
+     ------------------- Start of thread 1 vpp_wk_0 -------------------
+     Packet 1
+
+     00:06:50:520688: pg-input
+       stream x, 128 bytes, 0 sw_if_index
+       current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000000
+       00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+       00000020: 0000000000000000000000000000000000000000000000000000000000000000
+       00000040: 0000000000000000000000000000000000000000000000000000000000000000
+       00000060: 0000000000000000000000000000000000000000000000000000000000000000
+     00:06:50:520762: handoffdemo-1
+       HANDOFFDEMO: current thread 1
+
+     Packet 2
+
+     00:06:50:520688: pg-input
+       stream x, 128 bytes, 0 sw_if_index
+       current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000001
+       00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+       00000020: 0000000000000000000000000000000000000000000000000000000000000000
+       00000040: 0000000000000000000000000000000000000000000000000000000000000000
+       00000060: 0000000000000000000000000000000000000000000000000000000000000000
+     00:06:50:520762: handoffdemo-1
+       HANDOFFDEMO: current thread 1
+
+     Packet 3
+
+     00:06:50:520688: pg-input
+       stream x, 128 bytes, 0 sw_if_index
+       current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000002
+       00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+       00000020: 0000000000000000000000000000000000000000000000000000000000000000
+       00000040: 0000000000000000000000000000000000000000000000000000000000000000
+       00000060: 0000000000000000000000000000000000000000000000000000000000000000
+     00:06:50:520762: handoffdemo-1
+       HANDOFFDEMO: current thread 1
+
+     Packet 4
+
+     00:06:50:520688: pg-input
+       stream x, 128 bytes, 0 sw_if_index
+       current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000003
+       00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+       00000020: 0000000000000000000000000000000000000000000000000000000000000000
+       00000040: 0000000000000000000000000000000000000000000000000000000000000000
+       00000060: 0000000000000000000000000000000000000000000000000000000000000000
+     00:06:50:520762: handoffdemo-1
+       HANDOFFDEMO: current thread 1
+
+     Packet 5
+
+     00:06:50:520688: pg-input
+       stream x, 128 bytes, 0 sw_if_index
+       current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000004
+       00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+       00000020: 0000000000000000000000000000000000000000000000000000000000000000
+       00000040: 0000000000000000000000000000000000000000000000000000000000000000
+       00000060: 0000000000000000000000000000000000000000000000000000000000000000
+     00:06:50:520762: handoffdemo-1
+       HANDOFFDEMO: current thread 1
+
+     ------------------- Start of thread 2 vpp_wk_1 -------------------
+     Packet 1
+
+     00:06:50:520796: handoff_trace
+       HANDED-OFF: from thread 1 trace index 0
+     00:06:50:520796: handoffdemo-2
+       HANDOFFDEMO: current thread 2
+     00:06:50:520867: error-drop
+       rx:local0
+     00:06:50:520914: drop
+       handoffdemo-2: completed packets
+
+     Packet 2
+
+     00:06:50:520796: handoff_trace
+       HANDED-OFF: from thread 1 trace index 1
+     00:06:50:520796: handoffdemo-2
+       HANDOFFDEMO: current thread 2
+     00:06:50:520867: error-drop
+       rx:local0
+     00:06:50:520914: drop
+       handoffdemo-2: completed packets
+
+     Packet 3
+
+     00:06:50:520796: handoff_trace
+       HANDED-OFF: from thread 1 trace index 2
+     00:06:50:520796: handoffdemo-2
+       HANDOFFDEMO: current thread 2
+     00:06:50:520867: error-drop
+       rx:local0
+     00:06:50:520914: drop
+       handoffdemo-2: completed packets
+
+     Packet 4
+
+     00:06:50:520796: handoff_trace
+       HANDED-OFF: from thread 1 trace index 3
+     00:06:50:520796: handoffdemo-2
+       HANDOFFDEMO: current thread 2
+     00:06:50:520867: error-drop
+       rx:local0
+     00:06:50:520914: drop
+       handoffdemo-2: completed packets
+
+     Packet 5
+
+     00:06:50:520796: handoff_trace
+       HANDED-OFF: from thread 1 trace index 4
+     00:06:50:520796: handoffdemo-2
+       HANDOFFDEMO: current thread 2
+     00:06:50:520867: error-drop
+       rx:local0
+     00:06:50:520914: drop
+       handoffdemo-2: completed packets
+    DBGvpp#