summaryrefslogtreecommitdiffstats
path: root/docs/developer/corearchitecture/vlib.rst
diff options
context:
space:
mode:
authorNathan Skrzypczak <nathan.skrzypczak@gmail.com>2021-08-19 11:38:06 +0200
committerDave Wallace <dwallacelf@gmail.com>2021-10-13 23:22:32 +0000
commit9ad39c026c8a3c945a7003c4aa4f5cb1d4c80160 (patch)
tree3cca19635417e28ae381d67ae31c75df2925032d /docs/developer/corearchitecture/vlib.rst
parentf47122e07e1ecd0151902a3cabe46c60a99bee8e (diff)
docs: better docs, mv doxygen to sphinx
This patch refactors the VPP sphinx docs in order to make it easier to consume for external readers as well as VPP developers. It also makes sphinx the single source of documentation, which simplifies maintenance and operation. Most important updates are: - reformat the existing documentation as rst - split RELEASE.md and move it into separate rst files - remove section 'events' - remove section 'archive' - remove section 'related projects' - remove section 'feature by release' - remove section 'Various links' - make (Configuration reference, CLI docs, developer docs) top level items in the list - move 'Use Cases' as part of 'About VPP' - move 'Troubleshooting' as part of 'Getting Started' - move test framework docs into 'Developer Documentation' - add a 'Contributing' section for gerrit, docs and other contributer related infos - deprecate doxygen and test-docs targets - redirect the "make doxygen" target to "make docs" Type: refactor Change-Id: I552a5645d5b7964d547f99b1336e2ac24e7c209f Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com> Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
Diffstat (limited to 'docs/developer/corearchitecture/vlib.rst')
-rw-r--r--docs/developer/corearchitecture/vlib.rst888
1 files changed, 888 insertions, 0 deletions
diff --git a/docs/developer/corearchitecture/vlib.rst b/docs/developer/corearchitecture/vlib.rst
new file mode 100644
index 00000000000..f542d33ebb8
--- /dev/null
+++ b/docs/developer/corearchitecture/vlib.rst
@@ -0,0 +1,888 @@
+VLIB (Vector Processing Library)
+================================
+
+The files associated with vlib are located in the ./src/{vlib, vlibapi,
+vlibmemory} folders. These libraries provide vector processing support
+including graph-node scheduling, reliable multicast support,
+ultra-lightweight cooperative multi-tasking threads, a CLI, plug in .DLL
+support, physical memory and Linux epoll support. Parts of this library
+embody US Patent 7,961,636.
+
+Init function discovery
+-----------------------
+
+vlib applications register for various [initialization] events by
+placing structures and \__attribute__((constructor)) functions into the
+image. At appropriate times, the vlib framework walks
+constructor-generated singly-linked structure lists, performs a
+topological sort based on specified constraints, and calls the indicated
+functions. Vlib applications create graph nodes, add CLI functions,
+start cooperative multi-tasking threads, etc. etc. using this mechanism.
+
+vlib applications invariably include a number of VLIB_INIT_FUNCTION
+(my_init_function) macros.
+
+Each init / configure / etc. function has the return type clib_error_t
+\*. Make sure that the function returns 0 if all is well, otherwise the
+framework will announce an error and exit.
+
+vlib applications must link against vppinfra, and often link against
+other libraries such as VNET. In the latter case, it may be necessary to
+explicitly reference symbol(s) otherwise large portions of the library
+may be AWOL at runtime.
+
+Init function construction and constraint specification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It’s easy to add an init function:
+
+.. code:: c
+
+ static clib_error_t *my_init_function (vlib_main_t *vm)
+ {
+ /* ... initialize things ... */
+
+ return 0; // or return clib_error_return (0, "BROKEN!");
+ }
+ VLIB_INIT_FUNCTION(my_init_function);
+
+As given, my_init_function will be executed “at some point,” but with no
+ordering guarantees.
+
+Specifying ordering constraints is easy:
+
+.. code:: c
+
+ VLIB_INIT_FUNCTION(my_init_function) =
+ {
+ .runs_before = VLIB_INITS("we_run_before_function_1",
+ "we_run_before_function_2"),
+ .runs_after = VLIB_INITS("we_run_after_function_1",
+ "we_run_after_function_2),
+ };
+
+It’s also easy to specify bulk ordering constraints of the form “a then
+b then c then d”:
+
+.. code:: c
+
+ VLIB_INIT_FUNCTION(my_init_function) =
+ {
+ .init_order = VLIB_INITS("a", "b", "c", "d"),
+ };
+
+It’s OK to specify all three sorts of ordering constraints for a single
+init function, although it’s hard to imagine why it would be necessary.
+
+Node Graph Initialization
+-------------------------
+
+vlib packet-processing applications invariably define a set of graph
+nodes to process packets.
+
+One constructs a vlib_node_registration_t, most often via the
+VLIB_REGISTER_NODE macro. At runtime, the framework processes the set of
+such registrations into a directed graph. It is easy enough to add nodes
+to the graph at runtime. The framework does not support removing nodes.
+
+vlib provides several types of vector-processing graph nodes, primarily
+to control framework dispatch behaviors. The type member of the
+vlib_node_registration_t functions as follows:
+
+- VLIB_NODE_TYPE_PRE_INPUT - run before all other node types
+- VLIB_NODE_TYPE_INPUT - run as often as possible, after pre_input
+ nodes
+- VLIB_NODE_TYPE_INTERNAL - only when explicitly made runnable by
+ adding pending frames for processing
+- VLIB_NODE_TYPE_PROCESS - only when explicitly made runnable.
+ “Process” nodes are actually cooperative multi-tasking threads. They
+ **must** explicitly suspend after a reasonably short period of time.
+
+For a precise understanding of the graph node dispatcher, please read
+./src/vlib/main.c:vlib_main_loop.
+
+Graph node dispatcher
+---------------------
+
+Vlib_main_loop() dispatches graph nodes. The basic vector processing
+algorithm is diabolically simple, but may not be obvious from even a
+long stare at the code. Here’s how it works: some input node, or set of
+input nodes, produce a vector of work to process. The graph node
+dispatcher pushes the work vector through the directed graph,
+subdividing it as needed, until the original work vector has been
+completely processed. At that point, the process recurs.
+
+This scheme yields a stable equilibrium in frame size, by construction.
+Here’s why: as the frame size increases, the per-frame-element
+processing time decreases. There are several related forces at work; the
+simplest to describe is the effect of vector processing on the CPU L1
+I-cache. The first frame element [packet] processed by a given node
+warms up the node dispatch function in the L1 I-cache. All subsequent
+frame elements profit. As we increase the number of frame elements, the
+cost per element goes down.
+
+Under light load, it is a crazy waste of CPU cycles to run the graph
+node dispatcher flat-out. So, the graph node dispatcher arranges to wait
+for work by sitting in a timed epoll wait if the prevailing frame size
+is low. The scheme has a certain amount of hysteresis to avoid
+constantly toggling back and forth between interrupt and polling mode.
+Although the graph dispatcher supports interrupt and polling modes, our
+current default device drivers do not.
+
+The graph node scheduler uses a hierarchical timer wheel to reschedule
+process nodes upon timer expiration.
+
+Graph dispatcher internals
+--------------------------
+
+This section may be safely skipped. It’s not necessary to understand
+graph dispatcher internals to create graph nodes.
+
+Vector Data Structure
+---------------------
+
+In vpp / vlib, we represent vectors as instances of the vlib_frame_t
+type:
+
+.. code:: c
+
+ typedef struct vlib_frame_t
+ {
+ /* Frame flags. */
+ u16 flags;
+
+ /* Number of scalar bytes in arguments. */
+ u8 scalar_size;
+
+ /* Number of bytes per vector argument. */
+ u8 vector_size;
+
+ /* Number of vector elements currently in frame. */
+ u16 n_vectors;
+
+ /* Scalar and vector arguments to next node. */
+ u8 arguments[0];
+ } vlib_frame_t;
+
+Note that one *could* construct all kinds of vectors - including vectors
+with some associated scalar data - using this structure. In the vpp
+application, vectors typically use a 4-byte vector element size, and
+zero bytes’ worth of associated per-frame scalar data.
+
+Frames are always allocated on CLIB_CACHE_LINE_BYTES boundaries. Frames
+have u32 indices which make use of the alignment property, so the
+maximum feasible main heap offset of a frame is CLIB_CACHE_LINE_BYTES \*
+0xFFFFFFFF: 64*4 = 256 Gbytes.
+
+Scheduling Vectors
+------------------
+
+As you can see, vectors are not directly associated with graph nodes. We
+represent that association in a couple of ways. The simplest is the
+vlib_pending_frame_t:
+
+.. code:: c
+
+ /* A frame pending dispatch by main loop. */
+ typedef struct
+ {
+ /* Node and runtime for this frame. */
+ u32 node_runtime_index;
+
+ /* Frame index (in the heap). */
+ u32 frame_index;
+
+ /* Start of next frames for this node. */
+ u32 next_frame_index;
+
+ /* Special value for next_frame_index when there is no next frame. */
+ #define VLIB_PENDING_FRAME_NO_NEXT_FRAME ((u32) ~0)
+ } vlib_pending_frame_t;
+
+Here is the code in …/src/vlib/main.c:vlib_main_or_worker_loop() which
+processes frames:
+
+.. code:: c
+
+ /*
+ * Input nodes may have added work to the pending vector.
+ * Process pending vector until there is nothing left.
+ * All pending vectors will be processed from input -> output.
+ */
+ for (i = 0; i < _vec_len (nm->pending_frames); i++)
+ cpu_time_now = dispatch_pending_node (vm, i, cpu_time_now);
+ /* Reset pending vector for next iteration. */
+
+The pending frame node_runtime_index associates the frame with the node
+which will process it.
+
+Complications
+-------------
+
+Fasten your seatbelt. Here’s where the story - and the data structures -
+become quite complicated…
+
+At 100,000 feet: vpp uses a directed graph, not a directed *acyclic*
+graph. It’s really quite normal for a packet to visit ip[46]-lookup
+multiple times. The worst-case: a graph node which enqueues packets to
+itself.
+
+To deal with this issue, the graph dispatcher must force allocation of a
+new frame if the current graph node’s dispatch function happens to
+enqueue a packet back to itself.
+
+There are no guarantees that a pending frame will be processed
+immediately, which means that more packets may be added to the
+underlying vlib_frame_t after it has been attached to a
+vlib_pending_frame_t. Care must be taken to allocate new frames and
+pending frames if a (pending_frame, frame) pair fills.
+
+Next frames, next frame ownership
+---------------------------------
+
+The vlib_next_frame_t is the last key graph dispatcher data structure:
+
+.. code:: c
+
+ typedef struct
+ {
+ /* Frame index. */
+ u32 frame_index;
+
+ /* Node runtime for this next. */
+ u32 node_runtime_index;
+
+ /* Next frame flags. */
+ u32 flags;
+
+ /* Reflects node frame-used flag for this next. */
+ #define VLIB_FRAME_NO_FREE_AFTER_DISPATCH \
+ VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH
+
+ /* This next frame owns enqueue to node
+ corresponding to node_runtime_index. */
+ #define VLIB_FRAME_OWNER (1 << 15)
+
+ /* Set when frame has been allocated for this next. */
+ #define VLIB_FRAME_IS_ALLOCATED VLIB_NODE_FLAG_IS_OUTPUT
+
+ /* Set when frame has been added to pending vector. */
+ #define VLIB_FRAME_PENDING VLIB_NODE_FLAG_IS_DROP
+
+ /* Set when frame is to be freed after dispatch. */
+ #define VLIB_FRAME_FREE_AFTER_DISPATCH VLIB_NODE_FLAG_IS_PUNT
+
+ /* Set when frame has traced packets. */
+ #define VLIB_FRAME_TRACE VLIB_NODE_FLAG_TRACE
+
+ /* Number of vectors enqueue to this next since last overflow. */
+ u32 vectors_since_last_overflow;
+ } vlib_next_frame_t;
+
+Graph node dispatch functions call vlib_get_next_frame (…) to set “(u32
+\*)to_next” to the right place in the vlib_frame_t corresponding to the
+ith arc (aka next0) from the current node to the indicated next node.
+
+After some scuffling around - two levels of macros - processing reaches
+vlib_get_next_frame_internal (…). Get-next-frame-internal digs up the
+vlib_next_frame_t corresponding to the desired graph arc.
+
+The next frame data structure amounts to a graph-arc-centric frame
+cache. Once a node finishes adding element to a frame, it will acquire a
+vlib_pending_frame_t and end up on the graph dispatcher’s run-queue. But
+there’s no guarantee that more vector elements won’t be added to the
+underlying frame from the same (source_node, next_index) arc or from a
+different (source_node, next_index) arc.
+
+Maintaining consistency of the arc-to-frame cache is necessary. The
+first step in maintaining consistency is to make sure that only one
+graph node at a time thinks it “owns” the target vlib_frame_t.
+
+Back to the graph node dispatch function. In the usual case, a certain
+number of packets will be added to the vlib_frame_t acquired by calling
+vlib_get_next_frame (…).
+
+Before a dispatch function returns, it’s required to call
+vlib_put_next_frame (…) for all of the graph arcs it actually used. This
+action adds a vlib_pending_frame_t to the graph dispatcher’s pending
+frame vector.
+
+Vlib_put_next_frame makes a note in the pending frame of the frame
+index, and also of the vlib_next_frame_t index.
+
+dispatch_pending_node actions
+-----------------------------
+
+The main graph dispatch loop calls dispatch pending node as shown above.
+
+Dispatch_pending_node recovers the pending frame, and the graph node
+runtime / dispatch function. Further, it recovers the next_frame
+currently associated with the vlib_frame_t, and detaches the
+vlib_frame_t from the next_frame.
+
+In …/src/vlib/main.c:dispatch_pending_node(…), note this stanza:
+
+.. code:: c
+
+ /* Force allocation of new frame while current frame is being
+ dispatched. */
+ restore_frame_index = ~0;
+ if (nf->frame_index == p->frame_index)
+ {
+ nf->frame_index = ~0;
+ nf->flags &= ~VLIB_FRAME_IS_ALLOCATED;
+ if (!(n->flags & VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH))
+ restore_frame_index = p->frame_index;
+ }
+
+dispatch_pending_node is worth a hard stare due to the several
+second-order optimizations it implements. Almost as an afterthought, it
+calls dispatch_node which actually calls the graph node dispatch
+function.
+
+Process / thread model
+----------------------
+
+vlib provides an ultra-lightweight cooperative multi-tasking thread
+model. The graph node scheduler invokes these processes in much the same
+way as traditional vector-processing run-to-completion graph nodes;
+plus-or-minus a setjmp/longjmp pair required to switch stacks. Simply
+set the vlib_node_registration_t type field to vlib_NODE_TYPE_PROCESS.
+Yes, process is a misnomer. These are cooperative multi-tasking threads.
+
+As of this writing, the default stack size is 2<<15 = 32kb. Initialize
+the node registration’s process_log2_n_stack_bytes member as needed. The
+graph node dispatcher makes some effort to detect stack overrun, e.g. by
+mapping a no-access page below each thread stack.
+
+Process node dispatch functions are expected to be “while(1) { }” loops
+which suspend when not otherwise occupied, and which must not run for
+unreasonably long periods of time.
+
+“Unreasonably long” is an application-dependent concept. Over the years,
+we have constructed frame-size sensitive control-plane nodes which will
+use a much higher fraction of the available CPU bandwidth when the frame
+size is low. The classic example: modifying forwarding tables. So long
+as the table-builder leaves the forwarding tables in a valid state, one
+can suspend the table builder to avoid dropping packets as a result of
+control-plane activity.
+
+Process nodes can suspend for fixed amounts of time, or until another
+entity signals an event, or both. See the next section for a description
+of the vlib process event mechanism.
+
+When running in vlib process context, one must pay strict attention to
+loop invariant issues. If one walks a data structure and calls a
+function which may suspend, one had best know by construction that it
+cannot change. Often, it’s best to simply make a snapshot copy of a data
+structure, walk the copy at leisure, then free the copy.
+
+Process events
+--------------
+
+The vlib process event mechanism API is extremely lightweight and easy
+to use. Here is a typical example:
+
+.. code:: c
+
+ vlib_main_t *vm = &vlib_global_main;
+ uword event_type, * event_data = 0;
+
+ while (1)
+ {
+ vlib_process_wait_for_event_or_clock (vm, 5.0 /* seconds */);
+
+ event_type = vlib_process_get_events (vm, &event_data);
+
+ switch (event_type) {
+ case EVENT1:
+ handle_event1s (event_data);
+ break;
+
+ case EVENT2:
+ handle_event2s (event_data);
+ break;
+
+ case ~0: /* 5-second idle/periodic */
+ handle_idle ();
+ break;
+
+ default: /* bug! */
+ ASSERT (0);
+ }
+
+ vec_reset_length(event_data);
+ }
+
+In this example, the VLIB process node waits for an event to occur, or
+for 5 seconds to elapse. The code demuxes on the event type, calling the
+appropriate handler function. Each call to vlib_process_get_events
+returns a vector of per-event-type data passed to successive
+vlib_process_signal_event calls; it is a serious error to process only
+event_data[0].
+
+Resetting the event_data vector-length to 0 [instead of calling
+vec_free] means that the event scheme doesn’t burn cycles continuously
+allocating and freeing the event data vector. This is a common vppinfra
+/ vlib coding pattern, well worth using when appropriate.
+
+Signaling an event is easy, for example:
+
+.. code:: c
+
+ vlib_process_signal_event (vm, process_node_index, EVENT1,
+ (uword)arbitrary_event1_data); /* and so forth */
+
+One can either know the process node index by construction - dig it out
+of the appropriate vlib_node_registration_t - or by finding the
+vlib_node_t with vlib_get_node_by_name(…).
+
+Buffers
+-------
+
+vlib buffering solves the usual set of packet-processing problems,
+albeit at high performance. Key in terms of performance: one ordinarily
+allocates / frees N buffers at a time rather than one at a time. Except
+when operating directly on a specific buffer, one deals with buffers by
+index, not by pointer.
+
+Packet-processing frames are u32[] arrays, not vlib_buffer_t[] arrays.
+
+Packets comprise one or more vlib buffers, chained together as required.
+Multiple particle sizes are supported; hardware input nodes simply ask
+for the required size(s). Coalescing support is available. For obvious
+reasons one is discouraged from writing one’s own wild and wacky buffer
+chain traversal code.
+
+vlib buffer headers are allocated immediately prior to the buffer data
+area. In typical packet processing this saves a dependent read wait:
+given a buffer’s address, one can prefetch the buffer header [metadata]
+at the same time as the first cache line of buffer data.
+
+Buffer header metadata (vlib_buffer_t) includes the usual rewrite
+expansion space, a current_data offset, RX and TX interface indices,
+packet trace information, and a opaque areas.
+
+The opaque data is intended to control packet processing in arbitrary
+subgraph-dependent ways. The programmer shoulders responsibility for
+data lifetime analysis, type-checking, etc.
+
+Buffers have reference-counts in support of e.g. multicast replication.
+
+Shared-memory message API
+-------------------------
+
+Local control-plane and application processes interact with the vpp
+dataplane via asynchronous message-passing in shared memory over
+unidirectional queues. The same application APIs are available via
+sockets.
+
+Capturing API traces and replaying them in a simulation environment
+requires a disciplined approach to the problem. This seems like a
+make-work task, but it is not. When something goes wrong in the
+control-plane after 300,000 or 3,000,000 operations, high-speed replay
+of the events leading up to the accident is a huge win.
+
+The shared-memory message API message allocator vl_api_msg_alloc uses a
+particularly cute trick. Since messages are processed in order, we try
+to allocate message buffering from a set of fixed-size, preallocated
+rings. Each ring item has a “busy” bit. Freeing one of the preallocated
+message buffers merely requires the message consumer to clear the busy
+bit. No locking required.
+
+Debug CLI
+---------
+
+Adding debug CLI commands to VLIB applications is very simple.
+
+Here is a complete example:
+
+.. code:: c
+
+ static clib_error_t *
+ show_ip_tuple_match (vlib_main_t * vm,
+ unformat_input_t * input,
+ vlib_cli_command_t * cmd)
+ {
+ vlib_cli_output (vm, "%U\n", format_ip_tuple_match_tables, &routing_main);
+ return 0;
+ }
+
+ static VLIB_CLI_COMMAND (show_ip_tuple_command) =
+ {
+ .path = "show ip tuple match",
+ .short_help = "Show ip 5-tuple match-and-broadcast tables",
+ .function = show_ip_tuple_match,
+ };
+
+This example implements the “show ip tuple match” debug cli command. In
+ordinary usage, the vlib cli is available via the “vppctl” application,
+which sends traffic to a named pipe. One can configure debug CLI telnet
+access on a configurable port.
+
+The cli implementation has an output redirection facility which makes it
+simple to deliver cli output via shared-memory API messaging,
+
+Particularly for debug or “show tech support” type commands, it would be
+wasteful to write vlib application code to pack binary data, write more
+code elsewhere to unpack the data and finally print the answer. If a
+certain cli command has the potential to hurt packet processing
+performance by running for too long, do the work incrementally in a
+process node. The client can wait.
+
+Macro expansion
+~~~~~~~~~~~~~~~
+
+The vpp debug CLI engine includes a recursive macro expander. This is
+quite useful for factoring out address and/or interface name specifics:
+
+::
+
+ define ip1 192.168.1.1/24
+ define ip2 192.168.2.1/24
+ define iface1 GigabitEthernet3/0/0
+ define iface2 loop1
+
+ set int ip address $iface1 $ip1
+ set int ip address $iface2 $(ip2)
+
+ undefine ip1
+ undefine ip2
+ undefine iface1
+ undefine iface2
+
+Each socket (or telnet) debug CLI session has its own macro tables. All
+debug CLI sessions which use CLI_INBAND binary API messages share a
+single table.
+
+The macro expander recognizes circular definitions:
+
+::
+
+ define foo \$(bar)
+ define bar \$(mumble)
+ define mumble \$(foo)
+
+At 8 levels of recursion, the macro expander throws up its hands and
+replies “CIRCULAR.”
+
+Macro-related debug CLI commands
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In addition to the “define” and “undefine” debug CLI commands, use “show
+macro [noevaluate]” to dump the macro table. The “echo” debug CLI
+command will evaluate and print its argument:
+
+::
+
+ vpp# define foo This\ Is\ Foo
+ vpp# echo $foo
+ This Is Foo
+
+Handing off buffers between threads
+-----------------------------------
+
+Vlib includes an easy-to-use mechanism for handing off buffers between
+worker threads. A typical use-case: software ingress flow hashing. At a
+high level, one creates a per-worker-thread queue which sends packets to
+a specific graph node in the indicated worker thread. With the queue in
+hand, enqueue packets to the worker thread of your choice.
+
+Initialize a handoff queue
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Simple enough, call vlib_frame_queue_main_init:
+
+.. code:: c
+
+ main_ptr->frame_queue_index
+ = vlib_frame_queue_main_init (dest_node.index, frame_queue_size);
+
+Frame_queue_size means what it says: the number of frames which may be
+queued. Since frames contain 1…256 packets, frame_queue_size should be a
+reasonably small number (32…64). If the frame queue producer(s) are
+faster than the frame queue consumer(s), congestion will occur. Suggest
+letting the enqueue operator deal with queue congestion, as shown in the
+enqueue example below.
+
+Under the floorboards, vlib_frame_queue_main_init creates an input queue
+for each worker thread.
+
+Please do NOT create frame queues until it’s clear that they will be
+used. Although the main dispatch loop is reasonably smart about how
+often it polls the (entire set of) frame queues, polling unused frame
+queues is a waste of clock cycles.
+
+Hand off packets
+~~~~~~~~~~~~~~~~
+
+The actual handoff mechanics are simple, and integrate nicely with a
+typical graph-node dispatch function:
+
+.. code:: c
+
+ always_inline uword
+ do_handoff_inline (vlib_main_t * vm,
+ vlib_node_runtime_t * node, vlib_frame_t * frame,
+ int is_ip4, int is_trace)
+ {
+ u32 n_left_from, *from;
+ vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
+ u16 thread_indices [VLIB_FRAME_SIZE];
+ u16 nexts[VLIB_FRAME_SIZE], *next;
+ u32 n_enq;
+ htest_main_t *hmp = &htest_main;
+ int i;
+
+ from = vlib_frame_vector_args (frame);
+ n_left_from = frame->n_vectors;
+
+ vlib_get_buffers (vm, from, bufs, n_left_from);
+ next = nexts;
+ b = bufs;
+
+ /*
+ * Typical frame traversal loop, details vary with
+ * use case. Make sure to set thread_indices[i] with
+ * the desired destination thread index. You may
+ * or may not bother to set next[i].
+ */
+
+ for (i = 0; i < frame->n_vectors; i++)
+ {
+ <snip>
+ /* Pick a thread to handle this packet */
+ thread_indices[i] = f (packet_data_or_whatever);
+ <snip>
+
+ b += 1;
+ next += 1;
+ n_left_from -= 1;
+ }
+
+ /* Enqueue buffers to threads */
+ n_enq =
+ vlib_buffer_enqueue_to_thread (vm, node, hmp->frame_queue_index,
+ from, thread_indices, frame->n_vectors,
+ 1 /* drop on congestion */);
+ /* Typical counters,
+ if (n_enq < frame->n_vectors)
+ vlib_node_increment_counter (vm, node->node_index,
+ XXX_ERROR_CONGESTION_DROP,
+ frame->n_vectors - n_enq);
+ vlib_node_increment_counter (vm, node->node_index,
+ XXX_ERROR_HANDED_OFF, n_enq);
+ return frame->n_vectors;
+ }
+
+Notes about calling vlib_buffer_enqueue_to_thread(…):
+
+- If you pass “drop on congestion” non-zero, all packets in the inbound
+ frame will be consumed one way or the other. This is the recommended
+ setting.
+
+- In the drop-on-congestion case, please don’t try to “help” in the
+ enqueue node by freeing dropped packets, or by pushing them to
+ “error-drop.” Either of those actions would be a severe error.
+
+- It’s perfectly OK to enqueue packets to the current thread.
+
+Handoff Demo Plugin
+-------------------
+
+Check out the sample (plugin) example in …/src/examples/handoffdemo. If
+you want to build the handoff demo plugin:
+
+::
+
+ $ cd .../src/plugins
+ $ ln -s ../examples/handoffdemo
+
+This plugin provides a simple example of how to hand off packets between
+threads. We used it to debug packet-tracer handoff tracing support.
+
+Packet generator input script
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ packet-generator new {
+ name x
+ limit 5
+ size 128-128
+ interface local0
+ node handoffdemo-1
+ data {
+ incrementing 30
+ }
+ }
+
+Start vpp with 2 worker threads
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The demo plugin hands packets from worker 1 to worker 2.
+
+Enable tracing, and start the packet generator
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ trace add pg-input 100
+ packet-generator enable
+
+Sample Run
+~~~~~~~~~~
+
+::
+
+ DBGvpp# ex /tmp/pg_input_script
+ DBGvpp# pa en
+ DBGvpp# sh err
+ Count Node Reason
+ 5 handoffdemo-1 packets handed off processed
+ 5 handoffdemo-2 completed packets
+ DBGvpp# show run
+ Thread 1 vpp_wk_0 (lcore 0)
+ Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00
+ vector rates in 3.7331e-2, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
+ Name State Calls Vectors Suspends Clocks Vectors/Call
+ handoffdemo-1 active 1 5 0 4.76e3 5.00
+ pg-input disabled 2 5 0 5.58e4 2.50
+ unix-epoll-input polling 22760 0 0 2.14e7 0.00
+ ---------------
+ Thread 2 vpp_wk_1 (lcore 2)
+ Time 133.9, average vectors/node 5.00, last 128 main loops 0.00 per node 0.00
+ vector rates in 0.0000e0, out 0.0000e0, drop 3.7331e-2, punt 0.0000e0
+ Name State Calls Vectors Suspends Clocks Vectors/Call
+ drop active 1 5 0 1.35e4 5.00
+ error-drop active 1 5 0 2.52e4 5.00
+ handoffdemo-2 active 1 5 0 2.56e4 5.00
+ unix-epoll-input polling 22406 0 0 2.18e7 0.00
+
+Enable the packet tracer and run it again…
+
+::
+
+ DBGvpp# trace add pg-input 100
+ DBGvpp# pa en
+ DBGvpp# sh trace
+ sh trace
+ ------------------- Start of thread 0 vpp_main -------------------
+ No packets in trace buffer
+ ------------------- Start of thread 1 vpp_wk_0 -------------------
+ Packet 1
+
+ 00:06:50:520688: pg-input
+ stream x, 128 bytes, 0 sw_if_index
+ current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000000
+ 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+ 00000020: 0000000000000000000000000000000000000000000000000000000000000000
+ 00000040: 0000000000000000000000000000000000000000000000000000000000000000
+ 00000060: 0000000000000000000000000000000000000000000000000000000000000000
+ 00:06:50:520762: handoffdemo-1
+ HANDOFFDEMO: current thread 1
+
+ Packet 2
+
+ 00:06:50:520688: pg-input
+ stream x, 128 bytes, 0 sw_if_index
+ current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000001
+ 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+ 00000020: 0000000000000000000000000000000000000000000000000000000000000000
+ 00000040: 0000000000000000000000000000000000000000000000000000000000000000
+ 00000060: 0000000000000000000000000000000000000000000000000000000000000000
+ 00:06:50:520762: handoffdemo-1
+ HANDOFFDEMO: current thread 1
+
+ Packet 3
+
+ 00:06:50:520688: pg-input
+ stream x, 128 bytes, 0 sw_if_index
+ current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000002
+ 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+ 00000020: 0000000000000000000000000000000000000000000000000000000000000000
+ 00000040: 0000000000000000000000000000000000000000000000000000000000000000
+ 00000060: 0000000000000000000000000000000000000000000000000000000000000000
+ 00:06:50:520762: handoffdemo-1
+ HANDOFFDEMO: current thread 1
+
+ Packet 4
+
+ 00:06:50:520688: pg-input
+ stream x, 128 bytes, 0 sw_if_index
+ current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000003
+ 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+ 00000020: 0000000000000000000000000000000000000000000000000000000000000000
+ 00000040: 0000000000000000000000000000000000000000000000000000000000000000
+ 00000060: 0000000000000000000000000000000000000000000000000000000000000000
+ 00:06:50:520762: handoffdemo-1
+ HANDOFFDEMO: current thread 1
+
+ Packet 5
+
+ 00:06:50:520688: pg-input
+ stream x, 128 bytes, 0 sw_if_index
+ current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x1000004
+ 00000000: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d0000
+ 00000020: 0000000000000000000000000000000000000000000000000000000000000000
+ 00000040: 0000000000000000000000000000000000000000000000000000000000000000
+ 00000060: 0000000000000000000000000000000000000000000000000000000000000000
+ 00:06:50:520762: handoffdemo-1
+ HANDOFFDEMO: current thread 1
+
+ ------------------- Start of thread 2 vpp_wk_1 -------------------
+ Packet 1
+
+ 00:06:50:520796: handoff_trace
+ HANDED-OFF: from thread 1 trace index 0
+ 00:06:50:520796: handoffdemo-2
+ HANDOFFDEMO: current thread 2
+ 00:06:50:520867: error-drop
+ rx:local0
+ 00:06:50:520914: drop
+ handoffdemo-2: completed packets
+
+ Packet 2
+
+ 00:06:50:520796: handoff_trace
+ HANDED-OFF: from thread 1 trace index 1
+ 00:06:50:520796: handoffdemo-2
+ HANDOFFDEMO: current thread 2
+ 00:06:50:520867: error-drop
+ rx:local0
+ 00:06:50:520914: drop
+ handoffdemo-2: completed packets
+
+ Packet 3
+
+ 00:06:50:520796: handoff_trace
+ HANDED-OFF: from thread 1 trace index 2
+ 00:06:50:520796: handoffdemo-2
+ HANDOFFDEMO: current thread 2
+ 00:06:50:520867: error-drop
+ rx:local0
+ 00:06:50:520914: drop
+ handoffdemo-2: completed packets
+
+ Packet 4
+
+ 00:06:50:520796: handoff_trace
+ HANDED-OFF: from thread 1 trace index 3
+ 00:06:50:520796: handoffdemo-2
+ HANDOFFDEMO: current thread 2
+ 00:06:50:520867: error-drop
+ rx:local0
+ 00:06:50:520914: drop
+ handoffdemo-2: completed packets
+
+ Packet 5
+
+ 00:06:50:520796: handoff_trace
+ HANDED-OFF: from thread 1 trace index 4
+ 00:06:50:520796: handoffdemo-2
+ HANDOFFDEMO: current thread 2
+ 00:06:50:520867: error-drop
+ rx:local0
+ 00:06:50:520914: drop
+ handoffdemo-2: completed packets
+ DBGvpp#