summaryrefslogtreecommitdiffstats
path: root/docs/gettingstarted/developers
diff options
context:
space:
mode:
authorJohn DeNisco <jdenisco@cisco.com>2018-07-26 12:45:10 -0400
committerDave Barach <openvpp@barachs.net>2018-07-26 18:34:47 +0000
commit06dcd45ff81e06bc8cf40ed487c0b2652d346a5a (patch)
tree71403f9d422c4e532b2871a66ab909bd6066b10b /docs/gettingstarted/developers
parent1d65279ffecd0f540288187b94cb1a6b84a7a0c6 (diff)
Initial commit of Sphinx docs
Change-Id: I9fca8fb98502dffc2555f9de7f507b6f006e0e77 Signed-off-by: John DeNisco <jdenisco@cisco.com>
Diffstat (limited to 'docs/gettingstarted/developers')
-rw-r--r--docs/gettingstarted/developers/bihash.md273
-rw-r--r--docs/gettingstarted/developers/building.rst151
-rw-r--r--docs/gettingstarted/developers/featurearcs.md224
-rw-r--r--docs/gettingstarted/developers/index.rst18
-rw-r--r--docs/gettingstarted/developers/infrastructure.md330
-rw-r--r--docs/gettingstarted/developers/plugins.md11
-rw-r--r--docs/gettingstarted/developers/softwarearchitecture.md44
-rw-r--r--docs/gettingstarted/developers/vlib.md496
-rw-r--r--docs/gettingstarted/developers/vnet.md171
9 files changed, 1718 insertions, 0 deletions
diff --git a/docs/gettingstarted/developers/bihash.md b/docs/gettingstarted/developers/bihash.md
new file mode 100644
index 00000000000..3f53e7bbc3e
--- /dev/null
+++ b/docs/gettingstarted/developers/bihash.md
@@ -0,0 +1,273 @@
+Bounded-index Extensible Hashing (bihash)
+=========================================
+
+Vpp uses bounded-index extensible hashing to solve a variety of
+exact-match (key, value) lookup problems. Benefits of the current
+implementation:
+
+* Very high record count scaling, tested to 100,000,000 records.
+* Lookup performance degrades gracefully as the number of records increases
+* No reader locking required
+* Template implementation, it's easy to support arbitrary (key,value) types
+
+Bounded-index extensible hashing has been widely used in databases for
+decades.
+
+Bihash uses a two-level data structure:
+
+```
+ +-----------------+
+ | bucket-0 |
+ | log2_size |
+ | backing store |
+ +-----------------+
+ | bucket-1 |
+ | log2_size | +--------------------------------+
+ | backing store | --------> | KVP_PER_PAGE * key-value-pairs |
+ +-----------------+ | page 0 |
+ ... +--------------------------------+
+ +-----------------+ | KVP_PER_PAGE * key-value-pairs |
+ | bucket-2**N-1 | | page 1 |
+ | log2_size | +--------------------------------+
+ | backing store | ---
+ +-----------------+ +--------------------------------+
+ | KVP_PER_PAGE * key-value-pairs |
+ | page 2**(log2(size)) - 1 |
+ +--------------------------------+
+```
+
+Discussion of the algorithm
+---------------------------
+
+This structure has a couple of major advantages. In practice, each
+bucket entry fits into a 64-bit integer. Coincidentally, vpp's target
+CPU architectures support 64-bit atomic operations. When modifying the
+contents of a specific bucket, we do the following:
+
+* Make a working copy of the bucket's backing storage
+* Atomically swap a pointer to the working copy into the bucket array
+* Change the original backing store data
+* Atomically swap back to the original
+
+So, no reader locking is required to search a bihash table.
+
+At lookup time, the implementation computes a key hash code. We use
+the least-significant N bits of the hash to select the bucket.
+
+With the bucket in hand, we learn log2 (nBackingPages) for the
+selected bucket. At this point, we use the next log2_size bits from
+the hash code to select the specific backing page in which the
+(key,value) page will be found.
+
+Net result: we search **one** backing page, not 2**log2_size
+pages. This is a key property of the algorithm.
+
+When sufficient collisions occur to fill the backing pages for a given
+bucket, we double the bucket size, rehash, and deal the bucket
+contents into a double-sized set of backing pages. In the future, we
+may represent the size as a linear combination of two powers-of-two,
+to increase space efficiency.
+
+To solve the "jackpot case" where a set of records collide under
+hashing in a bad way, the implementation will fall back to linear
+search across 2**log2_size backing pages on a per-bucket basis.
+
+To maintain *space* efficiency, we should configure the bucket array
+so that backing pages are effectively utilized. Lookup performance
+tends to change *very litte* if the bucket array is too small or too
+large.
+
+Bihash depends on selecting an effective hash function. If one were to
+use a truly broken hash function such as "return 1ULL." bihash would
+still work, but it would be equivalent to poorly-programmed linear
+search.
+
+We often use cpu intrinsic functions - think crc32 - to rapidly
+compute a hash code which has decent statistics.
+
+Bihash Cookbook
+---------------
+
+### Using current (key,value) template instance types
+
+It's quite easy to use one of the template instance types. As of this
+writing, .../src/vppinfra provides pre-built templates for 8, 16, 20,
+24, 40, and 48 byte keys, u8 * vector keys, and 8 byte values.
+
+See .../src/vppinfra/{bihash_<key-size>_8}.h
+
+To define the data types, #include a specific template instance, most
+often in a subsystem header file:
+
+```c
+ #include <vppinfra/bihash_8_8.h>
+```
+
+If you're building a standalone application, you'll need to define the
+various functions by #including the method implementation file in a C
+source file.
+
+The core vpp engine currently uses most if not all of the known bihash
+types, so you probably won't need to #include the method
+implementation file.
+
+
+```c
+ #include <vppinfra/bihash_template.c>
+```
+
+Add an instance of the selected bihash data structure to e.g. a
+"main_t" structure:
+
+```c
+ typedef struct
+ {
+ ...
+ BVT (clib_bihash) hash;
+ or
+ clib_bihash_8_8_t hash;
+ ...
+ } my_main_t;
+```
+
+The BV macro concatenate its argument with the value of the
+preprocessor symbol BIHASH_TYPE. The BVT macro concatenates its
+argument with the value of BIHASH_TYPE and the fixed-string "_t". So
+in the above example, BVT (clib_bihash) generates "clib_bihash_8_8_t".
+
+If you're sure you won't decide to change the template / type name
+later, it's perfectly OK to code "clib_bihash_8_8_t" and so forth.
+
+In fact, if you #include multiple template instances in a single
+source file, you **must** use fully-enumerated type names. The macros
+stand no chance of working.
+
+### Initializing a bihash table
+
+Call the init function as shown. As a rough guide, pick a number of
+buckets which is approximately
+number_of_expected_records/BIHASH_KVP_PER_PAGE from the relevant
+template instance header-file. See previous discussion.
+
+The amount of memory selected should easily contain all of the
+records, with a generous allowance for hash collisions. Bihash memory
+is allocated separately from the main heap, and won't cost anything
+except kernel PTE's until touched, so it's OK to be reasonably
+generous.
+
+For example:
+
+```c
+ my_main_t *mm = &my_main;
+ clib_bihash_8_8_t *h;
+
+ h = &mm->hash_table;
+
+ clib_bihash_init_8_8 (h, "test", (u32) number_of_buckets,
+ (uword) memory_size);
+```
+
+### Add or delete a key/value pair
+
+Use BV(clib_bihash_add_del), or the explicit type variant:
+
+```c
+ clib_bihash_kv_8_8_t kv;
+ clib_bihash_8_8_t * h;
+ my_main_t *mm = &my_main;
+ clib_bihash_8_8_t *h;
+
+ h = &mm->hash_table;
+ kv.key = key_to_add_or_delete;
+ kv.value = value_to_add_or_delete;
+
+ clib_bihash_add_del_8_8 (h, &kv, is_add /* 1=add, 0=delete */);
+```
+
+In the delete case, kv.value is irrelevant. To change the value associated
+with an existing (key,value) pair, simply re-add the [new] pair.
+
+### Simple search
+
+The simplest possible (key, value) search goes like so:
+
+```c
+ clib_bihash_kv_8_8_t search_kv, return_kv;
+ clib_bihash_8_8_t * h;
+ my_main_t *mm = &my_main;
+ clib_bihash_8_8_t *h;
+
+ h = &mm->hash_table;
+ search_kv.key = key_to_add_or_delete;
+
+ if (clib_bihash_search_8_8 (h, &search_kv, &return_kv) < 0)
+ key_not_found()
+ else
+ key_not_found();
+```
+
+Note that it's perfectly fine to collect the lookup result
+
+```c
+ if (clib_bihash_search_8_8 (h, &search_kv, &search_kv))
+ key_not_found();
+ etc.
+```
+
+### Bihash vector processing
+
+When processing a vector of packets which need a certain lookup
+performed, it's worth the trouble to compute the key hash, and
+prefetch the correct bucket ahead of time.
+
+Here's a sketch of one way to write the required code:
+
+Dual-loop:
+* 6 packets ahead, prefetch 2x vlib_buffer_t's and 2x packet data
+ required to form the record keys
+* 4 packets ahead, form 2x record keys and call BV(clib_bihash_hash)
+ or the explicit hash function to calculate the record hashes.
+ Call 2x BV(clib_bihash_prefetch_bucket) to prefetch the buckets
+* 2 packets ahead, call 2x BV(clib_bihash_prefetch_data) to prefetch
+ 2x (key,value) data pages.
+* In the processing section, call 2x BV(clib_bihash_search_inline_with_hash)
+ to perform the search
+
+Programmer's choice whether to stash the hash code somewhere in
+vnet_buffer(b) metadata, or to use local variables.
+
+Single-loop:
+* Use simple search as shown above.
+
+### Walking a bihash table
+
+A fairly common scenario to build "show" commands involves walking a
+bihash table. It's simple enough:
+
+```c
+ my_main_t *mm = &my_main;
+ clib_bihash_8_8_t *h;
+ void callback_fn (clib_bihash_kv_8_8_t *, void *);
+
+ h = &mm->hash_table;
+
+ BV(clib_bihash_foreach_key_value_pair) (h, callback_fn, (void *) arg);
+```
+To nobody's great surprise: clib_bihash_foreach_key_value_pair
+iterates across the entire table, calling callback_fn with active
+entries.
+
+### Creating a new template instance
+
+Creating a new template is easy. Use one of the existing templates as
+a model, and make the obvious changes. The hash and key_compare
+methods are performance-critical in multiple senses.
+
+If the key compare method is slow, every lookup will be slow. If the
+hash function is slow, same story. If the hash function has poor
+statistical properties, space efficiency will suffer. In the limit, a
+bad enough hash function will cause large portions of the table to
+revert to linear search.
+
+Use of the best available vector unit is well worth the trouble in the
+hash and key_compare functions.
diff --git a/docs/gettingstarted/developers/building.rst b/docs/gettingstarted/developers/building.rst
new file mode 100644
index 00000000000..18fa943a6fb
--- /dev/null
+++ b/docs/gettingstarted/developers/building.rst
@@ -0,0 +1,151 @@
+.. _building:
+
+.. toctree::
+
+Building VPP
+============
+
+To get started developing with VPP you need to get the sources and build the packages.
+
+.. _setupproxies:
+
+Set up Proxies
+--------------
+
+Depending on the environment, proxies may need to be set.
+You may run these commands:
+
+.. code-block:: console
+
+ $ export http_proxy=http://<proxy-server-name>.com:<port-number>
+ $ export https_proxy=https://<proxy-server-name>.com:<port-number>
+
+
+Get the VPP Sources
+-------------------
+
+To get the VPP sources and get ready to build execute the following:
+
+.. code-block:: console
+
+ $ git clone https://gerrit.fd.io/r/vpp
+ $ cd vpp
+
+Build VPP Dependencies
+----------------------
+
+Before building, make sure there are no FD.io VPP or DPDK packages installed by entering the following
+commands:
+
+.. code-block:: console
+
+ $ dpkg -l | grep vpp
+ $ dpkg -l | grep DPDK
+
+There should be no output, or packages showing after each of the above commands.
+
+Run this to install the dependencies for FD.io VPP.
+If it hangs during downloading at any point, you may need to set up :ref:`proxies for this to work <setupproxies>`.
+
+.. code-block:: console
+
+ $ make install-dep
+ Hit:1 http://us.archive.ubuntu.com/ubuntu xenial InRelease
+ Get:2 http://us.archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
+ Get:3 http://security.ubuntu.com/ubuntu xenial-security InRelease [107 kB]
+ Get:4 http://us.archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]
+ Get:5 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [803 kB]
+ Get:6 http://us.archive.ubuntu.com/ubuntu xenial-updates/main i386 Packages [732 kB]
+ ...
+ ...
+ Update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jmap to provide /usr/bin/jmap (jmap) in auto mode
+ Setting up default-jdk-headless (2:1.8-56ubuntu2) ...
+ Processing triggers for libc-bin (2.23-0ubuntu3) ...
+ Processing triggers for systemd (229-4ubuntu6) ...
+ Processing triggers for ureadahead (0.100.0-19) ...
+ Processing triggers for ca-certificates (20160104ubuntu1) ...
+ Updating certificates in /etc/ssl/certs...
+ 0 added, 0 removed; done.
+ Running hooks in /etc/ca-certificates/update.d...
+
+ done.
+ done.
+
+Build VPP (Debug Mode)
+----------------------
+
+This build version contains debug symbols which is useful to modify VPP. The command below will build debug version of VPP.
+This build will come with /build-root/vpp_debug-native.
+
+.. code-block:: console
+
+ $ make build
+ make[1]: Entering directory '/home/vagrant/vpp-master/build-root'
+ @@@@ Arch for platform 'vpp' is native @@@@
+ @@@@ Finding source for dpdk @@@@
+ @@@@ Makefile fragment found in /home/vagrant/vpp-master/build-data/packages/dpdk.mk @@@@
+ @@@@ Source found in /home/vagrant/vpp-master/dpdk @@@@
+ @@@@ Arch for platform 'vpp' is native @@@@
+ @@@@ Finding source for vpp @@@@
+ @@@@ Makefile fragment found in /home/vagrant/vpp-master/build-data/packages/vpp.mk @@@@
+ @@@@ Source found in /home/vagrant/vpp-master/src @@@@
+ ...
+ ...
+ make[5]: Leaving directory '/home/vagrant/vpp-master/build-root/build-vpp_debug-native/vpp/vpp-api/java'
+ make[4]: Leaving directory '/home/vagrant/vpp-master/build-root/build-vpp_debug-native/vpp/vpp-api/java'
+ make[3]: Leaving directory '/home/vagrant/vpp-master/build-root/build-vpp_debug-native/vpp'
+ make[2]: Leaving directory '/home/vagrant/vpp-master/build-root/build-vpp_debug-native/vpp'
+ @@@@ Installing vpp: nothing to do @@@@
+ make[1]: Leaving directory '/home/vagrant/vpp-master/build-root'
+
+Build VPP (Release Version)
+---------------------------
+
+To build the release version of FD.io VPP.
+This build is optimized and will not create debug symbols.
+This build will come with /build-root/build-vpp-native
+
+.. code-block:: console
+
+ $ make release
+
+
+Building Necessary Packages
+---------------------------
+
+To build the debian packages, one of the following commands below depending on the system:
+
+Building Debian Packages
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: console
+
+ $ make pkg-deb
+
+
+Building RPM Packages
+^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: console
+
+ $ make pkg-rpm
+
+The packages will be found in the build-root directory.
+
+.. code-block:: console
+
+ $ ls *.deb
+
+ If packages built correctly, this should be the Output
+
+ vpp_18.07-rc0~456-gb361076_amd64.deb vpp-dbg_18.07-rc0~456-gb361076_amd64.deb
+ vpp-api-java_18.07-rc0~456-gb361076_amd64.deb vpp-dev_18.07-rc0~456-gb361076_amd64.deb
+ vpp-api-lua_18.07-rc0~456-gb361076_amd64.deb vpp-lib_18.07-rc0~456-gb361076_amd64.deb
+ vpp-api-python_18.07-rc0~456-gb361076_amd64.deb vpp-plugins_18.07-rc0~456-gb361076_amd64.deb
+
+Packages built installed end up in build-root directory. Finally, the command below installs all built packages.
+
+.. code-block:: console
+
+ $ sudo bash
+ # dpkg -i *.deb
diff --git a/docs/gettingstarted/developers/featurearcs.md b/docs/gettingstarted/developers/featurearcs.md
new file mode 100644
index 00000000000..f1e3ec47d05
--- /dev/null
+++ b/docs/gettingstarted/developers/featurearcs.md
@@ -0,0 +1,224 @@
+Feature Arcs
+============
+
+A significant number of vpp features are configurable on a per-interface
+or per-system basis. Rather than ask feature coders to manually
+construct the required graph arcs, we built a general mechanism to
+manage these mechanics.
+
+Specifically, feature arcs comprise ordered sets of graph nodes. Each
+feature node in an arc is independently controlled. Feature arc nodes
+are generally unaware of each other. Handing a packet to "the next
+feature node" is quite inexpensive.
+
+The feature arc implementation solves the problem of creating graph arcs
+used for steering.
+
+At the beginning of a feature arc, a bit of setup work is needed, but
+only if at least one feature is enabled on the arc.
+
+On a per-arc basis, individual feature definitions create a set of
+ordering dependencies. Feature infrastructure performs a topological
+sort of the ordering dependencies, to determine the actual feature
+order. Missing dependencies **will** lead to runtime disorder. See
+<https://gerrit.fd.io/r/#/c/12753> for an example.
+
+If no partial order exists, vpp will refuse to run. Circular dependency
+loops of the form "a then b, b then c, c then a" are impossible to
+satisfy.
+
+Adding a feature to an existing feature arc
+-------------------------------------------
+
+To nobody's great surprise, we set up feature arcs using the typical
+"macro -> constructor function -> list of declarations" pattern:
+
+```c
+ VNET_FEATURE_INIT (mactime, static) =
+ {
+ .arc_name = "device-input",
+ .node_name = "mactime",
+ .runs_before = VNET_FEATURES ("ethernet-input"),
+ };
+```
+
+This creates a "mactime" feature on the "device-input" arc.
+
+Once per frame, dig up the vnet\_feature\_config\_main\_t corresponding
+to the "device-input" feature arc:
+
+```c
+ vnet_main_t *vnm = vnet_get_main ();
+ vnet_interface_main_t *im = &vnm->interface_main;
+ u8 arc = im->output_feature_arc_index;
+ vnet_feature_config_main_t *fcm;
+
+ fcm = vnet_feature_get_config_main (arc);
+```
+
+Note that in this case, we've stored the required arc index - assigned
+by the feature infrastructure - in the vnet\_interface\_main\_t. Where
+to put the arc index is a programmer's decision when creating a feature
+arc.
+
+Per packet, set next0 to steer packets to the next node they should
+visit:
+
+```c
+ vnet_get_config_data (&fcm->config_main,
+ &b0->current_config_index /* value-result */,
+ &next0, 0 /* # bytes of config data */);
+```
+
+Configuration data is per-feature arc, and is often unused. Note that
+it's normal to reset next0 to divert packets elsewhere; often, to drop
+them for cause:
+
+```c
+ next0 = MACTIME_NEXT_DROP;
+ b0->error = node->errors[DROP_CAUSE];
+```
+
+Creating a feature arc
+----------------------
+
+Once again, we create feature arcs using constructor macros:
+
+```c
+ VNET_FEATURE_ARC_INIT (ip4_unicast, static) =
+ {
+ .arc_name = "ip4-unicast",
+ .start_nodes = VNET_FEATURES ("ip4-input", "ip4-input-no-checksum"),
+ .arc_index_ptr = &ip4_main.lookup_main.ucast_feature_arc_index,
+ };
+```
+
+In this case, we configure two arc start nodes to handle the
+"hardware-verified ip checksum or not" cases. During initialization,
+the feature infrastructure stores the arc index as shown.
+
+In the head-of-arc node, do the following to send packets along the
+feature arc:
+
+```c
+ ip_lookup_main_t *lm = &im->lookup_main;
+ arc = lm->ucast_feature_arc_index;
+```
+
+Once per packet, initialize packet metadata to walk the feature arc:
+
+```c
+vnet_feature_arc_start (arc, sw_if_index0, &next, b0);
+```
+
+Enabling / Disabling features
+-----------------------------
+
+Simply call vnet_feature_enable_disable to enable or disable a specific
+feature:
+
+```c
+ vnet_feature_enable_disable ("device-input", /* arc name */
+ "mactime", /* feature name */
+ sw_if_index, /* Interface sw_if_index */
+ enable_disable, /* 1 => enable */
+ 0 /* (void *) feature_configuration */,
+ 0 /* feature_configuration_nbytes */);
+```
+
+The feature_configuration opaque is seldom used.
+
+If you wish to make a feature a _de facto_ system-level concept, pass
+sw_if_index=0 at all times. Sw_if_index 0 is always valid, and
+corresponds to the "local" interface.
+
+Related "show" commands
+-----------------------
+
+To display the entire set of features, use "show features [verbose]". The
+verbose form displays arc indices, and feature indicies within the arcs
+
+```
+$ vppctl show features verbose
+Available feature paths
+<snip>
+[14] ip4-unicast:
+ [ 0]: nat64-out2in-handoff
+ [ 1]: nat64-out2in
+ [ 2]: nat44-ed-hairpin-dst
+ [ 3]: nat44-hairpin-dst
+ [ 4]: ip4-dhcp-client-detect
+ [ 5]: nat44-out2in-fast
+ [ 6]: nat44-in2out-fast
+ [ 7]: nat44-handoff-classify
+ [ 8]: nat44-out2in-worker-handoff
+ [ 9]: nat44-in2out-worker-handoff
+ [10]: nat44-ed-classify
+ [11]: nat44-ed-out2in
+ [12]: nat44-ed-in2out
+ [13]: nat44-det-classify
+ [14]: nat44-det-out2in
+ [15]: nat44-det-in2out
+ [16]: nat44-classify
+ [17]: nat44-out2in
+ [18]: nat44-in2out
+ [19]: ip4-qos-record
+ [20]: ip4-vxlan-gpe-bypass
+ [21]: ip4-reassembly-feature
+ [22]: ip4-not-enabled
+ [23]: ip4-source-and-port-range-check-rx
+ [24]: ip4-flow-classify
+ [25]: ip4-inacl
+ [26]: ip4-source-check-via-rx
+ [27]: ip4-source-check-via-any
+ [28]: ip4-policer-classify
+ [29]: ipsec-input-ip4
+ [30]: vpath-input-ip4
+ [31]: ip4-vxlan-bypass
+ [32]: ip4-lookup
+<snip>
+```
+
+Here, we learn that the ip4-unicast feature arc has index 14, and that
+e.g. ip4-inacl is the 25th feature in the generated partial order.
+
+To display the features currently active on a specific interface,
+use "show interface <name> features":
+
+```
+$ vppctl show interface GigabitEthernet3/0/0 features
+Feature paths configured on GigabitEthernet3/0/0...
+<snip>
+ip4-unicast:
+ nat44-out2in
+<snip>
+```
+
+Table of Feature Arcs
+---------------------
+
+Simply search for name-strings to track down the arc definition, location of
+the arc index, etc.
+
+```
+ | Arc Name |
+ |------------------|
+ | device-input |
+ | ethernet-output |
+ | interface-output |
+ | ip4-drop |
+ | ip4-local |
+ | ip4-multicast |
+ | ip4-output |
+ | ip4-punt |
+ | ip4-unicast |
+ | ip6-drop |
+ | ip6-local |
+ | ip6-multicast |
+ | ip6-output |
+ | ip6-punt |
+ | ip6-unicast |
+ | mpls-input |
+ | mpls-output |
+ | nsh-output |
+```
diff --git a/docs/gettingstarted/developers/index.rst b/docs/gettingstarted/developers/index.rst
new file mode 100644
index 00000000000..cccb18d731a
--- /dev/null
+++ b/docs/gettingstarted/developers/index.rst
@@ -0,0 +1,18 @@
+.. _gstarteddevel:
+
+##########
+Developers
+##########
+
+.. toctree::
+ :maxdepth: 2
+
+ building
+ softwarearchitecture
+ infrastructure
+ vlib
+ plugins
+ vnet
+ featurearcs
+ bihash
+
diff --git a/docs/gettingstarted/developers/infrastructure.md b/docs/gettingstarted/developers/infrastructure.md
new file mode 100644
index 00000000000..688c42133ed
--- /dev/null
+++ b/docs/gettingstarted/developers/infrastructure.md
@@ -0,0 +1,330 @@
+VPPINFRA (Infrastructure)
+=========================
+
+The files associated with the VPP Infrastructure layer are located in
+the ./src/vppinfra folder.
+
+VPPinfra is a collection of basic c-library services, quite
+sufficient to build standalone programs to run directly on bare metal.
+It also provides high-performance dynamic arrays, hashes, bitmaps,
+high-precision real-time clock support, fine-grained event-logging, and
+data structure serialization.
+
+One fair comment / fair warning about vppinfra: you can\'t always tell a
+macro from an inline function from an ordinary function simply by name.
+Macros are used to avoid function calls in the typical case, and to
+cause (intentional) side-effects.
+
+Vppinfra has been around for almost 20 years and tends not to change
+frequently. The VPP Infrastructure layer contains the following
+functions:
+
+Vectors
+-------
+
+Vppinfra vectors are ubiquitous dynamically resized arrays with by user
+defined \"headers\". Many vpppinfra data structures (e.g. hash, heap,
+pool) are vectors with various different headers.
+
+The memory layout looks like this:
+
+```
+ User header (optional, uword aligned)
+ Alignment padding (if needed)
+ Vector length in elements
+ User's pointer -> Vector element 0
+ Vector element 1
+ ...
+ Vector element N-1
+```
+
+As shown above, the vector APIs deal with pointers to the 0th element of
+a vector. Null pointers are valid vectors of length zero.
+
+To avoid thrashing the memory allocator, one often resets the length of
+a vector to zero while retaining the memory allocation. Set the vector
+length field to zero via the vec\_reset\_length(v) macro. \[Use the
+macro! It's smart about NULL pointers.\]
+
+Typically, the user header is not present. User headers allow for other
+data structures to be built atop vppinfra vectors. Users may specify the
+alignment for data elements via the [vec]()\*\_aligned macros.
+
+Vectors elements can be any C type e.g. (int, double, struct bar). This
+is also true for data types built atop vectors (e.g. heap, pool, etc.).
+Many macros have \_a variants supporting alignment of vector data and
+\_h variants supporting non-zero-length vector headers. The \_ha
+variants support both.
+
+Inconsistent usage of header and/or alignment related macro variants
+will cause delayed, confusing failures.
+
+Standard programming error: memorize a pointer to the ith element of a
+vector, and then expand the vector. Vectors expand by 3/2, so such code
+may appear to work for a period of time. Correct code almost always
+memorizes vector **indices** which are invariant across reallocations.
+
+In typical application images, one supplies a set of global functions
+designed to be called from gdb. Here are a few examples:
+
+- vl(v) - prints vec\_len(v)
+- pe(p) - prints pool\_elts(p)
+- pifi(p, index) - prints pool\_is\_free\_index(p, index)
+- debug\_hex\_bytes (p, nbytes) - hex memory dump nbytes starting at p
+
+Use the "show gdb" debug CLI command to print the current set.
+
+Bitmaps
+-------
+
+Vppinfra bitmaps are dynamic, built using the vppinfra vector APIs.
+Quite handy for a variety jobs.
+
+Pools
+-----
+
+Vppinfra pools combine vectors and bitmaps to rapidly allocate and free
+fixed-size data structures with independent lifetimes. Pools are perfect
+for allocating per-session structures.
+
+Hashes
+------
+
+Vppinfra provides several hash flavors. Data plane problems involving
+packet classification / session lookup often use
+./src/vppinfra/bihash\_template.\[ch\] bounded-index extensible
+hashes. These templates are instantiated multiple times, to efficiently
+service different fixed-key sizes.
+
+Bihashes are thread-safe. Read-locking is not required. A simple
+spin-lock ensures that only one thread writes an entry at a time.
+
+The original vppinfra hash implementation in
+./src/vppinfra/hash.\[ch\] are simple to use, and are often used in
+control-plane code which needs exact-string-matching.
+
+In either case, one almost always looks up a key in a hash table to
+obtain an index in a related vector or pool. The APIs are simple enough,
+but one must take care when using the unmanaged arbitrary-sized key
+variant. Hash\_set\_mem (hash\_table, key\_pointer, value) memorizes
+key\_pointer. It is usually a bad mistake to pass the address of a
+vector element as the second argument to hash\_set\_mem. It is perfectly
+fine to memorize constant string addresses in the text segment.
+
+Format
+------
+
+Vppinfra format is roughly equivalent to printf.
+
+Format has a few properties worth mentioning. Format's first argument is
+a (u8 \*) vector to which it appends the result of the current format
+operation. Chaining calls is very easy:
+
+```c
+ u8 * result;
+
+ result = format (0, "junk = %d, ", junk);
+ result = format (result, "more junk = %d\n", more_junk);
+```
+
+As previously noted, NULL pointers are perfectly proper 0-length
+vectors. Format returns a (u8 \*) vector, **not** a C-string. If you
+wish to print a (u8 \*) vector, use the "%v" format string. If you need
+a (u8 \*) vector which is also a proper C-string, either of these
+schemes may be used:
+
+```c
+ vec_add1 (result, 0)
+ or
+ result = format (result, "<whatever>%c", 0);
+```
+
+Remember to vec\_free() the result if appropriate. Be careful not to
+pass format an uninitialized (u8 \*).
+
+Format implements a particularly handy user-format scheme via the "%U"
+format specification. For example:
+
+```c
+ u8 * format_junk (u8 * s, va_list *va)
+ {
+ junk = va_arg (va, u32);
+ s = format (s, "%s", junk);
+ return s;
+ }
+
+ result = format (0, "junk = %U, format_junk, "This is some junk");
+```
+
+format\_junk() can invoke other user-format functions if desired. The
+programmer shoulders responsibility for argument type-checking. It is
+typical for user format functions to blow up if the va\_arg(va,
+type) macros don't match the caller's idea of reality.
+
+Unformat
+--------
+
+Vppinfra unformat is vaguely related to scanf, but considerably more
+general.
+
+A typical use case involves initializing an unformat\_input\_t from
+either a C-string or a (u8 \*) vector, then parsing via unformat() as
+follows:
+
+```c
+ unformat_input_t input;
+
+ unformat_init_string (&input, "<some-C-string>");
+ /* or */
+ unformat_init_vector (&input, <u8-vector>);
+```
+
+Then loop parsing individual elements:
+
+```c
+ while (unformat_check_input (&input) != UNFORMAT_END_OF_INPUT)
+ {
+ if (unformat (&input, "value1 %d", &value1))
+ ;/* unformat sets value1 */
+ else if (unformat (&input, "value2 %d", &value2)
+ ;/* unformat sets value2 */
+ else
+ return clib_error_return (0, "unknown input '%U'",
+ format_unformat_error, input);
+ }
+```
+
+As with format, unformat implements a user-unformat function capability
+via a "%U" user unformat function scheme.
+
+Vppinfra errors and warnings
+----------------------------
+
+Many functions within the vpp dataplane have return-values of type
+clib\_error\_t \*. Clib\_error\_t's are arbitrary strings with a bit of
+metadata \[fatal, warning\] and are easy to announce. Returning a NULL
+clib\_error\_t \* indicates "A-OK, no error."
+
+Clib\_warning(format-args) is a handy way to add debugging
+output; clib warnings prepend function:line info to unambiguously locate
+the message source. Clib\_unix\_warning() adds perror()-style Linux
+system-call information. In production images, clib\_warnings result in
+syslog entries.
+
+Serialization
+-------------
+
+Vppinfra serialization support allows the programmer to easily serialize
+and unserialize complex data structures.
+
+The underlying primitive serialize/unserialize functions use network
+byte-order, so there are no structural issues serializing on a
+little-endian host and unserializing on a big-endian host.
+
+Event-logger, graphical event log viewer
+----------------------------------------
+
+The vppinfra event logger provides very lightweight (sub-100ns)
+precisely time-stamped event-logging services. See
+./src/vppinfra/{elog.c, elog.h}
+
+Serialization support makes it easy to save and ultimately to combine a
+set of event logs. In a distributed system running NTP over a local LAN,
+we find that event logs collected from multiple system elements can be
+combined with a temporal uncertainty no worse than 50us.
+
+A typical event definition and logging call looks like this:
+
+```c
+ ELOG_TYPE_DECLARE (e) =
+ {
+ .format = "tx-msg: stream %d local seq %d attempt %d",
+ .format_args = "i4i4i4",
+ };
+ struct { u32 stream_id, local_sequence, retry_count; } * ed;
+ ed = ELOG_DATA (m->elog_main, e);
+ ed->stream_id = stream_id;
+ ed->local_sequence = local_sequence;
+ ed->retry_count = retry_count;
+```
+
+The ELOG\_DATA macro returns a pointer to 20 bytes worth of arbitrary
+event data, to be formatted (offline, not at runtime) as described by
+format\_args. Aside from obvious integer formats, the CLIB event logger
+provides a couple of interesting additions. The "t4" format
+pretty-prints enumerated values:
+
+```c
+ ELOG_TYPE_DECLARE (e) =
+ {
+ .format = "get_or_create: %s",
+ .format_args = "t4",
+ .n_enum_strings = 2,
+ .enum_strings = { "old", "new", },
+ };
+```
+
+The "t" format specifier indicates that the corresponding datum is an
+index in the event's set of enumerated strings, as shown in the previous
+event type definition.
+
+The “T” format specifier indicates that the corresponding datum is an
+index in the event log’s string heap. This allows the programmer to emit
+arbitrary formatted strings. One often combines this facility with a
+hash table to keep the event-log string heap from growing arbitrarily
+large.
+
+Noting the 20-octet limit per-log-entry data field, the event log
+formatter supports arbitrary combinations of these data types. As in:
+the ".format" field may contain one or more instances of the following:
+
+- i1 - 8-bit unsigned integer
+- i2 - 16-bit unsigned integer
+- i4 - 32-bit unsigned integer
+- i8 - 64-bit unsigned integer
+- f4 - float
+- f8 - double
+- s - NULL-terminated string - be careful
+- sN - N-byte character array
+- t1,2,4 - per-event enumeration ID
+- T4 - Event-log string table offset
+
+The vpp engine event log is thread-safe, and is shared by all threads.
+Take care not to serialize the computation. Although the event-logger is
+about as fast as practicable, it's not appropriate for per-packet use in
+hard-core data plane code. It's most appropriate for capturing rare
+events - link up-down events, specific control-plane events and so
+forth.
+
+The vpp engine has several debug CLI commands for manipulating its event
+log:
+
+```
+ vpp# event-logger clear
+ vpp# event-logger save <filename> # for security, writes into /tmp/<filename>.
+ # <filename> must not contain '.' or '/' characters
+ vpp# show event-logger [all] [<nnn>] # display the event log
+ # by default, the last 250 entries
+```
+
+The event log defaults to 128K entries. The command-line argument "...
+vlib { elog-events nnn } ..." configures the size of the event log.
+
+As described above, the vpp engine event log is thread-safe and shared.
+To avoid confusing non-appearance of events logged by worker threads,
+make sure to code vlib\_global\_main.elog\_main - instead of
+vm->elog\_main. The latter form is correct in the main thread, but
+will almost certainly produce bad results in worker threads.
+
+G2 graphical event viewer
+-------------------------
+
+The g2 graphical event viewer can display serialized vppinfra event logs
+directly, or via the c2cpel tool.
+
+<div class="admonition note">
+
+Todo: please convert wiki page and figures
+
+</div>
+
diff --git a/docs/gettingstarted/developers/plugins.md b/docs/gettingstarted/developers/plugins.md
new file mode 100644
index 00000000000..ba3a2446306
--- /dev/null
+++ b/docs/gettingstarted/developers/plugins.md
@@ -0,0 +1,11 @@
+
+Plugins
+=======
+
+vlib implements a straightforward plug-in DLL mechanism. VLIB client
+applications specify a directory to search for plug-in .DLLs, and a name
+filter to apply (if desired). VLIB needs to load plug-ins very early.
+
+Once loaded, the plug-in DLL mechanism uses dlsym to find and verify a
+vlib\_plugin\_registration data structure in the newly-loaded plug-in.
+
diff --git a/docs/gettingstarted/developers/softwarearchitecture.md b/docs/gettingstarted/developers/softwarearchitecture.md
new file mode 100644
index 00000000000..a663134cd46
--- /dev/null
+++ b/docs/gettingstarted/developers/softwarearchitecture.md
@@ -0,0 +1,44 @@
+Software Architecture
+=====================
+
+The fd.io vpp implementation is a third-generation vector packet
+processing implementation specifically related to US Patent 7,961,636,
+as well as earlier work. Note that the Apache-2 license specifically
+grants non-exclusive patent licenses; we mention this patent as a point
+of historical interest.
+
+For performance, the vpp dataplane consists of a directed graph of
+forwarding nodes which process multiple packets per invocation. This
+schema enables a variety of micro-processor optimizations: pipelining
+and prefetching to cover dependent read latency, inherent I-cache phase
+behavior, vector instructions. Aside from hardware input and hardware
+output nodes, the entire forwarding graph is portable code.
+
+Depending on the scenario at hand, we often spin up multiple worker
+threads which process ingress-hashes packets from multiple queues using
+identical forwarding graph replicas.
+
+VPP Layers - Implementation Taxonomy
+------------------------------------
+
+![image](/_images/VPP_Layering.png)
+
+- VPP Infra - the VPP infrastructure layer, which contains the core
+ library source code. This layer performs memory functions, works
+ with vectors and rings, performs key lookups in hash tables, and
+ works with timers for dispatching graph nodes.
+- VLIB - the vector processing library. The vlib layer also handles
+ various application management functions: buffer, memory and graph
+ node management, maintaining and exporting counters, thread
+ management, packet tracing. Vlib implements the debug CLI (command
+ line interface).
+- VNET - works with VPP\'s networking interface (layers 2, 3, and 4)
+ performs session and traffic management, and works with devices and
+ the data control plane.
+- Plugins - Contains an increasingly rich set of data-plane plugins,
+ as noted in the above diagram.
+- VPP - the container application linked against all of the above.
+
+It's important to understand each of these layers in a certain amount of
+detail. Much of the implementation is best dealt with at the API level
+and otherwise left alone.
diff --git a/docs/gettingstarted/developers/vlib.md b/docs/gettingstarted/developers/vlib.md
new file mode 100644
index 00000000000..9ef37fd2657
--- /dev/null
+++ b/docs/gettingstarted/developers/vlib.md
@@ -0,0 +1,496 @@
+
+VLIB (Vector Processing Library)
+================================
+
+The files associated with vlib are located in the ./src/{vlib,
+vlibapi, vlibmemory} folders. These libraries provide vector
+processing support including graph-node scheduling, reliable multicast
+support, ultra-lightweight cooperative multi-tasking threads, a CLI,
+plug in .DLL support, physical memory and Linux epoll support. Parts of
+this library embody US Patent 7,961,636.
+
+Init function discovery
+-----------------------
+
+vlib applications register for various \[initialization\] events by
+placing structures and \_\_attribute\_\_((constructor)) functions into
+the image. At appropriate times, the vlib framework walks
+constructor-generated singly-linked structure lists, calling the
+indicated functions. vlib applications create graph nodes, add CLI
+functions, start cooperative multi-tasking threads, etc. etc. using this
+mechanism.
+
+vlib applications invariably include a number of VLIB\_INIT\_FUNCTION
+(my\_init\_function) macros.
+
+Each init / configure / etc. function has the return type clib\_error\_t
+\*. Make sure that the function returns 0 if all is well, otherwise the
+framework will announce an error and exit.
+
+vlib applications must link against vppinfra, and often link against
+other libraries such as VNET. In the latter case, it may be necessary to
+explicitly reference symbol(s) otherwise large portions of the library
+may be AWOL at runtime.
+
+Node Graph Initialization
+-------------------------
+
+vlib packet-processing applications invariably define a set of graph
+nodes to process packets.
+
+One constructs a vlib\_node\_registration\_t, most often via the
+VLIB\_REGISTER\_NODE macro. At runtime, the framework processes the set
+of such registrations into a directed graph. It is easy enough to add
+nodes to the graph at runtime. The framework does not support removing
+nodes.
+
+vlib provides several types of vector-processing graph nodes, primarily
+to control framework dispatch behaviors. The type member of the
+vlib\_node\_registration\_t functions as follows:
+
+- VLIB\_NODE\_TYPE\_PRE\_INPUT - run before all other node types
+- VLIB\_NODE\_TYPE\_INPUT - run as often as possible, after pre\_input
+ nodes
+- VLIB\_NODE\_TYPE\_INTERNAL - only when explicitly made runnable by
+ adding pending frames for processing
+- VLIB\_NODE\_TYPE\_PROCESS - only when explicitly made runnable.
+ "Process" nodes are actually cooperative multi-tasking threads. They
+ **must** explicitly suspend after a reasonably short period of time.
+
+For a precise understanding of the graph node dispatcher, please read
+./src/vlib/main.c:vlib\_main\_loop.
+
+Graph node dispatcher
+---------------------
+
+Vlib\_main\_loop() dispatches graph nodes. The basic vector processing
+algorithm is diabolically simple, but may not be obvious from even a
+long stare at the code. Here's how it works: some input node, or set of
+input nodes, produce a vector of work to process. The graph node
+dispatcher pushes the work vector through the directed graph,
+subdividing it as needed, until the original work vector has been
+completely processed. At that point, the process recurs.
+
+This scheme yields a stable equilibrium in frame size, by construction.
+Here's why: as the frame size increases, the per-frame-element
+processing time decreases. There are several related forces at work; the
+simplest to describe is the effect of vector processing on the CPU L1
+I-cache. The first frame element \[packet\] processed by a given node
+warms up the node dispatch function in the L1 I-cache. All subsequent
+frame elements profit. As we increase the number of frame elements, the
+cost per element goes down.
+
+Under light load, it is a crazy waste of CPU cycles to run the graph
+node dispatcher flat-out. So, the graph node dispatcher arranges to wait
+for work by sitting in a timed epoll wait if the prevailing frame size
+is low. The scheme has a certain amount of hysteresis to avoid
+constantly toggling back and forth between interrupt and polling mode.
+Although the graph dispatcher supports interrupt and polling modes, our
+current default device drivers do not.
+
+The graph node scheduler uses a hierarchical timer wheel to reschedule
+process nodes upon timer expiration.
+
+Graph dispatcher internals
+--------------------------
+
+This section may be safely skipped. It's not necessary to understand
+graph dispatcher internals to create graph nodes.
+
+Vector Data Structure
+---------------------
+
+In vpp / vlib, we represent vectors as instances of the vlib_frame_t type:
+
+```c
+ typedef struct vlib_frame_t
+ {
+ /* Frame flags. */
+ u16 flags;
+
+ /* Number of scalar bytes in arguments. */
+ u8 scalar_size;
+
+ /* Number of bytes per vector argument. */
+ u8 vector_size;
+
+ /* Number of vector elements currently in frame. */
+ u16 n_vectors;
+
+ /* Scalar and vector arguments to next node. */
+ u8 arguments[0];
+ } vlib_frame_t;
+```
+
+Note that one _could_ construct all kinds of vectors - including
+vectors with some associated scalar data - using this structure. In
+the vpp application, vectors typically use a 4-byte vector element
+size, and zero bytes' worth of associated per-frame scalar data.
+
+Frames are always allocated on CLIB_CACHE_LINE_BYTES boundaries.
+Frames have u32 indices which make use of the alignment property, so
+the maximum feasible main heap offset of a frame is
+CLIB_CACHE_LINE_BYTES * 0xFFFFFFFF: 64*4 = 256 Gbytes.
+
+Scheduling Vectors
+------------------
+
+As you can see, vectors are not directly associated with graph
+nodes. We represent that association in a couple of ways. The
+simplest is the vlib\_pending\_frame\_t:
+
+```c
+ /* A frame pending dispatch by main loop. */
+ typedef struct
+ {
+ /* Node and runtime for this frame. */
+ u32 node_runtime_index;
+
+ /* Frame index (in the heap). */
+ u32 frame_index;
+
+ /* Start of next frames for this node. */
+ u32 next_frame_index;
+
+ /* Special value for next_frame_index when there is no next frame. */
+ #define VLIB_PENDING_FRAME_NO_NEXT_FRAME ((u32) ~0)
+ } vlib_pending_frame_t;
+```
+
+Here is the code in .../src/vlib/main.c:vlib_main_or_worker_loop()
+which processes frames:
+
+```c
+ /*
+ * Input nodes may have added work to the pending vector.
+ * Process pending vector until there is nothing left.
+ * All pending vectors will be processed from input -> output.
+ */
+ for (i = 0; i < _vec_len (nm->pending_frames); i++)
+ cpu_time_now = dispatch_pending_node (vm, i, cpu_time_now);
+ /* Reset pending vector for next iteration. */
+```
+
+The pending frame node_runtime_index associates the frame with the
+node which will process it.
+
+Complications
+-------------
+
+Fasten your seatbelt. Here's where the story - and the data structures
+\- become quite complicated...
+
+At 100,000 feet: vpp uses a directed graph, not a directed _acyclic_
+graph. It's really quite normal for a packet to visit ip\[46\]-lookup
+multiple times. The worst-case: a graph node which enqueues packets to
+itself.
+
+To deal with this issue, the graph dispatcher must force allocation of
+a new frame if the current graph node's dispatch function happens to
+enqueue a packet back to itself.
+
+There are no guarantees that a pending frame will be processed
+immediately, which means that more packets may be added to the
+underlying vlib_frame_t after it has been attached to a
+vlib_pending_frame_t. Care must be taken to allocate new
+frames and pending frames if a (pending\_frame, frame) pair fills.
+
+Next frames, next frame ownership
+---------------------------------
+
+The vlib\_next\_frame\_t is the last key graph dispatcher data structure:
+
+```c
+ typedef struct
+ {
+ /* Frame index. */
+ u32 frame_index;
+
+ /* Node runtime for this next. */
+ u32 node_runtime_index;
+
+ /* Next frame flags. */
+ u32 flags;
+
+ /* Reflects node frame-used flag for this next. */
+ #define VLIB_FRAME_NO_FREE_AFTER_DISPATCH \
+ VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH
+
+ /* This next frame owns enqueue to node
+ corresponding to node_runtime_index. */
+ #define VLIB_FRAME_OWNER (1 << 15)
+
+ /* Set when frame has been allocated for this next. */
+ #define VLIB_FRAME_IS_ALLOCATED VLIB_NODE_FLAG_IS_OUTPUT
+
+ /* Set when frame has been added to pending vector. */
+ #define VLIB_FRAME_PENDING VLIB_NODE_FLAG_IS_DROP
+
+ /* Set when frame is to be freed after dispatch. */
+ #define VLIB_FRAME_FREE_AFTER_DISPATCH VLIB_NODE_FLAG_IS_PUNT
+
+ /* Set when frame has traced packets. */
+ #define VLIB_FRAME_TRACE VLIB_NODE_FLAG_TRACE
+
+ /* Number of vectors enqueue to this next since last overflow. */
+ u32 vectors_since_last_overflow;
+ } vlib_next_frame_t;
+```
+
+Graph node dispatch functions call vlib\_get\_next\_frame (...) to
+set "(u32 \*)to_next" to the right place in the vlib_frame_t
+corresponding to the ith arc (aka next0) from the current node to the
+indicated next node.
+
+After some scuffling around - two levels of macros - processing
+reaches vlib\_get\_next\_frame_internal (...). Get-next-frame-internal
+digs up the vlib\_next\_frame\_t corresponding to the desired graph
+arc.
+
+The next frame data structure amounts to a graph-arc-centric frame
+cache. Once a node finishes adding element to a frame, it will acquire
+a vlib_pending_frame_t and end up on the graph dispatcher's
+run-queue. But there's no guarantee that more vector elements won't be
+added to the underlying frame from the same (source\_node,
+next\_index) arc or from a different (source\_node, next\_index) arc.
+
+Maintaining consistency of the arc-to-frame cache is necessary. The
+first step in maintaining consistency is to make sure that only one
+graph node at a time thinks it "owns" the target vlib\_frame\_t.
+
+Back to the graph node dispatch function. In the usual case, a certain
+number of packets will be added to the vlib\_frame\_t acquired by
+calling vlib\_get\_next\_frame (...).
+
+Before a dispatch function returns, it's required to call
+vlib\_put\_next\_frame (...) for all of the graph arcs it actually
+used. This action adds a vlib\_pending\_frame\_t to the graph
+dispatcher's pending frame vector.
+
+Vlib\_put\_next\_frame makes a note in the pending frame of the frame
+index, and also of the vlib\_next\_frame\_t index.
+
+dispatch\_pending\_node actions
+-------------------------------
+
+The main graph dispatch loop calls dispatch pending node as shown
+above.
+
+Dispatch\_pending\_node recovers the pending frame, and the graph node
+runtime / dispatch function. Further, it recovers the next\_frame
+currently associated with the vlib\_frame\_t, and detaches the
+vlib\_frame\_t from the next\_frame.
+
+In .../src/vlib/main.c:dispatch\_pending\_node(...), note this stanza:
+
+```c
+ /* Force allocation of new frame while current frame is being
+ dispatched. */
+ restore_frame_index = ~0;
+ if (nf->frame_index == p->frame_index)
+ {
+ nf->frame_index = ~0;
+ nf->flags &= ~VLIB_FRAME_IS_ALLOCATED;
+ if (!(n->flags & VLIB_NODE_FLAG_FRAME_NO_FREE_AFTER_DISPATCH))
+ restore_frame_index = p->frame_index;
+ }
+```
+
+dispatch\_pending\_node is worth a hard stare due to the several
+second-order optimizations it implements. Almost as an afterthought,
+it calls dispatch_node which actually calls the graph node dispatch
+function.
+
+Process / thread model
+----------------------
+
+vlib provides an ultra-lightweight cooperative multi-tasking thread
+model. The graph node scheduler invokes these processes in much the same
+way as traditional vector-processing run-to-completion graph nodes;
+plus-or-minus a setjmp/longjmp pair required to switch stacks. Simply
+set the vlib\_node\_registration\_t type field to
+vlib\_NODE\_TYPE\_PROCESS. Yes, process is a misnomer. These are
+cooperative multi-tasking threads.
+
+As of this writing, the default stack size is 2<<15 = 32kb.
+Initialize the node registration's process\_log2\_n\_stack\_bytes member
+as needed. The graph node dispatcher makes some effort to detect stack
+overrun, e.g. by mapping a no-access page below each thread stack.
+
+Process node dispatch functions are expected to be "while(1) { }" loops
+which suspend when not otherwise occupied, and which must not run for
+unreasonably long periods of time.
+
+"Unreasonably long" is an application-dependent concept. Over the years,
+we have constructed frame-size sensitive control-plane nodes which will
+use a much higher fraction of the available CPU bandwidth when the frame
+size is low. The classic example: modifying forwarding tables. So long
+as the table-builder leaves the forwarding tables in a valid state, one
+can suspend the table builder to avoid dropping packets as a result of
+control-plane activity.
+
+Process nodes can suspend for fixed amounts of time, or until another
+entity signals an event, or both. See the next section for a description
+of the vlib process event mechanism.
+
+When running in vlib process context, one must pay strict attention to
+loop invariant issues. If one walks a data structure and calls a
+function which may suspend, one had best know by construction that it
+cannot change. Often, it's best to simply make a snapshot copy of a data
+structure, walk the copy at leisure, then free the copy.
+
+Process events
+--------------
+
+The vlib process event mechanism API is extremely lightweight and easy
+to use. Here is a typical example:
+
+```c
+ vlib_main_t *vm = &vlib_global_main;
+ uword event_type, * event_data = 0;
+
+ while (1)
+ {
+ vlib_process_wait_for_event_or_clock (vm, 5.0 /* seconds */);
+
+ event_type = vlib_process_get_events (vm, &event_data);
+
+ switch (event_type) {
+ case EVENT1:
+ handle_event1s (event_data);
+ break;
+
+ case EVENT2:
+ handle_event2s (event_data);
+ break;
+
+ case ~0: /* 5-second idle/periodic */
+ handle_idle ();
+ break;
+
+ default: /* bug! */
+ ASSERT (0);
+ }
+
+ vec_reset_length(event_data);
+ }
+```
+
+In this example, the VLIB process node waits for an event to occur, or
+for 5 seconds to elapse. The code demuxes on the event type, calling
+the appropriate handler function. Each call to
+vlib\_process\_get\_events returns a vector of per-event-type data
+passed to successive vlib\_process\_signal\_event calls; it is a
+serious error to process only event\_data\[0\].
+
+Resetting the event\_data vector-length to 0 \[instead of calling
+vec\_free\] means that the event scheme doesn't burn cycles continuously
+allocating and freeing the event data vector. This is a common vppinfra
+/ vlib coding pattern, well worth using when appropriate.
+
+Signaling an event is easy, for example:
+
+```c
+ vlib_process_signal_event (vm, process_node_index, EVENT1,
+ (uword)arbitrary_event1_data); /* and so forth */
+```
+
+One can either know the process node index by construction - dig it out
+of the appropriate vlib\_node\_registration\_t - or by finding the
+vlib\_node\_t with vlib\_get\_node\_by\_name(...).
+
+Buffers
+-------
+
+vlib buffering solves the usual set of packet-processing problems,
+albeit at high performance. Key in terms of performance: one ordinarily
+allocates / frees N buffers at a time rather than one at a time. Except
+when operating directly on a specific buffer, one deals with buffers by
+index, not by pointer.
+
+Packet-processing frames are u32\[\] arrays, not
+vlib\_buffer\_t\[\] arrays.
+
+Packets comprise one or more vlib buffers, chained together as required.
+Multiple particle sizes are supported; hardware input nodes simply ask
+for the required size(s). Coalescing support is available. For obvious
+reasons one is discouraged from writing one's own wild and wacky buffer
+chain traversal code.
+
+vlib buffer headers are allocated immediately prior to the buffer data
+area. In typical packet processing this saves a dependent read wait:
+given a buffer's address, one can prefetch the buffer header
+\[metadata\] at the same time as the first cache line of buffer data.
+
+Buffer header metadata (vlib\_buffer\_t) includes the usual rewrite
+expansion space, a current\_data offset, RX and TX interface indices,
+packet trace information, and a opaque areas.
+
+The opaque data is intended to control packet processing in arbitrary
+subgraph-dependent ways. The programmer shoulders responsibility for
+data lifetime analysis, type-checking, etc.
+
+Buffers have reference-counts in support of e.g. multicast replication.
+
+Shared-memory message API
+-------------------------
+
+Local control-plane and application processes interact with the vpp
+dataplane via asynchronous message-passing in shared memory over
+unidirectional queues. The same application APIs are available via
+sockets.
+
+Capturing API traces and replaying them in a simulation environment
+requires a disciplined approach to the problem. This seems like a
+make-work task, but it is not. When something goes wrong in the
+control-plane after 300,000 or 3,000,000 operations, high-speed replay
+of the events leading up to the accident is a huge win.
+
+The shared-memory message API message allocator vl\_api\_msg\_alloc uses
+a particularly cute trick. Since messages are processed in order, we try
+to allocate message buffering from a set of fixed-size, preallocated
+rings. Each ring item has a "busy" bit. Freeing one of the preallocated
+message buffers merely requires the message consumer to clear the busy
+bit. No locking required.
+
+Debug CLI
+---------
+
+Adding debug CLI commands to VLIB applications is very simple.
+
+Here is a complete example:
+
+```c
+ static clib_error_t *
+ show_ip_tuple_match (vlib_main_t * vm,
+ unformat_input_t * input,
+ vlib_cli_command_t * cmd)
+ {
+ vlib_cli_output (vm, "%U\n", format_ip_tuple_match_tables, &routing_main);
+ return 0;
+ }
+
+ /* *INDENT-OFF* */
+ static VLIB_CLI_COMMAND (show_ip_tuple_command) =
+ {
+ .path = "show ip tuple match",
+ .short_help = "Show ip 5-tuple match-and-broadcast tables",
+ .function = show_ip_tuple_match,
+ };
+ /* *INDENT-ON* */
+```
+
+This example implements the "show ip tuple match" debug cli
+command. In ordinary usage, the vlib cli is available via the "vppctl"
+applicationn, which sends traffic to a named pipe. One can configure
+debug CLI telnet access on a configurable port.
+
+The cli implementation has an output redirection facility which makes it
+simple to deliver cli output via shared-memory API messaging,
+
+Particularly for debug or "show tech support" type commands, it would be
+wasteful to write vlib application code to pack binary data, write more
+code elsewhere to unpack the data and finally print the answer. If a
+certain cli command has the potential to hurt packet processing
+performance by running for too long, do the work incrementally in a
+process node. The client can wait.
diff --git a/docs/gettingstarted/developers/vnet.md b/docs/gettingstarted/developers/vnet.md
new file mode 100644
index 00000000000..191a2a16969
--- /dev/null
+++ b/docs/gettingstarted/developers/vnet.md
@@ -0,0 +1,171 @@
+
+VNET (VPP Network Stack)
+========================
+
+The files associated with the VPP network stack layer are located in the
+./src/vnet folder. The Network Stack Layer is basically an
+instantiation of the code in the other layers. This layer has a vnet
+library that provides vectorized layer-2 and 3 networking graph nodes, a
+packet generator, and a packet tracer.
+
+In terms of building a packet processing application, vnet provides a
+platform-independent subgraph to which one connects a couple of
+device-driver nodes.
+
+Typical RX connections include "ethernet-input" \[full software
+classification, feeds ipv4-input, ipv6-input, arp-input etc.\] and
+"ipv4-input-no-checksum" \[if hardware can classify, perform ipv4 header
+checksum\].
+
+![image](/_images/VNET_Features.png)
+
+List of features and layer areas that VNET works with:
+
+Effective graph dispatch function coding
+----------------------------------------
+
+Over the 15 years, multiple coding styles have emerged: a
+single/dual/quad loop coding model (with variations) and a
+fully-pipelined coding model.
+
+Single/dual loops
+-----------------
+
+The single/dual/quad loop model variations conveniently solve problems
+where the number of items to process is not known in advance: typical
+hardware RX-ring processing. This coding style is also very effective
+when a given node will not need to cover a complex set of dependent
+reads.
+
+Here is an quad/single loop which can leverage up-to-avx512 SIMD vector
+units to convert buffer indices to buffer pointers:
+
+```c
+ static uword
+ simulated_ethernet_interface_tx (vlib_main_t * vm,
+ vlib_node_runtime_t *
+ node, vlib_frame_t * frame)
+ {
+ u32 n_left_from, *from;
+ u32 next_index = 0;
+ u32 n_bytes;
+ u32 thread_index = vm->thread_index;
+ vnet_main_t *vnm = vnet_get_main ();
+ vnet_interface_main_t *im = &vnm->interface_main;
+ vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
+ u16 nexts[VLIB_FRAME_SIZE], *next;
+
+ n_left_from = frame->n_vectors;
+ from = vlib_frame_args (frame);
+
+ /*
+ * Convert up to VLIB_FRAME_SIZE indices in "from" to
+ * buffer pointers in bufs[]
+ */
+ vlib_get_buffers (vm, from, bufs, n_left_from);
+ b = bufs;
+ next = nexts;
+
+ /*
+ * While we have at least 4 vector elements (pkts) to process..
+ */
+ while (n_left_from >= 4)
+ {
+ /* Prefetch next quad-loop iteration. */
+ if (PREDICT_TRUE (n_left_from >= 8))
+ {
+ vlib_prefetch_buffer_header (b[4], STORE);
+ vlib_prefetch_buffer_header (b[5], STORE);
+ vlib_prefetch_buffer_header (b[6], STORE);
+ vlib_prefetch_buffer_header (b[7], STORE);
+ }
+
+ /*
+ * $$$ Process 4x packets right here...
+ * set next[0..3] to send the packets where they need to go
+ */
+
+ do_something_to (b[0]);
+ do_something_to (b[1]);
+ do_something_to (b[2]);
+ do_something_to (b[3]);
+
+ /* Process the next 0..4 packets */
+ b += 4;
+ next += 4;
+ n_left_from -= 4;
+ }
+ /*
+ * Clean up 0...3 remaining packets at the end of the incoming frame
+ */
+ while (n_left_from > 0)
+ {
+ /*
+ * $$$ Process one packet right here...
+ * set next[0..3] to send the packets where they need to go
+ */
+ do_something_to (b[0]);
+
+ /* Process the next packet */
+ b += 1;
+ next += 1;
+ n_left_from -= 1;
+ }
+
+ /*
+ * Send the packets along their respective next-node graph arcs
+ * Considerable locality of reference is expected, most if not all
+ * packets in the inbound vector will traverse the same next-node
+ * arc
+ */
+ vlib_buffer_enqueue_to_next (vm, node, from, nexts, frame->n_vectors);
+
+ return frame->n_vectors;
+ }
+```
+
+Given a packet processing task to implement, it pays to scout around
+looking for similar tasks, and think about using the same coding
+pattern. It is not uncommon to recode a given graph node dispatch function
+several times during performance optimization.
+
+Packet tracer
+-------------
+
+Vlib includes a frame element \[packet\] trace facility, with a simple
+vlib cli interface. The cli is straightforward: "trace add
+input-node-name count".
+
+To trace 100 packets on a typical x86\_64 system running the dpdk
+plugin: "trace add dpdk-input 100". When using the packet generator:
+"trace add pg-input 100"
+
+Each graph node has the opportunity to capture its own trace data. It is
+almost always a good idea to do so. The trace capture APIs are simple.
+
+The packet capture APIs snapshoot binary data, to minimize processing at
+capture time. Each participating graph node initialization provides a
+vppinfra format-style user function to pretty-print data when required
+by the VLIB "show trace" command.
+
+Set the VLIB node registration ".format\_trace" member to the name of
+the per-graph node format function.
+
+Here's a simple example:
+
+```c
+ u8 * my_node_format_trace (u8 * s, va_list * args)
+ {
+ vlib_main_t * vm = va_arg (*args, vlib_main_t *);
+ vlib_node_t * node = va_arg (*args, vlib_node_t *);
+ my_node_trace_t * t = va_arg (*args, my_trace_t *);
+
+ s = format (s, "My trace data was: %d", t-><whatever>);
+
+ return s;
+ }
+```
+
+The trace framework hands the per-node format function the data it
+captured as the packet whizzed by. The format function pretty-prints the
+data as desired.