summaryrefslogtreecommitdiffstats
path: root/docs/gettingstarted/developers/infrastructure.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/gettingstarted/developers/infrastructure.md')
-rw-r--r--docs/gettingstarted/developers/infrastructure.md614
1 files changed, 0 insertions, 614 deletions
diff --git a/docs/gettingstarted/developers/infrastructure.md b/docs/gettingstarted/developers/infrastructure.md
deleted file mode 100644
index a61068c75cd..00000000000
--- a/docs/gettingstarted/developers/infrastructure.md
+++ /dev/null
@@ -1,614 +0,0 @@
-VPPINFRA (Infrastructure)
-=========================
-
-The files associated with the VPP Infrastructure layer are located in
-the ./src/vppinfra folder.
-
-VPPinfra is a collection of basic c-library services, quite
-sufficient to build standalone programs to run directly on bare metal.
-It also provides high-performance dynamic arrays, hashes, bitmaps,
-high-precision real-time clock support, fine-grained event-logging, and
-data structure serialization.
-
-One fair comment / fair warning about vppinfra: you can\'t always tell a
-macro from an inline function from an ordinary function simply by name.
-Macros are used to avoid function calls in the typical case, and to
-cause (intentional) side-effects.
-
-Vppinfra has been around for almost 20 years and tends not to change
-frequently. The VPP Infrastructure layer contains the following
-functions:
-
-Vectors
--------
-
-Vppinfra vectors are ubiquitous dynamically resized arrays with by user
-defined \"headers\". Many vpppinfra data structures (e.g. hash, heap,
-pool) are vectors with various different headers.
-
-The memory layout looks like this:
-
-```
- User header (optional, uword aligned)
- Alignment padding (if needed)
- Vector length in elements
- User's pointer -> Vector element 0
- Vector element 1
- ...
- Vector element N-1
-```
-
-As shown above, the vector APIs deal with pointers to the 0th element of
-a vector. Null pointers are valid vectors of length zero.
-
-To avoid thrashing the memory allocator, one often resets the length of
-a vector to zero while retaining the memory allocation. Set the vector
-length field to zero via the vec\_reset\_length(v) macro. \[Use the
-macro! It's smart about NULL pointers.\]
-
-Typically, the user header is not present. User headers allow for other
-data structures to be built atop vppinfra vectors. Users may specify the
-alignment for first data element of a vector via the \[vec\]()\*\_aligned
-macros.
-
-Vector elements can be any C type e.g. (int, double, struct bar). This
-is also true for data types built atop vectors (e.g. heap, pool, etc.).
-Many macros have \_a variants supporting alignment of vector elements
-and \_h variants supporting non-zero-length vector headers. The \_ha
-variants support both. Additionally cacheline alignment within a
-vector element structure can be specified using the
-\[CLIB_CACHE_LINE_ALIGN_MARK\]() macro.
-
-Inconsistent usage of header and/or alignment related macro variants
-will cause delayed, confusing failures.
-
-Standard programming error: memorize a pointer to the ith element of a
-vector, and then expand the vector. Vectors expand by 3/2, so such code
-may appear to work for a period of time. Correct code almost always
-memorizes vector **indices** which are invariant across reallocations.
-
-In typical application images, one supplies a set of global functions
-designed to be called from gdb. Here are a few examples:
-
-- vl(v) - prints vec\_len(v)
-- pe(p) - prints pool\_elts(p)
-- pifi(p, index) - prints pool\_is\_free\_index(p, index)
-- debug\_hex\_bytes (p, nbytes) - hex memory dump nbytes starting at p
-
-Use the "show gdb" debug CLI command to print the current set.
-
-Bitmaps
--------
-
-Vppinfra bitmaps are dynamic, built using the vppinfra vector APIs.
-Quite handy for a variety jobs.
-
-Pools
------
-
-Vppinfra pools combine vectors and bitmaps to rapidly allocate and free
-fixed-size data structures with independent lifetimes. Pools are perfect
-for allocating per-session structures.
-
-Hashes
-------
-
-Vppinfra provides several hash flavors. Data plane problems involving
-packet classification / session lookup often use
-./src/vppinfra/bihash\_template.\[ch\] bounded-index extensible
-hashes. These templates are instantiated multiple times, to efficiently
-service different fixed-key sizes.
-
-Bihashes are thread-safe. Read-locking is not required. A simple
-spin-lock ensures that only one thread writes an entry at a time.
-
-The original vppinfra hash implementation in
-./src/vppinfra/hash.\[ch\] are simple to use, and are often used in
-control-plane code which needs exact-string-matching.
-
-In either case, one almost always looks up a key in a hash table to
-obtain an index in a related vector or pool. The APIs are simple enough,
-but one must take care when using the unmanaged arbitrary-sized key
-variant. Hash\_set\_mem (hash\_table, key\_pointer, value) memorizes
-key\_pointer. It is usually a bad mistake to pass the address of a
-vector element as the second argument to hash\_set\_mem. It is perfectly
-fine to memorize constant string addresses in the text segment.
-
-Timekeeping
------------
-
-Vppinfra includes high-precision, low-cost timing services. The
-datatype clib_time_t and associated functions reside in
-./src/vppinfra/time.\[ch\]. Call clib_time_init (clib_time_t \*cp) to
-initialize the clib_time_t object.
-
-Clib_time_init(...) can use a variety of different ways to establish
-the hardware clock frequency. At the end of the day, vppinfra
-timekeeping takes the attitude that the operating system's clock is
-the closest thing to a gold standard it has handy.
-
-When properly configured, NTP maintains kernel clock synchronization
-with a highly accurate off-premises reference clock. Notwithstanding
-network propagation delays, a synchronized NTP client will keep the
-kernel clock accurate to within 50ms or so.
-
-Why should one care? Simply put, oscillators used to generate CPU
-ticks aren't super accurate. They work pretty well, but a 0.1% error
-wouldn't be out of the question. That's a minute and a half's worth of
-error in 1 day. The error changes constantly, due to temperature
-variation, and a host of other physical factors.
-
-It's far too expensive to use system calls for timing, so we're left
-with the problem of continously adjusting our view of the CPU tick
-register's clocks_per_second parameter.
-
-The clock rate adjustment algorithm measures the number of cpu ticks
-and the "gold standard" reference time across an interval of
-approximately 16 seconds. We calculate clocks_per_second for the
-interval: use rdtsc (on x86_64) and a system call to get the latest
-cpu tick count and the kernel's latest nanosecond timestamp. We
-subtract the previous interval end values, and use exponential
-smoothing to merge the new clock rate sample into the clocks_per_second
-parameter.
-
-As of this writing, we maintain the clock rate by way of the following
-first-order differential equation:
-
-
-```
- clocks_per_second(t) = clocks_per_second(t-1) * K + sample_cps(t)*(1-K)
- where K = e**(-1.0/3.75);
-```
-
-This yields a per observation "half-life" of 1 minute. Empirically,
-the clock rate converges within 5 minutes, and appears to maintain
-near-perfect agreement with the kernel clock in the face of ongoing
-NTP time adjustments.
-
-See ./src/vppinfra/time.c:clib_time_verify_frequency(...) to look at
-the rate adjustment algorithm. The code rejects frequency samples
-corresponding to the sort of adjustment which might occur if someone
-changes the gold standard kernel clock by several seconds.
-
-### Monotonic timebase support
-
-Particularly during system initialization, the "gold standard" system
-reference clock can change by a large amount, in an instant. It's not
-a best practice to yank the reference clock - in either direction - by
-hours or days. In fact, some poorly-constructed use-cases do so.
-
-To deal with this reality, clib_time_now(...) returns the number of
-seconds since vpp started, *guaranteed to be monotonically
-increasing, no matter what happens to the system reference clock*.
-
-This is first-order important, to avoid breaking every active timer in
-the system. The vpp host stack alone may account for tens of millions
-of active timers. It's utterly impractical to track down and fix
-timers, so we must deal with the issue at the timebase level.
-
-Here's how it works. Prior to adjusting the clock rate, we collect the
-kernel reference clock and the cpu clock:
-
-```
- /* Ask the kernel and the CPU what time it is... */
- now_reference = unix_time_now ();
- now_clock = clib_cpu_time_now ();
-```
-
-Compute changes for both clocks since the last rate adjustment,
-roughly 15 seconds ago:
-
-```
- /* Compute change in the reference clock */
- delta_reference = now_reference - c->last_verify_reference_time;
-
- /* And change in the CPU clock */
- delta_clock_in_seconds = (f64) (now_clock - c->last_verify_cpu_time) *
- c->seconds_per_clock;
-```
-
-Delta_reference is key. Almost 100% of the time, delta_reference and
-delta_clock_in_seconds are identical modulo one system-call
-time. However, NTP or a privileged user can yank the system reference
-time - in either direction - by an hour, a day, or a decade.
-
-As described above, clib_time_now(...) must return monotonically
-increasing answers to the question "how long has it been since vpp
-started, in seconds." To do that, the clock rate adjustment algorithm
-begins by recomputing the initial reference time:
-
-```
- c->init_reference_time += (delta_reference - delta_clock_in_seconds);
-```
-
-It's easy to convince yourself that if the reference clock changes by
-15.000000 seconds and the cpu clock tick time changes by 15.000000
-seconds, the initial reference time won't change.
-
-If, on the other hand, delta_reference is -86400.0 and delta clock is
-15.0 - reference time jumped backwards by exactly one day in a
-15-second rate update interval - we add -86415.0 to the initial
-reference time.
-
-Given the corrected initial reference time, we recompute the total
-number of cpu ticks which have occurred since the corrected initial
-reference time, at the current clock tick rate:
-
-```
- c->total_cpu_time = (now_reference - c->init_reference_time)
- * c->clocks_per_second;
-```
-
-### Timebase precision
-
-Cognoscenti may notice that vlib/clib\_time\_now(...) return a 64-bit
-floating-point value; the number of seconds since vpp started.
-
-Please see [this Wikipedia
-article](https://en.wikipedia.org/wiki/Double-precision_floating-point_format)
-for more information. C double-precision floating point numbers
-(called f64 in the vpp code base) have a 53-bit effective mantissa,
-and can accurately represent 15 decimal digits' worth of precision.
-
-There are 315,360,000.000001 seconds in ten years plus one
-microsecond. That string has exactly 15 decimal digits. The vpp time
-base retains 1us precision for roughly 30 years.
-
-vlib/clib\_time\_now do *not* provide precision in excess of 1e-6
-seconds. If necessary, please use clib_cpu_time_now(...) for direct
-access to the CPU clock-cycle counter. Note that the number of CPU
-clock cycles per second varies significantly across CPU architectures.
-
-Timer Wheels
-------------
-
-Vppinfra includes configurable timer wheel support. See the source
-code in .../src/vppinfra/tw_timer_template.[ch], as well as a
-considerable number of template instances defined in
-.../src/vppinfra/tw_timer_<wheel-geometry-spec>.[ch].
-
-Instantiation of tw_timer_template.h generates named structures to
-implement specific timer wheel geometries. Choices include: number of
-timer wheels (currently, 1 or 2), number of slots per ring (a power of
-two), and the number of timers per "object handle".
-
-Internally, user object/timer handles are 32-bit integers, so if one
-selects 16 timers/object (4 bits), the resulting timer wheel handle is
-limited to 2**28 objects.
-
-Here are the specific settings required to generate a single 2048 slot
-wheel which supports 2 timers per object:
-
-```
- #define TW_TIMER_WHEELS 1
- #define TW_SLOTS_PER_RING 2048
- #define TW_RING_SHIFT 11
- #define TW_RING_MASK (TW_SLOTS_PER_RING -1)
- #define TW_TIMERS_PER_OBJECT 2
- #define LOG2_TW_TIMERS_PER_OBJECT 1
- #define TW_SUFFIX _2t_1w_2048sl
- #define TW_FAST_WHEEL_BITMAP 0
- #define TW_TIMER_ALLOW_DUPLICATE_STOP 0
-```
-
-See tw_timer_2t_1w_2048sl.h for a complete
-example.
-
-tw_timer_template.h is not intended to be #included directly. Client
-codes can include multiple timer geometry header files, although
-extreme caution would required to use the TW and TWT macros in such a
-case.
-
-### API usage examples
-
-The unit test code in .../src/vppinfra/test_tw_timer.c provides a
-concrete API usage example. It uses a synthetic clock to rapidly
-exercise the underlying tw_timer_expire_timers(...) template.
-
-There are not many API routines to call.
-
-#### Initialize a two-timer, single 2048-slot wheel w/ a 1-second timer granularity
-
-```
- tw_timer_wheel_init_2t_1w_2048sl (&tm->single_wheel,
- expired_timer_single_callback,
- 1.0 / * timer interval * / );
-```
-
-#### Start a timer
-
-```
- handle = tw_timer_start_2t_1w_2048sl (&tm->single_wheel, elt_index,
- [0 | 1] / * timer id * / ,
- expiration_time_in_u32_ticks);
-```
-
-#### Stop a timer
-
-```
- tw_timer_stop_2t_1w_2048sl (&tm->single_wheel, handle);
-```
-
-#### An expired timer callback
-
-```
- static void
- expired_timer_single_callback (u32 * expired_timers)
- {
- int i;
- u32 pool_index, timer_id;
- tw_timer_test_elt_t *e;
- tw_timer_test_main_t *tm = &tw_timer_test_main;
-
- for (i = 0; i < vec_len (expired_timers);
- {
- pool_index = expired_timers[i] & 0x7FFFFFFF;
- timer_id = expired_timers[i] >> 31;
-
- ASSERT (timer_id == 1);
-
- e = pool_elt_at_index (tm->test_elts, pool_index);
-
- if (e->expected_to_expire != tm->single_wheel.current_tick)
- {
- fformat (stdout, "[%d] expired at %d not %d\n",
- e - tm->test_elts, tm->single_wheel.current_tick,
- e->expected_to_expire);
- }
- pool_put (tm->test_elts, e);
- }
- }
-```
-
-We use wheel timers extensively in the vpp host stack. Each TCP
-session needs 5 timers, so supporting 10 million flows requires up to
-50 million concurrent timers.
-
-Timers rarely expire, so it's of utmost important that stopping and
-restarting a timer costs as few clock cycles as possible.
-
-Stopping a timer costs a doubly-linked list dequeue. Starting a timer
-involves modular arithmetic to determine the correct timer wheel and
-slot, and a list head enqueue.
-
-Expired timer processing generally involves bulk link-list retirement
-with user callback presentation. Some additional complexity at wheel
-wrap time, to relocate timers from slower-turning timer wheels into
-faster-turning wheels.
-
-Format
-------
-
-Vppinfra format is roughly equivalent to printf.
-
-Format has a few properties worth mentioning. Format's first argument is
-a (u8 \*) vector to which it appends the result of the current format
-operation. Chaining calls is very easy:
-
-```c
- u8 * result;
-
- result = format (0, "junk = %d, ", junk);
- result = format (result, "more junk = %d\n", more_junk);
-```
-
-As previously noted, NULL pointers are perfectly proper 0-length
-vectors. Format returns a (u8 \*) vector, **not** a C-string. If you
-wish to print a (u8 \*) vector, use the "%v" format string. If you need
-a (u8 \*) vector which is also a proper C-string, either of these
-schemes may be used:
-
-```c
- vec_add1 (result, 0)
- or
- result = format (result, "<whatever>%c", 0);
-```
-
-Remember to vec\_free() the result if appropriate. Be careful not to
-pass format an uninitialized (u8 \*).
-
-Format implements a particularly handy user-format scheme via the "%U"
-format specification. For example:
-
-```c
- u8 * format_junk (u8 * s, va_list *va)
- {
- junk = va_arg (va, u32);
- s = format (s, "%s", junk);
- return s;
- }
-
- result = format (0, "junk = %U, format_junk, "This is some junk");
-```
-
-format\_junk() can invoke other user-format functions if desired. The
-programmer shoulders responsibility for argument type-checking. It is
-typical for user format functions to blow up spectacularly if the
-va\_arg(va, type) macros don't match the caller's idea of reality.
-
-Unformat
---------
-
-Vppinfra unformat is vaguely related to scanf, but considerably more
-general.
-
-A typical use case involves initializing an unformat\_input\_t from
-either a C-string or a (u8 \*) vector, then parsing via unformat() as
-follows:
-
-```c
- unformat_input_t input;
- u8 *s = "<some-C-string>";
-
- unformat_init_string (&input, (char *) s, strlen((char *) s));
- /* or */
- unformat_init_vector (&input, <u8-vector>);
-```
-
-Then loop parsing individual elements:
-
-```c
- while (unformat_check_input (&input) != UNFORMAT_END_OF_INPUT)
- {
- if (unformat (&input, "value1 %d", &value1))
- ;/* unformat sets value1 */
- else if (unformat (&input, "value2 %d", &value2)
- ;/* unformat sets value2 */
- else
- return clib_error_return (0, "unknown input '%U'",
- format_unformat_error, input);
- }
-```
-
-As with format, unformat implements a user-unformat function capability
-via a "%U" user unformat function scheme. Generally, one can trivially
-transform "format (s, "foo %d", foo) -> "unformat (input, "foo %d", &foo)".
-
-Unformat implements a couple of handy non-scanf-like format specifiers:
-
-```c
- unformat (input, "enable %=", &enable, 1 /* defaults to 1 */);
- unformat (input, "bitzero %|", &mask, (1<<0));
- unformat (input, "bitone %|", &mask, (1<<1));
- <etc>
-```
-
-The phrase "enable %=" means "set the supplied variable to the default
-value" if unformat parses the "enable" keyword all by itself. If
-unformat parses "enable 123" set the supplied variable to 123.
-
-We could clean up a number of hand-rolled "verbose" + "verbose %d"
-argument parsing codes using "%=".
-
-The phrase "bitzero %|" means "set the specified bit in the supplied
-bitmask" if unformat parses "bitzero". Although it looks like it could
-be fairly handy, it's very lightly used in the code base.
-
-`%_` toggles whether or not to skip input white space.
-
-For transition from skip to no-skip in middle of format string, skip input white space. For example, the following:
-
-```c
-fmt = "%_%d.%d%_->%_%d.%d%_"
-unformat (input, fmt, &one, &two, &three, &four);
-```
-matches input "1.2 -> 3.4".
-Without this, the space after -> does not get skipped.
-
-
-```
-
-### How to parse a single input line
-
-Debug CLI command functions MUST NOT accidentally consume input
-belonging to other debug CLI commands. Otherwise, it's impossible to
-script a set of debug CLI commands which "work fine" when issued one
-at a time.
-
-This bit of code is NOT correct:
-
-```c
- /* Eats script input NOT beloging to it, and chokes! */
- while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT)
- {
- if (unformat (input, ...))
- ;
- else if (unformat (input, ...))
- ;
- else
- return clib_error_return (0, "parse error: '%U'",
- format_unformat_error, input);
- }
- }
-```
-
-When executed as part of a script, such a function will return "parse
-error: '<next-command-text>'" every time, unless it happens to be the
-last command in the script.
-
-Instead, use "unformat_line_input" to consume the rest of a line's
-worth of input - everything past the path specified in the
-VLIB_CLI_COMMAND declaration.
-
-For example, unformat_line_input with "my_command" set up as shown
-below and user input "my path is clear" will produce an
-unformat_input_t that contains "is clear".
-
-```c
- VLIB_CLI_COMMAND (...) = {
- .path = "my path",
- };
-```
-
-Here's a bit of code which shows the required mechanics, in full:
-
-```c
- static clib_error_t *
- my_command_fn (vlib_main_t * vm,
- unformat_input_t * input,
- vlib_cli_command_t * cmd)
- {
- unformat_input_t _line_input, *line_input = &_line_input;
- u32 this, that;
- clib_error_t *error = 0;
-
- if (!unformat_user (input, unformat_line_input, line_input))
- return 0;
-
- /*
- * Here, UNFORMAT_END_OF_INPUT is at the end of the line we consumed,
- * not at the end of the script...
- */
- while (unformat_check_input (line_input) != UNFORMAT_END_OF_INPUT)
- {
- if (unformat (line_input, "this %u", &this))
- ;
- else if (unformat (line_input, "that %u", &that))
- ;
- else
- {
- error = clib_error_return (0, "parse error: '%U'",
- format_unformat_error, line_input);
- goto done;
- }
- }
-
- <do something based on "this" and "that", etc>
-
- done:
- unformat_free (line_input);
- return error;
- }
- /* *INDENT-OFF* */
- VLIB_CLI_COMMAND (my_command, static) = {
- .path = "my path",
- .function = my_command_fn",
- };
- /* *INDENT-ON* */
-
-```
-
-
-Vppinfra errors and warnings
-----------------------------
-
-Many functions within the vpp dataplane have return-values of type
-clib\_error\_t \*. Clib\_error\_t's are arbitrary strings with a bit of
-metadata \[fatal, warning\] and are easy to announce. Returning a NULL
-clib\_error\_t \* indicates "A-OK, no error."
-
-Clib\_warning(format-args) is a handy way to add debugging
-output; clib warnings prepend function:line info to unambiguously locate
-the message source. Clib\_unix\_warning() adds perror()-style Linux
-system-call information. In production images, clib\_warnings result in
-syslog entries.
-
-Serialization
--------------
-
-Vppinfra serialization support allows the programmer to easily serialize
-and unserialize complex data structures.
-
-The underlying primitive serialize/unserialize functions use network
-byte-order, so there are no structural issues serializing on a
-little-endian host and unserializing on a big-endian host.