aboutsummaryrefslogtreecommitdiffstats
path: root/docs/content/methodology/measurements
diff options
context:
space:
mode:
Diffstat (limited to 'docs/content/methodology/measurements')
-rw-r--r--docs/content/methodology/measurements/_index.md12
-rw-r--r--docs/content/methodology/measurements/data_plane_throughput/_index.md13
-rw-r--r--docs/content/methodology/measurements/data_plane_throughput/data_plane_throughput.md139
-rw-r--r--docs/content/methodology/measurements/data_plane_throughput/mlr_search.md112
-rw-r--r--docs/content/methodology/measurements/data_plane_throughput/mrr.md56
-rw-r--r--docs/content/methodology/measurements/data_plane_throughput/plr_search.md386
-rw-r--r--docs/content/methodology/measurements/packet_latency.md52
-rw-r--r--docs/content/methodology/measurements/telemetry.md158
8 files changed, 928 insertions, 0 deletions
diff --git a/docs/content/methodology/measurements/_index.md b/docs/content/methodology/measurements/_index.md
new file mode 100644
index 0000000000..21176fef80
--- /dev/null
+++ b/docs/content/methodology/measurements/_index.md
@@ -0,0 +1,12 @@
+---
+bookCollapseSection: true
+bookFlatSection: false
+title: "Measurements"
+weight: 2
+---
+
+# Measurement
+
+- [Data Plane Throughput]({{< relref "/methodology/measurements/data_plane_throughput" >}})
+- [Packet Latency]({{< relref "/methodology/measurements/packet_latency" >}})
+- [Telemetry]({{< relref "/methodology/measurements/telemetry" >}})
diff --git a/docs/content/methodology/measurements/data_plane_throughput/_index.md b/docs/content/methodology/measurements/data_plane_throughput/_index.md
new file mode 100644
index 0000000000..30b55f149d
--- /dev/null
+++ b/docs/content/methodology/measurements/data_plane_throughput/_index.md
@@ -0,0 +1,13 @@
+---
+bookCollapseSection: true
+bookFlatSection: false
+title: "Data Plane Throughput"
+weight: 1
+---
+
+# Data Plane Throughput
+
+- [Overview]({{< relref "/methodology/measurements/data_plane_throughput/data_plane_throughput" >}})
+- [MLR Search]({{< relref "/methodology/measurements/data_plane_throughput/mlr_search" >}})
+- [PLR Search]({{< relref "/methodology/measurements/data_plane_throughput/plr_search" >}})
+- [MRR]({{< relref "/methodology/measurements/data_plane_throughput/mrr" >}}) \ No newline at end of file
diff --git a/docs/content/methodology/measurements/data_plane_throughput/data_plane_throughput.md b/docs/content/methodology/measurements/data_plane_throughput/data_plane_throughput.md
new file mode 100644
index 0000000000..c7dce24c1a
--- /dev/null
+++ b/docs/content/methodology/measurements/data_plane_throughput/data_plane_throughput.md
@@ -0,0 +1,139 @@
+---
+title: "Overview"
+weight: 1
+---
+
+# Data Plane Throughput
+
+Network data plane throughput is measured using multiple test methods in
+order to obtain representative and repeatable results across the large
+set of performance test cases implemented and executed within CSIT.
+
+Following throughput test methods are used:
+
+- MLRsearch - Multiple Loss Ratio search, used in NDRPDR tests.
+- PLRsearch - Probabilistic Loss Ratio search, used in SOAK tests.
+- MRR - Maximum Receive Rate tests, the method based on FRMOL from RFC 2285.
+
+Description of each test method is followed by generic test properties
+shared by all methods.
+
+## NDRPDR Tests
+
+These tests employ MLRsearch to find two conditional throughput values.
+NDR for zero loss ratio goal and PDR for 0.5% loss ratio goal.
+
+### Algorithm Details
+
+See [MLRSearch]({{< ref "mlr_search/#MLRsearch" >}}) section for more detail.
+MLRsearch is being standardized in IETF in
+[draft-ietf-bmwg-mlrsearch](https://datatracker.ietf.org/doc/html/draft-ietf-bmwg-mlrsearch-06).
+
+### Description
+
+Multiple Loss Ratio search (MLRsearch) algorithm can discover multiple
+conditional throughputs in a single search,
+reducing the overall test execution time compared to a binary search.
+In FD.io CSIT, conditional throughputs are discovered for two search goals:
+Non-Drop Rate (NDR, zero loss ratio goal)
+and Partial Drop Rate (PDR, 0.5% loss ratio goal).
+Other inputs are common for both goals:
+Goal width is 0.5%, trial duration is 1 second, duration sum goal is 21 seconds
+and exceed ratio is 50%.
+
+The main algorithm expresses the conditional throughput based on one-port load.
+The results presented in CSIT show aggregate load,
+(the value from the search is doubled if the tests uses bidirectional traffic).
+
+### Usage
+
+MLRsearch tests are run to discover NDR and PDR rates for each VPP and
+DPDK release covered by CSIT report. Results for small frame sizes
+(64B/78B, IMIX) are presented in packet throughput graphs
+(Box-and-Whisker Plots) with NDR and PDR rates plotted against the test
+cases covering popular VPP packet paths.
+
+Each test is executed at least 10 times to verify measurements
+repeatability and results are compared between releases and test
+environments. NDR and PDR packet and bandwidth throughput results for
+all frame sizes and for all tests are presented in detailed results
+tables.
+
+## SOAK Tests
+
+These tests employ PLRsearch to find a critical load value.
+
+### Algorithm Details
+
+See [PLRSearch]({{< ref "plr_search/#PLRsearch" >}}) methodology section for
+more detail. PLRsearch is being standardized in IETF in
+[draft-vpolak-bmwg-plrsearch](https://tools.ietf.org/html/draft-vpolak-bmwg-plrsearch).
+
+### Description
+
+Probabilistic Loss Ratio search (PLRsearch) tests discovers a packet
+throughput rate associated with configured Packet Loss Ratio (PLR)
+target for tests run over an extended period of time a.k.a. soak
+testing. PLRsearch assumes that system under test is probabilistic in
+nature, and not deterministic.
+
+### Usage
+
+PLRsearch are run to discover a critical load for PLR=10^-7^
+(close to NDR) for VPP release covered by CSIT report. Results for small
+frame sizes (64B/78B) are presented in packet throughput graphs (Box
+Plots) for a small subset of baseline tests.
+
+Each soak test lasts 30 minutes and is executed at least twice.
+
+## MRR Tests
+
+### Algorithm Details
+
+See [MRR Throughput]({{< ref "mrr/#MRR" >}})
+section for more detail about MRR tests configuration.
+
+FD.io CSIT performance dashboard includes complete description of
+[daily performance trending tests]({{< ref "../../trending/analysis" >}})
+and [VPP per patch tests]({{< ref "../../per_patch_testing.md" >}}).
+
+### Description
+
+Maximum Receive Rate (MRR) tests are complementary to MLRsearch tests,
+as they provide a maximum “raw” throughput benchmark for development and
+testing community.
+
+MRR tests measure the packet forwarding rate under the maximum load
+offered by traffic generator (dependent on link type and NIC model) over
+a set trial duration, regardless of packet loss. Maximum load for
+specified Ethernet frame size is set to the bi-directional link rate.
+
+### Usage
+
+MRR tests are much faster than MLRsearch as they rely on
+a small set of trials with very short duration. It is this property
+that makes them suitable for continuous execution in daily performance
+trending jobs enabling detection of performance anomalies (regressions,
+progressions) resulting from data plane code changes.
+
+MRR tests are also used for VPP per patch performance jobs verifying
+patch performance vs parent. CSIT reports include MRR throughput
+comparisons between releases and test environments. Small frame sizes
+only (64B/78B, IMIX).
+
+## Generic Test Properties
+
+All data plane throughput test methodologies share following generic
+properties:
+
+- Tested L2 frame sizes (untagged Ethernet):
+
+ - IPv4 payload: 64B, IMIX (28x64B, 16x570B, 4x1518B), 1518B, 9000B.
+ - IPv6 payload: 78B, IMIX (28x78B, 16x570B, 4x1518B), 1518B, 9000B.
+ - All quoted sizes include frame CRC, but exclude per frame
+ transmission overhead of 20B (preamble, inter frame gap).
+
+- Offered packet load is always bi-directional and symmetric.
+- All measured and reported packet and bandwidth rates are aggregate
+ bi-directional rates reported from external Traffic Generator
+ perspective.
diff --git a/docs/content/methodology/measurements/data_plane_throughput/mlr_search.md b/docs/content/methodology/measurements/data_plane_throughput/mlr_search.md
new file mode 100644
index 0000000000..71e4471905
--- /dev/null
+++ b/docs/content/methodology/measurements/data_plane_throughput/mlr_search.md
@@ -0,0 +1,112 @@
+---
+title: "MLR Search"
+weight: 2
+---
+
+# MLR Search
+
+## Overview
+
+Multiple Loss Ratio search (MLRsearch) tests use an optimized search algorithm
+implemented in FD.io CSIT project. MLRsearch discovers conditional throughput
+corresponding to any number of loss ratio goals, within a single search.
+
+Two loss ratio goals are of interest in FD.io CSIT, leading to Non-Drop Rate
+(NDR, loss ratio goal is exact zero) and Partial Drop Rate
+(PDR, 0.5% loss ratio goal).
+Instead of a single long trial, a sequence of short (1s) trials is done.
+Thus, instead of final trial duration, a duration sum (21s) is prescribed.
+This allows the algorithm to make a decision sooner,
+when the results are quite one-sided.
+Also, only one half of the trial results is required to meet
+the loss ratio requirement, making the conditional throughput more stable.
+The conditional throughput in this case is in principle the median forwarding rate
+among all trials at the relevant lower bound intended load.
+In practice, the search stops when missing trial results cannot
+disprove the load as a lower bound, so conditional throughput
+is the worst forwarding rate among the measured good trials.
+
+MLRsearch discovers all the loads in single search, reducing required time
+duration compared to separate `binary search`es[^1] for each rate. Overall
+search time is reduced even further by relying on shorter trial
+duration sums for intermediate targets, with only measurements for
+final targets require the full duration sum. This results in the
+shorter overall execution time when compared to standard NDR/PDR binary
+search, while guaranteeing similar results.
+
+ Note: The conditional throughput is *always* reported by Robot code
+ as a bi-directional aggregate of two (usually symmetric)
+ uni-directional packet rates received and reported by an
+ external traffic generator (TRex), unless the test specifically requires
+ unidirectional traffic. The underlying Python library uses
+ unidirectional values instead, as min and max load are given for those.
+
+## Search Implementation
+
+Detailed description of the MLRsearch algorithm is included in the IETF
+draft
+[draft-ietf-bmwg-mlrsearch](https://datatracker.ietf.org/doc/html/draft-ietf-bmwg-mlrsearch)
+that is in the process of being standardized in the IETF Benchmarking
+Methodology Working Group (BMWG).
+
+MLRsearch is also available as a
+[PyPI (Python Package Index) library](https://pypi.org/project/MLRsearch/).
+
+## Algorithm highlights
+
+MRR and receive rate at MRR load are used as initial guesses for the search.
+
+All previously measured trials (except the very first one which acts
+as a warm-up) are taken into consideration.
+
+For every loss ratio goal, the relevant upper and lower bound
+(intended loads, among loads of large enough duration sum) form an interval.
+Exit condition is given by that interval reaching low enough relative width.
+Small enough width is achieved by bisecting the current interval.
+The bisection can be uneven, to save measurements based on information theory.
+The width value is 0.5%, the same as PDR goal loss ratio,
+as smaller values may report PDR conditional throughput smaller than NDR.
+
+Switching to higher trial duration sum generally requires additional trials
+at a load from previous duration sum target.
+When this refinement does not confirm previous bound classification
+(e.g. a lower bound for preceding target
+becomes an upper bound of the new target due to new trail results),
+external search is used to find close enough bound of the lost type.
+External search is a generalization of the first stage of
+`exponential search`[^2].
+
+A preceding target uses double of the next width goal,
+because one bisection is always safe before risking external search.
+
+As different search targets are interested at different loads,
+lower intended load are measured first,
+as that approach saves more time when trial results are not very consistent.
+Other heuristics are there, aimed to prevent unneccessarily narrow intervals,
+and to handle corner cases around min and max load.
+
+## Deviations from RFC 2544
+
+RFC 2544 implies long final trial duration (just one long trial is needed
+for classification to lower or uper bound, so exceed ratio does not matter).
+With 1s trials and 0.5 exceed ratio, NDR values reported by CSIT
+are likely higher than RFC 2544 throughput (especially for less stable tests).
+
+CSIT does not have any explicit wait times before and after trial traffic.
+(But the TRex-based measurer takes almost half a second between targets.)
+
+Small difference between intended load and offered load is tolerated,
+mainly due to various time overheads preventing precise measurement
+of the traffic duration (and TRex can sometimes suffer from duration
+stretching). Large difference is reported as unsent packets
+(measurement is forcibly stopped after given time), counted as
+a packet loss, so search focuses on loads actually achievable by TRex.
+
+In some tests, negative loss count is observed (TRex sees more packets
+coming back to it than TRex sent this trial). CSIT code treats that
+as a packet loss (as if VPP duplicated the packets),
+but TRex does not check other packets for duplication
+(as many traffic profiles generate non-unique packets).
+
+[^1]: [binary search](https://en.wikipedia.org/wiki/Binary_search)
+[^2]: [exponential search](https://en.wikipedia.org/wiki/Exponential_search)
diff --git a/docs/content/methodology/measurements/data_plane_throughput/mrr.md b/docs/content/methodology/measurements/data_plane_throughput/mrr.md
new file mode 100644
index 0000000000..e8c3e62eb6
--- /dev/null
+++ b/docs/content/methodology/measurements/data_plane_throughput/mrr.md
@@ -0,0 +1,56 @@
+---
+title: "MRR"
+weight: 4
+---
+
+# MRR
+
+Maximum Receive Rate (MRR) tests are complementary to MLRsearch tests,
+as they provide a maximum "raw" throughput benchmark for development and
+testing community. MRR tests measure the packet forwarding rate under
+the maximum load offered by traffic generator over a set trial duration,
+regardless of packet loss.
+
+MRR tests are currently used for following test jobs:
+
+- Report performance comparison: 64B, IMIX for vhost, memif.
+- Daily performance trending: 64B, IMIX for vhost, memif.
+- Per-patch performance verification: 64B.
+- Initial iterations of MLRsearch and PLRsearch: 64B.
+
+Maximum offered load for specific L2 Ethernet frame size is set to
+either the maximum bi-directional link rate or tested NIC model
+capacity, as follows:
+
+- For 10GE NICs the maximum packet rate load is 2x14.88 Mpps for 64B, a
+ 10GE bi-directional link rate.
+- For 25GE NICs the maximum packet rate load is 2x18.75 Mpps for 64B, a
+ 25GE bi-directional link sub-rate limited by 25GE NIC used on TRex TG,
+ XXV710.
+- For 40GE NICs the maximum packet rate load is 2x18.75 Mpps for 64B, a
+ 40GE bi-directional link sub-rate limited by 40GE NIC used on TRex
+ TG, XL710. Packet rate for other tested frame sizes is limited by
+ PCIeGen3 x8 bandwidth limitation of ~50Gbps.
+
+MRR test code implements multiple bursts of offered packet load and has
+two configurable burst parameters: individual trial duration and number
+of trials in a single burst. This enables more precise performance
+trending by providing more results data for analysis.
+
+Burst parameter settings vary between different tests using MRR:
+
+- MRR individual trial duration:
+
+ - Report performance comparison: 1 sec.
+ - Daily performance trending: 1 sec.
+ - Per-patch performance verification: 10 sec.
+ - Initial iteration for MLRsearch: 1 sec.
+ - Initial iteration for PLRsearch: 5.2 sec.
+
+- Number of MRR trials per burst:
+
+ - Report performance comparison: 10.
+ - Daily performance trending: 10.
+ - Per-patch performance verification: 5.
+ - Initial iteration for MLRsearch: 1.
+ - Initial iteration for PLRsearch: 1.
diff --git a/docs/content/methodology/measurements/data_plane_throughput/plr_search.md b/docs/content/methodology/measurements/data_plane_throughput/plr_search.md
new file mode 100644
index 0000000000..6f208c1ece
--- /dev/null
+++ b/docs/content/methodology/measurements/data_plane_throughput/plr_search.md
@@ -0,0 +1,386 @@
+---
+title: "PLR Search"
+weight: 3
+---
+
+# PLR Search
+
+## Motivation for PLRsearch
+
+Network providers are interested in throughput a system can sustain.
+
+`RFC 2544`[^1] assumes loss ratio is given by a deterministic function of
+offered load. But NFV software systems are not deterministic enough.
+This makes deterministic algorithms (such as `binary search`[^2] per RFC 2544
+and MLRsearch with single trial) to return results,
+which when repeated show relatively high standard deviation,
+thus making it harder to tell what "the throughput" actually is.
+
+We need another algorithm, which takes this indeterminism into account.
+
+## Generic Algorithm
+
+Detailed description of the PLRsearch algorithm is included in the IETF
+draft `Probabilistic Loss Ratio Search for Packet Throughput`[^3] that is in the
+process of being standardized in the IETF Benchmarking Methodology Working Group
+(BMWG).
+
+### Terms
+
+The rest of this page assumes the reader is familiar with the following terms
+defined in the IETF draft:
+
++ Trial Order Independent System
++ Duration Independent System
++ Target Loss Ratio
++ Critical Load
++ Offered Load regions
+
+ + Zero Loss Region
+ + Non-Deterministic Region
+ + Guaranteed Loss Region
+
++ Fitting Function
+
+ + Stretch Function
+ + Erf Function
+
++ Bayesian Inference
+
+ + Prior distribution
+ + Posterior Distribution
+
++ Numeric Integration
+
+ + Monte Carlo
+ + Importance Sampling
+
+## FD.io CSIT Implementation Specifics
+
+The search receives min_rate and max_rate values, to avoid measurements
+at offered loads not supporeted by the traffic generator.
+
+The implemented tests cases use bidirectional traffic.
+The algorithm stores each rate as bidirectional rate (internally,
+the algorithm is agnostic to flows and directions,
+it only cares about aggregate counts of packets sent and packets lost),
+but debug output from traffic generator lists unidirectional values.
+
+In CSIT, tests that employ PLRsearch are identified as SOAK tests,
+the search time is set to 30 minuts.
+
+### Measurement Delay
+
+In a sample implemenation in FD.io CSIT project, there is roughly 0.5
+second delay between trials due to restrictons imposed by packet traffic
+generator in use (T-Rex).
+
+As measurements results come in, posterior distribution computation takes
+more time (per sample), although there is a considerable constant part
+(mostly for inverting the fitting functions).
+
+Also, the integrator needs a fair amount of samples to reach the region
+the posterior distribution is concentrated at.
+
+And of course, the speed of the integrator depends on computing power
+of the CPU the algorithm is able to use.
+
+All those timing related effects are addressed by arithmetically increasing
+trial durations with configurable coefficients
+(currently 5.1 seconds for the first trial,
+each subsequent trial being 0.1 second longer).
+
+### Rounding Errors and Underflows
+
+In order to avoid them, the current implementation tracks natural logarithm
+(instead of the original quantity) for any quantity which is never negative.
+Logarithm of zero is minus infinity (not supported by Python),
+so special value "None" is used instead.
+Specific functions for frequent operations (such as "logarithm
+of sum of exponentials") are defined to handle None correctly.
+
+### Fitting Functions
+
+Current implementation uses two fitting functions, called "stretch" and "erf".
+In general, their estimates for critical rate differ,
+which adds a simple source of systematic error,
+on top of randomness error reported by integrator.
+Otherwise the reported stdev of critical rate estimate
+is unrealistically low.
+
+Both functions are not only increasing, but also convex
+(meaning the rate of increase is also increasing).
+
+Both fitting functions have several mathematically equivalent formulas,
+each can lead to an arithmetic overflow or underflow in different sub-terms.
+Overflows can be eliminated by using different exact formulas
+for different argument ranges.
+Underflows can be avoided by using approximate formulas
+in affected argument ranges, such ranges have their own formulas to compute.
+At the end, both fitting function implementations
+contain multiple "if" branches, discontinuities are a possibility
+at range boundaries.
+
+### Prior Distributions
+
+The numeric integrator expects all the parameters to be distributed
+(independently and) uniformly on an interval (-1, 1).
+
+As both "mrr" and "spread" parameters are positive and not dimensionless,
+a transformation is needed. Dimentionality is inherited from max_rate value.
+
+The "mrr" parameter follows a `Lomax distribution`[^4]
+with alpha equal to one, but shifted so that mrr is always greater than 1
+packet per second.
+
+The "stretch" parameter is generated simply as the "mrr" value
+raised to a random power between zero and one;
+thus it follows a `reciprocal distribution`[^5].
+
+### Integrator
+
+After few measurements, the posterior distribution of fitting function
+arguments gets quite concentrated into a small area.
+The integrator is using `Monte Carlo`[^6] with `importance sampling`[^7]
+where the biased distribution is `bivariate Gaussian`[^8] distribution,
+with deliberately larger variance.
+If the generated sample falls outside (-1, 1) interval,
+another sample is generated.
+
+The center and the covariance matrix for the biased distribution
+is based on the first and second moments of samples seen so far
+(within the computation). The center is used directly,
+covariance matrix is scaled up by a heurictic constant (8.0 by default).
+The following additional features are applied
+designed to avoid hyper-focused distributions.
+
+Each computation starts with the biased distribution inherited
+from the previous computation (zero point and unit covariance matrix
+is used in the first computation), but the overal weight of the data
+is set to the weight of the first sample of the computation.
+Also, the center is set to the first sample point.
+When additional samples come, their weight (including the importance correction)
+is compared to sum of the weights of data seen so far (within the iteration).
+If the new sample is more than one e-fold more impactful, both weight values
+(for data so far and for the new sample) are set to (geometric) average
+of the two weights.
+
+This combination showed the best behavior, as the integrator usually follows
+two phases. First phase (where inherited biased distribution
+or single big sample are dominating) is mainly important
+for locating the new area the posterior distribution is concentrated at.
+The second phase (dominated by whole sample population)
+is actually relevant for the critical rate estimation.
+
+### Offered Load Selection
+
+First two measurements are hardcoded to happen at the middle of rate interval
+and at max_rate. Next two measurements follow MRR-like logic,
+offered load is decreased so that it would reach target loss ratio
+if offered load decrease lead to equal decrease of loss rate.
+
+The rest of measurements start directly in between
+erf and stretch estimate average.
+There is one workaround implemented, aimed at reducing the number of consequent
+zero loss measurements (per fitting function). The workaround first stores
+every measurement result which loss ratio was the targed loss ratio or higher.
+Sorted list (called lossy loads) of such results is maintained.
+
+When a sequence of one or more zero loss measurement results is encountered,
+a smallest of lossy loads is drained from the list.
+If the estimate average is smaller than the drained value,
+a weighted average of this estimate and the drained value is used
+as the next offered load. The weight of the estimate decreases exponentially
+with the length of consecutive zero loss results.
+
+This behavior helps the algorithm with convergence speed,
+as it does not need so many zero loss result to get near critical region.
+Using the smallest (not drained yet) of lossy loads makes it sure
+the new offered load is unlikely to result in big loss region.
+Draining even if the estimate is large enough helps to discard
+early measurements when loss hapened at too low offered load.
+Current implementation adds 4 copies of lossy loads and drains 3 of them,
+which leads to fairly stable behavior even for somewhat inconsistent SUTs.
+
+### Caveats
+
+As high loss count measurements add many bits of information,
+they need a large amount of small loss count measurements to balance them,
+making the algorithm converge quite slowly. Typically, this happens
+when few initial measurements suggest spread way bigger then later measurements.
+The workaround in offered load selection helps,
+but more intelligent workarounds could get faster convergence still.
+
+Some systems evidently do not follow the assumption of repeated measurements
+having the same average loss rate (when the offered load is the same).
+The idea of estimating the trend is not implemented at all,
+as the observed trends have varied characteristics.
+
+Probably, using a more realistic fitting functions
+will give better estimates than trend analysis.
+
+## Bottom Line
+
+The notion of Throughput is easy to grasp, but it is harder to measure
+with any accuracy for non-deterministic systems.
+
+Even though the notion of critical rate is harder to grasp than the notion
+of throughput, it is easier to measure using probabilistic methods.
+
+In testing, the difference between througput measurements and critical
+rate measurements is usually small.
+
+In pactice, rules of thumb such as "send at max 95% of purported throughput"
+are common. The correct benchmarking analysis should ask "Which notion is
+95% of throughput an approximation to?" before attempting to answer
+"Is 95% of critical rate safe enough?".
+
+## Algorithmic Analysis
+
+### Motivation
+
+While the estimation computation is based on hard probability science;
+the offered load selection part of PLRsearch logic is pure heuristics,
+motivated by what would a human do based on measurement and computation results.
+
+The quality of any heuristic is not affected by soundness of its motivation,
+just by its ability to achieve the intended goals.
+In case of offered load selection, the goal is to help the search to converge
+to the long duration estimates sooner.
+
+But even those long duration estimates could still be of poor quality.
+Even though the estimate computation is Bayesian (so it is the best it could be
+within the applied assumptions), it can still of poor quality when compared
+to what a human would estimate.
+
+One possible source of poor quality is the randomnes inherently present
+in Monte Carlo numeric integration, but that can be supressed
+by tweaking the time related input parameters.
+
+The most likely source of poor quality then are the assumptions.
+Most importantly, the number and the shape of fitting functions;
+but also others, such as trial order independence and duration independence.
+
+The result can have poor quality in basically two ways.
+One way is related to location. Both upper and lower bounds
+can be overestimates or underestimates, meaning the entire estimated interval
+between lower bound and upper bound lays above or below (respectively)
+of human-estimated interval.
+The other way is related to the estimation interval width.
+The interval can be too wide or too narrow, compared to human estimation.
+
+An estimate from a particular fitting function can be classified
+as an overestimate (or underestimate) just by looking at time evolution
+(without human examining measurement results). Overestimates
+decrease by time, underestimates increase by time (assuming
+the system performance stays constant).
+
+Quality of the width of the estimation interval needs human evaluation,
+and is unrelated to both rate of narrowing (both good and bad estimate intervals
+get narrower at approximately the same relative rate) and relatative width
+(depends heavily on the system being tested).
+
+### Graphical Examples
+
+The following pictures show the upper (red) and lower (blue) bound,
+as well as average of Stretch (pink) and Erf (light green) estimate,
+and offered load chosen (grey), as computed by PLRsearch,
+after each trial measurement within the 30 minute duration of a test run.
+
+Both graphs are focusing on later estimates. Estimates computed from
+few initial measurements are wildly off the y-axis range shown.
+
+The following analysis will rely on frequency of zero loss measurements
+and magnitude of loss ratio if nonzero.
+
+The offered load selection strategy used implies zero loss measurements
+can be gleaned from the graph by looking at offered load points.
+When the points move up farther from lower estimate, it means
+the previous measurement had zero loss. After non-zero loss,
+the offered load starts again right between (the previous values of)
+the estimate curves.
+
+The very big loss ratio results are visible as noticeable jumps
+of both estimates downwards. Medium and small loss ratios are much harder
+to distinguish just by looking at the estimate curves,
+the analysis is based on raw loss ratio measurement results.
+
+The following descriptions should explain why the graphs seem to signal
+low quality estimate at first sight, but a more detailed look
+reveals the quality is good (considering the measurement results).
+
+#### L2 patch
+
+Both fitting functions give similar estimates, the graph shows
+"stochasticity" of measurements (estimates increase and decrease
+within small time regions), and an overall trend of decreasing estimates.
+
+On the first look, the final interval looks fairly narrow,
+especially compared to the region the estimates have travelled
+during the search. But the look at the frequency of zero loss results shows
+this is not a case of overestimation. Measurements at around the same
+offered load have higher probability of zero loss earlier
+(when performed farther from upper bound), but smaller probability later
+(when performed closer to upper bound). That means it is the performance
+of the system under test that decreases (slightly) over time.
+
+With that in mind, the apparent narrowness of the interval
+is not a sign of low quality, just a consequence of PLRsearch assuming
+the performance stays constant.
+
+{{< figure src="/cdocs/PLR_patch.svg" >}}
+
+#### Vhost
+
+This test case shows what looks like a quite broad estimation interval,
+compared to other test cases with similarly looking zero loss frequencies.
+Notable features are infrequent high-loss measurement results
+causing big drops of estimates, and lack of long-term convergence.
+
+Any convergence in medium-sized intervals (during zero loss results)
+is reverted by the big loss results, as they happen quite far
+from the critical load estimates, and the two fitting functions
+extrapolate differently.
+
+In other words, human only seeing estimates from one fitting function
+would expect narrower end interval, but human seeing the measured loss ratios
+agrees that the interval should be wider than that.
+
+{{< figure src="/cdocs/PLR_vhost.svg" >}}
+
+#### Summary
+
+The two graphs show the behavior of PLRsearch algorithm applied to soak test
+when some of PLRsearch assumptions do not hold:
+
++ L2 patch measurement results violate the assumption
+ of performance not changing over time.
++ Vhost measurement results violate the assumption
+ of Poisson distribution matching the loss counts.
+
+The reported upper and lower bounds can have distance larger or smaller
+than a first look by a human would expect, but a more closer look reveals
+the quality is good, considering the circumstances.
+
+The usefullness of the critical load estimate is of questionable value
+when the assumptions are violated.
+
+Some improvements can be made via more specific workarounds,
+for example long term limit of L2 patch performance could be estmated
+by some heuristic.
+
+Other improvements can be achieved only by asking users
+whether loss patterns matter. Is it better to have single digit losses
+distributed fairly evenly over time (as Poisson distribution would suggest),
+or is it better to have short periods of medium losses
+mixed with long periods of zero losses (as happens in Vhost test)
+with the same overall loss ratio?
+
+[^1]: [RFC 2544: Benchmarking Methodology for Network Interconnect Devices](https://tools.ietf.org/html/rfc2544)
+[^2]: [Binary search](https://en.wikipedia.org/wiki/Binary_search_algorithm)
+[^3]: [Probabilistic Loss Ratio Search for Packet Throughput](https://tools.ietf.org/html/draft-vpolak-bmwg-plrsearch-02)
+[^4]: [Lomax distribution](https://en.wikipedia.org/wiki/Lomax_distribution)
+[^5]: [Reciprocal distribution](https://en.wikipedia.org/wiki/Reciprocal_distribution)
+[^6]: [Monte Carlo](https://en.wikipedia.org/wiki/Monte_Carlo_integration)
+[^7]: [Importance sampling](https://en.wikipedia.org/wiki/Importance_sampling)
+[^8]: [Bivariate Gaussian](https://en.wikipedia.org/wiki/Multivariate_normal_distribution)
diff --git a/docs/content/methodology/measurements/packet_latency.md b/docs/content/methodology/measurements/packet_latency.md
new file mode 100644
index 0000000000..f3606b5ffb
--- /dev/null
+++ b/docs/content/methodology/measurements/packet_latency.md
@@ -0,0 +1,52 @@
+---
+title: "Packet Latency"
+weight: 2
+---
+
+# Packet Latency
+
+TRex Traffic Generator (TG) is used for measuring one-way latency in
+2-Node and 3-Node physical testbed topologies. TRex integrates
+[High Dynamic Range Histogram (HDRH)](http://hdrhistogram.org/)
+functionality and reports per packet latency distribution for latency
+streams sent in parallel to the main load packet streams.
+
+Following methodology is used:
+
+- Only NDRPDR test type measures latency and only after NDR and PDR
+ values are determined. Other test types do not involve latency
+ streams.
+
+- Latency is measured at different background load packet rates:
+
+ - No-Load: latency streams only.
+ - Low-Load: at 10% PDR.
+ - Mid-Load: at 50% PDR.
+ - High-Load: at 90% PDR.
+
+- Latency is measured for all tested packet sizes except IMIX due to
+ TRex TG restriction.
+
+- TG sends dedicated latency streams, one per direction, each at the
+ rate of 9 kpps at the prescribed packet size; these are sent in
+ addition to the main load streams.
+
+- TG reports Min/Avg/Max and HDRH latency values distribution per stream
+ direction, hence two sets of latency values are reported per test case
+ (marked as E-W and W-E).
+
+- +/- 1 usec is the measurement accuracy of TRex TG and the data in HDRH
+ latency values distribution is rounded to microseconds.
+
+- TRex TG introduces a (background) always-on Tx + Rx latency bias of 4
+ usec on average per direction resulting from TRex software writing and
+ reading packet timestamps on CPU cores. Quoted values are based on TG
+ back-to-back latency measurements.
+
+- Latency graphs are not smoothed, each latency value has its own
+ horizontal line across corresponding packet percentiles.
+
+- Percentiles are shown on X-axis using a logarithmic scale, so the
+ maximal latency value (ending at 100% percentile) would be in
+ infinity. The graphs are cut at 99.9999% (hover information still
+ lists 100%).
diff --git a/docs/content/methodology/measurements/telemetry.md b/docs/content/methodology/measurements/telemetry.md
new file mode 100644
index 0000000000..aed32d9e17
--- /dev/null
+++ b/docs/content/methodology/measurements/telemetry.md
@@ -0,0 +1,158 @@
+---
+title: "Telemetry"
+weight: 3
+---
+
+# Telemetry
+
+OpenMetrics specifies the de-facto standard for transmitting cloud-native
+metrics at scale, with support for both text representation and Protocol
+Buffers.
+
+## RFC
+
+- RFC2119
+- RFC5234
+- RFC8174
+- draft-richih-opsawg-openmetrics-00
+
+## Reference
+
+[OpenMetrics](https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md)
+
+## Metric Types
+
+- Gauge
+- Counter
+- StateSet
+- Info
+- Histogram
+- GaugeHistogram
+- Summary
+- Unknown
+
+Telemetry module in CSIT currently support only Gauge, Counter and Info.
+
+## Anatomy of CSIT telemetry implementation
+
+Existing implementation consists of several measurment building blocks:
+the main measuring block running search algorithms (MLR, PLR, SOAK, MRR, ...),
+the latency measuring block and the several telemetry blocks with or without
+traffic running on a background.
+
+The main measuring block must not be interrupted by any read operation that can
+impact data plane traffic processing during throughput search algorithm. Thus
+operational reads are done before (pre-stat) and after (post-stat) that block.
+
+Some operational reads must be done while traffic is running and usually
+consists of two reads (pre-run-stat, post-run-stat) with defined delay between
+them.
+
+## MRR measurement
+
+ traffic_start(r=mrr) traffic_stop |< measure >|
+ | | | (r=mrr) |
+ | pre_run_stat post_run_stat | pre_stat | | post_stat
+ | | | | | | | |
+ o--------o---------------o-------o------o------+---------------+------o------>
+ t
+ Legend:
+ - pre_run_stat
+ - vpp-clear-runtime
+ - post_run_stat
+ - vpp-show-runtime
+ - bash-perf-stat // if extended_debug == True
+ - pre_stat
+ - vpp-clear-stats
+ - vpp-enable-packettrace // if extended_debug == True
+ - vpp-enable-elog
+ - post_stat
+ - vpp-show-stats
+ - vpp-show-packettrace // if extended_debug == True
+ - vpp-show-elog
+
+ |< measure >|
+ | (r=mrr) |
+ | |
+ |< traffic_trial0 >|< traffic_trial1 >|< traffic_trialN >|
+ | (i=0,t=duration) | (i=1,t=duration) | (i=N,t=duration) |
+ | | | |
+ o-----------------------o------------------------o------------------------o--->
+ t
+
+
+## MLR measurement
+
+ |< measure >| traffic_start(r=pdr) traffic_stop traffic_start(r=ndr) traffic_stop |< [ latency ] >|
+ | (r=mlr) | | | | | | .9/.5/.1/.0 |
+ | | | pre_run_stat post_run_stat | | pre_run_stat post_run_stat | | |
+ | | | | | | | | | | | |
+ +-------------+---o-------o---------------o--------o-------------o-------o---------------o--------o------------[-------------------]--->
+ t
+ Legend:
+ - pre_run_stat
+ - vpp-clear-runtime
+ - post_run_stat
+ - vpp-show-runtime
+ - bash-perf-stat // if extended_debug == True
+ - pre_stat
+ - vpp-clear-stats
+ - vpp-enable-packettrace // if extended_debug == True
+ - vpp-enable-elog
+ - post_stat
+ - vpp-show-stats
+ - vpp-show-packettrace // if extended_debug == True
+ - vpp-show-elog
+
+## MRR measurement
+
+ traffic_start(r=mrr) traffic_stop |< measure >|
+ | | | (r=mrr) |
+ | |< stat_runtime >| | stat_pre_trial | | stat_post_trial
+ | | | | | | | |
+ o---+------------------+---o------o------------+-------------+----o------------>
+ t
+ Legend:
+ - stat_runtime
+ - vpp-runtime
+ - stat_pre_trial
+ - vpp-clear-stats
+ - vpp-enable-packettrace // if extended_debug == True
+ - stat_post_trial
+ - vpp-show-stats
+ - vpp-show-packettrace // if extended_debug == True
+
+ |< measure >|
+ | (r=mrr) |
+ | |
+ |< traffic_trial0 >|< traffic_trial1 >|< traffic_trialN >|
+ | (i=0,t=duration) | (i=1,t=duration) | (i=N,t=duration) |
+ | | | |
+ o------------------------o------------------------o------------------------o--->
+ t
+
+ |< stat_runtime >|
+ | |
+ |< program0 >|< program1 >|< programN >|
+ | (@=params) | (@=params) | (@=params) |
+ | | | |
+ o------------------------o------------------------o------------------------o--->
+ t
+
+## MLR measurement
+
+ |< measure >| traffic_start(r=pdr) traffic_stop traffic_start(r=ndr) traffic_stop |< [ latency ] >|
+ | (r=mlr) | | | | | | .9/.5/.1/.0 |
+ | | | |< stat_runtime >| | | |< stat_runtime >| | | |
+ | | | | | | | | | | | |
+ +-------------+---o---+------------------+---o--------------o---+------------------+---o-----------[-----------------]--->
+ t
+ Legend:
+ - stat_runtime
+ - vpp-runtime
+ - stat_pre_trial
+ - vpp-clear-stats
+ - vpp-enable-packettrace // if extended_debug == True
+ - stat_post_trial
+ - vpp-show-stats
+ - vpp-show-packettrace // if extended_debug == True