From 4f2d0c379b50b66e70d9615fc8425cd4772f7738 Mon Sep 17 00:00:00 2001
From: Vratko Polak <vrpolak@cisco.com>
Date: Tue, 16 Apr 2019 18:59:33 +0200
Subject: Add perpatch info to cpta methodology

Also, split methodology file into multiple, per section.

Change-Id: I973b93d1a99205d7adb80996a3657215e05b8985
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
---
 docs/cpta/methodology/index.rst                    | 279 +--------------------
 docs/cpta/methodology/jenkins_jobs.rst             |  62 +++++
 docs/cpta/methodology/overview.rst                 |  14 ++
 docs/cpta/methodology/performance_tests.rst        |  36 +++
 .../methodology/perpatch_performance_tests.rst     |  86 +++++++
 docs/cpta/methodology/testbed_hw_configuration.rst |   5 +
 docs/cpta/methodology/trend_analysis.rst           | 106 ++++++++
 docs/cpta/methodology/trend_presentation.rst       |  42 ++++
 8 files changed, 360 insertions(+), 270 deletions(-)
 create mode 100644 docs/cpta/methodology/jenkins_jobs.rst
 create mode 100644 docs/cpta/methodology/overview.rst
 create mode 100644 docs/cpta/methodology/performance_tests.rst
 create mode 100644 docs/cpta/methodology/perpatch_performance_tests.rst
 create mode 100644 docs/cpta/methodology/testbed_hw_configuration.rst
 create mode 100644 docs/cpta/methodology/trend_analysis.rst
 create mode 100644 docs/cpta/methodology/trend_presentation.rst

(limited to 'docs/cpta')

diff --git a/docs/cpta/methodology/index.rst b/docs/cpta/methodology/index.rst
index 7d7604bee8..cbcfcb50cb 100644
--- a/docs/cpta/methodology/index.rst
+++ b/docs/cpta/methodology/index.rst
@@ -3,273 +3,12 @@
 Trending Methodology
 ====================
 
-Overview
---------
-
-This document describes a high-level design of a system for continuous
-performance measuring, trending and change detection for FD.io VPP SW
-data plane (and other performance tests run within CSIT sub-project).
-
-There is a Performance Trending (PT) CSIT module, and a separate
-Performance Analysis (PA) module ingesting results from PT and
-analysing, detecting and reporting any performance anomalies using
-historical data and statistical metrics. PA does also produce
-trending dashboard, list of failed tests and graphs with summary and
-drill-down views across all specified tests that can be reviewed and
-inspected regularly by FD.io developers and users community.
-
-Performance Tests
------------------
-
-Performance trending relies on Maximum Receive Rate (MRR) tests.
-MRR tests measure the packet forwarding rate, in multiple trials of set
-duration, under the maximum load offered by traffic generator
-regardless of packet loss. Maximum load for specified Ethernet frame
-size is set to the bi-directional link rate.
-
-Current parameters for performance trending MRR tests:
-
-- **Ethernet frame sizes**: 64B (78B for IPv6 tests) for all tests, IMIX for
-  selected tests (vhost, memif); all quoted sizes include frame CRC, but
-  exclude per frame transmission overhead of 20B (preamble, inter frame
-  gap).
-- **Maximum load offered**: 10GE and 40GE link (sub-)rates depending on NIC
-  tested, with the actual packet rate depending on frame size,
-  transmission overhead and traffic generator NIC forwarding capacity.
-
-  - For 10GE NICs the maximum packet rate load is 2* 14.88 Mpps for 64B,
-    a 10GE bi-directional link rate.
-  - For 40GE NICs the maximum packet rate load is 2* 18.75 Mpps for 64B,
-    a 40GE bi-directional link sub-rate limited by the packet forwarding
-    capacity of 2-port 40GE NIC model (XL710) used on T-Rex Traffic
-    Generator.
-
-- **Trial duration**: 1 sec.
-- **Number of trials per test**: 10.
-- **Test execution frequency**: twice a day, every 12 hrs (02:00,
-  14:00 UTC).
-
-Note: MRR tests should be reporting bi-directional link rate (or NIC
-rate, if lower) if tested VPP configuration can handle the packet rate
-higher than bi-directional link rate, e.g. large packet tests and/or
-multi-core tests. In other words MRR = min(VPP rate, bi-dir link rate,
-NIC rate).
-
-Trend Analysis
---------------
-
-All measured performance trend data is treated as time-series data that
-can be modelled as concatenation of groups, each group modelled
-using normal distribution. While sometimes the samples within a group
-are far from being distributed normally, currently we do not have a
-better tractable model.
-
-Here, "sample" should be the result of single trial measurement,
-with group boundaries set only at test run granularity.
-But in order to avoid detecting causes unrelated to VPP performance,
-the default presentation (without /new/ in URL)
-takes average of all trials within the run as the sample.
-Effectively, this acts as a single trial with aggregate duration.
-
-Performance graphs always show the run average (not all trial results).
-
-The group boundaries are selected based on `Minimum Description Length`_.
-
-Minimum Description Length
---------------------------
-
-`Minimum Description Length`_ (MDL) is a particular formalization
-of `Occam's razor`_ principle.
-
-The general formulation mandates to evaluate a large set of models,
-but for anomaly detection purposes, it is useful to consider
-a smaller set of models, so that scoring and comparing them is easier.
-
-For each candidate model, the data should be compressed losslessly,
-which includes model definitions, encoded model parameters,
-and the raw data encoded based on probabilities computed by the model.
-The model resulting in shortest compressed message is the "the" correct model.
-
-For our model set (groups of normally distributed samples),
-we need to encode group length (which penalizes too many groups),
-group average (more on that later), group stdev and then all the samples.
-
-Luckily, the "all the samples" part turns out to be quite easy to compute.
-If sample values are considered as coordinates in (multi-dimensional)
-Euclidean space, fixing stdev means the point with allowed coordinates
-lays on a sphere. Fixing average intersects the sphere with a (hyper)-plane,
-and Gaussian probability density on the resulting sphere is constant.
-So the only contribution is the "area" of the sphere, which only depends
-on the number of samples and stdev.
-
-A somehow ambiguous part is in choosing which encoding
-is used for group size, average and stdev.
-Different encodings cause different biases to large or small values.
-In our implementation we have chosen probability density
-corresponding to uniform distribution (from zero to maximal sample value)
-for stdev and average of the first group,
-but for averages of subsequent groups we have chosen a distribution
-which disourages delimiting groups with averages close together.
-
-Our implementation assumes that measurement precision is 1.0 pps.
-Thus it is slightly wrong for trial durations other than 1.0 seconds.
-Also, all the calculations assume 1.0 pps is totally negligible,
-compared to stdev value.
-
-The group selection algorithm currently has no parameters,
-all the aforementioned encodings and handling of precision is hardcoded.
-In principle, every group selection is examined, and the one encodable
-with least amount of bits is selected.
-As the bit amount for a selection is just sum of bits for every group,
-finding the best selection takes number of comparisons
-quadratically increasing with the size of data,
-the overall time complexity being probably cubic.
-
-The resulting group distribution looks good
-if samples are distributed normally enough within a group.
-But for obviously different distributions (for example `bimodal distribution`_)
-the groups tend to focus on less relevant factors (such as "outlier" density).
-
-Anomaly Detection
-`````````````````
-
-Once the trend data is divided into groups, each group has its population average.
-The start of the following group is marked as a regression (or progression)
-if the new group's average is lower (higher) then the previous group's.
-
-In the text below, "average at time <t>", shorthand "AVG[t]"
-means "the group average of the group the sample at time <t> belongs to".
-
-Trend Compliance
-````````````````
-
-Trend compliance metrics are targeted to provide an indication of trend
-changes over a short-term (i.e. weekly) and a long-term (i.e.
-quarterly), comparing the last group average AVG[last], to the one from week
-ago, AVG[last - 1week] and to the maximum of trend values over last
-quarter except last week, max(AVG[last - 3mths]..ANV[last - 1week]),
-respectively. This results in following trend compliance calculations:
-
-+-------------------------+---------------------------------+-----------+-------------------------------------------+
-| Trend Compliance Metric | Trend Change Formula            | Value     | Reference                                 |
-+=========================+=================================+===========+===========================================+
-| Short-Term Change       | (Value - Reference) / Reference | AVG[last] | AVG[last - 1week]                         |
-+-------------------------+---------------------------------+-----------+-------------------------------------------+
-| Long-Term Change        | (Value - Reference) / Reference | AVG[last] | max(AVG[last - 3mths]..AVG[last - 1week]) |
-+-------------------------+---------------------------------+-----------+-------------------------------------------+
-
-Trend Presentation
-------------------
-
-Performance Dashboard
-`````````````````````
-
-Dashboard tables list a summary of per test-case VPP MRR performance
-trend and trend compliance metrics and detected number of anomalies.
-
-Separate tables are generated for each testbed and each tested number of
-physical cores for VPP workers (1c, 2c, 4c). Test case names are linked to
-respective trending graphs for ease of navigation through the test data.
-
-Failed tests
-````````````
-
-The Failed tests tables list the tests which failed over the specified seven-
-day period together with the number of fails over the period and last failure
-details - Time, VPP-Build-Id and CSIT-Job-Build-Id.
-
-Separate tables are generated for each testbed. Test case names are linked to
-respective trending graphs for ease of navigation through the test data.
-
-Trendline Graphs
-````````````````
-
-Trendline graphs show measured per run averages of MRR values,
-group average values, and detected anomalies.
-The graphs are constructed as follows:
-
-- X-axis represents the date in the format MMDD.
-- Y-axis represents run-average MRR value in Mpps.
-- Markers to indicate anomaly classification:
-
-  - Regression - red circle.
-  - Progression - green circle.
-
-- The line shows average MRR value of each group.
-
-In addition the graphs show dynamic labels while hovering over graph
-data points, presenting the CSIT build date, measured MRR value, VPP
-reference, trend job build ID and the LF testbed ID.
-
-Jenkins Jobs
-------------
-
-Performance Trending (PT)
-`````````````````````````
-
-CSIT PT runs regular performance test jobs measuring and collecting MRR
-data per test case. PT is designed as follows:
-
-1. PT job triggers:
-
-   a) Periodic e.g. twice a day.
-   b) On-demand gerrit triggered.
-
-2. Measurements and data calculations per test case:
-
-  a) Max Received Rate (MRR) - for each trial measurement,
-     send packets at link rate for trial duration,
-     count total received packets, divide by trial duration.
-
-3. Archive MRR values per test case.
-4. Archive all counters collected at MRR.
-
-Performance Analysis (PA)
-`````````````````````````
-
-CSIT PA runs performance analysis
-including anomaly detection as described above.
-PA is defined as follows:
-
-1. PA job triggers:
-
-   a) By PT jobs at their completion.
-   b) On-demand gerrit triggered.
-
-2. Download and parse archived historical data and the new data:
-
-   a) Download RF output.xml files from latest PT job and compressed
-      archived data from nexus.
-   b) Parse out the data filtering test cases listed in PA specification
-      (part of CSIT PAL specification file).
-
-3. Re-calculate new groups and their averages.
-
-4. Evaluate new test data:
-
-   a) If the existing group is prolonged => Result = Pass,
-      Reason = Normal.
-   b) If a new group is detected with lower average =>
-      Result = Fail, Reason = Regression.
-   c) If a new group is detected with higher average =>
-      Result = Pass, Reason = Progression.
-
-5. Generate and publish results
-
-   a) Relay evaluation result to job result.
-   b) Generate a new set of trend summary dashboard, list of failed
-      tests and graphs.
-   c) Publish trend dashboard and graphs in html format on
-      https://docs.fd.io/.
-   d) Generate an alerting email. This email is sent by Jenkins to
-      csit-report@lists.fd.io
-
-Testbed HW configuration
-------------------------
-
-The testbed HW configuration is described on
-`this FD.IO wiki page <https://wiki.fd.io/view/CSIT/CSIT_LF_testbed#FD.IO_CSIT_testbed_-_Server_HW_Configuration>`_.
-
-.. _Minimum Description Length: https://en.wikipedia.org/wiki/Minimum_description_length
-.. _Occam's razor: https://en.wikipedia.org/wiki/Occam%27s_razor
-.. _bimodal distribution: https://en.wikipedia.org/wiki/Bimodal_distribution
+.. toctree::
+
+    overview
+    performance_tests
+    trend_analysis
+    trend_presentation
+    jenkins_jobs
+    testbed_hw_configuration
+    perpatch_performance_tests
diff --git a/docs/cpta/methodology/jenkins_jobs.rst b/docs/cpta/methodology/jenkins_jobs.rst
new file mode 100644
index 0000000000..677e0bc748
--- /dev/null
+++ b/docs/cpta/methodology/jenkins_jobs.rst
@@ -0,0 +1,62 @@
+Jenkins Jobs
+------------
+
+Performance Trending (PT)
+`````````````````````````
+
+CSIT PT runs regular performance test jobs measuring and collecting MRR
+data per test case. PT is designed as follows:
+
+1. PT job triggers:
+
+   a) Periodic e.g. twice a day.
+   b) On-demand gerrit triggered.
+
+2. Measurements and data calculations per test case:
+
+  a) Max Received Rate (MRR) - for each trial measurement,
+     send packets at link rate for trial duration,
+     count total received packets, divide by trial duration.
+
+3. Archive MRR values per test case.
+4. Archive all counters collected at MRR.
+
+Performance Analysis (PA)
+`````````````````````````
+
+CSIT PA runs performance analysis
+including anomaly detection as described above.
+PA is defined as follows:
+
+1. PA job triggers:
+
+   a) By PT jobs at their completion.
+   b) On-demand gerrit triggered.
+
+2. Download and parse archived historical data and the new data:
+
+   a) Download RF output.xml files from latest PT job and compressed
+      archived data from nexus.
+   b) Parse out the data filtering test cases listed in PA specification
+      (part of CSIT PAL specification file).
+
+3. Re-calculate new groups and their averages.
+
+4. Evaluate new test data:
+
+   a) If the existing group is prolonged => Result = Pass,
+      Reason = Normal.
+   b) If a new group is detected with lower average =>
+      Result = Fail, Reason = Regression.
+   c) If a new group is detected with higher average =>
+      Result = Pass, Reason = Progression.
+
+5. Generate and publish results
+
+   a) Relay evaluation result to job result.
+   b) Generate a new set of trend summary dashboard, list of failed
+      tests and graphs.
+   c) Publish trend dashboard and graphs in html format on
+      https://docs.fd.io/.
+   d) Generate an alerting email. This email is sent by Jenkins to
+      csit-report@lists.fd.io
diff --git a/docs/cpta/methodology/overview.rst b/docs/cpta/methodology/overview.rst
new file mode 100644
index 0000000000..ecea051116
--- /dev/null
+++ b/docs/cpta/methodology/overview.rst
@@ -0,0 +1,14 @@
+Overview
+--------
+
+This document describes a high-level design of a system for continuous
+performance measuring, trending and change detection for FD.io VPP SW
+data plane (and other performance tests run within CSIT sub-project).
+
+There is a Performance Trending (PT) CSIT module, and a separate
+Performance Analysis (PA) module ingesting results from PT and
+analysing, detecting and reporting any performance anomalies using
+historical data and statistical metrics. PA does also produce
+trending dashboard, list of failed tests and graphs with summary and
+drill-down views across all specified tests that can be reviewed and
+inspected regularly by FD.io developers and users community.
diff --git a/docs/cpta/methodology/performance_tests.rst b/docs/cpta/methodology/performance_tests.rst
new file mode 100644
index 0000000000..82e64f870a
--- /dev/null
+++ b/docs/cpta/methodology/performance_tests.rst
@@ -0,0 +1,36 @@
+Performance Tests
+-----------------
+
+Performance trending relies on Maximum Receive Rate (MRR) tests.
+MRR tests measure the packet forwarding rate, in multiple trials of set
+duration, under the maximum load offered by traffic generator
+regardless of packet loss. Maximum load for specified Ethernet frame
+size is set to the bi-directional link rate.
+
+Current parameters for performance trending MRR tests:
+
+- **Ethernet frame sizes**: 64B (78B for IPv6 tests) for all tests, IMIX for
+  selected tests (vhost, memif); all quoted sizes include frame CRC, but
+  exclude per frame transmission overhead of 20B (preamble, inter frame
+  gap).
+- **Maximum load offered**: 10GE and 40GE link (sub-)rates depending on NIC
+  tested, with the actual packet rate depending on frame size,
+  transmission overhead and traffic generator NIC forwarding capacity.
+
+  - For 10GE NICs the maximum packet rate load is 2* 14.88 Mpps for 64B,
+    a 10GE bi-directional link rate.
+  - For 40GE NICs the maximum packet rate load is 2* 18.75 Mpps for 64B,
+    a 40GE bi-directional link sub-rate limited by the packet forwarding
+    capacity of 2-port 40GE NIC model (XL710) used on T-Rex Traffic
+    Generator.
+
+- **Trial duration**: 1 sec.
+- **Number of trials per test**: 10.
+- **Test execution frequency**: twice a day, every 12 hrs (02:00,
+  14:00 UTC).
+
+Note: MRR tests should be reporting bi-directional link rate (or NIC
+rate, if lower) if tested VPP configuration can handle the packet rate
+higher than bi-directional link rate, e.g. large packet tests and/or
+multi-core tests. In other words MRR = min(VPP rate, bi-dir link rate,
+NIC rate).
diff --git a/docs/cpta/methodology/perpatch_performance_tests.rst b/docs/cpta/methodology/perpatch_performance_tests.rst
new file mode 100644
index 0000000000..c1d3d669b1
--- /dev/null
+++ b/docs/cpta/methodology/perpatch_performance_tests.rst
@@ -0,0 +1,86 @@
+Per-patch performance tests
+---------------------------
+
+Updated for CSIT git commit id: 661035ac4ce6e51649f302fe2b7a8218257c0587.
+
+A methodology similar to trending analysis is used for comparing performance
+before a DUT code change is merged. This can act as a verify job to disallow
+changes which would decrease performance without a good reason.
+
+Existing jobs
+`````````````
+
+VPP is the only project currently using such jobs.
+They are not started automatically, must be triggered on demand.
+They allow full tag expressions, but some tags are enforced (such as MRR).
+
+Only the three types of tesbed based on Xeon processors have jobs created.
+Their Gerrit triggers words are "perftest-3n-hsw", "perftest-3n-skx"
+and "perftest-2n-skx".
+
+If additional arguments are added to the Gerrit trigger, they are treated
+as Robot tag expressions to select tests to run. For more details
+on existing tags, see `tag documentation rst file`_.
+
+Basic operation
+```````````````
+
+The job builds VPP .deb packages for both the patch under test
+(called "current") and its parent patch (called "parent").
+
+For each test (from a set defined by tag expression),
+both builds are subjected to several trial measurements (BMRR).
+Measured samples are grouped to "parent" sequence,
+followed by "current" sequence. The same Minimal Description Length
+algorithm as in trending is used to decide whether it is one big group,
+or two smaller gropus. If it is one group, a "normal" result
+is declared for the test. If it is two groups, and current average
+is less then parent average, the test is declared a regression.
+If it is two groups and current average is larger or equal,
+the test is declared a progression.
+
+The whole job fails (giving -1) if some trial measurement failed,
+or if any test was declared a regression.
+
+Temporary specifics
+```````````````````
+
+The Minimal Description Length analysis is performed by
+jumpavg-0.1.3 available on PyPI.
+
+In hopes of strengthening of signal (code performance) compared to noise
+(all other factors influencing the measured values), several workarounds
+are applied.
+
+In contrast to trending, trial duration is set to 10 seconds,
+and only 5 samples are measured for each build.
+Both parameters are set in ci-management.
+
+This decreases sensitivity to regressions, but also decreases
+probability of false positives.
+
+Console output
+``````````````
+
+The following information as visible towards the end of Jenkins console output,
+repeated for each analyzed test.
+
+The original 5 values are visible in order they were measured.
+The 5 values after processing are also visible in output,
+this time sorted by value (so people can see minimum and maximum).
+
+The next output is difference of averages. It is the current average
+minus the parent average, expressed as percentage of the parent average.
+
+The next three outputs contain the jumpavg representation
+of the two groups and a combined group.
+Here, "bits" is the description length; for "current" sequence
+it includes effect from "parent" average value
+(jumpavg-0.1.3 penalizes sequences with too close averages).
+
+Next, a sentence describing which grouping description is shorter,
+by how much bits.
+Finally, the test result classification is visible.
+
+The algorithm does not track test case names,
+so test cases are indexed (from 0).
diff --git a/docs/cpta/methodology/testbed_hw_configuration.rst b/docs/cpta/methodology/testbed_hw_configuration.rst
new file mode 100644
index 0000000000..7914de5674
--- /dev/null
+++ b/docs/cpta/methodology/testbed_hw_configuration.rst
@@ -0,0 +1,5 @@
+Testbed HW configuration
+------------------------
+
+The testbed HW configuration is described on
+`this FD.IO wiki page <https://wiki.fd.io/view/CSIT/CSIT_LF_testbed#FD.IO_CSIT_testbed_-_Server_HW_Configuration>`_.
diff --git a/docs/cpta/methodology/trend_analysis.rst b/docs/cpta/methodology/trend_analysis.rst
new file mode 100644
index 0000000000..9916f20350
--- /dev/null
+++ b/docs/cpta/methodology/trend_analysis.rst
@@ -0,0 +1,106 @@
+Trend Analysis
+--------------
+
+All measured performance trend data is treated as time-series data that
+can be modelled as concatenation of groups, each group modelled
+using normal distribution. While sometimes the samples within a group
+are far from being distributed normally, currently we do not have a
+better tractable model.
+
+Here, "sample" should be the result of single trial measurement,
+with group boundaries set only at test run granularity.
+But in order to avoid detecting causes unrelated to VPP performance,
+the default presentation (without /new/ in URL)
+takes average of all trials within the run as the sample.
+Effectively, this acts as a single trial with aggregate duration.
+
+Performance graphs always show the run average (not all trial results).
+
+The group boundaries are selected based on `Minimum Description Length`_.
+
+Minimum Description Length
+``````````````````````````
+
+`Minimum Description Length`_ (MDL) is a particular formalization
+of `Occam's razor`_ principle.
+
+The general formulation mandates to evaluate a large set of models,
+but for anomaly detection purposes, it is useful to consider
+a smaller set of models, so that scoring and comparing them is easier.
+
+For each candidate model, the data should be compressed losslessly,
+which includes model definitions, encoded model parameters,
+and the raw data encoded based on probabilities computed by the model.
+The model resulting in shortest compressed message is the "the" correct model.
+
+For our model set (groups of normally distributed samples),
+we need to encode group length (which penalizes too many groups),
+group average (more on that later), group stdev and then all the samples.
+
+Luckily, the "all the samples" part turns out to be quite easy to compute.
+If sample values are considered as coordinates in (multi-dimensional)
+Euclidean space, fixing stdev means the point with allowed coordinates
+lays on a sphere. Fixing average intersects the sphere with a (hyper)-plane,
+and Gaussian probability density on the resulting sphere is constant.
+So the only contribution is the "area" of the sphere, which only depends
+on the number of samples and stdev.
+
+A somehow ambiguous part is in choosing which encoding
+is used for group size, average and stdev.
+Different encodings cause different biases to large or small values.
+In our implementation we have chosen probability density
+corresponding to uniform distribution (from zero to maximal sample value)
+for stdev and average of the first group,
+but for averages of subsequent groups we have chosen a distribution
+which disourages delimiting groups with averages close together.
+
+Our implementation assumes that measurement precision is 1.0 pps.
+Thus it is slightly wrong for trial durations other than 1.0 seconds.
+Also, all the calculations assume 1.0 pps is totally negligible,
+compared to stdev value.
+
+The group selection algorithm currently has no parameters,
+all the aforementioned encodings and handling of precision is hardcoded.
+In principle, every group selection is examined, and the one encodable
+with least amount of bits is selected.
+As the bit amount for a selection is just sum of bits for every group,
+finding the best selection takes number of comparisons
+quadratically increasing with the size of data,
+the overall time complexity being probably cubic.
+
+The resulting group distribution looks good
+if samples are distributed normally enough within a group.
+But for obviously different distributions (for example `bimodal distribution`_)
+the groups tend to focus on less relevant factors (such as "outlier" density).
+
+Anomaly Detection
+`````````````````
+
+Once the trend data is divided into groups, each group has its population average.
+The start of the following group is marked as a regression (or progression)
+if the new group's average is lower (higher) then the previous group's.
+
+In the text below, "average at time <t>", shorthand "AVG[t]"
+means "the group average of the group the sample at time <t> belongs to".
+
+Trend Compliance
+````````````````
+
+Trend compliance metrics are targeted to provide an indication of trend
+changes over a short-term (i.e. weekly) and a long-term (i.e.
+quarterly), comparing the last group average AVG[last], to the one from week
+ago, AVG[last - 1week] and to the maximum of trend values over last
+quarter except last week, max(AVG[last - 3mths]..ANV[last - 1week]),
+respectively. This results in following trend compliance calculations:
+
++-------------------------+---------------------------------+-----------+-------------------------------------------+
+| Trend Compliance Metric | Trend Change Formula            | Value     | Reference                                 |
++=========================+=================================+===========+===========================================+
+| Short-Term Change       | (Value - Reference) / Reference | AVG[last] | AVG[last - 1week]                         |
++-------------------------+---------------------------------+-----------+-------------------------------------------+
+| Long-Term Change        | (Value - Reference) / Reference | AVG[last] | max(AVG[last - 3mths]..AVG[last - 1week]) |
++-------------------------+---------------------------------+-----------+-------------------------------------------+
+
+.. _Minimum Description Length: https://en.wikipedia.org/wiki/Minimum_description_length
+.. _Occam's razor: https://en.wikipedia.org/wiki/Occam%27s_razor
+.. _bimodal distribution: https://en.wikipedia.org/wiki/Bimodal_distribution
diff --git a/docs/cpta/methodology/trend_presentation.rst b/docs/cpta/methodology/trend_presentation.rst
new file mode 100644
index 0000000000..e9918020c5
--- /dev/null
+++ b/docs/cpta/methodology/trend_presentation.rst
@@ -0,0 +1,42 @@
+Trend Presentation
+------------------
+
+Performance Dashboard
+`````````````````````
+
+Dashboard tables list a summary of per test-case VPP MRR performance
+trend and trend compliance metrics and detected number of anomalies.
+
+Separate tables are generated for each testbed and each tested number of
+physical cores for VPP workers (1c, 2c, 4c). Test case names are linked to
+respective trending graphs for ease of navigation through the test data.
+
+Failed tests
+````````````
+
+The Failed tests tables list the tests which failed over the specified seven-
+day period together with the number of fails over the period and last failure
+details - Time, VPP-Build-Id and CSIT-Job-Build-Id.
+
+Separate tables are generated for each testbed. Test case names are linked to
+respective trending graphs for ease of navigation through the test data.
+
+Trendline Graphs
+````````````````
+
+Trendline graphs show measured per run averages of MRR values,
+group average values, and detected anomalies.
+The graphs are constructed as follows:
+
+- X-axis represents the date in the format MMDD.
+- Y-axis represents run-average MRR value in Mpps.
+- Markers to indicate anomaly classification:
+
+  - Regression - red circle.
+  - Progression - green circle.
+
+- The line shows average MRR value of each group.
+
+In addition the graphs show dynamic labels while hovering over graph
+data points, presenting the CSIT build date, measured MRR value, VPP
+reference, trend job build ID and the LF testbed ID.
-- 
cgit 1.2.3-korg