diff options
author | Vratko Polak <vrpolak@cisco.com> | 2018-06-14 14:03:44 +0200 |
---|---|---|
committer | Tibor Frank <tifrank@cisco.com> | 2018-06-20 06:43:22 +0000 |
commit | f2430562835ded8aeb66db2c36379cf3ea54c748 (patch) | |
tree | 0100045839806e1308c76ed7e89943fa96e616fb /docs/new/cpta | |
parent | 47b807e7268231b35982dc3b5a0c3108537d6432 (diff) |
CSIT-1110: Improve new detection methodology doc
Change-Id: I068fd4e9418f232ee1e1f13994e9c5c431478ec8
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
Diffstat (limited to 'docs/new/cpta')
-rw-r--r-- | docs/new/cpta/introduction/index.rst | 14 | ||||
-rw-r--r-- | docs/new/cpta/methodology/index.rst | 105 |
2 files changed, 82 insertions, 37 deletions
diff --git a/docs/new/cpta/introduction/index.rst b/docs/new/cpta/introduction/index.rst index 991181aff4..229e9e3da9 100644 --- a/docs/new/cpta/introduction/index.rst +++ b/docs/new/cpta/introduction/index.rst @@ -8,17 +8,18 @@ Performance dashboard tables provide the latest VPP throughput trend, trend compliance and detected anomalies, all on a per VPP test case basis. Linked trendline graphs enable further drill-down into the trendline compliance, sequence and nature of anomalies, as well as -pointers to performance test builds/logs and VPP builds. Performance -trending is currently based on the Maximum Receive Rate (MRR) tests. MRR -tests measure the packet forwarding rate under the maximum load offered +pointers to performance test builds/logs and VPP (or DPDK) builds. +Performance trending is currently based on the Maximum Receive Rate (MRR) tests. +MRR tests measure the packet forwarding rate under the maximum load offered by traffic generator over a set trial duration, regardless of packet loss. See :ref:`trending_methodology` section for more detail including trend and anomaly calculations. -Data samples are generated by the CSIT VPP performance trending jobs +Data samples are generated by the CSIT VPP (and DPDK) performance trending jobs executed twice a day (target start: every 12 hrs, 02:00, 14:00 UTC). All -trend and anomaly evaluation is based on a rolling window of <N=14> data -samples, covering last 7 days. +trend and anomaly evaluation is based on an algorithm which divides test runs +into groups according to minimum description length principle. +The trend value is the population average of the results within a group. Failed tests ------------ @@ -53,7 +54,6 @@ Legend to the tables: maximum of trend values over the last quarter except last week. - **Regressions [#]**: Number of regressions detected. - **Progressions [#]**: Number of progressions detected. - - **Outliers [#]**: Number of outliers detected. Tested VPP worker-thread-core combinations (1t1c, 2t2c, 4t4c) are listed in separate tables in section 1.x. Followed by trending methodology in diff --git a/docs/new/cpta/methodology/index.rst b/docs/new/cpta/methodology/index.rst index ff69eb1f9a..612f6b32db 100644 --- a/docs/new/cpta/methodology/index.rst +++ b/docs/new/cpta/methodology/index.rst @@ -66,7 +66,63 @@ are far from being distributed normally, we do not have a better tractable model The group boundaries are selected based on `Minimum Description Length`_. -TODO: Decide the level of detail for describing group selection. +Minimum Description Length +-------------------------- + +`Minimum Description Length`_ (MDL) is a particular formalization +of `Occam's razor`_ principle. + +The general formulation mandates to evaluate a large set of models, +but for anomaly detection purposes, it is usefuls to consider +a smaller set of models, so that scoring and comparing them is easier. + +For each candidate model, the data should be compressed losslessly, +which includes model definitions, encoded model parameters, +and the raw data encoded based on probabilities computed by the model. +The model resulting in shortest compressed message is the "the" correct model. + +For our model set (groups of normally distributed samples), +we need to encode group length (which penalizes too many groups), +group average (more on that later), group stdev and then all the samples. + +Luckily, the "all the samples" part turns out to be quite easy to compute. +If sample values are considered as coordinates in (multi-dimensional) +Euclidean space, fixing stdev means the point with allowed coordinates +lays on a sphere. Fixing average intersects the sphere with a (hyper)-plane, +and Gaussian probability density on the resulting sphere is constant. +So the only contribution is the "area" of the sphere, which only depends +on the number of samples and stdev. + +A somehow ambiguous part is in choosing which encoding +is used for group size, average and stdev. +Diferent encodings cause different biases to large or small values. +In our implementation we have chosen probability density +corresponding to uniform distribution (from zero to maximal sample value) +for stdev and average of the first group, +but for averages of subsequent groups we have chosen a distribution +which disourages deliminating groups with averages close together. + +One part of our implementation which is not precise enough +is handling of measurement precision. +The minimal difference in MRR values is currently 0.1 pps +(the difference of one packet over 10 second trial), +but the code assumes the precision is 1.0. +Also, all the calculations assume 1.0 is totally negligible, +compared to stdev value. + +The group selection algorithm currently has no parameters, +all the aforementioned encodings and handling of precision is hardcoded. +In principle, every group selection is examined, and the one encodable +with least amount of bits is selected. +As the bit amount for a selection is just sum of bits for every group, +finding the best selection takes number of comparisons +quadratically increasing with the size of data, +the overall time complexity being probably cubic. + +The resulting group distribution looks good +if samples are distributed normally enough within a group. +But for obviously different distributions (for example `bimodal distribution`_) +the groups tend to focus on less relevant factors (such as "outlier" density). Anomaly Detection ````````````````` @@ -75,37 +131,23 @@ Once the trend data is divided into groups, each group has its population averag The start of the following group is marked as a regression (or progression) if the new group's average is lower (higher) then the previous group's. -Metrics -``````` - -TODO: Only needed for current trend compliance. - -Following statistical metrics are used as performance trend indicators -over the rolling window of last <N> sets of historical measurement data: - -- **TMM** : **Trimmed Moving Median**, median across the data set of <N> - values excluding the outliers. Used as a trending value and as a - reference for anomaly detection. - Trend Compliance ```````````````` -TODO: Apply new detection algorithm also to trend compliance. - Trend compliance metrics are targeted to provide an indication of trend changes over a short-term (i.e. weekly) and a long-term (i.e. -quarterly), comparing the last trend value, TMM[last], to one from week -ago, TMM[last - 1week] and to the maximum of trend values over last -quarter except last week, max(TMM[(last - 3mths)..(last - 1week)]), +quarterly), comparing the last group average AVG[last], to the one from week +ago, AVG[last - 1week] and to the maximum of trend values over last +quarter except last week, max(AVG[last - 3mths]..ANV[last - 1week]), respectively. This results in following trend compliance calculations: -+-------------------------+---------------------------------+-----------+------------------------------------------+ -| Trend Compliance Metric | Trend Change Formula | Value | Reference | -+=========================+=================================+===========+==========================================+ -| Short-Term Change | (Value - Reference) / Reference | TMM[last] | TMM[last - 1week] | -+-------------------------+---------------------------------+-----------+------------------------------------------+ -| Long-Term Change | (Value - Reference) / Reference | TMM[last] | max(TMM[(last - 3mths)..(last - 1week)]) | -+-------------------------+---------------------------------+-----------+------------------------------------------+ ++-------------------------+---------------------------------+-----------+-------------------------------------------+ +| Trend Compliance Metric | Trend Change Formula | Value | Reference | ++=========================+=================================+===========+===========================================+ +| Short-Term Change | (Value - Reference) / Reference | AVG[last] | AVG[last - 1week] | ++-------------------------+---------------------------------+-----------+-------------------------------------------+ +| Long-Term Change | (Value - Reference) / Reference | AVG[last] | max(AVG[last - 3mths]..AVG[last - 1week]) | ++-------------------------+---------------------------------+-----------+-------------------------------------------+ Trend Presentation ------------------ @@ -131,10 +173,11 @@ associated gruop averages. The graphs are constructed as follows: - Y-axis represents MRR throughput in Mpps. - Markers to indicate anomaly classification: - - Outlier - gray circle around MRR value point. - Regression - red circle. - Progression - green circle. +- The line shows average of each group. + In addition the graphs show dynamic labels while hovering over graph data points, representing (trend job build Id, MRR value) and the actual vpp build number (b<XXX>) tested. @@ -184,12 +227,12 @@ PA is defined as follows: 3. Re-calculate new groups and their averages. -4. Evaluate new test data (TODO: Update.): +4. Evaluate new test data: - a) If within the range of (TMA +/- 3*TMSD) => Result = Pass, + a) If the existing group is prolonged => Result = Pass, Reason = Normal. (to be updated base on the final Jenkins code). - b) If below the range => Result = Fail, Reason = Regression. - c) If above the range => Result = Pass, Reason = Progression. + b) If a new group is detected with lower average => Result = Fail, Reason = Regression. + c) If a new group is detected with higher average => Result = Pass, Reason = Progression. 5. Generate and publish results @@ -206,3 +249,5 @@ The testbed HW configuration is described on `this FD.IO wiki page <https://wiki.fd.io/view/CSIT/CSIT_LF_testbed#FD.IO_CSIT_testbed_-_Server_HW_Configuration>`_. .. _Minimum Description Length: https://en.wikipedia.org/wiki/Minimum_description_length +.. _Occam's razor: https://en.wikipedia.org/wiki/Occam%27s_razor +.. _bimodal distribution: https://en.wikipedia.org/wiki/Bimodal_distribution |