1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
|
Introduction
============
Purpose
-------
With increasing number of features and code changes in the FD.io VPP data plane
codebase, it is increasingly difficult to measure and detect VPP data plane
performance changes. Similarly, once degradation is detected, it is getting
harder to bisect the source code in search of the Bad code change or addition.
The problem is further escalated by a large combination of compute platforms
that VPP is running and used on, including Intel Xeon, Intel Atom, ARM Aarch64.
Existing FD.io CSIT continuous performance trending test jobs help, but they
rely on human factors for anomaly detection, and as such are error prone and
unreliable, as the volume of data generated by these jobs is growing
exponentially.
Proposed solution is to eliminate human factor and fully automate performance
trending, regression and progression detection, as well as bisecting.
This document describes a high-level design of a system for continuous
measuring, trending and performance change detection for FD.io VPP SW data
plane. It builds upon the existing CSIT framework with extensions to its
throughput testing methodology, CSIT data analytics engine
(PAL – Presentation-and-Analytics-Layer) and associated Jenkins jobs
definitions.
Continuous Performance Trending and Analysis
--------------------------------------------
Proposed design replaces existing CSIT performance trending jobs and tests with
new Performance Trending (PT) CSIT module and separate Performance Analysis (PA)
module ingesting results from PT and analysing, detecting and reporting any
performance anomalies using historical trending data and statistical metrics.
PA does also produce trending graphs with summary and drill-down views across
all specified tests that can be reviewed and inspected regularly by FD.io
developers and users community.
Trend Analysis
``````````````
All measured performance trend data is treated as time-series data that can be
modelled using normal distribution. After trimming the outliers, the average and
deviations from average are used for detecting performance change anomalies
following the three-sigma rule of thumb (a.k.a. 68-95-99.7 rule).
Analysis Metrics
````````````````
Following statistical metrics are proposed as performance trend indicators over
the rolling window of last <N> sets of historical measurement data:
#. Quartiles Q1, Q2, Q3 – three points dividing a ranked set of data set
into four equal parts, Q2 is the median of the data.
#. Inter Quartile Range IQR=Q3-Q1 – measure of variability, used here to
eliminate outliers.
#. Outliers – extreme values that are at least 1.5*IQR below Q1, or at
least 1.5*IQR above Q3.
#. Trimmed Moving Average (TMA) – average across the data set of the rolling
window of <N> values without the outliers. Used here to calculate TMSD.
#. Trimmed Moving Standard Deviation (TMSD) – standard deviation over the
data set of the rolling window of <N> values without the outliers,
requires calculating TMA. Used here for anomaly detection.
#. Moving Median (MM) - median across the data set of the rolling window of
<N> values with all data points, including the outliers. Used here for
anomaly detection.
Anomaly Detection
`````````````````
Based on the assumption that all performance measurements can be modelled using
normal distribution, a three-sigma rule of thumb is proposed as the main
criteria for anomaly detection.
Three-sigma rule of thumb, aka 68–95–99.7 rule, is a shorthand used to capture
the percentage of values that lie within a band around the average (mean) in a
normal distribution within a width of two, four and six standard deviations.
More accurately 68.27%, 95.45% and 99.73% of the result values should lie within
one, two or three standard deviations of the mean, see figure below.
To verify compliance of test result with value X against defined trend analysis
metric and detect anomalies, three simple evaluation criteria are proposed:
::
Test Result Evaluation Reported Result Reported Reason Trending Graph Markers
==========================================================================================
Normal Pass Normal Part of plot line
Regression Fail Regression Red circle
Progression Pass Progression Green circle
Jenkins job cumulative results:
#. Pass - if all detection results are Pass or Warning.
#. Fail - if any detection result is Fail.
Performance Trending (PT)
`````````````````````````
CSIT PT runs regular performance test jobs finding MRR, PDR and NDR per test
cases. PT is designed as follows:
#. PT job triggers:
#. Periodic e.g. daily.
#. On-demand gerrit triggered.
#. Other periodic TBD.
#. Measurements and calculations per test case:
#. MRR Max Received Rate
#. Measured: Unlimited tolerance of packet loss.
#. Send packets at link rate, count total received packets, divide
by test trial period.
#. Optimized binary search bounds for PDR and NDR tests:
#. Calculated: High and low bounds for binary search based on MRR
and pre-defined Packet Loss Ratio (PLR).
#. HighBound=MRR, LowBound=to-be-determined.
#. PLR – acceptable loss ratio for PDR tests, currently set to 0.5%
for all performance tests.
#. PDR and NDR:
#. Run binary search within the calculated bounds, find PDR and NDR.
#. Measured: PDR Partial Drop Rate – limited non-zero tolerance of
packet loss.
#. Measured: NDR Non Drop Rate - zero packet loss.
#. Archive MRR, PDR and NDR per test case.
#. Archive counters collected at MRR, PDR and NDR.
Performance Analysis (PA)
`````````````````````````
CSIT PA runs performance analysis, change detection and trending using specified
trend analysis metrics over the rolling window of last <N> sets of historical
measurement data. PA is defined as follows:
#. PA job triggers:
#. By PT job at its completion.
#. Manually from Jenkins UI.
#. Download and parse archived historical data and the new data:
#. New data from latest PT job is evaluated against the rolling window
of <N> sets of historical data.
#. Download RF output.xml files and compressed archived data.
#. Parse out the data filtering test cases listed in PA specification
(part of CSIT PAL specification file).
#. Calculate trend metrics for the rolling window of <N> sets of historical data:
#. Calculate quartiles Q1, Q2, Q3.
#. Trim outliers using IQR.
#. Calculate TMA and TMSD.
#. Calculate normal trending range per test case based on TMA and TMSD.
#. Evaluate new test data against trend metrics:
#. If within the range of (TMA +/- 3*TMSD) => Result = Pass,
Reason = Normal.
#. If below the range => Result = Fail, Reason = Regression.
#. If above the range => Result = Pass, Reason = Progression.
#. Generate and publish results
#. Relay evaluation result to job result.
#. Generate a new set of trend analysis summary graphs and drill-down
graphs.
#. Summary graphs to include measured values with Normal,
Progression and Regression markers. MM shown in the background if
possible.
#. Drill-down graphs to include MM, TMA and TMSD.
#. Publish trend analysis graphs in html format.
|