diff options
Diffstat (limited to 'docs/cpta/introduction/index.rst')
-rw-r--r-- | docs/cpta/introduction/index.rst | 182 |
1 files changed, 182 insertions, 0 deletions
diff --git a/docs/cpta/introduction/index.rst b/docs/cpta/introduction/index.rst new file mode 100644 index 0000000000..aad683b390 --- /dev/null +++ b/docs/cpta/introduction/index.rst @@ -0,0 +1,182 @@ +Introduction +============ + +Purpose +------- + +With increasing number of features and code changes in the FD.io VPP data plane +codebase, it is increasingly difficult to measure and detect VPP data plane +performance changes. Similarly, once degradation is detected, it is getting +harder to bisect the source code in search of the Bad code change or addition. +The problem is further escalated by a large combination of compute platforms +that VPP is running and used on, including Intel Xeon, Intel Atom, ARM Aarch64. + +Existing FD.io CSIT continuous performance trending test jobs help, but they +rely on human factors for anomaly detection, and as such are error prone and +unreliable, as the volume of data generated by these jobs is growing +exponentially. + +Proposed solution is to eliminate human factor and fully automate performance +trending, regression and progression detection, as well as bisecting. + +This document describes a high-level design of a system for continuous +measuring, trending and performance change detection for FD.io VPP SW data +plane. It builds upon the existing CSIT framework with extensions to its +throughput testing methodology, CSIT data analytics engine +(PAL – Presentation-and-Analytics-Layer) and associated Jenkins jobs +definitions. + +Continuous Performance Trending and Analysis +-------------------------------------------- + +Proposed design replaces existing CSIT performance trending jobs and tests with +new Performance Trending (PT) CSIT module and separate Performance Analysis (PA) +module ingesting results from PT and analysing, detecting and reporting any +performance anomalies using historical trending data and statistical metrics. +PA does also produce trending graphs with summary and drill-down views across +all specified tests that can be reviewed and inspected regularly by FD.io +developers and users community. + +Trend Analysis +`````````````` + +All measured performance trend data is treated as time-series data that can be +modelled using normal distribution. After trimming the outliers, the average and +deviations from average are used for detecting performance change anomalies +following the three-sigma rule of thumb (a.k.a. 68-95-99.7 rule). + +Analysis Metrics +```````````````` + +Following statistical metrics are proposed as performance trend indicators over +the rolling window of last <N> sets of historical measurement data: + + #. Quartiles Q1, Q2, Q3 – three points dividing a ranked set of data set + into four equal parts, Q2 is the median of the data. + #. Inter Quartile Range IQR=Q3-Q1 – measure of variability, used here to + eliminate outliers. + #. Outliers – extreme values that are at least 1.5*IQR below Q1, or at + least 1.5*IQR above Q3. + #. Trimmed Moving Average (TMA) – average across the data set of the rolling + window of <N> values without the outliers. Used here to calculate TMSD. + #. Trimmed Moving Standard Deviation (TMSD) – standard deviation over the + data set of the rolling window of <N> values without the outliers, + requires calculating TMA. Used here for anomaly detection. + #. Moving Median (MM) - median across the data set of the rolling window of + <N> values with all data points, including the outliers. Used here for + anomaly detection. + +Anomaly Detection +````````````````` + +Based on the assumption that all performance measurements can be modelled using +normal distribution, a three-sigma rule of thumb is proposed as the main +criteria for anomaly detection. + +Three-sigma rule of thumb, aka 68–95–99.7 rule, is a shorthand used to capture +the percentage of values that lie within a band around the average (mean) in a +normal distribution within a width of two, four and six standard deviations. +More accurately 68.27%, 95.45% and 99.73% of the result values should lie within +one, two or three standard deviations of the mean, see figure below. + +To verify compliance of test result with value X against defined trend analysis +metric and detect anomalies, three simple evaluation criteria are proposed: + +:: + + Test Result Evaluation Reported Result Reported Reason Trending Graph Markers + ========================================================================================== + Normal Pass Normal Part of plot line + Regression Fail Regression Red circle + Progression Pass Progression Green circle + +Jenkins job cumulative results: + + #. Pass - if all detection results are Pass or Warning. + #. Fail - if any detection result is Fail. + +Performance Trending (PT) +````````````````````````` + +CSIT PT runs regular performance test jobs finding MRR, PDR and NDR per test +cases. PT is designed as follows: + + #. PT job triggers: + + #. Periodic e.g. daily. + #. On-demand gerrit triggered. + #. Other periodic TBD. + + #. Measurements and calculations per test case: + + #. MRR Max Received Rate + + #. Measured: Unlimited tolerance of packet loss. + #. Send packets at link rate, count total received packets, divide + by test trial period. + + #. Optimized binary search bounds for PDR and NDR tests: + + #. Calculated: High and low bounds for binary search based on MRR + and pre-defined Packet Loss Ratio (PLR). + #. HighBound=MRR, LowBound=to-be-determined. + #. PLR – acceptable loss ratio for PDR tests, currently set to 0.5% + for all performance tests. + + #. PDR and NDR: + + #. Run binary search within the calculated bounds, find PDR and NDR. + #. Measured: PDR Partial Drop Rate – limited non-zero tolerance of + packet loss. + #. Measured: NDR Non Drop Rate - zero packet loss. + + #. Archive MRR, PDR and NDR per test case. + #. Archive counters collected at MRR, PDR and NDR. + +Performance Analysis (PA) +````````````````````````` + +CSIT PA runs performance analysis, change detection and trending using specified +trend analysis metrics over the rolling window of last <N> sets of historical +measurement data. PA is defined as follows: + + #. PA job triggers: + + #. By PT job at its completion. + #. On-demand gerrit triggered. + #. Other periodic TBD. + + #. Download and parse archived historical data and the new data: + + #. New data from latest PT job is evaluated against the rolling window + of <N> sets of historical data. + #. Download RF output.xml files and compressed archived data. + #. Parse out the data filtering test cases listed in PA specification + (part of CSIT PAL specification file). + + #. Calculate trend metrics for the rolling window of <N> sets of historical data: + + #. Calculate quartiles Q1, Q2, Q3. + #. Trim outliers using IQR. + #. Calculate TMA and TMSD. + #. Calculate normal trending range per test case based on TMA and TMSD. + + #. Evaluate new test data against trend metrics: + + #. If within the range of (TMA +/- 3*TMSD) => Result = Pass, + Reason = Normal. + #. If below the range => Result = Fail, Reason = Regression. + #. If above the range => Result = Pass, Reason = Progression. + + #. Generate and publish results + + #. Relay evaluation result to job result. + #. Generate a new set of trend analysis summary graphs and drill-down + graphs. + + #. Summary graphs to include measured values with Normal, + Progression and Regression markers. MM shown in the background if + possible. + #. Drill-down graphs to include MM, TMA and TMSD. + + #. Publish trend analysis graphs in html format. |