diff options
Diffstat (limited to 'docs/cpta/methodology/perpatch_performance_tests.rst')
-rw-r--r-- | docs/cpta/methodology/perpatch_performance_tests.rst | 86 |
1 files changed, 0 insertions, 86 deletions
diff --git a/docs/cpta/methodology/perpatch_performance_tests.rst b/docs/cpta/methodology/perpatch_performance_tests.rst deleted file mode 100644 index c1d3d669b1..0000000000 --- a/docs/cpta/methodology/perpatch_performance_tests.rst +++ /dev/null @@ -1,86 +0,0 @@ -Per-patch performance tests ---------------------------- - -Updated for CSIT git commit id: 661035ac4ce6e51649f302fe2b7a8218257c0587. - -A methodology similar to trending analysis is used for comparing performance -before a DUT code change is merged. This can act as a verify job to disallow -changes which would decrease performance without a good reason. - -Existing jobs -````````````` - -VPP is the only project currently using such jobs. -They are not started automatically, must be triggered on demand. -They allow full tag expressions, but some tags are enforced (such as MRR). - -Only the three types of tesbed based on Xeon processors have jobs created. -Their Gerrit triggers words are "perftest-3n-hsw", "perftest-3n-skx" -and "perftest-2n-skx". - -If additional arguments are added to the Gerrit trigger, they are treated -as Robot tag expressions to select tests to run. For more details -on existing tags, see `tag documentation rst file`_. - -Basic operation -``````````````` - -The job builds VPP .deb packages for both the patch under test -(called "current") and its parent patch (called "parent"). - -For each test (from a set defined by tag expression), -both builds are subjected to several trial measurements (BMRR). -Measured samples are grouped to "parent" sequence, -followed by "current" sequence. The same Minimal Description Length -algorithm as in trending is used to decide whether it is one big group, -or two smaller gropus. If it is one group, a "normal" result -is declared for the test. If it is two groups, and current average -is less then parent average, the test is declared a regression. -If it is two groups and current average is larger or equal, -the test is declared a progression. - -The whole job fails (giving -1) if some trial measurement failed, -or if any test was declared a regression. - -Temporary specifics -``````````````````` - -The Minimal Description Length analysis is performed by -jumpavg-0.1.3 available on PyPI. - -In hopes of strengthening of signal (code performance) compared to noise -(all other factors influencing the measured values), several workarounds -are applied. - -In contrast to trending, trial duration is set to 10 seconds, -and only 5 samples are measured for each build. -Both parameters are set in ci-management. - -This decreases sensitivity to regressions, but also decreases -probability of false positives. - -Console output -`````````````` - -The following information as visible towards the end of Jenkins console output, -repeated for each analyzed test. - -The original 5 values are visible in order they were measured. -The 5 values after processing are also visible in output, -this time sorted by value (so people can see minimum and maximum). - -The next output is difference of averages. It is the current average -minus the parent average, expressed as percentage of the parent average. - -The next three outputs contain the jumpavg representation -of the two groups and a combined group. -Here, "bits" is the description length; for "current" sequence -it includes effect from "parent" average value -(jumpavg-0.1.3 penalizes sequences with too close averages). - -Next, a sentence describing which grouping description is shorter, -by how much bits. -Finally, the test result classification is visible. - -The algorithm does not track test case names, -so test cases are indexed (from 0). |