aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorVratko Polak <vrpolak@cisco.com>2024-01-23 15:37:14 +0100
committerVratko Polak <vrpolak@cisco.com>2024-01-24 08:59:29 +0000
commit852f60f525fdc6080387fe6a3b297736c83f0834 (patch)
treed3b12ee6072995b83e348a592953ed141f22d2be
parent064b326e365ad89b18b58d007d0b1f09ada65180 (diff)
feat(PyPI): update metadata for jumpavg 0.4.2
+ Convert readme to .rst + Add Usage including a basic example. - Detailed description points to C-DASH methodology. + That page is updated for new defaults and better description. + Unify readme layout and origins with MLRsearch metadata. - Not releasing new MLRsearch version just for readme improvements. + Update the TODO file. Change-Id: I76ac22b7f283f01349bf9a50459dc841e13b21ad Signed-off-by: Vratko Polak <vrpolak@cisco.com>
-rw-r--r--PyPI/MLRsearch/README.rst52
-rw-r--r--PyPI/jumpavg/LICENSE.txt (renamed from PyPI/jumpavg/LICENSE)0
-rw-r--r--PyPI/jumpavg/README.md38
-rw-r--r--PyPI/jumpavg/README.rst74
-rw-r--r--PyPI/jumpavg/hints_and_todos.txt3
-rw-r--r--PyPI/jumpavg/pyproject.toml21
-rw-r--r--docs/content/methodology/trending/analysis.md25
7 files changed, 128 insertions, 85 deletions
diff --git a/PyPI/MLRsearch/README.rst b/PyPI/MLRsearch/README.rst
index 3df7756f10..bec1b7749e 100644
--- a/PyPI/MLRsearch/README.rst
+++ b/PyPI/MLRsearch/README.rst
@@ -9,30 +9,15 @@ in CSIT_ (Continuous System and Integration Testing) project of fd.io_
(Fast Data), one of LFN_ (Linux Foundation Networking) projects.
In order to make this code available in PyPI_ (Python Package Index),
-the setuputils stuff has been added,
-but after some discussion, the export directory_
-is only a symlink to the original place of tightly coupled CSIT code.
+the setuputils stuff (later converted to pyproject.toml) has been added,
+but after some discussion, that directory_ ended up having
+only a symlink to the original place of tightly coupled CSIT code.
-Change log
-----------
-
-1.2.1: Updated the readme document.
-
-1.2.0: Changed the output structure to use Goal Result as described in draft-05.
-
-1.1.0: Logic improvements, independent selectors, exceed ratio support,
-better width rounding, conditional throughput as output.
-Implementation relies more on dataclasses, code split into smaller files.
-API changed considerably, mainly to avoid long argument lists.
-
-0.4.0: Considarable logic improvements, more than two target ratios supported.
-API is not backward compatible with previous versions.
-
-0.3.0: Migrated to Python 3.6, small code quality improvements.
-
-0.2.0: Optional parameter "doublings" has been added.
+IETF documents
+--------------
-0.1.1: First officially released version.
+The currently published `IETF draft`_ describes the logic of version 1.2.0,
+earlier library and draft versions do not match each other that well.
Usage
-----
@@ -142,11 +127,26 @@ This is the screen capture of interactive python interpreter
PDR conditional throughput: 1000000.6730730429
>>>
-IETF documents
---------------
+Change log
+----------
-The currently published `IETF draft`_ describes the logic of version 1.2.0,
-earlier library and draft versions do not match each other that well.
+1.2.1: Updated the readme document.
+
+1.2.0: Changed the output structure to use Goal Result as described in draft-05.
+
+1.1.0: Logic improvements, independent selectors, exceed ratio support,
+better width rounding, conditional throughput as output.
+Implementation relies more on dataclasses, code split into smaller files.
+API changed considerably, mainly to avoid long argument lists.
+
+0.4.0: Considarable logic improvements, more than two target ratios supported.
+API is not backward compatible with previous versions.
+
+0.3.0: Migrated to Python 3.6, small code quality improvements.
+
+0.2.0: Optional parameter "doublings" has been added.
+
+0.1.1: First officially released version.
.. _CSIT: https://wiki.fd.io/view/CSIT
.. _fd.io: https://fd.io/
diff --git a/PyPI/jumpavg/LICENSE b/PyPI/jumpavg/LICENSE.txt
index 261eeb9e9f..261eeb9e9f 100644
--- a/PyPI/jumpavg/LICENSE
+++ b/PyPI/jumpavg/LICENSE.txt
diff --git a/PyPI/jumpavg/README.md b/PyPI/jumpavg/README.md
deleted file mode 100644
index e93e4dc13b..0000000000
--- a/PyPI/jumpavg/README.md
+++ /dev/null
@@ -1,38 +0,0 @@
-# Jumpavg library
-
-## Origins
-
-This library was developed as anomaly detection logic
-for [PAL](https://wiki.fd.io/view/CSIT/Design_Optimizations#Presentation_and_Analytics_Layer "Presentation and Analysis Layer")
-of [CSIT](https://wiki.fd.io/view/CSIT "Continuous System and Integration Testing")
-project of [fd.io](https://fd.io/ "Fast Data"),
-one of [LFN](https://www.linuxfoundation.org/projects/networking/ "Linux Foundation Networking") projects.
-
-Currently still being primarily used in PAL's successor [CSIT-DASH](https://csit.fd.io).
-
-In order to make this code available in [PyPI](https://pypi.org/ "Python Package Index"),
-the setuputils stuff has been added,
-and the code has been moved into a separate [directory](https://gerrit.fd.io/r/gitweb?p=csit.git;a=tree;f=PyPI/jumpavg),
-in order to not intervere of otherwise tightly coupled CSIT code.
-
-## Usage
-
-TODO.
-
-## Change log
-
-TODO: Move into a separate file?
-
-+ 0.4.1: Fixed bug of not penalizing large stdev enough (at all for size 2 stats).
-
-+ 0.4.0: Added "unit" and "sbps" parameters so information content
- is reasonable even if sample values are below one.
-
-+ 0.3.0: Considerable speedup by avoiding unneeded copy. Dataclasses used.
- Mostly API compatible, but repr looks different.
-
-+ 0.2.0: API incompatible changes. Targeted to Python 3 now.
-
-+ 0.1.3: Changed stdev computation to avoid negative variance due to rounding errors.
-
-+ 0.1.2: First version published in PyPI.
diff --git a/PyPI/jumpavg/README.rst b/PyPI/jumpavg/README.rst
new file mode 100644
index 0000000000..b6b502c62b
--- /dev/null
+++ b/PyPI/jumpavg/README.rst
@@ -0,0 +1,74 @@
+Jumpavg library
+===============
+
+Origins
+-------
+
+This library was developed as anomaly detection logic for "PAL" component
+of CSIT_ (Continuous System and Integration Testing) project
+of fd.io_ ("Fast Data"), one of LFN_ (Linux Foundation Networking) projects.
+Currently still being primarily used in PAL's successor: CSIT-DASH_.
+
+In order to make this code available in PyPI_ (Python Package Index),
+the setuputils stuff (later converted to pyproject.toml) has been added,
+but after some discussion, that directory_ ended up having
+only a symlink to the original place of tightly coupled CSIT code.
+
+Usage
+-----
+
+High level description
+______________________
+
+The main method is "classify", which partitions the input sequence of values
+into consecutive "groups", so that standard deviation of samples within a group
+is small.
+
+The design decisions that went into the final algorithm are heavily influenced
+by typical results seen in CSIT testing, so it is better to read about
+the inner workings of the classification procedure in CSIT documentation,
+especially the Minimum Description Length sub-chapter of `trend analysis`_.
+
+Example
+_______
+
+A very basic example, showing some inputs and the structure of output.
+The output is a single line, here shown wrapped for readability.
+
+.. code-block:: python3
+
+ >>> from jumpavg import classify
+ >>> classify(values=[2.1, 3.1, 3.2], unit=0.1)
+ BitCountingGroupList(max_value=3.2, unit=0.1, group_list=[BitCountingGroup(run_list=
+ [2.1], max_value=3.2, unit=0.1, comment='normal', prev_avg=None, stats=AvgStdevStats
+ (size=1, avg=2.1, stdev=0.0), cached_bits=6.044394119358453), BitCountingGroup(run_l
+ ist=[3.1, 3.2], max_value=3.2, unit=0.1, comment='progression', prev_avg=2.1, stats=
+ AvgStdevStats(size=2, avg=3.1500000000000004, stdev=0.050000000000000044), cached_bi
+ ts=10.215241265313393)], bits_except_last=6.044394119358453)
+
+Change log
+----------
+
+0.4.2: Should no longer divide by zero on empty inputs.
+
+0.4.1: Fixed bug of not penalizing large stdev enough (at all for size 2 stats).
+
+0.4.0: Added "unit" and "sbps" parameters so information content
+is reasonable even if sample values are below one.
+
+0.3.0: Considerable speedup by avoiding unneeded copy. Dataclasses used.
+Mostly API compatible, but repr looks different.
+
+0.2.0: API incompatible changes. Targeted to Python 3 now.
+
+0.1.3: Changed stdev computation to avoid negative variance due to rounding errors.
+
+0.1.2: First version published in PyPI.
+
+.. _CSIT: https://wiki.fd.io/view/CSIT
+.. _CSIT-DASH: https://csit.fd.io
+.. _directory: https://gerrit.fd.io/r/gitweb?p=csit.git;a=tree;f=PyPI/jumpavg
+.. _fd.io: https://fd.io/
+.. _LFN: https://www.linuxfoundation.org/projects/networking/
+.. _PyPI: https://pypi.org/
+.. _trend analysis: https://csit.fd.io/cdocs/methodology/trending/analysis/#trend-analysis
diff --git a/PyPI/jumpavg/hints_and_todos.txt b/PyPI/jumpavg/hints_and_todos.txt
index 42072054e4..e829efa921 100644
--- a/PyPI/jumpavg/hints_and_todos.txt
+++ b/PyPI/jumpavg/hints_and_todos.txt
@@ -1,8 +1,7 @@
toml hint: https://flit.pypa.io/en/stable/pyproject_toml.html
-md hint: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
+rst hint: https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html
build hint: https://packaging.python.org/en/latest/tutorials/packaging-projects/
-TODO: Copy improvements from MLRsearch metadata (readme and license extensions).
TODO: Include simulator and tests.
TODO: Test which Python versions is the code actually compatible with.
TODO: Create a separate webpage for jumpavg library.
diff --git a/PyPI/jumpavg/pyproject.toml b/PyPI/jumpavg/pyproject.toml
index ee6b4cabed..8aa906c4c3 100644
--- a/PyPI/jumpavg/pyproject.toml
+++ b/PyPI/jumpavg/pyproject.toml
@@ -1,16 +1,9 @@
[project]
name = "jumpavg"
-version = "0.4.1"
+version = "0.4.2"
description = "Library for locating changes in time series by grouping results."
-authors = [
- { name = "Cisco Systems Inc. and/or its affiliates", email = "csit-dev@lists.fd.io" },
-]
-maintainers = [
- { name = "Vratko Polak", email = "vrpolak@cisco.com" },
- { name = "Tibor Frank", email = "tifrank@cisco.com" },
-]
-keywords = ["progression", "regression", "anomaly detection", "statistics", "bits" ]
-readme = "README.md"
+license = { file = "LICENSE.txt" }
+readme = { file = "README.rst", content-type = "text/x-rst" }
requires-python = ">=3.8"
classifiers = [
"Development Status :: 3 - Alpha",
@@ -21,6 +14,14 @@ classifiers = [
"Programming Language :: Python :: 3.8",
"Topic :: Scientific/Engineering :: Information Analysis",
]
+keywords = ["progression", "regression", "anomaly detection", "statistics", "bits" ]
+authors = [
+ { name = "Cisco Systems Inc. and/or its affiliates", email = "csit-dev@lists.fd.io" },
+]
+maintainers = [
+ { name = "Vratko Polak", email = "vrpolak@cisco.com" },
+ { name = "Tibor Frank", email = "tifrank@cisco.com" },
+]
[project.urls]
"Bug Tracker" = "https://jira.fd.io/projects/CSIT/issues"
diff --git a/docs/content/methodology/trending/analysis.md b/docs/content/methodology/trending/analysis.md
index fe952259ab..eb1c8a741b 100644
--- a/docs/content/methodology/trending/analysis.md
+++ b/docs/content/methodology/trending/analysis.md
@@ -31,8 +31,9 @@ normally, currently we do not have a better tractable model.
Here, "sample" should be the result of single trial measurement, with group
boundaries set only at test run granularity. But in order to avoid detecting
causes unrelated to VPP performance, the current presentation takes average of
-all trials within the run as the sample. Effectively, this acts as a single
+all trials within the MRR run as the sample. Effectively, this acts as a single
trial with aggregate duration.
+(Trending of NDR or PDR results take just one sample, the conditional throughput).
Performance graphs show the run average as a dot (not all individual trial
results).
@@ -59,12 +60,20 @@ group average (more on that later), group stdev and then all the samples.
Luckily, the "all the samples" part turns out to be quite easy to compute.
If sample values are considered as coordinates in (multi-dimensional)
-Euclidean space, fixing stdev means the point with allowed coordinates
-lays on a sphere. Fixing average intersects the sphere with a (hyper)-plane,
-and Gaussian probability density on the resulting sphere is constant.
+Euclidean space, fixing average restrict possible values to a (hyper-)plane.
+Then, fixing stdev means the point with allowed coordinates
+lays on a sphere (centered the "all samples equal to average" point)
+within that hyper-plane.
+And the Gaussian probability density on the resulting sphere is constant.
So the only contribution is the "area" of the sphere, which only depends
on the number of samples and stdev.
+Still, to get the information content in bits, we need to know what "size"
+one "pixel" of that area is.
+Our implementation assumes that measurement precision is such that
+the max sample value is 4096 (2^12) pixels (inspired by 0.5% precision
+of NDRPDR tests, roughly two pixels around max value).
+
A somehow ambiguous part is in choosing which encoding
is used for group size, average and stdev.
Different encodings cause different biases to large or small values.
@@ -74,13 +83,11 @@ for stdev and average of the first group,
but for averages of subsequent groups we have chosen a distribution
which discourages delimiting groups with averages close together.
-Our implementation assumes that measurement precision is 1.0 pps.
-Thus it is slightly wrong for trial durations other than 1.0 seconds.
-Also, all the calculations assume 1.0 pps is totally negligible,
-compared to stdev value.
-
The group selection algorithm currently has no parameters,
all the aforementioned encodings and handling of precision is hard-coded.
+(Although the underlying library "jumpavg" allows users to change the precision,
+either in absolute units or in bits per max sample.)
+
In principle, every group selection is examined, and the one encodable
with least amount of bits is selected.
As the bit amount for a selection is just sum of bits for every group,