aboutsummaryrefslogtreecommitdiffstats
path: root/PyPI/jumpavg/README.rst
blob: b6b502c62b9a127b4023e473b1673d0f34651bb5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
Jumpavg library
===============

Origins
-------

This library was developed as anomaly detection logic for "PAL" component
of CSIT_ (Continuous System and Integration Testing) project
of fd.io_ ("Fast Data"), one of LFN_ (Linux Foundation Networking) projects.
Currently still being primarily used in PAL's successor: CSIT-DASH_.

In order to make this code available in PyPI_ (Python Package Index),
the setuputils stuff (later converted to pyproject.toml) has been added,
but after some discussion, that directory_ ended up having
only a symlink to the original place of tightly coupled CSIT code.

Usage
-----

High level description
______________________

The main method is "classify", which partitions the input sequence of values
into consecutive "groups", so that standard deviation of samples within a group
is small.

The design decisions that went into the final algorithm are heavily influenced
by typical results seen in CSIT testing, so it is better to read about
the inner workings of the classification procedure in CSIT documentation,
especially the Minimum Description Length sub-chapter of `trend analysis`_.

Example
_______

A very basic example, showing some inputs and the structure of output.
The output is a single line, here shown wrapped for readability.

..  code-block:: python3

    >>> from jumpavg import classify
    >>> classify(values=[2.1, 3.1, 3.2], unit=0.1)
    BitCountingGroupList(max_value=3.2, unit=0.1, group_list=[BitCountingGroup(run_list=
    [2.1], max_value=3.2, unit=0.1, comment='normal', prev_avg=None, stats=AvgStdevStats
    (size=1, avg=2.1, stdev=0.0), cached_bits=6.044394119358453), BitCountingGroup(run_l
    ist=[3.1, 3.2], max_value=3.2, unit=0.1, comment='progression', prev_avg=2.1, stats=
    AvgStdevStats(size=2, avg=3.1500000000000004, stdev=0.050000000000000044), cached_bi
    ts=10.215241265313393)], bits_except_last=6.044394119358453)

Change log
----------

0.4.2: Should no longer divide by zero on empty inputs.

0.4.1: Fixed bug of not penalizing large stdev enough (at all for size 2 stats).

0.4.0: Added "unit" and "sbps" parameters so information content
is reasonable even if sample values are below one.

0.3.0: Considerable speedup by avoiding unneeded copy. Dataclasses used.
Mostly API compatible, but repr looks different.

0.2.0: API incompatible changes. Targeted to Python 3 now.

0.1.3: Changed stdev computation to avoid negative variance due to rounding errors.

0.1.2: First version published in PyPI.

.. _CSIT: https://wiki.fd.io/view/CSIT
.. _CSIT-DASH: https://csit.fd.io
.. _directory: https://gerrit.fd.io/r/gitweb?p=csit.git;a=tree;f=PyPI/jumpavg
.. _fd.io: https://fd.io/
.. _LFN: https://www.linuxfoundation.org/projects/networking/
.. _PyPI: https://pypi.org/
.. _trend analysis: https://csit.fd.io/cdocs/methodology/trending/analysis/#trend-analysis