Presentation and Analytics
==========================

Overview
--------

The presentation and analytics layer (PAL) is the fourth layer of CSIT
hierarchy. The model of presentation and analytics layer consists of four
sub-layers, bottom up:

 - sL1 - Data - input data to be processed:

   - Static content - .rst text files, .svg static figures, and other files
     stored in the CSIT git repository.
   - Data to process - .xml files generated by Jenkins jobs executing tests,
     stored as robot results files (output.xml).
   - Specification - .yaml file with the models of report elements (tables,
     plots, layout, ...) generated by this tool. There is also the configuration
     of the tool and the specification of input data (jobs and builds).

 - sL2 - Data processing

   - The data are read from the specified input files (.xml) and stored as
     multi-indexed `pandas.Series <https://pandas.pydata.org/pandas-docs/stable/
     generated/pandas.Series.html>`_.
   - This layer provides also interface to input data and filtering of the input
     data.

 - sL3 - Data presentation - This layer generates the elements specified in the
   specification file:

   - Tables: .csv files linked to static .rst files.
   - Plots: .html files generated using plot.ly linked to static .rst files.

 - sL4 - Report generation - Sphinx generates required formats and versions:

   - formats: html, pdf
   - versions: minimal, full (TODO: define the names and scope of versions)

.. only:: latex

    .. raw:: latex

        \begin{figure}[H]
            \centering
                \graphicspath{{../_tmp/src/csit_framework_documentation/}}
                \includegraphics[width=0.90\textwidth]{pal_layers}
                \label{fig:pal_layers}
        \end{figure}

.. only:: html

    .. figure:: pal_layers.svg
        :alt: PAL Layers
        :align: center

Data
----

Report Specification
````````````````````

The report specification file defines which data is used and which outputs are
generated. It is human readable and structured. It is easy to add / remove /
change items. The specification includes:

 - Specification of the environment.
 - Configuration of debug mode (optional).
 - Specification of input data (jobs, builds, files, ...).
 - Specification of the output.
 - What and how is generated:
   - What: plots, tables.
   - How: specification of all properties and parameters.
 - .yaml format.

Structure of the specification file
'''''''''''''''''''''''''''''''''''

The specification file is organized as a list of dictionaries distinguished by
the type:

::

    -
      type: "environment"
    -
      type: "configuration"
    -
      type: "debug"
    -
      type: "static"
    -
      type: "input"
    -
      type: "output"
    -
      type: "table"
    -
      type: "plot"
    -
      type: "file"

Each type represents a section. The sections "environment", "debug", "static",
"input" and "output" are listed only once in the specification; "table", "file"
and "plot" can be there multiple times.

Sections "debug", "table", "file" and "plot" are optional.

Table(s), files(s) and plot(s) are referred as "elements" in this text. It is
possible to define and implement other elements if needed.


Section: Environment
''''''''''''''''''''

This section has the following parts:

 - type: "environment" - says that this is the section "environment".
 - configuration - configuration of the PAL.
 - paths - paths used by the PAL.
 - urls - urls pointing to the data sources.
 - make-dirs - a list of the directories to be created by the PAL while
   preparing the environment.
 - remove-dirs - a list of the directories to be removed while cleaning the
   environment.
 - build-dirs - a list of the directories where the results are stored.

The structure of the section "Environment" is as follows (example):

::

    -
      type: "environment"
      configuration:
        # Debug mode:
        # - Skip:
        #   - Download of input data files
        # - Do:
        #   - Read data from given zip / xml files
        #   - Set the configuration as it is done in normal mode
        # If the section "type: debug" is missing, CFG[DEBUG] is set to 0.
        CFG[DEBUG]: 0

      paths:
        # Top level directories:
        ## Working directory
        DIR[WORKING]: "_tmp"
        ## Build directories
        DIR[BUILD,HTML]: "_build"
        DIR[BUILD,LATEX]: "_build_latex"

        # Static .rst files
        DIR[RST]: "../../../docs/report"

        # Working directories
        ## Input data files (.zip, .xml)
        DIR[WORKING,DATA]: "{DIR[WORKING]}/data"
        ## Static source files from git
        DIR[WORKING,SRC]: "{DIR[WORKING]}/src"
        DIR[WORKING,SRC,STATIC]: "{DIR[WORKING,SRC]}/_static"

        # Static html content
        DIR[STATIC]: "{DIR[BUILD,HTML]}/_static"
        DIR[STATIC,VPP]: "{DIR[STATIC]}/vpp"
        DIR[STATIC,DPDK]: "{DIR[STATIC]}/dpdk"
        DIR[STATIC,ARCH]: "{DIR[STATIC]}/archive"

        # Detailed test results
        DIR[DTR]: "{DIR[WORKING,SRC]}/detailed_test_results"
        DIR[DTR,PERF,DPDK]: "{DIR[DTR]}/dpdk_performance_results"
        DIR[DTR,PERF,VPP]: "{DIR[DTR]}/vpp_performance_results"
        DIR[DTR,FUNC,VPP]: "{DIR[DTR]}/vpp_functional_results"
        DIR[DTR,FUNC,NSHSFC]: "{DIR[DTR]}/nshsfc_functional_results"
        DIR[DTR,PERF,VPP,IMPRV]: "{DIR[WORKING,SRC]}/vpp_performance_tests/performance_improvements"

        # Detailed test configurations
        DIR[DTC]: "{DIR[WORKING,SRC]}/test_configuration"
        DIR[DTC,PERF,VPP]: "{DIR[DTC]}/vpp_performance_configuration"
        DIR[DTC,FUNC,VPP]: "{DIR[DTC]}/vpp_functional_configuration"

        # Detailed tests operational data
        DIR[DTO]: "{DIR[WORKING,SRC]}/test_operational_data"
        DIR[DTO,PERF,VPP]: "{DIR[DTO]}/vpp_performance_operational_data"

        # .css patch file to fix tables generated by Sphinx
        DIR[CSS_PATCH_FILE]: "{DIR[STATIC]}/theme_overrides.css"
        DIR[CSS_PATCH_FILE2]: "{DIR[WORKING,SRC,STATIC]}/theme_overrides.css"

      urls:
        URL[JENKINS,CSIT]: "https://jenkins.fd.io/view/csit/job"
        URL[JENKINS,HC]: "https://jenkins.fd.io/view/hc2vpp/job"

      make-dirs:
      # List the directories which are created while preparing the environment.
      # All directories MUST be defined in "paths" section.
      - "DIR[WORKING,DATA]"
      - "DIR[STATIC,VPP]"
      - "DIR[STATIC,DPDK]"
      - "DIR[STATIC,ARCH]"
      - "DIR[BUILD,LATEX]"
      - "DIR[WORKING,SRC]"
      - "DIR[WORKING,SRC,STATIC]"

      remove-dirs:
      # List the directories which are deleted while cleaning the environment.
      # All directories MUST be defined in "paths" section.
      #- "DIR[BUILD,HTML]"

      build-dirs:
      # List the directories where the results (build) is stored.
      # All directories MUST be defined in "paths" section.
      - "DIR[BUILD,HTML]"
      - "DIR[BUILD,LATEX]"

It is possible to use defined items in the definition of other items, e.g.:

::

    DIR[WORKING,DATA]: "{DIR[WORKING]}/data"

will be automatically changed to

::

    DIR[WORKING,DATA]: "_tmp/data"


Section: Configuration
''''''''''''''''''''''

This section specifies the groups of parameters which are repeatedly used in the
elements defined later in the specification file. It has the following parts:

 - data sets - Specification of data sets used later in element's specifications
   to define the input data.
 - plot layouts - Specification of plot layouts used later in plots'
   specifications to define the plot layout.

The structure of the section "Configuration" is as follows (example):

::

    -
      type: "configuration"
      data-sets:
        plot-vpp-throughput-latency:
          csit-vpp-perf-1710-all:
          - 11
          - 12
          - 13
          - 14
          - 15
          - 16
          - 17
          - 18
          - 19
          - 20
        vpp-perf-results:
          csit-vpp-perf-1710-all:
          - 20
          - 23
      plot-layouts:
        plot-throughput:
          xaxis:
            autorange: True
            autotick: False
            fixedrange: False
            gridcolor: "rgb(238, 238, 238)"
            linecolor: "rgb(238, 238, 238)"
            linewidth: 1
            showgrid: True
            showline: True
            showticklabels: True
            tickcolor: "rgb(238, 238, 238)"
            tickmode: "linear"
            title: "Indexed Test Cases"
            zeroline: False
          yaxis:
            gridcolor: "rgb(238, 238, 238)'"
            hoverformat: ".4s"
            linecolor: "rgb(238, 238, 238)"
            linewidth: 1
            range: []
            showgrid: True
            showline: True
            showticklabels: True
            tickcolor: "rgb(238, 238, 238)"
            title: "Packets Per Second [pps]"
            zeroline: False
          boxmode: "group"
          boxgroupgap: 0.5
          autosize: False
          margin:
            t: 50
            b: 20
            l: 50
            r: 20
          showlegend: True
          legend:
            orientation: "h"
          width: 700
          height: 1000

The definitions from this sections are used in the elements, e.g.:

::

    -
      type: "plot"
      title: "VPP Performance 64B-1t1c-(eth|dot1q|dot1ad)-(l2xcbase|l2bdbasemaclrn)-ndrdisc"
      algorithm: "plot_performance_box"
      output-file-type: ".html"
      output-file: "{DIR[STATIC,VPP]}/64B-1t1c-l2-sel1-ndrdisc"
      data:
        "plot-vpp-throughput-latency"
      filter: "'64B' and ('BASE' or 'SCALE') and 'NDRDISC' and '1T1C' and ('L2BDMACSTAT' or 'L2BDMACLRN' or 'L2XCFWD') and not 'VHOST'"
      parameters:
      - "throughput"
      - "parent"
      traces:
        hoverinfo: "x+y"
        boxpoints: "outliers"
        whiskerwidth: 0
      layout:
        title: "64B-1t1c-(eth|dot1q|dot1ad)-(l2xcbase|l2bdbasemaclrn)-ndrdisc"
        layout:
          "plot-throughput"


Section: Debug mode
'''''''''''''''''''

This section is optional as it configures the debug mode. It is used if one
does not want to download input data files and use local files instead.

If the debug mode is configured, the "input" section is ignored.

This section has the following parts:

 - type: "debug" - says that this is the section "debug".
 - general:

   - input-format - xml or zip.
   - extract - if "zip" is defined as the input format, this file is extracted
     from the zip file, otherwise this parameter is ignored.

 - builds - list of builds from which the data is used. Must include a job
   name as a key and then a list of builds and their output files.

The structure of the section "Debug" is as follows (example):

::

    -
      type: "debug"
      general:
        input-format: "zip"  # zip or xml
        extract: "robot-plugin/output.xml"  # Only for zip
      builds:
        # The files must be in the directory DIR[WORKING,DATA]
        csit-dpdk-perf-1707-all:
        -
          build: 10
          file: "csit-dpdk-perf-1707-all__10.xml"
        -
          build: 9
          file: "csit-dpdk-perf-1707-all__9.xml"
        csit-nsh_sfc-verify-func-1707-ubuntu1604-virl:
        -
          build: 2
          file: "csit-nsh_sfc-verify-func-1707-ubuntu1604-virl-2.xml"
        csit-vpp-functional-1707-ubuntu1604-virl:
        -
          build: lastSuccessfulBuild
          file: "csit-vpp-functional-1707-ubuntu1604-virl-lastSuccessfulBuild.xml"
        hc2vpp-csit-integration-1707-ubuntu1604:
        -
          build: lastSuccessfulBuild
          file: "hc2vpp-csit-integration-1707-ubuntu1604-lastSuccessfulBuild.xml"
        csit-vpp-perf-1707-all:
        -
          build: 16
          file: "csit-vpp-perf-1707-all__16__output.xml"
        -
          build: 17
          file: "csit-vpp-perf-1707-all__17__output.xml"


Section: Static
'''''''''''''''

This section defines the static content which is stored in git and will be used
as a source to generate the report.

This section has these parts:

 - type: "static" - says that this section is the "static".
 - src-path - path to the static content.
 - dst-path - destination path where the static content is copied and then
   processed.

::

    -
      type: "static"
      src-path: "{DIR[RST]}"
      dst-path: "{DIR[WORKING,SRC]}"


Section: Input
''''''''''''''

This section defines the data used to generate elements. It is mandatory
if the debug mode is not used.

This section has the following parts:

 - type: "input" - says that this section is the "input".
 - general - parameters common to all builds:

   - file-name: file to be downloaded.
   - file-format: format of the downloaded file, ".zip" or ".xml" are supported.
   - download-path: path to be added to url pointing to the file, e.g.:
     "{job}/{build}/robot/report/*zip*/{filename}"; {job}, {build} and
     {filename} are replaced by proper values defined in this section.
   - extract: file to be extracted from downloaded zip file, e.g.: "output.xml";
     if xml file is downloaded, this parameter is ignored.

 - builds - list of jobs (keys) and numbers of builds which output data will be
   downloaded.

The structure of the section "Input" is as follows (example from 17.07 report):

::

    -
      type: "input"  # Ignored in debug mode
      general:
        file-name: "robot-plugin.zip"
        file-format: ".zip"
        download-path: "{job}/{build}/robot/report/*zip*/{filename}"
        extract: "robot-plugin/output.xml"
      builds:
        csit-vpp-perf-1707-all:
        - 9
        - 10
        - 13
        - 14
        - 15
        - 16
        - 17
        - 18
        - 19
        - 21
        - 22
        csit-dpdk-perf-1707-all:
        - 1
        - 2
        - 3
        - 4
        - 5
        - 6
        - 7
        - 8
        - 9
        - 10
        csit-vpp-functional-1707-ubuntu1604-virl:
        - lastSuccessfulBuild
        hc2vpp-csit-perf-master-ubuntu1604:
        - 8
        - 9
        hc2vpp-csit-integration-1707-ubuntu1604:
        - lastSuccessfulBuild
        csit-nsh_sfc-verify-func-1707-ubuntu1604-virl:
        - 2


Section: Output
'''''''''''''''

This section specifies which format(s) will be generated (html, pdf) and which
versions will be generated for each format.

This section has the following parts:

 - type: "output" - says that this section is the "output".
 - format: html or pdf.
 - version: defined for each format separately.

The structure of the section "Output" is as follows (example):

::

    -
      type: "output"
      format:
        html:
        - full
        pdf:
        - full
        - minimal

TODO: define the names of versions


Content of "minimal" version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TODO: define the name and content of this version


Section: Table
''''''''''''''

This section defines a table to be generated. There can be 0 or more "table"
sections.

This section has the following parts:

 - type: "table" - says that this section defines a table.
 - title: Title of the table.
 - algorithm: Algorithm which is used to generate the table. The other
   parameters in this section must provide all information needed by the used
   algorithm.
 - template: (optional) a .csv file used as a template while generating the
   table.
 - output-file-ext: extension of the output file.
 - output-file: file which the table will be written to.
 - columns: specification of table columns:

   - title: The title used in the table header.
   - data: Specification of the data, it has two parts - command and arguments:

     - command:

       - template - take the data from template, arguments:

         - number of column in the template.

       - data - take the data from the input data, arguments:

         - jobs and builds which data will be used.

       - operation - performs an operation with the data already in the table,
         arguments:

         - operation to be done, e.g.: mean, stdev, relative_change (compute
           the relative change between two columns) and display number of data
           samples ~= number of test jobs. The operations are implemented in the
           utils.py
           TODO: Move from utils,py to e.g. operations.py
         - numbers of columns which data will be used (optional).

 - data: Specify the jobs and builds which data is used to generate the table.
 - filter: filter based on tags applied on the input data, if "template" is
   used, filtering is based on the template.
 - parameters: Only these parameters will be put to the output data structure.

The structure of the section "Table" is as follows (example of
"table_performance_improvements"):

::

    -
      type: "table"
      title: "Performance improvements"
      algorithm: "table_performance_improvements"
      template: "{DIR[DTR,PERF,VPP,IMPRV]}/tmpl_performance_improvements.csv"
      output-file-ext: ".csv"
      output-file: "{DIR[DTR,PERF,VPP,IMPRV]}/performance_improvements"
      columns:
      -
        title: "VPP Functionality"
        data: "template 1"
      -
        title: "Test Name"
        data: "template 2"
      -
        title: "VPP-16.09 mean [Mpps]"
        data: "template 3"
      -
        title: "VPP-17.01 mean [Mpps]"
        data: "template 4"
      -
        title: "VPP-17.04 mean [Mpps]"
        data: "template 5"
      -
        title: "VPP-17.07 mean [Mpps]"
        data: "data csit-vpp-perf-1707-all mean"
      -
        title: "VPP-17.07 stdev [Mpps]"
        data: "data csit-vpp-perf-1707-all stdev"
      -
        title: "17.04 to 17.07 change [%]"
        data: "operation relative_change 5 4"
      data:
        csit-vpp-perf-1707-all:
        - 9
        - 10
        - 13
        - 14
        - 15
        - 16
        - 17
        - 18
        - 19
        - 21
      filter: "template"
      parameters:
      - "throughput"

Example of "table_details" which generates "Detailed Test Results - VPP
Performance Results":

::

    -
      type: "table"
      title: "Detailed Test Results - VPP Performance Results"
      algorithm: "table_details"
      output-file-ext: ".csv"
      output-file: "{DIR[WORKING]}/vpp_performance_results"
      columns:
      -
        title: "Name"
        data: "data test_name"
      -
        title: "Documentation"
        data: "data test_documentation"
      -
        title: "Status"
        data: "data test_msg"
      data:
        csit-vpp-perf-1707-all:
        - 17
      filter: "all"
      parameters:
      - "parent"
      - "doc"
      - "msg"

Example of "table_details" which generates "Test configuration - VPP Performance
Test Configs":

::

    -
      type: "table"
      title: "Test configuration - VPP Performance Test Configs"
      algorithm: "table_details"
      output-file-ext: ".csv"
      output-file: "{DIR[WORKING]}/vpp_test_configuration"
      columns:
      -
        title: "Name"
        data: "data name"
      -
        title: "VPP API Test (VAT) Commands History - Commands Used Per Test Case"
        data: "data show-run"
      data:
        csit-vpp-perf-1707-all:
        - 17
      filter: "all"
      parameters:
      - "parent"
      - "name"
      - "show-run"


Section: Plot
'''''''''''''

This section defines a plot to be generated. There can be 0 or more "plot"
sections.

This section has these parts:

 - type: "plot" - says that this section defines a plot.
 - title: Plot title used in the logs. Title which is displayed is in the
   section "layout".
 - output-file-type: format of the output file.
 - output-file: file which the plot will be written to.
 - algorithm: Algorithm used to generate the plot. The other parameters in this
   section must provide all information needed by plot.ly to generate the plot.
   For example:

   - traces
   - layout

   - These parameters are transparently passed to plot.ly.

 - data: Specify the jobs and numbers of builds which data is used to generate
   the plot.
 - filter: filter applied on the input data.
 - parameters: Only these parameters will be put to the output data structure.

The structure of the section "Plot" is as follows (example of a plot showing
throughput in a chart box-with-whiskers):

::

    -
      type: "plot"
      title: "VPP Performance 64B-1t1c-(eth|dot1q|dot1ad)-(l2xcbase|l2bdbasemaclrn)-ndrdisc"
      algorithm: "plot_performance_box"
      output-file-type: ".html"
      output-file: "{DIR[STATIC,VPP]}/64B-1t1c-l2-sel1-ndrdisc"
      data:
        csit-vpp-perf-1707-all:
        - 9
        - 10
        - 13
        - 14
        - 15
        - 16
        - 17
        - 18
        - 19
        - 21
      # Keep this formatting, the filter is enclosed with " (quotation mark) and
      # each tag is enclosed with ' (apostrophe).
      filter: "'64B' and 'BASE' and 'NDRDISC' and '1T1C' and ('L2BDMACSTAT' or 'L2BDMACLRN' or 'L2XCFWD') and not 'VHOST'"
      parameters:
      - "throughput"
      - "parent"
      traces:
        hoverinfo: "x+y"
        boxpoints: "outliers"
        whiskerwidth: 0
      layout:
        title: "64B-1t1c-(eth|dot1q|dot1ad)-(l2xcbase|l2bdbasemaclrn)-ndrdisc"
        xaxis:
          autorange: True
          autotick: False
          fixedrange: False
          gridcolor: "rgb(238, 238, 238)"
          linecolor: "rgb(238, 238, 238)"
          linewidth: 1
          showgrid: True
          showline: True
          showticklabels: True
          tickcolor: "rgb(238, 238, 238)"
          tickmode: "linear"
          title: "Indexed Test Cases"
          zeroline: False
        yaxis:
          gridcolor: "rgb(238, 238, 238)'"
          hoverformat: ".4s"
          linecolor: "rgb(238, 238, 238)"
          linewidth: 1
          range: []
          showgrid: True
          showline: True
          showticklabels: True
          tickcolor: "rgb(238, 238, 238)"
          title: "Packets Per Second [pps]"
          zeroline: False
        boxmode: "group"
        boxgroupgap: 0.5
        autosize: False
        margin:
          t: 50
          b: 20
          l: 50
          r: 20
        showlegend: True
        legend:
          orientation: "h"
        width: 700
        height: 1000

The structure of the section "Plot" is as follows (example of a plot showing
latency in a box chart):

::

    -
      type: "plot"
      title: "VPP Latency 64B-1t1c-(eth|dot1q|dot1ad)-(l2xcbase|l2bdbasemaclrn)-ndrdisc"
      algorithm: "plot_latency_box"
      output-file-type: ".html"
      output-file: "{DIR[STATIC,VPP]}/64B-1t1c-l2-sel1-ndrdisc-lat50"
      data:
        csit-vpp-perf-1707-all:
        - 9
        - 10
        - 13
        - 14
        - 15
        - 16
        - 17
        - 18
        - 19
        - 21
      filter: "'64B' and 'BASE' and 'NDRDISC' and '1T1C' and ('L2BDMACSTAT' or 'L2BDMACLRN' or 'L2XCFWD') and not 'VHOST'"
      parameters:
      - "latency"
      - "parent"
      traces:
        boxmean: False
      layout:
        title: "64B-1t1c-(eth|dot1q|dot1ad)-(l2xcbase|l2bdbasemaclrn)-ndrdisc"
        xaxis:
          autorange: True
          autotick: False
          fixedrange: False
          gridcolor: "rgb(238, 238, 238)"
          linecolor: "rgb(238, 238, 238)"
          linewidth: 1
          showgrid: True
          showline: True
          showticklabels: True
          tickcolor: "rgb(238, 238, 238)"
          tickmode: "linear"
          title: "Indexed Test Cases"
          zeroline: False
        yaxis:
          gridcolor: "rgb(238, 238, 238)'"
          hoverformat: ""
          linecolor: "rgb(238, 238, 238)"
          linewidth: 1
          range: []
          showgrid: True
          showline: True
          showticklabels: True
          tickcolor: "rgb(238, 238, 238)"
          title: "Latency min/avg/max [uSec]"
          zeroline: False
        boxmode: "group"
        boxgroupgap: 0.5
        autosize: False
        margin:
          t: 50
          b: 20
          l: 50
          r: 20
        showlegend: True
        legend:
          orientation: "h"
        width: 700
        height: 1000

The structure of the section "Plot" is as follows (example of a plot showing
VPP HTTP server performance in a box chart with pre-defined data
"plot-vpp-httlp-server-performance" set and  plot layout "plot-cps"):

::

    -
      type: "plot"
      title: "VPP HTTP Server Performance"
      algorithm: "plot_http_server_performance_box"
      output-file-type: ".html"
      output-file: "{DIR[STATIC,VPP]}/http-server-performance-cps"
      data:
        "plot-vpp-httlp-server-performance"
      # Keep this formatting, the filter is enclosed with " (quotation mark) and
      # each tag is enclosed with ' (apostrophe).
      filter: "'HTTP' and 'TCP_CPS'"
      parameters:
      - "result"
      - "name"
      traces:
        hoverinfo: "x+y"
        boxpoints: "outliers"
        whiskerwidth: 0
      layout:
        title: "VPP HTTP Server Performance"
        layout:
          "plot-cps"


Section: file
'''''''''''''

This section defines a file to be generated. There can be 0 or more "file"
sections.

This section has the following parts:

 - type: "file" - says that this section defines a file.
 - title: Title of the table.
 - algorithm: Algorithm which is used to generate the file. The other
   parameters in this section must provide all information needed by the used
   algorithm.
 - output-file-ext: extension of the output file.
 - output-file: file which the file will be written to.
 - file-header: The header of the generated .rst file.
 - dir-tables: The directory with the tables.
 - data: Specify the jobs and builds which data is used to generate the table.
 - filter: filter based on tags applied on the input data, if "all" is
   used, no filtering is done.
 - parameters: Only these parameters will be put to the output data structure.
 - chapters: the hierarchy of chapters in the generated file.
 - start-level: the level of the the top-level chapter.

The structure of the section "file" is as follows (example):

::

    -
      type: "file"
      title: "VPP Performance Results"
      algorithm: "file_test_results"
      output-file-ext: ".rst"
      output-file: "{DIR[DTR,PERF,VPP]}/vpp_performance_results"
      file-header: "\n.. |br| raw:: html\n\n    <br />\n\n\n.. |prein| raw:: html\n\n    <pre>\n\n\n.. |preout| raw:: html\n\n    </pre>\n\n"
      dir-tables: "{DIR[DTR,PERF,VPP]}"
      data:
        csit-vpp-perf-1707-all:
        - 22
      filter: "all"
      parameters:
      - "name"
      - "doc"
      - "level"
      data-start-level: 2  # 0, 1, 2, ...
      chapters-start-level: 2  # 0, 1, 2, ...


Static content
``````````````

 - Manually created / edited files.
 - .rst files, static .csv files, static pictures (.svg), ...
 - Stored in CSIT git repository.

No more details about the static content in this document.


Data to process
```````````````

The PAL processes tests results and other information produced by Jenkins jobs.
The data are now stored as robot results in Jenkins (TODO: store the data in
nexus) either as .zip and / or .xml files.


Data processing
---------------

As the first step, the data are downloaded and stored locally (typically on a
Jenkins slave). If .zip files are used, the given .xml files are extracted for
further processing.

Parsing of the .xml files is performed by a class derived from
"robot.api.ResultVisitor", only necessary methods are overridden. All and only
necessary data is extracted from .xml file and stored in a structured form.

The parsed data are stored as the multi-indexed pandas.Series data type. Its
structure is as follows:

::

    <job name>
      <build>
        <metadata>
        <suites>
        <tests>

"job name", "build", "metadata", "suites", "tests" are indexes to access the
data. For example:

::

    data =

    job 1 name:
      build 1:
        metadata: metadata
        suites: suites
        tests: tests
      ...
      build N:
        metadata: metadata
        suites: suites
        tests: tests
    ...
    job M name:
      build 1:
        metadata: metadata
        suites: suites
        tests: tests
      ...
      build N:
        metadata: metadata
        suites: suites
        tests: tests

Using indexes data["job 1 name"]["build 1"]["tests"] (e.g.:
data["csit-vpp-perf-1704-all"]["17"]["tests"]) we get a list of all tests with
all tests data.

Data will not be accessible directly using indexes, but using getters and
filters.

**Structure of metadata:**

::

    "metadata": {
        "version": "VPP version",
        "job": "Jenkins job name"
        "build": "Information about the build"
    },

**Structure of suites:**

::

    "suites": {
        "Suite name 1": {
            "doc": "Suite 1 documentation"
            "parent": "Suite 1 parent"
        }
        "Suite name N": {
            "doc": "Suite N documentation"
            "parent": "Suite N parent"
        }

**Structure of tests:**

Performance tests:

::

    "tests": {
        "ID": {
            "name": "Test name",
            "parent": "Name of the parent of the test",
            "doc": "Test documentation"
            "msg": "Test message"
            "tags": ["tag 1", "tag 2", "tag n"],
            "type": "PDR" | "NDR",
            "throughput": {
                "value": int,
                "unit": "pps" | "bps" | "percentage"
            },
            "latency": {
                "direction1": {
                    "100": {
                        "min": int,
                        "avg": int,
                        "max": int
                    },
                    "50": {  # Only for NDR
                        "min": int,
                        "avg": int,
                        "max": int
                    },
                    "10": {  # Only for NDR
                        "min": int,
                        "avg": int,
                        "max": int
                    }
                },
                "direction2": {
                    "100": {
                        "min": int,
                        "avg": int,
                        "max": int
                    },
                    "50": {  # Only for NDR
                        "min": int,
                        "avg": int,
                        "max": int
                    },
                    "10": {  # Only for NDR
                        "min": int,
                        "avg": int,
                        "max": int
                    }
                }
            },
            "lossTolerance": "lossTolerance"  # Only for PDR
            "vat-history": "DUT1 and DUT2 VAT History"
            },
            "show-run": "Show Run"
        },
        "ID" {
            # next test
        }

Functional tests:

::

    "tests": {
        "ID": {
            "name": "Test name",
            "parent": "Name of the parent of the test",
            "doc": "Test documentation"
            "msg": "Test message"
            "tags": ["tag 1", "tag 2", "tag n"],
            "vat-history": "DUT1 and DUT2 VAT History"
            "show-run": "Show Run"
            "status": "PASS" | "FAIL"
        },
        "ID" {
            # next test
        }
    }

Note: ID is the lowercase full path to the test.


Data filtering
``````````````

The first step when generating an element is getting the data needed to
construct the element. The data are filtered from the processed input data.

The data filtering is based on:

 - job name(s).
 - build number(s).
 - tag(s).
 - required data - only this data is included in the output.

WARNING: The filtering is based on tags, so be careful with tagging.

For example, the element which specification includes:

::

    data:
      csit-vpp-perf-1707-all:
      - 9
      - 10
      - 13
      - 14
      - 15
      - 16
      - 17
      - 18
      - 19
      - 21
    filter:
      - "'64B' and 'BASE' and 'NDRDISC' and '1T1C' and ('L2BDMACSTAT' or 'L2BDMACLRN' or 'L2XCFWD') and not 'VHOST'"

will be constructed using data from the job "csit-vpp-perf-1707-all", for all
listed builds and the tests with the list of tags matching the filter
conditions.

The output data structure for filtered test data is:

::

    - job 1
      - build 1
        - test 1
          - parameter 1
          - parameter 2
          ...
          - parameter n
        ...
        - test n
        ...
      ...
      - build n
    ...
    - job n


Data analytics
``````````````

Data analytics part implements:

 - methods to compute statistical data from the filtered input data.
 - trending.

Throughput Speedup Analysis - Multi-Core with Multi-Threading
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Throughput Speedup Analysis (TSA) calculates throughput speedup ratios
for tested 1-, 2- and 4-core multi-threaded VPP configurations using the
following formula:

::

                                N_core_throughput
    N_core_throughput_speedup = -----------------
                                1_core_throughput

Multi-core throughput speedup ratios are plotted in grouped bar graphs
for throughput tests with 64B/78B frame size, with number of cores on
X-axis and speedup ratio on Y-axis.

For better comparison multiple test results' data sets are plotted per
each graph:

    - graph type: grouped bars;
    - graph X-axis: (testcase index, number of cores);
    - graph Y-axis: speedup factor.

Subset of existing performance tests is covered by TSA graphs.

**Model for TSA:**

::

    -
      type: "plot"
      title: "TSA: 64B-*-(eth|dot1q|dot1ad)-(l2xcbase|l2bdbasemaclrn)-ndrdisc"
      algorithm: "plot_throughput_speedup_analysis"
      output-file-type: ".html"
      output-file: "{DIR[STATIC,VPP]}/10ge2p1x520-64B-l2-tsa-ndrdisc"
      data:
        "plot-throughput-speedup-analysis"
      filter: "'NIC_Intel-X520-DA2' and '64B' and 'BASE' and 'NDRDISC' and ('L2BDMACSTAT' or 'L2BDMACLRN' or 'L2XCFWD') and not 'VHOST'"
      parameters:
      - "throughput"
      - "parent"
      - "tags"
      layout:
        title: "64B-*-(eth|dot1q|dot1ad)-(l2xcbase|l2bdbasemaclrn)-ndrdisc"
        layout:
          "plot-throughput-speedup-analysis"


Comparison of results from two sets of the same test executions
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

This algorithm enables comparison of results coming from two sets of the
same test executions. It is used to quantify performance changes across
all tests after test environment changes e.g. Operating System
upgrades/patches, Hardware changes.

It is assumed that each set of test executions includes multiple runs
of the same tests, 10 or more, to verify test results repeatibility and
to yield statistically meaningful results data.

Comparison results are presented in a table with a specified number of
the best and the worst relative changes between the two sets. Following table
columns are defined:

    - name of the test;
    - throughput mean values of the reference set;
    - throughput standard deviation  of the reference set;
    - throughput mean values of the set to compare;
    - throughput standard deviation  of the set to compare;
    - relative change of the mean values.

**The model**

The model specifies:

    - type: "table" - means this section defines a table.
    - title: Title of the table.
    - algorithm: Algorithm which is used to generate the table. The other
      parameters in this section must provide all information needed by the used
      algorithm.
    - output-file-ext: Extension of the output file.
    - output-file: File which the table will be written to.
    - reference - the builds which are used as the reference for comparison.
    - compare - the builds which are compared to the reference.
    - data: Specify the sources, jobs and builds, providing data for generating
      the table.
    - filter: Filter based on tags applied on the input data, if "template" is
      used, filtering is based on the template.
    - parameters: Only these parameters will be put to the output data
      structure.
    - nr-of-tests-shown: Number of the best and the worst tests presented in the
      table. Use 0 (zero) to present all tests.

*Example:*

::

    -
      type: "table"
      title: "Performance comparison"
      algorithm: "table_performance_comparison"
      output-file-ext: ".csv"
      output-file: "{DIR[DTR,PERF,VPP,IMPRV]}/vpp_performance_comparison"
      reference:
        title: "csit-vpp-perf-1801-all - 1"
        data:
          csit-vpp-perf-1801-all:
          - 1
          - 2
      compare:
        title: "csit-vpp-perf-1801-all - 2"
        data:
          csit-vpp-perf-1801-all:
          - 1
          - 2
      data:
        "vpp-perf-comparison"
      filter: "all"
      parameters:
      - "name"
      - "parent"
      - "throughput"
      nr-of-tests-shown: 20


Advanced data analytics
```````````````````````

In the future advanced data analytics (ADA) will be added to analyze the
telemetry data collected from SUT telemetry sources and correlate it to
performance test results.

:TODO:

    - describe the concept of ADA.
    - add specification.


Data presentation
-----------------

Generates the plots and tables according to the report models per
specification file. The elements are generated using algorithms and data
specified in their models.


Tables
``````

 - tables are generated by algorithms implemented in PAL, the model includes the
   algorithm and all necessary information.
 - output format: csv
 - generated tables are stored in specified directories and linked to .rst
   files.


Plots
`````

 - `plot.ly <https://plot.ly/>`_ is currently used to generate plots, the model
   includes the type of plot and all the necessary information to render it.
 - output format: html.
 - generated plots are stored in specified directories and linked to .rst files.


Report generation
-----------------

Report is generated using Sphinx and Read_the_Docs template. PAL generates html
and pdf formats. It is possible to define the content of the report by
specifying the version (TODO: define the names and content of versions).


The process
```````````

1. Read the specification.
2. Read the input data.
3. Process the input data.
4. For element (plot, table, file) defined in specification:

   a. Get the data needed to construct the element using a filter.
   b. Generate the element.
   c. Store the element.

5. Generate the report.
6. Store the report (Nexus).

The process is model driven. The elements' models (tables, plots, files
and report itself) are defined in the specification file. Script reads
the elements' models from specification file and generates the elements.

It is easy to add elements to be generated in the report. If a new type
of an element is required, only a new algorithm needs to be implemented
and integrated.


Continuous Performance Measurements and Trending
------------------------------------------------

Performance analysis and trending execution sequence:
`````````````````````````````````````````````````````

CSIT PA runs performance analysis, change detection and trending using specified
trend analysis metrics over the rolling window of last <N> sets of historical
measurement data. PA is defined as follows:

    #. PA job triggers:

        #. By PT job at its completion.
        #. Manually from Jenkins UI.

    #. Download and parse archived historical data and the new data:

        #. New data from latest PT job is evaluated against the rolling window
           of <N> sets of historical data.
        #. Download RF output.xml files and compressed archived data.
        #. Parse out the data filtering test cases listed in PA specification
           (part of CSIT PAL specification file).

    #. Calculate trend metrics for the rolling window of <N> sets of historical
       data:

        #. Calculate quartiles Q1, Q2, Q3.
        #. Trim outliers using IQR.
        #. Calculate TMA and TMSD.
        #. Calculate normal trending range per test case based on TMA and TMSD.

    #. Evaluate new test data against trend metrics:

        #. If within the range of (TMA +/- 3*TMSD) => Result = Pass,
           Reason = Normal.
        #. If below the range => Result = Fail, Reason = Regression.
        #. If above the range => Result = Pass, Reason = Progression.

    #. Generate and publish results

        #. Relay evaluation result to job result.
        #. Generate a new set of trend analysis summary graphs and drill-down
           graphs.

            #. Summary graphs to include measured values with Normal,
               Progression and Regression markers. MM shown in the background if
               possible.
            #. Drill-down graphs to include MM, TMA and TMSD.

        #. Publish trend analysis graphs in html format on
           https://docs.fd.io/csit/master/trending/.


Parameters to specify:
``````````````````````

*General section - parameters common to all plots:*

    - type: "cpta";
    - title: The title of this section;
    - output-file-type: only ".html" is supported;
    - output-file: path where the generated files will be stored.

*Plots section:*

    - plot title;
    - output file name;
    - input data for plots;

        - job to be monitored - the Jenkins job which results are used as input
          data for this test;
        - builds used for trending plot(s) - specified by a list of build
          numbers or by a range of builds defined by the first and the last
          build number;

    - tests to be displayed in the plot defined by a filter;
    - list of parameters to extract from the data;
    - plot layout

*Example:*

::

    -
      type: "cpta"
      title: "Continuous Performance Trending and Analysis"
      output-file-type: ".html"
      output-file: "{DIR[STATIC,VPP]}/cpta"
      plots:

        - title: "VPP 1T1C L2 64B Packet Throughput - Trending"
          output-file-name: "l2-1t1c-x520"
          data: "plot-performance-trending-vpp"
          filter: "'NIC_Intel-X520-DA2' and 'MRR' and '64B' and ('BASE' or 'SCALE') and '1T1C' and ('L2BDMACSTAT' or 'L2BDMACLRN' or 'L2XCFWD') and not 'VHOST' and not 'MEMIF'"
          parameters:
          - "result"
          layout: "plot-cpta-vpp"

        - title: "DPDK 4T4C IMIX MRR Trending"
          output-file-name: "dpdk-imix-4t4c-xl710"
          data: "plot-performance-trending-dpdk"
          filter: "'NIC_Intel-XL710' and 'IMIX' and 'MRR' and '4T4C' and 'DPDK'"
          parameters:
          - "result"
          layout: "plot-cpta-dpdk"

The Dashboard
`````````````

Performance dashboard tables provide the latest VPP throughput trend, trend
compliance and detected anomalies, all on a per VPP test case basis.
The Dashboard is generated as three tables for 1t1c, 2t2c and 4t4c MRR tests.

At first, the .csv tables are generated (only the table for 1t1c is shown):

::

    -
      type: "table"
      title: "Performance trending dashboard"
      algorithm: "table_performance_trending_dashboard"
      output-file-ext: ".csv"
      output-file: "{DIR[STATIC,VPP]}/performance-trending-dashboard-1t1c"
      data: "plot-performance-trending-all"
      filter: "'MRR' and '1T1C'"
      parameters:
      - "name"
      - "parent"
      - "result"
      ignore-list:
      - "tests.vpp.perf.l2.10ge2p1x520-eth-l2bdscale1mmaclrn-mrr.tc01-64b-1t1c-eth-l2bdscale1mmaclrn-ndrdisc"
      outlier-const: 1.5
      window: 14
      evaluated-window: 14
      long-trend-window: 180

Then, html tables stored inside .rst files are generated:

::

    -
      type: "table"
      title: "HTML performance trending dashboard 1t1c"
      algorithm: "table_performance_trending_dashboard_html"
      input-file: "{DIR[STATIC,VPP]}/performance-trending-dashboard-1t1c.csv"
      output-file: "{DIR[STATIC,VPP]}/performance-trending-dashboard-1t1c.rst"

Root Cause Analysis
-------------------

Root Cause Analysis (RCA) by analysing archived performance results – re-analyse
available data for specified:

    - range of jobs builds,
    - set of specific tests and
    - PASS/FAIL criteria to detect performance change.

In addition, PAL generates trending plots to show performance over the specified
time interval.

Root Cause Analysis - Option 1: Analysing Archived VPP Results
``````````````````````````````````````````````````````````````

It can be used to speed-up the process, or when the existing data is sufficient.
In this case, PAL uses existing data saved in Nexus, searches for performance
degradations and generates plots to show performance over the specified time
interval for the selected tests.

Execution Sequence
''''''''''''''''''

    #. Download and parse archived historical data and the new data.
    #. Calculate trend metrics.
    #. Find regression / progression.
    #. Generate and publish results:

        #. Summary graphs to include measured values with Progression and
           Regression markers.
        #. List the DUT build(s) where the anomalies were detected.

CSIT PAL Specification
''''''''''''''''''''''

    - What to test:

        - first build (Good); specified by the Jenkins job name and the build
          number
        - last build (Bad); specified by the Jenkins job name and the build
          number
        - step (1..n).

    - Data:

        - tests of interest; list of tests (full name is used) which results are
          used

*Example:*

::

    TODO


API
---

List of modules, classes, methods and functions
```````````````````````````````````````````````

::

    specification_parser.py

        class Specification

            Methods:
                read_specification
                set_input_state
                set_input_file_name

            Getters:
                specification
                environment
                debug
                is_debug
                input
                builds
                output
                tables
                plots
                files
                static


    input_data_parser.py

        class InputData

            Methods:
                read_data
                filter_data

            Getters:
                data
                metadata
                suites
                tests


    environment.py

        Functions:
            clean_environment

        class Environment

            Methods:
                set_environment

            Getters:
                environment


    input_data_files.py

        Functions:
            download_data_files
            unzip_files


    generator_tables.py

        Functions:
            generate_tables

        Functions implementing algorithms to generate particular types of
        tables (called by the function "generate_tables"):
            table_details
            table_performance_improvements


    generator_plots.py

        Functions:
            generate_plots

        Functions implementing algorithms to generate particular types of
        plots (called by the function "generate_plots"):
            plot_performance_box
            plot_latency_box


    generator_files.py

        Functions:
            generate_files

        Functions implementing algorithms to generate particular types of
        files (called by the function "generate_files"):
            file_test_results


    report.py

        Functions:
            generate_report

        Functions implementing algorithms to generate particular types of
        report (called by the function "generate_report"):
            generate_html_report
            generate_pdf_report

        Other functions called by the function "generate_report":
            archive_input_data
            archive_report


PAL functional diagram
``````````````````````

.. only:: latex

    .. raw:: latex

        \begin{figure}[H]
            \centering
                \graphicspath{{../_tmp/src/csit_framework_documentation/}}
                \includegraphics[width=0.90\textwidth]{pal_func_diagram}
                \label{fig:pal_func_diagram}
        \end{figure}

.. only:: html

    .. figure:: pal_func_diagram.svg
        :alt: PAL functional diagram
        :align: center


How to add an element
`````````````````````

Element can be added by adding it's model to the specification file. If
the element is to be generated by an existing algorithm, only it's
parameters must be set.

If a brand new type of element needs to be added, also the algorithm
must be implemented. Element generation algorithms are implemented in
the files with names starting with "generator" prefix. The name of the
function implementing the algorithm and the name of algorithm in the
specification file have to be the same.