.. _telemetry:

OpenMetrics
-----------

OpenMetrics specifies the de-facto standard for transmitting cloud-native
metrics at scale, with support for both text representation and Protocol
Buffers.

RFC
~~~

- RFC2119
- RFC5234
- RFC8174
- draft-richih-opsawg-openmetrics-00

Reference
~~~~~~~~~

`OpenMetrics <https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md>`_

Metric Types
~~~~~~~~~~~~

- Gauge
- Counter
- StateSet
- Info
- Histogram
- GaugeHistogram
- Summary
- Unknown

Telemetry module in CSIT currently support only Gauge, Counter and Info.

Example metric file
~~~~~~~~~~~~~~~~~~~

```
  # HELP calls_total Number of calls total
  # TYPE calls_total counter
  calls_total{name="api-rx-from-ring",state="active",thread_id="0",thread_lcore="1",thread_name="vpp_main"} 0.0
  calls_total{name="fib-walk",state="any wait",thread_id="0",thread_lcore="1",thread_name="vpp_main"} 0.0
  calls_total{name="ip6-mld-process",state="any wait",thread_id="0",thread_lcore="1",thread_name="vpp_main"} 0.0
  calls_total{name="ip6-ra-process",state="any wait",thread_id="0",thread_lcore="1",thread_name="vpp_main"} 0.0
  calls_total{name="unix-epoll-input",state="polling",thread_id="0",thread_lcore="1",thread_name="vpp_main"} 39584.0
  calls_total{name="avf-0/18/6/0-output",state="active",thread_id="1",thread_lcore="2",thread_name="vpp_wk_0"} 91.0
  calls_total{name="avf-0/18/6/0-tx",state="active",thread_id="1",thread_lcore="2",thread_name="vpp_wk_0"} 91.0
  calls_total{name="avf-input",state="polling",thread_id="1",thread_lcore="2",thread_name="vpp_wk_0"} 91.0
  calls_total{name="ethernet-input",state="active",thread_id="1",thread_lcore="2",thread_name="vpp_wk_0"} 91.0
  calls_total{name="ip4-input-no-checksum",state="active",thread_id="1",thread_lcore="2",thread_name="vpp_wk_0"} 91.0
  calls_total{name="ip4-lookup",state="active",thread_id="1",thread_lcore="2",thread_name="vpp_wk_0"} 91.0
  calls_total{name="ip4-rewrite",state="active",thread_id="1",thread_lcore="2",thread_name="vpp_wk_0"} 91.0
  calls_total{name="avf-0/18/2/0-output",state="active",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 91.0
  calls_total{name="avf-0/18/2/0-tx",state="active",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 91.0
  calls_total{name="avf-input",state="polling",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 91.0
  calls_total{name="ethernet-input",state="active",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 91.0
  calls_total{name="ip4-input-no-checksum",state="active",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 91.0
  calls_total{name="ip4-lookup",state="active",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 91.0
  calls_total{name="ip4-rewrite",state="active",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 91.0
  calls_total{name="unix-epoll-input",state="polling",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 1.0
```

Anatomy of existing CSIT telemetry implementation
-------------------------------------------------

Existing implementation consists of several measurment building blocks:
the main measuring block running search algorithms (MLR, PLR, SOAK, MRR, ...),
the latency measuring block and the several telemetry blocks with or without
traffic running on a background.

The main measuring block must not be interrupted by any read operation that can
impact data plane traffic processing during throughput search algorithm. Thus
operational reads are done before (pre-stat) and after (post-stat) that block.

Some operational reads must be done while traffic is running and usually
consists of two reads (pre-run-stat, post-run-stat) with defined delay between
them.

MRR measurement
~~~~~~~~~~~~~~~

```
  traffic_start(r=mrr)               traffic_stop       |<     measure     >|
    |                                  |                |      (r=mrr)      |
    |   pre_run_stat   post_run_stat   |    pre_stat    |                   |  post_stat
    |        |               |         |       |        |                   |      |
  --o--------o---------------o---------o-------o--------+-------------------+------o------------>
                                                                                              t

Legend:
  - pre_run_stat
    - vpp-clear-runtime
  - post_run_stat
    - vpp-show-runtime
    - bash-perf-stat            // if extended_debug == True
  - pre_stat
    - vpp-clear-stats
    - vpp-enable-packettrace    // if extended_debug == True
    - vpp-enable-elog
  - post_stat
    - vpp-show-stats
    - vpp-show-packettrace      // if extended_debug == True
    - vpp-show-elog
```

```
    |<                                measure                                 >|
    |                                 (r=mrr)                                  |
    |                                                                          |
    |<    traffic_trial0    >|<    traffic_trial1    >|<    traffic_trialN    >|
    |    (i=0,t=duration)    |    (i=1,t=duration)    |    (i=N,t=duration)    |
    |                        |                        |                        |
  --o------------------------o------------------------o------------------------o--->
                                                                                 t
```

MLR measurement
~~~~~~~~~~~~~~~

```
    |<     measure     >|   traffic_start(r=pdr)               traffic_stop   traffic_start(r=ndr)               traffic_stop  |< [    latency    ] >|
    |      (r=mlr)      |    |                                  |              |                                  |            |     .9/.5/.1/.0     |
    |                   |    |   pre_run_stat   post_run_stat   |              |   pre_run_stat   post_run_stat   |            |                     |
    |                   |    |        |               |         |              |        |               |         |            |                     |
  --+-------------------+----o--------o---------------o---------o--------------o--------o---------------o---------o------------[---------------------]--->
                                                                                                                                                       t

Legend:
  - pre_run_stat
    - vpp-clear-runtime
  - post_run_stat
    - vpp-show-runtime
    - bash-perf-stat          // if extended_debug == True
  - pre_stat
    - vpp-clear-stats
    - vpp-enable-packettrace  // if extended_debug == True
    - vpp-enable-elog
  - post_stat
    - vpp-show-stats
    - vpp-show-packettrace    // if extended_debug == True
    - vpp-show-elog
```


Improving existing solution
---------------------------

Improving existing CSIT telemetry implementaion including these areas.

- telemetry optimization
  - reducing ssh overhead
  - removing stats without added value
- telemetry scheduling
  - improve accuracy
  - improve configuration
- telemetry output
  - standardize output

Exesting stats implementation was abstracted to having pre-/post-run-stats
phases. Improvement will be done by merging pre-/post- logic implementation into
separated stat-runtime block configurable and locally executed on SUT.

This will increase precision, remove complexity and move implementation into
spearated module.

OpenMetric format for cloud native metric capturing will be used to ensure
integration with post processing module.

MRR measurement
~~~~~~~~~~~~~~~

```
    traffic_start(r=mrr)               traffic_stop                 |<     measure     >|
      |                                  |                          |      (r=mrr)      |
      |   |<      stat_runtime      >|   |          stat_pre_trial  |                   |  stat_post_trial
      |   |                          |   |             |            |                   |     |
  ----o---+--------------------------+---o-------------o------------+-------------------+-----o------------->
                                                                                                          t

Legend:
  - stat_runtime
    - vpp-runtime
  - stat_pre_trial
    - vpp-clear-stats
    - vpp-enable-packettrace  // if extended_debug == True
  - stat_post_trial
    - vpp-show-stats
    - vpp-show-packettrace    // if extended_debug == True
```

```
    |<                                measure                                 >|
    |                                 (r=mrr)                                  |
    |                                                                          |
    |<    traffic_trial0    >|<    traffic_trial1    >|<    traffic_trialN    >|
    |    (i=0,t=duration)    |    (i=1,t=duration)    |    (i=N,t=duration)    |
    |                        |                        |                        |
  --o------------------------o------------------------o------------------------o--->
                                                                                 t
```

```
    |<                              stat_runtime                              >|
    |                                                                          |
    |<       program0       >|<       program1       >|<       programN       >|
    |       (@=params)       |       (@=params)       |       (@=params)       |
    |                        |                        |                        |
  --o------------------------o------------------------o------------------------o--->
                                                                                 t
```


MLR measurement
~~~~~~~~~~~~~~~

```
    |<     measure     >|   traffic_start(r=pdr)               traffic_stop   traffic_start(r=ndr)               traffic_stop  |< [    latency    ] >|
    |      (r=mlr)      |     |                                  |              |                                  |           |     .9/.5/.1/.0     |
    |                   |     |   |<      stat_runtime      >|   |              |   |<      stat_runtime      >|   |           |                     |
    |                   |     |   |                          |   |              |   |                          |   |           |                     |
  --+-------------------+-----o---+--------------------------+---o--------------o---+--------------------------+---o-----------[---------------------]--->
                                                                                                                                                       t

Legend:
  - stat_runtime
    - vpp-runtime
  - stat_pre_trial
    - vpp-clear-stats
    - vpp-enable-packettrace  // if extended_debug == True
  - stat_post_trial
    - vpp-show-stats
    - vpp-show-packettrace    // if extended_debug == True
```


Tooling
-------

Prereqisities:
- bpfcc-tools
- python-bpfcc
- libbpfcc
- libbpfcc-dev
- libclang1-9 libllvm9

```bash
  $ sudo apt install bpfcc-tools python3-bpfcc libbpfcc libbpfcc-dev libclang1-9 libllvm9
```


Configuration
-------------

```yaml
  logging:
    version: 1
    formatters:
      console:
        format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
      prom:
        format: '%(message)s'
    handlers:
      console:
        class: logging.StreamHandler
        level: INFO
        formatter: console
        stream: ext://sys.stdout
      prom:
        class: logging.handlers.RotatingFileHandler
        level: INFO
        formatter: prom
        filename: /tmp/metric.prom
        mode: w
    loggers:
      prom:
        handlers: [prom]
        level: INFO
        propagate: False
    root:
      level: INFO
      handlers: [console]
  scheduler:
    duration: 1
  programs:
    - name: bundle_bpf
      metrics:
        counter:
          - name: cpu_cycle
            documentation: Cycles processed by CPUs
            namespace: bpf
            labelnames:
              - name
              - cpu
              - pid
          - name: cpu_instruction
            documentation: Instructions retired by CPUs
            namespace: bpf
            labelnames:
              - name
              - cpu
              - pid
          - name: llc_reference
            documentation: Last level cache operations by type
            namespace: bpf
            labelnames:
              - name
              - cpu
              - pid
          - name: llc_miss
            documentation: Last level cache operations by type
            namespace: bpf
            labelnames:
              - name
              - cpu
              - pid
      events:
        - type: 0x0 # HARDWARE
          name: 0x0 # PERF_COUNT_HW_CPU_CYCLES
          target: on_cpu_cycle
          table: cpu_cycle
        - type: 0x0 # HARDWARE
          name: 0x1 # PERF_COUNT_HW_INSTRUCTIONS
          target: on_cpu_instruction
          table: cpu_instruction
        - type: 0x0 # HARDWARE
          name: 0x2 # PERF_COUNT_HW_CACHE_REFERENCES
          target: on_cache_reference
          table: llc_reference
        - type: 0x0 # HARDWARE
          name: 0x3 # PERF_COUNT_HW_CACHE_MISSES
          target: on_cache_miss
          table: llc_miss
      code: |
        #include <linux/ptrace.h>
        #include <uapi/linux/bpf_perf_event.h>

        const int max_cpus = 256;

        struct key_t {
            int cpu;
            int pid;
            char name[TASK_COMM_LEN];
        };

        BPF_HASH(llc_miss, struct key_t);
        BPF_HASH(llc_reference, struct key_t);
        BPF_HASH(cpu_instruction, struct key_t);
        BPF_HASH(cpu_cycle, struct key_t);

        static inline __attribute__((always_inline)) void get_key(struct key_t* key) {
            key->cpu = bpf_get_smp_processor_id();
            key->pid = bpf_get_current_pid_tgid();
            bpf_get_current_comm(&(key->name), sizeof(key->name));
        }

        int on_cpu_cycle(struct bpf_perf_event_data *ctx) {
            struct key_t key = {};
            get_key(&key);

            cpu_cycle.increment(key, ctx->sample_period);
            return 0;
        }
        int on_cpu_instruction(struct bpf_perf_event_data *ctx) {
            struct key_t key = {};
            get_key(&key);

            cpu_instruction.increment(key, ctx->sample_period);
            return 0;
        }
        int on_cache_reference(struct bpf_perf_event_data *ctx) {
            struct key_t key = {};
            get_key(&key);

            llc_reference.increment(key, ctx->sample_period);
            return 0;
        }
        int on_cache_miss(struct bpf_perf_event_data *ctx) {
            struct key_t key = {};
            get_key(&key);

            llc_miss.increment(key, ctx->sample_period);
            return 0;
        }
```

CSIT captured metrics
---------------------

SUT
~~~

Compute resource
________________

- BPF /process
  - BPF_HASH(llc_miss, struct key_t);
  - BPF_HASH(llc_reference, struct key_t);
  - BPF_HASH(cpu_instruction, struct key_t);
  - BPF_HASH(cpu_cycle, struct key_t);

Memory resource
_______________

- BPF /process
  - tbd

Network resource
________________

- BPF /process
  - tbd

DUT VPP metrics
~~~~~~~~~~~~~~~

Compute resource
________________

- runtime /node `show runtime`
  - calls
  - vectors
  - suspends
  - clocks
  - vectors_calls
- perfmon /bundle
  - inst-and-clock      node      intel-core          instructions/packet, cycles/packet and IPC
  - cache-hierarchy     node      intel-core          cache hits and misses
  - context-switches    thread    linux               per-thread context switches
  - branch-mispred      node      intel-core          Branches, branches taken and mis-predictions
  - page-faults         thread    linux               per-thread page faults
  - load-blocks         node      intel-core          load operations blocked due to various uarch reasons
  - power-licensing     node      intel-core          Thread power licensing
  - memory-bandwidth    system    intel-uncore        memory reads and writes per memory controller channel

Memory resource - tbd
_____________________

- memory /segment `show memory verbose api-segment stats-segment main-heap`
  - total
  - used
  - free
  - trimmable
  - free-chunks
  - free-fastbin-blks
  - max-total-allocated
- physmem `show physmem`
  - pages
  - subpage-size

Network resource
________________

- counters /node `show node counters`
  - count
  - severity
- hardware /interface `show interface`
  - rx_stats
  - tx_stats
- packets /interface `show hardware`
  - rx_packets
  - rx_bytes
  - rx_errors
  - tx_packets
  - tx_bytes
  - tx_errors
  - drops
  - punt
  - ip4
  - ip6
  - rx_no_buf
  - rx_miss


DUT DPDK metrics - tbd
~~~~~~~~~~~~~~~~~~~~~~

Compute resource
________________

- BPF /process
  - BPF_HASH(llc_miss, struct key_t);
  - BPF_HASH(llc_reference, struct key_t);
  - BPF_HASH(cpu_instruction, struct key_t);
  - BPF_HASH(cpu_cycle, struct key_t);

Memory resource
_______________

- BPF /process
  - tbd

Network resource
________________

- packets /interface
  - inPackets
  - outPackets
  - inBytes
  - outBytes
  - outErrorPackets
  - dropPackets