aboutsummaryrefslogtreecommitdiffstats
path: root/docs/report/introduction/methodology_telemetry.rst
diff options
context:
space:
mode:
authorpmikus <peter.mikus@protonmail.ch>2022-10-03 14:42:34 +0200
committerPeter Mikus <peter.mikus@protonmail.ch>2022-10-03 14:09:00 +0000
commit14fe52e81c59a5a3558f4f587e56b75e388191dc (patch)
treefda33cb74b18dc0f6dcdde168b1113badeaf5a5b /docs/report/introduction/methodology_telemetry.rst
parentc6bfe865e4a62dda2c5e635df53083e909a6558b (diff)
fix(docs): Static content
Signed-off-by: pmikus <peter.mikus@protonmail.ch> Change-Id: I29c2c108cec7badbe302095d5797d39c0d34ad58
Diffstat (limited to 'docs/report/introduction/methodology_telemetry.rst')
-rw-r--r--docs/report/introduction/methodology_telemetry.rst399
1 files changed, 67 insertions, 332 deletions
diff --git a/docs/report/introduction/methodology_telemetry.rst b/docs/report/introduction/methodology_telemetry.rst
index dcd2d06541..c10f99affe 100644
--- a/docs/report/introduction/methodology_telemetry.rst
+++ b/docs/report/introduction/methodology_telemetry.rst
@@ -37,7 +37,8 @@ Telemetry module in CSIT currently support only Gauge, Counter and Info.
Example metric file
~~~~~~~~~~~~~~~~~~~
-```
+::
+
# HELP calls_total Number of calls total
# TYPE calls_total counter
calls_total{name="api-rx-from-ring",state="active",thread_id="0",thread_lcore="1",thread_name="vpp_main"} 0.0
@@ -60,7 +61,7 @@ Example metric file
calls_total{name="ip4-lookup",state="active",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 91.0
calls_total{name="ip4-rewrite",state="active",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 91.0
calls_total{name="unix-epoll-input",state="polling",thread_id="2",thread_lcore="0",thread_name="vpp_wk_1"} 1.0
-```
+
Anatomy of existing CSIT telemetry implementation
-------------------------------------------------
@@ -81,7 +82,8 @@ them.
MRR measurement
~~~~~~~~~~~~~~~
-```
+::
+
traffic_start(r=mrr) traffic_stop |< measure >|
| | | (r=mrr) |
| pre_run_stat post_run_stat | pre_stat | | post_stat
@@ -89,23 +91,23 @@ MRR measurement
--o--------o---------------o---------o-------o--------+-------------------+------o------------>
t
-Legend:
- - pre_run_stat
- - vpp-clear-runtime
- - post_run_stat
- - vpp-show-runtime
- - bash-perf-stat // if extended_debug == True
- - pre_stat
- - vpp-clear-stats
- - vpp-enable-packettrace // if extended_debug == True
- - vpp-enable-elog
- - post_stat
- - vpp-show-stats
- - vpp-show-packettrace // if extended_debug == True
- - vpp-show-elog
-```
-
-```
+ Legend:
+ - pre_run_stat
+ - vpp-clear-runtime
+ - post_run_stat
+ - vpp-show-runtime
+ - bash-perf-stat // if extended_debug == True
+ - pre_stat
+ - vpp-clear-stats
+ - vpp-enable-packettrace // if extended_debug == True
+ - vpp-enable-elog
+ - post_stat
+ - vpp-show-stats
+ - vpp-show-packettrace // if extended_debug == True
+ - vpp-show-elog
+
+::
+
|< measure >|
| (r=mrr) |
| |
@@ -114,12 +116,13 @@ Legend:
| | | |
--o------------------------o------------------------o------------------------o--->
t
-```
+
MLR measurement
~~~~~~~~~~~~~~~
-```
+::
+
|< measure >| traffic_start(r=pdr) traffic_stop traffic_start(r=ndr) traffic_stop |< [ latency ] >|
| (r=mlr) | | | | | | .9/.5/.1/.0 |
| | | pre_run_stat post_run_stat | | pre_run_stat post_run_stat | | |
@@ -127,21 +130,20 @@ MLR measurement
--+-------------------+----o--------o---------------o---------o--------------o--------o---------------o---------o------------[---------------------]--->
t
-Legend:
- - pre_run_stat
- - vpp-clear-runtime
- - post_run_stat
- - vpp-show-runtime
- - bash-perf-stat // if extended_debug == True
- - pre_stat
- - vpp-clear-stats
- - vpp-enable-packettrace // if extended_debug == True
- - vpp-enable-elog
- - post_stat
- - vpp-show-stats
- - vpp-show-packettrace // if extended_debug == True
- - vpp-show-elog
-```
+ Legend:
+ - pre_run_stat
+ - vpp-clear-runtime
+ - post_run_stat
+ - vpp-show-runtime
+ - bash-perf-stat // if extended_debug == True
+ - pre_stat
+ - vpp-clear-stats
+ - vpp-enable-packettrace // if extended_debug == True
+ - vpp-enable-elog
+ - post_stat
+ - vpp-show-stats
+ - vpp-show-packettrace // if extended_debug == True
+ - vpp-show-elog
Improving existing solution
@@ -171,7 +173,8 @@ integration with post processing module.
MRR measurement
~~~~~~~~~~~~~~~
-```
+::
+
traffic_start(r=mrr) traffic_stop |< measure >|
| | | (r=mrr) |
| |< stat_runtime >| | stat_pre_trial | | stat_post_trial
@@ -179,18 +182,19 @@ MRR measurement
----o---+--------------------------+---o-------------o------------+-------------------+-----o------------->
t
-Legend:
- - stat_runtime
- - vpp-runtime
- - stat_pre_trial
- - vpp-clear-stats
- - vpp-enable-packettrace // if extended_debug == True
- - stat_post_trial
- - vpp-show-stats
- - vpp-show-packettrace // if extended_debug == True
-```
-
-```
+ Legend:
+ - stat_runtime
+ - vpp-runtime
+ - stat_pre_trial
+ - vpp-clear-stats
+ - vpp-enable-packettrace // if extended_debug == True
+ - stat_post_trial
+ - vpp-show-stats
+ - vpp-show-packettrace // if extended_debug == True
+
+
+::
+
|< measure >|
| (r=mrr) |
| |
@@ -199,9 +203,9 @@ Legend:
| | | |
--o------------------------o------------------------o------------------------o--->
t
-```
-```
+::
+
|< stat_runtime >|
| |
|< program0 >|< program1 >|< programN >|
@@ -209,13 +213,13 @@ Legend:
| | | |
--o------------------------o------------------------o------------------------o--->
t
-```
MLR measurement
~~~~~~~~~~~~~~~
-```
+::
+
|< measure >| traffic_start(r=pdr) traffic_stop traffic_start(r=ndr) traffic_stop |< [ latency ] >|
| (r=mlr) | | | | | | .9/.5/.1/.0 |
| | | |< stat_runtime >| | | |< stat_runtime >| | | |
@@ -223,281 +227,12 @@ MLR measurement
--+-------------------+-----o---+--------------------------+---o--------------o---+--------------------------+---o-----------[---------------------]--->
t
-Legend:
- - stat_runtime
- - vpp-runtime
- - stat_pre_trial
- - vpp-clear-stats
- - vpp-enable-packettrace // if extended_debug == True
- - stat_post_trial
- - vpp-show-stats
- - vpp-show-packettrace // if extended_debug == True
-```
-
-
-Tooling
--------
-
-Prereqisities:
-- bpfcc-tools
-- python-bpfcc
-- libbpfcc
-- libbpfcc-dev
-- libclang1-9 libllvm9
-
-```bash
- $ sudo apt install bpfcc-tools python3-bpfcc libbpfcc libbpfcc-dev libclang1-9 libllvm9
-```
-
-
-Configuration
--------------
-
-```yaml
- logging:
- version: 1
- formatters:
- console:
- format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
- prom:
- format: '%(message)s'
- handlers:
- console:
- class: logging.StreamHandler
- level: INFO
- formatter: console
- stream: ext://sys.stdout
- prom:
- class: logging.handlers.RotatingFileHandler
- level: INFO
- formatter: prom
- filename: /tmp/metric.prom
- mode: w
- loggers:
- prom:
- handlers: [prom]
- level: INFO
- propagate: False
- root:
- level: INFO
- handlers: [console]
- scheduler:
- duration: 1
- programs:
- - name: bundle_bpf
- metrics:
- counter:
- - name: cpu_cycle
- documentation: Cycles processed by CPUs
- namespace: bpf
- labelnames:
- - name
- - cpu
- - pid
- - name: cpu_instruction
- documentation: Instructions retired by CPUs
- namespace: bpf
- labelnames:
- - name
- - cpu
- - pid
- - name: llc_reference
- documentation: Last level cache operations by type
- namespace: bpf
- labelnames:
- - name
- - cpu
- - pid
- - name: llc_miss
- documentation: Last level cache operations by type
- namespace: bpf
- labelnames:
- - name
- - cpu
- - pid
- events:
- - type: 0x0 # HARDWARE
- name: 0x0 # PERF_COUNT_HW_CPU_CYCLES
- target: on_cpu_cycle
- table: cpu_cycle
- - type: 0x0 # HARDWARE
- name: 0x1 # PERF_COUNT_HW_INSTRUCTIONS
- target: on_cpu_instruction
- table: cpu_instruction
- - type: 0x0 # HARDWARE
- name: 0x2 # PERF_COUNT_HW_CACHE_REFERENCES
- target: on_cache_reference
- table: llc_reference
- - type: 0x0 # HARDWARE
- name: 0x3 # PERF_COUNT_HW_CACHE_MISSES
- target: on_cache_miss
- table: llc_miss
- code: |
- #include <linux/ptrace.h>
- #include <uapi/linux/bpf_perf_event.h>
-
- const int max_cpus = 256;
-
- struct key_t {
- int cpu;
- int pid;
- char name[TASK_COMM_LEN];
- };
-
- BPF_HASH(llc_miss, struct key_t);
- BPF_HASH(llc_reference, struct key_t);
- BPF_HASH(cpu_instruction, struct key_t);
- BPF_HASH(cpu_cycle, struct key_t);
-
- static inline __attribute__((always_inline)) void get_key(struct key_t* key) {
- key->cpu = bpf_get_smp_processor_id();
- key->pid = bpf_get_current_pid_tgid();
- bpf_get_current_comm(&(key->name), sizeof(key->name));
- }
-
- int on_cpu_cycle(struct bpf_perf_event_data *ctx) {
- struct key_t key = {};
- get_key(&key);
-
- cpu_cycle.increment(key, ctx->sample_period);
- return 0;
- }
- int on_cpu_instruction(struct bpf_perf_event_data *ctx) {
- struct key_t key = {};
- get_key(&key);
-
- cpu_instruction.increment(key, ctx->sample_period);
- return 0;
- }
- int on_cache_reference(struct bpf_perf_event_data *ctx) {
- struct key_t key = {};
- get_key(&key);
-
- llc_reference.increment(key, ctx->sample_period);
- return 0;
- }
- int on_cache_miss(struct bpf_perf_event_data *ctx) {
- struct key_t key = {};
- get_key(&key);
-
- llc_miss.increment(key, ctx->sample_period);
- return 0;
- }
-```
-
-CSIT captured metrics
----------------------
-
-SUT
-~~~
-
-Compute resource
-________________
-
-- BPF /process
- - BPF_HASH(llc_miss, struct key_t);
- - BPF_HASH(llc_reference, struct key_t);
- - BPF_HASH(cpu_instruction, struct key_t);
- - BPF_HASH(cpu_cycle, struct key_t);
-
-Memory resource
-_______________
-
-- BPF /process
- - tbd
-
-Network resource
-________________
-
-- BPF /process
- - tbd
-
-DUT VPP metrics
-~~~~~~~~~~~~~~~
-
-Compute resource
-________________
-
-- runtime /node `show runtime`
- - calls
- - vectors
- - suspends
- - clocks
- - vectors_calls
-- perfmon /bundle
- - inst-and-clock node intel-core instructions/packet, cycles/packet and IPC
- - cache-hierarchy node intel-core cache hits and misses
- - context-switches thread linux per-thread context switches
- - branch-mispred node intel-core Branches, branches taken and mis-predictions
- - page-faults thread linux per-thread page faults
- - load-blocks node intel-core load operations blocked due to various uarch reasons
- - power-licensing node intel-core Thread power licensing
- - memory-bandwidth system intel-uncore memory reads and writes per memory controller channel
-
-Memory resource - tbd
-_____________________
-
-- memory /segment `show memory verbose api-segment stats-segment main-heap`
- - total
- - used
- - free
- - trimmable
- - free-chunks
- - free-fastbin-blks
- - max-total-allocated
-- physmem `show physmem`
- - pages
- - subpage-size
-
-Network resource
-________________
-
-- counters /node `show node counters`
- - count
- - severity
-- hardware /interface `show interface`
- - rx_stats
- - tx_stats
-- packets /interface `show hardware`
- - rx_packets
- - rx_bytes
- - rx_errors
- - tx_packets
- - tx_bytes
- - tx_errors
- - drops
- - punt
- - ip4
- - ip6
- - rx_no_buf
- - rx_miss
-
-
-DUT DPDK metrics - tbd
-~~~~~~~~~~~~~~~~~~~~~~
-
-Compute resource
-________________
-
-- BPF /process
- - BPF_HASH(llc_miss, struct key_t);
- - BPF_HASH(llc_reference, struct key_t);
- - BPF_HASH(cpu_instruction, struct key_t);
- - BPF_HASH(cpu_cycle, struct key_t);
-
-Memory resource
-_______________
-
-- BPF /process
- - tbd
-
-Network resource
-________________
-
-- packets /interface
- - inPackets
- - outPackets
- - inBytes
- - outBytes
- - outErrorPackets
- - dropPackets
+ Legend:
+ - stat_runtime
+ - vpp-runtime
+ - stat_pre_trial
+ - vpp-clear-stats
+ - vpp-enable-packettrace // if extended_debug == True
+ - stat_post_trial
+ - vpp-show-stats
+ - vpp-show-packettrace // if extended_debug == True