aboutsummaryrefslogtreecommitdiffstats
path: root/docs/test_specs/csit-nat44ed-cps-spec.md
blob: c39ba9eced8b6a9b30418c59f7f9958333c4f188 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
## Content

<!-- MarkdownTOC autolink="true" -->

- [Tests for NAT44ED](#tests-for-nat44ed)
- [CPS Test Objectives](#cps-test-objectives)
- [Input Parameters](#input-parameters)
- [Stateful traffic profiles](#stateful-traffic-profiles)
- [UDP CPS Tests](#udp-cps-tests)
  - [UDP TRex Measurements](#udp-trex-measurements)
    - [Counters](#counters)
    - [Calculations](#calculations)
    - [CPS-MRR](#cps-mrr)
    - [CPS-PDR](#cps-pdr)
    - [CPS-NDR](#cps-ndr)
  - [UDP VPP Telemetry](#udp-vpp-telemetry)
    - [Counters](#counters-1)
    - [Errors](#errors)
- [TCP/IP CPS Tests](#tcpip-cps-tests)
  - [TCP/IP TRex Measurements](#tcpip-trex-measurements)
    - [Counters](#counters-2)
    - [Calculations](#calculations-1)
    - [CPS Trial PASS](#cps-trial-pass)
    - [CPS-MRR](#cps-mrr-1)
    - [CPS-PDR](#cps-pdr-1)
    - [CPS-NDR](#cps-ndr-1)
  - [TCP/IP VPP Telemetry](#tcpip-vpp-telemetry)
    - [Counters](#counters-3)
    - [Errors](#errors-1)

<!-- /MarkdownTOC -->

## Tests for NAT44ED

Two types of stateful tests are developed for NAT44ED (source network address
and port translation IPv4 to IPv4 with 5-tuple session state):

- Connections-Per-Second (CPS), discovering the maximum rate of creating
  NAT44ED sessions. Measured separately for UDP and TCP connections and
  for different session scale.

- Packets-Per-Second (PPS), discovering the maximum rate of
simultaneously creating NAT44ED sessions and transfering bulk of data
packets across the corresponding connections. Measured separately for
UDP and TCP connections with different session scale and different data
packet sizes per each connection. Current code is using 64B only for UDP
and default MSS 1460B for TCP/IP.

This note describes CPS tests.

## CPS Test Objectives

Discover DUT's highest sustain rate of creating fully functional NAT44ED
5-tuple stateful session entries. Session entry is considered fully
functional, if packets associated with this entry are NAT44ED processed
by DUT and forwarded in both directions without loss.

Similarly to packet throughput tests, three CPS rates are discovered:

- CPS-MRR, verified connection rate at maximal connection attempt rate,
  regardless of an amount of not established connections. (Connections
  per Second - Maximum Receive Rate.)
- CPS-NDR, maximal connection attempt rate at which all connections get
  established. (Connections per Second - Non Drop Rate.)
- CPS-PDR, maximal connection attempt rate at which ratio of not
  established connections to attempted connections is below configured
  threshold. (Connections per Second - Partial Drop Rate.)

## Input Parameters

- `max_cps_rate`, maximum rate of attempting connections, to be used by
  traffic generator, limited by traffic generator capabilities, Ethernet
  link(s) rate and NIC model.
- `min_cps_rate`, minimum rate of establishing connections to be used for
  measurements. Search fails if lower transmit rate needs to be used to
  meet search criteria.
- `target_session_number`, maximum number of sessions to be established and
  tested.
- `target_loss_ratio`, maximum acceptable connections loss ratio search
  criteria for PDR measurements with UDP tests. Indicates packet drop
  impact on connection establishment rate.
- `final_relative_width`, required measurement resolution expressed as
  (lower_bound, upper_bound) interval width relative to upper_bound.
- stateful traffic profiles, TRex ASTF program defining the connection
  per L4 protocol tested (TCP, UDP), including connect and
  close sequence.

## Stateful traffic profiles

TRex ASTF program defines following TCP and UDP transactions for
discovering NAT44ED CPS limits:

- CPS with TCP
  - connect(syn,syn-ack,ack)
    - pkts client tx 2, rx 1
    - pkts server tx 1, rx 2
  - delay (note: optional, currently not implemented)
    - no packets
  - close(fin,fin-ack,ack,ack)
    - pkts client tx 2, rx 2
    - pkts server tx 1, rx 2
- CPS with UDP
  - connect_and_close(req,ack)
    - pkts client tx 1, rx 1
    - pkts server tx 1, rx 1

TRex ASTF program configuration parameters:

- `limit` of connections, set to `target_session_number`.
- `multiplier`, represents `trial_cps_rate`, a number of connections per
  second to be executed per trial. Multiplier applies to connect phases.
  Close phases occur automatically based on arrival of the last packet
  expected per session.
- IPv4 source and destination address and port ranges matching the
  limit of connections.
  - Source and destination addresses changing packet-by-packet with two
    separate profiles i) incrementing sequentially pair-wise
    (implemented) and ii) changed randomly (with seed) pair-wise (not
    implemented yet).
  - Source port changing randomly within the range.
- `trial_duration`, function of `target_session_number` and `multiplier`
  - `multiplier`, subject of the search, value in the range (`min_cps_rate`,`max_cps_rate`)
  - `target_setup_duration` = `target_session_number` / `trial_cps_rate`
  - For UDP:
    - `trial_duration` = `target_setup_duration` + `late_traffic_start_correction`
    - `late_traffic_start_correction` = 0.1115 seconds (hardcoded for now)
  - For TCP:
    - `trial_duration` = 2 * `target_setup_duration` + `late_traffic_start_correction`
    - `late_traffic_start_correction` = 0.1115 seconds (hardcoded for now)

## UDP CPS Tests

### UDP TRex Measurements

#### Counters

Following TRex ASTF counters are collected by UDP CPS tests for automated
results evaluation (r) and debugging purposes (d):

- Interface 1 Client
  - (r) `opackets`, TRex UDP transaction start
  - (r) `ipackets`, TRex UDP transaction finish
- Interface 2 Server
  - (d) `opackets`
  - (d) `ipackets`
- Traffic Client
  - (d) `m_active_flows`
  - (d) `m_est_flows`
  - (d) `m_traffic_duration`, includes TRex ramp-up overhead, and it can
    be quite far from the actual traffic duration
  - (d) `udps_connects`
  - (d) `udps_closed`
  - (d) `udps_sndbyte`
  - (d) `udps_sndpkt`
  - (d) `udps_rcvbyte`
  - (d) `udps_rcvpkt`
  - (d) `udps_keepdrops`, TRex out of capacity, dropping UDP KAs(?)
<!--
Vratko Polak: Yes, although the traffic profile should have set large
enough keepalive value so zero KA packets are actually sent within the
trial. I did not actually check the value is large enough for the worst
case (ndrpdr search hitting min multiplier of 9001).
-->
  - (d) `err_rx_throttled`, TRex out of capacity, throttling workers due
    to Rx overload(?)
<!--
Vratko Polak: I think this is TRex receiving the packet on L2 level, but
then dropping it because L7 buffers are full. Such packet increases
ipackets, but does not increase any L7 counter (even if traffic profile
wants to receive that packet). But this is just me guessing. TRex docs
say "rx thread was throttled due too many packets in NIC rx queue", and
I did no experiments/investigation to confirm my hypothesis fits with
the observed counters.
-->
  - (d) `err_c_nf_throttled`, Number of client side flows that were not
    opened due to flow-table overflow(?)
  - (d) `err_flow_overflow`, too many flows(?)
- Traffic Server
  - (d) `m_active_flows`
  - (d) `m_est_flows`
  - (r) `m_traffic_duration`
  - (d) `udps_accepts`
  - (d) `udps_closed`
  - (d) `udps_sndbyte`
  - (d) `udps_sndpkt`
  - (d) `udps_rcvbyte`
  - (d) `udps_rcvpkt`
  - (d) `err_rx_throttled`, TRex out of capacity, throttling workers due
    to Rx overload(?)

[TRex ASTF counters reference](https://trex-tgn.cisco.com/trex/doc/trex_astf.html#_counters_reference).

TRex counters are polled once TRex confirms traffic is stopped, after it
is explicitly instructed to stop it. Early attempts to use periodic TRex
counter polling affected TRex behaviour and test results, hence counter
polling is consider as invasive.

#### Calculations

- Interface packet loss
  - pktloss_ratio = (c_opackets - c_ipackets) / c_opackets
- UDP session packet loss (currently not used)
- UDP session byte loss (currently not used)
- UDP session integrity (currently not used)

#### CPS-MRR

Reported MRR values are calculated as follows:

CPS-MRR = `c_ipackets` / `s_traffic_duration`, where
`s_traffic_duration` = TRex Traffic Server `m_traffic_duration`.

In order to ensure a determnistic region of TRex ASTF operation, a
separate set of tests is run for each traffic profile, with vpp-ip4base
DUT instead of vpp-nat44ed, to auto-discover the maximum rate TRex ASTF
traffic profile is capable of. Result of this test is used as a side
reference to compare with the results of NAT44ed CPS-MRR tests.

#### CPS-PDR

CPS-PDR values are discovered using MLRsearch, a binary search optimized
for the overall test duration.

CPS-PDR = max(`trial_cps_rate`) found for `pktloss_ratio` <
`target_loss_ratio`, according to MLRsearch criteria for PDR.

Measurements to be reported in the CPS-PDR result test message:

- PDR_LOWER

#### CPS-NDR

CPS-NDR values are also discovered using MLRsearch.

CPS-NDR = max(`trial_cps_rate`) found for `pktloss_ratio` = 0, according
to MLRsearch criteria for PDR.

Measurements to be reported in the CPS-NDR result test message:

- NDR_LOWER

### UDP VPP Telemetry

#### Counters

- VPP show nat44 summary

  ```
  max translations per thread: 81920
  max translations per user: 81920
  total timed out sessions: 0
  total sessions: 64514
  total tcp sessions: 0
  total tcp established sessions: 0
  total tcp transitory sessions: 0
  total tcp transitory (WAIT-CLOSED) sessions: 0
  total tcp transitory (CLOSED) sessions: 0
  total udp sessions: 64514
  total icmp sessions: 0
  ```

- VPP show interface

  ```
  show hardware verbose (10.30.51.54 - /run/vpp/api.sock):
  Name                Idx   Link  Hardware
  avf-0/3b/2/0                       1     up   avf-0/3b/2/0
    Link speed: 25 Gbps
    Ethernet address 3c:fe:bd:f9:00:00
    flags: initialized admin-up vaddr-dma link-up rx-interrupts
    offload features: l2 vlan rx-polling rss-pf
    num-queue-pairs 3 max-vectors 5 max-mtu 0 rss-key-size 52 rss-lut-size 64
    speed
    stats:
      rx bytes             69368896
      rx unicast           135301620
      rx discards          94585780
      tx bytes             2401281120
      tx unicast           40021352
  avf-0/3b/a/0                       2     up   avf-0/3b/a/0
    Link speed: 25 Gbps
    Ethernet address 3c:fe:bd:f9:01:00
    flags: initialized admin-up vaddr-dma link-up rx-interrupts
    offload features: l2 vlan rx-polling rss-pf
    num-queue-pairs 3 max-vectors 5 max-mtu 0 rss-key-size 52 rss-lut-size 64
    speed
    stats:
      rx bytes             40912192
      rx unicast           134856987
      rx discards          94835635
      tx bytes             2442955680
      tx unicast           40715928
  ```

- VPP show runtime

  ```
  Thread 1 vpp_wk_0 (lcore 2)
  Time 21.5, 10 sec internal node vector rate 0.00 loops/sec 6740197.88
    vector rates in 4.2183e3, out 3.7118e3, drop 0.0000e0, punt 0.0000e0
               Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
  avf-0/3b/2/0-output              active                277           34387               0          1.96e1          124.14
  avf-0/3b/2/0-tx                  active                277           34387               0          3.54e1          124.14
  avf-0/3b/a/0-output              active                380           45245               0          1.92e1          119.07
  avf-0/3b/a/0-tx                  active                380           45245               0          3.36e1          119.07
  avf-input                        polling         144384995           90499               0          3.03e5            0.00
  ethernet-input                   active                381           90499               0          1.91e1          237.53
  ip4-input-no-checksum            active                381           90499               0          4.94e1          237.53
  ip4-lookup                       active                521           79632               0          3.76e1          152.84
  ip4-rewrite                      active                521           79632               0          4.19e1          152.84
  ip4-sv-reassembly-feature        active                381           90499               0          3.78e1          237.53
  nat44-ed-in2out                  active                380           45245               0          1.98e2          119.07
  nat44-ed-in2out-slowpath         active                380           45245               0          2.31e3          119.07
  nat44-ed-out2in                  active                277           34387               0          1.89e2          124.14
  nat44-in2out-worker-handoff      active                381           90499               0          9.42e1          237.53
  unix-epoll-input                 polling            140863               0               0          1.61e3            0.00
  ---------------
  Thread 2 vpp_wk_1 (lcore 58)
  Time 21.5, 10 sec internal node vector rate 0.00 loops/sec 6733488.17
    vector rates in 3.3365e3, out 3.5604e3, drop 0.0000e0, punt 0.0000e0
               Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
  avf-0/3b/2/0-output              active                276           31129               0          2.03e1          112.79
  avf-0/3b/2/0-tx                  active                276           31129               0          3.63e1          112.79
  avf-0/3b/a/0-output              active                332           45254               0          1.87e1          136.31
  avf-0/3b/a/0-tx                  active                332           45254               0          3.48e1          136.31
  avf-input                        polling         166439403           71581               0          4.42e5            0.00
  ethernet-input                   active                277           65516               0          1.89e1          236.52
  ip4-input-no-checksum            active                277           65516               0          4.95e1          236.52
  ip4-lookup                       active                455           76383               0          3.75e1          167.87
  ip4-rewrite                      active                455           76383               0          4.20e1          167.87
  ip4-sv-reassembly-feature        active                277           65516               0          3.85e1          236.52
  nat44-ed-in2out                  active                377           45254               0          1.97e2          120.04
  nat44-ed-in2out-slowpath         active                332           45254               0          2.39e3          136.31
  nat44-ed-out2in                  active                276           31129               0          1.83e2          112.79
  nat44-out2in-worker-handoff      active                277           65516               0          2.17e2          236.52
  unix-epoll-input                 polling            140817               0               0          1.60e3            0.00
  ```

#### Errors

- VPP show errors

  ```
  Count                    Node                  Reason
       32258       nat44-in2out-worker-handoff      same worker
       32256       nat44-in2out-worker-handoff      do handoff
       32258             nat44-ed-out2in            good out2in packets processed
       32258             nat44-ed-out2in            UDP packets
       32258        nat44-ed-in2out-slowpath        good in2out packets processed
       32258        nat44-ed-in2out-slowpath        UDP packets
       32256       nat44-out2in-worker-handoff      same worker
       32258       nat44-out2in-worker-handoff      do handoff
       32256             nat44-ed-out2in            good out2in packets processed
       32256             nat44-ed-out2in            UDP packets
       32256        nat44-ed-in2out-slowpath        good in2out packets processed
       32256        nat44-ed-in2out-slowpath        UDP packets
  ```

## TCP/IP CPS Tests

### TCP/IP TRex Measurements

#### Counters

Following TRex ASTF counters are collected by UDP CPS tests for automated
results evaluation (r) and debugging purposes (d):

- Interface 1 Client
  - (d) `opackets`
  - (d) `packets`
- Interface 2 Server
  - (d) `opackets`
  - (d) `packets`
- Traffic Client
  - (d) `m_active_flows`
  - (d) `m_est_flows`
  - (d) `m_traffic_duration`
  - (r) `tcps_connattempt`
  - (d) `tcps_connects`
  - (d) `tcps_closed`
- Traffic Server
  - (d) `m_active_flows`
  - (d) `m_est_flows`
  - (r) `m_traffic_duration`
  - (d) `tcps_accepts`
  - (r) `tcps_connects`
  - (d) `tcps_closed`
  - (d) `err_no_template`, server can’t match L7 template no destination port or IP range

[TRex ASTF counters reference](https://trex-tgn.cisco.com/trex/doc/trex_astf.html#_counters_reference).

TRex counters are polled only once by CSIT after traffic is stopped.

#### Calculations

TODO WIP Note: Currently s_tcp_connects is used for counting successful
sessions. But now I am not sure whether it is correct, as already
c_tcps_connects counts NAT sessions that got established (even though
TCP is not fully connected yet). Not sure how the counters behave when
the third packet is lost and retransmitted.

- Interface packet loss
  - `pktloss_c_s` = `c_opackets` - `s_ipackets`
  - `pktloss_s_c` = `s_opackets` - `c_ipackets`
  - `pktloss_ratio` = (`pktloss_s_c` + `pktloss_c_s`) / (`c_opackets` + `s_opackets`)
- TCP session integrity
  - `tcp_attempted_connection_count` = `c_tcps_connattempt`
  - `tcp_failed_connection_count` = `c_tcps_connects` - `c_tcps_connattempt`

#### CPS Trial PASS

TODO WIP Note: Currently any trial measurement fails only if TRex itself
fails, or if we fail to parse some counter. No criteria mentioned here
is currently planned to be implemented; we rely on bad things leading to
too few (maybe zero) passed transactions.

<!--
PASS of TCP CPS test trial is conditioned on all of the following criteria being met:

- PASS-C1 TRex must attempt all configured `target_session_number` in `target_setup_duration` time
   - IOW TRex must send connect packets at configured `trial_cps_rate`.
- PASS-C2 Following TRex errors ARE NOT recorded in Target-Counters:
  - Traffic Client
    - No errors recorded so far
  - Traffic Server
    - `err_no_template`, server can’t match L7 template no destination port or IP range
-->

#### CPS-MRR

Reported MRR values are equal to the following TRex counters from Target-Counters:
- `c_m_est_flows`
- `s_m_est_flows`

TODO Add description of separate set of tests for discovering a **safe**
CPS-MTR value (Maximum Transmit Rate) for TRex, where TRex errors **are not**
observed in Target-Counters.

#### CPS-PDR

CPS-PDR values are discovered using MLRsearch, a binary search optimized
for the overall test duration.

CPS-PDR = `trial_cps_rate`, if all of the following conditions are met:

- `tcp_failed_connection_count` < `target_loss_ratio`
- `pktloss_ratio` < `target_loss_ratio`

Measurements to be reported in the CPS-PDR result test message:

- `trial_cps_rate`
- `c_m_est_flows`
- `s_m_est_flows`

#### CPS-NDR

CPS-NDR values are discovered using MLRsearch, a binary search optimized
for the overall test duration.

CPS-NDR = `trial_cps_rate`, if all of the following conditions are met:

- `tcp_failed_connection_count` = 0
- `pktloss_ratio` = 0

Measurements to be reported in the CPS-PDR result test message:

- `trial_cps_rate`
- `c_m_est_flows`
- `s_m_est_flows`

### TCP/IP VPP Telemetry

#### Counters

- VPP show nat44 summary

  ```
  <TODO add sample output>
  ```

- VPP show interface

  ```
  <TODO add sample output>
  ```

- VPP show runtime

  ```
  <TODO add sample output>
  ```

#### Errors

- VPP show errors

  ```
  <TODO add sample output>
  ```