docs/ietf/draft-ietf-bmwg-mlrsearch-04.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479

---
title: Multiple Loss Ratio Search
abbrev: MLRsearch
docname: draft-ietf-bmwg-mlrsearch-04
date: 2023-07-10

ipr: trust200902
area: ops
wg: Benchmarking Working Group
kw: Internet-Draft
cat: info

coding: us-ascii
pi:    # can use array (if all yes) or hash here
  toc: yes
  sortrefs:   # defaults to yes
  symrefs: yes

author:
      -
        ins: M. Konstantynowicz
        name: Maciek Konstantynowicz
        org: Cisco Systems
        email: mkonstan@cisco.com
      -
        ins: V. Polak
        name: Vratko Polak
        org: Cisco Systems
        email: vrpolak@cisco.com

normative:
  RFC1242:
  RFC2285:
  RFC2544:
  RFC9004:

informative:
  TST009:
    target: https://www.etsi.org/deliver/etsi_gs/NFV-TST/001_099/009/03.04.01_60/gs_NFV-TST009v030401p.pdf
    title: "TST 009"
  FDio-CSIT-MLRsearch:
    target: https://csit.fd.io/cdocs/methodology/measurements/data_plane_throughput/mlr_search/
    title: "FD.io CSIT Test Methodology - MLRsearch"
    date: 2022-11
  PyPI-MLRsearch:
    target: https://pypi.org/project/MLRsearch/0.4.0/
    title: "MLRsearch 0.4.0, Python Package Index"
    date: 2021-04

--- abstract

This document proposes improvements to [RFC2544] throughput search by
defining a new methodology called Multiple Loss Ratio search
(MLRsearch). The main objectives for MLRsearch are to minimize the
total test duration, search for multiple loss ratios and improve
results repeatibility and comparability.

The main motivation behind MLRsearch is the new set of challenges and
requirements posed by testing Network Function Virtualization
(NFV) systems and other software based network data planes.

MLRsearch offers several ways to address these challenges, giving user
configuration options to select their preferred way.

--- middle

{::comment}
    As we use kramdown to convert from markdown,
    we use this way of marking comments not to be visible in rendered draft.
    https://stackoverflow.com/a/42323390
    If other engine is used, convert to this way:
    https://stackoverflow.com/a/20885980
{:/comment}

# Purpose and Scope

The purpose of this document is to describe Multiple Loss Ratio search
(MLRsearch), a throughput search methodology optimized for software
DUTs.

Applying vanilla [RFC2544] throughput bisection to software DUTs
results in a number of problems:

- Binary search takes too long as most of trials are done far from the
  eventually found throughput.
- The required final trial duration (and pauses between trials) also
  prolong the overall search duration.
- Software DUTs show noisy trial results (noisy neighbor problem),
  leading to big spread of possible discovered throughput values.
- Throughput requires loss of exactly zero packets, but the industry
  frequently allows for small but non-zero losses.
- The definition of throughput is not clear when trial results are
  inconsistent.

MLRsearch aims to address these problems by applying the following set
of enhancements:

- Allow searching for multiple search goals, with differing goal loss ratios.
  - Each trial result can affect any search goal in principle
    (trial reuse).
- Multiple preceding targets for each search goal, earlier ones need
  to spend less time on trials.
  - Earlier targets also aim at lesser precision.
  - Use Forwarding Rate (FR) at maximum offered load
    [RFC2285] (section 3.6.2) to initialize the initial targets.
- Take care when dealing with inconsistent trial results.
  - Loss ratios goals are handled in an order that minimizes the chance
    of interference from later trials to earlier goals.
- Apply several load selection heuristics to save even more time
  by trying hard to avoid unnecessarily narrow bounds.

MLRsearch configuration options are flexible enough to
support both conservative settings (unconditionally compliant with [RFC2544],
but longer search duration and worse repeatability) and aggressive
settings (shorter search duration and better repeatability but not
compliant with [RFC2544]).

No part of [RFC2544] is intended to be obsoleted by this document.

# Terminology

When a subsection is defining a term, the first paragraph
acts as a definition. Other paragraphs are treated as a description,
they provide additional details without being needed to define the term.

Definitions should form a directed acyclic graph of dependencies.
If a section contains subsections, the section definition
may depend on the subsection definitions.
Otherwise, any definition may depend on preceding definitions.
In other words, if the section definition were to come after subsections,
there would be no forward dependencies for people reading just definitions
from start to finish.

Descriptions provide motivations and explanations,
they frequently reference terms defined only later.
Motivations in section descriptions are the reason
why section text comes before subsection text.

## General notions

General notions are the terms defined in this section.

It is useful to define the following notions
before delving into MLRsearch architecture,
as the notions appear in multiple places
with no place being special enough to host definition.

### General and specific quantities

General quantity is a quantity that may appear multiple times
in MLRsearch specification, perhaps each time in a different role.
The quantity when appearing in a single role is called
a specific quantity.

It is useful to define the general quantity,
so definitions of specific quantities may refer to it.
We say a specific quantity is based on a general quantity,
if the specific quantity definition refers to and
relies on the general quantity definition.

It is natural to name specific quantities by adding an adjective
(or a noun) to the name of the general quantity.
But existing RFCs typically explicitly define a term acting
in a specific role, so the RFC name directly refers to a specific
quantity, while the corresponding general quantity
is defined only implicitly.
Therefore this documents defines general quantities explicitly,
even if the same term already appears in an RFC.

In practice, it is required to know which unit of measurement
is used to accompany a numeric value of each quantity.
The choice of a particular unit of measurement is not important
for MLRsearch specification though, so specific units
mentioned in this document are just examples or recommendations,
not requirements.

When reporting, it is REQUIRED to state the units used.

### Composite

A composite is a set of named attributes.
Each attribute is either a specific quantity or a composite.

MLRsearch specification frequently groups multiple specific quantities
into a composite. Description of such a composite brings an insight
to motivations why this or other terms are defined as they are.
Such insight will be harder to communicate
with the specific quantities alone.

Also, it simplifies naming of specific quantities, as they usually can
share a noun or adjective referring to their common composite.
Most of relations between composites and their specific quantities
can be described using plain English.

Perhaps the only exception involves referring to specific quantities
as attributes. For example if there is a composite called 'target',
and one of its specific quantities is 'target width' defined using
a general quantity 'width', we can say 'width is one of target attributes'.

### SUT

As defined in RFC 2285:
The collective set of network devices to which stimulus is offered
as a single entity and response measured.

While RFC 2544 mostly refers to DUT as a single
(network interconnecting) device, section 19 makes it clear
multiple DUTs can be treated as a single system,
so most of RFC 2544 also applies to testing SUT.

MLRsearch specification only refers to SUT (not DUT),
even if it consists of just a single device.

### Trial

A trial is the part of test described in RFC 2544 section 23.

When traffic has been sent and SUT response has been observed,
we say the trial has been performed, or the trial has been measured.
Before that happens, multiple possibilities for upcoming trial
may be under consideration.

### Load

Intended, constant load for a trial, usually in frames per second.

Load is the general quantity implied by Constant Load of RFC 1242,
Data Rate of RFC 2544 and Intended Load of RFC 2285.
All three specify this value applies to one (input or output) interface,
so we can talk about unidirectional load also
when bidirectional or multi-port traffic is applied.

MLRsearch does not rely on this distinction, it works also if
the load values correspond to an aggregate rate
(sum over all SUT tested input or output interface unidirectional loads),
as long as all loads share the same semantics.

Several RFCs define useful quantities based on Offered Load
(instead of Intended Load), but MLRsearch specification
works only with (intended) load. Those useful quantities
still serve as motivations for few specific quantities used in MLRsearch
specification.

MLRsearch assumes most load values are positive.
For some (but not all) specific quantities based on load,
zero may also be a valid value.

### Duration

Intended duration of the traffic for a trial, usually in seconds.

This general quantity does not include any preparation nor waiting
described in section 23 of RFC 2544.
Section 24 of RFC 2544 places additional restrictions on duration,
but those restriction apply only to some of the specific quantities based
on duration.

Duration is always positive in MLRsearch.

### Duration sum

For a specific set of trials, this is the sum of their durations.

Some of specific quantities based on duration sum are derived quantities,
without a specific set of trials to sum their durations.

Duration sum is never negative in MLRsearch.

### Width

General quantity defined for an ordered pair (lower and higher)
of load values, which describes a distance between the two values.

The motivation for the name comes from binary search.
The binary search tries to approximate an unknown value
by repeatedly bisecting an interval of possible values,
until the interval becomes narrow enough.
Width of the interval is a specific quantity
and the termination condition compares that
to another specific quantity acting as the threshold.
The threshold value does not have a specific interval associated,
but corresponds to a 'size' of the compared interval.
As size is a word already used in definition of frame size,
a more natural word describing interval is width.

The MLRsearch specification does use (analogues of) upper bound
and lower bound, but does not actually need to talk about intervals.
Still, the intervals are implicitly there, so width is the natural name.

Actually, there are two popular options for defining width.
Absolute width is based on load, the value is the higher load
minus the lower load.
Relative width is dimensionless, the value is the absolute width
divided by the higher load. As intended loads for trials are positive,
relative width is between 0.0 (including) and 1.0 (excluding).

Relative width as a threshold value may be useful for users
who do not presume what is the typical performance of SUT,
but absolute width may be a more familiar concept.

MLRsearch specification does not prescribe which width has to be used,
but widths MUST be either all absolute or all relative,
and it MUST be clear from report which option was used
(it is implied from the unit of measurement of any width value).

### Loss ratio

The loss ratio is a general quantity, dimensionless floating point value
assumed to be between 0.0 and 1.0, both including.
It is computed as the number of frames forwarded by SUT, divided by
the number of frames that should have been forwarded during the trial.

If the number of frames that should have been forwarded is zero,
the loss ratio is considered to be zero
(but it is better to use high enough loads to prevent this).

Loss ratio is basically the same quantity as Frame Loss Rate of RFC 1242,
just not expressed in percents.

RFC1242 Frame Loss Rate:
Percentage of frames that should have been forwarded
by a network device under steady state (constant)
load that were not forwarded due to lack of
resources.

(RFC2544 restricts Frame Loss Rate to a type of benchmark,
for loads 100% of 'maximum rate', 90% and so on.)

### Exceed ratio

This general quantity is a dimensionless floating point value,
defined using two duration sum quantities.
One duration sum is referred to as the good duration sum,
the other is referred to as the bad duration sum.
The exceed ratio value is computed as the bad duration sum value
divided by the sum of the two sums. If both sums are zero,
the exceed ratio is undefined.

As there are no negative duration sums in MLRsearch,
exceed ratio values are between 0.0 and 1.0 (both including).

## Architecture

MLRsearch architecture consists of three main components:
the manager, the controller and the measurer.

The search algorithm is implemented in the controller,
and it is the main focus of this document.

Most implementation details of the manager and the measurer are
out of scope of this document, except when describing
how do those components interface with the controller.

### Manager

The manager is the component that initializes SUT, traffic generator
(called tester in RFC 2544), the measurer and the controller
with intended configurations. It then handles the execution
to the controller and receives its result.

Managers can range from simple CLI utilities to complex
Continuous Integration systems. From the controller point of view
it is important that no additional configuration (nor warmup)
is needed for SUT and the measurer to perform trials.

The interface between the manager and the controller
is defined in the controller section.

One execution of the controller is called a search.
Some benchmarks may execute multiple searches on the same SUT
(for example when confirming the performance is stable over time),
but in this document only one invocation is concerned
(others may be understood as the part of SUT preparation).

Creation of reports of appropriate format can also be understood
as the responsibility of the manager. This document places requirements
on which information has to be reported.

### Measurer

The measurer is the component which performs one trial
as described in RFC 2544 section 23, when requested by the controller.

From the controller point of view, it is a function that accepts
trial input and returns trial output.

This is the only way the controller can interact with SUT.
In practice, the measurer has to do subtle decisions
when converting the observed SUT behavior into a single
trial loss ratio value. For example how to deal with
out of order frames or duplicate frames.

On software implementation level, the measurer is a callable,
injected by the manager into the controller instance.

The act of performing one trial (act of turning trial input
to trial output) is called a measurement, or trial measurement.
This way we can talk about trials that were measured already
and trials that are merely planned (not measured yet).

#### Trial input

The load and duration to use in an upcoming trial.

This is a composite.

Other quantities needed by the measurer are assumed to be constant
and set up by the manager before search starts (see traffic profile),
so they do not count as trial input attributes.

##### Trial load

Trial load is the intended load for the trial.

This is a specific quantity based on load,
directly corresponding to RFC 2285 intended load.

##### Trial duration

Trial duration is the intended duration for the trial.

This is a specific quantity based on duration, so it specifies
only the traffic part of the trial, not the waiting parts.

#### Traffic profile

Any other configuration values needed by the measurer to perform a trial.

The measurer needs both trial input and traffic profile to perform the trial.
As trial input contains the only values that vary during one the search,
traffic profile remains constant during the search.

Traffic profile when understood as a composite is REQUIRED by RFC 2544
to contain some specific quantities (for example frame size).
Several more specific quantities may be RECOMMENDED.

Depending on SUT configuration (e.g. when testing specific protocols),
additional values need to be included in the traffic profile
and in the test report. (See other IETF documents.)

#### Trial ouput

A composite consisting of trial loss ratio
and trial forwarding rate.

Those are the only two specific quantities (among other quantities
possibly measured in the trial, for example offered load)
that are important for MLRsearch.

##### Trial loss ratio

Trial loss ratio is a specific quantity based on loss ratio.
The value is related to a particular measured trial,
as measured by the measurer.

##### Trial forwarding rate

Trial forwarding rate is a derived quantity.
It is computed as one minus trial loss ratio,
that multiplied by trial load.

Despite the name, the general quantity this specific quantity
corresponds to is load (not rate).
The name is inspired by RFC 2285, which defines Forwarding Rate
specific to one output interface.

As the definition of loss ratio is not neccessarily per-interface
(one of details left for the measurer), using the definition above
(instead of RFC 2285) makes sure trial forwarding rate
is always between zero and the trial load (both including).

#### Trial result

Trial result is a composite consisting of trial input attributes
and trial output attributes.

Those are all specific quantites related to a measured trial MLRsearch needs.

While distinction between trial input and output is important
when defining the interface between the controller and the measurer,
it is easier to talk about trial result
when describing how measured trials influence the controller behavior.

### Controller

The component of MLRsearch architecture that calls the measurer
and returns conditional throughputs to the manager.

This component implements the search algorithm,
the main content of this document.

Contrary to Throughput as defined in RFC 1242,
the definition of conditional throughput is quite sensitive
to the controller input (as provided by the manager),
and its full definition needs several terms
which would otherwise be hidden as internals of the controller
implementation.

The ability of conditional throughput to be less sensitive
to performance variance, and the ability of the controller
to find conditional throughputs for multiple search goals
within one search (and in short overall search time)
are strong enough motivations for the need of increased complexity.

### Controller input

A composite of max load, min load, and a set of search goals.

The search goals (as elements of the set of search goals)
are usually not named and unordered.

It is fine if all search goals of the set have the same value
of a particular attribute. In that case, the common value
may be treated as a global attribute (similarly to max and min load).

The set of search goals MUST NOT be empty.
Two search goals within the set MUST differ in at least one attribute.
The manager MAY avoid both issues by presenting empty report
or de-duplicating the search goals, but it is RECOMMENDED
for the manager to raise an error to its caller,
as the two conditions suggest the test is improperly configured.

#### Max load

Max load is a specific quantity based on load.
No trial load is ever higher than this value.

RFC 2544 section 20 defines maximum frame rate
based on theoretical maximum rate for the frame size on the media.
RFC 2285 section 3.5.3 specifies Maximum offered load (MOL)
which may be lower than maximum frame rate.
There may be other limitations preventing high loads,
for examples resources available to traffic generator.

The manager is expected to provide a value that is not greater
than any known limitation. Alternatively, the measurer
is expected to work at max load, possibly reporting as lost
any frames that were not able to leave Traffic Generator.

From the controller point of view, this is merely a global upper limit
for any trial load candidates.

#### Min load

Min load is a specific quantity based on load.
No trial load is ever lower than this value.

The motivation of this quantity is to prevent trials
with too few frames sent to SUT.

Also, practically if a SUT is able to reach only very small
forwarding rates (min load indirectly serves as a threshold for how small),
it may be considered faulty (or perhaps the test is misconfigured).

#### Search goal

A composite of 7 attributes (see subsections).

If not otherwise specified, 'goal' always refers to a search goal
in this document.

The controller input may contain multiple search goals.
The name Multiple Loss Ratio search was created back when
goal loss ratio was the only attribute allowed to vary between goals.

Each goal will get its conditional throughput discovered
and reported at the end of the search.

The definitions of the 7 attributes are not very informative by themselves.
Their motivation (and naming) becomes more clear
from the impact they have on conditional throughput.

##### Goal loss ratio

A specific quantity based on loss ratio.
A threshold value for trial loss ratios.
MUST be lower than one.

Trial loss ratio values will be compared to this value,
a trial will be considered bad if its loss ratio is higher than this.

For example, RFC 2544 throughput has goal loss ratio of zero,
a trial is bad once a sigle frame is lost.

Loss ratio of one would classify each trial as good (regardless of loss),
which is not useful.

##### Goal initial trial duration

A specific quantity based on duration.
A threshold value for trial durations.
MUST be positive.

MLRsearch is allowed to use trials as short as this when focusing
on this goal.
The conditional throughput may be influenced by shorter trials,
(measured when focusing on other search goals).

{::comment}
    FIXME: Should shorter trials be explicitly ignored?
{:/comment}

##### Goal final trial duration

A specific quantity based on duration.
A threshold value for trial durations.
MUST be no smaller than goal initial trial duration.

MLRsearch is allowed to use trials as long as this when focusing
on this goal. If more data is needed, repeated trials
at the same load and duration are requested by the controller.

##### Goal min duration sum

A specific quantity based on duration sum.
A threshold value for a particular duration sum.

MLRsearch requires at least this amount of (effective) trials
for a particular load to become part of MLRsearch outputs.

It is possible (though maybe not prectical) for goal min duration sum
to be smaller than goal final trial duration.

In practice, the sum of durations actually spent on trial measurement
can be smaller (when trial results are quite one-sided) or even larger
(in presence of shorter-than-final trial duration results at the same load).

If the sum of all (good and bad) long trials is at least this,
and there are no short trials, then the load is guaranteed
to be classified as either an upper or a lower bound.

In some cases, the classification is known sooner,
when the 'missing' trials cannot change the outcome.

When short trials are present, the logic is more complicated.

##### Goal exceed ratio

A specific quantity based on exceed ratio.
A threshold value for particulat sets of trials.

An attribute used for classifying loads into upper and lower bounds.

If the duration sum of all (current duration) trials is at least
min duration sum, and more than this percentage of the duration sum
comes from bad trials, this load is an upper bound.

If there are shorter duration trials, the logic is more complicated.

##### Goal width

A specific quantity based on width.
A threshold value for a particular width.
MUST be positive.

This defines the exit condition for this search goal.

Relevant bounds (of the final target) need to be this close
before conditional throughput can be reported.

##### Preceding targets

A non-negative integer affecting the behavior of the controller.

How many additional non-final targets to add.
Each next preceding target has double width
and min duration sum geometrically closer to initial trial duration.

The usage of preceding targets is an important source
of MLRsearch time savings (compared to simpler search algorithms).

Having this value configurable lets the manager
tweak the overall search duration based on presumed knowledge
of SUT performance stability.

### Controller internals

Terms not directly corresponding to the controller's input nor output,
but needed indirectly as dependencies of the conditional throughput
definition.

Following these definitions specifies virtually all of the controller
(MLRsearch algorithm) logic.

#### Pre-initial trials

Up to three special trials executed at the start of the search.
The first trial load is max load,
subsequent trial load are computed from preceding trial
forwarding rate.

The main loop of the controller logic needs at least one trial result,
and time is saved if the trial results are close to future conditional
throughput values.

The exact way to compute load for second and third trial
(and whether even measure second or third trial)
are not specified here, as the implementation details
have negligible effect on the reported conditional throughput.

{::comment}
    TODO: Still, recommend something like this:
    Loads need to fit several different initial targets at once.
    Duration is the largest among initial trial durations,
    loads are computed from forwarding rate an smallest loss ratio goal.
    Also, the initial target current width is set based on these.
{:/comment}

#### Search target

A composite of 5 specific quantites (see subsections).
Frequently called just target.

Similar to (but distinct from) the search goal.

Each search goal prescribes a final target,
probably with a chain of preceding targets.

More details in the Derived targets section.

##### Target loss ratio

Same as loss ratio of the corresponding goal.

##### Target exceed ratio

Same as exceed ratio of the corresponding goal.

##### Target width

Similar to goal width attribute.
Doubled from goal width for each level of preceding target.

##### Target min duration sum

Similar to goal min duration sum attribute.
Geometrically interpolated between
initial target duration and goal min duration sum.

##### Target trial duration

When MLRsearch focuses on this target, it measures trials
with this duration.
The value is equal to the minimum of goal final trial duration
and target min duration sum.

Also, this value is used to classify trial results
as short (if trial duration is shorter than this) or long.

#### Derived targets

After receiving the set of search goals,
MLRsearch internally derives a set of search targets.

The derived targets can be seen as forming a chain,
from initial target to final target.
The chain is linked by a reference from a target to its preceding
(towarsds initial) target.

The reference may be implemented as 6th attribute od target.

##### Final target

The final target is the target where the most of attribute values
are directly copied from the coresponding search goal.
Final target width is the same as goal width,
final target trial duration is the same as goal final trial duration,
and final target min duration sum is the same
as the goal min duration sum.

The conditional throughput is found when focusing on the final target.
All non-final targets do not directly affect the conditional throughput,
they are there just as an optimization.

##### Preceding target

Each target may have a preceding target.
Goal attribute Preceding targets governs how many targets are created
in addition to the final target corresponding to the search goal.

Any preceding target has double width, meaning one balanced bisection
is needed to reduce preceding target width to the next target width.

Preceding target min duration sum is exponentially smaller,
aiming for prescribed initial target min duration sum.

Preceding target trial duration is either its min duration sum,
or the corresponding goal's final trial duration, whichever is smaller.

As the preceding min duration sum is shorter than the next duration sum,
MLRsearch is able to achieve the preceding target width
sooner (than with the next target min duration sum).

This way an approximation of the conditional throughput is found,
with the next target needing not as much time to improve the approximation
(compared to not starting with the approximation).

##### Initial target

Initial target is a target without any other target preceding it.
Initial target min duration sum is equal to the corresponding goal's
initial trial duration.

As a consequence, initial target trial duration is equal to its min duration sum.

#### Trial classification

Any trial result can be classified according to any target along two axes.

The two classifications are independent.

This classification is important for defining the conditional throughput.

##### Short trial

If the (measured) trial duration is shorter than
the target trial duration, the trial is called long.

##### Long trial

If the (measured) trial duration is not shorter than
the target trial duration, the trial is called long.

##### Bad trial

If the (measured) trial loss ratio is larger than the target loss ratio,
the trial is called bad.

For example, if the target loss ratio is zero,
a trial is bad as soon as one frame was lost.

##### Good trial

If the (measured) trial loss ratio is not larger than the target loss ratio,
the trial is called good.

For example, if the target loss ratio is zero,
a trial is good only when there were no frames lost.

#### Load stat

A composite of 8 quantities (see subsections)
The quantites depend on a target and a load,
and are computed from all trials measured at that load so far.

The MLRsearch output is the conditional througput,
which is a specific quantity based on load.
As MLRsearch may measure multiple trials at the same load,
and those trials may not have the same duration,
we need a way to classify a set of trial results at the same load.

As the logic is not as straightforward as in other parts
of MLRsearch algorithm, it is best defined using the following
derived quantities.

Load stat is the composite for one load and one target.
Set of load stats for one load an all targets is commonly called load stats.

##### Long good duration sum

Sum of durations of all long good trials
(at this load, according to this target).

##### Long bad duration sum

Sum of durations of all long bad trials
(at this load, according to this target).

##### Short good duration sum

Sum of durations of all short good trials
(at this load, according to this target).

##### Short bad duration sum

Sum of durations of all short bad trials
(at this load, according to this target).

##### Effective bad duration sum

One divided by tagret exceed ratio, that plus one.
Short good duration sum divided by that.
Short bad duration sum minus that, or zero if that would be negative.
Long bad duration sum plus that is the effective bad duration sum.

Effective bad duration sum is the long bad duration sum
plus some fraction of short bad duration sum.
The fraction is between zero and one (both possibly including).

If there are no short good trials, effective bad duration sum
becomes the duration sum of all bad trials (long or short).

If an exceed ratio computed from short good duration sum
and short bad duration sum is equal or smaller than the target exceed ratio,
effective bad duration sum is equal to just long bad duration sum.

Basically, short good trials can only lessen the impact
of short bad trials, while short bad trials directly contribute
(unless lessened).

A typical example of why a goal needs higher final trial duration
than initial trial duration is when SUT is expected to have large buffers,
so a trial may be too short to see frame losses due to
a buffer becoming full. So a short good trial does not give strong information.
On the other hand, short bad trial is a strong hint SUT would lose many frames
at that load and long duration.
But if there is a mix of short bad and short good trials,
MLRsearch should not cherry-pick only the short bad ones.

The presented way of computing the effective bad duration sum
aims to be a fair treatment of short good trials.

If the target exceed ratio is zero, the given definition contains
positive infinty as an intermediate value, but still simplifies
to a finite result (long bad duration sum plus short bad duration sum).

##### Missing duration sum

The target min duration sum minus effective bad duration sum
and minus long good duration sum, or zero if that would be negative.

MLRsearch may need up to this duration sum of additional long trials
before classifing the load.

##### Optimistic exceed ratio

The specific quantity based on exceed ratio, where bad duration sum is
the effective bad duration sum, and good duration sum is
the long good duration sum plus the missing duration sum.

This is the value MLRsearch would compare to target exceed ratio
assuming all of the missing duration sum ends up consisting of long good trials.

If there was a bad long trial, optimistic exceed ratio
becomes larger than zero.
Additionally, if the target exceed ratio is zero, optimistic exceed ratio
becomes larger than zero even on one short bad trial.

##### Pessimistic exceed ratio

The specific quantity based on exceed ratio, where bad duration sum is
the effective bad duration sum plus the missing duration sum,
and good duration sum is the long good duration sum.

This is the value MLRsearch would compare to target exceed ratio
assuming all of the missing duration sum ends up consisting of bad good trials.

Note that if the missing duration sum is zero,
optimistic exceed ratio becomes equal to pessimistic exceed ratio.

This is the role target min duration sum has,
it guarantees the two load exceed ratios eventually become the same.
Otherwise, pessimistic exceed ratio
is always bigger than the optimistic exceed ratio.

Depending on trial results, the missing duration sum may not be large enough
to change optimistic (or pessimistic) exceed ratio
to move to the other side compared to target exceed ratio.
In that case, MLRsearch does not need to measure more trials
at this load when focusing on this target.

#### Target bounds

With respect to a target, some loads may be classified as upper or lower bound,
and some of the bounds are treated as relevant.

The subsequent parts of MLRsearch rely only on relevant bounds,
without the need to classify other loads.

##### Upper bound

A load is classified as an upper bound for a target,
if and only if both optimistic exceed ratio
and pessimstic load exceed ratio are larger than the target exceed ratio.

During the search, it is possible there is no upper bound,
for example because every measured load still has too high
missing duration sum.

If the target exceed ratio is zero, and the load has at least one bad trial
(short or long), the load becomes an upper bound.

##### Lower bound

A load is classified as a lower bound for a target,
if and only if both optimistic exceed ratio
and pessimstic load exceed ratio are no larger than the target exceed ratio.

During the search, it is possible there is no lower bound,
for example because every measured load still has too high
missing duration sum.

If the target exceed ratio is zero, all trials at the load of
a lower bound must be good trials (short or long).

Note that so far it is possible for a lower bound to be higher
than an upper bound.

##### Relevant upper bound

For a target, a load is the relevant upper bound,
if and only if it is an upper bound, and all other upper bounds
are larger (as loads).

In some cases, the max load when classified as a lower bound
is also effectively treated as the relevant upper bound.
(In that case both relevant bounds are equal.)

If that happens for a final target at the end of the search,
the controller output may contain max load as the relevant upper bound
(even if the goal exceed ratio was not exceeded),
signalling SUT performs well even at max load.

If the target exceed ratio is zero, the relevant upper bound
is the smallest load where a bad trial (short or long) has been measured.

##### Relevant lower bound

For a target, a load is the relevant lower bound if two conditions hold.
Both optimistic exceed ratio and pessimstic load exceed ratio
are no larger than the target exceed ratio,
and there is no smaller load classified as an upper bound.

This is a second place where MLRsearch is not symmetric
(the first place was effective bad duration sum).

While it is not likely for a MLRsearch to find a smaller upper bound
and a larger load satisfying first condition for the lower bound,
it still may happen and MLRsearch has to deal with it.
The second condition makes sure the relevant lower bound
is smaller than the relevant upper bound.

In some cases, the min load when classified as an upper bound
is also effectively treated as the relevant lower bound.
(In that case both relevant bounds are equal.)

If that happens for a final target at the end of the search,
the controller output may contain min load as the relevant lower bound
even if the exceed ratio was 'overstepped',
signalizing the SUT does not even reach the minimal required performance.

The manager has to make sure this is distingushed in report
from cases where min rate is a legitimate conditional throughput
(e.g. the exceed ratio was not overstepped at the min load).

##### Relevant bounds

The pair of the relevant lower bound and the relevant upper bound.

Useful for determining the width of the relevant bounds.
Any of the bounds may be the effective one (max load or min load).

A goal is achieved (at the end of the search) when the final target's
relevant bounds have width no larger than the goal width.

#### Candidate selector

A stateful object (a finite state machine)
focusing on a single target, used to determine next trial input.

Initialized for a pair of targets:
the current target and its preceding target (if any).

Private state (not shared with other selectors) consists of mode and flags.
Public state (shared with all selectors) is the actual relevant bounds
for both targets (current and precedinig).

After accepting a trial result, each selector can nominate
one candidate (or no candidate) for the next trial measurement.

##### Current target

This is the target this selector tries to achieve.

##### Preceding target

The target (if any) preceding to the current target.

While this selector does not focus on the preceding target,
the relevant bounds for the preceding target are used as hints
when the current bound does not have enough of its relevant bounds.

##### Candidate

The trial input (if any) this selecor nominates.

The trial duration attribute is always the current target trial duration.
The trial load attribute depends on the selector state.

Candidates have defined ordering, to simplify finding the winner.
If load differs, the candidate with lower load is preferred.
If load is the same but duration differs, the candidate
with larger duration is preferred.

##### Selector mode

During its lifetime, selector proceeds through the following modes.
In order, but some modes may be skipped or revisited.

Each mode has its own strategy of determining the candidate load (if any).

###### Waiting

Not enough relevant bounds (even for the preceding target).
In this mode, the selector abstains from nominating a candidate.

This selector leaves this mode when preceding target's selector is done.

###### Halving

Candidate is in the middle of the relevant bounds of the preceding target.

If the relevant bounds are narrow enough already, this mode is skipped.
As the preceding target had double width, just one halving load
needs to be measured.

Selector uses a flag to avoid re-entering this mode
once it finished measuring the halved load.

###### Upgrading

This mode activates when one relevant bound for the current target is present
and there is a matching relevant bound of the preceding target
within the current target width.
Candidate is the load of the matching bound from the preceding target.

At most one bound load is measured, depending on halving outcome.
Private flags are used to avoid upgrading at later times
once selector finished measuring the upgraded load.

###### Extending

Refined already but the other relevant bound for the current target
is still missing.
Nominate new candidate according to external search.
Initial target selectors skip all previous modes.

A private value is used to track the width to be used in next load extension
(increasing geometrically).
For initial target selectors, the starting width may be chosen
based on pre-initial trial results.

If both relevant bounds are present at the current load,
but the lower bound is far away (compared to tracked width),
the candidate from this mode is preferred (as long as the load
is larger than the candidate load of bisecting mode).

###### Bisecting

Both relevant bounds for the current target are available, but they are too far
from each other. Candidate is in the middle.

Contrary to halving, the candidate load does not need to be at the exact middle.
For example if the width of the current relevant bounds
is three times as large as the target width,
it is advantageous to split the interval in 1:2 ratio
(choosing the lower candidate load), as it can save one bisect.

###### Done

Both relevant bounds for the current target are available,
the width is no larger than the target width.
No candidate.

If a selector reaches the done state,
it is still possible later trials invalidate its relevant lower bound
(by proving a lower load is in fact a new uper bound),
making the selector transition into extending or bisecting mode.

##### Active selector

Derived from a common goal, the earliest selector which nominates a candidate
is considered to be the active selector for this goal.
Candidates from other selectors of the same goal are ignored.

It is quite possible selectors focusing on other goals
have already found a lower bound relevant to multiple targets in a chain.
In that case, we want the most-initial of the target selectors
(not already in done mode) to have the nomination.

Otherwise (when in extending mode and missun relevant upper bound)
the closer-to-final selectors would nominate candidates
at lower load but at too high duration sum,
preventing some of the time savings.

##### Winner

If the candidate previously nominated by a selector was the one
that got measured, the candidate is called a winner.

A selector observing its previous candidate was a winer
can use simplified logic when determining the mode,
as it knows no other selectors may have changed the relevant loads unexpectedly.

### Controller output

The output object the controller returns to the manager
is a mapping assigning each search goal its conditional output (if it exists).

The controller MAY include more information (if manager accepts it),
for example load stat at relevant bounds.

There MAY be several ways how to communicate the fact a conditional output
does not exist (e.g. min load is classified as an upper bound).
The manager MUST NOT present min load as a conditional output in that case.

If max load is a lower bound, it leads to a valid conditional output value.

#### Conditional throughput

The conditional throughput is the average of trial forwarding rates
across long good trials measured at the (offered load classified as)
relevant lower bound (for the goal, at the end of the search).
The average is the weighted arithmetic mean, weighted by trial duration.

If the goal exceed ratio is zero, the definition of the relevant bounds
simplifies significantly.
If additionally the goal loss ratio is zero,
and the goal min duration sum is equal to goal final trial duration,
conditional throughput becomes conditionally compliant with RFC 2544 throughput.
If the goal final trial duration is at least 60 seconds,
the conditional througput becomes unconditionally compliant
with RFC 2544 throughput.

# Problems

## Long Test Duration

Emergence of software DUTs, with frequent software updates and a
number of different packet processing modes and configurations, drives
the requirement of continuous test execution and bringing down the test
execution time.

In the context of characterising particular DUT's network performance, this
calls for improving the time efficiency of throughput search.
A vanilla bisection (at 60sec trial duration for unconditional [RFC2544]
compliance) is slow, because most trials spend time quite far from the
eventual throughput.

[RFC2544] does not specify any stopping condition for throughput search,
so users can trade-off between search duration and achieved precision.
But, due to exponential behavior of bisection, small improvement
in search duration needs relatively big sacrifice in the throughput precision.

## DUT within SUT

[RFC2285] defines:
- *DUT* as
  - The network forwarding device to which stimulus is offered and
    response measured [RFC2285] (section 3.1.1).
- *SUT* as
  - The collective set of network devices to which stimulus is offered
    as a single entity and response measured [RFC2285] (section 3.1.2).

[RFC2544] specifies a test setup with an external tester stimulating the
networking system, treating it either as a single DUT, or as a system
of devices, an SUT.

In case of software networking, the SUT consists of a software program
processing packets (device of interest, the DUT),
running on a server hardware and using operating system functions as appropriate,
with server hardware resources shared across all programs
and the operating system.

DUT is effectively "nested" within SUT.

Due to a shared multi-tenant nature of SUT, DUT is subject to
interference (noise) coming from the operating system and any other
software running on the same server. Some sources of noise can be
eliminated (e.g. by pinning DUT program threads to specific CPU cores
and isolating those cores to avoid context switching). But some
noise remains after all such reasonable precautions are applied. This
noise does negatively affect DUT's network performance. We refer to it
as an *SUT noise*.

DUT can also exhibit fluctuating performance itself, e.g. while performing
some "stop the world" internal stateful processing. In many cases this
may be an expected per-design behavior, as it would be observable even
in a hypothetical scenario where all sources of SUT noise are
eliminated. Such behavior affects trial results in a way similar to SUT
noise. We use *noise* as a shorthand covering both *DUT fluctuations* and
genuine SUT noise.

A simple model of SUT performance consists of a baseline *noiseless performance*,
and an additional noise. The baseline is assumed to be constant (enough).
The noise varies in time, sometimes wildly. The noise can sometimes be negligible,
but frequently it lowers the observed SUT performance in a trial.

In this model, SUT does not have a single performance value, it has a spectrum.
One end of the spectrum is the noiseless baseline,
the other end is a *noiseful performance*. In practice, trial results
close to the noiseful end of the spectrum happen only rarely.
The worse performance, the more rarely it is seen in a trial.

Focusing on DUT, the benchmarking effort should aim
at eliminating only the SUT noise from SUT measurement.
But that is not really possible, as there are no realistic enough models
able to distinguish SUT noise from DUT fluctuations.

However, assuming that a well-constructed SUT has the DUT as its
performance bottleneck, the "DUT noiseless performance" can be defined
as the noiseless end of SUT performance spectrum. (At least for
throughput. For other quantities such as latency there will be an
additive difference.) By this definition, DUT noiseless performance
also minimizes the impact of DUT fluctuations.

In this document, we reduce the "DUT within SUT" problem to estimating
the noiseless end of SUT performance spectrum from a limited number of
trial results.

Any improvements to throughput search algorithm, aimed for better
dealing with software networking SUT and DUT setup, should employ
strategies recognizing the presence of SUT noise, and allow discovery of
(proxies for) DUT noiseless performance
at different levels of sensitivity to SUT noise.

## Repeatability and Comparability

[RFC2544] does not suggest to repeat throughput search. And from just one
throughput value, it cannot be determined how repeatable that value is.
In practice, poor repeatability is also the main cause of poor
comparability, e.g. different benchmarking teams can test the same SUT
but get different throughput values.

[RFC2544] throughput requirements (60s trial, no tolerance to single frame loss)
force the search to fluctuate close the noiseful end of SUT performance
spectrum. As that end is affected by rare trials of significantly low
performance, the resulting throughput repeatability is poor.

The repeatability problem is the problem of defining a search procedure
which reports more stable results
(even if they can no longer be called "throughput" in [RFC2544] sense).
According to baseline (noiseless) and noiseful model, better repeatability
will be at the noiseless end of the spectrum.
Therefore, solutions to the "DUT within SUT" problem
will help also with the repeatability problem.

Conversely, any alteration to [RFC2544] throughput search
that improves repeatability should be considered
as less dependent on the SUT noise.

An alternative option is to simply run a search multiple times, and report some
statistics (e.g. average and standard deviation). This can be used
for "important" tests, but it makes the search duration problem even
more pronounced.

## Throughput with Non-Zero Loss

[RFC1242] (section 3.17) defines throughput as:
    The maximum rate at which none of the offered frames
    are dropped by the device.

and then it says:
    Since even the loss of one frame in a
    data stream can cause significant delays while
    waiting for the higher level protocols to time out,
    it is useful to know the actual maximum data
    rate that the device can support.

Contrary to that, many benchmarking teams settle with non-zero
(small) loss ratio as the goal for a "throughput rate".

Motivations are many: modern protocols tolerate frame loss better;
trials nowadays send way more frames within the same duration;
impact of rare noise bursts is smaller as the baseline performance
can compensate somewhat by keeping the loss ratio below the goal;
if SUT noise with "ideal DUT" is known, it can be set as the loss ratio goal.

Regardless of validity of any and all similar motivations,
support for non-zero loss goals makes any search algorithm more user-friendly.
[RFC2544] throughput is not friendly in this regard.

Searching for multiple goal loss ratios also helps to describe the SUT
performance better than a single goal result. Repeated wide gap between
zero and non-zero loss conditional throughputs indicates
the noise has a large impact on the overall SUT performance.

It is easy to modify the vanilla bisection to find a lower bound
for intended load that satisfies a non-zero-loss goal,
but it is not that obvious how to search for multiple goals at once,
hence the support for multiple loss goals remains a problem.

## Inconsistent Trial Results

While performing throughput search by executing a sequence of
measurement trials, there is a risk of encountering inconsistencies
between trial results.

The plain bisection never encounters inconsistent trials.
But [RFC2544] hints about possibility if inconsistent trial results in two places.
The first place is section 24 where full trial durations are required, presumably
because they can be inconsistent with results from shorter trial durations.
The second place is section 26.3 where two successive zero-loss trials
are recommended, presumably because after one zero-loss trial
there can be subsequent inconsistent non-zero-loss trial.

Examples include:

- a trial at the same load (same or different trial duration) results
  in a different packet loss ratio.
- a trial at higher load (same or different trial duration) results
  in a smaller packet loss ratio.

Any robust throughput search algorithm needs to decide how to continue
the search in presence of such inconsistencies.
Definitions of throughput in [RFC1242] and [RFC2544] are not specific enough
to imply a unique way of handling such inconsistencies.

Ideally, there will be a definition of a quantity which both generalizes
throughput for non-zero-loss (and other possible repeatibility enhancements),
while being precise enough to force a specific way to resolve trial
inconsistencies.
But until such definition is agreed upon, the correct way to handle
inconsistent trial results remains an open problem.

# How the problems are addressed

Configurable loss ratio in MLRsearch search goals are there
in direct support for non-zero-loss conditional throughput.
In practice the conditional throughput results' stability
increases with higher loss ratio goals.

Multiple trials with noise tolerance enhancement,
as implemented in MLRsearch using non-zero goal exceed ratio value,
also indirectly increases the result stability.
That allows MLRsearch to achieve all the benefits
of Binary Search with Loss Verification,
as recommended in [RFC9004] (section 6.2)
and specified in [TST009] (section 12.3.3).

The main factor improving the overall search time is the introduction
of preceding targets. Less impactful time savings
are achieved by pre-initial trials, halving mode
and smart splitting in bisecting mode.

In several places, MLRsearch is "conservative" when handling
(potentially) inconsistent results. This includes the requirement
for the relevant lower bound to be smaller than any upper bound,
the unequal handling of good and bad short trials,
and preference to lower load when choosing the winner among candidates.

While this does no guarantee good search stability
(goals focusing on higher loads may still invalidate existing bounds
simply by requiring larger min duration sums),
it lowers the change of SUT having an area of poorer performance
below the reported conditional througput loads.
In any case, the definition of conditional throughput
is precise enough to dictate "conservative" handling
of trial inconsistencies.

# IANA Considerations

No requests of IANA.

# Security Considerations

Benchmarking activities as described in this memo are limited to
technology characterization of a DUT/SUT using controlled stimuli in a
laboratory environment, with dedicated address space and the constraints
specified in the sections above.

The benchmarking network topology will be an independent test setup and
MUST NOT be connected to devices that may forward the test traffic into
a production network or misroute traffic to the test management network.

Further, benchmarking is performed on a "black-box" basis, relying
solely on measurements observable external to the DUT/SUT.

Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
benchmarking purposes. Any implications for network security arising
from the DUT/SUT SHOULD be identical in the lab and in production
networks.

# Acknowledgements

Many thanks to Alec Hothan of OPNFV NFVbench project for thorough
review and numerous useful comments and suggestions.

--- back