aboutsummaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/ietf/draft-ietf-bmwg-mlrsearch-02.md1359
-rw-r--r--docs/ietf/draft-ietf-bmwg-mlrsearch-03.md501
-rw-r--r--docs/ietf/process.txt4
3 files changed, 504 insertions, 1360 deletions
diff --git a/docs/ietf/draft-ietf-bmwg-mlrsearch-02.md b/docs/ietf/draft-ietf-bmwg-mlrsearch-02.md
deleted file mode 100644
index fef146618c..0000000000
--- a/docs/ietf/draft-ietf-bmwg-mlrsearch-02.md
+++ /dev/null
@@ -1,1359 +0,0 @@
----
-title: Multiple Loss Ratio Search for Packet Throughput (MLRsearch)
-abbrev: Multiple Loss Ratio Search
-docname: draft-ietf-bmwg-mlrsearch-02
-date: 2022-03-07
-
-ipr: trust200902
-area: ops
-wg: Benchmarking Working Group
-kw: Internet-Draft
-cat: info
-
-coding: us-ascii
-pi: # can use array (if all yes) or hash here
- toc: yes
- sortrefs: # defaults to yes
- symrefs: yes
-
-author:
- -
- ins: M. Konstantynowicz
- name: Maciek Konstantynowicz
- org: Cisco Systems
- role: editor
- email: mkonstan@cisco.com
- -
- ins: V. Polak
- name: Vratko Polak
- org: Cisco Systems
- email: vrpolak@cisco.com
-
-normative:
- RFC2544:
-
-informative:
- FDio-CSIT-MLRsearch:
- target: https://s3-docs.fd.io/csit/rls2110/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.html#mlrsearch-tests
- title: "FD.io CSIT Test Methodology - MLRsearch"
- date: 2021-11
- PyPI-MLRsearch:
- target: https://pypi.org/project/MLRsearch/0.4.0/
- title: "MLRsearch 0.4.0, Python Package Index"
- date: 2021-04
-
---- abstract
-
-TODO: Update after all sections are ready.
-
-This document proposes changes to [RFC2544], specifically to packet
-throughput search methodology, by defining a new search algorithm
-referred to as Multiple Loss Ratio search (MLRsearch for short). Instead
-of relying on binary search with pre-set starting offered load, it
-proposes a novel approach discovering the starting point in the initial
-phase, and then searching for packet throughput based on defined packet
-loss ratio (PLR) input criteria and defined final trial duration time.
-One of the key design principles behind MLRsearch is minimizing the
-total test duration and searching for multiple packet throughput rates
-(each with a corresponding PLR) concurrently, instead of doing it
-sequentially.
-
-The main motivation behind MLRsearch is the new set of challenges and
-requirements posed by NFV (Network Function Virtualization),
-specifically software based implementations of NFV data planes. Using
-[RFC2544] in the experience of the authors yields often not repetitive
-and not replicable end results due to a large number of factors that are
-out of scope for this draft. MLRsearch aims to address this challenge
-in a simple way of getting the same result sooner, so more repetitions
-can be done to describe the replicability.
-
---- middle
-
-{::comment}
- As we use kramdown to convert from markdown,
- we use this way of marking comments not to be visible in rendered draft.
- https://stackoverflow.com/a/42323390
- If other engine is used, convert to this way:
- https://stackoverflow.com/a/20885980
-{:/comment}
-
-# Terminology
-
-TODO: Update after most other sections are updated.
-
-{::comment}
- The following is probably not needed (or defined elsewhere).
-
- * Frame size: size of an Ethernet Layer-2 frame on the wire, including
- any VLAN tags (dot1q, dot1ad) and Ethernet FCS, but excluding Ethernet
- preamble and inter-frame gap. Measured in bytes (octets).
- * Packet size: same as frame size, both terms used interchangeably.
- * Device Under Test (DUT): In software networking, "device" denotes a
- specific piece of software tasked with packet processing. Such device
- is surrounded with other software components (such as operating system
- kernel). It is not possible to run devices without also running the
- other components, and hardware resources are shared between both. For
- purposes of testing, the whole set of hardware and software components
- is called "system under test" (SUT). As SUT is the part of the whole
- test setup performance of which can be measured by [RFC2544] methods,
- this document uses SUT instead of [RFC2544] DUT. Device under test
- (DUT) can be re-introduced when analysing test results using whitebox
- techniques, but this document sticks to blackbox testing.
- * System Under Test (SUT): System under test (SUT) is a part of the
- whole test setup whose performance is to be benchmarked. The complete
- test setup contains other parts, whose performance is either already
- established, or not affecting the benchmarking result.
- * Bi-directional throughput tests: involve packets/frames flowing in
- both transmit and receive directions over every tested interface of
- SUT/DUT. Packet flow metrics are measured per direction, and can be
- reported as aggregate for both directions and/or separately
- for each measured direction. In most cases bi-directional tests
- use the same (symmetric) load in both directions.
- * Uni-directional throughput tests: involve packets/frames flowing in
- only one direction, i.e. either transmit or receive direction, over
- every tested interface of SUT/DUT. Packet flow metrics are measured
- and are reported for measured direction.
- * Packet Throughput Rate: maximum packet offered load DUT/SUT forwards
- within the specified Packet Loss Ratio (PLR). In many cases the rate
- depends on the frame size processed by DUT/SUT. Hence packet
- throughput rate MUST be quoted with specific frame size as received by
- DUT/SUT during the measurement. For bi-directional tests, packet
- throughput rate should be reported as aggregate for both directions.
- Measured in packets-per-second (pps) or frames-per-second (fps),
- equivalent metrics.
- * Bandwidth Throughput Rate: a secondary metric calculated from packet
- throughput rate using formula: bw_rate = pkt_rate * (frame_size +
- L1_overhead) * 8, where L1_overhead for Ethernet includes preamble (8
- octets) and inter-frame gap (12 octets). For bi-directional tests,
- bandwidth throughput rate should be reported as aggregate for both
- directions. Expressed in bits-per-second (bps).
- * TODO do we need this as it is identical to RFC2544 Throughput?
- Non Drop Rate (NDR): maximum packet/bandwidth throughput rate sustained
- by DUT/SUT at PLR equal zero (zero packet loss) specific to tested
- frame size(s). MUST be quoted with specific packet size as received by
- DUT/SUT during the measurement. Packet NDR measured in
- packets-per-second (or fps), bandwidth NDR expressed in
- bits-per-second (bps).
- * TODO if needed, reformulate to make it clear there can be multiple rates
- for multiple (non-zero) loss ratios.
- : Partial Drop Rate (PDR): maximum packet/bandwidth throughput rate
- sustained by DUT/SUT at PLR greater than zero (non-zero packet loss)
- specific to tested frame size(s). MUST be quoted with specific packet
- size as received by DUT/SUT during the measurement. Packet PDR
- measured in packets-per-second (or fps), bandwidth PDR expressed in
- bits-per-second (bps).
- * TODO: Refer to FRMOL instead.
- Maximum Receive Rate (MRR): packet/bandwidth rate regardless of PLR
- sustained by DUT/SUT under specified Maximum Transmit Rate (MTR)
- packet load offered by traffic generator. MUST be quoted with both
- specific packet size and MTR as received by DUT/SUT during the
- measurement. Packet MRR measured in packets-per-second (or fps),
- bandwidth MRR expressed in bits-per-second (bps).
- * TODO just keep using "trial measurement"?
- Trial: a single measurement step. See [RFC2544] section 23.
- * TODO already defined in RFC2544:
- Trial duration: amount of time over which packets are transmitted
- in a single measurement step.
-{:/comment}
-{::comment}
-{:/comment}
-
-* TODO: The current text uses Throughput for the zero loss ratio load.
- Is the capital T needed/useful?
-* DUT and SUT: see the definitions in https://gerrit.fd.io/r/c/csit/+/35545
-* Traffic Generator (TG) and Traffic Analyzer (TA): see
- https://datatracker.ietf.org/doc/html/rfc6894#section-4
- TODO: Maybe there is an earlier RFC?
-* Overall search time: the time it takes to find all required loads within
- their precision goals, starting from zero trials measured at given
- DUT configuration and traffic profile.
-* TODO: traffic profile?
-* Intended load: https://datatracker.ietf.org/doc/html/rfc2285#section-3.5.1
-* Offered load: https://datatracker.ietf.org/doc/html/rfc2285#section-3.5.2
-* Maximum offered load (MOL): see
- https://datatracker.ietf.org/doc/html/rfc2285#section-3.5.3
-* Forwarding rate at maximum offered load (FRMOL)
- https://datatracker.ietf.org/doc/html/rfc2285#section-3.6.2
-* Trial Loss Count: the number of frames transmitted
- minus the number of frames received. Negative count is possible,
- e.g. when SUT duplicates some frames.
-* Trial Loss Ratio: ratio of frames received relative to frames
- transmitted over the trial duration.
- For bi-directional throughput tests, the aggregate ratio is calculated,
- based on the aggregate number of frames transmitted and received.
- If the trial loss count is negative, its absolute value MUST be used
- to keep compliance with RFC2544.
-* Safe load: any value, such that trial measurement at this (or lower)
- intended load is correcrly handled by both TG and TA, regardless of SUT behavior.
- Frequently, it is not known what the safe load is.
-* Max load (TODO rename?): Maximal intended load to be used during search.
- Benchmarking team decides which value is low enough
- to guarantee values reported by TG and TA are reliable.
- It has to be a safe load, but it can be lower than a safe load estimate
- for added safety.
- See the subsection on unreliable test equipment below.
- This value MUST NOT be higher than MOL, which itself MUST NOT
- be higher than Maximum Frame Rate
- https://datatracker.ietf.org/doc/html/rfc2544#section-20
-* Min load: Minimal intended load to be used during search.
- Benchmarking team decides which value is high enough
- to guarantee the trial measurement results are valid.
- E.g. considerable overall search time can be saved by declaring SUT
- faulty if min load trial shows too high loss rate.
- Zero frames per second is a valid min load value
-* Effective loss ratio: a corrected value of trial loss ratio
- chosen to avoid difficulties if SUT exhibits decreasing loss ratio
- with increasing load. It is the maximum of trial loss ratios
- measured at the same duration on all loads smaller than (and including)
- the current one.
-* Target loss ratio: a loss ratio value acting as an input for the search.
- The search is finding tight enough lower and upper bounds in intended load,
- so that the measurement at the lower bound has smaller or equal
- trial loss ratio, and upper bound has strictly larger trial loss ratio.
- For the tightest upper bound, the effective loss ratio is the same as
- trial loss ratio at that upper bound load.
- For the tightest lower bound, the effective loss ratio can be higher
- than the trial loss ratio at that lower bound, but still not larger
- than the target loss ratio.
-* TODO: Search algorithm.
-* TODO: Precision goal.
-* TODO: Define a "benchmarking group".
-* TODO: Upper and lower bound.
-* TODO: Valid and invalid bound?
-* TODO: Interval and interval width?
-
-TODO: Mention NIC/PCI bandwidth/pps limits can be lower than bandwidth of medium.
-
-# Intentions of this document
-
-{::comment}
- Instead of talking about DUTs being non-deterministic
- and vendors "gaming" in order to get better Throughput results,
- Maciek and Vratko currently prefer to talk about result repeatability.
-{:/comment}
-
-The intention of this document is to provide recommendations for:
-* optimizing search for multiple target loss ratios at once,
-* speeding up the overall search time,
-* improve search results repeatability and comparability.
-
-No part of RFC2544 is intended to be obsoleted by this document.
-
-{::comment}
- This document may contain examples which contradict RFC2544 requirements
- and suggestions.
- That is not an ecouragement for benchmarking groups
- to stop being compliant with RFC2544.
-{:/comment}
-
-# RFC2544
-
-## Throughput search
-
-It is useful to restate the key requirements of RFC2544
-using the new terminology (see section Terminology).
-
-The following sections of RFC2544 are of interest for this document.
-
-* https://datatracker.ietf.org/doc/html/rfc2544#section-20
- Mentions the max load SHOULD not be larget than the theoretical
- maximum rate for the frame size on the media.
-
-* https://datatracker.ietf.org/doc/html/rfc2544#section-23
- Lists the actions to be done for each trial measurement,
- it also mentions loss rate as an example of trial measurement results.
- This document uses loss count instead, as that is the quantity
- that is easier for the current test equipment to measure,
- e.g. it is not affected by the real traffic duration.
- TODO: Time uncertainty again.
-
-* https://datatracker.ietf.org/doc/html/rfc2544#section-24
- Mentions "full length trials" leading to the Throughput found,
- as opposed to shorter trial durations, allowed in an attempt
- to "minimize the length of search procedure".
- This document talks about "final trial duration" and aims to
- "optimize overal search time".
-
-* https://datatracker.ietf.org/doc/html/rfc2544#section-26.1
- with https://www.rfc-editor.org/errata/eid422
- finaly states requirements for the search procedure.
- It boils down to "increase intended load upon zero trial loss
- and decrease intended load upon non-zero trial loss".
-
-No additional constraints are placed on the load selection,
-and there is no mention of an exit condition, e.g. when there is enough
-trial measurements to proclaim the largest load with zero trial loss
-(and final trial duration) to be the Throughput found.
-
-{::comment}
- The following section is probably not useful enough.
-
- ## Generalized search
-
- Note that the Throughput search can be restated as a "conditional
- load search" with a specific condition.
-
- "increase intended load upon trial result satisfying the condition
- and decrease intended load upon trial result not satisfying the condition"
- where the Throughput condition is "trial loss count is zero".
-
- This works for any condition that can be evaluated from a single
- trial measurement result, and is likely to be true at low loads
- and false at high loads.
-
- MLRsearch can incorporate multiple different conditions,
- as long as there is total ligical ordering between them
- (e.g. if a condition for a target loss ratio is not satisfied,
- it is also not satisfied for any other codition which uses
- larger target loss ratio).
-
- TODO: How to call a "load associated with this particular condition"?
-{:/comment}
-
-{::comment}
-
- TODO: Not sure if this subsection is needed an where.
-
- ## Simple bisection
-
- There is one obvious and simple search algorithm which conforms
- to throughput search requirements: simple bijection.
-
- Input: target precision, in frames per second.
-
- Procedure:
-
- 1. Chose min load to be zero.
- 1. No need to measure, loss count has to be zero.
- 2. Use the zero load as the current lower bound.
- 2. Chose max load to be the max value allowed by bandwidth of the medium.
- 1. Perform a trial measurement (at the full length duration) at max load.
- 2. If there is zero trial loss count, return max load as Throughput.
- 3. Use max load as the current upper bound.
- 3. Repeat until the difference between lower bound and upper bound is
- smaller or equal to the precision goal.
- 1. If it is not larget, return the current lower bound as Throughput.
- 2. Else: Chose new load as the arithmetic average of lower and upper bound.
- 3. Perform a trial measurement (at the full length duration) at this load.
- 4. If the trial loss rate is zero, consider the load as new lower bound.
- 5. Else consider the load as the new upper bound.
- 6. Jump back to the repeat at 3.
-
- Another possible stop condition is the overal search time so far,
- but that is not really a different condition, as the time for search to reach
- the precision goal is just a function of precision goal, trial duration
- and the difference between max and min load.
-
- While this algorithm can be accomodated to search for multiple
- target loss ratios "at the same time (see somewhere below),
- it is still missing multiple improvement which give MLRsearch
- considerably better overal search time in practice.
-
-{:/comment}
-
-# Problems
-
-## Repeatability and Comparability
-
-RFC2544 does not suggest to repeat Throughput search,
-{::comment}probably because the full set of tests already takes long{:/comment}
-and from just one Throughput value, it cannot be determined
-how repeatable that value is (how likely it is for a repeated Throughput search
-to end up with a value less then the precision goal away from the first value).
-
-Depending on SUT behavior, different benchmark groups
-can report significantly different Througput values,
-even when using identical SUT and test equipment,
-just because of minor differences in their search algorithm
-(e.g. different max load value).
-
-While repeatability can be addressed by repeating the search several times,
-the differences in the comparability scenario may be systematic,
-e.g. seeming like a bias in one or both benchmark groups.
-
-MLRsearch algorithm does not really help with the repeatability problem.
-This document RECOMMENDS to repeat a selection of "important" tests
-ten times, so users can ascertain the repeatability of the results.
-
-TODO: How to report? Average and standard deviation?
-
-Following MLRsearch algorithm leaves less freedom for the benchmark groups
-to encounter the comparability problem,
-alghough more research is needed to determine the effect
-of MLRsearch's tweakable parameters.
-
-{::comment}
- Possibly, the old DUTs were quite sharply consistent in their performance,
- and/or precision goals were quite large in order to save overal search time.
-
- With software DUTs and with time-efficient search algorithms,
- nowadays the repeatability of Throughput can be quite low,
- as in standard deviation of repeated Througput results
- is considerably higher than the precision goal.
-{:/comment}
-
-{::comment}
- TODO: Unify with PLRsearch draft.
- TODO: No-loss region, random region, lossy region.
- TODO: Tweaks with respect to non-zero loss ratio goal.
- TODO: Duration dependence?
-
- Both RFC2544 and MLRsearch return Throughput somewhere inside the random region,
- or at most the precision goal below it.
-{:/comment}
-
-{::comment}
- TODO: Make sure this is covered elsewhere, then delete.
-
- ## Search repeatability
-
- The goal of RFC1242 and RFC2544 is to limit how vendors benchmark their DUTs,
- in order to force them to report values that have higher chance
- to be confirmed by independent benchmarking groups following the same RFCs.
-
- This works well for deterministic DUTs.
-
- But for non-deterministic DUTs, the RFC2544 Throughput value
- is only guaranteed to fall somewhere below the lossy region (TODO define).
- It is possible to arrive at a value positioned likely high in the random region
- at the cost of increased overall search duration,
- simply by lowering the load by very small amounts (instead of exact halving)
- upon lossy trial and increasing by large amounts upon lossless trial.
-
- Prescribing an exact search algorithm (bisection or MLRsearch or other)
- will force vendors to report less "gamey" Throughput values.
-{:/comment}
-
-{::comment}
- ## Extensions
-
- The following two sections are probably out of scope,
- as they does not affect MLRsearch design choices.
-
- ### Direct and inverse measurements
-
- TODO expand: Direct measurement is single trial measurement,
- with predescribed inputs and outputs turned directly into the quality of interest
- Examples:
- Latency https://datatracker.ietf.org/doc/html/rfc2544#section-26.2
- is a single direct measurement.
- Frame loss rate https://datatracker.ietf.org/doc/html/rfc2544#section-26.3
- is a sequence of direct measurements.
-
- TODO expand: Indirect measurement aims to solve an "inverse function problem",
- meaning (a part of) trial measurement output is prescribed, and the quantity
- of interest is (derived from) the input parameters of trial measurement
- that achieves the prescribed output.
- In general this is a hard problem, but if the unknown input parameter
- is just one-dimensional quantity, algorithms such as bisection
- do converge regardless of outputs seen.
- We call any such algorithm examining one-dimensional input as "search".
- Of course, some exit condition is needed for the search to end.
- In case of Throughput, bisection algorithm tracks both upper bound
- and lower bound, with lower bound at the end of search is the quantity
- satisfying the definition of Throughput.
-
- ### Metrics other than frames
-
- TODO expand: Small TCP transaction can succeed even if some frames are lost.
-
- TODO expand: It is possible for loss ratio to use different metric than load.
- E.g. pps loss ratio when traffic profile uses higher level transactions per second.
-
- ### TODO: Stateful DUT
-
- ### TODO: Stateful traffic
-{:/comment}
-
-## Non-Zero Target Loss Ratios
-
-https://datatracker.ietf.org/doc/html/rfc1242#section-3.17
-defines Throughput as:
- The maximum rate at which none of the offered frames
- are dropped by the device.
-
-and then it says:
- Since even the loss of one frame in a
- data stream can cause significant delays while
- waiting for the higher level protocols to time out,
- it is useful to know the actual maximum data
- rate that the device can support.
-
-{::comment}
-
- While this may still be true for some protocols,
- research has been performed...
-
- TODO: Add this link properly: https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-Y.1541-201112-I!!PDF-E&type=items
- TODO: List values from that document, from 10^-3 to 4*10^-6.
-
- ...on other protocols and use cases,
- resulting in some small but non-zero loss ratios being considered
- as acceptable. Unfortunately, the acceptable value depends on use case
- and properties such as TCP window size and round trip time,
- so no single value of target loss rate (other than zero)
- is considered to be universally applicable.
-
-{:/comment}
-
-New "software DUTs" (traffic forwarding programs running on
-commercial-off-the-shelf compute server hardware) frequently exhibit quite
-low repeatability of Throughput results per above definition.
-
-This is due to, in general, throughput rates of software DUTs (programs)
-being sensitive to server resource allocation by OS during runtime,
-as well as any interrupts or blocking of software threads involved
-in packet processing.
-
-To deal with this, this document recommends discovery of multiple throughput rates of interest for software DUTs that run on general purpose COTS servers (with x86, AArch64 Instruction Set Architectures):
-* throughput rate with target of zero packet loss ratio.
-* at least one throughput rate with target of non-zero packet loss ratio.
-
-
-In our experience, the higher the target loss ratio is,
-the better is the repeatability of the corresponding load found.
-
-TODO: Define a good name for a load corresponding to a specific non-zero
-target loss ration, while keeping Throughput for the load corresponding
-to zero target loss ratio.
-
-This document RECOMMENDS the benchmark groups to search for corresponding loads
-to at least one non-zero target loss ratio.
-This document does not suggest any particular non-zero target loss ratio value
-to search the corresponding load for.
-
-{::comment}
- What is worse, some benchmark groups (which groups?; citation needed)
- started reporting loads that achieved only "approximate zero loss",
- while still calling that a Throughput (and thus becoming non-compliant
- with RFC2544).
-{:/comment}
-
-# Solution ideas
-
-This document gives several independent ideas on how to lower the (average)
-overall search time, while remaining unconditionally compliant with RFC2544
-(and adding some of extensions).
-
-This document also specifies one particular way to combine all the ideas
-into a single search algorithm class (single logic with few tweakable parameters).
-
-Little to no research has been done into the question of which combination
-of ideas achieves the best compromise with respect to overal search time,
-high repeatability and high comparability.
-
-TODO: How important it is to discuss particular implementation choices,
-especially when motivated by non-deterministic SUT behavior?
-
-## Short duration trials
-
-https://datatracker.ietf.org/doc/html/rfc2544#section-24
-already mentions the possibity of using shorter duration
-for trials that are not part of "final determination".
-
-Obviously, the upper and lower bound from a smaller duration trial
-can be used as the initial upper and lower bound for the final determination.
-
-MLRsearch makes it clear a re-measurement is always needed
-(new trial measurement with the same load but longer duration).
-It also specifes what to do if the longer trial is no longer a valid bound
-(TODO define?), e.g. start an external search.
-Additionaly one halving can be saved during the shorter duration search.
-
-## FRMOL as reasonable start
-
-TODO expand: Overal search ends with "final determination" search,
-preceded by "shorter duration search" preceded by "bound initialization",
-where the bounds can be considerably different from min and max load.
-
-For SUTs with high repeatability, the FRMOL is usually a good approximation
-of Throughput. But for less repeatable SUTs, forwarding rate (TODO define)
-is frequently a bad approximation to Throughput, therefore halving
-and other robust-to-worst-case approaches have to be used.
-Still, forwarding rate at FRMOL load can be a good initial bound.
-
-## Non-zero loss ratios
-
-See the "Popularity of non-zero target loss ratios" section above.
-
-TODO: Define "trial measurement result classification criteria",
-or keep reusing long phrases without definitions?
-
-A search for a load corresponding to a non-zero target loss rate
-is very similar to a search for Throughput,
-just the criterion when to increase or decrease the intended load
-for the next trial measurement uses the comparison of trial loss ratio
-to the target loss ratio (instead of comparing loss count to zero)
-Any search algorithm that works for Throughput can be easily used also for
-non-zero target loss rates, perhaps with small modifications
-in places where the measured forwarding rate is used.
-
-Note that it is possible to search for multiple loss ratio goals if needed.
-
-## Concurrent ratio search
-
-A single trial measurement result can act as an upper bound for a lower
-target loss ratio, and as a lower bound for a higher target loss ratio
-at the same time. This is an example of how
-it can be advantageous to search for all loss ratio goals "at once",
-or at least "reuse" trial measurement result done so far.
-
-Even when a search algorithm is fully deterministic in load selection
-while focusing on a single loss ratio and trial duration,
-the choice of iteration order between target loss ratios and trial durations
-can affect the obtained results in subtle ways.
-MLRsearch offers one particular ordering.
-
-{::comment}
- It is not clear if the current ordering is "best",
- it is not even clear how to measure how good an ordering is.
- We would need several models for bad SUT behaviors,
- bug-free implementations of different orderings,
- simulator to show the distribution of rates found,
- distribution of overall durations,
- and a criterion of which rate distribution is "bad"
- and whether it is worth the time saved.
-{:/comment}
-{::comment}
-{:/comment}
-
-## Load selection heuristics and shortcuts
-
-Aside of the two heuristics already mentioned (FRMOL based initial bounds
-and saving one halving when increasing trial duration),
-there are other tricks that can save some overall search time
-at the cost of keeping the difference between final lower and upper bound
-intentionally large (but still within the precision goal).
-
-TODO: Refer implementation subsections on:
-* Uneven splits.
-* Rounding the interval width up.
-* Using old invalid bounds for interval width guessing.
-
-The impact on overall duration is probably small,
-and the effect on result distribution maybe even smaller.
-TODO: Is the two-liner above useful at all?
-
-# Non-compliance with RFC2544
-
-It is possible to achieve even faster search times by abandoning
-some requirements and suggestions of RFC2544,
-mainly by reducing the wait times at start and end of trial.
-
-Such results are therefore no longer compliant with RFC2544
-(or at least not unconditionally),
-but they may still be useful for internal usage, or for comparing
-results of different DUTs achieved with an identical non-compliant algorithm.
-
-TODO: Refer to the subsection with CSIT customizations.
-
-# Additional Requirements
-
-RFC2544 can be understood as having a number of implicit requirements.
-They are made explicit in this section
-(as requirements for this document, not for RFC2544).
-
-Recommendations on how to properly address the implicit requirements
-are out of scope of this document.
-
-{::comment}
-
- Although some (insufficient) ideas are proposed.
-
-{:/comment}
-
-## TODO: Search Stop Criteria
-
-TODO: Mention the timeout parameter?
-
-{::comment}
-
- TODO: highlight importance of results consistency
- for SUT performance trending and anomaly detection.
-
-{:/comment}
-
-## Reliability of Test Equipment
-
-Both TG and TA MUST be able to handle correctly
-every intended load used during the search.
-
-On TG side, the difference between Intended Load and Offered Load
-MUST be small.
-
-TODO: How small? Difference of one packet may not be measurable
-due to time uncertainties.
-
-{::comment}
-
- Maciek: 1 packet out of 10M, that's 10**-7 accuracy.
-
- Vratko: For example, TRex uses several "worker" threads, each doing its own
- rounding on how many packets to send, separately per each traffic stream.
- For high loads and durations, the observed number of frames transmitted
- can differ from the expected (fractional) value by tens of frames.
-
-{:/comment}
-
-TODO expand: time uncertainty.
-
-To ensure that, max load (see Terminology) has to be set to low enough value.
-Benchmark groups MAY list the max load value used,
-especially if the Throughput value is equal (or close) to the max load.
-
-{::comment}
-
- The following is probably out of scope of this document,
- but can be useful when put into a separate document.
-
- TODO expand: If it results in smaller Throughput reported,
- it is not a big issue. Treat similarly to bandwidth and PPS limits of NICs.
-
- TODO expand: TA dropping packets when loaded only lowers Throughput,
- so not an issue.
-
- TODO expand: TG sending less packets but stopping at target duration
- is also fine, as long as the forwarding rate is used as Throughput value,
- not the higher intended load.
-
- TODO expand: Duration stretching is not fine.
- Neither "check for actual duration" nor "start+sleep+stop"
- are reliable solutions due to time overheads and uncertainty
- of TG starting/stopping traffic (and TA stopping counting packets).
-
-{:/comment}
-
-Solutions (even problem formulations) for the following open problems
-are outside of the scope of this document:
-* Detecting when the test equipment operates above its safe load.
-* Finding a large but safe load value.
-* Correcting any result affected by max load value not being a safe load.
-
-{::comment}
-
- TODO: Mention 90% of self-test as an idea:
- https://datatracker.ietf.org/doc/html/rfc8219#section-9.2.1
-
- This is pointing to DNS testing, nothing to do with throughput,
- so how is it relevant here?
-
-{:/comment}
-
-{::comment}
-
- Part of discussion on BMWG mailing list (with small edits):
-
- This is a hard issue.
- The algorithm as described has no way of knowing
- which part of the whole system is limiting the performance.
-
- It could be SUT only (no problem, testing SUT as expected),
- it could be TG only (can be mitigated by TG self-test
- and using small enough loads).
-
- But it could also be an interaction between DUT and TG.
- Imagine a TG (the Traffic Analyzer part) which is only able
- to handle incoming traffic up to some rate,
- but passes the self-test as the Generator part has maximal rate
- not larger than that. But what if SUT turns that steady rate
- into long-enough bursts of a higher rate (with delays between bursts
- large enough, so average forwarding rate matches the load).
- This way TA will see some packets as missing (when its buffers
- fill up), even though SUT has processed them correctly.
-
-{:/comment}
-
-### Very late frames
-
-{::comment}
-
- In CSIT we are aggressive at skipping all wait times around trial,
- but few of DUTs have large enough buffers.
- Or there is another reason why we are seeing negative loss counts.
-
-{:/comment}
-
-
-RFC2544 requires quite conservative time delays
-see https://datatracker.ietf.org/doc/html/rfc2544#section-23
-to prevent frames buffered in one trial measurement
-to be counted as received in a subsequent trial measurement.
-
-However, for some SUTs it may still be possible to buffer enough frames,
-so they are still sending them (perhaps in bursts)
-when the next trial measurement starts.
-Sometimes, this can be detected as a negative trial loss count, e.g. TA receiving
-more frames than TG has sent during this trial measurement. Frame duplication
-is another way of causing the negative trial loss count.
-
-https://datatracker.ietf.org/doc/html/rfc2544#section-10
-recommends to use sequence numbers in frame payloads,
-but generating and verifying them requires test equipment resources,
-which may be not plenty enough to suport at high loads.
-(Using low enough max load would work, but frequently that would be
-smaller than SUT's sctual Throughput.)
-
-RFC2544 does not offer any solution to the negative loss problem,
-except implicitly treating negative trial loss counts
-the same way as positive trial loss counts.
-
-This document also does not offer any practical solution.
-
-Instead, this document SUGGESTS the search algorithm to take any precaution
-necessary to avoid very late frames.
-
-This document also REQUIRES any detected duplicate frames to be counted
-as additional lost frames.
-This document also REQUIRES, any negative trial loss ratio
-to be treated as positive trial loss ratio of the same absolute value.
-
-{::comment}
-
- !!! Make sure this is covered elsewere, at least in better comments. !!!
-
- ## TODO: Bad behavior of SUT
-
- (Highest load with always zero loss can be quite far from lowest load
- with always nonzero loss.)
- (Non-determinism: warm up, periodic "stalls", perf decrease over time, ...)
-
- Big buffers:
- http://www.hit.bme.hu/~lencse/publications/ECC-2017-B-M-DNS64-revised.pdf
- See page 8 and search for the word "gaming".
-
-{:/comment}
-
-!!! Nothing below is up-to-date with draft v02. !!!
-
-# MLRsearch Background
-
-TODO: Old section, probably obsoleted by preceding section(s).
-
-Multiple Loss Ratio search (MLRsearch) is a packet throughput search
-algorithm suitable for deterministic systems (as opposed to
-probabilistic systems). MLRsearch discovers multiple packet throughput
-rates in a single search, each rate is associated with a distinct
-Packet Loss Ratio (PLR) criterion.
-
-For cases when multiple rates need to be found, this property makes
-MLRsearch more efficient in terms of time execution, compared to
-traditional throughput search algorithms that discover a single packet
-rate per defined search criteria (e.g. a binary search specified by
-[RFC2544]). MLRsearch reduces execution time even further by relying on
-shorter trial durations of intermediate steps, with only the final
-measurements conducted at the specified final trial duration. This
-results in the shorter overall search execution time when compared to a
-traditional binary search, while guaranteeing the same results for
-deterministic systems.
-
-In practice, two rates with distinct PLRs are commonly used for packet
-throughput measurements of NFV systems: Non Drop Rate (NDR) with PLR=0
-and Partial Drop Rate (PDR) with PLR>0. The rest of this document
-describes MLRsearch with NDR and PDR pair as an example.
-
-Similarly to other throughput search approaches like binary search,
-MLRsearch is effective for SUTs/DUTs with PLR curve that is
-non-decreasing with growing offered load. It may not be as
-effective for SUTs/DUTs with abnormal PLR curves, although
-it will always converge to some value.
-
-MLRsearch relies on traffic generator to qualify the received packet
-stream as error-free, and invalidate the results if any disqualifying
-errors are present e.g. out-of-sequence frames.
-
-MLRsearch can be applied to both uni-directional and bi-directional
-throughput tests.
-
-For bi-directional tests, MLRsearch rates and ratios are aggregates of
-both directions, based on the following assumptions:
-
-* Traffic transmitted by traffic generator and received by SUT/DUT
- has the same packet rate in each direction,
- in other words the offered load is symmetric.
-* SUT/DUT packet processing capacity is the same in both directions,
- resulting in the same packet loss under load.
-
-MLRsearch can be applied even without those assumptions,
-but in that case the aggregate loss ratio is less useful as a metric.
-
-MLRsearch can be used for network transactions consisting of more than
-just one packet, or anything else that has intended load as input
-and loss ratio as output (duration as input is optional).
-This text uses mostly packet-centric language.
-
-# MLRsearch Overview
-
-The main properties of MLRsearch:
-
-* MLRsearch is a duration aware multi-phase multi-rate search algorithm:
- * Initial Phase determines promising starting interval for the search.
- * Intermediate Phases progress towards defined final search criteria.
- * Final Phase executes measurements according to the final search
- criteria.
- * Final search criteria are defined by following inputs:
- * Target PLRs (e.g. 0.0 and 0.005 when searching for NDR and PDR).
- * Final trial duration.
- * Measurement resolution.
-* Initial Phase:
- * Measure MRR over initial trial duration.
- * Measured MRR is used as an input to the first intermediate phase.
-* Multiple Intermediate Phases:
- * Trial duration:
- * Start with initial trial duration in the first intermediate phase.
- * Converge geometrically towards the final trial duration.
- * Track all previous trial measurement results:
- * Duration, offered load and loss ratio are tracked.
- * Effective loss ratios are tracked.
- * While in practice, real loss ratios can decrease with increasing load,
- effective loss ratios never decrease. This is achieved by sorting
- results by load, and using the effective loss ratio of the previous load
- if the current loss ratio is smaller than that.
- * The algorithm queries the results to find best lower and upper bounds.
- * Effective loss ratios are always used.
- * The phase ends if all target loss ratios have tight enough bounds.
- * Search:
- * Iterate over target loss ratios in increasing order.
- * If both upper and lower bound are in measurement results for this duration,
- apply bisect until the bounds are tight enough,
- and continue with next loss ratio.
- * If a bound is missing for this duration, but there exists a bound
- from the previous duration (compatible with the other bound
- at this duration), re-measure at the current duration.
- * If a bound in one direction (upper or lower) is missing for this duration,
- and the previous duration does not have a compatible bound,
- compute the current "interval size" from the second tightest bound
- in the other direction (lower or upper respectively)
- for the current duration, and choose next offered load for external search.
- * The logic guarantees that a measurement is never repeated with both
- duration and offered load being the same.
- * The logic guarantees that measurements for higher target loss ratio
- iterations (still within the same phase duration) do not affect validity
- and tightness of bounds for previous target loss ratio iterations
- (at the same duration).
- * Use of internal and external searches:
- * External search:
- * It is a variant of "exponential search".
- * The "interval size" is multiplied by a configurable constant
- (powers of two work well with the subsequent internal search).
- * Internal search:
- * A variant of binary search that measures at offered load between
- the previously found bounds.
- * The interval does not need to be split into exact halves,
- if other split can get to the target width goal faster.
- * The idea is to avoid returning interval narrower than the current
- width goal. See sample implementation details, below.
-* Final Phase:
- * Executed with the final test trial duration, and the final width
- goal that determines resolution of the overall search.
-* Intermediate Phases together with the Final Phase are called
- Non-Initial Phases.
-* The returned bounds stay within prescribed min_rate and max_rate.
- * When returning min_rate or max_rate, the returned bounds may be invalid.
- * E.g. upper bound at max_rate may come from a measurement
- with loss ratio still not higher than the target loss ratio.
-
-The main benefits of MLRsearch vs. binary search include:
-
-* In general, MLRsearch is likely to execute more trials overall, but
- likely less trials at a set final trial duration.
-* In well behaving cases, e.g. when results do not depend on trial
- duration, it greatly reduces (>50%) the overall duration compared to a
- single PDR (or NDR) binary search over duration, while finding
- multiple drop rates.
-* In all cases MLRsearch yields the same or similar results to binary
- search.
-* Note: both binary search and MLRsearch are susceptible to reporting
- non-repeatable results across multiple runs for very bad behaving
- cases.
-
-Caveats:
-
-* Worst case MLRsearch can take longer than a binary search, e.g. in case of
- drastic changes in behaviour for trials at varying durations.
- * Re-measurement at higher duration can trigger a long external search.
- That never happens in binary search, which uses the final duration
- from the start.
-
-# Sample Implementation
-
-Following is a brief description of a sample MLRsearch implementation,
-which is a simplified version of the existing implementation.
-
-## Input Parameters
-
-1. **max_rate** - Maximum Transmit Rate (MTR) of packets to
- be used by external traffic generator implementing MLRsearch,
- limited by the actual Ethernet link(s) rate, NIC model or traffic
- generator capabilities.
-2. **min_rate** - minimum packet transmit rate to be used for
- measurements. MLRsearch fails if lower transmit rate needs to be
- used to meet search criteria.
-3. **final_trial_duration** - required trial duration for final rate
- measurements.
-4. **initial_trial_duration** - trial duration for initial MLRsearch phase.
-5. **final_relative_width** - required measurement resolution expressed as
- (lower_bound, upper_bound) interval width relative to upper_bound.
-6. **packet_loss_ratios** - list of maximum acceptable PLR search criteria.
-7. **number_of_intermediate_phases** - number of phases between the initial
- phase and the final phase. Impacts the overall MLRsearch duration.
- Less phases are required for well behaving cases, more phases
- may be needed to reduce the overall search duration for worse behaving cases.
-
-## Initial Phase
-
-1. First trial measures at configured maximum transmit rate (MTR) and
- discovers maximum receive rate (MRR).
- * IN: trial_duration = initial_trial_duration.
- * IN: offered_transmit_rate = maximum_transmit_rate.
- * DO: single trial.
- * OUT: measured loss ratio.
- * OUT: MRR = measured receive rate.
- Received rate is computed as intended load multiplied by pass ratio
- (which is one minus loss ratio). This is useful when loss ratio is computed
- from a different metric than intended load. For example, intended load
- can be in transactions (multiple packets each), but loss ratio is computed
- on level of packets, not transactions.
-
- * Example: If MTR is 10 transactions per second, and each transaction has
- 10 packets, and receive rate is 90 packets per second, then loss rate
- is 10%, and MRR is computed to be 9 transactions per second.
-
- If MRR is too close to MTR, MRR is set below MTR so that interval width
- is equal to the width goal of the first intermediate phase.
- If MRR is less than min_rate, min_rate is used.
-2. Second trial measures at MRR and discovers MRR2.
- * IN: trial_duration = initial_trial_duration.
- * IN: offered_transmit_rate = MRR.
- * DO: single trial.
- * OUT: measured loss ratio.
- * OUT: MRR2 = measured receive rate.
- If MRR2 is less than min_rate, min_rate is used.
- If loss ratio is less or equal to the smallest target loss ratio,
- MRR2 is set to a value above MRR, so that interval width is equal
- to the width goal of the first intermediate phase.
- MRR2 could end up being equal to MTR (for example if both measurements so far
- had zero loss), which was already measured, step 3 is skipped in that case.
-3. Third trial measures at MRR2.
- * IN: trial_duration = initial_trial_duration.
- * IN: offered_transmit_rate = MRR2.
- * DO: single trial.
- * OUT: measured loss ratio.
- * OUT: MRR3 = measured receive rate.
- If MRR3 is less than min_rate, min_rate is used.
- If step 3 is not skipped, the first trial measurement is forgotten.
- This is done because in practice (if MRR2 is above MRR), external search
- from MRR and MRR2 is likely to lead to a faster intermediate phase
- than a bisect between MRR2 and MTR.
-
-## Non-Initial Phases
-
-1. Main phase loop:
- 1. IN: trial_duration for the current phase. Set to
- initial_trial_duration for the first intermediate phase; to
- final_trial_duration for the final phase; or to the element of
- interpolating geometric sequence for other intermediate phases.
- For example with two intermediate phases, trial_duration of the
- second intermediate phase is the geometric average of
- initial_trial_duration and final_trial_duration.
- 2. IN: relative_width_goal for the current phase. Set to
- final_relative_width for the final phase; doubled for each
- preceding phase. For example with two intermediate phases, the
- first intermediate phase uses quadruple of final_relative_width
- and the second intermediate phase uses double of
- final_relative_width.
- 3. IN: Measurement results from the previous phase (previous duration).
- 4. Internal target ratio loop:
- 1. IN: Target loss ratio for this iteration of ratio loop.
- 2. IN: Measurement results from all previous ratio loop iterations
- of current phase (current duration).
- 3. DO: According to the procedure described in point 2:
- 1. either exit the phase (by jumping to 1.5),
- 2. or exit loop iteration (by continuing with next target loss ratio,
- jumping to 1.4.1),
- 3. or calculate new transmit rate to measure with.
- 4. DO: Perform the trial measurement at the new transmit rate and
- current trial duration, compute its loss ratio.
- 5. DO: Add the result and go to next iteration (1.4.1),
- including the added trial result in 1.4.2.
- 5. OUT: Measurement results from this phase.
- 6. OUT: In the final phase, bounds for each target loss ratio
- are extracted and returned.
- 1. If a valid bound does not exist, use min_rate or max_rate.
-2. New transmit rate (or exit) calculation (for point 1.4.3):
- 1. If the previous duration has the best upper and lower bound,
- select the middle point as the new transmit rate.
- 1. See 2.5.3. below for the exact splitting logic.
- 2. This can be a no-op if interval is narrow enough already,
- in that case continue with 2.2.
- 3. Discussion, assuming the middle point is selected and measured:
- 1. Regardless of loss rate measured, the result becomes
- either best upper or best lower bound at current duration.
- 2. So this condition is satisfied at most once per iteration.
- 3. This also explains why previous phase has double width goal:
- 1. We avoid one more bisection at previous phase.
- 2. At most one bound (per iteration) is re-measured
- with current duration.
- 3. Each re-measurement can trigger an external search.
- 4. Such surprising external searches are the main hurdle
- in achieving low overall search durations.
- 5. Even without 1.1, there is at most one external search
- per phase and target loss ratio.
- 6. But without 1.1 there can be two re-measurements,
- each coming with a risk of triggering external search.
- 2. If the previous duration has one bound best, select its transmit rate.
- In deterministic case this is the last measurement needed this iteration.
- 3. If only upper bound exists in current duration results:
- 1. This can only happen for the smallest target loss ratio.
- 2. If the upper bound was measured at min_rate,
- exit the whole phase early (not investigating other target loss ratios).
- 3. Select new transmit rate using external search:
- 1. For computing previous interval size, use:
- 1. second tightest bound at current duration,
- 2. or tightest bound of previous duration,
- if compatible and giving a more narrow interval,
- 3. or target interval width if none of the above is available.
- 4. In any case increase to target interval width if smaller.
- 2. Quadruple the interval width.
- 3. Use min_rate if the new transmit rate is lower.
- 4. If only lower bound exists in current duration results:
- 1. If the lower bound was measured at max_rate,
- exit this iteration (continue with next lowest target loss ratio).
- 2. Select new transmit rate using external search:
- 1. For computing previous interval size, use:
- 1. second tightest bound at current duration,
- 2. or tightest bound of previous duration,
- if compatible and giving a more narrow interval,
- 3. or target interval width if none of the above is available.
- 4. In any case increase to target interval width if smaller.
- 2. Quadruple the interval width.
- 3. Use max_rate if the new transmit rate is higher.
- 5. The only remaining option is both bounds in current duration results.
- 1. This can happen in two ways, depending on how the lower bound
- was chosen.
- 1. It could have been selected for the current loss ratio,
- e.g. in re-measurement (2.2) or in initial bisect (2.1).
- 2. It could have been found as an upper bound for the previous smaller
- target loss ratio, in which case it might be too low.
- 3. The algorithm does not track which one is the case,
- as the decision logic works well regardless.
- 2. Compute "extending down" candidate transmit rate exactly as in 2.3.
- 3. Compute "bisecting" candidate transmit rate:
- 1. Compute the current interval width from the two bounds.
- 2. Express the width as a (float) multiple of the target width goal
- for this phase.
- 3. If the multiple is not higher than one, it means the width goal
- is met. Exit this iteration and continue with next higher
- target loss ratio.
- 4. If the multiple is two or less, use half of that
- for new width if the lower subinterval.
- 5. Round the multiple up to nearest even integer.
- 6. Use half of that for new width if the lower subinterval.
- 7. Example: If lower bound is 2.0 and upper bound is 5.0, and width
- goal is 1.0, the new candidate transmit rate will be 4.0.
- This can save a measurement when 4.0 has small loss.
- Selecting the average (3.5) would never save a measurement,
- giving more narrow bounds instead.
- 4. If either candidate computation want to exit the iteration,
- do as bisecting candidate computation says.
- 5. The remaining case is both candidates wanting to measure at some rate.
- Use the higher rate. This prefers external search down narrow enough
- interval, competing with perfectly sized lower bisect subinterval.
-
-# FD.io CSIT Implementation
-
-The only known working implementation of MLRsearch is in
-the open-source code running in Linux Foundation
-FD.io CSIT project [FDio-CSIT-MLRsearch] as part of
-a Continuous Integration / Continuous Development (CI/CD) framework.
-
-MLRsearch is also available as a Python package in [PyPI-MLRsearch].
-
-## Additional details
-
-This document so far has been describing a simplified version of
-MLRsearch algorithm. The full algorithm as implemented in CSIT contains
-additional logic, which makes some of the details (but not general
-ideas) above incorrect. Here is a short description of the additional
-logic as a list of principles, explaining their main differences from
-(or additions to) the simplified description, but without detailing
-their mutual interaction.
-
-1. Logarithmic transmit rate.
- * In order to better fit the relative width goal, the interval
- doubling and halving is done differently.
- * For example, the middle of 2 and 8 is 4, not 5.
-2. Timeout for bad cases.
- * The worst case for MLRsearch is when each phase converges to
- intervals way different than the results of the previous phase.
- * Rather than suffer total search time several times larger than pure
- binary search, the implemented tests fail themselves when the
- search takes too long (given by argument *timeout*).
-3. Intended count.
- * The number of packets to send during the trial should be equal to
- the intended load multiplied by the duration.
- * Also multiplied by a coefficient, if loss ratio is calculated
- from a different metric.
- * Example: If a successful transaction uses 10 packets,
- load is given in transactions per second, but loss ratio is calculated
- from packets, so the coefficient to get intended count of packets
- is 10.
- * But in practice that does not work.
- * It could result in a fractional number of packets,
- * so it has to be rounded in a way traffic generator chooses,
- * which may depend on the number of traffic flows
- and traffic generator worker threads.
-4. Attempted count. As the real number of intended packets is not known exactly,
- the computation uses the number of packets traffic generator reports as sent.
- Unless overridden by the next point.
-5. Duration stretching.
- * In some cases, traffic generator may get overloaded,
- causing it to take significantly longer (than duration) to send all packets.
- * The implementation uses an explicit stop,
- * causing lower attempted count in those cases.
- * The implementation tolerates some small difference between
- attempted count and intended count.
- * 10 microseconds worth of traffic is sufficient for our tests.
- * If the difference is higher, the unsent packets are counted as lost.
- * This forces the search to avoid the regions of high duration stretching.
- * The final bounds describe the performance of not just SUT,
- but of the whole system, including the traffic generator.
-6. Excess packets.
- * In some test (e.g. using TCP flows) Traffic generator reacts to packet loss
- by retransmission. Usually, such packet loss is already affecting loss ratio.
- If a test also wants to treat retransmissions due to heavily delayed packets
- also as a failure, this is once again visible as a mismatch between
- the intended count and the attempted count.
- * The CSIT implementation simply looks at absolute value of the difference,
- so it offers the same small tolerance before it starts marking a "loss".
-7. For result processing, we use lower bounds and ignore upper bounds.
-
-### FD.io CSIT Input Parameters
-
-1. **max_rate** - Typical values: 2 * 14.88 Mpps for 64B
- 10GE link rate, 2 * 18.75 Mpps for 64B 40GE NIC (specific model).
-2. **min_rate** - Value: 2 * 9001 pps (we reserve 9000 pps
- for latency measurements).
-3. **final_trial_duration** - Value: 30.0 seconds.
-4. **initial_trial_duration** - Value: 1.0 second.
-5. **final_relative_width** - Value: 0.005 (0.5%).
-6. **packet_loss_ratios** - Value: 0.0, 0.005 (0.0% for NDR, 0.5% for PDR).
-7. **number_of_intermediate_phases** - Value: 2.
- The value has been chosen based on limited experimentation to date.
- More experimentation needed to arrive to clearer guidelines.
-8. **timeout** - Limit for the overall search duration (for one search).
- If MLRsearch oversteps this limit, it immediately declares the test failed,
- to avoid wasting even more time on a misbehaving SUT.
- Value: 600.0 (seconds).
-9. **expansion_coefficient** - Width multiplier for external search.
- Value: 4.0 (interval width is quadroupled).
- Value of 2.0 is best for well-behaved SUTs, but value of 4.0 has been found
- to decrease overall search time for worse-behaved SUT configurations,
- contributing more to the overall set of different SUT configurations tested.
-
-
-## Example MLRsearch Run
-
-
-The following list describes a search from a real test run in CSIT
-(using the default input values as above).
-
-* Initial phase, trial duration 1.0 second.
-
-Measurement 1, intended load 18750000.0 pps (MTR),
-measured loss ratio 0.7089514628479618 (valid upper bound for both NDR and PDR).
-
-Measurement 2, intended load 5457160.071600716 pps (MRR),
-measured loss ratio 0.018650817320118702 (new tightest upper bounds).
-
-Measurement 3, intended load 5348832.933500009 pps (slightly less than MRR2
-in preparation for first intermediate phase target interval width),
-measured loss ratio 0.00964383362905351 (new tightest upper bounds).
-
-* First intermediate phase starts, trial duration still 1.0 seconds.
-
-Measurement 4, intended load 4936605.579021453 pps (no lower bound,
-performing external search downwards, for NDR),
-measured loss ratio 0.0 (valid lower bound for both NDR and PDR).
-
-Measurement 5, intended load 5138587.208637197 pps (bisecting for NDR),
-measured loss ratio 0.0 (new tightest lower bounds).
-
-Measurement 6, intended load 5242656.244044665 pps (bisecting),
-measured loss ratio 0.013523745379347257 (new tightest upper bounds).
-
-* Both intervals are narrow enough.
-* Second intermediate phase starts, trial duration 5.477225575051661 seconds.
-
-Measurement 7, intended load 5190360.904111567 pps (initial bisect for NDR),
-measured loss ratio 0.0023533920869969953 (NDR upper bound, PDR lower bound).
-
-Measurement 8, intended load 5138587.208637197 pps (re-measuring NDR lower bound),
-measured loss ratio 1.2080222912800403e-06 (new tightest NDR upper bound).
-
-* The two intervals have separate bounds from now on.
-
-Measurement 9, intended load 4936605.381062318 pps (external NDR search down),
-measured loss ratio 0.0 (new valid NDR lower bound).
-
-Measurement 10, intended load 5036583.888432355 pps (NDR bisect),
-measured loss ratio 0.0 (new tightest NDR lower bound).
-
-Measurement 11, intended load 5087329.903232804 pps (NDR bisect),
-measured loss ratio 0.0 (new tightest NDR lower bound).
-
-* NDR interval is narrow enough, PDR interval not ready yet.
-
-Measurement 12, intended load 5242656.244044665 pps (re-measuring PDR upper bound),
-measured loss ratio 0.0101174866190136 (still valid PDR upper bound).
-
-* Also PDR interval is narrow enough, with valid bounds for this duration.
-* Final phase starts, trial duration 30.0 seconds.
-
-Measurement 13, intended load 5112894.3238511775 pps (initial bisect for NDR),
-measured loss ratio 0.0 (new tightest NDR lower bound).
-
-Measurement 14, intended load 5138587.208637197 (re-measuring NDR upper bound),
-measured loss ratio 2.030389804256833e-06 (still valid PDR upper bound).
-
-* NDR interval is narrow enough, PDR interval not yet.
-
-Measurement 15, intended load 5216443.04126728 pps (initial bisect for PDR),
-measured loss ratio 0.005620871287975237 (new tightest PDR upper bound).
-
-Measurement 16, intended load 5190360.904111567 (re-measuring PDR lower bound),
-measured loss ratio 0.0027629971184465604 (still valid PDR lower bound).
-
-* PDR interval is also narrow enough.
-* Returning bounds:
-* NDR_LOWER = 5112894.3238511775 pps; NDR_UPPER = 5138587.208637197 pps;
-* PDR_LOWER = 5190360.904111567 pps; PDR_UPPER = 5216443.04126728 pps.
-
-# IANA Considerations
-
-No requests of IANA.
-
-# Security Considerations
-
-Benchmarking activities as described in this memo are limited to
-technology characterization of a DUT/SUT using controlled stimuli in a
-laboratory environment, with dedicated address space and the constraints
-specified in the sections above.
-
-The benchmarking network topology will be an independent test setup and
-MUST NOT be connected to devices that may forward the test traffic into
-a production network or misroute traffic to the test management network.
-
-Further, benchmarking is performed on a "black-box" basis, relying
-solely on measurements observable external to the DUT/SUT.
-
-Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
-benchmarking purposes. Any implications for network security arising
-from the DUT/SUT SHOULD be identical in the lab and in production
-networks.
-
-# Acknowledgements
-
-Many thanks to Alec Hothan of OPNFV NFVbench project for thorough
-review and numerous useful comments and suggestions.
-
---- back
diff --git a/docs/ietf/draft-ietf-bmwg-mlrsearch-03.md b/docs/ietf/draft-ietf-bmwg-mlrsearch-03.md
new file mode 100644
index 0000000000..40180dc55b
--- /dev/null
+++ b/docs/ietf/draft-ietf-bmwg-mlrsearch-03.md
@@ -0,0 +1,501 @@
+---
+title: Multiple Loss Ratio Search
+abbrev: MLRsearch
+docname: draft-ietf-bmwg-mlrsearch-03
+date: 2022-11-09
+
+ipr: trust200902
+area: ops
+wg: Benchmarking Working Group
+kw: Internet-Draft
+cat: info
+
+coding: us-ascii
+pi: # can use array (if all yes) or hash here
+ toc: yes
+ sortrefs: # defaults to yes
+ symrefs: yes
+
+author:
+ -
+ ins: M. Konstantynowicz
+ name: Maciek Konstantynowicz
+ org: Cisco Systems
+ email: mkonstan@cisco.com
+ -
+ ins: V. Polak
+ name: Vratko Polak
+ org: Cisco Systems
+ email: vrpolak@cisco.com
+
+normative:
+ RFC1242:
+ RFC2285:
+ RFC2544:
+ RFC9004:
+
+informative:
+ TST009:
+ target: https://www.etsi.org/deliver/etsi_gs/NFV-TST/001_099/009/03.04.01_60/gs_NFV-TST009v030401p.pdf
+ title: "TST 009"
+ FDio-CSIT-MLRsearch:
+ target: https://s3-docs.fd.io/csit/rls2110/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.html#mlrsearch-tests
+ title: "FD.io CSIT Test Methodology - MLRsearch"
+ date: 2021-11
+ PyPI-MLRsearch:
+ target: https://pypi.org/project/MLRsearch/0.3.0/
+ title: "MLRsearch 0.3.0, Python Package Index"
+ date: 2021-04
+
+--- abstract
+
+This document proposes improvements to [RFC2544] throughput search by
+defining a new methodology called Multiple Loss Ratio search
+(MLRsearch). The main objectives for MLRsearch are to minimize the
+total test duration, search for multiple loss ratios and improve
+results repeatibility and comparability.
+
+The main motivation behind MLRsearch is the new set of challenges and
+requirements posed by testing Network Function Virtualization
+(NFV) systems and other software based network data planes.
+
+MLRsearch offers several ways to address these challenges, giving user
+configuration options to select their way.
+
+--- middle
+
+{::comment}
+ As we use kramdown to convert from markdown,
+ we use this way of marking comments not to be visible in rendered draft.
+ https://stackoverflow.com/a/42323390
+ If other engine is used, convert to this way:
+ https://stackoverflow.com/a/20885980
+{:/comment}
+
+# Purpose and Scope
+
+The purpose of this document is to describe Multiple Loss Ratio search
+(MLRsearch), a throughput search methodology optimized for software
+DUTs.
+
+Applying vanilla [RFC2544] throughput bisection to software DUTs
+results in a number of problems:
+
+- Binary search takes too long as most of trials are done far from the
+ eventually found throughput.
+- The required final trial duration and pauses between trials also
+ prolong the overall search duration.
+- Software DUTs show noisy trial results (noisy neighbor problem),
+ leading to big spread of possible discovered throughput values.
+- Throughput requires loss of exactly zero packets, but the industry
+ frequently allows for small but non-zero losses.
+- The definition of throughput is not clear when trial results are
+ inconsistent.
+
+MLRsearch aims to address these problems by applying the following set
+of enhancements:
+
+- Allow searching with multiple loss ratio goals.
+ - Each trial result can affect any search goal in principle
+ (trial reuse).
+- Multiple phases within one loss ratio goal search, middle ones need
+ to spend less time on trials.
+ - Middle phases also aim at lesser precision.
+ - Use Forwarding Rate (FR) at maximum offered load
+ [RFC2285] (section 3.6.2) to initialize the first middle phase.
+- Take care when dealing with inconsistent trial results.
+ - Loss ratios goals are handled in an order that precludes any
+ interference from later trials to earlier goals.
+- Apply several load selection heuristics to save even more time
+ by trying hard to avoid unnecessarily narrow intervals.
+
+MLRsearch configuration options are flexible enough to
+support both conservative settings (unconditionally compliant with [RFC2544],
+but longer search duration and worse repeatability) and aggressive
+settings (shorter search duration and better repeatability but not
+compliant with [RFC2544]).
+
+No part of [RFC2544] is intended to be obsoleted by this document.
+
+# Problems
+
+## Long Test Duration
+
+Emergence of software DUTs, with frequent software updates and a
+number of different packet processing modes and configurations, drives
+the requirement of continuous test execution and bringing down the test
+execution time.
+
+In the context of characterising particular DUT's network performance, this
+calls for improving the time efficiency of throughput search.
+A vanilla bisection (at 60sec trial duration for unconditional [RFC2544]
+compliance) is slow, because most trials spend time quite far from the
+eventual throughput.
+
+[RFC2544] does not specify any stopping condition for throughput search,
+so users can trade-off between search duration and precision goal.
+But, due to exponential behavior of bisection, small improvement
+in search duration needs relatively big sacrifice in the result precision.
+
+## DUT within SUT
+
+[RFC2285] defines:
+- *DUT* as
+ - The network forwarding device to which stimulus is offered and
+ response measured [RFC2285] (section 3.1.1).
+- *SUT* as
+ - The collective set of network devices to which stimulus is offered
+ as a single entity and response measured [RFC2285] (section 3.1.2).
+
+[RFC2544] specifies a test setup with an external tester stimulating the
+networking system, treating it either as a single DUT, or as a system
+of devices, an SUT.
+
+In case of software networking, the SUT consists of a software program
+processing packets (device of interest, the DUT),
+running on a server hardware and using operating system functions as appropriate,
+with server hardware resources shared across all programs
+and the operating system.
+
+DUT is effectively "nested" within SUT.
+
+Due to a shared multi-tenant nature of SUT, DUT is subject to
+interference (noise) coming from the operating system and any other
+software running on the same server. Some sources of noise can be
+eliminated (e.g. by pinning DUT program threads to specific CPU cores
+and isolating those cores to avoid context switching). But some
+noise remains after all such reasonable precautions are applied. This
+noise does negatively affect DUT's network performance. We refer to it
+as an *SUT noise*.
+
+DUT can also exhibit fluctuating performance itself, e.g. while performing
+some "stop the world" internal stateful processing. In many cases this
+may be an expected per-design behavior, as it would be observable even
+in a hypothetical scenario where all sources of SUT noise are
+eliminated. Such behavior affects trial results in a way similar to SUT
+noise. We use *noise* as a shorthand covering both *DUT fluctuations* and
+genuine SUT noise.
+
+A simple model of SUT performance consists of a baseline *noiseless performance*,
+and an additional noise. The baseline is assumed to be constant (enough).
+The noise varies in time, sometimes wildly. The noise can sometimes be negligible,
+but frequently it lowers the observed SUT performance in a trial.
+
+In this model, SUT does not have a single performance value, it has a spectrum.
+One end of the spectrum is the noiseless baseline,
+the other end is a *noiseful performance*. In practice, trial results
+close to the noiseful end of the spectrum happen only rarely.
+The worse performance, the more rarely it is seen.
+
+Focusing on DUT, the benchmarking effort should aim
+at eliminating only the SUT noise from SUT measurement.
+But that is not really possible, as there are no realistic enough models
+able to distinguish SUT noise from DUT fluctuations.
+
+However, assuming that a well-constructed SUT has the DUT as its
+performance bottleneck, the "DUT noiseless performance" can be defined
+as the noiseless end of SUT performance spectrum. (At least for
+throughput. For other quantities such as latency there will be an
+additive difference.) By this definition, DUT noiseless performance
+also minimizes the impact of DUT fluctuations.
+
+In this document, we reduce the "DUT within SUT" problem to estimating
+the noiseless end of SUT performance spectrum from a limited number of
+trial results.
+
+Any improvements to throughput search algorithm, aimed for better
+dealing with software networking SUT and DUT setup, should employ
+strategies recognizing the presence of SUT noise, and allow discovery of
+(proxies for) DUT noiseless performance
+at different levels of sensitivity to SUT noise.
+
+## Repeatability and Comparability
+
+[RFC2544] does not suggest to repeat throughput search, and from just one
+throughput value, it cannot be determined how repeatable that value is.
+In practice, poor repeatability is also the main cause of poor
+comparability, e.g. different benchmarking teams can test the same DUT
+but get different throughput values.
+
+[RFC2544] throughput requirements (60s trial, no tolerance to single frame loss)
+force the search to converge around the noiseful end of SUT performance
+spectrum. As that end is affected by rare trials of significantly low
+performance, the resulting throughput repeatability is poor.
+
+The repeatability problem is the problem of defining a search procedure
+which reports more stable results
+(even if they can no longer be called "throughput" in [RFC2544] sense).
+According to baseline (noiseless) and noiseful model, better repeatability
+will be at the noiseless end of the spectrum.
+Therefore, solutions to the "DUT within SUT" problem
+will help also with the repeatability problem.
+
+Conversely, any alteration to [RFC2544] throughput search
+that improves repeatability should be considered
+as less dependent on the SUT noise.
+
+An alternative option is to simply run a search multiple times, and report some
+statistics (e.g. average and standard deviation). This can be used
+for "important" tests, but it makes the search duration problem even
+bigger.
+
+## Throughput with Non-Zero Loss
+
+[RFC1242] (section 3.17) defines throughput as:
+ The maximum rate at which none of the offered frames
+ are dropped by the device.
+
+and then it says:
+ Since even the loss of one frame in a
+ data stream can cause significant delays while
+ waiting for the higher level protocols to time out,
+ it is useful to know the actual maximum data
+ rate that the device can support.
+
+Contrary to that, many benchmarking teams settle with non-zero
+(small) loss ratio as the goal for a "throughput rate".
+
+Motivations are many: modern protocols tolerate frame loss better;
+trials nowadays send way more frames within the same duration;
+impact of rare noise bursts is smaller as the baseline performance
+can compensate somewhat by keeping the loss ratio below the goal;
+if SUT noise with "ideal DUT" is known, it can be set as the loss ratio goal.
+
+Regardless of validity of any and all similar motivations,
+support for non-zero loss goals makes any search algorithm more user-friendly.
+[RFC2544] throughput is not friendly in this regard.
+
+Searching for multiple loss ratio goals also helps to describe the SUT
+performance better than a single goal result. Repeated wide gap between
+zero and non-zero loss loads indicates the noise has a large impact on
+the overall SUT performance.
+
+It is easy to modify the vanilla bisection to find a lower bound
+for intended load that satisfies a non-zero-loss goal,
+but it is not that obvious how to search for multiple goals at once,
+hence the support for multiple loss goals remains a problem.
+
+## Inconsistent Trial Results
+
+While performing throughput search by executing a sequence of
+measurement trials, there is a risk of encountering inconsistencies
+between trial results.
+
+The plain bisection never encounters inconsistent trials.
+But [RFC2544] hints about possibility if inconsistent trial results in two places.
+The first place is section 24 where full trial durations are required, presumably
+because they can be inconsistent with results from shorter trial durations.
+The second place is section 26.3 where two successive zero-loss trials
+are recommended, presumably because after one zero-loss trial
+there can be subsequent inconsistent non-zero-loss trial.
+
+Examples include:
+
+- a trial at the same load (same or different trial duration) results
+ in a different packet loss ratio.
+- a trial at higher load (same or different trial duration) results
+ in a smaller packet loss ratio.
+
+Any robust throughput search algorithm needs to decide how to continue
+the search in presence of such inconsistencies.
+Definitions of throughput in [RFC1242] and [RFC2544] are not specific enough
+to imply a unique way of handling such inconsistencies.
+
+Ideally, there will be a definition of a quantity which both generalizes
+throughput for non-zero-loss (and other possible repeatibility enhancements),
+while being precise enough to force a specific way to resolve trial
+inconsistencies.
+But until such definition is agreed upon, the correct way to handle
+inconsistent trial results remains an open problem.
+
+# MLRsearch Approach
+
+The following description intentionally leaves out some important implementation
+details. This is both to hide complexity that is not important for overall
+understanding, and to allow future improvements in the implementation.
+
+## Terminology
+
+- *trial duration*: Amount of time over which frames are transmitted
+ towards SUT and DUT in a single measurement step.
+ - **MLRsearch input parameter** for final MLRsearch measurements.
+- *loss ratio*: Ratio of the count of frames lost to the count of frames
+ transmitted over a trial duration, a.k.a. packet loss ratio. Related
+ to packet loss rate [RFC1242] (section 3.6).
+ In MLRsearch loss ratio can mean either a trial result or a goal:
+ - *trial loss ratio*: Loss ratio measured during a trial.
+ - *loss ratio goal*: **MLRsearch input parameter**.
+ - If *trial loss ratio* is smaller or equal to this,
+ the trial **satisfies** the loss ratio goal.
+- *load*: Constant offered load stimulating the SUT and DUT. Consistent
+ with offered load [RFC2285] (section 3.5.2).
+ - MLRsearch works with intended load instead, as it cannot deal with
+ situations where the offered load is considerably different than
+ intended load.
+- *throughput*: The maximum load at which none of the offered frames are
+ dropped by the SUT and DUT. Consistent with [RFC1242] (section 3.17).
+- *conditional throughput*: The forwarding rate measured at the maximum
+ load at which a list of specified conditions are met i.e. loss ratio
+ goal and trial duration.
+ - Throughput is then a special case of conditional throughput
+ for zero loss ratio goal and long enough trial duration.
+ - Conditional throughput is aligned with forwarding rate (FR)
+ [RFC2285] (section 3.6.1), adding trial duration to offered load
+ required when reporting FR.
+- *lower bound*: One of values tracked by MLRsearch during the search runtime.
+ It is specific to the current trial duration and current loss ratio goal.
+ It represents a load value with at least one trial result available.
+ If the trial satisfies the current loss ratio goal,
+ it is a *valid* bound (else *invalid*).
+- *upper bound*: One of values tracked by MLRsearch during the search runtime.
+ It is specific to the current trial duration and current loss ratio goal.
+ It represents a load value with at least one trial result available.
+ If the trial satisfies the current loss ratio goal,
+ it is an *invalid* bound (else *valid*).
+- *interval*: The span between lower and upper bound loads.
+- *precision goal*: **MLRsearch input parameter**, acting as a search
+ stop condition, given as either absolute or relative width goal. An
+ interval meets precision goal if:
+ - The difference of upper and lower bound loads (in pps)
+ is not more than the absolute width goal.
+ - The difference as above, divided by upper bound load (in pps)
+ is not more than the relative width goal.
+
+## Description
+
+The MLRsearch approach to address the identified problems is based
+on the following main strategies:
+
+- MLRsearch main inputs include the following search goals and parameters:
+ - One or more **loss ratio goals**.
+ - e.g. a zero-loss goal and one (or more) non-zero-loss goals.
+ - **Target trial duration** condition governing required trial duration
+ for final measurements.
+ - **Target precision** condition governing how close final lower and
+ upper bound load values must be to each other for final
+ measurements.
+- Search is executed as a sequence of phases:
+ - *Initial phase* initializes bounds for the first middle phase.
+ - *Middle phase*s narrow down the bounds, using shorter trial
+ durations and lower precision goals. Several middle phases can
+ precede each final phase.
+ - *Final phase* (one per loss ratio goal) finds bounds matching input
+ goals and parameters to serve as the overal search output.
+- Each search phase produces its *ending* upper bound and lower bound:
+ - Initial phase may produce invalid bounds.
+ - Middle and final phases produce valid bounds.
+ - Middle or final phases needs at least two values to act as
+ *starting* bounds (may be invalid).
+ - Each phase may perform several trial measurements, until phase's
+ ending conditions are all met.
+ - Trial results from previous phases may be re-used.
+- Initial phase establishes the starting values for bounds, using
+ forwarding rates (FR) [RFC2285] (section 3.6.1)
+ from a few trials of minimal duration, as follows:
+ - 1st trial is done at *maximum offered load (MOL)* [RFC2285] (section 3.5.3),
+ resulting in Forwarding rate at maximum offered load (FRMOL)
+ [RFC2285] (section 3.6.2).
+ - 2nd trial is done at *FRMOL*, resulting in forwarding rate at FRMOL (FRFRMOL),
+ newly defined here.
+ - 3rd trial is done at *FRFRMOL*, so its results are available for the next phase.
+ - By default, FRMOL is used as an upper bound, FRFRMOL as a lower bound.
+ - Adjustments may apply here for some cases e.g. when 2nd trial got
+ zero loss or if FRFRMOL is too close to FRMOL.
+- Middle phases are producing ending bounds by improving upon starting bounds:
+ - Each middle phase uses the same loss ratio goal as the final phase it precedes.
+ - Called *current loss ratio goal* for upper and lower bound purposes.
+ - Each middle phase has its own *current trial duration*
+ and *current precision goal* parameters, computed from
+ MLRsearch input parameters.
+ As phases progress, these parameters approach MLRsearch main input values.
+ - Current trial duration starts from a configurable minimum (e.g. 1 sec)
+ and increases in a geometric sequence.
+ - Current precision goal always allows twice as wide intervals
+ as the following phase.
+ - The starting bounds are usually the ending bounds from the preceding phase.
+ - Unless there are many previous trial results that are more promising.
+ - Each middle phase operates in a sequence of four actions:
+ 1. Perform trial at the load between the starting bounds.
+ - Depending on the trial result this becomes the first
+ new valid upper or lower bound for current phase.
+ 2. Re-measure at the remaining starting lower or upper (respectively) bound.
+ 3. If that did not result in a valid bound, start an *external search*.
+ - That is a variant of exponential search.
+ - The "growth" is given by input parameter *expansion_coefficient*.
+ - This action ends when a new valid bound is found.
+ - Or if an already existing valid bound becomes close enough.
+ 4. Repeatedly bisect the current interval until the bounds are close enough.
+- Final search phase operates in exactly the same way as middle phases.
+ There are two reasons why it is named differently:
+ - The current trial duration and current precision goal within the phase
+ are equal to the target trial duration and target precision input parameters.
+ - The forwarding rates of the ending bounds become the output of MLRsearch.
+ - Specifically, the forwarding rates of the final lower bounds
+ are the conditional throughput values per given loss ratio goals.
+
+## Enhancement: Multiple trials per load
+
+An enhancement of MLRsearch is to introduce a *noise tolerance* input parameter.
+The idea is to perform several medium-length trials (instead of a single long trial)
+and tolerate a configurable fraction of them to not-satisfy the loss ratio goal.
+
+MLRsearch implementation with this enhancement exists in FD.io CSIT project
+and test results of VPP and DPDK (testpmd, l3fwd) DUTs look promising.
+
+This enhancement would make the description of MLRsearch approach
+considerably more complicated, so this document version only describes
+MLRsearch without this enhancement.
+
+# How the problems are addressed
+
+Configurable loss ratio goals are in direct support for non-zero-loss conditional througput.
+In practice the conditional throughput results' stability
+increases with higher loss ratio goals.
+
+Multiple trials with noise tolerance enhancement will also indirectly
+increase result stability and it will allow MLRsearch
+to add all the benefits of Binary Search with Loss Verification,
+as recommended in [RFC9004] (section 6.2)
+and specified in [TST009] (section 12.3.3).
+
+The main factor improving the overall search time is the introduction
+of middle phases. The full implementation can bring a large number of
+heuristics related to how exactly should the next trial load be chosen,
+but the impact of those is not as big.
+
+The Description subsection lacks any details on how to handle inconsistent
+trial results. In practice, there tend to be a three-way trade-off
+between i) short overall search time, ii) result stability
+and iii) how simple the definition of the returned conditional throughput can be.
+The third one is important for comparability between different MLRsearch
+implementations.
+
+# IANA Considerations
+
+No requests of IANA.
+
+# Security Considerations
+
+Benchmarking activities as described in this memo are limited to
+technology characterization of a DUT/SUT using controlled stimuli in a
+laboratory environment, with dedicated address space and the constraints
+specified in the sections above.
+
+The benchmarking network topology will be an independent test setup and
+MUST NOT be connected to devices that may forward the test traffic into
+a production network or misroute traffic to the test management network.
+
+Further, benchmarking is performed on a "black-box" basis, relying
+solely on measurements observable external to the DUT/SUT.
+
+Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
+benchmarking purposes. Any implications for network security arising
+from the DUT/SUT SHOULD be identical in the lab and in production
+networks.
+
+# Acknowledgements
+
+Many thanks to Alec Hothan of OPNFV NFVbench project for thorough
+review and numerous useful comments and suggestions.
+
+--- back
diff --git a/docs/ietf/process.txt b/docs/ietf/process.txt
index e170352cb9..261756fc8a 100644
--- a/docs/ietf/process.txt
+++ b/docs/ietf/process.txt
@@ -19,4 +19,6 @@ $ kdrfc --version
$ sudo gem install kramdown-rfc2629
Main:
-$ kdrfc draft-ietf-bmwg-mlrsearch-02.md
+$ kdrfc draft-ietf-bmwg-mlrsearch-03.md
+
+If that complains, do it manually at https://author-tools.ietf.org/ \ No newline at end of file