diff options
author | Vratko Polak <vrpolak@cisco.com> | 2024-07-17 20:12:47 +0200 |
---|---|---|
committer | Maciek Konstantynowicz <mkonstan@cisco.com> | 2024-07-21 14:18:57 +0000 |
commit | 25e7fb3b4fe955d1eea55ff282544d340ebcfe2b (patch) | |
tree | 8cce39c7275fcfc3f8bbef93fb37b079f0367869 /docs/ietf/draft-ietf-bmwg-mlrsearch-07.txt | |
parent | 901fba157df6da09800866c7eb80014ae2ffbf5b (diff) |
feat(ietf): MLRsearch draft-07
+ date change 17jul to 18jul due to ietf submission tool reqs
+ .xml and .txt
Change-Id: I8b6fbba3a412ce41773b73592722905a9f361861
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
Signed-off-by: Maciek Konstantynowicz <mkonstan@cisco.com>
Diffstat (limited to 'docs/ietf/draft-ietf-bmwg-mlrsearch-07.txt')
-rw-r--r-- | docs/ietf/draft-ietf-bmwg-mlrsearch-07.txt | 2800 |
1 files changed, 2800 insertions, 0 deletions
diff --git a/docs/ietf/draft-ietf-bmwg-mlrsearch-07.txt b/docs/ietf/draft-ietf-bmwg-mlrsearch-07.txt new file mode 100644 index 0000000000..c5c94410a8 --- /dev/null +++ b/docs/ietf/draft-ietf-bmwg-mlrsearch-07.txt @@ -0,0 +1,2800 @@ + + + + +Benchmarking Working Group M. Konstantynowicz +Internet-Draft V. Polak +Intended status: Informational Cisco Systems +Expires: 18 January 2025 18 July 2024 + + + Multiple Loss Ratio Search + draft-ietf-bmwg-mlrsearch-07 + +Abstract + + This document proposes extensions to [RFC2544] throughput search by + defining a new methodology called Multiple Loss Ratio search + (MLRsearch). MLRsearch aims to minimize search duration, support + multiple loss ratio searches, and enhance result repeatability and + comparability. + + The primary reason for extending [RFC2544] is to address the + challenges and requirements presented by the evaluation and testing + of software-based networking systems' data planes. + + To give users more freedom, MLRsearch provides additional + configuration options such as allowing multiple short trials per load + instead of one large trial, tolerating a certain percentage of trial + results with higher loss, and supporting the search for multiple + goals with varying loss ratios. + +Status of This Memo + + This Internet-Draft is submitted in full conformance with the + provisions of BCP 78 and BCP 79. + + Internet-Drafts are working documents of the Internet Engineering + Task Force (IETF). Note that other groups may also distribute + working documents as Internet-Drafts. The list of current Internet- + Drafts is at https://datatracker.ietf.org/drafts/current/. + + Internet-Drafts are draft documents valid for a maximum of six months + and may be updated, replaced, or obsoleted by other documents at any + time. It is inappropriate to use Internet-Drafts as reference + material or to cite them other than as "work in progress." + + This Internet-Draft will expire on 18 January 2025. + +Copyright Notice + + Copyright (c) 2024 IETF Trust and the persons identified as the + document authors. All rights reserved. + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 1] + +Internet-Draft MLRsearch July 2024 + + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents (https://trustee.ietf.org/ + license-info) in effect on the date of publication of this document. + Please review these documents carefully, as they describe your rights + and restrictions with respect to this document. Code Components + extracted from this document must include Revised BSD License text as + described in Section 4.e of the Trust Legal Provisions and are + provided without warranty as described in the Revised BSD License. + +Table of Contents + + 1. Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . 4 + 2. Identified Problems . . . . . . . . . . . . . . . . . . . . . 5 + 2.1. Long Search Duration . . . . . . . . . . . . . . . . . . 5 + 2.2. DUT in SUT . . . . . . . . . . . . . . . . . . . . . . . 6 + 2.3. Repeatability and Comparability . . . . . . . . . . . . . 8 + 2.4. Throughput with Non-Zero Loss . . . . . . . . . . . . . . 8 + 2.5. Inconsistent Trial Results . . . . . . . . . . . . . . . 9 + 3. MLRsearch Specification . . . . . . . . . . . . . . . . . . . 10 + 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 10 + 3.2. Measurement Quantities . . . . . . . . . . . . . . . . . 11 + 3.3. Existing Terms . . . . . . . . . . . . . . . . . . . . . 12 + 3.3.1. SUT . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 3.3.2. DUT . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 3.3.3. Trial . . . . . . . . . . . . . . . . . . . . . . . . 12 + 3.4. Trial Terms . . . . . . . . . . . . . . . . . . . . . . . 13 + 3.4.1. Trial Duration . . . . . . . . . . . . . . . . . . . 14 + 3.4.2. Trial Load . . . . . . . . . . . . . . . . . . . . . 14 + 3.4.3. Trial Input . . . . . . . . . . . . . . . . . . . . . 15 + 3.4.4. Traffic Profile . . . . . . . . . . . . . . . . . . . 15 + 3.4.5. Trial Forwarding Ratio . . . . . . . . . . . . . . . 16 + 3.4.6. Trial Loss Ratio . . . . . . . . . . . . . . . . . . 16 + 3.4.7. Trial Forwarding Rate . . . . . . . . . . . . . . . . 17 + 3.4.8. Trial Effective Duration . . . . . . . . . . . . . . 17 + 3.4.9. Trial Output . . . . . . . . . . . . . . . . . . . . 18 + 3.4.10. Trial Result . . . . . . . . . . . . . . . . . . . . 18 + 3.5. Goal Terms . . . . . . . . . . . . . . . . . . . . . . . 19 + 3.5.1. Goal Final Trial Duration . . . . . . . . . . . . . . 19 + 3.5.2. Goal Duration Sum . . . . . . . . . . . . . . . . . . 19 + 3.5.3. Goal Loss Ratio . . . . . . . . . . . . . . . . . . . 20 + 3.5.4. Goal Exceed Ratio . . . . . . . . . . . . . . . . . . 20 + 3.5.5. Goal Width . . . . . . . . . . . . . . . . . . . . . 21 + 3.5.6. Search Goal . . . . . . . . . . . . . . . . . . . . . 21 + 3.5.7. Controller Input . . . . . . . . . . . . . . . . . . 22 + 3.6. Search Goal Examples . . . . . . . . . . . . . . . . . . 23 + 3.6.1. RFC2544 Goal . . . . . . . . . . . . . . . . . . . . 23 + 3.6.2. TST009 Goal . . . . . . . . . . . . . . . . . . . . . 24 + 3.7. Result Terms . . . . . . . . . . . . . . . . . . . . . . 24 + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 2] + +Internet-Draft MLRsearch July 2024 + + + 3.7.1. Relevant Upper Bound . . . . . . . . . . . . . . . . 25 + 3.7.2. Relevant Lower Bound . . . . . . . . . . . . . . . . 25 + 3.7.3. Conditional Throughput . . . . . . . . . . . . . . . 26 + 3.7.4. Goal Result . . . . . . . . . . . . . . . . . . . . . 26 + 3.7.5. Search Result . . . . . . . . . . . . . . . . . . . . 27 + 3.7.6. Controller Output . . . . . . . . . . . . . . . . . . 27 + 3.8. MLRsearch Architecture . . . . . . . . . . . . . . . . . 28 + 3.8.1. Measurer . . . . . . . . . . . . . . . . . . . . . . 28 + 3.8.2. Controller . . . . . . . . . . . . . . . . . . . . . 29 + 3.8.3. Manager . . . . . . . . . . . . . . . . . . . . . . . 29 + 3.9. Implementation Compliance . . . . . . . . . . . . . . . . 30 + 4. Additional Considerations . . . . . . . . . . . . . . . . . . 30 + 4.1. MLRsearch Versions . . . . . . . . . . . . . . . . . . . 31 + 4.2. Stopping Conditions . . . . . . . . . . . . . . . . . . . 31 + 4.3. Load Classification . . . . . . . . . . . . . . . . . . . 32 + 4.4. Loss Ratios . . . . . . . . . . . . . . . . . . . . . . . 32 + 4.5. Loss Inversion . . . . . . . . . . . . . . . . . . . . . 33 + 4.6. Exceed Ratio . . . . . . . . . . . . . . . . . . . . . . 34 + 4.7. Duration Sum . . . . . . . . . . . . . . . . . . . . . . 34 + 4.8. Short Trials . . . . . . . . . . . . . . . . . . . . . . 35 + 4.9. Throughput . . . . . . . . . . . . . . . . . . . . . . . 35 + 4.10. Search Time . . . . . . . . . . . . . . . . . . . . . . . 37 + 4.11. RFC2544 Compliance . . . . . . . . . . . . . . . . . . . 38 + 5. Logic of Load Classification . . . . . . . . . . . . . . . . 38 + 5.1. Introductory Remarks . . . . . . . . . . . . . . . . . . 38 + 5.2. Performance Spectrum . . . . . . . . . . . . . . . . . . 38 + 5.2.1. First Example . . . . . . . . . . . . . . . . . . . . 39 + 5.2.2. Second Example . . . . . . . . . . . . . . . . . . . 40 + 5.2.3. Third Example . . . . . . . . . . . . . . . . . . . . 40 + 5.2.4. Summary . . . . . . . . . . . . . . . . . . . . . . . 40 + 5.3. Trials with Single Duration . . . . . . . . . . . . . . . 40 + 5.4. Trials with Short Duration . . . . . . . . . . . . . . . 42 + 5.4.1. Scenarios . . . . . . . . . . . . . . . . . . . . . . 42 + 5.4.2. Classification Logic . . . . . . . . . . . . . . . . 43 + 5.5. Trials with Longer Duration . . . . . . . . . . . . . . . 45 + 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45 + 7. Security Considerations . . . . . . . . . . . . . . . . . . . 45 + 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 46 + 9. Appendix A: Load Classification . . . . . . . . . . . . . . . 46 + 10. Appendix B: Conditional Throughput . . . . . . . . . . . . . 47 + 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 49 + 11.1. Normative References . . . . . . . . . . . . . . . . . . 49 + 11.2. Informative References . . . . . . . . . . . . . . . . . 49 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 49 + + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 3] + +Internet-Draft MLRsearch July 2024 + + +1. Purpose and Scope + + The purpose of this document is to describe Multiple Loss Ratio + search (MLRsearch), a data plane throughput search methodology + optimized for software networking DUTs. + + Applying vanilla [RFC2544] throughput bisection to software DUTs + results in several problems: + + * Binary search takes too long as most trials are done far from the + eventually found throughput. + + * The required final trial duration and pauses between trials + prolong the overall search duration. + + * Software DUTs show noisy trial results, leading to a big spread of + possible discovered throughput values. + + * Throughput requires a loss of exactly zero frames, but the + industry frequently allows for small but non-zero losses. + + * The definition of throughput is not clear when trial results are + inconsistent. + + To address the problems mentioned above, the MLRsearch test + methodology specification employs the following enhancements: + + * Allow multiple short trials instead of one big trial per load. + + - Optionally, tolerate a percentage of trial results with higher + loss. + + * Allow searching for multiple Search Goals, with differing loss + ratios. + + - Any trial result can affect each Search Goal in principle. + + * Insert multiple coarse targets for each Search Goal, earlier ones + need to spend less time on trials. + + - Earlier targets also aim for lesser precision. + + - Use Forwarding Rate (FR) at maximum offered load [RFC2285] + (section 3.6.2) to initialize the initial targets. + + * Take care when dealing with inconsistent trial results. + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 4] + +Internet-Draft MLRsearch July 2024 + + + - Reported throughput is smaller than the smallest load with high + loss. + + - Smaller load candidates are measured first. + + * Apply several load selection heuristics to save even more time by + trying hard to avoid unnecessarily narrow bounds. + + Some of these enhancements are formalized as MLRsearch specification, + the remaining enhancements are treated as implementation details, + thus achieving high comparability without limiting future + improvements. + + MLRsearch configuration options are flexible enough to support both + conservative settings and aggressive settings. The conservative + settings lead to results unconditionally compliant with [RFC2544], + but longer search duration and worse repeatability. Conversely, + aggressive settings lead to shorter search duration and better + repeatability, but the results are not compliant with [RFC2544]. + + No part of [RFC2544] is intended to be obsoleted by this document. + +2. Identified Problems + + This chapter describes the problems affecting usability of various + performance testing methodologies, mainly a binary search for + [RFC2544] unconditionally compliant throughput. + +2.1. Long Search Duration + + The emergence of software DUTs, with frequent software updates and a + number of different frame processing modes and configurations, has + increased both the number of performance tests required to verify the + DUT update and the frequency of running those tests. This makes the + overall test execution time even more important than before. + + The current [RFC2544] throughput definition restricts the potential + for time-efficiency improvements. A more generalized throughput + concept could enable further enhancements while maintaining the + precision of simpler methods. + + The bisection method, when unconditionally compliant with [RFC2544], + is excessively slow. This is because a significant amount of time is + spent on trials with loads that, in retrospect, are far from the + final determined throughput. + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 5] + +Internet-Draft MLRsearch July 2024 + + + [RFC2544] does not specify any stopping condition for throughput + search, so users already have an access to a limited trade-off + between search duration and achieved precision. However, each full + 60-second trials doubles the precision, so not many trials can be + removed without a substantial loss of precision. + +2.2. DUT in SUT + + [RFC2285] defines: - DUT as - The network forwarding device to which + stimulus is offered and response measured [RFC2285] (section 3.1.1). + - SUT as - The collective set of network devices to which stimulus is + offered as a single entity and response measured [RFC2285] (section + 3.1.2). + + [RFC2544] specifies a test setup with an external tester stimulating + the networking system, treating it either as a single DUT, or as a + system of devices, an SUT. + + In the case of software networking, the SUT consists of not only the + DUT as a software program processing frames, but also of server + hardware and operating system functions, with that server hardware + resources shared across all programs including the operating system. + + Given that the SUT is a shared multi-tenant environment encompassing + the DUT and other components, the DUT might inadvertently experience + interference from the operating system or other software operating on + the same server. + + Some of this interference can be mitigated. For instance, pinning + DUT program threads to specific CPU cores and isolating those cores + can prevent context switching. + + Despite taking all feasible precautions, some adverse effects may + still impact the DUT's network performance. In this document, these + effects are collectively referred to as SUT noise, even if the + effects are not as unpredictable as what other engineering + disciplines call noise. + + DUT can also exhibit fluctuating performance itself, for reasons not + related to the rest of SUT. For example due to pauses in execution + as needed for internal stateful processing. In many cases this may + be an expected per-design behavior, as it would be observable even in + a hypothetical scenario where all sources of SUT noise are + eliminated. Such behavior affects trial results in a way similar to + SUT noise. As the two phenomenons are hard to distinguish, in this + document the term 'noise' is used to encompass both the internal + performance fluctuations of the DUT and the genuine noise of the SUT. + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 6] + +Internet-Draft MLRsearch July 2024 + + + A simple model of SUT performance consists of an idealized noiseless + performance, and additional noise effects. For a specific SUT, the + noiseless performance is assumed to be constant, with all observed + performance variations being attributed to noise. The impact of the + noise can vary in time, sometimes wildly, even within a single trial. + The noise can sometimes be negligible, but frequently it lowers the + observed SUT performance as observed in trial results. + + In this model, SUT does not have a single performance value, it has a + spectrum. One end of the spectrum is the idealized noiseless + performance value, the other end can be called a noiseful + performance. In practice, trial result close to the noiseful end of + the spectrum happens only rarely. The worse the performance value + is, the more rarely it is seen in a trial. Therefore, the extreme + noiseful end of the SUT spectrum is not observable among trial + results. Also, the extreme noiseless end of the SUT spectrum is + unlikely to be observable, this time because some small noise effects + are likely to occur multiple times during a trial. + + Unless specified otherwise, this document's focus is on the + potentially observable ends of the SUT performance spectrum, as + opposed to the extreme ones. + + When focusing on the DUT, the benchmarking effort should ideally aim + to eliminate only the SUT noise from SUT measurements. However, this + is currently not feasible in practice, as there are no realistic + enough models available to distinguish SUT noise from DUT + fluctuations, based on authors' experience and available literature. + + Assuming a well-constructed SUT, the DUT is likely its primary + performance bottleneck. In this case, we can define the DUT's ideal + noiseless performance as the noiseless end of the SUT performance + spectrum, especially for throughput. However, other performance + metrics, such as latency, may require additional considerations. + + Note that by this definition, DUT noiseless performance also + minimizes the impact of DUT fluctuations, as much as realistically + possible for a given trial duration. + + MLRsearch methodology aims to solve the DUT in SUT problem by + estimating the noiseless end of the SUT performance spectrum using a + limited number of trial results. + + Any improvements to the throughput search algorithm, aimed at better + dealing with software networking SUT and DUT setup, should employ + strategies recognizing the presence of SUT noise, allowing the + discovery of (proxies for) DUT noiseless performance at different + levels of sensitivity to SUT noise. + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 7] + +Internet-Draft MLRsearch July 2024 + + +2.3. Repeatability and Comparability + + [RFC2544] does not suggest to repeat throughput search. And from + just one discovered throughput value, it cannot be determined how + repeatable that value is. Poor repeatability then leads to poor + comparability, as different benchmarking teams may obtain varying + throughput values for the same SUT, exceeding the expected + differences from search precision. + + [RFC2544] throughput requirements (60 seconds trial and no tolerance + of a single frame loss) affect the throughput results in the + following way. The SUT behavior close to the noiseful end of its + performance spectrum consists of rare occasions of significantly low + performance, but the long trial duration makes those occasions not so + rare on the trial level. Therefore, the binary search results tend + to wander away from the noiseless end of SUT performance spectrum, + more frequently and more widely than short trials would, thus causing + poor throughput repeatability. + + The repeatability problem can be addressed by defining a search + procedure that identifies a consistent level of performance, even if + it does not meet the strict definition of throughput in [RFC2544]. + + According to the SUT performance spectrum model, better repeatability + will be at the noiseless end of the spectrum. Therefore, solutions + to the DUT in SUT problem will help also with the repeatability + problem. + + Conversely, any alteration to [RFC2544] throughput search that + improves repeatability should be considered as less dependent on the + SUT noise. + + An alternative option is to simply run a search multiple times, and + report some statistics (e.g. average and standard deviation). This + can be used for a subset of tests deemed more important, but it makes + the search duration problem even more pronounced. + +2.4. Throughput with Non-Zero Loss + + [RFC1242] (section 3.17 Throughput) defines throughput as: The + maximum rate at which none of the offered frames are dropped by the + device. + + Then, it says: Since even the loss of one frame in a data stream can + cause significant delays while waiting for the higher level protocols + to time out, it is useful to know the actual maximum data rate that + the device can support. + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 8] + +Internet-Draft MLRsearch July 2024 + + + However, many benchmarking teams accept a small, non-zero loss ratio + as the goal for their load search. + + Motivations are many: + + * Modern protocols tolerate frame loss better, compared to the time + when [RFC1242] and [RFC2544] were specified. + + * Trials nowadays send way more frames within the same duration, + increasing the chance of a small SUT performance fluctuation being + enough to cause frame loss. + + * Small bursts of frame loss caused by noise have otherwise smaller + impact on the average frame loss ratio observed in the trial, as + during other parts of the same trial the SUT may work more closely + to its noiseless performance, thus perhaps lowering the Trial Loss + Ratio below the Goal Loss Ratio value. + + * If an approximation of the SUT noise impact on the Trial Loss + Ratio is known, it can be set as the Goal Loss Ratio. + + Regardless of the validity of all similar motivations, support for + non-zero loss goals makes any search algorithm more user-friendly. + [RFC2544] throughput is not user-friendly in this regard. + + Furthermore, allowing users to specify multiple loss ratio values, + and enabling a single search to find all relevant bounds, + significantly enhances the usefulness of the search algorithm. + + Searching for multiple Search Goals also helps to describe the SUT + performance spectrum better than the result of a single Search Goal. + For example, the repeated wide gap between zero and non-zero loss + loads indicates the noise has a large impact on the observed + performance, which is not evident from a single goal load search + procedure result. + + It is easy to modify the vanilla bisection to find a lower bound for + the intended load that satisfies a non-zero Goal Loss Ratio. But it + is not that obvious how to search for multiple goals at once, hence + the support for multiple Search Goals remains a problem. + +2.5. Inconsistent Trial Results + + While performing throughput search by executing a sequence of + measurement trials, there is a risk of encountering inconsistencies + between trial results. + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 9] + +Internet-Draft MLRsearch July 2024 + + + The plain bisection never encounters inconsistent trials. But + [RFC2544] hints about the possibility of inconsistent trial results, + in two places in its text. The first place is section 24, where full + trial durations are required, presumably because they can be + inconsistent with the results from short trial durations. The second + place is section 26.3, where two successive zero-loss trials are + recommended, presumably because after one zero-loss trial there can + be a subsequent inconsistent non-zero-loss trial. + + Examples include: + + * A trial at the same load (same or different trial duration) + results in a different Trial Loss Ratio. + + * A trial at a higher load (same or different trial duration) + results in a smaller Trial Loss Ratio. + + Any robust throughput search algorithm needs to decide how to + continue the search in the presence of such inconsistencies. + Definitions of throughput in [RFC1242] and [RFC2544] are not specific + enough to imply a unique way of handling such inconsistencies. + + Ideally, there will be a definition of a new quantity which both + generalizes throughput for non-zero-loss (and other possible + repeatability enhancements), while being precise enough to force a + specific way to resolve trial result inconsistencies. But until such + a definition is agreed upon, the correct way to handle inconsistent + trial results remains an open problem. + +3. MLRsearch Specification + + This section describes MLRsearch specification including all + technical definitions needed for evaluating whether a particular test + procedure complies with MLRsearch specification. + +3.1. Overview + + MLRsearch specification describes a set of abstract system + components, acting as functions with specified inputs and outputs. + + A test procedure is said to comply with MLRsearch specification if it + can be conceptually divided into analogous components, each + satisfying requirements for the corresponding MLRsearch component. + + + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 10] + +Internet-Draft MLRsearch July 2024 + + + The Measurer component is tasked to perform trials, the Controller + component is tasked to select trial loads and durations, the Manager + component is tasked to pre-configure everything and to produce the + test report. The test report explicitly states Search Goals (as the + Controller Inputs) and corresponding Goal Results (Controller + Outputs). + + The Manager calls the Controller once, the Controller keeps calling + the Measurer until all stopping conditions are met. + + The part where Controller calls the Measurer is called the search. + Any activity done by the Manager before it calls the Controller (or + after Controller returns) is not considered to be part of the search. + + MLRsearch specification prescribes regular search results and + recommends their stopping conditions. Irregular search results are + also allowed, they may have different requirements and stopping + conditions. + + Search results are based on load classification. When measured + enough, any chosen load either achieves of fails each search goal, + thus becoming a lower or an upper bound for that goal. When the + relevant bounds are at loads that are close enough (according to goal + precision), the regular result is found. Search stops when all + regular results are found (or if some goals are proven to have only + irregular results). + +3.2. Measurement Quantities + + MLRsearch specification uses a number of measurement quantities. + + In general, MLRsearch specification does not require particular units + to be used, but it is REQUIRED for the test report to state all the + units. For example, ratio quantities can be dimensionless numbers + between zero and one, but may be expressed as percentages instead. + + For convenience, a group of quantities can be treated as a composite + quantity, One constituent of a composite quantity is called an + attribute, and a group of attribute values is called an instance of + that composite quantity. + + Some attributes are not independent from others, and they can be + calculated from other attributes. Such quantites are called derived + quantities. + + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 11] + +Internet-Draft MLRsearch July 2024 + + +3.3. Existing Terms + + RFC 1242 "Benchmarking Terminology for Network Interconnect Devices" + contains basic definitions, and RFC 2544 "Benchmarking Methodology + for Network Interconnect Devices" contains discussions of a number of + terms and additional methodology requirements. RFC 2285 adds more + terms and discussions, describing some known situations in more + precise way. + + All three documents should be consulted before attempting to make use + of this document. + + Definitions of some central terms are copied and discussed in + subsections. + +3.3.1. SUT + + Defined in [RFC2285] (section 3.1.2 System Under Test (SUT)) as + follows. + + Definition: + + The collective set of network devices to which stimulus is offered as + a single entity and response measured. + + Discussion: + + An SUT consisting of a single network device is also allowed. + +3.3.2. DUT + + Defined in [RFC2285] (section 3.1.1 Device Under Test (DUT)) as + follows. + + Definition: + + The network forwarding device to which stimulus is offered and + response measured. + + Discussion: + + DUT, as a sub-component of SUT, is only indirectly mentioned in + MLRsearch specification, but is of key relevance for its motivation. + +3.3.3. Trial + + A trial is the part of the test described in [RFC2544] (section 23. + Trial description). + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 12] + +Internet-Draft MLRsearch July 2024 + + + Definition: + + A particular test consists of multiple trials. Each trial returns + one piece of information, for example the loss rate at a particular + input frame rate. Each trial consists of a number of phases: + + a) If the DUT is a router, send the routing update to the "input" + port and pause two seconds to be sure that the routing has settled. + + b) Send the "learning frames" to the "output" port and wait 2 seconds + to be sure that the learning has settled. Bridge learning frames are + frames with source addresses that are the same as the destination + addresses used by the test frames. Learning frames for other + protocols are used to prime the address resolution tables in the DUT. + The formats of the learning frame that should be used are shown in + the Test Frame Formats document. + + c) Run the test trial. + + d) Wait for two seconds for any residual frames to be received. + + e) Wait for at least five seconds for the DUT to restabilize. + + Discussion: + + The definition describes some traits, it is not clear whether all of + them are REQUIRED, or some of them are only RECOMMENDED. + + For the purposes of the MLRsearch specification, it is ALLOWED for + the test procedure to deviate from the [RFC2544] description, but any + such deviation MUST be made explicit in the test report. + + Trials are the only stimuli the SUT is expected to experience during + the search. + + In some discussion paragraphs, it is useful to consider the traffic + as sent and received by a tester, as implicitly defined in [RFC2544] + (section 6. Test set up). + + An example of deviation from [RFC2544] is using shorter wait times. + +3.4. Trial Terms + + This section defines new and redefine existing terms for quantities + relevant as inputs or outputs of trial, as used by the Measurer + component. + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 13] + +Internet-Draft MLRsearch July 2024 + + +3.4.1. Trial Duration + + Definition: + + Trial duration is the intended duration of the traffic for a trial. + + Discussion: + + In general, this quantity does not include any preparation nor + waiting described in section 23 of [RFC2544] (section 23. Trial + description). + + While any positive real value may be provided, some Measurer + implementations MAY limit possible values, e.g. by rounding down to + neared integer in seconds. In that case, it is RECOMMENDED to give + such inputs to the Controller so the Controller only proposes the + accepted values. Alternatively, the test report MUST present the + rounded values as Search Goal attributes. + +3.4.2. Trial Load + + Definition: + + The trial load is the intended load for a trial + + Discussion: + + For test report purposes, it is assumed that this is a constant load + by default. This MAY be only an average load, e.g. when the traffic + is intended to be busty, e.g. as suggested in [RFC2544] (section 21. + Bursty traffic), but the test report MUST explicitly mention how non- + constant the traffic is. + + Trial load is the quantity defined as Constant Load of [RFC1242] + (section 3.4 Constant Load), Data Rate of [RFC2544] (section 14. + Bidirectional traffic) and Intended Load of [RFC2285] (section 3.5.1 + Intended load (Iload)). All three definitions specify that this + value applies to one (input or output) interface. + + For test report purposes, multi-interface aggregate load MAY be + reported, this is understood as the same quantity expressed using + different units. From the report it MUST be clear whether a + particular trial load value is per one interface, or an aggregate + over all interfaces. + + Similarly to trial duration, some Measurers may limit the possible + values of trial load. Contrary to trial duration, the test report is + NOT REQUIRED to document such behavior. + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 14] + +Internet-Draft MLRsearch July 2024 + + + It is ALLOWED to combine trial load and trial duration in a way that + would not be possible to achieve using any integer number of data + frames. + +3.4.3. Trial Input + + Definition: + + Trial Input is a composite quantity, consisting of two attributes: + trial duration and trial load. + + Discussion: + + When talking about multiple trials, it is common to say "Trial + Inputs" to denote all corresponding Trial Input instances. + + A Trial Input instance acts as the input for one call of the Measurer + component. + + Contrary to other composite quantities, MLRsearch implementations are + NOT ALLOWED to add optional attributes here. This improves + interoperability between various implementations of the Controller + and the Measurer. + +3.4.4. Traffic Profile + + Definition: + + Traffic profile is a composite quantity containing attributes other + than trial load and trial duration, needed for unique determination + of the trial to be performed. + + Discussion: + + All its attributes are assumed to be constant during the search, and + the composite is configured on the Measurer by the Manager before the + search starts. This is why the traffic profile is not part of the + Trial Input. + + As a consequence, implementations of the Manager and the Measurer + must be aware of their common set of capabilities, so that the + traffic profile uniquely defines the traffic during the search. The + important fact is that none of those capabilities have to be known by + the Controller implementations. + + The traffic profile SHOULD contain some specific quantities, for + example [RFC2544] (section 9. Frame sizes) governs data link frame + size as defined in [RFC1242] (section 3.5 Data link frame size). + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 15] + +Internet-Draft MLRsearch July 2024 + + + Several more specific quantities may be RECOMMENDED, depending on + media type. For example, [RFC2544] (Appendix C) lists frame formats + and protocol addresses, as recommended from [RFC2544] (section 8. + Frame formats) and [RFC2544] (section 12. Protocol addresses). + + Depending on SUT configuration, e.g. when testing specific protocols, + additional attributes MUST be included in the traffic profile and in + the test report. + + Example: [RFC8219] (section 5.3. Traffic Setup) introduces traffic + setups consisting of a mix of IPv4 and IPv6 traffic - the implied + traffic profile therefore must include an attribute for their + percentage. + + Other traffic properties that need to be somehow specified in Traffic + Profile include: [RFC2544] (section 14. Bidirectional traffic), + [RFC2285] (section 3.3.3 Fully meshed traffic), and [RFC2544] + (section 11. Modifiers). + +3.4.5. Trial Forwarding Ratio + + Definition: + + The trial forwarding ratio is a dimensionless floating point value. + It MUST range between 0.0 and 1.0, both inclusive. It is calculated + by dividing the number of frames successfully forwarded by the SUT by + the total number of frames expected to be forwarded during the trial + + Discussion: + + For most traffic profiles, "expected to be forwarded" means "intended + to get transmitted from Tester towards SUT". + + Trial forwarding ratio MAY be expressed in other units (e.g. as a + percentage) in the test report. + + Note that, contrary to loads, frame counts used to compute trial + forwarding ratio are aggregates over all SUT output interfaces. + + Questions around what is the correct number of frames that should + have been forwarded is generally outside of the scope of this + document. + +3.4.6. Trial Loss Ratio + + Definition: + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 16] + +Internet-Draft MLRsearch July 2024 + + + The Trial Loss Ratio is equal to one minus the trial forwarding + ratio. + + Discussion: + + 100% minus the trial forwarding ratio, when expressed as a + percentage. + + This is almost identical to Frame Loss Rate of [RFC1242] (section 3.6 + Frame Loss Rate), the only minor difference is that Trial Loss Ratio + does not need to be expressed as a percentage. + +3.4.7. Trial Forwarding Rate + + Definition: + + The trial forwarding rate is a derived quantity, calculated by + multiplying the trial load by the trial forwarding ratio. + + Discussion: + + It is important to note that while similar, this quantity is not + identical to the Forwarding Rate as defined in [RFC2285] (section + 3.6.1 Forwarding rate (FR)). The latter is specific to one output + interface only, whereas the trial forwarding ratio is based on frame + counts aggregated over all SUT output interfaces. + +3.4.8. Trial Effective Duration + + Definition: + + Trial effective duration is a time quantity related to the trial, by + default equal to the trial duration. + + Discussion: + + This is an optional feature. If the Measurer does not return any + trial effective duration value, the Controller MUST use the trial + duration value instead. + + Trial effective duration may be any time quantity chosen by the + Measurer to be used for time-based decisions in the Controller. + + The test report MUST explain how the Measurer computes the returned + trial effective duration values, if they are not always equal to the + trial duration. + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 17] + +Internet-Draft MLRsearch July 2024 + + + This feature can be beneficial for users who wish to manage the + overall search duration, rather than solely the traffic portion of + it. Simply measure the duration of the whole trial (waits including) + and use that as the trial effective duration. + + Also, this is a way for the Measurer to inform the Controller about + its surprising behavior, for example when rounding the trial duration + value. + +3.4.9. Trial Output + + Definition: + + Trial Output is a composite quantity. The REQUIRED attributes are + Trial Loss Ratio, trial effective duration and trial forwarding rate. + + Discussion: + + When talking about multiple trials, it is common to say "Trial + Outputs" to denote all corresponding Trial Output instances. + + Implementations may provide additional (optional) attributes. The + Controller implementations MUST ignore values of any optional + attribute they are not familiar with, except when passing Trial + Output instance to the Manager. + + Example of an optional attribute: The aggregate number of frames + expected to be forwarded during the trial, especially if it is not + just (a rounded-up value) implied by trial load and trial duration. + + While [RFC2285] (Section 3.5.2 Offered load (Oload)) requires the + offered load value to be reported for forwarding rate measurements, + it is NOT REQUIRED in MLRsearch specification. + +3.4.10. Trial Result + + Definition: + + Trial result is a composite quantity, consisting of the Trial Input + and the Trial Output. + + Discussion: + + When talking about multiple trials, it is common to say "trial + results" to denote all corresponding trial result instances. + + While implementations SHOULD NOT include additional attributes with + independent values, they MAY include derived quantities. + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 18] + +Internet-Draft MLRsearch July 2024 + + +3.5. Goal Terms + + This section defines new and redefine existing terms for quantities + indirectly relevant for inputs or outputs of the Controller + component. + + Several goal attributes are defined before introducing the main + component quantity: the Search Goal. + +3.5.1. Goal Final Trial Duration + + Definition: + + A threshold value for trial durations. + + Discussion: + + This attribute value MUST be positive. + + A trial with Trial Duration at least as long as the Goal Final Trial + Duration is called a full-length trial (with respect to the given + Search Goal). + + A trial that is not full-length is called a short trial. + + Informally, while MLRsearch is allowed to perform short trials, the + results from such short trials have only limited impact on search + results. + + One trial may be full-length for some Search Goals, but not for + others. + + The full relation of this goal to Controller Output is defined later + in this document in subsections of [Goal Result] (#Goal-Result). For + example, the Conditional Throughput for this goal is computed only + from full-length trial results. + +3.5.2. Goal Duration Sum + + Definition: + + A threshold value for a particular sum of trial effective durations. + + Discussion: + + This attribute value MUST be positive. + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 19] + +Internet-Draft MLRsearch July 2024 + + + Informally, even when looking only at full-length trials, MLRsearch + may spend up to this time measuring the same load value. + + If the Goal Duration Sum is larger than the Goal Final Trial + Duration, multiple full-length trials may need to be performed at the + same load. + + See [TST009 Example] (#TST009-Example) for an example where + possibility of multiple full-length trials at the same load is + intended. + + A Goal Duration Sum value lower than the Goal Final Trial Duration + (of the same goal) could save some search time, but is NOT + RECOMMENDED. See [Relevant Upper Bound] (#Relevant-Upper-Bound) for + partial explanation. + +3.5.3. Goal Loss Ratio + + Definition: + + A threshold value for Trial Loss Ratios. + + Discussion: + + Attribute value MUST be non-negative and smaller than one. + + A trial with Trial Loss Ratio larger than a Goal Loss Ratio value is + called a lossy trial, with respect to given Search Goal. + + Informally, if a load causes too many lossy trials, the Relevant + Lower Bound for this goal will be smaller than that load. + + If a trial is not lossy, it is called a low-loss trial, or + (specifically for zero Goal Loss Ratio value) zero-loss trial. + +3.5.4. Goal Exceed Ratio + + Definition: + + A threshold value for a particular ratio of sums of Trial Effective + Durations. + + Discussion: + + Attribute value MUST be non-negative and smaller than one. + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 20] + +Internet-Draft MLRsearch July 2024 + + + See later sections for details on which sums. Specifically, the + direct usage is only in [Appendix A: Load Classification] (#Appendix- + A:-Load-Classification) and [Appendix B: Conditional Throughput] + (#Appendix-B:-Conditional-Throughput). The impact of that usage is + discussed in subsections leading to [Goal Result] (#Goal-Result). + + Informally, the impact of lossy trials is controlled by this value. + Effectively, Goal Exceed Ratio is a percentage of full-length trials + that may be lossy without the load being classified as the [Relevant + Upper Bound] (#Relevant-Upper-Bound). + +3.5.5. Goal Width + + Definition: + + A value used as a threshold for deciding whether two trial load + values are close enough. + + Discussion: + + If present, the value MUST be positive. + + Informally, this acts as a stopping condition, controlling the + precision of the search. The search stops if every goal has reached + its precision. + + Implementations without this attribute MUST give the Controller other + ways to control the search stopping conditions. + + Absolute load difference and relative load difference are two popular + choices, but implementations may choose a different way to specify + width. + + The test report MUST make it clear what specific quantity is used as + Goal Width. + + It is RECOMMENDED to set the Goal Width (as relative difference) + value to a value no smaller than the Goal Loss Ratio. (The reason is + not obvious, see [Throughput] (#Throughput) if interested.) + +3.5.6. Search Goal + + Definition: + + The Search Goal is a composite quantity consisting of several + attributes, some of them are required. + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 21] + +Internet-Draft MLRsearch July 2024 + + + Required attributes: - Goal Final Trial Duration - Goal Duration Sum + - Goal Loss Ratio - Goal Exceed Ratio + + Optional attribute: - Goal Width + + Discussion: + + Implementations MAY add their own attributes. Those additional + attributes may be required by the implementation even if they are not + required by MLRsearch specification. But it is RECOMMENDED for those + implementations to support missing values by computing reasonable + defaults. + + The meaning of listed attributes is formally given only by their + indirect effect on the search results. + + Informally, later sections provide additional intuitions and examples + of the Search Goal attribute values. + + An example of additional attributes required by some implementations + is Goal Initial Trial Duration, together with another attribute that + controls possible intermediate Trial Duration values. The reasonable + default in this case is using the Goal Final Trial Duration and no + intermediate values. + +3.5.7. Controller Input + + Definition: + + Controller Input is a composite quantity required as an input for the + Controller. The only REQUIRED attribute is a list of Search Goal + instances. + + Discussion: + + MLRsearch implementations MAY use additional attributes. Those + additional attributes may be required by the implementation even if + they are not required by MLRsearch specification. + + Formally, the Manager does not apply any Controller configuration + apart from one Controller Input instance. + + For example, Traffic Profile is configured on the Measurer by the + Manager (without explicit assistance of the Controller). + + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 22] + +Internet-Draft MLRsearch July 2024 + + + The order of Search Goal instances in a list SHOULD NOT have a big + impact on Controller Output (see section [Controller Output] + (#Controller-Output) , but MLRsearch implementations MAY base their + behavior on the order of Search Goal instances in a list. + + An example of an optional attribute (outside the list of Search + Goals) required by some implementations is Max Load. While this is a + frequently used configuration parameter, already governed by + [RFC2544] (section 20. Maximum frame rate) and [RFC2285] (3.5.3 + Maximum offered load (MOL)), some implementations may detect or + discover it instead. + + In MLRsearch specification, the [Relevant Upper Bound] (#Relevant- + Upper-Bound) is added as a required attribute precisely because it + makes the search result independent of Max Load value. + +3.6. Search Goal Examples + +3.6.1. RFC2544 Goal + + The following set of values makes the search result unconditionally + compliant with [RFC2544] (section 24 Trial duration) + + * Goal Final Trial Duration = 60 seconds + + * Goal Duration Sum = 60 seconds + + * Goal Loss Ratio = 0% + + * Goal Exceed Ratio = 0% + + The latter two attributes are enough to make the search goal + conditionally compliant, adding the first attribute makes it + unconditionally compliant. + + The second attribute (Goal Duration Sum) only prevents MLRsearch from + repeating zero-loss full-length trials. + + Non-zero exceed ratio could prolong the search and allow loss + inversion between lower-load lossy short trial and higher-load full- + length zero-loss trial. From [RFC2544] alone, it is not clear + whether that higher load could be considered as compliant throughput. + + + + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 23] + +Internet-Draft MLRsearch July 2024 + + +3.6.2. TST009 Goal + + One of the alternatives to RFC2544 is described in [TST009] (section + 12.3.3 Binary search with loss verification). The idea there is to + repeat lossy trials, hoping for zero loss on second try, so the + results are closer to the noiseless end of performance sprectum, and + more repeatable and comparable. + + Only the variant with "z = infinity" is achievable with MLRsearch. + + For example, for "r = 2" variant, the following search goal should be + used: + + * Goal Final Trial Duration = 60 seconds + + * Goal Duration Sum = 120 seconds + + * Goal Loss Ratio = 0% + + * Goal Exceed Ratio = 50% + + If the first 60s trial has zero loss, it is enough for MLRsearch to + stop measuring at that load, as even a second lossy trial would still + fit within the exceed ratio. + + But if the first trial is lossy, MLRsearch needs to perform also the + second trial to classify that load. As Goal Duration Sum is twice as + long as Goal Final Trial Duration, third full-length trial is never + needed. + +3.7. Result Terms + + Before defining the output of the Controller, it is useful to define + what the Goal Result is. + + The Goal Result is a composite quantity. + + Following subsections define its attribute first, before describing + the Goal Result quantity. + + There is a correspondence between Search Goals and Goal Results. + Most of the following subsections refer to a given Search Goal, when + defining attributes of the Goal Result. Conversely, at the end of + the search, each Search Goal has its corresponding Goal Result. + + Conceptually, the search can be seen as a process of load + classification, where the Controller attempts to classify some loads + as an Upper Bound or a Lower Bound with respect to some Search Goal. + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 24] + +Internet-Draft MLRsearch July 2024 + + + Before defining real attributes of the goal result, it is useful to + define bounds in general. + +3.7.1. Relevant Upper Bound + + Definition: + + The Relevant Upper Bound is the smallest trial load value that is + classified at the end of the search as an upper bound (see + [Appendix A: Load Classification] (#Appendix-A:-Load-Classification)) + for the given Search Goal. + + Discussion: + + One search goal can have many different load classified as an upper + bound. At the end of the search, one of those loads will be the + smallest, becoming the relevant upper bound for that goal. + + In more detail, the set of all trial outputs (both short and full- + length, enough of them according to Goal Duration Sum) performed at + that smallest load failed to uphold all the requirements of the given + Search Goal, mainly the Goal Loss Ratio in combination with the Goal + Exceed Ratio. + + If Max Load does not cause enough lossy trials, the Relevant Upper + Bound does not exist. Conversely, if Relevant Upper Bound exists, it + is not affected by Max Load value. + +3.7.2. Relevant Lower Bound + + Definition: + + The Relevant Lower Bound is the largest trial load value among those + smaller than the Relevant Upper Bound, that got classified at the end + of the search as a lower bound (see [Appendix A: Load Classification] + (#Appendix-A:-Load-Classification)) for the given Search Goal. + + Discussion: + + Only among loads smaller that the relevant upper bound, the largest + load becomes the relevant lower bound. With loss inversion, stricter + upper bound matters. + + In more detail, the set of all trial outputs (both short and full- + length, enough of them according to Goal Duration Sum) performed at + that largest load managed to uphold all the requirements of the given + Search Goal, mainly the Goal Loss Ratio in combination with the Goal + Exceed Ratio. + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 25] + +Internet-Draft MLRsearch July 2024 + + + Is no load had enough low-loss trials, the relevant lower bound MAY + not exist. + + Strictly speaking, if the Relevant Upper Bound does not exist, the + Relevant Lower Bound also does not exist. In that case, Max Load is + classified as a lower bound, but it is not clear whether a higher + lower bound would be found if the search used a higher Max Load + value. + + For a regular Goal Result, the distance between the Relevant Lower + Bound and the Relevant Upper Bound MUST NOT be larger than the Goal + Width, if the implementation offers width as a goal attribute. + + Searching for anther search goal may cause a loss inversion + phenomenon, where a lower load is classified as an upper bound, but + also a higher load is classified as a lower bound for the same search + goal. The definition of the Relevant Lower Bound ignores such high + lower bounds. + +3.7.3. Conditional Throughput + + Definition: + + The Conditional Throughput (see section [Appendix B: Conditional + Throughput] (#Appendix-B:-Conditional-Throughput)) as evaluated at + the Relevant Lower Bound of the given Search Goal at the end of the + search. + + Discussion: + + Informally, this is a typical trial forwarding rate, expected to be + seen at the Relevant Lower Bound of the given Search Goal. + + But frequently it is only a conservative estimate thereof, as + MLRsearch implementations tend to stop gathering more data as soon as + they confirm the value cannot get worse than this estimate within the + Goal Duration Sum. + + This value is RECOMMENDED to be used when evaluating repeatability + and comparability if different MLRsearch implementations. + +3.7.4. Goal Result + + Definition: + + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 26] + +Internet-Draft MLRsearch July 2024 + + + The Goal Result is a composite quantity consisting of several + attributes. Relevant Upper Bound and Relevant Lower Bound are + REQUIRED attributes, Conditional Throughput is a RECOMMENDED + attribute. + + Discussion: + + Depending on SUT behavior, it is possible that one or both relevant + bounds do not exist. The goal result instance where the required + attribute values exist is informally called a Regular Goal Result + instance, so we can say some goals reached Irregular Goal Results. + + A typical Irregular Goal Result is when all trials at the Max Load + have zero loss, as the Relevant Upper Bound does not exist in that + case. + + It is RECOMMENDED that the test report will display such results + appropriately, although MLRsearch specification does not prescibe + how. + + Anything else regarging Irregular Goal Results, including their role + in stopping conditions of the search is outside the scope of this + document. + +3.7.5. Search Result + + Definition: + + The Search Result is a single composite object that maps each Search + Goal instance to a corresponding Goal Result instance. + + Discussion: + + Alternatively, the Search Result can be implemented as an ordered + list of the Goal Result instances, matching the order of Search Goal + instances. + + The Search Result (as a mapping) MUST map from all the Search Goal + instances present in the Controller Input. + +3.7.6. Controller Output + + Definition: + + The Controller Output is a composite quantity returned from the + Controller to the Manager at the end of the search. The Search + Result instance is its only REQUIRED attribute. + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 27] + +Internet-Draft MLRsearch July 2024 + + + Discussion: + + MLRsearch implementation MAY return additional data in the Controller + Output. + +3.8. MLRsearch Architecture + + MLRsearch architecture consists of three main system components: the + Manager, the Controller, and the Measurer. + + The architecture also implies the presence of other components, such + as the SUT and the Tester (as a sub-component of the Measurer). + + Protocols of communication between components are generally left + unspecified. For example, when MLRsearch specification mentions + "Controller calls Measurer", it is possible that the Controller + notifies the Manager to call the Measurer indirectly instead. This + way the Measurer implementations can be fully independent from the + Controller implementations, e.g. programmed in different programming + languages. + +3.8.1. Measurer + + Definition: + + The Measurer is an abstract system component that when called with a + [Trial Input] (#Trial-Input) instance, performs one [Trial] (#Trial), + and returns a [Trial Output] (#Trial-Output) instance. + + Discussion: + + This definition assumes the Measurer is already initialized. In + practice, there may be additional steps before the search, e.g. when + the Manager configures the traffic profile (either on the Measurer or + on its tester sub-component directly) and performs a warmup (if the + tester requires one). + + It is the responsibility of the Measurer implementation to uphold any + requirements and assumptions present in MLRsearch specification, e.g. + trial forwarding ratio not being larger than one. + + Implementers have some freedom. For example [RFC2544] (section 10. + Verifying received frames) gives some suggestions (but not + requirements) related to duplicated or reordered frames. + Implementations are RECOMMENDED to document their behavior related to + such freedoms in as detailed a way as possible. + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 28] + +Internet-Draft MLRsearch July 2024 + + + It is RECOMMENDED to benchmark the test equipment first, e.g. connect + sender and receiver directly (without any SUT in the path), find a + load value that guarantees the offered load is not too far from the + intended load, and use that value as the Max Load value. When + testing the real SUT, it is RECOMMENDED to turn any big difference + between the intended load and the offered load into increased Trial + Loss Ratio. + + Neither of the two recommendations are made into requirements, + because it is not easy to tell when the difference is big enough, in + a way thay would be dis-entangled from other Measurer freedoms. + +3.8.2. Controller + + Definition: + + The Controller is an abstract system component that when called with + a Controller Input instance repeatedly computes Trial Input instance + for the Measurer, obtains corresponding Trial Output instances, and + eventually returns a Controller Output instance. + + Discussion: + + Informally, the Controller has big freedom in selection of Trial + Inputs, and the implementations want to achieve the Search Goals in + the shortest expected time. + + The Controller's role in optimizing the overall search time + distinguishes MLRsearch algorithms from simpler search procedures. + + Informally, each implementation can have different stopping + conditions. Goal Width is only one example. In practice, + implementation details do not matter, as long as Goal Results are + regular. + +3.8.3. Manager + + Definition: + + The Manager is an abstract system component that is reponsible for + configuring other components, calling the Controller component once, + and for creating the test report following the reporting format as + defined in [RFC2544] (section 26. Benchmarking tests). + + Discussion: + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 29] + +Internet-Draft MLRsearch July 2024 + + + The Manager initializes the SUT, the Measurer (and the Tester if + independent) with their intended configurations before calling the + Controller. + + The Manager does not need to be able to tweak any Search Goal + attributes, but it MUST report all applied attribute values even if + not tweaked. + + In principle, there should be a "user" (human or CI) that "starts" or + "calls" the Manager and receives the report. The Manager MAY be able + to be called more than once whis way. + +3.9. Implementation Compliance + + Any networking measurement setup where there can be logically + delineated system components and there are components satisfying + requirements for the Measurer, the Controller and the Manager, is + considered to be compliant with MLRsearch design. + + These components can be seen as abstractions present in any testing + procedure. For example, there can be a single component acting both + as the Manager and the Controller, but as long as values of required + attributes of Search Goals and Goal Results are visible in the test + report, the Controller Input instance and output instance are + implied. + + For example, any setup for conditionally (or unconditionally) + compliant [RFC2544] throughput testing can be understood as a + MLRsearch architecture, assuming there is enough data to reconstruct + the Relevant Upper Bound. + + See [RFC2544 Goal] (#RFC2544-Goal) subsection for equivalent Search + Goal. + + Any test procedure that can be understood as (one call to the Manager + of) MLRsearch architecture is said to be compliant with MLRsearch + specification. + +4. Additional Considerations + + This section focuses on additional considerations, intuitions and + motivations pertaining to MLRsearch methodology. + + + + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 30] + +Internet-Draft MLRsearch July 2024 + + +4.1. MLRsearch Versions + + The MLRsearch algorithm has been developed in a code-first approach, + a Python library has been created, debugged, used in production and + published in PyPI before the first descriptions (even informal) were + published. + + But the code (and hence the description) was evolving over time. + Multiple versions of the library were used over past several years, + and later code was usually not compatible with earlier descriptions. + + The code in (some version of) MLRsearch library fully determines the + search process (for a given set of configuration parameters), leaving + no space for deviations. + + This historic meaning of MLRsearch, as a family of search algorithm + implementations, leaves plenty of space for future improvements, at + the cost of poor comparability of results of search algoritm + implementations. + + There are two competing needs. There is the need for standardization + in areas critical to comparability. There is also the need to allow + flexibility for implementations to innovate and improve in other + areas. This document defines MLRsearch as a new specification in a + manner that aims to fairly balance both needs. + +4.2. Stopping Conditions + + [RFC2544] prescribes that after performing one trial at a specific + offered load, the next offered load should be larger or smaller, + based on frame loss. + + The usual implementation uses binary search. Here a lossy trial + becomes a new upper bound, a lossless trial becomes a new lower + bound. The span of values between the tightest lower bound and the + tightest upper bound (including both values) forms an interval of + possible results, and after each trial the width of that interval + halves. + + Usually the binary search implementation tracks only the two tightest + bounds, simply calling them bounds. But the old values still remain + valid bounds, just not as tight as the new ones. + + After some number of trials, the tightest lower bound becomes the + throughput. [RFC2544] does not specify when, if ever, should the + search stop. + + MLRsearch introduces a concept of [Goal Width] (#Goal-Width). + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 31] + +Internet-Draft MLRsearch July 2024 + + + The search stops when the distance between the tightest upper bound + and the tightest lower bound is smaller than a user-configured value, + called Goal Width from now on. In other words, the interval width at + the end of the search has to be no larger than the Goal Width. + + This Goal Width value therefore determines the precision of the + result. Due to the fact that MLRsearch specification requires a + particular structure of the result (see [Trial Result] (#Trial- + Result) section), the result itself does contain enough information + to determine its precision, thus it is not required to report the + Goal Width value. + + This allows MLRsearch implementations to use stopping conditions + different from Goal Width. + +4.3. Load Classification + + MLRsearch keeps the basic logic of binary search (tracking tightest + bounds, measuring at the middle), perhaps with minor technical + differences. + + MLRsearch algorithm chooses an intended load (as opposed to the + offered load), the interval between bounds does not need to be split + exactly into two equal halves, and the final reported structure + specifies both bounds. + + The biggest difference is that to classify a load as an upper or + lower bound, MLRsearch may need more than one trial (depending on + configuration options) to be performed at the same intended load. + + In consequence, even if a load already does have few trial results, + it still may be classified as undecided, neither a lower bound nor an + upper bound. + + An explanation of the classification logic is given in the next + section [Logic of Load Classification] (#Logic-of-Load- + Classification), as it heavily relies on other subsections of this + section. + + For repeatability and comparability reasons, it is important that + given a set of trial results, all implementations of MLRsearch + classify the load equivalently. + +4.4. Loss Ratios + + Another difference between MLRsearch and [RFC2544] binary search is + in the goals of the search. [RFC2544] has a single goal, based on + classifying full-length trials as either lossless or lossy. + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 32] + +Internet-Draft MLRsearch July 2024 + + + MLRsearch, as the name suggests, can search for multiple goals, + differing in their loss ratios. The precise definition of the Goal + Loss Ratio will be given later. The [RFC2544] throughput goal then + simply becomes a zero Goal Loss Ratio. Different goals also may have + different Goal Widths. + + A set of trial results for one specific intended load value can + classify the load as an upper bound for some goals, but a lower bound + for some other goals, and undecided for the rest of the goals. + + Therefore, the load classification depends not only on trial results, + but also on the goal. The overall search procedure becomes more + complicated, when compared to binary search with a single goal, but + most of the complications do not affect the final result, except for + one phenomenon, loss inversion. + +4.5. Loss Inversion + + In [RFC2544] throughput search using bisection, any load with a lossy + trial becomes a hard upper bound, meaning every subsequent trial has + a smaller intended load. + + But in MLRsearch, a load that is classified as an upper bound for one + goal may still be a lower bound for another goal, and due to the + other goal MLRsearch will probably perform trials at even higher + loads. What to do when all such higher load trials happen to have + zero loss? Does it mean the earlier upper bound was not real? Does + it mean the later lossless trials are not considered a lower bound? + Surely we do not want to have an upper bound at a load smaller than a + lower bound. + + MLRsearch is conservative in these situations. The upper bound is + considered real, and the lossless trials at higher loads are + considered to be a coincidence, at least when computing the final + result. + + This is formalized using new notions, the [Relevant Upper Bound] + (#Relevant-Upper-Bound) and the [Relevant Lower Bound] (#Relevant- + Lower-Bound). Load classification is still based just on the set of + trial results at a given intended load (trials at other loads are + ignored), making it possible to have a lower load classified as an + upper bound, and a higher load classified as a lower bound (for the + same goal). The Relevant Upper Bound (for a goal) is the smallest + load classified as an upper bound. But the Relevant Lower Bound is + not simply the largest among lower bounds. It is the largest load + among loads that are lower bounds while also being smaller than the + Relevant Upper Bound. + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 33] + +Internet-Draft MLRsearch July 2024 + + + With these definitions, the Relevant Lower Bound is always smaller + than the Relevant Upper Bound (if both exist), and the two relevant + bounds are used analogously as the two tightest bounds in the binary + search. When they are less than the Goal Width apart, the relevant + bounds are used in the output. + + One consequence is that every trial result can have an impact on the + search result. That means if your SUT (or your traffic generator) + needs a warmup, be sure to warm it up before starting the search. + +4.6. Exceed Ratio + + The idea of performing multiple trials at the same load comes from a + model where some trial results (those with high loss) are affected by + infrequent effects, causing poor repeatability of [RFC2544] + throughput results. See the discussion about noiseful and noiseless + ends of the SUT performance spectrum in section [DUT in SUT] (#DUT- + in-SUT). Stable results are closer to the noiseless end of the SUT + performance spectrum, so MLRsearch may need to allow some frequency + of high-loss trials to ignore the rare but big effects near the + noiseful end. + + MLRsearch can do such trial result filtering, but it needs a + configuration option to tell it how frequent can the infrequent big + loss be. This option is called the exceed ratio. It tells MLRsearch + what ratio of trials (more exactly what ratio of trial seconds) can + have a [Trial Loss Ratio] (#Trial-Loss-Ratio) larger than the Goal + Loss Ratio and still be classified as a lower bound. Zero exceed + ratio means all trials have to have a Trial Loss Ratio equal to or + smaller than the Goal Loss Ratio. + + For explainability reasons, the RECOMMENDED value for exceed ratio is + 0.5, as it simplifies some later concepts by relating them to the + concept of median. + +4.7. Duration Sum + + When more than one trial is intended to classify a load, MLRsearch + also needs something that controls the number of trials needed. + Therefore, each goal also has an attribute called duration sum. + + The meaning of a [Goal Duration Sum] (#Goal-Duration-Sum) is that + when a load has (full-length) trials whose trial durations when + summed up give a value at least as big as the Goal Duration Sum + value, the load is guaranteed to be classified either as an upper + bound or a lower bound for that goal. + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 34] + +Internet-Draft MLRsearch July 2024 + + + Due to the fact that the duration sum has a big impact on the overall + search duration, and [RFC2544] prescribes wait intervals around trial + traffic, the MLRsearch algorithm is allowed to sum durations that are + different from the actual trial traffic durations. + + In the MLRsearch specification, the different duration values are + called [Trial Effective Duration] (#Trial-Effective-Duration). + +4.8. Short Trials + + MLRsearch requires each goal to specify its final trial duration. + Full-length trial is a shorter name for a trial whose intended trial + duration is equal to (or longer than) the goal final trial duration. + + Section 24 of [RFC2544] already anticipates possible time savings + when short trials (shorter than full-length trials) are used. Full- + length trials are the opposite of short trials, so they may also be + called long trials. + + Any MLRsearch implementation may include its own configuration + options which control when and how MLRsearch chooses to use short + trial durations. + + For explainability reasons, when exceed ratio of 0.5 is used, it is + recommended for the Goal Duration Sum to be an odd multiple of the + full trial durations, so Conditional Throughput becomes identical to + a median of a particular set of trial forwarding rates. + + The presence of short trial results complicates the load + classification logic. + + Full details are given later in section [Logic of Load + Classification] (#Logic-of-Load-Classification). In a nutshell, + results from short trials may cause a load to be classified as an + upper bound. This may cause loss inversion, and thus lower the + Relevant Lower Bound, below what would classification say when + considering full-length trials only. + +4.9. Throughput + + Due to the fact that testing equipment takes the intended load as an + input parameter for a trial measurement, any load search algorithm + needs to deal with intended load values internally. + + + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 35] + +Internet-Draft MLRsearch July 2024 + + + But in the presence of goals with a non-zero loss ratio, the intended + load usually does not match the user's intuition of what a throughput + is. The forwarding rate (as defined in [RFC2285] section 3.6.1) is + better, but it is not obvious how to generalize it for loads with + multiple trial results and a non-zero [Goal Loss Ratio] (#Goal-Loss- + Ratio). + + The best example is also the main motivation: hard limit performance. + Even if the medium allows higher performance, the SUT interfaces may + have their additional own limitations, e.g. a specific fps limit on + the NIC (a very common occurance). + + Ideally, those should be known and used when computing Max Load. But + if Max Load is higher that what interface can receive or transmit, + there will be a "hard limit" observed in trial results. Imagine the + hard limit is at 100 Mfps, Max Load is higher, and the goal loss + ratio is 0.5%. If DUT has no additional losses, 0.5% loss ratio will + be achieved at 100.5025 Mfps (the relevant lower bound). But it is + not intuitive to report SUT performance as a value that is larger + than known hard limit. We need a generalization of RFC2544 + throughput, different from just the relevant lower bound. + + MLRsearch defines one such generalization, called the Conditional + Throughput. It is the trial forwarding rate from one of the trials + performed at the load in question. Determining which trial exactly + is defined in [MLRsearch Specification] (#MLRsearch-Specification), + and in [Appendix B: Conditional Throughput] (#Appendix-B:- + Conditional-Throughput). + + In the hard limit example, 100.5 Mfps load will still have only 100.0 + Mfps forwarding rate, nicely confirming the known limitation. + + Conditional Throughput is partially related to load classification. + If a load is classified as a lower bound for a goal, the Conditional + Throughput can be calculated from trial results, and guaranteed to + show an loss ratio no larger than the Goal Loss Ratio. + + Note that when comparing the best (all zero loss) and worst case (all + loss just below Goal Loss Ratio), the same Relevant Lower Bound value + may result in the Conditional Throughput differing up to the Goal + Loss Ratio. + + + + + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 36] + +Internet-Draft MLRsearch July 2024 + + + Therefore it is rarely needed to set the Goal Width (if expressed as + the relative difference of loads) below the Goal Loss Ratio. In + other words, setting the Goal Width below the Goal Loss Ratio may + cause the Conditional Throughput for a larger loss ratio to become + smaller than a Conditional Throughput for a goal with a smaller Goal + Loss Ratio, which is counter-intuitive, considering they come from + the same search. Therefore it is RECOMMENDED to set the Goal Width + to a value no smaller than the Goal Loss Ratio. + + Overall, this Conditional Throughput does behave well for + comparability purposes. + +4.10. Search Time + + MLRsearch was primarily developed to reduce the time required to + determine a throughput, either the [RFC2544] compliant one, or some + generalization thereof. The art of achieving short search times is + mainly in the smart selection of intended loads (and intended + durations) for the next trial to perform. + + While there is an indirect impact of the load selection on the + reported values, in practice such impact tends to be small, even for + SUTs with quite a broad performance spectrum. + + A typical example of two approaches to load selection leading to + different Relevant Lower Bounds is when the interval is split in a + very uneven way. Any implementation choosing loads very close to the + current Relevant Lower Bound is quite likely to eventually stumble + upon a trial result with poor performance (due to SUT noise). For an + implementation choosing loads very close to the current Relevant + Upper Bound, this is unlikely, as it examines more loads that can see + a performance close to the noiseless end of the SUT performance + spectrum. + + However, as even splits optimize search duration at give precision, + MLRsearch implementations that prioritize minimizing search time are + unlikely to suffer from any such bias. + + Therefore, this document remains quite vague on load selection and + other optimization details, and configuration attributes related to + them. Assuming users prefer libraries that achieve short overall + search time, the definition of the Relevant Lower Bound should be + strict enough to ensure result repeatability and comparability + between different implementations, while not restricting future + implementations much. + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 37] + +Internet-Draft MLRsearch July 2024 + + +4.11. [RFC2544] Compliance + + Some Search Goal instances lead to results compliant with RFC2544. + See [RFC2544 Goal] (#RFC2544-Goal) for more details regarding both + conditional and unconditional compliance. + + The presence of other Search Goals does not affect the compliance of + this Goal Result. The Relevant Lower Bound and the Conditional + Throughput are in this case equal to each other, and the value is the + [RFC2544] throughput. + +5. Logic of Load Classification + +5.1. Introductory Remarks + + This chapter continues with explanations, but this time more precise + definitions are needed for readers to follow the explanations. + + Descriptions in this section are wordy and implementers should read + [MLRsearch Specification] (#MLRsearch-Specification) section and + Appendices for more concise definitions. + + The two areas of focus here are load classification and the + Conditional Throughput. + + To start with [Performance Spectrum] (#Performance-Spectrum) + subsection contains definitions needed to gain insight into what + Conditional Throughput means. Remaining subsections discuss load + classification. + + For load classification, it is useful to define *good trials* and + *bad trials*: + + * *Bad trial*: Trial is called bad (according to a goal) if its + [Trial Loss Ratio] (#Trial-Loss-Ratio) is larger than the [Goal + Loss Ratio] (#Goal-Loss-Ratio). + + * *Good trial*: Trial that is not bad is called good. + +5.2. Performance Spectrum + + ### Description + + There are several equivalent ways to explain the Conditional + Throughput computation. One of the ways relies on performance + spectrum. + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 38] + +Internet-Draft MLRsearch July 2024 + + + Take an intended load value, a trial duration value, and a finite set + of trial results, with all trials measured at that load value and + duration value. + + The performance spectrum is the function that maps any non-negative + real number into a sum of trial durations among all trials in the + set, that has that number, as their trial forwarding rate, e.g. map + to zero if no trial has that particular forwarding rate. + + A related function, defined if there is at least one trial in the + set, is the performance spectrum divided by the sum of the durations + of all trials in the set. + + That function is called the performance probability function, as it + satisfies all the requirements for probability mass function of a + discrete probability distribution, the one-dimensional random + variable being the trial forwarding rate. + + These functions are related to the SUT performance spectrum, as + sampled by the trials in the set. + + Take a set of all full-length trials performed at the Relevant Lower + Bound, sorted by decreasing trial forwarding rate. The sum of the + durations of those trials may be less than the Goal Duration Sum, or + not. If it is less, add an imaginary trial result with zero trial + forwarding rate, such that the new sum of durations is equal to the + Goal Duration Sum. This is the set of trials to use. + + If the quantile touches two trials, + + the larger trial forwarding rate (from the trial result sorted + earlier) is used. + + The resulting quantity is the Conditional Throughput of the goal in + question. + + A set of examples follows. + +5.2.1. First Example + + * [Goal Exceed Ratio] (#Goal-Exceed-Ratio) = 0 and [Goal Duration + Sum] (#Goal-Duration-Sum) has been reached. + + * Conditional Throughput is the smallest trial forwarding rate among + the trials. + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 39] + +Internet-Draft MLRsearch July 2024 + + +5.2.2. Second Example + + * Goal Exceed Ratio = 0 and Goal Duration Sum has not been reached + yet. + + * Due to the missing duration sum, the worst case may still happen, + so the Conditional Throughput is zero. + + * This is not reported to the user, as this load cannot become the + Relevant Lower Bound yet. + +5.2.3. Third Example + + * Goal Exceed Ratio = 50% and Goal Duration Sum is two seconds. + + * One trial is present with the duration of one second and zero + loss. + + * The imaginary trial is added with the duration of one second and + zero trial forwarding rate. + + * The median would touch both trials, so the Conditional Throughput + is the trial forwarding rate of the one non-imaginary trial. + + * As that had zero loss, the value is equal to the offered load. + +5.2.4. Summary + + While the Conditional Throughput is a generalization of the trial + forwarding rate, its definition is not an obvious one. + + Other than the trial forwarding rate, the other source of intuition + is the quantile in general, and the median the recommended case. + +5.3. Trials with Single Duration + + When goal attributes are chosen in such a way that every trial has + the same intended duration, the load classification is simpler. + + The following description follows the motivation of Goal Loss Ratio, + Goal Exceed Ratio, and Goal Duration Sum. + + If the sum of the durations of all trials (at the given load) is less + than the Goal Duration Sum, imagine two scenarios: + + * *best case scenario*: all subsequent trials having zero loss, and + + * *worst case scenario*: all subsequent trials having 100% loss. + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 40] + +Internet-Draft MLRsearch July 2024 + + + Here we assume there are as many subsequent trials as needed to make + the sum of all trials equal to the Goal Duration Sum. + + The exceed ratio is defined using sums of durations (and number of + trials does not matter), so it does not matter whether the + "subsequent trials" can consist of an integer number of full-length + trials. + + In any of the two scenarios, best case and worst case, we can compute + the load exceed ratio, as the duration sum of good trials divided by + the duration sum of all trials, in both cases including the assumed + trials. + + Even if, in the best case scenario, the load exceed ratio is larger + than the Goal Exceed Ratio, the load is an upper bound. + + MKP2 Even if, in the worst case scenario, the load exceed ratio is + not larger than the Goal Exceed Ratio, the load is a lower bound. + + More specifically: + + * Take all trials measured at a given load. + + * The sum of the durations of all bad full-length trials is called + the bad sum. + + * The sum of the durations of all good full-length trials is called + the good sum. + + * The result of adding the bad sum plus the good sum is called the + measured sum. + + * The larger of the measured sum and the Goal Duration Sum is called + the whole sum. + + * The whole sum minus the measured sum is called the missing sum. + + * The optimistic exceed ratio is the bad sum divided by the whole + sum. + + * The pessimistic exceed ratio is the bad sum plus the missing sum, + that divided by the whole sum. + + * If the optimistic exceed ratio is larger than the Goal Exceed + Ratio, the load is classified as an upper bound. + + * If the pessimistic exceed ratio is not larger than the Goal Exceed + Ratio, the load is classified as a lower bound. + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 41] + +Internet-Draft MLRsearch July 2024 + + + * Else, the load is classified as undecided. + + The definition of pessimistic exceed ratio is compatible with the + logic in the Conditional Throughput computation, so in this single + trial duration case, a load is a lower bound if and only if the + Conditional Throughput loss ratio is not larger than the Goal Loss + Ratio. + + If it is larger, the load is either an upper bound or undecided. + +5.4. Trials with Short Duration + +5.4.1. Scenarios + + Trials with intended duration smaller than the goal final trial + duration are called short trials. The motivation for load + classification logic in the presence of short trials is based around + a counter-factual case: What would the trial result be if a short + trial has been measured as a full-length trial instead? + + There are three main scenarios where human intuition guides the + intended behavior of load classification. + +5.4.1.1. False Good Scenario + + The user had their reason for not configuring a shorter goal final + trial duration. Perhaps SUT has buffers that may get full at longer + trial durations. Perhaps SUT shows periodic decreases in performance + the user does not want to be treated as noise. + + In any case, many good short trials may become bad full-length trials + in the counter-factual case. + + In extreme cases, there are plenty of good short trials and no bad + short trials. + + In this scenario, we want the load classification NOT to classify the + load as a lower bound, despite the abundance of good short trials. + + Effectively, we want the good short trials to be ignored, so they do + not contribute to comparisons with the Goal Duration Sum. + +5.4.1.2. True Bad Scenario + + When there is a frame loss in a short trial, the counter-factual + full-length trial is expected to lose at least as many frames. + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 42] + +Internet-Draft MLRsearch July 2024 + + + In practice, bad short trials are rarely turning into good full- + length trials. + + In extreme cases, there are no good short trials. + + In this scenario, we want the load classification to classify the + load as an upper bound just based on the abundance of short bad + trials. + + Effectively, we want the bad short trials to contribute to + comparisons with the Goal Duration Sum, so the load can be classified + sooner. + +5.4.1.3. Balanced Scenario + + Some SUTs are quite indifferent to trial duration. Performance + probability function constructed from short trial results is likely + to be similar to the performance probability function constructed + from full-length trial results (perhaps with larger dispersion, but + without a big impact on the median quantiles overall). + + For a moderate Goal Exceed Ratio value, this may mean there are both + good short trials and bad short trials. + + This scenario is there just to invalidate a simple heuristic of + always ignoring good short trials and never ignoring bad short + trials, as that simple heuristic would be too biased. + + Yes, the short bad trials are likely to turn into full-length bad + trials in the counter-factual case, but there is no information on + what would the good short trials turn into. + + The only way to decide safely is to do more trials at full length, + the same as in False Good Scenario. + +5.4.2. Classification Logic + + MLRsearch picks a particular logic for load classification in the + presence of short trials, but it is still RECOMMENDED to use + configurations that imply no short trials, so the possible + inefficiencies in that logic do not affect the result, and the result + has better explainability. + + With that said, the logic differs from the single trial duration case + only in different definition of the bad sum. The good sum is still + the sum across all good full-length trials. + + Few more notions are needed for defining the new bad sum: + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 43] + +Internet-Draft MLRsearch July 2024 + + + * The sum of durations of all bad full-length trials is called the + bad long sum. + + * The sum of durations of all bad short trials is called the bad + short sum. + + * The sum of durations of all good short trials is called the good + short sum. + + * One minus the Goal Exceed Ratio is called the subceed ratio. + + * The Goal Exceed Ratio divided by the subceed ratio is called the + exceed coefficient. + + * The good short sum multiplied by the exceed coefficient is called + the balancing sum. + + * The bad short sum minus the balancing sum is called the excess + sum. + + * If the excess sum is negative, the bad sum is equal to the bad + long sum. + + * Otherwise, the bad sum is equal to the bad long sum plus the + excess sum. + + Here is how the new definition of the bad sum fares in the three + scenarios, where the load is close to what would the relevant bounds + be if only full-length trials were used for the search. + +5.4.2.1. False Good Scenario + + If the duration is too short, we expect to see a higher frequency of + good short trials. This could lead to a negative excess sum, which + has no impact, hence the load classification is given just by full- + length trials. Thus, MLRsearch using too short trials has no + detrimental effect on result comparability in this scenario. But + also using short trials does not help with overall search duration, + probably making it worse. + +5.4.2.2. True Bad Scenario + + Settings with a small exceed ratio have a small exceed coefficient, + so the impact of the good short sum is small, and the bad short sum + is almost wholly converted into excess sum, thus bad short trials + have almost as big an impact as full-length bad trials. The same + conclusion applies to moderate exceed ratio values when the good + short sum is small. Thus, short trials can cause a load to get + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 44] + +Internet-Draft MLRsearch July 2024 + + + classified as an upper bound earlier, bringing time savings (while + not affecting comparability). + +5.4.2.3. Balanced Scenario + + Here excess sum is small in absolute value, as the balancing sum is + expected to be similar to the bad short sum. Once again, full-length + trials are needed for final load classification; but usage of short + trials probably means MLRsearch needed a shorter overall search time + before selecting this load for measurement, thus bringing time + savings (while not affecting comparability). + + Note that in presence of short trial results, the comparibility + between the load classification and the Conditional Throughput is + only partial. The Conditional Throughput still comes from a good + long trial, but a load higher than the Relevant Lower Bound may also + compute to a good value. + +5.5. Trials with Longer Duration + + If there are trial results with an intended duration larger than the + goal trial duration, the precise definitions in Appendix A and + Appendix B treat them in exactly the same way as trials with duration + equal to the goal trial duration. + + But in configurations with moderate (including 0.5) or small Goal + Exceed Ratio and small Goal Loss Ratio (especially zero), bad trials + with longer than goal durations may bias the search towards the lower + load values, as the noiseful end of the spectrum gets a larger + probability of causing the loss within the longer trials. + +6. IANA Considerations + + No requests of IANA. + +7. Security Considerations + + Benchmarking activities as described in this memo are limited to + technology characterization of a DUT/SUT using controlled stimuli in + a laboratory environment, with dedicated address space and the + constraints specified in the sections above. + + The benchmarking network topology will be an independent test setup + and MUST NOT be connected to devices that may forward the test + traffic into a production network or misroute traffic to the test + management network. + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 45] + +Internet-Draft MLRsearch July 2024 + + + Further, benchmarking is performed on a "black-box" basis, relying + solely on measurements observable external to the DUT/SUT. + + Special capabilities SHOULD NOT exist in the DUT/SUT specifically for + benchmarking purposes. Any implications for network security arising + from the DUT/SUT SHOULD be identical in the lab and in production + networks. + +8. Acknowledgements + + Some phrases and statements in this document were created with help + of Mistral AI (mistral.ai). + + Many thanks to Alec Hothan of the OPNFV NFVbench project for thorough + review and numerous useful comments and suggestions in the earlier + versions of this document. + + Special wholehearted gratitude and thanks to the late Al Morton for + his thorough reviews filled with very specific feedback and + constructive guidelines. Thank you Al for the close collaboration + over the years, for your continuous unwavering encouragement full of + empathy and positive attitude. Al, you are dearly missed. + +9. Appendix A: Load Classification + + This section specifies how to perform the load classification. + + Any intended load value can be classified, according to a given + [Search Goal] (#Search-Goal). + + The algorithm uses (some subsets of) the set of all available trial + results from trials measured at a given intended load at the end of + the search. All durations are those returned by the Measurer. + + The block at the end of this appendix holds pseudocode which computes + two values, stored in variables named optimistic and pessimistic. + + The pseudocode happens to be a valid Python code. + + If values of both variables are computed to be true, the load in + question is classified as a lower bound according to the given Search + Goal. If values of both variables are false, the load is classified + as an upper bound. Otherwise, the load is classified as undecided. + + The pseudocode expects the following variables to hold values as + follows: + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 46] + +Internet-Draft MLRsearch July 2024 + + + * goal_duration_sum: The duration sum value of the given Search + Goal. + + * goal_exceed_ratio: The exceed ratio value of the given Search + Goal. + + * good_long_sum: Sum of durations across trials with trial duration + at least equal to the goal final trial duration and with a Trial + Loss Ratio not higher than the Goal Loss Ratio. + + * bad_long_sum: Sum of durations across trials with trial duration + at least equal to the goal final trial duration and with a Trial + Loss Ratio higher than the Goal Loss Ratio. + + * good_short_sum: Sum of durations across trials with trial duration + shorter than the goal final trial duration and with a Trial Loss + Ratio not higher than the Goal Loss Ratio. + + * bad_short_sum: Sum of durations across trials with trial duration + shorter than the goal final trial duration and with a Trial Loss + Ratio higher than the Goal Loss Ratio. + + The code works correctly also when there are no trial results at a + given load. + + balancing_sum = good_short_sum * goal_exceed_ratio / (1.0 - goal_exceed_ratio) + effective_bad_sum = bad_long_sum + max(0.0, bad_short_sum - balancing_sum) + effective_whole_sum = max(good_long_sum + effective_bad_sum, goal_duration_sum) + quantile_duration_sum = effective_whole_sum * goal_exceed_ratio + optimistic = effective_bad_sum <= quantile_duration_sum + pessimistic = (effective_whole_sum - good_long_sum) <= quantile_duration_sum + +10. Appendix B: Conditional Throughput + + This section specifies how to compute Conditional Throughput, as + referred to in section [Conditional Throughput] (#Conditional- + Throughput). + + Any intended load value can be used as the basis for the following + computation, but only the Relevant Lower Bound (at the end of the + search) leads to the value called the Conditional Throughput for a + given Search Goal. + + The algorithm uses (some subsets of) the set of all available trial + results from trials measured at a given intended load at the end of + the search. All durations are those returned by the Measurer. + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 47] + +Internet-Draft MLRsearch July 2024 + + + The block at the end of this appendix holds pseudocode which computes + a value stored as variable conditional_throughput. + + The pseudocode happens to be a valid Python code. + + The pseudocode expects the following variables to hold values as + follows: + + * goal_duration_sum: The duration sum value of the given Search + Goal. + + * goal_exceed_ratio: The exceed ratio value of the given Search + Goal. + + * good_long_sum: Sum of durations across trials with trial duration + at least equal to the goal final trial duration and with a Trial + Loss Ratio not higher than the Goal Loss Ratio. + + * bad_long_sum: Sum of durations across trials with trial duration + at least equal to the goal final trial duration and with a Trial + Loss Ratio higher than the Goal Loss Ratio. + + * long_trials: An iterable of all trial results from trials with + trial duration at least equal to the goal final trial duration, + sorted by increasing the Trial Loss Ratio. A trial result is a + composite with the following two attributes available: + + - trial.loss_ratio: The Trial Loss Ratio as measured for this + trial. + + - trial.duration: The trial duration of this trial. + + The code works correctly only when there if there is at least one + trial result measured at a given load. + + all_long_sum = max(goal_duration_sum, good_long_sum + bad_long_sum) + remaining = all_long_sum * (1.0 - goal_exceed_ratio) + quantile_loss_ratio = None + for trial in long_trials: + if quantile_loss_ratio is None or remaining > 0.0: + quantile_loss_ratio = trial.loss_ratio + remaining -= trial.duration + else: + break + else: + if remaining > 0.0: + quantile_loss_ratio = 1.0 + conditional_throughput = intended_load * (1.0 - quantile_loss_ratio) + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 48] + +Internet-Draft MLRsearch July 2024 + + +11. References + +11.1. Normative References + + [RFC1242] Bradner, S., "Benchmarking Terminology for Network + Interconnection Devices", RFC 1242, DOI 10.17487/RFC1242, + July 1991, <https://www.rfc-editor.org/info/rfc1242>. + + [RFC2285] Mandeville, R., "Benchmarking Terminology for LAN + Switching Devices", RFC 2285, DOI 10.17487/RFC2285, + February 1998, <https://www.rfc-editor.org/info/rfc2285>. + + [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for + Network Interconnect Devices", RFC 2544, + DOI 10.17487/RFC2544, March 1999, + <https://www.rfc-editor.org/info/rfc2544>. + + [RFC8219] Georgescu, M., Pislaru, L., and G. Lencse, "Benchmarking + Methodology for IPv6 Transition Technologies", RFC 8219, + DOI 10.17487/RFC8219, August 2017, + <https://www.rfc-editor.org/info/rfc8219>. + + [RFC9004] Morton, A., "Updates for the Back-to-Back Frame Benchmark + in RFC 2544", RFC 9004, DOI 10.17487/RFC9004, May 2021, + <https://www.rfc-editor.org/info/rfc9004>. + +11.2. Informative References + + [FDio-CSIT-MLRsearch] + "FD.io CSIT Test Methodology - MLRsearch", October 2023, + <https://csit.fd.io/cdocs/methodology/measurements/ + data_plane_throughput/mlr_search/>. + + [PyPI-MLRsearch] + "MLRsearch 1.2.1, Python Package Index", October 2023, + <https://pypi.org/project/MLRsearch/1.2.1/>. + + [TST009] "TST 009", n.d., <https://www.etsi.org/deliver/etsi_gs/ + NFV-TST/001_099/009/03.04.01_60/gs_NFV- + TST009v030401p.pdf>. + +Authors' Addresses + + Maciek Konstantynowicz + Cisco Systems + Email: mkonstan@cisco.com + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 49] + +Internet-Draft MLRsearch July 2024 + + + Vratko Polak + Cisco Systems + Email: vrpolak@cisco.com + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Konstantynowicz & Polak Expires 18 January 2025 [Page 50] |