From 3c23979e8770e5d5e6dc104d36f20ea5697f3fcc Mon Sep 17 00:00:00 2001 From: Vratko Polak Date: Mon, 3 Feb 2020 18:51:14 +0100 Subject: Report: Edit minor details in methodology docs Change-Id: I9bbb97e635b6ef438dcb8bed3f69617bb98e9779 Signed-off-by: Vratko Polak --- .../methodology_data_plane_throughput.rst | 2 +- .../methodology_mlrsearch_tests.rst | 15 ++++++--------- .../methodology_mrr_throughput.rst | 8 +++++--- .../methodology_plrsearch.rst | 8 ++++---- .../introduction/methodology_kvm_vms_vhost_user.rst | 2 +- .../introduction/methodology_multi_core_speedup.rst | 8 ++++---- .../introduction/methodology_nfv_service_density.rst | 4 ++-- docs/report/introduction/methodology_packet_latency.rst | 4 ++-- .../report/introduction/methodology_quic_with_vppecho.rst | 3 ++- docs/report/introduction/methodology_reconf.rst | 11 ++++------- docs/report/introduction/methodology_tcp_with_iperf3.rst | 8 ++++---- docs/report/introduction/methodology_terminology.rst | 10 +++++----- .../introduction/methodology_trex_traffic_generator.rst | 9 ++++----- .../introduction/methodology_tunnel_encapsulations.rst | 4 ++-- .../introduction/methodology_vpp_device_functional.rst | 8 ++++---- 15 files changed, 50 insertions(+), 54 deletions(-) (limited to 'docs') diff --git a/docs/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.rst b/docs/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.rst index 202b4281b7..764e198d0f 100644 --- a/docs/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.rst +++ b/docs/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.rst @@ -111,7 +111,7 @@ PLRsearch are run to discover a sustained throughput for PLR=10^-7 frame sizes (64b/78B) are presented in packet throughput graphs (Box Plots) for a small subset of baseline tests. -Each soak test lasts 2hrs and is executed at least twice. Results are +Each soak test lasts 30 minutes and is executed at least twice. Results are compared against NDR and PDR rates discovered with MLRsearch. Details diff --git a/docs/report/introduction/methodology_data_plane_throughput/methodology_mlrsearch_tests.rst b/docs/report/introduction/methodology_data_plane_throughput/methodology_mlrsearch_tests.rst index acc974841d..1209697195 100644 --- a/docs/report/introduction/methodology_data_plane_throughput/methodology_mlrsearch_tests.rst +++ b/docs/report/introduction/methodology_data_plane_throughput/methodology_mlrsearch_tests.rst @@ -16,15 +16,15 @@ with zero packet loss, PLR=0) and Partial Drop Rate (PDR, with packet loss rate not greater than the configured non-zero PLR). MLRsearch discovers NDR and PDR in a single pass reducing required time -duration compared to separate binary searches for NDR and PDR. Overall +duration compared to separate `binary search`_es for NDR and PDR. Overall search time is reduced even further by relying on shorter trial durations of intermediate steps, with only the final measurements conducted at the specified final trial duration. This results in the shorter overall execution time when compared to standard NDR/PDR binary search, while guaranteeing similar results. -If needed, MLRsearch can be easily adopted to discover more throughput -rates with different pre-defined PLRs. +If needed, next version of MLRsearch can be easily adopted +to discover more throughput rates with different pre-defined PLRs. .. Note:: All throughput rates are *always* bi-directional aggregates of two equal (symmetric) uni-directional packet rates @@ -45,11 +45,8 @@ MLRsearch is also available as a `PyPI (Python Package Index) library Implementation Deviations ~~~~~~~~~~~~~~~~~~~~~~~~~ -FD.io CSIT implementation of MLRsearch so far is fully based on the -01 -version of the `draft-vpolak-mkonstan-mlrsearch-01 -`_. +FD.io CSIT implementation of MLRsearch so far is fully based on the -02 +version of the `draft-vpolak-mkonstan-mlrsearch-02 +`_. .. _binary search: https://en.wikipedia.org/wiki/Binary_search -.. _exponential search: https://en.wikipedia.org/wiki/Exponential_search -.. _estimation of standard deviation: https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation -.. _simplified error propagation formula: https://en.wikipedia.org/wiki/Propagation_of_uncertainty#Simplification diff --git a/docs/report/introduction/methodology_data_plane_throughput/methodology_mrr_throughput.rst b/docs/report/introduction/methodology_data_plane_throughput/methodology_mrr_throughput.rst index fd4baca2f3..4e8000b161 100644 --- a/docs/report/introduction/methodology_data_plane_throughput/methodology_mrr_throughput.rst +++ b/docs/report/introduction/methodology_data_plane_throughput/methodology_mrr_throughput.rst @@ -14,7 +14,7 @@ MRR tests are currently used for following test jobs: - Report performance comparison: 64B, IMIX for vhost, memif. - Daily performance trending: 64B, IMIX for vhost, memif. - Per-patch performance verification: 64B. -- PLRsearch soaking tests: 64B. +- Initial iterations of MLRsearch and PLRsearch: 64B. Maximum offered load for specific L2 Ethernet frame size is set to either the maximum bi-directional link rate or tested NIC model @@ -42,11 +42,13 @@ Burst parameter settings vary between different tests using MRR: - Report performance comparison: 1 sec. - Daily performance trending: 1 sec. - Per-patch performance verification: 10 sec. - - PLRsearch soaking tests: 5.2 sec. + - Initial iteration for MLRsearch: 1 sec. + - Initial iteration for PLRsearch: 5.2 sec. - Number of MRR trials per burst: - Report performance comparison: 10. - Daily performance trending: 10. - Per-patch performance verification: 5. - - PLRsearch soaking tests: 1. \ No newline at end of file + - Initial iteration for MLRsearch: 1. + - Initial iteration for PLRsearch: 1. diff --git a/docs/report/introduction/methodology_data_plane_throughput/methodology_plrsearch.rst b/docs/report/introduction/methodology_data_plane_throughput/methodology_plrsearch.rst index 65165b31c7..68f30bc562 100644 --- a/docs/report/introduction/methodology_data_plane_throughput/methodology_plrsearch.rst +++ b/docs/report/introduction/methodology_data_plane_throughput/methodology_plrsearch.rst @@ -102,7 +102,7 @@ of sum of exponentials") are defined to handle None correctly. Fitting Functions ````````````````` -Current implementation uses two fitting functions. +Current implementation uses two fitting functions, called "stretch" and "erf". In general, their estimates for critical rate differ, which adds a simple source of systematic error, on top of randomness error reported by integrator. @@ -113,7 +113,7 @@ Both functions are not only increasing, but also convex (meaning the rate of increase is also increasing). Both fitting functions have several mathematically equivalent formulas, -each can lead to an overflow or underflow in different sub-terms. +each can lead to an arithmetic overflow or underflow in different sub-terms. Overflows can be eliminated by using different exact formulas for different argument ranges. Underflows can be avoided by using approximate formulas @@ -128,7 +128,7 @@ Prior Distributions The numeric integrator expects all the parameters to be distributed (independently and) uniformly on an interval (-1, 1). -As both "mrr" and "spread" parameters are positive and not not dimensionless, +As both "mrr" and "spread" parameters are positive and not dimensionless, a transformation is needed. Dimentionality is inherited from max_rate value. The "mrr" parameter follows a `Lomax distribution`_ @@ -303,7 +303,7 @@ The following analysis will rely on frequency of zero loss measurements and magnitude of loss ratio if nonzero. The offered load selection strategy used implies zero loss measurements -can be gleamed from the graph by looking at offered load points. +can be gleaned from the graph by looking at offered load points. When the points move up farther from lower estimate, it means the previous measurement had zero loss. After non-zero loss, the offered load starts again right between (the previous values of) diff --git a/docs/report/introduction/methodology_kvm_vms_vhost_user.rst b/docs/report/introduction/methodology_kvm_vms_vhost_user.rst index e6a98596da..216d461911 100644 --- a/docs/report/introduction/methodology_kvm_vms_vhost_user.rst +++ b/docs/report/introduction/methodology_kvm_vms_vhost_user.rst @@ -3,7 +3,7 @@ KVM VMs vhost-user QEMU is used for KVM VM vhost-user testing enviroment. By default, standard QEMU version is used, preinstalled from OS repositories -(qemu-2.11.1 for Ubuntu 18.04, qemu-2.5.0 for Ubuntu 16.04). The path +(qemu-2.11.1 for Ubuntu 18.04). The path to the QEMU binary can be adjusted in `Constants.py`. FD.io CSIT performance lab is testing VPP vhost-user with KVM VMs using diff --git a/docs/report/introduction/methodology_multi_core_speedup.rst b/docs/report/introduction/methodology_multi_core_speedup.rst index b42bf42f92..095f0f7796 100644 --- a/docs/report/introduction/methodology_multi_core_speedup.rst +++ b/docs/report/introduction/methodology_multi_core_speedup.rst @@ -1,7 +1,7 @@ Multi-Core Speedup ------------------ -All performance tests are executed with single processor core and with +All performance tests are executed with single physical core and with multiple cores scenarios. Intel Hyper-Threading (HT) @@ -16,7 +16,7 @@ making it impractical for continuous changes of HT mode of operation. |csit-release| performance tests are executed with server SUTs' Intel XEON processors configured with Intel Hyper-Threading Disabled for all Xeon Haswell testbeds (3n-hsw) and with Intel Hyper-Threading Enabled -for all Xeon Skylake testbeds. +for all Xeon Skylake and Xeon Cascadelake testbeds. More information about physical testbeds is provided in :ref:`tested_physical_topologies`. @@ -34,8 +34,8 @@ thread and physical core configurations: #. 2t2c - 2 VPP worker threads on 2 physical cores. #. 4t4c - 4 VPP worker threads on 4 physical cores. -#. Intel Xeon Skylake testbeds (2n-skx, 3n-skx) with Intel HT enabled - (2 logical CPU cores per each physical core): +#. Intel Xeon Skylake and Cascadelake testbeds (2n-skx, 3n-skx, 2n-clx) + with Intel HT enabled (2 logical CPU cores per each physical core): #. 2t1c - 2 VPP worker threads on 1 physical core. #. 4t2c - 4 VPP worker threads on 2 physical cores. diff --git a/docs/report/introduction/methodology_nfv_service_density.rst b/docs/report/introduction/methodology_nfv_service_density.rst index b09c1be629..c5407b5125 100644 --- a/docs/report/introduction/methodology_nfv_service_density.rst +++ b/docs/report/introduction/methodology_nfv_service_density.rst @@ -16,8 +16,8 @@ service chain forwarding context(s). In order to provide a most complete picture, each network topology and service configuration is tested in different service density setups by varying two parameters: -- Number of service instances (e.g. 1,2,4..10). -- Number of NFs per service instance (e.g. 1,2,4..10). +- Number of service instances (e.g. 1, 2, 4, 6, 8, 10). +- Number of NFs per service instance (e.g. 1, 2, 4, 6, 8, 10). Implementation of NFV service density tests in |csit-release| is using two NF applications: diff --git a/docs/report/introduction/methodology_packet_latency.rst b/docs/report/introduction/methodology_packet_latency.rst index b8df660539..1f7ad7f633 100644 --- a/docs/report/introduction/methodology_packet_latency.rst +++ b/docs/report/introduction/methodology_packet_latency.rst @@ -1,7 +1,7 @@ Packet Latency -------------- -TRex Traffic Generator (TG) is used for measuring latency across 2-Node +TRex Traffic Generator (TG) is used for measuring latency across 2-Node and 3-Node SUT server topologies. TRex integrates `A High Dynamic Range Histogram (HDRH) `_ code providing per packet latency distribution for latency streams sent in parallel to the main @@ -30,4 +30,4 @@ methodology: setup used. - TG setup introduces an always-on Tx/Rx interface latency of about 2 * 2 usec per direction induced by TRex SW writing and reading packet - timestamps on CPU cores. \ No newline at end of file + timestamps on CPU cores. diff --git a/docs/report/introduction/methodology_quic_with_vppecho.rst b/docs/report/introduction/methodology_quic_with_vppecho.rst index 12b64203db..5579fb5954 100644 --- a/docs/report/introduction/methodology_quic_with_vppecho.rst +++ b/docs/report/introduction/methodology_quic_with_vppecho.rst @@ -32,6 +32,7 @@ where, measurements for all streams and the sum of all streams. Test cases include + 1. 1 QUIC Connection with 1 Stream 2. 1 QUIC connection with 10 Streams 3. 10 QUIC connetions with 1 Stream @@ -39,5 +40,5 @@ where, with stream sizes to provide reasonable test durations. The VPP Host Stack QUIC transport is configured to utilize the picotls encryption - library. In the future, tests utilizing addtional encryption + library. In the future, tests utilizing addtional encryption algorithms will be added. diff --git a/docs/report/introduction/methodology_reconf.rst b/docs/report/introduction/methodology_reconf.rst index 32e0fd7561..1a1f4cc98c 100644 --- a/docs/report/introduction/methodology_reconf.rst +++ b/docs/report/introduction/methodology_reconf.rst @@ -25,7 +25,7 @@ with somewhat long durations, and the re-configuration process can also be long, finding an offered load which would result in zero loss during the re-configuration process would be time-consuming. -Instead, reconf tests find a througput value (lower bound for NDR) +Instead, reconf tests first find a througput value (lower bound for NDR) without re-configuration, and then maintain that ofered load during re-configuration. The measured loss count is then assumed to be caused by the re-configuration process. The result published by reconf tests @@ -38,16 +38,16 @@ Current Implementation Each reconf suite is based on a similar MLRsearch performance suite. MLRsearch parameters are changed to speed up the throughput discovery. -For example, PDR is not searched for, and final trial duration is shorter. +For example, PDR is not searched for, and the final trial duration is shorter. The MLRsearch suite has to contain a configuration parameter -that can be scaled up, e.g. number of routes or number of service chains. +that can be scaled up, e.g. number of tunnels or number of service chains. Currently, only increasing the scale is supported as the re-configuration operation. In future, scale decrease or other operations can be implemented. The traffic profile is not changed, so the traffic present is processed -only by the smaller scale configuration. The added routes / chains +only by the smaller scale configuration. The added tunnels / chains are not targetted by the traffic. For the re-configuration, the same Robot Framework and Python libraries @@ -73,6 +73,3 @@ are expected without re-configuration. But different suites show different allowing full NIC buffers to drain quickly between worker pauses. For other suites, lower bound for NDR still has quite a large probability of non-zero packet loss even without re-configuration. - -But the results show very high effective blocked time, -so the two objections related to NDR lower bound are negligible in comparison. diff --git a/docs/report/introduction/methodology_tcp_with_iperf3.rst b/docs/report/introduction/methodology_tcp_with_iperf3.rst index ef28dec4a3..288da004a5 100644 --- a/docs/report/introduction/methodology_tcp_with_iperf3.rst +++ b/docs/report/introduction/methodology_tcp_with_iperf3.rst @@ -1,11 +1,11 @@ Hoststack Throughput Testing over TCP/IP with iperf3 ---------------------------------------------------- -`iperf3 bandwidth measurement tool `_ -is used for measuring the maximum attainable bandwidth of the VPP Host +`iperf3 goodput measurement tool `_ +is used for measuring the maximum attainable goodput of the VPP Host Stack connection across two instances of VPP running on separate DUT nodes. iperf3 is a popular open source tool for active measurements -of the maximum achievable bandwidth on IP networks. +of the maximum achievable goodput on IP networks. Because iperf3 utilizes the POSIX socket interface APIs, the current test configuration utilizes the LD_PRELOAD mechanism in the linux @@ -14,7 +14,7 @@ Communications Library (VCL) LD_PRELOAD library (libvcl_ldpreload.so). In the future, a forked version of iperf3 which has been modified to directly use the VCL application APIs may be added to determine the -difference in performance of 'VCL Native' applications .vs. utilizing +difference in performance of 'VCL Native' applications versus utilizing LD_PRELOAD which inherently has more overhead and other limitations. The test configuration is as follows: diff --git a/docs/report/introduction/methodology_terminology.rst b/docs/report/introduction/methodology_terminology.rst index db76827a5a..33ab116491 100644 --- a/docs/report/introduction/methodology_terminology.rst +++ b/docs/report/introduction/methodology_terminology.rst @@ -27,13 +27,13 @@ Terminology methodology contains other parts, whose performance is either already established, or not affecting the benchmarking result. - **Bi-directional throughput tests**: involve packets/frames flowing in - both transmit and receive directions over every tested interface of + both east-west and west-east directions over every tested interface of SUT/DUT. Packet flow metrics are measured per direction, and can be reported as aggregate for both directions (i.e. throughput) and/or separately for each measured direction (i.e. latency). In most cases bi-directional tests use the same (symmetric) load in both directions. - **Uni-directional throughput tests**: involve packets/frames flowing in - only one direction, i.e. either transmit or receive direction, over + only one direction, i.e. either east-west or west-east direction, over every tested interface of SUT/DUT. Packet flow metrics are measured and are reported for measured direction. - **Packet Loss Ratio (PLR)**: ratio of packets received relative to packets @@ -50,8 +50,8 @@ Terminology Measured in packets-per-second (pps) or frames-per-second (fps), equivalent metrics. - **Bandwidth Throughput Rate**: a secondary metric calculated from packet - throughput rate using formula: bw_rate = pkt_rate - (frame_size + - L1_overhead) - 8, where L1_overhead for Ethernet includes preamble (8 + throughput rate using formula: bw_rate = pkt_rate * (frame_size + + L1_overhead) * 8, where L1_overhead for Ethernet includes preamble (8 Bytes) and inter-frame gap (12 Bytes). For bi-directional tests, bandwidth throughput rate should be reported as aggregate for both directions. Expressed in bits-per-second (bps). @@ -75,4 +75,4 @@ Terminology bandwidth MRR expressed in bits-per-second (bps). - **Trial**: a single measurement step. - **Trial duration**: amount of time over which packets are transmitted and - received in a single throughput measurement step. + received in a single measurement step. diff --git a/docs/report/introduction/methodology_trex_traffic_generator.rst b/docs/report/introduction/methodology_trex_traffic_generator.rst index 0d19c2cf78..d9e7df57d3 100644 --- a/docs/report/introduction/methodology_trex_traffic_generator.rst +++ b/docs/report/introduction/methodology_trex_traffic_generator.rst @@ -4,13 +4,12 @@ TRex Traffic Generator Usage ~~~~~ -`TRex traffic generator `_ is used for all +`TRex traffic generator `_ is used for all CSIT performance tests. TRex stateless mode is used to measure NDR and PDR throughputs using MLRsearch and to measure maximum transer rate in MRR tests. -TRex is installed and run on the TG compute node. The typical procedure -is: +TRex is installed and run on the TG compute node. The typical procedure is: - If the TRex is not already installed on TG, it is installed in the suite setup phase - see `TRex installation`_. @@ -22,7 +21,7 @@ is: - TRex is started in the background mode :: - $ sh -c 'cd /scripts/ && sudo nohup ./t-rex-64 -i -c 7 --prefix $(hostname) --hdrh > /tmp/trex.log 2>&1 &' > /dev/null + $ sh -c 'cd /scripts/ && sudo nohup ./t-rex-64 -i --prefix $(hostname) --hdrh --no-scapy-server > /tmp/trex.log 2>&1 &' > /dev/null - There are traffic streams dynamically prepared for each test, based on traffic profiles. The traffic is sent and the statistics obtained using @@ -49,4 +48,4 @@ Measuring Latency If measurement of latency is requested, two more packet streams are created (one for each direction) with TRex flow_stats parameter set to STLFlowLatencyStats. In that case, returned statistics will also include -min/avg/max latency values. +min/avg/max latency values and encoded HDRHstogram data. diff --git a/docs/report/introduction/methodology_tunnel_encapsulations.rst b/docs/report/introduction/methodology_tunnel_encapsulations.rst index d9e2f42f25..c61df171ac 100644 --- a/docs/report/introduction/methodology_tunnel_encapsulations.rst +++ b/docs/report/introduction/methodology_tunnel_encapsulations.rst @@ -15,7 +15,7 @@ VPP is tested in the following IPv4 tunnel baseline configurations: - *ip4lispip4-ip4base*: LISP over IPv4 tunnels with IPv4 routing. - *ip4lispip6-ip6base*: LISP over IPv4 tunnels with IPv6 routing. -In all cases listed above low number of MAC, IPv4, IPv6 flows (254 or 253 per +In all cases listed above low number of MAC, IPv4, IPv6 flows (253 or 254 per direction) is switched or routed by VPP. In addition selected IPv4 tunnels are tested at scale: @@ -34,5 +34,5 @@ VPP is tested in the following IPv6 tunnel baseline configurations: - *ip6lispip4-ip4base*: LISP over IPv4 tunnels with IPv4 routing. - *ip6lispip6-ip6base*: LISP over IPv4 tunnels with IPv6 routing. -In all cases listed above low number of IPv4, IPv6 flows (253 per +In all cases listed above low number of IPv4, IPv6 flows (253 or 254 per direction) is routed by VPP. diff --git a/docs/report/introduction/methodology_vpp_device_functional.rst b/docs/report/introduction/methodology_vpp_device_functional.rst index 0c29624419..ff6f3fb03b 100644 --- a/docs/report/introduction/methodology_vpp_device_functional.rst +++ b/docs/report/introduction/methodology_vpp_device_functional.rst @@ -5,7 +5,7 @@ VPP_Device Functional device tests integrated into LFN CI/CD infrastructure. VPP_Device tests run on 1-Node testbeds (1n-skx, 1n-arm) and rely on Linux SRIOV Virtual Function (VF), dot1q VLAN tagging and external loopback cables to -facilitate packet passing over exernal physical links. Initial focus is -on few baseline tests. Existing CSIT Performance tests can be moved to -VPP_Device framework. RF test definition code stays unchanged with the -exception of traffic generator related L2 KWs. +facilitate packet passing over external physical links. Initial focus is +on few baseline tests. New device tests can be added by small edits +to existing CSIT Performance (2-node) test. RF test definition code +stays unchanged with the exception of traffic generator related L2 KWs. -- cgit 1.2.3-korg