aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorVratko Polak <vrpolak@cisco.com>2020-02-03 18:51:14 +0100
committerPeter Mikus <pmikus@cisco.com>2020-02-22 11:47:04 +0000
commit3c23979e8770e5d5e6dc104d36f20ea5697f3fcc (patch)
tree04ea5139ad3eccba15b12b513a6767a1d4748672
parent815cbb45dff3fd759f2bd4608bb45ee7949dfc55 (diff)
Report: Edit minor details in methodology docs
Change-Id: I9bbb97e635b6ef438dcb8bed3f69617bb98e9779 Signed-off-by: Vratko Polak <vrpolak@cisco.com>
-rw-r--r--docs/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.rst2
-rw-r--r--docs/report/introduction/methodology_data_plane_throughput/methodology_mlrsearch_tests.rst15
-rw-r--r--docs/report/introduction/methodology_data_plane_throughput/methodology_mrr_throughput.rst8
-rw-r--r--docs/report/introduction/methodology_data_plane_throughput/methodology_plrsearch.rst8
-rw-r--r--docs/report/introduction/methodology_kvm_vms_vhost_user.rst2
-rw-r--r--docs/report/introduction/methodology_multi_core_speedup.rst8
-rw-r--r--docs/report/introduction/methodology_nfv_service_density.rst4
-rw-r--r--docs/report/introduction/methodology_packet_latency.rst4
-rw-r--r--docs/report/introduction/methodology_quic_with_vppecho.rst3
-rw-r--r--docs/report/introduction/methodology_reconf.rst11
-rw-r--r--docs/report/introduction/methodology_tcp_with_iperf3.rst8
-rw-r--r--docs/report/introduction/methodology_terminology.rst10
-rw-r--r--docs/report/introduction/methodology_trex_traffic_generator.rst9
-rw-r--r--docs/report/introduction/methodology_tunnel_encapsulations.rst4
-rw-r--r--docs/report/introduction/methodology_vpp_device_functional.rst8
15 files changed, 50 insertions, 54 deletions
diff --git a/docs/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.rst b/docs/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.rst
index 202b4281b7..764e198d0f 100644
--- a/docs/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.rst
+++ b/docs/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.rst
@@ -111,7 +111,7 @@ PLRsearch are run to discover a sustained throughput for PLR=10^-7
frame sizes (64b/78B) are presented in packet throughput graphs (Box
Plots) for a small subset of baseline tests.
-Each soak test lasts 2hrs and is executed at least twice. Results are
+Each soak test lasts 30 minutes and is executed at least twice. Results are
compared against NDR and PDR rates discovered with MLRsearch.
Details
diff --git a/docs/report/introduction/methodology_data_plane_throughput/methodology_mlrsearch_tests.rst b/docs/report/introduction/methodology_data_plane_throughput/methodology_mlrsearch_tests.rst
index acc974841d..1209697195 100644
--- a/docs/report/introduction/methodology_data_plane_throughput/methodology_mlrsearch_tests.rst
+++ b/docs/report/introduction/methodology_data_plane_throughput/methodology_mlrsearch_tests.rst
@@ -16,15 +16,15 @@ with zero packet loss, PLR=0) and Partial Drop Rate (PDR, with packet
loss rate not greater than the configured non-zero PLR).
MLRsearch discovers NDR and PDR in a single pass reducing required time
-duration compared to separate binary searches for NDR and PDR. Overall
+duration compared to separate `binary search`_es for NDR and PDR. Overall
search time is reduced even further by relying on shorter trial
durations of intermediate steps, with only the final measurements
conducted at the specified final trial duration. This results in the
shorter overall execution time when compared to standard NDR/PDR binary
search, while guaranteeing similar results.
-If needed, MLRsearch can be easily adopted to discover more throughput
-rates with different pre-defined PLRs.
+If needed, next version of MLRsearch can be easily adopted
+to discover more throughput rates with different pre-defined PLRs.
.. Note:: All throughput rates are *always* bi-directional
aggregates of two equal (symmetric) uni-directional packet rates
@@ -45,11 +45,8 @@ MLRsearch is also available as a `PyPI (Python Package Index) library
Implementation Deviations
~~~~~~~~~~~~~~~~~~~~~~~~~
-FD.io CSIT implementation of MLRsearch so far is fully based on the -01
-version of the `draft-vpolak-mkonstan-mlrsearch-01
-<https://tools.ietf.org/html/draft-vpolak-mkonstan-bmwg-mlrsearch-01>`_.
+FD.io CSIT implementation of MLRsearch so far is fully based on the -02
+version of the `draft-vpolak-mkonstan-mlrsearch-02
+<https://tools.ietf.org/html/draft-vpolak-mkonstan-bmwg-mlrsearch-02>`_.
.. _binary search: https://en.wikipedia.org/wiki/Binary_search
-.. _exponential search: https://en.wikipedia.org/wiki/Exponential_search
-.. _estimation of standard deviation: https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation
-.. _simplified error propagation formula: https://en.wikipedia.org/wiki/Propagation_of_uncertainty#Simplification
diff --git a/docs/report/introduction/methodology_data_plane_throughput/methodology_mrr_throughput.rst b/docs/report/introduction/methodology_data_plane_throughput/methodology_mrr_throughput.rst
index fd4baca2f3..4e8000b161 100644
--- a/docs/report/introduction/methodology_data_plane_throughput/methodology_mrr_throughput.rst
+++ b/docs/report/introduction/methodology_data_plane_throughput/methodology_mrr_throughput.rst
@@ -14,7 +14,7 @@ MRR tests are currently used for following test jobs:
- Report performance comparison: 64B, IMIX for vhost, memif.
- Daily performance trending: 64B, IMIX for vhost, memif.
- Per-patch performance verification: 64B.
-- PLRsearch soaking tests: 64B.
+- Initial iterations of MLRsearch and PLRsearch: 64B.
Maximum offered load for specific L2 Ethernet frame size is set to
either the maximum bi-directional link rate or tested NIC model
@@ -42,11 +42,13 @@ Burst parameter settings vary between different tests using MRR:
- Report performance comparison: 1 sec.
- Daily performance trending: 1 sec.
- Per-patch performance verification: 10 sec.
- - PLRsearch soaking tests: 5.2 sec.
+ - Initial iteration for MLRsearch: 1 sec.
+ - Initial iteration for PLRsearch: 5.2 sec.
- Number of MRR trials per burst:
- Report performance comparison: 10.
- Daily performance trending: 10.
- Per-patch performance verification: 5.
- - PLRsearch soaking tests: 1. \ No newline at end of file
+ - Initial iteration for MLRsearch: 1.
+ - Initial iteration for PLRsearch: 1.
diff --git a/docs/report/introduction/methodology_data_plane_throughput/methodology_plrsearch.rst b/docs/report/introduction/methodology_data_plane_throughput/methodology_plrsearch.rst
index 65165b31c7..68f30bc562 100644
--- a/docs/report/introduction/methodology_data_plane_throughput/methodology_plrsearch.rst
+++ b/docs/report/introduction/methodology_data_plane_throughput/methodology_plrsearch.rst
@@ -102,7 +102,7 @@ of sum of exponentials") are defined to handle None correctly.
Fitting Functions
`````````````````
-Current implementation uses two fitting functions.
+Current implementation uses two fitting functions, called "stretch" and "erf".
In general, their estimates for critical rate differ,
which adds a simple source of systematic error,
on top of randomness error reported by integrator.
@@ -113,7 +113,7 @@ Both functions are not only increasing, but also convex
(meaning the rate of increase is also increasing).
Both fitting functions have several mathematically equivalent formulas,
-each can lead to an overflow or underflow in different sub-terms.
+each can lead to an arithmetic overflow or underflow in different sub-terms.
Overflows can be eliminated by using different exact formulas
for different argument ranges.
Underflows can be avoided by using approximate formulas
@@ -128,7 +128,7 @@ Prior Distributions
The numeric integrator expects all the parameters to be distributed
(independently and) uniformly on an interval (-1, 1).
-As both "mrr" and "spread" parameters are positive and not not dimensionless,
+As both "mrr" and "spread" parameters are positive and not dimensionless,
a transformation is needed. Dimentionality is inherited from max_rate value.
The "mrr" parameter follows a `Lomax distribution`_
@@ -303,7 +303,7 @@ The following analysis will rely on frequency of zero loss measurements
and magnitude of loss ratio if nonzero.
The offered load selection strategy used implies zero loss measurements
-can be gleamed from the graph by looking at offered load points.
+can be gleaned from the graph by looking at offered load points.
When the points move up farther from lower estimate, it means
the previous measurement had zero loss. After non-zero loss,
the offered load starts again right between (the previous values of)
diff --git a/docs/report/introduction/methodology_kvm_vms_vhost_user.rst b/docs/report/introduction/methodology_kvm_vms_vhost_user.rst
index e6a98596da..216d461911 100644
--- a/docs/report/introduction/methodology_kvm_vms_vhost_user.rst
+++ b/docs/report/introduction/methodology_kvm_vms_vhost_user.rst
@@ -3,7 +3,7 @@ KVM VMs vhost-user
QEMU is used for KVM VM vhost-user testing enviroment. By default,
standard QEMU version is used, preinstalled from OS repositories
-(qemu-2.11.1 for Ubuntu 18.04, qemu-2.5.0 for Ubuntu 16.04). The path
+(qemu-2.11.1 for Ubuntu 18.04). The path
to the QEMU binary can be adjusted in `Constants.py`.
FD.io CSIT performance lab is testing VPP vhost-user with KVM VMs using
diff --git a/docs/report/introduction/methodology_multi_core_speedup.rst b/docs/report/introduction/methodology_multi_core_speedup.rst
index b42bf42f92..095f0f7796 100644
--- a/docs/report/introduction/methodology_multi_core_speedup.rst
+++ b/docs/report/introduction/methodology_multi_core_speedup.rst
@@ -1,7 +1,7 @@
Multi-Core Speedup
------------------
-All performance tests are executed with single processor core and with
+All performance tests are executed with single physical core and with
multiple cores scenarios.
Intel Hyper-Threading (HT)
@@ -16,7 +16,7 @@ making it impractical for continuous changes of HT mode of operation.
|csit-release| performance tests are executed with server SUTs' Intel
XEON processors configured with Intel Hyper-Threading Disabled for all
Xeon Haswell testbeds (3n-hsw) and with Intel Hyper-Threading Enabled
-for all Xeon Skylake testbeds.
+for all Xeon Skylake and Xeon Cascadelake testbeds.
More information about physical testbeds is provided in
:ref:`tested_physical_topologies`.
@@ -34,8 +34,8 @@ thread and physical core configurations:
#. 2t2c - 2 VPP worker threads on 2 physical cores.
#. 4t4c - 4 VPP worker threads on 4 physical cores.
-#. Intel Xeon Skylake testbeds (2n-skx, 3n-skx) with Intel HT enabled
- (2 logical CPU cores per each physical core):
+#. Intel Xeon Skylake and Cascadelake testbeds (2n-skx, 3n-skx, 2n-clx)
+ with Intel HT enabled (2 logical CPU cores per each physical core):
#. 2t1c - 2 VPP worker threads on 1 physical core.
#. 4t2c - 4 VPP worker threads on 2 physical cores.
diff --git a/docs/report/introduction/methodology_nfv_service_density.rst b/docs/report/introduction/methodology_nfv_service_density.rst
index b09c1be629..c5407b5125 100644
--- a/docs/report/introduction/methodology_nfv_service_density.rst
+++ b/docs/report/introduction/methodology_nfv_service_density.rst
@@ -16,8 +16,8 @@ service chain forwarding context(s). In order to provide a most complete
picture, each network topology and service configuration is tested in
different service density setups by varying two parameters:
-- Number of service instances (e.g. 1,2,4..10).
-- Number of NFs per service instance (e.g. 1,2,4..10).
+- Number of service instances (e.g. 1, 2, 4, 6, 8, 10).
+- Number of NFs per service instance (e.g. 1, 2, 4, 6, 8, 10).
Implementation of NFV service density tests in |csit-release| is using two NF
applications:
diff --git a/docs/report/introduction/methodology_packet_latency.rst b/docs/report/introduction/methodology_packet_latency.rst
index b8df660539..1f7ad7f633 100644
--- a/docs/report/introduction/methodology_packet_latency.rst
+++ b/docs/report/introduction/methodology_packet_latency.rst
@@ -1,7 +1,7 @@
Packet Latency
--------------
-TRex Traffic Generator (TG) is used for measuring latency across 2-Node
+TRex Traffic Generator (TG) is used for measuring latency across 2-Node
and 3-Node SUT server topologies. TRex integrates `A High Dynamic Range
Histogram (HDRH) <http://hdrhistogram.org/>`_ code providing per packet
latency distribution for latency streams sent in parallel to the main
@@ -30,4 +30,4 @@ methodology:
setup used.
- TG setup introduces an always-on Tx/Rx interface latency of about 2
* 2 usec per direction induced by TRex SW writing and reading packet
- timestamps on CPU cores. \ No newline at end of file
+ timestamps on CPU cores.
diff --git a/docs/report/introduction/methodology_quic_with_vppecho.rst b/docs/report/introduction/methodology_quic_with_vppecho.rst
index 12b64203db..5579fb5954 100644
--- a/docs/report/introduction/methodology_quic_with_vppecho.rst
+++ b/docs/report/introduction/methodology_quic_with_vppecho.rst
@@ -32,6 +32,7 @@ where,
measurements for all streams and the sum of all streams.
Test cases include
+
1. 1 QUIC Connection with 1 Stream
2. 1 QUIC connection with 10 Streams
3. 10 QUIC connetions with 1 Stream
@@ -39,5 +40,5 @@ where,
with stream sizes to provide reasonable test durations. The VPP Host
Stack QUIC transport is configured to utilize the picotls encryption
- library. In the future, tests utilizing addtional encryption
+ library. In the future, tests utilizing addtional encryption
algorithms will be added.
diff --git a/docs/report/introduction/methodology_reconf.rst b/docs/report/introduction/methodology_reconf.rst
index 32e0fd7561..1a1f4cc98c 100644
--- a/docs/report/introduction/methodology_reconf.rst
+++ b/docs/report/introduction/methodology_reconf.rst
@@ -25,7 +25,7 @@ with somewhat long durations, and the re-configuration process can also be long,
finding an offered load which would result in zero loss
during the re-configuration process would be time-consuming.
-Instead, reconf tests find a througput value (lower bound for NDR)
+Instead, reconf tests first find a througput value (lower bound for NDR)
without re-configuration, and then maintain that ofered load
during re-configuration. The measured loss count is then assumed to be caused
by the re-configuration process. The result published by reconf tests
@@ -38,16 +38,16 @@ Current Implementation
Each reconf suite is based on a similar MLRsearch performance suite.
MLRsearch parameters are changed to speed up the throughput discovery.
-For example, PDR is not searched for, and final trial duration is shorter.
+For example, PDR is not searched for, and the final trial duration is shorter.
The MLRsearch suite has to contain a configuration parameter
-that can be scaled up, e.g. number of routes or number of service chains.
+that can be scaled up, e.g. number of tunnels or number of service chains.
Currently, only increasing the scale is supported
as the re-configuration operation. In future, scale decrease
or other operations can be implemented.
The traffic profile is not changed, so the traffic present is processed
-only by the smaller scale configuration. The added routes / chains
+only by the smaller scale configuration. The added tunnels / chains
are not targetted by the traffic.
For the re-configuration, the same Robot Framework and Python libraries
@@ -73,6 +73,3 @@ are expected without re-configuration. But different suites show different
allowing full NIC buffers to drain quickly between worker pauses.
For other suites, lower bound for NDR still has quite a large probability
of non-zero packet loss even without re-configuration.
-
-But the results show very high effective blocked time,
-so the two objections related to NDR lower bound are negligible in comparison.
diff --git a/docs/report/introduction/methodology_tcp_with_iperf3.rst b/docs/report/introduction/methodology_tcp_with_iperf3.rst
index ef28dec4a3..288da004a5 100644
--- a/docs/report/introduction/methodology_tcp_with_iperf3.rst
+++ b/docs/report/introduction/methodology_tcp_with_iperf3.rst
@@ -1,11 +1,11 @@
Hoststack Throughput Testing over TCP/IP with iperf3
----------------------------------------------------
-`iperf3 bandwidth measurement tool <https://github.com/esnet/iperf>`_
-is used for measuring the maximum attainable bandwidth of the VPP Host
+`iperf3 goodput measurement tool <https://github.com/esnet/iperf>`_
+is used for measuring the maximum attainable goodput of the VPP Host
Stack connection across two instances of VPP running on separate DUT
nodes. iperf3 is a popular open source tool for active measurements
-of the maximum achievable bandwidth on IP networks.
+of the maximum achievable goodput on IP networks.
Because iperf3 utilizes the POSIX socket interface APIs, the current
test configuration utilizes the LD_PRELOAD mechanism in the linux
@@ -14,7 +14,7 @@ Communications Library (VCL) LD_PRELOAD library (libvcl_ldpreload.so).
In the future, a forked version of iperf3 which has been modified to
directly use the VCL application APIs may be added to determine the
-difference in performance of 'VCL Native' applications .vs. utilizing
+difference in performance of 'VCL Native' applications versus utilizing
LD_PRELOAD which inherently has more overhead and other limitations.
The test configuration is as follows:
diff --git a/docs/report/introduction/methodology_terminology.rst b/docs/report/introduction/methodology_terminology.rst
index db76827a5a..33ab116491 100644
--- a/docs/report/introduction/methodology_terminology.rst
+++ b/docs/report/introduction/methodology_terminology.rst
@@ -27,13 +27,13 @@ Terminology
methodology contains other parts, whose performance is either already
established, or not affecting the benchmarking result.
- **Bi-directional throughput tests**: involve packets/frames flowing in
- both transmit and receive directions over every tested interface of
+ both east-west and west-east directions over every tested interface of
SUT/DUT. Packet flow metrics are measured per direction, and can be
reported as aggregate for both directions (i.e. throughput) and/or
separately for each measured direction (i.e. latency). In most cases
bi-directional tests use the same (symmetric) load in both directions.
- **Uni-directional throughput tests**: involve packets/frames flowing in
- only one direction, i.e. either transmit or receive direction, over
+ only one direction, i.e. either east-west or west-east direction, over
every tested interface of SUT/DUT. Packet flow metrics are measured
and are reported for measured direction.
- **Packet Loss Ratio (PLR)**: ratio of packets received relative to packets
@@ -50,8 +50,8 @@ Terminology
Measured in packets-per-second (pps) or frames-per-second (fps),
equivalent metrics.
- **Bandwidth Throughput Rate**: a secondary metric calculated from packet
- throughput rate using formula: bw_rate = pkt_rate - (frame_size +
- L1_overhead) - 8, where L1_overhead for Ethernet includes preamble (8
+ throughput rate using formula: bw_rate = pkt_rate * (frame_size +
+ L1_overhead) * 8, where L1_overhead for Ethernet includes preamble (8
Bytes) and inter-frame gap (12 Bytes). For bi-directional tests,
bandwidth throughput rate should be reported as aggregate for both
directions. Expressed in bits-per-second (bps).
@@ -75,4 +75,4 @@ Terminology
bandwidth MRR expressed in bits-per-second (bps).
- **Trial**: a single measurement step.
- **Trial duration**: amount of time over which packets are transmitted and
- received in a single throughput measurement step.
+ received in a single measurement step.
diff --git a/docs/report/introduction/methodology_trex_traffic_generator.rst b/docs/report/introduction/methodology_trex_traffic_generator.rst
index 0d19c2cf78..d9e7df57d3 100644
--- a/docs/report/introduction/methodology_trex_traffic_generator.rst
+++ b/docs/report/introduction/methodology_trex_traffic_generator.rst
@@ -4,13 +4,12 @@ TRex Traffic Generator
Usage
~~~~~
-`TRex traffic generator <https://wiki.fd.io/view/TRex>`_ is used for all
+`TRex traffic generator <https://trex-tgn.cisco.com>`_ is used for all
CSIT performance tests. TRex stateless mode is used to measure NDR and
PDR throughputs using MLRsearch and to measure maximum transer rate
in MRR tests.
-TRex is installed and run on the TG compute node. The typical procedure
-is:
+TRex is installed and run on the TG compute node. The typical procedure is:
- If the TRex is not already installed on TG, it is installed in the
suite setup phase - see `TRex installation`_.
@@ -22,7 +21,7 @@ is:
- TRex is started in the background mode
::
- $ sh -c 'cd <t-rex-install-dir>/scripts/ && sudo nohup ./t-rex-64 -i -c 7 --prefix $(hostname) --hdrh > /tmp/trex.log 2>&1 &' > /dev/null
+ $ sh -c 'cd <t-rex-install-dir>/scripts/ && sudo nohup ./t-rex-64 -i --prefix $(hostname) --hdrh --no-scapy-server > /tmp/trex.log 2>&1 &' > /dev/null
- There are traffic streams dynamically prepared for each test, based on traffic
profiles. The traffic is sent and the statistics obtained using
@@ -49,4 +48,4 @@ Measuring Latency
If measurement of latency is requested, two more packet streams are
created (one for each direction) with TRex flow_stats parameter set to
STLFlowLatencyStats. In that case, returned statistics will also include
-min/avg/max latency values.
+min/avg/max latency values and encoded HDRHstogram data.
diff --git a/docs/report/introduction/methodology_tunnel_encapsulations.rst b/docs/report/introduction/methodology_tunnel_encapsulations.rst
index d9e2f42f25..c61df171ac 100644
--- a/docs/report/introduction/methodology_tunnel_encapsulations.rst
+++ b/docs/report/introduction/methodology_tunnel_encapsulations.rst
@@ -15,7 +15,7 @@ VPP is tested in the following IPv4 tunnel baseline configurations:
- *ip4lispip4-ip4base*: LISP over IPv4 tunnels with IPv4 routing.
- *ip4lispip6-ip6base*: LISP over IPv4 tunnels with IPv6 routing.
-In all cases listed above low number of MAC, IPv4, IPv6 flows (254 or 253 per
+In all cases listed above low number of MAC, IPv4, IPv6 flows (253 or 254 per
direction) is switched or routed by VPP.
In addition selected IPv4 tunnels are tested at scale:
@@ -34,5 +34,5 @@ VPP is tested in the following IPv6 tunnel baseline configurations:
- *ip6lispip4-ip4base*: LISP over IPv4 tunnels with IPv4 routing.
- *ip6lispip6-ip6base*: LISP over IPv4 tunnels with IPv6 routing.
-In all cases listed above low number of IPv4, IPv6 flows (253 per
+In all cases listed above low number of IPv4, IPv6 flows (253 or 254 per
direction) is routed by VPP.
diff --git a/docs/report/introduction/methodology_vpp_device_functional.rst b/docs/report/introduction/methodology_vpp_device_functional.rst
index 0c29624419..ff6f3fb03b 100644
--- a/docs/report/introduction/methodology_vpp_device_functional.rst
+++ b/docs/report/introduction/methodology_vpp_device_functional.rst
@@ -5,7 +5,7 @@ VPP_Device Functional
device tests integrated into LFN CI/CD infrastructure. VPP_Device tests
run on 1-Node testbeds (1n-skx, 1n-arm) and rely on Linux SRIOV Virtual
Function (VF), dot1q VLAN tagging and external loopback cables to
-facilitate packet passing over exernal physical links. Initial focus is
-on few baseline tests. Existing CSIT Performance tests can be moved to
-VPP_Device framework. RF test definition code stays unchanged with the
-exception of traffic generator related L2 KWs.
+facilitate packet passing over external physical links. Initial focus is
+on few baseline tests. New device tests can be added by small edits
+to existing CSIT Performance (2-node) test. RF test definition code
+stays unchanged with the exception of traffic generator related L2 KWs.