aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMaciek Konstantynowicz <mkonstan@cisco.com>2018-08-13 21:17:10 +0100
committerMaciek Konstantynowicz <mkonstan@cisco.com>2018-08-14 14:05:05 +0000
commit25a40550e90c036dddf17698103a9a3c34ff6799 (patch)
treeadfa59a51262b3feeb717ad156db3f2ae96593bb
parent7ae94f2578699fcb50544742d6c27b59edc74abb (diff)
1807 report: added HW calibration sections to test_environment plus editing nits.
Change-Id: I66698ae70d1bbbde6992e5663bc64c30249f7f79 Signed-off-by: Maciek Konstantynowicz <mkonstan@cisco.com>
-rw-r--r--docs/report/introduction/methodology.rst3
-rw-r--r--docs/report/introduction/test_environment_intro.rst72
-rw-r--r--docs/report/introduction/test_environment_sut_calib_hsw.rst223
-rw-r--r--docs/report/introduction/test_environment_sut_calib_skx.rst213
-rw-r--r--docs/report/introduction/test_environment_sut_conf_1.rst18
-rw-r--r--docs/report/introduction/test_environment_sut_conf_2.rst13
-rw-r--r--docs/report/introduction/test_environment_sut_conf_3.rst18
-rw-r--r--docs/report/introduction/test_environment_tg.rst22
-rw-r--r--docs/report/vpp_performance_tests/test_environment.rst30
9 files changed, 559 insertions, 53 deletions
diff --git a/docs/report/introduction/methodology.rst b/docs/report/introduction/methodology.rst
index ff5714c259..28bcf68257 100644
--- a/docs/report/introduction/methodology.rst
+++ b/docs/report/introduction/methodology.rst
@@ -1,3 +1,6 @@
+
+.. _performance_test_methodology:
+
Performance Test Methodology
============================
diff --git a/docs/report/introduction/test_environment_intro.rst b/docs/report/introduction/test_environment_intro.rst
index d80ecdffe0..19dac90b96 100644
--- a/docs/report/introduction/test_environment_intro.rst
+++ b/docs/report/introduction/test_environment_intro.rst
@@ -3,16 +3,62 @@
Test Environment
================
-CSIT performance tests are executed on physical testbeds hosted by
-:abbr:`LF (Linux Foundation)` for FD.io project. Each testbed consists of
-either one (2-node) or two (3-node) servers acting as Systems Under Test (SUT)
-and one server acting as Traffic Generator (TG).
-
-Server Specification and Configuration
---------------------------------------
-
-Complete specification and configuration of compute servers used in CSIT
-physical testbeds is maintained on wiki page `CSIT testbed - Server HW
-Configuration (Haswell) <https://wiki.fd.io/view/CSIT/CSIT_LF_testbed>`_ and
-`CSIT testbed - Server HW Configuration (Skylake/ARM)
-<https://wiki.fd.io/view/CSIT/fdio_csit_lab_ext_lld_draft>`_.
+Physical Testbeds
+-----------------
+
+FD.io CSIT performance tests are executed in physical testbeds hosted by
+:abbr:`LF (Linux Foundation)` for FD.io project.
+
+Two physical testbed topology types are used:
+
+- **3-Node Topology**: Consisting of two servers acting as SUTs
+ (Systems Under Test) and one server as TG (Traffic Generator), all
+ connected in ring topology.
+- **2-Node Topology**: Consisting of one server acting as SUTs and one
+ server as TG both connected in ring topology.
+
+Tested SUT servers are based on a range of processors including Intel
+Xeon Haswell-SP, Intel Xeon Skylake-SP, Arm, Intel Atom. More detailed
+description is provided in
+:ref:`tested_physical_topologies`.
+
+Tested logical topologies are described in
+:ref:`tested_logical_topologies`.
+
+Server Specifications
+---------------------
+
+Complete technical specifications of compute servers used in CSIT
+physical testbeds are maintained on FD.io wiki pages: `CSIT/Testbeds:
+Xeon Hsw, VIRL
+<https://wiki.fd.io/view/CSIT/Testbeds:_Xeon_Hsw,_VIRL.#FD.io_CSIT_testbeds_-_Xeon_Haswell.2C_VIRL>`_
+and `CSIT Testbeds: Xeon Skx, Arm, Atom
+<https://wiki.fd.io/view/CSIT/Testbeds:_Xeon_Skx,_Arm,_Atom.#Server_Specification>`_.
+
+Pre-Test Server Calibration
+---------------------------
+
+Number of SUT server sub-system runtime parameters have been identified
+as impacting data plane performance tests. Calibrating those parameters
+is part of FD.io CSIT pre-test activities, and includes measuring and
+reporting following:
+
+#. System level core jitter – measure duration of core interrupts by
+ Linux in clock cycles and how often interrupts happen. Using
+ `CPU core jitter tool <https://git.fd.io/pma_tools/tree/jitter>`_.
+
+#. Memory bandwidth – measure bandwidth with `Intel MLC tool
+ <https://software.intel.com/en-us/articles/intelr-memory-latency-checker>`_.
+
+#. Memory latency – measure memory latency with Intel MLC tool.
+
+#. Cache latency at all levels (L1, L2, and Last Level Cache) – measure
+ cache latency with Intel MLC tool.
+
+Measured values of listed parameters are especially important for
+repeatable zero packet loss throughput measurements across multiple
+system instances. Generally they come useful as a background data for
+comparing data plane performance results across disparate servers.
+
+Following sections include measured calibration data for Intel Xeon
+Haswell and Intel Xeon Skylake testbeds.
diff --git a/docs/report/introduction/test_environment_sut_calib_hsw.rst b/docs/report/introduction/test_environment_sut_calib_hsw.rst
new file mode 100644
index 0000000000..b5ebdd25e5
--- /dev/null
+++ b/docs/report/introduction/test_environment_sut_calib_hsw.rst
@@ -0,0 +1,223 @@
+Calibration Data - Haswell
+--------------------------
+
+Following sections include sample calibration data measured on t1-sut1
+server running in one of the Intel Xeon Haswell testbeds as specified in
+`CSIT/Testbeds: Xeon Hsw, VIRL
+<https://wiki.fd.io/view/CSIT/Testbeds:_Xeon_Hsw,_VIRL.#FD.io_CSIT_testbeds_-_Xeon_Haswell.2C_VIRL>`_.
+
+Calibration data obtained from all other servers in Haswell testbeds
+shows the same or similar values.
+
+
+Linux cmdline
+~~~~~~~~~~~~~
+
+::
+
+ $ cat /proc/cmdline
+ BOOT_IMAGE=/vmlinuz-4.4.0-72-generic root=UUID=efb7e8b3-3548-4440-98f6-6ebe102e9ec6 ro isolcpus=1-17,19-35 nohz_full=1-17,19-35 rcu_nocbs=1-17,19-35 intel_pstate=disable console=tty0 console=ttyS0,115200n8
+
+
+Linux uname
+~~~~~~~~~~~
+
+::
+
+ $ uname -a
+ Linux t3-sut2 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
+
+
+System-level core jitter
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ $ sudo taskset -c 3 /home/testuser/pma_tools/jitter/jitter -i 30
+ Linux Jitter testing program version 1.8
+ Iterations=30
+ The pragram will execute a dummy function 80000 times
+ Display is updated every 20000 displayUpdate intervals
+ Timings are in CPU Core cycles
+ Inst_Min: Minimum Excution time during the display update interval(default is ~1 second)
+ Inst_Max: Maximum Excution time during the display update interval(default is ~1 second)
+ Inst_jitter: Jitter in the Excution time during rhe display update interval. This is the value of interest
+ last_Exec: The Excution time of last iteration just before the display update
+ Abs_Min: Absolute Minimum Excution time since the program started or statistics were reset
+ Abs_Max: Absolute Maximum Excution time since the program started or statistics were reset
+ tmp: Cumulative value calcualted by the dummy function
+ Interval: Time interval between the display updates in Core Cycles
+ Sample No: Sample number
+
+ Inst_Min Inst_Max Inst_jitter last_Exec Abs_min Abs_max tmp Interval Sample No
+ 160024 172636 12612 160028 160024 172636 1573060608 3205463144 1
+ 160024 188236 28212 160028 160024 188236 958595072 3205500844 2
+ 160024 185676 25652 160028 160024 188236 344129536 3205485976 3
+ 160024 172608 12584 160024 160024 188236 4024631296 3205472740 4
+ 160024 179260 19236 160028 160024 188236 3410165760 3205502164 5
+ 160024 172432 12408 160024 160024 188236 2795700224 3205452036 6
+ 160024 178820 18796 160024 160024 188236 2181234688 3205455408 7
+ 160024 172512 12488 160028 160024 188236 1566769152 3205461528 8
+ 160024 172636 12612 160028 160024 188236 952303616 3205478820 9
+ 160024 173676 13652 160028 160024 188236 337838080 3205470412 10
+ 160024 178776 18752 160028 160024 188236 4018339840 3205481472 11
+ 160024 172788 12764 160028 160024 188236 3403874304 3205492336 12
+ 160024 174616 14592 160028 160024 188236 2789408768 3205474904 13
+ 160024 174440 14416 160028 160024 188236 2174943232 3205479448 14
+ 160024 178748 18724 160024 160024 188236 1560477696 3205482668 15
+ 160024 172588 12564 169404 160024 188236 946012160 3205510496 16
+ 160024 172636 12612 160024 160024 188236 331546624 3205472204 17
+ 160024 172480 12456 160024 160024 188236 4012048384 3205455864 18
+ 160024 172740 12716 160028 160024 188236 3397582848 3205464932 19
+ 160024 179200 19176 160028 160024 188236 2783117312 3205476012 20
+ 160024 172480 12456 160028 160024 188236 2168651776 3205465632 21
+ 160024 172728 12704 160024 160024 188236 1554186240 3205497204 22
+ 160024 172620 12596 160028 160024 188236 939720704 3205466972 23
+ 160024 172640 12616 160028 160024 188236 325255168 3205471216 24
+ 160024 172484 12460 160028 160024 188236 4005756928 3205467388 25
+ 160024 172636 12612 160028 160024 188236 3391291392 3205482748 26
+ 160024 179056 19032 160024 160024 188236 2776825856 3205467152 27
+ 160024 172672 12648 160024 160024 188236 2162360320 3205483268 28
+ 160024 176932 16908 160024 160024 188236 1547894784 3205488536 29
+ 160024 172452 12428 160028 160024 188236 933429248 3205440636 30
+
+
+Memory bandwidth
+~~~~~~~~~~~~~~~~
+
+::
+
+ $ sudo /home/testuser/mlc --bandwidth_matrix
+ Intel(R) Memory Latency Checker - v3.5
+ Command line parameters: --bandwidth_matrix
+
+ Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes
+ Measuring Memory Bandwidths between nodes within system
+ Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
+ Using all the threads from each core if Hyper-threading is enabled
+ Using Read-only traffic type
+ Numa node
+ Numa node 0 1
+ 0 57935.5 30265.2
+ 1 30284.6 58409.9
+
+::
+
+ $ sudo /home/testuser/mlc --peak_injection_bandwidth
+ Intel(R) Memory Latency Checker - v3.5
+ Command line parameters: --peak_injection_bandwidth
+
+ Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes
+
+ Measuring Peak Injection Memory Bandwidths for the system
+ Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
+ Using all the threads from each core if Hyper-threading is enabled
+ Using traffic with the following read-write ratios
+ ALL Reads : 115762.2
+ 3:1 Reads-Writes : 106242.2
+ 2:1 Reads-Writes : 103031.8
+ 1:1 Reads-Writes : 87943.7
+ Stream-triad like: 100048.4
+
+::
+
+ $ sudo /home/testuser/mlc --max_bandwidth
+ Intel(R) Memory Latency Checker - v3.5
+ Command line parameters: --max_bandwidth
+
+ Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes
+
+ Measuring Maximum Memory Bandwidths for the system
+ Will take several minutes to complete as multiple injection rates will be tried to get the best bandwidth
+ Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
+ Using all the threads from each core if Hyper-threading is enabled
+ Using traffic with the following read-write ratios
+ ALL Reads : 115782.41
+ 3:1 Reads-Writes : 105965.78
+ 2:1 Reads-Writes : 103162.38
+ 1:1 Reads-Writes : 88255.82
+ Stream-triad like: 105608.10
+
+
+Memory latency
+~~~~~~~~~~~~~~
+
+::
+
+ $ sudo /home/testuser/mlc --latency_matrix
+ Intel(R) Memory Latency Checker - v3.5
+ Command line parameters: --latency_matrix
+
+ Using buffer size of 200.000MB
+ Measuring idle latencies (in ns)...
+ Numa node
+ Numa node 0 1
+ 0 101.0 132.0
+ 1 141.2 98.8
+
+::
+
+ $ sudo /home/testuser/mlc --idle_latency
+ Intel(R) Memory Latency Checker - v3.5
+ Command line parameters: --idle_latency
+
+ Using buffer size of 200.000MB
+ Each iteration took 227.2 core clocks ( 99.0 ns)
+
+::
+
+ $ sudo /home/testuser/mlc --loaded_latency
+ Intel(R) Memory Latency Checker - v3.5
+ Command line parameters: --loaded_latency
+
+ Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes
+
+ Measuring Loaded Latencies for the system
+ Using all the threads from each core if Hyper-threading is enabled
+ Using Read-only traffic type
+ Inject Latency Bandwidth
+ Delay (ns) MB/sec
+ ==========================
+ 00000 294.08 115841.6
+ 00002 294.27 115851.5
+ 00008 293.67 115821.8
+ 00015 278.92 115587.5
+ 00050 246.80 113991.2
+ 00100 206.86 104508.1
+ 00200 123.72 72873.6
+ 00300 113.35 52641.1
+ 00400 108.89 41078.9
+ 00500 108.11 33699.1
+ 00700 106.19 24878.0
+ 01000 104.75 17948.1
+ 01300 103.72 14089.0
+ 01700 102.95 11013.6
+ 02500 102.25 7756.3
+ 03500 101.81 5749.3
+ 05000 101.46 4230.4
+ 09000 101.05 2641.4
+ 20000 100.77 1542.5
+
+
+L1/L2/LLC latency
+~~~~~~~~~~~~~~~~~
+
+::
+
+ $ sudo /home/testuser/mlc --c2c_latency
+ Intel(R) Memory Latency Checker - v3.5
+ Command line parameters: --c2c_latency
+
+ Measuring cache-to-cache transfer latency (in ns)...
+ Local Socket L2->L2 HIT latency 42.1
+ Local Socket L2->L2 HITM latency 47.0
+ Remote Socket L2->L2 HITM latency (data address homed in writer socket)
+ Reader Numa Node
+ Writer Numa Node 0 1
+ 0 - 108.0
+ 1 106.9 -
+ Remote Socket L2->L2 HITM latency (data address homed in reader socket)
+ Reader Numa Node
+ Writer Numa Node 0 1
+ 0 - 107.7
+ 1 106.6 - \ No newline at end of file
diff --git a/docs/report/introduction/test_environment_sut_calib_skx.rst b/docs/report/introduction/test_environment_sut_calib_skx.rst
new file mode 100644
index 0000000000..2496e7a0d9
--- /dev/null
+++ b/docs/report/introduction/test_environment_sut_calib_skx.rst
@@ -0,0 +1,213 @@
+Calibration Data - Skylake
+--------------------------
+
+Following sections include sample calibration data measured on
+s11-t31-sut1 server running in one of the Intel Xeon Skylake testbeds as
+specified in `CSIT Testbeds: Xeon Skx, Arm, Atom
+<https://wiki.fd.io/view/CSIT/Testbeds:_Xeon_Skx,_Arm,_Atom.#Server_Specification>`_.
+
+Calibration data obtained from all other servers in Skylake testbeds
+shows the same or similar values.
+
+
+Linux cmdline
+~~~~~~~~~~~~~
+
+::
+
+ $ cat /proc/cmdline
+ BOOT_IMAGE=/vmlinuz-4.15.0-23-generic root=UUID=759ad671-ad46-441b-a75b-9f54e81837bb ro isolcpus=1-27,29-55,57-83,85-111 nohz_full=1-27,29-55,57-83,85-111 rcu_nocbs=1-27,29-55,57-83,85-111 numa_balancing=disable intel_pstate=disable intel_iommu=on iommu=pt nmi_watchdog=0 audit=0 nosoftlockup processor.max_cstate=1 intel_idle.max_cstate=1 hpet=disable tsc=reliable mce=off console=tty0 console=ttyS0,115200n8
+
+
+Linux uname
+~~~~~~~~~~~
+
+::
+
+ $ uname -a
+ Linux s5-t22-sut1 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 18:02:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
+
+
+System-level core jitter
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ $ sudo taskset -c 3 /home/testuser/pma_tools/jitter/jitter -i 20
+ Linux Jitter testing program version 1.8
+ Iterations=20
+ The pragram will execute a dummy function 80000 times
+ Display is updated every 20000 displayUpdate intervals
+ Timings are in CPU Core cycles
+ Inst_Min: Minimum Excution time during the display update interval(default is ~1 second)
+ Inst_Max: Maximum Excution time during the display update interval(default is ~1 second)
+ Inst_jitter: Jitter in the Excution time during rhe display update interval. This is the value of interest
+ last_Exec: The Excution time of last iteration just before the display update
+ Abs_Min: Absolute Minimum Excution time since the program started or statistics were reset
+ Abs_Max: Absolute Maximum Excution time since the program started or statistics were reset
+ tmp: Cumulative value calcualted by the dummy function
+ Interval: Time interval between the display updates in Core Cycles
+ Sample No: Sample number
+
+ Inst_Min Inst_Max Inst_jitter last_Exec Abs_min Abs_max tmp Interval Sample No
+ 160022 171330 11308 160022 160022 171330 2538733568 3204142750 1
+ 160022 167294 7272 160026 160022 171330 328335360 3203873548 2
+ 160022 167560 7538 160026 160022 171330 2412904448 3203878736 3
+ 160022 169000 8978 160024 160022 171330 202506240 3203864588 4
+ 160022 166572 6550 160026 160022 171330 2287075328 3203866224 5
+ 160022 167460 7438 160026 160022 171330 76677120 3203854632 6
+ 160022 168134 8112 160024 160022 171330 2161246208 3203874674 7
+ 160022 169094 9072 160022 160022 171330 4245815296 3203878798 8
+ 160022 172460 12438 160024 160022 172460 2035417088 3204112010 9
+ 160022 167862 7840 160030 160022 172460 4119986176 3203856800 10
+ 160022 168398 8376 160024 160022 172460 1909587968 3203854192 11
+ 160022 167548 7526 160024 160022 172460 3994157056 3203847442 12
+ 160022 167562 7540 160026 160022 172460 1783758848 3203862936 13
+ 160022 167604 7582 160024 160022 172460 3868327936 3203859346 14
+ 160022 168262 8240 160024 160022 172460 1657929728 3203851120 15
+ 160022 169700 9678 160024 160022 172460 3742498816 3203877690 16
+ 160022 170476 10454 160026 160022 172460 1532100608 3204088480 17
+ 160022 167798 7776 160024 160022 172460 3616669696 3203862072 18
+ 160022 166540 6518 160024 160022 172460 1406271488 3203836904 19
+ 160022 167516 7494 160024 160022 172460 3490840576 3203848120 20
+
+
+Memory bandwidth
+~~~~~~~~~~~~~~~~
+
+::
+
+ $ sudo /home/testuser/mlc --bandwidth_matrix
+ Intel(R) Memory Latency Checker - v3.5
+ Command line parameters: --bandwidth_matrix
+
+ Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes
+ Measuring Memory Bandwidths between nodes within system
+ Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
+ Using all the threads from each core if Hyper-threading is enabled
+ Using Read-only traffic type
+ Numa node
+ Numa node 0 1
+ 0 107947.7 50951.5
+ 1 50834.6 108183.4
+
+::
+
+ $ sudo /home/testuser/mlc --peak_injection_bandwidth
+ Intel(R) Memory Latency Checker - v3.5
+ Command line parameters: --peak_injection_bandwidth
+
+ Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes
+
+ Measuring Peak Injection Memory Bandwidths for the system
+ Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
+ Using all the threads from each core if Hyper-threading is enabled
+ Using traffic with the following read-write ratios
+ ALL Reads : 215733.9
+ 3:1 Reads-Writes : 182141.9
+ 2:1 Reads-Writes : 178615.7
+ 1:1 Reads-Writes : 149911.3
+ Stream-triad like: 159533.6
+
+::
+
+ $ sudo /home/testuser/mlc --max_bandwidth
+ Intel(R) Memory Latency Checker - v3.5
+ Command line parameters: --max_bandwidth
+
+ Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes
+
+ Measuring Maximum Memory Bandwidths for the system
+ Will take several minutes to complete as multiple injection rates will be tried to get the best bandwidth
+ Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
+ Using all the threads from each core if Hyper-threading is enabled
+ Using traffic with the following read-write ratios
+ ALL Reads : 216875.73
+ 3:1 Reads-Writes : 182615.14
+ 2:1 Reads-Writes : 178745.67
+ 1:1 Reads-Writes : 149485.27
+ Stream-triad like: 180057.87
+
+
+Memory latency
+~~~~~~~~~~~~~~
+
+::
+
+ $ sudo /home/testuser/mlc --latency_matrix
+ Intel(R) Memory Latency Checker - v3.5
+ Command line parameters: --latency_matrix
+
+ Using buffer size of 2000.000MB
+ Measuring idle latencies (in ns)...
+ Numa node
+ Numa node 0 1
+ 0 81.4 131.1
+ 1 131.1 81.3
+
+::
+
+ $ sudo /home/testuser/mlc --idle_latency
+ Intel(R) Memory Latency Checker - v3.5
+ Command line parameters: --idle_latency
+
+ Using buffer size of 2000.000MB
+ Each iteration took 202.0 core clocks ( 80.8 ns)
+
+::
+
+ $ sudo /home/testuser/mlc --loaded_latency
+ Intel(R) Memory Latency Checker - v3.5
+ Command line parameters: --loaded_latency
+
+ Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes
+
+ Measuring Loaded Latencies for the system
+ Using all the threads from each core if Hyper-threading is enabled
+ Using Read-only traffic type
+ Inject Latency Bandwidth
+ Delay (ns) MB/sec
+ ==========================
+ 00000 282.66 215712.8
+ 00002 282.14 215757.4
+ 00008 280.21 215868.1
+ 00015 279.20 216313.2
+ 00050 275.25 216643.0
+ 00100 227.05 215075.0
+ 00200 121.92 160242.9
+ 00300 101.21 111587.4
+ 00400 95.48 85019.7
+ 00500 94.46 68717.3
+ 00700 92.27 49742.2
+ 01000 91.03 35264.8
+ 01300 90.11 27396.3
+ 01700 89.34 21178.7
+ 02500 90.15 14672.8
+ 03500 89.00 10715.7
+ 05000 82.00 7788.2
+ 09000 81.46 4684.0
+ 20000 81.40 2541.9
+
+
+L1/L2/LLC latency
+~~~~~~~~~~~~~~~~~
+
+::
+
+$ sudo /home/testuser/mlc --c2c_latency
+Intel(R) Memory Latency Checker - v3.5
+Command line parameters: --c2c_latency
+
+Measuring cache-to-cache transfer latency (in ns)...
+Local Socket L2->L2 HIT latency 53.7
+Local Socket L2->L2 HITM latency 53.7
+Remote Socket L2->L2 HITM latency (data address homed in writer socket)
+ Reader Numa Node
+Writer Numa Node 0 1
+ 0 - 113.9
+ 1 113.9 -
+Remote Socket L2->L2 HITM latency (data address homed in reader socket)
+ Reader Numa Node
+Writer Numa Node 0 1
+ 0 - 177.9
+ 1 177.6 - \ No newline at end of file
diff --git a/docs/report/introduction/test_environment_sut_conf_1.rst b/docs/report/introduction/test_environment_sut_conf_1.rst
index 2c44d6bb02..c6803e07a3 100644
--- a/docs/report/introduction/test_environment_sut_conf_1.rst
+++ b/docs/report/introduction/test_environment_sut_conf_1.rst
@@ -1,5 +1,5 @@
-SUT Configuration - Host OS Linux
----------------------------------
+SUT Settings - Linux
+--------------------
System provisioning is done by combination of PXE boot unattented
install and
@@ -7,7 +7,7 @@ install and
Below a subset of the running configuration:
-#. Haswell - Ubuntu 16.04.1 LTS
+1. Xeon Haswell - Ubuntu 16.04.1 LTS
::
@@ -18,7 +18,7 @@ Below a subset of the running configuration:
Release: 16.04
Codename: xenial
-#. Skylake - Ubuntu 18.04 LTS
+2. Xeon Skylake - Ubuntu 18.04 LTS
::
@@ -29,7 +29,8 @@ Below a subset of the running configuration:
Release: 18.04
Codename: bionic
-**Kernel boot parameters used in CSIT performance testbeds**
+Linux Boot Parameters
+~~~~~~~~~~~~~~~~~~~~~
- **isolcpus=<cpu number>-<cpu number>** used for all cpu cores apart from
first core of each socket used for running VPP worker threads and Qemu/LXC
@@ -67,16 +68,17 @@ Below a subset of the running configuration:
virtualized environment.
- **hpet=disable** - [X86-32,HPET] Disable HPET and use PIT instead.
-**Applied command line boot parameters:**
+Applied Boot Cmdline
+~~~~~~~~~~~~~~~~~~~~
-#. Haswell - Ubuntu 16.04.1 LTS
+1. Xeon Haswell - Ubuntu 16.04.1 LTS
::
$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.4.0-72-generic root=UUID=35ea11e4-e44f-4f67-8cbe-12f09c49ed90 ro isolcpus=1-17,19-35 nohz_full=1-17,19-35 rcu_nocbs=1-17,19-35 intel_pstate=disable console=tty0 console=ttyS0,115200n8
-#. Skylake - Ubuntu 18.04 LTS
+2. Xeon Skylake - Ubuntu 18.04 LTS
::
diff --git a/docs/report/introduction/test_environment_sut_conf_2.rst b/docs/report/introduction/test_environment_sut_conf_2.rst
index 482c09d5fb..79aaff660e 100644
--- a/docs/report/introduction/test_environment_sut_conf_2.rst
+++ b/docs/report/introduction/test_environment_sut_conf_2.rst
@@ -1,9 +1,10 @@
-**Host CFS optimizations (QEMU+VPP)**
+Linux CFS tunings
+~~~~~~~~~~~~~~~~~
-Applying CFS scheduler tuning on all Qemu vcpu worker threads (those are
-handling testpmd - pmd threads) and VPP PMD worker threads. List of VPP PMD
-threads can be obtained e.g. from:
+Linux CFS scheduler tunings are applied to all QEMU vCPU worker threads
+(the ones handling testpmd PMD threads) and VPP data plane worker
+threads. List of VPP data plane threads can be obtained by running:
::
@@ -21,7 +22,7 @@ Or:
$ cat /proc/`pidof vpp`/task/*/stat | awk '{print $1" "$2" "$39}'
-Applying Round-robin scheduling with highest priority
+CFS round-robin scheduling with highest priority is applied using:
::
@@ -33,5 +34,5 @@ Applying Round-robin scheduling with highest priority
$ done
$ done
-More information about Linux CFS can be found in: `Sched manual pages
+More information about Linux CFS can be found in `Sched manual pages
<http://man7.org/linux/man-pages/man7/sched.7.html>`_.
diff --git a/docs/report/introduction/test_environment_sut_conf_3.rst b/docs/report/introduction/test_environment_sut_conf_3.rst
index e50a08eb98..f09327d531 100644
--- a/docs/report/introduction/test_environment_sut_conf_3.rst
+++ b/docs/report/introduction/test_environment_sut_conf_3.rst
@@ -1,26 +1,26 @@
-**Host IRQ affinity**
+Host IRQ Affinity
+~~~~~~~~~~~~~~~~~
-Changing the default pinning of every IRQ to core 0. (Same does apply on both
-guest VM and host OS)
+IRQs are pinned to core 0. The same configuration is applied in host Linux and guest VM.
::
$ for l in `ls /proc/irq`; do echo 1 | sudo tee /proc/irq/$l/smp_affinity; done
-**Host RCU affinity**
+Host RCU Affinity
+~~~~~~~~~~~~~~~~~
-Changing the default pinning of RCU to core 0. (Same does apply on both guest VM
-and host OS)
+RCUs are pinned to core 0. The same configuration is applied in host Linux and guest VM.
::
$ for i in `pgrep rcu[^c]` ; do sudo taskset -pc 0 $i ; done
-**Host Writeback affinity**
+Host Writeback Affinity
+~~~~~~~~~~~~~~~~~~~~~~~
-Changing the default pinning of writebacks to core 0. (Same does apply on both
-guest VM and host OS)
+Writebacks are pinned to core 0. The same configuration is applied in host Linux and guest VM.
::
diff --git a/docs/report/introduction/test_environment_tg.rst b/docs/report/introduction/test_environment_tg.rst
index 4ec30e5e80..7e0d3ddb80 100644
--- a/docs/report/introduction/test_environment_tg.rst
+++ b/docs/report/introduction/test_environment_tg.rst
@@ -1,19 +1,23 @@
-TG Configuration - TRex
------------------------
+TG Settings - TRex
+------------------
-**TG Version**
+TG Version
+~~~~~~~~~~
|trex-release|
-**DPDK version**
+DPDK version
+~~~~~~~~~~~~
DPDK v17.11
-**TG Build Script used**
+TG Build Script used
+~~~~~~~~~~~~~~~~~~~~
`TRex intallation`_
-**TG Startup Configuration**
+TG Startup Configuration
+~~~~~~~~~~~~~~~~~~~~~~~~
::
@@ -27,12 +31,14 @@ DPDK v17.11
- dest_mac : [0x3c,0xfd,0xfe,0x9c,0xee,0xf4]
src_mac : [0x3c,0xfd,0xfe,0x9c,0xee,0xf5]
-**TG Startup Command**
+TG Startup Command
+~~~~~~~~~~~~~~~~~~
::
$ sh -c 'cd <t-rex-install-dir>/scripts/ && sudo nohup ./t-rex-64 -i -c 7 --iom 0 > /tmp/trex.log 2>&1 &'> /dev/null
-**TG common API - pointer to driver**
+TG API Driver
+~~~~~~~~~~~~~
`TRex driver`_
diff --git a/docs/report/vpp_performance_tests/test_environment.rst b/docs/report/vpp_performance_tests/test_environment.rst
index 11308e8706..131e51dea3 100644
--- a/docs/report/vpp_performance_tests/test_environment.rst
+++ b/docs/report/vpp_performance_tests/test_environment.rst
@@ -5,6 +5,10 @@
.. include:: ../introduction/test_environment_intro.rst
+.. include:: ../introduction/test_environment_sut_calib_hsw.rst
+
+.. include:: ../introduction/test_environment_sut_calib_skx.rst
+
.. include:: ../introduction/test_environment_sut_conf_1.rst
.. include:: ../introduction/test_environment_sut_conf_2.rst
@@ -12,28 +16,33 @@
.. include:: ../introduction/test_environment_sut_conf_3.rst
-DUT Configuration - VPP
------------------------
+DUT Settings - VPP
+-----------------
-**VPP Version**
+VPP Version
+~~~~~~~~~~~
|vpp-release|
-**VPP Compile Parameters**
+VPP Compile Parameters
+~~~~~~~~~~~~~~~~~~~~~~
`FD.io VPP compile job`_
-**VPP Install Parameters**
+VPP Install Parameters
+~~~~~~~~~~~~~~~~~~~~~~
::
$ dpkg -i --force-all vpp*
-**VPP Startup Configuration**
+VPP Startup Configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~
-VPP startup configuration changes per test case with different settings for
-`$$CORELIST_WORKERS`, `$$NUM_RX_QUEUES`, `$$UIO_DRIVER`, `$$NUM-MBUFS` and
-`$$NO_MULTI_SEG` parameter. Default template:
+VPP startup configuration vary per test case, with different settings
+for `$$CORELIST_WORKERS`, `$$NUM_RX_QUEUES`, `$$UIO_DRIVER`, `$$NUM-
+MBUFS` and `$$NO_MULTI_SEG` parameter. Default template is provided
+below:
::
@@ -89,4 +98,7 @@ VPP startup configuration changes per test case with different settings for
dev $$DEV_2
}
+Description of VPP startup settings used in CSIT is provided in
+:ref:`performance_test_methodology`.
+
.. include:: ../introduction/test_environment_tg.rst