diff options
author | Maciek Konstantynowicz <mkonstan@cisco.com> | 2018-08-13 21:17:10 +0100 |
---|---|---|
committer | Maciek Konstantynowicz <mkonstan@cisco.com> | 2018-08-14 14:05:05 +0000 |
commit | 25a40550e90c036dddf17698103a9a3c34ff6799 (patch) | |
tree | adfa59a51262b3feeb717ad156db3f2ae96593bb | |
parent | 7ae94f2578699fcb50544742d6c27b59edc74abb (diff) |
1807 report: added HW calibration sections to test_environment plus editing nits.
Change-Id: I66698ae70d1bbbde6992e5663bc64c30249f7f79
Signed-off-by: Maciek Konstantynowicz <mkonstan@cisco.com>
9 files changed, 559 insertions, 53 deletions
diff --git a/docs/report/introduction/methodology.rst b/docs/report/introduction/methodology.rst index ff5714c259..28bcf68257 100644 --- a/docs/report/introduction/methodology.rst +++ b/docs/report/introduction/methodology.rst @@ -1,3 +1,6 @@ + +.. _performance_test_methodology: + Performance Test Methodology ============================ diff --git a/docs/report/introduction/test_environment_intro.rst b/docs/report/introduction/test_environment_intro.rst index d80ecdffe0..19dac90b96 100644 --- a/docs/report/introduction/test_environment_intro.rst +++ b/docs/report/introduction/test_environment_intro.rst @@ -3,16 +3,62 @@ Test Environment ================ -CSIT performance tests are executed on physical testbeds hosted by -:abbr:`LF (Linux Foundation)` for FD.io project. Each testbed consists of -either one (2-node) or two (3-node) servers acting as Systems Under Test (SUT) -and one server acting as Traffic Generator (TG). - -Server Specification and Configuration --------------------------------------- - -Complete specification and configuration of compute servers used in CSIT -physical testbeds is maintained on wiki page `CSIT testbed - Server HW -Configuration (Haswell) <https://wiki.fd.io/view/CSIT/CSIT_LF_testbed>`_ and -`CSIT testbed - Server HW Configuration (Skylake/ARM) -<https://wiki.fd.io/view/CSIT/fdio_csit_lab_ext_lld_draft>`_. +Physical Testbeds +----------------- + +FD.io CSIT performance tests are executed in physical testbeds hosted by +:abbr:`LF (Linux Foundation)` for FD.io project. + +Two physical testbed topology types are used: + +- **3-Node Topology**: Consisting of two servers acting as SUTs + (Systems Under Test) and one server as TG (Traffic Generator), all + connected in ring topology. +- **2-Node Topology**: Consisting of one server acting as SUTs and one + server as TG both connected in ring topology. + +Tested SUT servers are based on a range of processors including Intel +Xeon Haswell-SP, Intel Xeon Skylake-SP, Arm, Intel Atom. More detailed +description is provided in +:ref:`tested_physical_topologies`. + +Tested logical topologies are described in +:ref:`tested_logical_topologies`. + +Server Specifications +--------------------- + +Complete technical specifications of compute servers used in CSIT +physical testbeds are maintained on FD.io wiki pages: `CSIT/Testbeds: +Xeon Hsw, VIRL +<https://wiki.fd.io/view/CSIT/Testbeds:_Xeon_Hsw,_VIRL.#FD.io_CSIT_testbeds_-_Xeon_Haswell.2C_VIRL>`_ +and `CSIT Testbeds: Xeon Skx, Arm, Atom +<https://wiki.fd.io/view/CSIT/Testbeds:_Xeon_Skx,_Arm,_Atom.#Server_Specification>`_. + +Pre-Test Server Calibration +--------------------------- + +Number of SUT server sub-system runtime parameters have been identified +as impacting data plane performance tests. Calibrating those parameters +is part of FD.io CSIT pre-test activities, and includes measuring and +reporting following: + +#. System level core jitter – measure duration of core interrupts by + Linux in clock cycles and how often interrupts happen. Using + `CPU core jitter tool <https://git.fd.io/pma_tools/tree/jitter>`_. + +#. Memory bandwidth – measure bandwidth with `Intel MLC tool + <https://software.intel.com/en-us/articles/intelr-memory-latency-checker>`_. + +#. Memory latency – measure memory latency with Intel MLC tool. + +#. Cache latency at all levels (L1, L2, and Last Level Cache) – measure + cache latency with Intel MLC tool. + +Measured values of listed parameters are especially important for +repeatable zero packet loss throughput measurements across multiple +system instances. Generally they come useful as a background data for +comparing data plane performance results across disparate servers. + +Following sections include measured calibration data for Intel Xeon +Haswell and Intel Xeon Skylake testbeds. diff --git a/docs/report/introduction/test_environment_sut_calib_hsw.rst b/docs/report/introduction/test_environment_sut_calib_hsw.rst new file mode 100644 index 0000000000..b5ebdd25e5 --- /dev/null +++ b/docs/report/introduction/test_environment_sut_calib_hsw.rst @@ -0,0 +1,223 @@ +Calibration Data - Haswell +-------------------------- + +Following sections include sample calibration data measured on t1-sut1 +server running in one of the Intel Xeon Haswell testbeds as specified in +`CSIT/Testbeds: Xeon Hsw, VIRL +<https://wiki.fd.io/view/CSIT/Testbeds:_Xeon_Hsw,_VIRL.#FD.io_CSIT_testbeds_-_Xeon_Haswell.2C_VIRL>`_. + +Calibration data obtained from all other servers in Haswell testbeds +shows the same or similar values. + + +Linux cmdline +~~~~~~~~~~~~~ + +:: + + $ cat /proc/cmdline + BOOT_IMAGE=/vmlinuz-4.4.0-72-generic root=UUID=efb7e8b3-3548-4440-98f6-6ebe102e9ec6 ro isolcpus=1-17,19-35 nohz_full=1-17,19-35 rcu_nocbs=1-17,19-35 intel_pstate=disable console=tty0 console=ttyS0,115200n8 + + +Linux uname +~~~~~~~~~~~ + +:: + + $ uname -a + Linux t3-sut2 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux + + +System-level core jitter +~~~~~~~~~~~~~~~~~~~~~~~~ + +:: + + $ sudo taskset -c 3 /home/testuser/pma_tools/jitter/jitter -i 30 + Linux Jitter testing program version 1.8 + Iterations=30 + The pragram will execute a dummy function 80000 times + Display is updated every 20000 displayUpdate intervals + Timings are in CPU Core cycles + Inst_Min: Minimum Excution time during the display update interval(default is ~1 second) + Inst_Max: Maximum Excution time during the display update interval(default is ~1 second) + Inst_jitter: Jitter in the Excution time during rhe display update interval. This is the value of interest + last_Exec: The Excution time of last iteration just before the display update + Abs_Min: Absolute Minimum Excution time since the program started or statistics were reset + Abs_Max: Absolute Maximum Excution time since the program started or statistics were reset + tmp: Cumulative value calcualted by the dummy function + Interval: Time interval between the display updates in Core Cycles + Sample No: Sample number + + Inst_Min Inst_Max Inst_jitter last_Exec Abs_min Abs_max tmp Interval Sample No + 160024 172636 12612 160028 160024 172636 1573060608 3205463144 1 + 160024 188236 28212 160028 160024 188236 958595072 3205500844 2 + 160024 185676 25652 160028 160024 188236 344129536 3205485976 3 + 160024 172608 12584 160024 160024 188236 4024631296 3205472740 4 + 160024 179260 19236 160028 160024 188236 3410165760 3205502164 5 + 160024 172432 12408 160024 160024 188236 2795700224 3205452036 6 + 160024 178820 18796 160024 160024 188236 2181234688 3205455408 7 + 160024 172512 12488 160028 160024 188236 1566769152 3205461528 8 + 160024 172636 12612 160028 160024 188236 952303616 3205478820 9 + 160024 173676 13652 160028 160024 188236 337838080 3205470412 10 + 160024 178776 18752 160028 160024 188236 4018339840 3205481472 11 + 160024 172788 12764 160028 160024 188236 3403874304 3205492336 12 + 160024 174616 14592 160028 160024 188236 2789408768 3205474904 13 + 160024 174440 14416 160028 160024 188236 2174943232 3205479448 14 + 160024 178748 18724 160024 160024 188236 1560477696 3205482668 15 + 160024 172588 12564 169404 160024 188236 946012160 3205510496 16 + 160024 172636 12612 160024 160024 188236 331546624 3205472204 17 + 160024 172480 12456 160024 160024 188236 4012048384 3205455864 18 + 160024 172740 12716 160028 160024 188236 3397582848 3205464932 19 + 160024 179200 19176 160028 160024 188236 2783117312 3205476012 20 + 160024 172480 12456 160028 160024 188236 2168651776 3205465632 21 + 160024 172728 12704 160024 160024 188236 1554186240 3205497204 22 + 160024 172620 12596 160028 160024 188236 939720704 3205466972 23 + 160024 172640 12616 160028 160024 188236 325255168 3205471216 24 + 160024 172484 12460 160028 160024 188236 4005756928 3205467388 25 + 160024 172636 12612 160028 160024 188236 3391291392 3205482748 26 + 160024 179056 19032 160024 160024 188236 2776825856 3205467152 27 + 160024 172672 12648 160024 160024 188236 2162360320 3205483268 28 + 160024 176932 16908 160024 160024 188236 1547894784 3205488536 29 + 160024 172452 12428 160028 160024 188236 933429248 3205440636 30 + + +Memory bandwidth +~~~~~~~~~~~~~~~~ + +:: + + $ sudo /home/testuser/mlc --bandwidth_matrix + Intel(R) Memory Latency Checker - v3.5 + Command line parameters: --bandwidth_matrix + + Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes + Measuring Memory Bandwidths between nodes within system + Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) + Using all the threads from each core if Hyper-threading is enabled + Using Read-only traffic type + Numa node + Numa node 0 1 + 0 57935.5 30265.2 + 1 30284.6 58409.9 + +:: + + $ sudo /home/testuser/mlc --peak_injection_bandwidth + Intel(R) Memory Latency Checker - v3.5 + Command line parameters: --peak_injection_bandwidth + + Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes + + Measuring Peak Injection Memory Bandwidths for the system + Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) + Using all the threads from each core if Hyper-threading is enabled + Using traffic with the following read-write ratios + ALL Reads : 115762.2 + 3:1 Reads-Writes : 106242.2 + 2:1 Reads-Writes : 103031.8 + 1:1 Reads-Writes : 87943.7 + Stream-triad like: 100048.4 + +:: + + $ sudo /home/testuser/mlc --max_bandwidth + Intel(R) Memory Latency Checker - v3.5 + Command line parameters: --max_bandwidth + + Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes + + Measuring Maximum Memory Bandwidths for the system + Will take several minutes to complete as multiple injection rates will be tried to get the best bandwidth + Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) + Using all the threads from each core if Hyper-threading is enabled + Using traffic with the following read-write ratios + ALL Reads : 115782.41 + 3:1 Reads-Writes : 105965.78 + 2:1 Reads-Writes : 103162.38 + 1:1 Reads-Writes : 88255.82 + Stream-triad like: 105608.10 + + +Memory latency +~~~~~~~~~~~~~~ + +:: + + $ sudo /home/testuser/mlc --latency_matrix + Intel(R) Memory Latency Checker - v3.5 + Command line parameters: --latency_matrix + + Using buffer size of 200.000MB + Measuring idle latencies (in ns)... + Numa node + Numa node 0 1 + 0 101.0 132.0 + 1 141.2 98.8 + +:: + + $ sudo /home/testuser/mlc --idle_latency + Intel(R) Memory Latency Checker - v3.5 + Command line parameters: --idle_latency + + Using buffer size of 200.000MB + Each iteration took 227.2 core clocks ( 99.0 ns) + +:: + + $ sudo /home/testuser/mlc --loaded_latency + Intel(R) Memory Latency Checker - v3.5 + Command line parameters: --loaded_latency + + Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes + + Measuring Loaded Latencies for the system + Using all the threads from each core if Hyper-threading is enabled + Using Read-only traffic type + Inject Latency Bandwidth + Delay (ns) MB/sec + ========================== + 00000 294.08 115841.6 + 00002 294.27 115851.5 + 00008 293.67 115821.8 + 00015 278.92 115587.5 + 00050 246.80 113991.2 + 00100 206.86 104508.1 + 00200 123.72 72873.6 + 00300 113.35 52641.1 + 00400 108.89 41078.9 + 00500 108.11 33699.1 + 00700 106.19 24878.0 + 01000 104.75 17948.1 + 01300 103.72 14089.0 + 01700 102.95 11013.6 + 02500 102.25 7756.3 + 03500 101.81 5749.3 + 05000 101.46 4230.4 + 09000 101.05 2641.4 + 20000 100.77 1542.5 + + +L1/L2/LLC latency +~~~~~~~~~~~~~~~~~ + +:: + + $ sudo /home/testuser/mlc --c2c_latency + Intel(R) Memory Latency Checker - v3.5 + Command line parameters: --c2c_latency + + Measuring cache-to-cache transfer latency (in ns)... + Local Socket L2->L2 HIT latency 42.1 + Local Socket L2->L2 HITM latency 47.0 + Remote Socket L2->L2 HITM latency (data address homed in writer socket) + Reader Numa Node + Writer Numa Node 0 1 + 0 - 108.0 + 1 106.9 - + Remote Socket L2->L2 HITM latency (data address homed in reader socket) + Reader Numa Node + Writer Numa Node 0 1 + 0 - 107.7 + 1 106.6 -
\ No newline at end of file diff --git a/docs/report/introduction/test_environment_sut_calib_skx.rst b/docs/report/introduction/test_environment_sut_calib_skx.rst new file mode 100644 index 0000000000..2496e7a0d9 --- /dev/null +++ b/docs/report/introduction/test_environment_sut_calib_skx.rst @@ -0,0 +1,213 @@ +Calibration Data - Skylake +-------------------------- + +Following sections include sample calibration data measured on +s11-t31-sut1 server running in one of the Intel Xeon Skylake testbeds as +specified in `CSIT Testbeds: Xeon Skx, Arm, Atom +<https://wiki.fd.io/view/CSIT/Testbeds:_Xeon_Skx,_Arm,_Atom.#Server_Specification>`_. + +Calibration data obtained from all other servers in Skylake testbeds +shows the same or similar values. + + +Linux cmdline +~~~~~~~~~~~~~ + +:: + + $ cat /proc/cmdline + BOOT_IMAGE=/vmlinuz-4.15.0-23-generic root=UUID=759ad671-ad46-441b-a75b-9f54e81837bb ro isolcpus=1-27,29-55,57-83,85-111 nohz_full=1-27,29-55,57-83,85-111 rcu_nocbs=1-27,29-55,57-83,85-111 numa_balancing=disable intel_pstate=disable intel_iommu=on iommu=pt nmi_watchdog=0 audit=0 nosoftlockup processor.max_cstate=1 intel_idle.max_cstate=1 hpet=disable tsc=reliable mce=off console=tty0 console=ttyS0,115200n8 + + +Linux uname +~~~~~~~~~~~ + +:: + + $ uname -a + Linux s5-t22-sut1 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 18:02:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux + + +System-level core jitter +~~~~~~~~~~~~~~~~~~~~~~~~ + +:: + + $ sudo taskset -c 3 /home/testuser/pma_tools/jitter/jitter -i 20 + Linux Jitter testing program version 1.8 + Iterations=20 + The pragram will execute a dummy function 80000 times + Display is updated every 20000 displayUpdate intervals + Timings are in CPU Core cycles + Inst_Min: Minimum Excution time during the display update interval(default is ~1 second) + Inst_Max: Maximum Excution time during the display update interval(default is ~1 second) + Inst_jitter: Jitter in the Excution time during rhe display update interval. This is the value of interest + last_Exec: The Excution time of last iteration just before the display update + Abs_Min: Absolute Minimum Excution time since the program started or statistics were reset + Abs_Max: Absolute Maximum Excution time since the program started or statistics were reset + tmp: Cumulative value calcualted by the dummy function + Interval: Time interval between the display updates in Core Cycles + Sample No: Sample number + + Inst_Min Inst_Max Inst_jitter last_Exec Abs_min Abs_max tmp Interval Sample No + 160022 171330 11308 160022 160022 171330 2538733568 3204142750 1 + 160022 167294 7272 160026 160022 171330 328335360 3203873548 2 + 160022 167560 7538 160026 160022 171330 2412904448 3203878736 3 + 160022 169000 8978 160024 160022 171330 202506240 3203864588 4 + 160022 166572 6550 160026 160022 171330 2287075328 3203866224 5 + 160022 167460 7438 160026 160022 171330 76677120 3203854632 6 + 160022 168134 8112 160024 160022 171330 2161246208 3203874674 7 + 160022 169094 9072 160022 160022 171330 4245815296 3203878798 8 + 160022 172460 12438 160024 160022 172460 2035417088 3204112010 9 + 160022 167862 7840 160030 160022 172460 4119986176 3203856800 10 + 160022 168398 8376 160024 160022 172460 1909587968 3203854192 11 + 160022 167548 7526 160024 160022 172460 3994157056 3203847442 12 + 160022 167562 7540 160026 160022 172460 1783758848 3203862936 13 + 160022 167604 7582 160024 160022 172460 3868327936 3203859346 14 + 160022 168262 8240 160024 160022 172460 1657929728 3203851120 15 + 160022 169700 9678 160024 160022 172460 3742498816 3203877690 16 + 160022 170476 10454 160026 160022 172460 1532100608 3204088480 17 + 160022 167798 7776 160024 160022 172460 3616669696 3203862072 18 + 160022 166540 6518 160024 160022 172460 1406271488 3203836904 19 + 160022 167516 7494 160024 160022 172460 3490840576 3203848120 20 + + +Memory bandwidth +~~~~~~~~~~~~~~~~ + +:: + + $ sudo /home/testuser/mlc --bandwidth_matrix + Intel(R) Memory Latency Checker - v3.5 + Command line parameters: --bandwidth_matrix + + Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes + Measuring Memory Bandwidths between nodes within system + Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) + Using all the threads from each core if Hyper-threading is enabled + Using Read-only traffic type + Numa node + Numa node 0 1 + 0 107947.7 50951.5 + 1 50834.6 108183.4 + +:: + + $ sudo /home/testuser/mlc --peak_injection_bandwidth + Intel(R) Memory Latency Checker - v3.5 + Command line parameters: --peak_injection_bandwidth + + Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes + + Measuring Peak Injection Memory Bandwidths for the system + Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) + Using all the threads from each core if Hyper-threading is enabled + Using traffic with the following read-write ratios + ALL Reads : 215733.9 + 3:1 Reads-Writes : 182141.9 + 2:1 Reads-Writes : 178615.7 + 1:1 Reads-Writes : 149911.3 + Stream-triad like: 159533.6 + +:: + + $ sudo /home/testuser/mlc --max_bandwidth + Intel(R) Memory Latency Checker - v3.5 + Command line parameters: --max_bandwidth + + Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes + + Measuring Maximum Memory Bandwidths for the system + Will take several minutes to complete as multiple injection rates will be tried to get the best bandwidth + Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) + Using all the threads from each core if Hyper-threading is enabled + Using traffic with the following read-write ratios + ALL Reads : 216875.73 + 3:1 Reads-Writes : 182615.14 + 2:1 Reads-Writes : 178745.67 + 1:1 Reads-Writes : 149485.27 + Stream-triad like: 180057.87 + + +Memory latency +~~~~~~~~~~~~~~ + +:: + + $ sudo /home/testuser/mlc --latency_matrix + Intel(R) Memory Latency Checker - v3.5 + Command line parameters: --latency_matrix + + Using buffer size of 2000.000MB + Measuring idle latencies (in ns)... + Numa node + Numa node 0 1 + 0 81.4 131.1 + 1 131.1 81.3 + +:: + + $ sudo /home/testuser/mlc --idle_latency + Intel(R) Memory Latency Checker - v3.5 + Command line parameters: --idle_latency + + Using buffer size of 2000.000MB + Each iteration took 202.0 core clocks ( 80.8 ns) + +:: + + $ sudo /home/testuser/mlc --loaded_latency + Intel(R) Memory Latency Checker - v3.5 + Command line parameters: --loaded_latency + + Using buffer size of 100.000MB/thread for reads and an additional 100.000MB/thread for writes + + Measuring Loaded Latencies for the system + Using all the threads from each core if Hyper-threading is enabled + Using Read-only traffic type + Inject Latency Bandwidth + Delay (ns) MB/sec + ========================== + 00000 282.66 215712.8 + 00002 282.14 215757.4 + 00008 280.21 215868.1 + 00015 279.20 216313.2 + 00050 275.25 216643.0 + 00100 227.05 215075.0 + 00200 121.92 160242.9 + 00300 101.21 111587.4 + 00400 95.48 85019.7 + 00500 94.46 68717.3 + 00700 92.27 49742.2 + 01000 91.03 35264.8 + 01300 90.11 27396.3 + 01700 89.34 21178.7 + 02500 90.15 14672.8 + 03500 89.00 10715.7 + 05000 82.00 7788.2 + 09000 81.46 4684.0 + 20000 81.40 2541.9 + + +L1/L2/LLC latency +~~~~~~~~~~~~~~~~~ + +:: + +$ sudo /home/testuser/mlc --c2c_latency +Intel(R) Memory Latency Checker - v3.5 +Command line parameters: --c2c_latency + +Measuring cache-to-cache transfer latency (in ns)... +Local Socket L2->L2 HIT latency 53.7 +Local Socket L2->L2 HITM latency 53.7 +Remote Socket L2->L2 HITM latency (data address homed in writer socket) + Reader Numa Node +Writer Numa Node 0 1 + 0 - 113.9 + 1 113.9 - +Remote Socket L2->L2 HITM latency (data address homed in reader socket) + Reader Numa Node +Writer Numa Node 0 1 + 0 - 177.9 + 1 177.6 -
\ No newline at end of file diff --git a/docs/report/introduction/test_environment_sut_conf_1.rst b/docs/report/introduction/test_environment_sut_conf_1.rst index 2c44d6bb02..c6803e07a3 100644 --- a/docs/report/introduction/test_environment_sut_conf_1.rst +++ b/docs/report/introduction/test_environment_sut_conf_1.rst @@ -1,5 +1,5 @@ -SUT Configuration - Host OS Linux ---------------------------------- +SUT Settings - Linux +-------------------- System provisioning is done by combination of PXE boot unattented install and @@ -7,7 +7,7 @@ install and Below a subset of the running configuration: -#. Haswell - Ubuntu 16.04.1 LTS +1. Xeon Haswell - Ubuntu 16.04.1 LTS :: @@ -18,7 +18,7 @@ Below a subset of the running configuration: Release: 16.04 Codename: xenial -#. Skylake - Ubuntu 18.04 LTS +2. Xeon Skylake - Ubuntu 18.04 LTS :: @@ -29,7 +29,8 @@ Below a subset of the running configuration: Release: 18.04 Codename: bionic -**Kernel boot parameters used in CSIT performance testbeds** +Linux Boot Parameters +~~~~~~~~~~~~~~~~~~~~~ - **isolcpus=<cpu number>-<cpu number>** used for all cpu cores apart from first core of each socket used for running VPP worker threads and Qemu/LXC @@ -67,16 +68,17 @@ Below a subset of the running configuration: virtualized environment. - **hpet=disable** - [X86-32,HPET] Disable HPET and use PIT instead. -**Applied command line boot parameters:** +Applied Boot Cmdline +~~~~~~~~~~~~~~~~~~~~ -#. Haswell - Ubuntu 16.04.1 LTS +1. Xeon Haswell - Ubuntu 16.04.1 LTS :: $ cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.4.0-72-generic root=UUID=35ea11e4-e44f-4f67-8cbe-12f09c49ed90 ro isolcpus=1-17,19-35 nohz_full=1-17,19-35 rcu_nocbs=1-17,19-35 intel_pstate=disable console=tty0 console=ttyS0,115200n8 -#. Skylake - Ubuntu 18.04 LTS +2. Xeon Skylake - Ubuntu 18.04 LTS :: diff --git a/docs/report/introduction/test_environment_sut_conf_2.rst b/docs/report/introduction/test_environment_sut_conf_2.rst index 482c09d5fb..79aaff660e 100644 --- a/docs/report/introduction/test_environment_sut_conf_2.rst +++ b/docs/report/introduction/test_environment_sut_conf_2.rst @@ -1,9 +1,10 @@ -**Host CFS optimizations (QEMU+VPP)** +Linux CFS tunings +~~~~~~~~~~~~~~~~~ -Applying CFS scheduler tuning on all Qemu vcpu worker threads (those are -handling testpmd - pmd threads) and VPP PMD worker threads. List of VPP PMD -threads can be obtained e.g. from: +Linux CFS scheduler tunings are applied to all QEMU vCPU worker threads +(the ones handling testpmd PMD threads) and VPP data plane worker +threads. List of VPP data plane threads can be obtained by running: :: @@ -21,7 +22,7 @@ Or: $ cat /proc/`pidof vpp`/task/*/stat | awk '{print $1" "$2" "$39}' -Applying Round-robin scheduling with highest priority +CFS round-robin scheduling with highest priority is applied using: :: @@ -33,5 +34,5 @@ Applying Round-robin scheduling with highest priority $ done $ done -More information about Linux CFS can be found in: `Sched manual pages +More information about Linux CFS can be found in `Sched manual pages <http://man7.org/linux/man-pages/man7/sched.7.html>`_. diff --git a/docs/report/introduction/test_environment_sut_conf_3.rst b/docs/report/introduction/test_environment_sut_conf_3.rst index e50a08eb98..f09327d531 100644 --- a/docs/report/introduction/test_environment_sut_conf_3.rst +++ b/docs/report/introduction/test_environment_sut_conf_3.rst @@ -1,26 +1,26 @@ -**Host IRQ affinity** +Host IRQ Affinity +~~~~~~~~~~~~~~~~~ -Changing the default pinning of every IRQ to core 0. (Same does apply on both -guest VM and host OS) +IRQs are pinned to core 0. The same configuration is applied in host Linux and guest VM. :: $ for l in `ls /proc/irq`; do echo 1 | sudo tee /proc/irq/$l/smp_affinity; done -**Host RCU affinity** +Host RCU Affinity +~~~~~~~~~~~~~~~~~ -Changing the default pinning of RCU to core 0. (Same does apply on both guest VM -and host OS) +RCUs are pinned to core 0. The same configuration is applied in host Linux and guest VM. :: $ for i in `pgrep rcu[^c]` ; do sudo taskset -pc 0 $i ; done -**Host Writeback affinity** +Host Writeback Affinity +~~~~~~~~~~~~~~~~~~~~~~~ -Changing the default pinning of writebacks to core 0. (Same does apply on both -guest VM and host OS) +Writebacks are pinned to core 0. The same configuration is applied in host Linux and guest VM. :: diff --git a/docs/report/introduction/test_environment_tg.rst b/docs/report/introduction/test_environment_tg.rst index 4ec30e5e80..7e0d3ddb80 100644 --- a/docs/report/introduction/test_environment_tg.rst +++ b/docs/report/introduction/test_environment_tg.rst @@ -1,19 +1,23 @@ -TG Configuration - TRex ------------------------ +TG Settings - TRex +------------------ -**TG Version** +TG Version +~~~~~~~~~~ |trex-release| -**DPDK version** +DPDK version +~~~~~~~~~~~~ DPDK v17.11 -**TG Build Script used** +TG Build Script used +~~~~~~~~~~~~~~~~~~~~ `TRex intallation`_ -**TG Startup Configuration** +TG Startup Configuration +~~~~~~~~~~~~~~~~~~~~~~~~ :: @@ -27,12 +31,14 @@ DPDK v17.11 - dest_mac : [0x3c,0xfd,0xfe,0x9c,0xee,0xf4] src_mac : [0x3c,0xfd,0xfe,0x9c,0xee,0xf5] -**TG Startup Command** +TG Startup Command +~~~~~~~~~~~~~~~~~~ :: $ sh -c 'cd <t-rex-install-dir>/scripts/ && sudo nohup ./t-rex-64 -i -c 7 --iom 0 > /tmp/trex.log 2>&1 &'> /dev/null -**TG common API - pointer to driver** +TG API Driver +~~~~~~~~~~~~~ `TRex driver`_ diff --git a/docs/report/vpp_performance_tests/test_environment.rst b/docs/report/vpp_performance_tests/test_environment.rst index 11308e8706..131e51dea3 100644 --- a/docs/report/vpp_performance_tests/test_environment.rst +++ b/docs/report/vpp_performance_tests/test_environment.rst @@ -5,6 +5,10 @@ .. include:: ../introduction/test_environment_intro.rst +.. include:: ../introduction/test_environment_sut_calib_hsw.rst + +.. include:: ../introduction/test_environment_sut_calib_skx.rst + .. include:: ../introduction/test_environment_sut_conf_1.rst .. include:: ../introduction/test_environment_sut_conf_2.rst @@ -12,28 +16,33 @@ .. include:: ../introduction/test_environment_sut_conf_3.rst -DUT Configuration - VPP ------------------------ +DUT Settings - VPP +----------------- -**VPP Version** +VPP Version +~~~~~~~~~~~ |vpp-release| -**VPP Compile Parameters** +VPP Compile Parameters +~~~~~~~~~~~~~~~~~~~~~~ `FD.io VPP compile job`_ -**VPP Install Parameters** +VPP Install Parameters +~~~~~~~~~~~~~~~~~~~~~~ :: $ dpkg -i --force-all vpp* -**VPP Startup Configuration** +VPP Startup Configuration +~~~~~~~~~~~~~~~~~~~~~~~~~ -VPP startup configuration changes per test case with different settings for -`$$CORELIST_WORKERS`, `$$NUM_RX_QUEUES`, `$$UIO_DRIVER`, `$$NUM-MBUFS` and -`$$NO_MULTI_SEG` parameter. Default template: +VPP startup configuration vary per test case, with different settings +for `$$CORELIST_WORKERS`, `$$NUM_RX_QUEUES`, `$$UIO_DRIVER`, `$$NUM- +MBUFS` and `$$NO_MULTI_SEG` parameter. Default template is provided +below: :: @@ -89,4 +98,7 @@ VPP startup configuration changes per test case with different settings for dev $$DEV_2 } +Description of VPP startup settings used in CSIT is provided in +:ref:`performance_test_methodology`. + .. include:: ../introduction/test_environment_tg.rst |