diff options
author | Vratko Polak <vrpolak@cisco.com> | 2020-11-19 14:06:56 +0100 |
---|---|---|
committer | Vratko Polak <vrpolak@cisco.com> | 2020-11-19 15:21:10 +0000 |
commit | 209c8d3fa2c6a1e0e8b8fe1b93076bb081cf4c74 (patch) | |
tree | c6c75393291321d70c8070169b5fd8e7da78ec28 | |
parent | c07fd95dbd33f463ddc05d8874380fd4a9d0c9c3 (diff) |
Methodology: Trex modes and transactions
Change-Id: I43423dea499bce3a298dbbba752c2aee2a322836
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
3 files changed, 199 insertions, 49 deletions
diff --git a/docs/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.rst b/docs/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.rst index 6389353a65..00dcb0b40e 100644 --- a/docs/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.rst +++ b/docs/report/introduction/methodology_data_plane_throughput/methodology_data_plane_throughput.rst @@ -13,6 +13,9 @@ Following throughput test methods are used: - MRR - Maximum Receive Rate - PLRsearch - Probabilistic Loss Ratio search +.. + TODO: Add RECONF. + Description of each test method is followed by generic test properties shared by all methods. diff --git a/docs/report/introduction/methodology_trex_traffic_generator.rst b/docs/report/introduction/methodology_trex_traffic_generator.rst index aea4d3236d..9813b28025 100644 --- a/docs/report/introduction/methodology_trex_traffic_generator.rst +++ b/docs/report/introduction/methodology_trex_traffic_generator.rst @@ -5,59 +5,194 @@ Usage ~~~~~ `TRex traffic generator <https://trex-tgn.cisco.com>`_ is used for majority of -CSIT performance tests. TRex stateless mode is used to measure NDR and PDR -throughputs using MLRsearch and to measure maximum transfer rate in MRR tests. - -TRex is installed and run on the TG compute node. The typical procedure is: - -- TRex configuration is set in its configuration file - - :: - - $ sudo -E -S sh -c 'cat << EOF > /etc/trex_cfg.yaml - - version: 2 - c: 8 - limit_memory: 8192 - interfaces: ["${pci1}","${pci2}"] - port_info: - - dest_mac: [${dest_mac1}] - src_mac: [${src_mac1}] - - dest_mac: [${dest_mac2}] - src_mac: [${src_mac2}] - platform : - master_thread_id: 0 - latency_thread_id: 9 - dual_if: - - socket: 0 - threads: [1, 2, 3, 4, 5, 6, 7, 8] - EOF' - -- TRex is started in the interactive mode as a background service - - :: - - $ sh -c 'cd <t-rex-install-dir>/scripts/ && \ - sudo nohup ./t-rex-64 -i --prefix $(hostname) --hdrh --no-scapy-server \ - > /tmp/trex.log 2>&1 &' > /dev/null - -- There are traffic streams dynamically prepared for each test, based on traffic - profiles. The traffic is sent and the statistics obtained using API - :command:`trex.stl.api.STLClient`. - -Measuring Packet Loss +CSIT performance tests. TRex is used in multiple types of performance tests, +see :ref:`data_plane_throughput` for more detail. + +TRex is installed and run on the TG compute node. +Versioning, installation and startup is documented in +:ref:`test_environment_tg`. + +Traffic modes +~~~~~~~~~~~~~ + +TRex is primarily used in two (mutually incompatible) modes. + +Stateless mode +______________ + +Sometimes abbreviated as STL. +A mode with high performance, which is unable to react to incoming traffic. +We use this mode whenever it is possible. +Typical test where this mode is not applicable is NAT44ED, +as DUT does not assign deterministic outside address+port combinations, +so we are unable to create traffic that does not lose packets +in out2in direction. + +Measurement results are based on simple L2 counters +(opackets, ipackets) for each traffic direction. + +Stateful mode +_____________ + +A mode capable of reacting to incoming traffic. +Contrary to the stateless mode, only UDP and TCP is supported +(carried over IPv4 or IPv6 packets). +Performance is limited, as TRex needs to do more CPU processing. +TRex suports two subtypes of stateful traffic, +CSIT uses ASTF (Advanced STateFul mode). + +This mode is suitable for NAT44ED tests, as clients send packets from inside, +and servers react to it, so they see the outside address and port to respond to. +Also, they do not send traffic before NAT44ED has opened the sessions. + +When possible, L2 counters (opackets, ipackets) are used. +Some tests need L7 counters, which track protocol state (e.g. TCP), +but the values are less than reliable on high loads. + +Traffic Continuity +~~~~~~~~~~~~~~~~~~ + +Generated traffic is either continuous, or limited. +Both modes support both continuities in principle. + +Continuous traffic +__________________ + +Traffic is started without any size goal. +Traffic is ended based on time duration as hinted by search algorithm. +This is useful when DUT behavior does not depend on the traffic duration. +The default for stateless mode. + +Limited traffic +_______________ + +Traffic has defined size goal, duration is computed based on the goal. +Traffic is ended when the size goal is reached, +or when the computed duration is reached. +This is useful when DUT behavior depends on traffic size, +e.g. target number of session, each to be hit once. +This is used mainly for stateful mode. + +Traffic synchronicity ~~~~~~~~~~~~~~~~~~~~~ -Following sequence is followed to measure packet loss: +Traffic can be generated synchronously (test waits for duration) +or asynchronously (test operates during traffic and stops traffic explicitly). + +Synchronous traffic +___________________ + +Trial measurement is driven by given (or precomputed) duration, +no activity from test driver during the traffic. +Used for most trials. -- Create an instance of STLClient. -- Connect to the client. -- Add all streams. -- Clear statistics. -- Send the traffic for defined time. -- Get the statistics. +Asynchronous traffic +____________________ -If there is a warm-up phase required, the traffic is sent also before -test and the statistics are ignored. +Traffic is started, but then the test driver is free to perform +other actions, before stopping the traffic explicitly. +This is used mainly by reconf tests, but also by some trials +used for runtime telemetry. + +Trafic profiles +~~~~~~~~~~~~~~~ + +TRex supports several ways to define the traffic. +CSIT uses small Python modules based on Scapy as definitions. +Details of traffic profiles depend on modes (STL or ASTF), +but some are common for both modes. + +Search algorithms are intentionally unaware of the traffic mode used, +so CSIT defines some terms to use instead of mode-specific TRex terms. + +Transactions +____________ + +TRex traffic profile defines a small number of behaviors, +in CSIT called transaction templates. Traffic profiles also instruct +TRex how to create a large number of transactions based on the templates. + +Continuous traffic loops over the generated transactions. +Limited traffic usually executes each transaction once. + +Currently, ASTF profiles define one transaction template each. +Number of packets expected per one transaction varies based on profile details, +as does the criterion for when a transaction is considered successful. + +Stateless transactions are just one packet (sent from one TG port, +successful if received on the other TG port). +Thus unidirectional stateless profiles define one transaction template, +bidirectional stateless profiles define two transaction templates. + +TPS multiplier +______________ + +TRex aims to open transaction specified by the profile at a steady rate. +While TRex allows the transaction template to define its intended "cps" value, +CSIT does not specify it, so the default value of 1 is applied, +meaning TRex will open one transaction per second (and transaction template) +by default. But CSIT invocation uses "multiplier" (mult) argument +when starting the traffic, that multiplies the cps value, +meaning it acts as TPS (transactions per second) input. + +With a slight abuse of nomenclature, bidirectional stateless tests +set "packets per transaction" value to 2, just to keep the TPS semantics +as a unidirectional input value. + +Duration stretching +___________________ + +TRex can be IO-bound, CPU-bound, or have any other reason +why it is not able to generate the traffic at the requested TPS. +Some conditions are detected, leading to TRex failure, +for example when the bandwidth does not fit into the line capacity. +But many reasons are not detected. + +Unfortunately, TRex frequently reacts by not honoring the duration +in synchronous mode, taking longer to send the traffic, +leading to lower then requested load offered to DUT. +This usualy breaks assumptions used in search algorithms, +so it has to be avoided. + +For stateless traffic, the behavior is quite deterministic, +so the workaround is to apply a fictional TPS limit (max_rate) +to search algorithms, usually depending only on the NIC used. + +For stateful traffic the behavior is not deterministic enough, +for example the limit for TCP traffic depends on DUT packet loss. +In CSIT we decided to use logic similar to asynchronous traffic. +The traffic driver sleeps for a time, then stops the traffic explicitly. +The library that parses counters into measurement results +than usually treats unsent packets as lost. + +We have added a IP4base tests for every NAT44ED test, +so that users can compare results. +Of the results are very similar, it is probable TRex was the bottleneck. + +Startup delay +_____________ + +By investigating TRex behavior, it was found that TRex does not start +the traffic in ASTF mode immediately. There is a delay of zero traffic, +after which the traffic rate ramps up to the defined TPS value. + +It is possible to poll for counters during the traffic +(fist nonzero means traffic has started), +but that was found to influence the NDR results. + +Thus "sleep and stop" stategy is used, which needs a correction +to the computed duration so traffic is stopped after the intended +duration of real traffic. Luckily, it turns out this correction +is not dependend on traffic profile nor CPU used by TRex, +so a fixed constant (0.1115 seconds) works well. + +The result computations need a precise enough duration of the real traffic, +luckily server side of TRex has precise enough counter for that. + +It is unknown whether stateless traffic profiles also exhibit a startup delay. +Unfortunately, stateless mode does not have similarly precise duration counter, +so some results (mostly MRR) are affected by less precise duration measurement +in Python part of CSIT code. Measuring Latency ~~~~~~~~~~~~~~~~~ diff --git a/docs/report/introduction/test_environment_tg.rst b/docs/report/introduction/test_environment_tg.rst index 28b233a574..24df4deb63 100644 --- a/docs/report/introduction/test_environment_tg.rst +++ b/docs/report/introduction/test_environment_tg.rst @@ -1,3 +1,5 @@ +.. _test_environment_tg: + TG Settings - TRex ------------------ @@ -54,6 +56,16 @@ Also, Python client is now starting traffic with: core_mask=STLClient.CORE_MASK_PIN +TG Startup Command (Stateful Mode) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:: + + $ sudo -E -S sh -c "cd '${trex_install_dir}/scripts/' && \ + nohup ./t-rex-64 -i --prefix $(hostname) --astf --hdrh --no-scapy-server \ + --mbuf-factor 32 > /tmp/trex.log 2>&1 &" > /dev/null + + TG API Driver ~~~~~~~~~~~~~ |