CORE MASK doc

author: imarom <imarom@cisco.com> 2016-08-21 03:37:40 +0300
committer: imarom <imarom@cisco.com> 2016-08-21 03:39:52 +0300
commit: 34151a07567c5eb2d8af0ee46e52de3c09718469 (patch)
tree: 6a337af10044b7fa1d8f2d973036a90b02fd0ba7
parent: 666dde01548e89416789fd50439ceceb6bbaf020 (diff)
4 files changed, 192 insertions, 1 deletions
diff --git a/images/core_mask_pin.png b/images/core_mask_pin.png
new file mode 100644
index 00000000..63893484
--- /dev/null
+++ b/images/core_mask_pin.png
diff --git a/images/core_mask_split.png b/images/core_mask_split.png
new file mode 100644
index 00000000..7b0a456c
--- /dev/null
+++ b/images/core_mask_split.png
diff --git a/trex_rpc_server_spec.asciidoc b/trex_rpc_server_spec.asciidoc
index 7d016112..15bfc97f 100755
--- a/trex_rpc_server_spec.asciidoc
+++ b/trex_rpc_server_spec.asciidoc
@@ -1115,6 +1115,7 @@ In case rx_stats feature is enabled, rx_object **must include** all rx_stats obj
 * *Parameters*
 ** *handler* ['string'] - unique connection handler
 ** *port_id* ['int'] - port id on which to start traffic
+** *core_mask* ['uint64'] [*optional*] - a non zero mask to specify which cores will be active during TX, if no value is provided, the value is all bits on (MAX_UINT64)
 
 * *Result* ['object'] - {}
 
@@ -1128,9 +1129,10 @@ In case rx_stats feature is enabled, rx_object **must include** all rx_stats obj
     "jsonrpc": "2.0",
     "method": "start_traffic",
     "params": {
-	"api_h": "SPhoCDIV",
+        "api_h": "SPhoCDIV",
         "handler": "37JncCHr",
         "port_id": 3
+        "core_mask": 0xff
     }
 
 'Response':
diff --git a/trex_stateless.asciidoc b/trex_stateless.asciidoc
index 88eed076..c55b6520 100755
--- a/trex_stateless.asciidoc
+++ b/trex_stateless.asciidoc
@@ -3673,6 +3673,195 @@ try:
 
 ----
 
+=== Performance Tweaking
+In this section we provide some advanced features to help get the most of TRex performance.
+The reason that those features are not active out of the box because they might have
+some impact on other areas and in general, might sacrafice one or more properties
+that requires the user to explicitly give up on those.
+
+==== Caching MBUFs
+*<TBD>*
+
+==== Core masking per interface
+By default, TRex will regard any TX command with a **greedy approach**:
+All the DP cores associated with this port will be assigned in order to produce the maximum
+throughput.
+
+image::images/core_mask_split.png[title="Greedy Approach - Splitting",align="left",width={p_width}, link="images/core_mask_split.png"]
+
+However, in some cases it might be beneficial to provide a port with a subset of the cores to use.
+
+
+For example, when injecting traffic on two ports and the following conditions are met:
+
+* the two ports are adjacent
+* the profile is symmetric
+
+Due to TRex architecture, adjacent ports (e.g. port 0 & port 1) shares the same cores,
+and using the greedy approach will cause all the cores to transmit on both port 0 and port 1.
+
+When the profile is *symmetric* it will be wiser to pin half the cores to port 0 and half
+the cores to port 1 and thus avoid cache trashing and bouncing.
+If the profile is not symmetric, the static pinning may deny CPU cycles from the more congested port.
+
+image::images/core_mask_pin.png[title="Pinning Cores To Ports",align="left",width={p_width}, link="images/core_mask_pin.png"]
+
+TRex provides this in two ways:
+
+
+==== Predefind modes
+
+As said above, the default mode is 'split' mode, but you can provide a predefined mode called 'pin'.
+This can be done by both API and from the console:
+
+[source,bash]
+----
+
+trex>start -f stl/syn_attack.py -m 40mpps --total -p 0 1 --pin        <-- provide '--pin' to the command
+
+Removing all streams from port(s) [0, 1]:                    [SUCCESS]
+
+
+Attaching 1 streams to port(s) [0]:                          [SUCCESS]
+
+
+Attaching 1 streams to port(s) [1]:                          [SUCCESS]
+
+
+Starting traffic on port(s) [0, 1]:                          [SUCCESS]
+
+60.20 [ms]
+
+trex>
+
+----
+
+[source,bash]
+----
+
+We can see in the CPU util. available from the TUI window,
+that each core was reserverd for an interface:
+
+Global Stats:
+
+Total Tx L2  : 20.49 Gb/sec
+Total Tx L1  : 26.89 Gb/sec
+Total Rx     : 20.49 Gb/sec
+Total Pps    : 40.01 Mpkt/sec       <-- performance meets the requested rate
+Drop Rate    : 0.00 b/sec
+Queue Full   : 0 pkts
+
+
+Cpu Util(%)
+
+  Thread   | Avg | Latest | -1  | -2  | -3  | -4  | -5  | -6  | -7  | -8 
+
+ 0   (0)   |  92 |     92 |  92 |  91 |  91 |  92 |  91 |  92 |  93 |  94
+ 1 (IDLE)  |   0 |      0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0
+ 2   (1)   |  96 |     95 |  95 |  96 |  96 |  96 |  96 |  95 |  94 |  95
+ 3 (IDLE)  |   0 |      0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0
+ 4   (0)   |  92 |     93 |  93 |  91 |  91 |  93 |  93 |  93 |  93 |  93
+ 5 (IDLE)  |   0 |      0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0
+ 6   (1)   |  88 |     88 |  88 |  88 |  88 |  88 |  88 |  88 |  87 |  87
+ 7 (IDLE)  |   0 |      0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0
+
+----
+
+
+If we had used the *default mode*, the table should have looked like this, and yield
+much worse performance:
+
+[source,bash]
+----
+
+Global Stats:
+
+Total Tx L2  : 12.34 Gb/sec
+Total Tx L1  : 16.19 Gb/sec
+Total Rx     : 12.34 Gb/sec
+Total Pps    : 24.09 Mpkt/sec       <-- performance is quite low than requested
+Drop Rate    : 0.00 b/sec
+Queue Full   : 0 pkts
+
+Cpu Util(%)
+
+  Thread   | Avg | Latest | -1  | -2  | -3  | -4  | -5  | -6  | -7  | -8  
+
+ 0  (0,1)  | 100 |    100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 
+ 1 (IDLE)  |   0 |      0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 
+ 2  (0,1)  | 100 |    100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 
+ 3 (IDLE)  |   0 |      0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 
+ 4  (0,1)  | 100 |    100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 
+ 5 (IDLE)  |   0 |      0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 
+ 6  (0,1)  | 100 |    100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 
+ 7 (IDLE)  |   0 |      0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 
+
+----
+
+This feature is also available from the Python API by providing:
+*CORE_MASK_SPLIT* or *CORE_MASK_PIN* to the start API.
+
+
+==== Manual mask
+Sometimes for debug purposes or for a more advanced core scheduling you might want
+to provide a manual masking that will guide the server on which cores to use.
+
+For example, let's assume we have a profile that utilize 95% of the traffic on one side,
+and in the other direction it provides 5% of the traffic.
+Let's assume also we have 8 cores assigned to the two interfaces.
+
+We want to assign 3 cores to interface 0 and 1 core only to interface 1.
+
+We can provide this line to the console (or for the API by providing a list of masks to the start
+command):
+
+[source,bash]
+----
+trex>start -f stl/syn_attack.py -m 10mpps --total -p 0 1 --core_mask 0xE 0x1
+
+Removing all streams from port(s) [0, 1]:                    [SUCCESS]
+
+
+Attaching 1 streams to port(s) [0]:                          [SUCCESS]
+
+
+Attaching 1 streams to port(s) [1]:                          [SUCCESS]
+
+
+Starting traffic on port(s) [0, 1]:                          [SUCCESS]
+
+37.19 [ms]
+
+trex>                   
+----
+
+The following output is received on the TUI CPU util window:
+
+[source,bash]
+----
+
+Total Tx L2  : 5.12 Gb/sec
+Total Tx L1  : 6.72 Gb/sec
+Total Rx     : 5.12 Gb/sec
+Total Pps    : 10.00 Mpkt/sec
+Drop Rate    : 0.00 b/sec
+Queue Full   : 0 pkts
+
+Cpu Util(%)
+
+  Thread   | Avg | Latest | -1  | -2  | -3  | -4  | -5  | -6  | -7  | -8 
+
+ 0   (1)   |  45 |     45 |  45 |  45 |  45 |  45 |  46 |  45 |  46 |  45
+ 1 (IDLE)  |   0 |      0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0
+ 2   (0)   |  15 |     15 |  14 |  15 |  15 |  14 |  14 |  14 |  14 |  14
+ 3 (IDLE)  |   0 |      0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0
+ 4   (0)   |  14 |     14 |  14 |  14 |  14 |  14 |  14 |  14 |  15 |  14
+ 5 (IDLE)  |   0 |      0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0
+ 6   (0)   |  15 |     15 |  15 |  15 |  15 |  15 |  15 |  15 |  15 |  15
+ 7 (IDLE)  |   0 |      0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0
+
+----
+
 === Reference
 
 Additional profiles and examples are available in the `stl/hlt` folder.
author	imarom <imarom@cisco.com>	2016-08-21 03:37:40 +0300
committer	imarom <imarom@cisco.com>	2016-08-21 03:39:52 +0300
commit	34151a07567c5eb2d8af0ee46e52de3c09718469 (patch)
tree	6a337af10044b7fa1d8f2d973036a90b02fd0ba7
parent	666dde01548e89416789fd50439ceceb6bbaf020 (diff)