1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
|
.. SPDX-License-Identifier: BSD-3-Clause
Copyright(c) 2010-2017 Intel Corporation.
Quota and Watermark Sample Application
======================================
The Quota and Watermark sample application is a simple example of packet
processing using Data Plane Development Kit (DPDK) that showcases the use
of a quota as the maximum number of packets enqueue/dequeue at a time and
low and high thresholds, or watermarks, to signal low and high ring usage
respectively.
Additionally, it shows how the thresholds can be used to feedback congestion notifications to data producers by
temporarily stopping processing overloaded rings and sending Ethernet flow control frames.
This sample application is split in two parts:
* qw - The core quota and watermark sample application
* qwctl - A command line tool to alter quota and watermarks while qw is running
Overview
--------
The Quota and Watermark sample application performs forwarding for each packet that is received on a given port.
The destination port is the adjacent port from the enabled port mask, that is,
if the first four ports are enabled (port mask 0xf), ports 0 and 1 forward into each other,
and ports 2 and 3 forward into each other.
The MAC addresses of the forwarded Ethernet frames are not affected.
Internally, packets are pulled from the ports by the master logical core and put on a variable length processing pipeline,
each stage of which being connected by rings, as shown in :numref:`figure_pipeline_overview`.
.. _figure_pipeline_overview:
.. figure:: img/pipeline_overview.*
Pipeline Overview
An adjustable quota value controls how many packets are being moved through the pipeline per enqueue and dequeue.
Adjustable threshold values associated with the rings control a back-off mechanism that
tries to prevent the pipeline from being overloaded by:
* Stopping enqueuing on rings for which the usage has crossed the high watermark threshold
* Sending Ethernet pause frames
* Only resuming enqueuing on a ring once its usage goes below a global low watermark threshold
This mechanism allows congestion notifications to go up the ring pipeline and
eventually lead to an Ethernet flow control frame being send to the source.
On top of serving as an example of quota and watermark usage,
this application can be used to benchmark ring based processing pipelines performance using a traffic- generator,
as shown in :numref:`figure_ring_pipeline_perf_setup`.
.. _figure_ring_pipeline_perf_setup:
.. figure:: img/ring_pipeline_perf_setup.*
Ring-based Processing Pipeline Performance Setup
Compiling the Application
-------------------------
To compile the sample application see :doc:`compiling`.
The application is located in the ``quota_watermark`` sub-directory.
Running the Application
-----------------------
The core application, qw, has to be started first.
Once it is up and running, one can alter quota and watermarks while it runs using the control application, qwctl.
Running the Core Application
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The application requires a single command line option:
.. code-block:: console
./qw/build/qw [EAL options] -- -p PORTMASK
where,
-p PORTMASK: A hexadecimal bitmask of the ports to configure
To run the application in a linuxapp environment with four logical cores and ports 0 and 2,
issue the following command:
.. code-block:: console
./qw/build/qw -l 0-3 -n 4 -- -p 5
Refer to the *DPDK Getting Started Guide* for general information on running applications and
the Environment Abstraction Layer (EAL) options.
Running the Control Application
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The control application requires a number of command line options:
.. code-block:: console
./qwctl/build/qwctl [EAL options] --proc-type=secondary
The --proc-type=secondary option is necessary for the EAL to properly initialize the control application to
use the same huge pages as the core application and thus be able to access its rings.
To run the application in a linuxapp environment on logical core 0, issue the following command:
.. code-block:: console
./qwctl/build/qwctl -l 0 -n 4 --proc-type=secondary
Refer to the *DPDK Getting Started* Guide for general information on running applications and
the Environment Abstraction Layer (EAL) options.
qwctl is an interactive command line that let the user change variables in a running instance of qw.
The help command gives a list of available commands:
.. code-block:: console
$ qwctl > help
Code Overview
-------------
The following sections provide a quick guide to the application's source code.
Core Application - qw
~~~~~~~~~~~~~~~~~~~~~
EAL and Drivers Setup
^^^^^^^^^^^^^^^^^^^^^
The EAL arguments are parsed at the beginning of the main() function:
.. code-block:: c
ret = rte_eal_init(argc, argv);
if (ret < 0)
rte_exit(EXIT_FAILURE, "Cannot initialize EAL\n");
argc -= ret;
argv += ret;
Then, a call to init_dpdk(), defined in init.c, is made to initialize the poll mode drivers:
.. code-block:: c
void
init_dpdk(void)
{
int ret;
/* Bind the drivers to usable devices */
ret = rte_pci_probe();
if (ret < 0)
rte_exit(EXIT_FAILURE, "rte_pci_probe(): error %d\n", ret);
if (rte_eth_dev_count_avail() < 2)
rte_exit(EXIT_FAILURE, "Not enough Ethernet port available\n");
}
To fully understand this code, it is recommended to study the chapters that relate to the *Poll Mode Driver*
in the *DPDK Getting Started Guide* and the *DPDK API Reference*.
Shared Variables Setup
^^^^^^^^^^^^^^^^^^^^^^
The quota and high and low watermark shared variables are put into an rte_memzone using a call to setup_shared_variables():
.. code-block:: c
void
setup_shared_variables(void)
{
const struct rte_memzone *qw_memzone;
qw_memzone = rte_memzone_reserve(QUOTA_WATERMARK_MEMZONE_NAME,
3 * sizeof(int), rte_socket_id(), 0);
if (qw_memzone == NULL)
rte_exit(EXIT_FAILURE, "%s\n", rte_strerror(rte_errno));
quota = qw_memzone->addr;
low_watermark = (unsigned int *) qw_memzone->addr + 1;
high_watermark = (unsigned int *) qw_memzone->addr + 2;
}
These three variables are initialized to a default value in main() and
can be changed while qw is running using the qwctl control program.
Application Arguments
^^^^^^^^^^^^^^^^^^^^^
The qw application only takes one argument: a port mask that specifies which ports should be used by the application.
At least two ports are needed to run the application and there should be an even number of ports given in the port mask.
The port mask parsing is done in parse_qw_args(), defined in args.c.
Mbuf Pool Initialization
^^^^^^^^^^^^^^^^^^^^^^^^
Once the application's arguments are parsed, an mbuf pool is created.
It contains a set of mbuf objects that are used by the driver and the application to store network packets:
.. code-block:: c
/* Create a pool of mbuf to store packets */
mbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", MBUF_PER_POOL, 32, 0,
MBUF_DATA_SIZE, rte_socket_id());
if (mbuf_pool == NULL)
rte_panic("%s\n", rte_strerror(rte_errno));
The rte_mempool is a generic structure used to handle pools of objects.
In this case, it is necessary to create a pool that will be used by the driver.
The number of allocated pkt mbufs is MBUF_PER_POOL, with a data room size
of MBUF_DATA_SIZE each.
A per-lcore cache of 32 mbufs is kept.
The memory is allocated in on the master lcore's socket, but it is possible to extend this code to allocate one mbuf pool per socket.
The rte_pktmbuf_pool_create() function uses the default mbuf pool and mbuf
initializers, respectively rte_pktmbuf_pool_init() and rte_pktmbuf_init().
An advanced application may want to use the mempool API to create the
mbuf pool with more control.
Ports Configuration and Pairing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Each port in the port mask is configured and a corresponding ring is created in the master lcore's array of rings.
This ring is the first in the pipeline and will hold the packets directly coming from the port.
.. code-block:: c
for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++)
if (is_bit_set(port_id, portmask)) {
configure_eth_port(port_id);
init_ring(master_lcore_id, port_id);
}
pair_ports();
The configure_eth_port() and init_ring() functions are used to configure a port and a ring respectively and are defined in init.c.
They make use of the DPDK APIs defined in rte_eth.h and rte_ring.h.
pair_ports() builds the port_pairs[] array so that its key-value pairs are a mapping between reception and transmission ports.
It is defined in init.c.
Logical Cores Assignment
^^^^^^^^^^^^^^^^^^^^^^^^
The application uses the master logical core to poll all the ports for new packets and enqueue them on a ring associated with the port.
Each logical core except the last runs pipeline_stage() after a ring for each used port is initialized on that core.
pipeline_stage() on core X dequeues packets from core X-1's rings and enqueue them on its own rings. See :numref:`figure_threads_pipelines`.
.. code-block:: c
/* Start pipeline_stage() on all the available slave lcore but the last */
for (lcore_id = 0 ; lcore_id < last_lcore_id; lcore_id++) {
if (rte_lcore_is_enabled(lcore_id) && lcore_id != master_lcore_id) {
for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++)
if (is_bit_set(port_id, portmask))
init_ring(lcore_id, port_id);
rte_eal_remote_launch(pipeline_stage, NULL, lcore_id);
}
}
The last available logical core runs send_stage(),
which is the last stage of the pipeline dequeuing packets from the last ring in the pipeline and
sending them out on the destination port setup by pair_ports().
.. code-block:: c
/* Start send_stage() on the last slave core */
rte_eal_remote_launch(send_stage, NULL, last_lcore_id);
Receive, Process and Transmit Packets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. _figure_threads_pipelines:
.. figure:: img/threads_pipelines.*
Threads and Pipelines
In the receive_stage() function running on the master logical core,
the main task is to read ingress packets from the RX ports and enqueue them
on the port's corresponding first ring in the pipeline.
This is done using the following code:
.. code-block:: c
lcore_id = rte_lcore_id();
/* Process each port round robin style */
for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
if (!is_bit_set(port_id, portmask))
continue;
ring = rings[lcore_id][port_id];
if (ring_state[port_id] != RING_READY) {
if (rte_ring_count(ring) > *low_watermark)
continue;
else
ring_state[port_id] = RING_READY;
}
/* Enqueue received packets on the RX ring */
nb_rx_pkts = rte_eth_rx_burst(port_id, 0, pkts,
(uint16_t) *quota);
ret = rte_ring_enqueue_bulk(ring, (void *) pkts,
nb_rx_pkts, &free);
if (RING_SIZE - free > *high_watermark) {
ring_state[port_id] = RING_OVERLOADED;
send_pause_frame(port_id, 1337);
}
if (ret == 0) {
/*
* Return mbufs to the pool,
* effectively dropping packets
*/
for (i = 0; i < nb_rx_pkts; i++)
rte_pktmbuf_free(pkts[i]);
}
}
For each port in the port mask, the corresponding ring's pointer is fetched into ring and that ring's state is checked:
* If it is in the RING_READY state, \*quota packets are grabbed from the port and put on the ring.
Should this operation make the ring's usage cross its high watermark,
the ring is marked as overloaded and an Ethernet flow control frame is sent to the source.
* If it is not in the RING_READY state, this port is ignored until the ring's usage crosses the \*low_watermark value.
The pipeline_stage() function's task is to process and move packets from the preceding pipeline stage.
This thread is running on most of the logical cores to create and arbitrarily long pipeline.
.. code-block:: c
lcore_id = rte_lcore_id();
previous_lcore_id = get_previous_lcore_id(lcore_id);
for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
if (!is_bit_set(port_id, portmask))
continue;
tx = rings[lcore_id][port_id];
rx = rings[previous_lcore_id][port_id];
if (ring_state[port_id] != RING_READY) {
if (rte_ring_count(tx) > *low_watermark)
continue;
else
ring_state[port_id] = RING_READY;
}
/* Dequeue up to quota mbuf from rx */
nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts,
*quota, NULL);
if (unlikely(nb_dq_pkts < 0))
continue;
/* Enqueue them on tx */
ret = rte_ring_enqueue_bulk(tx, pkts,
nb_dq_pkts, &free);
if (RING_SIZE - free > *high_watermark)
ring_state[port_id] = RING_OVERLOADED;
if (ret == 0) {
/*
* Return mbufs to the pool,
* effectively dropping packets
*/
for (i = 0; i < nb_dq_pkts; i++)
rte_pktmbuf_free(pkts[i]);
}
}
The thread's logic works mostly like receive_stage(),
except that packets are moved from ring to ring instead of port to ring.
In this example, no actual processing is done on the packets,
but pipeline_stage() is an ideal place to perform any processing required by the application.
Finally, the send_stage() function's task is to read packets from the last ring in a pipeline and
send them on the destination port defined in the port_pairs[] array.
It is running on the last available logical core only.
.. code-block:: c
lcore_id = rte_lcore_id();
previous_lcore_id = get_previous_lcore_id(lcore_id);
for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
if (!is_bit_set(port_id, portmask)) continue;
dest_port_id = port_pairs[port_id];
tx = rings[previous_lcore_id][port_id];
if (rte_ring_empty(tx)) continue;
/* Dequeue packets from tx and send them */
nb_dq_pkts = rte_ring_dequeue_burst(tx, (void *) tx_pkts, *quota);
nb_tx_pkts = rte_eth_tx_burst(dest_port_id, 0, tx_pkts, nb_dq_pkts);
}
For each port in the port mask, up to \*quota packets are pulled from the last ring in its pipeline and
sent on the destination port paired with the current port.
Control Application - qwctl
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The qwctl application uses the rte_cmdline library to provide the user with an interactive command line that
can be used to modify and inspect parameters in a running qw application.
Those parameters are the global quota and low_watermark value as well as each ring's built-in high watermark.
Command Definitions
^^^^^^^^^^^^^^^^^^^
The available commands are defined in commands.c.
It is advised to use the cmdline sample application user guide as a reference for everything related to the rte_cmdline library.
Accessing Shared Variables
^^^^^^^^^^^^^^^^^^^^^^^^^^
The setup_shared_variables() function retrieves the shared variables quota and
low_watermark from the rte_memzone previously created by qw.
.. code-block:: c
static void
setup_shared_variables(void)
{
const struct rte_memzone *qw_memzone;
qw_memzone = rte_memzone_lookup(QUOTA_WATERMARK_MEMZONE_NAME);
if (qw_memzone == NULL)
rte_exit(EXIT_FAILURE, "Couldn't find memzone\n");
quota = qw_memzone->addr;
low_watermark = (unsigned int *) qw_memzone->addr + 1;
high_watermark = (unsigned int *) qw_memzone->addr + 2;
}
|