1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
|
.. BSD LICENSE
Copyright(c) 2017 Intel Corporation. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Event Device Library
====================
The DPDK Event device library is an abstraction that provides the application
with features to schedule events. This is achieved using the PMD architecture
similar to the ethdev or cryptodev APIs, which may already be familiar to the
reader.
The eventdev framework introduces the event driven programming model. In a
polling model, lcores poll ethdev ports and associated Rx queues directly
to look for a packet. By contrast in an event driven model, lcores call the
scheduler that selects packets for them based on programmer-specified criteria.
The Eventdev library adds support for an event driven programming model, which
offers applications automatic multicore scaling, dynamic load balancing,
pipelining, packet ingress order maintenance and synchronization services to
simplify application packet processing.
By introducing an event driven programming model, DPDK can support both polling
and event driven programming models for packet processing, and applications are
free to choose whatever model (or combination of the two) best suits their
needs.
Step-by-step instructions of the eventdev design is available in the `API
Walk-through`_ section later in this document.
Event struct
------------
The eventdev API represents each event with a generic struct, which contains a
payload and metadata required for scheduling by an eventdev. The
``rte_event`` struct is a 16 byte C structure, defined in
``libs/librte_eventdev/rte_eventdev.h``.
Event Metadata
~~~~~~~~~~~~~~
The rte_event structure contains the following metadata fields, which the
application fills in to have the event scheduled as required:
* ``flow_id`` - The targeted flow identifier for the enq/deq operation.
* ``event_type`` - The source of this event, eg RTE_EVENT_TYPE_ETHDEV or CPU.
* ``sub_event_type`` - Distinguishes events inside the application, that have
the same event_type (see above)
* ``op`` - This field takes one of the RTE_EVENT_OP_* values, and tells the
eventdev about the status of the event - valid values are NEW, FORWARD or
RELEASE.
* ``sched_type`` - Represents the type of scheduling that should be performed
on this event, valid values are the RTE_SCHED_TYPE_ORDERED, ATOMIC and
PARALLEL.
* ``queue_id`` - The identifier for the event queue that the event is sent to.
* ``priority`` - The priority of this event, see RTE_EVENT_DEV_PRIORITY.
Event Payload
~~~~~~~~~~~~~
The rte_event struct contains a union for payload, allowing flexibility in what
the actual event being scheduled is. The payload is a union of the following:
* ``uint64_t u64``
* ``void *event_ptr``
* ``struct rte_mbuf *mbuf``
These three items in a union occupy the same 64 bits at the end of the rte_event
structure. The application can utilize the 64 bits directly by accessing the
u64 variable, while the event_ptr and mbuf are provided as convenience
variables. For example the mbuf pointer in the union can used to schedule a
DPDK packet.
Queues
~~~~~~
An event queue is a queue containing events that are scheduled by the event
device. An event queue contains events of different flows associated with
scheduling types, such as atomic, ordered, or parallel.
Queue All Types Capable
^^^^^^^^^^^^^^^^^^^^^^^
If RTE_EVENT_DEV_CAP_QUEUE_ALL_TYPES capability bit is set in the event device,
then events of any type may be sent to any queue. Otherwise, the queues only
support events of the type that it was created with.
Queue All Types Incapable
^^^^^^^^^^^^^^^^^^^^^^^^^
In this case, each stage has a specified scheduling type. The application
configures each queue for a specific type of scheduling, and just enqueues all
events to the eventdev. An example of a PMD of this type is the eventdev
software PMD.
The Eventdev API supports the following scheduling types per queue:
* Atomic
* Ordered
* Parallel
Atomic, Ordered and Parallel are load-balanced scheduling types: the output
of the queue can be spread out over multiple CPU cores.
Atomic scheduling on a queue ensures that a single flow is not present on two
different CPU cores at the same time. Ordered allows sending all flows to any
core, but the scheduler must ensure that on egress the packets are returned to
ingress order on downstream queue enqueue. Parallel allows sending all flows
to all CPU cores, without any re-ordering guarantees.
Single Link Flag
^^^^^^^^^^^^^^^^
There is a SINGLE_LINK flag which allows an application to indicate that only
one port will be connected to a queue. Queues configured with the single-link
flag follow a FIFO like structure, maintaining ordering but it is only capable
of being linked to a single port (see below for port and queue linking details).
Ports
~~~~~
Ports are the points of contact between worker cores and the eventdev. The
general use-case will see one CPU core using one port to enqueue and dequeue
events from an eventdev. Ports are linked to queues in order to retrieve events
from those queues (more details in `Linking Queues and Ports`_ below).
API Walk-through
----------------
This section will introduce the reader to the eventdev API, showing how to
create and configure an eventdev and use it for a two-stage atomic pipeline
with a single core for TX. The diagram below shows the final state of the
application after this walk-through:
.. _figure_eventdev-usage1:
.. figure:: img/eventdev_usage.*
Sample eventdev usage, with RX, two atomic stages and a single-link to TX.
A high level overview of the setup steps are:
* rte_event_dev_configure()
* rte_event_queue_setup()
* rte_event_port_setup()
* rte_event_port_link()
* rte_event_dev_start()
Init and Config
~~~~~~~~~~~~~~~
The eventdev library uses vdev options to add devices to the DPDK application.
The ``--vdev`` EAL option allows adding eventdev instances to your DPDK
application, using the name of the eventdev PMD as an argument.
For example, to create an instance of the software eventdev scheduler, the
following vdev arguments should be provided to the application EAL command line:
.. code-block:: console
./dpdk_application --vdev="event_sw0"
In the following code, we configure eventdev instance with 3 queues
and 6 ports as follows. The 3 queues consist of 2 Atomic and 1 Single-Link,
while the 6 ports consist of 4 workers, 1 RX and 1 TX.
.. code-block:: c
const struct rte_event_dev_config config = {
.nb_event_queues = 3,
.nb_event_ports = 6,
.nb_events_limit = 4096,
.nb_event_queue_flows = 1024,
.nb_event_port_dequeue_depth = 128,
.nb_event_port_enqueue_depth = 128,
};
int err = rte_event_dev_configure(dev_id, &config);
The remainder of this walk-through assumes that dev_id is 0.
Setting up Queues
~~~~~~~~~~~~~~~~~
Once the eventdev itself is configured, the next step is to configure queues.
This is done by setting the appropriate values in a queue_conf structure, and
calling the setup function. Repeat this step for each queue, starting from
0 and ending at ``nb_event_queues - 1`` from the event_dev config above.
.. code-block:: c
struct rte_event_queue_conf atomic_conf = {
.event_queue_cfg = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY,
.priority = RTE_EVENT_DEV_PRIORITY_NORMAL,
.nb_atomic_flows = 1024,
.nb_atomic_order_sequences = 1024,
};
int dev_id = 0;
int queue_id = 0;
int err = rte_event_queue_setup(dev_id, queue_id, &atomic_conf);
The remainder of this walk-through assumes that the queues are configured as
follows:
* id 0, atomic queue #1
* id 1, atomic queue #2
* id 2, single-link queue
Setting up Ports
~~~~~~~~~~~~~~~~
Once queues are set up successfully, create the ports as required. Each port
should be set up with its corresponding port_conf type, worker for worker cores,
rx and tx for the RX and TX cores:
.. code-block:: c
struct rte_event_port_conf rx_conf = {
.dequeue_depth = 128,
.enqueue_depth = 128,
.new_event_threshold = 1024,
};
struct rte_event_port_conf worker_conf = {
.dequeue_depth = 16,
.enqueue_depth = 64,
.new_event_threshold = 4096,
};
struct rte_event_port_conf tx_conf = {
.dequeue_depth = 128,
.enqueue_depth = 128,
.new_event_threshold = 4096,
};
int dev_id = 0;
int port_id = 0;
int err = rte_event_port_setup(dev_id, port_id, &CORE_FUNCTION_conf);
It is now assumed that:
* port 0: RX core
* ports 1,2,3,4: Workers
* port 5: TX core
Linking Queues and Ports
~~~~~~~~~~~~~~~~~~~~~~~~
The final step is to "wire up" the ports to the queues. After this, the
eventdev is capable of scheduling events, and when cores request work to do,
the correct events are provided to that core. Note that the RX core takes input
from eg: a NIC so it is not linked to any eventdev queues.
Linking all workers to atomic queues, and the TX core to the single-link queue
can be achieved like this:
.. code-block:: c
uint8_t port_id = 0;
uint8_t atomic_qs[] = {0, 1};
uint8_t single_link_q = 2;
uint8_t tx_port_id = 5;
uin8t_t priority = RTE_EVENT_DEV_PRIORITY_NORMAL;
for(int i = 0; i < 4; i++) {
int worker_port = i + 1;
int links_made = rte_event_port_link(dev_id, worker_port, atomic_qs, NULL, 2);
}
int links_made = rte_event_port_link(dev_id, tx_port_id, &single_link_q, &priority, 1);
Starting the EventDev
~~~~~~~~~~~~~~~~~~~~~
A single function call tells the eventdev instance to start processing
events. Note that all queues must be linked to for the instance to start, as
if any queue is not linked to, enqueuing to that queue will cause the
application to backpressure and eventually stall due to no space in the
eventdev.
.. code-block:: c
int err = rte_event_dev_start(dev_id);
Ingress of New Events
~~~~~~~~~~~~~~~~~~~~~
Now that the eventdev is set up, and ready to receive events, the RX core must
enqueue some events into the system for it to schedule. The events to be
scheduled are ordinary DPDK packets, received from an eth_rx_burst() as normal.
The following code shows how those packets can be enqueued into the eventdev:
.. code-block:: c
const uint16_t nb_rx = rte_eth_rx_burst(eth_port, 0, mbufs, BATCH_SIZE);
for (i = 0; i < nb_rx; i++) {
ev[i].flow_id = mbufs[i]->hash.rss;
ev[i].op = RTE_EVENT_OP_NEW;
ev[i].sched_type = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY;
ev[i].queue_id = 0;
ev[i].event_type = RTE_EVENT_TYPE_ETHDEV;
ev[i].sub_event_type = 0;
ev[i].priority = RTE_EVENT_DEV_PRIORITY_NORMAL;
ev[i].mbuf = mbufs[i];
}
const int nb_tx = rte_event_enqueue_burst(dev_id, port_id, ev, nb_rx);
if (nb_tx != nb_rx) {
for(i = nb_tx; i < nb_rx; i++)
rte_pktmbuf_free(mbufs[i]);
}
Forwarding of Events
~~~~~~~~~~~~~~~~~~~~
Now that the RX core has injected events, there is work to be done by the
workers. Note that each worker will dequeue as many events as it can in a burst,
process each one individually, and then burst the packets back into the
eventdev.
The worker can lookup the events source from ``event.queue_id``, which should
indicate to the worker what workload needs to be performed on the event.
Once done, the worker can update the ``event.queue_id`` to a new value, to send
the event to the next stage in the pipeline.
.. code-block:: c
int timeout = 0;
struct rte_event events[BATCH_SIZE];
uint16_t nb_rx = rte_event_dequeue_burst(dev_id, worker_port_id, events, BATCH_SIZE, timeout);
for (i = 0; i < nb_rx; i++) {
/* process mbuf using events[i].queue_id as pipeline stage */
struct rte_mbuf *mbuf = events[i].mbuf;
/* Send event to next stage in pipeline */
events[i].queue_id++;
}
uint16_t nb_tx = rte_event_enqueue_burst(dev_id, port_id, events, nb_rx);
Egress of Events
~~~~~~~~~~~~~~~~
Finally, when the packet is ready for egress or needs to be dropped, we need
to inform the eventdev that the packet is no longer being handled by the
application. This can be done by calling dequeue() or dequeue_burst(), which
indicates that the previous burst of packets is no longer in use by the
application.
An event driven worker thread has following typical workflow on fastpath:
.. code-block:: c
while (1) {
rte_event_dequeue_burst(...);
(event processing)
rte_event_enqueue_burst(...);
}
Summary
-------
The eventdev library allows an application to easily schedule events as it
requires, either using a run-to-completion or pipeline processing model. The
queues and ports abstract the logical functionality of an eventdev, providing
the application with a generic method to schedule events. With the flexible
PMD infrastructure applications benefit of improvements in existing eventdevs
and additions of new ones without modification.
|