summaryrefslogtreecommitdiffstats
path: root/doc/guides/sample_app_ug/exception_path.rst
blob: 161b6e0f42ba737182bf1d7c5fa2f302b2ee15b4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
..  BSD LICENSE
    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
    All rights reserved.

    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions
    are met:

    * Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in
    the documentation and/or other materials provided with the
    distribution.
    * Neither the name of Intel Corporation nor the names of its
    contributors may be used to endorse or promote products derived
    from this software without specific prior written permission.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Exception Path Sample Application
=================================

The Exception Path sample application is a simple example that demonstrates the use of the DPDK
to set up an exception path for packets to go through the Linux* kernel.
This is done by using virtual TAP network interfaces.
These can be read from and written to by the DPDK application and
appear to the kernel as a standard network interface.

Overview
--------

The application creates two threads for each NIC port being used.
One thread reads from the port and writes the data unmodified to a thread-specific TAP interface.
The second thread reads from a TAP interface and writes the data unmodified to the NIC port.

The packet flow through the exception path application is as shown in the following figure.

.. _figure_exception_path_example:

.. figure:: img/exception_path_example.*

   Packet Flow


To make throughput measurements, kernel bridges must be setup to forward data between the bridges appropriately.

Compiling the Application
-------------------------

#.  Go to example directory:

    .. code-block:: console

        export RTE_SDK=/path/to/rte_sdk
        cd ${RTE_SDK}/examples/exception_path

#.  Set the target (a default target will be used if not specified).
    For example:

    .. code-block:: console

        export RTE_TARGET=x86_64-native-linuxapp-gcc

This application is intended as a linuxapp only.
See the *DPDK Getting Started Guide* for possible RTE_TARGET values.

#.  Build the application:

    .. code-block:: console

        make

Running the Application
-----------------------

The application requires a number of command line options:

.. code-block:: console

    .build/exception_path [EAL options] -- -p PORTMASK -i IN_CORES -o OUT_CORES

where:

*   -p PORTMASK: A hex bitmask of ports to use

*   -i IN_CORES: A hex bitmask of cores which read from NIC

*   -o OUT_CORES: A hex bitmask of cores which write to NIC

Refer to the *DPDK Getting Started Guide* for general information on running applications
and the Environment Abstraction Layer (EAL) options.

The number of bits set in each bitmask must be the same.
The coremask -c parameter of the EAL options should include IN_CORES and OUT_CORES.
The same bit must not be set in IN_CORES and OUT_CORES.
The affinities between ports and cores are set beginning with the least significant bit of each mask, that is,
the port represented by the lowest bit in PORTMASK is read from by the core represented by the lowest bit in IN_CORES,
and written to by the core represented by the lowest bit in OUT_CORES.

For example to run the application with two ports and four cores:

.. code-block:: console

    ./build/exception_path -c f -n 4 -- -p 3 -i 3 -o c

Getting Statistics
~~~~~~~~~~~~~~~~~~

While the application is running, statistics on packets sent and
received can be displayed by sending the SIGUSR1 signal to the application from another terminal:

.. code-block:: console

    killall -USR1 exception_path

The statistics can be reset by sending a SIGUSR2 signal in a similar way.

Explanation
-----------

The following sections provide some explanation of the code.

Initialization
~~~~~~~~~~~~~~

Setup of the mbuf pool, driver and queues is similar to the setup done in the :ref:`l2_fwd_app_real_and_virtual`.
In addition, the TAP interfaces must also be created.
A TAP interface is created for each lcore that is being used.
The code for creating the TAP interface is as follows:

.. code-block:: c

    /*
     *   Create a tap network interface, or use existing one with same name.
     *   If name[0]='\0' then a name is automatically assigned and returned in name.
     */

    static int tap_create(char *name)
    {
        struct ifreq ifr;
        int fd, ret;

        fd = open("/dev/net/tun", O_RDWR);
        if (fd < 0)
            return fd;

        memset(&ifr, 0, sizeof(ifr));

        /* TAP device without packet information */

        ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
        if (name && *name)
            rte_snprinf(ifr.ifr_name, IFNAMSIZ, name);

        ret = ioctl(fd, TUNSETIFF, (void *) &ifr);

        if (ret < 0) {
            close(fd);
            return ret;

        }

        if (name)
            snprintf(name, IFNAMSIZ, ifr.ifr_name);

        return fd;
    }

The other step in the initialization process that is unique to this sample application
is the association of each port with two cores:

*   One core to read from the port and write to a TAP interface

*   A second core to read from a TAP interface and write to the port

This is done using an array called port_ids[], which is indexed by the lcore IDs.
The population of this array is shown below:

.. code-block:: c

    tx_port = 0;
    rx_port = 0;

    RTE_LCORE_FOREACH(i) {
        if (input_cores_mask & (1ULL << i)) {
            /* Skip ports that are not enabled */
            while ((ports_mask & (1 << rx_port)) == 0) {
                rx_port++;
                if (rx_port > (sizeof(ports_mask) * 8))
                    goto fail; /* not enough ports */
            }
            port_ids[i] = rx_port++;
        } else if (output_cores_mask & (1ULL << i)) {
            /* Skip ports that are not enabled */
            while ((ports_mask & (1 << tx_port)) == 0) {
                tx_port++;
                if (tx_port > (sizeof(ports_mask) * 8))
                   goto fail; /* not enough ports */
            }
            port_ids[i] = tx_port++;
        }
   }

Packet Forwarding
~~~~~~~~~~~~~~~~~

After the initialization steps are complete, the main_loop() function is run on each lcore.
This function first checks the lcore_id against the user provided input_cores_mask and output_cores_mask to see
if this core is reading from or writing to a TAP interface.

For the case that reads from a NIC port, the packet reception is the same as in the L2 Forwarding sample application
(see :ref:`l2_fwd_app_rx_tx_packets`).
The packet transmission is done by calling write() with the file descriptor of the appropriate TAP interface
and then explicitly freeing the mbuf back to the pool.

..  code-block:: c

    /* Loop forever reading from NIC and writing to tap */

    for (;;) {
        struct rte_mbuf *pkts_burst[PKT_BURST_SZ];
        unsigned i;

        const unsigned nb_rx = rte_eth_rx_burst(port_ids[lcore_id], 0, pkts_burst, PKT_BURST_SZ);

        lcore_stats[lcore_id].rx += nb_rx;

        for (i = 0; likely(i < nb_rx); i++) {
            struct rte_mbuf *m = pkts_burst[i];
            int ret = write(tap_fd, rte_pktmbuf_mtod(m, void*),

            rte_pktmbuf_data_len(m));
            rte_pktmbuf_free(m);
            if (unlikely(ret<0))
                lcore_stats[lcore_id].dropped++;
            else
                lcore_stats[lcore_id].tx++;
        }
    }

For the other case that reads from a TAP interface and writes to a NIC port,
packets are retrieved by doing a read() from the file descriptor of the appropriate TAP interface.
This fills in the data into the mbuf, then other fields are set manually.
The packet can then be transmitted as normal.

.. code-block:: c

    /* Loop forever reading from tap and writing to NIC */

    for (;;) {
        int ret;
        struct rte_mbuf *m = rte_pktmbuf_alloc(pktmbuf_pool);

        if (m == NULL)
            continue;

        ret = read(tap_fd, m->pkt.data, MAX_PACKET_SZ); lcore_stats[lcore_id].rx++;
        if (unlikely(ret < 0)) {
            FATAL_ERROR("Reading from %s interface failed", tap_name);
        }

        m->pkt.nb_segs = 1;
        m->pkt.next = NULL;
        m->pkt.data_len = (uint16_t)ret;

        ret = rte_eth_tx_burst(port_ids[lcore_id], 0, &m, 1);
        if (unlikely(ret < 1)) {
            rte_pktmuf_free(m);
            lcore_stats[lcore_id].dropped++;
        }
        else {
            lcore_stats[lcore_id].tx++;
        }
    }

To set up loops for measuring throughput, TAP interfaces can be connected using bridging.
The steps to do this are described in the section that follows.

Managing TAP Interfaces and Bridges
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Exception Path sample application creates TAP interfaces with names of the format tap_dpdk_nn,
where nn is the lcore ID. These TAP interfaces need to be configured for use:

.. code-block:: console

    ifconfig tap_dpdk_00 up

To set up a bridge between two interfaces so that packets sent to one interface can be read from another,
use the brctl tool:

.. code-block:: console

    brctl addbr "br0"
    brctl addif br0 tap_dpdk_00
    brctl addif br0 tap_dpdk_03
    ifconfig br0 up

The TAP interfaces created by this application exist only when the application is running,
so the steps above need to be repeated each time the application is run.
To avoid this, persistent TAP interfaces can be created using openvpn:

.. code-block:: console

    openvpn --mktun --dev tap_dpdk_00

If this method is used, then the steps above have to be done only once and
the same TAP interfaces can be reused each time the application is run.
To remove bridges and persistent TAP interfaces, the following commands are used:

.. code-block:: console

    ifconfig br0 down
    brctl delbr br0
    openvpn --rmtun --dev tap_dpdk_00