diff options
author | C.J. Collier <cjcollier@linuxfoundation.org> | 2016-06-14 07:50:17 -0700 |
---|---|---|
committer | C.J. Collier <cjcollier@linuxfoundation.org> | 2016-06-14 12:17:54 -0700 |
commit | 97f17497d162afdb82c8704bf097f0fee3724b2e (patch) | |
tree | 1c6269614c0c15ffef8451c58ae8f8b30a1bc804 /doc/guides/sample_app_ug/ipv4_multicast.rst | |
parent | e04be89c2409570e0055b2cda60bd11395bb93b0 (diff) |
Imported Upstream version 16.04
Change-Id: I77eadcd8538a9122e4773cbe55b24033dc451757
Signed-off-by: C.J. Collier <cjcollier@linuxfoundation.org>
Diffstat (limited to 'doc/guides/sample_app_ug/ipv4_multicast.rst')
-rw-r--r-- | doc/guides/sample_app_ug/ipv4_multicast.rst | 372 |
1 files changed, 372 insertions, 0 deletions
diff --git a/doc/guides/sample_app_ug/ipv4_multicast.rst b/doc/guides/sample_app_ug/ipv4_multicast.rst new file mode 100644 index 00000000..67ea944e --- /dev/null +++ b/doc/guides/sample_app_ug/ipv4_multicast.rst @@ -0,0 +1,372 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +IPv4 Multicast Sample Application +================================= + +The IPv4 Multicast application is a simple example of packet processing +using the Data Plane Development Kit (DPDK). +The application performs L3 multicasting. + +Overview +-------- + +The application demonstrates the use of zero-copy buffers for packet forwarding. +The initialization and run-time paths are very similar to those of the :doc:`l2_forward_real_virtual`. +This guide highlights the differences between the two applications. +There are two key differences from the L2 Forwarding sample application: + +* The IPv4 Multicast sample application makes use of indirect buffers. + +* The forwarding decision is taken based on information read from the input packet's IPv4 header. + +The lookup method is the Four-byte Key (FBK) hash-based method. +The lookup table is composed of pairs of destination IPv4 address (the FBK) +and a port mask associated with that IPv4 address. + +For convenience and simplicity, this sample application does not take IANA-assigned multicast addresses into account, +but instead equates the last four bytes of the multicast group (that is, the last four bytes of the destination IP address) +with the mask of ports to multicast packets to. +Also, the application does not consider the Ethernet addresses; +it looks only at the IPv4 destination address for any given packet. + +Building the Application +------------------------ + +To compile the application: + +#. Go to the sample application directory: + + .. code-block:: console + + export RTE_SDK=/path/to/rte_sdk + cd ${RTE_SDK}/examples/ipv4_multicast + +#. Set the target (a default target is used if not specified). For example: + + .. code-block:: console + + export RTE_TARGET=x86_64-native-linuxapp-gcc + +See the *DPDK Getting Started Guide* for possible RTE_TARGET values. + +#. Build the application: + + .. code-block:: console + + make + +.. note:: + + The compiled application is written to the build subdirectory. + To have the application written to a different location, + the O=/path/to/build/directory option may be specified in the make command. + +Running the Application +----------------------- + +The application has a number of command line options: + +.. code-block:: console + + ./build/ipv4_multicast [EAL options] -- -p PORTMASK [-q NQ] + +where, + +* -p PORTMASK: Hexadecimal bitmask of ports to configure + +* -q NQ: determines the number of queues per lcore + +.. note:: + + Unlike the basic L2/L3 Forwarding sample applications, + NUMA support is not provided in the IPv4 Multicast sample application. + +Typically, to run the IPv4 Multicast sample application, issue the following command (as root): + +.. code-block:: console + + ./build/ipv4_multicast -c 0x00f -n 3 -- -p 0x3 -q 1 + +In this command: + +* The -c option enables cores 0, 1, 2 and 3 + +* The -n option specifies 3 memory channels + +* The -p option enables ports 0 and 1 + +* The -q option assigns 1 queue to each lcore + +Refer to the *DPDK Getting Started Guide* for general information on running applications +and the Environment Abstraction Layer (EAL) options. + +Explanation +----------- + +The following sections provide some explanation of the code. +As mentioned in the overview section, +the initialization and run-time paths are very similar to those of the :doc:`l2_forward_real_virtual`. +The following sections describe aspects that are specific to the IPv4 Multicast sample application. + +Memory Pool Initialization +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The IPv4 Multicast sample application uses three memory pools. +Two of the pools are for indirect buffers used for packet duplication purposes. +Memory pools for indirect buffers are initialized differently from the memory pool for direct buffers: + +.. code-block:: c + + packet_pool = rte_mempool_create("packet_pool", NB_PKT_MBUF, PKT_MBUF_SIZE, 32, sizeof(struct rte_pktmbuf_pool_private), + rte_pktmbuf_pool_init, NULL, rte_pktmbuf_init, NULL, rte_socket_id(), 0); + + header_pool = rte_mempool_create("header_pool", NB_HDR_MBUF, HDR_MBUF_SIZE, 32, 0, NULL, NULL, rte_pktmbuf_init, NULL, rte_socket_id(), 0); + clone_pool = rte_mempool_create("clone_pool", NB_CLONE_MBUF, + CLONE_MBUF_SIZE, 32, 0, NULL, NULL, rte_pktmbuf_init, NULL, rte_socket_id(), 0); + +The reason for this is because indirect buffers are not supposed to hold any packet data and +therefore can be initialized with lower amount of reserved memory for each buffer. + +Hash Initialization +~~~~~~~~~~~~~~~~~~~ + +The hash object is created and loaded with the pre-configured entries read from a global array: + +.. code-block:: c + + static int + + init_mcast_hash(void) + { + uint32_t i; + mcast_hash_params.socket_id = rte_socket_id(); + + mcast_hash = rte_fbk_hash_create(&mcast_hash_params); + if (mcast_hash == NULL){ + return -1; + } + + for (i = 0; i < N_MCAST_GROUPS; i ++){ + if (rte_fbk_hash_add_key(mcast_hash, mcast_group_table[i].ip, mcast_group_table[i].port_mask) < 0) { + return -1; + } + } + return 0; + } + +Forwarding +~~~~~~~~~~ + +All forwarding is done inside the mcast_forward() function. +Firstly, the Ethernet* header is removed from the packet and the IPv4 address is extracted from the IPv4 header: + +.. code-block:: c + + /* Remove the Ethernet header from the input packet */ + + iphdr = (struct ipv4_hdr *)rte_pktmbuf_adj(m, sizeof(struct ether_hdr)); + RTE_MBUF_ASSERT(iphdr != NULL); + dest_addr = rte_be_to_cpu_32(iphdr->dst_addr); + +Then, the packet is checked to see if it has a multicast destination address and +if the routing table has any ports assigned to the destination address: + +.. code-block:: c + + if (!IS_IPV4_MCAST(dest_addr) || + (hash = rte_fbk_hash_lookup(mcast_hash, dest_addr)) <= 0 || + (port_mask = hash & enabled_port_mask) == 0) { + rte_pktmbuf_free(m); + return; + } + +Then, the number of ports in the destination portmask is calculated with the help of the bitcnt() function: + +.. code-block:: c + + /* Get number of bits set. */ + + static inline uint32_t bitcnt(uint32_t v) + { + uint32_t n; + + for (n = 0; v != 0; v &= v - 1, n++) + ; + return n; + } + +This is done to determine which forwarding algorithm to use. +This is explained in more detail in the next section. + +Thereafter, a destination Ethernet address is constructed: + +.. code-block:: c + + /* construct destination Ethernet address */ + + dst_eth_addr = ETHER_ADDR_FOR_IPV4_MCAST(dest_addr); + +Since Ethernet addresses are also part of the multicast process, each outgoing packet carries the same destination Ethernet address. +The destination Ethernet address is constructed from the lower 23 bits of the multicast group OR-ed +with the Ethernet address 01:00:5e:00:00:00, as per RFC 1112: + +.. code-block:: c + + #define ETHER_ADDR_FOR_IPV4_MCAST(x) \ + (rte_cpu_to_be_64(0x01005e000000ULL | ((x) & 0x7fffff)) >> 16) + +Then, packets are dispatched to the destination ports according to the portmask associated with a multicast group: + +.. code-block:: c + + for (port = 0; use_clone != port_mask; port_mask >>= 1, port++) { + /* Prepare output packet and send it out. */ + + if ((port_mask & 1) != 0) { + if (likely ((mc = mcast_out_pkt(m, use_clone)) != NULL)) + mcast_send_pkt(mc, &dst_eth_addr.as_addr, qconf, port); + else if (use_clone == 0) + rte_pktmbuf_free(m); + } + } + +The actual packet transmission is done in the mcast_send_pkt() function: + +.. code-block:: c + + static inline void mcast_send_pkt(struct rte_mbuf *pkt, struct ether_addr *dest_addr, struct lcore_queue_conf *qconf, uint8_t port) + { + struct ether_hdr *ethdr; + uint16_t len; + + /* Construct Ethernet header. */ + + ethdr = (struct ether_hdr *)rte_pktmbuf_prepend(pkt, (uint16_t) sizeof(*ethdr)); + + RTE_MBUF_ASSERT(ethdr != NULL); + + ether_addr_copy(dest_addr, ðdr->d_addr); + ether_addr_copy(&ports_eth_addr[port], ðdr->s_addr); + ethdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4); + + /* Put new packet into the output queue */ + + len = qconf->tx_mbufs[port].len; + qconf->tx_mbufs[port].m_table[len] = pkt; + qconf->tx_mbufs[port].len = ++len; + + /* Transmit packets */ + + if (unlikely(MAX_PKT_BURST == len)) + send_burst(qconf, port); + } + +Buffer Cloning +~~~~~~~~~~~~~~ + +This is the most important part of the application since it demonstrates the use of zero- copy buffer cloning. +There are two approaches for creating the outgoing packet and although both are based on the data zero-copy idea, +there are some differences in the detail. + +The first approach creates a clone of the input packet, for example, +walk though all segments of the input packet and for each of segment, +create a new buffer and attach that new buffer to the segment +(refer to rte_pktmbuf_clone() in the rte_mbuf library for more details). +A new buffer is then allocated for the packet header and is prepended to the cloned buffer. + +The second approach does not make a clone, it just increments the reference counter for all input packet segment, +allocates a new buffer for the packet header and prepends it to the input packet. + +Basically, the first approach reuses only the input packet's data, but creates its own copy of packet's metadata. +The second approach reuses both input packet's data and metadata. + +The advantage of first approach is that each outgoing packet has its own copy of the metadata, +so we can safely modify the data pointer of the input packet. +That allows us to skip creation if the output packet is for the last destination port +and instead modify input packet's header in place. +For example, for N destination ports, we need to invoke mcast_out_pkt() (N-1) times. + +The advantage of the second approach is that there is less work to be done for each outgoing packet, +that is, the "clone" operation is skipped completely. +However, there is a price to pay. +The input packet's metadata must remain intact, so for N destination ports, +we need to invoke mcast_out_pkt() (N) times. + +Therefore, for a small number of outgoing ports (and segments in the input packet), +first approach is faster. +As the number of outgoing ports (and/or input segments) grows, the second approach becomes more preferable. + +Depending on the number of segments or the number of ports in the outgoing portmask, +either the first (with cloning) or the second (without cloning) approach is taken: + +.. code-block:: c + + use_clone = (port_num <= MCAST_CLONE_PORTS && m->pkt.nb_segs <= MCAST_CLONE_SEGS); + +It is the mcast_out_pkt() function that performs the packet duplication (either with or without actually cloning the buffers): + +.. code-block:: c + + static inline struct rte_mbuf *mcast_out_pkt(struct rte_mbuf *pkt, int use_clone) + { + struct rte_mbuf *hdr; + + /* Create new mbuf for the header. */ + + if (unlikely ((hdr = rte_pktmbuf_alloc(header_pool)) == NULL)) + return NULL; + + /* If requested, then make a new clone packet. */ + + if (use_clone != 0 && unlikely ((pkt = rte_pktmbuf_clone(pkt, clone_pool)) == NULL)) { + rte_pktmbuf_free(hdr); + return NULL; + } + + /* prepend new header */ + + hdr->pkt.next = pkt; + + /* update header's fields */ + + hdr->pkt.pkt_len = (uint16_t)(hdr->pkt.data_len + pkt->pkt.pkt_len); + hdr->pkt.nb_segs = (uint8_t)(pkt->pkt.nb_segs + 1); + + /* copy metadata from source packet */ + + hdr->pkt.in_port = pkt->pkt.in_port; + hdr->pkt.vlan_macip = pkt->pkt.vlan_macip; + hdr->pkt.hash = pkt->pkt.hash; + hdr->ol_flags = pkt->ol_flags; + rte_mbuf_sanity_check(hdr, RTE_MBUF_PKT, 1); + + return hdr; + } |