From 8b25d1ad5d2264bdfc2818c7bda74ee2697df6db Mon Sep 17 00:00:00 2001 From: Christian Ehrhardt Date: Wed, 6 Jul 2016 09:22:35 +0200 Subject: Imported Upstream version 16.07-rc1 Change-Id: I40a523e52f12e8496fdd69e902824b0226c303de Signed-off-by: Christian Ehrhardt --- doc/guides/prog_guide/env_abstraction_layer.rst | 4 +- doc/guides/prog_guide/index.rst | 1 + doc/guides/prog_guide/ivshmem_lib.rst | 2 + doc/guides/prog_guide/mempool_lib.rst | 42 ++++- doc/guides/prog_guide/pdump_lib.rst | 123 +++++++++++++ doc/guides/prog_guide/poll_mode_drv.rst | 21 ++- doc/guides/prog_guide/vhost_lib.rst | 223 +++++++++++++++++------- 7 files changed, 338 insertions(+), 78 deletions(-) create mode 100644 doc/guides/prog_guide/pdump_lib.rst (limited to 'doc/guides/prog_guide') diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 4737dc2e..4b9895e4 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -322,8 +322,8 @@ Known Issues The rte_mempool uses a per-lcore cache inside the mempool. For non-EAL pthreads, ``rte_lcore_id()`` will not return a valid number. - So for now, when rte_mempool is used with non-EAL pthreads, the put/get operations will bypass the mempool cache and there is a performance penalty because of this bypass. - Support for non-EAL mempool cache is currently being enabled. + So for now, when rte_mempool is used with non-EAL pthreads, the put/get operations will bypass the default mempool cache and there is a performance penalty because of this bypass. + Only user-owned external caches can be used in a non-EAL context in conjunction with ``rte_mempool_generic_put()`` and ``rte_mempool_generic_get()`` that accept an explicit cache parameter. + rte_ring diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst index b862d0cf..07a4d354 100644 --- a/doc/guides/prog_guide/index.rst +++ b/doc/guides/prog_guide/index.rst @@ -52,6 +52,7 @@ Programmer's Guide packet_distrib_lib reorder_lib ip_fragment_reassembly_lib + pdump_lib multi_proc_support kernel_nic_interface thread_safety_dpdk_functions diff --git a/doc/guides/prog_guide/ivshmem_lib.rst b/doc/guides/prog_guide/ivshmem_lib.rst index 9401ccf2..b8a32e4c 100644 --- a/doc/guides/prog_guide/ivshmem_lib.rst +++ b/doc/guides/prog_guide/ivshmem_lib.rst @@ -79,6 +79,8 @@ The following is a simple guide to using the IVSHMEM Library API: Only data structures fully residing in DPDK hugepage memory work correctly. Supported data structures created by malloc(), mmap() or otherwise using non-DPDK memory cause undefined behavior and even a segmentation fault. + Specifically, because the memzone field in an rte_ring refers to a memzone structure residing in local memory, + accessing the memzone field in a shared rte_ring will cause an immediate segmentation fault. IVSHMEM Environment Configuration --------------------------------- diff --git a/doc/guides/prog_guide/mempool_lib.rst b/doc/guides/prog_guide/mempool_lib.rst index 5fae79ab..59466752 100644 --- a/doc/guides/prog_guide/mempool_lib.rst +++ b/doc/guides/prog_guide/mempool_lib.rst @@ -34,13 +34,12 @@ Mempool Library =============== A memory pool is an allocator of a fixed-sized object. -In the DPDK, it is identified by name and uses a ring to store free objects. +In the DPDK, it is identified by name and uses a mempool handler to store free objects. +The default mempool handler is ring based. It provides some other optional services such as a per-core object cache and an alignment helper to ensure that objects are padded to spread them equally on all DRAM or DDR3 channels. -This library is used by the -:ref:`Mbuf Library ` and the -:ref:`Environment Abstraction Layer ` (for logging history). +This library is used by the :ref:`Mbuf Library `. Cookies ------- @@ -116,7 +115,7 @@ While this may mean a number of buffers may sit idle on some core's cache, the speed at which a core can access its own cache for a specific memory pool without locks provides performance gains. The cache is composed of a small, per-core table of pointers and its length (used as a stack). -This cache can be enabled or disabled at creation of the pool. +This internal cache can be enabled or disabled at creation of the pool. The maximum size of the cache is static and is defined at compilation time (CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE). @@ -128,6 +127,39 @@ The maximum size of the cache is static and is defined at compilation time (CONF A mempool in Memory with its Associated Ring +Alternatively to the internal default per-lcore local cache, an application can create and manage external caches through the ``rte_mempool_cache_create()``, ``rte_mempool_cache_free()`` and ``rte_mempool_cache_flush()`` calls. +These user-owned caches can be explicitly passed to ``rte_mempool_generic_put()`` and ``rte_mempool_generic_get()``. +The ``rte_mempool_default_cache()`` call returns the default internal cache if any. +In contrast to the default caches, user-owned caches can be used by non-EAL threads too. + +Mempool Handlers +------------------------ + +This allows external memory subsystems, such as external hardware memory +management systems and software based memory allocators, to be used with DPDK. + +There are two aspects to a mempool handler. + +* Adding the code for your new mempool operations (ops). This is achieved by + adding a new mempool ops code, and using the ``REGISTER_MEMPOOL_OPS`` macro. + +* Using the new API to call ``rte_mempool_create_empty()`` and + ``rte_mempool_set_ops_byname()`` to create a new mempool and specifying which + ops to use. + +Several different mempool handlers may be used in the same application. A new +mempool can be created by using the ``rte_mempool_create_empty()`` function, +then using ``rte_mempool_set_ops_byname()`` to point the mempool to the +relevant mempool handler callback (ops) structure. + +Legacy applications may continue to use the old ``rte_mempool_create()`` API +call, which uses a ring based mempool handler by default. These applications +will need to be modified to use a new mempool handler. + +For applications that use ``rte_pktmbuf_create()``, there is a config setting +(``RTE_MBUF_DEFAULT_MEMPOOL_OPS``) that allows the application to make use of +an alternative mempool handler. + Use Cases --------- diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst new file mode 100644 index 00000000..580ffcbd --- /dev/null +++ b/doc/guides/prog_guide/pdump_lib.rst @@ -0,0 +1,123 @@ +.. BSD LICENSE + Copyright(c) 2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +.. _pdump_library: + +The librte_pdump Library +======================== + +The ``librte_pdump`` library provides a framework for packet capturing in DPDK. +The library does the complete copy of the Rx and Tx mbufs to a new mempool and +hence it slows down the performance of the applications, so it is recommended +to use this library for debugging purposes. + +The library provides the following APIs to initialize the packet capture framework, to enable +or disable the packet capture, and to uninitialize it: + +* ``rte_pdump_init()``: + This API initializes the packet capture framework. + +* ``rte_pdump_enable()``: + This API enables the packet capture on a given port and queue. + Note: The filter option in the API is a place holder for future enhancements. + +* ``rte_pdump_enable_by_deviceid()``: + This API enables the packet capture on a given device id (``vdev name or pci address``) and queue. + Note: The filter option in the API is a place holder for future enhancements. + +* ``rte_pdump_disable()``: + This API disables the packet capture on a given port and queue. + +* ``rte_pdump_disable_by_deviceid()``: + This API disables the packet capture on a given device id (``vdev name or pci address``) and queue. + +* ``rte_pdump_uninit()``: + This API uninitializes the packet capture framework. + +* ``rte_pdump_set_socket_dir()``: + This API sets the server and client socket paths. + Note: This API is not thread-safe. + + +Operation +--------- + +The ``librte_pdump`` library works on a client/server model. The server is responsible for enabling or +disabling the packet capture and the clients are responsible for requesting the enabling or disabling of +the packet capture. + +The packet capture framework, as part of its initialization, creates the pthread and the server socket in +the pthread. The application that calls the framework initialization will have the server socket created, +either under the path that the application has passed or under the default path i.e. either ``/var/run`` for +root user or ``$HOME`` for non root user. + +Applications that request enabling or disabling of the packet capture will have the client socket created either under +the path that the application has passed or under the default path i.e. either ``/var/run/`` for root user or ``$HOME`` +for not root user to send the requests to the server. +The server socket will listen for client requests for enabling or disabling the packet capture. + + +Implementation Details +---------------------- + +The library API ``rte_pdump_init()``, initializes the packet capture framework by creating the pthread and the server +socket. The server socket in the pthread context will be listening to the client requests to enable or disable the +packet capture. + +The library APIs ``rte_pdump_enable()`` and ``rte_pdump_enable_by_deviceid()`` enables the packet capture. +On each call to these APIs, the library creates a separate client socket, creates the "pdump enable" request and sends +the request to the server. The server that is listening on the socket will take the request and enable the packet capture +by registering the Ethernet RX and TX callbacks for the given port or device_id and queue combinations. +Then the server will mirror the packets to the new mempool and enqueue them to the rte_ring that clients have passed +to these APIs. The server also sends the response back to the client about the status of the request that was processed. +After the response is received from the server, the client socket is closed. + +The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture. +On each call to these APIs, the library creates a separate client socket, creates the "pdump disable" request and sends +the request to the server. The server that is listening on the socket will take the request and disable the packet +capture by removing the Ethernet RX and TX callbacks for the given port or device_id and queue combinations. The server +also sends the response back to the client about the status of the request that was processed. After the response is +received from the server, the client socket is closed. + +The library API ``rte_pdump_uninit()``, uninitializes the packet capture framework by closing the pthread and the +server socket. + +The library API ``rte_pdump_set_socket_dir()``, sets the given path as either server socket path +or client socket path based on the ``type`` argument of the API. +If the given path is ``NULL``, default path will be selected, i.e. either ``/var/run/`` for root user or ``$HOME`` +for non root user. Clients also need to call this API to set their server socket path if the server socket +path is different from default path. + + +Use Case: Packet Capturing +-------------------------- + +The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK. +Users can use this as an example to develop their own packet capturing tools. diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst index 76986924..bf3ea9fd 100644 --- a/doc/guides/prog_guide/poll_mode_drv.rst +++ b/doc/guides/prog_guide/poll_mode_drv.rst @@ -299,10 +299,23 @@ Extended Statistics API ~~~~~~~~~~~~~~~~~~~~~~~ The extended statistics API allows each individual PMD to expose a unique set -of statistics. The client of the API provides an array of -``struct rte_eth_xstats`` type. Each ``struct rte_eth_xstats`` contains a -string and value pair. The amount of xstats exposed, and position of the -statistic in the array must remain constant during runtime. +of statistics. Accessing these from application programs is done via two +functions: + +* ``rte_eth_xstats_get``: Fills in an array of ``struct rte_eth_xstat`` + with extended statistics. +* ``rte_eth_xstats_get_names``: Fills in an array of + ``struct rte_eth_xstat_name`` with extended statistic name lookup + information. + +Each ``struct rte_eth_xstat`` contains an identifier and value pair, and +each ``struct rte_eth_xstat_name`` contains a string. Each identifier +within the ``struct rte_eth_xstat`` lookup array must have a corresponding +entry in the ``struct rte_eth_xstat_name`` lookup array. Within the latter +the index of the entry is the identifier the string is associated with. +These identifiers, as well as the number of extended statistic exposed, must +remain constant during runtime. Note that extended statistic identifiers are +driver-specific, and hence might not be the same for different ports. A naming scheme exists for the strings exposed to clients of the API. This is to allow scraping of the API for statistics of interest. The naming scheme uses diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index 48e1fffb..14d5e675 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -1,5 +1,5 @@ .. BSD LICENSE - Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + Copyright(c) 2010-2016 Intel Corporation. All rights reserved. All rights reserved. Redistribution and use in source and binary forms, with or without @@ -31,105 +31,194 @@ Vhost Library ============= -The vhost library implements a user space vhost driver. It supports both vhost-cuse -(cuse: user space character device) and vhost-user(user space socket server). -It also creates, manages and destroys vhost devices for corresponding virtio -devices in the guest. Vhost supported vSwitch could register callbacks to this -library, which will be called when a vhost device is activated or deactivated -by guest virtual machine. +The vhost library implements a user space virtio net server allowing the user +to manipulate the virtio ring directly. In another words, it allows the user +to fetch/put packets from/to the VM virtio net device. To achieve this, a +vhost library should be able to: + +* Access the guest memory: + + For QEMU, this is done by using the ``-object memory-backend-file,share=on,...`` + option. Which means QEMU will create a file to serve as the guest RAM. + The ``share=on`` option allows another process to map that file, which + means it can access the guest RAM. + +* Know all the necessary information about the vring: + + Information such as where the available ring is stored. Vhost defines some + messages to tell the backend all the information it needs to know how to + manipulate the vring. + +Currently, there are two ways to pass these messages and as a result there are +two Vhost implementations in DPDK: *vhost-cuse* (where the character devices +are in user space) and *vhost-user*. + +Vhost-cuse creates a user space character device and hook to a function ioctl, +so that all ioctl commands that are sent from the frontend (QEMU) will be +captured and handled. + +Vhost-user creates a Unix domain socket file through which messages are +passed. + +.. Note:: + + Since DPDK v2.2, the majority of the development effort has gone into + enhancing vhost-user, such as multiple queue, live migration, and + reconnect. Thus, it is strongly advised to use vhost-user instead of + vhost-cuse. + Vhost API Overview ------------------ -* Vhost driver registration +The following is an overview of the Vhost API functions: + +* ``rte_vhost_driver_register(path, flags)`` + + This function registers a vhost driver into the system. For vhost-cuse, a + ``/dev/path`` character device file will be created. For vhost-user server + mode, a Unix domain socket file ``path`` will be created. + + Currently two flags are supported (these are valid for vhost-user only): + + - ``RTE_VHOST_USER_CLIENT`` + + DPDK vhost-user will act as the client when this flag is given. See below + for an explanation. + + - ``RTE_VHOST_USER_NO_RECONNECT`` + + When DPDK vhost-user acts as the client it will keep trying to reconnect + to the server (QEMU) until it succeeds. This is useful in two cases: + + * When QEMU is not started yet. + * When QEMU restarts (for example due to a guest OS reboot). + + This reconnect option is enabled by default. However, it can be turned off + by setting this flag. - rte_vhost_driver_register registers the vhost driver into the system. - For vhost-cuse, character device file will be created under the /dev directory. - Character device name is specified as the parameter. - For vhost-user, a Unix domain socket server will be created with the parameter as - the local socket path. +* ``rte_vhost_driver_session_start()`` -* Vhost session start + This function starts the vhost session loop to handle vhost messages. It + starts an infinite loop, therefore it should be called in a dedicated + thread. - rte_vhost_driver_session_start starts the vhost session loop. - Vhost session is an infinite blocking loop. - Put the session in a dedicate DPDK thread. +* ``rte_vhost_driver_callback_register(virtio_net_device_ops)`` -* Callback register + This function registers a set of callbacks, to let DPDK applications take + the appropriate action when some events happen. The following events are + currently supported: - Vhost supported vSwitch could call rte_vhost_driver_callback_register to - register two callbacks, new_destory and destroy_device. - When virtio device is activated or deactivated by guest virtual machine, - the callback will be called, then vSwitch could put the device onto data - core or remove the device from data core by setting or unsetting - VIRTIO_DEV_RUNNING on the device flags. + * ``new_device(int vid)`` -* Read/write packets from/to guest virtual machine + This callback is invoked when a virtio net device becomes ready. ``vid`` + is the virtio net device ID. - rte_vhost_enqueue_burst transmit host packets to guest. - rte_vhost_dequeue_burst receives packets from guest. + * ``destroy_device(int vid)`` -* Feature enable/disable + This callback is invoked when a virtio net device shuts down (or when the + vhost connection is broken). - Now one negotiate-able feature in vhost is merge-able. - vSwitch could enable/disable this feature for performance consideration. + * ``vring_state_changed(int vid, uint16_t queue_id, int enable)`` -Vhost Implementation --------------------- + This callback is invoked when a specific queue's state is changed, for + example to enabled or disabled. -Vhost cuse implementation +* ``rte_vhost_enqueue_burst(vid, queue_id, pkts, count)`` + + Transmits (enqueues) ``count`` packets from host to guest. + +* ``rte_vhost_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count)`` + + Receives (dequeues) ``count`` packets from guest, and stored them at ``pkts``. + +* ``rte_vhost_feature_disable/rte_vhost_feature_enable(feature_mask)`` + + This function disables/enables some features. For example, it can be used to + disable mergeable buffers and TSO features, which both are enabled by + default. + + +Vhost Implementations +--------------------- + +Vhost-cuse implementation ~~~~~~~~~~~~~~~~~~~~~~~~~ + When vSwitch registers the vhost driver, it will register a cuse device driver into the system and creates a character device file. This cuse driver will -receive vhost open/release/IOCTL message from QEMU simulator. +receive vhost open/release/IOCTL messages from the QEMU simulator. -When the open call is received, vhost driver will create a vhost device for the -virtio device in the guest. +When the open call is received, the vhost driver will create a vhost device +for the virtio device in the guest. -When VHOST_SET_MEM_TABLE IOCTL is received, vhost searches the memory region -to find the starting user space virtual address that maps the memory of guest -virtual machine. Through this virtual address and the QEMU pid, vhost could -find the file QEMU uses to map the guest memory. Vhost maps this file into its -address space, in this way vhost could fully access the guest physical memory, -which means vhost could access the shared virtio ring and the guest physical -address specified in the entry of the ring. +When the ``VHOST_SET_MEM_TABLE`` ioctl is received, vhost searches the memory +region to find the starting user space virtual address that maps the memory of +the guest virtual machine. Through this virtual address and the QEMU pid, +vhost can find the file QEMU uses to map the guest memory. Vhost maps this +file into its address space, in this way vhost can fully access the guest +physical memory, which means vhost could access the shared virtio ring and the +guest physical address specified in the entry of the ring. The guest virtual machine tells the vhost whether the virtio device is ready -for processing or is de-activated through VHOST_NET_SET_BACKEND message. -The registered callback from vSwitch will be called. +for processing or is de-activated through the ``VHOST_NET_SET_BACKEND`` +message. The registered callback from vSwitch will be called. -When the release call is released, vhost will destroy the device. +When the release call is made, vhost will destroy the device. -Vhost user implementation +Vhost-user implementation ~~~~~~~~~~~~~~~~~~~~~~~~~ -When vSwitch registers a vhost driver, it will create a Unix domain socket server -into the system. This server will listen for a connection and process the vhost message from -QEMU simulator. -When there is a new socket connection, it means a new virtio device has been created in -the guest virtual machine, and the vhost driver will create a vhost device for this virtio device. +Vhost-user uses Unix domain sockets for passing messages. This means the DPDK +vhost-user implementation has two options: + +* DPDK vhost-user acts as the server. + + DPDK will create a Unix domain socket server file and listen for + connections from the frontend. + + Note, this is the default mode, and the only mode before DPDK v16.07. + + +* DPDK vhost-user acts as the client. + + Unlike the server mode, this mode doesn't create the socket file; + it just tries to connect to the server (which responses to create the + file instead). + + When the DPDK vhost-user application restarts, DPDK vhost-user will try to + connect to the server again. This is how the "reconnect" feature works. + + Note: the "reconnect" feature requires **QEMU v2.7** (or above). + +No matter which mode is used, once a connection is established, DPDK +vhost-user will start receiving and processing vhost messages from QEMU. + +For messages with a file descriptor, the file descriptor can be used directly +in the vhost process as it is already installed by the Unix domain socket. -For messages with a file descriptor, the file descriptor could be directly used in the vhost -process as it is already installed by Unix domain socket. +The supported vhost messages are: - * VHOST_SET_MEM_TABLE - * VHOST_SET_VRING_KICK - * VHOST_SET_VRING_CALL - * VHOST_SET_LOG_FD - * VHOST_SET_VRING_ERR +* ``VHOST_SET_MEM_TABLE`` +* ``VHOST_SET_VRING_KICK`` +* ``VHOST_SET_VRING_CALL`` +* ``VHOST_SET_LOG_FD`` +* ``VHOST_SET_VRING_ERR`` -For VHOST_SET_MEM_TABLE message, QEMU will send us information for each memory region and its -file descriptor in the ancillary data of the message. The fd is used to map that region. +For ``VHOST_SET_MEM_TABLE`` message, QEMU will send information for each +memory region and its file descriptor in the ancillary data of the message. +The file descriptor is used to map that region. -There is no VHOST_NET_SET_BACKEND message as in vhost cuse to signal us whether virtio device -is ready or should be stopped. -VHOST_SET_VRING_KICK is used as the signal to put the vhost device onto data plane. -VHOST_GET_VRING_BASE is used as the signal to remove vhost device from data plane. +There is no ``VHOST_NET_SET_BACKEND`` message as in vhost-cuse to signal +whether the virtio device is ready or stopped. Instead, +``VHOST_SET_VRING_KICK`` is used as the signal to put the vhost device into +the data plane, and ``VHOST_GET_VRING_BASE`` is used as the signal to remove +the vhost device from the data plane. When the socket connection is closed, vhost will destroy the device. Vhost supported vSwitch reference --------------------------------- -For more vhost details and how to support vhost in vSwitch, please refer to vhost example in the -DPDK Sample Applications Guide. +For more vhost details and how to support vhost in vSwitch, please refer to +the vhost example in the DPDK Sample Applications Guide. -- cgit 1.2.3-korg