diff options
Diffstat (limited to 'doc/guides/prog_guide/compressdev.rst')
-rw-r--r-- | doc/guides/prog_guide/compressdev.rst | 623 |
1 files changed, 623 insertions, 0 deletions
diff --git a/doc/guides/prog_guide/compressdev.rst b/doc/guides/prog_guide/compressdev.rst new file mode 100644 index 00000000..87e26490 --- /dev/null +++ b/doc/guides/prog_guide/compressdev.rst @@ -0,0 +1,623 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2017-2018 Cavium Networks. + +Compression Device Library +=========================== + +The compression framework provides a generic set of APIs to perform compression services +as well as to query and configure compression devices both physical(hardware) and virtual(software) +to perform those services. The framework currently only supports lossless compression schemes: +Deflate and LZS. + +Device Management +----------------- + +Device Creation +~~~~~~~~~~~~~~~ + +Physical compression devices are discovered during the bus probe of the EAL function +which is executed at DPDK initialization, based on their unique device identifier. +For eg. PCI devices can be identified using PCI BDF (bus/bridge, device, function). +Specific physical compression devices, like other physical devices in DPDK can be +white-listed or black-listed using the EAL command line options. + +Virtual devices can be created by two mechanisms, either using the EAL command +line options or from within the application using an EAL API directly. + +From the command line using the --vdev EAL option + +.. code-block:: console + + --vdev '<pmd name>,socket_id=0' + +.. Note:: + + * If DPDK application requires multiple software compression PMD devices then required + number of ``--vdev`` with appropriate libraries are to be added. + + * An Application with multiple compression device instances exposed by the same PMD must + specify a unique name for each device. + + Example: ``--vdev 'pmd0' --vdev 'pmd1'`` + +Or, by using the rte_vdev_init API within the application code. + +.. code-block:: c + + rte_vdev_init("<pmd_name>","socket_id=0") + +All virtual compression devices support the following initialization parameters: + +* ``socket_id`` - socket on which to allocate the device resources on. + +Device Identification +~~~~~~~~~~~~~~~~~~~~~ + +Each device, whether virtual or physical is uniquely designated by two +identifiers: + +- A unique device index used to designate the compression device in all functions + exported by the compressdev API. + +- A device name used to designate the compression device in console messages, for + administration or debugging purposes. + +Device Configuration +~~~~~~~~~~~~~~~~~~~~ + +The configuration of each compression device includes the following operations: + +- Allocation of resources, including hardware resources if a physical device. +- Resetting the device into a well-known default state. +- Initialization of statistics counters. + +The ``rte_compressdev_configure`` API is used to configure a compression device. + +The ``rte_compressdev_config`` structure is used to pass the configuration +parameters. + +See *DPDK API Reference* for details. + +Configuration of Queue Pairs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Each compression device queue pair is individually configured through the +``rte_compressdev_queue_pair_setup`` API. + +The ``max_inflight_ops`` is used to pass maximum number of +rte_comp_op that could be present in a queue at-a-time. +PMD then can allocate resources accordingly on a specified socket. + +See *DPDK API Reference* for details. + +Logical Cores, Memory and Queues Pair Relationships +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Library supports NUMA similarly as described in Cryptodev library section. + +A queue pair cannot be shared and should be exclusively used by a single processing +context for enqueuing operations or dequeuing operations on the same compression device +since sharing would require global locks and hinder performance. It is however possible +to use a different logical core to dequeue an operation on a queue pair from the logical +core on which it was enqueued. This means that a compression burst enqueue/dequeue +APIs are a logical place to transition from one logical core to another in a +data processing pipeline. + +Device Features and Capabilities +--------------------------------- + +Compression devices define their functionality through two mechanisms, global device +features and algorithm features. Global devices features identify device +wide level features which are applicable to the whole device such as supported hardware +acceleration and CPU features. List of compression device features can be seen in the +RTE_COMPDEV_FF_XXX macros. + +The algorithm features lists individual algo feature which device supports per-algorithm, +such as a stateful compression/decompression, checksums operation etc. List of algorithm +features can be seen in the RTE_COMP_FF_XXX macros. + +Capabilities +~~~~~~~~~~~~ +Each PMD has a list of capabilities, including algorithms listed in +enum ``rte_comp_algorithm`` and its associated feature flag and +sliding window range in log base 2 value. Sliding window tells +the minimum and maximum size of lookup window that algorithm uses +to find duplicates. + +See *DPDK API Reference* for details. + +Each Compression poll mode driver defines its array of capabilities +for each algorithm it supports. See PMD implementation for capability +initialization. + +Capabilities Discovery +~~~~~~~~~~~~~~~~~~~~~~ + +PMD capability and features are discovered via ``rte_compressdev_info_get`` function. + +The ``rte_compressdev_info`` structure contains all the relevant information for the device. + +See *DPDK API Reference* for details. + +Compression Operation +---------------------- + +DPDK compression supports two types of compression methodologies: + +- Stateless, data associated to a compression operation is compressed without any reference + to another compression operation. + +- Stateful, data in each compression operation is compressed with reference to previous compression + operations in the same data stream i.e. history of data is maintained between the operations. + +For more explanation, please refer RFC https://www.ietf.org/rfc/rfc1951.txt + +Operation Representation +~~~~~~~~~~~~~~~~~~~~~~~~ + +Compression operation is described via ``struct rte_comp_op``, which contains both input and +output data. The operation structure includes the operation type (stateless or stateful), +the operation status and the priv_xform/stream handle, source, destination and checksum buffer +pointers. It also contains the source mempool from which the operation is allocated. +PMD updates consumed field with amount of data read from source buffer and produced +field with amount of data of written into destination buffer along with status of +operation. See section *Produced, Consumed And Operation Status* for more details. + +Compression operations mempool also has an ability to allocate private memory with the +operation for application's purposes. Application software is responsible for specifying +all the operation specific fields in the ``rte_comp_op`` structure which are then used +by the compression PMD to process the requested operation. + + +Operation Management and Allocation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The compressdev library provides an API set for managing compression operations which +utilize the Mempool Library to allocate operation buffers. Therefore, it ensures +that the compression operation is interleaved optimally across the channels and +ranks for optimal processing. + +A ``rte_comp_op`` contains a field indicating the pool it originated from. + +``rte_comp_op_alloc()`` and ``rte_comp_op_bulk_alloc()`` are used to allocate +compression operations from a given compression operation mempool. +The operation gets reset before being returned to a user so that operation +is always in a good known state before use by the application. + +``rte_comp_op_free()`` is called by the application to return an operation to +its allocating pool. + +See *DPDK API Reference* for details. + +Passing source data as mbuf-chain +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If input data is scattered across several different buffers, then +Application can either parse through all such buffers and make one +mbuf-chain and enqueue it for processing or, alternatively, it can +make multiple sequential enqueue_burst() calls for each of them +processing them statefully. See *Compression API Stateful Operation* +for stateful processing of ops. + +Operation Status +~~~~~~~~~~~~~~~~ +Each operation carries a status information updated by PMD after it is processed. +following are currently supported status: + +- RTE_COMP_OP_STATUS_SUCCESS, + Operation is successfully completed + +- RTE_COMP_OP_STATUS_NOT_PROCESSED, + Operation has not yet been processed by the device + +- RTE_COMP_OP_STATUS_INVALID_ARGS, + Operation failed due to invalid arguments in request + +- RTE_COMP_OP_STATUS_ERROR, + Operation failed because of internal error + +- RTE_COMP_OP_STATUS_INVALID_STATE, + Operation is invoked in invalid state + +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED, + Output buffer ran out of space during processing. Error case, + PMD cannot continue from here. + +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE, + Output buffer ran out of space before operation completed, but this + is not an error case. Output data up to op.produced can be used and + next op in the stream should continue on from op.consumed+1. + +Produced, Consumed And Operation Status +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- If status is RTE_COMP_OP_STATUS_SUCCESS, + consumed = amount of data read from input buffer, and + produced = amount of data written in destination buffer +- If status is RTE_COMP_OP_STATUS_FAILURE, + consumed = produced = 0 or undefined +- If status is RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED, + consumed = 0 and + produced = usually 0, but in decompression cases a PMD may return > 0 + i.e. amount of data successfully produced until out of space condition + hit. Application can consume output data in this case, if required. +- If status is RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE, + consumed = amount of data read, and + produced = amount of data successfully produced until + out of space condition hit. PMD has ability to recover + from here, so application can submit next op from + consumed+1 and a destination buffer with available space. + +Transforms +---------- + +Compression transforms (``rte_comp_xform``) are the mechanism +to specify the details of the compression operation such as algorithm, +window size and checksum. + +Compression API Hash support +---------------------------- + +Compression API allows application to enable digest calculation +alongside compression and decompression of data. A PMD reflects its +support for hash algorithms via capability algo feature flags. +If supported, PMD calculates digest always on plaintext i.e. +before compression and after decompression. + +Currently supported list of hash algos are SHA-1 and SHA2 family +SHA256. + +See *DPDK API Reference* for details. + +If required, application should set valid hash algo in compress +or decompress xforms during ``rte_compressdev_stream_create()`` +or ``rte_compressdev_private_xform_create()`` and pass a valid +output buffer in ``rte_comp_op`` hash field struct to store the +resulting digest. Buffer passed should be contiguous and large +enough to store digest which is 20 bytes for SHA-1 and +32 bytes for SHA2-256. + +Compression API Stateless operation +------------------------------------ + +An op is processed stateless if it has +- op_type set to RTE_COMP_OP_STATELESS +- flush value set to RTE_FLUSH_FULL or RTE_FLUSH_FINAL +(required only on compression side), +- All required input in source buffer + +When all of the above conditions are met, PMD initiates stateless processing +and releases acquired resources after processing of current operation is +complete. Application can enqueue multiple stateless ops in a single burst +and must attach priv_xform handle to such ops. + +priv_xform in Stateless operation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +priv_xform is PMD internally managed private data that it maintains to do stateless processing. +priv_xforms are initialized provided a generic xform structure by an application via making call +to ``rte_comp_private_xform_create``, at an output PMD returns an opaque priv_xform reference. +If PMD support SHAREABLE priv_xform indicated via algorithm feature flag, then application can +attach same priv_xform with many stateless ops at-a-time. If not, then application needs to +create as many priv_xforms as it expects to have stateless operations in-flight. + +.. figure:: img/stateless-op.* + + Stateless Ops using Non-Shareable priv_xform + + +.. figure:: img/stateless-op-shared.* + + Stateless Ops using Shareable priv_xform + + +Application should call ``rte_compressdev_private_xform_create()`` and attach to stateless op before +enqueuing them for processing and free via ``rte_compressdev_private_xform_free()`` during termination. + +An example pseudocode to setup and process NUM_OPS stateless ops with each of length OP_LEN +using priv_xform would look like: + +.. code-block:: c + + /* + * pseudocode for stateless compression + */ + + uint8_t cdev_id = rte_compdev_get_dev_id(<pmd name>); + + /* configure the device. */ + if (rte_compressdev_configure(cdev_id, &conf) < 0) + rte_exit(EXIT_FAILURE, "Failed to configure compressdev %u", cdev_id); + + if (rte_compressdev_queue_pair_setup(cdev_id, 0, NUM_MAX_INFLIGHT_OPS, + socket_id()) < 0) + rte_exit(EXIT_FAILURE, "Failed to setup queue pair\n"); + + if (rte_compressdev_start(cdev_id) < 0) + rte_exit(EXIT_FAILURE, "Failed to start device\n"); + + /* setup compress transform */ + struct rte_compress_compress_xform compress_xform = { + .type = RTE_COMP_COMPRESS, + .compress = { + .algo = RTE_COMP_ALGO_DEFLATE, + .deflate = { + .huffman = RTE_COMP_HUFFMAN_DEFAULT + }, + .level = RTE_COMP_LEVEL_PMD_DEFAULT, + .chksum = RTE_COMP_CHECKSUM_NONE, + .window_size = DEFAULT_WINDOW_SIZE, + .hash_algo = RTE_COMP_HASH_ALGO_NONE + } + }; + + /* create priv_xform and initialize it for the compression device. */ + void *priv_xform = NULL; + rte_compressdev_info_get(cdev_id, &dev_info); + if(dev_info.capability->comps_feature_flag & RTE_COMP_FF_SHAREABLE_PRIV_XFORM) { + rte_comp_priv_xform_create(cdev_id, &compress_xform, &priv_xform); + } else { + shareable = 0; + } + + /* create operation pool via call to rte_comp_op_pool_create and alloc ops */ + rte_comp_op_bulk_alloc(op_pool, comp_ops, NUM_OPS); + + /* prepare ops for compression operations */ + for (i = 0; i < NUM_OPS; i++) { + struct rte_comp_op *op = comp_ops[i]; + if (!shareable) + rte_priv_xform_create(cdev_id, &compress_xform, &op->priv_xform) + else + op->priv_xform = priv_xform; + op->type = RTE_COMP_OP_STATELESS; + op->flush = RTE_COMP_FLUSH_FINAL; + + op->src.offset = 0; + op->dst.offset = 0; + op->src.length = OP_LEN; + op->input_chksum = 0; + setup op->m_src and op->m_dst; + } + num_enqd = rte_compressdev_enqueue_burst(cdev_id, 0, comp_ops, NUM_OPS); + /* wait for this to complete before enqueing next*/ + do { + num_deque = rte_compressdev_dequeue_burst(cdev_id, 0 , &processed_ops, NUM_OPS); + } while (num_dqud < num_enqd); + + +Stateless and OUT_OF_SPACE +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +OUT_OF_SPACE is a condition when output buffer runs out of space and where PMD +still has more data to produce. If PMD runs into such condition, then PMD returns +RTE_COMP_OP_OUT_OF_SPACE_TERMINATED error. In such case, PMD resets itself and can set +consumed=0 and produced=amount of output it could produce before hitting out_of_space. +Application would need to resubmit the whole input with a larger output buffer, if it +wants the operation to be completed. + +Hash in Stateless +~~~~~~~~~~~~~~~~~ +If hash is enabled, digest buffer will contain valid data after op is successfully +processed i.e. dequeued with status = RTE_COMP_OP_STATUS_SUCCESS. + +Checksum in Stateless +~~~~~~~~~~~~~~~~~~~~~ +If checksum is enabled, checksum will only be available after op is successfully +processed i.e. dequeued with status = RTE_COMP_OP_STATUS_SUCCESS. + +Compression API Stateful operation +----------------------------------- + +Compression API provide RTE_COMP_FF_STATEFUL_COMPRESSION and +RTE_COMP_FF_STATEFUL_DECOMPRESSION feature flag for PMD to reflect +its support for Stateful operations. + +A Stateful operation in DPDK compression means application invokes enqueue +burst() multiple times to process related chunk of data because +application broke data into several ops. + +In such case +- ops are setup with op_type RTE_COMP_OP_STATEFUL, +- all ops except last set to flush value = RTE_COMP_NO/SYNC_FLUSH +and last set to flush value RTE_COMP_FULL/FINAL_FLUSH. + +In case of either one or all of the above conditions, PMD initiates +stateful processing and releases acquired resources after processing +operation with flush value = RTE_COMP_FLUSH_FULL/FINAL is complete. +Unlike stateless, application can enqueue only one stateful op from +a particular stream at a time and must attach stream handle +to each op. + +Stream in Stateful operation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +`stream` in DPDK compression is a logical entity which identifies related set of ops, say, a one large +file broken into multiple chunks then file is represented by a stream and each chunk of that file is +represented by compression op `rte_comp_op`. Whenever application wants a stateful processing of such +data, then it must get a stream handle via making call to ``rte_comp_stream_create()`` +with xform, at an output the target PMD will return an opaque stream handle to application which +it must attach to all of the ops carrying data of that stream. In stateful processing, every op +requires previous op data for compression/decompression. A PMD allocates and set up resources such +as history, states, etc. within a stream, which are maintained during the processing of the related ops. + +Unlike priv_xforms, stream is always a NON_SHAREABLE entity. One stream handle must be attached to only +one set of related ops and cannot be reused until all of them are processed with status Success or failure. + +.. figure:: img/stateful-op.* + + Stateful Ops + + +Application should call ``rte_comp_stream_create()`` and attach to op before +enqueuing them for processing and free via ``rte_comp_stream_free()`` during +termination. All ops that are to be processed statefully should carry *same* stream. + +See *DPDK API Reference* document for details. + +An example pseudocode to set up and process a stream having NUM_CHUNKS with each chunk size of CHUNK_LEN would look like: + +.. code-block:: c + + /* + * pseudocode for stateful compression + */ + + uint8_t cdev_id = rte_compdev_get_dev_id(<pmd name>); + + /* configure the device. */ + if (rte_compressdev_configure(cdev_id, &conf) < 0) + rte_exit(EXIT_FAILURE, "Failed to configure compressdev %u", cdev_id); + + if (rte_compressdev_queue_pair_setup(cdev_id, 0, NUM_MAX_INFLIGHT_OPS, + socket_id()) < 0) + rte_exit(EXIT_FAILURE, "Failed to setup queue pair\n"); + + if (rte_compressdev_start(cdev_id) < 0) + rte_exit(EXIT_FAILURE, "Failed to start device\n"); + + /* setup compress transform. */ + struct rte_compress_compress_xform compress_xform = { + .type = RTE_COMP_COMPRESS, + .compress = { + .algo = RTE_COMP_ALGO_DEFLATE, + .deflate = { + .huffman = RTE_COMP_HUFFMAN_DEFAULT + }, + .level = RTE_COMP_LEVEL_PMD_DEFAULT, + .chksum = RTE_COMP_CHECKSUM_NONE, + .window_size = DEFAULT_WINDOW_SIZE, + .hash_algo = RTE_COMP_HASH_ALGO_NONE + } + }; + + /* create stream */ + rte_comp_stream_create(cdev_id, &compress_xform, &stream); + + /* create an op pool and allocate ops */ + rte_comp_op_bulk_alloc(op_pool, comp_ops, NUM_CHUNKS); + + /* Prepare source and destination mbufs for compression operations */ + unsigned int i; + for (i = 0; i < NUM_CHUNKS; i++) { + if (rte_pktmbuf_append(mbufs[i], CHUNK_LEN) == NULL) + rte_exit(EXIT_FAILURE, "Not enough room in the mbuf\n"); + comp_ops[i]->m_src = mbufs[i]; + if (rte_pktmbuf_append(dst_mbufs[i], CHUNK_LEN) == NULL) + rte_exit(EXIT_FAILURE, "Not enough room in the mbuf\n"); + comp_ops[i]->m_dst = dst_mbufs[i]; + } + + /* Set up the compress operations. */ + for (i = 0; i < NUM_CHUNKS; i++) { + struct rte_comp_op *op = comp_ops[i]; + op->stream = stream; + op->m_src = src_buf[i]; + op->m_dst = dst_buf[i]; + op->type = RTE_COMP_OP_STATEFUL; + if(i == NUM_CHUNKS-1) { + /* set to final, if last chunk*/ + op->flush = RTE_COMP_FLUSH_FINAL; + } else { + /* set to NONE, for all intermediary ops */ + op->flush = RTE_COMP_FLUSH_NONE; + } + op->src.offset = 0; + op->dst.offset = 0; + op->src.length = CHUNK_LEN; + op->input_chksum = 0; + num_enqd = rte_compressdev_enqueue_burst(cdev_id, 0, &op[i], 1); + /* wait for this to complete before enqueing next*/ + do { + num_deqd = rte_compressdev_dequeue_burst(cdev_id, 0 , &processed_ops, 1); + } while (num_deqd < num_enqd); + /* push next op*/ + } + + +Stateful and OUT_OF_SPACE +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If PMD supports stateful operation, then OUT_OF_SPACE status is not an actual +error for the PMD. In such case, PMD returns with status +RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE with consumed = number of input bytes +read and produced = length of complete output buffer. +Application should enqueue next op with source starting at consumed+1 and an +output buffer with available space. + +Hash in Stateful +~~~~~~~~~~~~~~~~ +If enabled, digest buffer will contain valid digest after last op in stream +(having flush = RTE_COMP_OP_FLUSH_FINAL) is successfully processed i.e. dequeued +with status = RTE_COMP_OP_STATUS_SUCCESS. + +Checksum in Stateful +~~~~~~~~~~~~~~~~~~~~ +If enabled, checksum will only be available after last op in stream +(having flush = RTE_COMP_OP_FLUSH_FINAL) is successfully processed i.e. dequeued +with status = RTE_COMP_OP_STATUS_SUCCESS. + +Burst in compression API +------------------------- + +Scheduling of compression operations on DPDK's application data path is +performed using a burst oriented asynchronous API set. A queue pair on a compression +device accepts a burst of compression operations using enqueue burst API. On physical +devices the enqueue burst API will place the operations to be processed +on the device's hardware input queue, for virtual devices the processing of the +operations is usually completed during the enqueue call to the compression +device. The dequeue burst API will retrieve any processed operations available +from the queue pair on the compression device, from physical devices this is usually +directly from the devices processed queue, and for virtual device's from a +``rte_ring`` where processed operations are place after being processed on the +enqueue call. + +A burst in DPDK compression can be a combination of stateless and stateful operations with a condition +that for stateful ops only one op at-a-time should be enqueued from a particular stream i.e. no-two ops +should belong to same stream in a single burst. However a burst may contain multiple stateful ops as long +as each op is attached to a different stream i.e. a burst can look like: + ++---------------+--------------+--------------+-----------------+--------------+--------------+ +| enqueue_burst | op1.no_flush | op2.no_flush | op3.flush_final | op4.no_flush | op5.no_flush | ++---------------+--------------+--------------+-----------------+--------------+--------------+ + +Where, op1 .. op5 all belong to different independent data units. op1, op2, op4, op5 must be stateful +as stateless ops can only use flush full or final and op3 can be of type stateless or stateful. +Every op with type set to RTE_COMP_OP_TYPE_STATELESS must be attached to priv_xform and +Every op with type set to RTE_COMP_OP_TYPE_STATEFUL *must* be attached to stream. + +Since each operation in a burst is independent and thus can be completed +out-of-order, applications which need ordering, should setup per-op user data +area with reordering information so that it can determine enqueue order at +dequeue. + +Also if multiple threads calls enqueue_burst() on same queue pair then it’s +application onus to use proper locking mechanism to ensure exclusive enqueuing +of operations. + +Enqueue / Dequeue Burst APIs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The burst enqueue API uses a compression device identifier and a queue pair +identifier to specify the compression device queue pair to schedule the processing on. +The ``nb_ops`` parameter is the number of operations to process which are +supplied in the ``ops`` array of ``rte_comp_op`` structures. +The enqueue function returns the number of operations it actually enqueued for +processing, a return value equal to ``nb_ops`` means that all packets have been +enqueued. + +The dequeue API uses the same format as the enqueue API but +the ``nb_ops`` and ``ops`` parameters are now used to specify the max processed +operations the user wishes to retrieve and the location in which to store them. +The API call returns the actual number of processed operations returned, this +can never be larger than ``nb_ops``. + +Sample code +----------- + +There are unit test applications that show how to use the compressdev library inside +test/test/test_compressdev.c + +Compression Device API +~~~~~~~~~~~~~~~~~~~~~~ + +The compressdev Library API is described in the *DPDK API Reference* document. |