From b585097048884e063ac25aecc26a6802ee3faa4d Mon Sep 17 00:00:00 2001 From: Chris Luke Date: Tue, 3 May 2016 16:34:59 -0400 Subject: VPP-57 Add Doxygen to VPP - Configures Doxygen. - Adds a source filter to do magic on our use of the preprocessor to do constructor stuff to make Doxygen grok it better. - Adds a convenience helper to the root Makefile. - Adds a README.md to the root directory (and which Doxygem uses as its "mainpage". - Add several other documentative files. - Currently using SVG for call graphs, though this may have a load-time performance impact in browsers. Change-Id: I25fc6fb5bf634319dcb36a7f0e32031921c125ac Signed-off-by: Chris Luke --- README.md | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) create mode 100644 README.md (limited to 'README.md') diff --git a/README.md b/README.md new file mode 100644 index 00000000..2a71d27e --- /dev/null +++ b/README.md @@ -0,0 +1,94 @@ +Vector Packet Processing +======================== + +## Introduction. + +The VPP platform is an extensible framework that provides out-of-the-box +production quality switch/router functionality. It is the open source version +of Cisco's Vector Packet Processing (VPP) technology: a high performance, +packet-processing stack that can run on commodity CPUs. + +The benefits of this implementation of VPP are its high performance, proven +technology, its modularity and flexibility, and rich feature set. + +For more information on VPP and its features please visit the +[FD.io website](http://fd.io/) and +[What is VPP?](https://wiki.fd.io/view/VPP/What_is_VPP%3F) pages. + + +## Directory layout. + +Directory name | Description +---------------------- | ------------------------------------------- + build-data | Build metadata + build-root | Build output directory + doxygen | Documentation generator configuration + dpdk | DPDK patches and build infrastructure + g2 | Event log visualization tool + gmod | perf related? + perftool | Performance tool + sample-plugin | A sample plugin +@ref svm | Shared virtual memory allocation library + test | Unit tests +@ref vlib | VPP application library source +@ref vlib-api | VPP API library source +@ref vnet | VPP networking source +@ref vpp | VPP application source +@ref vpp-api | VPP application API source + vppapigen | VPP API generator source + vpp-api-test | VPP API test program source +@ref vppinfra | VPP core library source + +(If the page you are viewing is not generated by Doxygen then +ignore any @@ref labels in the above table.) + + +## Getting started. + +In general anyone interested in building, developing or running VPP should +consult the [VPP wiki](https://wiki.fd.io/view/VPP) for more complete +documentation. + +In particular, readers are recommended to take a look at [Pulling, Building, +Running, Hacking, Pushing](https://wiki.fd.io/view/VPP/Pulling,_Building,_Run +ning,_Hacking_and_Pushing_VPP_Code) which provides extensive step-by-step +coverage of the topic. + +For the impatient, some salient information is distilled below. + + +### Quick-start: On an existing Linux host. + +To install system dependencies, build VPP and then install it, simply run the +build script. This should be performed a non-privileged user with `sudo` +access from the project base directory: + + ./build-root/vagrant/build.sh + +If you want a more fine-grained approach because you intend to do some +development work, the `Makefile` in the root directory of the source tree +provides several convenience shortcuts as `make` targets that may be of +interest. To see the available targets run: + + make + + +### Quick-start: Vagrant. + +The directory `build-root/vagrant` contains a `VagrantFile` and supporting +scripts to bootstrap a working VPP inside a Vagrant-managed Virtual Machine. +This VM can then be used to test concepts with VPP or as a development +platform to extend VPP. Some obvious caveats apply when using a VM for VPP +since its performance will never match that of bare metal; if your work is +timing or performance sensitive, consider using bare metal in addition or +instead of the VM. + +For this to work you will need a working installation of Vagrant. Instructions +for this can be found [on the Setting up Vagrant wiki page] +(https://wiki.fd.io/view/DEV/Setting_Up_Vagrant). + + +## More information. + +Visit the [VPP wiki](https://wiki.fd.io/view/VPP) for details on more +advanced building strategies and development notes. -- cgit 1.2.3-korg From b52c40abf5c50cedfd80964c093bce779815d280 Mon Sep 17 00:00:00 2001 From: Chris Luke Date: Tue, 26 Jul 2016 12:02:46 -0400 Subject: VPP-165 Update directory list in README Change-Id: I7bc983860d6cc40f264e509e74b92978667324bc Signed-off-by: Chris Luke --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'README.md') diff --git a/README.md b/README.md index 2a71d27e..ccc832c4 100644 --- a/README.md +++ b/README.md @@ -27,7 +27,7 @@ Directory name | Description g2 | Event log visualization tool gmod | perf related? perftool | Performance tool - sample-plugin | A sample plugin + plugins | VPP bundled plugins directory @ref svm | Shared virtual memory allocation library test | Unit tests @ref vlib | VPP application library source -- cgit 1.2.3-korg From 5ccbaa97a86715bfee2ab44523075a93a8abb698 Mon Sep 17 00:00:00 2001 From: Chris Luke Date: Wed, 14 Sep 2016 12:42:54 -0400 Subject: Add structure to some of the documentation; VPP-223 Moves the random .md files, when rendered by Doxygen, into a config examples tree. We may later flesh this out into a more complete user documentation section. Change-Id: If423b82f1047f1c84f90876a786313054b5f7c77 Signed-off-by: Chris Luke --- README.md | 5 +- doxygen/Makefile | 6 +- doxygen/user_doc.md | 10 + plugins/ioam-plugin/ioam/Readme.md | 465 ---------------------------- plugins/ioam-plugin/ioam/ioam_plugin_doc.md | 464 +++++++++++++++++++++++++++ plugins/lb-plugin/README.md | 141 --------- plugins/lb-plugin/lb_plugin_doc.md | 141 +++++++++ vnet/vnet/ipsec-gre/ipsec_gre_doc.md | 2 +- vnet/vnet/lldp/lldp.md | 84 ----- vnet/vnet/lldp/lldp_doc.md | 84 +++++ vnet/vnet/map/map_doc.md | 2 +- 11 files changed, 710 insertions(+), 694 deletions(-) create mode 100644 doxygen/user_doc.md delete mode 100644 plugins/ioam-plugin/ioam/Readme.md create mode 100644 plugins/ioam-plugin/ioam/ioam_plugin_doc.md delete mode 100644 plugins/lb-plugin/README.md create mode 100644 plugins/lb-plugin/lb_plugin_doc.md delete mode 100644 vnet/vnet/lldp/lldp.md create mode 100644 vnet/vnet/lldp/lldp_doc.md (limited to 'README.md') diff --git a/README.md b/README.md index ccc832c4..bf26c9de 100644 --- a/README.md +++ b/README.md @@ -27,7 +27,7 @@ Directory name | Description g2 | Event log visualization tool gmod | perf related? perftool | Performance tool - plugins | VPP bundled plugins directory +@ref plugins | VPP bundled plugins directory @ref svm | Shared virtual memory allocation library test | Unit tests @ref vlib | VPP application library source @@ -90,5 +90,8 @@ for this can be found [on the Setting up Vagrant wiki page] ## More information. +Several modules provide documentation, see @subpage user_doc for more +information. + Visit the [VPP wiki](https://wiki.fd.io/view/VPP) for details on more advanced building strategies and development notes. diff --git a/doxygen/Makefile b/doxygen/Makefile index 954191eb..0a69c2df 100644 --- a/doxygen/Makefile +++ b/doxygen/Makefile @@ -40,10 +40,14 @@ DOXY_SRC_DIRECTORIES = \ # Input directories and files DOXY_INPUT ?= \ - README.md \ + $(wildcard $(WS_ROOT)/*.md) \ + $(wildcard $(DOXY_DIR)/*.md) \ $(DOXY_SRC_DIRECTORIES) \ plugins +# Strip leading workspace path from input names +DOXY_INPUT := $(subst $(WS_ROOT)/,,$(DOXY_INPUT)) + # Files to exclude, from pre-Doxygen steps, eg because they're # selectively compiled. # Examples would be to exclude non-DPDK related sources when diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md new file mode 100644 index 00000000..b08ac86a --- /dev/null +++ b/doxygen/user_doc.md @@ -0,0 +1,10 @@ +User Configuration Examples {#user_doc} +=========================== + +Several modules provide operational documentation. + +- @subpage ipsec_gre_doc +- @subpage map_doc +- @subpage lldp_doc +- @subpage ioam_plugin_doc +- @subpage lb_plugin_doc diff --git a/plugins/ioam-plugin/ioam/Readme.md b/plugins/ioam-plugin/ioam/Readme.md deleted file mode 100644 index 469a15a1..00000000 --- a/plugins/ioam-plugin/ioam/Readme.md +++ /dev/null @@ -1,465 +0,0 @@ - -## VPP Inband OAM (iOAM) - -In-band OAM (iOAM) is an implementation study to record operational -information in the packet while the packet traverses a path between -two points in the network. - -Overview of iOAM can be found in [iOAM-Devnet] page. -The following IETF drafts detail the motivation and mechanism for -recording operational information: - - [iOAM-ietf-requirements] - Describes motivation and usecases for iOAM - - [iOAM-ietf-data] - Describes data records that can be collected using iOAM - - [iOAM-ietf-transport] - Lists out the transport protocols - and mechanism to carry iOAM data records - - [iOAM-ietf-proof-of-transit] - Describes the idea of Proof of Transit (POT) - and mechanisms to operationalize the idea - -## Terminology -In-band OAM is expected to be deployed in a specific domain rather -than on the overall Internet. The part of the network which employs in-band OAM -is referred to as **"in-band OAM-domain"**. - -In-band OAM data is added to a packet on entering the in-band OAM-domain -and is removed from the packet when exiting the domain. -Within the in-band OAM-domain, network nodes that the packet traverses -may update the in-band OAM data records. - -- The node which adds in-band OAM data to the packet is called the -**"in-band OAM encapsulating node"**. - -- The node which removes the in-band OAM data is referred to as the -**"in-band OAM decapsulating node"**. - -- Nodes within the domain which are aware of in-band OAM data and read -and/or write or process the in-band OAM data are called -**"in-band OAM transit nodes"**. - -## Features supported in the current release -VPP can function as in-band OAM encapsulating, transit and decapsulating node. -In this version of VPP in-band OAM data is transported as options in an -IPv6 hop-by-hop extension header. Hence in-band OAM can be enabled -for IPv6 traffic. - -The following iOAM features are supported: - -- **In-band OAM Tracing** : In-band OAM supports multiple data records to be -recorded in the packet as the packet traverses the network. -These data records offer insights into the operational behavior of the network. -The following information can be collected in the tracing -data from the nodes a packet traverses: - - Node ID - - Ingress interface ID - - Egress interface ID - - Timestamp - - Pre-configured application data - -- **In-band OAM Proof of Transit (POT)**: Proof of transit iOAM data is -added to every packet for verifying that a packet traverses a specific -set of nodes. -In-band OAM data is updated at every node that is enabled with iOAM -proof of transit and is used to verify whether a packet traversed -all the specified nodes. When the verifier receives each packet, -it can validate whether the packet traversed the specified nodes. - - -## Configuration -Configuring iOAM involves: -- Selecting the packets for which iOAM data must be inserted, updated or removed - - Selection of packets for iOAM data insertion on iOAM encapsulating node. - Selection of packets is done by 5-tuple based classification - - Selection of packets for updating iOAM data is implicitly done on the - presence of iOAM options in the packet - - Selection of packets for removing the iOAM data is done on 5-tuple - based classification -- The kind of data to be collected - - Tracing data - - Proof of transit -- Additional details for processing iOAM data to be collected - - For trace data - trace type, number of nodes to be recorded in the trace, - time stamp precision, etc. - - For POT data - configuration of POT profile required to process the POT data - -The CLI for configuring iOAM is explained here followed by detailed steps -and examples to deploy iOAM on VPP as an encapsulating, transit or -decapsulating iOAM node in the subsequent sub-sections. - -VPP iOAM configuration for enabling trace and POT is as follows: - - set ioam rewrite trace-type <0x1f|0x7|0x9|0x11|0x19> - trace-elts trace-tsp <0|1|2|3> - node-id app-data [pot] - -A description of each of the options of the CLI follows: -- trace-type : An entry in the "Node data List" array of the trace option -can have different formats, following the needs of the a deployment. -For example: Some deployments might only be interested -in recording the node identifiers, whereas others might be interested -in recording node identifier and timestamp. -The following types are currently supported: - - 0x1f : Node data to include hop limit (8 bits), node ID (24 bits), - ingress and egress interface IDs (16 bits each), timestamp (32 bits), - application data (32 bits) - - 0x7 : Node data to include hop limit (8 bits), node ID (24 bits), - ingress and egress interface IDs (16 bits each) - - 0x9 : Node data to include hop limit (8 bits), node ID (24 bits), - timestamp (32 bits) - - 0x11: Node data to include hop limit (8 bits), node ID (24 bits), - application data (32 bits) - - 0x19: Node data to include hop limit (8 bits), node ID (24 bits), - timestamp (32 bits), application data (32 bits) -- trace-elts : Defines the length of the node data array in the trace option. -- trace-tsp : Defines the timestamp precision to use with the enumerated value - for precision as follows: - - 0 : 32bits timestamp in seconds - - 1 : 32bits timestamp in milliseconds - - 2 : 32bits timestamp in microseconds - - 3 : 32bits timestamp in nanoseconds -- node-id : Unique identifier for the node, included in the node ID - field of the node data in trace option. -- app-data : The value configured here is included as is in -application data field of node data in trace option. -- pot : Enables POT option to be included in the iOAM options. - -### Trace configuration - -#### On in-band OAM encapsulating node - - **Configure classifier and apply ACL** to select packets for - iOAM data insertion - - Example to enable iOAM data insertion for all the packets - towards IPv6 address db06::06: - - vpp# classify table miss-next node ip6-lookup mask l3 ip6 dst - - vpp# classify session acl-hit-next node ip6-add-hop-by-hop - table-index 0 match l3 ip6 dst db06::06 - - vpp# set int input acl intfc GigabitEthernet0/0/0 ip6-table 0 - - - **Enable tracing** : Specify node ID, maximum number of nodes for which - trace data should be recorded, type of data to be included for recording, - optionally application data to be included - - Example to enable tracing with a maximum of 4 nodes recorded - and the data to be recorded to include - hop limit, node id, - ingress and egress interface IDs, timestamp (millisecond precision), - application data (0x1234): - - - vpp# set ioam rewrite trace-type 0x1f trace-elts 4 trace-tsp 1 - node-id 0x1 app-data 0x1234 - - - -#### On in-band OAM transit node -- The transit node requires trace type, timestamp precision, node ID and -optionally application data to be configured, -to update its node data in the trace option. - -Example: - - vpp# set ioam rewrite trace-type 0x1f trace-elts 4 trace-tsp 1 - node-id 0x2 app-data 0x1234 - -#### On the In-band OAM decapsulating node -- The decapsulating node similar to encapsulating node requires -**classification** of the packets to remove iOAM data from. - - Example to decapsulate iOAM data for packets towards - db06::06, configure classifier and enable it as an ACL as follows: - - - vpp# classify table miss-next node ip6-lookup mask l3 ip6 dst - - vpp# classify session acl-hit-next node ip6-lookup table-index 0 - match l3 ip6 dst db06::06 opaque-index 100 - - vpp# set int input acl intfc GigabitEthernet0/0/0 ip6-table 0 - - -- Decapsulating node requires trace type, timestamp precision, -node ID and optionally application data to be configured, -to update its node data in the trace option before it is decapsulated. - -Example: - - vpp# set ioam rewrite trace-type 0x1f trace-elts 4 - trace-tsp 1 node-id 0x3 app-data 0x1234 - - -### Proof of Transit configuration - -For details on proof-of-transit, -see the IETF draft [iOAM-ietf-proof-of-transit]. -To enable Proof of Transit all the nodes that participate -and hence are verified for transit need a proof of transit profile. -A script to generate a proof of transit profile as per the mechanism -described in [iOAM-ietf-proof-of-transit] will be available at [iOAM-Devnet]. - -The Proof of transit mechanism implemented here is based on -Shamir's Secret Sharing algorithm. -The overall algorithm uses two polynomials -POLY-1 and POLY-2. The degree of polynomials depends on number of nodes -to be verified for transit. -POLY-1 is secret and constant. Each node gets a point on POLY-1 -at setup-time and keeps it secret. -POLY-2 is public, random and per packet. -Each node is assigned a point on POLY-1 and POLY-2 with the same x index. -Each node derives its point on POLY-2 each time a packet arrives at it. -A node then contributes its points on POLY-1 and POLY-2 to construct -POLY-3 (POLY-3 = POLY-1 + POLY-2) using lagrange extrapolation and -forwards it towards the verifier by updating POT data in the packet. -The verifier constructs POLY-3 from the accumulated value from all the nodes -and its own points on POLY-1 and POLY-2 and verifies whether -POLY-3 = POLY-1 + POLY-2. Only the verifier knows POLY-1. -The solution leverages finite field arithmetic in a field of size "prime number" -for reasons explained in description of Shamir's secret sharing algorithm. - -Here is an explanation of POT profile list and profile configuration CLI to -realize the above mechanism. -It is best to use the script provided at [iOAM-Devnet] to generate -this configuration. -- **Create POT profile** : set pot profile name id [0-1] -[validator-key 0xu64] prime-number 0xu64 secret_share 0xu64 -lpc 0xu64 polynomial2 0xu64 bits-in-random [0-64] - - name : Profile list name. - - id : Profile id, it can be 0 or 1. - A maximum of two profiles can be configured per profile list. - - validator-key : Secret key configured only on the - verifier/decapsulating node used to compare and verify proof of transit. - - prime-number : Prime number for finite field arithmetic as required by the - proof of transit mechanism. - - secret_share : Unique point for each node on the secret polynomial POLY-1. - - lpc : Lagrange Polynomial Constant(LPC) calculated per node based on - its point (x value used for evaluating the points on the polynomial) - on the polynomial used in lagrange extrapolation - for reconstructing polynomial (POLY-3). - - polynomial2 : Is the pre-evaluated value of the point on - 2nd polynomial(POLY-2). This is unique for each node. - It is pre-evaluated for all the coefficients of POLY-2 except - for the constant part of the polynomial that changes per packet - and is received as part of the POT data in the packet. - - bits-in-random : To control the size of the random number to be - generated. This number has to match the other numbers generated and used - in the profile as per the algorithm. - -- **Set a configured profile as active/in-use** : -set pot profile-active name ID [0-1] - - name : Name of the profile list to be used for computing - POT data per packet. - - ID : Identifier of the profile within the list to be used. - -#### On In-band OAM encapsulating node - - Configure the classifier and apply ACL to select packets for iOAM data insertion. - - Example to enable iOAM data insertion for all the packet towards - IPv6 address db06::06 - - - - vpp# classify table miss-next node ip6-lookup mask l3 ip6 dst - - vpp# classify session acl-hit-next node - ip6-add-hop-by-hop table-index 0 match l3 ip6 dst db06::06 - - vpp# set int input acl intfc GigabitEthernet0/0/0 ip6-table 0 - - - - Configure the proof of transit profile list with profiles. -Each profile list referred to by a name can contain 2 profiles, -only one is in use for updating proof of transit data at any time. - - Example profile list example with a profile generated from the - script to verify transit through 3 nodes is: - - - vpp# set pot profile name example id 0 prime-number 0x7fff0000fa884685 - secret_share 0x6c22eff0f45ec56d lpc 0x7fff0000fa884682 - polynomial2 0xffb543d4a9c bits-in-random 63 - - - Enable one of the profiles from the configured profile list as active - so that is will be used for calculating proof of transit - -Example enable profile ID 0 from profile list example configured above: - - - vpp# set pot profile-active name example ID 0 - - - - Enable POT option to be inserted - - - vpp# set ioam rewrite pot - - -#### On in-band OAM transit node - - Configure the proof of transit profile list with profiles for transit node. -Example: - - - vpp# set pot profile name example id 0 prime-number 0x7fff0000fa884685 - secret_share 0x564cdbdec4eb625d lpc 0x1 - polynomial2 0x23f3a227186a bits-in-random 63 - -#### On in-band OAM decapsulating node / verifier -- The decapsulating node, similar to the encapsulating node requires -classification of the packets to remove iOAM data from. - - Example to decapsulate iOAM data for packets towards db06::06 - configure classifier and enable it as an ACL as follows: - - - vpp# classify table miss-next node ip6-lookup mask l3 ip6 dst - - vpp# classify session acl-hit-next node ip6-lookup table-index 0 - match l3 ip6 dst db06::06 opaque-index 100 - - vpp# set int input acl intfc GigabitEthernet0/0/0 ip6-table 0 - -- To update and verify the proof of transit, POT profile list should be configured. - - Example POT profile list configured as follows: - - vpp# set pot profile name example id 0 validate-key 0x7fff0000fa88465d - prime-number 0x7fff0000fa884685 secret_share 0x7a08fbfc5b93116d lpc 0x3 - polynomial2 0x3ff738597ce bits-in-random 63 - -## Operational data - -Following CLIs are available to check iOAM operation: -- To check iOAM configuration that are effective use "show ioam summary" - -Example: - - vpp# show ioam summary - REWRITE FLOW CONFIGS - Not configured - HOP BY HOP OPTIONS - TRACE CONFIG - - Trace Type : 0x1f (31) - Trace timestamp precision : 1 (Milliseconds) - Num of trace nodes : 4 - Node-id : 0x2 (2) - App Data : 0x1234 (4660) - POT OPTION - 1 (Enabled) - Try 'show ioam pot and show pot profile' for more information - -- To find statistics about packets for which iOAM options were -added (encapsulating node) and removed (decapsulating node) execute -*show errors* - -Example on encapsulating node: - - - vpp# show error - Count Node Reason - 1208804706 ip6-inacl input ACL hits - 1208804706 ip6-add-hop-by-hop Pkts w/ added ip6 hop-by-hop options - -Example on decapsulating node: - - vpp# show error - Count Node Reason - 69508569 ip6-inacl input ACL hits - 69508569 ip6-pop-hop-by-hop Pkts w/ removed ip6 hop-by-hop options - -- To check the POT profiles use "show pot profile" - -Example: - - vpp# show pot profile - Profile list in use : example - POT Profile at index: 0 - ID : 0 - Validator : False (0) - Secret share : 0x564cdbdec4eb625d (6218586935324795485) - Prime number : 0x7fff0000fa884685 (9223090566081300101) - 2nd polynomial(eval) : 0x23f3a227186a (39529304496234) - LPC : 0x1 (1) - Bit mask : 0x7fffffffffffffff (9223372036854775807) - Profile index in use: 0 - Pkts passed : 0x36 (54) - -- To get statistics of POT for packets use "show ioam pot" - -Example at encapsulating or transit node: - - vpp# show ioam pot - Pkts with ip6 hop-by-hop POT options - 54 - Pkts with ip6 hop-by-hop POT options but no profile set - 0 - Pkts with POT in Policy - 0 - Pkts with POT out of Policy - 0 - - -Example at decapsulating/verification node: - - - vpp# show ioam pot - Pkts with ip6 hop-by-hop POT options - 54 - Pkts with ip6 hop-by-hop POT options but no profile set - 0 - Pkts with POT in Policy - 54 - Pkts with POT out of Policy - 0 - -- Tracing - enable trace of IPv6 packets to view the data inserted and -collected. - -Example when the nodes are receiving data over a DPDK interface: -Enable tracing using "trace add dpdk-input 20" and -execute "show trace" to view the iOAM data collected: - - - vpp# trace add dpdk-input 20 - - vpp# show trace - - ------------------- Start of thread 0 vpp_main ------------------- - - Packet 1 - - 00:00:19:294697: dpdk-input - GigabitEthernetb/0/0 rx queue 0 - buffer 0x10e6b: current data 0, length 214, free-list 0, totlen-nifb 0, trace 0x0 - PKT MBUF: port 0, nb_segs 1, pkt_len 214 - buf_len 2176, data_len 214, ol_flags 0x0, data_off 128, phys_addr 0xe9a35a00 - packet_type 0x0 - IP6: 00:50:56:9c:df:72 -> 00:50:56:9c:be:55 - IP6_HOP_BY_HOP_OPTIONS: db05::2 -> db06::6 - tos 0x00, flow label 0x0, hop limit 63, payload length 160 - 00:00:19:294737: ethernet-input - IP6: 00:50:56:9c:df:72 -> 00:50:56:9c:be:55 - 00:00:19:294753: ip6-input - IP6_HOP_BY_HOP_OPTIONS: db05::2 -> db06::6 - tos 0x00, flow label 0x0, hop limit 63, payload length 160 - 00:00:19:294757: ip6-lookup - fib 0 adj-idx 15 : indirect via db05::2 flow hash: 0x00000000 - IP6_HOP_BY_HOP_OPTIONS: db05::2 -> db06::6 - tos 0x00, flow label 0x0, hop limit 63, payload length 160 - 00:00:19:294802: ip6-hop-by-hop - IP6_HOP_BY_HOP: next index 5 len 96 traced 96 Trace Type 0x1f , 1 elts left - [0] ttl 0x0 node ID 0x0 ingress 0x0 egress 0x0 ts 0x0 - app 0x0 - [1] ttl 0x3e node ID 0x3 ingress 0x1 egress 0x2 ts 0xb68c2213 - app 0x1234 - [2] ttl 0x3f node ID 0x2 ingress 0x1 egress 0x2 ts 0xb68c2204 - app 0x1234 - [3] ttl 0x40 node ID 0x1 ingress 0x5 egress 0x6 ts 0xb68c2200 - app 0x1234 - POT opt present - random = 0x577a916946071950, Cumulative = 0x10b46e78a35a392d, Index = 0x0 - 00:00:19:294810: ip6-rewrite - tx_sw_if_index 1 adj-idx 14 : GigabitEthernetb/0/0 - IP6: 00:50:56:9c:be:55 -> 00:50:56:9c:df:72 flow hash: 0x00000000 - IP6: 00:50:56:9c:be:55 -> 00:50:56:9c:df:72 - IP6_HOP_BY_HOP_OPTIONS: db05::2 -> db06::6 - tos 0x00, flow label 0x0, hop limit 62, payload length 160 - 00:00:19:294814: GigabitEthernetb/0/0-output - GigabitEthernetb/0/0 - IP6: 00:50:56:9c:be:55 -> 00:50:56:9c:df:72 - IP6_HOP_BY_HOP_OPTIONS: db05::2 -> db06::6 - tos 0x00, flow label 0x0, hop limit 62, payload length 160 - 00:00:19:294820: GigabitEthernetb/0/0-tx - GigabitEthernetb/0/0 tx queue 0 - buffer 0x10e6b: current data 0, length 214, free-list 0, totlen-nifb 0, trace 0x0 - IP6: 00:50:56:9c:be:55 -> 00:50:56:9c:df:72 - - IP6_HOP_BY_HOP_OPTIONS: db05::2 -> db06::6 - - tos 0x00, flow label 0x0, hop limit 62, payload length 160 - - -[iOAM-Devnet]: -[iOAM-ietf-requirements]: -[iOAM-ietf-transport]: -[iOAM-ietf-data]: -[iOAM-ietf-proof-of-transit]: diff --git a/plugins/ioam-plugin/ioam/ioam_plugin_doc.md b/plugins/ioam-plugin/ioam/ioam_plugin_doc.md new file mode 100644 index 00000000..343abcf7 --- /dev/null +++ b/plugins/ioam-plugin/ioam/ioam_plugin_doc.md @@ -0,0 +1,464 @@ +## VPP Inband OAM (iOAM) {#ioam_plugin_doc} + +In-band OAM (iOAM) is an implementation study to record operational +information in the packet while the packet traverses a path between +two points in the network. + +Overview of iOAM can be found in [iOAM-Devnet] page. +The following IETF drafts detail the motivation and mechanism for +recording operational information: + - [iOAM-ietf-requirements] - Describes motivation and usecases for iOAM + - [iOAM-ietf-data] - Describes data records that can be collected using iOAM + - [iOAM-ietf-transport] - Lists out the transport protocols + and mechanism to carry iOAM data records + - [iOAM-ietf-proof-of-transit] - Describes the idea of Proof of Transit (POT) + and mechanisms to operationalize the idea + +## Terminology +In-band OAM is expected to be deployed in a specific domain rather +than on the overall Internet. The part of the network which employs in-band OAM +is referred to as **"in-band OAM-domain"**. + +In-band OAM data is added to a packet on entering the in-band OAM-domain +and is removed from the packet when exiting the domain. +Within the in-band OAM-domain, network nodes that the packet traverses +may update the in-band OAM data records. + +- The node which adds in-band OAM data to the packet is called the +**"in-band OAM encapsulating node"**. + +- The node which removes the in-band OAM data is referred to as the +**"in-band OAM decapsulating node"**. + +- Nodes within the domain which are aware of in-band OAM data and read +and/or write or process the in-band OAM data are called +**"in-band OAM transit nodes"**. + +## Features supported in the current release +VPP can function as in-band OAM encapsulating, transit and decapsulating node. +In this version of VPP in-band OAM data is transported as options in an +IPv6 hop-by-hop extension header. Hence in-band OAM can be enabled +for IPv6 traffic. + +The following iOAM features are supported: + +- **In-band OAM Tracing** : In-band OAM supports multiple data records to be +recorded in the packet as the packet traverses the network. +These data records offer insights into the operational behavior of the network. +The following information can be collected in the tracing +data from the nodes a packet traverses: + - Node ID + - Ingress interface ID + - Egress interface ID + - Timestamp + - Pre-configured application data + +- **In-band OAM Proof of Transit (POT)**: Proof of transit iOAM data is +added to every packet for verifying that a packet traverses a specific +set of nodes. +In-band OAM data is updated at every node that is enabled with iOAM +proof of transit and is used to verify whether a packet traversed +all the specified nodes. When the verifier receives each packet, +it can validate whether the packet traversed the specified nodes. + + +## Configuration +Configuring iOAM involves: +- Selecting the packets for which iOAM data must be inserted, updated or removed + - Selection of packets for iOAM data insertion on iOAM encapsulating node. + Selection of packets is done by 5-tuple based classification + - Selection of packets for updating iOAM data is implicitly done on the + presence of iOAM options in the packet + - Selection of packets for removing the iOAM data is done on 5-tuple + based classification +- The kind of data to be collected + - Tracing data + - Proof of transit +- Additional details for processing iOAM data to be collected + - For trace data - trace type, number of nodes to be recorded in the trace, + time stamp precision, etc. + - For POT data - configuration of POT profile required to process the POT data + +The CLI for configuring iOAM is explained here followed by detailed steps +and examples to deploy iOAM on VPP as an encapsulating, transit or +decapsulating iOAM node in the subsequent sub-sections. + +VPP iOAM configuration for enabling trace and POT is as follows: + + set ioam rewrite trace-type <0x1f|0x7|0x9|0x11|0x19> + trace-elts trace-tsp <0|1|2|3> + node-id app-data [pot] + +A description of each of the options of the CLI follows: +- trace-type : An entry in the "Node data List" array of the trace option +can have different formats, following the needs of the a deployment. +For example: Some deployments might only be interested +in recording the node identifiers, whereas others might be interested +in recording node identifier and timestamp. +The following types are currently supported: + - 0x1f : Node data to include hop limit (8 bits), node ID (24 bits), + ingress and egress interface IDs (16 bits each), timestamp (32 bits), + application data (32 bits) + - 0x7 : Node data to include hop limit (8 bits), node ID (24 bits), + ingress and egress interface IDs (16 bits each) + - 0x9 : Node data to include hop limit (8 bits), node ID (24 bits), + timestamp (32 bits) + - 0x11: Node data to include hop limit (8 bits), node ID (24 bits), + application data (32 bits) + - 0x19: Node data to include hop limit (8 bits), node ID (24 bits), + timestamp (32 bits), application data (32 bits) +- trace-elts : Defines the length of the node data array in the trace option. +- trace-tsp : Defines the timestamp precision to use with the enumerated value + for precision as follows: + - 0 : 32bits timestamp in seconds + - 1 : 32bits timestamp in milliseconds + - 2 : 32bits timestamp in microseconds + - 3 : 32bits timestamp in nanoseconds +- node-id : Unique identifier for the node, included in the node ID + field of the node data in trace option. +- app-data : The value configured here is included as is in +application data field of node data in trace option. +- pot : Enables POT option to be included in the iOAM options. + +### Trace configuration + +#### On in-band OAM encapsulating node + - **Configure classifier and apply ACL** to select packets for + iOAM data insertion + - Example to enable iOAM data insertion for all the packets + towards IPv6 address db06::06: + + vpp# classify table miss-next node ip6-lookup mask l3 ip6 dst + + vpp# classify session acl-hit-next node ip6-add-hop-by-hop + table-index 0 match l3 ip6 dst db06::06 + + vpp# set int input acl intfc GigabitEthernet0/0/0 ip6-table 0 + + - **Enable tracing** : Specify node ID, maximum number of nodes for which + trace data should be recorded, type of data to be included for recording, + optionally application data to be included + - Example to enable tracing with a maximum of 4 nodes recorded + and the data to be recorded to include - hop limit, node id, + ingress and egress interface IDs, timestamp (millisecond precision), + application data (0x1234): + + + vpp# set ioam rewrite trace-type 0x1f trace-elts 4 trace-tsp 1 + node-id 0x1 app-data 0x1234 + + + +#### On in-band OAM transit node +- The transit node requires trace type, timestamp precision, node ID and +optionally application data to be configured, +to update its node data in the trace option. + +Example: + + vpp# set ioam rewrite trace-type 0x1f trace-elts 4 trace-tsp 1 + node-id 0x2 app-data 0x1234 + +#### On the In-band OAM decapsulating node +- The decapsulating node similar to encapsulating node requires +**classification** of the packets to remove iOAM data from. + - Example to decapsulate iOAM data for packets towards + db06::06, configure classifier and enable it as an ACL as follows: + + + vpp# classify table miss-next node ip6-lookup mask l3 ip6 dst + + vpp# classify session acl-hit-next node ip6-lookup table-index 0 + match l3 ip6 dst db06::06 opaque-index 100 + + vpp# set int input acl intfc GigabitEthernet0/0/0 ip6-table 0 + + +- Decapsulating node requires trace type, timestamp precision, +node ID and optionally application data to be configured, +to update its node data in the trace option before it is decapsulated. + +Example: + + vpp# set ioam rewrite trace-type 0x1f trace-elts 4 + trace-tsp 1 node-id 0x3 app-data 0x1234 + + +### Proof of Transit configuration + +For details on proof-of-transit, +see the IETF draft [iOAM-ietf-proof-of-transit]. +To enable Proof of Transit all the nodes that participate +and hence are verified for transit need a proof of transit profile. +A script to generate a proof of transit profile as per the mechanism +described in [iOAM-ietf-proof-of-transit] will be available at [iOAM-Devnet]. + +The Proof of transit mechanism implemented here is based on +Shamir's Secret Sharing algorithm. +The overall algorithm uses two polynomials +POLY-1 and POLY-2. The degree of polynomials depends on number of nodes +to be verified for transit. +POLY-1 is secret and constant. Each node gets a point on POLY-1 +at setup-time and keeps it secret. +POLY-2 is public, random and per packet. +Each node is assigned a point on POLY-1 and POLY-2 with the same x index. +Each node derives its point on POLY-2 each time a packet arrives at it. +A node then contributes its points on POLY-1 and POLY-2 to construct +POLY-3 (POLY-3 = POLY-1 + POLY-2) using lagrange extrapolation and +forwards it towards the verifier by updating POT data in the packet. +The verifier constructs POLY-3 from the accumulated value from all the nodes +and its own points on POLY-1 and POLY-2 and verifies whether +POLY-3 = POLY-1 + POLY-2. Only the verifier knows POLY-1. +The solution leverages finite field arithmetic in a field of size "prime number" +for reasons explained in description of Shamir's secret sharing algorithm. + +Here is an explanation of POT profile list and profile configuration CLI to +realize the above mechanism. +It is best to use the script provided at [iOAM-Devnet] to generate +this configuration. +- **Create POT profile** : set pot profile name id [0-1] +[validator-key 0xu64] prime-number 0xu64 secret_share 0xu64 +lpc 0xu64 polynomial2 0xu64 bits-in-random [0-64] + - name : Profile list name. + - id : Profile id, it can be 0 or 1. + A maximum of two profiles can be configured per profile list. + - validator-key : Secret key configured only on the + verifier/decapsulating node used to compare and verify proof of transit. + - prime-number : Prime number for finite field arithmetic as required by the + proof of transit mechanism. + - secret_share : Unique point for each node on the secret polynomial POLY-1. + - lpc : Lagrange Polynomial Constant(LPC) calculated per node based on + its point (x value used for evaluating the points on the polynomial) + on the polynomial used in lagrange extrapolation + for reconstructing polynomial (POLY-3). + - polynomial2 : Is the pre-evaluated value of the point on + 2nd polynomial(POLY-2). This is unique for each node. + It is pre-evaluated for all the coefficients of POLY-2 except + for the constant part of the polynomial that changes per packet + and is received as part of the POT data in the packet. + - bits-in-random : To control the size of the random number to be + generated. This number has to match the other numbers generated and used + in the profile as per the algorithm. + +- **Set a configured profile as active/in-use** : +set pot profile-active name ID [0-1] + - name : Name of the profile list to be used for computing + POT data per packet. + - ID : Identifier of the profile within the list to be used. + +#### On In-band OAM encapsulating node + - Configure the classifier and apply ACL to select packets for iOAM data insertion. + - Example to enable iOAM data insertion for all the packet towards + IPv6 address db06::06 - + + + vpp# classify table miss-next node ip6-lookup mask l3 ip6 dst + + vpp# classify session acl-hit-next node + ip6-add-hop-by-hop table-index 0 match l3 ip6 dst db06::06 + + vpp# set int input acl intfc GigabitEthernet0/0/0 ip6-table 0 + + + - Configure the proof of transit profile list with profiles. +Each profile list referred to by a name can contain 2 profiles, +only one is in use for updating proof of transit data at any time. + - Example profile list example with a profile generated from the + script to verify transit through 3 nodes is: + + + vpp# set pot profile name example id 0 prime-number 0x7fff0000fa884685 + secret_share 0x6c22eff0f45ec56d lpc 0x7fff0000fa884682 + polynomial2 0xffb543d4a9c bits-in-random 63 + + - Enable one of the profiles from the configured profile list as active + so that is will be used for calculating proof of transit + +Example enable profile ID 0 from profile list example configured above: + + + vpp# set pot profile-active name example ID 0 + + + - Enable POT option to be inserted + + + vpp# set ioam rewrite pot + + +#### On in-band OAM transit node + - Configure the proof of transit profile list with profiles for transit node. +Example: + + + vpp# set pot profile name example id 0 prime-number 0x7fff0000fa884685 + secret_share 0x564cdbdec4eb625d lpc 0x1 + polynomial2 0x23f3a227186a bits-in-random 63 + +#### On in-band OAM decapsulating node / verifier +- The decapsulating node, similar to the encapsulating node requires +classification of the packets to remove iOAM data from. + - Example to decapsulate iOAM data for packets towards db06::06 + configure classifier and enable it as an ACL as follows: + + + vpp# classify table miss-next node ip6-lookup mask l3 ip6 dst + + vpp# classify session acl-hit-next node ip6-lookup table-index 0 + match l3 ip6 dst db06::06 opaque-index 100 + + vpp# set int input acl intfc GigabitEthernet0/0/0 ip6-table 0 + +- To update and verify the proof of transit, POT profile list should be configured. + - Example POT profile list configured as follows: + + vpp# set pot profile name example id 0 validate-key 0x7fff0000fa88465d + prime-number 0x7fff0000fa884685 secret_share 0x7a08fbfc5b93116d lpc 0x3 + polynomial2 0x3ff738597ce bits-in-random 63 + +## Operational data + +Following CLIs are available to check iOAM operation: +- To check iOAM configuration that are effective use "show ioam summary" + +Example: + + vpp# show ioam summary + REWRITE FLOW CONFIGS - Not configured + HOP BY HOP OPTIONS - TRACE CONFIG - + Trace Type : 0x1f (31) + Trace timestamp precision : 1 (Milliseconds) + Num of trace nodes : 4 + Node-id : 0x2 (2) + App Data : 0x1234 (4660) + POT OPTION - 1 (Enabled) + Try 'show ioam pot and show pot profile' for more information + +- To find statistics about packets for which iOAM options were +added (encapsulating node) and removed (decapsulating node) execute +*show errors* + +Example on encapsulating node: + + + vpp# show error + Count Node Reason + 1208804706 ip6-inacl input ACL hits + 1208804706 ip6-add-hop-by-hop Pkts w/ added ip6 hop-by-hop options + +Example on decapsulating node: + + vpp# show error + Count Node Reason + 69508569 ip6-inacl input ACL hits + 69508569 ip6-pop-hop-by-hop Pkts w/ removed ip6 hop-by-hop options + +- To check the POT profiles use "show pot profile" + +Example: + + vpp# show pot profile + Profile list in use : example + POT Profile at index: 0 + ID : 0 + Validator : False (0) + Secret share : 0x564cdbdec4eb625d (6218586935324795485) + Prime number : 0x7fff0000fa884685 (9223090566081300101) + 2nd polynomial(eval) : 0x23f3a227186a (39529304496234) + LPC : 0x1 (1) + Bit mask : 0x7fffffffffffffff (9223372036854775807) + Profile index in use: 0 + Pkts passed : 0x36 (54) + +- To get statistics of POT for packets use "show ioam pot" + +Example at encapsulating or transit node: + + vpp# show ioam pot + Pkts with ip6 hop-by-hop POT options - 54 + Pkts with ip6 hop-by-hop POT options but no profile set - 0 + Pkts with POT in Policy - 0 + Pkts with POT out of Policy - 0 + + +Example at decapsulating/verification node: + + + vpp# show ioam pot + Pkts with ip6 hop-by-hop POT options - 54 + Pkts with ip6 hop-by-hop POT options but no profile set - 0 + Pkts with POT in Policy - 54 + Pkts with POT out of Policy - 0 + +- Tracing - enable trace of IPv6 packets to view the data inserted and +collected. + +Example when the nodes are receiving data over a DPDK interface: +Enable tracing using "trace add dpdk-input 20" and +execute "show trace" to view the iOAM data collected: + + + vpp# trace add dpdk-input 20 + + vpp# show trace + + ------------------- Start of thread 0 vpp_main ------------------- + + Packet 1 + + 00:00:19:294697: dpdk-input + GigabitEthernetb/0/0 rx queue 0 + buffer 0x10e6b: current data 0, length 214, free-list 0, totlen-nifb 0, trace 0x0 + PKT MBUF: port 0, nb_segs 1, pkt_len 214 + buf_len 2176, data_len 214, ol_flags 0x0, data_off 128, phys_addr 0xe9a35a00 + packet_type 0x0 + IP6: 00:50:56:9c:df:72 -> 00:50:56:9c:be:55 + IP6_HOP_BY_HOP_OPTIONS: db05::2 -> db06::6 + tos 0x00, flow label 0x0, hop limit 63, payload length 160 + 00:00:19:294737: ethernet-input + IP6: 00:50:56:9c:df:72 -> 00:50:56:9c:be:55 + 00:00:19:294753: ip6-input + IP6_HOP_BY_HOP_OPTIONS: db05::2 -> db06::6 + tos 0x00, flow label 0x0, hop limit 63, payload length 160 + 00:00:19:294757: ip6-lookup + fib 0 adj-idx 15 : indirect via db05::2 flow hash: 0x00000000 + IP6_HOP_BY_HOP_OPTIONS: db05::2 -> db06::6 + tos 0x00, flow label 0x0, hop limit 63, payload length 160 + 00:00:19:294802: ip6-hop-by-hop + IP6_HOP_BY_HOP: next index 5 len 96 traced 96 Trace Type 0x1f , 1 elts left + [0] ttl 0x0 node ID 0x0 ingress 0x0 egress 0x0 ts 0x0 + app 0x0 + [1] ttl 0x3e node ID 0x3 ingress 0x1 egress 0x2 ts 0xb68c2213 + app 0x1234 + [2] ttl 0x3f node ID 0x2 ingress 0x1 egress 0x2 ts 0xb68c2204 + app 0x1234 + [3] ttl 0x40 node ID 0x1 ingress 0x5 egress 0x6 ts 0xb68c2200 + app 0x1234 + POT opt present + random = 0x577a916946071950, Cumulative = 0x10b46e78a35a392d, Index = 0x0 + 00:00:19:294810: ip6-rewrite + tx_sw_if_index 1 adj-idx 14 : GigabitEthernetb/0/0 + IP6: 00:50:56:9c:be:55 -> 00:50:56:9c:df:72 flow hash: 0x00000000 + IP6: 00:50:56:9c:be:55 -> 00:50:56:9c:df:72 + IP6_HOP_BY_HOP_OPTIONS: db05::2 -> db06::6 + tos 0x00, flow label 0x0, hop limit 62, payload length 160 + 00:00:19:294814: GigabitEthernetb/0/0-output + GigabitEthernetb/0/0 + IP6: 00:50:56:9c:be:55 -> 00:50:56:9c:df:72 + IP6_HOP_BY_HOP_OPTIONS: db05::2 -> db06::6 + tos 0x00, flow label 0x0, hop limit 62, payload length 160 + 00:00:19:294820: GigabitEthernetb/0/0-tx + GigabitEthernetb/0/0 tx queue 0 + buffer 0x10e6b: current data 0, length 214, free-list 0, totlen-nifb 0, trace 0x0 + IP6: 00:50:56:9c:be:55 -> 00:50:56:9c:df:72 + + IP6_HOP_BY_HOP_OPTIONS: db05::2 -> db06::6 + + tos 0x00, flow label 0x0, hop limit 62, payload length 160 + + +[iOAM-Devnet]: +[iOAM-ietf-requirements]: +[iOAM-ietf-transport]: +[iOAM-ietf-data]: +[iOAM-ietf-proof-of-transit]: diff --git a/plugins/lb-plugin/README.md b/plugins/lb-plugin/README.md deleted file mode 100644 index 4effe533..00000000 --- a/plugins/lb-plugin/README.md +++ /dev/null @@ -1,141 +0,0 @@ -# Load Balancer plugin for VPP - -## Version - -The load balancer plugin is currently in *beta* version. -Both CLIs and APIs are subject to *heavy* changes. -Wich also means feedback is really welcome regarding features, apis, etc... - -## Overview - -This plugin provides load balancing for VPP in a way that is largely inspired -from Google's MagLev: http://research.google.com/pubs/pub44824.html - -The load balancer is configured with a set of Virtual IPs (VIP, which can be -prefixes), and for each VIP, with a set of Application Server addresses (ASs). - -Traffic received for a given VIP (or VIP prefix) is tunneled using GRE towards -the different ASs in a way that (tries to) ensure that a given session will -always be tunneled to the same AS. - -Both VIPs or ASs can be IPv4 or IPv6, but for a given VIP, all ASs must be using -the same encap. type (i.e. IPv4+GRE or IPv6+GRE). Meaning that for a given VIP, -all AS addresses must be of the same family. - -## Performances - -The load balancer has been tested up to 1 millions flows and still forwards more -than 3Mpps per core in such circumstances. -Although 3Mpps seems already good, it is likely that performances will be improved -in next versions. - -## Configuration - -### Global LB parameters - -The load balancer needs to be configured with some parameters: - - lb conf [ip4-src-address ] [ip6-src-address ] - [buckets ] [timeout ] - -ip4-src-address: the source address used to send encap. packets using IPv4. - -ip6-src-address: the source address used to send encap. packets using IPv6. - -buckets: the *per-thread* established-connexions-table number of buckets. - -timeout: the number of seconds a connection will remain in the - established-connexions-table while no packet for this flow - is received. - - -### Configure the VIPs - - lb vip [encap (gre6|gre4)] [new_len ] [del] - -new_len is the size of the new-connection-table. It should be 1 or 2 orders of -magnitude bigger than the number of ASs for the VIP in order to ensure a good -load balancing. - -Examples: - - lb vip 2002::/16 encap gre6 new_len 1024 - lb vip 2003::/16 encap gre4 new_len 2048 - lb vip 80.0.0.0/8 encap gre6 new_len 16 - lb vip 90.0.0.0/8 encap gre4 new_len 1024 - -### Configure the ASs (for each VIP) - - lb as [
[
[...]]] [del] - -You can add (or delete) as many ASs at a time (for a single VIP). -Note that the AS address family must correspond to the VIP encap. IP family. - -Examples: - - lb as 2002::/16 2001::2 2001::3 2001::4 - lb as 2003::/16 10.0.0.1 10.0.0.2 - lb as 80.0.0.0/8 2001::2 - lb as 90.0.0.0/8 10.0.0.1 - - - -## Monitoring - -The plugin provides quite a bunch of counters and information. -These are still subject to quite significant changes. - - show lb - show lb vip - show lb vip verbose - - show node counters - - -## Design notes - -### Multi-Threading - -MagLev is a distributed system which pseudo-randomly generates a -new-connections-table based on AS names such that each server configured with -the same set of ASs ends up with the same table. Connection stickyness is then -ensured with an established-connections-table. Using ECMP, it is assumed (but -not relied on) that servers will mostly receive traffic for different flows. - -This implementation pushes the parallelism a little bit further by using -one established-connections table per thread. This is equivalent to assuming -that RSS will make a job similar to ECMP, and is pretty useful as threads don't -need to get a lock in order to write in the table. - -### Hash Table - -A load balancer requires an efficient read and write hash table. The hash table -used by ip6-forward is very read-efficient, but not so much for writing. In -addition, it is not a big deal if writing into the hash table fails (again, -MagLev uses a flow table but does not heaviliy relies on it). - -The plugin therefore uses a very specific (and stupid) hash table. - - Fixed (and power of 2) number of buckets (configured at runtime) - - Fixed (and power of 2) elements per buckets (configured at compilation time) - -### Reference counting - -When an AS is removed, there is two possible ways to react. - - Keep using the AS for established connections - - Change AS for established connections (likely to cause error for TCP) - -In the first case, although an AS is removed from the configuration, its -associated state needs to stay around as long as it is used by at least one -thread. - -In order to avoid locks, a specific reference counter is used. The design is quite -similar to clib counters but: - - It is possible to decrease the value - - Summing will not zero the per-thread counters - - Only the thread can reallocate its own counters vector (to avoid concurrency issues) - -This reference counter is lock free, but reading a count of 0 does not mean -the value can be freed unless it is ensured by *other* means that no other thread -is concurrently referencing the object. In the case of this plugin, it is assumed -that no concurrent event will take place after a few seconds. - diff --git a/plugins/lb-plugin/lb_plugin_doc.md b/plugins/lb-plugin/lb_plugin_doc.md new file mode 100644 index 00000000..c7885ffb --- /dev/null +++ b/plugins/lb-plugin/lb_plugin_doc.md @@ -0,0 +1,141 @@ +# Load Balancer plugin for VPP {#lb_plugin_doc} + +## Version + +The load balancer plugin is currently in *beta* version. +Both CLIs and APIs are subject to *heavy* changes. +Wich also means feedback is really welcome regarding features, apis, etc... + +## Overview + +This plugin provides load balancing for VPP in a way that is largely inspired +from Google's MagLev: http://research.google.com/pubs/pub44824.html + +The load balancer is configured with a set of Virtual IPs (VIP, which can be +prefixes), and for each VIP, with a set of Application Server addresses (ASs). + +Traffic received for a given VIP (or VIP prefix) is tunneled using GRE towards +the different ASs in a way that (tries to) ensure that a given session will +always be tunneled to the same AS. + +Both VIPs or ASs can be IPv4 or IPv6, but for a given VIP, all ASs must be using +the same encap. type (i.e. IPv4+GRE or IPv6+GRE). Meaning that for a given VIP, +all AS addresses must be of the same family. + +## Performances + +The load balancer has been tested up to 1 millions flows and still forwards more +than 3Mpps per core in such circumstances. +Although 3Mpps seems already good, it is likely that performances will be improved +in next versions. + +## Configuration + +### Global LB parameters + +The load balancer needs to be configured with some parameters: + + lb conf [ip4-src-address ] [ip6-src-address ] + [buckets ] [timeout ] + +ip4-src-address: the source address used to send encap. packets using IPv4. + +ip6-src-address: the source address used to send encap. packets using IPv6. + +buckets: the *per-thread* established-connexions-table number of buckets. + +timeout: the number of seconds a connection will remain in the + established-connexions-table while no packet for this flow + is received. + + +### Configure the VIPs + + lb vip [encap (gre6|gre4)] [new_len ] [del] + +new_len is the size of the new-connection-table. It should be 1 or 2 orders of +magnitude bigger than the number of ASs for the VIP in order to ensure a good +load balancing. + +Examples: + + lb vip 2002::/16 encap gre6 new_len 1024 + lb vip 2003::/16 encap gre4 new_len 2048 + lb vip 80.0.0.0/8 encap gre6 new_len 16 + lb vip 90.0.0.0/8 encap gre4 new_len 1024 + +### Configure the ASs (for each VIP) + + lb as [
[
[...]]] [del] + +You can add (or delete) as many ASs at a time (for a single VIP). +Note that the AS address family must correspond to the VIP encap. IP family. + +Examples: + + lb as 2002::/16 2001::2 2001::3 2001::4 + lb as 2003::/16 10.0.0.1 10.0.0.2 + lb as 80.0.0.0/8 2001::2 + lb as 90.0.0.0/8 10.0.0.1 + + + +## Monitoring + +The plugin provides quite a bunch of counters and information. +These are still subject to quite significant changes. + + show lb + show lb vip + show lb vip verbose + + show node counters + + +## Design notes + +### Multi-Threading + +MagLev is a distributed system which pseudo-randomly generates a +new-connections-table based on AS names such that each server configured with +the same set of ASs ends up with the same table. Connection stickyness is then +ensured with an established-connections-table. Using ECMP, it is assumed (but +not relied on) that servers will mostly receive traffic for different flows. + +This implementation pushes the parallelism a little bit further by using +one established-connections table per thread. This is equivalent to assuming +that RSS will make a job similar to ECMP, and is pretty useful as threads don't +need to get a lock in order to write in the table. + +### Hash Table + +A load balancer requires an efficient read and write hash table. The hash table +used by ip6-forward is very read-efficient, but not so much for writing. In +addition, it is not a big deal if writing into the hash table fails (again, +MagLev uses a flow table but does not heaviliy relies on it). + +The plugin therefore uses a very specific (and stupid) hash table. + - Fixed (and power of 2) number of buckets (configured at runtime) + - Fixed (and power of 2) elements per buckets (configured at compilation time) + +### Reference counting + +When an AS is removed, there is two possible ways to react. + - Keep using the AS for established connections + - Change AS for established connections (likely to cause error for TCP) + +In the first case, although an AS is removed from the configuration, its +associated state needs to stay around as long as it is used by at least one +thread. + +In order to avoid locks, a specific reference counter is used. The design is quite +similar to clib counters but: + - It is possible to decrease the value + - Summing will not zero the per-thread counters + - Only the thread can reallocate its own counters vector (to avoid concurrency issues) + +This reference counter is lock free, but reading a count of 0 does not mean +the value can be freed unless it is ensured by *other* means that no other thread +is concurrently referencing the object. In the case of this plugin, it is assumed +that no concurrent event will take place after a few seconds. + diff --git a/vnet/vnet/ipsec-gre/ipsec_gre_doc.md b/vnet/vnet/ipsec-gre/ipsec_gre_doc.md index e9e5fdd9..e1bb9cda 100644 --- a/vnet/vnet/ipsec-gre/ipsec_gre_doc.md +++ b/vnet/vnet/ipsec-gre/ipsec_gre_doc.md @@ -1,4 +1,4 @@ -# VPP L2-GRE over IPsec implementation +# VPP L2-GRE over IPsec implementation {#ipsec_gre_doc} This is a memo intended to contain documentation of the VPP L2-GRE over IPsec implementation. Everything that is not directly obvious should come here. diff --git a/vnet/vnet/lldp/lldp.md b/vnet/vnet/lldp/lldp.md deleted file mode 100644 index 55de8f06..00000000 --- a/vnet/vnet/lldp/lldp.md +++ /dev/null @@ -1,84 +0,0 @@ -# VPP Link Layer Discovery Protocol (LLDP) implementation - -This is a memo intended to contain documentation of the VPP LLDP implementation -Everything that is not directly obvious should come here. - - -## LLDP -LLDP is a link layer protocol to advertise the capabilities and current status of the system. - -There are 2 nodes handling LLDP - -1.) input-node which processes incoming packets and updates the local database -2.) process-node which is responsible for sending out LLDP packets from VPP side - - -### Configuration - -LLDP has a global configuration and a per-interface enable setting. - -Global configuration is modified using the "set lldp" command - -set lldp [system-name ] [tx-hold ] [tx-interval ] - -system-name: the name of the VPP system sent to peers in the system-name TLV -tx-hold: multiplier for tx-interval when setting time-to-live (TTL) value in the LLDP packets (TTL = tx-hold * tx-interval + 1, if TTL > 65535, then TTL = 65535) -tx-interval: time interval between sending out LLDP packets - -Per interface setting is done using the "set interface lldp" command - -set interface lldp (enable | disable) - -interface: the name of the interface for which to enable/disable LLDP - - -### Configuration example - -Configure system-name as "VPP" and transmit interval to 10 seconds: - -set lldp system-name VPP tx-interval 10 - -Enable LLDP on interface TenGigabitEthernet5/0/1 - -set interface lldp TenGigabitEthernet5/0/1 enable - - -### Operational data - -The list of LLDP-enabled interfaces which are up can be shown using "show lldp" command - -Example: -DBGvpp# show lldp -Local interface Peer chassis ID Remote port ID Last heard Last sent Status -GigabitEthernet2/0/1 never 27.0s ago inactive -TenGigabitEthernet5/0/1 8c:60:4f:dd:ca:52 Eth1/3/3 20.1s ago 18.3s ago active - -All LLDP configuration data with all LLDP-enabled interfaces can be shown using "show lldp detail" command - -Example: -DBGvpp# show lldp detail -LLDP configuration: -Configured system name: vpp -Configured tx-hold: 4 -Configured tx-interval: 30 - -LLDP-enabled interface table: - -Interface name: GigabitEthernet2/0/1 -Interface/peer state: inactive(timeout) -Last known peer chassis ID: -Last known peer port ID: -Last packet sent: 12.4s ago -Last packet received: never - -Interface name: GigabitEthernet2/0/2 -Interface/peer state: interface down -Last packet sent: never - -Interface name: TenGigabitEthernet5/0/1 -Interface/peer state: active -Peer chassis ID: 8c:60:4f:dd:ca:52(MAC address) -Remote port ID: Eth1/3/3(Locally assigned) -Last packet sent: 3.6s ago -Last packet received: 5.5s ago - diff --git a/vnet/vnet/lldp/lldp_doc.md b/vnet/vnet/lldp/lldp_doc.md new file mode 100644 index 00000000..bac480a5 --- /dev/null +++ b/vnet/vnet/lldp/lldp_doc.md @@ -0,0 +1,84 @@ +# VPP Link Layer Discovery Protocol (LLDP) implementation {#lldp_doc} + +This is a memo intended to contain documentation of the VPP LLDP implementation +Everything that is not directly obvious should come here. + + +## LLDP +LLDP is a link layer protocol to advertise the capabilities and current status of the system. + +There are 2 nodes handling LLDP + +1.) input-node which processes incoming packets and updates the local database +2.) process-node which is responsible for sending out LLDP packets from VPP side + + +### Configuration + +LLDP has a global configuration and a per-interface enable setting. + +Global configuration is modified using the "set lldp" command + +set lldp [system-name ] [tx-hold ] [tx-interval ] + +system-name: the name of the VPP system sent to peers in the system-name TLV +tx-hold: multiplier for tx-interval when setting time-to-live (TTL) value in the LLDP packets (TTL = tx-hold * tx-interval + 1, if TTL > 65535, then TTL = 65535) +tx-interval: time interval between sending out LLDP packets + +Per interface setting is done using the "set interface lldp" command + +set interface lldp (enable | disable) + +interface: the name of the interface for which to enable/disable LLDP + + +### Configuration example + +Configure system-name as "VPP" and transmit interval to 10 seconds: + +set lldp system-name VPP tx-interval 10 + +Enable LLDP on interface TenGigabitEthernet5/0/1 + +set interface lldp TenGigabitEthernet5/0/1 enable + + +### Operational data + +The list of LLDP-enabled interfaces which are up can be shown using "show lldp" command + +Example: +DBGvpp# show lldp +Local interface Peer chassis ID Remote port ID Last heard Last sent Status +GigabitEthernet2/0/1 never 27.0s ago inactive +TenGigabitEthernet5/0/1 8c:60:4f:dd:ca:52 Eth1/3/3 20.1s ago 18.3s ago active + +All LLDP configuration data with all LLDP-enabled interfaces can be shown using "show lldp detail" command + +Example: +DBGvpp# show lldp detail +LLDP configuration: +Configured system name: vpp +Configured tx-hold: 4 +Configured tx-interval: 30 + +LLDP-enabled interface table: + +Interface name: GigabitEthernet2/0/1 +Interface/peer state: inactive(timeout) +Last known peer chassis ID: +Last known peer port ID: +Last packet sent: 12.4s ago +Last packet received: never + +Interface name: GigabitEthernet2/0/2 +Interface/peer state: interface down +Last packet sent: never + +Interface name: TenGigabitEthernet5/0/1 +Interface/peer state: active +Peer chassis ID: 8c:60:4f:dd:ca:52(MAC address) +Remote port ID: Eth1/3/3(Locally assigned) +Last packet sent: 3.6s ago +Last packet received: 5.5s ago + diff --git a/vnet/vnet/map/map_doc.md b/vnet/vnet/map/map_doc.md index 230c52df..17f3c511 100644 --- a/vnet/vnet/map/map_doc.md +++ b/vnet/vnet/map/map_doc.md @@ -1,4 +1,4 @@ -# VPP MAP and Lw4o6 implementation +# VPP MAP and Lw4o6 implementation {#map_doc} This is a memo intended to contain documentation of the VPP MAP and Lw4o6 implementations. Everything that is not directly obvious should come here. -- cgit 1.2.3-korg From f304552665ddd0658b361698424b815ff8237ef0 Mon Sep 17 00:00:00 2001 From: Chris Luke Date: Wed, 21 Sep 2016 09:35:01 -0400 Subject: Copy the 16.09 release notes to master - Copy the 16.09 release notes to master - Add some notes for 16.06 (from the press release) - Add some structure around notes for each release - Add skeleton for next release Change-Id: Id5a5d8bf02fce1bbaed303e6c6e4f8908c7e7d75 Signed-off-by: Chris Luke --- README.md | 19 +++-- RELEASE.md | 200 ++++++++++++++++++++++++++++++++++++++++++++++++++++ doxygen/user_doc.md | 1 + 3 files changed, 213 insertions(+), 7 deletions(-) create mode 100644 RELEASE.md (limited to 'README.md') diff --git a/README.md b/README.md index bf26c9de..cf43a4df 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ Vector Packet Processing ======================== -## Introduction. +## Introduction The VPP platform is an extensible framework that provides out-of-the-box production quality switch/router functionality. It is the open source version @@ -16,7 +16,13 @@ For more information on VPP and its features please visit the [What is VPP?](https://wiki.fd.io/view/VPP/What_is_VPP%3F) pages. -## Directory layout. +## Changes + +Details of the changes leading up to this version of VPP can be found under +@ref release_notes. + + +## Directory layout Directory name | Description ---------------------- | ------------------------------------------- @@ -25,7 +31,6 @@ Directory name | Description doxygen | Documentation generator configuration dpdk | DPDK patches and build infrastructure g2 | Event log visualization tool - gmod | perf related? perftool | Performance tool @ref plugins | VPP bundled plugins directory @ref svm | Shared virtual memory allocation library @@ -43,7 +48,7 @@ Directory name | Description ignore any @@ref labels in the above table.) -## Getting started. +## Getting started In general anyone interested in building, developing or running VPP should consult the [VPP wiki](https://wiki.fd.io/view/VPP) for more complete @@ -57,7 +62,7 @@ coverage of the topic. For the impatient, some salient information is distilled below. -### Quick-start: On an existing Linux host. +### Quick-start: On an existing Linux host To install system dependencies, build VPP and then install it, simply run the build script. This should be performed a non-privileged user with `sudo` @@ -73,7 +78,7 @@ interest. To see the available targets run: make -### Quick-start: Vagrant. +### Quick-start: Vagrant The directory `build-root/vagrant` contains a `VagrantFile` and supporting scripts to bootstrap a working VPP inside a Vagrant-managed Virtual Machine. @@ -88,7 +93,7 @@ for this can be found [on the Setting up Vagrant wiki page] (https://wiki.fd.io/view/DEV/Setting_Up_Vagrant). -## More information. +## More information Several modules provide documentation, see @subpage user_doc for more information. diff --git a/RELEASE.md b/RELEASE.md new file mode 100644 index 00000000..3ed72d06 --- /dev/null +++ b/RELEASE.md @@ -0,0 +1,200 @@ +# Release Notes {#release_notes} + +* @subpage release_notes_1701 +* @subpage release_notes_1609 +* @subpage release_notes_1606 + +@page release_notes_1701 Release notes for VPP 17.01 + +@note This release was for a while known as 16.12. +@todo Release 17.01 needs release notes. + +## Features + +## Known issues + +## Issues fixed + + +@page release_notes_1609 Release notes for VPP 16.09 + +## Features + +- [Integrated July 2016 DPDK release](http://www.dpdk.org/doc/guides/rel_notes/release_16_07.html) + - DPDK-vhost is depreciated pending a complete rework of the original integration and + addressing of rx performance deltas. + - Patches required for DPDK 16.07: + - Correctly setting the Packet Type in the IGB, IXGBE and i40e drivers. + - Correctly setting checksum in the i40e driver. + - NXP DPAA2 PMD Driver. + - rte_delay (yield) functionality. + +- Add “in tree” plugins: + - IPv6 ILA. + - iOAM. + - Load Balancer. + - SNAT. + +- High-performance (line-rate) “neutron like” L4 port-filtering. + +- API refactoring - addressing some of the issues around JVPP bindings. + - Accommodating plugins [(e.g. NSH_SFC)](https://wiki.fd.io/view/NSH_SFC) + - Binding for [python](https://wiki.fd.io/view/VPP/Python_API) + +- LISP + - L2 LISP overlays + - Multitenancy + - Multihoming + - RTR mode + - Map-resolver failover algorithm + +- Support 64-bit vector lengths, huge shared-memory segments. + +- Dynamic IP Feature ordering + - IP Features can now specify features they appear before and after + +- 16.09 Builds + - Ubuntu 14.04 LTS - Trusty Tahr + - Ubuntu 16.04 LTS - Xenial Xerus + - CentOS 7 + - More information on [VPP wiki](https://wiki.fd.io/view/VPP/Installing_VPP_binaries_from_packages) + +- Performance, characterize and document performance for this release + [(more information on CSIT page)](https://wiki.fd.io/view/CSIT) + + - IPv4 and IPv6 Scale - performance tests. + - Bidirectional 10k/100k/1M flows. + - 64B,570B, 1518B,9000B packet sizes. + - IPv6 iACL - performance + - DUT1 and DUT2 are configured with IPv6 routing, two static IPv6 /64 routes and IPv6 iAcl + security whitelist ingress /64 filter entries applied on links. + - TG traffic profile contains two L3 flow-groups (flow-group per direction, 253 flows per + flow-group) with all packets containing Ethernet header, IPv6 header and generated payload. + MAC addresses are matching MAC addresses of the TG node interfaces. + + - L2XC VXLANoIPv4 - performance + - DUT1 and DUT2 are configured with L2 cross-connect. VXLAN tunnels are configured between + L2XCs on DUT1 and DUT2. + - TG traffic profile contains two L3 flow-groups (flow-group per direction, 253 flows per + flow-group) with all packets containing Ethernet header, IPv4 header with IP protocol=61 + and generated payload. MAC addresses are matching MAC addresses of the TG node interfaces. + +- Documentation + - Autogenerated CLI documentation. + - Using doxygen to automate API/Node documentation. + - [(available online)](https://docs.fd.io/vpp/16.09/) + +- Resolved all static analysis issues found by Coverity + - Beginning of 16.09 cycle: 505 issues. + - Release: 0 outstanding issues. + + +## Known issues + +Issues in fd.io are tracked in [JIRA](https://jira.fd.io). + +Issue | Description +--- | --- +VPP-391 | vpp debug version assert appeared in the process of start +VPP-380 | Mapping algorithm compute wrong ea-bits when IPv4 prefix 0.0.0.0/0 +VPP-371 | load_one_plugin:63: Loaded plugin: message from vppctl +VPP-367 | vpp packages need to depend on specific versions of each other +VPP-312 | IP6 FIB gets in indeterminate state by duplicating commands +VPP-224 | Lookup-in-vrf can not be set correctly +VPP-206 | Fix classify table delete +VPP-203 | Fix binary API for reading vpp node graph +VPP-147 | Inconsistent behaviour when adding L2 FIB filter entry +VPP-99 | VPP doesn't discard DHCPOFFER message with wrong XID + + +## Issues fixed + +Issues in fd.io are tracked in [JIRA](https://jira.fd.io). + +Issue | Description +--- | --- +VPP-396 | Ubuntu systems Graphviz bug +VPP-390 | vpp-lib rpm fails to include *.so symlinks, causing linking problems with out of tree builds +VPP-388 | IPSec output feature assumes packets have been ethernet rewritten +VPP-385 | ARP for indirect adjacencies not working correctly +VPP-361 | Memory leak on delete of VXLAN over IPv6 tunnel +VPP-357 | VNI not set correctly when removing LISP fwd entries +VPP-349 | sw_interface_vhost_user_dump not working +VPP-345 | net/enic: bad L4 checksum ptype set on ICMP packets +VPP-340 | MAP-T wrong destination address +VPP-330 | Use fifo to store LISP pending map-requests +VPP-326 | map_add_domain VAT command: unable to configure domain with mtu parameter +VPP-318 | The map_add_domain VAT command accepts invalid arguments +VPP-315 | Fix "show vxlan-gpe" issue +VPP-310 | Mapping algorithm compute wrong ea-bits +VPP-239 | LISP IP forwarding does not tag packets that hit negative mapping entries +VPP-235 | Invalid help in VAT for sw_interface_set_l2_bridge +VPP-228 | Mapping algorithm sends packet to wrong IPv6 address +VPP-214 | vpp-api-test: api_ipsec_sad_add_del_entry: vector "ck" not initialized +VPP-200 | VPP - TAP port create problem +VPP-189 | Coverity Issues for 16.09 +VPP-184 | u16 translating to char ,not short +VPP-179 | Adjacency share-count botch +VPP-163 | "show ip6 interface" ignores non-global addresses +VPP-155 | Netmap: Inconsistency in interface state between "show hardware" and "show interface" +VPP-145 | Dynamically compute IP feature ordering based on constraints +VPP-137 | VPP sends ARP with wrong requested IP +VPP-118 | JVpp: 0 length arrays not handled properly in VPP responses +VPP-112 | linux kernel info missing from build log +VPP-110 | vxlan encap node should never touch a deleted tunnel +VPP-107 | RPM build broken in master +VPP-92 | segment routing is not properly filling out the segment list +VPP-91 | segment routing add/del tunnel lookup doesn't work +VPP-84 | af_packet throws a fatal error on EAGAIN +VPP-74 | Clang compile fails due to warning in vlib/unix/cli.c +VPP-64 | Top level "make pkg-deb" fails if CDPATH is set in user env. +VPP-48 | Traceroute does not terminate when VPP is the target +VPP-23 | CLI pager does not gracefully handle lines longer than the terminal width + + +@page release_notes_1606 Release notes for VPP 16.06 + + +The FD.io Project, relentlessly focused on data IO speed and efficiency +supporting the creation of high performance, flexible, and scalable software +defined infrastructures, announces the availability of the community’s first +software release (16.06). + +In the four months since launching, FD.io has brought together more than 75 +developers from 11 different companies including network operators, solution +providers chip vendors, and network equipment vendors who are collaborating to +enhance and innovate around the Vector Packet Processing (VPP) technology. The +FD.io community has quickly formed to grow the number of projects from the +initial VPP project to an additional 6 projects addressing a diverse set of +requirements and usability across a variety of deployment environments. + +The 16.06 release brings unprecedented performance: 480Gbps/200mpps with 8 +million routes and 2k whitelist entries on standard high volume x86 servers. + + +## Features + +In addition to the existing full suite of vswitch/vrouter features, the new +16.06 release adds: + +* Enhanced Switching and Routing: + * IPv6 Segment Routing multicast support. + * LISP xTR support. + * VXLAN over IPv6 underlay. + * Per interface whitelists. + * Shared adjacencies in FIB. + +* New and improved interface support: + * Jumbo frame support for vhost-user. + * Netmap interface support. + * AF_Packet interface support. + +* Expanded and improved programmability: + * Python API bindings. + * Enhanced JVPP Java API bindings. + * Debugging CLI. + +* Expanded Hardware and Software Support: + * Support for ARM 32 targets including Rasberry Pi single-board computer. + * Support for DPDK 16.04. + diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index b08ac86a..48a9fff3 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -3,6 +3,7 @@ User Configuration Examples {#user_doc} Several modules provide operational documentation. +- [GUI guided user demo](https://wiki.fd.io/view/VPP_Sandbox/vpp-userdemo) - @subpage ipsec_gre_doc - @subpage map_doc - @subpage lldp_doc -- cgit 1.2.3-korg From fdc62abdc113ea63dc867375bd49ef3043dcd290 Mon Sep 17 00:00:00 2001 From: Chris Luke Date: Wed, 28 Dec 2016 09:44:47 -0500 Subject: Repair Doxygen build infrastructure After Gerrit 4430 much of the documentation failed to build, but silently so it was easily missed; equally missing that several paths have been missing for a while. - Correct paths after directory tree changes. - Doxygen now bails when input paths don't exist. - Fix up some of the less deranged entries in the documentation index. - Exclude the LUA tree, its documentation is a mess. Change-Id: I35e6b433feee5e05bca772d93aa1635c724db734 Signed-off-by: Chris Luke --- README.md | 30 +++++++++++----------- doxygen/Makefile | 38 +++++++++++++++++++-------- doxygen/user_doc.md | 2 ++ src/vnet/bfd/bfd_doc.md | 4 ++- src/vnet/span/span.md | 65 ----------------------------------------------- src/vnet/span/span_doc.md | 65 +++++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 112 insertions(+), 92 deletions(-) delete mode 100644 src/vnet/span/span.md create mode 100644 src/vnet/span/span_doc.md (limited to 'README.md') diff --git a/README.md b/README.md index cf43a4df..8520b395 100644 --- a/README.md +++ b/README.md @@ -30,22 +30,22 @@ Directory name | Description build-root | Build output directory doxygen | Documentation generator configuration dpdk | DPDK patches and build infrastructure - g2 | Event log visualization tool - perftool | Performance tool -@ref plugins | VPP bundled plugins directory -@ref svm | Shared virtual memory allocation library +@ref plugins | Not-yet-relocated VPP bundled plugins directory +@ref src | VPP source code +@ref src/plugins | VPP bundled plugins directory +@ref src/svm | Shared virtual memory allocation library + src/tests | Unit tests + src/vat | VPP API test program +@ref src/vlib | VPP application library +@ref src/vlibapi | VPP API library +@ref src/vlibmemory | VPP Memory management +@ref src/vlibsocket | VPP Socket I/O +@ref src/vnet | VPP networking +@ref src/vpp | VPP application +@ref src/vpp-api | VPP application API bindings +@ref src/vppinfra | VPP core library test | Unit tests -@ref vlib | VPP application library source -@ref vlib-api | VPP API library source -@ref vnet | VPP networking source -@ref vpp | VPP application source -@ref vpp-api | VPP application API source - vppapigen | VPP API generator source - vpp-api-test | VPP API test program source -@ref vppinfra | VPP core library source - -(If the page you are viewing is not generated by Doxygen then -ignore any @@ref labels in the above table.) +@ref vpp-api | Not-yet-relocated API bindings ## Getting started diff --git a/doxygen/Makefile b/doxygen/Makefile index 92fa3635..ffce3c13 100644 --- a/doxygen/Makefile +++ b/doxygen/Makefile @@ -42,13 +42,17 @@ DOC_MAC_PY_DEPENDS = pyparsing jinja2 DOXY_DIR ?= $(WS_ROOT)/doxygen # Primary source directories +DOXY_SRC ?= src DOXY_SRC_DIRECTORIES = \ - vppinfra \ - svm \ - vlib \ - vlib-api \ - vnet \ - vpp \ + $(DOXY_SRC)/vppinfra \ + $(DOXY_SRC)/svm \ + $(DOXY_SRC)/vlib \ + $(DOXY_SRC)/vlibapi \ + $(DOXY_SRC)/vlibmemory \ + $(DOXY_SRC)/vlibsocket \ + $(DOXY_SRC)/vnet \ + $(DOXY_SRC)/vpp \ + $(DOXY_SRC)/vpp-api \ vpp-api # Input directories and files @@ -56,6 +60,7 @@ DOXY_INPUT ?= \ $(wildcard $(WS_ROOT)/*.md) \ $(wildcard $(DOXY_DIR)/*.md) \ $(DOXY_SRC_DIRECTORIES) \ + $(DOXY_SRC)/plugins \ plugins # Strip leading workspace path from input names @@ -67,9 +72,10 @@ DOXY_INPUT := $(subst $(WS_ROOT)/,,$(DOXY_INPUT)) # there's a DPDK equivalent that conflicts. # These must be left-anchored paths for the regexp below to work. DOXY_EXCLUDE ?= \ - vlib/vlib/buffer.c \ - vlib/example \ - plugins/sample-plugin + $(DOXY_SRC)/vlib/vlib/buffer.c \ + $(DOXY_SRC)/vlib/example \ + plugins/sample-plugin \ + vpp-api/lua # Generate a regexp for filenames to exclude DOXY_EXCLUDE_REGEXP = ($(subst .,\.,$(shell echo '$(strip $(DOXY_EXCLUDE))' | sed -e 's/ /|/g'))) @@ -164,9 +170,11 @@ endif bootstrap-doxygen: $(BR)/.doxygen-bootstrap.ok .DELETE_ON_ERROR: $(BR)/.doxygen-siphon.dep -$(BR)/.doxygen-siphon.dep: Makefile +$(BR)/.doxygen-siphon.dep: Makefile \ + $(addprefix,$(WSROOT),$(DOXY_INPUT)) @echo "Building siphon dependencies..." - @set -e; rm -f "$@"; for input in $(DOXY_INPUT); do \ + @rm -f "$@"; for input in $(DOXY_INPUT); do \ + [ -e "$(WS_ROOT)/$$input" ] && \ find "$(WS_ROOT)/$$input" -type f \ \( -name '*.[ch]' -or -name '*.dox' \) -print \ | grep -v -E '^$(WS_ROOT)/$(DOXY_EXCLUDE_REGEXP)' \ @@ -182,7 +190,15 @@ $(BR)/.doxygen-siphon.dep: Makefile .NOTPARALLEL: $(SIPHON_FILES) $(SIPHON_FILES): $(BR)/.doxygen-bootstrap.ok \ $(DOXY_DIR)/siphon-generate \ + $(addprefix,$(WSROOT),$(DOXY_INPUT)) \ $(wildcard $(DOXY_DIR)/siphon/*.py) + @echo "Validating source tree..." + @set -e; for input in $(DOXY_INPUT); do \ + if [ ! -e "$(WS_ROOT)/$$input" ]; then \ + echo "ERROR: Input path '$$input' does not exist." >&2; \ + exit 1; \ + fi; \ + done @rm -rf "$(SIPHON_INPUT)" "$(SIPHON_OUTPUT)" @mkdir -p "$(SIPHON_INPUT)" "$(SIPHON_OUTPUT)" @touch $(SIPHON_INPUT)/files diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index 2e87c877..40303439 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -12,3 +12,5 @@ Several modules provide operational, dataplane-user focused documentation. - @subpage ioam_plugin_doc - @subpage lb_plugin_doc - @subpage flowperpkt_plugin_doc +- @subpage span_doc +- @subpage bfd_doc diff --git a/src/vnet/bfd/bfd_doc.md b/src/vnet/bfd/bfd_doc.md index 1333ed77..3e86b178 100644 --- a/src/vnet/bfd/bfd_doc.md +++ b/src/vnet/bfd/bfd_doc.md @@ -1 +1,3 @@ -TODO +# BFD Notes {#bfd_doc} + +@todo Someone needs to produce this or remove the stub file. diff --git a/src/vnet/span/span.md b/src/vnet/span/span.md deleted file mode 100644 index ee3f814f..00000000 --- a/src/vnet/span/span.md +++ /dev/null @@ -1,65 +0,0 @@ -# VPP SPAN implementation - -This is a memo intended to contain documentation of the VPP SPAN implementation. -Everything that is not directly obvious should come here. - - -## Switched Port Analyzer (SPAN) -Port mirroring is used on a network switch to send a copy of network packets seen on one switch port to a network monitoring connection on another switch port. -Can be used by network engineers or administrators to measure performnce, analyze and debug data or diagnose errors on a network. - -### RX traffic node -There is one static node to mirror incomming packets. -* span-input: Creates a copy of incomming buffer due to incomming buffers can be reused internally. - -Chaining: dpdk-input -> span-input -> -* original buffer is sent to ethernet-input for processing -* buffer copy is sent to interface-output - -### Configuration -SPAN supports the following CLI configuration commands: - -#### Enable/Disable SPAN (CLI) - set interface span [disable | destination ] - -: mirrored interface name -destination : monitoring interface name -disable: delete mirroring - -#### Enable/Disabl SPAN (API) -SPAN supports the following API configuration command: - sw_interface_span_enable_disable src GigabitEthernet0/8/0 dst GigabitEthernet0/9/0 - sw_interface_span_enable_disable src_sw_if_index 1 dst_sw_if_index 2 - -src/src_sw_if_index: mirrored interface name -dst/dst_sw_if_index: monitoring interface name - -#### Remove SPAN entry (API) -SPAN supports the following API configuration command: - sw_interface_span_enable_disable src_sw_if_index 1 dst_sw_if_index 2 disable - -src_sw_if_index: mirrored interface name -dst_sw_if_index: monitoring interface name - -### Configuration example - -Mirror all packets on interface GigabitEthernet0/10/0 to interface GigabitEthernet0/11/0. - -Configure IPv4 addresses on mirrored interface: -set interface ip address GigabitEthernet0/10/0 192.168.1.13/24 -set interface state GigabitEthernet0/10/0 up - -Configure IPv4 addresses on monitoring interface: -set interface ip address GigabitEthernet0/11/0 192.168.2.13/24 -set interface state GigabitEthernet0/11/0 up - -Configure SPAN -set span src GigabitEthernet0/10/0 dst GigabitEthernet0/11/0 - -### Operational data - -Active SPAN mirroring CLI show command: - show interfaces span - -Active SPAN mirroring API dump command: - sw_interface_span_dump diff --git a/src/vnet/span/span_doc.md b/src/vnet/span/span_doc.md new file mode 100644 index 00000000..46480b28 --- /dev/null +++ b/src/vnet/span/span_doc.md @@ -0,0 +1,65 @@ +# VPP SPAN implementation {#span_doc} + +This is a memo intended to contain documentation of the VPP SPAN implementation. +Everything that is not directly obvious should come here. + + +## Switched Port Analyzer (SPAN) +Port mirroring is used on a network switch to send a copy of network packets seen on one switch port to a network monitoring connection on another switch port. +Can be used by network engineers or administrators to measure performnce, analyze and debug data or diagnose errors on a network. + +### RX traffic node +There is one static node to mirror incomming packets. +* span-input: Creates a copy of incomming buffer due to incomming buffers can be reused internally. + +Chaining: dpdk-input -> span-input -> +* original buffer is sent to ethernet-input for processing +* buffer copy is sent to interface-output + +### Configuration +SPAN supports the following CLI configuration commands: + +#### Enable/Disable SPAN (CLI) + set interface span [disable | destination ] + +: mirrored interface name +destination : monitoring interface name +disable: delete mirroring + +#### Enable/Disabl SPAN (API) +SPAN supports the following API configuration command: + sw_interface_span_enable_disable src GigabitEthernet0/8/0 dst GigabitEthernet0/9/0 + sw_interface_span_enable_disable src_sw_if_index 1 dst_sw_if_index 2 + +src/src_sw_if_index: mirrored interface name +dst/dst_sw_if_index: monitoring interface name + +#### Remove SPAN entry (API) +SPAN supports the following API configuration command: + sw_interface_span_enable_disable src_sw_if_index 1 dst_sw_if_index 2 disable + +src_sw_if_index: mirrored interface name +dst_sw_if_index: monitoring interface name + +### Configuration example + +Mirror all packets on interface GigabitEthernet0/10/0 to interface GigabitEthernet0/11/0. + +Configure IPv4 addresses on mirrored interface: +set interface ip address GigabitEthernet0/10/0 192.168.1.13/24 +set interface state GigabitEthernet0/10/0 up + +Configure IPv4 addresses on monitoring interface: +set interface ip address GigabitEthernet0/11/0 192.168.2.13/24 +set interface state GigabitEthernet0/11/0 up + +Configure SPAN +set span src GigabitEthernet0/10/0 dst GigabitEthernet0/11/0 + +### Operational data + +Active SPAN mirroring CLI show command: + show interfaces span + +Active SPAN mirroring API dump command: + sw_interface_span_dump -- cgit 1.2.3-korg From c8c5335e3a1d1ba7ec1248b0ecbf00cc74ea57bb Mon Sep 17 00:00:00 2001 From: Dave Wallace Date: Fri, 17 Feb 2017 00:10:53 -0500 Subject: Add ref to test framework docs in doxygen output. Change-Id: If3081c4a9dde00cd522d1fc5a7daa9b1849684bf Signed-off-by: Dave Wallace --- README.md | 8 ++++++-- doxygen/Makefile | 6 +----- doxygen/test_framework_doc.md | 7 +++++++ 3 files changed, 14 insertions(+), 7 deletions(-) create mode 100644 doxygen/test_framework_doc.md (limited to 'README.md') diff --git a/README.md b/README.md index 8520b395..b70103fb 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,6 @@ Directory name | Description build-root | Build output directory doxygen | Documentation generator configuration dpdk | DPDK patches and build infrastructure -@ref plugins | Not-yet-relocated VPP bundled plugins directory @ref src | VPP source code @ref src/plugins | VPP bundled plugins directory @ref src/svm | Shared virtual memory allocation library @@ -45,7 +44,7 @@ Directory name | Description @ref src/vpp-api | VPP application API bindings @ref src/vppinfra | VPP core library test | Unit tests -@ref vpp-api | Not-yet-relocated API bindings +@ref src/vpp/api | Not-yet-relocated API bindings ## Getting started @@ -100,3 +99,8 @@ information. Visit the [VPP wiki](https://wiki.fd.io/view/VPP) for details on more advanced building strategies and development notes. + + +## Test Framework + +There is PyDoc generated documentation available for the VPP test framework. See @subpage test_framework_doc for details. diff --git a/doxygen/Makefile b/doxygen/Makefile index 069d97fb..973ca07f 100644 --- a/doxygen/Makefile +++ b/doxygen/Makefile @@ -82,10 +82,6 @@ DOXY_EXCLUDE_REGEXP = ($(subst .,\.,$(shell echo '$(strip $(DOXY_EXCLUDE))' | se # Include all the normal source directories in the include file path DOXY_INCLUDE_PATH = $(DOXY_SRC_DIRECTORIES) -# Also include any plugin directories that exist -DOXY_INCLUDE_PATH += \ - $(shell find $(WS_ROOT)/plugins -maxdepth 1 -type d | sed -e 's@^$(WS_ROOT)/*@@') - # Find API header directories and include them in the header path. # This is only useful if VPP and plugins are already built; nothing # here depends on those targets. We don't build documentation for these @@ -140,7 +136,7 @@ ifeq ($(OS_ID),ubuntu) sudo apt-get $(CONFIRM) $(FORCE) install $$inst; \ fi @if [ ! -s /usr/lib/graphviz/config6a ]; then \ - echo "Rebuidlding system Graphviz configuration."; \ + echo "Rebuilding system Graphviz configuration."; \ sudo dot -c; \ fi else ifneq ("$(wildcard /etc/redhat-release)","") diff --git a/doxygen/test_framework_doc.md b/doxygen/test_framework_doc.md new file mode 100644 index 00000000..b2873c5f --- /dev/null +++ b/doxygen/test_framework_doc.md @@ -0,0 +1,7 @@ +Test Framework Documentation {#test_framework_doc} +============================ + +PyDoc generated documentation for the "make test" framework is available for the following releases + +- [Test framework documentation for VPP 17.04](https://docs.fd.io/vpp/17.04/vpp_make_test/html) +- [Test framework documentation for VPP 17.01](https://docs.fd.io/vpp/17.01/vpp_make_test/html) -- cgit 1.2.3-korg From 48009e4c5732180ecc951ac6359b9b96d38b8dc2 Mon Sep 17 00:00:00 2001 From: Damjan Marion Date: Thu, 20 Apr 2017 11:43:56 +0200 Subject: Move vagrant stuff to extras/ Change-Id: I7e3d6ecc3f23d862004c273e23e36e234ceb6015 Signed-off-by: Damjan Marion --- README.md | 4 +- build-root/vagrant | 1 + build-root/vagrant/.gitignore | 1 - build-root/vagrant/README | 28 --------- build-root/vagrant/Vagrantfile | 113 ---------------------------------- build-root/vagrant/WELCOME | 61 ------------------ build-root/vagrant/build.sh | 74 ---------------------- build-root/vagrant/clearinterfaces.sh | 17 ----- build-root/vagrant/env.sh | 6 -- build-root/vagrant/install.sh | 30 --------- build-root/vagrant/run.sh | 23 ------- build-root/vagrant/update.sh | 48 --------------- extras/vagrant/.gitignore | 1 + extras/vagrant/README | 28 +++++++++ extras/vagrant/Vagrantfile | 113 ++++++++++++++++++++++++++++++++++ extras/vagrant/WELCOME | 61 ++++++++++++++++++ extras/vagrant/build.sh | 74 ++++++++++++++++++++++ extras/vagrant/clearinterfaces.sh | 17 +++++ extras/vagrant/env.sh | 6 ++ extras/vagrant/install.sh | 30 +++++++++ extras/vagrant/run.sh | 23 +++++++ extras/vagrant/update.sh | 48 +++++++++++++++ 22 files changed, 404 insertions(+), 403 deletions(-) create mode 120000 build-root/vagrant delete mode 100644 build-root/vagrant/.gitignore delete mode 100644 build-root/vagrant/README delete mode 100644 build-root/vagrant/Vagrantfile delete mode 100644 build-root/vagrant/WELCOME delete mode 100755 build-root/vagrant/build.sh delete mode 100755 build-root/vagrant/clearinterfaces.sh delete mode 100644 build-root/vagrant/env.sh delete mode 100644 build-root/vagrant/install.sh delete mode 100755 build-root/vagrant/run.sh delete mode 100755 build-root/vagrant/update.sh create mode 100644 extras/vagrant/.gitignore create mode 100644 extras/vagrant/README create mode 100644 extras/vagrant/Vagrantfile create mode 100644 extras/vagrant/WELCOME create mode 100755 extras/vagrant/build.sh create mode 100755 extras/vagrant/clearinterfaces.sh create mode 100644 extras/vagrant/env.sh create mode 100644 extras/vagrant/install.sh create mode 100755 extras/vagrant/run.sh create mode 100755 extras/vagrant/update.sh (limited to 'README.md') diff --git a/README.md b/README.md index b70103fb..7f429d12 100644 --- a/README.md +++ b/README.md @@ -67,7 +67,7 @@ To install system dependencies, build VPP and then install it, simply run the build script. This should be performed a non-privileged user with `sudo` access from the project base directory: - ./build-root/vagrant/build.sh + ./extras/vagrant/build.sh If you want a more fine-grained approach because you intend to do some development work, the `Makefile` in the root directory of the source tree @@ -79,7 +79,7 @@ interest. To see the available targets run: ### Quick-start: Vagrant -The directory `build-root/vagrant` contains a `VagrantFile` and supporting +The directory `extras/vagrant` contains a `VagrantFile` and supporting scripts to bootstrap a working VPP inside a Vagrant-managed Virtual Machine. This VM can then be used to test concepts with VPP or as a development platform to extend VPP. Some obvious caveats apply when using a VM for VPP diff --git a/build-root/vagrant b/build-root/vagrant new file mode 120000 index 00000000..ada25b83 --- /dev/null +++ b/build-root/vagrant @@ -0,0 +1 @@ +../extras/vagrant \ No newline at end of file diff --git a/build-root/vagrant/.gitignore b/build-root/vagrant/.gitignore deleted file mode 100644 index a977916f..00000000 --- a/build-root/vagrant/.gitignore +++ /dev/null @@ -1 +0,0 @@ -.vagrant/ diff --git a/build-root/vagrant/README b/build-root/vagrant/README deleted file mode 100644 index 238c90ce..00000000 --- a/build-root/vagrant/README +++ /dev/null @@ -1,28 +0,0 @@ -INTRO: - -This is a vagrant environment for VPP. - -VPP currently works under Linux and has support for: - -- Ubuntu 14.04, Ubuntu 16.04 and Centos7.2 - -The VM builds VPP from source which can be located at /vpp - -VM PARTICULARS: -This vagrant environment creates a VM based on environment variables found in ./env.sh -To use, edit env.sh then - source ./env.sh - vagrant up - -By default, the VM created is/has: -- Ubuntu 14.04 -- 2 vCPUs -- 4G of RAM -- 2 NICs (1 x NAT - host access, 1 x VPP DPDK enabled) - -PROVIDERS: - -Supported vagrant providers are: - -- Virtualbox, VMware Fusion/Workstation, Libvirt - diff --git a/build-root/vagrant/Vagrantfile b/build-root/vagrant/Vagrantfile deleted file mode 100644 index b463d646..00000000 --- a/build-root/vagrant/Vagrantfile +++ /dev/null @@ -1,113 +0,0 @@ -# -*- mode: ruby -*- -# vi: set ft=ruby : - -Vagrant.configure(2) do |config| - - # Pick the right distro and bootstrap, default is ubuntu1604 - distro = ( ENV['VPP_VAGRANT_DISTRO'] || "ubuntu1604") - if distro == 'centos7' - config.vm.box = "puppetlabs/centos-7.2-64-nocm" - config.ssh.insert_key = false - elsif distro == 'ubuntu1404' - config.vm.box = "puppetlabs/ubuntu-14.04-64-nocm" - else - config.vm.box = "puppetlabs/ubuntu-16.04-64-nocm" - end - config.vm.box_check_update = false - - config.vm.provision :shell, :path => File.join(File.dirname(__FILE__),"update.sh") - config.vm.provision :shell, :path => File.join(File.dirname(__FILE__),"build.sh"), :args => "/vpp vagrant" - - post_build = ( ENV['VPP_VAGRANT_POST_BUILD'] ) - if post_build == "test" - config.vm.provision "shell", inline: "echo Testing VPP; cd /vpp; make test" - elsif post_build == "install" - config.vm.provision :shell, :path => File.join(File.dirname(__FILE__),"install.sh"), :args => "/vpp" - config.vm.provision :shell, :path => File.join(File.dirname(__FILE__),"clearinterfaces.sh") - config.vm.provision :shell, :path => File.join(File.dirname(__FILE__),"run.sh") - end - - # Add .gnupg dir in so folks can sign patches - # Note, as gnupg puts socket files in that dir, we have - # to be cautious and make sure we are dealing with a plain file - homedir = File.expand_path("~/") - Dir["#{homedir}/.gnupg/**/*"].each do |fname| - if File.file?(fname) - destname = fname.sub(Regexp.escape("#{homedir}/"),'') - config.vm.provision "file", source: fname, destination: destname - end - end - - # Copy in the .gitconfig if it exists - if File.file?(File.expand_path("~/.gitconfig")) - config.vm.provision "file", source: "~/.gitconfig", destination: ".gitconfig" - end - - # vagrant-cachier caches apt/yum etc to speed subsequent - # vagrant up - # to enable, run - # vagrant plugin install vagrant-cachier - # - if Vagrant.has_plugin?("vagrant-cachier") - config.cache.scope = :box - end - - # Define some physical ports for your VMs to be used by DPDK - nics = (ENV['VPP_VAGRANT_NICS'] || "2").to_i(10) - for i in 1..nics - config.vm.network "private_network", type: "dhcp" - end - - # use http proxy if avaiable - if ENV['http_proxy'] && Vagrant.has_plugin?("vagrant-proxyconf") - config.proxy.http = ENV['http_proxy'] - config.proxy.https = ENV['https_proxy'] - config.proxy.no_proxy = "localhost,127.0.0.1" - end - - vmcpu=(ENV['VPP_VAGRANT_VMCPU'] || 2) - vmram=(ENV['VPP_VAGRANT_VMRAM'] || 4096) - - config.ssh.forward_agent = true - config.ssh.forward_x11 = true - - config.vm.provider "virtualbox" do |vb| - vb.customize ["modifyvm", :id, "--ioapic", "on"] - vb.memory = "#{vmram}" - vb.cpus = "#{vmcpu}" - - # rsync the vpp directory if provision hasn't happened yet - unless File.exist? (".vagrant/machines/default/virtualbox/action_provision") - config.vm.synced_folder "../../", "/vpp", type: "rsync", - rsync__auto: false, - rsync__exclude: [ - "build-root/build*/", - "build-root/install*/", - "build-root/images*/", - "build-root/*.deb", - "build-root/*.rpm", - "build-root/*.changes", - "build-root/python", - "build-root/deb/debian/*.dkms", - "build-root/deb/debian/*.install", - "build-root/deb/debian/changes", - "build-root/tools"] - end - - #support for the SSE4.x instruction is required in some versions of VB. - vb.customize ["setextradata", :id, "VBoxInternal/CPUM/SSE4.1", "1"] - vb.customize ["setextradata", :id, "VBoxInternal/CPUM/SSE4.2", "1"] - end - config.vm.provider "vmware_fusion" do |fusion,override| - fusion.vmx["memsize"] = "#{vmram}" - fusion.vmx["numvcpus"] = "#{vmcpu}" - end - config.vm.provider "libvirt" do |lv| - lv.memory = "#{vmram}" - lv.cpus = "#{vmcpu}" - end - config.vm.provider "vmware_workstation" do |vws,override| - vws.vmx["memsize"] = "#{vmram}" - vws.vmx["numvcpus"] = "#{vmcpu}" - end -end diff --git a/build-root/vagrant/WELCOME b/build-root/vagrant/WELCOME deleted file mode 100644 index eb6aa2fd..00000000 --- a/build-root/vagrant/WELCOME +++ /dev/null @@ -1,61 +0,0 @@ -VPP has now been built, installed, and started. - -To give it a spin, we can create a tap interface and try a simple ping -(with trace). - -Make sure you have run: - -$ vagrant ssh - -To get to the vagrant VM: - -vagrant@localhost:~$ - -Confirm that vpp is running with - -vagrant@localhost:~$ sudo status vpp -vpp start/running, process 25202 - -To create the tap: - -vagrant@localhost:~$ sudo vppctl tap connect foobar -Created tap-0 for Linux tap 'foobar' -vagrant@localhost:~$ sudo vppctl show int - -To assign it an ip address (and 'up' the interface): - -vagrant@localhost:~$ sudo vppctl set int ip address tap-0 192.168.1.1/24 -vagrant@localhost:~$ sudo vppctl set int state tap-0 up - -To turn on packet tracing for the tap interface: -vagrant@localhost:~$ sudo vppctl trace add tapcli-rx 10 - -Now, to set up and try the other end: -vagrant@localhost:~$ sudo ip addr add 192.168.1.2/24 dev foobar -vagrant@localhost:~$ ping -c 3 192.168.1.1 - -To look at the trace: -vagrant@localhost:~$ sudo vppctl show trace - -And to stop tracing: - -vagrant@localhost:~$ sudo vppctl clear trace - -Other fun things to look at: - -The vlib packet processing graph: -vagrant@localhost:~$ sudo vppctl show vlib graph - -which will produce output like: - - Name Next Previous -ip4-icmp-input error-punt [0] ip4-local - ip4-icmp-echo-request [1] - vpe-icmp4-oam [2] - -To read this, the first column (Name) is the name of the node. -The second column (Next) is the name of the children of that node. -The third column (Previous) is the name of the parents of this node. - -To see this README again: -cat /vagrant/README diff --git a/build-root/vagrant/build.sh b/build-root/vagrant/build.sh deleted file mode 100755 index 76838e28..00000000 --- a/build-root/vagrant/build.sh +++ /dev/null @@ -1,74 +0,0 @@ -#!/bin/bash - -# Get Command Line arguements if present -VPP_DIR=$1 -if [ "x$1" != "x" ]; then - VPP_DIR=$1 -else - VPP_DIR=`dirname $0`/../../ -fi - -if [ "x$2" != "x" ]; then - SUDOCMD="sudo -H -u $2" -fi -echo 0:$0 -echo 1:$1 -echo 2:$2 -echo VPP_DIR: $VPP_DIR -echo SUDOCMD: $SUDOCMD - -# Figure out what system we are running on -if [ -f /etc/lsb-release ];then - . /etc/lsb-release -elif [ -f /etc/redhat-release ];then - sudo yum install -y redhat-lsb - DISTRIB_ID=`lsb_release -si` - DISTRIB_RELEASE=`lsb_release -sr` - DISTRIB_CODENAME=`lsb_release -sc` - DISTRIB_DESCRIPTION=`lsb_release -sd` -fi -KERNEL_OS=`uname -o` -KERNEL_MACHINE=`uname -m` -KERNEL_RELEASE=`uname -r` -KERNEL_VERSION=`uname -v` - -echo KERNEL_OS: $KERNEL_OS -echo KERNEL_MACHINE: $KERNEL_MACHINE -echo KERNEL_RELEASE: $KERNEL_RELEASE -echo KERNEL_VERSION: $KERNEL_VERSION -echo DISTRIB_ID: $DISTRIB_ID -echo DISTRIB_RELEASE: $DISTRIB_RELEASE -echo DISTRIB_CODENAME: $DISTRIB_CODENAME -echo DISTRIB_DESCRIPTION: $DISTRIB_DESCRIPTION - -# Install dependencies -cd $VPP_DIR -make UNATTENDED=yes install-dep - -# Really really clean things up so we can be sure -# that the build works even when switching distros -$SUDOCMD make wipe -(cd build-root/;$SUDOCMD make distclean) -rm -f build-root/.bootstrap.ok - -if [ $DISTRIB_ID == "CentOS" ]; then - echo rpm -V apr-devel - rpm -V apr-devel - if [ $? != 0 ]; then sudo yum reinstall -y apr-devel;fi - echo rpm -V ganglia-devel - rpm -V ganglia-devel - if [ $? != 0 ]; then sudo yum reinstall -y ganglia-devel;fi - echo rpm -V libconfuse-devel - rpm -V libconfuse-devel - if [ $? != 0 ]; then sudo yum reinstall -y libconfuse-devel;fi -fi - -# Build and install packaging -$SUDOCMD make bootstrap -if [ $DISTRIB_ID == "Ubuntu" ]; then - $SUDOCMD make pkg-deb -elif [ $DISTRIB_ID == "CentOS" ]; then - (cd $VPP_DIR/vnet ;$SUDOCMD aclocal;$SUDOCMD automake -a) - $SUDOCMD make pkg-rpm -fi - diff --git a/build-root/vagrant/clearinterfaces.sh b/build-root/vagrant/clearinterfaces.sh deleted file mode 100755 index 78f6705c..00000000 --- a/build-root/vagrant/clearinterfaces.sh +++ /dev/null @@ -1,17 +0,0 @@ -#!/bin/bash - -# Capture all the interface IPs, in case we need them later -ip -o addr show > ~vagrant/ifconfiga -chown vagrant:vagrant ~vagrant/ifconfiga - -# Disable all ethernet interfaces other than the default route -# interface so VPP will use those interfaces. The VPP auto-blacklist -# algorithm prevents the use of any physical interface contained in the -# routing table (i.e. "route --inet --inet6") preventing the theft of -# the management ethernet interface by VPP from the kernel. -for intf in $(ls /sys/class/net) ; do - if [ -d /sys/class/net/$intf/device ] && - [ "$(route --inet --inet6 | grep default | grep $intf)" == "" ] ; then - ifconfig $intf down - fi -done diff --git a/build-root/vagrant/env.sh b/build-root/vagrant/env.sh deleted file mode 100644 index f0edfd88..00000000 --- a/build-root/vagrant/env.sh +++ /dev/null @@ -1,6 +0,0 @@ -#!/usr/bin/env bash - -export VPP_VAGRANT_DISTRO="ubuntu1404" -export VPP_VAGRANT_NICS=2 -export VPP_VAGRANT_VMCPU=4 -export VPP_VAGRANT_VMRAM=4096 diff --git a/build-root/vagrant/install.sh b/build-root/vagrant/install.sh deleted file mode 100644 index a53faa4d..00000000 --- a/build-root/vagrant/install.sh +++ /dev/null @@ -1,30 +0,0 @@ -#!/bin/bash - -# Get Command Line arguements if present -VPP_DIR=$1 -if [ "x$1" != "x" ]; then - VPP_DIR=$1 -else - VPP_DIR=`dirname $0`/../../ -fi - -# Figure out what system we are running on -if [ -f /etc/lsb-release ];then - . /etc/lsb-release -elif [ -f /etc/redhat-release ];then - sudo yum install -y redhat-lsb - DISTRIB_ID=`lsb_release -si` - DISTRIB_RELEASE=`lsb_release -sr` - DISTRIB_CODENAME=`lsb_release -sc` - DISTRIB_DESCRIPTION=`lsb_release -sd` -fi -echo DISTRIB_ID: $DISTRIB_ID -echo DISTRIB_RELEASE: $DISTRIB_RELEASE -echo DISTRIB_CODENAME: $DISTRIB_CODENAME -echo DISTRIB_DESCRIPTION: $DISTRIB_DESCRIPTION - -if [ $DISTRIB_ID == "Ubuntu" ]; then - (cd ${VPP_DIR}/build-root/;sudo dpkg -i *.deb) -elif [ $DISTRIB_ID == "CentOS" ]; then - (cd ${VPP_DIR}/build-root/;sudo rpm -Uvh *.rpm) -fi \ No newline at end of file diff --git a/build-root/vagrant/run.sh b/build-root/vagrant/run.sh deleted file mode 100755 index 1cd33826..00000000 --- a/build-root/vagrant/run.sh +++ /dev/null @@ -1,23 +0,0 @@ -#!/bin/bash - -# Figure out what system we are running on -if [ -f /etc/lsb-release ];then - . /etc/lsb-release -elif [ -f /etc/redhat-release ];then - yum install -y redhat-lsb - DISTRIB_ID=`lsb_release -si` - DISTRIB_RELEASE=`lsb_release -sr` - DISTRIB_CODENAME=`lsb_release -sc` - DISTRIB_DESCRIPTION=`lsb_release -sd` -fi - -if [ $DISTRIB_ID == "CentOS" ]; then - # Install uio-pci-generic - modprobe uio_pci_generic -fi -echo "Starting VPP..." -if [ $DISTRIB_ID == "Ubuntu" ] && [ $DISTRIB_CODENAME = "trusty" ] ; then - start vpp -else - service vpp start -fi diff --git a/build-root/vagrant/update.sh b/build-root/vagrant/update.sh deleted file mode 100755 index f4139d74..00000000 --- a/build-root/vagrant/update.sh +++ /dev/null @@ -1,48 +0,0 @@ -#!/bin/bash - -# Make sure that we get the hugepages we need on provision boot -# Note: The package install should take care of this at the end -# But sometimes after all the work of provisioning, we can't -# get the requested number of hugepages without rebooting. -# So do it here just in case -sysctl -w vm.nr_hugepages=1024 -HUGEPAGES=`sysctl -n vm.nr_hugepages` -if [ $HUGEPAGES != 1024 ]; then - echo "ERROR: Unable to get 1024 hugepages, only got $HUGEPAGES. Cannot finish." - exit -fi - -# Figure out what system we are running on -if [ -f /etc/lsb-release ];then - . /etc/lsb-release -elif [ -f /etc/redhat-release ];then - yum install -y redhat-lsb - DISTRIB_ID=`lsb_release -si` - DISTRIB_RELEASE=`lsb_release -sr` - DISTRIB_CODENAME=`lsb_release -sc` - DISTRIB_DESCRIPTION=`lsb_release -sd` -fi - -# Do initial setup for the system -if [ $DISTRIB_ID == "Ubuntu" ]; then - - export DEBIAN_PRIORITY=critical - export DEBIAN_FRONTEND=noninteractive - export DEBCONF_NONINTERACTIVE_SEEN=true - APT_OPTS="--assume-yes --no-install-suggests --no-install-recommends -o Dpkg::Options::=\"--force-confdef\" -o Dpkg::Options::=\"--force-confold\"" - - # Standard update + upgrade dance - apt-get update ${APT_OPTS} >/dev/null - apt-get upgrade ${APT_OPTS} >/dev/null - - # Fix the silly notion that /bin/sh should point to dash by pointing it to bash - - update-alternatives --install /bin/sh sh /bin/bash 100 - - # Install useful but non-mandatory tools - apt-get install -y emacs x11-utils git-review gdb gdbserver -elif [ $DISTRIB_ID == "CentOS" ]; then - # Standard update + upgrade dance - yum check-update - yum update -y -fi diff --git a/extras/vagrant/.gitignore b/extras/vagrant/.gitignore new file mode 100644 index 00000000..a977916f --- /dev/null +++ b/extras/vagrant/.gitignore @@ -0,0 +1 @@ +.vagrant/ diff --git a/extras/vagrant/README b/extras/vagrant/README new file mode 100644 index 00000000..238c90ce --- /dev/null +++ b/extras/vagrant/README @@ -0,0 +1,28 @@ +INTRO: + +This is a vagrant environment for VPP. + +VPP currently works under Linux and has support for: + +- Ubuntu 14.04, Ubuntu 16.04 and Centos7.2 + +The VM builds VPP from source which can be located at /vpp + +VM PARTICULARS: +This vagrant environment creates a VM based on environment variables found in ./env.sh +To use, edit env.sh then + source ./env.sh + vagrant up + +By default, the VM created is/has: +- Ubuntu 14.04 +- 2 vCPUs +- 4G of RAM +- 2 NICs (1 x NAT - host access, 1 x VPP DPDK enabled) + +PROVIDERS: + +Supported vagrant providers are: + +- Virtualbox, VMware Fusion/Workstation, Libvirt + diff --git a/extras/vagrant/Vagrantfile b/extras/vagrant/Vagrantfile new file mode 100644 index 00000000..b463d646 --- /dev/null +++ b/extras/vagrant/Vagrantfile @@ -0,0 +1,113 @@ +# -*- mode: ruby -*- +# vi: set ft=ruby : + +Vagrant.configure(2) do |config| + + # Pick the right distro and bootstrap, default is ubuntu1604 + distro = ( ENV['VPP_VAGRANT_DISTRO'] || "ubuntu1604") + if distro == 'centos7' + config.vm.box = "puppetlabs/centos-7.2-64-nocm" + config.ssh.insert_key = false + elsif distro == 'ubuntu1404' + config.vm.box = "puppetlabs/ubuntu-14.04-64-nocm" + else + config.vm.box = "puppetlabs/ubuntu-16.04-64-nocm" + end + config.vm.box_check_update = false + + config.vm.provision :shell, :path => File.join(File.dirname(__FILE__),"update.sh") + config.vm.provision :shell, :path => File.join(File.dirname(__FILE__),"build.sh"), :args => "/vpp vagrant" + + post_build = ( ENV['VPP_VAGRANT_POST_BUILD'] ) + if post_build == "test" + config.vm.provision "shell", inline: "echo Testing VPP; cd /vpp; make test" + elsif post_build == "install" + config.vm.provision :shell, :path => File.join(File.dirname(__FILE__),"install.sh"), :args => "/vpp" + config.vm.provision :shell, :path => File.join(File.dirname(__FILE__),"clearinterfaces.sh") + config.vm.provision :shell, :path => File.join(File.dirname(__FILE__),"run.sh") + end + + # Add .gnupg dir in so folks can sign patches + # Note, as gnupg puts socket files in that dir, we have + # to be cautious and make sure we are dealing with a plain file + homedir = File.expand_path("~/") + Dir["#{homedir}/.gnupg/**/*"].each do |fname| + if File.file?(fname) + destname = fname.sub(Regexp.escape("#{homedir}/"),'') + config.vm.provision "file", source: fname, destination: destname + end + end + + # Copy in the .gitconfig if it exists + if File.file?(File.expand_path("~/.gitconfig")) + config.vm.provision "file", source: "~/.gitconfig", destination: ".gitconfig" + end + + # vagrant-cachier caches apt/yum etc to speed subsequent + # vagrant up + # to enable, run + # vagrant plugin install vagrant-cachier + # + if Vagrant.has_plugin?("vagrant-cachier") + config.cache.scope = :box + end + + # Define some physical ports for your VMs to be used by DPDK + nics = (ENV['VPP_VAGRANT_NICS'] || "2").to_i(10) + for i in 1..nics + config.vm.network "private_network", type: "dhcp" + end + + # use http proxy if avaiable + if ENV['http_proxy'] && Vagrant.has_plugin?("vagrant-proxyconf") + config.proxy.http = ENV['http_proxy'] + config.proxy.https = ENV['https_proxy'] + config.proxy.no_proxy = "localhost,127.0.0.1" + end + + vmcpu=(ENV['VPP_VAGRANT_VMCPU'] || 2) + vmram=(ENV['VPP_VAGRANT_VMRAM'] || 4096) + + config.ssh.forward_agent = true + config.ssh.forward_x11 = true + + config.vm.provider "virtualbox" do |vb| + vb.customize ["modifyvm", :id, "--ioapic", "on"] + vb.memory = "#{vmram}" + vb.cpus = "#{vmcpu}" + + # rsync the vpp directory if provision hasn't happened yet + unless File.exist? (".vagrant/machines/default/virtualbox/action_provision") + config.vm.synced_folder "../../", "/vpp", type: "rsync", + rsync__auto: false, + rsync__exclude: [ + "build-root/build*/", + "build-root/install*/", + "build-root/images*/", + "build-root/*.deb", + "build-root/*.rpm", + "build-root/*.changes", + "build-root/python", + "build-root/deb/debian/*.dkms", + "build-root/deb/debian/*.install", + "build-root/deb/debian/changes", + "build-root/tools"] + end + + #support for the SSE4.x instruction is required in some versions of VB. + vb.customize ["setextradata", :id, "VBoxInternal/CPUM/SSE4.1", "1"] + vb.customize ["setextradata", :id, "VBoxInternal/CPUM/SSE4.2", "1"] + end + config.vm.provider "vmware_fusion" do |fusion,override| + fusion.vmx["memsize"] = "#{vmram}" + fusion.vmx["numvcpus"] = "#{vmcpu}" + end + config.vm.provider "libvirt" do |lv| + lv.memory = "#{vmram}" + lv.cpus = "#{vmcpu}" + end + config.vm.provider "vmware_workstation" do |vws,override| + vws.vmx["memsize"] = "#{vmram}" + vws.vmx["numvcpus"] = "#{vmcpu}" + end +end diff --git a/extras/vagrant/WELCOME b/extras/vagrant/WELCOME new file mode 100644 index 00000000..eb6aa2fd --- /dev/null +++ b/extras/vagrant/WELCOME @@ -0,0 +1,61 @@ +VPP has now been built, installed, and started. + +To give it a spin, we can create a tap interface and try a simple ping +(with trace). + +Make sure you have run: + +$ vagrant ssh + +To get to the vagrant VM: + +vagrant@localhost:~$ + +Confirm that vpp is running with + +vagrant@localhost:~$ sudo status vpp +vpp start/running, process 25202 + +To create the tap: + +vagrant@localhost:~$ sudo vppctl tap connect foobar +Created tap-0 for Linux tap 'foobar' +vagrant@localhost:~$ sudo vppctl show int + +To assign it an ip address (and 'up' the interface): + +vagrant@localhost:~$ sudo vppctl set int ip address tap-0 192.168.1.1/24 +vagrant@localhost:~$ sudo vppctl set int state tap-0 up + +To turn on packet tracing for the tap interface: +vagrant@localhost:~$ sudo vppctl trace add tapcli-rx 10 + +Now, to set up and try the other end: +vagrant@localhost:~$ sudo ip addr add 192.168.1.2/24 dev foobar +vagrant@localhost:~$ ping -c 3 192.168.1.1 + +To look at the trace: +vagrant@localhost:~$ sudo vppctl show trace + +And to stop tracing: + +vagrant@localhost:~$ sudo vppctl clear trace + +Other fun things to look at: + +The vlib packet processing graph: +vagrant@localhost:~$ sudo vppctl show vlib graph + +which will produce output like: + + Name Next Previous +ip4-icmp-input error-punt [0] ip4-local + ip4-icmp-echo-request [1] + vpe-icmp4-oam [2] + +To read this, the first column (Name) is the name of the node. +The second column (Next) is the name of the children of that node. +The third column (Previous) is the name of the parents of this node. + +To see this README again: +cat /vagrant/README diff --git a/extras/vagrant/build.sh b/extras/vagrant/build.sh new file mode 100755 index 00000000..76838e28 --- /dev/null +++ b/extras/vagrant/build.sh @@ -0,0 +1,74 @@ +#!/bin/bash + +# Get Command Line arguements if present +VPP_DIR=$1 +if [ "x$1" != "x" ]; then + VPP_DIR=$1 +else + VPP_DIR=`dirname $0`/../../ +fi + +if [ "x$2" != "x" ]; then + SUDOCMD="sudo -H -u $2" +fi +echo 0:$0 +echo 1:$1 +echo 2:$2 +echo VPP_DIR: $VPP_DIR +echo SUDOCMD: $SUDOCMD + +# Figure out what system we are running on +if [ -f /etc/lsb-release ];then + . /etc/lsb-release +elif [ -f /etc/redhat-release ];then + sudo yum install -y redhat-lsb + DISTRIB_ID=`lsb_release -si` + DISTRIB_RELEASE=`lsb_release -sr` + DISTRIB_CODENAME=`lsb_release -sc` + DISTRIB_DESCRIPTION=`lsb_release -sd` +fi +KERNEL_OS=`uname -o` +KERNEL_MACHINE=`uname -m` +KERNEL_RELEASE=`uname -r` +KERNEL_VERSION=`uname -v` + +echo KERNEL_OS: $KERNEL_OS +echo KERNEL_MACHINE: $KERNEL_MACHINE +echo KERNEL_RELEASE: $KERNEL_RELEASE +echo KERNEL_VERSION: $KERNEL_VERSION +echo DISTRIB_ID: $DISTRIB_ID +echo DISTRIB_RELEASE: $DISTRIB_RELEASE +echo DISTRIB_CODENAME: $DISTRIB_CODENAME +echo DISTRIB_DESCRIPTION: $DISTRIB_DESCRIPTION + +# Install dependencies +cd $VPP_DIR +make UNATTENDED=yes install-dep + +# Really really clean things up so we can be sure +# that the build works even when switching distros +$SUDOCMD make wipe +(cd build-root/;$SUDOCMD make distclean) +rm -f build-root/.bootstrap.ok + +if [ $DISTRIB_ID == "CentOS" ]; then + echo rpm -V apr-devel + rpm -V apr-devel + if [ $? != 0 ]; then sudo yum reinstall -y apr-devel;fi + echo rpm -V ganglia-devel + rpm -V ganglia-devel + if [ $? != 0 ]; then sudo yum reinstall -y ganglia-devel;fi + echo rpm -V libconfuse-devel + rpm -V libconfuse-devel + if [ $? != 0 ]; then sudo yum reinstall -y libconfuse-devel;fi +fi + +# Build and install packaging +$SUDOCMD make bootstrap +if [ $DISTRIB_ID == "Ubuntu" ]; then + $SUDOCMD make pkg-deb +elif [ $DISTRIB_ID == "CentOS" ]; then + (cd $VPP_DIR/vnet ;$SUDOCMD aclocal;$SUDOCMD automake -a) + $SUDOCMD make pkg-rpm +fi + diff --git a/extras/vagrant/clearinterfaces.sh b/extras/vagrant/clearinterfaces.sh new file mode 100755 index 00000000..78f6705c --- /dev/null +++ b/extras/vagrant/clearinterfaces.sh @@ -0,0 +1,17 @@ +#!/bin/bash + +# Capture all the interface IPs, in case we need them later +ip -o addr show > ~vagrant/ifconfiga +chown vagrant:vagrant ~vagrant/ifconfiga + +# Disable all ethernet interfaces other than the default route +# interface so VPP will use those interfaces. The VPP auto-blacklist +# algorithm prevents the use of any physical interface contained in the +# routing table (i.e. "route --inet --inet6") preventing the theft of +# the management ethernet interface by VPP from the kernel. +for intf in $(ls /sys/class/net) ; do + if [ -d /sys/class/net/$intf/device ] && + [ "$(route --inet --inet6 | grep default | grep $intf)" == "" ] ; then + ifconfig $intf down + fi +done diff --git a/extras/vagrant/env.sh b/extras/vagrant/env.sh new file mode 100644 index 00000000..f0edfd88 --- /dev/null +++ b/extras/vagrant/env.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash + +export VPP_VAGRANT_DISTRO="ubuntu1404" +export VPP_VAGRANT_NICS=2 +export VPP_VAGRANT_VMCPU=4 +export VPP_VAGRANT_VMRAM=4096 diff --git a/extras/vagrant/install.sh b/extras/vagrant/install.sh new file mode 100644 index 00000000..a53faa4d --- /dev/null +++ b/extras/vagrant/install.sh @@ -0,0 +1,30 @@ +#!/bin/bash + +# Get Command Line arguements if present +VPP_DIR=$1 +if [ "x$1" != "x" ]; then + VPP_DIR=$1 +else + VPP_DIR=`dirname $0`/../../ +fi + +# Figure out what system we are running on +if [ -f /etc/lsb-release ];then + . /etc/lsb-release +elif [ -f /etc/redhat-release ];then + sudo yum install -y redhat-lsb + DISTRIB_ID=`lsb_release -si` + DISTRIB_RELEASE=`lsb_release -sr` + DISTRIB_CODENAME=`lsb_release -sc` + DISTRIB_DESCRIPTION=`lsb_release -sd` +fi +echo DISTRIB_ID: $DISTRIB_ID +echo DISTRIB_RELEASE: $DISTRIB_RELEASE +echo DISTRIB_CODENAME: $DISTRIB_CODENAME +echo DISTRIB_DESCRIPTION: $DISTRIB_DESCRIPTION + +if [ $DISTRIB_ID == "Ubuntu" ]; then + (cd ${VPP_DIR}/build-root/;sudo dpkg -i *.deb) +elif [ $DISTRIB_ID == "CentOS" ]; then + (cd ${VPP_DIR}/build-root/;sudo rpm -Uvh *.rpm) +fi \ No newline at end of file diff --git a/extras/vagrant/run.sh b/extras/vagrant/run.sh new file mode 100755 index 00000000..1cd33826 --- /dev/null +++ b/extras/vagrant/run.sh @@ -0,0 +1,23 @@ +#!/bin/bash + +# Figure out what system we are running on +if [ -f /etc/lsb-release ];then + . /etc/lsb-release +elif [ -f /etc/redhat-release ];then + yum install -y redhat-lsb + DISTRIB_ID=`lsb_release -si` + DISTRIB_RELEASE=`lsb_release -sr` + DISTRIB_CODENAME=`lsb_release -sc` + DISTRIB_DESCRIPTION=`lsb_release -sd` +fi + +if [ $DISTRIB_ID == "CentOS" ]; then + # Install uio-pci-generic + modprobe uio_pci_generic +fi +echo "Starting VPP..." +if [ $DISTRIB_ID == "Ubuntu" ] && [ $DISTRIB_CODENAME = "trusty" ] ; then + start vpp +else + service vpp start +fi diff --git a/extras/vagrant/update.sh b/extras/vagrant/update.sh new file mode 100755 index 00000000..f4139d74 --- /dev/null +++ b/extras/vagrant/update.sh @@ -0,0 +1,48 @@ +#!/bin/bash + +# Make sure that we get the hugepages we need on provision boot +# Note: The package install should take care of this at the end +# But sometimes after all the work of provisioning, we can't +# get the requested number of hugepages without rebooting. +# So do it here just in case +sysctl -w vm.nr_hugepages=1024 +HUGEPAGES=`sysctl -n vm.nr_hugepages` +if [ $HUGEPAGES != 1024 ]; then + echo "ERROR: Unable to get 1024 hugepages, only got $HUGEPAGES. Cannot finish." + exit +fi + +# Figure out what system we are running on +if [ -f /etc/lsb-release ];then + . /etc/lsb-release +elif [ -f /etc/redhat-release ];then + yum install -y redhat-lsb + DISTRIB_ID=`lsb_release -si` + DISTRIB_RELEASE=`lsb_release -sr` + DISTRIB_CODENAME=`lsb_release -sc` + DISTRIB_DESCRIPTION=`lsb_release -sd` +fi + +# Do initial setup for the system +if [ $DISTRIB_ID == "Ubuntu" ]; then + + export DEBIAN_PRIORITY=critical + export DEBIAN_FRONTEND=noninteractive + export DEBCONF_NONINTERACTIVE_SEEN=true + APT_OPTS="--assume-yes --no-install-suggests --no-install-recommends -o Dpkg::Options::=\"--force-confdef\" -o Dpkg::Options::=\"--force-confold\"" + + # Standard update + upgrade dance + apt-get update ${APT_OPTS} >/dev/null + apt-get upgrade ${APT_OPTS} >/dev/null + + # Fix the silly notion that /bin/sh should point to dash by pointing it to bash + + update-alternatives --install /bin/sh sh /bin/bash 100 + + # Install useful but non-mandatory tools + apt-get install -y emacs x11-utils git-review gdb gdbserver +elif [ $DISTRIB_ID == "CentOS" ]; then + # Standard update + upgrade dance + yum check-update + yum update -y +fi -- cgit 1.2.3-korg From 583dc8d3e23a780c85ebe48ea59f0338aad4df17 Mon Sep 17 00:00:00 2001 From: Ray Kinsella Date: Thu, 8 Jun 2017 15:54:19 +0100 Subject: Sample plugin: Add sample plugin documentation Added some user documentation to sample plugin. Change-Id: I518910f80499307e8fcac8dcef7baaeab5ea8e35 Signed-off-by: Ray Kinsella --- README.md | 2 +- doxygen/Makefile | 3 +- doxygen/user_doc.md | 1 + src/examples/sample-plugin/sample/sample.c | 30 ++++++++--- src/examples/sample-plugin/sample_plugin_doc.md | 66 +++++++++++++++++++++++++ 5 files changed, 92 insertions(+), 10 deletions(-) create mode 100644 src/examples/sample-plugin/sample_plugin_doc.md (limited to 'README.md') diff --git a/README.md b/README.md index 7f429d12..596494b9 100644 --- a/README.md +++ b/README.md @@ -45,7 +45,7 @@ Directory name | Description @ref src/vppinfra | VPP core library test | Unit tests @ref src/vpp/api | Not-yet-relocated API bindings - +@ref src/examples | VPP example code ## Getting started diff --git a/doxygen/Makefile b/doxygen/Makefile index abaca185..b6ba5887 100644 --- a/doxygen/Makefile +++ b/doxygen/Makefile @@ -74,8 +74,7 @@ DOXY_INPUT := $(subst $(WS_ROOT)/,,$(DOXY_INPUT)) # These must be left-anchored paths for the regexp below to work. DOXY_EXCLUDE ?= \ $(DOXY_SRC)/vlib/buffer.c \ - $(DOXY_SRC)/vpp-api/lua \ - $(DOXY_SRC)/examples/sample-plugin + $(DOXY_SRC)/vpp-api/lua # Generate a regexp for filenames to exclude DOXY_EXCLUDE_REGEXP = ($(subst .,\.,$(shell echo '$(strip $(DOXY_EXCLUDE))' | sed -e 's/ /|/g'))) diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index becc2e0a..c0a05bd9 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -16,3 +16,4 @@ Several modules provide operational, dataplane-user focused documentation. - @subpage span_doc - @subpage srv6_doc - @subpage srmpls_doc +- @subpage sample_plugin_doc diff --git a/src/examples/sample-plugin/sample/sample.c b/src/examples/sample-plugin/sample/sample.c index 2f8ac4c9..3929ac23 100644 --- a/src/examples/sample-plugin/sample/sample.c +++ b/src/examples/sample-plugin/sample/sample.c @@ -12,10 +12,9 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -/* - *------------------------------------------------------------------ - * sample.c - simple MAC-swap API / debug CLI handling - *------------------------------------------------------------------ +/** + * @file + * @brief Sample Plugin, plugin API / trace / CLI handling. */ #include @@ -65,7 +64,11 @@ VLIB_PLUGIN_REGISTER () = { }; /* *INDENT-ON* */ -/* Action function shared between message handler and debug CLI */ +/** + * @brief Enable/disable the macswap plugin. + * + * Action function shared between message handler and debug CLI. + */ int sample_macswap_enable_disable (sample_main_t * sm, u32 sw_if_index, int enable_disable) @@ -135,6 +138,9 @@ macswap_enable_disable_command_fn (vlib_main_t * vm, return 0; } +/** + * @brief CLI command to enable/disable the sample macswap plugin. + */ VLIB_CLI_COMMAND (sr_content_command, static) = { .path = "sample macswap", .short_help = @@ -142,7 +148,9 @@ VLIB_CLI_COMMAND (sr_content_command, static) = { .function = macswap_enable_disable_command_fn, }; -/* API message handler */ +/** + * @brief Plugin API message handler. + */ static void vl_api_sample_macswap_enable_disable_t_handler (vl_api_sample_macswap_enable_disable_t * mp) { @@ -156,7 +164,9 @@ static void vl_api_sample_macswap_enable_disable_t_handler REPLY_MACRO(VL_API_SAMPLE_MACSWAP_ENABLE_DISABLE_REPLY); } -/* Set up the API message handling tables */ +/** + * @brief Set up the API message handling tables. + */ static clib_error_t * sample_plugin_api_hookup (vlib_main_t *vm) { @@ -188,6 +198,9 @@ setup_message_id_table (sample_main_t * sm, api_main_t *am) #undef _ } +/** + * @brief Initialize the sample plugin. + */ static clib_error_t * sample_init (vlib_main_t * vm) { sample_main_t * sm = &sample_main; @@ -214,6 +227,9 @@ static clib_error_t * sample_init (vlib_main_t * vm) VLIB_INIT_FUNCTION (sample_init); +/** + * @brief Hook the sample plugin into the VPP graph hierarchy. + */ VNET_FEATURE_INIT (sample, static) = { .arc_name = "device-input", diff --git a/src/examples/sample-plugin/sample_plugin_doc.md b/src/examples/sample-plugin/sample_plugin_doc.md new file mode 100644 index 00000000..9348094c --- /dev/null +++ b/src/examples/sample-plugin/sample_plugin_doc.md @@ -0,0 +1,66 @@ +# Sample plugin for VPP {#sample_plugin_doc} + +## Overview + +This is the VPP sample plugin demonstrates how to create a new plugin that integrates +with VPP. The sample code implements a trival macswap algorithim that demonstrates plugin +runtime integration with the VPP graph hierachy, api and cli. + +For deeper dive information see the annotations in the sample code itself. See [sample.c](@ref sample.c) + +## How to build and run the sample plugin. + +Now (re)build VPP. + + $ make wipe + +Define environmental variable 'VPP_WITH_SAMPLE_PLUGIN=yes' with a process scope + + $ VPP_WITH_SAMPLE_PLUGIN=yes make build + +or a session scope, and build VPP. + + $ export VPP_WITH_SAMPLE_PLUGIN=yes + & make build + +Now run VPP and make sure the plugin is loaded. + + $ make run + ... + load_one_plugin:184: Loaded plugin: memif_plugin.so (Packet Memory Interface (experimetal)) + load_one_plugin:184: Loaded plugin: sample_plugin.so (Sample of VPP Plugin) + load_one_plugin:184: Loaded plugin: snat_plugin.so (Network Address Translation) + ... + DBGvpp# + +## How to create a new plugin + +To create a new plugin based on the sample plugin, copy and rename the sample plugin directory and automake config. + + cp -r src/examples/sample-plugin/sample src/plugins/newplugin + cp src/examples/sample-plugin/sample.am src/plugins/newplugin.am + +Add the following entry to the plugins section of `src/configure.ac`. + + PLUGIN_ENABLED(newplugin) + +Add the following entry to the plugins section of `src/plugins/Makefile.am` + + if ENABLE_NEWPLUGIN + include newplugin.am + endif + +Now (re)build VPP. + + $ make wipe + $ make build + +## Configuration + +To enable the sample plugin + + sample macswap + +To disable the sample plugin + + sample macswap disable -- cgit 1.2.3-korg From d84f2ef54a457918f21bb0ee82392274b62611fe Mon Sep 17 00:00:00 2001 From: Chris Luke Date: Wed, 27 Sep 2017 15:09:48 -0400 Subject: General documentation updates - We now have several developer-focused docs, so create an index page for them. - Rework several docs to fit into the index structure. - Experiment with code highlighting; tweak the CSS slightly to make it slightly nicer to look at. Change-Id: I4185a18f84fa0764745ca7a3148276064a3155c6 Signed-off-by: Chris Luke (cherry picked from commit 64ebb5ff1338140d94c7f9ee72138fe84d89de2e) --- README.md | 15 +- doxygen/Makefile | 4 +- doxygen/assets/doxy-vpp.css | 6 + doxygen/dev_doc.md | 12 + doxygen/doxygen.cfg | 3 +- doxygen/test_framework_doc.md | 1 + doxygen/user_doc.md | 2 - extras/libmemif/README.md | 72 ------ extras/libmemif/docs/BuildInstructions.md | 54 ---- extras/libmemif/docs/GettingStarted.md | 215 ---------------- extras/libmemif/docs/buildinstructions_doc.md | 55 ++++ extras/libmemif/docs/gettingstarted_doc.md | 219 ++++++++++++++++ extras/libmemif/examples/ExampleSetup.md | 206 --------------- extras/libmemif/examples/README.md | 16 -- extras/libmemif/examples/example_setup_doc.md | 207 +++++++++++++++ extras/libmemif/examples/examples_doc.md | 18 ++ extras/libmemif/libmemif_doc.md | 74 ++++++ src/plugins/acl/acl-plugin.md | 347 ------------------------- src/plugins/acl/acl_hash_lookup_doc.md | 241 ++++++++++++++++++ src/plugins/acl/acl_multicore_doc.md | 349 ++++++++++++++++++++++++++ src/plugins/acl/hash_lookup.md | 241 ------------------ src/vlibapi/api_doc.md | 77 +++--- 22 files changed, 1234 insertions(+), 1200 deletions(-) create mode 100644 doxygen/dev_doc.md delete mode 100644 extras/libmemif/README.md delete mode 100644 extras/libmemif/docs/BuildInstructions.md delete mode 100644 extras/libmemif/docs/GettingStarted.md create mode 100644 extras/libmemif/docs/buildinstructions_doc.md create mode 100644 extras/libmemif/docs/gettingstarted_doc.md delete mode 100644 extras/libmemif/examples/ExampleSetup.md delete mode 100644 extras/libmemif/examples/README.md create mode 100644 extras/libmemif/examples/example_setup_doc.md create mode 100644 extras/libmemif/examples/examples_doc.md create mode 100644 extras/libmemif/libmemif_doc.md delete mode 100644 src/plugins/acl/acl-plugin.md create mode 100644 src/plugins/acl/acl_hash_lookup_doc.md create mode 100644 src/plugins/acl/acl_multicore_doc.md delete mode 100644 src/plugins/acl/hash_lookup.md (limited to 'README.md') diff --git a/README.md b/README.md index 596494b9..090dd35f 100644 --- a/README.md +++ b/README.md @@ -30,10 +30,11 @@ Directory name | Description build-root | Build output directory doxygen | Documentation generator configuration dpdk | DPDK patches and build infrastructure -@ref src | VPP source code +@ref extras/libmemif | Client library for memif +@ref src/examples | VPP example code @ref src/plugins | VPP bundled plugins directory @ref src/svm | Shared virtual memory allocation library - src/tests | Unit tests + src/tests | Standalone tests (not part of test harness) src/vat | VPP API test program @ref src/vlib | VPP application library @ref src/vlibapi | VPP API library @@ -43,9 +44,8 @@ Directory name | Description @ref src/vpp | VPP application @ref src/vpp-api | VPP application API bindings @ref src/vppinfra | VPP core library - test | Unit tests @ref src/vpp/api | Not-yet-relocated API bindings -@ref src/examples | VPP example code + test | Unit tests and Python test harness ## Getting started @@ -95,12 +95,13 @@ for this can be found [on the Setting up Vagrant wiki page] ## More information Several modules provide documentation, see @subpage user_doc for more -information. +end-user-oriented information. Also see @subpage dev_doc for developer notes. Visit the [VPP wiki](https://wiki.fd.io/view/VPP) for details on more -advanced building strategies and development notes. +advanced building strategies and other development notes. ## Test Framework -There is PyDoc generated documentation available for the VPP test framework. See @subpage test_framework_doc for details. +There is PyDoc generated documentation available for the VPP test framework. +See @ref test_framework_doc for details. diff --git a/doxygen/Makefile b/doxygen/Makefile index b6ba5887..5870530f 100644 --- a/doxygen/Makefile +++ b/doxygen/Makefile @@ -62,7 +62,8 @@ DOXY_INPUT ?= \ $(wildcard $(WS_ROOT)/*.md) \ $(wildcard $(DOXY_DIR)/*.md) \ $(DOXY_SRC_DIRECTORIES) \ - $(DOXY_SRC)/plugins + $(DOXY_SRC)/plugins \ + extras # Strip leading workspace path from input names DOXY_INPUT := $(subst $(WS_ROOT)/,,$(DOXY_INPUT)) @@ -73,7 +74,6 @@ DOXY_INPUT := $(subst $(WS_ROOT)/,,$(DOXY_INPUT)) # there's a DPDK equivalent that conflicts. # These must be left-anchored paths for the regexp below to work. DOXY_EXCLUDE ?= \ - $(DOXY_SRC)/vlib/buffer.c \ $(DOXY_SRC)/vpp-api/lua # Generate a regexp for filenames to exclude diff --git a/doxygen/assets/doxy-vpp.css b/doxygen/assets/doxy-vpp.css index 3b74eb6f..108600ac 100644 --- a/doxygen/assets/doxy-vpp.css +++ b/doxygen/assets/doxy-vpp.css @@ -45,3 +45,9 @@ pre { a.el { font-family: Consolas, Courier, monospace; } + +div.fragment { + padding: 2px; + margin-left: 8px; +} + diff --git a/doxygen/dev_doc.md b/doxygen/dev_doc.md new file mode 100644 index 00000000..d1d22f76 --- /dev/null +++ b/doxygen/dev_doc.md @@ -0,0 +1,12 @@ +Developer Documentation {#dev_doc} +======================= + +Programming notes for developers. + +- @subpage test_framework_doc +- @subpage sample_plugin_doc +- @subpage api_doc +- @subpage vapi_doc +- @subpage acl_hash_lookup +- @subpage acl_multicore +- @subpage libmemif_doc diff --git a/doxygen/doxygen.cfg b/doxygen/doxygen.cfg index 7675b693..f2d39112 100644 --- a/doxygen/doxygen.cfg +++ b/doxygen/doxygen.cfg @@ -882,7 +882,8 @@ EXAMPLE_RECURSIVE = NO # that contain images that are to be included in the documentation (see the # \image command). -IMAGE_PATH = $(ROOT)/doxygen/assets +IMAGE_PATH = $(ROOT)/doxygen/assets \ + $(ROOT)/extras/libmemif/docs # The INPUT_FILTER tag can be used to specify a program that doxygen should # invoke to filter for each input file. Doxygen will invoke the filter program diff --git a/doxygen/test_framework_doc.md b/doxygen/test_framework_doc.md index b2873c5f..755a29e4 100644 --- a/doxygen/test_framework_doc.md +++ b/doxygen/test_framework_doc.md @@ -3,5 +3,6 @@ Test Framework Documentation {#test_framework_doc} PyDoc generated documentation for the "make test" framework is available for the following releases +- [Test framework documentation for VPP 17.10](https://docs.fd.io/vpp/17.10/vpp_make_test/html) - [Test framework documentation for VPP 17.04](https://docs.fd.io/vpp/17.04/vpp_make_test/html) - [Test framework documentation for VPP 17.01](https://docs.fd.io/vpp/17.01/vpp_make_test/html) diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index 65b09f98..f7d7936b 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -16,6 +16,4 @@ Several modules provide operational, dataplane-user focused documentation. - @subpage span_doc - @subpage srv6_doc - @subpage srmpls_doc -- @subpage sample_plugin_doc - @subpage nat64_doc -- @subpage api_doc diff --git a/extras/libmemif/README.md b/extras/libmemif/README.md deleted file mode 100644 index 0cccfb37..00000000 --- a/extras/libmemif/README.md +++ /dev/null @@ -1,72 +0,0 @@ -Shared Memory Packet Interface (memif) Library -============================================== -## Introduction - -Shared memory packet interface (memif) provides high performance packet transmit and receive between user application and Vector Packet Processing (VPP) or multiple user applications. Using libmemif, user application can create shared memory interface in master or slave mode and connect to VPP or another application using libmemif. Once the connection is established, user application can receive or transmit packets using libmemif API. - -![Architecture](docs/architecture.png) - -## Features - -- [x] Slave mode - - [x] Connect to VPP over memif - - [x] ICMP responder example app -- [x] Transmit/receive packets -- [x] Interrupt mode support -- [x] File descriptor event polling in libmemif (optional) - - [x] Simplify file descriptor event polling (one handler for control and interrupt channel) -- [x] Multiple connections -- [x] Multiple queues - - [x] Multi-thread support -- [x] Master mode - - [ ] Multiple regions (TODO) -- [ ] Performance testing (TODO) - -## Quickstart - -This setup will run libmemif ICMP responder example app in container. Install [docker](https://docs.docker.com/engine/installation) engine. -Useful link: [Docker documentation](https://docs.docker.com/get-started). - -Pull image: -``` -# docker pull ligato/libmemif-sample-service -``` - -Now you should be able to see ligato/libmemif-sample-service image on your local machine (IMAGE ID in this README may be outdated): -``` -# docker images -REPOSITORY TAG IMAGE ID CREATED SIZE -ligato/libmemif-sample-service latest 32ecc2f9d013 About a minute ago 468MB -... -``` - -Run container: -``` -# docker run -it --rm --name icmp-responder --hostname icmp-responder --privileged -v "/run/vpp/:/run/vpp/" ligato/libmemif-sample-service -``` -Example application will start in debug mode. Output should look like this: -``` -ICMP_Responder:add_epoll_fd:204: fd 0 added to epoll -MEMIF_DEBUG:src/main.c:memif_init:383: app name: ICMP_Responder -ICMP_Responder:add_epoll_fd:204: fd 4 added to epoll -LIBMEMIF EXAMPLE APP: ICMP_Responder (debug) -============================== -libmemif version: 1.0 (debug) -memif version: 256 -commands: - help - prints this help - exit - exit app - conn - create memif (slave-mode) - del - delete memif - show - show connection details - ip-set - set interface ip address - rx-mode - set queue rx mode -``` - -Continue with [Example setup](examples/ExampleSetup.md) which contains instructions on how to set up conenction between icmpr-epoll example app and VPP-memif. - -#### Next steps - -- [Build instructions](docs/BuildInstructions.md) Instructions on how to build/install libmemif. -- [Examples](examples/README.md) More example apps presenting different features. -- [Getting started](docs/GettingStarted.md) Introduction to libmemif API. Explaining library usage in custom app. diff --git a/extras/libmemif/docs/BuildInstructions.md b/extras/libmemif/docs/BuildInstructions.md deleted file mode 100644 index 9bb59b94..00000000 --- a/extras/libmemif/docs/BuildInstructions.md +++ /dev/null @@ -1,54 +0,0 @@ -## Build Instructions - -Install dependencies -``` -# sudo apt-get install -y git autoconf pkg_config libtool check -``` - -Clone repository to your local machine. -``` -# git clone https://github.com/JakubGrajciar/libmemif.git -``` - -From root directory execute: -For debug build: -``` -# ./bootstrap -# ./configure -# make -# make install -``` - -For release build: -``` -# ./bootstrap -# ./configure -# make release -# make install -``` -Verify installation: -``` -# ./.libs/icmpr-epoll -``` -> Make sure to run the binary file from ./.libs. File ./icmp\_responder in libmemif root directory is script that links the library, so it only verifies successful build. Default install path is /usr/lib. -Use _help_ command to display build information and commands: -``` -ICMP_Responder:add_epoll_fd:204: fd 0 added to epoll -MEMIF_DEBUG:src/main.c:memif_init:383: app name: ICMP_Responder -ICMP_Responder:add_epoll_fd:204: fd 4 added to epoll -LIBMEMIF EXAMPLE APP: ICMP_Responder (debug) -============================== -libmemif version: 1.0 (debug) -memif version: 256 -commands: - help - prints this help - exit - exit app - conn - create memif (slave-mode) - del - delete memif - show - show connection details - ip-set - set interface ip address - rx-mode - set queue rx mode -``` -#### Examples - -Once the library is build/installed, refer to [Examples](../examples/README.md) and [Getting started](GettingStarted.md) for additional information on basic use cases and API usage. diff --git a/extras/libmemif/docs/GettingStarted.md b/extras/libmemif/docs/GettingStarted.md deleted file mode 100644 index 3cc0ceb0..00000000 --- a/extras/libmemif/docs/GettingStarted.md +++ /dev/null @@ -1,215 +0,0 @@ -## Getting started - -#### Concept (Connecting to VPP) - -For detailed information on api calls and structures please refer to [libmemif.h](../src/libmemif.h) - -1. Initialize memif - - Declare callback function handling file descriptor event polling. -```C -int -control_fd_update (int fd, uint8_t events) -{ -... -} -``` - - Call memif initialization function. memif\_init -```C -err = memif_init (control_fd_update, APP_NAME); -``` - -> If event occurres on any file descriptor returned by this callback, call memif\_control\_fd\_handler function. -```C -memif_err = memif_control_fd_handler (evt.data.fd, events); -``` -> If callback function parameter for memif\_init function is set to NULL, libmemif will handle file descriptor event polling. - Api call memif\_poll\_event will call epoll\_pwait with user defined timeout to poll event on file descriptors opened by libmemif. -```C -/* main loop */ - while (1) - { - if (memif_poll_event (-1) < 0) - { - DBG ("poll_event error!"); - } - } -``` - -> Memif initialization function will initialize internal structures and create timer file descriptor, which will be used for sending periodic connection requests. Timer is disarmed if no memif interface is created. - -2. Creating interface - - Declare memif connction handle. -```C -memif_conn_handle_t c; -``` -> example app uses struct that contains connection handle, rx/tx buffers and other connection specific information. - - - Specify connection arguments. -```C -memif_conn_args_t args; -memset (&args, 0, sizeof (args)); -args.is_master = is_master; -args.log2_ring_size = 10; -args.buffer_size = 2048; -args.num_s2m_rings = 2; -args.num_m2s_rings = 2; -strncpy ((char *) args.interface_name, IF_NAME, strlen (IF_NAME)); -strncpy ((char *) args.instance_name, APP_NAME, strlen (APP_NAME)); -args.mode = 0; -args.interface_id = 0; -``` - - Declare callback functions called on connected/disconnected/interrupted status changed. -```C -int -on_connect (memif_conn_handle_t conn, void *private_ctx) -{ -... -} - -int -on_disconnect (memif_conn_handle_t conn, void *private_ctx) -{ - INFO ("memif connected!"); - return 0; -} -``` - - Call memif interface create function. memif\_create -```C -err = memif_create (&c->conn, - &args, on_connect, on_disconnect, on_interrupt, &ctx[index]); -``` -> If connection is in slave mode, arms timer file descriptor. -> If on interrupt callback is set to NULL, user will not be notified about interrupt. Use memif\_get\_queue\_efd call to get interrupt file descriptor for specific queue. -```C -int fd = -1; -err = memif_get_queue_efd (c->conn, data->qid, &fd); -``` - -3. Connection establishment - - User application will poll events on all file descriptors returned in memif\_control\_fd\_update\_t callback. - - On event call memif\_control\_fd\_handler. - - Everything else regarding connection establishment will be done internally. - - Once connection has been established, a callback will inform the user about connection status change. - -4. Interrupt packet receive - - If event is polled on interrupt file descriptor, libmemif will call memif\_interrupt\_t callback specified for every connection instance. -```C -int -on_interrupt (memif_conn_handle_t conn, void *private_ctx, uint16_t qid) -{ -... -} -``` - -6. Memif buffers - - Packet data are stored in memif\_buffer\_t. Pointer _data_ points to shared memory buffer, and unsigned integer *data\_len* contains packet data length. -```C -typedef struct -{ - uint16_t desc_index; - uint32_t buffer_len; - uint32_t data_len; - void *data; -} memif_buffer_t; -``` - -5. Packet receive - - Api call memif\_rx\_burst will set all required fields in memif buffers provided by user application. -```C -err = memif_rx_burst (c->conn, qid, c->rx_bufs, MAX_MEMIF_BUFS, &rx); -``` - - User application can then process packets. - - Api call memif\_buffer\_free will make supplied memif buffers ready for next receive and mark shared memory buffers as free. -```C -err = memif_buffer_free (c->conn, qid, c->rx_bufs, rx, &fb); -``` - -6. Packet transmit - - Api call memif\_buffer\_alloc will set all required fields in memif buffers provided by user application. -```C -err = memif_buffer_alloc (c->conn, qid, c->tx_bufs, n, &r); -``` - - User application can populate shared memory buffers with packets. - - Api call memif\_tx\_burst will inform peer interface (master memif on VPP) that there are packets ready to receive and mark memif buffers as free. -```C -err = memif_tx_burst (c->conn, qid, c->tx_bufs, c->tx_buf_num, &r); -``` - -7. Helper functions - - Memif details - - Api call memif\_get\_details will return details about connection. -```C -err = memif_get_details (c->conn, &md, buf, buflen); -``` - - Memif error messages - - Every api call returns error code (integer value) mapped to error string. - - Call memif\_strerror will return error message assigned to specific error code. -```C -if (err != MEMIF_ERR_SUCCESS) - INFO ("memif_get_details: %s", memif_strerror (err)); -``` - - Not all syscall errors are translated to memif error codes. If error code 1 (MEMIF\_ERR\_SYSCALL) is returned then libmemif needs to be compiled with -DMEMIF_DBG flag to print error message. Use _make -B_ to rebuild libmemif in debug mode. - -#### Example app (libmemif fd event polling): - -- [ICMP Responder](../examples/icmp_responder/main.c) -> Optional argument: transmit queue id. -``` -icmpr 1 -``` -> Set transmit queue id to 1. Default is 0. -> Application will create memif interface in slave mode and try to connect to VPP. Exit using Ctrl+C. Application will handle SIGINT signal, free allocated memory and exit with EXIT_SUCCESS. - -#### Example app: - -- [ICMP Responder custom fd event polling](../examples/icmp_responder-epoll/main.c) - -#### Example app (multi-thread queue polling) - -- [ICMP Responder multi-thread](../examples/icmp_responder-mt/main.c) - -> Simple example of libmemif multi-thread usage. Connection establishment is handled by main thread. There are two rx queues in this example. One in polling mode and second in interrupt mode. - -VPP config: -``` -# create memif id 0 master -# set int state memif0 up -# set int ip address memif0 192.168.1.1/24 -# ping 192.168.1.2 -``` -For multiple rings (queues) support run VPP with worker threads: -example startup.conf: -``` -unix { - interactive - nodaemon - full-coredump -} - -cpu { - workers 2 -} -``` -VPP config: -``` -# create memif id 0 master -# set int state memif0 up -# set int ip address memif0 192.168.1.1/24 -# ping 192.168.1.2 -``` -> Master mode queue number is limited by worker threads. Slave mode interface needs to specify number of queues. -``` -# create memif id 0 slave rx-queues 2 tx-queues 2 -``` -> Example applications use VPP default socket file for memif: /run/vpp/memif.sock -> For master mode, socket directory must exist prior to memif\_create call. - -#### Unit tests - -Unit tests use [Check](https://libcheck.github.io/check/index.html) framework. This framework must be installed in order to build *unit\_test* binary. -Ubuntu/Debian: -``` -sudo apt-get install check -``` -[More platforms](https://libcheck.github.io/check/web/install.html) - diff --git a/extras/libmemif/docs/buildinstructions_doc.md b/extras/libmemif/docs/buildinstructions_doc.md new file mode 100644 index 00000000..838e835a --- /dev/null +++ b/extras/libmemif/docs/buildinstructions_doc.md @@ -0,0 +1,55 @@ +## Build Instructions {#libmemif_build_doc} + +Install dependencies +``` +# sudo apt-get install -y git autoconf pkg_config libtool check +``` + +Clone repository to your local machine. +``` +# git clone https://github.com/JakubGrajciar/libmemif.git +``` + +From root directory execute: +For debug build: +``` +# ./bootstrap +# ./configure +# make +# make install +``` + +For release build: +``` +# ./bootstrap +# ./configure +# make release +# make install +``` +Verify installation: +``` +# ./.libs/icmpr-epoll +``` +> Make sure to run the binary file from ./.libs. File ./icmp\_responder in libmemif root directory is script that links the library, so it only verifies successful build. Default install path is /usr/lib. +Use _help_ command to display build information and commands: +``` +ICMP_Responder:add_epoll_fd:204: fd 0 added to epoll +MEMIF_DEBUG:src/main.c:memif_init:383: app name: ICMP_Responder +ICMP_Responder:add_epoll_fd:204: fd 4 added to epoll +LIBMEMIF EXAMPLE APP: ICMP_Responder (debug) +============================== +libmemif version: 1.0 (debug) +memif version: 256 +commands: + help - prints this help + exit - exit app + conn - create memif (slave-mode) + del - delete memif + show - show connection details + ip-set - set interface ip address + rx-mode - set queue rx mode +``` + +#### Examples + +Once the library is built/installed, refer to @ref libmemif_examples_doc and @ref libmemif_gettingstarted_doc for additional information on basic use cases and API usage. diff --git a/extras/libmemif/docs/gettingstarted_doc.md b/extras/libmemif/docs/gettingstarted_doc.md new file mode 100644 index 00000000..e3ae6e56 --- /dev/null +++ b/extras/libmemif/docs/gettingstarted_doc.md @@ -0,0 +1,219 @@ +## Getting started {#libmemif_gettingstarted_doc} + +#### Concept (Connecting to VPP) + +For detailed information on api calls and structures please refer to @ref libmemif.h. + +1. Initialize memif + - Declare callback function handling file descriptor event polling. +```C +int +control_fd_update (int fd, uint8_t events) +{ +... +} +``` + - Call memif initialization function. memif\_init +```C +err = memif_init (control_fd_update, APP_NAME); +``` + +> If event occurres on any file descriptor returned by this callback, call memif\_control\_fd\_handler function. +```C +memif_err = memif_control_fd_handler (evt.data.fd, events); +``` +> If callback function parameter for memif\_init function is set to NULL, libmemif will handle file descriptor event polling. + Api call memif\_poll\_event will call epoll\_pwait with user defined timeout to poll event on file descriptors opened by libmemif. +```C +/* main loop */ + while (1) + { + if (memif_poll_event (-1) < 0) + { + DBG ("poll_event error!"); + } + } +``` + +> Memif initialization function will initialize internal structures and create timer file descriptor, which will be used for sending periodic connection requests. Timer is disarmed if no memif interface is created. + +2. Creating interface + - Declare memif connction handle. +```C +memif_conn_handle_t c; +``` +> example app uses struct that contains connection handle, rx/tx buffers and other connection specific information. + + - Specify connection arguments. +```C +memif_conn_args_t args; +memset (&args, 0, sizeof (args)); +args.is_master = is_master; +args.log2_ring_size = 10; +args.buffer_size = 2048; +args.num_s2m_rings = 2; +args.num_m2s_rings = 2; +strncpy ((char *) args.interface_name, IF_NAME, strlen (IF_NAME)); +strncpy ((char *) args.instance_name, APP_NAME, strlen (APP_NAME)); +args.mode = 0; +args.interface_id = 0; +``` + - Declare callback functions called on connected/disconnected/interrupted status changed. +```C +int +on_connect (memif_conn_handle_t conn, void *private_ctx) +{ +... +} + +int +on_disconnect (memif_conn_handle_t conn, void *private_ctx) +{ + INFO ("memif connected!"); + return 0; +} +``` + - Call memif interface create function. memif\_create +```C +err = memif_create (&c->conn, + &args, on_connect, on_disconnect, on_interrupt, &ctx[index]); +``` +> If connection is in slave mode, arms timer file descriptor. +> If on interrupt callback is set to NULL, user will not be notified about interrupt. Use memif\_get\_queue\_efd call to get interrupt file descriptor for specific queue. +```C +int fd = -1; +err = memif_get_queue_efd (c->conn, data->qid, &fd); +``` + +3. Connection establishment + - User application will poll events on all file descriptors returned in memif\_control\_fd\_update\_t callback. + - On event call memif\_control\_fd\_handler. + - Everything else regarding connection establishment will be done internally. + - Once connection has been established, a callback will inform the user about connection status change. + +4. Interrupt packet receive + - If event is polled on interrupt file descriptor, libmemif will call memif\_interrupt\_t callback specified for every connection instance. +```C +int +on_interrupt (memif_conn_handle_t conn, void *private_ctx, uint16_t qid) +{ +... +} +``` + +6. Memif buffers + - Packet data are stored in memif\_buffer\_t. Pointer _data_ points to shared memory buffer, and unsigned integer *data\_len* contains packet data length. +```C +typedef struct +{ + uint16_t desc_index; + uint32_t buffer_len; + uint32_t data_len; + void *data; +} memif_buffer_t; +``` + +5. Packet receive + - Api call memif\_rx\_burst will set all required fields in memif buffers provided by user application. +```C +err = memif_rx_burst (c->conn, qid, c->rx_bufs, MAX_MEMIF_BUFS, &rx); +``` + - User application can then process packets. + - Api call memif\_buffer\_free will make supplied memif buffers ready for next receive and mark shared memory buffers as free. +```C +err = memif_buffer_free (c->conn, qid, c->rx_bufs, rx, &fb); +``` + +6. Packet transmit + - Api call memif\_buffer\_alloc will set all required fields in memif buffers provided by user application. +```C +err = memif_buffer_alloc (c->conn, qid, c->tx_bufs, n, &r); +``` + - User application can populate shared memory buffers with packets. + - Api call memif\_tx\_burst will inform peer interface (master memif on VPP) that there are packets ready to receive and mark memif buffers as free. +```C +err = memif_tx_burst (c->conn, qid, c->tx_bufs, c->tx_buf_num, &r); +``` + +7. Helper functions + - Memif details + - Api call memif\_get\_details will return details about connection. +```C +err = memif_get_details (c->conn, &md, buf, buflen); +``` + - Memif error messages + - Every api call returns error code (integer value) mapped to error string. + - Call memif\_strerror will return error message assigned to specific error code. +```C +if (err != MEMIF_ERR_SUCCESS) + INFO ("memif_get_details: %s", memif_strerror (err)); +``` + - Not all syscall errors are translated to memif error codes. If error code 1 (MEMIF\_ERR\_SYSCALL) is returned then libmemif needs to be compiled with -DMEMIF_DBG flag to print error message. Use _make -B_ to rebuild libmemif in debug mode. + +#### Example app (libmemif fd event polling): + +- @ref extras/libmemif/examples/icmp_responder + +> Optional argument: transmit queue id. +``` +icmpr 1 +``` +> Set transmit queue id to 1. Default is 0. +> Application will create memif interface in slave mode and try to connect to VPP. Exit using Ctrl+C. Application will handle SIGINT signal, free allocated memory and exit with EXIT_SUCCESS. + +#### Example app: + +ICMP Responder custom fd event polling. + +- @ref extras/libmemif/examples/icmp_responder-epoll + +#### Example app (multi-thread queue polling) + +ICMP Responder multi-thread. +- @ref extras/libmemif/examples/icmp_responder-mt + +> Simple example of libmemif multi-thread usage. Connection establishment is handled by main thread. There are two rx queues in this example. One in polling mode and second in interrupt mode. + +VPP config: +``` +# create memif id 0 master +# set int state memif0 up +# set int ip address memif0 192.168.1.1/24 +# ping 192.168.1.2 +``` +For multiple rings (queues) support run VPP with worker threads: +example startup.conf: +``` +unix { + interactive + nodaemon + full-coredump +} + +cpu { + workers 2 +} +``` +VPP config: +``` +# create memif id 0 master +# set int state memif0 up +# set int ip address memif0 192.168.1.1/24 +# ping 192.168.1.2 +``` +> Master mode queue number is limited by worker threads. Slave mode interface needs to specify number of queues. +``` +# create memif id 0 slave rx-queues 2 tx-queues 2 +``` +> Example applications use VPP default socket file for memif: /run/vpp/memif.sock +> For master mode, socket directory must exist prior to memif\_create call. + +#### Unit tests + +Unit tests use [Check](https://libcheck.github.io/check/index.html) framework. This framework must be installed in order to build *unit\_test* binary. +Ubuntu/Debian: +``` +sudo apt-get install check +``` +[More platforms](https://libcheck.github.io/check/web/install.html) + diff --git a/extras/libmemif/examples/ExampleSetup.md b/extras/libmemif/examples/ExampleSetup.md deleted file mode 100644 index a02506dd..00000000 --- a/extras/libmemif/examples/ExampleSetup.md +++ /dev/null @@ -1,206 +0,0 @@ -## Example setup - -#### VPP-memif master icmp_responder slave - -> Libmemif example app(s) use memif default socket file: /run/vpp/memif.sock. - -Run VPP and icmpr-epoll example (default example when running in container). -> Other examples work similar to icmpr-epoll. Brief explanation can be found in [Examples readme](README.md) file. - -VPP-side config: -``` -DBGvpp# create memif id 0 master -DBGvpp# set int state memif0/0 up -DBGvpp# set int ip address memif0/0 192.168.1.1/24 -``` -icmpr-epoll: -``` -conn 0 0 -``` -Memif in slave mode will try to connect every 2 seconds. If connection establishment is successfull, a message will show. -``` -INFO: memif connected! -``` -> Error messages like "unmatched interface id" are printed only in debug mode. - -Check connected status. -Use show command in icmpr-epoll: -``` -show -MEMIF DETAILS -============================== -interface index: 0 - interface ip: 192.168.1.2 - interface name: memif_connection - app name: ICMP_Responder - remote interface name: memif0/0 - remote app name: VPP 17.10-rc0~132-g62f9cdd - id: 0 - secret: - role: slave - mode: ethernet - socket filename: /run/vpp/memif.sock - rx queues: - queue id: 0 - ring size: 1024 - buffer size: 2048 - tx queues: - queue id: 0 - ring size: 1024 - buffer size: 2048 - link: up -interface index: 1 - no connection - -``` -Use sh memif command in VPP: -``` -DBGvpp# sh memif -interface memif0/0 - remote-name "ICMP_Responder" - remote-interface "memif_connection" - id 0 mode ethernet file /run/vpp/memif.sock - flags admin-up connected - listener-fd 12 conn-fd 13 - num-s2m-rings 1 num-m2s-rings 1 buffer-size 0 - master-to-slave ring 0: - region 0 offset 32896 ring-size 1024 int-fd 16 - head 0 tail 0 flags 0x0000 interrupts 0 - master-to-slave ring 0: - region 0 offset 0 ring-size 1024 int-fd 15 - head 0 tail 0 flags 0x0001 interrupts 0 -``` - -Send ping from VPP to icmpr-epoll: -``` -DBGvpp# ping 192.168.1.2 -64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=.1888 ms -64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=.1985 ms -64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=.1813 ms -64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=.1929 ms - -Statistics: 5 sent, 4 received, 20% packet loss -``` -#### multiple queues VPP-memif slave icmp_responder master - -Run icmpr-epoll as in previous example setup. -Run VPP with startup conf, enabling 2 worker threads. -Example startup.conf: -``` -unix { - interactive - nodaemon - full-coredump -} - -cpu { - workers 2 -} -``` -VPP-side config: -``` -DBGvpp# create memif id 0 slave rx-queues 2 tx-queues 2 -DBGvpp# set int state memif0/0 up -DBGvpp# set int ip address memif0/0 192.168.1.1/24 -``` -icmpr-epoll: -``` -conn 0 1 -``` -When connection is established a message will print: -``` -INFO: memif connected! -``` -> Error messages like "unmatched interface id" are printed only in debug mode. - -Check connected status. -Use show command in icmpr-epoll: -``` -show -MEMIF DETAILS -============================== -interface index: 0 - interface ip: 192.168.1.2 - interface name: memif_connection - app name: ICMP_Responder - remote interface name: memif0/0 - remote app name: VPP 17.10-rc0~132-g62f9cdd - id: 0 - secret: - role: master - mode: ethernet - socket filename: /run/vpp/memif.sock - rx queues: - queue id: 0 - ring size: 1024 - buffer size: 2048 - queue id: 1 - ring size: 1024 - buffer size: 2048 - tx queues: - queue id: 0 - ring size: 1024 - buffer size: 2048 - queue id: 1 - ring size: 1024 - buffer size: 2048 - link: up -interface index: 1 - no connection - -``` -Use sh memif command in VPP: -``` -DBGvpp# sh memif -interface memif0/0 - remote-name "ICMP_Responder" - remote-interface "memif_connection" - id 0 mode ethernet file /run/vpp/memif.sock - flags admin-up slave connected - listener-fd -1 conn-fd 12 - num-s2m-rings 2 num-m2s-rings 2 buffer-size 2048 - slave-to-master ring 0: - region 0 offset 0 ring-size 1024 int-fd 14 - head 0 tail 0 flags 0x0000 interrupts 0 - slave-to-master ring 1: - region 0 offset 32896 ring-size 1024 int-fd 15 - head 0 tail 0 flags 0x0000 interrupts 0 - slave-to-master ring 0: - region 0 offset 65792 ring-size 1024 int-fd 16 - head 0 tail 0 flags 0x0001 interrupts 0 - slave-to-master ring 1: - region 0 offset 98688 ring-size 1024 int-fd 17 - head 0 tail 0 flags 0x0001 interrupts 0 - -``` -Send ping from VPP to icmpr-epoll: -``` -DBGvpp# ping 192.168.1.2 -64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=.1439 ms -64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=.2184 ms -64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=.1458 ms -64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=.1687 ms - -Statistics: 5 sent, 4 received, 20% packet loss -``` - -#### icmp_responder master icmp_responder slave - -> Example apps can only repond to ping. This setup creates connection between two applications using libmemif. Traffic functionality is the same as when connection to VPP. App can receive ARP/ICMP request and transmit response, but can not send ARP/ICMP request. - -Run two instances of icmpr-epoll example. -> If not running in container, make sure folder /run/vpp/ exists before creating memif master. -Instance 1 will be in master mode, instance 2 in slave mode. -instance 1: -``` -conn 0 1 -``` -instance 2: -``` -conn 0 0 -``` -In 2 seconds, both instances should print connected! message: -``` -INFO: memif connected! -``` -Check peer interface names using show command. diff --git a/extras/libmemif/examples/README.md b/extras/libmemif/examples/README.md deleted file mode 100644 index 1375d27a..00000000 --- a/extras/libmemif/examples/README.md +++ /dev/null @@ -1,16 +0,0 @@ -## Examples - -After build, root folder will contain scripts linking binary examples with library (same name as example apps). These scripts can be executed to run example apps without installing the library. Example apps binaries can be found in _libs_ filder. To run binaries directly, make sure that libmemif library is installed. - -#### Run in container -ligato/libmemif-sample-service image contains built and installed libmemf. To run different examples, override docker CMD to start container in bash: -``` -# docker run -it --entrypoint=/bin/bash -i --rm --name icmp-responder --hostname icmp-responder --privileged -v "/run/vpp/:/run/vpp/" ligato/libmemif-sample-service -``` -Current WORKDIR is set to root repository directory. Example apps can be run from this directory (a script linking binary with library), or browse to ./.libs folder and execute binary directly. - -Example app | Description -------------|------------ -[icmpr](../examples/icmp_responder/main.c) | Simplest implementaion. Event polling is handled by libmemif. Single memif conenction in slave mode is created (id 0). Use Ctrl + C to exit app. Memif receive mode: interrupt. -[icmpr-epoll](../examples/icmp_responder-epoll/main.c) (run in container by default) | Supports multiple connections and master mode. User can create/delete connections, set ip addresses, print connection information. [Example setup](ExampleSetup.md) contains instructions on basic connection use cases setups. Memif receive mode: interrupt. App provides functionality to disable interrupts for specified queue/s for testing purposes. Polling mode is not implemented in this example. -[icmpr-mt](../examples/icmp_responder-mt/main.c) | Multi-thread example, very similar to icmpr-epoll. Packets are handled in threads assigned to specific queues. Slave mode only. Memif receive mode: polling (memif_rx_poll function), interrupt (memif_rx_interrupt function). Receive modes differ per queue. diff --git a/extras/libmemif/examples/example_setup_doc.md b/extras/libmemif/examples/example_setup_doc.md new file mode 100644 index 00000000..89a0b10e --- /dev/null +++ b/extras/libmemif/examples/example_setup_doc.md @@ -0,0 +1,207 @@ +## Example setup {#libmemif_example_setup_doc} + +#### VPP-memif master icmp_responder slave + +> Libmemif example app(s) use memif default socket file: `/run/vpp/memif.sock`. + +Run VPP and icmpr-epoll example (default example when running in container). + +> Other examples work similar to icmpr-epoll. Brief explanation can be found in @ref libmemif_examples_doc . + +VPP-side config: +``` +DBGvpp# create memif id 0 master +DBGvpp# set int state memif0/0 up +DBGvpp# set int ip address memif0/0 192.168.1.1/24 +``` +icmpr-epoll: +``` +conn 0 0 +``` +Memif in slave mode will try to connect every 2 seconds. If connection establishment is successfull, a message will show. +``` +INFO: memif connected! +``` +> Error messages like "unmatched interface id" are printed only in debug mode. + +Check connected status. +Use show command in icmpr-epoll: +``` +show +MEMIF DETAILS +============================== +interface index: 0 + interface ip: 192.168.1.2 + interface name: memif_connection + app name: ICMP_Responder + remote interface name: memif0/0 + remote app name: VPP 17.10-rc0~132-g62f9cdd + id: 0 + secret: + role: slave + mode: ethernet + socket filename: /run/vpp/memif.sock + rx queues: + queue id: 0 + ring size: 1024 + buffer size: 2048 + tx queues: + queue id: 0 + ring size: 1024 + buffer size: 2048 + link: up +interface index: 1 + no connection + +``` +Use sh memif command in VPP: +``` +DBGvpp# sh memif +interface memif0/0 + remote-name "ICMP_Responder" + remote-interface "memif_connection" + id 0 mode ethernet file /run/vpp/memif.sock + flags admin-up connected + listener-fd 12 conn-fd 13 + num-s2m-rings 1 num-m2s-rings 1 buffer-size 0 + master-to-slave ring 0: + region 0 offset 32896 ring-size 1024 int-fd 16 + head 0 tail 0 flags 0x0000 interrupts 0 + master-to-slave ring 0: + region 0 offset 0 ring-size 1024 int-fd 15 + head 0 tail 0 flags 0x0001 interrupts 0 +``` + +Send ping from VPP to icmpr-epoll: +``` +DBGvpp# ping 192.168.1.2 +64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=.1888 ms +64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=.1985 ms +64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=.1813 ms +64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=.1929 ms + +Statistics: 5 sent, 4 received, 20% packet loss +``` +#### multiple queues VPP-memif slave icmp_responder master + +Run icmpr-epoll as in previous example setup. +Run VPP with startup conf, enabling 2 worker threads. +Example startup.conf: +``` +unix { + interactive + nodaemon + full-coredump +} + +cpu { + workers 2 +} +``` +VPP-side config: +``` +DBGvpp# create memif id 0 slave rx-queues 2 tx-queues 2 +DBGvpp# set int state memif0/0 up +DBGvpp# set int ip address memif0/0 192.168.1.1/24 +``` +icmpr-epoll: +``` +conn 0 1 +``` +When connection is established a message will print: +``` +INFO: memif connected! +``` +> Error messages like "unmatched interface id" are printed only in debug mode. + +Check connected status. +Use show command in icmpr-epoll: +``` +show +MEMIF DETAILS +============================== +interface index: 0 + interface ip: 192.168.1.2 + interface name: memif_connection + app name: ICMP_Responder + remote interface name: memif0/0 + remote app name: VPP 17.10-rc0~132-g62f9cdd + id: 0 + secret: + role: master + mode: ethernet + socket filename: /run/vpp/memif.sock + rx queues: + queue id: 0 + ring size: 1024 + buffer size: 2048 + queue id: 1 + ring size: 1024 + buffer size: 2048 + tx queues: + queue id: 0 + ring size: 1024 + buffer size: 2048 + queue id: 1 + ring size: 1024 + buffer size: 2048 + link: up +interface index: 1 + no connection + +``` +Use sh memif command in VPP: +``` +DBGvpp# sh memif +interface memif0/0 + remote-name "ICMP_Responder" + remote-interface "memif_connection" + id 0 mode ethernet file /run/vpp/memif.sock + flags admin-up slave connected + listener-fd -1 conn-fd 12 + num-s2m-rings 2 num-m2s-rings 2 buffer-size 2048 + slave-to-master ring 0: + region 0 offset 0 ring-size 1024 int-fd 14 + head 0 tail 0 flags 0x0000 interrupts 0 + slave-to-master ring 1: + region 0 offset 32896 ring-size 1024 int-fd 15 + head 0 tail 0 flags 0x0000 interrupts 0 + slave-to-master ring 0: + region 0 offset 65792 ring-size 1024 int-fd 16 + head 0 tail 0 flags 0x0001 interrupts 0 + slave-to-master ring 1: + region 0 offset 98688 ring-size 1024 int-fd 17 + head 0 tail 0 flags 0x0001 interrupts 0 + +``` +Send ping from VPP to icmpr-epoll: +``` +DBGvpp# ping 192.168.1.2 +64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=.1439 ms +64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=.2184 ms +64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=.1458 ms +64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=.1687 ms + +Statistics: 5 sent, 4 received, 20% packet loss +``` + +#### icmp_responder master icmp_responder slave + +> Example apps can only repond to ping. This setup creates connection between two applications using libmemif. Traffic functionality is the same as when connection to VPP. App can receive ARP/ICMP request and transmit response, but can not send ARP/ICMP request. + +Run two instances of icmpr-epoll example. +> If not running in container, make sure folder /run/vpp/ exists before creating memif master. +Instance 1 will be in master mode, instance 2 in slave mode. +instance 1: +``` +conn 0 1 +``` +instance 2: +``` +conn 0 0 +``` +In 2 seconds, both instances should print connected! message: +``` +INFO: memif connected! +``` +Check peer interface names using show command. diff --git a/extras/libmemif/examples/examples_doc.md b/extras/libmemif/examples/examples_doc.md new file mode 100644 index 00000000..11937c57 --- /dev/null +++ b/extras/libmemif/examples/examples_doc.md @@ -0,0 +1,18 @@ +## Examples {#libmemif_examples_doc} + +After build, root folder will contain scripts linking binary examples with library (same name as example apps). These scripts can be executed to run example apps without installing the library. Example apps binaries can be found in _libs_ filder. To run binaries directly, make sure that libmemif library is installed. + +#### Run in container + +`ligato/libmemif-sample-service` image contains built and installed libmemf. To run different examples, override docker CMD to start container in bash: + +``` +# docker run -it --entrypoint=/bin/bash -i --rm --name icmp-responder --hostname icmp-responder --privileged -v "/run/vpp/:/run/vpp/" ligato/libmemif-sample-service +``` +Current WORKDIR is set to root repository directory. Example apps can be run from this directory (a script linking binary with library), or browse to `./.libs` folder and execute binary directly. + +Example app | Description +------------|------------ +@ref extras/libmemif/examples/icmp_responder | Simplest implementaion. Event polling is handled by libmemif. Single memif conenction in slave mode is created (id 0). Use Ctrl + C to exit app. Memif receive mode: interrupt. +@ref extras/libmemif/examples/icmp_responder-epoll (run in container by default) | Supports multiple connections and master mode. User can create/delete connections, set ip addresses, print connection information. @ref libmemif_example_setup_doc contains instructions on basic connection use cases setups. Memif receive mode: interrupt. App provides functionality to disable interrupts for specified queue/s for testing purposes. Polling mode is not implemented in this example. +@ref extras/libmemif/examples/icmp_responder-mt) | Multi-thread example, very similar to icmpr-epoll. Packets are handled in threads assigned to specific queues. Slave mode only. Memif receive mode: polling (memif_rx_poll function), interrupt (memif_rx_interrupt function). Receive modes differ per queue. diff --git a/extras/libmemif/libmemif_doc.md b/extras/libmemif/libmemif_doc.md new file mode 100644 index 00000000..fb4a18af --- /dev/null +++ b/extras/libmemif/libmemif_doc.md @@ -0,0 +1,74 @@ +Shared Memory Packet Interface (memif) Library {#libmemif_doc} +============================================== + +## Introduction + +Shared memory packet interface (memif) provides high performance packet transmit and receive between user application and Vector Packet Processing (VPP) or multiple user applications. Using libmemif, user application can create shared memory interface in master or slave mode and connect to VPP or another application using libmemif. Once the connection is established, user application can receive or transmit packets using libmemif API. + +![Architecture](docs/architecture.png) + +## Features + +- [x] Slave mode + - [x] Connect to VPP over memif + - [x] ICMP responder example app +- [x] Transmit/receive packets +- [x] Interrupt mode support +- [x] File descriptor event polling in libmemif (optional) + - [x] Simplify file descriptor event polling (one handler for control and interrupt channel) +- [x] Multiple connections +- [x] Multiple queues + - [x] Multi-thread support +- [x] Master mode + - [ ] Multiple regions (TODO) +- [ ] Performance testing (TODO) + +## Quickstart + +This setup will run libmemif ICMP responder example app in container. Install [docker](https://docs.docker.com/engine/installation) engine. +Useful link: [Docker documentation](https://docs.docker.com/get-started). + +Pull image: +``` +# docker pull ligato/libmemif-sample-service +``` + +Now you should be able to see ligato/libmemif-sample-service image on your local machine (IMAGE ID in this README may be outdated): +``` +# docker images +REPOSITORY TAG IMAGE ID CREATED SIZE +ligato/libmemif-sample-service latest 32ecc2f9d013 About a minute ago 468MB +... +``` + +Run container: +``` +# docker run -it --rm --name icmp-responder --hostname icmp-responder --privileged -v "/run/vpp/:/run/vpp/" ligato/libmemif-sample-service +``` +Example application will start in debug mode. Output should look like this: +``` +ICMP_Responder:add_epoll_fd:204: fd 0 added to epoll +MEMIF_DEBUG:src/main.c:memif_init:383: app name: ICMP_Responder +ICMP_Responder:add_epoll_fd:204: fd 4 added to epoll +LIBMEMIF EXAMPLE APP: ICMP_Responder (debug) +============================== +libmemif version: 1.0 (debug) +memif version: 256 +commands: + help - prints this help + exit - exit app + conn - create memif (slave-mode) + del - delete memif + show - show connection details + ip-set - set interface ip address + rx-mode - set queue rx mode +``` + +Continue with @ref libmemif_example_setup which contains instructions on how to set up conenction between icmpr-epoll example app and VPP-memif. + +#### Next steps + +- @subpage libmemif_build_doc +- @subpage libmemif_examples_doc +- @subpage libmemif_example_setup_doc +- @subpage libmemif_gettingstarted_doc diff --git a/src/plugins/acl/acl-plugin.md b/src/plugins/acl/acl-plugin.md deleted file mode 100644 index 1b44bca9..00000000 --- a/src/plugins/acl/acl-plugin.md +++ /dev/null @@ -1,347 +0,0 @@ -Multicore support for ACL plugin -================================ - -This captures some considerations and design decisions that I have made, -both for my own memory later on ("what the hell was I thinking?!?"), -and for anyone interested to criticize/improve/hack on this code. - -One of the factors taken into account while making these decisions, -was the relative emphasis on the multi-thread vs. single-thread -use cases: the latter is the vastly more prevalent. But, -one can not optimize the single-thread performance without -having a functioning code for multi-thread. - -stateless ACLs -============== - -The stateless trivially parallelizes, and the only potential for the -race between the different threads is during the reconfiguration, -at the time of replacing the old ACL being checked, with -the new ACL. - -In case an acl_add_replace is being used to replace the rules -within the existing entry, a reallocation of am->acls[X].rules -vector will happen and potentially a change in count. - -acl_match_5tuple() has the following code: - - a = am->acls + acl_index; - for (i = 0; i < a->count; i++) - { - r = a->rules + i; - . . . - -Ideally we should be immune from a->rules changing, -but the problem arises if the count changes in flight, -and the new ruleset is smaller - then we will attempt -to "match" against the free memory. - -This can(?) be solved by replacing the for() with while(), -so the comparison happens at each iteration. - -full_acl_match_5tuple(), which iterates over the list -of ACLs, is a bit less immune, since it takes the pointer -to the vector to iterate and keeps a local copy of -that pointer. - -This race can be solved by checking the -current pointer to the vector with the source pointer, -and seeing if there is an (unlikely) change, and if -there is, return the "deny" action, or, better, -restart the check. - -Since the check reloads the ACL list on a per-packet basis, -there is only a window of opportunity of one packet to -"match" packet against an incorrect rule set. -The workers also do not change anything, only read. -Therefore, it looks like building special structures -to ensure that it does not happen at all might be not -worth it. - -At least not until we have a unit-test able to -reliably catch this condition and test that -the measures applied are effective. Adding the code -which is not possible to exercise is worse than -not adding any code at all. - -So, I opt for "do-nothing" here for the moment. - -reflexive ACLs: single-thread -============================= - -Before we talk multi-thread, is worth revisiting the -design of the reflexive ACLs in the plugin, and -the history of their evolution. - -The very first version of the ACL plugin, shipped in -1701, mostly did the job using the existing components -and gluing them together. Because it needed to work -in bridged forwarding path only, using L2 classifier -as an insertion point appeared natural, also L2 classifier, -being a table with sessions, seemed like a good place -to hold the sessions. - -So, the original design had two conceptual nodes: -one, pointed by the next_miss from the L2 classifier table, -was checking the actual ACL, and inserting session into -the L2 classifier table, and the other one, pointed -to by the next_match within the specific session rule, -was checking the existing session. The timing out -of the existing connections was done in the datapath, -by periodically calling the aging function. - -This decision to use the existing components, -with its attrativeness, did bring a few limitations as well: - -* L2 classifier is a simple mask-and-value match, with -a fixed mask across the table. So, sanely supporting IPv6 -packets with extension headers in that framework was impossible. - -* There is no way to get a backpressure from L2 classifier -depending on memory usage. When it runs out of memory, -it simply crashes the box. When it runs out of memory ? -We don't really know. Depends on how it allocates it. - -* Since we need to match the *reflected* traffic, -we had to create *two* full session entries -in two different directions, which is quite wasteful memory-wise. - -* (showstopper): the L2 classifier runs only in -the bridged data path, so supporting routed data path -would require creating something else entirely different, -which would mean much more headaches support-wise going forward. - -Because of that, I have moved to a different model of -creating a session-5-tuple from the packet data - once, -and then doing all the matching just on that 5-tuple. - -This has allowed to add support for skipping IPv6 extension headers. - -Also, this new version started to store the sessions in a dedicated -bihash-per-interface, with the session key data being -aligned for the ingress packets, and being mirrored for the -egress packets. This allows of significant savings in memory, -because now we need to keep only one copy of the session table per -interface instead of two, and also to only have ONE node for all the lookups, -(L2/L3 path, in/out, IPv4/IPv6) - significantly reducing the code complexity. - -Unfortunately, bihash still has the "lack of backpressure" problem, -in a sense that if you try to insert too many entries and run out -of memory in the heap you supplied, you get a crash. - -To somewhat workaround against that, there is a "maximum tested number of sessions" -value, which tracks the currently inserted sessions in the bihash, -and if this number is being approached, a more aggressive cleanup -can happen. If this number is reached, two behaviors are possible: - -* attempt to do the stateless ACL matching and permit the packet - if it succeeds - -* deny the packet - -Currently I have opted for a second one, since it allows for -a better defined behavior, and if you have to permit -the traffic in both directions, why using stateful anyway ? - -In order to be able to do the cleanup, we need to discriminate between -the session types, with each session type having its own idle timeout. -In order to do that, we keep three lists, defined in enum acl_timeout_e: -ACL_TIMEOUT_UDP_IDLE, ACL_TIMEOUT_TCP_IDLE, ACL_TIMEOUT_TCP_TRANSIENT. - -The first one is hopefully obvious - it is just all UDP connections. -They have an idle timeout of 600 seconds. - -The second and third is a bit more subtle. TCP is a complicated protocol, -and we need to tread the fine line between doing too little and doing -too much, and triggering the potential compatibility issues because of -being a "middlebox". - -I decided to split the TCP connections into two classes: -established, and everything else. "Established", means we have seen -the SYN and ACK from both sides (with PUSH obviously masked out). -This is the "active" state of any TCP connection and we would like -to ensure we do not screw it up. So, the connections in this state -have the default idle timer of 24 hours. - -All the rest of the connections have the idle timeout of 2 minutes, -(inspired by an old value of MSL) and based on the observation -that the states this class represent are usually very short lived. - -Once we have these three baskets of connections, it is trivial to -imagine a simple cleanup mechanism to deal with this: take a -TCP transient connection that has been hanging around. - -It is debatable whether we want to do discrimination between the -different TCP transient connections. Assuming we do FIFO (and -the lists allow us to do just that), it means a given connection -on the head of the list has been hanging around for longest. -Thus, if we are short on resources, we might just go ahead and -reuse it within the datapath. - -This is where we are slowly approaching the question -"Why in the world have not you used timer wheel or such ?" - -The answer is simple: within the above constraints, it does -not buy me much. - -Also, timer wheel creates a leaky abstraction with a difficult -to manage corner case. Which corner case ? - -We have a set of objects (sessions) with an event that may -or may not happen (idle timeout timer firing), and a -necessity to reset the idle timeout when there is -activity on the session. - -In the worst case, where we had a 10000 of one-packet -UDP sessions just created 10 minutes ago, we would need -to deal with a spike of 10000 expired timers. - -Of course, if we have the active traffic on all -of these 10000 connections, then we will not have -to deal with that ? Right, but we will still have to deal -with canceling and requeueing the timers. - -In the best possible case, requeueing a timer is -going to be something along the lines of a linked-list -removal and reinsertion. - -However, keep in mind we already need to classify the -connections for reuse, so therefore we already have -the linked lists! - -And if we just check these linked lists periodically in -a FIFO fashion, we can get away with a very simple per-packet operation: -writing back the timestamp of "now" into the connection structure. - -Then rather than requeueing the list on a per-packet or per-frame -basis, we can defer this action until the time this session -appears on the head of the FIFO list, and the cleaning -routine makes the decision about whether to discard -the session (because the interval since last activity is bigger -than the idle timeout), or to requeue the session back to -the end of the list (because the last activity was less -than idle timeout ago). - -So, rather than using the timers, we can simply reuse our classification -FIFOs, with the following heuristic: do not look at the session that was -enqueued at time X until X+session_timeout. If we enqueue the sessions -in the order of their initial activity, then we can simply use enqueue -timestamp of the head session as a decision criterion for when we need -to get back at looking at it for the timeout purposes. - -Since the number of FIFOs is small, we get a slightly worse check -performance than with timers, but still O(1). - -We seemingly do quite a few "useless" operations of requeueing the items -back to the tail of the list - but, these are the operations we do not -have to do in the active data path, so overall it is a win. - -(Diversion: I believe this problem is congruent to poll vs. epoll or -events vs. threads, some reading on this subject: -http://web.archive.org/web/20120225022154/http://sheddingbikes.com/posts/1280829388.html) - -We can also can run a TCP-like scheme for adaptively changing -the wait period in the routine that deals with the connection timeouts: -we can attempt to check the connections a couple of times per second -(same as we would advance the timer wheel), and then if we have requeued -close to a max-per-quantum number of connections, we can half the waiting -interval, and if we did not requeue any, we can slowly increment the waiting -interval - which at a steady state should stabilize similar to what the TCP rate -does. - -reflexive ACLs: multi-thread -============================= - -The single-threaded implementation in 1704 used a separate "cleaner" process -to deal with the timing out of the connections. -It is all good and great when you know that there is only a single core -to run everything on, but the existence of the lists proves to be -a massive difficulty when it comes to operating from multiple threads. - -Initial study shows that with a few assumptions (e.g. that the cleaner running in main thread -and the worker have a demarcation point in time where either one or the other one touches -the session in the list) it might be possible to make it work, but the resulting -trickiness of doing it neatly with all the corner cases is quite large. - -So, for the multi-threaded scenario, we need to move the connection -aging back to the same CPU as its creation. - -Luckily we can do this with the help of the interrupts. - -So, the design is as follows: the aging thread (acl_fa_session_cleaner_process) -periodically fires the interrupts to the workers interrupt nodes (acl_fa_worker_session_cleaner_process_node.index), -using vlib_node_set_interrupt_pending(), and -the interrupt node acl_fa_worker_conn_cleaner_process() calls acl_fa_check_idle_sessions() -which does the actual job of advancing the lists. And within the actual datapath the only thing we will be -doing is putting the items onto FIFO, and updating the last active time on the existing connection. - -The one "delicate" part is that the worker for one leg of the connection might be different from -the worker of another leg of the connection - but, even if the "owner" tries to free the connection, -nothing terrible can happen - worst case the element of the pool (which is nominally free for a short period) -will get the timestamp updated - same thing about the TCP flags seen. - -A slightly trickier issue arises when the packet initially seen by one worker (thus owned by that worker), -and the return packet processed by another worker, and as a result changes the -the class of the connection (e.g. becomes TCP_ESTABLISHED from TCP_TRANSIENT or vice versa). -If the class changes from one with the shorter idle time to the one with the longer idle time, -then unless we are in the starvation mode where the transient connections are recycled, -we can simply do nothing and let the normal requeue mechanism kick in. If the class changes from the longer idle -timer to the shorter idle timer, then we risk keeping the connection around for longer than needed, which -will affect the resource usage. - -One solution to that is to have NxN ring buffers (where N is the number of workers), such that the non-owner -can signal to the owner the connection# that needs to be requeued out of order. - -A simpler solution though, is to ensure that each FIFO's period is equal to that of a shortest timer. -This way the resource starvation problem is taken care of, at an expense of some additional work. - -This all looks sufficiently nice and simple until a skeleton falls out of the closet: -sometimes we want to clean the connections en masse before they expire. - -There few potential scenarios: -1) removal of an ACL from the interface -2) removal of an interface -3) manual action of an operator (in the future). - -In order to tackle this, we need to modify the logic which decides whether to requeue the -connection on the end of the list, or to delete it due to idle timeout: - -We define a point in time, and have each worker thread fast-forward through its FIFO, -in the process looking for sessions that satisfy the criteria, and either keeping them or requeueing them. - -To keep the ease of appearance to the outside world, we still process this as an event -within the connection cleaner thread, but this event handler does as follows: -1) it creates the bitmap of the sw_if_index values requested to be cleared -2) for each worker, it waits to ensure there is no cleanup operation in progress (and if there is one, -it waits), and then makes a copy of the bitmap, sets the per-worker flag of a cleanup operation, and sends an interrupt. -3) wait until all cleanup operations have completed. - -Within the worker interrupt node, we check if the "cleanup in progress" is set, -and if it is, we check the "fast forward time" value. If unset, we initialize it to value now, and compare the -requested bitmap of sw_if_index values (pending_clear_sw_if_index_bitmap) with the bitmap of sw_if_index that this worker deals with. - -(we set the bit in the bitmap every time we enqueue the packet onto a FIFO - serviced_sw_if_index_bitmap in acl_fa_conn_list_add_session). - -If the result of this AND operation is zero - then we can clear the flag of cleanup in progress and return. -Else we kick off the quantum of cleanup, and make sure we get another interrupt ASAP if that cleanup operation returns non-zero, -meaning there is more work to do. -When that operation returns zero, everything has been processed, we can clear the "cleanup-in-progress" flag, and -zeroize the bitmap of sw_if_index-es requested to be cleaned. - -The interrupt node signals its wish to receive an interrupt ASAP by setting interrupt_is_needed -flag within the per-worker structure. The main thread, while waiting for the -cleanup operation to complete, checks if there is a request for interrupt, -and if there is - it sends one. - -This approach gives us a way to mass-clean the connections which is reusing the code of the regular idle -connection cleanup. - -One potential inefficiency is the bitmap values set by the session insertion -in the data path - there is nothing to clear them. - -So, if one rearranges the interface placement with the workers, then the cleanups will cause some unnecessary work. -For now, we consider it an acceptable limitation. It can be resolved by having another per-worker bitmap, which, when set, -would trigger the cleanup of the bits in the serviced_sw_if_index_bitmap). - -=== the end === - diff --git a/src/plugins/acl/acl_hash_lookup_doc.md b/src/plugins/acl/acl_hash_lookup_doc.md new file mode 100644 index 00000000..cb93df04 --- /dev/null +++ b/src/plugins/acl/acl_hash_lookup_doc.md @@ -0,0 +1,241 @@ +ACL plugin constant-time lookup design {#acl_hash_lookup} +====================================== + +The initial implementation of ACL plugin performs a trivial for() cycle, +going through the assigned ACLs on a per-packet basis. This is not very +efficient, even if for very short ACLs due to its simplicity it can beat +more advanced methods. + +However, to cover the case of longer ACLs with acceptable performance, +we need to have a better way of matching. This write-up proposes +a mechanism to make a lookup from O(M) where M is number of entries +to O(N) where N is number of different mask combinations. + +Preparation of ACL(s) +--------------------- + +The ACL plugin will maintain a global list of "mask types", i.e. the specific +configurations of "do not care" bits within the ACEs. +Upon the creation of a new ACL, a pass will be made through all the +ACEs, to assign and possibly allocate the "mask type number". + +Each ACL has a structure *hash_acl_info_t* representing the "hash-based" +parts of information related to that ACL, primarily the array of +*hash_ace_info_t* structures - each of the members of that array +corresponding to one of the rules (ACEs) in the original ACL, +for this they have a pair of *(acl_index, ace_index)* to keep track, +predominantly for the debugging. + +Why do we need a whole separate structure, and are not adding new fields +to the existing rile structure ? First, encapsulation, to minimize +the pollution of the main ACL code with the hash-based lookup artifacts. + +Second, one rule may correspond to more than one "hash-based" ACE. +In fact, most of the rules do correspond to two of those. Why ? + +Consider that the current ACL lookup logic is that if a packet +is not the initial fragment, and there is an L4 entry acting on the packet, +the comparison will be made only on the L4 protocol field value rather +than on the protocol and port values. This beaviour is governed by +*l4_match_nonfirst_fragment* flag in the *acl_main*, and was needed to +maintain the compatibility with the existing software switch implementation. + +While for the sequential check in *single_acl_match_5tuple()* +it is very easy to implement by just breaking out at the right moment, +in case of hash-based matching this cost us two checks: +one on full 5-tuple and the flag *pkt.is_nonfirst_fragment* being zero, +the second on 3-tuple and the flag *pkt.is_nonfirst_fragment* being one, +with the second check triggered by the *acl_main.l4_match_nonfirst_fragment* +setting being the default 1. This dictates the necessity of having a "match" +field in a given *hash_ace_info_t* element, which would reflect the value +we are supposed to match after applying the mask. + +There can be other circumstances when it might be beneficial to expand +the given rule in the original ACL into multiple - for example, as an +optimization within the port range handling for small port ranges +(this is not done as of the time of writing). + +Assigning ACLs to an interface +------------------------------ + +Once the ACL list is assigned to an interface, or, rather, a new ACL +is added to the list of the existing ACLs applied to the interface, +we need to update the bihash accelerating the lookup. + +All the entries for the lookups are stored within a single *48_8* bihash, +which captures the 5-tuple from the packet as well as the miscellaneous +per-packet information flags, e.g. *l4_valid*, *is_non_first_fragment*, +and so on. To facilitate the use of the single bihash by all the interfaces, +the *is_ip6*, *is_input*, *sw_if_index* are part of the key, +as well as *mask_type_index* - the latter being necessary because +there can be entries with the same value but different masks, e.g.: +`permit ::/0, permit::/128`. + +At the moment of an ACL being applied to an interface, we need to +walk the list of *hash_ace_info_t* entries corresponding to that ACL, +and update the bihash with the keys corresponding to the match +values in these entries. + +The value of the hash match contains the index into a per-*sw_if_index* vector +of *applied_ace_hash_entry_t* elements, as well as a couple of flags: +*shadowed* (optimization: if this flag on a matched entry is zero, means +we can stop the lookup early and declare a match - see below), +and *need_portrange_check* - meaning that what matched was a superset +of the actual match, and we need to perform an extra check. + +Also, upon insertion, we must keep in mind there can be +multiple *applied_ace_hash_entry_t* for the same key and must keep +a list of those. This is necessary to incrementally apply/unapply +the ACLs as part of the ACL vector: say, two ACLs have +"permit 2001:db8::1/128 any" - we should be able to retain the entry +for the second ACL even if we have deleted the first one. +Also, in case there are two entries with the same key but +different port ranges, say 0..42 and 142..65535 - we need +to be able to sequentially match on those if we decide not +to expand them into individual port-specific entries. + +Per-packet lookup +----------------- + +The simple single-packet lookup is defined in +*multi_acl_match_get_applied_ace_index*, which returns the index +of the applied hash ACE if there was a match, or ~0 if there wasn't. + +The future optimized per-packet lookup may be batched in three phases: + +1. Prepare the keys in the per-worker vector by doing logical AND of + original 5-tuple record with the elements of the mask vector. +2. Lookup the keys in the bihash in a batch manner, collecting the + result with lowest u64 (acl index within vector, ACE index) from + the hash lookup value, and performing the list walk if necessary + (for portranges) +3. Take the action from the ACL record as defined by (ACL#, ACE#) from the + resulting lookup winner, or, if no match found, then perform default deny. + +Shadowed/independent/redundant ACEs +------------------------------------ + +During the phase of combining multiple ACLs into one rulebase, when they +are applied to interface, we also can perform several optimizations. + +If a given ACE is a strict subset of another ACE located up in the linear +search order, we can ignore this ACE completely - because by definition +it will never match. We will call such an ACE *redundant*. Here is an example: + +``` +permit 2001:db8:1::/48 2001:db8:2::/48 (B) +deny 2001:d8b:1:1::/64 2001:db8:2:1::/64 (A) +``` + +A bit more formally, we can define this relationship of an ACE A to ACE B as: + +``` +redundant(aceA, aceB) := (contains(protoB, protoA) && contains(srcB, srcA) + && contains(dstB, dstA) && is_after(A, B)) +``` + +Here as "contains" we define an operation operating on the sets defined by +the protocol, (srcIP, srcPortDefinition) and (dstIP, dstPortDefinition) +respectively, and returning true if all the elements represented by +the second argument are represented by the first argument. The "is_after" +is true if A is located below B in the ruleset. + +If a given ACE does not intersect at all with any other ACE +in front of it, we can mark it as such. + +Then during the sequence of the lookups the successful hit on this ACE means +we do not need to look up other mask combinations - thus potentially +significantly speeding up the match process. Here is an example, +assuming we have the following ACL: + +``` +permit 2001:db8:1::/48 2001:db8:2::/48 (B) +deny 2001:db8:3::/48 2001:db8:2:1::/64 (A) +``` + +In this case if we match the second entry, we do not need to check whether +we have matched the first one - the source addresses are completely +different. We call such an ACE *independent* from another. + +We can define this as + +``` +independent(aceA, aceB) := (!intersect(protoA, protoB) || + !intersect(srcA, srcB) || + !intersect(dstA, dstB)) +``` + +where intersect is defined as operation returning true if there are +elements belonging to the sets of both arguments. + +If the entry A is neither redundant nor independent from B, and is below +B in the ruleset, we call such an entry *shadowed* by B, here is an example: + +``` +deny tcp 2001:db8:1::/48 2001:db8:2::/48 (B) +permit 2001:d8b:1:1::/64 2001:db8:2:1::/64 (A) +``` + +This means the earlier rule "carves out" a subset of A, thus leaving +a "shadow". (Evidently, the action needs to be different for the shadow +to have an effect, but for for the terminology sake we do not care). + +The more formal definition: + +``` +shadowed(aceA, aceB) := !redundante(aceA, aceB) && + !independent(aceA, aceB) && + is_after(aceA, aceB) +``` + +Using this terminology, any ruleset can be represented as +a DAG (Directed Acyclic Graph), with the bottom being the implicit +"deny any", pointing to the set of rules shadowing it or the ones +it is redundant for. + +These rules may in turn be shadowing each other. There is no cycles in +this graph because of the natural order of the rules - the rule located +closer to the end of the ruleset can never shadow or make redundant a rule +higher up. + +The optimization that enables can allow for is to skip matching certain +masks on a per-lookup basis - if a given rule has matched, +the only adjustments that can happen is the match with one of +the shadowing rules. + +Also, another avenue for the optimization can be starting the lookup process +with the mask type that maximizes the chances of the independent ACE match, +thus resulting in an ACE lookup being a single hash table hit. + + +Plumbing +-------- + +All the new routines are located in a separate file, +so we can cleanly experiment with a different approach if this +does not fit all of the use cases. + +The constant-time lookup within the data path has the API with +the same signature as: + +``` +u8 +multi_acl_match_5tuple (u32 sw_if_index, fa_5tuple_t * pkt_5tuple, int is_l2, + int is_ip6, int is_input, u32 * acl_match_p, + u32 * rule_match_p, u32 * trace_bitmap) +``` + +There should be a new upper-level function with the same signature, which +will make a decision whether to use a linear lookup, or to use the +constant-time lookup implemented by this work, or to add some other +optimizations (e.g. by keeping the cache of the last N lookups). + +The calls to the routine doing preparatory work should happen +in `acl_add_list()` after creating the linear-lookup structures, +and the routine doing the preparatory work populating the hashtable +should be called from `acl_interface_add_del_inout_acl()` or its callees. + +The initial implementation will be geared towards looking up a single +match at a time, with the subsequent optimizations possible to make +the lookup for more than one packet. + diff --git a/src/plugins/acl/acl_multicore_doc.md b/src/plugins/acl/acl_multicore_doc.md new file mode 100644 index 00000000..b2cf7b9c --- /dev/null +++ b/src/plugins/acl/acl_multicore_doc.md @@ -0,0 +1,349 @@ +Multicore support for ACL plugin {#acl_multicore} +================================ + +This captures some considerations and design decisions that I have made, +both for my own memory later on ("what the hell was I thinking?!?"), +and for anyone interested to criticize/improve/hack on this code. + +One of the factors taken into account while making these decisions, +was the relative emphasis on the multi-thread vs. single-thread +use cases: the latter is the vastly more prevalent. But, +one can not optimize the single-thread performance without +having a functioning code for multi-thread. + +stateless ACLs +============== + +The stateless trivially parallelizes, and the only potential for the +race between the different threads is during the reconfiguration, +at the time of replacing the old ACL being checked, with +the new ACL. + +In case an acl_add_replace is being used to replace the rules +within the existing entry, a reallocation of `am->acls[X].rules` +vector will happen and potentially a change in count. + +acl_match_5tuple() has the following code: + +```{.c} + a = am->acls + acl_index; + for (i = 0; i < a->count; i++) + { + r = a->rules + i; + . . . +``` + +Ideally we should be immune from a->rules changing, +but the problem arises if the count changes in flight, +and the new ruleset is smaller - then we will attempt +to "match" against the free memory. + +This can(?) be solved by replacing the for() with while(), +so the comparison happens at each iteration. + +full_acl_match_5tuple(), which iterates over the list +of ACLs, is a bit less immune, since it takes the pointer +to the vector to iterate and keeps a local copy of +that pointer. + +This race can be solved by checking the +current pointer to the vector with the source pointer, +and seeing if there is an (unlikely) change, and if +there is, return the "deny" action, or, better, +restart the check. + +Since the check reloads the ACL list on a per-packet basis, +there is only a window of opportunity of one packet to +"match" packet against an incorrect rule set. +The workers also do not change anything, only read. +Therefore, it looks like building special structures +to ensure that it does not happen at all might be not +worth it. + +At least not until we have a unit-test able to +reliably catch this condition and test that +the measures applied are effective. Adding the code +which is not possible to exercise is worse than +not adding any code at all. + +So, I opt for "do-nothing" here for the moment. + +reflexive ACLs: single-thread +============================= + +Before we talk multi-thread, is worth revisiting the +design of the reflexive ACLs in the plugin, and +the history of their evolution. + +The very first version of the ACL plugin, shipped in +1701, mostly did the job using the existing components +and gluing them together. Because it needed to work +in bridged forwarding path only, using L2 classifier +as an insertion point appeared natural, also L2 classifier, +being a table with sessions, seemed like a good place +to hold the sessions. + +So, the original design had two conceptual nodes: +one, pointed by the next_miss from the L2 classifier table, +was checking the actual ACL, and inserting session into +the L2 classifier table, and the other one, pointed +to by the next_match within the specific session rule, +was checking the existing session. The timing out +of the existing connections was done in the datapath, +by periodically calling the aging function. + +This decision to use the existing components, +with its attrativeness, did bring a few limitations as well: + +* L2 classifier is a simple mask-and-value match, with +a fixed mask across the table. So, sanely supporting IPv6 +packets with extension headers in that framework was impossible. + +* There is no way to get a backpressure from L2 classifier +depending on memory usage. When it runs out of memory, +it simply crashes the box. When it runs out of memory ? +We don't really know. Depends on how it allocates it. + +* Since we need to match the *reflected* traffic, +we had to create *two* full session entries +in two different directions, which is quite wasteful memory-wise. + +* (showstopper): the L2 classifier runs only in +the bridged data path, so supporting routed data path +would require creating something else entirely different, +which would mean much more headaches support-wise going forward. + +Because of that, I have moved to a different model of +creating a session-5-tuple from the packet data - once, +and then doing all the matching just on that 5-tuple. + +This has allowed to add support for skipping IPv6 extension headers. + +Also, this new version started to store the sessions in a dedicated +bihash-per-interface, with the session key data being +aligned for the ingress packets, and being mirrored for the +egress packets. This allows of significant savings in memory, +because now we need to keep only one copy of the session table per +interface instead of two, and also to only have ONE node for all the lookups, +(L2/L3 path, in/out, IPv4/IPv6) - significantly reducing the code complexity. + +Unfortunately, bihash still has the "lack of backpressure" problem, +in a sense that if you try to insert too many entries and run out +of memory in the heap you supplied, you get a crash. + +To somewhat workaround against that, there is a "maximum tested number of sessions" +value, which tracks the currently inserted sessions in the bihash, +and if this number is being approached, a more aggressive cleanup +can happen. If this number is reached, two behaviors are possible: + +* attempt to do the stateless ACL matching and permit the packet + if it succeeds + +* deny the packet + +Currently I have opted for a second one, since it allows for +a better defined behavior, and if you have to permit +the traffic in both directions, why using stateful anyway ? + +In order to be able to do the cleanup, we need to discriminate between +the session types, with each session type having its own idle timeout. +In order to do that, we keep three lists, defined in enum acl_timeout_e: +ACL_TIMEOUT_UDP_IDLE, ACL_TIMEOUT_TCP_IDLE, ACL_TIMEOUT_TCP_TRANSIENT. + +The first one is hopefully obvious - it is just all UDP connections. +They have an idle timeout of 600 seconds. + +The second and third is a bit more subtle. TCP is a complicated protocol, +and we need to tread the fine line between doing too little and doing +too much, and triggering the potential compatibility issues because of +being a "middlebox". + +I decided to split the TCP connections into two classes: +established, and everything else. "Established", means we have seen +the SYN and ACK from both sides (with PUSH obviously masked out). +This is the "active" state of any TCP connection and we would like +to ensure we do not screw it up. So, the connections in this state +have the default idle timer of 24 hours. + +All the rest of the connections have the idle timeout of 2 minutes, +(inspired by an old value of MSL) and based on the observation +that the states this class represent are usually very short lived. + +Once we have these three baskets of connections, it is trivial to +imagine a simple cleanup mechanism to deal with this: take a +TCP transient connection that has been hanging around. + +It is debatable whether we want to do discrimination between the +different TCP transient connections. Assuming we do FIFO (and +the lists allow us to do just that), it means a given connection +on the head of the list has been hanging around for longest. +Thus, if we are short on resources, we might just go ahead and +reuse it within the datapath. + +This is where we are slowly approaching the question +"Why in the world have not you used timer wheel or such ?" + +The answer is simple: within the above constraints, it does +not buy me much. + +Also, timer wheel creates a leaky abstraction with a difficult +to manage corner case. Which corner case ? + +We have a set of objects (sessions) with an event that may +or may not happen (idle timeout timer firing), and a +necessity to reset the idle timeout when there is +activity on the session. + +In the worst case, where we had a 10000 of one-packet +UDP sessions just created 10 minutes ago, we would need +to deal with a spike of 10000 expired timers. + +Of course, if we have the active traffic on all +of these 10000 connections, then we will not have +to deal with that ? Right, but we will still have to deal +with canceling and requeueing the timers. + +In the best possible case, requeueing a timer is +going to be something along the lines of a linked-list +removal and reinsertion. + +However, keep in mind we already need to classify the +connections for reuse, so therefore we already have +the linked lists! + +And if we just check these linked lists periodically in +a FIFO fashion, we can get away with a very simple per-packet operation: +writing back the timestamp of "now" into the connection structure. + +Then rather than requeueing the list on a per-packet or per-frame +basis, we can defer this action until the time this session +appears on the head of the FIFO list, and the cleaning +routine makes the decision about whether to discard +the session (because the interval since last activity is bigger +than the idle timeout), or to requeue the session back to +the end of the list (because the last activity was less +than idle timeout ago). + +So, rather than using the timers, we can simply reuse our classification +FIFOs, with the following heuristic: do not look at the session that was +enqueued at time X until X+session_timeout. If we enqueue the sessions +in the order of their initial activity, then we can simply use enqueue +timestamp of the head session as a decision criterion for when we need +to get back at looking at it for the timeout purposes. + +Since the number of FIFOs is small, we get a slightly worse check +performance than with timers, but still O(1). + +We seemingly do quite a few "useless" operations of requeueing the items +back to the tail of the list - but, these are the operations we do not +have to do in the active data path, so overall it is a win. + +(Diversion: I believe this problem is congruent to poll vs. epoll or +events vs. threads, some reading on this subject: +http://web.archive.org/web/20120225022154/http://sheddingbikes.com/posts/1280829388.html) + +We can also can run a TCP-like scheme for adaptively changing +the wait period in the routine that deals with the connection timeouts: +we can attempt to check the connections a couple of times per second +(same as we would advance the timer wheel), and then if we have requeued +close to a max-per-quantum number of connections, we can half the waiting +interval, and if we did not requeue any, we can slowly increment the waiting +interval - which at a steady state should stabilize similar to what the TCP rate +does. + +reflexive ACLs: multi-thread +============================= + +The single-threaded implementation in 1704 used a separate "cleaner" process +to deal with the timing out of the connections. +It is all good and great when you know that there is only a single core +to run everything on, but the existence of the lists proves to be +a massive difficulty when it comes to operating from multiple threads. + +Initial study shows that with a few assumptions (e.g. that the cleaner running in main thread +and the worker have a demarcation point in time where either one or the other one touches +the session in the list) it might be possible to make it work, but the resulting +trickiness of doing it neatly with all the corner cases is quite large. + +So, for the multi-threaded scenario, we need to move the connection +aging back to the same CPU as its creation. + +Luckily we can do this with the help of the interrupts. + +So, the design is as follows: the aging thread (acl_fa_session_cleaner_process) +periodically fires the interrupts to the workers interrupt nodes (acl_fa_worker_session_cleaner_process_node.index), +using vlib_node_set_interrupt_pending(), and +the interrupt node acl_fa_worker_conn_cleaner_process() calls acl_fa_check_idle_sessions() +which does the actual job of advancing the lists. And within the actual datapath the only thing we will be +doing is putting the items onto FIFO, and updating the last active time on the existing connection. + +The one "delicate" part is that the worker for one leg of the connection might be different from +the worker of another leg of the connection - but, even if the "owner" tries to free the connection, +nothing terrible can happen - worst case the element of the pool (which is nominally free for a short period) +will get the timestamp updated - same thing about the TCP flags seen. + +A slightly trickier issue arises when the packet initially seen by one worker (thus owned by that worker), +and the return packet processed by another worker, and as a result changes the +the class of the connection (e.g. becomes TCP_ESTABLISHED from TCP_TRANSIENT or vice versa). +If the class changes from one with the shorter idle time to the one with the longer idle time, +then unless we are in the starvation mode where the transient connections are recycled, +we can simply do nothing and let the normal requeue mechanism kick in. If the class changes from the longer idle +timer to the shorter idle timer, then we risk keeping the connection around for longer than needed, which +will affect the resource usage. + +One solution to that is to have NxN ring buffers (where N is the number of workers), such that the non-owner +can signal to the owner the connection# that needs to be requeued out of order. + +A simpler solution though, is to ensure that each FIFO's period is equal to that of a shortest timer. +This way the resource starvation problem is taken care of, at an expense of some additional work. + +This all looks sufficiently nice and simple until a skeleton falls out of the closet: +sometimes we want to clean the connections en masse before they expire. + +There few potential scenarios: +1) removal of an ACL from the interface +2) removal of an interface +3) manual action of an operator (in the future). + +In order to tackle this, we need to modify the logic which decides whether to requeue the +connection on the end of the list, or to delete it due to idle timeout: + +We define a point in time, and have each worker thread fast-forward through its FIFO, +in the process looking for sessions that satisfy the criteria, and either keeping them or requeueing them. + +To keep the ease of appearance to the outside world, we still process this as an event +within the connection cleaner thread, but this event handler does as follows: +1) it creates the bitmap of the sw_if_index values requested to be cleared +2) for each worker, it waits to ensure there is no cleanup operation in progress (and if there is one, +it waits), and then makes a copy of the bitmap, sets the per-worker flag of a cleanup operation, and sends an interrupt. +3) wait until all cleanup operations have completed. + +Within the worker interrupt node, we check if the "cleanup in progress" is set, +and if it is, we check the "fast forward time" value. If unset, we initialize it to value now, and compare the +requested bitmap of sw_if_index values (pending_clear_sw_if_index_bitmap) with the bitmap of sw_if_index that this worker deals with. + +(we set the bit in the bitmap every time we enqueue the packet onto a FIFO - serviced_sw_if_index_bitmap in acl_fa_conn_list_add_session). + +If the result of this AND operation is zero - then we can clear the flag of cleanup in progress and return. +Else we kick off the quantum of cleanup, and make sure we get another interrupt ASAP if that cleanup operation returns non-zero, +meaning there is more work to do. +When that operation returns zero, everything has been processed, we can clear the "cleanup-in-progress" flag, and +zeroize the bitmap of sw_if_index-es requested to be cleaned. + +The interrupt node signals its wish to receive an interrupt ASAP by setting interrupt_is_needed +flag within the per-worker structure. The main thread, while waiting for the +cleanup operation to complete, checks if there is a request for interrupt, +and if there is - it sends one. + +This approach gives us a way to mass-clean the connections which is reusing the code of the regular idle +connection cleanup. + +One potential inefficiency is the bitmap values set by the session insertion +in the data path - there is nothing to clear them. + +So, if one rearranges the interface placement with the workers, then the cleanups will cause some unnecessary work. +For now, we consider it an acceptable limitation. It can be resolved by having another per-worker bitmap, which, when set, +would trigger the cleanup of the bits in the serviced_sw_if_index_bitmap). + +=== the end === + diff --git a/src/plugins/acl/hash_lookup.md b/src/plugins/acl/hash_lookup.md deleted file mode 100644 index 95524643..00000000 --- a/src/plugins/acl/hash_lookup.md +++ /dev/null @@ -1,241 +0,0 @@ -ACL plugin constant-time lookup design -====================================== - -The initial implementation of ACL plugin performs a trivial for() cycle, -going through the assigned ACLs on a per-packet basis. This is not very -efficient, even if for very short ACLs due to its simplicity it can beat -more advanced methods. - -However, to cover the case of longer ACLs with acceptable performance, -we need to have a better way of matching. This write-up proposes -a mechanism to make a lookup from O(M) where M is number of entries -to O(N) where N is number of different mask combinations. - -Preparation of ACL(s) ---------------------- - -The ACL plugin will maintain a global list of "mask types", i.e. the specific -configurations of "do not care" bits within the ACEs. -Upon the creation of a new ACL, a pass will be made through all the -ACEs, to assign and possibly allocate the "mask type number". - -Each ACL has a structure *hash_acl_info_t* representing the "hash-based" -parts of information related to that ACL, primarily the array of -*hash_ace_info_t* structures - each of the members of that array -corresponding to one of the rules (ACEs) in the original ACL, -for this they have a pair of *(acl_index, ace_index)* to keep track, -predominantly for the debugging. - -Why do we need a whole separate structure, and are not adding new fields -to the existing rile structure ? First, encapsulation, to minimize -the pollution of the main ACL code with the hash-based lookup artifacts. - -Second, one rule may correspond to more than one "hash-based" ACE. -In fact, most of the rules do correspond to two of those. Why ? - -Consider that the current ACL lookup logic is that if a packet -is not the initial fragment, and there is an L4 entry acting on the packet, -the comparison will be made only on the L4 protocol field value rather -than on the protocol and port values. This beaviour is governed by -*l4_match_nonfirst_fragment* flag in the *acl_main*, and was needed to -maintain the compatibility with the existing software switch implementation. - -While for the sequential check in *single_acl_match_5tuple()* -it is very easy to implement by just breaking out at the right moment, -in case of hash-based matching this cost us two checks: -one on full 5-tuple and the flag *pkt.is_nonfirst_fragment* being zero, -the second on 3-tuple and the flag *pkt.is_nonfirst_fragment* being one, -with the second check triggered by the *acl_main.l4_match_nonfirst_fragment* -setting being the default 1. This dictates the necessity of having a "match" -field in a given *hash_ace_info_t* element, which would reflect the value -we are supposed to match after applying the mask. - -There can be other circumstances when it might be beneficial to expand -the given rule in the original ACL into multiple - for example, as an -optimization within the port range handling for small port ranges -(this is not done as of the time of writing). - -Assigning ACLs to an interface ------------------------------- - -Once the ACL list is assigned to an interface, or, rather, a new ACL -is added to the list of the existing ACLs applied to the interface, -we need to update the bihash accelerating the lookup. - -All the entries for the lookups are stored within a single *48_8* bihash, -which captures the 5-tuple from the packet as well as the miscellaneous -per-packet information flags, e.g. *l4_valid*, *is_non_first_fragment*, -and so on. To facilitate the use of the single bihash by all the interfaces, -the *is_ip6*, *is_input*, *sw_if_index* are part of the key, -as well as *mask_type_index* - the latter being necessary because -there can be entries with the same value but different masks, e.g.: -`permit ::/0, permit::/128`. - -At the moment of an ACL being applied to an interface, we need to -walk the list of *hash_ace_info_t* entries corresponding to that ACL, -and update the bihash with the keys corresponding to the match -values in these entries. - -The value of the hash match contains the index into a per-*sw_if_index* vector -of *applied_ace_hash_entry_t* elements, as well as a couple of flags: -*shadowed* (optimization: if this flag on a matched entry is zero, means -we can stop the lookup early and declare a match - see below), -and *need_portrange_check* - meaning that what matched was a superset -of the actual match, and we need to perform an extra check. - -Also, upon insertion, we must keep in mind there can be -multiple *applied_ace_hash_entry_t* for the same key and must keep -a list of those. This is necessary to incrementally apply/unapply -the ACLs as part of the ACL vector: say, two ACLs have -"permit 2001:db8::1/128 any" - we should be able to retain the entry -for the second ACL even if we have deleted the first one. -Also, in case there are two entries with the same key but -different port ranges, say 0..42 and 142..65535 - we need -to be able to sequentially match on those if we decide not -to expand them into individual port-specific entries. - -Per-packet lookup ------------------ - -The simple single-packet lookup is defined in -*multi_acl_match_get_applied_ace_index*, which returns the index -of the applied hash ACE if there was a match, or ~0 if there wasn't. - -The future optimized per-packet lookup may be batched in three phases: - -1. Prepare the keys in the per-worker vector by doing logical AND of - original 5-tuple record with the elements of the mask vector. -2. Lookup the keys in the bihash in a batch manner, collecting the - result with lowest u64 (acl index within vector, ACE index) from - the hash lookup value, and performing the list walk if necessary - (for portranges) -3. Take the action from the ACL record as defined by (ACL#, ACE#) from the - resulting lookup winner, or, if no match found, then perform default deny. - -Shadowed/independent/redundant ACEs ------------------------------------- - -During the phase of combining multiple ACLs into one rulebase, when they -are applied to interface, we also can perform several optimizations. - -If a given ACE is a strict subset of another ACE located up in the linear -search order, we can ignore this ACE completely - because by definition -it will never match. We will call such an ACE *redundant*. Here is an example: - -``` -permit 2001:db8:1::/48 2001:db8:2::/48 (B) -deny 2001:d8b:1:1::/64 2001:db8:2:1::/64 (A) -``` - -A bit more formally, we can define this relationship of an ACE A to ACE B as: - -``` -redundant(aceA, aceB) := (contains(protoB, protoA) && contains(srcB, srcA) - && contains(dstB, dstA) && is_after(A, B)) -``` - -Here as "contains" we define an operation operating on the sets defined by -the protocol, (srcIP, srcPortDefinition) and (dstIP, dstPortDefinition) -respectively, and returning true if all the elements represented by -the second argument are represented by the first argument. The "is_after" -is true if A is located below B in the ruleset. - -If a given ACE does not intersect at all with any other ACE -in front of it, we can mark it as such. - -Then during the sequence of the lookups the successful hit on this ACE means -we do not need to look up other mask combinations - thus potentially -significantly speeding up the match process. Here is an example, -assuming we have the following ACL: - -``` -permit 2001:db8:1::/48 2001:db8:2::/48 (B) -deny 2001:db8:3::/48 2001:db8:2:1::/64 (A) -``` - -In this case if we match the second entry, we do not need to check whether -we have matched the first one - the source addresses are completely -different. We call such an ACE *independent* from another. - -We can define this as - -``` -independent(aceA, aceB) := (!intersect(protoA, protoB) || - !intersect(srcA, srcB) || - !intersect(dstA, dstB)) -``` - -where intersect is defined as operation returning true if there are -elements belonging to the sets of both arguments. - -If the entry A is neither redundant nor independent from B, and is below -B in the ruleset, we call such an entry *shadowed* by B, here is an example: - -``` -deny tcp 2001:db8:1::/48 2001:db8:2::/48 (B) -permit 2001:d8b:1:1::/64 2001:db8:2:1::/64 (A) -``` - -This means the earlier rule "carves out" a subset of A, thus leaving -a "shadow". (Evidently, the action needs to be different for the shadow -to have an effect, but for for the terminology sake we do not care). - -The more formal definition: - -``` -shadowed(aceA, aceB) := !redundante(aceA, aceB) && - !independent(aceA, aceB) && - is_after(aceA, aceB) -``` - -Using this terminology, any ruleset can be represented as -a DAG (Directed Acyclic Graph), with the bottom being the implicit -"deny any", pointing to the set of rules shadowing it or the ones -it is redundant for. - -These rules may in turn be shadowing each other. There is no cycles in -this graph because of the natural order of the rules - the rule located -closer to the end of the ruleset can never shadow or make redundant a rule -higher up. - -The optimization that enables can allow for is to skip matching certain -masks on a per-lookup basis - if a given rule has matched, -the only adjustments that can happen is the match with one of -the shadowing rules. - -Also, another avenue for the optimization can be starting the lookup process -with the mask type that maximizes the chances of the independent ACE match, -thus resulting in an ACE lookup being a single hash table hit. - - -Plumbing --------- - -All the new routines are located in a separate file, -so we can cleanly experiment with a different approach if this -does not fit all of the use cases. - -The constant-time lookup within the data path has the API with -the same signature as: - -``` -u8 -multi_acl_match_5tuple (u32 sw_if_index, fa_5tuple_t * pkt_5tuple, int is_l2, - int is_ip6, int is_input, u32 * acl_match_p, - u32 * rule_match_p, u32 * trace_bitmap) -``` - -There should be a new upper-level function with the same signature, which -will make a decision whether to use a linear lookup, or to use the -constant-time lookup implemented by this work, or to add some other -optimizations (e.g. by keeping the cache of the last N lookups). - -The calls to the routine doing preparatory work should happen -in `acl_add_list()` after creating the linear-lookup structures, -and the routine doing the preparatory work populating the hashtable -should be called from `acl_interface_add_del_inout_acl()` or its callees. - -The initial implementation will be geared towards looking up a single -match at a time, with the subsequent optimizations possible to make -the lookup for more than one packet. - diff --git a/src/vlibapi/api_doc.md b/src/vlibapi/api_doc.md index e620ee12..2e7ae09a 100644 --- a/src/vlibapi/api_doc.md +++ b/src/vlibapi/api_doc.md @@ -6,12 +6,12 @@ APIs. Messages are defined in `*.api` files. Today, there are about 50 api files, with more arriving as folks add programmable features. The API file compiler -sources reside in @ref src/tools/vppapigen . +sources reside in @ref src/tools/vppapigen. -Here's a typical request/response message definition, from -@ref src/vnet/interface.api : +From @ref src/vnet/interface.api, here's a typical request/response message +definition: -``` +```{.c} autoreply define sw_interface_set_flags { u32 client_index; @@ -22,10 +22,10 @@ Here's a typical request/response message definition, from }; ``` -To a first approximation, the API compiler renders this definition as -follows: +To a first approximation, the API compiler renders this definition into +`build-root/.../vpp/include/vnet/interface.api.h` as follows: -``` +```{.c} /****** Message ID / handler enum ******/ #ifdef vl_msg_id vl_msg_id(VL_API_SW_INTERFACE_SET_FLAGS, vl_api_sw_interface_set_flags_t_handler) @@ -60,10 +60,13 @@ follows: u32 context; i32 retval; }) vl_api_sw_interface_set_flags_reply_t; + + ... + #endif /* vl_typedefs */ ``` To change the admin state of an interface, a binary api client sends a -@ref vl_api_sw_interface_set_flags_t to vpp, which will respond with a +@ref vl_api_sw_interface_set_flags_t to VPP, which will respond with a @ref vl_api_sw_interface_set_flags_reply_t message. Multiple layers of software, transport types, and shared libraries @@ -76,7 +79,7 @@ implement a variety of features: message handlers. Correctly-coded message handlers know nothing about the transport used to -deliver messages to/from vpp. It's reasonably straighforward to use multiple +deliver messages to/from VPP. It's reasonably straighforward to use multiple API message transport types simultaneously. For historical reasons, binary api messages are (putatively) sent in network @@ -90,12 +93,12 @@ Since binary API messages are always processed in order, we allocate messages using a ring allocator whenever possible. This scheme is extremely fast when compared with a traditional memory allocator, and doesn't cause heap fragmentation. See -@ref src/vlibmemory/memory_shared.c @ref vl_msg_api_alloc_internal() . +@ref src/vlibmemory/memory_shared.c @ref vl_msg_api_alloc_internal(). Regardless of transport, binary api messages always follow a @ref msgbuf_t header: -``` +```{.c} typedef struct msgbuf_ { unix_shared_memory_queue_t *q; @@ -109,7 +112,7 @@ This structure makes it easy to trace messages without having to decode them - simply save data_len bytes - and allows @ref vl_msg_api_free() to rapidly dispose of message buffers: -``` +```{.c} void vl_msg_api_free (void *a) { @@ -130,34 +133,34 @@ decode them - simply save data_len bytes - and allows return; } - } + } ``` ## Message Tracing and Replay -It's extremely important that vpp can capture and replay sizeable binary API +It's extremely important that VPP can capture and replay sizeable binary API traces. System-level issues involving hundreds of thousands of API transactions can be re-run in a second or less. Partial replay allows one to binary-search for the point where the wheels fall off. One can add scaffolding to the data plane, to trigger when complex conditions obtain. With binary API trace, print, and replay, system-level bug reports of the form -"after 300,000 API transactions, the vpp data-plane stopped forwarding +"after 300,000 API transactions, the VPP data-plane stopped forwarding traffic, FIX IT!" can be solved offline. More often than not, one discovers that a control-plane client misprograms the data plane after a long time or under complex circumstances. Without direct evidence, "it's a data-plane problem!" -See @ref src/vlibmemory/memory_vlib.c @ref vl_msg_api_process_file() , -and @ref src/vlibapi/api_shared.c . See also the debug CLI command "api trace" +See @ref src/vlibmemory/memory_vlib.c @ref vl_msg_api_process_file(), +and @ref src/vlibapi/api_shared.c. See also the debug CLI command "api trace" ## Client connection details -Establishing a binary API connection to vpp from a C-language client +Establishing a binary API connection to VPP from a C-language client is easy: -``` +```{.c} int connect_to_vpe (char *client_name, int client_message_queue_length) { @@ -176,9 +179,9 @@ is easy: } ``` -32 is a typical value for client_message_queue_length. Vpp cannot +32 is a typical value for client_message_queue_length. VPP cannot block when it needs to send an API message to a binary API client, and -the vpp-side binary API message handlers are very fast. When sending +the VPP-side binary API message handlers are very fast. When sending asynchronous messages, make sure to scrape the binary API rx ring with some enthusiasm. @@ -187,7 +190,7 @@ some enthusiasm. Calling @ref vl_client_connect_to_vlib spins up a binary API message RX pthread: -``` +```{.c} static void * rx_thread_fn (void *arg) { @@ -214,31 +217,31 @@ To handle the binary API message queue yourself, use @ref vl_client_connect_to_vlib_no_rx_pthread. In turn, vl_msg_api_queue_handler(...) uses mutex/condvar signalling -to wake up, process vpp -> client traffic, then sleep. Vpp supplies a -condvar broadcast when the vpp -> client API message queue transitions +to wake up, process VPP -> client traffic, then sleep. VPP supplies a +condvar broadcast when the VPP -> client API message queue transitions from empty to nonempty. -Vpp checks its own binary API input queue at a very high rate. Vpp +VPP checks its own binary API input queue at a very high rate. VPP invokes message handlers in "process" context [aka cooperative multitasking thread context] at a variable rate, depending on data-plane packet processing requirements. ## Client disconnection details -To disconnect from vpp, call @ref vl_client_disconnect_from_vlib -. Please arrange to call this function if the client application -terminates abnormally. Vpp makes every effort to hold a decent funeral -for dead clients, but vpp can't guarantee to free leaked memory in the +To disconnect from VPP, call @ref vl_client_disconnect_from_vlib. +Please arrange to call this function if the client application +terminates abnormally. VPP makes every effort to hold a decent funeral +for dead clients, but VPP can't guarantee to free leaked memory in the shared binary API segment. -## Sending binary API messages to vpp +## Sending binary API messages to VPP -The point of the exercise is to send binary API messages to vpp, and -to receive replies from vpp. Many vpp binary APIs comprise a client +The point of the exercise is to send binary API messages to VPP, and +to receive replies from VPP. Many VPP binary APIs comprise a client request message, and a simple status reply. For example, to set the admin status of an interface, one codes: -``` +```{.c} vl_api_sw_interface_set_flags_t *mp; mp = vl_msg_api_alloc (sizeof (*mp)); @@ -262,9 +265,9 @@ Key points: network byte order * The client-library global data structure @ref api_main keeps track - of sufficient pointers and handles used to communicate with vpp + of sufficient pointers and handles used to communicate with VPP -## Receiving binary API messages from vpp +## Receiving binary API messages from VPP Unless you've made other arrangements (see @ref vl_client_connect_to_vlib_no_rx_pthread), *messages are received on a @@ -273,7 +276,7 @@ thread is the responsibility of the application! Set up message handlers about as follows: -``` +```{.c} #define vl_typedefs /* define message structures */ #include #undef vl_typedefs @@ -319,7 +322,7 @@ vectors in the @ref api_main_t structure. As of this writing: not all vector element values can be set through the API. You'll see sporadic API message registrations followed by minor adjustments of this form: -``` +```{.c} /* * Thread-safe API messages */ -- cgit 1.2.3-korg