From 5ccbaa97a86715bfee2ab44523075a93a8abb698 Mon Sep 17 00:00:00 2001 From: Chris Luke Date: Wed, 14 Sep 2016 12:42:54 -0400 Subject: Add structure to some of the documentation; VPP-223 Moves the random .md files, when rendered by Doxygen, into a config examples tree. We may later flesh this out into a more complete user documentation section. Change-Id: If423b82f1047f1c84f90876a786313054b5f7c77 Signed-off-by: Chris Luke --- doxygen/user_doc.md | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doxygen/user_doc.md (limited to 'doxygen/user_doc.md') diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md new file mode 100644 index 00000000..b08ac86a --- /dev/null +++ b/doxygen/user_doc.md @@ -0,0 +1,10 @@ +User Configuration Examples {#user_doc} +=========================== + +Several modules provide operational documentation. + +- @subpage ipsec_gre_doc +- @subpage map_doc +- @subpage lldp_doc +- @subpage ioam_plugin_doc +- @subpage lb_plugin_doc -- cgit 1.2.3-korg From f304552665ddd0658b361698424b815ff8237ef0 Mon Sep 17 00:00:00 2001 From: Chris Luke Date: Wed, 21 Sep 2016 09:35:01 -0400 Subject: Copy the 16.09 release notes to master - Copy the 16.09 release notes to master - Add some notes for 16.06 (from the press release) - Add some structure around notes for each release - Add skeleton for next release Change-Id: Id5a5d8bf02fce1bbaed303e6c6e4f8908c7e7d75 Signed-off-by: Chris Luke --- README.md | 19 +++-- RELEASE.md | 200 ++++++++++++++++++++++++++++++++++++++++++++++++++++ doxygen/user_doc.md | 1 + 3 files changed, 213 insertions(+), 7 deletions(-) create mode 100644 RELEASE.md (limited to 'doxygen/user_doc.md') diff --git a/README.md b/README.md index bf26c9de..cf43a4df 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ Vector Packet Processing ======================== -## Introduction. +## Introduction The VPP platform is an extensible framework that provides out-of-the-box production quality switch/router functionality. It is the open source version @@ -16,7 +16,13 @@ For more information on VPP and its features please visit the [What is VPP?](https://wiki.fd.io/view/VPP/What_is_VPP%3F) pages. -## Directory layout. +## Changes + +Details of the changes leading up to this version of VPP can be found under +@ref release_notes. + + +## Directory layout Directory name | Description ---------------------- | ------------------------------------------- @@ -25,7 +31,6 @@ Directory name | Description doxygen | Documentation generator configuration dpdk | DPDK patches and build infrastructure g2 | Event log visualization tool - gmod | perf related? perftool | Performance tool @ref plugins | VPP bundled plugins directory @ref svm | Shared virtual memory allocation library @@ -43,7 +48,7 @@ Directory name | Description ignore any @@ref labels in the above table.) -## Getting started. +## Getting started In general anyone interested in building, developing or running VPP should consult the [VPP wiki](https://wiki.fd.io/view/VPP) for more complete @@ -57,7 +62,7 @@ coverage of the topic. For the impatient, some salient information is distilled below. -### Quick-start: On an existing Linux host. +### Quick-start: On an existing Linux host To install system dependencies, build VPP and then install it, simply run the build script. This should be performed a non-privileged user with `sudo` @@ -73,7 +78,7 @@ interest. To see the available targets run: make -### Quick-start: Vagrant. +### Quick-start: Vagrant The directory `build-root/vagrant` contains a `VagrantFile` and supporting scripts to bootstrap a working VPP inside a Vagrant-managed Virtual Machine. @@ -88,7 +93,7 @@ for this can be found [on the Setting up Vagrant wiki page] (https://wiki.fd.io/view/DEV/Setting_Up_Vagrant). -## More information. +## More information Several modules provide documentation, see @subpage user_doc for more information. diff --git a/RELEASE.md b/RELEASE.md new file mode 100644 index 00000000..3ed72d06 --- /dev/null +++ b/RELEASE.md @@ -0,0 +1,200 @@ +# Release Notes {#release_notes} + +* @subpage release_notes_1701 +* @subpage release_notes_1609 +* @subpage release_notes_1606 + +@page release_notes_1701 Release notes for VPP 17.01 + +@note This release was for a while known as 16.12. +@todo Release 17.01 needs release notes. + +## Features + +## Known issues + +## Issues fixed + + +@page release_notes_1609 Release notes for VPP 16.09 + +## Features + +- [Integrated July 2016 DPDK release](http://www.dpdk.org/doc/guides/rel_notes/release_16_07.html) + - DPDK-vhost is depreciated pending a complete rework of the original integration and + addressing of rx performance deltas. + - Patches required for DPDK 16.07: + - Correctly setting the Packet Type in the IGB, IXGBE and i40e drivers. + - Correctly setting checksum in the i40e driver. + - NXP DPAA2 PMD Driver. + - rte_delay (yield) functionality. + +- Add “in tree” plugins: + - IPv6 ILA. + - iOAM. + - Load Balancer. + - SNAT. + +- High-performance (line-rate) “neutron like” L4 port-filtering. + +- API refactoring - addressing some of the issues around JVPP bindings. + - Accommodating plugins [(e.g. NSH_SFC)](https://wiki.fd.io/view/NSH_SFC) + - Binding for [python](https://wiki.fd.io/view/VPP/Python_API) + +- LISP + - L2 LISP overlays + - Multitenancy + - Multihoming + - RTR mode + - Map-resolver failover algorithm + +- Support 64-bit vector lengths, huge shared-memory segments. + +- Dynamic IP Feature ordering + - IP Features can now specify features they appear before and after + +- 16.09 Builds + - Ubuntu 14.04 LTS - Trusty Tahr + - Ubuntu 16.04 LTS - Xenial Xerus + - CentOS 7 + - More information on [VPP wiki](https://wiki.fd.io/view/VPP/Installing_VPP_binaries_from_packages) + +- Performance, characterize and document performance for this release + [(more information on CSIT page)](https://wiki.fd.io/view/CSIT) + + - IPv4 and IPv6 Scale - performance tests. + - Bidirectional 10k/100k/1M flows. + - 64B,570B, 1518B,9000B packet sizes. + - IPv6 iACL - performance + - DUT1 and DUT2 are configured with IPv6 routing, two static IPv6 /64 routes and IPv6 iAcl + security whitelist ingress /64 filter entries applied on links. + - TG traffic profile contains two L3 flow-groups (flow-group per direction, 253 flows per + flow-group) with all packets containing Ethernet header, IPv6 header and generated payload. + MAC addresses are matching MAC addresses of the TG node interfaces. + + - L2XC VXLANoIPv4 - performance + - DUT1 and DUT2 are configured with L2 cross-connect. VXLAN tunnels are configured between + L2XCs on DUT1 and DUT2. + - TG traffic profile contains two L3 flow-groups (flow-group per direction, 253 flows per + flow-group) with all packets containing Ethernet header, IPv4 header with IP protocol=61 + and generated payload. MAC addresses are matching MAC addresses of the TG node interfaces. + +- Documentation + - Autogenerated CLI documentation. + - Using doxygen to automate API/Node documentation. + - [(available online)](https://docs.fd.io/vpp/16.09/) + +- Resolved all static analysis issues found by Coverity + - Beginning of 16.09 cycle: 505 issues. + - Release: 0 outstanding issues. + + +## Known issues + +Issues in fd.io are tracked in [JIRA](https://jira.fd.io). + +Issue | Description +--- | --- +VPP-391 | vpp debug version assert appeared in the process of start +VPP-380 | Mapping algorithm compute wrong ea-bits when IPv4 prefix 0.0.0.0/0 +VPP-371 | load_one_plugin:63: Loaded plugin: message from vppctl +VPP-367 | vpp packages need to depend on specific versions of each other +VPP-312 | IP6 FIB gets in indeterminate state by duplicating commands +VPP-224 | Lookup-in-vrf can not be set correctly +VPP-206 | Fix classify table delete +VPP-203 | Fix binary API for reading vpp node graph +VPP-147 | Inconsistent behaviour when adding L2 FIB filter entry +VPP-99 | VPP doesn't discard DHCPOFFER message with wrong XID + + +## Issues fixed + +Issues in fd.io are tracked in [JIRA](https://jira.fd.io). + +Issue | Description +--- | --- +VPP-396 | Ubuntu systems Graphviz bug +VPP-390 | vpp-lib rpm fails to include *.so symlinks, causing linking problems with out of tree builds +VPP-388 | IPSec output feature assumes packets have been ethernet rewritten +VPP-385 | ARP for indirect adjacencies not working correctly +VPP-361 | Memory leak on delete of VXLAN over IPv6 tunnel +VPP-357 | VNI not set correctly when removing LISP fwd entries +VPP-349 | sw_interface_vhost_user_dump not working +VPP-345 | net/enic: bad L4 checksum ptype set on ICMP packets +VPP-340 | MAP-T wrong destination address +VPP-330 | Use fifo to store LISP pending map-requests +VPP-326 | map_add_domain VAT command: unable to configure domain with mtu parameter +VPP-318 | The map_add_domain VAT command accepts invalid arguments +VPP-315 | Fix "show vxlan-gpe" issue +VPP-310 | Mapping algorithm compute wrong ea-bits +VPP-239 | LISP IP forwarding does not tag packets that hit negative mapping entries +VPP-235 | Invalid help in VAT for sw_interface_set_l2_bridge +VPP-228 | Mapping algorithm sends packet to wrong IPv6 address +VPP-214 | vpp-api-test: api_ipsec_sad_add_del_entry: vector "ck" not initialized +VPP-200 | VPP - TAP port create problem +VPP-189 | Coverity Issues for 16.09 +VPP-184 | u16 translating to char ,not short +VPP-179 | Adjacency share-count botch +VPP-163 | "show ip6 interface" ignores non-global addresses +VPP-155 | Netmap: Inconsistency in interface state between "show hardware" and "show interface" +VPP-145 | Dynamically compute IP feature ordering based on constraints +VPP-137 | VPP sends ARP with wrong requested IP +VPP-118 | JVpp: 0 length arrays not handled properly in VPP responses +VPP-112 | linux kernel info missing from build log +VPP-110 | vxlan encap node should never touch a deleted tunnel +VPP-107 | RPM build broken in master +VPP-92 | segment routing is not properly filling out the segment list +VPP-91 | segment routing add/del tunnel lookup doesn't work +VPP-84 | af_packet throws a fatal error on EAGAIN +VPP-74 | Clang compile fails due to warning in vlib/unix/cli.c +VPP-64 | Top level "make pkg-deb" fails if CDPATH is set in user env. +VPP-48 | Traceroute does not terminate when VPP is the target +VPP-23 | CLI pager does not gracefully handle lines longer than the terminal width + + +@page release_notes_1606 Release notes for VPP 16.06 + + +The FD.io Project, relentlessly focused on data IO speed and efficiency +supporting the creation of high performance, flexible, and scalable software +defined infrastructures, announces the availability of the community’s first +software release (16.06). + +In the four months since launching, FD.io has brought together more than 75 +developers from 11 different companies including network operators, solution +providers chip vendors, and network equipment vendors who are collaborating to +enhance and innovate around the Vector Packet Processing (VPP) technology. The +FD.io community has quickly formed to grow the number of projects from the +initial VPP project to an additional 6 projects addressing a diverse set of +requirements and usability across a variety of deployment environments. + +The 16.06 release brings unprecedented performance: 480Gbps/200mpps with 8 +million routes and 2k whitelist entries on standard high volume x86 servers. + + +## Features + +In addition to the existing full suite of vswitch/vrouter features, the new +16.06 release adds: + +* Enhanced Switching and Routing: + * IPv6 Segment Routing multicast support. + * LISP xTR support. + * VXLAN over IPv6 underlay. + * Per interface whitelists. + * Shared adjacencies in FIB. + +* New and improved interface support: + * Jumbo frame support for vhost-user. + * Netmap interface support. + * AF_Packet interface support. + +* Expanded and improved programmability: + * Python API bindings. + * Enhanced JVPP Java API bindings. + * Debugging CLI. + +* Expanded Hardware and Software Support: + * Support for ARM 32 targets including Rasberry Pi single-board computer. + * Support for DPDK 16.04. + diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index b08ac86a..48a9fff3 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -3,6 +3,7 @@ User Configuration Examples {#user_doc} Several modules provide operational documentation. +- [GUI guided user demo](https://wiki.fd.io/view/VPP_Sandbox/vpp-userdemo) - @subpage ipsec_gre_doc - @subpage map_doc - @subpage lldp_doc -- cgit 1.2.3-korg From d82799863a6c02ae83daf9e231a4a1de5400731e Mon Sep 17 00:00:00 2001 From: Chris Luke Date: Tue, 4 Oct 2016 16:17:56 -0400 Subject: Minor tweaks to hqos doc - correct formatting errors - add to user_doc tree - unix line endings, not windows(!) Change-Id: I9aa5511c398fc33ecb25fb33ae64c8e666c56dd8 Signed-off-by: Chris Luke --- doxygen/user_doc.md | 7 +- vnet/vnet/devices/dpdk/qos_doc.md | 691 ++++++++++++++++++++++---------------- 2 files changed, 408 insertions(+), 290 deletions(-) (limited to 'doxygen/user_doc.md') diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index 48a9fff3..e6cd5bbe 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -1,9 +1,10 @@ -User Configuration Examples {#user_doc} -=========================== +User Documentation {#user_doc} +================== -Several modules provide operational documentation. +Several modules provide operational, dataplane-user focused documentation. - [GUI guided user demo](https://wiki.fd.io/view/VPP_Sandbox/vpp-userdemo) +- @subpage qos_doc - @subpage ipsec_gre_doc - @subpage map_doc - @subpage lldp_doc diff --git a/vnet/vnet/devices/dpdk/qos_doc.md b/vnet/vnet/devices/dpdk/qos_doc.md index 1b58a77f..9bd0659d 100644 --- a/vnet/vnet/devices/dpdk/qos_doc.md +++ b/vnet/vnet/devices/dpdk/qos_doc.md @@ -1,287 +1,404 @@ -# Quality-of-Service (QoS)- Hierarchcal Scheduler - -The Quality-of-Service (QoS) scheduler performs egress-traffic management by prioritizing the transmission of the packets of different type services and subcribers based on the Service Level Agreements (SLAs). The QoS scheduler can be enabled on one or more NIC output interfaces depending upon the requirement. - -## Overview - -The QoS schdeuler supports a number of scheduling and shaping levels which construct hierarchical-tree. The first level in the hierarchy is port (i.e. the physical interface) that constitutes the root node of the tree. The subsequent level is subport which represents the group of the users/subscribers. The individual user/subscriber is represented by the pipe at the next level. Each user can have different traffic type based on the criteria of specific loss rate, jitter, and latency. These traffic types are represented at the traffic-class level in the form of different traffic-classes. The last level contains number of queues which are grouped together to host the packets of the specific class type traffic. - -The QoS scheduler implementation requires flow classification, enqueue and dequeue operations. The flow classification is mandatory stage for HQoS where incoming packets are classified by mapping the packet fields information to 5-tuple (HQoS subport, pipe, traffic class, queue within traffic class, and color) and storing that information in mbuf sched field. The enqueue operation uses this information to determine the queue for storing the packet, and at this stage, if the specific queue is full, QoS drops the packet. The dequeue operation consists of scheduling the packet based on its length and available credits, and handing over the scheduled packet to the output interface. - -for more information on QoS Scheduler, please refer DPDK Programmer's Guide- http://dpdk.org/doc/guides/prog_guide/qos_framework.html - -### QoS Schdeuler Parameters - -Following illustrates the default HQoS configuration for each 10GbE output port: - -Single subport (subport 0): - - Subport rate set to 100% of port rate - - Each of the 4 traffic classes has rate set to 100% of port rate -4K pipes per subport 0 (pipes 0 .. 4095) with identical configuration: - - Pipe rate set to 1/4K of port rate - - Each of the 4 traffic classes has rate set to 100% of pipe rate - - Within each traffic class, the byte-level WRR weights for the 4 queues are set to 1:1:1:1 - -#### Port configuration -port { - rate 1250000000 /* Assuming 10GbE port */ - frame_overhead 24 /* Overhead fields per Ethernet frame */ - /* 7B (Preamble) + 1B (Start of Frame Delimiter (SFD)) + 4B (Frame Check Sequence (FCS)) + 12B (Inter Frame Gap (IFG)) */ - mtu 1522 /* Assuming Ethernet/IPv4 pkt (Ethernet FCS not included) */ - n_subports_per_port 1 /* Number of subports per output interface */ - n_pipes_per_subport 4096 /* Number of pipes (users/subscribers) */ - queue_sizes 64 64 64 64 /* Packet queue size for each traffic class. All queues within the same pipe traffic class have the same size. - Queues from different pipes serving the same traffic class have the same size.*/ -} - -#### Subport configuration -subport 0 { - tb_rate 1250000000 /* Subport level token bucket rate (bytes per second) */ - tb_size 1000000 /* Subport level token bucket size (bytes) */ - tc0_rate 1250000000 /* Subport level token bucket rate for traffic class 0 (bytes per second) */ - tc1_rate 1250000000 /* Subport level token bucket rate for traffic class 1 (bytes per second) */ - tc2_rate 1250000000 /* Subport level token bucket rate for traffic class 2 (bytes per second) */ - tc3_rate 1250000000 /* Subport level token bucket rate for traffic class 3 (bytes per second) */ - tc_period 10 /* Time interval for refilling the token bucket associated with traffic class (Milliseconds) */ - pipe 0 4095 profile 0 /* pipes (users/subscribers) configured with pipe profile 0 */ -} - -#### Pipe configuration -pipe_profile 0 { - tb_rate 305175 /* Pipe level token bucket rate (bytes per second) */ - tb_size 1000000 /* Pipe level token bucket size (bytes) */ - tc0_rate 305175 /* Pipe level token bucket rate for traffic class 0 (bytes per second) */ - tc1_rate 305175 /* Pipe level token bucket rate for traffic class 1 (bytes per second) */ - tc2_rate 305175 /* Pipe level token bucket rate for traffic class 2 (bytes per second) */ - tc3_rate 305175 /* Pipe level token bucket rate for traffic class 3 (bytes per second) */ - tc_period 40 /* Time interval for refilling the token bucket associated with traffic class at pipe level (Milliseconds) */ - tc3_oversubscription_weight 1 /* Weight traffic class 3 oversubscription */ - tc0_wrr_weights 1 1 1 1 /* Pipe queues WRR weights for traffic class 0 */ - tc1_wrr_weights 1 1 1 1 /* Pipe queues WRR weights for traffic class 1 */ - tc2_wrr_weights 1 1 1 1 /* Pipe queues WRR weights for traffic class 2 */ - tc3_wrr_weights 1 1 1 1 /* Pipe queues WRR weights for traffic class 3 */ -} - -#### Random Early Detection (RED) parameters per traffic class and color (Green / Yellow / Red) -red { - tc0_wred_min 48 40 32 /* Minimum threshold for traffic class 0 queue (min_th) in number of packets */ - tc0_wred_max 64 64 64 /* Maximum threshold for traffic class 0 queue (max_th) in number of packets */ - tc0_wred_inv_prob 10 10 10 /* Inverse of packet marking probability for traffic class 0 queue (maxp = 1 / maxp_inv) */ - tc0_wred_weight 9 9 9 /* Traffic Class 0 queue weight */ - tc1_wred_min 48 40 32 /* Minimum threshold for traffic class 1 queue (min_th) in number of packets */ - tc1_wred_max 64 64 64 /* Maximum threshold for traffic class 1 queue (max_th) in number of packets */ - tc1_wred_inv_prob 10 10 10 /* Inverse of packet marking probability for traffic class 1 queue (maxp = 1 / maxp_inv) */ - tc1_wred_weight 9 9 9 /* Traffic Class 1 queue weight */ - tc2_wred_min 48 40 32 /* Minimum threshold for traffic class 2 queue (min_th) in number of packets */ - tc2_wred_max 64 64 64 /* Maximum threshold for traffic class 2 queue (max_th) in number of packets */ - tc2_wred_inv_prob 10 10 10 /* Inverse of packet marking probability for traffic class 2 queue (maxp = 1 / maxp_inv) */ - tc2_wred_weight 9 9 9 /* Traffic Class 2 queue weight */ - tc3_wred_min 48 40 32 /* Minimum threshold for traffic class 3 queue (min_th) in number of packets */ - tc3_wred_max 64 64 64 /* Maximum threshold for traffic class 3 queue (max_th) in number of packets */ - tc3_wred_inv_prob 10 10 10 /* Inverse of packet marking probability for traffic class 3 queue (maxp = 1 / maxp_inv) */ - tc3_wred_weight 9 9 9 /* Traffic Class 3 queue weight */ -} - -### DPDK QoS Scheduler Integration in VPP - -The Hierarchical Quaity-of-Service (HQoS) scheduler object could be seen as part of the logical NIC output interface. To enable HQoS on specific output interface, vpp startup.conf file has to be configured accordingly. The output interface that requires HQoS, should have "hqos" parameter specified in dpdk section. Another optional parameter "hqos-thread" has been defined which can be used to associate the output interface with specific hqos thread. In cpu section of the config file, "corelist-hqos-threads" is introduced to assign logical cpu cores to run the HQoS threads. A HQoS thread can run multiple HQoS objects each associated with different output interfaces. All worker threads instead of writing packets to NIC TX queue directly, write the packets to a software queues. The hqos_threads read the software queues, and enqueue the packets to HQoS objects, as well as dequeue packets from HQOS objects and write them to NIC output interfaces. The worker threads need to be able to send the packets to any output interface, therefore, each HQoS object associated with NIC output interface should have software queues equal to worker threads count. - -Following illustrates the sample startup configuration file with 4x worker threads feeding 2x hqos threads that handle each QoS scheduler for 1x output interface. - -dpdk { - socket-mem 16384,16384 - - dev 0000:02:00.0 { - num-rx-queues 2 - hqos - } - dev 0000:06:00.0 { - num-rx-queues 2 - hqos - } - - num-mbufs 1000000 -} - -cpu { - main-core 0 - corelist-workers 1, 2, 3, 4 - corelist-hqos-threads 5, 6 -} - -### QoS scheduler CLI Commands - Each QoS scheduler instance is initialised with default parameters required to configure hqos port, subport, pipe and queues. Some of the parameters can be re-configured in run-time through CLI commands. - -#### Configuration - Following commands can be used to configure QoS scheduler parameters- - - * The command below can be used to set the subport level parameters such as token bucket rate (bytes per seconds), token bucket size (bytes), - traffic class rates (bytes per seconds) and token update period (Milliseconds). - set dpdk interface hqos subport subport [rate ] [bktsize ] [tc0 ] [tc1 ] [tc2 ] [tc3 ] [period ] - - * For setting the pipe profile, following command can be used. - set dpdk interface hqos pipe subport pipe profile - - * To assign QoS scheduler instance to the specific thread, following command can be used. - set dpdk interface hqos placement thread - - * The command below is used to set the packet fields required for classifiying the incoming packet. As a result of classification process, - packet field information will be mapped to 5 tuples (subport, pipe, traffic class, pipe, color) and stored in packet mbuf. - set dpdk interface hqos pktfield id offset mask - - * The DSCP table entries used for idenfiying the traffic class and queue can be set using the command below; - set dpdk interface hqos tctbl entry tc queue - -#### Show Command - - * The QoS Scheduler configuration can displayed using the command below. - - vpp# show dpdk interface hqos TenGigabitEthernet2/0/0 - Thread: - Input SWQ size = 4096 packets - Enqueue burst size = 256 packets - Dequeue burst size = 220 packets - Packet field 0: slab position = 0, slab bitmask = 0x0000000000000000 - Packet field 1: slab position = 40, slab bitmask = 0x0000000fff000000 - Packet field 2: slab position = 8, slab bitmask = 0x00000000000000fc - Packet field 2 translation table: - [ 0 .. 15]: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 - [16 .. 31]: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 - [32 .. 47]: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 - [48 .. 63]: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 - Port: - Rate = 1250000000 bytes/second - MTU = 1514 bytes - Frame overhead = 24 bytes - Number of subports = 1 - Number of pipes per subport = 4096 - Packet queue size: TC0 = 64, TC1 = 64, TC2 = 64, TC3 = 64 packets - Number of pipe profiles = 1 - Pipe profile 0: - Rate = 305175 bytes/second - Token bucket size = 1000000 bytes - Traffic class rate: TC0 = 305175, TC1 = 305175, TC2 = 305175, TC3 = 305175 bytes/second - TC period = 40 milliseconds - TC0 WRR weights: Q0 = 1, Q1 = 1, Q2 = 1, Q3 = 1 - TC1 WRR weights: Q0 = 1, Q1 = 1, Q2 = 1, Q3 = 1 - TC2 WRR weights: Q0 = 1, Q1 = 1, Q2 = 1, Q3 = 1 - TC3 WRR weights: Q0 = 1, Q1 = 1, Q2 = 1, Q3 = 1 - - - * The QoS Scheduler placement over the logical cpu cores can be displayed using below command. - - vpp# show dpdk interface hqos placement - Thread 5 (vpp_hqos-threads_0 at lcore 5): - TenGigabitEthernet2/0/0 queue 0 - Thread 6 (vpp_hqos-threads_1 at lcore 6): - TenGigabitEthernet4/0/1 queue 0 - - -### QoS Scheduler Binary APIs - - This section explans the available binary APIs for configuring QoS scheduler parameters in run-time. - -** The following API can be used to set the pipe profile of pipe that belongs to subport id- - - sw_interface_set_dpdk_hqos_pipe rx | sw_if_index subport pipe - profile - - The data structures used for set the pipe profile parameter are as follows; - - /** \brief DPDK interface HQoS pipe profile set request - @param client_index - opaque cookie to identify the sender - @param context - sender context, to match reply w/ request - @param sw_if_index - the interface - @param subport - subport ID - @param pipe - pipe ID within its subport - @param profile - pipe profile ID - */ - define sw_interface_set_dpdk_hqos_pipe { - u32 client_index; - u32 context; - u32 sw_if_index; - u32 subport; - u32 pipe; - u32 profile; - }; - - /** \brief DPDK interface HQoS pipe profile set reply - @param context - sender context, to match reply w/ request - @param retval - request return code - */ - define sw_interface_set_dpdk_hqos_pipe_reply { - u32 context; - i32 retval; - }; - - -** The following API can be used to set the subport level parameters, for example- token bucket rate (bytes per seconds), tocken bucket size (bytes), - traffic class rate (bytes per seconds) and tokens update period. - - sw_interface_set_dpdk_hqos_subport rx | sw_if_index subport [rate ] - [bktsize ] [tc0 ] [tc1 ] [tc2 ] [tc3 ] [period ] - - The data structures used for set the subport level parameter are as follows; - - /** \brief DPDK interface HQoS subport parameters set request - @param client_index - opaque cookie to identify the sender - @param context - sender context, to match reply w/ request - @param sw_if_index - the interface - @param subport - subport ID - @param tb_rate - subport token bucket rate (measured in bytes/second) - @param tb_size - subport token bucket size (measured in credits) - @param tc_rate - subport traffic class 0 .. 3 rates (measured in bytes/second) - @param tc_period - enforcement period for rates (measured in milliseconds) - */ - define sw_interface_set_dpdk_hqos_subport { - u32 client_index; - u32 context; - u32 sw_if_index; - u32 subport; - u32 tb_rate; - u32 tb_size; - u32 tc_rate[4]; - u32 tc_period; - }; - - /** \brief DPDK interface HQoS subport parameters set reply - @param context - sender context, to match reply w/ request - @param retval - request return code - */ - define sw_interface_set_dpdk_hqos_subport_reply { - u32 context; - i32 retval; - }; - - -** The following API can be used set the DSCP table entry. The DSCP table have 64 entries to map the packet DSCP field onto traffic class and hqos input queue. - - sw_interface_set_dpdk_hqos_tctbl rx | sw_if_index entry tc queue - - The data structures used for setting DSCP table entries are given below. - - /** \brief DPDK interface HQoS tctbl entry set request - @param client_index - opaque cookie to identify the sender - @param context - sender context, to match reply w/ request - @param sw_if_index - the interface - @param entry - entry index ID - @param tc - traffic class (0 .. 3) - @param queue - traffic class queue (0 .. 3) - */ - define sw_interface_set_dpdk_hqos_tctbl { - u32 client_index; - u32 context; - u32 sw_if_index; - u32 entry; - u32 tc; - u32 queue; - }; - - /** \brief DPDK interface HQoS tctbl entry set reply - @param context - sender context, to match reply w/ request - @param retval - request return code - */ - define sw_interface_set_dpdk_hqos_tctbl_reply { - u32 context; - i32 retval; - }; +# QoS Hierarchical Scheduler {#qos_doc} + +The Quality-of-Service (QoS) scheduler performs egress-traffic management by +prioritizing the transmission of the packets of different type services and +subcribers based on the Service Level Agreements (SLAs). The QoS scheduler can +be enabled on one or more NIC output interfaces depending upon the +requirement. + + +## Overview + +The QoS schdeuler supports a number of scheduling and shaping levels which +construct hierarchical-tree. The first level in the hierarchy is port (i.e. +the physical interface) that constitutes the root node of the tree. The +subsequent level is subport which represents the group of the +users/subscribers. The individual user/subscriber is represented by the pipe +at the next level. Each user can have different traffic type based on the +criteria of specific loss rate, jitter, and latency. These traffic types are +represented at the traffic-class level in the form of different traffic- +classes. The last level contains number of queues which are grouped together +to host the packets of the specific class type traffic. + +The QoS scheduler implementation requires flow classification, enqueue and +dequeue operations. The flow classification is mandatory stage for HQoS where +incoming packets are classified by mapping the packet fields information to +5-tuple (HQoS subport, pipe, traffic class, queue within traffic class, and +color) and storing that information in mbuf sched field. The enqueue operation +uses this information to determine the queue for storing the packet, and at +this stage, if the specific queue is full, QoS drops the packet. The dequeue +operation consists of scheduling the packet based on its length and available +credits, and handing over the scheduled packet to the output interface. + +For more information on QoS Scheduler, please refer DPDK Programmer's Guide- +http://dpdk.org/doc/guides/prog_guide/qos_framework.html + + +### QoS Schdeuler Parameters + +Following illustrates the default HQoS configuration for each 10GbE output +port: + +Single subport (subport 0): + - Subport rate set to 100% of port rate + - Each of the 4 traffic classes has rate set to 100% of port rate + +4K pipes per subport 0 (pipes 0 .. 4095) with identical configuration: + - Pipe rate set to 1/4K of port rate + - Each of the 4 traffic classes has rate set to 100% of pipe rate + - Within each traffic class, the byte-level WRR weights for the 4 queues are set to 1:1:1:1 + + +#### Port configuration + +``` +port { + rate 1250000000 /* Assuming 10GbE port */ + frame_overhead 24 /* Overhead fields per Ethernet frame: + * 7B (Preamble) + + * 1B (Start of Frame Delimiter (SFD)) + + * 4B (Frame Check Sequence (FCS)) + + * 12B (Inter Frame Gap (IFG)) + */ + mtu 1522 /* Assuming Ethernet/IPv4 pkt (FCS not included) */ + n_subports_per_port 1 /* Number of subports per output interface */ + n_pipes_per_subport 4096 /* Number of pipes (users/subscribers) */ + queue_sizes 64 64 64 64 /* Packet queue size for each traffic class. + * All queues within the same pipe traffic class + * have the same size. Queues from different + * pipes serving the same traffic class have + * the same size. */ +} +``` + + +#### Subport configuration + +``` +subport 0 { + tb_rate 1250000000 /* Subport level token bucket rate (bytes per second) */ + tb_size 1000000 /* Subport level token bucket size (bytes) */ + tc0_rate 1250000000 /* Subport level token bucket rate for traffic class 0 (bytes per second) */ + tc1_rate 1250000000 /* Subport level token bucket rate for traffic class 1 (bytes per second) */ + tc2_rate 1250000000 /* Subport level token bucket rate for traffic class 2 (bytes per second) */ + tc3_rate 1250000000 /* Subport level token bucket rate for traffic class 3 (bytes per second) */ + tc_period 10 /* Time interval for refilling the token bucket associated with traffic class (Milliseconds) */ + pipe 0 4095 profile 0 /* pipes (users/subscribers) configured with pipe profile 0 */ +} +``` + + +#### Pipe configuration + +``` +pipe_profile 0 { + tb_rate 305175 /* Pipe level token bucket rate (bytes per second) */ + tb_size 1000000 /* Pipe level token bucket size (bytes) */ + tc0_rate 305175 /* Pipe level token bucket rate for traffic class 0 (bytes per second) */ + tc1_rate 305175 /* Pipe level token bucket rate for traffic class 1 (bytes per second) */ + tc2_rate 305175 /* Pipe level token bucket rate for traffic class 2 (bytes per second) */ + tc3_rate 305175 /* Pipe level token bucket rate for traffic class 3 (bytes per second) */ + tc_period 40 /* Time interval for refilling the token bucket associated with traffic class at pipe level (Milliseconds) */ + tc3_oversubscription_weight 1 /* Weight traffic class 3 oversubscription */ + tc0_wrr_weights 1 1 1 1 /* Pipe queues WRR weights for traffic class 0 */ + tc1_wrr_weights 1 1 1 1 /* Pipe queues WRR weights for traffic class 1 */ + tc2_wrr_weights 1 1 1 1 /* Pipe queues WRR weights for traffic class 2 */ + tc3_wrr_weights 1 1 1 1 /* Pipe queues WRR weights for traffic class 3 */ +} +``` + + +#### Random Early Detection (RED) parameters per traffic class and color (Green / Yellow / Red) + +``` +red { + tc0_wred_min 48 40 32 /* Minimum threshold for traffic class 0 queue (min_th) in number of packets */ + tc0_wred_max 64 64 64 /* Maximum threshold for traffic class 0 queue (max_th) in number of packets */ + tc0_wred_inv_prob 10 10 10 /* Inverse of packet marking probability for traffic class 0 queue (maxp = 1 / maxp_inv) */ + tc0_wred_weight 9 9 9 /* Traffic Class 0 queue weight */ + tc1_wred_min 48 40 32 /* Minimum threshold for traffic class 1 queue (min_th) in number of packets */ + tc1_wred_max 64 64 64 /* Maximum threshold for traffic class 1 queue (max_th) in number of packets */ + tc1_wred_inv_prob 10 10 10 /* Inverse of packet marking probability for traffic class 1 queue (maxp = 1 / maxp_inv) */ + tc1_wred_weight 9 9 9 /* Traffic Class 1 queue weight */ + tc2_wred_min 48 40 32 /* Minimum threshold for traffic class 2 queue (min_th) in number of packets */ + tc2_wred_max 64 64 64 /* Maximum threshold for traffic class 2 queue (max_th) in number of packets */ + tc2_wred_inv_prob 10 10 10 /* Inverse of packet marking probability for traffic class 2 queue (maxp = 1 / maxp_inv) */ + tc2_wred_weight 9 9 9 /* Traffic Class 2 queue weight */ + tc3_wred_min 48 40 32 /* Minimum threshold for traffic class 3 queue (min_th) in number of packets */ + tc3_wred_max 64 64 64 /* Maximum threshold for traffic class 3 queue (max_th) in number of packets */ + tc3_wred_inv_prob 10 10 10 /* Inverse of packet marking probability for traffic class 3 queue (maxp = 1 / maxp_inv) */ + tc3_wred_weight 9 9 9 /* Traffic Class 3 queue weight */ +} +``` + + +### DPDK QoS Scheduler Integration in VPP + +The Hierarchical Quaity-of-Service (HQoS) scheduler object could be seen as +part of the logical NIC output interface. To enable HQoS on specific output +interface, vpp startup.conf file has to be configured accordingly. The output +interface that requires HQoS, should have "hqos" parameter specified in dpdk +section. Another optional parameter "hqos-thread" has been defined which can +be used to associate the output interface with specific hqos thread. In cpu +section of the config file, "corelist-hqos-threads" is introduced to assign +logical cpu cores to run the HQoS threads. A HQoS thread can run multiple HQoS +objects each associated with different output interfaces. All worker threads +instead of writing packets to NIC TX queue directly, write the packets to a +software queues. The hqos_threads read the software queues, and enqueue the +packets to HQoS objects, as well as dequeue packets from HQOS objects and +write them to NIC output interfaces. The worker threads need to be able to +send the packets to any output interface, therefore, each HQoS object +associated with NIC output interface should have software queues equal to +worker threads count. + +Following illustrates the sample startup configuration file with 4x worker +threads feeding 2x hqos threads that handle each QoS scheduler for 1x output +interface. + +``` +dpdk { + socket-mem 16384,16384 + + dev 0000:02:00.0 { + num-rx-queues 2 + hqos + } + dev 0000:06:00.0 { + num-rx-queues 2 + hqos + } + + num-mbufs 1000000 +} + +cpu { + main-core 0 + corelist-workers 1, 2, 3, 4 + corelist-hqos-threads 5, 6 +} +``` + + +### QoS scheduler CLI Commands + +Each QoS scheduler instance is initialised with default parameters required to +configure hqos port, subport, pipe and queues. Some of the parameters can be +re-configured in run-time through CLI commands. + + +#### Configuration + +Following commands can be used to configure QoS scheduler parameters. + +The command below can be used to set the subport level parameters such as +token bucket rate (bytes per seconds), token bucket size (bytes), traffic +class rates (bytes per seconds) and token update period (Milliseconds). + +``` +set dpdk interface hqos subport subport [rate ] + [bktsize ] [tc0 ] [tc1 ] [tc2 ] [tc3 ] [period ] +``` + +For setting the pipe profile, following command can be used. + +``` +set dpdk interface hqos pipe subport pipe profile +``` + +To assign QoS scheduler instance to the specific thread, following command can +be used. + +``` +set dpdk interface hqos placement thread +``` + +The command below is used to set the packet fields required for classifiying +the incoming packet. As a result of classification process, packet field +information will be mapped to 5 tuples (subport, pipe, traffic class, pipe, +color) and stored in packet mbuf. + +``` +set dpdk interface hqos pktfield id offset mask +``` + +The DSCP table entries used for idenfiying the traffic class and queue can be set using the command below; + +``` +set dpdk interface hqos tctbl entry tc queue +``` + + +#### Show Command + +The QoS Scheduler configuration can displayed using the command below. + +``` + vpp# show dpdk interface hqos TenGigabitEthernet2/0/0 + Thread: + Input SWQ size = 4096 packets + Enqueue burst size = 256 packets + Dequeue burst size = 220 packets + Packet field 0: slab position = 0, slab bitmask = 0x0000000000000000 + Packet field 1: slab position = 40, slab bitmask = 0x0000000fff000000 + Packet field 2: slab position = 8, slab bitmask = 0x00000000000000fc + Packet field 2 translation table: + [ 0 .. 15]: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 + [16 .. 31]: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 + [32 .. 47]: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 + [48 .. 63]: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 + Port: + Rate = 1250000000 bytes/second + MTU = 1514 bytes + Frame overhead = 24 bytes + Number of subports = 1 + Number of pipes per subport = 4096 + Packet queue size: TC0 = 64, TC1 = 64, TC2 = 64, TC3 = 64 packets + Number of pipe profiles = 1 + Pipe profile 0: + Rate = 305175 bytes/second + Token bucket size = 1000000 bytes + Traffic class rate: TC0 = 305175, TC1 = 305175, TC2 = 305175, TC3 = 305175 bytes/second + TC period = 40 milliseconds + TC0 WRR weights: Q0 = 1, Q1 = 1, Q2 = 1, Q3 = 1 + TC1 WRR weights: Q0 = 1, Q1 = 1, Q2 = 1, Q3 = 1 + TC2 WRR weights: Q0 = 1, Q1 = 1, Q2 = 1, Q3 = 1 + TC3 WRR weights: Q0 = 1, Q1 = 1, Q2 = 1, Q3 = 1 +``` + +The QoS Scheduler placement over the logical cpu cores can be displayed using +below command. + +``` + vpp# show dpdk interface hqos placement + Thread 5 (vpp_hqos-threads_0 at lcore 5): + TenGigabitEthernet2/0/0 queue 0 + Thread 6 (vpp_hqos-threads_1 at lcore 6): + TenGigabitEthernet4/0/1 queue 0 +``` + + +### QoS Scheduler Binary APIs + +This section explans the available binary APIs for configuring QoS scheduler +parameters in run-time. + +The following API can be used to set the pipe profile of a pipe that belongs +to a given subport: + +``` +sw_interface_set_dpdk_hqos_pipe rx | sw_if_index + subport pipe profile +``` + +The data structures used for set the pipe profile parameter are as follows; + +``` + /** \\brief DPDK interface HQoS pipe profile set request + @param client_index - opaque cookie to identify the sender + @param context - sender context, to match reply w/ request + @param sw_if_index - the interface + @param subport - subport ID + @param pipe - pipe ID within its subport + @param profile - pipe profile ID + */ + define sw_interface_set_dpdk_hqos_pipe { + u32 client_index; + u32 context; + u32 sw_if_index; + u32 subport; + u32 pipe; + u32 profile; + }; + + /** \\brief DPDK interface HQoS pipe profile set reply + @param context - sender context, to match reply w/ request + @param retval - request return code + */ + define sw_interface_set_dpdk_hqos_pipe_reply { + u32 context; + i32 retval; + }; +``` + +The following API can be used to set the subport level parameters, for +example- token bucket rate (bytes per seconds), token bucket size (bytes), +traffic class rate (bytes per seconds) and tokens update period. + +``` +sw_interface_set_dpdk_hqos_subport rx | sw_if_index + subport [rate ] [bktsize ] + [tc0 ] [tc1 ] [tc2 ] [tc3 ] [period ] +``` + +The data structures used for set the subport level parameter are as follows; + +``` + /** \\brief DPDK interface HQoS subport parameters set request + @param client_index - opaque cookie to identify the sender + @param context - sender context, to match reply w/ request + @param sw_if_index - the interface + @param subport - subport ID + @param tb_rate - subport token bucket rate (measured in bytes/second) + @param tb_size - subport token bucket size (measured in credits) + @param tc_rate - subport traffic class 0 .. 3 rates (measured in bytes/second) + @param tc_period - enforcement period for rates (measured in milliseconds) + */ + define sw_interface_set_dpdk_hqos_subport { + u32 client_index; + u32 context; + u32 sw_if_index; + u32 subport; + u32 tb_rate; + u32 tb_size; + u32 tc_rate[4]; + u32 tc_period; + }; + + /** \\brief DPDK interface HQoS subport parameters set reply + @param context - sender context, to match reply w/ request + @param retval - request return code + */ + define sw_interface_set_dpdk_hqos_subport_reply { + u32 context; + i32 retval; + }; +``` + +The following API can be used set the DSCP table entry. The DSCP table have +64 entries to map the packet DSCP field onto traffic class and hqos input +queue. + +``` +sw_interface_set_dpdk_hqos_tctbl rx | sw_if_index + entry tc queue +``` + +The data structures used for setting DSCP table entries are given below. + +``` + /** \\brief DPDK interface HQoS tctbl entry set request + @param client_index - opaque cookie to identify the sender + @param context - sender context, to match reply w/ request + @param sw_if_index - the interface + @param entry - entry index ID + @param tc - traffic class (0 .. 3) + @param queue - traffic class queue (0 .. 3) + */ + define sw_interface_set_dpdk_hqos_tctbl { + u32 client_index; + u32 context; + u32 sw_if_index; + u32 entry; + u32 tc; + u32 queue; + }; + + /** \\brief DPDK interface HQoS tctbl entry set reply + @param context - sender context, to match reply w/ request + @param retval - request return code + */ + define sw_interface_set_dpdk_hqos_tctbl_reply { + u32 context; + i32 retval; + }; +``` -- cgit 1.2.3-korg From ad41b351bdbc706bd984a7b7162b4794deb2f589 Mon Sep 17 00:00:00 2001 From: Dave Barach Date: Fri, 7 Oct 2016 08:30:23 -0700 Subject: Per-packet IPFIX record generation plugin Change-Id: Ia790aa018e70d67ed343e3e466d1b33b22560fc0 Signed-off-by: Dave Barach --- doxygen/user_doc.md | 1 + plugins/Makefile.am | 4 + plugins/configure.ac | 1 + plugins/flowperpkt-plugin/Makefile.am | 52 +++ plugins/flowperpkt-plugin/configure.ac | 9 + .../flowperpkt-plugin/flowperpkt/flowperpkt.api | 42 ++ plugins/flowperpkt-plugin/flowperpkt/flowperpkt.c | 500 ++++++++++++++++++++ plugins/flowperpkt-plugin/flowperpkt/flowperpkt.h | 71 +++ .../flowperpkt/flowperpkt_all_api_h.h | 18 + .../flowperpkt/flowperpkt_msg_enum.h | 31 ++ .../flowperpkt-plugin/flowperpkt/flowperpkt_test.c | 231 ++++++++++ plugins/flowperpkt-plugin/flowperpkt/node.c | 504 +++++++++++++++++++++ plugins/flowperpkt-plugin/flowperpkt_plugin_doc.md | 13 + vlib/vlib/buffer.h | 3 + vnet/vnet/flow/flow_report.c | 2 +- 15 files changed, 1481 insertions(+), 1 deletion(-) create mode 100644 plugins/flowperpkt-plugin/Makefile.am create mode 100644 plugins/flowperpkt-plugin/configure.ac create mode 100644 plugins/flowperpkt-plugin/flowperpkt/flowperpkt.api create mode 100644 plugins/flowperpkt-plugin/flowperpkt/flowperpkt.c create mode 100644 plugins/flowperpkt-plugin/flowperpkt/flowperpkt.h create mode 100644 plugins/flowperpkt-plugin/flowperpkt/flowperpkt_all_api_h.h create mode 100644 plugins/flowperpkt-plugin/flowperpkt/flowperpkt_msg_enum.h create mode 100644 plugins/flowperpkt-plugin/flowperpkt/flowperpkt_test.c create mode 100644 plugins/flowperpkt-plugin/flowperpkt/node.c create mode 100644 plugins/flowperpkt-plugin/flowperpkt_plugin_doc.md (limited to 'doxygen/user_doc.md') diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index e6cd5bbe..59063764 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -10,3 +10,4 @@ Several modules provide operational, dataplane-user focused documentation. - @subpage lldp_doc - @subpage ioam_plugin_doc - @subpage lb_plugin_doc +- @subpage flowperpkt_plugin_doc diff --git a/plugins/Makefile.am b/plugins/Makefile.am index 5293e6e7..117f11dd 100644 --- a/plugins/Makefile.am +++ b/plugins/Makefile.am @@ -51,3 +51,7 @@ endif if ENABLE_lb_PLUGIN SUBDIRS += lb-plugin endif + +if ENABLE_flowperpkt_PLUGIN +SUBDIRS += flowperpkt-plugin +endif diff --git a/plugins/configure.ac b/plugins/configure.ac index 6ee064e8..c3151919 100644 --- a/plugins/configure.ac +++ b/plugins/configure.ac @@ -58,6 +58,7 @@ PLUGIN_ENABLED(ioam) PLUGIN_ENABLED(snat) PLUGIN_ENABLED(ila) PLUGIN_ENABLED(lb) +PLUGIN_ENABLED(flowperpkt) # Disabled plugins, require --enable-XXX-plugin PLUGIN_DISABLED(vcgn) diff --git a/plugins/flowperpkt-plugin/Makefile.am b/plugins/flowperpkt-plugin/Makefile.am new file mode 100644 index 00000000..fe0d8b27 --- /dev/null +++ b/plugins/flowperpkt-plugin/Makefile.am @@ -0,0 +1,52 @@ + +# Copyright (c) +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at: +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +AUTOMAKE_OPTIONS = foreign subdir-objects + +AM_CFLAGS = -Wall +AM_LDFLAGS = -module -shared -avoid-version + +vppapitestpluginsdir = ${libdir}/vpp_api_test_plugins +vpppluginsdir = ${libdir}/vpp_plugins + +vppplugins_LTLIBRARIES = flowperpkt_plugin.la +vppapitestplugins_LTLIBRARIES = flowperpkt_test_plugin.la + +flowperpkt_plugin_la_SOURCES = flowperpkt/flowperpkt.c \ + flowperpkt/node.c \ + flowperpkt/flowperpkt_plugin.api.h +flowperpkt_plugin_la_LDFLAGS = -module + +BUILT_SOURCES = flowperpkt/flowperpkt.api.h + +SUFFIXES = .api.h .api + +%.api.h: %.api + mkdir -p `dirname $@` ; \ + $(CC) $(CPPFLAGS) -E -P -C -x c $^ \ + | vppapigen --input - --output $@ --show-name $@ + +noinst_HEADERS = \ + flowperpkt/flowperpkt_all_api_h.h \ + flowperpkt/flowperpkt_msg_enum.h \ + flowperpkt/flowperpkt.api.h + +flowperpkt_test_plugin_la_SOURCES = \ + flowperpkt/flowperpkt_test.c flowperpkt/flowperpkt_plugin.api.h + +# Remove *.la files +install-data-hook: + @(cd $(vpppluginsdir) && $(RM) $(vppplugins_LTLIBRARIES)) + @(cd $(vppapitestpluginsdir) && $(RM) $(vppapitestplugins_LTLIBRARIES)) + diff --git a/plugins/flowperpkt-plugin/configure.ac b/plugins/flowperpkt-plugin/configure.ac new file mode 100644 index 00000000..80546169 --- /dev/null +++ b/plugins/flowperpkt-plugin/configure.ac @@ -0,0 +1,9 @@ + +AC_INIT(flowperpkt_plugin, 1.0) +AM_INIT_AUTOMAKE +AM_SILENT_RULES([yes]) + +AC_PROG_LIBTOOL +AC_PROG_CC + +AC_OUTPUT([Makefile]) diff --git a/plugins/flowperpkt-plugin/flowperpkt/flowperpkt.api b/plugins/flowperpkt-plugin/flowperpkt/flowperpkt.api new file mode 100644 index 00000000..8933b585 --- /dev/null +++ b/plugins/flowperpkt-plugin/flowperpkt/flowperpkt.api @@ -0,0 +1,42 @@ +/* Define a simple enable-disable binary API to control the feature */ + +/** \file + This file defines the vpp control-plane API messages + used to control the flowperpkt plugin +*/ + +/** \brief Enable / disable per-packet IPFIX recording on an interface + @param client_index - opaque cookie to identify the sender + @param context - sender context, to match reply w/ request + @param is_add - add address if non-zero, else delete + @param is_ipv6 - if non-zero the address is ipv6, else ipv4 + @param sw_if_index - index of the interface +*/ +manual_print define flowperpkt_tx_interface_add_del +{ + /* Client identifier, set from api_main.my_client_index */ + u32 client_index; + + /* Arbitrary context, so client can match reply to request */ + u32 context; + + /* Enable / disable the feature */ + u8 is_add; + u8 is_ipv6; + + /* Interface handle */ + u32 sw_if_index; +}; + +/** \brief Reply to enable/disable per-packet IPFIX recording messages + @param context - returned sender context, to match reply w/ request + @param retval - return code +*/ +define flowperpkt_tx_interface_add_del_reply +{ + /* From the request */ + u32 context; + + /* Return value, zero means all OK */ + i32 retval; +}; diff --git a/plugins/flowperpkt-plugin/flowperpkt/flowperpkt.c b/plugins/flowperpkt-plugin/flowperpkt/flowperpkt.c new file mode 100644 index 00000000..3aef24a2 --- /dev/null +++ b/plugins/flowperpkt-plugin/flowperpkt/flowperpkt.c @@ -0,0 +1,500 @@ +/* + * flowperpkt.c - per-packet data capture flow report plugin + * + * Copyright (c) + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * @file + * @brief Per-packet IPFIX flow record generator plugin + * + * This file implements vpp plugin registration mechanics, + * debug CLI, and binary API handling. + */ + +#include +#include +#include + +#include +#include +#include + +/* define message IDs */ +#include + +/* define message structures */ +#define vl_typedefs +#include +#undef vl_typedefs + +/* define generated endian-swappers */ +#define vl_endianfun +#include +#undef vl_endianfun + +/* instantiate all the print functions we know about */ +#define vl_print(handle, ...) vlib_cli_output (handle, __VA_ARGS__) +#define vl_printfun +#include +#undef vl_printfun + +flowperpkt_main_t flowperpkt_main; + +/* Get the API version number */ +#define vl_api_version(n,v) static u32 api_version=(v); +#include +#undef vl_api_version + +/* Define the per-interface configurable feature */ +/* *INDENT-OFF* */ +VNET_IP4_TX_FEATURE_INIT (flow_perpacket, static) = { + .node_name = "flowperpkt", + .runs_before = (char *[]){"interface-output", 0}, + .feature_index = &flowperpkt_main.ip4_tx_feature_index, +}; +/* *INDENT-ON* */ + +/* + * A handy macro to set up a message reply. + * Assumes that the following variables are available: + * mp - pointer to request message + * rmp - pointer to reply message type + * rv - return value + */ +#define REPLY_MACRO(t) \ +do { \ + unix_shared_memory_queue_t * q = \ + vl_api_client_index_to_input_queue (mp->client_index); \ + if (!q) \ + return; \ + \ + rmp = vl_msg_api_alloc (sizeof (*rmp)); \ + rmp->_vl_msg_id = ntohs((t)+fm->msg_id_base); \ + rmp->context = mp->context; \ + rmp->retval = ntohl(rv); \ + \ + vl_msg_api_send_shmem (q, (u8 *)&rmp); \ +} while(0); + +/* Macro to finish up custom dump fns */ +#define FINISH \ + vec_add1 (s, 0); \ + vl_print (handle, (char *)s); \ + vec_free (s); \ + return handle; + +#define VALIDATE_SW_IF_INDEX(mp) \ + do { u32 __sw_if_index = ntohl(mp->sw_if_index); \ + vnet_main_t *__vnm = vnet_get_main(); \ + if (pool_is_free_index(__vnm->interface_main.sw_interfaces, \ + __sw_if_index)) { \ + rv = VNET_API_ERROR_INVALID_SW_IF_INDEX; \ + goto bad_sw_if_index; \ + } \ +} while(0); + +#define BAD_SW_IF_INDEX_LABEL \ +do { \ +bad_sw_if_index: \ + ; \ +} while (0); + +/** + * @brief Create an IPFIX template packet rewrite string + * @param frm flow_report_main_t * + * @param fr flow_report_t * + * @param collector_address ip4_address_t * the IPFIX collector address + * @param src_address ip4_address_t * the source address we should use + * @param collector_port u16 the collector port we should use, host byte order + * @returns u8 * vector containing the indicated IPFIX template packet + */ +u8 * +flowperpkt_template_rewrite (flow_report_main_t * frm, + flow_report_t * fr, + ip4_address_t * collector_address, + ip4_address_t * src_address, u16 collector_port) +{ + ip4_header_t *ip; + udp_header_t *udp; + ipfix_message_header_t *h; + ipfix_set_header_t *s; + ipfix_template_header_t *t; + ipfix_field_specifier_t *f; + ipfix_field_specifier_t *first_field; + u8 *rewrite = 0; + ip4_ipfix_template_packet_t *tp; + u32 field_count = 0; + flow_report_stream_t *stream; + + stream = &frm->streams[fr->stream_index]; + + /* + * Supported Fields: + * + * egressInterface, TLV type 14, u32 + * ipClassOfService, TLV type 5, u8 + * flowStartNanoseconds, TLV type 156, dateTimeNanoseconds (f64) + * Implementation: f64 nanoseconds since VPP started + * warning: wireshark doesn't really understand this TLV + * dataLinkFrameSize, TLV type 312, u16 + * warning: wireshark doesn't understand this TLV at all + */ + + /* Currently 4 fields */ + field_count += 4; + + /* allocate rewrite space */ + vec_validate_aligned (rewrite, + sizeof (ip4_ipfix_template_packet_t) + + field_count * sizeof (ipfix_field_specifier_t) - 1, + CLIB_CACHE_LINE_BYTES); + + tp = (ip4_ipfix_template_packet_t *) rewrite; + ip = (ip4_header_t *) & tp->ip4; + udp = (udp_header_t *) (ip + 1); + h = (ipfix_message_header_t *) (udp + 1); + s = (ipfix_set_header_t *) (h + 1); + t = (ipfix_template_header_t *) (s + 1); + first_field = f = (ipfix_field_specifier_t *) (t + 1); + + ip->ip_version_and_header_length = 0x45; + ip->ttl = 254; + ip->protocol = IP_PROTOCOL_UDP; + ip->src_address.as_u32 = src_address->as_u32; + ip->dst_address.as_u32 = collector_address->as_u32; + udp->src_port = clib_host_to_net_u16 (stream->src_port); + udp->dst_port = clib_host_to_net_u16 (collector_port); + udp->length = clib_host_to_net_u16 (vec_len (rewrite) - sizeof (*ip)); + + /* FIXUP: message header export_time */ + /* FIXUP: message header sequence_number */ + h->domain_id = clib_host_to_net_u32 (stream->domain_id); + + /* Add TLVs to the template */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , egressInterface, + 4); + f++; + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , ipClassOfService, + 1); + f++; + f->e_id_length = + ipfix_e_id_length (0 /* enterprise */ , flowStartNanoseconds, + 8); + f++; + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , dataLinkFrameSize, + 2); + f++; + /* Extend in the obvious way, right here... */ + + /* Back to the template packet... */ + ip = (ip4_header_t *) & tp->ip4; + udp = (udp_header_t *) (ip + 1); + + ASSERT (f - first_field); + /* Field count in this template */ + t->id_count = ipfix_id_count (fr->template_id, f - first_field); + + /* set length in octets */ + s->set_id_length = + ipfix_set_id_length (2 /* set_id */ , (u8 *) f - (u8 *) s); + + /* message length in octets */ + h->version_length = version_length ((u8 *) f - (u8 *) h); + + ip->length = clib_host_to_net_u16 ((u8 *) f - (u8 *) ip); + ip->checksum = ip4_header_checksum (ip); + + return rewrite; +} + +/** + * @brief Flush accumulated data + * @param frm flow_report_main_t * + * @param fr flow_report_t * + * @param f vlib_frame_t * + * + * Notes: + * This function must simply return the incoming frame, or no template packets + * will be sent. + */ +vlib_frame_t * +flowperpkt_data_callback (flow_report_main_t * frm, + flow_report_t * fr, + vlib_frame_t * f, u32 * to_next, u32 node_index) +{ + flowperpkt_flush_callback (); + return f; +} + +/** + * @brief configure / deconfigure the IPFIX flow-per-packet + * @param fm flowperpkt_main_t * fm + * @param sw_if_index u32 the desired interface + * @param is_add int 1 to enable the feature, 0 to disable it + * @returns 0 if successful, non-zero otherwise + */ + +static int flowperpkt_tx_interface_add_del_feature + (flowperpkt_main_t * fm, u32 sw_if_index, int is_add) +{ + u32 ci; + ip4_main_t *im = &ip4_main; + ip_lookup_main_t *lm = &im->lookup_main; + ip_config_main_t *cm = &lm->feature_config_mains[VNET_IP_TX_FEAT]; + u32 feature_index; + flow_report_main_t *frm = &flow_report_main; + vnet_flow_report_add_del_args_t _a, *a = &_a; + int rv; + + if (!fm->report_created) + { + memset (a, 0, sizeof (*a)); + a->rewrite_callback = flowperpkt_template_rewrite; + a->flow_data_callback = flowperpkt_data_callback; + a->is_add = 1; + a->domain_id = 1; /*$$$$ config parameter */ + a->src_port = 4739; /*$$$$ config parameter */ + fm->report_created = 1; + + rv = vnet_flow_report_add_del (frm, a); + if (rv) + { + clib_warning ("vnet_flow_report_add_del returned %d", rv); + return -1; + } + } + + feature_index = fm->ip4_tx_feature_index; + + ci = cm->config_index_by_sw_if_index[sw_if_index]; + ci = (is_add + ? vnet_config_add_feature + : vnet_config_del_feature) + (fm->vlib_main, &cm->config_main, + ci, feature_index, 0 /* config struct */ , + 0 /* sizeof config struct */ ); + cm->config_index_by_sw_if_index[sw_if_index] = ci; + + vnet_config_update_tx_feature_count (lm, cm, sw_if_index, is_add); + return 0; +} + +/** + * @brief API message handler + * @param mp vl_api_flowperpkt_tx_interface_add_del_t * mp the api message + */ +void vl_api_flowperpkt_tx_interface_add_del_t_handler + (vl_api_flowperpkt_tx_interface_add_del_t * mp) +{ + flowperpkt_main_t *fm = &flowperpkt_main; + vl_api_flowperpkt_tx_interface_add_del_reply_t *rmp; + u32 sw_if_index = ntohl (mp->sw_if_index); + int rv = 0; + + VALIDATE_SW_IF_INDEX (mp); + + rv = flowperpkt_tx_interface_add_del_feature (fm, sw_if_index, mp->is_add); + + BAD_SW_IF_INDEX_LABEL; + + REPLY_MACRO (VL_API_FLOWPERPKT_TX_INTERFACE_ADD_DEL_REPLY); +} + +/** + * @brief API message custom-dump function + * @param mp vl_api_flowperpkt_tx_interface_add_del_t * mp the api message + * @param handle void * print function handle + * @returns u8 * output string + */ +static void *vl_api_flowperpkt_tx_interface_add_del_t_print + (vl_api_flowperpkt_tx_interface_add_del_t * mp, void *handle) +{ + u8 *s; + + s = format (0, "SCRIPT: flowperpkt_tx_interface_add_del "); + s = format (s, "sw_if_index %d is_add %d is_ipv6 %d ", + clib_host_to_net_u32 (mp->sw_if_index), + (int) mp->is_add, (int) mp->is_ipv6); + FINISH; +} + +/* List of message types that this plugin understands */ +#define foreach_flowperpkt_plugin_api_msg \ +_(FLOWPERPKT_TX_INTERFACE_ADD_DEL, flowperpkt_tx_interface_add_del) + +/** + * @brief plugin-api required function + * @param vm vlib_main_t * vlib main data structure pointer + * @param h vlib_plugin_handoff_t * handoff structure + * @param from_early_init int notused + * + * Notes: + * This routine exists to convince the vlib plugin framework that + * we haven't accidentally copied a random .dll into the plugin directory. + * + * Also collects global variable pointers passed from the vpp engine + */ +clib_error_t * +vlib_plugin_register (vlib_main_t * vm, vnet_plugin_handoff_t * h, + int from_early_init) +{ + flowperpkt_main_t *fm = &flowperpkt_main; + clib_error_t *error = 0; + + fm->vlib_main = vm; + fm->vnet_main = h->vnet_main; + + return error; +} + +static clib_error_t * +flowperpkt_tx_interface_add_del_feature_command_fn (vlib_main_t * vm, + unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + flowperpkt_main_t *fm = &flowperpkt_main; + u32 sw_if_index = ~0; + int is_add = 1; + + int rv; + + while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) + { + if (unformat (input, "disable")) + is_add = 0; + else if (unformat (input, "%U", unformat_vnet_sw_interface, + fm->vnet_main, &sw_if_index)) + ; + else + break; + } + + if (sw_if_index == ~0) + return clib_error_return (0, "Please specify an interface..."); + + rv = flowperpkt_tx_interface_add_del_feature (fm, sw_if_index, is_add); + switch (rv) + { + case 0: + break; + + case VNET_API_ERROR_INVALID_SW_IF_INDEX: + return clib_error_return + (0, "Invalid interface, only works on physical ports"); + break; + + case VNET_API_ERROR_UNIMPLEMENTED: + return clib_error_return (0, "ip6 not supported"); + break; + + default: + return clib_error_return (0, "flowperpkt_enable_disable returned %d", + rv); + } + return 0; +} + +/*? + * 'flowperpkt feature add-del' commands to enable/disable + * per-packet IPFIX flow record generation on an interface + * + * @cliexpar + * @parblock + * To enable per-packet IPFIX flow-record generation on an interface: + * @cliexcmd{flowperpkt feature add-del GigabitEthernet2/0/0} + * + * To disable per-packet IPFIX flow-record generation on an interface: + * @cliexcmd{flowperpkt feature add-del GigabitEthernet2/0/0 disable} + * @cliexend + * @endparblock +?*/ +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (flowperpkt_enable_disable_command, static) = { + .path = "flowperpkt feature add-del", + .short_help = + "flowperpkt feature add-del [disable]", + .function = flowperpkt_tx_interface_add_del_feature_command_fn, +}; +/* *INDENT-ON* */ + +/** + * @brief Set up the API message handling tables + * @param vm vlib_main_t * vlib main data structure pointer + * @returns 0 to indicate all is well + */ +static clib_error_t * +flowperpkt_plugin_api_hookup (vlib_main_t * vm) +{ + flowperpkt_main_t *fm = &flowperpkt_main; +#define _(N,n) \ + vl_msg_api_set_handlers((VL_API_##N + fm->msg_id_base), \ + #n, \ + vl_api_##n##_t_handler, \ + vl_noop_handler, \ + vl_api_##n##_t_endian, \ + vl_api_##n##_t_print, \ + sizeof(vl_api_##n##_t), 1); + foreach_flowperpkt_plugin_api_msg; +#undef _ + + return 0; +} + +/** + * @brief Set up the API message handling tables + * @param vm vlib_main_t * vlib main data structure pointer + * @returns 0 to indicate all is well, or a clib_error_t + */ +static clib_error_t * +flowperpkt_init (vlib_main_t * vm) +{ + flowperpkt_main_t *fm = &flowperpkt_main; + vlib_thread_main_t *tm = &vlib_thread_main; + clib_error_t *error = 0; + u32 num_threads; + u8 *name; + + /* Construct the API name */ + name = format (0, "flowperpkt_%08x%c", api_version, 0); + + /* Ask for a correctly-sized block of API message decode slots */ + fm->msg_id_base = vl_msg_api_get_msg_ids + ((char *) name, VL_MSG_FIRST_AVAILABLE); + + /* Hook up message handlers */ + error = flowperpkt_plugin_api_hookup (vm); + + vec_free (name); + + /* Decide how many worker threads we have */ + num_threads = 1 /* main thread */ + tm->n_eal_threads; + + /* Allocate per worker thread vectors */ + vec_validate (fm->buffers_per_worker, num_threads - 1); + vec_validate (fm->frames_per_worker, num_threads - 1); + vec_validate (fm->next_record_offset_per_worker, num_threads - 1); + + return error; +} + +VLIB_INIT_FUNCTION (flowperpkt_init); + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/plugins/flowperpkt-plugin/flowperpkt/flowperpkt.h b/plugins/flowperpkt-plugin/flowperpkt/flowperpkt.h new file mode 100644 index 00000000..54d978ab --- /dev/null +++ b/plugins/flowperpkt-plugin/flowperpkt/flowperpkt.h @@ -0,0 +1,71 @@ +/* + * flowperpkt.h - skeleton vpp engine plug-in header file + * + * Copyright (c) + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#ifndef __included_flowperpkt_h__ +#define __included_flowperpkt_h__ + +#include +#include +#include + +#include +#include +#include +#include + +/** + * @file + * @brief flow-per-packet plugin header file + */ +typedef struct +{ + /** API message ID base */ + u16 msg_id_base; + + /** ip4 feature index */ + u32 ip4_tx_feature_index; + + /** Has the report been created? */ + int report_created; + + /** ipfix buffers under construction, per-worker thread */ + vlib_buffer_t **buffers_per_worker; + /** frames containing ipfix buffers, per-worker thread */ + vlib_frame_t **frames_per_worker; + /** next record offset, per worker thread */ + u16 *next_record_offset_per_worker; + + /** convenience vlib_main_t pointer */ + vlib_main_t *vlib_main; + /** convenience vnet_main_t pointer */ + vnet_main_t *vnet_main; +} flowperpkt_main_t; + +extern flowperpkt_main_t flowperpkt_main; + +vlib_node_registration_t flowperpkt_node; + +void flowperpkt_flush_callback (void); + +#endif /* __included_flowperpkt_h__ */ + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/plugins/flowperpkt-plugin/flowperpkt/flowperpkt_all_api_h.h b/plugins/flowperpkt-plugin/flowperpkt/flowperpkt_all_api_h.h new file mode 100644 index 00000000..329c375a --- /dev/null +++ b/plugins/flowperpkt-plugin/flowperpkt/flowperpkt_all_api_h.h @@ -0,0 +1,18 @@ +/* + * flowperpkt_all_api_h.h - plug-in api #include file + * + * Copyright (c) + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +/* Include the generated file, see BUILT_SOURCES in Makefile.am */ +#include diff --git a/plugins/flowperpkt-plugin/flowperpkt/flowperpkt_msg_enum.h b/plugins/flowperpkt-plugin/flowperpkt/flowperpkt_msg_enum.h new file mode 100644 index 00000000..3177e77a --- /dev/null +++ b/plugins/flowperpkt-plugin/flowperpkt/flowperpkt_msg_enum.h @@ -0,0 +1,31 @@ +/* + * flowperpkt_msg_enum.h - vpp engine plug-in message enumeration + * + * Copyright (c) + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#ifndef included_flowperpkt_msg_enum_h +#define included_flowperpkt_msg_enum_h + +#include + +#define vl_msg_id(n,h) n, +typedef enum +{ +#include + /* We'll want to know how many messages IDs we need... */ + VL_MSG_FIRST_AVAILABLE, +} vl_msg_id_t; +#undef vl_msg_id + +#endif /* included_flowperpkt_msg_enum_h */ diff --git a/plugins/flowperpkt-plugin/flowperpkt/flowperpkt_test.c b/plugins/flowperpkt-plugin/flowperpkt/flowperpkt_test.c new file mode 100644 index 00000000..3c1cd227 --- /dev/null +++ b/plugins/flowperpkt-plugin/flowperpkt/flowperpkt_test.c @@ -0,0 +1,231 @@ +/* + * flowperpkt.c - skeleton vpp-api-test plug-in + * + * Copyright (c) + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include +#include +#include +#include +#include + +/** + * @file vpp_api_test plugin + */ + +uword unformat_sw_if_index (unformat_input_t * input, va_list * args); + +/* Declare message IDs */ +#include + +/* define message structures */ +#define vl_typedefs +#include +#undef vl_typedefs + +/* declare message handlers for each api */ + +#define vl_endianfun /* define message structures */ +#include +#undef vl_endianfun + +/* instantiate all the print functions we know about */ +#define vl_print(handle, ...) +#define vl_printfun +#include +#undef vl_printfun + +/* Get the API version number. */ +#define vl_api_version(n,v) static u32 api_version=(v); +#include +#undef vl_api_version + +typedef struct +{ + /** API message ID base */ + u16 msg_id_base; + /** vat_main_t pointer */ + vat_main_t *vat_main; +} flowperpkt_test_main_t; + +flowperpkt_test_main_t flowperpkt_test_main; + +#define foreach_standard_reply_retval_handler \ +_(flowperpkt_tx_interface_add_del_reply) + +#define _(n) \ + static void vl_api_##n##_t_handler \ + (vl_api_##n##_t * mp) \ + { \ + vat_main_t * vam = flowperpkt_test_main.vat_main; \ + i32 retval = ntohl(mp->retval); \ + if (vam->async_mode) { \ + vam->async_errors += (retval < 0); \ + } else { \ + vam->retval = retval; \ + vam->result_ready = 1; \ + } \ + } +foreach_standard_reply_retval_handler; +#undef _ + +/* + * Table of message reply handlers, must include boilerplate handlers + * we just generated + */ +#define foreach_vpe_api_reply_msg \ +_(FLOWPERPKT_TX_INTERFACE_ADD_DEL_REPLY, \ + flowperpkt_tx_interface_add_del_reply) + + +/* M: construct, but don't yet send a message */ + +#define M(T,t) \ +do { \ + vam->result_ready = 0; \ + mp = vl_msg_api_alloc(sizeof(*mp)); \ + memset (mp, 0, sizeof (*mp)); \ + mp->_vl_msg_id = ntohs (VL_API_##T + sm->msg_id_base); \ + mp->client_index = vam->my_client_index; \ +} while(0); + +#define M2(T,t,n) \ +do { \ + vam->result_ready = 0; \ + mp = vl_msg_api_alloc(sizeof(*mp)+(n)); \ + memset (mp, 0, sizeof (*mp)); \ + mp->_vl_msg_id = ntohs (VL_API_##T + sm->msg_id_base); \ + mp->client_index = vam->my_client_index; \ +} while(0); + +/* S: send a message */ +#define S (vl_msg_api_send_shmem (vam->vl_input_queue, (u8 *)&mp)) + +/* W: wait for results, with timeout */ +#define W \ +do { \ + timeout = vat_time_now (vam) + 1.0; \ + \ + while (vat_time_now (vam) < timeout) { \ + if (vam->result_ready == 1) { \ + return (vam->retval); \ + } \ + } \ + return -99; \ +} while(0); + +static int +api_flowperpkt_tx_interface_add_del (vat_main_t * vam) +{ + flowperpkt_test_main_t *sm = &flowperpkt_test_main; + unformat_input_t *i = vam->input; + f64 timeout; + int enable_disable = 1; + u32 sw_if_index = ~0; + vl_api_flowperpkt_tx_interface_add_del_t *mp; + + /* Parse args required to build the message */ + while (unformat_check_input (i) != UNFORMAT_END_OF_INPUT) + { + if (unformat (i, "%U", unformat_sw_if_index, vam, &sw_if_index)) + ; + else if (unformat (i, "sw_if_index %d", &sw_if_index)) + ; + else if (unformat (i, "disable")) + enable_disable = 0; + else + break; + } + + if (sw_if_index == ~0) + { + errmsg ("missing interface name / explicit sw_if_index number \n"); + return -99; + } + + /* Construct the API message */ + M (FLOWPERPKT_TX_INTERFACE_ADD_DEL, flowperpkt_tx_interface_add_del); + mp->sw_if_index = ntohl (sw_if_index); + mp->is_add = enable_disable; + mp->is_ipv6 = 0; /* $$$$ */ + + /* send it... */ + S; + + /* Wait for a reply... */ + W; +} + +/* + * List of messages that the api test plugin sends, + * and that the data plane plugin processes + */ +#define foreach_vpe_api_msg \ +_(flowperpkt_tx_interface_add_del, " [disable]") + +void +vat_api_hookup (vat_main_t * vam) +{ + flowperpkt_test_main_t *sm = &flowperpkt_test_main; + /* Hook up handlers for replies from the data plane plug-in */ +#define _(N,n) \ + vl_msg_api_set_handlers((VL_API_##N + sm->msg_id_base), \ + #n, \ + vl_api_##n##_t_handler, \ + vl_noop_handler, \ + vl_api_##n##_t_endian, \ + vl_api_##n##_t_print, \ + sizeof(vl_api_##n##_t), 1); + foreach_vpe_api_reply_msg; +#undef _ + + /* API messages we can send */ +#define _(n,h) hash_set_mem (vam->function_by_name, #n, api_##n); + foreach_vpe_api_msg; +#undef _ + + /* Help strings */ +#define _(n,h) hash_set_mem (vam->help_by_name, #n, h); + foreach_vpe_api_msg; +#undef _ +} + +clib_error_t * +vat_plugin_register (vat_main_t * vam) +{ + flowperpkt_test_main_t *sm = &flowperpkt_test_main; + u8 *name; + + sm->vat_main = vam; + + /* Ask the vpp engine for the first assigned message-id */ + name = format (0, "flowperpkt_%08x%c", api_version, 0); + sm->msg_id_base = vl_client_get_first_plugin_msg_id ((char *) name); + + /* Don't attempt to hook up API messages if the data plane plugin is AWOL */ + if (sm->msg_id_base != (u16) ~ 0) + vat_api_hookup (vam); + + vec_free (name); + + return 0; +} + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/plugins/flowperpkt-plugin/flowperpkt/node.c b/plugins/flowperpkt-plugin/flowperpkt/node.c new file mode 100644 index 00000000..4deaae1a --- /dev/null +++ b/plugins/flowperpkt-plugin/flowperpkt/node.c @@ -0,0 +1,504 @@ +/* + * node.c - skeleton vpp engine plug-in dual-loop node skeleton + * + * Copyright (c) + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include +#include +#include +#include +#include + +/** + * @file flow record generator graph node + */ + +typedef struct +{ + /** interface handle */ + u32 sw_if_index; + /** ToS bits */ + u8 tos; + /** packet timestamp */ + f64 timestamp; + /** size of the buffer */ + u16 buffer_size; +} flowperpkt_trace_t; + +/* packet trace format function */ +static u8 * +format_flowperpkt_trace (u8 * s, va_list * args) +{ + CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *); + CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *); + flowperpkt_trace_t *t = va_arg (*args, flowperpkt_trace_t *); + + s = format (s, + "FLOWPERPKT: sw_if_index %d, tos %0x2, timestamp %.6f, size %d", + t->sw_if_index, t->tos, t->timestamp, t->buffer_size); + return s; +} + +vlib_node_registration_t flowperpkt_node; + +#define foreach_flowperpkt_error \ +_(SWAPPED, "Mac swap packets processed") + +typedef enum +{ +#define _(sym,str) FLOWPERPKT_ERROR_##sym, + foreach_flowperpkt_error +#undef _ + FLOWPERPKT_N_ERROR, +} flowperpkt_error_t; + +static char *flowperpkt_error_strings[] = { +#define _(sym,string) string, + foreach_flowperpkt_error +#undef _ +}; + +typedef enum +{ + FLOWPERPKT_NEXT_DROP, + FLOWPERPKT_N_NEXT, +} flowperpkt_next_t; + +/** + * @brief add an entry to the flow record under construction + * @param vm vlib_main_t * current worker thread main structure pointer + * @param fm flowperpkt_main_t * flow-per-packet main structure pointer + * @param sw_if_index u32 interface handle + * @param tos u8 ToS bits from the packet + * @param timestamp f64 timestamp, floating-point seconds since vpp started + * @param length u16 ip length of the packet + * @param do_flush int 1 = flush all cached records, 0 = construct a record + */ + +static inline void +add_to_flow_record (vlib_main_t * vm, + flowperpkt_main_t * fm, + u32 sw_if_index, + u8 tos, f64 timestamp, u16 length, int do_flush) +{ + u32 my_cpu_number = vm->cpu_index; + flow_report_main_t *frm = &flow_report_main; + ip4_header_t *ip; + udp_header_t *udp; + ip4_ipfix_template_packet_t *tp; + ipfix_message_header_t *h; + ipfix_set_header_t *s; + vlib_frame_t *f; + vlib_buffer_t *b0; + u16 offset; + u32 bi0; + + /* Find or allocate a buffer */ + b0 = fm->buffers_per_worker[my_cpu_number]; + + /* Need to allocate a buffer? */ + if (PREDICT_FALSE (b0 == 0)) + { + /* Nothing to flush */ + if (do_flush) + return; + + /* $$$$ drop counter? */ + if (vlib_buffer_alloc (vm, &bi0, 1) != 1) + return; + b0 = fm->buffers_per_worker[my_cpu_number] = vlib_get_buffer (vm, bi0); + offset = 0; + } + else + { + /* use the current buffer */ + bi0 = vlib_get_buffer_index (vm, b0); + offset = fm->next_record_offset_per_worker[my_cpu_number]; + } + + /* Find or allocate a frame */ + f = fm->frames_per_worker[my_cpu_number]; + if (PREDICT_FALSE (f == 0)) + { + u32 *to_next; + f = vlib_get_frame_to_node (vm, ip4_lookup_node.index); + fm->frames_per_worker[my_cpu_number] = f; + + /* Enqueue the buffer */ + to_next = vlib_frame_vector_args (f); + to_next[0] = bi0; + f->n_vectors = 1; + } + + /* Fresh packet, construct header */ + if (PREDICT_FALSE (offset == 0)) + { + flow_report_stream_t *stream; + + stream = &frm->streams[0]; + + b0->current_data = 0; + b0->current_length = sizeof (*ip) + sizeof (*udp) + sizeof (*h) + + sizeof (*s); + b0->flags |= (VLIB_BUFFER_TOTAL_LENGTH_VALID | VLIB_BUFFER_FLOW_REPORT); + vnet_buffer (b0)->sw_if_index[VLIB_RX] = 0; + vnet_buffer (b0)->sw_if_index[VLIB_TX] = frm->fib_index; + + tp = vlib_buffer_get_current (b0); + ip = (ip4_header_t *) & tp->ip4; + udp = (udp_header_t *) (ip + 1); + h = (ipfix_message_header_t *) (udp + 1); + s = (ipfix_set_header_t *) (h + 1); + + ip->ip_version_and_header_length = 0x45; + ip->ttl = 254; + ip->protocol = IP_PROTOCOL_UDP; + ip->src_address.as_u32 = frm->src_address.as_u32; + ip->dst_address.as_u32 = frm->ipfix_collector.as_u32; + udp->src_port = clib_host_to_net_u16 (UDP_DST_PORT_ipfix); + udp->dst_port = clib_host_to_net_u16 (UDP_DST_PORT_ipfix); + + /* FIXUP: message header export_time */ + h->export_time = (u32) + (((f64) frm->unix_time_0) + + (vlib_time_now (frm->vlib_main) - frm->vlib_time_0)); + h->export_time = clib_host_to_net_u32 (h->export_time); + h->domain_id = clib_host_to_net_u32 (stream->domain_id); + + /* FIXUP: message header sequence_number */ + h->sequence_number = stream->sequence_number++; + h->sequence_number = clib_host_to_net_u32 (h->sequence_number); + + offset = (u32) (((u8 *) (s + 1)) - (u8 *) tp); + } + + /* Add data, unless we're flushing stale data */ + if (PREDICT_TRUE (do_flush == 0)) + { + + /* Add data */ + /* Egress interface */ + { + u32 egress_interface = clib_host_to_net_u32 (sw_if_index); + clib_memcpy (b0->data + offset, &egress_interface, + sizeof (egress_interface)); + offset += sizeof (egress_interface); + } + /* ToS */ + b0->data[offset++] = tos; + + /* Timestamp */ + clib_memcpy (b0->data + offset, ×tamp, sizeof (f64)); + offset += sizeof (f64); + + /* pkt size */ + { + u16 pkt_size = clib_host_to_net_u16 (length); + clib_memcpy (b0->data + offset, &pkt_size, sizeof (pkt_size)); + offset += sizeof (pkt_size); + } + + b0->current_length += + /* sw_if_index + tos + timestamp + length = 15 */ + sizeof (u32) + sizeof (u8) + sizeof (f64) + sizeof (u16); + + } + /* Time to flush the buffer? */ + if (PREDICT_FALSE + (do_flush || (offset + sizeof (u32) + sizeof (u8) + + sizeof (f64)) > frm->path_mtu)) + { + tp = vlib_buffer_get_current (b0); + ip = (ip4_header_t *) & tp->ip4; + udp = (udp_header_t *) (ip + 1); + h = (ipfix_message_header_t *) (udp + 1); + s = (ipfix_set_header_t *) (h + 1); + + s->set_id_length = ipfix_set_id_length (256, + b0->current_length - + (sizeof (*ip) + sizeof (*udp) + + sizeof (*h))); + h->version_length = version_length (b0->current_length - + (sizeof (*ip) + sizeof (*udp))); + + ip->length = clib_host_to_net_u16 (b0->current_length); + + ip->checksum = ip4_header_checksum (ip); + udp->length = clib_host_to_net_u16 (b0->current_length - sizeof (*ip)); + + if (frm->udp_checksum) + { + /* RFC 7011 section 10.3.2. */ + udp->checksum = ip4_tcp_udp_compute_checksum (vm, b0, ip); + if (udp->checksum == 0) + udp->checksum = 0xffff; + } + + ASSERT (ip->checksum == ip4_header_checksum (ip)); + + vlib_put_frame_to_node (vm, ip4_lookup_node.index, + fm->frames_per_worker[my_cpu_number]); + fm->frames_per_worker[my_cpu_number] = 0; + fm->buffers_per_worker[my_cpu_number] = 0; + offset = 0; + } + + fm->next_record_offset_per_worker[my_cpu_number] = offset; +} + +void +flowperpkt_flush_callback (void) +{ + vlib_main_t *vm = vlib_get_main (); + flowperpkt_main_t *fm = &flowperpkt_main; + + add_to_flow_record (vm, fm, 0 /* sw_if_index */ , + 0 /* ToS */ , + 0.0 /* timestamp */ , + 0 /* length */ , + 1 /* do_flush */ ); +} + + +static uword +flowperpkt_node_fn (vlib_main_t * vm, + vlib_node_runtime_t * node, vlib_frame_t * frame) +{ + u32 n_left_from, *from, *to_next; + flowperpkt_next_t next_index; + ip4_main_t *im = &ip4_main; + ip_lookup_main_t *lm = &im->lookup_main; + ip_config_main_t *cm = &lm->feature_config_mains[VNET_IP_TX_FEAT]; + flowperpkt_main_t *fm = &flowperpkt_main; + f64 now = vlib_time_now (vm); + + from = vlib_frame_vector_args (frame); + n_left_from = frame->n_vectors; + next_index = node->cached_next_index; + + while (n_left_from > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + while (n_left_from >= 4 && n_left_to_next >= 2) + { + u32 next0 = FLOWPERPKT_NEXT_DROP; + u32 next1 = FLOWPERPKT_NEXT_DROP; + ip4_header_t *ip0, *ip1; + u16 len0, len1; + u32 bi0, bi1; + vlib_buffer_t *b0, *b1; + + /* Prefetch next iteration. */ + { + vlib_buffer_t *p2, *p3; + + p2 = vlib_get_buffer (vm, from[2]); + p3 = vlib_get_buffer (vm, from[3]); + + vlib_prefetch_buffer_header (p2, LOAD); + vlib_prefetch_buffer_header (p3, LOAD); + + CLIB_PREFETCH (p2->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p3->data, CLIB_CACHE_LINE_BYTES, STORE); + } + + /* speculatively enqueue b0 and b1 to the current next frame */ + to_next[0] = bi0 = from[0]; + to_next[1] = bi1 = from[1]; + from += 2; + to_next += 2; + n_left_from -= 2; + n_left_to_next -= 2; + + b0 = vlib_get_buffer (vm, bi0); + b1 = vlib_get_buffer (vm, bi1); + + + vnet_get_config_data (&cm->config_main, + &b0->current_config_index, + &next0, 0 /* sizeof config data */ ); + + vnet_get_config_data (&cm->config_main, + &b1->current_config_index, + &next1, 0 /* sizeof config data */ ); + + ip0 = (ip4_header_t *) ((u8 *) vlib_buffer_get_current (b0) + + vnet_buffer (b0)->ip.save_rewrite_length); + + len0 = vlib_buffer_length_in_chain (vm, b0); + + if (PREDICT_TRUE ((b0->flags & VLIB_BUFFER_FLOW_REPORT) == 0)) + add_to_flow_record (vm, fm, + vnet_buffer (b0)->sw_if_index[VLIB_TX], + ip0->tos, now, len0, 0 /* flush */ ); + + ip1 = (ip4_header_t *) ((u8 *) vlib_buffer_get_current (b1) + + vnet_buffer (b1)->ip.save_rewrite_length); + len1 = vlib_buffer_length_in_chain (vm, b1); + + if (PREDICT_TRUE ((b1->flags & VLIB_BUFFER_FLOW_REPORT) == 0)) + add_to_flow_record (vm, fm, + vnet_buffer (b1)->sw_if_index[VLIB_TX], + ip1->tos, now, len1, 0 /* flush */ ); + + if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) + { + if (b0->flags & VLIB_BUFFER_IS_TRACED) + { + flowperpkt_trace_t *t = + vlib_add_trace (vm, node, b0, sizeof (*t)); + t->sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_TX]; + t->tos = ip0->tos; + t->timestamp = now; + t->buffer_size = len0; + + } + if (b1->flags & VLIB_BUFFER_IS_TRACED) + { + flowperpkt_trace_t *t = + vlib_add_trace (vm, node, b1, sizeof (*t)); + + t->sw_if_index = vnet_buffer (b1)->sw_if_index[VLIB_TX]; + t->tos = ip1->tos; + t->timestamp = now; + t->buffer_size = len1; + } + } + + /* verify speculative enqueues, maybe switch current next frame */ + vlib_validate_buffer_enqueue_x2 (vm, node, next_index, + to_next, n_left_to_next, + bi0, bi1, next0, next1); + } + + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0; + vlib_buffer_t *b0; + u32 next0 = FLOWPERPKT_NEXT_DROP; + ip4_header_t *ip0; + u16 len0; + + /* speculatively enqueue b0 to the current next frame */ + bi0 = from[0]; + to_next[0] = bi0; + from += 1; + to_next += 1; + n_left_from -= 1; + n_left_to_next -= 1; + + b0 = vlib_get_buffer (vm, bi0); + + vnet_get_config_data (&cm->config_main, + &b0->current_config_index, + &next0, 0 /* sizeof config data */ ); + + ip0 = (ip4_header_t *) ((u8 *) vlib_buffer_get_current (b0) + + vnet_buffer (b0)->ip.save_rewrite_length); + /* + * egressInterface, TLV type 14, u32 + * ipClassOfService, TLV type 5, u8 + * flowStartNanoseconds, TLV type 156, dateTimeNanoseconds (f64) + * Implementation: f64 nanoseconds since VPP started + * dataLinkFrameSize, TLV type 312, u16 + */ + len0 = vlib_buffer_length_in_chain (vm, b0); + + if (PREDICT_TRUE ((b0->flags & VLIB_BUFFER_FLOW_REPORT) == 0)) + add_to_flow_record (vm, fm, + vnet_buffer (b0)->sw_if_index[VLIB_TX], + ip0->tos, now, len0, 0 /* do flush */ ); + + if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE) + && (b0->flags & VLIB_BUFFER_IS_TRACED))) + { + flowperpkt_trace_t *t = + vlib_add_trace (vm, node, b0, sizeof (*t)); + t->sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_TX]; + t->tos = ip0->tos; + t->timestamp = now; + t->buffer_size = len0; + } + + /* verify speculative enqueue, maybe switch current next frame */ + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, + to_next, n_left_to_next, + bi0, next0); + } + + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + return frame->n_vectors; +} + +/** + * @brief IPFIX flow-per-packet graph node + * @node flowperpkt + * + * This is the IPFIX flow-record-per-packet node. + * + * @param vm vlib_main_t corresponding to the current thread. + * @param node vlib_node_runtime_t data for this node. + * @param frame vlib_frame_t whose contents should be dispatched. + * + * @par Graph mechanics: buffer metadata, next index usage + * + * Uses: + * - vnet_buffer(b)->ip.save_rewrite_length + * - tells the node the length of the rewrite which was applied in + * ip4/6_rewrite_inline, allows the code to find the IP header without + * having to parse L2 headers, or make stupid assumptions about their + * length. + * - vnet_buffer(b)->flags & VLIB_BUFFER_FLOW_REPORT + * - Used to suppress flow record generation for flow record packets. + * + * Sets: + * - vnet_buffer(b)->flags & VLIB_BUFFER_FLOW_REPORT + * - To suppress flow record generation for flow record packets + * + * Next Index: + * - Next configured output feature on the interface, usually + * "interface-output." Generated flow records head for ip4-lookup + */ + +/* *INDENT-OFF* */ +VLIB_REGISTER_NODE (flowperpkt_node) = { + .function = flowperpkt_node_fn, + .name = "flowperpkt", + .vector_size = sizeof (u32), + .format_trace = format_flowperpkt_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + + .n_errors = ARRAY_LEN(flowperpkt_error_strings), + .error_strings = flowperpkt_error_strings, + + .n_next_nodes = FLOWPERPKT_N_NEXT, + + /* edit / add dispositions here */ + .next_nodes = { + [FLOWPERPKT_NEXT_DROP] = "error-drop", + }, +}; +/* *INDENT-ON* */ + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/plugins/flowperpkt-plugin/flowperpkt_plugin_doc.md b/plugins/flowperpkt-plugin/flowperpkt_plugin_doc.md new file mode 100644 index 00000000..ed76c45c --- /dev/null +++ b/plugins/flowperpkt-plugin/flowperpkt_plugin_doc.md @@ -0,0 +1,13 @@ +Per-packet IPFIX flow record plugin {#flowperpkt_plugin_doc} +=================================== + +## Introduction + +This plugin generates one ipfix record entry per packet transmitted +on interfaces which have the feature enabled + +## Sample configuration + +set ipfix exporter collector 192.168.6.2 src 192.168.6.1 template-interval 20 port 4739 path-mtu 1500 + +flowperpkt feature add-del GigabitEthernet2/3/0 diff --git a/vlib/vlib/buffer.h b/vlib/vlib/buffer.h index e4755466..314ed193 100644 --- a/vlib/vlib/buffer.h +++ b/vlib/vlib/buffer.h @@ -88,6 +88,8 @@ typedef struct
VLIB_BUFFER_TOTAL_LENGTH_VALID: as it says
VLIB_BUFFER_REPL_FAIL: packet replication failure
VLIB_BUFFER_RECYCLE: as it says +
VLIB_BUFFER_FLOW_REPORT: buffer is a flow report, + set to avoid adding it to a flow report
VLIB_BUFFER_FLAG_USER(n): user-defined bit N */ #define VLIB_BUFFER_IS_TRACED (1 << 0) @@ -97,6 +99,7 @@ typedef struct #define VLIB_BUFFER_TOTAL_LENGTH_VALID (1 << 3) #define VLIB_BUFFER_REPL_FAIL (1 << 4) #define VLIB_BUFFER_RECYCLE (1 << 5) +#define VLIB_BUFFER_FLOW_REPORT (1 << 6) /* User defined buffer flags. */ #define LOG2_VLIB_BUFFER_FLAG_USER(n) (32 - (n)) diff --git a/vnet/vnet/flow/flow_report.c b/vnet/vnet/flow/flow_report.c index cbf1313d..6750105c 100644 --- a/vnet/vnet/flow/flow_report.c +++ b/vnet/vnet/flow/flow_report.c @@ -114,7 +114,7 @@ int send_template_packet (flow_report_main_t *frm, clib_memcpy (b0->data, fr->rewrite, vec_len (fr->rewrite)); b0->current_data = 0; b0->current_length = vec_len (fr->rewrite); - b0->flags |= VLIB_BUFFER_TOTAL_LENGTH_VALID; + b0->flags |= (VLIB_BUFFER_TOTAL_LENGTH_VALID | VLIB_BUFFER_FLOW_REPORT); vnet_buffer (b0)->sw_if_index[VLIB_RX] = 0; vnet_buffer (b0)->sw_if_index[VLIB_TX] = frm->fib_index; -- cgit 1.2.3-korg From a10f62b11e7a710fde628ae75fe5791e54caba0a Mon Sep 17 00:00:00 2001 From: Sergio Gonzalez Monroy Date: Fri, 25 Nov 2016 13:36:12 +0000 Subject: dpdk: add ipsec cryptodev support DPDK Cryptodev support and related IPsec ESP nodes using DPDK Cryptodev APIs. When DPDK Cryptodev support is enabled, the node graph is modified by adding and replacing some of the nodes. The following nodes are replaced: * esp-encrypt -> dpdk-esp-encrypt * esp-decrypt -> dpdk-esp-decrypt The following nodes are added: * dpdk-crypto-input : polling input node * dpdk-esp-encrypt-post : internal node * dpdk-esp-decrypt-post : internal node Change-Id: I6dca9a890abaf4fb2a4fffce3fd08ac013e4d701 Signed-off-by: Zhang, Roy Fan Signed-off-by: Sergio Gonzalez Monroy --- build-data/packages/dpdk.mk | 4 + build-data/packages/vnet.mk | 3 + build-data/packages/vpp.mk | 3 + build-data/platforms/vpp.mk | 1 + doxygen/user_doc.md | 1 + dpdk/Makefile | 4 + vnet/Makefile.am | 17 +- vnet/configure.ac | 8 + vnet/vnet/devices/dpdk/ipsec/cli.c | 141 ++++++ vnet/vnet/devices/dpdk/ipsec/crypto_node.c | 209 ++++++++ vnet/vnet/devices/dpdk/ipsec/dir.dox | 18 + .../devices/dpdk/ipsec/dpdk_crypto_ipsec_doc.md | 73 +++ vnet/vnet/devices/dpdk/ipsec/esp.h | 268 ++++++++++ vnet/vnet/devices/dpdk/ipsec/esp_decrypt.c | 549 ++++++++++++++++++++ vnet/vnet/devices/dpdk/ipsec/esp_encrypt.c | 554 +++++++++++++++++++++ vnet/vnet/devices/dpdk/ipsec/ipsec.c | 313 ++++++++++++ vnet/vnet/devices/dpdk/ipsec/ipsec.h | 216 ++++++++ vnet/vnet/ipsec-gre/interface.c | 9 +- vnet/vnet/ipsec/esp.h | 151 ++++++ vnet/vnet/ipsec/esp_decrypt.c | 121 ----- vnet/vnet/ipsec/esp_encrypt.c | 25 - vnet/vnet/ipsec/ipsec.c | 31 +- vnet/vnet/ipsec/ipsec.h | 5 + vnet/vnet/ipsec/ipsec_cli.c | 1 - vnet/vnet/ipsec/ipsec_if.c | 39 ++ vnet/vnet/ipsec/ipsec_if_in.c | 10 +- vnet/vnet/ipsec/ipsec_if_out.c | 7 +- vnet/vnet/ipsec/ipsec_input.c | 8 +- vnet/vnet/ipsec/ipsec_output.c | 8 +- vpp/Makefile.am | 5 +- vpp/configure.ac | 8 + 31 files changed, 2651 insertions(+), 159 deletions(-) create mode 100644 vnet/vnet/devices/dpdk/ipsec/cli.c create mode 100644 vnet/vnet/devices/dpdk/ipsec/crypto_node.c create mode 100644 vnet/vnet/devices/dpdk/ipsec/dir.dox create mode 100644 vnet/vnet/devices/dpdk/ipsec/dpdk_crypto_ipsec_doc.md create mode 100644 vnet/vnet/devices/dpdk/ipsec/esp.h create mode 100644 vnet/vnet/devices/dpdk/ipsec/esp_decrypt.c create mode 100644 vnet/vnet/devices/dpdk/ipsec/esp_encrypt.c create mode 100644 vnet/vnet/devices/dpdk/ipsec/ipsec.c create mode 100644 vnet/vnet/devices/dpdk/ipsec/ipsec.h (limited to 'doxygen/user_doc.md') diff --git a/build-data/packages/dpdk.mk b/build-data/packages/dpdk.mk index f9163c87..a529e365 100644 --- a/build-data/packages/dpdk.mk +++ b/build-data/packages/dpdk.mk @@ -22,6 +22,10 @@ DPDK_MAKE_ARGS = -C $(call find_source_fn,$(PACKAGE_SOURCE)) \ DPDK_TUNE=$(DPDK_TUNE) \ DPDK_DEBUG=$(DPDK_DEBUG) +DPDK_CRYPTO_PMD=$(strip $($(PLATFORM)_uses_dpdk_cryptodev)) +ifneq ($(DPDK_CRYPTO_PMD),) +DPDK_MAKE_ARGS += DPDK_CRYPTO_PMD=y +endif DPDK_PLATFORM_TARGET=$(strip $($(PLATFORM)_dpdk_target)) ifneq ($(DPDK_PLATFORM_TARGET),) diff --git a/build-data/packages/vnet.mk b/build-data/packages/vnet.mk index cafb9393..399ca1b4 100644 --- a/build-data/packages/vnet.mk +++ b/build-data/packages/vnet.mk @@ -38,4 +38,7 @@ vnet_configure_depend += dpdk-install vnet_CPPFLAGS += $(call installed_includes_fn, dpdk) vnet_LDFLAGS += $(call installed_libs_fn, dpdk) endif +ifeq ($($(PLATFORM)_uses_dpdk_cryptodev),yes) +vnet_configure_args += --with-dpdk-crypto +endif endif diff --git a/build-data/packages/vpp.mk b/build-data/packages/vpp.mk index be10e17d..6831c6b8 100644 --- a/build-data/packages/vpp.mk +++ b/build-data/packages/vpp.mk @@ -45,4 +45,7 @@ vpp_configure_depend += dpdk-install vpp_CPPFLAGS += $(call installed_includes_fn, dpdk) vpp_LDFLAGS += $(call installed_libs_fn, dpdk) endif +ifeq ($($(PLATFORM)_uses_dpdk_cryptodev),yes) +vpp_configure_args += --with-dpdk-crypto +endif endif diff --git a/build-data/platforms/vpp.mk b/build-data/platforms/vpp.mk index 4fe7cc8f..15d4dc39 100644 --- a/build-data/platforms/vpp.mk +++ b/build-data/platforms/vpp.mk @@ -41,6 +41,7 @@ vlib_configure_args_vpp = --with-pre-data=128 plugins_configure_args_vpp = --with-dpdk # DPDK configuration parameters +# vpp_uses_dpdk_cryptodev = yes # vpp_uses_external_dpdk = yes # vpp_dpdk_inc_dir = /usr/include/dpdk # vpp_dpdk_lib_dir = /usr/lib diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index 59063764..2e87c877 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -6,6 +6,7 @@ Several modules provide operational, dataplane-user focused documentation. - [GUI guided user demo](https://wiki.fd.io/view/VPP_Sandbox/vpp-userdemo) - @subpage qos_doc - @subpage ipsec_gre_doc +- @subpage dpdk_crypto_ipsec_doc - @subpage map_doc - @subpage lldp_doc - @subpage ioam_plugin_doc diff --git a/dpdk/Makefile b/dpdk/Makefile index 2f5037df..46cc3db1 100644 --- a/dpdk/Makefile +++ b/dpdk/Makefile @@ -21,6 +21,7 @@ DPDK_DOWNLOAD_DIR ?= $(HOME)/Downloads DPDK_MARCH ?= native DPDK_TUNE ?= generic DPDK_DEBUG ?= n +DPDK_CRYPTO_PMD ?= n B := $(DPDK_BUILD_DIR) I := $(DPDK_INSTALL_DIR) @@ -118,6 +119,9 @@ $(B)/custom-config: $(B)/.patch.ok Makefile $(call set,RTE_LIBRTE_VMXNET3_DEBUG_INIT,$(DPDK_DEBUG)) $(call set,RTE_LIBRTE_PMD_BOND,y) $(call set,RTE_LIBRTE_IP_FRAG,y) + $(call set,RTE_LIBRTE_PMD_AESNI_MB,$(DPDK_CRYPTO_PMD)) + $(call set,RTE_LIBRTE_PMD_AESNI_GCM,$(DPDK_CRYPTO_PMD)) + $(call set,RTE_LIBRTE_PMD_QAT,$(DPDK_CRYPTO_PMD)) @# not needed $(call set,RTE_LIBRTE_TIMER,n) $(call set,RTE_LIBRTE_CFGFILE,n) diff --git a/vnet/Makefile.am b/vnet/Makefile.am index 86b73906..d806785f 100644 --- a/vnet/Makefile.am +++ b/vnet/Makefile.am @@ -13,7 +13,7 @@ AUTOMAKE_OPTIONS = foreign subdir-objects -AM_CFLAGS = -Wall -Werror @DPDK@ @IPSEC@ @IPV6SR@ +AM_CFLAGS = -Wall -Werror @DPDK@ @DPDK_CRYPTO@ @IPSEC@ @IPV6SR@ libvnet_la_SOURCES = libvnetplugin_la_SOURCES = @@ -345,7 +345,15 @@ libvnet_la_SOURCES += \ vnet/ipsec/ikev2_cli.c \ vnet/ipsec/ikev2_payload.c \ vnet/ipsec/ikev2_format.c -endif +if WITH_DPDK_CRYPTO +libvnet_la_SOURCES += \ + vnet/devices/dpdk/ipsec/esp_encrypt.c \ + vnet/devices/dpdk/ipsec/esp_decrypt.c \ + vnet/devices/dpdk/ipsec/crypto_node.c \ + vnet/devices/dpdk/ipsec/cli.c \ + vnet/devices/dpdk/ipsec/ipsec.c +endif +endif libvnet_la_SOURCES += \ vnet/ipsec/ipsec_output.c @@ -355,6 +363,11 @@ nobase_include_HEADERS += \ vnet/ipsec/esp.h \ vnet/ipsec/ikev2.h \ vnet/ipsec/ikev2_priv.h +if WITH_DPDK_CRYPTO +nobase_include_HEADERS += \ + vnet/devices/dpdk/ipsec/ipsec.h \ + vnet/devices/dpdk/ipsec/esp.h +endif ######################################## # Layer 3 protocol: osi diff --git a/vnet/configure.ac b/vnet/configure.ac index 80de43af..6a5281b0 100644 --- a/vnet/configure.ac +++ b/vnet/configure.ac @@ -12,6 +12,11 @@ AC_ARG_WITH(dpdk, [with_dpdk=1], [with_dpdk=0]) +AC_ARG_WITH(dpdk_crypto, + AC_HELP_STRING([--with-dpdk-crypto],[Use DPDK cryptodev]), + [with_dpdk_crypto=1], + [with_dpdk_crypto=0]) + AC_ARG_WITH(ipsec, AC_HELP_STRING([--without-ipsec],[Disable ipsec]), [with_ipsec=0], @@ -27,6 +32,9 @@ AC_ARG_ENABLE(tests, [enable_tests=1], [enable_tests=0]) +AM_CONDITIONAL(WITH_DPDK_CRYPTO, test "$with_dpdk_crypto" = "1") +AC_SUBST(DPDK_CRYPTO,[-DDPDK_CRYPTO=${with_dpdk_crypto}]) + AM_CONDITIONAL(WITH_DPDK, test "$with_dpdk" = "1") AC_SUBST(DPDK,[-DDPDK=${with_dpdk}]) diff --git a/vnet/vnet/devices/dpdk/ipsec/cli.c b/vnet/vnet/devices/dpdk/ipsec/cli.c new file mode 100644 index 00000000..3b634e03 --- /dev/null +++ b/vnet/vnet/devices/dpdk/ipsec/cli.c @@ -0,0 +1,141 @@ +/* + * Copyright (c) 2016 Intel and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include + +static void +dpdk_ipsec_show_mapping (vlib_main_t * vm, u16 detail_display) +{ + dpdk_crypto_main_t *dcm = &dpdk_crypto_main; + vlib_thread_main_t *tm = vlib_get_thread_main (); + u32 i, skip_master; + + if (detail_display) + vlib_cli_output (vm, "worker\t%10s\t%15s\tdir\tdev\tqp\n", + "cipher", "auth"); + else + vlib_cli_output (vm, "worker\tcrypto device id(type)\n"); + + skip_master = vlib_num_workers () > 0; + + for (i = 0; i < tm->n_vlib_mains; i++) + { + uword key, data; + u32 cpu_index = vlib_mains[i]->cpu_index; + crypto_worker_main_t *cwm = &dcm->workers_main[cpu_index]; + u8 *s = 0; + + if (skip_master) + { + skip_master = 0; + continue; + } + + if (!detail_display) + { + i32 last_cdev = -1; + crypto_qp_data_t *qpd; + + s = format (s, "%u\t", cpu_index); + + /* *INDENT-OFF* */ + vec_foreach (qpd, cwm->qp_data) + { + u32 dev_id = qpd->dev_id; + + if ((u16) last_cdev != dev_id) + { + struct rte_cryptodev_info cdev_info; + + rte_cryptodev_info_get (dev_id, &cdev_info); + + s = format(s, "%u(%s)\t", dev_id, cdev_info.feature_flags & + RTE_CRYPTODEV_FF_HW_ACCELERATED ? "HW" : "SW"); + } + last_cdev = dev_id; + } + /* *INDENT-ON* */ + vlib_cli_output (vm, "%s", s); + } + else + { + char cipher_str[15], auth_str[15]; + struct rte_cryptodev_capabilities cap; + crypto_worker_qp_key_t *p_key = (crypto_worker_qp_key_t *) & key; + /* *INDENT-OFF* */ + hash_foreach (key, data, cwm->algo_qp_map, + ({ + cap.op = RTE_CRYPTO_OP_TYPE_SYMMETRIC; + cap.sym.xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER; + cap.sym.cipher.algo = p_key->cipher_algo; + check_algo_is_supported (&cap, cipher_str); + cap.op = RTE_CRYPTO_OP_TYPE_SYMMETRIC; + cap.sym.xform_type = RTE_CRYPTO_SYM_XFORM_AUTH; + cap.sym.auth.algo = p_key->auth_algo; + check_algo_is_supported (&cap, auth_str); + vlib_cli_output (vm, "%u\t%10s\t%15s\t%3s\t%u\t%u\n", + vlib_mains[i]->cpu_index, cipher_str, auth_str, + p_key->is_outbound ? "out" : "in", + cwm->qp_data[data].dev_id, + cwm->qp_data[data].qp_id); + })); + /* *INDENT-ON* */ + } + } +} + +static clib_error_t * +lcore_cryptodev_map_fn (vlib_main_t * vm, unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + unformat_input_t _line_input, *line_input = &_line_input; + u16 detail = 0; + + if (!unformat_user (input, unformat_line_input, line_input)) + return 0; + + while (unformat_check_input (line_input) != UNFORMAT_END_OF_INPUT) + { + if (unformat (line_input, "verbose")) + detail = 1; + else + return clib_error_return (0, "parse error: '%U'", + format_unformat_error, line_input); + } + + unformat_free (line_input); + + dpdk_ipsec_show_mapping (vm, detail); + + return 0; +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (lcore_cryptodev_map, static) = { + .path = "show crypto device mapping", + .short_help = + "show cryptodev device mapping ", + .function = lcore_cryptodev_map_fn, +}; +/* *INDENT-ON* */ + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/vnet/vnet/devices/dpdk/ipsec/crypto_node.c b/vnet/vnet/devices/dpdk/ipsec/crypto_node.c new file mode 100644 index 00000000..37d5e5fa --- /dev/null +++ b/vnet/vnet/devices/dpdk/ipsec/crypto_node.c @@ -0,0 +1,209 @@ +/* + *------------------------------------------------------------------ + * crypto_node.c - DPDK Cryptodev input node + * + * Copyright (c) 2016 Intel and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *------------------------------------------------------------------ + */ + +#include +#include +#include +#include + +#include + +#define foreach_dpdk_crypto_input_next \ + _(DROP, "error-drop") \ + _(ENCRYPT_POST, "dpdk-esp-encrypt-post") \ + _(DECRYPT_POST, "dpdk-esp-decrypt-post") + +typedef enum +{ +#define _(f,s) DPDK_CRYPTO_INPUT_NEXT_##f, + foreach_dpdk_crypto_input_next +#undef _ + DPDK_CRYPTO_INPUT_N_NEXT, +} dpdk_crypto_input_next_t; + +#define foreach_dpdk_crypto_input_error \ + _(DQ_COPS, "Crypto ops dequeued") \ + _(COP_FAILED, "Crypto op failed") + +typedef enum +{ +#define _(f,s) DPDK_CRYPTO_INPUT_ERROR_##f, + foreach_dpdk_crypto_input_error +#undef _ + DPDK_CRYPTO_INPUT_N_ERROR, +} dpdk_crypto_input_error_t; + +static char *dpdk_crypto_input_error_strings[] = { +#define _(n, s) s, + foreach_dpdk_crypto_input_error +#undef _ +}; + +vlib_node_registration_t dpdk_crypto_input_node; + +typedef struct +{ + u32 cdev; + u32 qp; + u32 status; + u32 sa_idx; + u32 next_index; +} dpdk_crypto_input_trace_t; + +static u8 * +format_dpdk_crypto_input_trace (u8 * s, va_list * args) +{ + CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *); + CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *); + dpdk_crypto_input_trace_t *t = va_arg (*args, dpdk_crypto_input_trace_t *); + + s = format (s, "dpdk_crypto: cryptodev-id %u queue-pair %u next-index %d", + t->cdev, t->qp, t->next_index); + + s = format (s, "status %u sa-idx %u\n", t->status, t->sa_idx); + + return s; +} + +static_always_inline u32 +dpdk_crypto_dequeue (vlib_main_t * vm, vlib_node_runtime_t * node, + crypto_qp_data_t * qpd) +{ + u32 n_deq, *to_next = 0, next_index, n_cops, def_next_index; + struct rte_crypto_op **cops = qpd->cops; + + if (qpd->inflights == 0) + return 0; + + if (qpd->is_outbound) + def_next_index = DPDK_CRYPTO_INPUT_NEXT_ENCRYPT_POST; + else + def_next_index = DPDK_CRYPTO_INPUT_NEXT_DECRYPT_POST; + + n_cops = rte_cryptodev_dequeue_burst (qpd->dev_id, qpd->qp_id, + cops, VLIB_FRAME_SIZE); + n_deq = n_cops; + next_index = def_next_index; + + qpd->inflights -= n_cops; + ASSERT (qpd->inflights >= 0); + + while (n_cops > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + while (n_cops > 0 && n_left_to_next > 0) + { + u32 bi0, next0; + vlib_buffer_t *b0 = 0; + struct rte_crypto_op *cop; + struct rte_crypto_sym_op *sym_cop; + + cop = cops[0]; + cops += 1; + n_cops -= 1; + n_left_to_next -= 1; + + next0 = def_next_index; + + if (PREDICT_FALSE (cop->status != RTE_CRYPTO_OP_STATUS_SUCCESS)) + { + next0 = DPDK_CRYPTO_INPUT_NEXT_DROP; + vlib_node_increment_counter (vm, dpdk_crypto_input_node.index, + DPDK_CRYPTO_INPUT_ERROR_COP_FAILED, + 1); + } + cop->status = RTE_CRYPTO_OP_STATUS_NOT_PROCESSED; + + sym_cop = (struct rte_crypto_sym_op *) (cop + 1); + b0 = vlib_buffer_from_rte_mbuf (sym_cop->m_src); + bi0 = vlib_get_buffer_index (vm, b0); + + to_next[0] = bi0; + to_next += 1; + + if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + dpdk_crypto_input_trace_t *tr; + tr = vlib_add_trace (vm, node, b0, sizeof (*tr)); + tr->cdev = qpd->dev_id; + tr->qp = qpd->qp_id; + tr->status = cop->status; + tr->next_index = next0; + tr->sa_idx = vnet_buffer (b0)->ipsec.sad_index; + } + + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, + n_left_to_next, bi0, next0); + } + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + + crypto_free_cop (qpd, qpd->cops, n_deq); + + vlib_node_increment_counter (vm, dpdk_crypto_input_node.index, + DPDK_CRYPTO_INPUT_ERROR_DQ_COPS, n_deq); + return n_deq; +} + +static uword +dpdk_crypto_input_fn (vlib_main_t * vm, vlib_node_runtime_t * node, + vlib_frame_t * frame) +{ + u32 cpu_index = os_get_cpu_number (); + dpdk_crypto_main_t *dcm = &dpdk_crypto_main; + crypto_worker_main_t *cwm = &dcm->workers_main[cpu_index]; + crypto_qp_data_t *qpd; + u32 n_deq = 0; + + /* *INDENT-OFF* */ + vec_foreach (qpd, cwm->qp_data) + n_deq += dpdk_crypto_dequeue(vm, node, qpd); + /* *INDENT-ON* */ + + return n_deq; +} + +VLIB_REGISTER_NODE (dpdk_crypto_input_node) = +{ + .function = dpdk_crypto_input_fn,.name = "dpdk-crypto-input",.format_trace = + format_dpdk_crypto_input_trace,.type = VLIB_NODE_TYPE_INPUT,.state = + VLIB_NODE_STATE_DISABLED,.n_errors = + DPDK_CRYPTO_INPUT_N_ERROR,.error_strings = + dpdk_crypto_input_error_strings,.n_next_nodes = + DPDK_CRYPTO_INPUT_N_NEXT,.next_nodes = + { +#define _(s,n) [DPDK_CRYPTO_INPUT_NEXT_##s] = n, + foreach_dpdk_crypto_input_next +#undef _ + } +,}; + +#if DPDK_CRYPTO==1 +VLIB_NODE_FUNCTION_MULTIARCH (dpdk_crypto_input_node, dpdk_crypto_input_fn) +#endif +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/vnet/vnet/devices/dpdk/ipsec/dir.dox b/vnet/vnet/devices/dpdk/ipsec/dir.dox new file mode 100644 index 00000000..ffebfc4d --- /dev/null +++ b/vnet/vnet/devices/dpdk/ipsec/dir.dox @@ -0,0 +1,18 @@ +/* + * Copyright (c) 2016 Intel and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +/** + @dir vnet/vnet/devices/dpdk/ipsec + @brief IPSec ESP encrypt/decrypt using DPDK Cryptodev API +*/ diff --git a/vnet/vnet/devices/dpdk/ipsec/dpdk_crypto_ipsec_doc.md b/vnet/vnet/devices/dpdk/ipsec/dpdk_crypto_ipsec_doc.md new file mode 100644 index 00000000..8089696f --- /dev/null +++ b/vnet/vnet/devices/dpdk/ipsec/dpdk_crypto_ipsec_doc.md @@ -0,0 +1,73 @@ +# VPP IPSec implementation using DPDK Cryptodev API {#dpdk_crypto_ipsec_doc} + +This document is meant to contain all related information about implementation and usability. + + +## VPP IPsec with DPDK Cryptodev + +DPDK Cryptodev is an asynchronous crypto API that supports both Hardware and Software implementations (for more details refer to [DPDK Cryptography Device Library documentation](http://dpdk.org/doc/guides/prog_guide/cryptodev_lib.html)). + +When DPDK Cryptodev support is enabled, the node graph is modified by adding and replacing some of the nodes. + +The following nodes are replaced: +* esp-encrypt -> dpdk-esp-encrypt +* esp-decrypt -> dpdk-esp-decrypt + +The following nodes are added: +* dpdk-crypto-input : polling input node, basically dequeuing from crypto devices. +* dpdk-esp-encrypt-post : internal node. +* dpdk-esp-decrypt-post : internal node. + + +### How to enable VPP IPSec with DPDK Cryptodev support + +To enable DPDK Cryptodev support (disabled by default), we need the following env option: + + vpp_uses_dpdk_cryptodev=yes + +A couple of ways to achive this: +* uncomment/add it in the platforms config (ie. build-data/platforms/vpp.mk) +* set the option when building vpp (ie. make vpp_uses_dpdk_cryptodev=yes build-release) + + +### Crypto Resources allocation + +VPP allocates crypto resources based on a best effort approach: +* first allocate Hardware crypto resources, then Software. +* if there are not enough crypto resources for all workers, all packets will be dropped if they reach ESP encrypt/decrypt nodes, displaying the warning: + + 0: dpdk_ipsec_init: not enough cryptodevs for ipsec + + +### Configuration example + +No especial IPsec configuration is required. + +Once DPDK Cryptodev is enabled, the user just needs to provide cryptodevs in the startup.conf. + +Example startup.conf: + +``` +dpdk { + socket-mem 1024,1024 + num-mbufs 131072 + dev 0000:81:00.0 + dev 0000:81:00.1 + dev 0000:85:01.0 + dev 0000:85:01.1 + vdev cryptodev_aesni_mb_pmd,socket_id=1 + vdev cryptodev_aesni_mb_pmd,socket_id=1 +} +``` + +In the above configuration: +* 0000:85:01.0 and 0000:85:01.1 are crypto BDFs and they require the same driver binding as DPDK Ethernet devices but they do not support any extra configuration options. +* Two AESNI-MB Software Cryptodev PMDs are created in NUMA node 1. + +For further details refer to [DPDK Crypto Device Driver documentation](http://dpdk.org/doc/guides/cryptodevs/index.html) + +### Operational data + +The following CLI command displays the Cryptodev/Worker mapping: + + show crypto device mapping [verbose] diff --git a/vnet/vnet/devices/dpdk/ipsec/esp.h b/vnet/vnet/devices/dpdk/ipsec/esp.h new file mode 100644 index 00000000..71282ac0 --- /dev/null +++ b/vnet/vnet/devices/dpdk/ipsec/esp.h @@ -0,0 +1,268 @@ +/* + * Copyright (c) 2016 Intel and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#ifndef __DPDK_ESP_H__ +#define __DPDK_ESP_H__ + +#include +#include +#include + +typedef struct +{ + enum rte_crypto_cipher_algorithm algo; + u8 key_len; + u8 iv_len; +} dpdk_esp_crypto_alg_t; + +typedef struct +{ + enum rte_crypto_auth_algorithm algo; + u8 trunc_size; +} dpdk_esp_integ_alg_t; + +typedef struct +{ + dpdk_esp_crypto_alg_t *esp_crypto_algs; + dpdk_esp_integ_alg_t *esp_integ_algs; +} dpdk_esp_main_t; + +dpdk_esp_main_t dpdk_esp_main; + +static_always_inline void +dpdk_esp_init () +{ + dpdk_esp_main_t *em = &dpdk_esp_main; + dpdk_esp_integ_alg_t *i; + dpdk_esp_crypto_alg_t *c; + + vec_validate (em->esp_crypto_algs, IPSEC_CRYPTO_N_ALG - 1); + + c = &em->esp_crypto_algs[IPSEC_CRYPTO_ALG_AES_CBC_128]; + c->algo = RTE_CRYPTO_CIPHER_AES_CBC; + c->key_len = 16; + c->iv_len = 16; + + c = &em->esp_crypto_algs[IPSEC_CRYPTO_ALG_AES_CBC_192]; + c->algo = RTE_CRYPTO_CIPHER_AES_CBC; + c->key_len = 24; + c->iv_len = 16; + + c = &em->esp_crypto_algs[IPSEC_CRYPTO_ALG_AES_CBC_256]; + c->algo = RTE_CRYPTO_CIPHER_AES_CBC; + c->key_len = 32; + c->iv_len = 16; + + vec_validate (em->esp_integ_algs, IPSEC_INTEG_N_ALG - 1); + + i = &em->esp_integ_algs[IPSEC_INTEG_ALG_SHA1_96]; + i->algo = RTE_CRYPTO_AUTH_SHA1_HMAC; + i->trunc_size = 12; + + i = &em->esp_integ_algs[IPSEC_INTEG_ALG_SHA_256_96]; + i->algo = RTE_CRYPTO_AUTH_SHA256_HMAC; + i->trunc_size = 12; + + i = &em->esp_integ_algs[IPSEC_INTEG_ALG_SHA_256_128]; + i->algo = RTE_CRYPTO_AUTH_SHA256_HMAC; + i->trunc_size = 16; + + i = &em->esp_integ_algs[IPSEC_INTEG_ALG_SHA_384_192]; + i->algo = RTE_CRYPTO_AUTH_SHA384_HMAC; + i->trunc_size = 24; + + i = &em->esp_integ_algs[IPSEC_INTEG_ALG_SHA_512_256]; + i->algo = RTE_CRYPTO_AUTH_SHA512_HMAC; + i->trunc_size = 32; +} + +static_always_inline int +add_del_sa_sess (u32 sa_index, u8 is_add) +{ + dpdk_crypto_main_t *dcm = &dpdk_crypto_main; + crypto_worker_main_t *cwm; + u8 skip_master = vlib_num_workers () > 0; + + /* *INDENT-OFF* */ + vec_foreach (cwm, dcm->workers_main) + { + crypto_sa_session_t *sa_sess; + u8 is_outbound; + + if (skip_master) + { + skip_master = 0; + continue; + } + + for (is_outbound = 0; is_outbound < 2; is_outbound++) + { + if (is_add) + { + pool_get (cwm->sa_sess_d[is_outbound], sa_sess); + } + else + { + u8 dev_id; + + sa_sess = pool_elt_at_index (cwm->sa_sess_d[is_outbound], sa_index); + dev_id = cwm->qp_data[sa_sess->qp_index].dev_id; + + if (!sa_sess->sess) + continue; + + if (rte_cryptodev_sym_session_free(dev_id, sa_sess->sess)) + { + clib_warning("failed to free session"); + return -1; + } + memset(sa_sess, 0, sizeof(sa_sess[0])); + } + } + } + /* *INDENT-OFF* */ + + return 0; +} + +static_always_inline int +translate_crypto_algo(ipsec_crypto_alg_t crypto_algo, + struct rte_crypto_sym_xform *cipher_xform) +{ + switch (crypto_algo) + { + case IPSEC_CRYPTO_ALG_NONE: + cipher_xform->cipher.algo = RTE_CRYPTO_CIPHER_NULL; + break; + case IPSEC_CRYPTO_ALG_AES_CBC_128: + case IPSEC_CRYPTO_ALG_AES_CBC_192: + case IPSEC_CRYPTO_ALG_AES_CBC_256: + cipher_xform->cipher.algo = RTE_CRYPTO_CIPHER_AES_CBC; + break; + default: + return -1; + } + + cipher_xform->type = RTE_CRYPTO_SYM_XFORM_CIPHER; + + return 0; +} + +static_always_inline int +translate_integ_algo(ipsec_integ_alg_t integ_alg, + struct rte_crypto_sym_xform *auth_xform) +{ + switch (integ_alg) { + case IPSEC_INTEG_ALG_NONE: + auth_xform->auth.algo = RTE_CRYPTO_AUTH_NULL; + auth_xform->auth.digest_length = 0; + break; + case IPSEC_INTEG_ALG_SHA1_96: + auth_xform->auth.algo = RTE_CRYPTO_AUTH_SHA1_HMAC; + auth_xform->auth.digest_length = 12; + break; + case IPSEC_INTEG_ALG_SHA_256_96: + auth_xform->auth.algo = RTE_CRYPTO_AUTH_SHA256_HMAC; + auth_xform->auth.digest_length = 12; + break; + case IPSEC_INTEG_ALG_SHA_256_128: + auth_xform->auth.algo = RTE_CRYPTO_AUTH_SHA256_HMAC; + auth_xform->auth.digest_length = 16; + break; + case IPSEC_INTEG_ALG_SHA_384_192: + auth_xform->auth.algo = RTE_CRYPTO_AUTH_SHA384_HMAC; + auth_xform->auth.digest_length = 24; + break; + case IPSEC_INTEG_ALG_SHA_512_256: + auth_xform->auth.algo = RTE_CRYPTO_AUTH_SHA512_HMAC; + auth_xform->auth.digest_length = 32; + break; + default: + return -1; + } + + auth_xform->type = RTE_CRYPTO_SYM_XFORM_AUTH; + + return 0; +} + +static_always_inline int +create_sym_sess(ipsec_sa_t *sa, crypto_sa_session_t *sa_sess, u8 is_outbound) +{ + u32 cpu_index = os_get_cpu_number(); + dpdk_crypto_main_t * dcm = &dpdk_crypto_main; + crypto_worker_main_t *cwm = &dcm->workers_main[cpu_index]; + struct rte_crypto_sym_xform cipher_xform = {0}; + struct rte_crypto_sym_xform auth_xform = {0}; + struct rte_crypto_sym_xform *xfs; + uword key = 0, *data; + crypto_worker_qp_key_t *p_key = (crypto_worker_qp_key_t *)&key; + + cipher_xform.type = RTE_CRYPTO_SYM_XFORM_CIPHER; + cipher_xform.cipher.key.data = sa->crypto_key; + cipher_xform.cipher.key.length = sa->crypto_key_len; + + auth_xform.type = RTE_CRYPTO_SYM_XFORM_AUTH; + auth_xform.auth.key.data = sa->integ_key; + auth_xform.auth.key.length = sa->integ_key_len; + + if (translate_crypto_algo(sa->crypto_alg, &cipher_xform) < 0) + return -1; + p_key->cipher_algo = cipher_xform.cipher.algo; + + if (translate_integ_algo(sa->integ_alg, &auth_xform) < 0) + return -1; + p_key->auth_algo = auth_xform.auth.algo; + + if (is_outbound) + { + cipher_xform.cipher.op = RTE_CRYPTO_CIPHER_OP_ENCRYPT; + auth_xform.auth.op = RTE_CRYPTO_AUTH_OP_GENERATE; + cipher_xform.next = &auth_xform; + xfs = &cipher_xform; + } + else + { + cipher_xform.cipher.op = RTE_CRYPTO_CIPHER_OP_DECRYPT; + auth_xform.auth.op = RTE_CRYPTO_AUTH_OP_VERIFY; + auth_xform.next = &cipher_xform; + xfs = &auth_xform; + } + + p_key->is_outbound = is_outbound; + + data = hash_get(cwm->algo_qp_map, key); + if (!data) + return -1; + + sa_sess->sess = + rte_cryptodev_sym_session_create(cwm->qp_data[*data].dev_id, xfs); + + if (!sa_sess->sess) + return -1; + + sa_sess->qp_index = (u8)*data; + + return 0; +} + +#endif /* __DPDK_ESP_H__ */ + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/vnet/vnet/devices/dpdk/ipsec/esp_decrypt.c b/vnet/vnet/devices/dpdk/ipsec/esp_decrypt.c new file mode 100644 index 00000000..c898d05e --- /dev/null +++ b/vnet/vnet/devices/dpdk/ipsec/esp_decrypt.c @@ -0,0 +1,549 @@ +/* + * esp_decrypt.c : IPSec ESP Decrypt node using DPDK Cryptodev + * + * Copyright (c) 2016 Intel and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include + +#include +#include +#include + +#define foreach_esp_decrypt_next \ +_(DROP, "error-drop") \ +_(IP4_INPUT, "ip4-input") \ +_(IP6_INPUT, "ip6-input") + +#define _(v, s) ESP_DECRYPT_NEXT_##v, +typedef enum { + foreach_esp_decrypt_next +#undef _ + ESP_DECRYPT_N_NEXT, +} esp_decrypt_next_t; + +#define foreach_esp_decrypt_error \ + _(RX_PKTS, "ESP pkts received") \ + _(DECRYPTION_FAILED, "ESP decryption failed") \ + _(REPLAY, "SA replayed packet") \ + _(NOT_IP, "Not IP packet (dropped)") \ + _(ENQ_FAIL, "Enqueue failed (buffer full)") \ + _(NO_CRYPTODEV, "Cryptodev not configured") \ + _(BAD_LEN, "Invalid ciphertext length") \ + _(UNSUPPORTED, "Cipher/Auth not supported") + + +typedef enum { +#define _(sym,str) ESP_DECRYPT_ERROR_##sym, + foreach_esp_decrypt_error +#undef _ + ESP_DECRYPT_N_ERROR, +} esp_decrypt_error_t; + +static char * esp_decrypt_error_strings[] = { +#define _(sym,string) string, + foreach_esp_decrypt_error +#undef _ +}; + +vlib_node_registration_t dpdk_esp_decrypt_node; + +typedef struct { + ipsec_crypto_alg_t crypto_alg; + ipsec_integ_alg_t integ_alg; +} esp_decrypt_trace_t; + +/* packet trace format function */ +static u8 * format_esp_decrypt_trace (u8 * s, va_list * args) +{ + CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *); + CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *); + esp_decrypt_trace_t * t = va_arg (*args, esp_decrypt_trace_t *); + + s = format (s, "esp: crypto %U integrity %U", + format_ipsec_crypto_alg, t->crypto_alg, + format_ipsec_integ_alg, t->integ_alg); + return s; +} + +static uword +dpdk_esp_decrypt_node_fn (vlib_main_t * vm, + vlib_node_runtime_t * node, + vlib_frame_t * from_frame) +{ + u32 n_left_from, *from, *to_next, next_index; + ipsec_main_t *im = &ipsec_main; + u32 cpu_index = os_get_cpu_number(); + dpdk_crypto_main_t * dcm = &dpdk_crypto_main; + dpdk_esp_main_t * em = &dpdk_esp_main; + u32 i; + + from = vlib_frame_vector_args (from_frame); + n_left_from = from_frame->n_vectors; + + if (PREDICT_FALSE(!dcm->workers_main)) + { + vlib_node_increment_counter (vm, dpdk_esp_decrypt_node.index, + ESP_DECRYPT_ERROR_NO_CRYPTODEV, n_left_from); + vlib_buffer_free(vm, from, n_left_from); + return n_left_from; + } + + crypto_worker_main_t *cwm = vec_elt_at_index(dcm->workers_main, cpu_index); + u32 n_qps = vec_len(cwm->qp_data); + struct rte_crypto_op ** cops_to_enq[n_qps]; + u32 n_cop_qp[n_qps], * bi_to_enq[n_qps]; + + for (i = 0; i < n_qps; i++) + { + bi_to_enq[i] = cwm->qp_data[i].bi; + cops_to_enq[i] = cwm->qp_data[i].cops; + } + + memset(n_cop_qp, 0, n_qps * sizeof(u32)); + + crypto_alloc_cops(); + + next_index = ESP_DECRYPT_NEXT_DROP; + + while (n_left_from > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0, sa_index0 = ~0, seq, icv_size; + vlib_buffer_t * b0; + esp_header_t * esp0; + ipsec_sa_t * sa0; + struct rte_mbuf * mb0 = 0; + const int BLOCK_SIZE = 16; + const int IV_SIZE = 16; + crypto_sa_session_t * sa_sess; + void * sess; + u16 qp_index; + struct rte_crypto_op * cop = 0; + + bi0 = from[0]; + from += 1; + n_left_from -= 1; + + b0 = vlib_get_buffer (vm, bi0); + esp0 = vlib_buffer_get_current (b0); + + sa_index0 = vnet_buffer(b0)->ipsec.sad_index; + sa0 = pool_elt_at_index (im->sad, sa_index0); + + seq = clib_host_to_net_u32(esp0->seq); + + /* anti-replay check */ + if (sa0->use_anti_replay) + { + int rv = 0; + + if (PREDICT_TRUE(sa0->use_esn)) + rv = esp_replay_check_esn(sa0, seq); + else + rv = esp_replay_check(sa0, seq); + + if (PREDICT_FALSE(rv)) + { + clib_warning ("anti-replay SPI %u seq %u", sa0->spi, seq); + vlib_node_increment_counter (vm, dpdk_esp_decrypt_node.index, + ESP_DECRYPT_ERROR_REPLAY, 1); + to_next[0] = bi0; + to_next += 1; + n_left_to_next -= 1; + goto trace; + } + } + + if (PREDICT_FALSE(sa0->integ_alg == IPSEC_INTEG_ALG_NONE) || + PREDICT_FALSE(sa0->crypto_alg == IPSEC_CRYPTO_ALG_NONE)) + { + clib_warning ("SPI %u : only cipher + auth supported", sa0->spi); + vlib_node_increment_counter (vm, dpdk_esp_decrypt_node.index, + ESP_DECRYPT_ERROR_UNSUPPORTED, 1); + to_next[0] = bi0; + to_next += 1; + n_left_to_next -= 1; + goto trace; + } + + sa_sess = pool_elt_at_index(cwm->sa_sess_d[0], sa_index0); + + if (PREDICT_FALSE(!sa_sess->sess)) + { + int ret = create_sym_sess(sa0, sa_sess, 0); + ASSERT(ret == 0); + } + + sess = sa_sess->sess; + qp_index = sa_sess->qp_index; + + ASSERT (vec_len (vec_elt (cwm->qp_data, qp_index).free_cops) > 0); + cop = vec_pop (vec_elt (cwm->qp_data, qp_index).free_cops); + ASSERT (cop->status == RTE_CRYPTO_OP_STATUS_NOT_PROCESSED); + + cops_to_enq[qp_index][0] = cop; + cops_to_enq[qp_index] += 1; + n_cop_qp[qp_index] += 1; + bi_to_enq[qp_index][0] = bi0; + bi_to_enq[qp_index] += 1; + + rte_crypto_op_attach_sym_session(cop, sess); + + icv_size = em->esp_integ_algs[sa0->integ_alg].trunc_size; + + /* Convert vlib buffer to mbuf */ + mb0 = rte_mbuf_from_vlib_buffer(b0); + mb0->data_len = b0->current_length; + mb0->pkt_len = b0->current_length; + mb0->data_off = RTE_PKTMBUF_HEADROOM + b0->current_data; + + /* Outer IP header has already been stripped */ + u16 payload_len = rte_pktmbuf_pkt_len(mb0) - sizeof (esp_header_t) - + IV_SIZE - icv_size; + + if ((payload_len & (BLOCK_SIZE - 1)) || (payload_len <= 0)) + { + clib_warning ("payload %u not multiple of %d\n", + payload_len, BLOCK_SIZE); + vlib_node_increment_counter (vm, dpdk_esp_decrypt_node.index, + ESP_DECRYPT_ERROR_BAD_LEN, 1); + vec_add (vec_elt (cwm->qp_data, qp_index).free_cops, &cop, 1); + bi_to_enq[qp_index] -= 1; + cops_to_enq[qp_index] -= 1; + n_cop_qp[qp_index] -= 1; + to_next[0] = bi0; + to_next += 1; + n_left_to_next -= 1; + goto trace; + } + + struct rte_crypto_sym_op *sym_cop = (struct rte_crypto_sym_op *)(cop + 1); + + sym_cop->m_src = mb0; + sym_cop->cipher.data.offset = sizeof (esp_header_t) + IV_SIZE; + sym_cop->cipher.data.length = payload_len; + + sym_cop->cipher.iv.data = + rte_pktmbuf_mtod_offset(mb0, void*, sizeof (esp_header_t)); + sym_cop->cipher.iv.phys_addr = + rte_pktmbuf_mtophys_offset(mb0, sizeof (esp_header_t)); + sym_cop->cipher.iv.length = IV_SIZE; + + if (sa0->use_esn) + { + dpdk_cop_priv_t* priv = (dpdk_cop_priv_t*) (sym_cop + 1); + u8* payload_end = + rte_pktmbuf_mtod_offset(mb0, u8*, sizeof(esp_header_t) + IV_SIZE + + payload_len); + + memcpy (priv->icv, payload_end, icv_size); + *((u32*) payload_end) = sa0->seq_hi; + sym_cop->auth.data.offset = 0; + sym_cop->auth.data.length = + sizeof(esp_header_t) + IV_SIZE + payload_len + sizeof(sa0->seq_hi); + sym_cop->auth.digest.data = priv->icv; + sym_cop->auth.digest.phys_addr = + cop->phys_addr + (uintptr_t) priv->icv - (uintptr_t) cop; + sym_cop->auth.digest.length = icv_size; + } + else + { + sym_cop->auth.data.offset = 0; + sym_cop->auth.data.length = sizeof(esp_header_t) + + IV_SIZE + payload_len; + + sym_cop->auth.digest.data = + rte_pktmbuf_mtod_offset(mb0, void*, + rte_pktmbuf_pkt_len(mb0) - icv_size); + sym_cop->auth.digest.phys_addr = + rte_pktmbuf_mtophys_offset(mb0, + rte_pktmbuf_pkt_len(mb0) - icv_size); + sym_cop->auth.digest.length = icv_size; + } + +trace: + if (PREDICT_FALSE(b0->flags & VLIB_BUFFER_IS_TRACED)) + { + esp_decrypt_trace_t *tr = vlib_add_trace (vm, node, b0, sizeof (*tr)); + tr->crypto_alg = sa0->crypto_alg; + tr->integ_alg = sa0->integ_alg; + } + } + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + vlib_node_increment_counter (vm, dpdk_esp_decrypt_node.index, + ESP_DECRYPT_ERROR_RX_PKTS, + from_frame->n_vectors); + crypto_qp_data_t *qpd; + /* *INDENT-OFF* */ + vec_foreach_index (i, cwm->qp_data) + { + u32 enq; + + qpd = vec_elt_at_index(cwm->qp_data, i); + enq = rte_cryptodev_enqueue_burst(qpd->dev_id, qpd->qp_id, + qpd->cops, n_cop_qp[i]); + qpd->inflights += enq; + + if (PREDICT_FALSE(enq < n_cop_qp[i])) + { + crypto_free_cop (qpd, &qpd->cops[enq], n_cop_qp[i] - enq); + vlib_buffer_free (vm, &qpd->bi[enq], n_cop_qp[i] - enq); + + vlib_node_increment_counter (vm, dpdk_esp_decrypt_node.index, + ESP_DECRYPT_ERROR_ENQ_FAIL, + n_cop_qp[i] - enq); + } + } + /* *INDENT-ON* */ + + return from_frame->n_vectors; +} + +VLIB_REGISTER_NODE (dpdk_esp_decrypt_node) = { + .function = dpdk_esp_decrypt_node_fn, + .name = "dpdk-esp-decrypt", + .vector_size = sizeof (u32), + .format_trace = format_esp_decrypt_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + + .n_errors = ARRAY_LEN(esp_decrypt_error_strings), + .error_strings = esp_decrypt_error_strings, + + .n_next_nodes = ESP_DECRYPT_N_NEXT, + .next_nodes = { +#define _(s,n) [ESP_DECRYPT_NEXT_##s] = n, + foreach_esp_decrypt_next +#undef _ + }, +}; + +VLIB_NODE_FUNCTION_MULTIARCH (dpdk_esp_decrypt_node, dpdk_esp_decrypt_node_fn) + +/* + * Decrypt Post Node + */ + +#define foreach_esp_decrypt_post_error \ + _(PKTS, "ESP post pkts") + +typedef enum { +#define _(sym,str) ESP_DECRYPT_POST_ERROR_##sym, + foreach_esp_decrypt_post_error +#undef _ + ESP_DECRYPT_POST_N_ERROR, +} esp_decrypt_post_error_t; + +static char * esp_decrypt_post_error_strings[] = { +#define _(sym,string) string, + foreach_esp_decrypt_post_error +#undef _ +}; + +vlib_node_registration_t dpdk_esp_decrypt_post_node; + +static u8 * format_esp_decrypt_post_trace (u8 * s, va_list * args) +{ + return s; +} + +static uword +dpdk_esp_decrypt_post_node_fn (vlib_main_t * vm, + vlib_node_runtime_t * node, + vlib_frame_t * from_frame) +{ + u32 n_left_from, *from, *to_next = 0, next_index; + ipsec_sa_t * sa0; + u32 sa_index0 = ~0; + ipsec_main_t *im = &ipsec_main; + dpdk_esp_main_t *em = &dpdk_esp_main; + + from = vlib_frame_vector_args (from_frame); + n_left_from = from_frame->n_vectors; + + next_index = node->cached_next_index; + + while (n_left_from > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + while (n_left_from > 0 && n_left_to_next > 0) + { + esp_footer_t * f0; + const u32 IV_SIZE = 16; + u32 bi0, next0, icv_size; + vlib_buffer_t * b0 = 0; + ip4_header_t *ih4 = 0, *oh4 = 0; + ip6_header_t *ih6 = 0, *oh6 = 0; + u8 tunnel_mode = 1; + u8 transport_ip6 = 0; + + next0 = ESP_DECRYPT_NEXT_DROP; + + bi0 = from[0]; + from += 1; + n_left_from -= 1; + n_left_to_next -= 1; + + b0 = vlib_get_buffer (vm, bi0); + + sa_index0 = vnet_buffer(b0)->ipsec.sad_index; + sa0 = pool_elt_at_index (im->sad, sa_index0); + + to_next[0] = bi0; + to_next += 1; + + icv_size = em->esp_integ_algs[sa0->integ_alg].trunc_size; + + if (sa0->use_anti_replay) + { + esp_header_t * esp0 = vlib_buffer_get_current (b0); + u32 seq; + seq = clib_host_to_net_u32(esp0->seq); + if (PREDICT_TRUE(sa0->use_esn)) + esp_replay_advance_esn(sa0, seq); + else + esp_replay_advance(sa0, seq); + } + + ih4 = (ip4_header_t *) (b0->data + sizeof(ethernet_header_t)); + vlib_buffer_advance (b0, sizeof (esp_header_t) + IV_SIZE); + + b0->current_length -= (icv_size + 2); + b0->flags = VLIB_BUFFER_TOTAL_LENGTH_VALID; + f0 = (esp_footer_t *) ((u8 *) vlib_buffer_get_current (b0) + + b0->current_length); + b0->current_length -= f0->pad_length; + + /* transport mode */ + if (PREDICT_FALSE(!sa0->is_tunnel && !sa0->is_tunnel_ip6)) + { + tunnel_mode = 0; + + if (PREDICT_TRUE((ih4->ip_version_and_header_length & 0xF0) != 0x40)) + { + if (PREDICT_TRUE((ih4->ip_version_and_header_length & 0xF0) == 0x60)) + transport_ip6 = 1; + else + { + clib_warning("next header: 0x%x", f0->next_header); + vlib_node_increment_counter (vm, dpdk_esp_decrypt_node.index, + ESP_DECRYPT_ERROR_NOT_IP, 1); + goto trace; + } + } + } + + if (PREDICT_TRUE (tunnel_mode)) + { + if (PREDICT_TRUE(f0->next_header == IP_PROTOCOL_IP_IN_IP)) + next0 = ESP_DECRYPT_NEXT_IP4_INPUT; + else if (f0->next_header == IP_PROTOCOL_IPV6) + next0 = ESP_DECRYPT_NEXT_IP6_INPUT; + else + { + clib_warning("next header: 0x%x", f0->next_header); + vlib_node_increment_counter (vm, dpdk_esp_decrypt_node.index, + ESP_DECRYPT_ERROR_DECRYPTION_FAILED, + 1); + goto trace; + } + } + /* transport mode */ + else + { + if (PREDICT_FALSE(transport_ip6)) + { + next0 = ESP_DECRYPT_NEXT_IP6_INPUT; + ih6 = (ip6_header_t *) (b0->data + sizeof(ethernet_header_t)); + vlib_buffer_advance (b0, -sizeof(ip6_header_t)); + oh6 = vlib_buffer_get_current (b0); + memmove(oh6, ih6, sizeof(ip6_header_t)); + + oh6->protocol = f0->next_header; + oh6->payload_length = + clib_host_to_net_u16 (vlib_buffer_length_in_chain(vm, b0) - + sizeof (ip6_header_t)); + } + else + { + next0 = ESP_DECRYPT_NEXT_IP4_INPUT; + vlib_buffer_advance (b0, -sizeof(ip4_header_t)); + oh4 = vlib_buffer_get_current (b0); + + oh4->ip_version_and_header_length = 0x45; + oh4->tos = ih4->tos; + oh4->fragment_id = 0; + oh4->flags_and_fragment_offset = 0; + oh4->ttl = ih4->ttl; + oh4->protocol = f0->next_header; + oh4->src_address.as_u32 = ih4->src_address.as_u32; + oh4->dst_address.as_u32 = ih4->dst_address.as_u32; + oh4->length = + clib_host_to_net_u16 (vlib_buffer_length_in_chain (vm, b0)); + oh4->checksum = ip4_header_checksum (oh4); + } + } + + vnet_buffer (b0)->sw_if_index[VLIB_TX] = (u32)~0; + +trace: + if (PREDICT_FALSE(b0->flags & VLIB_BUFFER_IS_TRACED)) + { + esp_decrypt_trace_t *tr = vlib_add_trace (vm, node, b0, sizeof (*tr)); + tr->crypto_alg = sa0->crypto_alg; + tr->integ_alg = sa0->integ_alg; + } + + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, + to_next, n_left_to_next, bi0, next0); + } + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + vlib_node_increment_counter (vm, dpdk_esp_decrypt_post_node.index, + ESP_DECRYPT_POST_ERROR_PKTS, + from_frame->n_vectors); + + return from_frame->n_vectors; +} + +VLIB_REGISTER_NODE (dpdk_esp_decrypt_post_node) = { + .function = dpdk_esp_decrypt_post_node_fn, + .name = "dpdk-esp-decrypt-post", + .vector_size = sizeof (u32), + .format_trace = format_esp_decrypt_post_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + + .n_errors = ARRAY_LEN(esp_decrypt_post_error_strings), + .error_strings = esp_decrypt_post_error_strings, + + .n_next_nodes = ESP_DECRYPT_N_NEXT, + .next_nodes = { +#define _(s,n) [ESP_DECRYPT_NEXT_##s] = n, + foreach_esp_decrypt_next +#undef _ + }, +}; + +VLIB_NODE_FUNCTION_MULTIARCH (dpdk_esp_decrypt_post_node, dpdk_esp_decrypt_post_node_fn) diff --git a/vnet/vnet/devices/dpdk/ipsec/esp_encrypt.c b/vnet/vnet/devices/dpdk/ipsec/esp_encrypt.c new file mode 100644 index 00000000..aef4b90d --- /dev/null +++ b/vnet/vnet/devices/dpdk/ipsec/esp_encrypt.c @@ -0,0 +1,554 @@ +/* + * esp_encrypt.c : IPSec ESP encrypt node using DPDK Cryptodev + * + * Copyright (c) 2016 Intel and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include + +#include +#include +#include + +#define foreach_esp_encrypt_next \ +_(DROP, "error-drop") \ +_(IP4_INPUT, "ip4-input") \ +_(IP6_INPUT, "ip6-input") \ +_(INTERFACE_OUTPUT, "interface-output") + +#define _(v, s) ESP_ENCRYPT_NEXT_##v, +typedef enum +{ + foreach_esp_encrypt_next +#undef _ + ESP_ENCRYPT_N_NEXT, +} esp_encrypt_next_t; + +#define foreach_esp_encrypt_error \ + _(RX_PKTS, "ESP pkts received") \ + _(SEQ_CYCLED, "sequence number cycled") \ + _(ENQ_FAIL, "Enqueue failed (buffer full)") \ + _(NO_CRYPTODEV, "Cryptodev not configured") \ + _(UNSUPPORTED, "Cipher/Auth not supported") + + +typedef enum +{ +#define _(sym,str) ESP_ENCRYPT_ERROR_##sym, + foreach_esp_encrypt_error +#undef _ + ESP_ENCRYPT_N_ERROR, +} esp_encrypt_error_t; + +static char *esp_encrypt_error_strings[] = { +#define _(sym,string) string, + foreach_esp_encrypt_error +#undef _ +}; + +vlib_node_registration_t dpdk_esp_encrypt_node; + +typedef struct +{ + u32 spi; + u32 seq; + ipsec_crypto_alg_t crypto_alg; + ipsec_integ_alg_t integ_alg; +} esp_encrypt_trace_t; + +/* packet trace format function */ +static u8 * +format_esp_encrypt_trace (u8 * s, va_list * args) +{ + CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *); + CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *); + esp_encrypt_trace_t *t = va_arg (*args, esp_encrypt_trace_t *); + + s = format (s, "esp: spi %u seq %u crypto %U integrity %U", + t->spi, t->seq, + format_ipsec_crypto_alg, t->crypto_alg, + format_ipsec_integ_alg, t->integ_alg); + return s; +} + +static uword +dpdk_esp_encrypt_node_fn (vlib_main_t * vm, + vlib_node_runtime_t * node, + vlib_frame_t * from_frame) +{ + u32 n_left_from, *from, *to_next, next_index; + ipsec_main_t *im = &ipsec_main; + u32 cpu_index = os_get_cpu_number (); + dpdk_crypto_main_t *dcm = &dpdk_crypto_main; + dpdk_esp_main_t *em = &dpdk_esp_main; + u32 i; + + from = vlib_frame_vector_args (from_frame); + n_left_from = from_frame->n_vectors; + + if (PREDICT_FALSE (!dcm->workers_main)) + { + /* Likely there are not enough cryptodevs, so drop frame */ + vlib_node_increment_counter (vm, dpdk_esp_encrypt_node.index, + ESP_ENCRYPT_ERROR_NO_CRYPTODEV, + n_left_from); + vlib_buffer_free (vm, from, n_left_from); + return n_left_from; + } + + crypto_worker_main_t *cwm = vec_elt_at_index (dcm->workers_main, cpu_index); + u32 n_qps = vec_len (cwm->qp_data); + struct rte_crypto_op **cops_to_enq[n_qps]; + u32 n_cop_qp[n_qps], *bi_to_enq[n_qps]; + + for (i = 0; i < n_qps; i++) + { + bi_to_enq[i] = cwm->qp_data[i].bi; + cops_to_enq[i] = cwm->qp_data[i].cops; + } + + memset (n_cop_qp, 0, n_qps * sizeof (u32)); + + crypto_alloc_cops (); + + next_index = ESP_ENCRYPT_NEXT_DROP; + + while (n_left_from > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0, next0; + vlib_buffer_t *b0 = 0; + u32 sa_index0; + ipsec_sa_t *sa0; + ip4_and_esp_header_t *ih0, *oh0 = 0; + ip6_and_esp_header_t *ih6_0, *oh6_0 = 0; + struct rte_mbuf *mb0 = 0; + esp_footer_t *f0; + u8 is_ipv6; + u8 ip_hdr_size; + u8 next_hdr_type; + u8 transport_mode = 0; + const int BLOCK_SIZE = 16; + const int IV_SIZE = 16; + u16 orig_sz; + crypto_sa_session_t *sa_sess; + void *sess; + struct rte_crypto_op *cop = 0; + u16 qp_index; + + bi0 = from[0]; + from += 1; + n_left_from -= 1; + + b0 = vlib_get_buffer (vm, bi0); + sa_index0 = vnet_buffer (b0)->ipsec.sad_index; + sa0 = pool_elt_at_index (im->sad, sa_index0); + + if (PREDICT_FALSE (esp_seq_advance (sa0))) + { + clib_warning ("sequence number counter has cycled SPI %u", + sa0->spi); + vlib_node_increment_counter (vm, dpdk_esp_encrypt_node.index, + ESP_ENCRYPT_ERROR_SEQ_CYCLED, 1); + //TODO: rekey SA + to_next[0] = bi0; + to_next += 1; + n_left_to_next -= 1; + goto trace; + } + + sa_sess = pool_elt_at_index (cwm->sa_sess_d[1], sa_index0); + if (PREDICT_FALSE (!sa_sess->sess)) + { + int ret = create_sym_sess (sa0, sa_sess, 1); + ASSERT (ret == 0); + } + + qp_index = sa_sess->qp_index; + sess = sa_sess->sess; + + ASSERT (vec_len (vec_elt (cwm->qp_data, qp_index).free_cops) > 0); + cop = vec_pop (vec_elt (cwm->qp_data, qp_index).free_cops); + ASSERT (cop->status == RTE_CRYPTO_OP_STATUS_NOT_PROCESSED); + + cops_to_enq[qp_index][0] = cop; + cops_to_enq[qp_index] += 1; + n_cop_qp[qp_index] += 1; + bi_to_enq[qp_index][0] = bi0; + bi_to_enq[qp_index] += 1; + + ssize_t adv; + ih0 = vlib_buffer_get_current (b0); + orig_sz = b0->current_length; + is_ipv6 = (ih0->ip4.ip_version_and_header_length & 0xF0) == 0x60; + /* is ipv6 */ + if (PREDICT_TRUE (sa0->is_tunnel)) + { + if (PREDICT_TRUE (!is_ipv6)) + adv = -sizeof (ip4_and_esp_header_t); + else + adv = -sizeof (ip6_and_esp_header_t); + } + else + { + adv = -sizeof (esp_header_t); + if (PREDICT_TRUE (!is_ipv6)) + orig_sz -= sizeof (ip4_header_t); + else + orig_sz -= sizeof (ip6_header_t); + } + + /*transport mode save the eth header before it is overwritten */ + if (PREDICT_FALSE (!sa0->is_tunnel)) + { + ethernet_header_t *ieh0 = (ethernet_header_t *) + ((u8 *) vlib_buffer_get_current (b0) - + sizeof (ethernet_header_t)); + ethernet_header_t *oeh0 = + (ethernet_header_t *) ((u8 *) ieh0 + (adv - IV_SIZE)); + clib_memcpy (oeh0, ieh0, sizeof (ethernet_header_t)); + } + + vlib_buffer_advance (b0, adv - IV_SIZE); + + /* XXX IP6/ip4 and IP4/IP6 not supported, only IP4/IP4 and IP6/IP6 */ + + /* is ipv6 */ + if (PREDICT_FALSE (is_ipv6)) + { + ih6_0 = (ip6_and_esp_header_t *) ih0; + ip_hdr_size = sizeof (ip6_header_t); + oh6_0 = vlib_buffer_get_current (b0); + + if (PREDICT_TRUE (sa0->is_tunnel)) + { + next_hdr_type = IP_PROTOCOL_IPV6; + oh6_0->ip6.ip_version_traffic_class_and_flow_label = + ih6_0->ip6.ip_version_traffic_class_and_flow_label; + } + else + { + next_hdr_type = ih6_0->ip6.protocol; + memmove (oh6_0, ih6_0, sizeof (ip6_header_t)); + } + + oh6_0->ip6.protocol = IP_PROTOCOL_IPSEC_ESP; + oh6_0->ip6.hop_limit = 254; + oh6_0->esp.spi = clib_net_to_host_u32 (sa0->spi); + oh6_0->esp.seq = clib_net_to_host_u32 (sa0->seq); + } + else + { + ip_hdr_size = sizeof (ip4_header_t); + next_hdr_type = IP_PROTOCOL_IP_IN_IP; + oh0 = vlib_buffer_get_current (b0); + + oh0->ip4.ip_version_and_header_length = 0x45; + oh0->ip4.tos = ih0->ip4.tos; + oh0->ip4.fragment_id = 0; + oh0->ip4.flags_and_fragment_offset = 0; + oh0->ip4.ttl = 254; + oh0->ip4.protocol = IP_PROTOCOL_IPSEC_ESP; + oh0->esp.spi = clib_net_to_host_u32 (sa0->spi); + oh0->esp.seq = clib_net_to_host_u32 (sa0->seq); + } + + if (PREDICT_TRUE (sa0->is_tunnel && !sa0->is_tunnel_ip6)) + { + oh0->ip4.src_address.as_u32 = sa0->tunnel_src_addr.ip4.as_u32; + oh0->ip4.dst_address.as_u32 = sa0->tunnel_dst_addr.ip4.as_u32; + + /* in tunnel mode send it back to FIB */ + next0 = ESP_ENCRYPT_NEXT_IP4_INPUT; + vnet_buffer (b0)->sw_if_index[VLIB_TX] = (u32) ~ 0; + } + else if (sa0->is_tunnel && sa0->is_tunnel_ip6) + { + oh6_0->ip6.src_address.as_u64[0] = + sa0->tunnel_src_addr.ip6.as_u64[0]; + oh6_0->ip6.src_address.as_u64[1] = + sa0->tunnel_src_addr.ip6.as_u64[1]; + oh6_0->ip6.dst_address.as_u64[0] = + sa0->tunnel_dst_addr.ip6.as_u64[0]; + oh6_0->ip6.dst_address.as_u64[1] = + sa0->tunnel_dst_addr.ip6.as_u64[1]; + + /* in tunnel mode send it back to FIB */ + next0 = ESP_ENCRYPT_NEXT_IP6_INPUT; + vnet_buffer (b0)->sw_if_index[VLIB_TX] = (u32) ~ 0; + } + else + { + next0 = ESP_ENCRYPT_NEXT_INTERFACE_OUTPUT; + transport_mode = 1; + /*ipv6 already handled */ + if (PREDICT_TRUE (!is_ipv6)) + { + next_hdr_type = ih0->ip4.protocol; + oh0->ip4.src_address.as_u32 = ih0->ip4.src_address.as_u32; + oh0->ip4.dst_address.as_u32 = ih0->ip4.dst_address.as_u32; + } + } + + ASSERT (sa0->crypto_alg < IPSEC_CRYPTO_N_ALG); + ASSERT (sa0->crypto_alg != IPSEC_CRYPTO_ALG_NONE); + + int blocks = 1 + (orig_sz + 1) / BLOCK_SIZE; + + /* pad packet in input buffer */ + u8 pad_bytes = BLOCK_SIZE * blocks - 2 - orig_sz; + u8 i; + u8 *padding = vlib_buffer_get_current (b0) + b0->current_length; + + for (i = 0; i < pad_bytes; ++i) + padding[i] = i + 1; + + f0 = vlib_buffer_get_current (b0) + b0->current_length + pad_bytes; + f0->pad_length = pad_bytes; + f0->next_header = next_hdr_type; + b0->current_length += pad_bytes + 2 + + em->esp_integ_algs[sa0->integ_alg].trunc_size; + + vnet_buffer (b0)->sw_if_index[VLIB_RX] = + vnet_buffer (b0)->sw_if_index[VLIB_RX]; + b0->flags |= VLIB_BUFFER_TOTAL_LENGTH_VALID; + + struct rte_crypto_sym_op *sym_cop; + sym_cop = (struct rte_crypto_sym_op *) (cop + 1); + + dpdk_cop_priv_t *priv = (dpdk_cop_priv_t *) (sym_cop + 1); + + vnet_buffer (b0)->unused[0] = next0; + priv->iv[0] = sa0->seq; + priv->iv[1] = sa0->seq_hi; + + mb0 = rte_mbuf_from_vlib_buffer (b0); + mb0->data_len = b0->current_length; + mb0->pkt_len = b0->current_length; + mb0->data_off = RTE_PKTMBUF_HEADROOM + b0->current_data; + + rte_crypto_op_attach_sym_session (cop, sess); + + sym_cop->m_src = mb0; + sym_cop->cipher.data.offset = ip_hdr_size + sizeof (esp_header_t); + sym_cop->cipher.data.length = BLOCK_SIZE * blocks + IV_SIZE; + + sym_cop->cipher.iv.data = (u8 *) priv->iv; + sym_cop->cipher.iv.phys_addr = cop->phys_addr + + (uintptr_t) priv->iv - (uintptr_t) cop; + sym_cop->cipher.iv.length = IV_SIZE; + + ASSERT (sa0->integ_alg < IPSEC_INTEG_N_ALG); + ASSERT (sa0->integ_alg != IPSEC_INTEG_ALG_NONE); + + sym_cop->auth.data.offset = ip_hdr_size; + sym_cop->auth.data.length = b0->current_length - ip_hdr_size - + em->esp_integ_algs[sa0->integ_alg].trunc_size; + + sym_cop->auth.digest.data = vlib_buffer_get_current (b0) + + b0->current_length - + em->esp_integ_algs[sa0->integ_alg].trunc_size; + sym_cop->auth.digest.phys_addr = rte_pktmbuf_mtophys_offset (mb0, + b0->current_length + - + em->esp_integ_algs + [sa0->integ_alg].trunc_size); + sym_cop->auth.digest.length = + em->esp_integ_algs[sa0->integ_alg].trunc_size; + + if (PREDICT_FALSE (sa0->use_esn)) + { + u8 *payload_end = + vlib_buffer_get_current (b0) + b0->current_length; + *((u32 *) payload_end) = sa0->seq_hi; + sym_cop->auth.data.length += sizeof (sa0->seq_hi); + } + + if (PREDICT_FALSE (is_ipv6)) + { + oh6_0->ip6.payload_length = + clib_host_to_net_u16 (vlib_buffer_length_in_chain (vm, b0) - + sizeof (ip6_header_t)); + } + else + { + oh0->ip4.length = + clib_host_to_net_u16 (vlib_buffer_length_in_chain (vm, b0)); + oh0->ip4.checksum = ip4_header_checksum (&oh0->ip4); + } + + if (transport_mode) + vlib_buffer_advance (b0, -sizeof (ethernet_header_t)); + + trace: + if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + esp_encrypt_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + tr->spi = sa0->spi; + tr->seq = sa0->seq - 1; + tr->crypto_alg = sa0->crypto_alg; + tr->integ_alg = sa0->integ_alg; + } + } + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + vlib_node_increment_counter (vm, dpdk_esp_encrypt_node.index, + ESP_ENCRYPT_ERROR_RX_PKTS, + from_frame->n_vectors); + crypto_qp_data_t *qpd; + /* *INDENT-OFF* */ + vec_foreach_index (i, cwm->qp_data) + { + u32 enq; + + qpd = vec_elt_at_index(cwm->qp_data, i); + enq = rte_cryptodev_enqueue_burst(qpd->dev_id, qpd->qp_id, + qpd->cops, n_cop_qp[i]); + qpd->inflights += enq; + + if (PREDICT_FALSE(enq < n_cop_qp[i])) + { + crypto_free_cop (qpd, &qpd->cops[enq], n_cop_qp[i] - enq); + vlib_buffer_free (vm, &qpd->bi[enq], n_cop_qp[i] - enq); + + vlib_node_increment_counter (vm, dpdk_esp_encrypt_node.index, + ESP_ENCRYPT_ERROR_ENQ_FAIL, + n_cop_qp[i] - enq); + } + } + /* *INDENT-ON* */ + + return from_frame->n_vectors; +} + +VLIB_REGISTER_NODE (dpdk_esp_encrypt_node) = +{ + .function = dpdk_esp_encrypt_node_fn,.name = "dpdk-esp-encrypt",.flags = + VLIB_NODE_FLAG_IS_OUTPUT,.vector_size = sizeof (u32),.format_trace = + format_esp_encrypt_trace,.n_errors = + ARRAY_LEN (esp_encrypt_error_strings),.error_strings = + esp_encrypt_error_strings,.n_next_nodes = 1,.next_nodes = + { + [ESP_ENCRYPT_NEXT_DROP] = "error-drop",} +}; + +VLIB_NODE_FUNCTION_MULTIARCH (dpdk_esp_encrypt_node, dpdk_esp_encrypt_node_fn) +/* + * ESP Encrypt Post Node + */ +#define foreach_esp_encrypt_post_error \ + _(PKTS, "ESP post pkts") + typedef enum + { +#define _(sym,str) ESP_ENCRYPT_POST_ERROR_##sym, + foreach_esp_encrypt_post_error +#undef _ + ESP_ENCRYPT_POST_N_ERROR, + } esp_encrypt_post_error_t; + + static char *esp_encrypt_post_error_strings[] = { +#define _(sym,string) string, + foreach_esp_encrypt_post_error +#undef _ + }; + +vlib_node_registration_t dpdk_esp_encrypt_post_node; + +static u8 * +format_esp_encrypt_post_trace (u8 * s, va_list * args) +{ + return s; +} + +static uword +dpdk_esp_encrypt_post_node_fn (vlib_main_t * vm, + vlib_node_runtime_t * node, + vlib_frame_t * from_frame) +{ + u32 n_left_from, *from, *to_next = 0, next_index; + + from = vlib_frame_vector_args (from_frame); + n_left_from = from_frame->n_vectors; + + next_index = node->cached_next_index; + + while (n_left_from > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0, next0; + vlib_buffer_t *b0 = 0; + + bi0 = from[0]; + from += 1; + n_left_from -= 1; + n_left_to_next -= 1; + + b0 = vlib_get_buffer (vm, bi0); + + to_next[0] = bi0; + to_next += 1; + + next0 = vnet_buffer (b0)->unused[0]; + + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, + to_next, n_left_to_next, bi0, + next0); + } + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + + vlib_node_increment_counter (vm, dpdk_esp_encrypt_post_node.index, + ESP_ENCRYPT_POST_ERROR_PKTS, + from_frame->n_vectors); + + return from_frame->n_vectors; +} + +VLIB_REGISTER_NODE (dpdk_esp_encrypt_post_node) = +{ + .function = dpdk_esp_encrypt_post_node_fn,.name = + "dpdk-esp-encrypt-post",.vector_size = sizeof (u32),.format_trace = + format_esp_encrypt_post_trace,.type = VLIB_NODE_TYPE_INTERNAL,.n_errors = + ARRAY_LEN (esp_encrypt_post_error_strings),.error_strings = + esp_encrypt_post_error_strings,.n_next_nodes = + ESP_ENCRYPT_N_NEXT,.next_nodes = + { +#define _(s,n) [ESP_ENCRYPT_NEXT_##s] = n, + foreach_esp_encrypt_next +#undef _ + } +}; + +VLIB_NODE_FUNCTION_MULTIARCH (dpdk_esp_encrypt_post_node, + dpdk_esp_encrypt_post_node_fn) +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/vnet/vnet/devices/dpdk/ipsec/ipsec.c b/vnet/vnet/devices/dpdk/ipsec/ipsec.c new file mode 100644 index 00000000..de253f02 --- /dev/null +++ b/vnet/vnet/devices/dpdk/ipsec/ipsec.c @@ -0,0 +1,313 @@ +/* + * Copyright (c) 2016 Intel and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include +#include +#include +#include +#include +#include +#include + +#define DPDK_CRYPTO_NB_OBJS 2048 +#define DPDK_CRYPTO_CACHE_SIZE 512 +#define DPDK_CRYPTO_PRIV_SIZE 128 +#define DPDK_CRYPTO_N_QUEUE_DESC 512 +#define DPDK_CRYPTO_NB_COPS (1024 * 4) + +/* + * return: + * -1: update failed + * 0: already exist + * 1: mapped + */ +static int +update_qp_data (crypto_worker_main_t * cwm, + u8 cdev_id, u16 qp_id, u8 is_outbound, u16 * idx) +{ + crypto_qp_data_t *qpd; + + /* *INDENT-OFF* */ + vec_foreach_index (*idx, cwm->qp_data) + { + qpd = vec_elt_at_index(cwm->qp_data, *idx); + + if (qpd->dev_id == cdev_id && qpd->qp_id == qp_id && + qpd->is_outbound == is_outbound) + return 0; + } + /* *INDENT-ON* */ + + vec_add2 (cwm->qp_data, qpd, 1); + + qpd->dev_id = cdev_id; + qpd->qp_id = qp_id; + qpd->is_outbound = is_outbound; + + return 1; +} + +/* + * return: + * -1: error + * 0: already exist + * 1: mapped + */ +static int +add_mapping (crypto_worker_main_t * cwm, + u8 cdev_id, u16 qp, u8 is_outbound, + const struct rte_cryptodev_capabilities *cipher_cap, + const struct rte_cryptodev_capabilities *auth_cap) +{ + int mapped; + u16 qp_index; + uword key = 0, data, *ret; + crypto_worker_qp_key_t *p_key = (crypto_worker_qp_key_t *) & key; + + p_key->cipher_algo = (u8) cipher_cap->sym.cipher.algo; + p_key->auth_algo = (u8) auth_cap->sym.auth.algo; + p_key->is_outbound = is_outbound; + + ret = hash_get (cwm->algo_qp_map, key); + if (ret) + return 0; + + mapped = update_qp_data (cwm, cdev_id, qp, is_outbound, &qp_index); + if (mapped < 0) + return -1; + + data = (uword) qp_index; + + ret = hash_set (cwm->algo_qp_map, key, data); + if (!ret) + rte_panic ("Failed to insert hash table\n"); + + return mapped; +} + +/* + * return: + * 0: already exist + * 1: mapped + */ +static int +add_cdev_mapping (crypto_worker_main_t * cwm, + struct rte_cryptodev_info *dev_info, u8 cdev_id, + u16 qp, u8 is_outbound) +{ + const struct rte_cryptodev_capabilities *i, *j; + u32 mapped = 0; + + for (i = dev_info->capabilities; i->op != RTE_CRYPTO_OP_TYPE_UNDEFINED; i++) + { + if (i->sym.xform_type != RTE_CRYPTO_SYM_XFORM_CIPHER) + continue; + + if (check_algo_is_supported (i, NULL) != 0) + continue; + + for (j = dev_info->capabilities; j->op != RTE_CRYPTO_OP_TYPE_UNDEFINED; + j++) + { + int status = 0; + + if (j->sym.xform_type != RTE_CRYPTO_SYM_XFORM_AUTH) + continue; + + if (check_algo_is_supported (j, NULL) != 0) + continue; + + status = add_mapping (cwm, cdev_id, qp, is_outbound, i, j); + if (status == 1) + mapped += 1; + if (status < 0) + return status; + } + } + + return mapped; +} + +static int +check_cryptodev_queues () +{ + u32 n_qs = 0; + u8 cdev_id; + u32 n_req_qs = 2; + + if (vlib_num_workers () > 0) + n_req_qs = vlib_num_workers () * 2; + + for (cdev_id = 0; cdev_id < rte_cryptodev_count (); cdev_id++) + { + struct rte_cryptodev_info cdev_info; + + rte_cryptodev_info_get (cdev_id, &cdev_info); + + if (! + (cdev_info.feature_flags & RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING)) + continue; + + n_qs += cdev_info.max_nb_queue_pairs; + } + + if (n_qs >= n_req_qs) + return 0; + else + return -1; +} + +static clib_error_t * +dpdk_ipsec_init (vlib_main_t * vm) +{ + dpdk_crypto_main_t *dcm = &dpdk_crypto_main; + vlib_thread_main_t *tm = vlib_get_thread_main (); + struct rte_cryptodev_config dev_conf; + struct rte_cryptodev_qp_conf qp_conf; + struct rte_cryptodev_info cdev_info; + struct rte_mempool *rmp; + i32 dev_id, ret; + u32 i, skip_master; + + if (check_cryptodev_queues () < 0) + return clib_error_return (0, "not enough cryptodevs for ipsec"); + + vec_alloc (dcm->workers_main, tm->n_vlib_mains); + _vec_len (dcm->workers_main) = tm->n_vlib_mains; + + fprintf (stdout, "DPDK Cryptodevs info:\n"); + fprintf (stdout, "dev_id\tn_qp\tnb_obj\tcache_size\n"); + /* HW cryptodevs have higher dev_id, use HW first */ + for (dev_id = rte_cryptodev_count () - 1; dev_id >= 0; dev_id--) + { + u16 max_nb_qp, qp = 0; + skip_master = vlib_num_workers () > 0; + + rte_cryptodev_info_get (dev_id, &cdev_info); + + if (! + (cdev_info.feature_flags & RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING)) + continue; + + max_nb_qp = cdev_info.max_nb_queue_pairs; + + for (i = 0; i < tm->n_vlib_mains; i++) + { + u8 is_outbound; + crypto_worker_main_t *cwm; + uword *map; + + if (skip_master) + { + skip_master = 0; + continue; + } + + cwm = vec_elt_at_index (dcm->workers_main, i); + map = cwm->algo_qp_map; + + if (!map) + { + map = hash_create (0, sizeof (crypto_worker_qp_key_t)); + if (!map) + return clib_error_return (0, "unable to create hash table " + "for worker %u", + vlib_mains[i]->cpu_index); + cwm->algo_qp_map = map; + } + + for (is_outbound = 0; is_outbound < 2 && qp < max_nb_qp; + is_outbound++) + { + int mapped = add_cdev_mapping (cwm, &cdev_info, + dev_id, qp, is_outbound); + if (mapped > 0) + qp++; + + if (mapped < 0) + return clib_error_return (0, + "too many queues for one worker"); + } + } + + if (qp == 0) + continue; + + dev_conf.socket_id = rte_cryptodev_socket_id (dev_id); + dev_conf.nb_queue_pairs = cdev_info.max_nb_queue_pairs; + dev_conf.session_mp.nb_objs = DPDK_CRYPTO_NB_OBJS; + dev_conf.session_mp.cache_size = DPDK_CRYPTO_CACHE_SIZE; + + ret = rte_cryptodev_configure (dev_id, &dev_conf); + if (ret < 0) + return clib_error_return (0, "cryptodev %u config error", dev_id); + + qp_conf.nb_descriptors = DPDK_CRYPTO_N_QUEUE_DESC; + for (qp = 0; qp < dev_conf.nb_queue_pairs; qp++) + { + ret = rte_cryptodev_queue_pair_setup (dev_id, qp, &qp_conf, + dev_conf.socket_id); + if (ret < 0) + return clib_error_return (0, "cryptodev %u qp %u setup error", + dev_id, qp); + } + fprintf (stdout, "%u\t%u\t%u\t%u\n", dev_id, dev_conf.nb_queue_pairs, + DPDK_CRYPTO_NB_OBJS, DPDK_CRYPTO_CACHE_SIZE); + } + + u32 socket_id = rte_socket_id (); + + vec_validate_aligned (dcm->cop_pools, socket_id, CLIB_CACHE_LINE_BYTES); + + /* pool already exists, nothing to do */ + if (dcm->cop_pools[socket_id]) + return 0; + + u8 *pool_name = format (0, "crypto_op_pool_socket%u%c", socket_id, 0); + + rmp = rte_crypto_op_pool_create ((char *) pool_name, + RTE_CRYPTO_OP_TYPE_SYMMETRIC, + DPDK_CRYPTO_NB_COPS * + (1 + vlib_num_workers ()), + DPDK_CRYPTO_CACHE_SIZE, + DPDK_CRYPTO_PRIV_SIZE, socket_id); + vec_free (pool_name); + + if (!rmp) + return clib_error_return (0, "failed to allocate mempool on socket %u", + socket_id); + dcm->cop_pools[socket_id] = rmp; + + dpdk_esp_init (); + + if (vec_len (vlib_mains) == 0) + vlib_node_set_state (&vlib_global_main, dpdk_crypto_input_node.index, + VLIB_NODE_STATE_POLLING); + else + for (i = 1; i < tm->n_vlib_mains; i++) + vlib_node_set_state (vlib_mains[i], dpdk_crypto_input_node.index, + VLIB_NODE_STATE_POLLING); + + return 0; +} + +VLIB_MAIN_LOOP_ENTER_FUNCTION (dpdk_ipsec_init); + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/vnet/vnet/devices/dpdk/ipsec/ipsec.h b/vnet/vnet/devices/dpdk/ipsec/ipsec.h new file mode 100644 index 00000000..e103655c --- /dev/null +++ b/vnet/vnet/devices/dpdk/ipsec/ipsec.h @@ -0,0 +1,216 @@ +/* + * Copyright (c) 2016 Intel and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#ifndef __DPDK_IPSEC_H__ +#define __DPDK_IPSEC_H__ + +#include + +#undef always_inline +#include +#include + +#if CLIB_DEBUG > 0 +#define always_inline static inline +#else +#define always_inline static inline __attribute__ ((__always_inline__)) +#endif + + +#define MAX_QP_PER_LCORE 16 + +typedef struct +{ + u32 iv[4]; + u8 icv[64]; +} dpdk_cop_priv_t; + +typedef struct +{ + u8 cipher_algo; + u8 auth_algo; + u8 is_outbound; +} crypto_worker_qp_key_t; + +typedef struct +{ + u16 dev_id; + u16 qp_id; + u16 is_outbound; + i16 inflights; + u32 bi[VLIB_FRAME_SIZE]; + struct rte_crypto_op *cops[VLIB_FRAME_SIZE]; + struct rte_crypto_op **free_cops; +} crypto_qp_data_t; + +typedef struct +{ + u8 qp_index; + void *sess; +} crypto_sa_session_t; + +typedef struct +{ + crypto_sa_session_t *sa_sess_d[2]; + crypto_qp_data_t *qp_data; + uword *algo_qp_map; +} crypto_worker_main_t; + +typedef struct +{ + struct rte_mempool **cop_pools; + crypto_worker_main_t *workers_main; +} dpdk_crypto_main_t; + +dpdk_crypto_main_t dpdk_crypto_main; + +extern vlib_node_registration_t dpdk_crypto_input_node; + +#define CRYPTO_N_FREE_COPS (VLIB_FRAME_SIZE * 3) + +static_always_inline void +crypto_alloc_cops () +{ + dpdk_crypto_main_t *dcm = &dpdk_crypto_main; + u32 cpu_index = os_get_cpu_number (); + crypto_worker_main_t *cwm = &dcm->workers_main[cpu_index]; + unsigned socket_id = rte_socket_id (); + crypto_qp_data_t *qpd; + + /* *INDENT-OFF* */ + vec_foreach (qpd, cwm->qp_data) + { + u32 l = vec_len (qpd->free_cops); + + if (PREDICT_FALSE (l < VLIB_FRAME_SIZE)) + { + u32 n_alloc; + + if (PREDICT_FALSE (!qpd->free_cops)) + vec_alloc (qpd->free_cops, CRYPTO_N_FREE_COPS); + + n_alloc = rte_crypto_op_bulk_alloc (dcm->cop_pools[socket_id], + RTE_CRYPTO_OP_TYPE_SYMMETRIC, + &qpd->free_cops[l], + CRYPTO_N_FREE_COPS - l - 1); + + _vec_len (qpd->free_cops) = l + n_alloc; + } + } + /* *INDENT-ON* */ +} + +static_always_inline void +crypto_free_cop (crypto_qp_data_t * qpd, struct rte_crypto_op **cops, u32 n) +{ + u32 l = vec_len (qpd->free_cops); + + if (l + n >= CRYPTO_N_FREE_COPS) + { + l -= VLIB_FRAME_SIZE; + rte_mempool_put_bulk (cops[0]->mempool, + (void **) &qpd->free_cops[l], VLIB_FRAME_SIZE); + } + clib_memcpy (&qpd->free_cops[l], cops, sizeof (*cops) * n); + + _vec_len (qpd->free_cops) = l + n; +} + +static_always_inline int +check_algo_is_supported (const struct rte_cryptodev_capabilities *cap, + char *name) +{ + struct + { + uint8_t cipher_algo; + enum rte_crypto_sym_xform_type type; + union + { + enum rte_crypto_auth_algorithm auth; + enum rte_crypto_cipher_algorithm cipher; + }; + char *name; + } supported_algo[] = + { + { + .type = RTE_CRYPTO_SYM_XFORM_CIPHER,.cipher = + RTE_CRYPTO_CIPHER_NULL,.name = "NULL"}, + { + .type = RTE_CRYPTO_SYM_XFORM_CIPHER,.cipher = + RTE_CRYPTO_CIPHER_AES_CBC,.name = "AES_CBC"}, + { + .type = RTE_CRYPTO_SYM_XFORM_CIPHER,.cipher = + RTE_CRYPTO_CIPHER_AES_CTR,.name = "AES_CTR"}, + { + .type = RTE_CRYPTO_SYM_XFORM_CIPHER,.cipher = + RTE_CRYPTO_CIPHER_3DES_CBC,.name = "3DES-CBC"}, + { + .type = RTE_CRYPTO_SYM_XFORM_CIPHER,.auth = + RTE_CRYPTO_CIPHER_AES_GCM,.name = "AES-GCM"}, + { + .type = RTE_CRYPTO_SYM_XFORM_AUTH,.auth = + RTE_CRYPTO_AUTH_SHA1_HMAC,.name = "HMAC-SHA1"}, + { + .type = RTE_CRYPTO_SYM_XFORM_AUTH,.auth = + RTE_CRYPTO_AUTH_SHA256_HMAC,.name = "HMAC-SHA256"}, + { + .type = RTE_CRYPTO_SYM_XFORM_AUTH,.auth = + RTE_CRYPTO_AUTH_SHA384_HMAC,.name = "HMAC-SHA384"}, + { + .type = RTE_CRYPTO_SYM_XFORM_AUTH,.auth = + RTE_CRYPTO_AUTH_SHA512_HMAC,.name = "HMAC-SHA512"}, + { + .type = RTE_CRYPTO_SYM_XFORM_AUTH,.auth = + RTE_CRYPTO_AUTH_AES_XCBC_MAC,.name = "AES-XCBC-MAC"}, + { + .type = RTE_CRYPTO_SYM_XFORM_AUTH,.auth = + RTE_CRYPTO_AUTH_AES_GCM,.name = "AES-GCM"}, + { + /* tail */ + .type = RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED},}; + uint32_t i = 0; + + if (cap->op != RTE_CRYPTO_OP_TYPE_SYMMETRIC) + return -1; + + while (supported_algo[i].type != RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED) + { + if (cap->sym.xform_type == supported_algo[i].type) + { + if ((cap->sym.xform_type == RTE_CRYPTO_SYM_XFORM_CIPHER && + cap->sym.cipher.algo == supported_algo[i].cipher) || + (cap->sym.xform_type == RTE_CRYPTO_SYM_XFORM_AUTH && + cap->sym.auth.algo == supported_algo[i].auth)) + { + if (name) + strcpy (name, supported_algo[i].name); + return 0; + } + } + + i++; + } + + return -1; +} + +#endif /* __DPDK_IPSEC_H__ */ + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/vnet/vnet/ipsec-gre/interface.c b/vnet/vnet/ipsec-gre/interface.c index fae79b97..56832ee1 100644 --- a/vnet/vnet/ipsec-gre/interface.c +++ b/vnet/vnet/ipsec-gre/interface.c @@ -27,7 +27,14 @@ #include #include #include + +#if DPDK_CRYPTO==1 +#include +#define ESP_NODE "dpdk-esp-encrypt" +#else #include +#define ESP_NODE "esp-encrypt" +#endif u8 * format_ipsec_gre_tunnel (u8 * s, va_list * args) @@ -186,7 +193,7 @@ vnet_ipsec_gre_add_del_tunnel (vnet_ipsec_gre_add_del_tunnel_args_t * a, hash_set (igm->tunnel_by_key, key, t - igm->tunnels); slot = vlib_node_add_named_next_with_slot - (vnm->vlib_main, hi->tx_node_index, "esp-encrypt", + (vnm->vlib_main, hi->tx_node_index, ESP_NODE, IPSEC_GRE_OUTPUT_NEXT_ESP_ENCRYPT); ASSERT (slot == IPSEC_GRE_OUTPUT_NEXT_ESP_ENCRYPT); diff --git a/vnet/vnet/ipsec/esp.h b/vnet/vnet/ipsec/esp.h index b9feacbb..50cac806 100644 --- a/vnet/vnet/ipsec/esp.h +++ b/vnet/vnet/ipsec/esp.h @@ -12,6 +12,8 @@ * See the License for the specific language governing permissions and * limitations under the License. */ +#ifndef __ESP_H__ +#define __ESP_H__ #include #include @@ -77,6 +79,154 @@ typedef struct esp_main_t esp_main; +#define ESP_WINDOW_SIZE (64) +#define ESP_SEQ_MAX (4294967295UL) + + +always_inline int +esp_replay_check (ipsec_sa_t * sa, u32 seq) +{ + u32 diff; + + if (PREDICT_TRUE (seq > sa->last_seq)) + return 0; + + diff = sa->last_seq - seq; + + if (ESP_WINDOW_SIZE > diff) + return (sa->replay_window & (1ULL << diff)) ? 1 : 0; + else + return 1; + + return 0; +} + +always_inline int +esp_replay_check_esn (ipsec_sa_t * sa, u32 seq) +{ + u32 tl = sa->last_seq; + u32 th = sa->last_seq_hi; + u32 diff = tl - seq; + + if (PREDICT_TRUE (tl >= (ESP_WINDOW_SIZE - 1))) + { + if (seq >= (tl - ESP_WINDOW_SIZE + 1)) + { + sa->seq_hi = th; + if (seq <= tl) + return (sa->replay_window & (1ULL << diff)) ? 1 : 0; + else + return 0; + } + else + { + sa->seq_hi = th + 1; + return 0; + } + } + else + { + if (seq >= (tl - ESP_WINDOW_SIZE + 1)) + { + sa->seq_hi = th - 1; + return (sa->replay_window & (1ULL << diff)) ? 1 : 0; + } + else + { + sa->seq_hi = th; + if (seq <= tl) + return (sa->replay_window & (1ULL << diff)) ? 1 : 0; + else + return 0; + } + } + + return 0; +} + +/* TODO seq increment should be atomic to be accessed by multiple workers */ +always_inline void +esp_replay_advance (ipsec_sa_t * sa, u32 seq) +{ + u32 pos; + + if (seq > sa->last_seq) + { + pos = seq - sa->last_seq; + if (pos < ESP_WINDOW_SIZE) + sa->replay_window = ((sa->replay_window) << pos) | 1; + else + sa->replay_window = 1; + sa->last_seq = seq; + } + else + { + pos = sa->last_seq - seq; + sa->replay_window |= (1ULL << pos); + } +} + +always_inline void +esp_replay_advance_esn (ipsec_sa_t * sa, u32 seq) +{ + int wrap = sa->seq_hi - sa->last_seq_hi; + u32 pos; + + if (wrap == 0 && seq > sa->last_seq) + { + pos = seq - sa->last_seq; + if (pos < ESP_WINDOW_SIZE) + sa->replay_window = ((sa->replay_window) << pos) | 1; + else + sa->replay_window = 1; + sa->last_seq = seq; + } + else if (wrap > 0) + { + pos = ~seq + sa->last_seq + 1; + if (pos < ESP_WINDOW_SIZE) + sa->replay_window = ((sa->replay_window) << pos) | 1; + else + sa->replay_window = 1; + sa->last_seq = seq; + sa->last_seq_hi = sa->seq_hi; + } + else if (wrap < 0) + { + pos = ~seq + sa->last_seq + 1; + sa->replay_window |= (1ULL << pos); + } + else + { + pos = sa->last_seq - seq; + sa->replay_window |= (1ULL << pos); + } +} + +always_inline int +esp_seq_advance (ipsec_sa_t * sa) +{ + if (PREDICT_TRUE (sa->use_esn)) + { + if (PREDICT_FALSE (sa->seq == ESP_SEQ_MAX)) + { + if (PREDICT_FALSE + (sa->use_anti_replay && sa->seq_hi == ESP_SEQ_MAX)) + return 1; + sa->seq_hi++; + } + sa->seq++; + } + else + { + if (PREDICT_FALSE (sa->use_anti_replay && sa->seq == ESP_SEQ_MAX)) + return 1; + sa->seq++; + } + + return 0; +} + always_inline void esp_init () { @@ -159,6 +309,7 @@ hmac_calc (ipsec_integ_alg_t alg, return em->esp_integ_algs[alg].trunc_size; } +#endif /* __ESP_H__ */ /* * fd.io coding-style-patch-verification: ON diff --git a/vnet/vnet/ipsec/esp_decrypt.c b/vnet/vnet/ipsec/esp_decrypt.c index 07a5edac..e69cd851 100644 --- a/vnet/vnet/ipsec/esp_decrypt.c +++ b/vnet/vnet/ipsec/esp_decrypt.c @@ -22,8 +22,6 @@ #include #include -#define ESP_WINDOW_SIZE 64 - #define foreach_esp_decrypt_next \ _(DROP, "error-drop") \ _(IP4_INPUT, "ip4-input") \ @@ -109,125 +107,6 @@ esp_decrypt_aes_cbc (ipsec_crypto_alg_t alg, EVP_DecryptFinal_ex (ctx, out + out_len, &out_len); } -always_inline int -esp_replay_check (ipsec_sa_t * sa, u32 seq) -{ - u32 diff; - - if (PREDICT_TRUE (seq > sa->last_seq)) - return 0; - - diff = sa->last_seq - seq; - - if (ESP_WINDOW_SIZE > diff) - return (sa->replay_window & (1ULL << diff)) ? 1 : 0; - else - return 1; - - return 0; -} - -always_inline int -esp_replay_check_esn (ipsec_sa_t * sa, u32 seq) -{ - u32 tl = sa->last_seq; - u32 th = sa->last_seq_hi; - u32 diff = tl - seq; - - if (PREDICT_TRUE (tl >= (ESP_WINDOW_SIZE - 1))) - { - if (seq >= (tl - ESP_WINDOW_SIZE + 1)) - { - sa->seq_hi = th; - if (seq <= tl) - return (sa->replay_window & (1ULL << diff)) ? 1 : 0; - else - return 0; - } - else - { - sa->seq_hi = th + 1; - return 0; - } - } - else - { - if (seq >= (tl - ESP_WINDOW_SIZE + 1)) - { - sa->seq_hi = th - 1; - return (sa->replay_window & (1ULL << diff)) ? 1 : 0; - } - else - { - sa->seq_hi = th; - if (seq <= tl) - return (sa->replay_window & (1ULL << diff)) ? 1 : 0; - else - return 0; - } - } - - return 0; -} - -always_inline void -esp_replay_advance (ipsec_sa_t * sa, u32 seq) -{ - u32 pos; - - if (seq > sa->last_seq) - { - pos = seq - sa->last_seq; - if (pos < ESP_WINDOW_SIZE) - sa->replay_window = ((sa->replay_window) << pos) | 1; - else - sa->replay_window = 1; - sa->last_seq = seq; - } - else - { - pos = sa->last_seq - seq; - sa->replay_window |= (1ULL << pos); - } -} - -always_inline void -esp_replay_advance_esn (ipsec_sa_t * sa, u32 seq) -{ - int wrap = sa->seq_hi - sa->last_seq_hi; - u32 pos; - - if (wrap == 0 && seq > sa->last_seq) - { - pos = seq - sa->last_seq; - if (pos < ESP_WINDOW_SIZE) - sa->replay_window = ((sa->replay_window) << pos) | 1; - else - sa->replay_window = 1; - sa->last_seq = seq; - } - else if (wrap > 0) - { - pos = ~seq + sa->last_seq + 1; - if (pos < ESP_WINDOW_SIZE) - sa->replay_window = ((sa->replay_window) << pos) | 1; - else - sa->replay_window = 1; - sa->last_seq = seq; - sa->last_seq_hi = sa->seq_hi; - } - else if (wrap < 0) - { - pos = ~seq + sa->last_seq + 1; - sa->replay_window |= (1ULL << pos); - } - else - { - pos = sa->last_seq - seq; - sa->replay_window |= (1ULL << pos); - } -} - static uword esp_decrypt_node_fn (vlib_main_t * vm, vlib_node_runtime_t * node, vlib_frame_t * from_frame) diff --git a/vnet/vnet/ipsec/esp_encrypt.c b/vnet/vnet/ipsec/esp_encrypt.c index b7186e3d..b947611e 100644 --- a/vnet/vnet/ipsec/esp_encrypt.c +++ b/vnet/vnet/ipsec/esp_encrypt.c @@ -22,7 +22,6 @@ #include #include -#define ESP_SEQ_MAX (4294967295UL) #define foreach_esp_encrypt_next \ _(DROP, "error-drop") \ @@ -111,30 +110,6 @@ esp_encrypt_aes_cbc (ipsec_crypto_alg_t alg, EVP_EncryptFinal_ex (ctx, out + out_len, &out_len); } -always_inline int -esp_seq_advance (ipsec_sa_t * sa) -{ - if (PREDICT_TRUE (sa->use_esn)) - { - if (PREDICT_FALSE (sa->seq == ESP_SEQ_MAX)) - { - if (PREDICT_FALSE - (sa->use_anti_replay && sa->seq_hi == ESP_SEQ_MAX)) - return 1; - sa->seq_hi++; - } - sa->seq++; - } - else - { - if (PREDICT_FALSE (sa->use_anti_replay && sa->seq == ESP_SEQ_MAX)) - return 1; - sa->seq++; - } - - return 0; -} - static uword esp_encrypt_node_fn (vlib_main_t * vm, vlib_node_runtime_t * node, vlib_frame_t * from_frame) diff --git a/vnet/vnet/ipsec/ipsec.c b/vnet/vnet/ipsec/ipsec.c index 223440ec..ee85c402 100644 --- a/vnet/vnet/ipsec/ipsec.c +++ b/vnet/vnet/ipsec/ipsec.c @@ -21,9 +21,25 @@ #include #include -#include #include +#if DPDK_CRYPTO==1 +#include +#define ESP_NODE "dpdk-esp-encrypt" +#else +#include +#define ESP_NODE "esp-encrypt" +#endif + +#if DPDK_CRYPTO==0 +/* dummy function */ +static int +add_del_sa_sess (u32 sa_index, u8 is_add) +{ + return 0; +} +#endif + u32 ipsec_get_sa_index_by_sa_id (u32 sa_id) { @@ -433,6 +449,7 @@ ipsec_add_del_sa (vlib_main_t * vm, ipsec_sa_t * new_sa, int is_add) return VNET_API_ERROR_SYSCALL_ERROR_1; /* sa used in policy */ } hash_unset (im->sa_index_by_sa_id, sa->id); + add_del_sa_sess (sa_index, is_add); pool_put (im->sad, sa); } else /* create new SA */ @@ -441,6 +458,8 @@ ipsec_add_del_sa (vlib_main_t * vm, ipsec_sa_t * new_sa, int is_add) clib_memcpy (sa, new_sa, sizeof (*sa)); sa_index = sa - im->sad; hash_set (im->sa_index_by_sa_id, sa->id, sa_index); + if (add_del_sa_sess (sa_index, is_add) < 0) + return VNET_API_ERROR_SYSCALL_ERROR_1; } return 0; } @@ -476,6 +495,12 @@ ipsec_set_sa_key (vlib_main_t * vm, ipsec_sa_t * sa_update) sa->integ_key_len = sa_update->integ_key_len; } + if (sa->crypto_key_len + sa->integ_key_len > 0) + { + if (add_del_sa_sess (sa_index, 0) < 0) + return VNET_API_ERROR_SYSCALL_ERROR_1; + } + return 0; } @@ -522,7 +547,8 @@ ipsec_init (vlib_main_t * vm) ASSERT (node); im->error_drop_node_index = node->index; - node = vlib_get_node_by_name (vm, (u8 *) "esp-encrypt"); + node = vlib_get_node_by_name (vm, (u8 *) ESP_NODE); + ASSERT (node); im->esp_encrypt_node_index = node->index; @@ -530,7 +556,6 @@ ipsec_init (vlib_main_t * vm) ASSERT (node); im->ip4_lookup_node_index = node->index; - if ((error = vlib_call_init_function (vm, ipsec_cli_init))) return error; diff --git a/vnet/vnet/ipsec/ipsec.h b/vnet/vnet/ipsec/ipsec.h index d33df383..65d7bad3 100644 --- a/vnet/vnet/ipsec/ipsec.h +++ b/vnet/vnet/ipsec/ipsec.h @@ -12,6 +12,9 @@ * See the License for the specific language governing permissions and * limitations under the License. */ +#ifndef __IPSEC_H__ +#define __IPSEC_H__ + #define IPSEC_FLAG_IPSEC_GRE_TUNNEL (1 << 0) #define foreach_ipsec_policy_action \ @@ -307,6 +310,8 @@ get_next_output_feature_node_index (vlib_buffer_t * b, return node->next_nodes[next]; } +#endif /* __IPSEC_H__ */ + /* * fd.io coding-style-patch-verification: ON * diff --git a/vnet/vnet/ipsec/ipsec_cli.c b/vnet/vnet/ipsec/ipsec_cli.c index 785e040b..8920924d 100644 --- a/vnet/vnet/ipsec/ipsec_cli.c +++ b/vnet/vnet/ipsec/ipsec_cli.c @@ -768,7 +768,6 @@ VLIB_CLI_COMMAND (set_interface_key_command, static) = { }; /* *INDENT-ON* */ - clib_error_t * ipsec_cli_init (vlib_main_t * vm) { diff --git a/vnet/vnet/ipsec/ipsec_if.c b/vnet/vnet/ipsec/ipsec_if.c index 77d5d19a..a8da046f 100644 --- a/vnet/vnet/ipsec/ipsec_if.c +++ b/vnet/vnet/ipsec/ipsec_if.c @@ -20,6 +20,20 @@ #include #include +#if DPDK_CRYPTO==1 +#include +#else +#include +#endif + +#if DPDK_CRYPTO==0 +/* dummy function */ +static int +add_del_sa_sess (u32 sa_index, u8 is_add) +{ + return 0; +} +#endif void vl_api_rpc_call_main_thread (void *fp, u8 * data, u32 data_length); @@ -124,6 +138,8 @@ ipsec_add_del_tunnel_if_internal (vnet_main_t * vnm, args->remote_crypto_key_len); } + add_del_sa_sess (t->input_sa_index, args->is_add); + pool_get (im->sad, sa); memset (sa, 0, sizeof (*sa)); t->output_sa_index = sa - im->sad; @@ -149,6 +165,8 @@ ipsec_add_del_tunnel_if_internal (vnet_main_t * vnm, args->local_crypto_key_len); } + add_del_sa_sess (t->output_sa_index, args->is_add); + hash_set (im->ipsec_if_pool_index_by_key, key, t - im->tunnel_interfaces); @@ -192,8 +210,17 @@ ipsec_add_del_tunnel_if_internal (vnet_main_t * vnm, /* delete input and output SA */ sa = pool_elt_at_index (im->sad, t->input_sa_index); + + if (add_del_sa_sess (t->input_sa_index, args->is_add) < 0) + return VNET_API_ERROR_SYSCALL_ERROR_1; + pool_put (im->sad, sa); + sa = pool_elt_at_index (im->sad, t->output_sa_index); + + if (add_del_sa_sess (t->output_sa_index, args->is_add) < 0) + return VNET_API_ERROR_SYSCALL_ERROR_1; + pool_put (im->sad, sa); hash_unset (im->ipsec_if_pool_index_by_key, key); @@ -282,6 +309,9 @@ ipsec_set_interface_key (vnet_main_t * vnm, u32 hw_if_index, sa->crypto_alg = alg; sa->crypto_key_len = vec_len (key); clib_memcpy (sa->crypto_key, key, vec_len (key)); + + if (add_del_sa_sess (t->input_sa_index, 0) < 0) + return VNET_API_ERROR_SYSCALL_ERROR_1; } else if (type == IPSEC_IF_SET_KEY_TYPE_LOCAL_INTEG) { @@ -289,6 +319,9 @@ ipsec_set_interface_key (vnet_main_t * vnm, u32 hw_if_index, sa->integ_alg = alg; sa->integ_key_len = vec_len (key); clib_memcpy (sa->integ_key, key, vec_len (key)); + + if (add_del_sa_sess (t->output_sa_index, 0) < 0) + return VNET_API_ERROR_SYSCALL_ERROR_1; } else if (type == IPSEC_IF_SET_KEY_TYPE_REMOTE_CRYPTO) { @@ -296,6 +329,9 @@ ipsec_set_interface_key (vnet_main_t * vnm, u32 hw_if_index, sa->crypto_alg = alg; sa->crypto_key_len = vec_len (key); clib_memcpy (sa->crypto_key, key, vec_len (key)); + + if (add_del_sa_sess (t->input_sa_index, 0) < 0) + return VNET_API_ERROR_SYSCALL_ERROR_1; } else if (type == IPSEC_IF_SET_KEY_TYPE_REMOTE_INTEG) { @@ -303,6 +339,9 @@ ipsec_set_interface_key (vnet_main_t * vnm, u32 hw_if_index, sa->integ_alg = alg; sa->integ_key_len = vec_len (key); clib_memcpy (sa->integ_key, key, vec_len (key)); + + if (add_del_sa_sess (t->output_sa_index, 0) < 0) + return VNET_API_ERROR_SYSCALL_ERROR_1; } else return VNET_API_ERROR_INVALID_VALUE; diff --git a/vnet/vnet/ipsec/ipsec_if_in.c b/vnet/vnet/ipsec/ipsec_if_in.c index 93cedce2..db75ab92 100644 --- a/vnet/vnet/ipsec/ipsec_if_in.c +++ b/vnet/vnet/ipsec/ipsec_if_in.c @@ -22,6 +22,12 @@ #include #include +#if DPDK_CRYPTO==1 +#define ESP_NODE "dpdk-esp-decrypt" +#else +#define ESP_NODE "esp-decrypt" +#endif + /* Statistics (not really errors) */ #define foreach_ipsec_if_input_error \ _(RX, "good packets received") @@ -153,8 +159,8 @@ VLIB_REGISTER_NODE (ipsec_if_input_node) = { .n_next_nodes = IPSEC_IF_INPUT_N_NEXT, .next_nodes = { - [IPSEC_IF_INPUT_NEXT_ESP_DECRYPT] = "esp-decrypt", - [IPSEC_IF_INPUT_NEXT_DROP] = "error-drop", + [IPSEC_IF_INPUT_NEXT_ESP_DECRYPT] = ESP_NODE, + [IPSEC_IF_INPUT_NEXT_DROP] = "error-drop", }, }; /* *INDENT-ON* */ diff --git a/vnet/vnet/ipsec/ipsec_if_out.c b/vnet/vnet/ipsec/ipsec_if_out.c index a605874e..8f062828 100644 --- a/vnet/vnet/ipsec/ipsec_if_out.c +++ b/vnet/vnet/ipsec/ipsec_if_out.c @@ -21,6 +21,11 @@ #include +#if DPDK_CRYPTO==1 +#define ESP_NODE "dpdk-esp-encrypt" +#else +#define ESP_NODE "esp-encrypt" +#endif /* Statistics (not really errors) */ #define foreach_ipsec_if_output_error \ @@ -140,7 +145,7 @@ VLIB_REGISTER_NODE (ipsec_if_output_node) = { .n_next_nodes = IPSEC_IF_OUTPUT_N_NEXT, .next_nodes = { - [IPSEC_IF_OUTPUT_NEXT_ESP_ENCRYPT] = "esp-encrypt", + [IPSEC_IF_OUTPUT_NEXT_ESP_ENCRYPT] = ESP_NODE, [IPSEC_IF_OUTPUT_NEXT_DROP] = "error-drop", }, }; diff --git a/vnet/vnet/ipsec/ipsec_input.c b/vnet/vnet/ipsec/ipsec_input.c index 8360a1d5..4662c1a1 100644 --- a/vnet/vnet/ipsec/ipsec_input.c +++ b/vnet/vnet/ipsec/ipsec_input.c @@ -23,9 +23,15 @@ #include #include +#if DPDK_CRYPTO==1 +#define ESP_NODE "dpdk-esp-decrypt" +#else +#define ESP_NODE "esp-decrypt" +#endif + #define foreach_ipsec_input_next \ _(DROP, "error-drop") \ -_(ESP_DECRYPT, "esp-decrypt") +_(ESP_DECRYPT, ESP_NODE) #define _(v, s) IPSEC_INPUT_NEXT_##v, typedef enum diff --git a/vnet/vnet/ipsec/ipsec_output.c b/vnet/vnet/ipsec/ipsec_output.c index 3810520d..97977899 100644 --- a/vnet/vnet/ipsec/ipsec_output.c +++ b/vnet/vnet/ipsec/ipsec_output.c @@ -21,11 +21,17 @@ #include +#if DPDK_CRYPTO==1 +#define ESP_NODE "dpdk-esp-encrypt" +#else +#define ESP_NODE "esp-encrypt" +#endif + #if IPSEC > 0 #define foreach_ipsec_output_next \ _(DROP, "error-drop") \ -_(ESP_ENCRYPT, "esp-encrypt") +_(ESP_ENCRYPT, ESP_NODE) #define _(v, s) IPSEC_OUTPUT_NEXT_##v, typedef enum diff --git a/vpp/Makefile.am b/vpp/Makefile.am index 52ce0655..214cc743 100644 --- a/vpp/Makefile.am +++ b/vpp/Makefile.am @@ -13,7 +13,7 @@ AUTOMAKE_OPTIONS = foreign subdir-objects -AM_CFLAGS = -Wall @DPDK@ @IPSEC@ @VCGN@ @IPV6SR@ +AM_CFLAGS = -Wall @DPDK@ @DPDK_CRYPTO@ @IPSEC@ @VCGN@ @IPV6SR@ noinst_PROGRAMS = BUILT_SOURCES = @@ -110,6 +110,9 @@ else vpp_LDFLAGS += -l:libdpdk.a endif vpp_LDFLAGS += -Wl,--no-whole-archive +if WITH_DPDK_CRYPTO +vpp_LDADD += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB +endif endif vpp_LDADD += -lvppinfra -lm -lpthread -ldl diff --git a/vpp/configure.ac b/vpp/configure.ac index 0bc13e63..d36695cc 100644 --- a/vpp/configure.ac +++ b/vpp/configure.ac @@ -9,6 +9,11 @@ AC_ARG_WITH(dpdk, [with_dpdk=1], [with_dpdk=0]) +AC_ARG_WITH(dpdk_crypto, + AC_HELP_STRING([--with-dpdk-crypto],[Use DPDK cryptodev]), + [with_dpdk_crypto=1], + [with_dpdk_crypto=0]) + AC_ARG_ENABLE(dpdk-shared, AC_HELP_STRING([--enable-dpdk-shared],[Link with DPDK shared lib]), [enable_dpdk_shared=1], @@ -32,6 +37,9 @@ AC_ARG_WITH(ipv6sr, AM_CONDITIONAL(WITH_DPDK, test "$with_dpdk" = "1") AM_CONDITIONAL(ENABLE_DPDK_SHARED, test "$enable_dpdk_shared" = "1") AC_SUBST(DPDK,["-DDPDK=${with_dpdk} -DDPDK_SHARED_LIB=${enable_dpdk_shared}"]) +AM_CONDITIONAL(WITH_DPDK_CRYPTO, test "$with_dpdk_crypto" = "1") +AC_SUBST(DPDK_CRYPTO,[-DDPDK_CRYPTO=${with_dpdk_crypto}]) + AM_COND_IF( [ENABLE_DPDK_SHARED], -- cgit 1.2.3-korg From fdc62abdc113ea63dc867375bd49ef3043dcd290 Mon Sep 17 00:00:00 2001 From: Chris Luke Date: Wed, 28 Dec 2016 09:44:47 -0500 Subject: Repair Doxygen build infrastructure After Gerrit 4430 much of the documentation failed to build, but silently so it was easily missed; equally missing that several paths have been missing for a while. - Correct paths after directory tree changes. - Doxygen now bails when input paths don't exist. - Fix up some of the less deranged entries in the documentation index. - Exclude the LUA tree, its documentation is a mess. Change-Id: I35e6b433feee5e05bca772d93aa1635c724db734 Signed-off-by: Chris Luke --- README.md | 30 +++++++++++----------- doxygen/Makefile | 38 +++++++++++++++++++-------- doxygen/user_doc.md | 2 ++ src/vnet/bfd/bfd_doc.md | 4 ++- src/vnet/span/span.md | 65 ----------------------------------------------- src/vnet/span/span_doc.md | 65 +++++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 112 insertions(+), 92 deletions(-) delete mode 100644 src/vnet/span/span.md create mode 100644 src/vnet/span/span_doc.md (limited to 'doxygen/user_doc.md') diff --git a/README.md b/README.md index cf43a4df..8520b395 100644 --- a/README.md +++ b/README.md @@ -30,22 +30,22 @@ Directory name | Description build-root | Build output directory doxygen | Documentation generator configuration dpdk | DPDK patches and build infrastructure - g2 | Event log visualization tool - perftool | Performance tool -@ref plugins | VPP bundled plugins directory -@ref svm | Shared virtual memory allocation library +@ref plugins | Not-yet-relocated VPP bundled plugins directory +@ref src | VPP source code +@ref src/plugins | VPP bundled plugins directory +@ref src/svm | Shared virtual memory allocation library + src/tests | Unit tests + src/vat | VPP API test program +@ref src/vlib | VPP application library +@ref src/vlibapi | VPP API library +@ref src/vlibmemory | VPP Memory management +@ref src/vlibsocket | VPP Socket I/O +@ref src/vnet | VPP networking +@ref src/vpp | VPP application +@ref src/vpp-api | VPP application API bindings +@ref src/vppinfra | VPP core library test | Unit tests -@ref vlib | VPP application library source -@ref vlib-api | VPP API library source -@ref vnet | VPP networking source -@ref vpp | VPP application source -@ref vpp-api | VPP application API source - vppapigen | VPP API generator source - vpp-api-test | VPP API test program source -@ref vppinfra | VPP core library source - -(If the page you are viewing is not generated by Doxygen then -ignore any @@ref labels in the above table.) +@ref vpp-api | Not-yet-relocated API bindings ## Getting started diff --git a/doxygen/Makefile b/doxygen/Makefile index 92fa3635..ffce3c13 100644 --- a/doxygen/Makefile +++ b/doxygen/Makefile @@ -42,13 +42,17 @@ DOC_MAC_PY_DEPENDS = pyparsing jinja2 DOXY_DIR ?= $(WS_ROOT)/doxygen # Primary source directories +DOXY_SRC ?= src DOXY_SRC_DIRECTORIES = \ - vppinfra \ - svm \ - vlib \ - vlib-api \ - vnet \ - vpp \ + $(DOXY_SRC)/vppinfra \ + $(DOXY_SRC)/svm \ + $(DOXY_SRC)/vlib \ + $(DOXY_SRC)/vlibapi \ + $(DOXY_SRC)/vlibmemory \ + $(DOXY_SRC)/vlibsocket \ + $(DOXY_SRC)/vnet \ + $(DOXY_SRC)/vpp \ + $(DOXY_SRC)/vpp-api \ vpp-api # Input directories and files @@ -56,6 +60,7 @@ DOXY_INPUT ?= \ $(wildcard $(WS_ROOT)/*.md) \ $(wildcard $(DOXY_DIR)/*.md) \ $(DOXY_SRC_DIRECTORIES) \ + $(DOXY_SRC)/plugins \ plugins # Strip leading workspace path from input names @@ -67,9 +72,10 @@ DOXY_INPUT := $(subst $(WS_ROOT)/,,$(DOXY_INPUT)) # there's a DPDK equivalent that conflicts. # These must be left-anchored paths for the regexp below to work. DOXY_EXCLUDE ?= \ - vlib/vlib/buffer.c \ - vlib/example \ - plugins/sample-plugin + $(DOXY_SRC)/vlib/vlib/buffer.c \ + $(DOXY_SRC)/vlib/example \ + plugins/sample-plugin \ + vpp-api/lua # Generate a regexp for filenames to exclude DOXY_EXCLUDE_REGEXP = ($(subst .,\.,$(shell echo '$(strip $(DOXY_EXCLUDE))' | sed -e 's/ /|/g'))) @@ -164,9 +170,11 @@ endif bootstrap-doxygen: $(BR)/.doxygen-bootstrap.ok .DELETE_ON_ERROR: $(BR)/.doxygen-siphon.dep -$(BR)/.doxygen-siphon.dep: Makefile +$(BR)/.doxygen-siphon.dep: Makefile \ + $(addprefix,$(WSROOT),$(DOXY_INPUT)) @echo "Building siphon dependencies..." - @set -e; rm -f "$@"; for input in $(DOXY_INPUT); do \ + @rm -f "$@"; for input in $(DOXY_INPUT); do \ + [ -e "$(WS_ROOT)/$$input" ] && \ find "$(WS_ROOT)/$$input" -type f \ \( -name '*.[ch]' -or -name '*.dox' \) -print \ | grep -v -E '^$(WS_ROOT)/$(DOXY_EXCLUDE_REGEXP)' \ @@ -182,7 +190,15 @@ $(BR)/.doxygen-siphon.dep: Makefile .NOTPARALLEL: $(SIPHON_FILES) $(SIPHON_FILES): $(BR)/.doxygen-bootstrap.ok \ $(DOXY_DIR)/siphon-generate \ + $(addprefix,$(WSROOT),$(DOXY_INPUT)) \ $(wildcard $(DOXY_DIR)/siphon/*.py) + @echo "Validating source tree..." + @set -e; for input in $(DOXY_INPUT); do \ + if [ ! -e "$(WS_ROOT)/$$input" ]; then \ + echo "ERROR: Input path '$$input' does not exist." >&2; \ + exit 1; \ + fi; \ + done @rm -rf "$(SIPHON_INPUT)" "$(SIPHON_OUTPUT)" @mkdir -p "$(SIPHON_INPUT)" "$(SIPHON_OUTPUT)" @touch $(SIPHON_INPUT)/files diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index 2e87c877..40303439 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -12,3 +12,5 @@ Several modules provide operational, dataplane-user focused documentation. - @subpage ioam_plugin_doc - @subpage lb_plugin_doc - @subpage flowperpkt_plugin_doc +- @subpage span_doc +- @subpage bfd_doc diff --git a/src/vnet/bfd/bfd_doc.md b/src/vnet/bfd/bfd_doc.md index 1333ed77..3e86b178 100644 --- a/src/vnet/bfd/bfd_doc.md +++ b/src/vnet/bfd/bfd_doc.md @@ -1 +1,3 @@ -TODO +# BFD Notes {#bfd_doc} + +@todo Someone needs to produce this or remove the stub file. diff --git a/src/vnet/span/span.md b/src/vnet/span/span.md deleted file mode 100644 index ee3f814f..00000000 --- a/src/vnet/span/span.md +++ /dev/null @@ -1,65 +0,0 @@ -# VPP SPAN implementation - -This is a memo intended to contain documentation of the VPP SPAN implementation. -Everything that is not directly obvious should come here. - - -## Switched Port Analyzer (SPAN) -Port mirroring is used on a network switch to send a copy of network packets seen on one switch port to a network monitoring connection on another switch port. -Can be used by network engineers or administrators to measure performnce, analyze and debug data or diagnose errors on a network. - -### RX traffic node -There is one static node to mirror incomming packets. -* span-input: Creates a copy of incomming buffer due to incomming buffers can be reused internally. - -Chaining: dpdk-input -> span-input -> -* original buffer is sent to ethernet-input for processing -* buffer copy is sent to interface-output - -### Configuration -SPAN supports the following CLI configuration commands: - -#### Enable/Disable SPAN (CLI) - set interface span [disable | destination ] - -: mirrored interface name -destination : monitoring interface name -disable: delete mirroring - -#### Enable/Disabl SPAN (API) -SPAN supports the following API configuration command: - sw_interface_span_enable_disable src GigabitEthernet0/8/0 dst GigabitEthernet0/9/0 - sw_interface_span_enable_disable src_sw_if_index 1 dst_sw_if_index 2 - -src/src_sw_if_index: mirrored interface name -dst/dst_sw_if_index: monitoring interface name - -#### Remove SPAN entry (API) -SPAN supports the following API configuration command: - sw_interface_span_enable_disable src_sw_if_index 1 dst_sw_if_index 2 disable - -src_sw_if_index: mirrored interface name -dst_sw_if_index: monitoring interface name - -### Configuration example - -Mirror all packets on interface GigabitEthernet0/10/0 to interface GigabitEthernet0/11/0. - -Configure IPv4 addresses on mirrored interface: -set interface ip address GigabitEthernet0/10/0 192.168.1.13/24 -set interface state GigabitEthernet0/10/0 up - -Configure IPv4 addresses on monitoring interface: -set interface ip address GigabitEthernet0/11/0 192.168.2.13/24 -set interface state GigabitEthernet0/11/0 up - -Configure SPAN -set span src GigabitEthernet0/10/0 dst GigabitEthernet0/11/0 - -### Operational data - -Active SPAN mirroring CLI show command: - show interfaces span - -Active SPAN mirroring API dump command: - sw_interface_span_dump diff --git a/src/vnet/span/span_doc.md b/src/vnet/span/span_doc.md new file mode 100644 index 00000000..46480b28 --- /dev/null +++ b/src/vnet/span/span_doc.md @@ -0,0 +1,65 @@ +# VPP SPAN implementation {#span_doc} + +This is a memo intended to contain documentation of the VPP SPAN implementation. +Everything that is not directly obvious should come here. + + +## Switched Port Analyzer (SPAN) +Port mirroring is used on a network switch to send a copy of network packets seen on one switch port to a network monitoring connection on another switch port. +Can be used by network engineers or administrators to measure performnce, analyze and debug data or diagnose errors on a network. + +### RX traffic node +There is one static node to mirror incomming packets. +* span-input: Creates a copy of incomming buffer due to incomming buffers can be reused internally. + +Chaining: dpdk-input -> span-input -> +* original buffer is sent to ethernet-input for processing +* buffer copy is sent to interface-output + +### Configuration +SPAN supports the following CLI configuration commands: + +#### Enable/Disable SPAN (CLI) + set interface span [disable | destination ] + +: mirrored interface name +destination : monitoring interface name +disable: delete mirroring + +#### Enable/Disabl SPAN (API) +SPAN supports the following API configuration command: + sw_interface_span_enable_disable src GigabitEthernet0/8/0 dst GigabitEthernet0/9/0 + sw_interface_span_enable_disable src_sw_if_index 1 dst_sw_if_index 2 + +src/src_sw_if_index: mirrored interface name +dst/dst_sw_if_index: monitoring interface name + +#### Remove SPAN entry (API) +SPAN supports the following API configuration command: + sw_interface_span_enable_disable src_sw_if_index 1 dst_sw_if_index 2 disable + +src_sw_if_index: mirrored interface name +dst_sw_if_index: monitoring interface name + +### Configuration example + +Mirror all packets on interface GigabitEthernet0/10/0 to interface GigabitEthernet0/11/0. + +Configure IPv4 addresses on mirrored interface: +set interface ip address GigabitEthernet0/10/0 192.168.1.13/24 +set interface state GigabitEthernet0/10/0 up + +Configure IPv4 addresses on monitoring interface: +set interface ip address GigabitEthernet0/11/0 192.168.2.13/24 +set interface state GigabitEthernet0/11/0 up + +Configure SPAN +set span src GigabitEthernet0/10/0 dst GigabitEthernet0/11/0 + +### Operational data + +Active SPAN mirroring CLI show command: + show interfaces span + +Active SPAN mirroring API dump command: + sw_interface_span_dump -- cgit 1.2.3-korg From 4521afa9a643983c789b6b3a77b14c0a95d5e0b2 Mon Sep 17 00:00:00 2001 From: Pablo Camarillo Date: Thu, 16 Mar 2017 10:43:05 +0100 Subject: Bugfixing and documentation for SRv6 - Fixed three coverity issues - Linked SRv6 docs - Moved sample plugin to examples folder - Fixed bug with hash. Now everything is using mhash. Potentially in the future we want to do bihash. Change-Id: Ie03a13c8fecb1e315e67d0596cbd23220779aaf2 Signed-off-by: Pablo Camarillo --- doxygen/Makefile | 6 +- doxygen/user_doc.md | 13 +- src/configure.ac | 1 - src/examples/srv6-sample-localsid/node.c | 200 +++++++++++++++++++++ .../srv6-sample-localsid/srv6_localsid_sample.c | 179 ++++++++++++++++++ .../srv6-sample-localsid/srv6_localsid_sample.h | 61 +++++++ .../srv6_sample_localsid_doc.md | 30 ++++ src/plugins/Makefile.am | 4 - src/plugins/sample_srv6_localsid.am | 23 --- src/plugins/srv6-localsid/node.c | 200 --------------------- src/plugins/srv6-localsid/srv6_localsid_sample.c | 179 ------------------ src/plugins/srv6-localsid/srv6_localsid_sample.h | 61 ------- .../srv6-localsid/srv6_sample_localsid_doc.md | 38 ---- src/vnet/sr/sr.h | 30 +--- src/vnet/sr/sr_doc.md | 132 ++------------ src/vnet/sr/sr_localsid.c | 18 +- src/vnet/sr/sr_localsid.md | 58 ++++++ src/vnet/sr/sr_policy.md | 56 ++++++ src/vnet/sr/sr_policy_rewrite.c | 28 ++- src/vnet/sr/sr_steering.c | 59 +++--- src/vnet/sr/sr_steering.md | 11 ++ 21 files changed, 672 insertions(+), 715 deletions(-) create mode 100644 src/examples/srv6-sample-localsid/node.c create mode 100755 src/examples/srv6-sample-localsid/srv6_localsid_sample.c create mode 100644 src/examples/srv6-sample-localsid/srv6_localsid_sample.h create mode 100644 src/examples/srv6-sample-localsid/srv6_sample_localsid_doc.md delete mode 100644 src/plugins/sample_srv6_localsid.am delete mode 100644 src/plugins/srv6-localsid/node.c delete mode 100755 src/plugins/srv6-localsid/srv6_localsid_sample.c delete mode 100644 src/plugins/srv6-localsid/srv6_localsid_sample.h delete mode 100644 src/plugins/srv6-localsid/srv6_sample_localsid_doc.md create mode 100644 src/vnet/sr/sr_localsid.md create mode 100644 src/vnet/sr/sr_policy.md create mode 100644 src/vnet/sr/sr_steering.md (limited to 'doxygen/user_doc.md') diff --git a/doxygen/Makefile b/doxygen/Makefile index 973ca07f..9caa4079 100644 --- a/doxygen/Makefile +++ b/doxygen/Makefile @@ -53,7 +53,8 @@ DOXY_SRC_DIRECTORIES = \ $(DOXY_SRC)/vlibsocket \ $(DOXY_SRC)/vnet \ $(DOXY_SRC)/vpp \ - $(DOXY_SRC)/vpp-api + $(DOXY_SRC)/vpp-api \ + $(DOXY_SRC)/examples # Input directories and files DOXY_INPUT ?= \ @@ -72,9 +73,8 @@ DOXY_INPUT := $(subst $(WS_ROOT)/,,$(DOXY_INPUT)) # These must be left-anchored paths for the regexp below to work. DOXY_EXCLUDE ?= \ $(DOXY_SRC)/vlib/vlib/buffer.c \ - $(DOXY_SRC)/vlib/example \ $(DOXY_SRC)/vpp-api/lua \ - plugins/sample-plugin + $(DOXY_SRC)/examples/sample-plugin # Generate a regexp for filenames to exclude DOXY_EXCLUDE_REGEXP = ($(subst .,\.,$(shell echo '$(strip $(DOXY_EXCLUDE))' | sed -e 's/ /|/g'))) diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index 40303439..29df6156 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -4,13 +4,14 @@ User Documentation {#user_doc} Several modules provide operational, dataplane-user focused documentation. - [GUI guided user demo](https://wiki.fd.io/view/VPP_Sandbox/vpp-userdemo) -- @subpage qos_doc -- @subpage ipsec_gre_doc -- @subpage dpdk_crypto_ipsec_doc -- @subpage map_doc -- @subpage lldp_doc +- @subpage bfd_doc - @subpage ioam_plugin_doc +- @subpage ipsec_gre_doc - @subpage lb_plugin_doc +- @subpage lldp_doc +- @subpage map_doc +- @subpage dpdk_crypto_ipsec_doc - @subpage flowperpkt_plugin_doc +- @subpage qos_doc - @subpage span_doc -- @subpage bfd_doc +- @subpage srv6_doc diff --git a/src/configure.ac b/src/configure.ac index 0ca1af11..581b0968 100644 --- a/src/configure.ac +++ b/src/configure.ac @@ -154,7 +154,6 @@ PLUGIN_ENABLED(lb) PLUGIN_ENABLED(memif) PLUGIN_ENABLED(sixrd) PLUGIN_ENABLED(snat) -PLUGIN_DISABLED(srv6sample) ############################################################################### # Dependency checks diff --git a/src/examples/srv6-sample-localsid/node.c b/src/examples/srv6-sample-localsid/node.c new file mode 100644 index 00000000..7bae9cd7 --- /dev/null +++ b/src/examples/srv6-sample-localsid/node.c @@ -0,0 +1,200 @@ +/* + * Copyright (c) 2015 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include +#include +#include +#include +#include + +typedef struct { + u32 localsid_index; +} srv6_localsid_sample_trace_t; + +/* packet trace format function */ +static u8 * format_srv6_localsid_sample_trace (u8 * s, va_list * args) +{ + CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *); + CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *); + srv6_localsid_sample_trace_t * t = va_arg (*args, srv6_localsid_sample_trace_t *); + s = format (s, "SRv6-sample-localsid: localsid_index %d\n", + t->localsid_index); + return s; +} + +vlib_node_registration_t srv6_localsid_sample_node; + +#define foreach_srv6_localsid_counter \ +_(PROCESSED, "srv6-sample-localsid processed packets") \ +_(NO_SRH, "(Error) No SRH.") + +typedef enum { +#define _(sym,str) SRV6_LOCALSID_COUNTER_##sym, + foreach_srv6_localsid_counter +#undef _ + SRV6_LOCALSID_N_COUNTERS, +} srv6_localsid_sample_counters; + +static char * srv6_localsid_counter_strings[] = { +#define _(sym,string) string, + foreach_srv6_localsid_counter +#undef _ +}; + +typedef enum { + SRV6_SAMPLE_LOCALSID_NEXT_ERROR, + SRV6_SAMPLE_LOCALSID_NEXT_IP6LOOKUP, + SRV6_SAMPLE_LOCALSID_N_NEXT, +} srv6_localsid_sample_next_t; + +/** + * @brief Function doing End processing. + */ +//Fixme: support OAM (hop-by-hop header) here! +static_always_inline void +end_srh_processing (vlib_node_runtime_t * node, + vlib_buffer_t * b0, + ip6_header_t * ip0, + ip6_sr_header_t * sr0, + u32 * next0) +{ + ip6_address_t *new_dst0; + + if(PREDICT_TRUE(ip0->protocol == IP_PROTOCOL_IPV6_ROUTE)) + { + if(PREDICT_TRUE(sr0->type == ROUTING_HEADER_TYPE_SR)) + { + if(PREDICT_TRUE(sr0->segments_left != 0)) + { + sr0->segments_left -= 1; + new_dst0 = (ip6_address_t *)(sr0->segments); + new_dst0 += sr0->segments_left; + ip0->dst_address.as_u64[0] = new_dst0->as_u64[0]; + ip0->dst_address.as_u64[1] = new_dst0->as_u64[1]; + } + else + { + *next0 = SRV6_SAMPLE_LOCALSID_NEXT_ERROR; + b0->error = node->errors[SRV6_LOCALSID_COUNTER_NO_SRH]; + } + } + else + { + /* Error. Routing header of type != SR */ + *next0 = SRV6_SAMPLE_LOCALSID_NEXT_ERROR; + b0->error = node->errors[SRV6_LOCALSID_COUNTER_NO_SRH]; + } + } +} + +/* + * @brief SRv6 Sample Localsid graph node + * WARNING: YOU MUST DO THE DUAL LOOP + */ +static uword +srv6_localsid_sample_fn (vlib_main_t * vm, vlib_node_runtime_t * node, vlib_frame_t * frame) +{ + u32 n_left_from, * from, * to_next; + u32 next_index; + u32 pkts_swapped = 0; + + ip6_sr_main_t * sm = &sr_main; + + from = vlib_frame_vector_args (frame); + n_left_from = frame->n_vectors; + next_index = node->cached_next_index; + u32 cpu_index = os_get_cpu_number (); + + while (n_left_from > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, + to_next, n_left_to_next); + + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0; + vlib_buffer_t * b0; + ip6_header_t * ip0 = 0; + ip6_sr_header_t * sr0; + u32 next0 = SRV6_SAMPLE_LOCALSID_NEXT_IP6LOOKUP; + ip6_sr_localsid_t *ls0; + srv6_localsid_sample_per_sid_memory_t *ls0_mem; + + bi0 = from[0]; + to_next[0] = bi0; + from += 1; + to_next += 1; + n_left_from -= 1; + n_left_to_next -= 1; + + b0 = vlib_get_buffer (vm, bi0); + ip0 = vlib_buffer_get_current (b0); + sr0 = (ip6_sr_header_t *)(ip0+1); + + /* Lookup the SR End behavior based on IP DA (adj) */ + ls0 = pool_elt_at_index (sm->localsids, vnet_buffer(b0)->ip.adj_index[VLIB_TX]); + ls0_mem = ls0->plugin_mem; + + /* SRH processing */ + end_srh_processing (node, b0, ip0, sr0, &next0); + + /* ==================================================================== */ + /* INSERT CODE HERE */ + /* Example starts here */ + //In this example we are changing the next VRF table by the one in CLI + vnet_buffer(b0)->sw_if_index[VLIB_TX] = ls0_mem->fib_table; + /* Example finishes here */ + /* ==================================================================== */ + + if (PREDICT_FALSE(b0->flags & VLIB_BUFFER_IS_TRACED)) + { + srv6_localsid_sample_trace_t *tr = vlib_add_trace (vm, node, b0, sizeof (*tr)); + tr->localsid_index = ls0 - sm->localsids; + } + + /* This increments the SRv6 per LocalSID counters.*/ + vlib_increment_combined_counter + (((next0 == SRV6_SAMPLE_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : &(sm->sr_ls_valid_counters)), + cpu_index, + ls0 - sm->localsids, + 1, vlib_buffer_length_in_chain (vm, b0)); + + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, + n_left_to_next, bi0, next0); + + pkts_swapped ++; + } + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + + } + + return frame->n_vectors; +} + +VLIB_REGISTER_NODE (srv6_localsid_sample_node) = { + .function = srv6_localsid_sample_fn, + .name = "srv6-localsid-sample", + .vector_size = sizeof (u32), + .format_trace = format_srv6_localsid_sample_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + .n_errors = SRV6_LOCALSID_N_COUNTERS, + .error_strings = srv6_localsid_counter_strings, + .n_next_nodes = SRV6_SAMPLE_LOCALSID_N_NEXT, + .next_nodes = { + [SRV6_SAMPLE_LOCALSID_NEXT_IP6LOOKUP] = "ip6-lookup", + [SRV6_SAMPLE_LOCALSID_NEXT_ERROR] = "error-drop", + }, +}; diff --git a/src/examples/srv6-sample-localsid/srv6_localsid_sample.c b/src/examples/srv6-sample-localsid/srv6_localsid_sample.c new file mode 100755 index 00000000..ec16547e --- /dev/null +++ b/src/examples/srv6-sample-localsid/srv6_localsid_sample.c @@ -0,0 +1,179 @@ +/* + * Copyright (c) 2015 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +/* + *------------------------------------------------------------------ + * srv6_localsid_sample.c - Simple SRv6 LocalSID + *------------------------------------------------------------------ + */ + +#include +#include +#include + +#include +#include +#include + +unsigned char srv6_localsid_name[32] = "Sample-SRv6-LocalSID-plugin"; +unsigned char keyword_str[32] = "new_srv6_localsid"; +unsigned char def_str[64] = "This is a definition of a sample new_srv6_localsid"; +unsigned char params_str[32] = ""; + +/*****************************************/ +/* SRv6 LocalSID instantiation and removal functions */ +static int +srv6_localsid_creation_fn (ip6_sr_localsid_t *localsid) +{ + /* + * Do you want to do anything fancy upon localsid instantiation? + * You can do it here + * (If return != 0 the localsid creation will be cancelled.) + */ + /* As an example Im going to do a +1 to the fib table inserted by the user */ + srv6_localsid_sample_per_sid_memory_t *ls_mem = localsid->plugin_mem; + ls_mem->fib_table += 1; + return 0; +} + +static int +srv6_localsid_removal_fn (ip6_sr_localsid_t *localsid) +{ + /* Do you want to do anything fancy upon localsid removal? + * You can do it here + * (If return != 0 the localsid removal will be cancelled.) + */ + /* + * BTW if you stored something in localsid->plugin_mem you should clean it now + */ + + //In this example we are only cleaning the memory allocated per localsid + clib_mem_free(localsid->plugin_mem); + return 0; +} + +/**********************************/ +/* SRv6 LocalSID format functions */ +/* + * Prints nicely the parameters of a localsid + * Example: print "Table 5" + */ +u8 * +format_srv6_localsid_sample (u8 * s, va_list * args) +{ + srv6_localsid_sample_per_sid_memory_t *ls_mem = va_arg (*args, void *); + return (format (s, "Table: %u", ls_mem->fib_table)); +} + +/* + * Process the parameters of a localsid + * Example: process from: + * sr localsid address cafe::1 behavior new_srv6_localsid 5 + * everything from behavior on... so in this case 'new_srv6_localsid 5' + * Notice that it MUST match the keyword_str and params_str defined above. + */ +uword +unformat_srv6_localsid_sample (unformat_input_t * input, va_list * args) +{ + void **plugin_mem = va_arg (*args, void **); + srv6_localsid_sample_per_sid_memory_t *ls_mem; + u32 table_id; + if (unformat (input, "new_srv6_localsid %u", &table_id)) + { + /* Allocate a portion of memory */ + ls_mem = clib_mem_alloc_aligned_at_offset ( + sizeof(srv6_localsid_sample_per_sid_memory_t), 0, 0, 1); + + /* Set to zero the memory */ + memset (ls_mem, 0, sizeof(srv6_localsid_sample_per_sid_memory_t)); + + /* Our brand-new car is ready */ + ls_mem->fib_table = table_id; + + /* Dont forget to add it to the localsid */ + *plugin_mem = ls_mem; + return 1; + } + return 0; +} + +/*************************/ +/* SRv6 LocalSID FIB DPO */ +static u8 * +format_srv6_localsid_sample_dpo (u8 * s, va_list * args) +{ + index_t index = va_arg (*args, index_t); + CLIB_UNUSED (u32 indent) = va_arg (*args, u32); + + return (format (s, "SR: localsid_sample_index:[%u]", index)); +} + +void +srv6_localsid_sample_dpo_lock (dpo_id_t * dpo) +{ +} + +void +srv6_localsid_sample_dpo_unlock (dpo_id_t * dpo) +{ +} + +const static dpo_vft_t srv6_localsid_sample_vft = { + .dv_lock = srv6_localsid_sample_dpo_lock, + .dv_unlock = srv6_localsid_sample_dpo_unlock, + .dv_format = format_srv6_localsid_sample_dpo, +}; + +const static char *const srv6_localsid_sample_ip6_nodes[] = { + "srv6-localsid-sample", + NULL, +}; + +const static char *const *const srv6_localsid_sample_nodes[DPO_PROTO_NUM] = { + [DPO_PROTO_IP6] = srv6_localsid_sample_ip6_nodes, +}; + +/**********************/ +static clib_error_t * srv6_localsid_sample_init (vlib_main_t * vm) +{ + srv6_localsid_sample_main_t * sm = &srv6_localsid_sample_main; + int rv = 0; + /* Create DPO */ + sm->srv6_localsid_sample_dpo_type = dpo_register_new_type ( + &srv6_localsid_sample_vft, srv6_localsid_sample_nodes); + + /* Register SRv6 LocalSID */ + rv = sr_localsid_register_function (vm, + srv6_localsid_name, + keyword_str, + def_str, + params_str, + &sm->srv6_localsid_sample_dpo_type, + format_srv6_localsid_sample, + unformat_srv6_localsid_sample, + srv6_localsid_creation_fn, + srv6_localsid_removal_fn); + if (rv < 0) + clib_error_return (0, "SRv6 LocalSID function could not be registered."); + else + sm->srv6_localsid_behavior_id = rv; + + return 0; +} + +VLIB_INIT_FUNCTION (srv6_localsid_sample_init); + +VLIB_PLUGIN_REGISTER () = { + .version = "1.0", +}; diff --git a/src/examples/srv6-sample-localsid/srv6_localsid_sample.h b/src/examples/srv6-sample-localsid/srv6_localsid_sample.h new file mode 100644 index 00000000..474b5de2 --- /dev/null +++ b/src/examples/srv6-sample-localsid/srv6_localsid_sample.h @@ -0,0 +1,61 @@ +/* + * Copyright (c) 2015 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#ifndef __included_srv6_localsid_sample_h__ +#define __included_srv6_localsid_sample_h__ + +#include +#include +#include +#include + +#include +#include +#include + +typedef struct { + /* API message ID base */ + u16 msg_id_base; + + /* convenience */ + vlib_main_t * vlib_main; + vnet_main_t * vnet_main; + + /* DPO type */ + dpo_type_t srv6_localsid_sample_dpo_type; + + /* SRv6 LocalSID behavior number */ + u32 srv6_localsid_behavior_id; + +} srv6_localsid_sample_main_t; + +/* + * This is the memory that will be stored per each localsid + * the user instantiates + */ +typedef struct { + u32 fib_table; /* Stupid index used as an example.. */ +} srv6_localsid_sample_per_sid_memory_t ; + +srv6_localsid_sample_main_t srv6_localsid_sample_main; + +format_function_t format_srv6_localsid_sample; +unformat_function_t unformat_srv6_localsid_sample; + +void srv6_localsid_sample_dpo_lock (dpo_id_t * dpo); +void srv6_localsid_sample_dpo_unlock (dpo_id_t * dpo); + +extern vlib_node_registration_t srv6_localsid_sample_node; + +#endif /* __included_sample_h__ */ diff --git a/src/examples/srv6-sample-localsid/srv6_sample_localsid_doc.md b/src/examples/srv6-sample-localsid/srv6_sample_localsid_doc.md new file mode 100644 index 00000000..78e91ab3 --- /dev/null +++ b/src/examples/srv6-sample-localsid/srv6_sample_localsid_doc.md @@ -0,0 +1,30 @@ +# Sample SRv6 LocalSID documentation {#srv6_plugin_doc} + +## Introduction + +This plugin is an example of how an user can create a new SRv6 LocalSID behavior by using VPP plugins with the appropiate API calls to the existing SR code. + +This **example** plugin registers a new localsid behavior, with cli keyword 'new_srv6_localsid' which only takes one parameter, a fib-table. Upon recival of a packet, this plugin will enforce the next IP6 lookup in the specific fib-table specified by the user. (Indeed it will do the lookup in the fib_table n+1 (since for the shake of the example we increment the fib-table.) + +Notice that the plugin only 'defines' a new SRv6 LocalSID behavior, but the existing SR code in VNET is the one actually instantiating new LocalSIDs. Notice that there are callback functions such that when you create or remove a LocalSID you can actually setup specific parameters through the functions in this plugin. + +## Variables to watch for + +* srv6_localsid_name: This variable is the name (used as a unique key) identifying this SR LocalSID plugin. +* keyword_str: This is the CLI keyword to be used for the plugin. In this example 'new_srv6_localsid'. (i.e. sr localsid address cafe::1 behavior new_srv6_localsid ) +* def_str: This is a definition of this SR behavior. This is printed when you do 'show sr localsid behaviors'. +* params_str: This is a definition of the parameters of this localsid. This is printed when you do 'show sr localsid behaviors'. + +## Functions to watch for + +* srv6_localsid_creation_fn: This function will be called every time a new SR LocalSID is instantiated with the behavior defined in this plugin. +* srv6_localsid_removal_fn: This function will be called every time a new SR LocalSID is removed with the behavior defined in this plugin. This function tends to be used for freeing up all the memory created in the previous function. +* format_srv6_localsid_sample: This function prints nicely the parameters of every SR LocalSID using this behavior. +* unformat_srv6_localsid_sample: This function parses the CLI command when initialising a new SR LocalSID using this behavior. It parses all the parameters and ensures that the parameters are correct. +* format_srv6_localsid_sample_dpo: This function formats the 'show ip6 fib' message for the SR LocalSIDs created with this plugin behavior. + +## Graph node + +The current graph node uses the function 'end_srh_processing' to do the Segment Routing Endpoint behavior. Notice that it does not allow the cleanup of a Segment Routing header (as per the SRv6 behavior specs). +This function is identical to the one found in /src/vnet/sr/sr_localsid.c +In case that by some other reason you want to do decapsulation, or SRH clean_up you can use the functions 'end_decaps_srh_processing' or 'end_psp_srh_processing' respectively. diff --git a/src/plugins/Makefile.am b/src/plugins/Makefile.am index 6713a9e2..623892e7 100644 --- a/src/plugins/Makefile.am +++ b/src/plugins/Makefile.am @@ -69,10 +69,6 @@ if ENABLE_SNAT_PLUGIN include snat.am endif -if ENABLE_SRV6SAMPLE_PLUGIN -include sample_srv6_localsid.am -endif - include ../suffix-rules.mk # Remove *.la files diff --git a/src/plugins/sample_srv6_localsid.am b/src/plugins/sample_srv6_localsid.am deleted file mode 100644 index a820ab25..00000000 --- a/src/plugins/sample_srv6_localsid.am +++ /dev/null @@ -1,23 +0,0 @@ - -# Copyright (c) -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at: -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -vppplugins_LTLIBRARIES += sample_srv6_localsid_plugin.la - -sample_srv6_localsid_plugin_la_SOURCES = \ - srv6-localsid/node.c \ - srv6-localsid/srv6_localsid_sample.c - -noinst_HEADERS += srv6-localsid/srv6_localsid_sample.h - -# vi:syntax=automake diff --git a/src/plugins/srv6-localsid/node.c b/src/plugins/srv6-localsid/node.c deleted file mode 100644 index 7bae9cd7..00000000 --- a/src/plugins/srv6-localsid/node.c +++ /dev/null @@ -1,200 +0,0 @@ -/* - * Copyright (c) 2015 Cisco and/or its affiliates. - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -#include -#include -#include -#include -#include - -typedef struct { - u32 localsid_index; -} srv6_localsid_sample_trace_t; - -/* packet trace format function */ -static u8 * format_srv6_localsid_sample_trace (u8 * s, va_list * args) -{ - CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *); - CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *); - srv6_localsid_sample_trace_t * t = va_arg (*args, srv6_localsid_sample_trace_t *); - s = format (s, "SRv6-sample-localsid: localsid_index %d\n", - t->localsid_index); - return s; -} - -vlib_node_registration_t srv6_localsid_sample_node; - -#define foreach_srv6_localsid_counter \ -_(PROCESSED, "srv6-sample-localsid processed packets") \ -_(NO_SRH, "(Error) No SRH.") - -typedef enum { -#define _(sym,str) SRV6_LOCALSID_COUNTER_##sym, - foreach_srv6_localsid_counter -#undef _ - SRV6_LOCALSID_N_COUNTERS, -} srv6_localsid_sample_counters; - -static char * srv6_localsid_counter_strings[] = { -#define _(sym,string) string, - foreach_srv6_localsid_counter -#undef _ -}; - -typedef enum { - SRV6_SAMPLE_LOCALSID_NEXT_ERROR, - SRV6_SAMPLE_LOCALSID_NEXT_IP6LOOKUP, - SRV6_SAMPLE_LOCALSID_N_NEXT, -} srv6_localsid_sample_next_t; - -/** - * @brief Function doing End processing. - */ -//Fixme: support OAM (hop-by-hop header) here! -static_always_inline void -end_srh_processing (vlib_node_runtime_t * node, - vlib_buffer_t * b0, - ip6_header_t * ip0, - ip6_sr_header_t * sr0, - u32 * next0) -{ - ip6_address_t *new_dst0; - - if(PREDICT_TRUE(ip0->protocol == IP_PROTOCOL_IPV6_ROUTE)) - { - if(PREDICT_TRUE(sr0->type == ROUTING_HEADER_TYPE_SR)) - { - if(PREDICT_TRUE(sr0->segments_left != 0)) - { - sr0->segments_left -= 1; - new_dst0 = (ip6_address_t *)(sr0->segments); - new_dst0 += sr0->segments_left; - ip0->dst_address.as_u64[0] = new_dst0->as_u64[0]; - ip0->dst_address.as_u64[1] = new_dst0->as_u64[1]; - } - else - { - *next0 = SRV6_SAMPLE_LOCALSID_NEXT_ERROR; - b0->error = node->errors[SRV6_LOCALSID_COUNTER_NO_SRH]; - } - } - else - { - /* Error. Routing header of type != SR */ - *next0 = SRV6_SAMPLE_LOCALSID_NEXT_ERROR; - b0->error = node->errors[SRV6_LOCALSID_COUNTER_NO_SRH]; - } - } -} - -/* - * @brief SRv6 Sample Localsid graph node - * WARNING: YOU MUST DO THE DUAL LOOP - */ -static uword -srv6_localsid_sample_fn (vlib_main_t * vm, vlib_node_runtime_t * node, vlib_frame_t * frame) -{ - u32 n_left_from, * from, * to_next; - u32 next_index; - u32 pkts_swapped = 0; - - ip6_sr_main_t * sm = &sr_main; - - from = vlib_frame_vector_args (frame); - n_left_from = frame->n_vectors; - next_index = node->cached_next_index; - u32 cpu_index = os_get_cpu_number (); - - while (n_left_from > 0) - { - u32 n_left_to_next; - - vlib_get_next_frame (vm, node, next_index, - to_next, n_left_to_next); - - while (n_left_from > 0 && n_left_to_next > 0) - { - u32 bi0; - vlib_buffer_t * b0; - ip6_header_t * ip0 = 0; - ip6_sr_header_t * sr0; - u32 next0 = SRV6_SAMPLE_LOCALSID_NEXT_IP6LOOKUP; - ip6_sr_localsid_t *ls0; - srv6_localsid_sample_per_sid_memory_t *ls0_mem; - - bi0 = from[0]; - to_next[0] = bi0; - from += 1; - to_next += 1; - n_left_from -= 1; - n_left_to_next -= 1; - - b0 = vlib_get_buffer (vm, bi0); - ip0 = vlib_buffer_get_current (b0); - sr0 = (ip6_sr_header_t *)(ip0+1); - - /* Lookup the SR End behavior based on IP DA (adj) */ - ls0 = pool_elt_at_index (sm->localsids, vnet_buffer(b0)->ip.adj_index[VLIB_TX]); - ls0_mem = ls0->plugin_mem; - - /* SRH processing */ - end_srh_processing (node, b0, ip0, sr0, &next0); - - /* ==================================================================== */ - /* INSERT CODE HERE */ - /* Example starts here */ - //In this example we are changing the next VRF table by the one in CLI - vnet_buffer(b0)->sw_if_index[VLIB_TX] = ls0_mem->fib_table; - /* Example finishes here */ - /* ==================================================================== */ - - if (PREDICT_FALSE(b0->flags & VLIB_BUFFER_IS_TRACED)) - { - srv6_localsid_sample_trace_t *tr = vlib_add_trace (vm, node, b0, sizeof (*tr)); - tr->localsid_index = ls0 - sm->localsids; - } - - /* This increments the SRv6 per LocalSID counters.*/ - vlib_increment_combined_counter - (((next0 == SRV6_SAMPLE_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : &(sm->sr_ls_valid_counters)), - cpu_index, - ls0 - sm->localsids, - 1, vlib_buffer_length_in_chain (vm, b0)); - - vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, - n_left_to_next, bi0, next0); - - pkts_swapped ++; - } - vlib_put_next_frame (vm, node, next_index, n_left_to_next); - - } - - return frame->n_vectors; -} - -VLIB_REGISTER_NODE (srv6_localsid_sample_node) = { - .function = srv6_localsid_sample_fn, - .name = "srv6-localsid-sample", - .vector_size = sizeof (u32), - .format_trace = format_srv6_localsid_sample_trace, - .type = VLIB_NODE_TYPE_INTERNAL, - .n_errors = SRV6_LOCALSID_N_COUNTERS, - .error_strings = srv6_localsid_counter_strings, - .n_next_nodes = SRV6_SAMPLE_LOCALSID_N_NEXT, - .next_nodes = { - [SRV6_SAMPLE_LOCALSID_NEXT_IP6LOOKUP] = "ip6-lookup", - [SRV6_SAMPLE_LOCALSID_NEXT_ERROR] = "error-drop", - }, -}; diff --git a/src/plugins/srv6-localsid/srv6_localsid_sample.c b/src/plugins/srv6-localsid/srv6_localsid_sample.c deleted file mode 100755 index ec16547e..00000000 --- a/src/plugins/srv6-localsid/srv6_localsid_sample.c +++ /dev/null @@ -1,179 +0,0 @@ -/* - * Copyright (c) 2015 Cisco and/or its affiliates. - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -/* - *------------------------------------------------------------------ - * srv6_localsid_sample.c - Simple SRv6 LocalSID - *------------------------------------------------------------------ - */ - -#include -#include -#include - -#include -#include -#include - -unsigned char srv6_localsid_name[32] = "Sample-SRv6-LocalSID-plugin"; -unsigned char keyword_str[32] = "new_srv6_localsid"; -unsigned char def_str[64] = "This is a definition of a sample new_srv6_localsid"; -unsigned char params_str[32] = ""; - -/*****************************************/ -/* SRv6 LocalSID instantiation and removal functions */ -static int -srv6_localsid_creation_fn (ip6_sr_localsid_t *localsid) -{ - /* - * Do you want to do anything fancy upon localsid instantiation? - * You can do it here - * (If return != 0 the localsid creation will be cancelled.) - */ - /* As an example Im going to do a +1 to the fib table inserted by the user */ - srv6_localsid_sample_per_sid_memory_t *ls_mem = localsid->plugin_mem; - ls_mem->fib_table += 1; - return 0; -} - -static int -srv6_localsid_removal_fn (ip6_sr_localsid_t *localsid) -{ - /* Do you want to do anything fancy upon localsid removal? - * You can do it here - * (If return != 0 the localsid removal will be cancelled.) - */ - /* - * BTW if you stored something in localsid->plugin_mem you should clean it now - */ - - //In this example we are only cleaning the memory allocated per localsid - clib_mem_free(localsid->plugin_mem); - return 0; -} - -/**********************************/ -/* SRv6 LocalSID format functions */ -/* - * Prints nicely the parameters of a localsid - * Example: print "Table 5" - */ -u8 * -format_srv6_localsid_sample (u8 * s, va_list * args) -{ - srv6_localsid_sample_per_sid_memory_t *ls_mem = va_arg (*args, void *); - return (format (s, "Table: %u", ls_mem->fib_table)); -} - -/* - * Process the parameters of a localsid - * Example: process from: - * sr localsid address cafe::1 behavior new_srv6_localsid 5 - * everything from behavior on... so in this case 'new_srv6_localsid 5' - * Notice that it MUST match the keyword_str and params_str defined above. - */ -uword -unformat_srv6_localsid_sample (unformat_input_t * input, va_list * args) -{ - void **plugin_mem = va_arg (*args, void **); - srv6_localsid_sample_per_sid_memory_t *ls_mem; - u32 table_id; - if (unformat (input, "new_srv6_localsid %u", &table_id)) - { - /* Allocate a portion of memory */ - ls_mem = clib_mem_alloc_aligned_at_offset ( - sizeof(srv6_localsid_sample_per_sid_memory_t), 0, 0, 1); - - /* Set to zero the memory */ - memset (ls_mem, 0, sizeof(srv6_localsid_sample_per_sid_memory_t)); - - /* Our brand-new car is ready */ - ls_mem->fib_table = table_id; - - /* Dont forget to add it to the localsid */ - *plugin_mem = ls_mem; - return 1; - } - return 0; -} - -/*************************/ -/* SRv6 LocalSID FIB DPO */ -static u8 * -format_srv6_localsid_sample_dpo (u8 * s, va_list * args) -{ - index_t index = va_arg (*args, index_t); - CLIB_UNUSED (u32 indent) = va_arg (*args, u32); - - return (format (s, "SR: localsid_sample_index:[%u]", index)); -} - -void -srv6_localsid_sample_dpo_lock (dpo_id_t * dpo) -{ -} - -void -srv6_localsid_sample_dpo_unlock (dpo_id_t * dpo) -{ -} - -const static dpo_vft_t srv6_localsid_sample_vft = { - .dv_lock = srv6_localsid_sample_dpo_lock, - .dv_unlock = srv6_localsid_sample_dpo_unlock, - .dv_format = format_srv6_localsid_sample_dpo, -}; - -const static char *const srv6_localsid_sample_ip6_nodes[] = { - "srv6-localsid-sample", - NULL, -}; - -const static char *const *const srv6_localsid_sample_nodes[DPO_PROTO_NUM] = { - [DPO_PROTO_IP6] = srv6_localsid_sample_ip6_nodes, -}; - -/**********************/ -static clib_error_t * srv6_localsid_sample_init (vlib_main_t * vm) -{ - srv6_localsid_sample_main_t * sm = &srv6_localsid_sample_main; - int rv = 0; - /* Create DPO */ - sm->srv6_localsid_sample_dpo_type = dpo_register_new_type ( - &srv6_localsid_sample_vft, srv6_localsid_sample_nodes); - - /* Register SRv6 LocalSID */ - rv = sr_localsid_register_function (vm, - srv6_localsid_name, - keyword_str, - def_str, - params_str, - &sm->srv6_localsid_sample_dpo_type, - format_srv6_localsid_sample, - unformat_srv6_localsid_sample, - srv6_localsid_creation_fn, - srv6_localsid_removal_fn); - if (rv < 0) - clib_error_return (0, "SRv6 LocalSID function could not be registered."); - else - sm->srv6_localsid_behavior_id = rv; - - return 0; -} - -VLIB_INIT_FUNCTION (srv6_localsid_sample_init); - -VLIB_PLUGIN_REGISTER () = { - .version = "1.0", -}; diff --git a/src/plugins/srv6-localsid/srv6_localsid_sample.h b/src/plugins/srv6-localsid/srv6_localsid_sample.h deleted file mode 100644 index 474b5de2..00000000 --- a/src/plugins/srv6-localsid/srv6_localsid_sample.h +++ /dev/null @@ -1,61 +0,0 @@ -/* - * Copyright (c) 2015 Cisco and/or its affiliates. - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -#ifndef __included_srv6_localsid_sample_h__ -#define __included_srv6_localsid_sample_h__ - -#include -#include -#include -#include - -#include -#include -#include - -typedef struct { - /* API message ID base */ - u16 msg_id_base; - - /* convenience */ - vlib_main_t * vlib_main; - vnet_main_t * vnet_main; - - /* DPO type */ - dpo_type_t srv6_localsid_sample_dpo_type; - - /* SRv6 LocalSID behavior number */ - u32 srv6_localsid_behavior_id; - -} srv6_localsid_sample_main_t; - -/* - * This is the memory that will be stored per each localsid - * the user instantiates - */ -typedef struct { - u32 fib_table; /* Stupid index used as an example.. */ -} srv6_localsid_sample_per_sid_memory_t ; - -srv6_localsid_sample_main_t srv6_localsid_sample_main; - -format_function_t format_srv6_localsid_sample; -unformat_function_t unformat_srv6_localsid_sample; - -void srv6_localsid_sample_dpo_lock (dpo_id_t * dpo); -void srv6_localsid_sample_dpo_unlock (dpo_id_t * dpo); - -extern vlib_node_registration_t srv6_localsid_sample_node; - -#endif /* __included_sample_h__ */ diff --git a/src/plugins/srv6-localsid/srv6_sample_localsid_doc.md b/src/plugins/srv6-localsid/srv6_sample_localsid_doc.md deleted file mode 100644 index b60ab5da..00000000 --- a/src/plugins/srv6-localsid/srv6_sample_localsid_doc.md +++ /dev/null @@ -1,38 +0,0 @@ -# SRv6 Sample LocalSID documentation {#srv6_plugin_doc} - -## Disclaimer - -This is a memo intended to contain documentation for the sample SRv6 LocalSID behavior plugin -Everything that is not directly obvious should come here. -For any feedback on content that should be explained please mailto:pcamaril@cisco.com - -This plugin refers to Segment Routing. Please read the SR documentation first. - -## Introduction - -This plugin is an example of how an user can create a new SRv6 LocalSID behavior by using VPP plugins with the appropiate API calls to the existing SR code. - -This **example** plugin registers a new localsid behavior, with cli keyword 'new_srv6_localsid' which only takes one parameter, a fib-table. Upon recival of a packet, this plugin will enforce the next IP6 lookup in the specific fib-table specified by the user. (Indeed it will do the lookup in the fib_table n+1 (since for the shake of the example we increment the fib-table.) - -Notice that the plugin only 'defines' a new SRv6 LocalSID behavior, but the existing SR code in VNET is the one actually instantiating new LocalSIDs. Notice that there are callback functions such that when you create or remove a LocalSID you can actually setup specific parameters through the functions in this plugin. - -## Variables to watch for - -* srv6_localsid_name: This variable is the name (used as a unique key) identifying this SR LocalSID plugin. -* keyword_str: This is the CLI keyword to be used for the plugin. In this example 'new_srv6_localsid'. (i.e. sr localsid address cafe::1 behavior new_srv6_localsid ) -* def_str: This is a definition of this SR behavior. This is printed when you do 'show sr localsid behaviors'. -* params_str: This is a definition of the parameters of this localsid. This is printed when you do 'show sr localsid behaviors'. - -## Functions to watch for - -* srv6_localsid_creation_fn: This function will be called every time a new SR LocalSID is instantiated with the behavior defined in this plugin. -* srv6_localsid_removal_fn: This function will be called every time a new SR LocalSID is removed with the behavior defined in this plugin. This function tends to be used for freeing up all the memory created in the previous function. -* format_srv6_localsid_sample: This function prints nicely the parameters of every SR LocalSID using this behavior. -* unformat_srv6_localsid_sample: This function parses the CLI command when initialising a new SR LocalSID using this behavior. It parses all the parameters and ensures that the parameters are correct. -* format_srv6_localsid_sample_dpo: This function formats the 'show ip6 fib' message for the SR LocalSIDs created with this plugin behavior. - -## Graph node - -The current graph node uses the function 'end_srh_processing' to do the Segment Routing Endpoint behavior. Notice that it does not allow the cleanup of a Segment Routing header (as per the SRv6 behavior specs). -This function is identical to the one found in /src/vnet/sr/sr_localsid.c -In case that by some other reason you want to do decapsulation, or SRH clean_up you can use the functions 'end_decaps_srh_processing' or 'end_psp_srh_processing' respectively. diff --git a/src/vnet/sr/sr.h b/src/vnet/sr/sr.h index 5efeb4dd..b832c0fc 100755 --- a/src/vnet/sr/sr.h +++ b/src/vnet/sr/sr.h @@ -173,6 +173,7 @@ typedef struct } l2; }; u8 traffic_type; /**< Traffic type (IPv4, IPv6, L2) */ + u8 padding[3]; } sr_steering_key_t; typedef struct @@ -186,46 +187,29 @@ typedef struct */ typedef struct { - /* ip6-lookup next index for imposition FIB entries */ - u32 ip6_lookup_sr_next_index; - - /* ip6-replicate next index for multicast tunnel */ - u32 ip6_lookup_sr_spray_index; - - /* IP4-lookup -> SR rewrite next index */ - u32 ip4_lookup_sr_policy_rewrite_encaps_index; - u32 ip4_lookup_sr_policy_rewrite_insert_index; - - /* IP6-lookup -> SR rewrite next index */ - u32 ip6_lookup_sr_policy_rewrite_encaps_index; - u32 ip6_lookup_sr_policy_rewrite_insert_index; - /* L2-input -> SR rewrite next index */ u32 l2_sr_policy_rewrite_index; - /* IP6-lookup -> SR LocalSID (SR End processing) index */ - u32 ip6_lookup_sr_localsid_index; - /* SR SID lists */ ip6_sr_sl_t *sid_lists; /* SR policies */ ip6_sr_policy_t *sr_policies; - /* Find an SR policy by its BindingSID */ - ip6_address_t *sr_policy_index_by_key; + /* Hash table mapping BindingSID to SR policy */ + mhash_t sr_policies_index_hash; /* Pool of SR localsid instances */ ip6_sr_localsid_t *localsids; - /* Find a SR localsid instance based on its functionID */ - ip6_address_t *localsids_index_by_key; + /* Hash table mapping LOC:FUNC to SR LocalSID instance */ + mhash_t sr_localsids_index_hash; /* Pool of SR steer policies instances */ ip6_sr_steering_policy_t *steer_policies; - /* Find a steer policy based on its classifier */ - sr_steering_key_t *steer_policies_index_by_key; + /* Hash table mapping steering rules to SR steer instance */ + mhash_t sr_steer_policies_hash; /* L2 steering ifaces - sr_policies */ u32 *sw_iface_sr_policies; diff --git a/src/vnet/sr/sr_doc.md b/src/vnet/sr/sr_doc.md index a7220630..fd92bdf2 100644 --- a/src/vnet/sr/sr_doc.md +++ b/src/vnet/sr/sr_doc.md @@ -1,6 +1,6 @@ -# SRv6: Segment Routing for IPv6 {#sr_doc} +# SRv6: Segment Routing for IPv6 {#srv6_doc} -This is a memo intended to contain documentation of the VPP SRv6 implementation +This is a memo intended to contain documentation of the VPP SRv6 implementation. Everything that is not directly obvious should come here. For any feedback on content that should be explained please mailto:pcamaril@cisco.com @@ -27,135 +27,29 @@ Segment routing can operate with either an MPLS or an IPv6 data plane. All the c * Local SID: is a SID associated with a processing function on the local node, which may go from advancing to the next SID in the SRH, to complex user-defined behaviors. When a FIB lookup, either in the main FIB or in a specific VRF, returns a match on a local SID, the associated function is performed. * BindingSID: a BindingSID is a SID (only one) associated one-one with an SR Policy. If a packet arrives with an IPv6 DA corresponding to a BindingSID, then the SR policy will be applied to such packet. -## Creating an SR LocalSID +## SRv6 Features in VPP -A local SID is associated to a Segment Routing behavior -or function- on the current node. +The SRv6 Network Programming (*draft-filsfils-spring-srv6-network-programming*) defines the SRv6 architecture. -The most basic behavior is called END. It simply activates the next SID in the current packet, by decrementing the Segments Left value and updating the IPv6 DA. +VPP supports the following SRv6 LocalSID functions: End, End.X, End.DX6, End.DT6, End.DX4, End.DT4, End.DX2, End.B6, End.B6.Encaps. -A local END SID is instantiated using the following CLI: +For further information and how to configure each specific function: @subpage srv6_localsid_doc - sr localsid (del) address XX::YY behavior end -This creates a new entry in the main FIB for IPv6 address XX::YY. All packets whose IPv6 DA matches this FIB entry are redirected to the sr-localsid node, where they are processed as described above. +The Segment Routing Policy (*draft-filsfils-spring-segment-routing-policy*) defines SR Policies. -Other examples of local SIDs are the following: +VPP supports SRv6 Policies with T.Insert and T.Encaps behaviors. - sr localsid (del) address XX::YY behavior end (psp) - sr localsid (del) address XX::YY behavior end.x GE0/1/0 2001::a (psp) - sr localsid (del) address XX::YY behavior end.dx6 GE0/1/0 2001::a - sr localsid (del) address XX::YY behavior end.dx4 GE0/1/0 10.0.0.1 - sr localsid (del) address XX::YY behavior end.dx2 GigabitE0/11/0 - sr localsid (del) address XX::YY behavior end.dt6 5 - sr localsid (del) address XX::YY behavior end.dt6 5 +For further information on how to create SR Policies: @subpage srv6_policy_doc -Note that all of these behaviors match the specifications in **TODO REF NET PGM**. Please refer to this document for a detailed description of each behavior. +For further information on how to steer traffic into SR Policies: @subpage srv6_steering_doc -Help on the available local SID behaviors and their usage can be obtained with: - - help sr localsid +## SRv6 LocalSID development framework -Alternatively they can be obtained using. - - show sr localsids behavior - -The difference in between those two commands is that the first one will only display the SR LocalSID behaviors that are built-in VPP, while the latter will display those behaviors plus the ones added with the SR LocalSID Development Framework. - - -VPP keeps a 'My LocalSID Table' where it stores all the SR local SIDs instantiated as well as their parameters. Every time a new local SID is instantiated, a new entry is added to this table. In addition, counters for correctly and incorrectly processed traffic are maintained for each local SID. The counters store both the number of packets and bytes. - -The contents of the 'My LocalSID Table' is shown with: - - vpp# show sr localsid - SRv6 - My LocalSID Table: - ========================= - Address: c3::1 - Behavior: DX6 (Endpoint with decapsulation and IPv6 cross-connect) - Iface: GigabitEthernet0/5/0 - Next hop: b:c3::b - Good traffic: [51277 packets : 5332808 bytes] - Bad traffic: [0 packets : 0 bytes] - -------------------- - -The traffic counters can be reset with: - - vpp# clear sr localsid counters - -## Creating a SR Policy - -An SR Policy is defined by a Binding SID and a weighted set of Segment Lists. - -A new SR policy is created with a first SID list using: - - sr policy add bsid 2001::1 next A1:: next B1:: next C1:: (weight 5) (fib-table 3) - -* The weight parameter is only used if more than one SID list is associated with the policy. -* The fib-table parameter specifies in which table (VRF) the Binding SID is to be installed. - -An SR policy is deleted with: - - sr policy del bsid 2001::1 - sr policy del index 1 - -The existing SR policies are listed with: - - show sr policies - -### Adding/Removing SID Lists from an SR policy - -An additional SID list is associated with an existing SR policy with: - - sr policy mod bsid 2001::1 add sl next A2:: next B2:: next C2:: (weight 3) - sr policy mod index 3 add sl next A2:: next B2:: next C2:: (weight 3) - -Conversely, a SID list can be removed from an SR policy with: - - sr policy mod bsid 2001::1 del sl index 1 - sr policy mod index 3 del sl index 1 - -Note that this cannot be used to remove the last SID list of a policy. - -The weight of a SID list can also be modified with: - - sr policy mod bsid 2001::1 mod sl index 1 weight 4 - sr policy mod index 3 mod sl index 1 weight 4 - -### SR Policies: Spray policies - -Spray policies are a specific type of SR policies where the packet is replicated on all the SID lists, rather than load-balanced among them. - -SID list weights are ignored with this type of policies. - -A Spray policy is instantiated by appending the keyword **spray** to a regular SR policy command, as in: - - sr policy add bsid 2001::1 next A1:: next B1:: next C1:: spray - -Spray policies are used for removing multicast state from a network core domain, and instead send a linear unicast copy to every access node. The last SID in each list accesses the multicast tree within the access node. - -### Encapsulation SR policies - -In case the user decides to create an SR policy an IPv6 Source Address must be specified for the encapsulated traffic. In order to do so the user might use the following command: - - set sr encaps source addr XXXX::YYYY - -## Steering packets into a SR Policy - -To steer packets in Transit into an SR policy (T.Insert, T.Encaps and T.Encaps.L2 behaviors), the user needs to create an 'sr steering policy'. - - sr steer l3 2001::/64 via sr policy index 1 - sr steer l3 2001::/64 via sr policy bsid cafe::1 - sr steer l3 2001::/64 via sr policy bsid cafe::1 fib-table 3 - sr steer l3 10.0.0.0/16 via sr policy bsid cafe::1 - sr steer l2 TenGE0/1/0 via sr policy bsid cafe::1 - -Disclaimer: The T.Encaps.L2 will steer L2 frames into an SR Policy. Notice that creating an SR steering policy for L2 frames will actually automatically *puts the interface into promiscous mode*. - -## SR LocalSID development framework - -One of the * 'key' * concepts about SRv6 is regarding network programmability. This is why an SRv6 LocalSID is associated with an specific function. +One of the *'key'* concepts about SRv6 is network programmability. This is why an SRv6 LocalSID is associated with an specific function. However, the trully way to enable network programmability is allowing any developer **easily** create his own SRv6 LocalSID function. That is the reason why we have added some API calls such that any developer can code his own SRv6 LocalSID behaviors as plugins an add them to the running SRv6 code. The principle is that the developer only codes the behavior -the graph node-. However all the FIB handling, SR LocalSID instantiation and so on are done by the VPP SRv6 code. -For more information please refer to the documentation *SRv6 Sample SR LocalSID plugin*. +For more information please refer to: @subpage srv6_plugin_doc diff --git a/src/vnet/sr/sr_localsid.c b/src/vnet/sr/sr_localsid.c index 610bb7ff..2e3d56de 100755 --- a/src/vnet/sr/sr_localsid.c +++ b/src/vnet/sr/sr_localsid.c @@ -72,18 +72,16 @@ sr_cli_localsid (char is_del, ip6_address_t * localsid_addr, int rv; ip6_sr_localsid_t *ls = 0; - ip6_address_t *key_copy; dpo_id_t dpo = DPO_INVALID; /* Search for the item */ - p = hash_get_mem (sm->localsids_index_by_key, localsid_addr); + p = mhash_get (&sm->sr_localsids_index_hash, localsid_addr); if (p) { if (is_del) { - hash_pair_t *hp; /* Retrieve localsid */ ls = pool_elt_at_index (sm->localsids, p[0]); /* Delete FIB entry */ @@ -116,10 +114,7 @@ sr_cli_localsid (char is_del, ip6_address_t * localsid_addr, /* Delete localsid registry */ pool_put (sm->localsids, ls); - hp = hash_get_pair (sm->localsids_index_by_key, localsid_addr); - key_copy = (void *) (hp->key); - hash_unset_mem (sm->localsids_index_by_key, localsid_addr); - vec_free (key_copy); + mhash_unset (&sm->sr_localsids_index_hash, localsid_addr, NULL); return 1; } else /* create with function already existing; complain */ @@ -232,9 +227,8 @@ sr_cli_localsid (char is_del, ip6_address_t * localsid_addr, } /* Set hash key for searching localsid by address */ - key_copy = vec_new (ip6_address_t, 1); - clib_memcpy (key_copy, localsid_addr, sizeof (ip6_address_t)); - hash_set_mem (sm->localsids_index_by_key, key_copy, ls - sm->localsids); + mhash_set (&sm->sr_localsids_index_hash, localsid_addr, ls - sm->localsids, + NULL); fib_table_entry_special_dpo_add (fib_index, &pfx, FIB_SOURCE_SR, FIB_ENTRY_FLAG_EXCLUSIVE, &dpo); @@ -1476,8 +1470,8 @@ sr_localsids_init (vlib_main_t * vm) { /* Init memory for function keys */ ip6_sr_main_t *sm = &sr_main; - sm->localsids_index_by_key = - hash_create_mem (0, sizeof (ip6_address_t), sizeof (uword)); + mhash_init (&sm->sr_localsids_index_hash, sizeof (uword), + sizeof (ip6_address_t)); /* Init SR behaviors DPO type */ sr_localsid_dpo_type = dpo_register_new_type (&sr_loc_vft, sr_loc_nodes); /* Init SR behaviors DPO type */ diff --git a/src/vnet/sr/sr_localsid.md b/src/vnet/sr/sr_localsid.md new file mode 100644 index 00000000..340af4a3 --- /dev/null +++ b/src/vnet/sr/sr_localsid.md @@ -0,0 +1,58 @@ +# SR LocalSIDs {#srv6_localsid_doc} + +A local SID is associated to a Segment Routing behavior -or function- on the current node. + +The most basic behavior is called END. It simply activates the next SID in the current packet, by decrementing the Segments Left value and updating the IPv6 DA. + +A local END SID is instantiated using the following CLI: + + sr localsid (del) address XX::YY behavior end + +This creates a new entry in the main FIB for IPv6 address XX::YY. All packets whose IPv6 DA matches this FIB entry are redirected to the sr-localsid node, where they are processed as described above. + +Other examples of local SIDs are the following: + + sr localsid (del) address XX::YY behavior end + sr localsid (del) address XX::YY behavior end.x GE0/1/0 2001::a + sr localsid (del) address XX::YY behavior end.dx6 GE0/1/0 2001::a + sr localsid (del) address XX::YY behavior end.dx4 GE0/1/0 10.0.0.1 + sr localsid (del) address XX::YY behavior end.dx2 GigabitE0/11/0 + sr localsid (del) address XX::YY behavior end.dt6 5 + sr localsid (del) address XX::YY behavior end.dt6 5 + +Note that all of these behaviors match the definitions of the SRv6 architecture (*draft-filsfils-spring-srv6-network-programming*). Please refer to this document for a detailed description of each behavior. + +Note also that you can configure the PSP flavor of the End and End.X behaviors by typing: + + sr localsid (del) address XX::YY behavior end psp + sr localsid (del) address XX::YY behavior end.x GE0/1/0 2001::a psp + +Help on the available local SID behaviors and their usage can be obtained with: + + help sr localsid + +Alternatively they can be obtained using. + + show sr localsids behavior + +The difference in between those two commands is that the first one will only display the SR LocalSID behaviors that are built-in VPP, while the latter will display those behaviors plus the ones added with the SR LocalSID Development Framework. + + +VPP keeps a 'My LocalSID Table' where it stores all the SR local SIDs instantiated as well as their parameters. Every time a new local SID is instantiated, a new entry is added to this table. In addition, counters for correctly and incorrectly processed traffic are maintained for each local SID. The counters store both the number of packets and bytes. + +The contents of the 'My LocalSID Table' is shown with: + + vpp# show sr localsid + SRv6 - My LocalSID Table: + ========================= + Address: c3::1 + Behavior: DX6 (Endpoint with decapsulation and IPv6 cross-connect) + Iface: GigabitEthernet0/5/0 + Next hop: b:c3::b + Good traffic: [51277 packets : 5332808 bytes] + Bad traffic: [0 packets : 0 bytes] + -------------------- + +The traffic counters can be reset with: + + vpp# clear sr localsid counters diff --git a/src/vnet/sr/sr_policy.md b/src/vnet/sr/sr_policy.md new file mode 100644 index 00000000..521b8461 --- /dev/null +++ b/src/vnet/sr/sr_policy.md @@ -0,0 +1,56 @@ +# Creating a SR Policy {#srv6_policy_doc} + +An SR Policy is defined by a Binding SID and a weighted set of Segment Lists. + +A new SR policy is created with a first SID list using: + + sr policy add bsid 2001::1 next A1:: next B1:: next C1:: (weight 5) (fib-table 3) + +* The weight parameter is only used if more than one SID list is associated with the policy. +* The fib-table parameter specifies in which table (VRF) the Binding SID is to be installed. + +An SR policy is deleted with: + + sr policy del bsid 2001::1 + sr policy del index 1 + +The existing SR policies are listed with: + + show sr policies + +## Adding/Removing SID Lists from an SR policy + +An additional SID list is associated with an existing SR policy with: + + sr policy mod bsid 2001::1 add sl next A2:: next B2:: next C2:: (weight 3) + sr policy mod index 3 add sl next A2:: next B2:: next C2:: (weight 3) + +Conversely, a SID list can be removed from an SR policy with: + + sr policy mod bsid 2001::1 del sl index 1 + sr policy mod index 3 del sl index 1 + +Note that this cannot be used to remove the last SID list of a policy. + +The weight of a SID list can also be modified with: + + sr policy mod bsid 2001::1 mod sl index 1 weight 4 + sr policy mod index 3 mod sl index 1 weight 4 + +## SR Policies: Spray policies + +Spray policies are a specific type of SR policies where the packet is replicated on all the SID lists, rather than load-balanced among them. + +SID list weights are ignored with this type of policies. + +A Spray policy is instantiated by appending the keyword **spray** to a regular SR policy command, as in: + + sr policy add bsid 2001::1 next A1:: next B1:: next C1:: spray + +Spray policies are used for removing multicast state from a network core domain, and instead send a linear unicast copy to every access node. The last SID in each list accesses the multicast tree within the access node. + +## Encapsulation SR policies + +In case the user decides to create an SR policy an IPv6 Source Address must be specified for the encapsulated traffic. In order to do so the user might use the following command: + + set sr encaps source addr XXXX::YYYY diff --git a/src/vnet/sr/sr_policy_rewrite.c b/src/vnet/sr/sr_policy_rewrite.c index dfdd7950..de6c1616 100755 --- a/src/vnet/sr/sr_policy_rewrite.c +++ b/src/vnet/sr/sr_policy_rewrite.c @@ -349,6 +349,7 @@ update_lb (ip6_sr_policy_t * sr_policy) ip6_sr_sl_t *segment_list; ip6_sr_main_t *sm = &sr_main; load_balance_path_t path; + path.path_index = FIB_NODE_INDEX_INVALID; load_balance_path_t *ip4_path_vector = 0; load_balance_path_t *ip6_path_vector = 0; load_balance_path_t *b_path_vector = 0; @@ -447,6 +448,7 @@ update_replicate (ip6_sr_policy_t * sr_policy) ip6_sr_sl_t *segment_list; ip6_sr_main_t *sm = &sr_main; load_balance_path_t path; + path.path_index = FIB_NODE_INDEX_INVALID; load_balance_path_t *b_path_vector = 0; load_balance_path_t *ip6_path_vector = 0; load_balance_path_t *ip4_path_vector = 0; @@ -548,11 +550,10 @@ sr_policy_add (ip6_address_t * bsid, ip6_address_t * segments, { ip6_sr_main_t *sm = &sr_main; ip6_sr_policy_t *sr_policy = 0; - ip6_address_t *key_copy; uword *p; /* Search for existing keys (BSID) */ - p = hash_get_mem (sm->sr_policy_index_by_key, bsid); + p = mhash_get (&sm->sr_policies_index_hash, bsid); if (p) { /* Add SR policy that already exists; complain */ @@ -592,10 +593,8 @@ sr_policy_add (ip6_address_t * bsid, ip6_address_t * segments, sr_policy->is_encap = is_encap; /* Copy the key */ - key_copy = vec_new (ip6_address_t, 1); - clib_memcpy (key_copy, bsid, sizeof (ip6_address_t)); - hash_set_mem (sm->sr_policy_index_by_key, key_copy, - sr_policy - sm->sr_policies); + mhash_set (&sm->sr_policies_index_hash, bsid, sr_policy - sm->sr_policies, + NULL); /* Create a segment list and add the index to the SR policy */ create_sl (sr_policy, segments, weight, is_encap); @@ -635,14 +634,12 @@ sr_policy_del (ip6_address_t * bsid, u32 index) ip6_sr_main_t *sm = &sr_main; ip6_sr_policy_t *sr_policy = 0; ip6_sr_sl_t *segment_list; - ip6_address_t *key_copy; u32 *sl_index; uword *p; - hash_pair_t *hp; if (bsid) { - p = hash_get_mem (sm->sr_policy_index_by_key, bsid); + p = mhash_get (&sm->sr_policies_index_hash, bsid); if (p) sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); else @@ -692,14 +689,11 @@ sr_policy_del (ip6_address_t * bsid, u32 index) } /* Remove SR policy entry */ - hp = hash_get_pair (sm->sr_policy_index_by_key, &sr_policy->bsid); - key_copy = (void *) (hp->key); - hash_unset_mem (sm->sr_policy_index_by_key, &sr_policy->bsid); - vec_free (key_copy); + mhash_unset (&sm->sr_policies_index_hash, &sr_policy->bsid, NULL); pool_put (sm->sr_policies, sr_policy); /* If FIB empty unlock it */ - if (!pool_elts (sm->sr_policies)) + if (!pool_elts (sm->sr_policies) && !pool_elts (sm->steer_policies)) { fib_table_unlock (sm->fib_table_ip6, FIB_PROTOCOL_IP6); fib_table_unlock (sm->fib_table_ip4, FIB_PROTOCOL_IP6); @@ -741,7 +735,7 @@ sr_policy_mod (ip6_address_t * bsid, u32 index, u32 fib_table, if (bsid) { - p = hash_get_mem (sm->sr_policy_index_by_key, bsid); + p = mhash_get (&sm->sr_policies_index_hash, bsid); if (p) sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); else @@ -3208,8 +3202,8 @@ sr_policy_rewrite_init (vlib_main_t * vm) ip6_sr_main_t *sm = &sr_main; /* Init memory for sr policy keys (bsid <-> ip6_address_t) */ - sm->sr_policy_index_by_key = hash_create_mem (0, sizeof (ip6_address_t), - sizeof (uword)); + mhash_init (&sm->sr_policies_index_hash, sizeof (uword), + sizeof (ip6_address_t)); /* Init SR VPO DPOs type */ sr_pr_encaps_dpo_type = diff --git a/src/vnet/sr/sr_steering.c b/src/vnet/sr/sr_steering.c index 86d6f27c..be1bb7ad 100755 --- a/src/vnet/sr/sr_steering.c +++ b/src/vnet/sr/sr_steering.c @@ -62,14 +62,14 @@ sr_steering_policy (int is_del, ip6_address_t * bsid, u32 sr_policy_index, u32 sw_if_index, u8 traffic_type) { ip6_sr_main_t *sm = &sr_main; - sr_steering_key_t key, *key_copy; + sr_steering_key_t key; ip6_sr_steering_policy_t *steer_pl; fib_prefix_t pfx = { 0 }; ip6_sr_policy_t *sr_policy = 0; uword *p = 0; - hash_pair_t *hp; + memset (&key, 0, sizeof (sr_steering_key_t)); /* Compute the steer policy key */ if (prefix) @@ -78,6 +78,8 @@ sr_steering_policy (int is_del, ip6_address_t * bsid, u32 sr_policy_index, key.l3.prefix.as_u64[1] = prefix->as_u64[1]; key.l3.mask_width = mask_width; key.l3.fib_table = (table_id != (u32) ~ 0 ? table_id : 0); + if (traffic_type != SR_STEER_IPV4 && traffic_type != SR_STEER_IPV6) + return -1; } else { @@ -92,12 +94,14 @@ sr_steering_policy (int is_del, ip6_address_t * bsid, u32 sr_policy_index, vnet_get_sw_interface (sm->vnet_main, sw_if_index); if (sw->type != VNET_SW_INTERFACE_TYPE_HARDWARE) return -3; + if (traffic_type != SR_STEER_L2) + return -1; } key.traffic_type = traffic_type; /* Search for the item */ - p = hash_get_mem (sm->steer_policies_index_by_key, &key); + p = mhash_get (&sm->sr_steer_policies_hash, &key); if (p) { @@ -152,10 +156,17 @@ sr_steering_policy (int is_del, ip6_address_t * bsid, u32 sr_policy_index, /* Delete SR steering policy entry */ pool_put (sm->steer_policies, steer_pl); - hp = hash_get_pair (sm->steer_policies_index_by_key, &key); - key_copy = (void *) (hp->key); - hash_unset_mem (sm->steer_policies_index_by_key, &key); - vec_free (key_copy); + mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); + + /* If no more SR policies or steering policies */ + if (!pool_elts (sm->sr_policies) && !pool_elts (sm->steer_policies)) + { + fib_table_unlock (sm->fib_table_ip6, FIB_PROTOCOL_IP6); + fib_table_unlock (sm->fib_table_ip4, FIB_PROTOCOL_IP6); + sm->fib_table_ip6 = (u32) ~ 0; + sm->fib_table_ip4 = (u32) ~ 0; + } + return 1; } else /* It means user requested to update an existing SR steering policy */ @@ -163,7 +174,7 @@ sr_steering_policy (int is_del, ip6_address_t * bsid, u32 sr_policy_index, /* Retrieve SR steering policy */ if (bsid) { - p = hash_get_mem (sm->sr_policy_index_by_key, bsid); + p = mhash_get (&sm->sr_policies_index_hash, bsid); if (p) sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); else @@ -192,7 +203,6 @@ sr_steering_policy (int is_del, ip6_address_t * bsid, u32 sr_policy_index, /* Create a new one */ goto update_fib; - } else if (steer_pl->classify.traffic_type == SR_STEER_IPV4) { @@ -224,7 +234,7 @@ sr_steering_policy (int is_del, ip6_address_t * bsid, u32 sr_policy_index, /* Retrieve SR policy */ if (bsid) { - p = hash_get_mem (sm->sr_policy_index_by_key, bsid); + p = mhash_get (&sm->sr_policies_index_hash, bsid); if (p) sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); else @@ -255,19 +265,14 @@ sr_steering_policy (int is_del, ip6_address_t * bsid, u32 sr_policy_index, { /* Incorrect API usage. Should never get here */ pool_put (sm->steer_policies, steer_pl); - hp = hash_get_pair (sm->steer_policies_index_by_key, &key); - key_copy = (void *) (hp->key); - hash_unset_mem (sm->steer_policies_index_by_key, &key); - vec_free (key_copy); + mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); return -1; } steer_pl->sr_policy = sr_policy - sm->sr_policies; /* Create and store key */ - key_copy = vec_new (sr_steering_key_t, 1); - clib_memcpy (key_copy, &key, sizeof (sr_steering_key_t)); - hash_set_mem (sm->steer_policies_index_by_key, - key_copy, steer_pl - sm->steer_policies); + mhash_set (&sm->sr_steer_policies_hash, &key, steer_pl - sm->steer_policies, + NULL); if (traffic_type == SR_STEER_L2) { @@ -345,18 +350,12 @@ update_fib: cleanup_error_encap: pool_put (sm->steer_policies, steer_pl); - hp = hash_get_pair (sm->steer_policies_index_by_key, &key); - key_copy = (void *) (hp->key); - hash_unset_mem (sm->steer_policies_index_by_key, &key); - vec_free (key_copy); + mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); return -5; cleanup_error_redirection: pool_put (sm->steer_policies, steer_pl); - hp = hash_get_pair (sm->steer_policies_index_by_key, &key); - key_copy = (void *) (hp->key); - hash_unset_mem (sm->steer_policies_index_by_key, &key); - vec_free (key_copy); + mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); return -3; } @@ -379,6 +378,8 @@ sr_steer_policy_command_fn (vlib_main_t * vm, unformat_input_t * input, u8 sr_policy_set = 0; + memset (&prefix, 0, sizeof (ip46_address_t)); + int rv; while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) { @@ -450,7 +451,7 @@ sr_steer_policy_command_fn (vlib_main_t * vm, unformat_input_t * input, "Unable to do SW redirect. Incorrect interface."); case -4: return clib_error_return (0, - "The requested SR policy could not be deleted. Review the BSID/index."); + "The requested SR steering policy could not be deleted."); case -5: return clib_error_return (0, "The SR policy is not an encapsulation one."); @@ -541,8 +542,8 @@ sr_steering_init (vlib_main_t * vm) ip6_sr_main_t *sm = &sr_main; /* Init memory for function keys */ - sm->steer_policies_index_by_key = - hash_create_mem (0, sizeof (sr_steering_key_t), sizeof (uword)); + mhash_init (&sm->sr_steer_policies_hash, sizeof (uword), + sizeof (sr_steering_key_t)); sm->sw_iface_sr_policies = 0; diff --git a/src/vnet/sr/sr_steering.md b/src/vnet/sr/sr_steering.md new file mode 100644 index 00000000..cf446f81 --- /dev/null +++ b/src/vnet/sr/sr_steering.md @@ -0,0 +1,11 @@ +# Steering packets into a SR Policy {#srv6_steering_doc} + +To steer packets in Transit into an SR policy (T.Insert, T.Encaps and T.Encaps.L2 behaviors), the user needs to create an 'sr steering policy'. + + sr steer l3 2001::/64 via sr policy index 1 + sr steer l3 2001::/64 via sr policy bsid cafe::1 + sr steer l3 2001::/64 via sr policy bsid cafe::1 fib-table 3 + sr steer l3 10.0.0.0/16 via sr policy bsid cafe::1 + sr steer l2 TenGE0/1/0 via sr policy bsid cafe::1 + +Disclaimer: The T.Encaps.L2 will steer L2 frames into an SR Policy. Notice that creating an SR steering policy for L2 frames will actually automatically *put the interface into promiscous mode*. -- cgit 1.2.3-korg From 5d73eecd63018db69b10bf56adeec9cc5cf92790 Mon Sep 17 00:00:00 2001 From: Pablo Camarillo Date: Mon, 24 Apr 2017 17:51:56 +0200 Subject: First commit SR MPLS Change-Id: I961685a2a0e4c314049444c64eb6ccf877c278dd Signed-off-by: Pablo Camarillo --- MAINTAINERS | 6 +- doxygen/user_doc.md | 1 + .../srv6-sample-localsid/srv6_localsid_sample.h | 4 +- .../srv6_sample_localsid_doc.md | 2 +- src/plugins/ioam/ip6/ioam_cache.h | 2 +- .../ioam/ip6/ioam_cache_tunnel_select_node.c | 2 +- src/plugins/ioam/udp-ping/udp_ping_node.c | 2 +- src/scripts/vnet/sr/left-linux-ping.sh | 3 - src/scripts/vnet/sr/leftpeer.conf | 27 - src/scripts/vnet/sr/mcast | 58 - src/scripts/vnet/sr/right-linux-ping.sh | 4 - src/scripts/vnet/sr/rightpeer.conf | 22 - src/scripts/vnet/sr/sr_mpls | 11 + src/scripts/vnet/sr/srlocal.sh | 4 - src/vnet.am | 32 +- src/vnet/dpo/mpls_label_dpo.c | 2 +- src/vnet/fib/fib_entry.h | 8 +- src/vnet/sr/dir.dox | 25 - src/vnet/sr/ietf_draft_05.txt | 1564 ---------- src/vnet/sr/sr.api | 168 - src/vnet/sr/sr.c | 57 - src/vnet/sr/sr.h | 323 -- src/vnet/sr/sr_api.c | 244 -- src/vnet/sr/sr_doc.md | 55 - src/vnet/sr/sr_localsid.c | 1492 --------- src/vnet/sr/sr_localsid.md | 58 - src/vnet/sr/sr_packet.h | 159 - src/vnet/sr/sr_policy.md | 56 - src/vnet/sr/sr_policy_rewrite.c | 3227 -------------------- src/vnet/sr/sr_steering.c | 573 ---- src/vnet/sr/sr_steering.md | 11 - src/vnet/srmpls/dir.dox | 22 + src/vnet/srmpls/sr.h | 152 + src/vnet/srmpls/sr_doc.md | 87 + src/vnet/srmpls/sr_mpls_policy.c | 569 ++++ src/vnet/srmpls/sr_mpls_steering.c | 453 +++ src/vnet/srv6/dir.dox | 25 + src/vnet/srv6/ietf_draft_05.txt | 1564 ++++++++++ src/vnet/srv6/sr.api | 168 + src/vnet/srv6/sr.c | 57 + src/vnet/srv6/sr.h | 325 ++ src/vnet/srv6/sr_api.c | 244 ++ src/vnet/srv6/sr_doc.md | 55 + src/vnet/srv6/sr_localsid.c | 1492 +++++++++ src/vnet/srv6/sr_localsid.md | 58 + src/vnet/srv6/sr_packet.h | 159 + src/vnet/srv6/sr_policy.md | 56 + src/vnet/srv6/sr_policy_rewrite.c | 3227 ++++++++++++++++++++ src/vnet/srv6/sr_steering.c | 573 ++++ src/vnet/srv6/sr_steering.md | 11 + src/vnet/vnet_all_api_h.h | 2 +- src/vpp/api/api.c | 2 +- src/vpp/api/custom_dump.c | 2 +- src/vpp/api/vpe.api | 2 +- 54 files changed, 9349 insertions(+), 8158 deletions(-) delete mode 100755 src/scripts/vnet/sr/left-linux-ping.sh delete mode 100644 src/scripts/vnet/sr/leftpeer.conf delete mode 100644 src/scripts/vnet/sr/mcast delete mode 100755 src/scripts/vnet/sr/right-linux-ping.sh delete mode 100644 src/scripts/vnet/sr/rightpeer.conf create mode 100644 src/scripts/vnet/sr/sr_mpls delete mode 100755 src/scripts/vnet/sr/srlocal.sh delete mode 100755 src/vnet/sr/dir.dox delete mode 100755 src/vnet/sr/ietf_draft_05.txt delete mode 100644 src/vnet/sr/sr.api delete mode 100755 src/vnet/sr/sr.c delete mode 100755 src/vnet/sr/sr.h delete mode 100644 src/vnet/sr/sr_api.c delete mode 100644 src/vnet/sr/sr_doc.md delete mode 100755 src/vnet/sr/sr_localsid.c delete mode 100644 src/vnet/sr/sr_localsid.md delete mode 100755 src/vnet/sr/sr_packet.h delete mode 100644 src/vnet/sr/sr_policy.md delete mode 100755 src/vnet/sr/sr_policy_rewrite.c delete mode 100755 src/vnet/sr/sr_steering.c delete mode 100644 src/vnet/sr/sr_steering.md create mode 100755 src/vnet/srmpls/dir.dox create mode 100755 src/vnet/srmpls/sr.h create mode 100644 src/vnet/srmpls/sr_doc.md create mode 100755 src/vnet/srmpls/sr_mpls_policy.c create mode 100755 src/vnet/srmpls/sr_mpls_steering.c create mode 100755 src/vnet/srv6/dir.dox create mode 100755 src/vnet/srv6/ietf_draft_05.txt create mode 100644 src/vnet/srv6/sr.api create mode 100755 src/vnet/srv6/sr.c create mode 100755 src/vnet/srv6/sr.h create mode 100644 src/vnet/srv6/sr_api.c create mode 100644 src/vnet/srv6/sr_doc.md create mode 100755 src/vnet/srv6/sr_localsid.c create mode 100644 src/vnet/srv6/sr_localsid.md create mode 100755 src/vnet/srv6/sr_packet.h create mode 100644 src/vnet/srv6/sr_policy.md create mode 100755 src/vnet/srv6/sr_policy_rewrite.c create mode 100755 src/vnet/srv6/sr_steering.c create mode 100644 src/vnet/srv6/sr_steering.md (limited to 'doxygen/user_doc.md') diff --git a/MAINTAINERS b/MAINTAINERS index bdc33abe..2f198319 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -76,9 +76,11 @@ VNET IPv4 and IPv6 LPM M: Dave Barach F: src/vnet/ip/ -VNET IPv6 Segment Routing +VNET Segment Routing (IPv6 and MPLS) M: Pablo Camarillo -F: src/vnet/sr/ +F: src/vnet/srv6/ +F: src/vnet/srmpls/ +F: src/examples/srv6-sample-localsid/ VNET IPSec M: Sergio Gonzalez Monroy diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index 29df6156..d052c53b 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -15,3 +15,4 @@ Several modules provide operational, dataplane-user focused documentation. - @subpage qos_doc - @subpage span_doc - @subpage srv6_doc +- @subpage srmpls_doc diff --git a/src/examples/srv6-sample-localsid/srv6_localsid_sample.h b/src/examples/srv6-sample-localsid/srv6_localsid_sample.h index 474b5de2..ef74ea3e 100644 --- a/src/examples/srv6-sample-localsid/srv6_localsid_sample.h +++ b/src/examples/srv6-sample-localsid/srv6_localsid_sample.h @@ -17,8 +17,8 @@ #include #include -#include -#include +#include +#include #include #include diff --git a/src/examples/srv6-sample-localsid/srv6_sample_localsid_doc.md b/src/examples/srv6-sample-localsid/srv6_sample_localsid_doc.md index 78e91ab3..cd717db8 100644 --- a/src/examples/srv6-sample-localsid/srv6_sample_localsid_doc.md +++ b/src/examples/srv6-sample-localsid/srv6_sample_localsid_doc.md @@ -26,5 +26,5 @@ Notice that the plugin only 'defines' a new SRv6 LocalSID behavior, but the exis ## Graph node The current graph node uses the function 'end_srh_processing' to do the Segment Routing Endpoint behavior. Notice that it does not allow the cleanup of a Segment Routing header (as per the SRv6 behavior specs). -This function is identical to the one found in /src/vnet/sr/sr_localsid.c +This function is identical to the one found in /src/vnet/srv6/sr_localsid.c In case that by some other reason you want to do decapsulation, or SRH clean_up you can use the functions 'end_decaps_srh_processing' or 'end_psp_srh_processing' respectively. diff --git a/src/plugins/ioam/ip6/ioam_cache.h b/src/plugins/ioam/ip6/ioam_cache.h index 3f69fa72..e668ad7f 100644 --- a/src/plugins/ioam/ip6/ioam_cache.h +++ b/src/plugins/ioam/ip6/ioam_cache.h @@ -20,7 +20,7 @@ #include #include #include -#include +#include #include #include diff --git a/src/plugins/ioam/ip6/ioam_cache_tunnel_select_node.c b/src/plugins/ioam/ip6/ioam_cache_tunnel_select_node.c index 0cf742c9..ca06607d 100644 --- a/src/plugins/ioam/ip6/ioam_cache_tunnel_select_node.c +++ b/src/plugins/ioam/ip6/ioam_cache_tunnel_select_node.c @@ -43,7 +43,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/src/plugins/ioam/udp-ping/udp_ping_node.c b/src/plugins/ioam/udp-ping/udp_ping_node.c index 84759b0f..e1a57955 100644 --- a/src/plugins/ioam/udp-ping/udp_ping_node.c +++ b/src/plugins/ioam/udp-ping/udp_ping_node.c @@ -25,7 +25,7 @@ #include #include #include -#include +#include typedef enum { diff --git a/src/scripts/vnet/sr/left-linux-ping.sh b/src/scripts/vnet/sr/left-linux-ping.sh deleted file mode 100755 index 55b83506..00000000 --- a/src/scripts/vnet/sr/left-linux-ping.sh +++ /dev/null @@ -1,3 +0,0 @@ -#!/bin/bash -ifconfig eth2 inet6 add db02::1/64 -route -A inet6 add db04::1/128 gw db02::2 diff --git a/src/scripts/vnet/sr/leftpeer.conf b/src/scripts/vnet/sr/leftpeer.conf deleted file mode 100644 index 9591d968..00000000 --- a/src/scripts/vnet/sr/leftpeer.conf +++ /dev/null @@ -1,27 +0,0 @@ -comment { test sr segment chunk-offset on } -test sr hmac validate on - -comment { trunk to rightpeer } -set int ip address GigabitEthernet2/3/0 db03::2/64 -enable ip6 interface GigabitEthernet2/3/0 -set int state GigabitEthernet2/3/0 up - -comment { subscriber left-linux-ping } -set int ip address GigabitEthernet2/2/0 db02::2/64 -enable ip6 interface GigabitEthernet2/2/0 -set int state GigabitEthernet2/2/0 up - -sr hmac id 2 key Gozzer -sr hmac id 3 key Hoser - -sr tunnel src db01::1 dst db04::1/128 next db03::1 next db04::1 tag db02::2 clean key Gozzer InPE 1 - -comment { sr unaware service chaining to db03::5 } -comment { sr tunnel src db01::1 dst db04::1/128 next db03::1 next db03::5 next db04::1 tag db02::2 clean key Gozzer InPE 1 } - -comment { tap connect srlocal hwaddr random } -comment { set int ip6 table tap-0 1 } -comment { set int ip address tap-0 db04::99/64 } -comment { enable ip6 interface tap-0 } -comment { set int state tap-0 up } -comment { ip route add table 1 db02::0/64 lookup in table 0 } diff --git a/src/scripts/vnet/sr/mcast b/src/scripts/vnet/sr/mcast deleted file mode 100644 index 50e73efa..00000000 --- a/src/scripts/vnet/sr/mcast +++ /dev/null @@ -1,58 +0,0 @@ - -loop create -loop create -loop create -loop create - -set int state loop0 up -set int state loop1 up -set int state loop2 up -set int state loop3 up - -set int ip address loop0 2001::1/64 -set int ip address loop1 2001:1::1/64 -set int ip address loop2 2001:2::1/64 -set int ip address loop3 2001:3::1/64 - -set ip6 neighbor loop1 2001:1::2 00:00:dd:ee:cc:d1 -set ip6 neighbor loop2 2001:2::2 00:00:dd:ee:cc:d2 -set ip6 neighbor loop3 2001:3::2 00:00:dd:ee:cc:d3 - -ip route 3001::1/128 via 2001:1::2 loop1 -ip route 3001::2/128 via 2001:2::2 loop2 -ip route 3001::3/128 via 2001:3::2 loop3 - -sr tunnel name SR1 src aaaa::2:1 dst ff19::1/128 next 3001::1 clean -sr tunnel name SR2 src aaaa::2:2 dst ff19::2/128 next 3001::2 clean -sr tunnel name SR3 src aaaa::2:3 dst ff19::3/128 next 3001::3 clean - -sr policy name MCAST1 tunnel SR1 tunnel SR2 tunnel SR3 - -sr multicast-map address ff18::1 sr-policy MCAST1 - -packet-generator new { - name x - limit 1 - node ethernet-input - size 64-64 - no-recycle - data { - IP6: 1.2.3 -> 4.5.6 - ICMP: 3002::2 -> ff18::1 - ICMP echo_request - incrementing 100 - } -} -trace add pg-input 100 - -sr multicast-map del address ff18::1 sr-policy MCAST1 -sr policy del name MCAST1 tunnel SR1 tunnel SR2 tunnel SR3 - -ip route del 3001::1/128 via 2001:1::2 loop1 -ip route del 3001::2/128 via 2001:2::2 loop2 -ip route del 3001::3/128 via 2001:3::2 loop3 - -sr tunnel del name SR1 src aaaa::2:1 dst ff19::1/128 next 3001::1 clean -sr tunnel del name SR2 src aaaa::2:2 dst ff19::2/128 next 3001::2 clean -sr tunnel del name SR3 src aaaa::2:3 dst ff19::3/128 next 3001::3 clean - diff --git a/src/scripts/vnet/sr/right-linux-ping.sh b/src/scripts/vnet/sr/right-linux-ping.sh deleted file mode 100755 index 029368db..00000000 --- a/src/scripts/vnet/sr/right-linux-ping.sh +++ /dev/null @@ -1,4 +0,0 @@ -#!/bin/bash - -ifconfig eth1 inet6 add db04::1/64 -route -A inet6 add db02::1/128 gw db04::2 diff --git a/src/scripts/vnet/sr/rightpeer.conf b/src/scripts/vnet/sr/rightpeer.conf deleted file mode 100644 index 6da7a7af..00000000 --- a/src/scripts/vnet/sr/rightpeer.conf +++ /dev/null @@ -1,22 +0,0 @@ -comment { trunk to leftpeer } -set int ip address GigabitEthernet2/0/0 db03::1/64 -enable ip6 interface GigabitEthernet2/0/0 -set int state GigabitEthernet2/0/0 up - -comment { subscriber right-linux-ping } -set int ip address GigabitEthernet2/2/0 db04::2/64 - -comment { next address to fake out ND on shared LAN segment } -set int ip address GigabitEthernet2/2/0 db02::13/64 - -enable ip6 interface GigabitEthernet2/2/0 -set int state GigabitEthernet2/2/0 up - -sr tunnel src db04::1 dst db02::1/128 next db03::2 next db02::1 tag db04::2 clean - -tap connect srlocal hwaddr random -set int ip6 table tap-0 1 -set int ip address tap-0 db04::99/64 -enable ip6 interface tap-0 -set int state tap-0 up -ip route add table 1 db02::0/64 lookup in table 0 diff --git a/src/scripts/vnet/sr/sr_mpls b/src/scripts/vnet/sr/sr_mpls new file mode 100644 index 00000000..4646372a --- /dev/null +++ b/src/scripts/vnet/sr/sr_mpls @@ -0,0 +1,11 @@ +set interface mpls local0 enable +sr mpls policy add bsid 20001 next 16001 next 16002 next 16003 +sr mpls steer l3 a::/112 via sr policy bsid 20001 + +loop create +set int state loop0 up + +set int ip address loop0 11.0.0.1/24 +set ip arp loop0 11.0.0.2 00:00:11:aa:bb:cc + +mpls local-label 16001 via 11.0.0.2 loop0 out-label 16001 diff --git a/src/scripts/vnet/sr/srlocal.sh b/src/scripts/vnet/sr/srlocal.sh deleted file mode 100755 index 2f568408..00000000 --- a/src/scripts/vnet/sr/srlocal.sh +++ /dev/null @@ -1,4 +0,0 @@ -#!/bin/bash - -ifconfig srlocal inet6 add db04::1/64 -route -6 add db02::0/64 gw db04::99 diff --git a/src/vnet.am b/src/vnet.am index 6e35df87..121d1a9c 100644 --- a/src/vnet.am +++ b/src/vnet.am @@ -697,21 +697,31 @@ nobase_include_HEADERS += \ # ipv6 segment routing ######################################## -if WITH_LIBSSL libvnet_la_SOURCES += \ - vnet/sr/sr.c \ - vnet/sr/sr_localsid.c \ - vnet/sr/sr_policy_rewrite.c \ - vnet/sr/sr_steering.c \ - vnet/sr/sr_api.c -endif + vnet/srv6/sr.c \ + vnet/srv6/sr_localsid.c \ + vnet/srv6/sr_policy_rewrite.c \ + vnet/srv6/sr_steering.c \ + vnet/srv6/sr_api.c nobase_include_HEADERS += \ - vnet/sr/sr_packet.h \ - vnet/sr/sr.h \ - vnet/sr/sr.api.h + vnet/srv6/sr_packet.h \ + vnet/srv6/sr.h \ + vnet/srv6/sr.api.h + +API_FILES += vnet/srv6/sr.api + +######################################## +# mpls segment routing +######################################## + +libvnet_la_SOURCES += \ + vnet/srmpls/sr_mpls_policy.c \ + vnet/srmpls/sr_mpls_steering.c -API_FILES += vnet/sr/sr.api + +nobase_include_HEADERS += \ + vnet/srmpls/sr.h ######################################## # IPFIX / netflow v10 diff --git a/src/vnet/dpo/mpls_label_dpo.c b/src/vnet/dpo/mpls_label_dpo.c index 4d84b900..18479531 100644 --- a/src/vnet/dpo/mpls_label_dpo.c +++ b/src/vnet/dpo/mpls_label_dpo.c @@ -171,7 +171,7 @@ mpls_label_paint (vlib_buffer_t * b0, hdr0 = vlib_buffer_get_current(b0); - if (PREDICT_TRUE(1 == mld0->mld_n_labels)) + if (1 == mld0->mld_n_labels) { /* optimise for the common case of one label */ *hdr0 = mld0->mld_hdr[0]; diff --git a/src/vnet/fib/fib_entry.h b/src/vnet/fib/fib_entry.h index 2196079b..aa1000e0 100644 --- a/src/vnet/fib/fib_entry.h +++ b/src/vnet/fib/fib_entry.h @@ -48,6 +48,10 @@ typedef enum fib_source_t_ { * that is from confiiguration on an interface, not a 'ip route' command */ FIB_SOURCE_INTERFACE, + /** + * SRv6 and SR-MPLS + */ + FIB_SOURCE_SR, /** * A high priority source a plugin can use */ @@ -64,10 +68,6 @@ typedef enum fib_source_t_ { * LISP */ FIB_SOURCE_LISP, - /** - * SRv6 - */ - FIB_SOURCE_SR, /** * IPv[46] Mapping */ diff --git a/src/vnet/sr/dir.dox b/src/vnet/sr/dir.dox deleted file mode 100755 index 3f539a58..00000000 --- a/src/vnet/sr/dir.dox +++ /dev/null @@ -1,25 +0,0 @@ -/* - * - * Copyright (c) 2013 Cisco and/or its affiliates. - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -/** - @dir - @brief Segment Routing code - - An implementation of Segment Routing as per: - draft-ietf-6man-segment-routing-header-05 - - @see ietf_draft_05.txt - -*/ \ No newline at end of file diff --git a/src/vnet/sr/ietf_draft_05.txt b/src/vnet/sr/ietf_draft_05.txt deleted file mode 100755 index e9bff04f..00000000 --- a/src/vnet/sr/ietf_draft_05.txt +++ /dev/null @@ -1,1564 +0,0 @@ -Network Working Group S. Previdi, Ed. -Internet-Draft C. Filsfils -Intended status: Standards Track Cisco Systems, Inc. -Expires: August 5, 2017 B. Field - Comcast - I. Leung - Rogers Communications - J. Linkova - Google - E. Aries - Facebook - T. Kosugi - NTT - E. Vyncke - Cisco Systems, Inc. - D. Lebrun - Universite Catholique de Louvain - February 1, 2017 - - - IPv6 Segment Routing Header (SRH) - draft-ietf-6man-segment-routing-header-05 - -Abstract - - Segment Routing (SR) allows a node to steer a packet through a - controlled set of instructions, called segments, by prepending an SR - header to the packet. A segment can represent any instruction, - topological or service-based. SR allows to enforce a flow through - any path (topological, or application/service based) while - maintaining per-flow state only at the ingress node to the SR domain. - - Segment Routing can be applied to the IPv6 data plane with the - addition of a new type of Routing Extension Header. This draft - describes the Segment Routing Extension Header Type and how it is - used by SR capable nodes. - -Requirements Language - - The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", - "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this - document are to be interpreted as described in RFC 2119 [RFC2119]. - -Status of This Memo - - This Internet-Draft is submitted in full conformance with the - provisions of BCP 78 and BCP 79. - - - - -Previdi, et al. Expires August 5, 2017 [Page 1] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - Internet-Drafts are working documents of the Internet Engineering - Task Force (IETF). Note that other groups may also distribute - working documents as Internet-Drafts. The list of current Internet- - Drafts is at http://datatracker.ietf.org/drafts/current/. - - Internet-Drafts are draft documents valid for a maximum of six months - and may be updated, replaced, or obsoleted by other documents at any - time. It is inappropriate to use Internet-Drafts as reference - material or to cite them other than as "work in progress." - - This Internet-Draft will expire on August 5, 2017. - -Copyright Notice - - Copyright (c) 2017 IETF Trust and the persons identified as the - document authors. All rights reserved. - - This document is subject to BCP 78 and the IETF Trust's Legal - Provisions Relating to IETF Documents - (http://trustee.ietf.org/license-info) in effect on the date of - publication of this document. Please review these documents - carefully, as they describe your rights and restrictions with respect - to this document. Code Components extracted from this document must - include Simplified BSD License text as described in Section 4.e of - the Trust Legal Provisions and are provided without warranty as - described in the Simplified BSD License. - -Table of Contents - - 1. Segment Routing Documents . . . . . . . . . . . . . . . . . . 3 - 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 - 2.1. Data Planes supporting Segment Routing . . . . . . . . . 4 - 2.2. Segment Routing (SR) Domain . . . . . . . . . . . . . . . 4 - 2.2.1. SR Domain in a Service Provider Network . . . . . . . 5 - 2.2.2. SR Domain in a Overlay Network . . . . . . . . . . . 6 - 3. Segment Routing Extension Header (SRH) . . . . . . . . . . . 7 - 3.1. SRH TLVs . . . . . . . . . . . . . . . . . . . . . . . . 9 - 3.1.1. Ingress Node TLV . . . . . . . . . . . . . . . . . . 10 - 3.1.2. Egress Node TLV . . . . . . . . . . . . . . . . . . . 11 - 3.1.3. Opaque Container TLV . . . . . . . . . . . . . . . . 11 - 3.1.4. Padding TLV . . . . . . . . . . . . . . . . . . . . . 12 - 3.1.5. HMAC TLV . . . . . . . . . . . . . . . . . . . . . . 13 - 3.2. SRH and RFC2460 behavior . . . . . . . . . . . . . . . . 14 - 4. SRH Procedures . . . . . . . . . . . . . . . . . . . . . . . 14 - 4.1. Source SR Node . . . . . . . . . . . . . . . . . . . . . 14 - 4.2. Transit Node . . . . . . . . . . . . . . . . . . . . . . 15 - 4.3. SR Segment Endpoint Node . . . . . . . . . . . . . . . . 16 - 5. Security Considerations . . . . . . . . . . . . . . . . . . . 16 - - - -Previdi, et al. Expires August 5, 2017 [Page 2] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - 5.1. Threat model . . . . . . . . . . . . . . . . . . . . . . 17 - 5.1.1. Source routing threats . . . . . . . . . . . . . . . 17 - 5.1.2. Applicability of RFC 5095 to SRH . . . . . . . . . . 17 - 5.1.3. Service stealing threat . . . . . . . . . . . . . . . 18 - 5.1.4. Topology disclosure . . . . . . . . . . . . . . . . . 18 - 5.1.5. ICMP Generation . . . . . . . . . . . . . . . . . . . 18 - 5.2. Security fields in SRH . . . . . . . . . . . . . . . . . 19 - 5.2.1. Selecting a hash algorithm . . . . . . . . . . . . . 20 - 5.2.2. Performance impact of HMAC . . . . . . . . . . . . . 21 - 5.2.3. Pre-shared key management . . . . . . . . . . . . . . 21 - 5.3. Deployment Models . . . . . . . . . . . . . . . . . . . . 22 - 5.3.1. Nodes within the SR domain . . . . . . . . . . . . . 22 - 5.3.2. Nodes outside of the SR domain . . . . . . . . . . . 22 - 5.3.3. SR path exposure . . . . . . . . . . . . . . . . . . 23 - 5.3.4. Impact of BCP-38 . . . . . . . . . . . . . . . . . . 23 - 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 - 7. Manageability Considerations . . . . . . . . . . . . . . . . 24 - 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24 - 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24 - 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 - 10.1. Normative References . . . . . . . . . . . . . . . . . . 25 - 10.2. Informative References . . . . . . . . . . . . . . . . . 25 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27 - -1. Segment Routing Documents - - Segment Routing terminology is defined in - [I-D.ietf-spring-segment-routing]. - - Segment Routing use cases are described in [RFC7855] and - [I-D.ietf-spring-ipv6-use-cases]. - - Segment Routing protocol extensions are defined in - [I-D.ietf-isis-segment-routing-extensions], and - [I-D.ietf-ospf-ospfv3-segment-routing-extensions]. - -2. Introduction - - Segment Routing (SR), defined in [I-D.ietf-spring-segment-routing], - allows a node to steer a packet through a controlled set of - instructions, called segments, by prepending an SR header to the - packet. A segment can represent any instruction, topological or - service-based. SR allows to enforce a flow through any path - (topological or service/application based) while maintaining per-flow - state only at the ingress node to the SR domain. Segments can be - derived from different components: IGP, BGP, Services, Contexts, - Locators, etc. The list of segment forming the path is called the - Segment List and is encoded in the packet header. - - - -Previdi, et al. Expires August 5, 2017 [Page 3] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - SR allows the use of strict and loose source based routing paradigms - without requiring any additional signaling protocols in the - infrastructure hence delivering an excellent scalability property. - - The source based routing model described in - [I-D.ietf-spring-segment-routing] is inherited from the ones proposed - by [RFC1940] and [RFC2460]. The source based routing model offers - the support for explicit routing capability. - -2.1. Data Planes supporting Segment Routing - - Segment Routing (SR), can be instantiated over MPLS - ([I-D.ietf-spring-segment-routing-mpls]) and IPv6. This document - defines its instantiation over the IPv6 data-plane based on the use- - cases defined in [I-D.ietf-spring-ipv6-use-cases]. - - This document defines a new type of Routing Header (originally - defined in [RFC2460]) called the Segment Routing Header (SRH) in - order to convey the Segment List in the packet header as defined in - [I-D.ietf-spring-segment-routing]. Mechanisms through which segment - are known and advertised are outside the scope of this document. - - A segment is materialized by an IPv6 address. A segment identifies a - topological instruction or a service instruction. A segment can be - either: - - o global: a global segment represents an instruction supported by - all nodes in the SR domain and it is instantiated through an IPv6 - address globally known in the SR domain. - - o local: a local segment represents an instruction supported only by - the node who originates it and it is instantiated through an IPv6 - address that is known only by the local node. - -2.2. Segment Routing (SR) Domain - - We define the concept of the Segment Routing Domain (SR Domain) as - the set of nodes participating into the source based routing model. - These nodes may be connected to the same physical infrastructure - (e.g.: a Service Provider's network) as well as nodes remotely - connected to each other (e.g.: an enterprise VPN or an overlay). - - A non-exhaustive list of examples of SR Domains is: - - o The network of an operator, service provider, content provider, - enterprise including nodes, links and Autonomous Systems. - - - - - -Previdi, et al. Expires August 5, 2017 [Page 4] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - o A set of nodes connected as an overlay over one or more transit - providers. The overlay nodes exchange SR-enabled traffic with - segments belonging solely to the overlay routers (the SR domain). - None of the segments in the SR-enabled packets exchanged by the - overlay belong to the transit networks - - The source based routing model through its instantiation of the - Segment Routing Header (SRH) defined in this document equally applies - to all the above examples. - - It is assumed in this document that the SRH is added to the packet by - its source, consistently with the source routing model defined in - [RFC2460]. For example: - - o At the node originating the packet (host, server). - - o At the ingress node of an SR domain where the ingress node - receives an IPv6 packet and encapsulates it into an outer IPv6 - header followed by a Segment Routing header. - -2.2.1. SR Domain in a Service Provider Network - - The following figure illustrates an SR domain consisting of an - operator's network infrastructure. - - (-------------------------- Operator 1 -----------------------) - ( ) - ( (-----AS 1-----) (-------AS 2-------) (----AS 3-------) ) - ( ( ) ( ) ( ) ) - A1--(--(--11---13--14-)--(-21---22---23--24-)--(-31---32---34--)--)--Z1 - ( ( /|\ /|\ /| ) ( |\ /|\ /|\ /| ) ( |\ /|\ /| \ ) ) - A2--(--(/ | \/ | \/ | ) ( | \/ | \/ | \/ | ) ( | \/ | \/ | \)--)--Z2 - ( ( | /\ | /\ | ) ( | /\ | /\ | /\ | ) ( | /\ | /\ | ) ) - ( ( |/ \|/ \| ) ( |/ \|/ \|/ \| ) ( |/ \|/ \| ) ) - A3--(--(--15---17--18-)--(-25---26---27--28-)--(-35---36---38--)--)--Z3 - ( ( ) ( ) ( ) ) - ( (--------------) (------------------) (---------------) ) - ( ) - (-------------------------------------------------------------) - - Figure 1: Service Provider SR Domain - - Figure 1 describes an operator network including several ASes and - delivering connectivity between endpoints. In this scenario, Segment - Routing is used within the operator networks and across the ASes - boundaries (all being under the control of the same operator). In - this case segment routing can be used in order to address use cases - such as end-to-end traffic engineering, fast re-route, egress peer - - - -Previdi, et al. Expires August 5, 2017 [Page 5] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - engineering, data-center traffic engineering as described in - [RFC7855], [I-D.ietf-spring-ipv6-use-cases] and - [I-D.ietf-spring-resiliency-use-cases]. - - Typically, an IPv6 packet received at ingress (i.e.: from outside the - SR domain), is classified according to network operator policies and - such classification results into an outer header with an SRH applied - to the incoming packet. The SRH contains the list of segment - representing the path the packet must take inside the SR domain. - Thus, the SA of the packet is the ingress node, the DA (due to SRH - procedures described in Section 4) is set as the first segment of the - path and the last segment of the path is the egress node of the SR - domain. - - The path may include intra-AS as well as inter-AS segments. It has - to be noted that all nodes within the SR domain are under control of - the same administration. When the packet reaches the egress point of - the SR domain, the outer header and its SRH are removed so that the - destination of the packet is unaware of the SR domain the packet has - traversed. - - The outer header with the SRH is no different from any other - tunneling encapsulation mechanism and allows a network operator to - implement traffic engineering mechanisms so to efficiently steer - traffic across his infrastructure. - -2.2.2. SR Domain in a Overlay Network - - The following figure illustrates an SR domain consisting of an - overlay network over multiple operator's networks. - - (--Operator 1---) (-----Operator 2-----) (--Operator 3---) - ( ) ( ) ( ) - A1--(--11---13--14--)--(--21---22---23--24--)--(-31---32---34--)--C1 - ( /|\ /|\ /| ) ( |\ /|\ /|\ /| ) ( |\ /|\ /| \ ) - A2--(/ | \/ | \/ | ) ( | \/ | \/ | \/ | ) ( | \/ | \/ | \)--C2 - ( | /\ | /\ | ) ( | /\ | /\ | /\ | ) ( | /\ | /\ | ) - ( |/ \|/ \| ) ( |/ \|/ \|/ \| ) ( |/ \|/ \| ) - A3--(--15---17--18--)--(--25---26---27--28--)--(-35---36---38--)--C3 - ( ) ( | | | ) ( ) - (---------------) (--|----|---------|--) (---------------) - | | | - B1 B2 B3 - - Figure 2: Overlay SR Domain - - - - - - -Previdi, et al. Expires August 5, 2017 [Page 6] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - Figure 2 describes an overlay consisting of nodes connected to three - different network operators and forming a single overlay network - where Segment routing packets are exchanged. - - The overlay consists of nodes A1, A2, A3, B1, B2, B3, C1, C2 and C3. - These nodes are connected to their respective network operator and - form an overlay network. - - Each node may originate packets with an SRH which contains, in the - segment list of the SRH or in the DA, segments identifying other - overlay nodes. This implies that packets with an SRH may traverse - operator's networks but, obviously, these SRHs cannot contain an - address/segment of the transit operators 1, 2 and 3. The SRH - originated by the overlay can only contain address/segment under the - administration of the overlay (e.g. address/segments supported by A1, - A2, A3, B1, B2, B3, C1,C2 or C3). - - In this model, the operator network nodes are transit nodes and, - according to [RFC2460], MUST NOT inspect the routing extension header - since they are not the DA of the packet. - - It is a common practice in operators networks to filter out, at - ingress, any packet whose DA is the address of an internal node and - it is also possible that an operator would filter out any packet - destined to an internal address and having an extension header in it. - - This common practice does not impact the SR-enabled traffic between - the overlay nodes as the intermediate transit networks never see a - destination address belonging to their infrastructure. These SR- - enabled overlay packets will thus never be filtered by the transit - operators. - - In all cases, transit packets (i.e.: packets whose DA is outside the - domain of the operator's network) will be forwarded accordingly - without introducing any security concern in the operator's network. - This is similar to tunneled packets. - -3. Segment Routing Extension Header (SRH) - - A new type of the Routing Header (originally defined in [RFC2460]) is - defined: the Segment Routing Header (SRH) which has a new Routing - Type, (suggested value 4) to be assigned by IANA. - - The Segment Routing Header (SRH) is defined as follows: - - - - - - - -Previdi, et al. Expires August 5, 2017 [Page 7] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - 0 1 2 3 - 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | Next Header | Hdr Ext Len | Routing Type | Segments Left | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | First Segment | Flags | RESERVED | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | | - | Segment List[0] (128 bits IPv6 address) | - | | - | | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | | - | | - ... - | | - | | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | | - | Segment List[n] (128 bits IPv6 address) | - | | - | | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - // // - // Optional Type Length Value objects (variable) // - // // - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - - where: - - o Next Header: 8-bit selector. Identifies the type of header - immediately following the SRH. - - o Hdr Ext Len: 8-bit unsigned integer, is the length of the SRH - header in 8-octet units, not including the first 8 octets. - - o Routing Type: TBD, to be assigned by IANA (suggested value: 4). - - o Segments Left. Defined in [RFC2460], it contains the index, in - the Segment List, of the next segment to inspect. Segments Left - is decremented at each segment. - - o First Segment: contains the index, in the Segment List, of the - first segment of the path which is in fact the last element of the - Segment List. - - o Flags: 8 bits of flags. Following flags are defined: - - - - -Previdi, et al. Expires August 5, 2017 [Page 8] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - 0 1 2 3 4 5 6 7 - +-+-+-+-+-+-+-+-+ - |U|P|O|A|H| U | - +-+-+-+-+-+-+-+-+ - - U: Unused and for future use. SHOULD be unset on transmission - and MUST be ignored on receipt. - - P-flag: Protected flag. Set when the packet has been rerouted - through FRR mechanism by an SR endpoint node. - - O-flag: OAM flag. When set, it indicates that this packet is - an operations and management (OAM) packet. - - A-flag: Alert flag. If present, it means important Type Length - Value (TLV) objects are present. See Section 3.1 for details - on TLVs objects. - - H-flag: HMAC flag. If set, the HMAC TLV is present and is - encoded as the last TLV of the SRH. In other words, the last - 36 octets of the SRH represent the HMAC information. See - Section 3.1.5 for details on the HMAC TLV. - - o RESERVED: SHOULD be unset on transmission and MUST be ignored on - receipt. - - o Segment List[n]: 128 bit IPv6 addresses representing the nth - segment in the Segment List. The Segment List is encoded starting - from the last segment of the path. I.e., the first element of the - segment list (Segment List [0]) contains the last segment of the - path while the last segment of the Segment List (Segment List[n]) - contains the first segment of the path. The index contained in - "Segments Left" identifies the current active segment. - - o Type Length Value (TLV) are described in Section 3.1. - -3.1. SRH TLVs - - This section defines TLVs of the Segment Routing Header. - - Type Length Value (TLV) contain optional information that may be used - by the node identified in the DA of the packet. It has to be noted - that the information carried in the TLVs is not intended to be used - by the routing layer. Typically, TLVs carry information that is - consumed by other components (e.g.: OAM) than the routing function. - - Each TLV has its own length, format and semantic. The code-point - allocated (by IANA) to each TLV defines both the format and the - - - -Previdi, et al. Expires August 5, 2017 [Page 9] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - semantic of the information carried in the TLV. Multiple TLVs may be - encoded in the same SRH. - - The "Length" field of the TLV is primarily used to skip the TLV while - inspecting the SRH in case the node doesn't support or recognize the - TLV codepoint. The "Length" defines the TLV length in octets and not - including the "Type" and "Length" fields. - - The primary scope of TLVs is to give the receiver of the packet - information related to the source routed path (e.g.: where the packet - entered in the SR domain and where it is expected to exit). - - Additional TLVs may be defined in the future. - -3.1.1. Ingress Node TLV - - The Ingress Node TLV is optional and identifies the node this packet - traversed when entered the SR domain. The Ingress Node TLV has - following format: - - 0 1 2 3 - 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | Type | Length | RESERVED | Flags | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | | - | Ingress Node (16 octets) | - | | - | | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - - where: - - o Type: to be assigned by IANA (suggested value 1). - - o Length: 18. - - o RESERVED: 8 bits. SHOULD be unset on transmission and MUST be - ignored on receipt. - - o Flags: 8 bits. No flags are defined in this document. - - o Ingress Node: 128 bits. Defines the node where the packet is - expected to enter the SR domain. In the encapsulation case - described in Section 2.2.1, this information corresponds to the SA - of the encapsulating header. - - - - - -Previdi, et al. Expires August 5, 2017 [Page 10] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - -3.1.2. Egress Node TLV - - The Egress Node TLV is optional and identifies the node this packet - is expected to traverse when exiting the SR domain. The Egress Node - TLV has following format: - - 0 1 2 3 - 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | Type | Length | RESERVED | Flags | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | | - | Egress Node (16 octets) | - | | - | | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - - where: - - o Type: to be assigned by IANA (suggested value 2). - - o Length: 18. - - o RESERVED: 8 bits. SHOULD be unset on transmission and MUST be - ignored on receipt. - - o Flags: 8 bits. No flags are defined in this document. - - o Egress Node: 128 bits. Defines the node where the packet is - expected to exit the SR domain. In the encapsulation case - described in Section 2.2.1, this information corresponds to the - last segment of the SRH in the encapsulating header. - -3.1.3. Opaque Container TLV - - The Opaque Container TLV is optional and has the following format: - - - - - - - - - - - - - - - -Previdi, et al. Expires August 5, 2017 [Page 11] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - 0 1 2 3 - 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | Type | Length | RESERVED | Flags | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | | - | Opaque Container (16 octets) | - | | - | | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - - where: - - o Type: to be assigned by IANA (suggested value 3). - - o Length: 18. - - o RESERVED: 8 bits. SHOULD be unset on transmission and MUST be - ignored on receipt. - - o Flags: 8 bits. No flags are defined in this document. - - o Opaque Container: 128 bits of opaque data not relevant for the - routing layer. Typically, this information is consumed by a non- - routing component of the node receiving the packet (i.e.: the node - in the DA). - -3.1.4. Padding TLV - - The Padding TLV is optional and with the purpose of aligning the SRH - on a 8 octet boundary. The Padding TLV has the following format: - - 0 1 2 3 - 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | Type | Length | Padding (variable) | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - // Padding (variable) // - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - - where: - - o Type: to be assigned by IANA (suggested value 4). - - o Length: 1 to 7 - - - - - - -Previdi, et al. Expires August 5, 2017 [Page 12] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - o Padding: from 1 to 7 octets of padding. Padding bits have no - semantic. They SHOULD be set to 0 on transmission and MUST be - ignored on receipt. - - The following applies to the Padding TLV: - - o Padding TLV is optional and MAY only appear once in the SRH. If - present, it MUST have a length between 1 and 7 octets. - - o The Padding TLV is used in order to align the SRH total length on - the 8 octet boundary. - - o When present, the Padding TLV MUST appear as the last TLV before - the HMAC TLV (if HMAC TLV is present). - - o When present, the Padding TLV MUST have a length from 1 to 7 in - order to align the SRH total lenght on a 8-octet boundary. - - o When a router inspecting the SRH encounters the Padding TLV, it - MUST assume that no other TLV (other than the HMAC) follow the - Padding TLV. - -3.1.5. HMAC TLV - - HMAC TLV is optional and contains the HMAC information. The HMAC TLV - has the following format: - - 0 1 2 3 - 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | Type | Length | RESERVED | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | HMAC Key ID (4 octets) | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | // - | HMAC (32 octets) // - | // - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - - where: - - o Type: to be assigned by IANA (suggested value 5). - - o Length: 38. - - o RESERVED: 2 octets. SHOULD be unset on transmission and MUST be - ignored on receipt. - - - - -Previdi, et al. Expires August 5, 2017 [Page 13] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - o HMAC Key ID: 4 octets. - - o HMAC: 32 octets. - - o HMAC and HMAC Key ID usage is described in Section 5 - - The Following applies to the HMAC TLV: - - o When present, the HMAC TLV MUST be encoded as the last TLV of the - SRH. - - o If the HMAC TLV is present, the SRH H-Flag (Figure 4) MUST be set. - - o When the H-flag is set in the SRH, the router inspecting the SRH - MUST find the HMAC TLV in the last 38 octets of the SRH. - -3.2. SRH and RFC2460 behavior - - The SRH being a new type of the Routing Header, it also has the same - properties: - - SHOULD only appear once in the packet. - - Only the router whose address is in the DA field of the packet - header MUST inspect the SRH. - - Therefore, Segment Routing in IPv6 networks implies that the segment - identifier (i.e.: the IPv6 address of the segment) is moved into the - DA of the packet. - - The DA of the packet changes at each segment termination/completion - and therefore the final DA of the packet MUST be encoded as the last - segment of the path. - -4. SRH Procedures - - In this section we describe the different procedures on the SRH. - -4.1. Source SR Node - - A Source SR Node can be any node originating an IPv6 packet with its - IPv6 and Segment Routing Headers. This include either: - - A host originating an IPv6 packet. - - An SR domain ingress router encapsulating a received IPv6 packet - into an outer IPv6 header followed by an SRH. - - - - -Previdi, et al. Expires August 5, 2017 [Page 14] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - The mechanism through which a Segment List is derived is outside of - the scope of this document. As an example, the Segment List may be - obtained through: - - Local path computation. - - Local configuration. - - Interaction with a centralized controller delivering the path. - - Any other mechanism. - - The following are the steps of the creation of the SRH: - - Next Header and Hdr Ext Len fields are set according to [RFC2460]. - - Routing Type field is set as TBD (to be allocated by IANA, - suggested value 4). - - The Segment List is built with the FIRST segment of the path - encoded in the LAST element of the Segment List. Subsequent - segments are encoded on top of the first segment. Finally, the - LAST segment of the path is encoded in the FIRST element of the - Segment List. In other words, the Segment List is encoded in the - reverse order of the path. - - The final DA of the packet is encoded as the last segment of the - path (encoded in the first element of the Segment List). - - The DA of the packet is set with the value of the first segment - (found in the last element of the segment list). - - The Segments Left field is set to n-1 where n is the number of - elements in the Segment List. - - The First Segment field is set to n-1 where n is the number of - elements in the Segment List. - - The packet is sent out towards the first segment (i.e.: - represented in the packet DA). - - HMAC TLV may be set according to Section 5. - -4.2. Transit Node - - According to [RFC2460], the only node who is allowed to inspect the - Routing Extension Header (and therefore the SRH), is the node - corresponding to the DA of the packet. Any other transit node MUST - - - -Previdi, et al. Expires August 5, 2017 [Page 15] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - NOT inspect the underneath routing header and MUST forward the packet - towards the DA and according to the IPv6 routing table. - - In the example case described in Section 2.2.2, when SR capable nodes - are connected through an overlay spanning multiple third-party - infrastructure, it is safe to send SRH packets (i.e.: packet having a - Segment Routing Header) between each other overlay/SR-capable nodes - as long as the segment list does not include any of the transit - provider nodes. In addition, as a generic security measure, any - service provider will block any packet destined to one of its - internal routers, especially if these packets have an extended header - in it. - -4.3. SR Segment Endpoint Node - - The SR segment endpoint node is the node whose address is in the DA. - The segment endpoint node inspects the SRH and does: - - 1. IF DA = myself (segment endpoint) - 2. IF Segments Left > 0 THEN - decrement Segments Left - update DA with Segment List[Segments Left] - 3. ELSE continue IPv6 processing of the packet - End of processing. - 4. Forward the packet out - -5. Security Considerations - - This section analyzes the security threat model, the security issues - and proposed solutions related to the new Segment Routing Header. - - The Segment Routing Header (SRH) is simply another type of the - routing header as described in RFC 2460 [RFC2460] and is: - - o Added by an SR edge router when entering the segment routing - domain or by the originating host itself. The source host can - even be outside the SR domain; - - o inspected and acted upon when reaching the destination address of - the IP header per RFC 2460 [RFC2460]. - - Per RFC2460 [RFC2460], routers on the path that simply forward an - IPv6 packet (i.e. the IPv6 destination address is none of theirs) - will never inspect and process the content of the SRH. Routers whose - one interface IPv6 address equals the destination address field of - the IPv6 packet MUST parse the SRH and, if supported and if the local - configuration allows it, MUST act accordingly to the SRH content. - - - - -Previdi, et al. Expires August 5, 2017 [Page 16] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - According to RFC2460 [RFC2460], the default behavior of a non SR- - capable router upon receipt of an IPv6 packet with SRH destined to an - address of its, is to: - - o ignore the SRH completely if the Segment Left field is 0 and - proceed to process the next header in the IPv6 packet; - - o discard the IPv6 packet if Segment Left field is greater than 0, - it MAY send a Parameter Problem ICMP message back to the Source - Address. - -5.1. Threat model - -5.1.1. Source routing threats - - Using an SRH is similar to source routing, therefore it has some - well-known security issues as described in RFC4942 [RFC4942] section - 2.1.1 and RFC5095 [RFC5095]: - - o amplification attacks: where a packet could be forged in such a - way to cause looping among a set of SR-enabled routers causing - unnecessary traffic, hence a Denial of Service (DoS) against - bandwidth; - - o reflection attack: where a hacker could force an intermediate node - to appear as the immediate attacker, hence hiding the real - attacker from naive forensic; - - o bypass attack: where an intermediate node could be used as a - stepping stone (for example in a De-Militarized Zone) to attack - another host (for example in the datacenter or any back-end - server). - -5.1.2. Applicability of RFC 5095 to SRH - - First of all, the reader must remember this specific part of section - 1 of RFC5095 [RFC5095], "A side effect is that this also eliminates - benign RH0 use-cases; however, such applications may be facilitated - by future Routing Header specifications.". In short, it is not - forbidden to create new secure type of Routing Header; for example, - RFC 6554 (RPL) [RFC6554] also creates a new Routing Header type for a - specific application confined in a single network. - - In the segment routing architecture described in - [I-D.ietf-spring-segment-routing] there are basically two kinds of - nodes (routers and hosts): - - - - - -Previdi, et al. Expires August 5, 2017 [Page 17] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - o nodes within the SR domain, which is within one single - administrative domain, i.e., where all nodes are trusted anyway - else the damage caused by those nodes could be worse than - amplification attacks: traffic interception, man-in-the-middle - attacks, more server DoS by dropping packets, and so on. - - o nodes outside of the SR domain, which is outside of the - administrative segment routing domain hence they cannot be trusted - because there is no physical security for those nodes, i.e., they - can be replaced by hostile nodes or can be coerced in wrong - behaviors. - - The main use case for SR consists of the single administrative domain - where only trusted nodes with SR enabled and configured participate - in SR: this is the same model as in RFC6554 [RFC6554]. All non- - trusted nodes do not participate as either SR processing is not - enabled by default or because they only process SRH from nodes within - their domain. - - Moreover, all SR nodes ignore SRH created by outsiders based on - topology information (received on a peering or internal interface) or - on presence and validity of the HMAC field. Therefore, if - intermediate nodes ONLY act on valid and authorized SRH (such as - within a single administrative domain), then there is no security - threat similar to RH-0. Hence, the RFC 5095 [RFC5095] attacks are - not applicable. - -5.1.3. Service stealing threat - - Segment routing is used for added value services, there is also a - need to prevent non-participating nodes to use those services; this - is called 'service stealing prevention'. - -5.1.4. Topology disclosure - - The SRH may also contains IPv6 addresses of some intermediate SR- - nodes in the path towards the destination, this obviously reveals - those addresses to the potentially hostile attackers if those - attackers are able to intercept packets containing SRH. On the other - hand, if the attacker can do a traceroute whose probes will be - forwarded along the SR path, then there is little learned by - intercepting the SRH itself. - -5.1.5. ICMP Generation - - Per section 4.4 of RFC2460 [RFC2460], when destination nodes (i.e. - where the destination address is one of theirs) receive a Routing - Header with unsupported Routing Type, the required behavior is: - - - -Previdi, et al. Expires August 5, 2017 [Page 18] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - o If Segments Left is zero, the node must ignore the Routing header - and proceed to process the next header in the packet. - - o If Segments Left is non-zero, the node must discard the packet and - send an ICMP Parameter Problem, Code 0, message to the packet's - Source Address, pointing to the unrecognized Routing Type. - - This required behavior could be used by an attacker to force the - generation of ICMP message by any node. The attacker could send - packets with SRH (with Segment Left set to 0) destined to a node not - supporting SRH. Per RFC2460 [RFC2460], the destination node could - generate an ICMP message, causing a local CPU utilization and if the - source of the offending packet with SRH was spoofed could lead to a - reflection attack without any amplification. - - It must be noted that this is a required behavior for any unsupported - Routing Type and not limited to SRH packets. So, it is not specific - to SRH and the usual rate limiting for ICMP generation is required - anyway for any IPv6 implementation and has been implemented and - deployed for many years. - -5.2. Security fields in SRH - - This section summarizes the use of specific fields in the SRH. They - are based on a key-hashed message authentication code (HMAC). - - The security-related fields in the SRH are instantiated by the HMAC - TLV, containing: - - o HMAC Key-id, 32 bits wide; - - o HMAC, 256 bits wide (optional, exists only if HMAC Key-id is not - 0). - - The HMAC field is the output of the HMAC computation (per RFC 2104 - [RFC2104]) using a pre-shared key identified by HMAC Key-id and of - the text which consists of the concatenation of: - - o the source IPv6 address; - - o First Segment field; - - o an octet of bit flags; - - o HMAC Key-id; - - o all addresses in the Segment List. - - - - -Previdi, et al. Expires August 5, 2017 [Page 19] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - The purpose of the HMAC TLV is to verify the validity, the integrity - and the authorization of the SRH itself. If an outsider of the SR - domain does not have access to a current pre-shared secret, then it - cannot compute the right HMAC field and the first SR router on the - path processing the SRH and configured to check the validity of the - HMAC will simply reject the packet. - - The HMAC TLV is located at the end of the SRH simply because only the - router on the ingress of the SR domain needs to process it, then all - other SR nodes can ignore it (based on local policy) because they - trust the upstream router. This is to speed up forwarding operations - because SR routers which do not validate the SRH do not need to parse - the SRH until the end. - - The HMAC Key-id field allows for the simultaneous existence of - several hash algorithms (SHA-256, SHA3-256 ... or future ones) as - well as pre-shared keys. The HMAC Key-id field is opaque, i.e., it - has neither syntax nor semantic except as an index to the right - combination of pre-shared key and hash algorithm and except that a - value of 0 means that there is no HMAC field. Having an HMAC Key-id - field allows for pre-shared key roll-over when two pre-shared keys - are supported for a while when all SR nodes converged to a fresher - pre-shared key. It could also allow for interoperation among - different SR domains if allowed by local policy and assuming a - collision-free HMAC Key Id allocation. - - When a specific SRH is linked to a time-related service (such as - turbo-QoS for a 1-hour period) where the DA, Segment ID (SID) are - identical, then it is important to refresh the shared-secret - frequently as the HMAC validity period expires only when the HMAC - Key-id and its associated shared-secret expires. - -5.2.1. Selecting a hash algorithm - - The HMAC field in the HMAC TLV is 256 bit wide. Therefore, the HMAC - MUST be based on a hash function whose output is at least 256 bits. - If the output of the hash function is 256, then this output is simply - inserted in the HMAC field. If the output of the hash function is - larger than 256 bits, then the output value is truncated to 256 by - taking the least-significant 256 bits and inserting them in the HMAC - field. - - SRH implementations can support multiple hash functions but MUST - implement SHA-2 [FIPS180-4] in its SHA-256 variant. - - NOTE: SHA-1 is currently used by some early implementations used for - quick interoperations testing, the 160-bit hash value must then be - - - - -Previdi, et al. Expires August 5, 2017 [Page 20] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - right-hand padded with 96 bits set to 0. The authors understand that - this is not secure but is ok for limited tests. - -5.2.2. Performance impact of HMAC - - While adding an HMAC to each and every SR packet increases the - security, it has a performance impact. Nevertheless, it must be - noted that: - - o the HMAC field is used only when SRH is added by a device (such as - a home set-up box) which is outside of the segment routing domain. - If the SRH is added by a router in the trusted segment routing - domain, then, there is no need for an HMAC field, hence no - performance impact. - - o when present, the HMAC field MUST only be checked and validated by - the first router of the segment routing domain, this router is - named 'validating SR router'. Downstream routers may not inspect - the HMAC field. - - o this validating router can also have a cache of to improve the performance. It is not the - same use case as in IPsec where HMAC value was unique per packet, - in SRH, the HMAC value is unique per flow. - - o Last point, hash functions such as SHA-2 have been optimized for - security and performance and there are multiple implementations - with good performance. - - With the above points in mind, the performance impact of using HMAC - is minimized. - -5.2.3. Pre-shared key management - - The field HMAC Key-id allows for: - - o key roll-over: when there is a need to change the key (the hash - pre-shared secret), then multiple pre-shared keys can be used - simultaneously. The validating routing can have a table of for the currently active and future - keys. - - o different algorithms: by extending the previous table to , the validating router - can also support simultaneously several hash algorithms (see - section Section 5.2.1) - - The pre-shared secret distribution can be done: - - - -Previdi, et al. Expires August 5, 2017 [Page 21] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - o in the configuration of the validating routers, either by static - configuration or any SDN oriented approach; - - o dynamically using a trusted key distribution such as [RFC6407] - - The intent of this document is NOT to define yet-another-key- - distribution-protocol. - -5.3. Deployment Models - -5.3.1. Nodes within the SR domain - - An SR domain is defined as a set of interconnected routers where all - routers at the perimeter are configured to add and act on SRH. Some - routers inside the SR domain can also act on SRH or simply forward - IPv6 packets. - - The routers inside an SR domain can be trusted to generate SRH and to - process SRH received on interfaces that are part of the SR domain. - These nodes MUST drop all SRH packets received on an interface that - is not part of the SR domain and containing an SRH whose HMAC field - cannot be validated by local policies. This includes obviously - packet with an SRH generated by a non-cooperative SR domain. - - If the validation fails, then these packets MUST be dropped, ICMP - error messages (parameter problem) SHOULD be generated (but rate - limited) and SHOULD be logged. - -5.3.2. Nodes outside of the SR domain - - Nodes outside of the SR domain cannot be trusted for physical - security; hence, they need to request by some trusted means (outside - of the scope of this document) a complete SRH for each new connection - (i.e. new destination address). The received SRH MUST include an - HMAC TLV which is computed correctly (see Section 5.2). - - When an outside node sends a packet with an SRH and towards an SR - domain ingress node, the packet MUST contain the HMAC TLV (with a - Key-id and HMAC fields) and the the destination address MUST be an - address of an SR domain ingress node . - - The ingress SR router, i.e., the router with an interface address - equals to the destination address, MUST verify the HMAC TLV. - - If the validation is successful, then the packet is simply forwarded - as usual for an SR packet. As long as the packet travels within the - SR domain, no further HMAC check needs to be done. Subsequent - - - - -Previdi, et al. Expires August 5, 2017 [Page 22] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - routers in the SR domain MAY verify the HMAC TLV when they process - the SRH (i.e. when they are the destination). - - If the validation fails, then this packet MUST be dropped, an ICMP - error message (parameter problem) SHOULD be generated (but rate - limited) and SHOULD be logged. - -5.3.3. SR path exposure - - As the intermediate SR nodes addresses appears in the SRH, if this - SRH is visible to an outsider then he/she could reuse this knowledge - to launch an attack on the intermediate SR nodes or get some insider - knowledge on the topology. This is especially applicable when the - path between the source node and the first SR domain ingress router - is on the public Internet. - - The first remark is to state that 'security by obscurity' is never - enough; in other words, the security policy of the SR domain MUST - assume that the internal topology and addressing is known by the - attacker. A simple traceroute will also give the same information - (with even more information as all intermediate nodes between SID - will also be exposed). IPsec Encapsulating Security Payload - [RFC4303] cannot be use to protect the SRH as per RFC4303 the ESP - header must appear after any routing header (including SRH). - - To prevent a user to leverage the gained knowledge by intercepting - SRH, it it recommended to apply an infrastructure Access Control List - (iACL) at the edge of the SR domain. This iACL will drop all packets - from outside the SR-domain whose destination is any address of any - router inside the domain. This security policy should be tuned for - local operations. - -5.3.4. Impact of BCP-38 - - BCP-38 [RFC2827], also known as "Network Ingress Filtering", checks - whether the source address of packets received on an interface is - valid for this interface. The use of loose source routing such as - SRH forces packets to follow a path which differs from the expected - routing. Therefore, if BCP-38 was implemented in all routers inside - the SR domain, then SR packets could be received by an interface - which is not expected one and the packets could be dropped. - - As an SR domain is usually a subset of one administrative domain, and - as BCP-38 is only deployed at the ingress routers of this - administrative domain and as packets arriving at those ingress - routers have been normally forwarded using the normal routing - information, then there is no reason why this ingress router should - - - - -Previdi, et al. Expires August 5, 2017 [Page 23] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - drop the SRH packet based on BCP-38. Routers inside the domain - commonly do not apply BCP-38; so, this is not a problem. - -6. IANA Considerations - - This document makes the following registrations in the Internet - Protocol Version 6 (IPv6) Parameters "Routing Type" registry - maintained by IANA: - - Suggested Description Reference - Value - ---------------------------------------------------------- - 4 Segment Routing Header (SRH) This document - - In addition, this document request IANA to create and maintain a new - Registry: "Segment Routing Header Type-Value Objects". The following - code-points are requested from the registry: - - Registry: Segment Routing Header Type-Value Objects - - Suggested Description Reference - Value - ----------------------------------------------------- - 1 Ingress Node TLV This document - 2 Egress Node TLV This document - 3 Opaque Container TLV This document - 4 Padding TLV This document - 5 HMAC TLV This document - -7. Manageability Considerations - - TBD - -8. Contributors - - Dave Barach, John Leddy, John Brzozowski, Pierre Francois, Nagendra - Kumar, Mark Townsley, Christian Martin, Roberta Maglione, James - Connolly, Aloys Augustin contributed to the content of this document. - -9. Acknowledgements - - The authors would like to thank Ole Troan, Bob Hinden, Fred Baker, - Brian Carpenter, Alexandru Petrescu and Punit Kumar Jaiswal for their - comments to this document. - - - - - - - -Previdi, et al. Expires August 5, 2017 [Page 24] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - -10. References - -10.1. Normative References - - [FIPS180-4] - National Institute of Standards and Technology, "FIPS - 180-4 Secure Hash Standard (SHS)", March 2012, - . - - [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate - Requirement Levels", BCP 14, RFC 2119, - DOI 10.17487/RFC2119, March 1997, - . - - [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 - (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460, - December 1998, . - - [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", - RFC 4303, DOI 10.17487/RFC4303, December 2005, - . - - [RFC5095] Abley, J., Savola, P., and G. Neville-Neil, "Deprecation - of Type 0 Routing Headers in IPv6", RFC 5095, - DOI 10.17487/RFC5095, December 2007, - . - - [RFC6407] Weis, B., Rowles, S., and T. Hardjono, "The Group Domain - of Interpretation", RFC 6407, DOI 10.17487/RFC6407, - October 2011, . - -10.2. Informative References - - [I-D.ietf-isis-segment-routing-extensions] - Previdi, S., Filsfils, C., Bashandy, A., Gredler, H., - Litkowski, S., Decraene, B., and j. jefftant@gmail.com, - "IS-IS Extensions for Segment Routing", draft-ietf-isis- - segment-routing-extensions-09 (work in progress), October - 2016. - - [I-D.ietf-ospf-ospfv3-segment-routing-extensions] - Psenak, P., Previdi, S., Filsfils, C., Gredler, H., - Shakir, R., Henderickx, W., and J. Tantsura, "OSPFv3 - Extensions for Segment Routing", draft-ietf-ospf-ospfv3- - segment-routing-extensions-07 (work in progress), October - 2016. - - - - -Previdi, et al. Expires August 5, 2017 [Page 25] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - [I-D.ietf-spring-ipv6-use-cases] - Brzozowski, J., Leddy, J., Townsley, W., Filsfils, C., and - R. Maglione, "IPv6 SPRING Use Cases", draft-ietf-spring- - ipv6-use-cases-08 (work in progress), January 2017. - - [I-D.ietf-spring-resiliency-use-cases] - Filsfils, C., Previdi, S., Decraene, B., and R. Shakir, - "Resiliency use cases in SPRING networks", draft-ietf- - spring-resiliency-use-cases-08 (work in progress), October - 2016. - - [I-D.ietf-spring-segment-routing] - Filsfils, C., Previdi, S., Decraene, B., Litkowski, S., - and R. Shakir, "Segment Routing Architecture", draft-ietf- - spring-segment-routing-10 (work in progress), November - 2016. - - [I-D.ietf-spring-segment-routing-mpls] - Filsfils, C., Previdi, S., Bashandy, A., Decraene, B., - Litkowski, S., Horneffer, M., Shakir, R., - jefftant@gmail.com, j., and E. Crabbe, "Segment Routing - with MPLS data plane", draft-ietf-spring-segment-routing- - mpls-06 (work in progress), January 2017. - - [RFC1940] Estrin, D., Li, T., Rekhter, Y., Varadhan, K., and D. - Zappala, "Source Demand Routing: Packet Format and - Forwarding Specification (Version 1)", RFC 1940, - DOI 10.17487/RFC1940, May 1996, - . - - [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- - Hashing for Message Authentication", RFC 2104, - DOI 10.17487/RFC2104, February 1997, - . - - [RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering: - Defeating Denial of Service Attacks which employ IP Source - Address Spoofing", BCP 38, RFC 2827, DOI 10.17487/RFC2827, - May 2000, . - - [RFC4942] Davies, E., Krishnan, S., and P. Savola, "IPv6 Transition/ - Co-existence Security Considerations", RFC 4942, - DOI 10.17487/RFC4942, September 2007, - . - - - - - - - -Previdi, et al. Expires August 5, 2017 [Page 26] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - [RFC6554] Hui, J., Vasseur, JP., Culler, D., and V. Manral, "An IPv6 - Routing Header for Source Routes with the Routing Protocol - for Low-Power and Lossy Networks (RPL)", RFC 6554, - DOI 10.17487/RFC6554, March 2012, - . - - [RFC7855] Previdi, S., Ed., Filsfils, C., Ed., Decraene, B., - Litkowski, S., Horneffer, M., and R. Shakir, "Source - Packet Routing in Networking (SPRING) Problem Statement - and Requirements", RFC 7855, DOI 10.17487/RFC7855, May - 2016, . - -Authors' Addresses - - Stefano Previdi (editor) - Cisco Systems, Inc. - Via Del Serafico, 200 - Rome 00142 - Italy - - Email: sprevidi@cisco.com - - - Clarence Filsfils - Cisco Systems, Inc. - Brussels - BE - - Email: cfilsfil@cisco.com - - - Brian Field - Comcast - 4100 East Dry Creek Road - Centennial, CO 80122 - US - - Email: Brian_Field@cable.comcast.com - - - Ida Leung - Rogers Communications - 8200 Dixie Road - Brampton, ON L6T 0C1 - CA - - Email: Ida.Leung@rci.rogers.com - - - - -Previdi, et al. Expires August 5, 2017 [Page 27] - -Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 - - - Jen Linkova - Google - 1600 Amphitheatre Parkway - Mountain View, CA 94043 - US - - Email: furry@google.com - - - Ebben Aries - Facebook - US - - Email: exa@fb.com - - - Tomoya Kosugi - NTT - 3-9-11, Midori-Cho Musashino-Shi, - Tokyo 180-8585 - JP - - Email: kosugi.tomoya@lab.ntt.co.jp - - - Eric Vyncke - Cisco Systems, Inc. - De Kleetlaann 6A - Diegem 1831 - Belgium - - Email: evyncke@cisco.com - - - David Lebrun - Universite Catholique de Louvain - Place Ste Barbe, 2 - Louvain-la-Neuve, 1348 - Belgium - - Email: david.lebrun@uclouvain.be - - - - - - - - - - -Previdi, et al. Expires August 5, 2017 [Page 28] \ No newline at end of file diff --git a/src/vnet/sr/sr.api b/src/vnet/sr/sr.api deleted file mode 100644 index 9e900741..00000000 --- a/src/vnet/sr/sr.api +++ /dev/null @@ -1,168 +0,0 @@ -/* - * Copyright (c) 2015-2016 Cisco and/or its affiliates. - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -/** \brief IPv6 SR LocalSID add/del request - @param client_index - opaque cookie to identify the sender - @param context - sender context, to match reply w/ request - @param is_del Boolean of whether its a delete instruction - @param localsid_addr IPv6 address of the localsid - @param end_psp Boolean of whether decapsulation is allowed in this function - @param behavior Type of behavior (function) for this localsid - @param sw_if_index Only for L2/L3 xconnect. OIF. In VRF variant the fib_table. - @param vlan_index Only for L2 xconnect. Outgoing VLAN tag. - @param fib_table FIB table in which we should install the localsid entry - @param nh_addr Next Hop IPv4/IPv6 address. Only for L2/L3 xconnect. -*/ -autoreply define sr_localsid_add_del -{ - u32 client_index; - u32 context; - u8 is_del; - u8 localsid_addr[16]; - u8 end_psp; - u8 behavior; - u32 sw_if_index; - u32 vlan_index; - u32 fib_table; - u8 nh_addr[16]; -}; - -/** \brief IPv6 SR policy add - @param client_index - opaque cookie to identify the sender - @param context - sender context, to match reply w/ request - @param bsid is the bindingSID of the SR Policy - @param weight is the weight of the sid list. optional. - @param is_encap is the behavior of the SR policy. (0.SRH insert // 1.Encapsulation) - @param type is the type of the SR policy. (0.Default // 1.Spray) - @param fib_table is the VRF where to install the FIB entry for the BSID - @param segments is a vector of IPv6 address composing the segment list -*/ -autoreply define sr_policy_add -{ - u32 client_index; - u32 context; - u8 bsid_addr[16]; - u32 weight; - u8 is_encap; - u8 type; - u32 fib_table; - u8 n_segments; - u8 segments[0]; -}; - -/** \brief IPv6 SR policy modification - @param client_index - opaque cookie to identify the sender - @param context - sender context, to match reply w/ request - @param bsid is the bindingSID of the SR Policy - @param sr_policy_index is the index of the SR policy - @param fib_table is the VRF where to install the FIB entry for the BSID - @param operation is the operation to perform (among the top ones) - @param segments is a vector of IPv6 address composing the segment list - @param sl_index is the index of the Segment List to modify/delete - @param weight is the weight of the sid list. optional. - @param is_encap Mode. Encapsulation or SRH insertion. -*/ -autoreply define sr_policy_mod -{ - u32 client_index; - u32 context; - u8 bsid_addr[16]; - u32 sr_policy_index; - u32 fib_table; - u8 operation; - u32 sl_index; - u32 weight; - u8 n_segments; - u8 segments[0]; -}; - -/** \brief IPv6 SR policy deletion - @param client_index - opaque cookie to identify the sender - @param context - sender context, to match reply w/ request - @param bsid is the bindingSID of the SR Policy - @param index is the index of the SR policy -*/ -autoreply define sr_policy_del -{ - u32 client_index; - u32 context; - u8 bsid_addr[16]; - u32 sr_policy_index; -}; - -/** \brief IPv6 SR steering add/del - @param client_index - opaque cookie to identify the sender - @param context - sender context, to match reply w/ request - @param is_del - @param bsid is the bindingSID of the SR Policy (alt to sr_policy_index) - @param sr_policy is the index of the SR Policy (alt to bsid) - @param table_id is the VRF where to install the FIB entry for the BSID - @param prefix is the IPv4/v6 address for L3 traffic type - @param mask_width is the mask for L3 traffic type - @param sw_if_index is the incoming interface for L2 traffic - @param traffic_type describes the type of traffic -*/ -autoreply define sr_steering_add_del -{ - u32 client_index; - u32 context; - u8 is_del; - u8 bsid_addr[16]; - u32 sr_policy_index; - u32 table_id; - u8 prefix_addr[16]; - u32 mask_width; - u32 sw_if_index; - u8 traffic_type; -}; - -/** \brief Dump the list of SR LocalSIDs - @param client_index - opaque cookie to identify the sender - @param context - sender context, to match reply w/ request -*/ -/**define sr_localsids_dump -{ - u32 client_index; - u32 context; -};*/ - -/** \brief Details about a single SR LocalSID - @param context - returned sender context, to match reply w/ request - @param localsid_addr IPv6 address of the localsid - @param behavior Type of behavior (function) for this localsid - @param end_psp Boolean of whether decapsulation is allowed in this function - @param sw_if_index Only for L2/L3 xconnect. OIF. In VRF variant the fib_table. - @param vlan_index Only for L2 xconnect. Outgoing VLAN tag. - @param fib_table FIB table in which we should install the localsid entry - @param nh_addr Next Hop IPv4/IPv6 address. Only for L2/L3 xconnect. -*/ -/**manual_endian define sr_localsid_details -{ - u32 context; - u8 localsid_addr[16]; - u8 behavior; - u8 end_psp; - u32 sw_if_index; - u32 vlan_index; - u32 fib_table; - u8 nh_addr[16]; -};*/ - -/* - * fd.io coding-style-patch-verification: ON - * Local Variables: - * eval: (c-set-style "gnu") - * End: - */ diff --git a/src/vnet/sr/sr.c b/src/vnet/sr/sr.c deleted file mode 100755 index 34344fce..00000000 --- a/src/vnet/sr/sr.c +++ /dev/null @@ -1,57 +0,0 @@ -/* - * sr.c: ipv6 segment routing - * - * Copyright (c) 2013 Cisco and/or its affiliates. - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -/** - * @file - * @brief Segment Routing initialization - * - */ - -#include -#include -#include -#include -#include -#include - -ip6_sr_main_t sr_main; - -/** - * @brief no-op lock function. - * The lifetime of the SR entry is managed by the control plane - */ -void -sr_dpo_lock (dpo_id_t * dpo) -{ -} - -/** - * @brief no-op unlock function. - * The lifetime of the SR entry is managed by the control plane - */ -void -sr_dpo_unlock (dpo_id_t * dpo) -{ -} - -/* -* fd.io coding-style-patch-verification: ON -* -* Local Variables: -* eval: (c-set-style "gnu") -* End: -*/ diff --git a/src/vnet/sr/sr.h b/src/vnet/sr/sr.h deleted file mode 100755 index b832c0fc..00000000 --- a/src/vnet/sr/sr.h +++ /dev/null @@ -1,323 +0,0 @@ -/* - * Copyright (c) 2015 Cisco and/or its affiliates. - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -/** - * @file - * @brief Segment Routing data structures definitions - * - */ - -#ifndef included_vnet_sr_h -#define included_vnet_sr_h - -#include -#include -#include -#include - -#include -#include - -#define IPv6_DEFAULT_HEADER_LENGTH 40 -#define IPv6_DEFAULT_HOP_LIMIT 64 -#define IPv6_DEFAULT_MAX_MASK_WIDTH 128 - -#define SR_BEHAVIOR_END 1 -#define SR_BEHAVIOR_X 2 -#define SR_BEHAVIOR_D_FIRST 3 /* Unused. Separator in between regular and D */ -#define SR_BEHAVIOR_DX2 4 -#define SR_BEHAVIOR_DX6 5 -#define SR_BEHAVIOR_DX4 6 -#define SR_BEHAVIOR_DT6 7 -#define SR_BEHAVIOR_DT4 8 -#define SR_BEHAVIOR_LAST 9 /* Must always be the last one */ - -#define SR_STEER_L2 2 -#define SR_STEER_IPV4 4 -#define SR_STEER_IPV6 6 - -#define SR_FUNCTION_SIZE 4 -#define SR_ARGUMENT_SIZE 4 - -#define SR_SEGMENT_LIST_WEIGHT_DEFAULT 1 - -/** - * @brief SR Segment List (SID list) - */ -typedef struct -{ - ip6_address_t *segments; /**< SIDs (key) */ - - u32 weight; /**< SID list weight (wECMP / UCMP) */ - - u8 *rewrite; /**< Precomputed rewrite header */ - u8 *rewrite_bsid; /**< Precomputed rewrite header for bindingSID */ - - dpo_id_t bsid_dpo; /**< DPO for Encaps/Insert for BSID */ - dpo_id_t ip6_dpo; /**< DPO for Encaps/Insert IPv6 */ - dpo_id_t ip4_dpo; /**< DPO for Encaps IPv6 */ -} ip6_sr_sl_t; - -/* SR policy types */ -#define SR_POLICY_TYPE_DEFAULT 0 -#define SR_POLICY_TYPE_SPRAY 1 -/** - * @brief SR Policy - */ -typedef struct -{ - u32 *segments_lists; /**< SID lists indexes (vector) */ - - ip6_address_t bsid; /**< BindingSID (key) */ - - u8 type; /**< Type (default is 0) */ - /* SR Policy specific DPO */ - /* IF Type = DEFAULT Then Load Balancer DPO among SID lists */ - /* IF Type = SPRAY then Spray DPO with all SID lists */ - dpo_id_t bsid_dpo; /**< SR Policy specific DPO - BSID */ - dpo_id_t ip4_dpo; /**< SR Policy specific DPO - IPv6 */ - dpo_id_t ip6_dpo; /**< SR Policy specific DPO - IPv4 */ - - u32 fib_table; /**< FIB table */ - - u8 is_encap; /**< Mode (0 is SRH insert, 1 Encaps) */ -} ip6_sr_policy_t; - -/** - * @brief SR LocalSID - */ -typedef struct -{ - ip6_address_t localsid; /**< LocalSID IPv6 address */ - - char end_psp; /**< Combined with End.PSP? */ - - u16 behavior; /**< Behavior associated to this localsid */ - - union - { - u32 sw_if_index; /**< xconnect only */ - u32 vrf_index; /**< vrf only */ - }; - - u32 fib_table; /**< FIB table where localsid is registered */ - - u32 vlan_index; /**< VLAN tag (not an index) */ - - ip46_address_t next_hop; /**< Next_hop for xconnect usage only */ - - u32 nh_adj; /**< Next_adj for xconnect usage only */ - - void *plugin_mem; /**< Memory to be used by the plugin callback functions */ -} ip6_sr_localsid_t; - -typedef int (sr_plugin_callback_t) (ip6_sr_localsid_t * localsid); - -/** - * @brief SR LocalSID behavior registration - */ -typedef struct -{ - u16 sr_localsid_function_number; /**< SR LocalSID plugin function (>SR_BEHAVIOR_LAST) */ - - u8 *function_name; /**< Function name. (key). */ - - u8 *keyword_str; /**< Behavior keyword (i.e. End.X) */ - - u8 *def_str; /**< Behavior definition (i.e. Endpoint with cross-connect) */ - - u8 *params_str; /**< Behavior parameters (i.e. ) */ - - dpo_type_t dpo; /**< DPO type registration */ - - format_function_t *ls_format; /**< LocalSID format function */ - - unformat_function_t *ls_unformat; /**< LocalSID unformat function */ - - sr_plugin_callback_t *creation; /**< Function within plugin that will be called after localsid creation*/ - - sr_plugin_callback_t *removal; /**< Function within plugin that will be called before localsid removal */ -} sr_localsid_fn_registration_t; - -/** - * @brief Steering db key - * - * L3 is IPv4/IPv6 + mask - * L2 is sf_if_index + vlan - */ -typedef struct -{ - union - { - struct - { - ip46_address_t prefix; /**< IP address of the prefix */ - u32 mask_width; /**< Mask width of the prefix */ - u32 fib_table; /**< VRF of the prefix */ - } l3; - struct - { - u32 sw_if_index; /**< Incoming software interface */ - } l2; - }; - u8 traffic_type; /**< Traffic type (IPv4, IPv6, L2) */ - u8 padding[3]; -} sr_steering_key_t; - -typedef struct -{ - sr_steering_key_t classify; /**< Traffic classification */ - u32 sr_policy; /**< SR Policy index */ -} ip6_sr_steering_policy_t; - -/** - * @brief Segment Routing main datastructure - */ -typedef struct -{ - /* L2-input -> SR rewrite next index */ - u32 l2_sr_policy_rewrite_index; - - /* SR SID lists */ - ip6_sr_sl_t *sid_lists; - - /* SR policies */ - ip6_sr_policy_t *sr_policies; - - /* Hash table mapping BindingSID to SR policy */ - mhash_t sr_policies_index_hash; - - /* Pool of SR localsid instances */ - ip6_sr_localsid_t *localsids; - - /* Hash table mapping LOC:FUNC to SR LocalSID instance */ - mhash_t sr_localsids_index_hash; - - /* Pool of SR steer policies instances */ - ip6_sr_steering_policy_t *steer_policies; - - /* Hash table mapping steering rules to SR steer instance */ - mhash_t sr_steer_policies_hash; - - /* L2 steering ifaces - sr_policies */ - u32 *sw_iface_sr_policies; - - /* Spray DPO */ - dpo_type_t sr_pr_spray_dpo_type; - - /* Plugin functions */ - sr_localsid_fn_registration_t *plugin_functions; - - /* Find plugin function by name */ - uword *plugin_functions_by_key; - - /* Counters */ - vlib_combined_counter_main_t sr_ls_valid_counters; - vlib_combined_counter_main_t sr_ls_invalid_counters; - - /* SR Policies FIBs */ - u32 fib_table_ip6; - u32 fib_table_ip4; - - /* convenience */ - vlib_main_t *vlib_main; - vnet_main_t *vnet_main; -} ip6_sr_main_t; - -ip6_sr_main_t sr_main; - -extern vlib_node_registration_t sr_policy_rewrite_encaps_node; -extern vlib_node_registration_t sr_policy_rewrite_insert_node; -extern vlib_node_registration_t sr_localsid_node; -extern vlib_node_registration_t sr_localsid_d_node; - -void sr_dpo_lock (dpo_id_t * dpo); -void sr_dpo_unlock (dpo_id_t * dpo); - -int sr_localsid_register_function (vlib_main_t * vm, u8 * fn_name, - u8 * keyword_str, u8 * def_str, - u8 * params_str, dpo_type_t * dpo, - format_function_t * ls_format, - unformat_function_t * ls_unformat, - sr_plugin_callback_t * creation_fn, - sr_plugin_callback_t * removal_fn); - -int -sr_policy_add (ip6_address_t * bsid, ip6_address_t * segments, - u32 weight, u8 behavior, u32 fib_table, u8 is_encap); -int -sr_policy_mod (ip6_address_t * bsid, u32 index, u32 fib_table, - u8 operation, ip6_address_t * segments, u32 sl_index, - u32 weight); -int sr_policy_del (ip6_address_t * bsid, u32 index); - -int sr_cli_localsid (char is_del, ip6_address_t * localsid_addr, - char end_psp, u8 behavior, u32 sw_if_index, - u32 vlan_index, u32 fib_table, ip46_address_t * nh_addr, - void *ls_plugin_mem); - -int -sr_steering_policy (int is_del, ip6_address_t * bsid, u32 sr_policy_index, - u32 table_id, ip46_address_t * prefix, u32 mask_width, - u32 sw_if_index, u8 traffic_type); - -/** - * @brief SR rewrite string computation for SRH insertion (inline) - * - * @param sl is a vector of IPv6 addresses composing the Segment List - * - * @return precomputed rewrite string for SRH insertion - */ -static inline u8 * -ip6_sr_compute_rewrite_string_insert (ip6_address_t * sl) -{ - ip6_sr_header_t *srh; - ip6_address_t *addrp, *this_address; - u32 header_length = 0; - u8 *rs = NULL; - - header_length = 0; - header_length += sizeof (ip6_sr_header_t); - header_length += (vec_len (sl) + 1) * sizeof (ip6_address_t); - - vec_validate (rs, header_length - 1); - - srh = (ip6_sr_header_t *) rs; - srh->type = ROUTING_HEADER_TYPE_SR; - srh->segments_left = vec_len (sl); - srh->first_segment = vec_len (sl); - srh->length = ((sizeof (ip6_sr_header_t) + - ((vec_len (sl) + 1) * sizeof (ip6_address_t))) / 8) - 1; - srh->flags = 0x00; - srh->reserved = 0x0000; - addrp = srh->segments + vec_len (sl); - vec_foreach (this_address, sl) - { - clib_memcpy (addrp->as_u8, this_address->as_u8, sizeof (ip6_address_t)); - addrp--; - } - return rs; -} - - -#endif /* included_vnet_sr_h */ - -/* - * fd.io coding-style-patch-verification: ON - * - * Local Variables: - * eval: (c-set-style "gnu") - * End: - */ diff --git a/src/vnet/sr/sr_api.c b/src/vnet/sr/sr_api.c deleted file mode 100644 index f4e1c346..00000000 --- a/src/vnet/sr/sr_api.c +++ /dev/null @@ -1,244 +0,0 @@ -/* - *------------------------------------------------------------------ - * sr_api.c - ipv6 segment routing api - * - * Copyright (c) 2016 Cisco and/or its affiliates. - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - *------------------------------------------------------------------ - */ - -#include -#include -#include - -#include -#include -#include - -#include - -#define vl_typedefs /* define message structures */ -#include -#undef vl_typedefs - -#define vl_endianfun /* define message structures */ -#include -#undef vl_endianfun - -/* instantiate all the print functions we know about */ -#define vl_print(handle, ...) vlib_cli_output (handle, __VA_ARGS__) -#define vl_printfun -#include -#undef vl_printfun - -#include - -#define foreach_vpe_api_msg \ -_(SR_LOCALSID_ADD_DEL, sr_localsid_add_del) \ -_(SR_POLICY_DEL, sr_policy_del) \ -_(SR_STEERING_ADD_DEL, sr_steering_add_del) -//_(SR_LOCALSIDS, sr_localsids_dump) -//_(SR_LOCALSID_BEHAVIORS, sr_localsid_behaviors_dump) - -static void vl_api_sr_localsid_add_del_t_handler - (vl_api_sr_localsid_add_del_t * mp) -{ - vl_api_sr_localsid_add_del_reply_t *rmp; - int rv = 0; -/* - * int sr_cli_localsid (char is_del, ip6_address_t *localsid_addr, - * char end_psp, u8 behavior, u32 sw_if_index, u32 vlan_index, u32 fib_table, - * ip46_address_t *nh_addr, void *ls_plugin_mem) - */ - rv = sr_cli_localsid (mp->is_del, - (ip6_address_t *) & mp->localsid_addr, - mp->end_psp, - mp->behavior, - ntohl (mp->sw_if_index), - ntohl (mp->vlan_index), - ntohl (mp->fib_table), - (ip46_address_t *) & mp->nh_addr, NULL); - - REPLY_MACRO (VL_API_SR_LOCALSID_ADD_DEL_REPLY); -} - -static void -vl_api_sr_policy_add_t_handler (vl_api_sr_policy_add_t * mp) -{ - vl_api_sr_policy_add_reply_t *rmp; - ip6_address_t *segments = 0, *seg; - ip6_address_t *this_address = (ip6_address_t *) mp->segments; - - int i; - for (i = 0; i < mp->n_segments; i++) - { - vec_add2 (segments, seg, 1); - clib_memcpy (seg->as_u8, this_address->as_u8, sizeof (*this_address)); - this_address++; - } - -/* - * sr_policy_add (ip6_address_t *bsid, ip6_address_t *segments, - * u32 weight, u8 behavior, u32 fib_table, u8 is_encap) - */ - int rv = 0; - rv = sr_policy_add ((ip6_address_t *) & mp->bsid_addr, - segments, - ntohl (mp->weight), - mp->type, ntohl (mp->fib_table), mp->is_encap); - - REPLY_MACRO (VL_API_SR_POLICY_ADD_REPLY); -} - -static void -vl_api_sr_policy_mod_t_handler (vl_api_sr_policy_mod_t * mp) -{ - vl_api_sr_policy_mod_reply_t *rmp; - - ip6_address_t *segments = 0, *seg; - ip6_address_t *this_address = (ip6_address_t *) mp->segments; - - int i; - for (i = 0; i < mp->n_segments; i++) - { - vec_add2 (segments, seg, 1); - clib_memcpy (seg->as_u8, this_address->as_u8, sizeof (*this_address)); - this_address++; - } - - int rv = 0; -/* - * int - * sr_policy_mod(ip6_address_t *bsid, u32 index, u32 fib_table, - * u8 operation, ip6_address_t *segments, u32 sl_index, - * u32 weight, u8 is_encap) - */ - rv = sr_policy_mod ((ip6_address_t *) & mp->bsid_addr, - ntohl (mp->sr_policy_index), - ntohl (mp->fib_table), - mp->operation, - segments, ntohl (mp->sl_index), ntohl (mp->weight)); - - REPLY_MACRO (VL_API_SR_POLICY_MOD_REPLY); -} - -static void -vl_api_sr_policy_del_t_handler (vl_api_sr_policy_del_t * mp) -{ - vl_api_sr_policy_del_reply_t *rmp; - int rv = 0; -/* - * int - * sr_policy_del (ip6_address_t *bsid, u32 index) - */ - rv = sr_policy_del ((ip6_address_t *) & mp->bsid_addr, - ntohl (mp->sr_policy_index)); - - REPLY_MACRO (VL_API_SR_POLICY_DEL_REPLY); -} - -static void vl_api_sr_steering_add_del_t_handler - (vl_api_sr_steering_add_del_t * mp) -{ - vl_api_sr_steering_add_del_reply_t *rmp; - int rv = 0; -/* - * int - * sr_steering_policy(int is_del, ip6_address_t *bsid, u32 sr_policy_index, - * u32 table_id, ip46_address_t *prefix, u32 mask_width, u32 sw_if_index, - * u8 traffic_type) - */ - rv = sr_steering_policy (mp->is_del, - (ip6_address_t *) & mp->bsid_addr, - ntohl (mp->sr_policy_index), - ntohl (mp->table_id), - (ip46_address_t *) & mp->prefix_addr, - ntohl (mp->mask_width), - ntohl (mp->sw_if_index), mp->traffic_type); - - REPLY_MACRO (VL_API_SR_STEERING_ADD_DEL_REPLY); -} - -/* - * sr_api_hookup - * Add vpe's API message handlers to the table. - * vlib has alread mapped shared memory and - * added the client registration handlers. - * See .../vlib-api/vlibmemory/memclnt_vlib.c:memclnt_process() - */ -#define vl_msg_name_crc_list -#include -#undef vl_msg_name_crc_list - -static void -setup_message_id_table (api_main_t * am) -{ -#define _(id,n,crc) vl_msg_api_add_msg_name_crc (am, #n "_" #crc, id); - foreach_vl_msg_name_crc_sr; -#undef _ -} - -static clib_error_t * -sr_api_hookup (vlib_main_t * vm) -{ - api_main_t *am = &api_main; - -#define _(N,n) \ - vl_msg_api_set_handlers(VL_API_##N, #n, \ - vl_api_##n##_t_handler, \ - vl_noop_handler, \ - vl_api_##n##_t_endian, \ - vl_api_##n##_t_print, \ - sizeof(vl_api_##n##_t), 1); - foreach_vpe_api_msg; -#undef _ - - /* - * Manually register the sr policy add msg, so we trace - * enough bytes to capture a typical segment list - */ - vl_msg_api_set_handlers (VL_API_SR_POLICY_ADD, - "sr_policy_add", - vl_api_sr_policy_add_t_handler, - vl_noop_handler, - vl_api_sr_policy_add_t_endian, - vl_api_sr_policy_add_t_print, 256, 1); - - /* - * Manually register the sr policy mod msg, so we trace - * enough bytes to capture a typical segment list - */ - vl_msg_api_set_handlers (VL_API_SR_POLICY_MOD, - "sr_policy_mod", - vl_api_sr_policy_mod_t_handler, - vl_noop_handler, - vl_api_sr_policy_mod_t_endian, - vl_api_sr_policy_mod_t_print, 256, 1); - - /* - * Set up the (msg_name, crc, message-id) table - */ - setup_message_id_table (am); - - return 0; -} - -VLIB_API_INIT_FUNCTION (sr_api_hookup); - -/* - * fd.io coding-style-patch-verification: ON - * - * Local Variables: - * eval: (c-set-style "gnu") - * End: - */ diff --git a/src/vnet/sr/sr_doc.md b/src/vnet/sr/sr_doc.md deleted file mode 100644 index fd92bdf2..00000000 --- a/src/vnet/sr/sr_doc.md +++ /dev/null @@ -1,55 +0,0 @@ -# SRv6: Segment Routing for IPv6 {#srv6_doc} - -This is a memo intended to contain documentation of the VPP SRv6 implementation. -Everything that is not directly obvious should come here. -For any feedback on content that should be explained please mailto:pcamaril@cisco.com - -## Segment Routing - -Segment routing is a network technology focused on addressing the limitations of existing IP and Multiprotocol Label Switching (MPLS) networks in terms of simplicity, scale, and ease of operation. It is a foundation for application engineered routing as it prepares the networks for new business models where applications can control the network behavior. - -Segment routing seeks the right balance between distributed intelligence and centralized optimization and programming. It was built for the software-defined networking (SDN) era. - -Segment routing enhances packet forwarding behavior by enabling a network to transport unicast packets through a specific forwarding path, different from the normal path that a packet usually takes (IGP shortest path or BGP best path). This capability benefits many use cases, and one can build those specific paths based on application requirements. - -Segment routing uses the source routing paradigm. A node, usually a router but also a switch, a trusted server, or a virtual forwarder running on a hypervisor, steers a packet through an ordered list of instructions, called segments. A segment can represent any instruction, topological or service-based. A segment can have a local semantic to a segment-routing node or global within a segment-routing network. Segment routing allows an operator to enforce a flow through any topological path and service chain while maintaining per-flow state only at the ingress node to the segment-routing network. Segment routing also supports equal-cost multipath (ECMP) by design. - -Segment routing can operate with either an MPLS or an IPv6 data plane. All the currently available MPLS services, such as Layer 3 VPN (L3VPN), L2VPN (Virtual Private Wire Service [VPWS], Virtual Private LAN Services [VPLS], Ethernet VPN [E-VPN], and Provider Backbone Bridging Ethernet VPN [PBB-EVPN]), can run on top of a segment-routing transport network. - -**The implementation of Segment Routing in VPP only covers the IPv6 data plane (SRv6).** - -## Segment Routing terminology - -* Segment Routing Header (SRH): IPv6 routing extension header of type 'Segment Routing'. (draft-ietf-6man-segment-routing-header-05) -* SegmentID (SID): is an IPv6 address. -* Segment List (SL) (SID List): is the sequence of SIDs that the packet will traverse. -* SR Policy: defines the SRH that will be applied to a packet. A packet steered into an SR policy may either receive the SRH by IPv6 header encapsulation (as recommended in draft-ietf-6man-rfc2460bis) or it could be inserted within an existing IPv6 header. An SR policy is uniquely identified by its Binding SID and associated with a weighted set of Segment Lists. In case several SID lists are defined, traffic steered into the policy is unevenly load-balanced among them according to their respective weights. -* Local SID: is a SID associated with a processing function on the local node, which may go from advancing to the next SID in the SRH, to complex user-defined behaviors. When a FIB lookup, either in the main FIB or in a specific VRF, returns a match on a local SID, the associated function is performed. -* BindingSID: a BindingSID is a SID (only one) associated one-one with an SR Policy. If a packet arrives with an IPv6 DA corresponding to a BindingSID, then the SR policy will be applied to such packet. - -## SRv6 Features in VPP - -The SRv6 Network Programming (*draft-filsfils-spring-srv6-network-programming*) defines the SRv6 architecture. - -VPP supports the following SRv6 LocalSID functions: End, End.X, End.DX6, End.DT6, End.DX4, End.DT4, End.DX2, End.B6, End.B6.Encaps. - -For further information and how to configure each specific function: @subpage srv6_localsid_doc - - -The Segment Routing Policy (*draft-filsfils-spring-segment-routing-policy*) defines SR Policies. - -VPP supports SRv6 Policies with T.Insert and T.Encaps behaviors. - -For further information on how to create SR Policies: @subpage srv6_policy_doc - -For further information on how to steer traffic into SR Policies: @subpage srv6_steering_doc - -## SRv6 LocalSID development framework - -One of the *'key'* concepts about SRv6 is network programmability. This is why an SRv6 LocalSID is associated with an specific function. - -However, the trully way to enable network programmability is allowing any developer **easily** create his own SRv6 LocalSID function. That is the reason why we have added some API calls such that any developer can code his own SRv6 LocalSID behaviors as plugins an add them to the running SRv6 code. - -The principle is that the developer only codes the behavior -the graph node-. However all the FIB handling, SR LocalSID instantiation and so on are done by the VPP SRv6 code. - -For more information please refer to: @subpage srv6_plugin_doc diff --git a/src/vnet/sr/sr_localsid.c b/src/vnet/sr/sr_localsid.c deleted file mode 100755 index 32fc5f82..00000000 --- a/src/vnet/sr/sr_localsid.c +++ /dev/null @@ -1,1492 +0,0 @@ -/* - * sr_localsid.c: ipv6 segment routing Endpoint behaviors - * - * Copyright (c) 2016 Cisco and/or its affiliates. - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -/** - * @file - * @brief Processing of packets with a SRH - * - * CLI to define new Segment Routing End processing functions. - * Graph node to support such functions. - * - * Each function associates an SRv6 segment (IPv6 address) with an specific - * Segment Routing function. - * - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include - -/** - * @brief Dynamically added SR localsid DPO type - */ -static dpo_type_t sr_localsid_dpo_type; -static dpo_type_t sr_localsid_d_dpo_type; - -/** - * @brief SR localsid add/del - * - * Function to add or delete SR LocalSIDs. - * - * @param is_del Boolean of whether its a delete instruction - * @param localsid_addr IPv6 address of the localsid - * @param is_decap Boolean of whether decapsulation is allowed in this function - * @param behavior Type of behavior (function) for this localsid - * @param sw_if_index Only for L2/L3 xconnect. OIF. In VRF variant the fib_table. - * @param vlan_index Only for L2 xconnect. Outgoing VLAN tag. - * @param fib_table FIB table in which we should install the localsid entry - * @param nh_addr Next Hop IPv4/IPv6 address. Only for L2/L3 xconnect. - * - * @return 0 on success, error otherwise. - */ -int -sr_cli_localsid (char is_del, ip6_address_t * localsid_addr, - char end_psp, u8 behavior, u32 sw_if_index, u32 vlan_index, - u32 fib_table, ip46_address_t * nh_addr, void *ls_plugin_mem) -{ - ip6_sr_main_t *sm = &sr_main; - uword *p; - int rv; - - ip6_sr_localsid_t *ls = 0; - - dpo_id_t dpo = DPO_INVALID; - - /* Search for the item */ - p = mhash_get (&sm->sr_localsids_index_hash, localsid_addr); - - if (p) - { - if (is_del) - { - /* Retrieve localsid */ - ls = pool_elt_at_index (sm->localsids, p[0]); - /* Delete FIB entry */ - fib_prefix_t pfx = { - .fp_proto = FIB_PROTOCOL_IP6, - .fp_len = 128, - .fp_addr = { - .ip6 = *localsid_addr, - } - }; - - fib_table_entry_delete (fib_table_find (FIB_PROTOCOL_IP6, - fib_table), - &pfx, FIB_SOURCE_SR); - - /* In case it is a Xconnect iface remove the (OIF, NHOP) adj */ - if (ls->behavior == SR_BEHAVIOR_X || ls->behavior == SR_BEHAVIOR_DX6 - || ls->behavior == SR_BEHAVIOR_DX4) - adj_unlock (ls->nh_adj); - - if (ls->behavior >= SR_BEHAVIOR_LAST) - { - sr_localsid_fn_registration_t *plugin = 0; - plugin = pool_elt_at_index (sm->plugin_functions, - ls->behavior - SR_BEHAVIOR_LAST); - - /* Callback plugin removal function */ - rv = plugin->removal (ls); - } - - /* Delete localsid registry */ - pool_put (sm->localsids, ls); - mhash_unset (&sm->sr_localsids_index_hash, localsid_addr, NULL); - return 1; - } - else /* create with function already existing; complain */ - return -1; - } - else - /* delete; localsid does not exist; complain */ - if (is_del) - return -2; - - /* Check whether there exists a FIB entry with such address */ - fib_prefix_t pfx = { - .fp_proto = FIB_PROTOCOL_IP6, - .fp_len = 128, - }; - - pfx.fp_addr.as_u64[0] = localsid_addr->as_u64[0]; - pfx.fp_addr.as_u64[1] = localsid_addr->as_u64[1]; - - /* Lookup the FIB index associated to the table id provided */ - u32 fib_index = fib_table_find (FIB_PROTOCOL_IP6, fib_table); - if (fib_index == ~0) - return -3; - - /* Lookup the localsid in such FIB table */ - fib_node_index_t fei = fib_table_lookup_exact_match (fib_index, &pfx); - if (FIB_NODE_INDEX_INVALID != fei) - return -4; //There is an entry for such address (the localsid addr) - - /* Create a new localsid registry */ - pool_get (sm->localsids, ls); - memset (ls, 0, sizeof (*ls)); - - clib_memcpy (&ls->localsid, localsid_addr, sizeof (ip6_address_t)); - ls->end_psp = end_psp; - ls->behavior = behavior; - ls->nh_adj = (u32) ~ 0; - ls->fib_table = fib_table; - switch (behavior) - { - case SR_BEHAVIOR_END: - break; - case SR_BEHAVIOR_X: - ls->sw_if_index = sw_if_index; - clib_memcpy (&ls->next_hop.ip6, &nh_addr->ip6, sizeof (ip6_address_t)); - break; - case SR_BEHAVIOR_DX4: - ls->sw_if_index = sw_if_index; - clib_memcpy (&ls->next_hop.ip4, &nh_addr->ip4, sizeof (ip4_address_t)); - break; - case SR_BEHAVIOR_DX6: - ls->sw_if_index = sw_if_index; - clib_memcpy (&ls->next_hop.ip6, &nh_addr->ip6, sizeof (ip6_address_t)); - break; - case SR_BEHAVIOR_DT6: - ls->vrf_index = sw_if_index; - break; - case SR_BEHAVIOR_DX2: - ls->sw_if_index = sw_if_index; - ls->vlan_index = vlan_index; - break; - } - - /* Figure out the adjacency magic for Xconnect variants */ - if (ls->behavior == SR_BEHAVIOR_X || ls->behavior == SR_BEHAVIOR_DX4 - || ls->behavior == SR_BEHAVIOR_DX6) - { - adj_index_t nh_adj_index = ADJ_INDEX_INVALID; - - /* Retrieve the adjacency corresponding to the (OIF, next_hop) */ - if (ls->behavior == SR_BEHAVIOR_DX6 || ls->behavior == SR_BEHAVIOR_X) - nh_adj_index = adj_nbr_add_or_lock (FIB_PROTOCOL_IP6, VNET_LINK_IP6, - nh_addr, sw_if_index); - - else if (ls->behavior == SR_BEHAVIOR_DX4) - nh_adj_index = adj_nbr_add_or_lock (FIB_PROTOCOL_IP4, VNET_LINK_IP4, - nh_addr, sw_if_index); - - /* Check for ADJ creation error. If so panic */ - if (nh_adj_index == ADJ_INDEX_INVALID) - { - pool_put (sm->localsids, ls); - return -5; - } - - ls->nh_adj = nh_adj_index; - } - - /* Set DPO */ - if (ls->behavior == SR_BEHAVIOR_END || ls->behavior == SR_BEHAVIOR_X) - dpo_set (&dpo, sr_localsid_dpo_type, DPO_PROTO_IP6, ls - sm->localsids); - else if (ls->behavior > SR_BEHAVIOR_D_FIRST - && ls->behavior < SR_BEHAVIOR_LAST) - dpo_set (&dpo, sr_localsid_d_dpo_type, DPO_PROTO_IP6, ls - sm->localsids); - else if (ls->behavior >= SR_BEHAVIOR_LAST) - { - sr_localsid_fn_registration_t *plugin = 0; - plugin = pool_elt_at_index (sm->plugin_functions, - ls->behavior - SR_BEHAVIOR_LAST); - /* Copy the unformat memory result */ - ls->plugin_mem = ls_plugin_mem; - /* Callback plugin creation function */ - rv = plugin->creation (ls); - if (rv) - { - pool_put (sm->localsids, ls); - return -6; - } - dpo_set (&dpo, plugin->dpo, DPO_PROTO_IP6, ls - sm->localsids); - } - - /* Set hash key for searching localsid by address */ - mhash_set (&sm->sr_localsids_index_hash, localsid_addr, ls - sm->localsids, - NULL); - - fib_table_entry_special_dpo_add (fib_index, &pfx, FIB_SOURCE_SR, - FIB_ENTRY_FLAG_EXCLUSIVE, &dpo); - dpo_reset (&dpo); - - /* Set counter to zero */ - vlib_validate_combined_counter (&(sm->sr_ls_valid_counters), - ls - sm->localsids); - vlib_validate_combined_counter (&(sm->sr_ls_invalid_counters), - ls - sm->localsids); - - vlib_zero_combined_counter (&(sm->sr_ls_valid_counters), - ls - sm->localsids); - vlib_zero_combined_counter (&(sm->sr_ls_invalid_counters), - ls - sm->localsids); - - return 0; -} - -/** - * @brief SR LocalSID CLI function. - * - * @see sr_cli_localsid - */ -static clib_error_t * -sr_cli_localsid_command_fn (vlib_main_t * vm, unformat_input_t * input, - vlib_cli_command_t * cmd) -{ - vnet_main_t *vnm = vnet_get_main (); - ip6_sr_main_t *sm = &sr_main; - u32 sw_if_index = (u32) ~ 0, vlan_index = (u32) ~ 0, fib_index = 0; - int is_del = 0; - int end_psp = 0; - ip6_address_t resulting_address; - ip46_address_t next_hop; - char address_set = 0; - char behavior = 0; - void *ls_plugin_mem = 0; - - int rv; - - memset (&resulting_address, 0, sizeof (ip6_address_t)); - ip46_address_reset (&next_hop); - - while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) - { - if (unformat (input, "del")) - is_del = 1; - else if (!address_set - && unformat (input, "address %U", unformat_ip6_address, - &resulting_address)) - address_set = 1; - else if (!address_set - && unformat (input, "addr %U", unformat_ip6_address, - &resulting_address)) - address_set = 1; - else if (unformat (input, "fib-table %u", &fib_index)); - else if (vlan_index == (u32) ~ 0 - && unformat (input, "vlan %u", &vlan_index)); - else if (!behavior && unformat (input, "behavior")) - { - if (unformat (input, "end.x %U %U", - unformat_vnet_sw_interface, vnm, &sw_if_index, - unformat_ip6_address, &next_hop.ip6)) - behavior = SR_BEHAVIOR_X; - else if (unformat (input, "end.dx6 %U %U", - unformat_vnet_sw_interface, vnm, &sw_if_index, - unformat_ip6_address, &next_hop.ip6)) - behavior = SR_BEHAVIOR_DX6; - else if (unformat (input, "end.dx4 %U %U", - unformat_vnet_sw_interface, vnm, &sw_if_index, - unformat_ip4_address, &next_hop.ip4)) - behavior = SR_BEHAVIOR_DX4; - else if (unformat (input, "end.dx2 %U", - unformat_vnet_sw_interface, vnm, &sw_if_index)) - behavior = SR_BEHAVIOR_DX2; - else if (unformat (input, "end.dt6 %u", &sw_if_index)) - behavior = SR_BEHAVIOR_DT6; - else if (unformat (input, "end.dt4 %u", &sw_if_index)) - behavior = SR_BEHAVIOR_DT4; - else - { - /* Loop over all the plugin behavior format functions */ - sr_localsid_fn_registration_t *plugin = 0, **vec_plugins = 0; - sr_localsid_fn_registration_t **plugin_it = 0; - - /* Create a vector out of the plugin pool as recommended */ - /* *INDENT-OFF* */ - pool_foreach (plugin, sm->plugin_functions, - { - vec_add1 (vec_plugins, plugin); - }); - /* *INDENT-ON* */ - - vec_foreach (plugin_it, vec_plugins) - { - if (unformat - (input, "%U", (*plugin_it)->ls_unformat, &ls_plugin_mem)) - { - behavior = (*plugin_it)->sr_localsid_function_number; - break; - } - } - } - - if (!behavior) - { - if (unformat (input, "end")) - behavior = SR_BEHAVIOR_END; - else - break; - } - } - else if (!end_psp && unformat (input, "psp")) - end_psp = 1; - else - break; - } - - if (!behavior && end_psp) - behavior = SR_BEHAVIOR_END; - - if (!address_set) - return clib_error_return (0, - "Error: SRv6 LocalSID address is mandatory."); - if (!is_del && !behavior) - return clib_error_return (0, - "Error: SRv6 LocalSID behavior is mandatory."); - if (vlan_index != (u32) ~ 0) - return clib_error_return (0, - "Error: SRv6 End.DX2 with rewrite VLAN tag not supported by now."); - if (end_psp && !(behavior == SR_BEHAVIOR_END || behavior == SR_BEHAVIOR_X)) - return clib_error_return (0, - "Error: SRv6 PSP only compatible with End and End.X"); - - rv = sr_cli_localsid (is_del, &resulting_address, end_psp, behavior, - sw_if_index, vlan_index, fib_index, &next_hop, - ls_plugin_mem); - - switch (rv) - { - case 0: - break; - case 1: - return 0; - case -1: - return clib_error_return (0, - "Identical localsid already exists. Requested localsid not created."); - case -2: - return clib_error_return (0, - "The requested localsid could not be deleted. SR localsid not found"); - case -3: - return clib_error_return (0, "FIB table %u does not exist", fib_index); - case -4: - return clib_error_return (0, "There is already one FIB entry for the" - "requested localsid non segment routing related"); - case -5: - return clib_error_return (0, - "Could not create ARP/ND entry for such next_hop. Internal error."); - case -6: - return clib_error_return (0, - "Error on the plugin based localsid creation."); - default: - return clib_error_return (0, "BUG: sr localsid returns %d", rv); - } - return 0; -} - -/* *INDENT-OFF* */ -VLIB_CLI_COMMAND (sr_localsid_command, static) = { - .path = "sr localsid", - .short_help = "sr localsid (del) address XX:XX::YY:YY" - "(fib-table 8) behavior STRING", - .long_help = - "Create SR LocalSID and binds it to a particular behavior\n" - "Arguments:\n" - "\tlocalSID IPv6_addr(128b) LocalSID IPv6 address\n" - "\t(fib-table X) Optional. VRF where to install SRv6 localsid\n" - "\tbehavior STRING Specifies the behavior\n" - "\n\tBehaviors:\n" - "\tEnd\t-> Endpoint.\n" - "\tEnd.X\t-> Endpoint with decapsulation and Layer-3 cross-connect.\n" - "\t\tParameters: ' '\n" - "\tEnd.DX2\t-> Endpoint with decapsulation and Layer-2 cross-connect.\n" - "\t\tParameters: ''\n" - "\tEnd.DX6\t-> Endpoint with decapsulation and IPv6 cross-connect.\n" - "\t\tParameters: ' '\n" - "\tEnd.DX4\t-> Endpoint with decapsulation and IPv4 cross-connect.\n" - "\t\tParameters: ' '\n" - "\tEnd.DT6\t-> Endpoint with decapsulation and specific IPv6 table lookup.\n" - "\t\tParameters: ''\n" - "\tEnd.DT4\t-> Endpoint with decapsulation and specific IPv4 table lookup.\n" - "\t\tParameters: ''\n", - .function = sr_cli_localsid_command_fn, -}; -/* *INDENT-ON* */ - -/** - * @brief CLI function to 'show' all SR LocalSIDs on console. - */ -static clib_error_t * -show_sr_localsid_command_fn (vlib_main_t * vm, unformat_input_t * input, - vlib_cli_command_t * cmd) -{ - vnet_main_t *vnm = vnet_get_main (); - ip6_sr_main_t *sm = &sr_main; - ip6_sr_localsid_t **localsid_list = 0; - ip6_sr_localsid_t *ls; - int i; - - vlib_cli_output (vm, "SRv6 - My LocalSID Table:"); - vlib_cli_output (vm, "========================="); - /* *INDENT-OFF* */ - pool_foreach (ls, sm->localsids, ({ vec_add1 (localsid_list, ls); })); - /* *INDENT-ON* */ - for (i = 0; i < vec_len (localsid_list); i++) - { - ls = localsid_list[i]; - switch (ls->behavior) - { - case SR_BEHAVIOR_END: - vlib_cli_output (vm, "\tAddress: \t%U\n\tBehavior: \tEnd", - format_ip6_address, &ls->localsid); - break; - case SR_BEHAVIOR_X: - vlib_cli_output (vm, - "\tAddress: \t%U\n\tBehavior: \tX (Endpoint with Layer-3 cross-connect)" - "\n\tIface: \t%U\n\tNext hop: \t%U", - format_ip6_address, &ls->localsid, - format_vnet_sw_if_index_name, vnm, ls->sw_if_index, - format_ip6_address, &ls->next_hop.ip6); - break; - case SR_BEHAVIOR_DX4: - vlib_cli_output (vm, - "\tAddress: \t%U\n\tBehavior: \tDX4 (Endpoint with decapsulation and IPv4 cross-connect)" - "\n\tIface: \t%U\n\tNext hop: \t%U", - format_ip6_address, &ls->localsid, - format_vnet_sw_if_index_name, vnm, ls->sw_if_index, - format_ip4_address, &ls->next_hop.ip4); - break; - case SR_BEHAVIOR_DX6: - vlib_cli_output (vm, - "\tAddress: \t%U\n\tBehavior: \tDX6 (Endpoint with decapsulation and IPv6 cross-connect)" - "\n\tIface: \t%U\n\tNext hop: \t%U", - format_ip6_address, &ls->localsid, - format_vnet_sw_if_index_name, vnm, ls->sw_if_index, - format_ip6_address, &ls->next_hop.ip6); - break; - case SR_BEHAVIOR_DX2: - if (ls->vlan_index == (u32) ~ 0) - vlib_cli_output (vm, - "\tAddress: \t%U\n\tBehavior: \tDX2 (Endpoint with decapulation and Layer-2 cross-connect)" - "\n\tIface: \t%U", format_ip6_address, - &ls->localsid, format_vnet_sw_if_index_name, vnm, - ls->sw_if_index); - else - vlib_cli_output (vm, - "Unsupported yet. (DX2 with egress VLAN rewrite)"); - break; - case SR_BEHAVIOR_DT6: - vlib_cli_output (vm, - "\tAddress: \t%U\n\tBehavior: \tDT6 (Endpoint with decapsulation and specific IPv6 table lookup)" - "\n\tTable: %u", format_ip6_address, &ls->localsid, - ls->fib_table); - break; - case SR_BEHAVIOR_DT4: - vlib_cli_output (vm, - "\tAddress: \t%U\n\tBehavior: \tDT4 (Endpoint with decapsulation and specific IPv4 table lookup)" - "\n\tTable: \t%u", format_ip6_address, - &ls->localsid, ls->fib_table); - break; - default: - if (ls->behavior >= SR_BEHAVIOR_LAST) - { - sr_localsid_fn_registration_t *plugin = - pool_elt_at_index (sm->plugin_functions, - ls->behavior - SR_BEHAVIOR_LAST); - - vlib_cli_output (vm, "\tAddress: \t%U\n" - "\tBehavior: \t%s (%s)\n\t%U", - format_ip6_address, &ls->localsid, - plugin->keyword_str, plugin->def_str, - plugin->ls_format, ls->plugin_mem); - } - else - //Should never get here... - vlib_cli_output (vm, "Internal error"); - break; - } - if (ls->end_psp) - vlib_cli_output (vm, "\tPSP: \tTrue\n"); - - /* Print counters */ - vlib_counter_t valid, invalid; - vlib_get_combined_counter (&(sm->sr_ls_valid_counters), i, &valid); - vlib_get_combined_counter (&(sm->sr_ls_invalid_counters), i, &invalid); - vlib_cli_output (vm, "\tGood traffic: \t[%Ld packets : %Ld bytes]\n", - valid.packets, valid.bytes); - vlib_cli_output (vm, "\tBad traffic: \t[%Ld packets : %Ld bytes]\n", - invalid.packets, invalid.bytes); - vlib_cli_output (vm, "--------------------"); - } - return 0; -} - -/* *INDENT-OFF* */ -VLIB_CLI_COMMAND (show_sr_localsid_command, static) = { - .path = "show sr localsids", - .short_help = "show sr localsids", - .function = show_sr_localsid_command_fn, -}; -/* *INDENT-ON* */ - -/** - * @brief Function to 'clear' ALL SR localsid counters - */ -static clib_error_t * -clear_sr_localsid_counters_command_fn (vlib_main_t * vm, - unformat_input_t * input, - vlib_cli_command_t * cmd) -{ - ip6_sr_main_t *sm = &sr_main; - - vlib_clear_combined_counters (&(sm->sr_ls_valid_counters)); - vlib_clear_combined_counters (&(sm->sr_ls_invalid_counters)); - - return 0; -} - -/* *INDENT-OFF* */ -VLIB_CLI_COMMAND (clear_sr_localsid_counters_command, static) = { - .path = "clear sr localsid counters", - .short_help = "clear sr localsid counters", - .function = clear_sr_localsid_counters_command_fn, -}; -/* *INDENT-ON* */ - -/************************ SR LocalSID graphs node ****************************/ -/** - * @brief SR localsid node trace - */ -typedef struct -{ - u32 localsid_index; - ip6_address_t src, out_dst; - u8 sr[256]; - u8 num_segments; - u8 segments_left; - //With SRv6 header update include flags here. -} sr_localsid_trace_t; - -#define foreach_sr_localsid_error \ -_(NO_INNER_HEADER, "(SR-Error) No inner IP header") \ -_(NO_MORE_SEGMENTS, "(SR-Error) No more segments") \ -_(NO_SRH, "(SR-Error) No SR header") \ -_(NO_PSP, "(SR-Error) PSP Not available (segments left > 0)") \ -_(NOT_LS, "(SR-Error) Decaps not available (segments left > 0)") \ -_(L2, "(SR-Error) SRv6 decapsulated a L2 frame without dest") - -typedef enum -{ -#define _(sym,str) SR_LOCALSID_ERROR_##sym, - foreach_sr_localsid_error -#undef _ - SR_LOCALSID_N_ERROR, -} sr_localsid_error_t; - -static char *sr_localsid_error_strings[] = { -#define _(sym,string) string, - foreach_sr_localsid_error -#undef _ -}; - -#define foreach_sr_localsid_next \ -_(ERROR, "error-drop") \ -_(IP6_LOOKUP, "ip6-lookup") \ -_(IP4_LOOKUP, "ip4-lookup") \ -_(IP6_REWRITE, "ip6-rewrite") \ -_(IP4_REWRITE, "ip4-rewrite") \ -_(INTERFACE_OUTPUT, "interface-output") - -typedef enum -{ -#define _(s,n) SR_LOCALSID_NEXT_##s, - foreach_sr_localsid_next -#undef _ - SR_LOCALSID_N_NEXT, -} sr_localsid_next_t; - -/** - * @brief SR LocalSID graph node trace function - * - * @see sr_localsid - */ -u8 * -format_sr_localsid_trace (u8 * s, va_list * args) -{ - CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *); - CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *); - ip6_sr_main_t *sm = &sr_main; - sr_localsid_trace_t *t = va_arg (*args, sr_localsid_trace_t *); - - ip6_sr_localsid_t *ls = - pool_elt_at_index (sm->localsids, t->localsid_index); - - s = - format (s, "SR-LOCALSID:\n\tLocalsid: %U\n", format_ip6_address, - &ls->localsid); - switch (ls->behavior) - { - case SR_BEHAVIOR_END: - s = format (s, "\tBehavior: End\n"); - break; - case SR_BEHAVIOR_DX6: - s = format (s, "\tBehavior: Decapsulation with IPv6 L3 xconnect\n"); - break; - case SR_BEHAVIOR_DX4: - s = format (s, "\tBehavior: Decapsulation with IPv4 L3 xconnect\n"); - break; - case SR_BEHAVIOR_X: - s = format (s, "\tBehavior: IPv6 L3 xconnect\n"); - break; - case SR_BEHAVIOR_DT6: - s = format (s, "\tBehavior: Decapsulation with IPv6 Table lookup\n"); - break; - case SR_BEHAVIOR_DT4: - s = format (s, "\tBehavior: Decapsulation with IPv4 Table lookup\n"); - break; - case SR_BEHAVIOR_DX2: - s = format (s, "\tBehavior: Decapsulation with L2 xconnect\n"); - break; - default: - s = format (s, "\tBehavior: defined in plugin\n"); //TODO - break; - } - if (t->num_segments != 0xFF) - { - if (t->num_segments > 0) - { - s = format (s, "\tSegments left: %d\n", t->num_segments); - s = format (s, "\tSID list: [in ietf order]"); - int i = 0; - for (i = 0; i < t->num_segments; i++) - { - s = format (s, "\n\t-> %U", format_ip6_address, - (ip6_address_t *) & t->sr[i * - sizeof (ip6_address_t)]); - } - } - } - return s; -} - -/** - * @brief Function doing End processing. - */ -static_always_inline void -end_srh_processing (vlib_node_runtime_t * node, - vlib_buffer_t * b0, - ip6_header_t * ip0, - ip6_sr_header_t * sr0, - ip6_sr_localsid_t * ls0, u32 * next0) -{ - ip6_address_t *new_dst0; - - if (PREDICT_TRUE (sr0->type == ROUTING_HEADER_TYPE_SR)) - { - if (PREDICT_TRUE (sr0->segments_left != 0)) - { - sr0->segments_left -= 1; - new_dst0 = (ip6_address_t *) (sr0->segments); - new_dst0 += sr0->segments_left; - ip0->dst_address.as_u64[0] = new_dst0->as_u64[0]; - ip0->dst_address.as_u64[1] = new_dst0->as_u64[1]; - - if (ls0->behavior == SR_BEHAVIOR_X) - { - vnet_buffer (b0)->ip.adj_index[VLIB_TX] = ls0->nh_adj; - *next0 = SR_LOCALSID_NEXT_IP6_REWRITE; - } - } - else - { - *next0 = SR_LOCALSID_NEXT_ERROR; - b0->error = node->errors[SR_LOCALSID_ERROR_NO_MORE_SEGMENTS]; - } - } - else - { - /* Error. Routing header of type != SR */ - *next0 = SR_LOCALSID_NEXT_ERROR; - b0->error = node->errors[SR_LOCALSID_ERROR_NO_SRH]; - } -} - -/* - * @brief Function doing SRH processing for D* variants - */ -//FixME. I must crosscheck that next_proto matches the localsid -static_always_inline void -end_decaps_srh_processing (vlib_node_runtime_t * node, - vlib_buffer_t * b0, - ip6_header_t * ip0, - ip6_sr_header_t * sr0, - ip6_sr_localsid_t * ls0, u32 * next0) -{ - /* Compute the size of the IPv6 header with all Ext. headers */ - u8 next_proto; - ip6_ext_header_t *next_ext_header; - u16 total_size = 0; - - next_proto = ip0->protocol; - next_ext_header = (void *) (ip0 + 1); - total_size = sizeof (ip6_header_t); - while (ip6_ext_hdr (next_proto)) - { - total_size += ip6_ext_header_len (next_ext_header); - next_proto = next_ext_header->next_hdr; - next_ext_header = ip6_ext_next_header (next_ext_header); - } - - /* Ensure this is the last segment. Otherwise drop. */ - if (sr0 && sr0->segments_left != 0) - { - *next0 = SR_LOCALSID_NEXT_ERROR; - b0->error = node->errors[SR_LOCALSID_ERROR_NOT_LS]; - return; - } - - switch (next_proto) - { - case IP_PROTOCOL_IPV6: - /* Encap-End IPv6. Pop outer IPv6 header. */ - if (ls0->behavior == SR_BEHAVIOR_DX6) - { - vlib_buffer_advance (b0, total_size); - vnet_buffer (b0)->ip.adj_index[VLIB_TX] = ls0->nh_adj; - *next0 = SR_LOCALSID_NEXT_IP6_REWRITE; - return; - } - else if (ls0->behavior == SR_BEHAVIOR_DT6) - { - vlib_buffer_advance (b0, total_size); - vnet_buffer (b0)->sw_if_index[VLIB_TX] = ls0->fib_table; - return; - } - break; - case IP_PROTOCOL_IP_IN_IP: - /* Encap-End IPv4. Pop outer IPv6 header */ - if (ls0->behavior == SR_BEHAVIOR_DX4) - { - vlib_buffer_advance (b0, total_size); - vnet_buffer (b0)->ip.adj_index[VLIB_TX] = ls0->nh_adj; - *next0 = SR_LOCALSID_NEXT_IP4_REWRITE; - return; - } - else if (ls0->behavior == SR_BEHAVIOR_DT4) - { - vlib_buffer_advance (b0, total_size); - vnet_buffer (b0)->sw_if_index[VLIB_TX] = ls0->fib_table; - *next0 = SR_LOCALSID_NEXT_IP4_LOOKUP; - return; - } - break; - case IP_PROTOCOL_IP6_NONXT: - /* L2 encaps */ - if (ls0->behavior == SR_BEHAVIOR_DX2) - { - vlib_buffer_advance (b0, total_size); - vnet_buffer (b0)->sw_if_index[VLIB_TX] = ls0->sw_if_index; - *next0 = SR_LOCALSID_NEXT_INTERFACE_OUTPUT; - return; - } - break; - } - *next0 = SR_LOCALSID_NEXT_ERROR; - b0->error = node->errors[SR_LOCALSID_ERROR_NO_INNER_HEADER]; - return; -} - -/** - * @brief Function doing End processing with PSP - */ -static_always_inline void -end_psp_srh_processing (vlib_node_runtime_t * node, - vlib_buffer_t * b0, - ip6_header_t * ip0, - ip6_ext_header_t * prev0, - ip6_sr_header_t * sr0, - ip6_sr_localsid_t * ls0, u32 * next0) -{ - u32 new_l0, sr_len; - u64 *copy_dst0, *copy_src0; - u32 copy_len_u64s0 = 0; - int i; - - if (PREDICT_TRUE (sr0->type == ROUTING_HEADER_TYPE_SR)) - { - if (PREDICT_TRUE (sr0->segments_left == 1)) - { - ip0->dst_address.as_u64[0] = sr0->segments->as_u64[0]; - ip0->dst_address.as_u64[1] = sr0->segments->as_u64[1]; - - /* Remove the SRH taking care of the rest of IPv6 ext header */ - if (prev0) - prev0->next_hdr = sr0->protocol; - else - ip0->protocol = sr0->protocol; - - sr_len = ip6_ext_header_len (sr0); - vlib_buffer_advance (b0, sr_len); - new_l0 = clib_net_to_host_u16 (ip0->payload_length) - sr_len; - ip0->payload_length = clib_host_to_net_u16 (new_l0); - copy_src0 = (u64 *) ip0; - copy_dst0 = copy_src0 + (sr0->length + 1); - /* number of 8 octet units to copy - * By default in absence of extension headers it is equal to length of ip6 header - * With extension headers it number of 8 octet units of ext headers preceding - * SR header - */ - copy_len_u64s0 = - (((u8 *) sr0 - (u8 *) ip0) - sizeof (ip6_header_t)) >> 3; - copy_dst0[4 + copy_len_u64s0] = copy_src0[4 + copy_len_u64s0]; - copy_dst0[3 + copy_len_u64s0] = copy_src0[3 + copy_len_u64s0]; - copy_dst0[2 + copy_len_u64s0] = copy_src0[2 + copy_len_u64s0]; - copy_dst0[1 + copy_len_u64s0] = copy_src0[1 + copy_len_u64s0]; - copy_dst0[0 + copy_len_u64s0] = copy_src0[0 + copy_len_u64s0]; - - for (i = copy_len_u64s0 - 1; i >= 0; i--) - { - copy_dst0[i] = copy_src0[i]; - } - - if (ls0->behavior == SR_BEHAVIOR_X) - { - vnet_buffer (b0)->ip.adj_index[VLIB_TX] = ls0->nh_adj; - *next0 = SR_LOCALSID_NEXT_IP6_REWRITE; - } - return; - } - } - /* Error. Routing header of type != SR */ - *next0 = SR_LOCALSID_NEXT_ERROR; - b0->error = node->errors[SR_LOCALSID_ERROR_NO_PSP]; -} - -/** - * @brief SR LocalSID graph node. Supports all default SR Endpoint variants - */ -static uword -sr_localsid_d_fn (vlib_main_t * vm, vlib_node_runtime_t * node, - vlib_frame_t * from_frame) -{ - u32 n_left_from, next_index, *from, *to_next; - ip6_sr_main_t *sm = &sr_main; - from = vlib_frame_vector_args (from_frame); - n_left_from = from_frame->n_vectors; - next_index = node->cached_next_index; - u32 thread_index = vlib_get_thread_index (); - - while (n_left_from > 0) - { - u32 n_left_to_next; - vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); - - /* Quad - Loop */ - while (n_left_from >= 8 && n_left_to_next >= 4) - { - u32 bi0, bi1, bi2, bi3; - vlib_buffer_t *b0, *b1, *b2, *b3; - ip6_header_t *ip0, *ip1, *ip2, *ip3; - ip6_ext_header_t *prev0, *prev1, *prev2, *prev3; - ip6_sr_header_t *sr0, *sr1, *sr2, *sr3; - u32 next0, next1, next2, next3; - next0 = next1 = next2 = next3 = SR_LOCALSID_NEXT_IP6_LOOKUP; - ip6_sr_localsid_t *ls0, *ls1, *ls2, *ls3; - - /* Prefetch next iteration. */ - { - vlib_buffer_t *p4, *p5, *p6, *p7; - - p4 = vlib_get_buffer (vm, from[4]); - p5 = vlib_get_buffer (vm, from[5]); - p6 = vlib_get_buffer (vm, from[6]); - p7 = vlib_get_buffer (vm, from[7]); - - /* Prefetch the buffer header and packet for the N+4 loop iteration */ - vlib_prefetch_buffer_header (p4, LOAD); - vlib_prefetch_buffer_header (p5, LOAD); - vlib_prefetch_buffer_header (p6, LOAD); - vlib_prefetch_buffer_header (p7, LOAD); - - CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); - } - - to_next[0] = bi0 = from[0]; - to_next[1] = bi1 = from[1]; - to_next[2] = bi2 = from[2]; - to_next[3] = bi3 = from[3]; - from += 4; - to_next += 4; - n_left_from -= 4; - n_left_to_next -= 4; - - b0 = vlib_get_buffer (vm, bi0); - b1 = vlib_get_buffer (vm, bi1); - b2 = vlib_get_buffer (vm, bi2); - b3 = vlib_get_buffer (vm, bi3); - - ls0 = - pool_elt_at_index (sm->localsids, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - ls1 = - pool_elt_at_index (sm->localsids, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - ls2 = - pool_elt_at_index (sm->localsids, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - ls3 = - pool_elt_at_index (sm->localsids, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - - ip0 = vlib_buffer_get_current (b0); - ip1 = vlib_buffer_get_current (b1); - ip2 = vlib_buffer_get_current (b2); - ip3 = vlib_buffer_get_current (b3); - - ip6_ext_header_find_t (ip0, prev0, sr0, IP_PROTOCOL_IPV6_ROUTE); - ip6_ext_header_find_t (ip1, prev1, sr1, IP_PROTOCOL_IPV6_ROUTE); - ip6_ext_header_find_t (ip2, prev2, sr2, IP_PROTOCOL_IPV6_ROUTE); - ip6_ext_header_find_t (ip3, prev3, sr3, IP_PROTOCOL_IPV6_ROUTE); - - end_decaps_srh_processing (node, b0, ip0, sr0, ls0, &next0); - end_decaps_srh_processing (node, b1, ip1, sr1, ls1, &next1); - end_decaps_srh_processing (node, b2, ip2, sr2, ls2, &next2); - end_decaps_srh_processing (node, b3, ip3, sr3, ls3, &next3); - - //TODO: trace. - - vlib_increment_combined_counter - (((next0 == - SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : - &(sm->sr_ls_valid_counters)), thread_index, ls0 - sm->localsids, - 1, vlib_buffer_length_in_chain (vm, b0)); - - vlib_increment_combined_counter - (((next1 == - SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : - &(sm->sr_ls_valid_counters)), thread_index, ls1 - sm->localsids, - 1, vlib_buffer_length_in_chain (vm, b1)); - - vlib_increment_combined_counter - (((next2 == - SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : - &(sm->sr_ls_valid_counters)), thread_index, ls2 - sm->localsids, - 1, vlib_buffer_length_in_chain (vm, b2)); - - vlib_increment_combined_counter - (((next3 == - SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : - &(sm->sr_ls_valid_counters)), thread_index, ls3 - sm->localsids, - 1, vlib_buffer_length_in_chain (vm, b3)); - - vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, - n_left_to_next, bi0, bi1, bi2, bi3, - next0, next1, next2, next3); - } - - /* Single loop for potentially the last three packets */ - while (n_left_from > 0 && n_left_to_next > 0) - { - u32 bi0; - vlib_buffer_t *b0; - ip6_header_t *ip0; - ip6_ext_header_t *prev0; - ip6_sr_header_t *sr0; - u32 next0 = SR_LOCALSID_NEXT_IP6_LOOKUP; - ip6_sr_localsid_t *ls0; - - bi0 = from[0]; - to_next[0] = bi0; - from += 1; - to_next += 1; - n_left_from -= 1; - n_left_to_next -= 1; - - b0 = vlib_get_buffer (vm, bi0); - ip0 = vlib_buffer_get_current (b0); - - /* Lookup the SR End behavior based on IP DA (adj) */ - ls0 = - pool_elt_at_index (sm->localsids, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - - /* Find SRH as well as previous header */ - ip6_ext_header_find_t (ip0, prev0, sr0, IP_PROTOCOL_IPV6_ROUTE); - - /* SRH processing and End variants */ - end_decaps_srh_processing (node, b0, ip0, sr0, ls0, &next0); - - if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_localsid_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - tr->num_segments = 0; - tr->localsid_index = ls0 - sm->localsids; - - if (ip0 == vlib_buffer_get_current (b0)) - { - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->out_dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->out_dst.as_u8)); - if (ip0->protocol == IP_PROTOCOL_IPV6_ROUTE - && sr0->type == ROUTING_HEADER_TYPE_SR) - { - clib_memcpy (tr->sr, sr0->segments, sr0->length * 8); - tr->num_segments = - sr0->length * 8 / sizeof (ip6_address_t); - tr->segments_left = sr0->segments_left; - } - } - else - tr->num_segments = 0xFF; - } - - /* Increase the counters */ - vlib_increment_combined_counter - (((next0 == - SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : - &(sm->sr_ls_valid_counters)), thread_index, ls0 - sm->localsids, - 1, vlib_buffer_length_in_chain (vm, b0)); - - vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, - n_left_to_next, bi0, next0); - } - vlib_put_next_frame (vm, node, next_index, n_left_to_next); - } - return from_frame->n_vectors; -} - -/* *INDENT-OFF* */ -VLIB_REGISTER_NODE (sr_localsid_d_node) = { - .function = sr_localsid_d_fn, - .name = "sr-localsid-d", - .vector_size = sizeof (u32), - .format_trace = format_sr_localsid_trace, - .type = VLIB_NODE_TYPE_INTERNAL, - .n_errors = SR_LOCALSID_N_ERROR, - .error_strings = sr_localsid_error_strings, - .n_next_nodes = SR_LOCALSID_N_NEXT, - .next_nodes = { -#define _(s,n) [SR_LOCALSID_NEXT_##s] = n, - foreach_sr_localsid_next -#undef _ - }, -}; -/* *INDENT-ON* */ - -/** - * @brief SR LocalSID graph node. Supports all default SR Endpoint variants - */ -static uword -sr_localsid_fn (vlib_main_t * vm, vlib_node_runtime_t * node, - vlib_frame_t * from_frame) -{ - u32 n_left_from, next_index, *from, *to_next; - ip6_sr_main_t *sm = &sr_main; - from = vlib_frame_vector_args (from_frame); - n_left_from = from_frame->n_vectors; - next_index = node->cached_next_index; - u32 thread_index = vlib_get_thread_index (); - - while (n_left_from > 0) - { - u32 n_left_to_next; - vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); - - /* Quad - Loop */ - while (n_left_from >= 8 && n_left_to_next >= 4) - { - u32 bi0, bi1, bi2, bi3; - vlib_buffer_t *b0, *b1, *b2, *b3; - ip6_header_t *ip0, *ip1, *ip2, *ip3; - ip6_sr_header_t *sr0, *sr1, *sr2, *sr3; - ip6_ext_header_t *prev0, *prev1, *prev2, *prev3; - u32 next0, next1, next2, next3; - next0 = next1 = next2 = next3 = SR_LOCALSID_NEXT_IP6_LOOKUP; - ip6_sr_localsid_t *ls0, *ls1, *ls2, *ls3; - - /* Prefetch next iteration. */ - { - vlib_buffer_t *p4, *p5, *p6, *p7; - - p4 = vlib_get_buffer (vm, from[4]); - p5 = vlib_get_buffer (vm, from[5]); - p6 = vlib_get_buffer (vm, from[6]); - p7 = vlib_get_buffer (vm, from[7]); - - /* Prefetch the buffer header and packet for the N+2 loop iteration */ - vlib_prefetch_buffer_header (p4, LOAD); - vlib_prefetch_buffer_header (p5, LOAD); - vlib_prefetch_buffer_header (p6, LOAD); - vlib_prefetch_buffer_header (p7, LOAD); - - CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); - } - - to_next[0] = bi0 = from[0]; - to_next[1] = bi1 = from[1]; - to_next[2] = bi2 = from[2]; - to_next[3] = bi3 = from[3]; - from += 4; - to_next += 4; - n_left_from -= 4; - n_left_to_next -= 4; - - b0 = vlib_get_buffer (vm, bi0); - b1 = vlib_get_buffer (vm, bi1); - b2 = vlib_get_buffer (vm, bi2); - b3 = vlib_get_buffer (vm, bi3); - - ip0 = vlib_buffer_get_current (b0); - ip1 = vlib_buffer_get_current (b1); - ip2 = vlib_buffer_get_current (b2); - ip3 = vlib_buffer_get_current (b3); - - ip6_ext_header_find_t (ip0, prev0, sr0, IP_PROTOCOL_IPV6_ROUTE); - ip6_ext_header_find_t (ip1, prev1, sr1, IP_PROTOCOL_IPV6_ROUTE); - ip6_ext_header_find_t (ip2, prev2, sr2, IP_PROTOCOL_IPV6_ROUTE); - ip6_ext_header_find_t (ip3, prev3, sr3, IP_PROTOCOL_IPV6_ROUTE); - - ls0 = - pool_elt_at_index (sm->localsids, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - ls1 = - pool_elt_at_index (sm->localsids, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - ls2 = - pool_elt_at_index (sm->localsids, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - ls3 = - pool_elt_at_index (sm->localsids, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - - if (ls0->end_psp) - end_psp_srh_processing (node, b0, ip0, prev0, sr0, ls0, &next0); - else - end_srh_processing (node, b0, ip0, sr0, ls0, &next0); - - if (ls1->end_psp) - end_psp_srh_processing (node, b1, ip1, prev1, sr1, ls1, &next1); - else - end_srh_processing (node, b1, ip1, sr1, ls1, &next1); - - if (ls2->end_psp) - end_psp_srh_processing (node, b2, ip2, prev2, sr2, ls2, &next2); - else - end_srh_processing (node, b2, ip2, sr2, ls2, &next2); - - if (ls3->end_psp) - end_psp_srh_processing (node, b3, ip3, prev3, sr3, ls3, &next3); - else - end_srh_processing (node, b3, ip3, sr3, ls3, &next3); - - //TODO: proper trace. - - vlib_increment_combined_counter - (((next0 == - SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : - &(sm->sr_ls_valid_counters)), thread_index, ls0 - sm->localsids, - 1, vlib_buffer_length_in_chain (vm, b0)); - - vlib_increment_combined_counter - (((next1 == - SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : - &(sm->sr_ls_valid_counters)), thread_index, ls1 - sm->localsids, - 1, vlib_buffer_length_in_chain (vm, b1)); - - vlib_increment_combined_counter - (((next2 == - SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : - &(sm->sr_ls_valid_counters)), thread_index, ls2 - sm->localsids, - 1, vlib_buffer_length_in_chain (vm, b2)); - - vlib_increment_combined_counter - (((next3 == - SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : - &(sm->sr_ls_valid_counters)), thread_index, ls3 - sm->localsids, - 1, vlib_buffer_length_in_chain (vm, b3)); - - vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, - n_left_to_next, bi0, bi1, bi2, bi3, - next0, next1, next2, next3); - } - - /* Single loop for potentially the last three packets */ - while (n_left_from > 0 && n_left_to_next > 0) - { - u32 bi0; - vlib_buffer_t *b0; - ip6_header_t *ip0 = 0; - ip6_ext_header_t *prev0; - ip6_sr_header_t *sr0; - u32 next0 = SR_LOCALSID_NEXT_IP6_LOOKUP; - ip6_sr_localsid_t *ls0; - - bi0 = from[0]; - to_next[0] = bi0; - from += 1; - to_next += 1; - n_left_from -= 1; - n_left_to_next -= 1; - - b0 = vlib_get_buffer (vm, bi0); - ip0 = vlib_buffer_get_current (b0); - ip6_ext_header_find_t (ip0, prev0, sr0, IP_PROTOCOL_IPV6_ROUTE); - - /* Lookup the SR End behavior based on IP DA (adj) */ - ls0 = - pool_elt_at_index (sm->localsids, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - - /* SRH processing */ - if (ls0->end_psp) - end_psp_srh_processing (node, b0, ip0, prev0, sr0, ls0, &next0); - else - end_srh_processing (node, b0, ip0, sr0, ls0, &next0); - - if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_localsid_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - tr->num_segments = 0; - tr->localsid_index = ls0 - sm->localsids; - - if (ip0 == vlib_buffer_get_current (b0)) - { - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->out_dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->out_dst.as_u8)); - if (ip0->protocol == IP_PROTOCOL_IPV6_ROUTE - && sr0->type == ROUTING_HEADER_TYPE_SR) - { - clib_memcpy (tr->sr, sr0->segments, sr0->length * 8); - tr->num_segments = - sr0->length * 8 / sizeof (ip6_address_t); - tr->segments_left = sr0->segments_left; - } - } - else - { - tr->num_segments = 0xFF; - } - } - - vlib_increment_combined_counter - (((next0 == - SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : - &(sm->sr_ls_valid_counters)), thread_index, ls0 - sm->localsids, - 1, vlib_buffer_length_in_chain (vm, b0)); - - vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, - n_left_to_next, bi0, next0); - } - vlib_put_next_frame (vm, node, next_index, n_left_to_next); - } - return from_frame->n_vectors; -} - -/* *INDENT-OFF* */ -VLIB_REGISTER_NODE (sr_localsid_node) = { - .function = sr_localsid_fn, - .name = "sr-localsid", - .vector_size = sizeof (u32), - .format_trace = format_sr_localsid_trace, - .type = VLIB_NODE_TYPE_INTERNAL, - .n_errors = SR_LOCALSID_N_ERROR, - .error_strings = sr_localsid_error_strings, - .n_next_nodes = SR_LOCALSID_N_NEXT, - .next_nodes = { -#define _(s,n) [SR_LOCALSID_NEXT_##s] = n, - foreach_sr_localsid_next -#undef _ - }, -}; -/* *INDENT-ON* */ - -static u8 * -format_sr_dpo (u8 * s, va_list * args) -{ - index_t index = va_arg (*args, index_t); - CLIB_UNUSED (u32 indent) = va_arg (*args, u32); - - return (format (s, "SR: localsid_index:[%d]", index)); -} - -const static dpo_vft_t sr_loc_vft = { - .dv_lock = sr_dpo_lock, - .dv_unlock = sr_dpo_unlock, - .dv_format = format_sr_dpo, -}; - -const static char *const sr_loc_ip6_nodes[] = { - "sr-localsid", - NULL, -}; - -const static char *const *const sr_loc_nodes[DPO_PROTO_NUM] = { - [DPO_PROTO_IP6] = sr_loc_ip6_nodes, -}; - -const static char *const sr_loc_d_ip6_nodes[] = { - "sr-localsid-d", - NULL, -}; - -const static char *const *const sr_loc_d_nodes[DPO_PROTO_NUM] = { - [DPO_PROTO_IP6] = sr_loc_d_ip6_nodes, -}; - - -/*************************** SR LocalSID plugins ******************************/ -/** - * @brief SR LocalSID plugin registry - */ -int -sr_localsid_register_function (vlib_main_t * vm, u8 * fn_name, - u8 * keyword_str, u8 * def_str, - u8 * params_str, dpo_type_t * dpo, - format_function_t * ls_format, - unformat_function_t * ls_unformat, - sr_plugin_callback_t * creation_fn, - sr_plugin_callback_t * removal_fn) -{ - ip6_sr_main_t *sm = &sr_main; - uword *p; - - sr_localsid_fn_registration_t *plugin; - - /* Did this function exist? If so update it */ - p = hash_get_mem (sm->plugin_functions_by_key, fn_name); - if (p) - { - plugin = pool_elt_at_index (sm->plugin_functions, p[0]); - } - /* Else create a new one and set hash key */ - else - { - pool_get (sm->plugin_functions, plugin); - hash_set_mem (sm->plugin_functions_by_key, fn_name, - plugin - sm->plugin_functions); - } - - memset (plugin, 0, sizeof (*plugin)); - - plugin->sr_localsid_function_number = (plugin - sm->plugin_functions); - plugin->sr_localsid_function_number += SR_BEHAVIOR_LAST; - plugin->ls_format = ls_format; - plugin->ls_unformat = ls_unformat; - plugin->creation = creation_fn; - plugin->removal = removal_fn; - clib_memcpy (&plugin->dpo, dpo, sizeof (dpo_type_t)); - plugin->function_name = format (0, "%s%c", fn_name, 0); - plugin->keyword_str = format (0, "%s%c", keyword_str, 0); - plugin->def_str = format (0, "%s%c", def_str, 0); - plugin->params_str = format (0, "%s%c", params_str, 0); - - return plugin->sr_localsid_function_number; -} - -/** - * @brief CLI function to 'show' all available SR LocalSID behaviors - */ -static clib_error_t * -show_sr_localsid_behaviors_command_fn (vlib_main_t * vm, - unformat_input_t * input, - vlib_cli_command_t * cmd) -{ - ip6_sr_main_t *sm = &sr_main; - sr_localsid_fn_registration_t *plugin; - sr_localsid_fn_registration_t **plugins_vec = 0; - int i; - - vlib_cli_output (vm, - "SR LocalSIDs behaviors:\n-----------------------\n\n"); - - /* *INDENT-OFF* */ - pool_foreach (plugin, sm->plugin_functions, - ({ vec_add1 (plugins_vec, plugin); })); - /* *INDENT-ON* */ - - /* Print static behaviors */ - vlib_cli_output (vm, "Default behaviors:\n" - "\tEnd\t-> Endpoint.\n" - "\tEnd.X\t-> Endpoint with decapsulation and Layer-3 cross-connect.\n" - "\t\tParameters: ' '\n" - "\tEnd.DX2\t-> Endpoint with decapsulation and Layer-2 cross-connect.\n" - "\t\tParameters: ''\n" - "\tEnd.DX6\t-> Endpoint with decapsulation and IPv6 cross-connect.\n" - "\t\tParameters: ' '\n" - "\tEnd.DX4\t-> Endpoint with decapsulation and IPv4 cross-connect.\n" - "\t\tParameters: ' '\n" - "\tEnd.DT6\t-> Endpoint with decapsulation and specific IPv6 table lookup.\n" - "\t\tParameters: ''\n" - "\tEnd.DT4\t-> Endpoint with decapsulation and specific IPv4 table lookup.\n" - "\t\tParameters: ''\n"); - vlib_cli_output (vm, "Plugin behaviors:\n"); - for (i = 0; i < vec_len (plugins_vec); i++) - { - plugin = plugins_vec[i]; - vlib_cli_output (vm, "\t%s\t-> %s.\n", plugin->keyword_str, - plugin->def_str); - vlib_cli_output (vm, "\t\tParameters: '%s'\n", plugin->params_str); - } - return 0; -} - -/* *INDENT-OFF* */ -VLIB_CLI_COMMAND (show_sr_localsid_behaviors_command, static) = { - .path = "show sr localsids behaviors", - .short_help = "show sr localsids behaviors", - .function = show_sr_localsid_behaviors_command_fn, -}; -/* *INDENT-ON* */ - -/** - * @brief SR LocalSID initialization - */ -clib_error_t * -sr_localsids_init (vlib_main_t * vm) -{ - /* Init memory for function keys */ - ip6_sr_main_t *sm = &sr_main; - mhash_init (&sm->sr_localsids_index_hash, sizeof (uword), - sizeof (ip6_address_t)); - /* Init SR behaviors DPO type */ - sr_localsid_dpo_type = dpo_register_new_type (&sr_loc_vft, sr_loc_nodes); - /* Init SR behaviors DPO type */ - sr_localsid_d_dpo_type = - dpo_register_new_type (&sr_loc_vft, sr_loc_d_nodes); - /* Init memory for localsid plugins */ - sm->plugin_functions_by_key = hash_create_string (0, sizeof (uword)); - return 0; -} - -VLIB_INIT_FUNCTION (sr_localsids_init); -/* -* fd.io coding-style-patch-verification: ON -* -* Local Variables: -* eval: (c-set-style "gnu") -* End: -*/ diff --git a/src/vnet/sr/sr_localsid.md b/src/vnet/sr/sr_localsid.md deleted file mode 100644 index 340af4a3..00000000 --- a/src/vnet/sr/sr_localsid.md +++ /dev/null @@ -1,58 +0,0 @@ -# SR LocalSIDs {#srv6_localsid_doc} - -A local SID is associated to a Segment Routing behavior -or function- on the current node. - -The most basic behavior is called END. It simply activates the next SID in the current packet, by decrementing the Segments Left value and updating the IPv6 DA. - -A local END SID is instantiated using the following CLI: - - sr localsid (del) address XX::YY behavior end - -This creates a new entry in the main FIB for IPv6 address XX::YY. All packets whose IPv6 DA matches this FIB entry are redirected to the sr-localsid node, where they are processed as described above. - -Other examples of local SIDs are the following: - - sr localsid (del) address XX::YY behavior end - sr localsid (del) address XX::YY behavior end.x GE0/1/0 2001::a - sr localsid (del) address XX::YY behavior end.dx6 GE0/1/0 2001::a - sr localsid (del) address XX::YY behavior end.dx4 GE0/1/0 10.0.0.1 - sr localsid (del) address XX::YY behavior end.dx2 GigabitE0/11/0 - sr localsid (del) address XX::YY behavior end.dt6 5 - sr localsid (del) address XX::YY behavior end.dt6 5 - -Note that all of these behaviors match the definitions of the SRv6 architecture (*draft-filsfils-spring-srv6-network-programming*). Please refer to this document for a detailed description of each behavior. - -Note also that you can configure the PSP flavor of the End and End.X behaviors by typing: - - sr localsid (del) address XX::YY behavior end psp - sr localsid (del) address XX::YY behavior end.x GE0/1/0 2001::a psp - -Help on the available local SID behaviors and their usage can be obtained with: - - help sr localsid - -Alternatively they can be obtained using. - - show sr localsids behavior - -The difference in between those two commands is that the first one will only display the SR LocalSID behaviors that are built-in VPP, while the latter will display those behaviors plus the ones added with the SR LocalSID Development Framework. - - -VPP keeps a 'My LocalSID Table' where it stores all the SR local SIDs instantiated as well as their parameters. Every time a new local SID is instantiated, a new entry is added to this table. In addition, counters for correctly and incorrectly processed traffic are maintained for each local SID. The counters store both the number of packets and bytes. - -The contents of the 'My LocalSID Table' is shown with: - - vpp# show sr localsid - SRv6 - My LocalSID Table: - ========================= - Address: c3::1 - Behavior: DX6 (Endpoint with decapsulation and IPv6 cross-connect) - Iface: GigabitEthernet0/5/0 - Next hop: b:c3::b - Good traffic: [51277 packets : 5332808 bytes] - Bad traffic: [0 packets : 0 bytes] - -------------------- - -The traffic counters can be reset with: - - vpp# clear sr localsid counters diff --git a/src/vnet/sr/sr_packet.h b/src/vnet/sr/sr_packet.h deleted file mode 100755 index 7af4ad4d..00000000 --- a/src/vnet/sr/sr_packet.h +++ /dev/null @@ -1,159 +0,0 @@ -#ifndef included_vnet_sr_packet_h -#define included_vnet_sr_packet_h - -#include - -/* - * ipv6 segment-routing header format - * - * Copyright (c) 2013 Cisco and/or its affiliates. - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -/* - * The Segment Routing Header (SRH) is defined as follows: - * - * 0 1 2 3 - * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * | Next Header | Hdr Ext Len | Routing Type | Segments Left | - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * | First Segment | Flags | RESERVED | - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * | | - * | Segment List[0] (128 bits IPv6 address) | - * | | - * | | - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * | | - * | | - * ... - * | | - * | | - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * | | - * | Segment List[n] (128 bits IPv6 address) | - * | | - * | | - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * // // - * // Optional Type Length Value objects (variable) // - * // // - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * - * where: - * - * o Next Header: 8-bit selector. Identifies the type of header - * immediately following the SRH. - * - * o Hdr Ext Len: 8-bit unsigned integer, is the length of the SRH - * header in 8-octet units, not including the first 8 octets. - * - * o Routing Type: TBD, to be assigned by IANA (suggested value: 4). - * - * o Segments Left. Defined in [RFC2460], it contains the index, in - * the Segment List, of the next segment to inspect. Segments Left - * is decremented at each segment. - * - * o First Segment: contains the index, in the Segment List, of the - * first segment of the path which is in fact the last element of the - * Segment List. - * - * o Flags: 8 bits of flags. Following flags are defined: - * - * 0 1 2 3 4 5 6 7 - * +-+-+-+-+-+-+-+-+ - * |U|P|O|A|H| U | - * +-+-+-+-+-+-+-+-+ - * - * U: Unused and for future use. SHOULD be unset on transmission - * and MUST be ignored on receipt. - * - * P-flag: Protected flag. Set when the packet has been rerouted - * through FRR mechanism by an SR endpoint node. - * - * O-flag: OAM flag. When set, it indicates that this packet is - * an operations and management (OAM) packet. - * - * A-flag: Alert flag. If present, it means important Type Length - * Value (TLV) objects are present. See Section 3.1 for details - * on TLVs objects. - * - * H-flag: HMAC flag. If set, the HMAC TLV is present and is - * encoded as the last TLV of the SRH. In other words, the last - * 36 octets of the SRH represent the HMAC information. See - * Section 3.1.5 for details on the HMAC TLV. - * - * o RESERVED: SHOULD be unset on transmission and MUST be ignored on - * receipt. - * - * o Segment List[n]: 128 bit IPv6 addresses representing the nth - * segment in the Segment List. The Segment List is encoded starting - * from the last segment of the path. I.e., the first element of the - * segment list (Segment List [0]) contains the last segment of the - * path while the last segment of the Segment List (Segment List[n]) - * contains the first segment of the path. The index contained in - * "Segments Left" identifies the current active segment. - * - * o Type Length Value (TLV) are described in Section 3.1. - * - */ - -#ifndef IPPROTO_IPV6_ROUTE -#define IPPROTO_IPV6_ROUTE 43 -#endif - -#define ROUTING_HEADER_TYPE_SR 4 - -typedef struct -{ - /* Protocol for next header. */ - u8 protocol; - /* - * Length of routing header in 8 octet units, - * not including the first 8 octets - */ - u8 length; - - /* Type of routing header; type 4 = segement routing */ - u8 type; - - /* Next segment in the segment list */ - u8 segments_left; - - /* Pointer to the first segment in the header */ - u8 first_segment; - - /* Flag bits */ -#define IP6_SR_HEADER_FLAG_PROTECTED (0x40) -#define IP6_SR_HEADER_FLAG_OAM (0x20) -#define IP6_SR_HEADER_FLAG_ALERT (0x10) -#define IP6_SR_HEADER_FLAG_HMAC (0x80) - - /* values 0x0, 0x4 - 0x7 are reserved */ - u8 flags; - u16 reserved; - - /* The segment elts */ - ip6_address_t segments[0]; -} __attribute__ ((packed)) ip6_sr_header_t; - -/* -* fd.io coding-style-patch-verification: ON -* -* Local Variables: -* eval: (c-set-style "gnu") -* End: -*/ - -#endif /* included_vnet_sr_packet_h */ diff --git a/src/vnet/sr/sr_policy.md b/src/vnet/sr/sr_policy.md deleted file mode 100644 index 521b8461..00000000 --- a/src/vnet/sr/sr_policy.md +++ /dev/null @@ -1,56 +0,0 @@ -# Creating a SR Policy {#srv6_policy_doc} - -An SR Policy is defined by a Binding SID and a weighted set of Segment Lists. - -A new SR policy is created with a first SID list using: - - sr policy add bsid 2001::1 next A1:: next B1:: next C1:: (weight 5) (fib-table 3) - -* The weight parameter is only used if more than one SID list is associated with the policy. -* The fib-table parameter specifies in which table (VRF) the Binding SID is to be installed. - -An SR policy is deleted with: - - sr policy del bsid 2001::1 - sr policy del index 1 - -The existing SR policies are listed with: - - show sr policies - -## Adding/Removing SID Lists from an SR policy - -An additional SID list is associated with an existing SR policy with: - - sr policy mod bsid 2001::1 add sl next A2:: next B2:: next C2:: (weight 3) - sr policy mod index 3 add sl next A2:: next B2:: next C2:: (weight 3) - -Conversely, a SID list can be removed from an SR policy with: - - sr policy mod bsid 2001::1 del sl index 1 - sr policy mod index 3 del sl index 1 - -Note that this cannot be used to remove the last SID list of a policy. - -The weight of a SID list can also be modified with: - - sr policy mod bsid 2001::1 mod sl index 1 weight 4 - sr policy mod index 3 mod sl index 1 weight 4 - -## SR Policies: Spray policies - -Spray policies are a specific type of SR policies where the packet is replicated on all the SID lists, rather than load-balanced among them. - -SID list weights are ignored with this type of policies. - -A Spray policy is instantiated by appending the keyword **spray** to a regular SR policy command, as in: - - sr policy add bsid 2001::1 next A1:: next B1:: next C1:: spray - -Spray policies are used for removing multicast state from a network core domain, and instead send a linear unicast copy to every access node. The last SID in each list accesses the multicast tree within the access node. - -## Encapsulation SR policies - -In case the user decides to create an SR policy an IPv6 Source Address must be specified for the encapsulated traffic. In order to do so the user might use the following command: - - set sr encaps source addr XXXX::YYYY diff --git a/src/vnet/sr/sr_policy_rewrite.c b/src/vnet/sr/sr_policy_rewrite.c deleted file mode 100755 index c4024070..00000000 --- a/src/vnet/sr/sr_policy_rewrite.c +++ /dev/null @@ -1,3227 +0,0 @@ -/* - * sr_policy_rewrite.c: ipv6 sr policy creation - * - * Copyright (c) 2016 Cisco and/or its affiliates. - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -/** - * @file - * @brief SR policy creation and application - * - * Create an SR policy. - * An SR policy can be either of 'default' type or 'spray' type - * An SR policy has attached a list of SID lists. - * In case the SR policy is a default one it will load balance among them. - * An SR policy has associated a BindingSID. - * In case any packet arrives with IPv6 DA == BindingSID then the SR policy - * associated to such bindingSID will be applied to such packet. - * - * SR policies can be applied either by using IPv6 encapsulation or - * SRH insertion. Both methods can be found on this file. - * - * Traffic input usually is IPv6 packets. However it is possible to have - * IPv4 packets or L2 frames. (that are encapsulated into IPv6 with SRH) - * - * This file provides the appropiates VPP graph nodes to do any of these - * methods. - * - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include - -/** - * @brief SR policy rewrite trace - */ -typedef struct -{ - ip6_address_t src, dst; -} sr_policy_rewrite_trace_t; - -/* Graph arcs */ -#define foreach_sr_policy_rewrite_next \ -_(IP6_LOOKUP, "ip6-lookup") \ -_(ERROR, "error-drop") - -typedef enum -{ -#define _(s,n) SR_POLICY_REWRITE_NEXT_##s, - foreach_sr_policy_rewrite_next -#undef _ - SR_POLICY_REWRITE_N_NEXT, -} sr_policy_rewrite_next_t; - -/* SR rewrite errors */ -#define foreach_sr_policy_rewrite_error \ -_(INTERNAL_ERROR, "Segment Routing undefined error") \ -_(BSID_ZERO, "BSID with SL = 0") \ -_(COUNTER_TOTAL, "SR steered IPv6 packets") \ -_(COUNTER_ENCAP, "SR: Encaps packets") \ -_(COUNTER_INSERT, "SR: SRH inserted packets") \ -_(COUNTER_BSID, "SR: BindingSID steered packets") - -typedef enum -{ -#define _(sym,str) SR_POLICY_REWRITE_ERROR_##sym, - foreach_sr_policy_rewrite_error -#undef _ - SR_POLICY_REWRITE_N_ERROR, -} sr_policy_rewrite_error_t; - -static char *sr_policy_rewrite_error_strings[] = { -#define _(sym,string) string, - foreach_sr_policy_rewrite_error -#undef _ -}; - -/** - * @brief Dynamically added SR SL DPO type - */ -static dpo_type_t sr_pr_encaps_dpo_type; -static dpo_type_t sr_pr_insert_dpo_type; -static dpo_type_t sr_pr_bsid_encaps_dpo_type; -static dpo_type_t sr_pr_bsid_insert_dpo_type; - -/** - * @brief IPv6 SA for encapsulated packets - */ -static ip6_address_t sr_pr_encaps_src; - -/******************* SR rewrite set encaps IPv6 source addr *******************/ -/* Note: This is temporal. We don't know whether to follow this path or - take the ip address of a loopback interface or even the OIF */ - -static clib_error_t * -set_sr_src_command_fn (vlib_main_t * vm, unformat_input_t * input, - vlib_cli_command_t * cmd) -{ - while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) - { - if (unformat - (input, "addr %U", unformat_ip6_address, &sr_pr_encaps_src)) - return 0; - else - return clib_error_return (0, "No address specified"); - } - return clib_error_return (0, "No address specified"); -} - -/* *INDENT-OFF* */ -VLIB_CLI_COMMAND (set_sr_src_command, static) = { - .path = "set sr encaps source", - .short_help = "set sr encaps source addr ", - .function = set_sr_src_command_fn, -}; -/* *INDENT-ON* */ - -/*********************** SR rewrite string computation ************************/ -/** - * @brief SR rewrite string computation for IPv6 encapsulation (inline) - * - * @param sl is a vector of IPv6 addresses composing the Segment List - * - * @return precomputed rewrite string for encapsulation - */ -static inline u8 * -compute_rewrite_encaps (ip6_address_t * sl) -{ - ip6_header_t *iph; - ip6_sr_header_t *srh; - ip6_address_t *addrp, *this_address; - u32 header_length = 0; - u8 *rs = NULL; - - header_length = 0; - header_length += IPv6_DEFAULT_HEADER_LENGTH; - if (vec_len (sl) > 1) - { - header_length += sizeof (ip6_sr_header_t); - header_length += vec_len (sl) * sizeof (ip6_address_t); - } - - vec_validate (rs, header_length - 1); - - iph = (ip6_header_t *) rs; - iph->ip_version_traffic_class_and_flow_label = - clib_host_to_net_u32 (0 | ((6 & 0xF) << 28)); - iph->src_address.as_u64[0] = sr_pr_encaps_src.as_u64[0]; - iph->src_address.as_u64[1] = sr_pr_encaps_src.as_u64[1]; - iph->payload_length = header_length - IPv6_DEFAULT_HEADER_LENGTH; - iph->protocol = IP_PROTOCOL_IPV6; - iph->hop_limit = IPv6_DEFAULT_HOP_LIMIT; - - srh = (ip6_sr_header_t *) (iph + 1); - iph->protocol = IP_PROTOCOL_IPV6_ROUTE; - srh->protocol = IP_PROTOCOL_IPV6; - srh->type = ROUTING_HEADER_TYPE_SR; - srh->segments_left = vec_len (sl) - 1; - srh->first_segment = vec_len (sl) - 1; - srh->length = ((sizeof (ip6_sr_header_t) + - (vec_len (sl) * sizeof (ip6_address_t))) / 8) - 1; - srh->flags = 0x00; - srh->reserved = 0x00; - addrp = srh->segments + vec_len (sl) - 1; - vec_foreach (this_address, sl) - { - clib_memcpy (addrp->as_u8, this_address->as_u8, sizeof (ip6_address_t)); - addrp--; - } - iph->dst_address.as_u64[0] = sl->as_u64[0]; - iph->dst_address.as_u64[1] = sl->as_u64[1]; - return rs; -} - -/** - * @brief SR rewrite string computation for SRH insertion (inline) - * - * @param sl is a vector of IPv6 addresses composing the Segment List - * - * @return precomputed rewrite string for SRH insertion - */ -static inline u8 * -compute_rewrite_insert (ip6_address_t * sl) -{ - ip6_sr_header_t *srh; - ip6_address_t *addrp, *this_address; - u32 header_length = 0; - u8 *rs = NULL; - - header_length = 0; - header_length += sizeof (ip6_sr_header_t); - header_length += (vec_len (sl) + 1) * sizeof (ip6_address_t); - - vec_validate (rs, header_length - 1); - - srh = (ip6_sr_header_t *) rs; - srh->type = ROUTING_HEADER_TYPE_SR; - srh->segments_left = vec_len (sl); - srh->first_segment = vec_len (sl); - srh->length = ((sizeof (ip6_sr_header_t) + - ((vec_len (sl) + 1) * sizeof (ip6_address_t))) / 8) - 1; - srh->flags = 0x00; - srh->reserved = 0x0000; - addrp = srh->segments + vec_len (sl); - vec_foreach (this_address, sl) - { - clib_memcpy (addrp->as_u8, this_address->as_u8, sizeof (ip6_address_t)); - addrp--; - } - return rs; -} - -/** - * @brief SR rewrite string computation for SRH insertion with BSID (inline) - * - * @param sl is a vector of IPv6 addresses composing the Segment List - * - * @return precomputed rewrite string for SRH insertion with BSID - */ -static inline u8 * -compute_rewrite_bsid (ip6_address_t * sl) -{ - ip6_sr_header_t *srh; - ip6_address_t *addrp, *this_address; - u32 header_length = 0; - u8 *rs = NULL; - - header_length = 0; - header_length += sizeof (ip6_sr_header_t); - header_length += vec_len (sl) * sizeof (ip6_address_t); - - vec_validate (rs, header_length - 1); - - srh = (ip6_sr_header_t *) rs; - srh->type = ROUTING_HEADER_TYPE_SR; - srh->segments_left = vec_len (sl) - 1; - srh->first_segment = vec_len (sl) - 1; - srh->length = ((sizeof (ip6_sr_header_t) + - (vec_len (sl) * sizeof (ip6_address_t))) / 8) - 1; - srh->flags = 0x00; - srh->reserved = 0x0000; - addrp = srh->segments + vec_len (sl) - 1; - vec_foreach (this_address, sl) - { - clib_memcpy (addrp->as_u8, this_address->as_u8, sizeof (ip6_address_t)); - addrp--; - } - return rs; -} - -/*************************** SR LB helper functions **************************/ -/** - * @brief Creates a Segment List and adds it to an SR policy - * - * Creates a Segment List and adds it to the SR policy. Notice that the SL are - * not necessarily unique. Hence there might be two Segment List within the - * same SR Policy with exactly the same segments and same weight. - * - * @param sr_policy is the SR policy where the SL will be added - * @param sl is a vector of IPv6 addresses composing the Segment List - * @param weight is the weight of the SegmentList (for load-balancing purposes) - * @param is_encap represents the mode (SRH insertion vs Encapsulation) - * - * @return pointer to the just created segment list - */ -static inline ip6_sr_sl_t * -create_sl (ip6_sr_policy_t * sr_policy, ip6_address_t * sl, u32 weight, - u8 is_encap) -{ - ip6_sr_main_t *sm = &sr_main; - ip6_sr_sl_t *segment_list; - - pool_get (sm->sid_lists, segment_list); - memset (segment_list, 0, sizeof (*segment_list)); - - vec_add1 (sr_policy->segments_lists, segment_list - sm->sid_lists); - - /* Fill in segment list */ - segment_list->weight = - (weight != (u32) ~ 0 ? weight : SR_SEGMENT_LIST_WEIGHT_DEFAULT); - segment_list->segments = vec_dup (sl); - - if (is_encap) - { - segment_list->rewrite = compute_rewrite_encaps (sl); - segment_list->rewrite_bsid = segment_list->rewrite; - } - else - { - segment_list->rewrite = compute_rewrite_insert (sl); - segment_list->rewrite_bsid = compute_rewrite_bsid (sl); - } - - /* Create DPO */ - dpo_reset (&segment_list->bsid_dpo); - dpo_reset (&segment_list->ip6_dpo); - dpo_reset (&segment_list->ip4_dpo); - - if (is_encap) - { - dpo_set (&segment_list->ip6_dpo, sr_pr_encaps_dpo_type, DPO_PROTO_IP6, - segment_list - sm->sid_lists); - dpo_set (&segment_list->ip4_dpo, sr_pr_encaps_dpo_type, DPO_PROTO_IP4, - segment_list - sm->sid_lists); - dpo_set (&segment_list->bsid_dpo, sr_pr_bsid_encaps_dpo_type, - DPO_PROTO_IP6, segment_list - sm->sid_lists); - } - else - { - dpo_set (&segment_list->ip6_dpo, sr_pr_insert_dpo_type, DPO_PROTO_IP6, - segment_list - sm->sid_lists); - dpo_set (&segment_list->bsid_dpo, sr_pr_bsid_insert_dpo_type, - DPO_PROTO_IP6, segment_list - sm->sid_lists); - } - - return segment_list; -} - -/** - * @brief Updates the Load Balancer after an SR Policy change - * - * @param sr_policy is the modified SR Policy - */ -static inline void -update_lb (ip6_sr_policy_t * sr_policy) -{ - flow_hash_config_t fhc; - u32 *sl_index; - ip6_sr_sl_t *segment_list; - ip6_sr_main_t *sm = &sr_main; - load_balance_path_t path; - path.path_index = FIB_NODE_INDEX_INVALID; - load_balance_path_t *ip4_path_vector = 0; - load_balance_path_t *ip6_path_vector = 0; - load_balance_path_t *b_path_vector = 0; - - /* In case LB does not exist, create it */ - if (!dpo_id_is_valid (&sr_policy->bsid_dpo)) - { - fib_prefix_t pfx = { - .fp_proto = FIB_PROTOCOL_IP6, - .fp_len = 128, - .fp_addr = { - .ip6 = sr_policy->bsid, - } - }; - - /* Add FIB entry for BSID */ - fhc = fib_table_get_flow_hash_config (sr_policy->fib_table, - dpo_proto_to_fib (DPO_PROTO_IP6)); - - dpo_set (&sr_policy->bsid_dpo, DPO_LOAD_BALANCE, DPO_PROTO_IP6, - load_balance_create (0, DPO_PROTO_IP6, fhc)); - - dpo_set (&sr_policy->ip6_dpo, DPO_LOAD_BALANCE, DPO_PROTO_IP6, - load_balance_create (0, DPO_PROTO_IP6, fhc)); - - /* Update FIB entry's to point to the LB DPO in the main FIB and hidden one */ - fib_table_entry_special_dpo_update (fib_table_find (FIB_PROTOCOL_IP6, - sr_policy->fib_table), - &pfx, FIB_SOURCE_SR, - FIB_ENTRY_FLAG_EXCLUSIVE, - &sr_policy->bsid_dpo); - - fib_table_entry_special_dpo_update (sm->fib_table_ip6, - &pfx, - FIB_SOURCE_SR, - FIB_ENTRY_FLAG_EXCLUSIVE, - &sr_policy->ip6_dpo); - - if (sr_policy->is_encap) - { - dpo_set (&sr_policy->ip4_dpo, DPO_LOAD_BALANCE, DPO_PROTO_IP4, - load_balance_create (0, DPO_PROTO_IP4, fhc)); - - fib_table_entry_special_dpo_update (sm->fib_table_ip4, - &pfx, - FIB_SOURCE_SR, - FIB_ENTRY_FLAG_EXCLUSIVE, - &sr_policy->ip4_dpo); - } - - } - - /* Create the LB path vector */ - //path_vector = vec_new(load_balance_path_t, vec_len(sr_policy->segments_lists)); - vec_foreach (sl_index, sr_policy->segments_lists) - { - segment_list = pool_elt_at_index (sm->sid_lists, *sl_index); - path.path_dpo = segment_list->bsid_dpo; - path.path_weight = segment_list->weight; - vec_add1 (b_path_vector, path); - path.path_dpo = segment_list->ip6_dpo; - vec_add1 (ip6_path_vector, path); - if (sr_policy->is_encap) - { - path.path_dpo = segment_list->ip4_dpo; - vec_add1 (ip4_path_vector, path); - } - } - - /* Update LB multipath */ - load_balance_multipath_update (&sr_policy->bsid_dpo, b_path_vector, - LOAD_BALANCE_FLAG_NONE); - load_balance_multipath_update (&sr_policy->ip6_dpo, ip6_path_vector, - LOAD_BALANCE_FLAG_NONE); - if (sr_policy->is_encap) - load_balance_multipath_update (&sr_policy->ip4_dpo, ip4_path_vector, - LOAD_BALANCE_FLAG_NONE); - - /* Cleanup */ - vec_free (b_path_vector); - vec_free (ip6_path_vector); - vec_free (ip4_path_vector); - -} - -/** - * @brief Updates the Replicate DPO after an SR Policy change - * - * @param sr_policy is the modified SR Policy (type spray) - */ -static inline void -update_replicate (ip6_sr_policy_t * sr_policy) -{ - u32 *sl_index; - ip6_sr_sl_t *segment_list; - ip6_sr_main_t *sm = &sr_main; - load_balance_path_t path; - path.path_index = FIB_NODE_INDEX_INVALID; - load_balance_path_t *b_path_vector = 0; - load_balance_path_t *ip6_path_vector = 0; - load_balance_path_t *ip4_path_vector = 0; - - /* In case LB does not exist, create it */ - if (!dpo_id_is_valid (&sr_policy->bsid_dpo)) - { - dpo_set (&sr_policy->bsid_dpo, DPO_REPLICATE, - DPO_PROTO_IP6, replicate_create (0, DPO_PROTO_IP6)); - - dpo_set (&sr_policy->ip6_dpo, DPO_REPLICATE, - DPO_PROTO_IP6, replicate_create (0, DPO_PROTO_IP6)); - - /* Update FIB entry's DPO to point to SR without LB */ - fib_prefix_t pfx = { - .fp_proto = FIB_PROTOCOL_IP6, - .fp_len = 128, - .fp_addr = { - .ip6 = sr_policy->bsid, - } - }; - fib_table_entry_special_dpo_update (fib_table_find (FIB_PROTOCOL_IP6, - sr_policy->fib_table), - &pfx, FIB_SOURCE_SR, - FIB_ENTRY_FLAG_EXCLUSIVE, - &sr_policy->bsid_dpo); - - fib_table_entry_special_dpo_update (sm->fib_table_ip6, - &pfx, - FIB_SOURCE_SR, - FIB_ENTRY_FLAG_EXCLUSIVE, - &sr_policy->ip6_dpo); - - if (sr_policy->is_encap) - { - dpo_set (&sr_policy->ip4_dpo, DPO_REPLICATE, DPO_PROTO_IP4, - replicate_create (0, DPO_PROTO_IP4)); - - fib_table_entry_special_dpo_update (sm->fib_table_ip4, - &pfx, - FIB_SOURCE_SR, - FIB_ENTRY_FLAG_EXCLUSIVE, - &sr_policy->ip4_dpo); - } - - } - - /* Create the replicate path vector */ - path.path_weight = 1; - vec_foreach (sl_index, sr_policy->segments_lists) - { - segment_list = pool_elt_at_index (sm->sid_lists, *sl_index); - path.path_dpo = segment_list->bsid_dpo; - vec_add1 (b_path_vector, path); - path.path_dpo = segment_list->ip6_dpo; - vec_add1 (ip6_path_vector, path); - if (sr_policy->is_encap) - { - path.path_dpo = segment_list->ip4_dpo; - vec_add1 (ip4_path_vector, path); - } - } - - /* Update replicate multipath */ - replicate_multipath_update (&sr_policy->bsid_dpo, b_path_vector); - replicate_multipath_update (&sr_policy->ip6_dpo, ip6_path_vector); - if (sr_policy->is_encap) - replicate_multipath_update (&sr_policy->ip4_dpo, ip4_path_vector); -} - -/******************************* SR rewrite API *******************************/ -/* Three functions for handling sr policies: - * -> sr_policy_add - * -> sr_policy_del - * -> sr_policy_mod - * All of them are API. CLI function on sr_policy_command_fn */ - -/** - * @brief Create a new SR policy - * - * @param bsid is the bindingSID of the SR Policy - * @param segments is a vector of IPv6 address composing the segment list - * @param weight is the weight of the sid list. optional. - * @param behavior is the behavior of the SR policy. (default//spray) - * @param fib_table is the VRF where to install the FIB entry for the BSID - * @param is_encap (bool) whether SR policy should behave as Encap/SRH Insertion - * - * @return 0 if correct, else error - */ -int -sr_policy_add (ip6_address_t * bsid, ip6_address_t * segments, - u32 weight, u8 behavior, u32 fib_table, u8 is_encap) -{ - ip6_sr_main_t *sm = &sr_main; - ip6_sr_policy_t *sr_policy = 0; - uword *p; - - /* Search for existing keys (BSID) */ - p = mhash_get (&sm->sr_policies_index_hash, bsid); - if (p) - { - /* Add SR policy that already exists; complain */ - return -12; - } - - /* Search collision in FIB entries */ - /* Explanation: It might be possible that some other entity has already - * created a route for the BSID. This in theory is impossible, but in - * practise we could see it. Assert it and scream if needed */ - fib_prefix_t pfx = { - .fp_proto = FIB_PROTOCOL_IP6, - .fp_len = 128, - .fp_addr = { - .ip6 = *bsid, - } - }; - - /* Lookup the FIB index associated to the table selected */ - u32 fib_index = fib_table_find (FIB_PROTOCOL_IP6, - (fib_table != (u32) ~ 0 ? fib_table : 0)); - if (fib_index == ~0) - return -13; - - /* Lookup whether there exists an entry for the BSID */ - fib_node_index_t fei = fib_table_lookup_exact_match (fib_index, &pfx); - if (FIB_NODE_INDEX_INVALID != fei) - return -12; //There is an entry for such lookup - - /* Add an SR policy object */ - pool_get (sm->sr_policies, sr_policy); - memset (sr_policy, 0, sizeof (*sr_policy)); - clib_memcpy (&sr_policy->bsid, bsid, sizeof (ip6_address_t)); - sr_policy->type = behavior; - sr_policy->fib_table = (fib_table != (u32) ~ 0 ? fib_table : 0); //Is default FIB 0 ? - sr_policy->is_encap = is_encap; - - /* Copy the key */ - mhash_set (&sm->sr_policies_index_hash, bsid, sr_policy - sm->sr_policies, - NULL); - - /* Create a segment list and add the index to the SR policy */ - create_sl (sr_policy, segments, weight, is_encap); - - /* If FIB doesnt exist, create them */ - if (sm->fib_table_ip6 == (u32) ~ 0) - { - sm->fib_table_ip6 = fib_table_create_and_lock (FIB_PROTOCOL_IP6, - "SRv6 steering of IP6 prefixes through BSIDs"); - sm->fib_table_ip4 = fib_table_create_and_lock (FIB_PROTOCOL_IP6, - "SRv6 steering of IP4 prefixes through BSIDs"); - } - - /* Create IPv6 FIB for the BindingSID attached to the DPO of the only SL */ - if (sr_policy->type == SR_POLICY_TYPE_DEFAULT) - update_lb (sr_policy); - else if (sr_policy->type == SR_POLICY_TYPE_SPRAY) - update_replicate (sr_policy); - return 0; -} - -/** - * @brief Delete a SR policy - * - * @param bsid is the bindingSID of the SR Policy - * @param index is the index of the SR policy - * - * @return 0 if correct, else error - */ -int -sr_policy_del (ip6_address_t * bsid, u32 index) -{ - ip6_sr_main_t *sm = &sr_main; - ip6_sr_policy_t *sr_policy = 0; - ip6_sr_sl_t *segment_list; - u32 *sl_index; - uword *p; - - if (bsid) - { - p = mhash_get (&sm->sr_policies_index_hash, bsid); - if (p) - sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); - else - return -1; - } - else - { - sr_policy = pool_elt_at_index (sm->sr_policies, index); - if (!sr_policy) - return -1; - } - - /* Remove BindingSID FIB entry */ - fib_prefix_t pfx = { - .fp_proto = FIB_PROTOCOL_IP6, - .fp_len = 128, - .fp_addr = { - .ip6 = sr_policy->bsid, - } - , - }; - - fib_table_entry_special_remove (fib_table_find (FIB_PROTOCOL_IP6, - sr_policy->fib_table), - &pfx, FIB_SOURCE_SR); - - fib_table_entry_special_remove (sm->fib_table_ip6, &pfx, FIB_SOURCE_SR); - - if (sr_policy->is_encap) - fib_table_entry_special_remove (sm->fib_table_ip4, &pfx, FIB_SOURCE_SR); - - if (dpo_id_is_valid (&sr_policy->bsid_dpo)) - { - dpo_reset (&sr_policy->bsid_dpo); - dpo_reset (&sr_policy->ip4_dpo); - dpo_reset (&sr_policy->ip6_dpo); - } - - /* Clean SID Lists */ - vec_foreach (sl_index, sr_policy->segments_lists) - { - segment_list = pool_elt_at_index (sm->sid_lists, *sl_index); - vec_free (segment_list->segments); - vec_free (segment_list->rewrite); - vec_free (segment_list->rewrite_bsid); - pool_put_index (sm->sid_lists, *sl_index); - } - - /* Remove SR policy entry */ - mhash_unset (&sm->sr_policies_index_hash, &sr_policy->bsid, NULL); - pool_put (sm->sr_policies, sr_policy); - - /* If FIB empty unlock it */ - if (!pool_elts (sm->sr_policies) && !pool_elts (sm->steer_policies)) - { - fib_table_unlock (sm->fib_table_ip6, FIB_PROTOCOL_IP6); - fib_table_unlock (sm->fib_table_ip4, FIB_PROTOCOL_IP6); - sm->fib_table_ip6 = (u32) ~ 0; - sm->fib_table_ip4 = (u32) ~ 0; - } - - return 0; -} - -/** - * @brief Modify an existing SR policy - * - * The possible modifications are adding a new Segment List, modifying an - * existing Segment List (modify the weight only) and delete a given - * Segment List from the SR Policy. - * - * @param bsid is the bindingSID of the SR Policy - * @param index is the index of the SR policy - * @param fib_table is the VRF where to install the FIB entry for the BSID - * @param operation is the operation to perform (among the top ones) - * @param segments is a vector of IPv6 address composing the segment list - * @param sl_index is the index of the Segment List to modify/delete - * @param weight is the weight of the sid list. optional. - * @param is_encap Mode. Encapsulation or SRH insertion. - * - * @return 0 if correct, else error - */ -int -sr_policy_mod (ip6_address_t * bsid, u32 index, u32 fib_table, - u8 operation, ip6_address_t * segments, u32 sl_index, - u32 weight) -{ - ip6_sr_main_t *sm = &sr_main; - ip6_sr_policy_t *sr_policy = 0; - ip6_sr_sl_t *segment_list; - u32 *sl_index_iterate; - uword *p; - - if (bsid) - { - p = mhash_get (&sm->sr_policies_index_hash, bsid); - if (p) - sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); - else - return -1; - } - else - { - sr_policy = pool_elt_at_index (sm->sr_policies, index); - if (!sr_policy) - return -1; - } - - if (operation == 1) /* Add SR List to an existing SR policy */ - { - /* Create the new SL */ - segment_list = - create_sl (sr_policy, segments, weight, sr_policy->is_encap); - - /* Create a new LB DPO */ - if (sr_policy->type == SR_POLICY_TYPE_DEFAULT) - update_lb (sr_policy); - else if (sr_policy->type == SR_POLICY_TYPE_SPRAY) - update_replicate (sr_policy); - } - else if (operation == 2) /* Delete SR List from an existing SR policy */ - { - /* Check that currently there are more than one SID list */ - if (vec_len (sr_policy->segments_lists) == 1) - return -21; - - /* Check that the SR list does exist and is assigned to the sr policy */ - vec_foreach (sl_index_iterate, sr_policy->segments_lists) - if (*sl_index_iterate == sl_index) - break; - - if (*sl_index_iterate != sl_index) - return -22; - - /* Remove the lucky SR list that is being kicked out */ - segment_list = pool_elt_at_index (sm->sid_lists, sl_index); - vec_free (segment_list->segments); - vec_free (segment_list->rewrite); - vec_free (segment_list->rewrite_bsid); - pool_put_index (sm->sid_lists, sl_index); - vec_del1 (sr_policy->segments_lists, - sl_index_iterate - sr_policy->segments_lists); - - /* Create a new LB DPO */ - if (sr_policy->type == SR_POLICY_TYPE_DEFAULT) - update_lb (sr_policy); - else if (sr_policy->type == SR_POLICY_TYPE_SPRAY) - update_replicate (sr_policy); - } - else if (operation == 3) /* Modify the weight of an existing SR List */ - { - /* Find the corresponding SL */ - vec_foreach (sl_index_iterate, sr_policy->segments_lists) - if (*sl_index_iterate == sl_index) - break; - - if (*sl_index_iterate != sl_index) - return -32; - - /* Change the weight */ - segment_list = pool_elt_at_index (sm->sid_lists, sl_index); - segment_list->weight = weight; - - /* Update LB */ - if (sr_policy->type == SR_POLICY_TYPE_DEFAULT) - update_lb (sr_policy); - } - else /* Incorrect op. */ - return -1; - - return 0; -} - -/** - * @brief CLI for 'sr policies' command family - */ -static clib_error_t * -sr_policy_command_fn (vlib_main_t * vm, unformat_input_t * input, - vlib_cli_command_t * cmd) -{ - int rv = -1; - char is_del = 0, is_add = 0, is_mod = 0; - char policy_set = 0; - ip6_address_t bsid, next_address; - u32 sr_policy_index = (u32) ~ 0, sl_index = (u32) ~ 0; - u32 weight = (u32) ~ 0, fib_table = (u32) ~ 0; - ip6_address_t *segments = 0, *this_seg; - u8 operation = 0; - char is_encap = 1; - char is_spray = 0; - - while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) - { - if (!is_add && !is_mod && !is_del && unformat (input, "add")) - is_add = 1; - else if (!is_add && !is_mod && !is_del && unformat (input, "del")) - is_del = 1; - else if (!is_add && !is_mod && !is_del && unformat (input, "mod")) - is_mod = 1; - else if (!policy_set - && unformat (input, "bsid %U", unformat_ip6_address, &bsid)) - policy_set = 1; - else if (!is_add && !policy_set - && unformat (input, "index %d", &sr_policy_index)) - policy_set = 1; - else if (unformat (input, "weight %d", &weight)); - else - if (unformat (input, "next %U", unformat_ip6_address, &next_address)) - { - vec_add2 (segments, this_seg, 1); - clib_memcpy (this_seg->as_u8, next_address.as_u8, - sizeof (*this_seg)); - } - else if (unformat (input, "add sl")) - operation = 1; - else if (unformat (input, "del sl index %d", &sl_index)) - operation = 2; - else if (unformat (input, "mod sl index %d", &sl_index)) - operation = 3; - else if (fib_table == (u32) ~ 0 - && unformat (input, "fib-table %d", &fib_table)); - else if (unformat (input, "encap")) - is_encap = 1; - else if (unformat (input, "insert")) - is_encap = 0; - else if (unformat (input, "spray")) - is_spray = 1; - else - break; - } - - if (!is_add && !is_mod && !is_del) - return clib_error_return (0, "Incorrect CLI"); - - if (!policy_set) - return clib_error_return (0, "No SR policy BSID or index specified"); - - if (is_add) - { - if (vec_len (segments) == 0) - return clib_error_return (0, "No Segment List specified"); - rv = sr_policy_add (&bsid, segments, weight, - (is_spray ? SR_POLICY_TYPE_SPRAY : - SR_POLICY_TYPE_DEFAULT), fib_table, is_encap); - } - else if (is_del) - rv = sr_policy_del ((sr_policy_index != (u32) ~ 0 ? NULL : &bsid), - sr_policy_index); - else if (is_mod) - { - if (!operation) - return clib_error_return (0, "No SL modification specified"); - if (operation != 1 && sl_index == (u32) ~ 0) - return clib_error_return (0, "No Segment List index specified"); - if (operation == 1 && vec_len (segments) == 0) - return clib_error_return (0, "No Segment List specified"); - if (operation == 3 && weight == (u32) ~ 0) - return clib_error_return (0, "No new weight for the SL specified"); - rv = sr_policy_mod ((sr_policy_index != (u32) ~ 0 ? NULL : &bsid), - sr_policy_index, fib_table, operation, segments, - sl_index, weight); - } - - switch (rv) - { - case 0: - break; - case 1: - return 0; - case -12: - return clib_error_return (0, - "There is already a FIB entry for the BindingSID address.\n" - "The SR policy could not be created."); - case -13: - return clib_error_return (0, "The specified FIB table does not exist."); - case -21: - return clib_error_return (0, - "The selected SR policy only contains ONE segment list. " - "Please remove the SR policy instead"); - case -22: - return clib_error_return (0, - "Could not delete the segment list. " - "It is not associated with that SR policy."); - case -32: - return clib_error_return (0, - "Could not modify the segment list. " - "The given SL is not associated with such SR policy."); - default: - return clib_error_return (0, "BUG: sr policy returns %d", rv); - } - return 0; -} - -/* *INDENT-OFF* */ -VLIB_CLI_COMMAND (sr_policy_command, static) = { - .path = "sr policy", - .short_help = "sr policy [add||del||mod] [bsid 2001::1||index 5] " - "next A:: next B:: next C:: (weight 1) (fib-table 2) (encap|insert)", - .long_help = - "Manipulation of SR policies.\n" - "A Segment Routing policy may contain several SID lists. Each SID list has\n" - "an associated weight (default 1), which will result in wECMP (uECMP).\n" - "Segment Routing policies might be of type encapsulation or srh insertion\n" - "Each SR policy will be associated with a unique BindingSID.\n" - "A BindingSID is a locally allocated SegmentID. For every packet that arrives\n" - "with IPv6_DA:BSID such traffic will be steered into the SR policy.\n" - "The add command will create a SR policy with its first segment list (sl)\n" - "The mod command allows you to add, remove, or modify the existing segment lists\n" - "within an SR policy.\n" - "The del command allows you to delete a SR policy along with all its associated\n" - "SID lists.\n", - .function = sr_policy_command_fn, -}; -/* *INDENT-ON* */ - -/** - * @brief CLI to display onscreen all the SR policies - */ -static clib_error_t * -show_sr_policies_command_fn (vlib_main_t * vm, unformat_input_t * input, - vlib_cli_command_t * cmd) -{ - ip6_sr_main_t *sm = &sr_main; - u32 *sl_index; - ip6_sr_sl_t *segment_list = 0; - ip6_sr_policy_t *sr_policy = 0; - ip6_sr_policy_t **vec_policies = 0; - ip6_address_t *addr; - u8 *s; - int i = 0; - - vlib_cli_output (vm, "SR policies:"); - - /* *INDENT-OFF* */ - pool_foreach (sr_policy, sm->sr_policies, - {vec_add1 (vec_policies, sr_policy); } ); - /* *INDENT-ON* */ - - vec_foreach_index (i, vec_policies) - { - sr_policy = vec_policies[i]; - vlib_cli_output (vm, "[%u].-\tBSID: %U", - (u32) (sr_policy - sm->sr_policies), - format_ip6_address, &sr_policy->bsid); - vlib_cli_output (vm, "\tBehavior: %s", - (sr_policy->is_encap ? "Encapsulation" : - "SRH insertion")); - vlib_cli_output (vm, "\tType: %s", - (sr_policy->type == - SR_POLICY_TYPE_DEFAULT ? "Default" : "Spray")); - vlib_cli_output (vm, "\tFIB table: %u", - (sr_policy->fib_table != - (u32) ~ 0 ? sr_policy->fib_table : 0)); - vlib_cli_output (vm, "\tSegment Lists:"); - vec_foreach (sl_index, sr_policy->segments_lists) - { - s = NULL; - s = format (s, "\t[%u].- ", *sl_index); - segment_list = pool_elt_at_index (sm->sid_lists, *sl_index); - s = format (s, "< "); - vec_foreach (addr, segment_list->segments) - { - s = format (s, "%U, ", format_ip6_address, addr); - } - s = format (s, "\b\b > "); - s = format (s, "weight: %u", segment_list->weight); - vlib_cli_output (vm, " %s", s); - } - vlib_cli_output (vm, "-----------"); - } - return 0; -} - -/* *INDENT-OFF* */ -VLIB_CLI_COMMAND (show_sr_policies_command, static) = { - .path = "show sr policies", - .short_help = "show sr policies", - .function = show_sr_policies_command_fn, -}; -/* *INDENT-ON* */ - -/*************************** SR rewrite graph node ****************************/ -/** - * @brief Trace for the SR Policy Rewrite graph node - */ -static u8 * -format_sr_policy_rewrite_trace (u8 * s, va_list * args) -{ - //TODO - CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *); - CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *); - sr_policy_rewrite_trace_t *t = va_arg (*args, sr_policy_rewrite_trace_t *); - - s = format - (s, "SR-policy-rewrite: src %U dst %U", - format_ip6_address, &t->src, format_ip6_address, &t->dst); - - return s; -} - -/** - * @brief IPv6 encapsulation processing as per RFC2473 - */ -static_always_inline void -encaps_processing_v6 (vlib_node_runtime_t * node, - vlib_buffer_t * b0, - ip6_header_t * ip0, ip6_header_t * ip0_encap) -{ - u32 new_l0; - - ip0_encap->hop_limit -= 1; - new_l0 = - ip0->payload_length + sizeof (ip6_header_t) + - clib_net_to_host_u16 (ip0_encap->payload_length); - ip0->payload_length = clib_host_to_net_u16 (new_l0); - ip0->ip_version_traffic_class_and_flow_label = - ip0_encap->ip_version_traffic_class_and_flow_label; -} - -/** - * @brief Graph node for applying a SR policy into an IPv6 packet. Encapsulation - */ -static uword -sr_policy_rewrite_encaps (vlib_main_t * vm, vlib_node_runtime_t * node, - vlib_frame_t * from_frame) -{ - ip6_sr_main_t *sm = &sr_main; - u32 n_left_from, next_index, *from, *to_next; - - from = vlib_frame_vector_args (from_frame); - n_left_from = from_frame->n_vectors; - - next_index = node->cached_next_index; - - int encap_pkts = 0, bsid_pkts = 0; - - while (n_left_from > 0) - { - u32 n_left_to_next; - - vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); - - /* Quad - Loop */ - while (n_left_from >= 8 && n_left_to_next >= 4) - { - u32 bi0, bi1, bi2, bi3; - vlib_buffer_t *b0, *b1, *b2, *b3; - u32 next0, next1, next2, next3; - next0 = next1 = next2 = next3 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; - ip6_header_t *ip0, *ip1, *ip2, *ip3; - ip6_header_t *ip0_encap, *ip1_encap, *ip2_encap, *ip3_encap; - ip6_sr_sl_t *sl0, *sl1, *sl2, *sl3; - - /* Prefetch next iteration. */ - { - vlib_buffer_t *p4, *p5, *p6, *p7; - - p4 = vlib_get_buffer (vm, from[4]); - p5 = vlib_get_buffer (vm, from[5]); - p6 = vlib_get_buffer (vm, from[6]); - p7 = vlib_get_buffer (vm, from[7]); - - /* Prefetch the buffer header and packet for the N+2 loop iteration */ - vlib_prefetch_buffer_header (p4, LOAD); - vlib_prefetch_buffer_header (p5, LOAD); - vlib_prefetch_buffer_header (p6, LOAD); - vlib_prefetch_buffer_header (p7, LOAD); - - CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); - } - - to_next[0] = bi0 = from[0]; - to_next[1] = bi1 = from[1]; - to_next[2] = bi2 = from[2]; - to_next[3] = bi3 = from[3]; - from += 4; - to_next += 4; - n_left_from -= 4; - n_left_to_next -= 4; - - b0 = vlib_get_buffer (vm, bi0); - b1 = vlib_get_buffer (vm, bi1); - b2 = vlib_get_buffer (vm, bi2); - b3 = vlib_get_buffer (vm, bi3); - - sl0 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - sl1 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b1)->ip.adj_index[VLIB_TX]); - sl2 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b2)->ip.adj_index[VLIB_TX]); - sl3 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b3)->ip.adj_index[VLIB_TX]); - - ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl0->rewrite)); - ASSERT (b1->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl1->rewrite)); - ASSERT (b2->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl2->rewrite)); - ASSERT (b3->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl3->rewrite)); - - ip0_encap = vlib_buffer_get_current (b0); - ip1_encap = vlib_buffer_get_current (b1); - ip2_encap = vlib_buffer_get_current (b2); - ip3_encap = vlib_buffer_get_current (b3); - - clib_memcpy (((u8 *) ip0_encap) - vec_len (sl0->rewrite), - sl0->rewrite, vec_len (sl0->rewrite)); - clib_memcpy (((u8 *) ip1_encap) - vec_len (sl1->rewrite), - sl1->rewrite, vec_len (sl1->rewrite)); - clib_memcpy (((u8 *) ip2_encap) - vec_len (sl2->rewrite), - sl2->rewrite, vec_len (sl2->rewrite)); - clib_memcpy (((u8 *) ip3_encap) - vec_len (sl3->rewrite), - sl3->rewrite, vec_len (sl3->rewrite)); - - vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); - vlib_buffer_advance (b1, -(word) vec_len (sl1->rewrite)); - vlib_buffer_advance (b2, -(word) vec_len (sl2->rewrite)); - vlib_buffer_advance (b3, -(word) vec_len (sl3->rewrite)); - - ip0 = vlib_buffer_get_current (b0); - ip1 = vlib_buffer_get_current (b1); - ip2 = vlib_buffer_get_current (b2); - ip3 = vlib_buffer_get_current (b3); - - encaps_processing_v6 (node, b0, ip0, ip0_encap); - encaps_processing_v6 (node, b1, ip1, ip1_encap); - encaps_processing_v6 (node, b2, ip2, ip2_encap); - encaps_processing_v6 (node, b3, ip3, ip3_encap); - - if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) - { - if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b1->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b1, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip1->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip1->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b2->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b2, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip2->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip2->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b3->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b3, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip3->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip3->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - } - - encap_pkts += 4; - vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, - n_left_to_next, bi0, bi1, bi2, bi3, - next0, next1, next2, next3); - } - - /* Single loop for potentially the last three packets */ - while (n_left_from > 0 && n_left_to_next > 0) - { - u32 bi0; - vlib_buffer_t *b0; - ip6_header_t *ip0 = 0, *ip0_encap = 0; - ip6_sr_sl_t *sl0; - u32 next0 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; - - bi0 = from[0]; - to_next[0] = bi0; - from += 1; - to_next += 1; - n_left_from -= 1; - n_left_to_next -= 1; - b0 = vlib_get_buffer (vm, bi0); - - sl0 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl0->rewrite)); - - ip0_encap = vlib_buffer_get_current (b0); - - clib_memcpy (((u8 *) ip0_encap) - vec_len (sl0->rewrite), - sl0->rewrite, vec_len (sl0->rewrite)); - vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); - - ip0 = vlib_buffer_get_current (b0); - - encaps_processing_v6 (node, b0, ip0, ip0_encap); - - if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE) && - PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - encap_pkts++; - vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, - n_left_to_next, bi0, next0); - } - - vlib_put_next_frame (vm, node, next_index, n_left_to_next); - } - - /* Update counters */ - vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, - SR_POLICY_REWRITE_ERROR_COUNTER_TOTAL, - encap_pkts); - vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, - SR_POLICY_REWRITE_ERROR_COUNTER_BSID, - bsid_pkts); - - return from_frame->n_vectors; -} - -/* *INDENT-OFF* */ -VLIB_REGISTER_NODE (sr_policy_rewrite_encaps_node) = { - .function = sr_policy_rewrite_encaps, - .name = "sr-pl-rewrite-encaps", - .vector_size = sizeof (u32), - .format_trace = format_sr_policy_rewrite_trace, - .type = VLIB_NODE_TYPE_INTERNAL, - .n_errors = SR_POLICY_REWRITE_N_ERROR, - .error_strings = sr_policy_rewrite_error_strings, - .n_next_nodes = SR_POLICY_REWRITE_N_NEXT, - .next_nodes = { -#define _(s,n) [SR_POLICY_REWRITE_NEXT_##s] = n, - foreach_sr_policy_rewrite_next -#undef _ - }, -}; -/* *INDENT-ON* */ - -/** - * @brief IPv4 encapsulation processing as per RFC2473 - */ -static_always_inline void -encaps_processing_v4 (vlib_node_runtime_t * node, - vlib_buffer_t * b0, - ip6_header_t * ip0, ip4_header_t * ip0_encap) -{ - u32 new_l0; - ip6_sr_header_t *sr0; - - u32 checksum0; - - /* Inner IPv4: Decrement TTL & update checksum */ - ip0_encap->ttl -= 1; - checksum0 = ip0_encap->checksum + clib_host_to_net_u16 (0x0100); - checksum0 += checksum0 >= 0xffff; - ip0_encap->checksum = checksum0; - - /* Outer IPv6: Update length, FL, proto */ - new_l0 = ip0->payload_length + clib_net_to_host_u16 (ip0_encap->length); - ip0->payload_length = clib_host_to_net_u16 (new_l0); - ip0->ip_version_traffic_class_and_flow_label = - clib_host_to_net_u32 (0 | ((6 & 0xF) << 28) | - ((ip0_encap->tos & 0xFF) << 20)); - sr0 = (void *) (ip0 + 1); - sr0->protocol = IP_PROTOCOL_IP_IN_IP; -} - -/** - * @brief Graph node for applying a SR policy into an IPv4 packet. Encapsulation - */ -static uword -sr_policy_rewrite_encaps_v4 (vlib_main_t * vm, vlib_node_runtime_t * node, - vlib_frame_t * from_frame) -{ - ip6_sr_main_t *sm = &sr_main; - u32 n_left_from, next_index, *from, *to_next; - - from = vlib_frame_vector_args (from_frame); - n_left_from = from_frame->n_vectors; - - next_index = node->cached_next_index; - - int encap_pkts = 0, bsid_pkts = 0; - - while (n_left_from > 0) - { - u32 n_left_to_next; - - vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); - - /* Quad - Loop */ - while (n_left_from >= 8 && n_left_to_next >= 4) - { - u32 bi0, bi1, bi2, bi3; - vlib_buffer_t *b0, *b1, *b2, *b3; - u32 next0, next1, next2, next3; - next0 = next1 = next2 = next3 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; - ip6_header_t *ip0, *ip1, *ip2, *ip3; - ip4_header_t *ip0_encap, *ip1_encap, *ip2_encap, *ip3_encap; - ip6_sr_sl_t *sl0, *sl1, *sl2, *sl3; - - /* Prefetch next iteration. */ - { - vlib_buffer_t *p4, *p5, *p6, *p7; - - p4 = vlib_get_buffer (vm, from[4]); - p5 = vlib_get_buffer (vm, from[5]); - p6 = vlib_get_buffer (vm, from[6]); - p7 = vlib_get_buffer (vm, from[7]); - - /* Prefetch the buffer header and packet for the N+2 loop iteration */ - vlib_prefetch_buffer_header (p4, LOAD); - vlib_prefetch_buffer_header (p5, LOAD); - vlib_prefetch_buffer_header (p6, LOAD); - vlib_prefetch_buffer_header (p7, LOAD); - - CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); - } - - to_next[0] = bi0 = from[0]; - to_next[1] = bi1 = from[1]; - to_next[2] = bi2 = from[2]; - to_next[3] = bi3 = from[3]; - from += 4; - to_next += 4; - n_left_from -= 4; - n_left_to_next -= 4; - - b0 = vlib_get_buffer (vm, bi0); - b1 = vlib_get_buffer (vm, bi1); - b2 = vlib_get_buffer (vm, bi2); - b3 = vlib_get_buffer (vm, bi3); - - sl0 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - sl1 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b1)->ip.adj_index[VLIB_TX]); - sl2 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b2)->ip.adj_index[VLIB_TX]); - sl3 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b3)->ip.adj_index[VLIB_TX]); - ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl0->rewrite)); - ASSERT (b1->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl1->rewrite)); - ASSERT (b2->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl2->rewrite)); - ASSERT (b3->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl3->rewrite)); - - ip0_encap = vlib_buffer_get_current (b0); - ip1_encap = vlib_buffer_get_current (b1); - ip2_encap = vlib_buffer_get_current (b2); - ip3_encap = vlib_buffer_get_current (b3); - - clib_memcpy (((u8 *) ip0_encap) - vec_len (sl0->rewrite), - sl0->rewrite, vec_len (sl0->rewrite)); - clib_memcpy (((u8 *) ip1_encap) - vec_len (sl1->rewrite), - sl1->rewrite, vec_len (sl1->rewrite)); - clib_memcpy (((u8 *) ip2_encap) - vec_len (sl2->rewrite), - sl2->rewrite, vec_len (sl2->rewrite)); - clib_memcpy (((u8 *) ip3_encap) - vec_len (sl3->rewrite), - sl3->rewrite, vec_len (sl3->rewrite)); - - vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); - vlib_buffer_advance (b1, -(word) vec_len (sl1->rewrite)); - vlib_buffer_advance (b2, -(word) vec_len (sl2->rewrite)); - vlib_buffer_advance (b3, -(word) vec_len (sl3->rewrite)); - - ip0 = vlib_buffer_get_current (b0); - ip1 = vlib_buffer_get_current (b1); - ip2 = vlib_buffer_get_current (b2); - ip3 = vlib_buffer_get_current (b3); - - encaps_processing_v4 (node, b0, ip0, ip0_encap); - encaps_processing_v4 (node, b1, ip1, ip1_encap); - encaps_processing_v4 (node, b2, ip2, ip2_encap); - encaps_processing_v4 (node, b3, ip3, ip3_encap); - - if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) - { - if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b1->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b1, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip1->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip1->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b2->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b2, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip2->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip2->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b3->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b3, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip3->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip3->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - } - - encap_pkts += 4; - vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, - n_left_to_next, bi0, bi1, bi2, bi3, - next0, next1, next2, next3); - } - - /* Single loop for potentially the last three packets */ - while (n_left_from > 0 && n_left_to_next > 0) - { - u32 bi0; - vlib_buffer_t *b0; - ip6_header_t *ip0 = 0; - ip4_header_t *ip0_encap = 0; - ip6_sr_sl_t *sl0; - u32 next0 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; - - bi0 = from[0]; - to_next[0] = bi0; - from += 1; - to_next += 1; - n_left_from -= 1; - n_left_to_next -= 1; - b0 = vlib_get_buffer (vm, bi0); - - sl0 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl0->rewrite)); - - ip0_encap = vlib_buffer_get_current (b0); - - clib_memcpy (((u8 *) ip0_encap) - vec_len (sl0->rewrite), - sl0->rewrite, vec_len (sl0->rewrite)); - vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); - - ip0 = vlib_buffer_get_current (b0); - - encaps_processing_v4 (node, b0, ip0, ip0_encap); - - if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE) && - PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - encap_pkts++; - vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, - n_left_to_next, bi0, next0); - } - - vlib_put_next_frame (vm, node, next_index, n_left_to_next); - } - - /* Update counters */ - vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, - SR_POLICY_REWRITE_ERROR_COUNTER_TOTAL, - encap_pkts); - vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, - SR_POLICY_REWRITE_ERROR_COUNTER_BSID, - bsid_pkts); - - return from_frame->n_vectors; -} - -/* *INDENT-OFF* */ -VLIB_REGISTER_NODE (sr_policy_rewrite_encaps_v4_node) = { - .function = sr_policy_rewrite_encaps_v4, - .name = "sr-pl-rewrite-encaps-v4", - .vector_size = sizeof (u32), - .format_trace = format_sr_policy_rewrite_trace, - .type = VLIB_NODE_TYPE_INTERNAL, - .n_errors = SR_POLICY_REWRITE_N_ERROR, - .error_strings = sr_policy_rewrite_error_strings, - .n_next_nodes = SR_POLICY_REWRITE_N_NEXT, - .next_nodes = { -#define _(s,n) [SR_POLICY_REWRITE_NEXT_##s] = n, - foreach_sr_policy_rewrite_next -#undef _ - }, -}; -/* *INDENT-ON* */ - -always_inline u32 -ip_flow_hash (void *data) -{ - ip4_header_t *iph = (ip4_header_t *) data; - - if ((iph->ip_version_and_header_length & 0xF0) == 0x40) - return ip4_compute_flow_hash (iph, IP_FLOW_HASH_DEFAULT); - else - return ip6_compute_flow_hash ((ip6_header_t *) iph, IP_FLOW_HASH_DEFAULT); -} - -always_inline u64 -mac_to_u64 (u8 * m) -{ - return (*((u64 *) m) & 0xffffffffffff); -} - -always_inline u32 -l2_flow_hash (vlib_buffer_t * b0) -{ - ethernet_header_t *eh; - u64 a, b, c; - uword is_ip, eh_size; - u16 eh_type; - - eh = vlib_buffer_get_current (b0); - eh_type = clib_net_to_host_u16 (eh->type); - eh_size = ethernet_buffer_header_size (b0); - - is_ip = (eh_type == ETHERNET_TYPE_IP4 || eh_type == ETHERNET_TYPE_IP6); - - /* since we have 2 cache lines, use them */ - if (is_ip) - a = ip_flow_hash ((u8 *) vlib_buffer_get_current (b0) + eh_size); - else - a = eh->type; - - b = mac_to_u64 ((u8 *) eh->dst_address); - c = mac_to_u64 ((u8 *) eh->src_address); - hash_mix64 (a, b, c); - - return (u32) c; -} - -/** - * @brief Graph node for applying a SR policy into a L2 frame - */ -static uword -sr_policy_rewrite_encaps_l2 (vlib_main_t * vm, vlib_node_runtime_t * node, - vlib_frame_t * from_frame) -{ - ip6_sr_main_t *sm = &sr_main; - u32 n_left_from, next_index, *from, *to_next; - - from = vlib_frame_vector_args (from_frame); - n_left_from = from_frame->n_vectors; - - next_index = node->cached_next_index; - - int encap_pkts = 0, bsid_pkts = 0; - - while (n_left_from > 0) - { - u32 n_left_to_next; - - vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); - - /* Quad - Loop */ - while (n_left_from >= 8 && n_left_to_next >= 4) - { - u32 bi0, bi1, bi2, bi3; - vlib_buffer_t *b0, *b1, *b2, *b3; - u32 next0, next1, next2, next3; - next0 = next1 = next2 = next3 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; - ethernet_header_t *en0, *en1, *en2, *en3; - ip6_header_t *ip0, *ip1, *ip2, *ip3; - ip6_sr_header_t *sr0, *sr1, *sr2, *sr3; - ip6_sr_policy_t *sp0, *sp1, *sp2, *sp3; - ip6_sr_sl_t *sl0, *sl1, *sl2, *sl3; - - /* Prefetch next iteration. */ - { - vlib_buffer_t *p4, *p5, *p6, *p7; - - p4 = vlib_get_buffer (vm, from[4]); - p5 = vlib_get_buffer (vm, from[5]); - p6 = vlib_get_buffer (vm, from[6]); - p7 = vlib_get_buffer (vm, from[7]); - - /* Prefetch the buffer header and packet for the N+2 loop iteration */ - vlib_prefetch_buffer_header (p4, LOAD); - vlib_prefetch_buffer_header (p5, LOAD); - vlib_prefetch_buffer_header (p6, LOAD); - vlib_prefetch_buffer_header (p7, LOAD); - - CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); - } - - to_next[0] = bi0 = from[0]; - to_next[1] = bi1 = from[1]; - to_next[2] = bi2 = from[2]; - to_next[3] = bi3 = from[3]; - from += 4; - to_next += 4; - n_left_from -= 4; - n_left_to_next -= 4; - - b0 = vlib_get_buffer (vm, bi0); - b1 = vlib_get_buffer (vm, bi1); - b2 = vlib_get_buffer (vm, bi2); - b3 = vlib_get_buffer (vm, bi3); - - sp0 = pool_elt_at_index (sm->sr_policies, - sm->sw_iface_sr_policies[vnet_buffer - (b0)->sw_if_index - [VLIB_RX]]); - - sp1 = pool_elt_at_index (sm->sr_policies, - sm->sw_iface_sr_policies[vnet_buffer - (b1)->sw_if_index - [VLIB_RX]]); - - sp2 = pool_elt_at_index (sm->sr_policies, - sm->sw_iface_sr_policies[vnet_buffer - (b2)->sw_if_index - [VLIB_RX]]); - - sp3 = pool_elt_at_index (sm->sr_policies, - sm->sw_iface_sr_policies[vnet_buffer - (b3)->sw_if_index - [VLIB_RX]]); - - if (vec_len (sp0->segments_lists) == 1) - vnet_buffer (b0)->ip.adj_index[VLIB_TX] = sp0->segments_lists[0]; - else - { - vnet_buffer (b0)->ip.flow_hash = l2_flow_hash (b0); - vnet_buffer (b0)->ip.adj_index[VLIB_TX] = - sp0->segments_lists[(vnet_buffer (b0)->ip.flow_hash & - (vec_len (sp0->segments_lists) - 1))]; - } - - if (vec_len (sp1->segments_lists) == 1) - vnet_buffer (b1)->ip.adj_index[VLIB_TX] = sp1->segments_lists[1]; - else - { - vnet_buffer (b1)->ip.flow_hash = l2_flow_hash (b1); - vnet_buffer (b1)->ip.adj_index[VLIB_TX] = - sp1->segments_lists[(vnet_buffer (b1)->ip.flow_hash & - (vec_len (sp1->segments_lists) - 1))]; - } - - if (vec_len (sp2->segments_lists) == 1) - vnet_buffer (b2)->ip.adj_index[VLIB_TX] = sp2->segments_lists[2]; - else - { - vnet_buffer (b2)->ip.flow_hash = l2_flow_hash (b2); - vnet_buffer (b2)->ip.adj_index[VLIB_TX] = - sp2->segments_lists[(vnet_buffer (b2)->ip.flow_hash & - (vec_len (sp2->segments_lists) - 1))]; - } - - if (vec_len (sp3->segments_lists) == 1) - vnet_buffer (b3)->ip.adj_index[VLIB_TX] = sp3->segments_lists[3]; - else - { - vnet_buffer (b3)->ip.flow_hash = l2_flow_hash (b3); - vnet_buffer (b3)->ip.adj_index[VLIB_TX] = - sp3->segments_lists[(vnet_buffer (b3)->ip.flow_hash & - (vec_len (sp3->segments_lists) - 1))]; - } - - sl0 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - sl1 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b1)->ip.adj_index[VLIB_TX]); - sl2 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b2)->ip.adj_index[VLIB_TX]); - sl3 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b3)->ip.adj_index[VLIB_TX]); - - ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl0->rewrite)); - ASSERT (b1->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl1->rewrite)); - ASSERT (b2->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl2->rewrite)); - ASSERT (b3->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl3->rewrite)); - - en0 = vlib_buffer_get_current (b0); - en1 = vlib_buffer_get_current (b1); - en2 = vlib_buffer_get_current (b2); - en3 = vlib_buffer_get_current (b3); - - clib_memcpy (((u8 *) en0) - vec_len (sl0->rewrite), sl0->rewrite, - vec_len (sl0->rewrite)); - clib_memcpy (((u8 *) en1) - vec_len (sl1->rewrite), sl1->rewrite, - vec_len (sl1->rewrite)); - clib_memcpy (((u8 *) en2) - vec_len (sl2->rewrite), sl2->rewrite, - vec_len (sl2->rewrite)); - clib_memcpy (((u8 *) en3) - vec_len (sl3->rewrite), sl3->rewrite, - vec_len (sl3->rewrite)); - - vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); - vlib_buffer_advance (b1, -(word) vec_len (sl1->rewrite)); - vlib_buffer_advance (b2, -(word) vec_len (sl2->rewrite)); - vlib_buffer_advance (b3, -(word) vec_len (sl3->rewrite)); - - ip0 = vlib_buffer_get_current (b0); - ip1 = vlib_buffer_get_current (b1); - ip2 = vlib_buffer_get_current (b2); - ip3 = vlib_buffer_get_current (b3); - - ip0->payload_length = - clib_host_to_net_u16 (b0->current_length - sizeof (ip6_header_t)); - ip1->payload_length = - clib_host_to_net_u16 (b1->current_length - sizeof (ip6_header_t)); - ip2->payload_length = - clib_host_to_net_u16 (b2->current_length - sizeof (ip6_header_t)); - ip3->payload_length = - clib_host_to_net_u16 (b3->current_length - sizeof (ip6_header_t)); - - sr0 = (void *) (ip0 + 1); - sr1 = (void *) (ip1 + 1); - sr2 = (void *) (ip2 + 1); - sr3 = (void *) (ip3 + 1); - - sr0->protocol = sr1->protocol = sr2->protocol = sr3->protocol = - IP_PROTOCOL_IP6_NONXT; - - /* Which Traffic class and flow label do I set ? */ - //ip0->ip_version_traffic_class_and_flow_label = clib_host_to_net_u32(0|((6&0xF)<<28)|((ip0_encap->tos&0xFF)<<20)); - - if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) - { - if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b1->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b1, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip1->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip1->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b2->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b2, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip2->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip2->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b3->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b3, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip3->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip3->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - } - - encap_pkts += 4; - vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, - n_left_to_next, bi0, bi1, bi2, bi3, - next0, next1, next2, next3); - } - - /* Single loop for potentially the last three packets */ - while (n_left_from > 0 && n_left_to_next > 0) - { - u32 bi0; - vlib_buffer_t *b0; - ip6_header_t *ip0 = 0; - ip6_sr_header_t *sr0; - ethernet_header_t *en0; - ip6_sr_policy_t *sp0; - ip6_sr_sl_t *sl0; - u32 next0 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; - - bi0 = from[0]; - to_next[0] = bi0; - from += 1; - to_next += 1; - n_left_from -= 1; - n_left_to_next -= 1; - b0 = vlib_get_buffer (vm, bi0); - - /* Find the SR policy */ - sp0 = pool_elt_at_index (sm->sr_policies, - sm->sw_iface_sr_policies[vnet_buffer - (b0)->sw_if_index - [VLIB_RX]]); - - /* In case there is more than one SL, LB among them */ - if (vec_len (sp0->segments_lists) == 1) - vnet_buffer (b0)->ip.adj_index[VLIB_TX] = sp0->segments_lists[0]; - else - { - vnet_buffer (b0)->ip.flow_hash = l2_flow_hash (b0); - vnet_buffer (b0)->ip.adj_index[VLIB_TX] = - sp0->segments_lists[(vnet_buffer (b0)->ip.flow_hash & - (vec_len (sp0->segments_lists) - 1))]; - } - sl0 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl0->rewrite)); - - en0 = vlib_buffer_get_current (b0); - - clib_memcpy (((u8 *) en0) - vec_len (sl0->rewrite), sl0->rewrite, - vec_len (sl0->rewrite)); - - vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); - - ip0 = vlib_buffer_get_current (b0); - - ip0->payload_length = - clib_host_to_net_u16 (b0->current_length - sizeof (ip6_header_t)); - - sr0 = (void *) (ip0 + 1); - sr0->protocol = IP_PROTOCOL_IP6_NONXT; - - if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE) && - PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - encap_pkts++; - vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, - n_left_to_next, bi0, next0); - } - - vlib_put_next_frame (vm, node, next_index, n_left_to_next); - } - - /* Update counters */ - vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, - SR_POLICY_REWRITE_ERROR_COUNTER_TOTAL, - encap_pkts); - vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, - SR_POLICY_REWRITE_ERROR_COUNTER_BSID, - bsid_pkts); - - return from_frame->n_vectors; -} - -/* *INDENT-OFF* */ -VLIB_REGISTER_NODE (sr_policy_rewrite_encaps_l2_node) = { - .function = sr_policy_rewrite_encaps_l2, - .name = "sr-pl-rewrite-encaps-l2", - .vector_size = sizeof (u32), - .format_trace = format_sr_policy_rewrite_trace, - .type = VLIB_NODE_TYPE_INTERNAL, - .n_errors = SR_POLICY_REWRITE_N_ERROR, - .error_strings = sr_policy_rewrite_error_strings, - .n_next_nodes = SR_POLICY_REWRITE_N_NEXT, - .next_nodes = { -#define _(s,n) [SR_POLICY_REWRITE_NEXT_##s] = n, - foreach_sr_policy_rewrite_next -#undef _ - }, -}; -/* *INDENT-ON* */ - -/** - * @brief Graph node for applying a SR policy into a packet. SRH insertion. - */ -static uword -sr_policy_rewrite_insert (vlib_main_t * vm, vlib_node_runtime_t * node, - vlib_frame_t * from_frame) -{ - ip6_sr_main_t *sm = &sr_main; - u32 n_left_from, next_index, *from, *to_next; - - from = vlib_frame_vector_args (from_frame); - n_left_from = from_frame->n_vectors; - - next_index = node->cached_next_index; - - int insert_pkts = 0, bsid_pkts = 0; - - while (n_left_from > 0) - { - u32 n_left_to_next; - - vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); - - /* Quad - Loop */ - while (n_left_from >= 8 && n_left_to_next >= 4) - { - u32 bi0, bi1, bi2, bi3; - vlib_buffer_t *b0, *b1, *b2, *b3; - u32 next0, next1, next2, next3; - next0 = next1 = next2 = next3 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; - ip6_header_t *ip0, *ip1, *ip2, *ip3; - ip6_sr_header_t *sr0, *sr1, *sr2, *sr3; - ip6_sr_sl_t *sl0, *sl1, *sl2, *sl3; - u16 new_l0, new_l1, new_l2, new_l3; - - /* Prefetch next iteration. */ - { - vlib_buffer_t *p4, *p5, *p6, *p7; - - p4 = vlib_get_buffer (vm, from[4]); - p5 = vlib_get_buffer (vm, from[5]); - p6 = vlib_get_buffer (vm, from[6]); - p7 = vlib_get_buffer (vm, from[7]); - - /* Prefetch the buffer header and packet for the N+2 loop iteration */ - vlib_prefetch_buffer_header (p4, LOAD); - vlib_prefetch_buffer_header (p5, LOAD); - vlib_prefetch_buffer_header (p6, LOAD); - vlib_prefetch_buffer_header (p7, LOAD); - - CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); - } - - to_next[0] = bi0 = from[0]; - to_next[1] = bi1 = from[1]; - to_next[2] = bi2 = from[2]; - to_next[3] = bi3 = from[3]; - from += 4; - to_next += 4; - n_left_from -= 4; - n_left_to_next -= 4; - - b0 = vlib_get_buffer (vm, bi0); - b1 = vlib_get_buffer (vm, bi1); - b2 = vlib_get_buffer (vm, bi2); - b3 = vlib_get_buffer (vm, bi3); - - sl0 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - sl1 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b1)->ip.adj_index[VLIB_TX]); - sl2 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b2)->ip.adj_index[VLIB_TX]); - sl3 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b3)->ip.adj_index[VLIB_TX]); - ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl0->rewrite)); - ASSERT (b1->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl1->rewrite)); - ASSERT (b2->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl2->rewrite)); - ASSERT (b3->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl3->rewrite)); - - ip0 = vlib_buffer_get_current (b0); - ip1 = vlib_buffer_get_current (b1); - ip2 = vlib_buffer_get_current (b2); - ip3 = vlib_buffer_get_current (b3); - - if (ip0->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) - sr0 = - (ip6_sr_header_t *) (((void *) (ip0 + 1)) + - ip6_ext_header_len (ip0 + 1)); - else - sr0 = (ip6_sr_header_t *) (ip0 + 1); - - if (ip1->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) - sr1 = - (ip6_sr_header_t *) (((void *) (ip1 + 1)) + - ip6_ext_header_len (ip1 + 1)); - else - sr1 = (ip6_sr_header_t *) (ip1 + 1); - - if (ip2->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) - sr2 = - (ip6_sr_header_t *) (((void *) (ip2 + 1)) + - ip6_ext_header_len (ip2 + 1)); - else - sr2 = (ip6_sr_header_t *) (ip2 + 1); - - if (ip3->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) - sr3 = - (ip6_sr_header_t *) (((void *) (ip3 + 1)) + - ip6_ext_header_len (ip3 + 1)); - else - sr3 = (ip6_sr_header_t *) (ip3 + 1); - - clib_memcpy ((u8 *) ip0 - vec_len (sl0->rewrite), (u8 *) ip0, - (void *) sr0 - (void *) ip0); - clib_memcpy ((u8 *) ip1 - vec_len (sl1->rewrite), (u8 *) ip1, - (void *) sr1 - (void *) ip1); - clib_memcpy ((u8 *) ip2 - vec_len (sl2->rewrite), (u8 *) ip2, - (void *) sr2 - (void *) ip2); - clib_memcpy ((u8 *) ip3 - vec_len (sl3->rewrite), (u8 *) ip3, - (void *) sr3 - (void *) ip3); - - clib_memcpy (((u8 *) sr0 - vec_len (sl0->rewrite)), sl0->rewrite, - vec_len (sl0->rewrite)); - clib_memcpy (((u8 *) sr1 - vec_len (sl1->rewrite)), sl1->rewrite, - vec_len (sl1->rewrite)); - clib_memcpy (((u8 *) sr2 - vec_len (sl2->rewrite)), sl2->rewrite, - vec_len (sl2->rewrite)); - clib_memcpy (((u8 *) sr3 - vec_len (sl3->rewrite)), sl3->rewrite, - vec_len (sl3->rewrite)); - - vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); - vlib_buffer_advance (b1, -(word) vec_len (sl1->rewrite)); - vlib_buffer_advance (b2, -(word) vec_len (sl2->rewrite)); - vlib_buffer_advance (b3, -(word) vec_len (sl3->rewrite)); - - ip0 = ((void *) ip0) - vec_len (sl0->rewrite); - ip1 = ((void *) ip1) - vec_len (sl1->rewrite); - ip2 = ((void *) ip2) - vec_len (sl2->rewrite); - ip3 = ((void *) ip3) - vec_len (sl3->rewrite); - - ip0->hop_limit -= 1; - ip1->hop_limit -= 1; - ip2->hop_limit -= 1; - ip3->hop_limit -= 1; - - new_l0 = - clib_net_to_host_u16 (ip0->payload_length) + - vec_len (sl0->rewrite); - new_l1 = - clib_net_to_host_u16 (ip1->payload_length) + - vec_len (sl1->rewrite); - new_l2 = - clib_net_to_host_u16 (ip2->payload_length) + - vec_len (sl2->rewrite); - new_l3 = - clib_net_to_host_u16 (ip3->payload_length) + - vec_len (sl3->rewrite); - - ip0->payload_length = clib_host_to_net_u16 (new_l0); - ip1->payload_length = clib_host_to_net_u16 (new_l1); - ip2->payload_length = clib_host_to_net_u16 (new_l2); - ip3->payload_length = clib_host_to_net_u16 (new_l3); - - sr0 = ((void *) sr0) - vec_len (sl0->rewrite); - sr1 = ((void *) sr1) - vec_len (sl1->rewrite); - sr2 = ((void *) sr2) - vec_len (sl2->rewrite); - sr3 = ((void *) sr3) - vec_len (sl3->rewrite); - - sr0->segments->as_u64[0] = ip0->dst_address.as_u64[0]; - sr0->segments->as_u64[1] = ip0->dst_address.as_u64[1]; - sr1->segments->as_u64[0] = ip1->dst_address.as_u64[0]; - sr1->segments->as_u64[1] = ip1->dst_address.as_u64[1]; - sr2->segments->as_u64[0] = ip2->dst_address.as_u64[0]; - sr2->segments->as_u64[1] = ip2->dst_address.as_u64[1]; - sr3->segments->as_u64[0] = ip3->dst_address.as_u64[0]; - sr3->segments->as_u64[1] = ip3->dst_address.as_u64[1]; - - ip0->dst_address.as_u64[0] = - (sr0->segments + sr0->segments_left)->as_u64[0]; - ip0->dst_address.as_u64[1] = - (sr0->segments + sr0->segments_left)->as_u64[1]; - ip1->dst_address.as_u64[0] = - (sr1->segments + sr1->segments_left)->as_u64[0]; - ip1->dst_address.as_u64[1] = - (sr1->segments + sr1->segments_left)->as_u64[1]; - ip2->dst_address.as_u64[0] = - (sr2->segments + sr2->segments_left)->as_u64[0]; - ip2->dst_address.as_u64[1] = - (sr2->segments + sr2->segments_left)->as_u64[1]; - ip3->dst_address.as_u64[0] = - (sr3->segments + sr3->segments_left)->as_u64[0]; - ip3->dst_address.as_u64[1] = - (sr3->segments + sr3->segments_left)->as_u64[1]; - - ip6_ext_header_t *ip_ext; - if (ip0 + 1 == (void *) sr0) - { - sr0->protocol = ip0->protocol; - ip0->protocol = IP_PROTOCOL_IPV6_ROUTE; - } - else - { - ip_ext = (void *) (ip0 + 1); - sr0->protocol = ip_ext->next_hdr; - ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; - } - - if (ip1 + 1 == (void *) sr1) - { - sr1->protocol = ip1->protocol; - ip1->protocol = IP_PROTOCOL_IPV6_ROUTE; - } - else - { - ip_ext = (void *) (ip2 + 1); - sr2->protocol = ip_ext->next_hdr; - ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; - } - - if (ip2 + 1 == (void *) sr2) - { - sr2->protocol = ip2->protocol; - ip2->protocol = IP_PROTOCOL_IPV6_ROUTE; - } - else - { - ip_ext = (void *) (ip2 + 1); - sr2->protocol = ip_ext->next_hdr; - ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; - } - - if (ip3 + 1 == (void *) sr3) - { - sr3->protocol = ip3->protocol; - ip3->protocol = IP_PROTOCOL_IPV6_ROUTE; - } - else - { - ip_ext = (void *) (ip3 + 1); - sr3->protocol = ip_ext->next_hdr; - ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; - } - - insert_pkts += 4; - - if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) - { - if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b1->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b1, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip1->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip1->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b2->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b2, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip2->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip2->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b3->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b3, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip3->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip3->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - } - - vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, - n_left_to_next, bi0, bi1, bi2, bi3, - next0, next1, next2, next3); - } - - /* Single loop for potentially the last three packets */ - while (n_left_from > 0 && n_left_to_next > 0) - { - u32 bi0; - vlib_buffer_t *b0; - ip6_header_t *ip0 = 0; - ip6_sr_header_t *sr0 = 0; - ip6_sr_sl_t *sl0; - u32 next0 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; - u16 new_l0 = 0; - - bi0 = from[0]; - to_next[0] = bi0; - from += 1; - to_next += 1; - n_left_from -= 1; - n_left_to_next -= 1; - - b0 = vlib_get_buffer (vm, bi0); - sl0 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl0->rewrite)); - - ip0 = vlib_buffer_get_current (b0); - - if (ip0->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) - sr0 = - (ip6_sr_header_t *) (((void *) (ip0 + 1)) + - ip6_ext_header_len (ip0 + 1)); - else - sr0 = (ip6_sr_header_t *) (ip0 + 1); - - clib_memcpy ((u8 *) ip0 - vec_len (sl0->rewrite), (u8 *) ip0, - (void *) sr0 - (void *) ip0); - clib_memcpy (((u8 *) sr0 - vec_len (sl0->rewrite)), sl0->rewrite, - vec_len (sl0->rewrite)); - - vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); - - ip0 = ((void *) ip0) - vec_len (sl0->rewrite); - ip0->hop_limit -= 1; - new_l0 = - clib_net_to_host_u16 (ip0->payload_length) + - vec_len (sl0->rewrite); - ip0->payload_length = clib_host_to_net_u16 (new_l0); - - sr0 = ((void *) sr0) - vec_len (sl0->rewrite); - sr0->segments->as_u64[0] = ip0->dst_address.as_u64[0]; - sr0->segments->as_u64[1] = ip0->dst_address.as_u64[1]; - - ip0->dst_address.as_u64[0] = - (sr0->segments + sr0->segments_left)->as_u64[0]; - ip0->dst_address.as_u64[1] = - (sr0->segments + sr0->segments_left)->as_u64[1]; - - if (ip0 + 1 == (void *) sr0) - { - sr0->protocol = ip0->protocol; - ip0->protocol = IP_PROTOCOL_IPV6_ROUTE; - } - else - { - ip6_ext_header_t *ip_ext = (void *) (ip0 + 1); - sr0->protocol = ip_ext->next_hdr; - ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; - } - - if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE) && - PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - insert_pkts++; - - vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, - n_left_to_next, bi0, next0); - } - - vlib_put_next_frame (vm, node, next_index, n_left_to_next); - } - - /* Update counters */ - vlib_node_increment_counter (vm, sr_policy_rewrite_insert_node.index, - SR_POLICY_REWRITE_ERROR_COUNTER_TOTAL, - insert_pkts); - vlib_node_increment_counter (vm, sr_policy_rewrite_insert_node.index, - SR_POLICY_REWRITE_ERROR_COUNTER_BSID, - bsid_pkts); - return from_frame->n_vectors; -} - -/* *INDENT-OFF* */ -VLIB_REGISTER_NODE (sr_policy_rewrite_insert_node) = { - .function = sr_policy_rewrite_insert, - .name = "sr-pl-rewrite-insert", - .vector_size = sizeof (u32), - .format_trace = format_sr_policy_rewrite_trace, - .type = VLIB_NODE_TYPE_INTERNAL, - .n_errors = SR_POLICY_REWRITE_N_ERROR, - .error_strings = sr_policy_rewrite_error_strings, - .n_next_nodes = SR_POLICY_REWRITE_N_NEXT, - .next_nodes = { -#define _(s,n) [SR_POLICY_REWRITE_NEXT_##s] = n, - foreach_sr_policy_rewrite_next -#undef _ - }, -}; -/* *INDENT-ON* */ - -/** - * @brief Graph node for applying a SR policy into a packet. BSID - SRH insertion. - */ -static uword -sr_policy_rewrite_b_insert (vlib_main_t * vm, vlib_node_runtime_t * node, - vlib_frame_t * from_frame) -{ - ip6_sr_main_t *sm = &sr_main; - u32 n_left_from, next_index, *from, *to_next; - - from = vlib_frame_vector_args (from_frame); - n_left_from = from_frame->n_vectors; - - next_index = node->cached_next_index; - - int insert_pkts = 0, bsid_pkts = 0; - - while (n_left_from > 0) - { - u32 n_left_to_next; - - vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); - - /* Quad - Loop */ - while (n_left_from >= 8 && n_left_to_next >= 4) - { - u32 bi0, bi1, bi2, bi3; - vlib_buffer_t *b0, *b1, *b2, *b3; - u32 next0, next1, next2, next3; - next0 = next1 = next2 = next3 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; - ip6_header_t *ip0, *ip1, *ip2, *ip3; - ip6_sr_header_t *sr0, *sr1, *sr2, *sr3; - ip6_sr_sl_t *sl0, *sl1, *sl2, *sl3; - u16 new_l0, new_l1, new_l2, new_l3; - - /* Prefetch next iteration. */ - { - vlib_buffer_t *p4, *p5, *p6, *p7; - - p4 = vlib_get_buffer (vm, from[4]); - p5 = vlib_get_buffer (vm, from[5]); - p6 = vlib_get_buffer (vm, from[6]); - p7 = vlib_get_buffer (vm, from[7]); - - /* Prefetch the buffer header and packet for the N+2 loop iteration */ - vlib_prefetch_buffer_header (p4, LOAD); - vlib_prefetch_buffer_header (p5, LOAD); - vlib_prefetch_buffer_header (p6, LOAD); - vlib_prefetch_buffer_header (p7, LOAD); - - CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); - } - - to_next[0] = bi0 = from[0]; - to_next[1] = bi1 = from[1]; - to_next[2] = bi2 = from[2]; - to_next[3] = bi3 = from[3]; - from += 4; - to_next += 4; - n_left_from -= 4; - n_left_to_next -= 4; - - b0 = vlib_get_buffer (vm, bi0); - b1 = vlib_get_buffer (vm, bi1); - b2 = vlib_get_buffer (vm, bi2); - b3 = vlib_get_buffer (vm, bi3); - - sl0 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - sl1 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b1)->ip.adj_index[VLIB_TX]); - sl2 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b2)->ip.adj_index[VLIB_TX]); - sl3 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b3)->ip.adj_index[VLIB_TX]); - ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl0->rewrite_bsid)); - ASSERT (b1->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl1->rewrite_bsid)); - ASSERT (b2->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl2->rewrite_bsid)); - ASSERT (b3->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl3->rewrite_bsid)); - - ip0 = vlib_buffer_get_current (b0); - ip1 = vlib_buffer_get_current (b1); - ip2 = vlib_buffer_get_current (b2); - ip3 = vlib_buffer_get_current (b3); - - if (ip0->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) - sr0 = - (ip6_sr_header_t *) (((void *) (ip0 + 1)) + - ip6_ext_header_len (ip0 + 1)); - else - sr0 = (ip6_sr_header_t *) (ip0 + 1); - - if (ip1->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) - sr1 = - (ip6_sr_header_t *) (((void *) (ip1 + 1)) + - ip6_ext_header_len (ip1 + 1)); - else - sr1 = (ip6_sr_header_t *) (ip1 + 1); - - if (ip2->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) - sr2 = - (ip6_sr_header_t *) (((void *) (ip2 + 1)) + - ip6_ext_header_len (ip2 + 1)); - else - sr2 = (ip6_sr_header_t *) (ip2 + 1); - - if (ip3->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) - sr3 = - (ip6_sr_header_t *) (((void *) (ip3 + 1)) + - ip6_ext_header_len (ip3 + 1)); - else - sr3 = (ip6_sr_header_t *) (ip3 + 1); - - clib_memcpy ((u8 *) ip0 - vec_len (sl0->rewrite_bsid), (u8 *) ip0, - (void *) sr0 - (void *) ip0); - clib_memcpy ((u8 *) ip1 - vec_len (sl1->rewrite_bsid), (u8 *) ip1, - (void *) sr1 - (void *) ip1); - clib_memcpy ((u8 *) ip2 - vec_len (sl2->rewrite_bsid), (u8 *) ip2, - (void *) sr2 - (void *) ip2); - clib_memcpy ((u8 *) ip3 - vec_len (sl3->rewrite_bsid), (u8 *) ip3, - (void *) sr3 - (void *) ip3); - - clib_memcpy (((u8 *) sr0 - vec_len (sl0->rewrite_bsid)), - sl0->rewrite_bsid, vec_len (sl0->rewrite_bsid)); - clib_memcpy (((u8 *) sr1 - vec_len (sl1->rewrite_bsid)), - sl1->rewrite_bsid, vec_len (sl1->rewrite_bsid)); - clib_memcpy (((u8 *) sr2 - vec_len (sl2->rewrite_bsid)), - sl2->rewrite_bsid, vec_len (sl2->rewrite_bsid)); - clib_memcpy (((u8 *) sr3 - vec_len (sl3->rewrite_bsid)), - sl3->rewrite_bsid, vec_len (sl3->rewrite_bsid)); - - vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite_bsid)); - vlib_buffer_advance (b1, -(word) vec_len (sl1->rewrite_bsid)); - vlib_buffer_advance (b2, -(word) vec_len (sl2->rewrite_bsid)); - vlib_buffer_advance (b3, -(word) vec_len (sl3->rewrite_bsid)); - - ip0 = ((void *) ip0) - vec_len (sl0->rewrite_bsid); - ip1 = ((void *) ip1) - vec_len (sl1->rewrite_bsid); - ip2 = ((void *) ip2) - vec_len (sl2->rewrite_bsid); - ip3 = ((void *) ip3) - vec_len (sl3->rewrite_bsid); - - ip0->hop_limit -= 1; - ip1->hop_limit -= 1; - ip2->hop_limit -= 1; - ip3->hop_limit -= 1; - - new_l0 = - clib_net_to_host_u16 (ip0->payload_length) + - vec_len (sl0->rewrite_bsid); - new_l1 = - clib_net_to_host_u16 (ip1->payload_length) + - vec_len (sl1->rewrite_bsid); - new_l2 = - clib_net_to_host_u16 (ip2->payload_length) + - vec_len (sl2->rewrite_bsid); - new_l3 = - clib_net_to_host_u16 (ip3->payload_length) + - vec_len (sl3->rewrite_bsid); - - ip0->payload_length = clib_host_to_net_u16 (new_l0); - ip1->payload_length = clib_host_to_net_u16 (new_l1); - ip2->payload_length = clib_host_to_net_u16 (new_l2); - ip3->payload_length = clib_host_to_net_u16 (new_l3); - - sr0 = ((void *) sr0) - vec_len (sl0->rewrite_bsid); - sr1 = ((void *) sr1) - vec_len (sl1->rewrite_bsid); - sr2 = ((void *) sr2) - vec_len (sl2->rewrite_bsid); - sr3 = ((void *) sr3) - vec_len (sl3->rewrite_bsid); - - ip0->dst_address.as_u64[0] = - (sr0->segments + sr0->segments_left)->as_u64[0]; - ip0->dst_address.as_u64[1] = - (sr0->segments + sr0->segments_left)->as_u64[1]; - ip1->dst_address.as_u64[0] = - (sr1->segments + sr1->segments_left)->as_u64[0]; - ip1->dst_address.as_u64[1] = - (sr1->segments + sr1->segments_left)->as_u64[1]; - ip2->dst_address.as_u64[0] = - (sr2->segments + sr2->segments_left)->as_u64[0]; - ip2->dst_address.as_u64[1] = - (sr2->segments + sr2->segments_left)->as_u64[1]; - ip3->dst_address.as_u64[0] = - (sr3->segments + sr3->segments_left)->as_u64[0]; - ip3->dst_address.as_u64[1] = - (sr3->segments + sr3->segments_left)->as_u64[1]; - - ip6_ext_header_t *ip_ext; - if (ip0 + 1 == (void *) sr0) - { - sr0->protocol = ip0->protocol; - ip0->protocol = IP_PROTOCOL_IPV6_ROUTE; - } - else - { - ip_ext = (void *) (ip0 + 1); - sr0->protocol = ip_ext->next_hdr; - ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; - } - - if (ip1 + 1 == (void *) sr1) - { - sr1->protocol = ip1->protocol; - ip1->protocol = IP_PROTOCOL_IPV6_ROUTE; - } - else - { - ip_ext = (void *) (ip2 + 1); - sr2->protocol = ip_ext->next_hdr; - ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; - } - - if (ip2 + 1 == (void *) sr2) - { - sr2->protocol = ip2->protocol; - ip2->protocol = IP_PROTOCOL_IPV6_ROUTE; - } - else - { - ip_ext = (void *) (ip2 + 1); - sr2->protocol = ip_ext->next_hdr; - ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; - } - - if (ip3 + 1 == (void *) sr3) - { - sr3->protocol = ip3->protocol; - ip3->protocol = IP_PROTOCOL_IPV6_ROUTE; - } - else - { - ip_ext = (void *) (ip3 + 1); - sr3->protocol = ip_ext->next_hdr; - ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; - } - - insert_pkts += 4; - - if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) - { - if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b1->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b1, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip1->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip1->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b2->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b2, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip2->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip2->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b3->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b3, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip3->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip3->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - } - - vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, - n_left_to_next, bi0, bi1, bi2, bi3, - next0, next1, next2, next3); - } - - /* Single loop for potentially the last three packets */ - while (n_left_from > 0 && n_left_to_next > 0) - { - u32 bi0; - vlib_buffer_t *b0; - ip6_header_t *ip0 = 0; - ip6_sr_header_t *sr0 = 0; - ip6_sr_sl_t *sl0; - u32 next0 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; - u16 new_l0 = 0; - - bi0 = from[0]; - to_next[0] = bi0; - from += 1; - to_next += 1; - n_left_from -= 1; - n_left_to_next -= 1; - - b0 = vlib_get_buffer (vm, bi0); - sl0 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl0->rewrite_bsid)); - - ip0 = vlib_buffer_get_current (b0); - - if (ip0->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) - sr0 = - (ip6_sr_header_t *) (((void *) (ip0 + 1)) + - ip6_ext_header_len (ip0 + 1)); - else - sr0 = (ip6_sr_header_t *) (ip0 + 1); - - clib_memcpy ((u8 *) ip0 - vec_len (sl0->rewrite_bsid), (u8 *) ip0, - (void *) sr0 - (void *) ip0); - clib_memcpy (((u8 *) sr0 - vec_len (sl0->rewrite_bsid)), - sl0->rewrite_bsid, vec_len (sl0->rewrite_bsid)); - - vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite_bsid)); - - ip0 = ((void *) ip0) - vec_len (sl0->rewrite_bsid); - ip0->hop_limit -= 1; - new_l0 = - clib_net_to_host_u16 (ip0->payload_length) + - vec_len (sl0->rewrite_bsid); - ip0->payload_length = clib_host_to_net_u16 (new_l0); - - sr0 = ((void *) sr0) - vec_len (sl0->rewrite_bsid); - - ip0->dst_address.as_u64[0] = - (sr0->segments + sr0->segments_left)->as_u64[0]; - ip0->dst_address.as_u64[1] = - (sr0->segments + sr0->segments_left)->as_u64[1]; - - if (ip0 + 1 == (void *) sr0) - { - sr0->protocol = ip0->protocol; - ip0->protocol = IP_PROTOCOL_IPV6_ROUTE; - } - else - { - ip6_ext_header_t *ip_ext = (void *) (ip0 + 1); - sr0->protocol = ip_ext->next_hdr; - ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; - } - - if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE) && - PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - insert_pkts++; - - vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, - n_left_to_next, bi0, next0); - } - - vlib_put_next_frame (vm, node, next_index, n_left_to_next); - } - - /* Update counters */ - vlib_node_increment_counter (vm, sr_policy_rewrite_insert_node.index, - SR_POLICY_REWRITE_ERROR_COUNTER_TOTAL, - insert_pkts); - vlib_node_increment_counter (vm, sr_policy_rewrite_insert_node.index, - SR_POLICY_REWRITE_ERROR_COUNTER_BSID, - bsid_pkts); - return from_frame->n_vectors; -} - -/* *INDENT-OFF* */ -VLIB_REGISTER_NODE (sr_policy_rewrite_b_insert_node) = { - .function = sr_policy_rewrite_b_insert, - .name = "sr-pl-rewrite-b-insert", - .vector_size = sizeof (u32), - .format_trace = format_sr_policy_rewrite_trace, - .type = VLIB_NODE_TYPE_INTERNAL, - .n_errors = SR_POLICY_REWRITE_N_ERROR, - .error_strings = sr_policy_rewrite_error_strings, - .n_next_nodes = SR_POLICY_REWRITE_N_NEXT, - .next_nodes = { -#define _(s,n) [SR_POLICY_REWRITE_NEXT_##s] = n, - foreach_sr_policy_rewrite_next -#undef _ - }, -}; -/* *INDENT-ON* */ - -/** - * @brief Function BSID encapsulation - */ -static_always_inline void -end_bsid_encaps_srh_processing (vlib_node_runtime_t * node, - vlib_buffer_t * b0, - ip6_header_t * ip0, - ip6_sr_header_t * sr0, u32 * next0) -{ - ip6_address_t *new_dst0; - - if (PREDICT_FALSE (!sr0)) - goto error_bsid_encaps; - - if (PREDICT_TRUE (sr0->type == ROUTING_HEADER_TYPE_SR)) - { - if (PREDICT_TRUE (sr0->segments_left != 0)) - { - sr0->segments_left -= 1; - new_dst0 = (ip6_address_t *) (sr0->segments); - new_dst0 += sr0->segments_left; - ip0->dst_address.as_u64[0] = new_dst0->as_u64[0]; - ip0->dst_address.as_u64[1] = new_dst0->as_u64[1]; - return; - } - } - -error_bsid_encaps: - *next0 = SR_POLICY_REWRITE_NEXT_ERROR; - b0->error = node->errors[SR_POLICY_REWRITE_ERROR_BSID_ZERO]; -} - -/** - * @brief Graph node for applying a SR policy BSID - Encapsulation - */ -static uword -sr_policy_rewrite_b_encaps (vlib_main_t * vm, vlib_node_runtime_t * node, - vlib_frame_t * from_frame) -{ - ip6_sr_main_t *sm = &sr_main; - u32 n_left_from, next_index, *from, *to_next; - - from = vlib_frame_vector_args (from_frame); - n_left_from = from_frame->n_vectors; - - next_index = node->cached_next_index; - - int encap_pkts = 0, bsid_pkts = 0; - - while (n_left_from > 0) - { - u32 n_left_to_next; - - vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); - - /* Quad - Loop */ - while (n_left_from >= 8 && n_left_to_next >= 4) - { - u32 bi0, bi1, bi2, bi3; - vlib_buffer_t *b0, *b1, *b2, *b3; - u32 next0, next1, next2, next3; - next0 = next1 = next2 = next3 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; - ip6_header_t *ip0, *ip1, *ip2, *ip3; - ip6_header_t *ip0_encap, *ip1_encap, *ip2_encap, *ip3_encap; - ip6_sr_header_t *sr0, *sr1, *sr2, *sr3; - ip6_ext_header_t *prev0, *prev1, *prev2, *prev3; - ip6_sr_sl_t *sl0, *sl1, *sl2, *sl3; - - /* Prefetch next iteration. */ - { - vlib_buffer_t *p4, *p5, *p6, *p7; - - p4 = vlib_get_buffer (vm, from[4]); - p5 = vlib_get_buffer (vm, from[5]); - p6 = vlib_get_buffer (vm, from[6]); - p7 = vlib_get_buffer (vm, from[7]); - - /* Prefetch the buffer header and packet for the N+2 loop iteration */ - vlib_prefetch_buffer_header (p4, LOAD); - vlib_prefetch_buffer_header (p5, LOAD); - vlib_prefetch_buffer_header (p6, LOAD); - vlib_prefetch_buffer_header (p7, LOAD); - - CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); - } - - to_next[0] = bi0 = from[0]; - to_next[1] = bi1 = from[1]; - to_next[2] = bi2 = from[2]; - to_next[3] = bi3 = from[3]; - from += 4; - to_next += 4; - n_left_from -= 4; - n_left_to_next -= 4; - - b0 = vlib_get_buffer (vm, bi0); - b1 = vlib_get_buffer (vm, bi1); - b2 = vlib_get_buffer (vm, bi2); - b3 = vlib_get_buffer (vm, bi3); - - sl0 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - sl1 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b1)->ip.adj_index[VLIB_TX]); - sl2 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b2)->ip.adj_index[VLIB_TX]); - sl3 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b3)->ip.adj_index[VLIB_TX]); - ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl0->rewrite)); - ASSERT (b1->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl1->rewrite)); - ASSERT (b2->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl2->rewrite)); - ASSERT (b3->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl3->rewrite)); - - ip0_encap = vlib_buffer_get_current (b0); - ip1_encap = vlib_buffer_get_current (b1); - ip2_encap = vlib_buffer_get_current (b2); - ip3_encap = vlib_buffer_get_current (b3); - - ip6_ext_header_find_t (ip0_encap, prev0, sr0, - IP_PROTOCOL_IPV6_ROUTE); - ip6_ext_header_find_t (ip1_encap, prev1, sr1, - IP_PROTOCOL_IPV6_ROUTE); - ip6_ext_header_find_t (ip2_encap, prev2, sr2, - IP_PROTOCOL_IPV6_ROUTE); - ip6_ext_header_find_t (ip3_encap, prev3, sr3, - IP_PROTOCOL_IPV6_ROUTE); - - end_bsid_encaps_srh_processing (node, b0, ip0_encap, sr0, &next0); - end_bsid_encaps_srh_processing (node, b1, ip1_encap, sr1, &next1); - end_bsid_encaps_srh_processing (node, b2, ip2_encap, sr2, &next2); - end_bsid_encaps_srh_processing (node, b3, ip3_encap, sr3, &next3); - - clib_memcpy (((u8 *) ip0_encap) - vec_len (sl0->rewrite), - sl0->rewrite, vec_len (sl0->rewrite)); - clib_memcpy (((u8 *) ip1_encap) - vec_len (sl1->rewrite), - sl1->rewrite, vec_len (sl1->rewrite)); - clib_memcpy (((u8 *) ip2_encap) - vec_len (sl2->rewrite), - sl2->rewrite, vec_len (sl2->rewrite)); - clib_memcpy (((u8 *) ip3_encap) - vec_len (sl3->rewrite), - sl3->rewrite, vec_len (sl3->rewrite)); - - vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); - vlib_buffer_advance (b1, -(word) vec_len (sl1->rewrite)); - vlib_buffer_advance (b2, -(word) vec_len (sl2->rewrite)); - vlib_buffer_advance (b3, -(word) vec_len (sl3->rewrite)); - - ip0 = vlib_buffer_get_current (b0); - ip1 = vlib_buffer_get_current (b1); - ip2 = vlib_buffer_get_current (b2); - ip3 = vlib_buffer_get_current (b3); - - encaps_processing_v6 (node, b0, ip0, ip0_encap); - encaps_processing_v6 (node, b1, ip1, ip1_encap); - encaps_processing_v6 (node, b2, ip2, ip2_encap); - encaps_processing_v6 (node, b3, ip3, ip3_encap); - - if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) - { - if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b1->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b1, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip1->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip1->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b2->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b2, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip2->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip2->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - if (PREDICT_FALSE (b3->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b3, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip3->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip3->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - } - - encap_pkts += 4; - vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, - n_left_to_next, bi0, bi1, bi2, bi3, - next0, next1, next2, next3); - } - - /* Single loop for potentially the last three packets */ - while (n_left_from > 0 && n_left_to_next > 0) - { - u32 bi0; - vlib_buffer_t *b0; - ip6_header_t *ip0 = 0, *ip0_encap = 0; - ip6_ext_header_t *prev0; - ip6_sr_header_t *sr0; - ip6_sr_sl_t *sl0; - u32 next0 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; - - bi0 = from[0]; - to_next[0] = bi0; - from += 1; - to_next += 1; - n_left_from -= 1; - n_left_to_next -= 1; - b0 = vlib_get_buffer (vm, bi0); - - sl0 = - pool_elt_at_index (sm->sid_lists, - vnet_buffer (b0)->ip.adj_index[VLIB_TX]); - ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= - vec_len (sl0->rewrite)); - - ip0_encap = vlib_buffer_get_current (b0); - ip6_ext_header_find_t (ip0_encap, prev0, sr0, - IP_PROTOCOL_IPV6_ROUTE); - end_bsid_encaps_srh_processing (node, b0, ip0_encap, sr0, &next0); - - clib_memcpy (((u8 *) ip0_encap) - vec_len (sl0->rewrite), - sl0->rewrite, vec_len (sl0->rewrite)); - vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); - - ip0 = vlib_buffer_get_current (b0); - - encaps_processing_v6 (node, b0, ip0, ip0_encap); - - if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE) && - PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) - { - sr_policy_rewrite_trace_t *tr = - vlib_add_trace (vm, node, b0, sizeof (*tr)); - clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, - sizeof (tr->src.as_u8)); - clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, - sizeof (tr->dst.as_u8)); - } - - encap_pkts++; - vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, - n_left_to_next, bi0, next0); - } - - vlib_put_next_frame (vm, node, next_index, n_left_to_next); - } - - /* Update counters */ - vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, - SR_POLICY_REWRITE_ERROR_COUNTER_TOTAL, - encap_pkts); - vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, - SR_POLICY_REWRITE_ERROR_COUNTER_BSID, - bsid_pkts); - - return from_frame->n_vectors; -} - -/* *INDENT-OFF* */ -VLIB_REGISTER_NODE (sr_policy_rewrite_b_encaps_node) = { - .function = sr_policy_rewrite_b_encaps, - .name = "sr-pl-rewrite-b-encaps", - .vector_size = sizeof (u32), - .format_trace = format_sr_policy_rewrite_trace, - .type = VLIB_NODE_TYPE_INTERNAL, - .n_errors = SR_POLICY_REWRITE_N_ERROR, - .error_strings = sr_policy_rewrite_error_strings, - .n_next_nodes = SR_POLICY_REWRITE_N_NEXT, - .next_nodes = { -#define _(s,n) [SR_POLICY_REWRITE_NEXT_##s] = n, - foreach_sr_policy_rewrite_next -#undef _ - }, -}; -/* *INDENT-ON* */ - -/*************************** SR Segment Lists DPOs ****************************/ -static u8 * -format_sr_segment_list_dpo (u8 * s, va_list * args) -{ - ip6_sr_main_t *sm = &sr_main; - ip6_address_t *addr; - ip6_sr_sl_t *sl; - - index_t index = va_arg (*args, index_t); - CLIB_UNUSED (u32 indent) = va_arg (*args, u32); - s = format (s, "SR: Segment List index:[%d]", index); - s = format (s, "\n\tSegments:"); - - sl = pool_elt_at_index (sm->sid_lists, index); - - s = format (s, "< "); - vec_foreach (addr, sl->segments) - { - s = format (s, "%U, ", format_ip6_address, addr); - } - s = format (s, "\b\b > - "); - s = format (s, "Weight: %u", sl->weight); - - return s; -} - -const static dpo_vft_t sr_policy_rewrite_vft = { - .dv_lock = sr_dpo_lock, - .dv_unlock = sr_dpo_unlock, - .dv_format = format_sr_segment_list_dpo, -}; - -const static char *const sr_pr_encaps_ip6_nodes[] = { - "sr-pl-rewrite-encaps", - NULL, -}; - -const static char *const sr_pr_encaps_ip4_nodes[] = { - "sr-pl-rewrite-encaps-v4", - NULL, -}; - -const static char *const *const sr_pr_encaps_nodes[DPO_PROTO_NUM] = { - [DPO_PROTO_IP6] = sr_pr_encaps_ip6_nodes, - [DPO_PROTO_IP4] = sr_pr_encaps_ip4_nodes, -}; - -const static char *const sr_pr_insert_ip6_nodes[] = { - "sr-pl-rewrite-insert", - NULL, -}; - -const static char *const *const sr_pr_insert_nodes[DPO_PROTO_NUM] = { - [DPO_PROTO_IP6] = sr_pr_insert_ip6_nodes, -}; - -const static char *const sr_pr_bsid_insert_ip6_nodes[] = { - "sr-pl-rewrite-b-insert", - NULL, -}; - -const static char *const *const sr_pr_bsid_insert_nodes[DPO_PROTO_NUM] = { - [DPO_PROTO_IP6] = sr_pr_bsid_insert_ip6_nodes, -}; - -const static char *const sr_pr_bsid_encaps_ip6_nodes[] = { - "sr-pl-rewrite-b-encaps", - NULL, -}; - -const static char *const *const sr_pr_bsid_encaps_nodes[DPO_PROTO_NUM] = { - [DPO_PROTO_IP6] = sr_pr_bsid_encaps_ip6_nodes, -}; - -/********************* SR Policy Rewrite initialization ***********************/ -/** - * @brief SR Policy Rewrite initialization - */ -clib_error_t * -sr_policy_rewrite_init (vlib_main_t * vm) -{ - ip6_sr_main_t *sm = &sr_main; - - /* Init memory for sr policy keys (bsid <-> ip6_address_t) */ - mhash_init (&sm->sr_policies_index_hash, sizeof (uword), - sizeof (ip6_address_t)); - - /* Init SR VPO DPOs type */ - sr_pr_encaps_dpo_type = - dpo_register_new_type (&sr_policy_rewrite_vft, sr_pr_encaps_nodes); - - sr_pr_insert_dpo_type = - dpo_register_new_type (&sr_policy_rewrite_vft, sr_pr_insert_nodes); - - sr_pr_bsid_encaps_dpo_type = - dpo_register_new_type (&sr_policy_rewrite_vft, sr_pr_bsid_encaps_nodes); - - sr_pr_bsid_insert_dpo_type = - dpo_register_new_type (&sr_policy_rewrite_vft, sr_pr_bsid_insert_nodes); - - /* Register the L2 encaps node used in HW redirect */ - sm->l2_sr_policy_rewrite_index = sr_policy_rewrite_encaps_node.index; - - sm->fib_table_ip6 = (u32) ~ 0; - sm->fib_table_ip4 = (u32) ~ 0; - - return 0; -} - -VLIB_INIT_FUNCTION (sr_policy_rewrite_init); - - -/* -* fd.io coding-style-patch-verification: ON -* -* Local Variables: -* eval: (c-set-style "gnu") -* End: -*/ diff --git a/src/vnet/sr/sr_steering.c b/src/vnet/sr/sr_steering.c deleted file mode 100755 index 04646198..00000000 --- a/src/vnet/sr/sr_steering.c +++ /dev/null @@ -1,573 +0,0 @@ -/* - * sr_steering.c: ipv6 segment routing steering into SR policy - * - * Copyright (c) 2016 Cisco and/or its affiliates. - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -/** - * @file - * @brief Packet steering into SR Policies - * - * This file is in charge of handling the FIB appropiatly to steer packets - * through SR Policies as defined in 'sr_policy_rewrite.c'. Notice that here - * we are only doing steering. SR policy application is done in - * sr_policy_rewrite.c - * - * Supports: - * - Steering of IPv6 traffic Destination Address based - * - Steering of IPv4 traffic Destination Address based - * - Steering of L2 frames, interface based (sw interface) - */ - -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include - -/** - * @brief Steer traffic L2 and L3 traffic through a given SR policy - * - * @param is_del - * @param bsid is the bindingSID of the SR Policy (alt to sr_policy_index) - * @param sr_policy is the index of the SR Policy (alt to bsid) - * @param table_id is the VRF where to install the FIB entry for the BSID - * @param prefix is the IPv4/v6 address for L3 traffic type - * @param mask_width is the mask for L3 traffic type - * @param sw_if_index is the incoming interface for L2 traffic - * @param traffic_type describes the type of traffic - * - * @return 0 if correct, else error - */ -int -sr_steering_policy (int is_del, ip6_address_t * bsid, u32 sr_policy_index, - u32 table_id, ip46_address_t * prefix, u32 mask_width, - u32 sw_if_index, u8 traffic_type) -{ - ip6_sr_main_t *sm = &sr_main; - sr_steering_key_t key; - ip6_sr_steering_policy_t *steer_pl; - fib_prefix_t pfx = { 0 }; - - ip6_sr_policy_t *sr_policy = 0; - uword *p = 0; - - memset (&key, 0, sizeof (sr_steering_key_t)); - - /* Compute the steer policy key */ - if (traffic_type == SR_STEER_IPV4 || traffic_type == SR_STEER_IPV6) - { - key.l3.prefix.as_u64[0] = prefix->as_u64[0]; - key.l3.prefix.as_u64[1] = prefix->as_u64[1]; - key.l3.mask_width = mask_width; - key.l3.fib_table = (table_id != (u32) ~ 0 ? table_id : 0); - } - else if (traffic_type == SR_STEER_L2) - { - key.l2.sw_if_index = sw_if_index; - - /* Sanitise the SW_IF_INDEX */ - if (pool_is_free_index (sm->vnet_main->interface_main.sw_interfaces, - sw_if_index)) - return -3; - - vnet_sw_interface_t *sw = - vnet_get_sw_interface (sm->vnet_main, sw_if_index); - if (sw->type != VNET_SW_INTERFACE_TYPE_HARDWARE) - return -3; - } - else - return -1; - - key.traffic_type = traffic_type; - - /* Search for the item */ - p = mhash_get (&sm->sr_steer_policies_hash, &key); - - if (p) - { - /* Retrieve Steer Policy function */ - steer_pl = pool_elt_at_index (sm->steer_policies, p[0]); - - if (is_del) - { - if (steer_pl->classify.traffic_type == SR_STEER_IPV6) - { - /* Remove FIB entry */ - pfx.fp_proto = FIB_PROTOCOL_IP6; - pfx.fp_len = steer_pl->classify.l3.mask_width; - pfx.fp_addr.ip6 = steer_pl->classify.l3.prefix.ip6; - - fib_table_entry_delete (fib_table_find - (FIB_PROTOCOL_IP6, - steer_pl->classify.l3.fib_table), - &pfx, FIB_SOURCE_SR); - } - else if (steer_pl->classify.traffic_type == SR_STEER_IPV4) - { - /* Remove FIB entry */ - pfx.fp_proto = FIB_PROTOCOL_IP4; - pfx.fp_len = steer_pl->classify.l3.mask_width; - pfx.fp_addr.ip4 = steer_pl->classify.l3.prefix.ip4; - - fib_table_entry_delete (fib_table_find - (FIB_PROTOCOL_IP4, - steer_pl->classify.l3.fib_table), &pfx, - FIB_SOURCE_SR); - } - else if (steer_pl->classify.traffic_type == SR_STEER_L2) - { - /* Remove HW redirection */ - vnet_feature_enable_disable ("device-input", - "sr-policy-rewrite-encaps-l2", - sw_if_index, 0, 0, 0); - sm->sw_iface_sr_policies[sw_if_index] = ~(u32) 0; - - /* Remove promiscous mode from interface */ - vnet_main_t *vnm = vnet_get_main (); - ethernet_main_t *em = ðernet_main; - ethernet_interface_t *eif = - ethernet_get_interface (em, sw_if_index); - - if (!eif) - goto cleanup_error_redirection; - - ethernet_set_flags (vnm, sw_if_index, 0); - } - - /* Delete SR steering policy entry */ - pool_put (sm->steer_policies, steer_pl); - mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); - - /* If no more SR policies or steering policies */ - if (!pool_elts (sm->sr_policies) && !pool_elts (sm->steer_policies)) - { - fib_table_unlock (sm->fib_table_ip6, FIB_PROTOCOL_IP6); - fib_table_unlock (sm->fib_table_ip4, FIB_PROTOCOL_IP6); - sm->fib_table_ip6 = (u32) ~ 0; - sm->fib_table_ip4 = (u32) ~ 0; - } - - return 1; - } - else /* It means user requested to update an existing SR steering policy */ - { - /* Retrieve SR steering policy */ - if (bsid) - { - p = mhash_get (&sm->sr_policies_index_hash, bsid); - if (p) - sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); - else - return -2; - } - else - sr_policy = pool_elt_at_index (sm->sr_policies, sr_policy_index); - - if (!sr_policy) - return -2; - - steer_pl->sr_policy = sr_policy - sm->sr_policies; - - /* Remove old FIB/hw redirection and create a new one */ - if (steer_pl->classify.traffic_type == SR_STEER_IPV6) - { - /* Remove FIB entry */ - pfx.fp_proto = FIB_PROTOCOL_IP6; - pfx.fp_len = steer_pl->classify.l3.mask_width; - pfx.fp_addr.ip6 = steer_pl->classify.l3.prefix.ip6; - - fib_table_entry_delete (fib_table_find - (FIB_PROTOCOL_IP6, - steer_pl->classify.l3.fib_table), - &pfx, FIB_SOURCE_SR); - - /* Create a new one */ - goto update_fib; - } - else if (steer_pl->classify.traffic_type == SR_STEER_IPV4) - { - /* Remove FIB entry */ - pfx.fp_proto = FIB_PROTOCOL_IP4; - pfx.fp_len = steer_pl->classify.l3.mask_width; - pfx.fp_addr.ip4 = steer_pl->classify.l3.prefix.ip4; - - fib_table_entry_delete (fib_table_find - (FIB_PROTOCOL_IP4, - steer_pl->classify.l3.fib_table), - &pfx, FIB_SOURCE_SR); - - /* Create a new one */ - goto update_fib; - } - else if (steer_pl->classify.traffic_type == SR_STEER_L2) - { - /* Update L2-HW redirection */ - goto update_fib; - } - } - } - else - /* delete; steering policy does not exist; complain */ - if (is_del) - return -4; - - /* Retrieve SR policy */ - if (bsid) - { - p = mhash_get (&sm->sr_policies_index_hash, bsid); - if (p) - sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); - else - return -2; - } - else - sr_policy = pool_elt_at_index (sm->sr_policies, sr_policy_index); - - /* Create a new steering policy */ - pool_get (sm->steer_policies, steer_pl); - memset (steer_pl, 0, sizeof (*steer_pl)); - - if (traffic_type == SR_STEER_IPV4 || traffic_type == SR_STEER_IPV6) - { - clib_memcpy (&steer_pl->classify.l3.prefix, prefix, - sizeof (ip46_address_t)); - steer_pl->classify.l3.mask_width = mask_width; - steer_pl->classify.l3.fib_table = - (table_id != (u32) ~ 0 ? table_id : 0); - steer_pl->classify.traffic_type = traffic_type; - } - else if (traffic_type == SR_STEER_L2) - { - steer_pl->classify.l2.sw_if_index = sw_if_index; - steer_pl->classify.traffic_type = traffic_type; - } - else - { - /* Incorrect API usage. Should never get here */ - pool_put (sm->steer_policies, steer_pl); - mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); - return -1; - } - steer_pl->sr_policy = sr_policy - sm->sr_policies; - - /* Create and store key */ - mhash_set (&sm->sr_steer_policies_hash, &key, steer_pl - sm->steer_policies, - NULL); - - if (traffic_type == SR_STEER_L2) - { - if (!sr_policy->is_encap) - goto cleanup_error_encap; - - if (vnet_feature_enable_disable - ("device-input", "sr-pl-rewrite-encaps-l2", sw_if_index, 1, 0, 0)) - goto cleanup_error_redirection; - - /* Set promiscous mode on interface */ - vnet_main_t *vnm = vnet_get_main (); - ethernet_main_t *em = ðernet_main; - ethernet_interface_t *eif = ethernet_get_interface (em, sw_if_index); - - if (!eif) - goto cleanup_error_redirection; - - ethernet_set_flags (vnm, sw_if_index, - ETHERNET_INTERFACE_FLAG_ACCEPT_ALL); - } - else if (traffic_type == SR_STEER_IPV4) - if (!sr_policy->is_encap) - goto cleanup_error_encap; - -update_fib: - /* FIB API calls - Recursive route through the BindingSID */ - if (traffic_type == SR_STEER_IPV6) - { - pfx.fp_proto = FIB_PROTOCOL_IP6; - pfx.fp_len = steer_pl->classify.l3.mask_width; - pfx.fp_addr.ip6 = steer_pl->classify.l3.prefix.ip6; - - fib_table_entry_path_add (fib_table_find (FIB_PROTOCOL_IP6, - (table_id != - (u32) ~ 0 ? - table_id : 0)), - &pfx, FIB_SOURCE_SR, - FIB_ENTRY_FLAG_LOOSE_URPF_EXEMPT, - FIB_PROTOCOL_IP6, - (ip46_address_t *) & sr_policy->bsid, ~0, - sm->fib_table_ip6, 1, NULL, - FIB_ROUTE_PATH_FLAG_NONE); - } - else if (traffic_type == SR_STEER_IPV4) - { - pfx.fp_proto = FIB_PROTOCOL_IP4; - pfx.fp_len = steer_pl->classify.l3.mask_width; - pfx.fp_addr.ip4 = steer_pl->classify.l3.prefix.ip4; - - fib_table_entry_path_add (fib_table_find (FIB_PROTOCOL_IP4, - (table_id != - (u32) ~ 0 ? - table_id : 0)), - &pfx, FIB_SOURCE_SR, - FIB_ENTRY_FLAG_LOOSE_URPF_EXEMPT, - FIB_PROTOCOL_IP6, - (ip46_address_t *) & sr_policy->bsid, ~0, - sm->fib_table_ip4, 1, NULL, - FIB_ROUTE_PATH_FLAG_NONE); - } - else if (traffic_type == SR_STEER_L2) - { - if (sw_if_index < vec_len (sm->sw_iface_sr_policies)) - sm->sw_iface_sr_policies[sw_if_index] = steer_pl->sr_policy; - else - { - vec_resize (sm->sw_iface_sr_policies, - (pool_len (sm->vnet_main->interface_main.sw_interfaces) - - vec_len (sm->sw_iface_sr_policies))); - sm->sw_iface_sr_policies[sw_if_index] = steer_pl->sr_policy; - } - } - - return 0; - -cleanup_error_encap: - pool_put (sm->steer_policies, steer_pl); - mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); - return -5; - -cleanup_error_redirection: - pool_put (sm->steer_policies, steer_pl); - mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); - return -3; -} - -static clib_error_t * -sr_steer_policy_command_fn (vlib_main_t * vm, unformat_input_t * input, - vlib_cli_command_t * cmd) -{ - vnet_main_t *vnm = vnet_get_main (); - - int is_del = 0; - - ip46_address_t prefix; - u32 dst_mask_width = 0; - u32 sw_if_index = (u32) ~ 0; - u8 traffic_type = 0; - u32 fib_table = (u32) ~ 0; - - ip6_address_t bsid; - u32 sr_policy_index = (u32) ~ 0; - - u8 sr_policy_set = 0; - - memset (&prefix, 0, sizeof (ip46_address_t)); - - int rv; - while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) - { - if (unformat (input, "del")) - is_del = 1; - else if (!traffic_type - && unformat (input, "l3 %U/%d", unformat_ip6_address, - &prefix.ip6, &dst_mask_width)) - traffic_type = SR_STEER_IPV6; - else if (!traffic_type - && unformat (input, "l3 %U/%d", unformat_ip4_address, - &prefix.ip4, &dst_mask_width)) - traffic_type = SR_STEER_IPV4; - else if (!traffic_type - && unformat (input, "l2 %U", unformat_vnet_sw_interface, vnm, - &sw_if_index)) - traffic_type = SR_STEER_L2; - else if (!sr_policy_set - && unformat (input, "via sr policy index %d", - &sr_policy_index)) - sr_policy_set = 1; - else if (!sr_policy_set - && unformat (input, "via sr policy bsid %U", - unformat_ip6_address, &bsid)) - sr_policy_set = 1; - else if (fib_table == (u32) ~ 0 - && unformat (input, "fib-table %d", &fib_table)); - else - break; - } - - if (!traffic_type) - return clib_error_return (0, "No L2/L3 traffic specified"); - if (!sr_policy_set) - return clib_error_return (0, "No SR policy specified"); - - /* Make sure that the prefixes are clean */ - if (traffic_type == SR_STEER_IPV4) - { - u32 mask = - (dst_mask_width ? (0xFFFFFFFFu >> (32 - dst_mask_width)) : 0); - prefix.ip4.as_u32 &= mask; - } - else if (traffic_type == SR_STEER_IPV6) - { - ip6_address_t mask; - ip6_address_mask_from_width (&mask, dst_mask_width); - ip6_address_mask (&prefix.ip6, &mask); - } - - rv = - sr_steering_policy (is_del, (sr_policy_index == ~(u32) 0 ? &bsid : NULL), - sr_policy_index, fib_table, &prefix, dst_mask_width, - sw_if_index, traffic_type); - - switch (rv) - { - case 0: - break; - case 1: - return 0; - case -1: - return clib_error_return (0, "Incorrect API usage."); - case -2: - return clib_error_return (0, - "The requested SR policy could not be located. Review the BSID/index."); - case -3: - return clib_error_return (0, - "Unable to do SW redirect. Incorrect interface."); - case -4: - return clib_error_return (0, - "The requested SR steering policy could not be deleted."); - case -5: - return clib_error_return (0, - "The SR policy is not an encapsulation one."); - default: - return clib_error_return (0, "BUG: sr steer policy returns %d", rv); - } - return 0; -} - -/* *INDENT-OFF* */ -VLIB_CLI_COMMAND (sr_steer_policy_command, static) = { - .path = "sr steer", - .short_help = "sr steer (del) [l3 |l2 ]" - "via sr policy [index |bsid ]" - "(fib-table )", - .long_help = - "\tSteer a L2 or L3 traffic through an existing SR policy.\n" - "\tExamples:\n" - "\t\tsr steer l3 2001::/64 via sr_policy index 5\n" - "\t\tsr steer l3 2001::/64 via sr_policy bsid 2010::9999:1\n" - "\t\tsr steer l2 GigabitEthernet0/5/0 via sr_policy index 5\n" - "\t\tsr steer del l3 2001::/64 via sr_policy index 5\n", - .function = sr_steer_policy_command_fn, -}; -/* *INDENT-ON* */ - -static clib_error_t * -show_sr_steering_policies_command_fn (vlib_main_t * vm, - unformat_input_t * input, - vlib_cli_command_t * cmd) -{ - ip6_sr_main_t *sm = &sr_main; - ip6_sr_steering_policy_t **steer_policies = 0; - ip6_sr_steering_policy_t *steer_pl; - - vnet_main_t *vnm = vnet_get_main (); - - ip6_sr_policy_t *pl = 0; - int i; - - vlib_cli_output (vm, "SR steering policies:"); - /* *INDENT-OFF* */ - pool_foreach (steer_pl, sm->steer_policies, ({vec_add1(steer_policies, steer_pl);})); - /* *INDENT-ON* */ - vlib_cli_output (vm, "Traffic\t\tSR policy BSID"); - for (i = 0; i < vec_len (steer_policies); i++) - { - steer_pl = steer_policies[i]; - pl = pool_elt_at_index (sm->sr_policies, steer_pl->sr_policy); - if (steer_pl->classify.traffic_type == SR_STEER_L2) - { - vlib_cli_output (vm, "L2 %U\t%U", - format_vnet_sw_if_index_name, vnm, - steer_pl->classify.l2.sw_if_index, - format_ip6_address, &pl->bsid); - } - else if (steer_pl->classify.traffic_type == SR_STEER_IPV4) - { - vlib_cli_output (vm, "L3 %U/%d\t%U", - format_ip4_address, - &steer_pl->classify.l3.prefix.ip4, - steer_pl->classify.l3.mask_width, - format_ip6_address, &pl->bsid); - } - else if (steer_pl->classify.traffic_type == SR_STEER_IPV6) - { - vlib_cli_output (vm, "L3 %U/%d\t%U", - format_ip6_address, - &steer_pl->classify.l3.prefix.ip6, - steer_pl->classify.l3.mask_width, - format_ip6_address, &pl->bsid); - } - } - return 0; -} - -/* *INDENT-OFF* */ -VLIB_CLI_COMMAND (show_sr_steering_policies_command, static) = { - .path = "show sr steering policies", - .short_help = "show sr steering policies", - .function = show_sr_steering_policies_command_fn, -}; -/* *INDENT-ON* */ - -clib_error_t * -sr_steering_init (vlib_main_t * vm) -{ - ip6_sr_main_t *sm = &sr_main; - - /* Init memory for function keys */ - mhash_init (&sm->sr_steer_policies_hash, sizeof (uword), - sizeof (sr_steering_key_t)); - - sm->sw_iface_sr_policies = 0; - - sm->vnet_main = vnet_get_main (); - - return 0; -} - -/* *INDENT-OFF* */ -VLIB_INIT_FUNCTION (sr_steering_init); -/* *INDENT-ON* */ - -/* *INDENT-OFF* */ -VNET_FEATURE_INIT (sr_pl_rewrite_encaps_l2, static) = -{ - .arc_name = "device-input", - .node_name = "sr-pl-rewrite-encaps-l2", - .runs_before = VNET_FEATURES ("ethernet-input"), -}; -/* *INDENT-ON* */ - -/* -* fd.io coding-style-patch-verification: ON -* -* Local Variables: -* eval: (c-set-style "gnu") -* End: -*/ diff --git a/src/vnet/sr/sr_steering.md b/src/vnet/sr/sr_steering.md deleted file mode 100644 index cf446f81..00000000 --- a/src/vnet/sr/sr_steering.md +++ /dev/null @@ -1,11 +0,0 @@ -# Steering packets into a SR Policy {#srv6_steering_doc} - -To steer packets in Transit into an SR policy (T.Insert, T.Encaps and T.Encaps.L2 behaviors), the user needs to create an 'sr steering policy'. - - sr steer l3 2001::/64 via sr policy index 1 - sr steer l3 2001::/64 via sr policy bsid cafe::1 - sr steer l3 2001::/64 via sr policy bsid cafe::1 fib-table 3 - sr steer l3 10.0.0.0/16 via sr policy bsid cafe::1 - sr steer l2 TenGE0/1/0 via sr policy bsid cafe::1 - -Disclaimer: The T.Encaps.L2 will steer L2 frames into an SR Policy. Notice that creating an SR steering policy for L2 frames will actually automatically *put the interface into promiscous mode*. diff --git a/src/vnet/srmpls/dir.dox b/src/vnet/srmpls/dir.dox new file mode 100755 index 00000000..76ec1d6a --- /dev/null +++ b/src/vnet/srmpls/dir.dox @@ -0,0 +1,22 @@ +/* + * + * Copyright (c) 2013 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +/** + @dir + @brief Segment Routing MPLS code + + An implementation of Segment Routing for the MPLS dataplane. + +*/ \ No newline at end of file diff --git a/src/vnet/srmpls/sr.h b/src/vnet/srmpls/sr.h new file mode 100755 index 00000000..0e106697 --- /dev/null +++ b/src/vnet/srmpls/sr.h @@ -0,0 +1,152 @@ +/* + * Copyright (c) 2015 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * @file + * @brief Segment Routing MPLS data structures definitions + * + */ + +#ifndef included_vnet_srmpls_h +#define included_vnet_srmpls_h + +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +/* SR policy types */ +#define SR_POLICY_TYPE_DEFAULT 0 +#define SR_POLICY_TYPE_SPRAY 1 + +#define SR_SEGMENT_LIST_WEIGHT_DEFAULT 1 + +#define SR_STEER_IPV4 4 +#define SR_STEER_IPV6 6 + +/** + * @brief SR Segment List (SID list) + */ +typedef struct +{ + /** + * SIDs (key) + */ + mpls_label_t *segments; + + /** + * SID list weight (wECMP / UCMP) + */ + u32 weight; + +} mpls_sr_sl_t; + +typedef struct +{ + u32 *segments_lists; /**< Pool of SID lists indexes */ + + mpls_label_t bsid; /**< BindingSID (key) */ + + u8 type; /**< Type (default is 0) */ + /* SR Policy specific DPO */ + /* IF Type = DEFAULT Then Load Balancer DPO among SID lists */ + /* IF Type = SPRAY then Spray DPO with all SID lists */ + +} mpls_sr_policy_t; + +/** + * @brief Steering db key + * + * L3 is IPv4/IPv6 + mask + */ +typedef struct +{ + ip46_address_t prefix; /**< IP address of the prefix */ + u32 mask_width; /**< Mask width of the prefix */ + u32 fib_table; /**< VRF of the prefix */ + u8 traffic_type; /**< Traffic type (IPv4, IPv6, L2) */ + u8 padding[3]; +} sr_mpls_steering_key_t; + +typedef struct +{ + sr_mpls_steering_key_t classify; /**< Traffic classification */ + u32 sr_policy; /**< SR Policy index */ +} mpls_sr_steering_policy_t; + +/** + * @brief Segment Routing main datastructure + */ +typedef struct +{ + /** + * SR SID lists + */ + mpls_sr_sl_t *sid_lists; + + /** + * SR MPLS policies + */ + mpls_sr_policy_t *sr_policies; + + /** + * Hash table mapping BindingSID to SR MPLS policy + */ + uword *sr_policies_index_hash; + + /** + * Pool of SR steer policies instances + */ + mpls_sr_steering_policy_t *steer_policies; + + /** + * MHash table mapping steering rules to SR steer instance + */ + mhash_t sr_steer_policies_hash; + + /** + * convenience + */ + vlib_main_t *vlib_main; + vnet_main_t *vnet_main; +} mpls_sr_main_t; + +extern mpls_sr_main_t sr_mpls_main; + +extern int +sr_mpls_policy_add (mpls_label_t bsid, mpls_label_t * segments, + u8 behavior, u32 weight); + +extern int +sr_mpls_policy_mod (mpls_label_t bsid, u32 index, u8 operation, + mpls_label_t * segments, u32 sl_index, u32 weight); + +extern int sr_mpls_policy_del (mpls_label_t bsid, u32 index); + +#endif /* included_vnet_sr_mpls_h */ + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/src/vnet/srmpls/sr_doc.md b/src/vnet/srmpls/sr_doc.md new file mode 100644 index 00000000..d60592bb --- /dev/null +++ b/src/vnet/srmpls/sr_doc.md @@ -0,0 +1,87 @@ +# SR-MPLS: Segment Routing for MPLS {#srmpls_doc} + +This is a memo intended to contain documentation of the VPP SR-MPLS implementation. +Everything that is not directly obvious should come here. +For any feedback on content that should be explained please mailto:pcamaril@cisco.com + +## Segment Routing + +Segment routing is a network technology focused on addressing the limitations of existing IP and Multiprotocol Label Switching (MPLS) networks in terms of simplicity, scale, and ease of operation. It is a foundation for application engineered routing as it prepares the networks for new business models where applications can control the network behavior. + +Segment routing seeks the right balance between distributed intelligence and centralized optimization and programming. It was built for the software-defined networking (SDN) era. + +Segment routing enhances packet forwarding behavior by enabling a network to transport unicast packets through a specific forwarding path, different from the normal path that a packet usually takes (IGP shortest path or BGP best path). This capability benefits many use cases, and one can build those specific paths based on application requirements. + +Segment routing uses the source routing paradigm. A node, usually a router but also a switch, a trusted server, or a virtual forwarder running on a hypervisor, steers a packet through an ordered list of instructions, called segments. A segment can represent any instruction, topological or service-based. A segment can have a local semantic to a segment-routing node or global within a segment-routing network. Segment routing allows an operator to enforce a flow through any topological path and service chain while maintaining per-flow state only at the ingress node to the segment-routing network. Segment routing also supports equal-cost multipath (ECMP) by design. + +Segment routing can operate with either an MPLS or an IPv6 data plane. All the currently available MPLS services, such as Layer 3 VPN (L3VPN), L2VPN (Virtual Private Wire Service [VPWS], Virtual Private LAN Services [VPLS], Ethernet VPN [E-VPN], and Provider Backbone Bridging Ethernet VPN [PBB-EVPN]), can run on top of a segment-routing transport network. + +**The implementation of Segment Routing in VPP covers both the IPv6 data plane (SRv6) as well as the MPLS data plane (SR-MPLS). This page contains the SR-MPLS documentation.** + +## Segment Routing terminology + +* SegmentID (SID): is an MPLS label. +* Segment List (SL) (SID List): is the sequence of SIDs that the packet will traverse. +* SR Policy: is a set of candidate paths (SID list+weight). An SR policy is uniquely identified by its Binding SID and associated with a weighted set of Segment Lists. In case several SID lists are defined, traffic steered into the policy is unevenly load-balanced among them according to their respective weights. +* BindingSID: a BindingSID is a SID (only one) associated one-one with an SR Policy. If a packet arrives with MPLS label corresponding to a BindingSID, then the SR policy will be applied to such packet. (BindingSID is popped first.) + +## SR-MPLS features in VPP + +The SR-MPLS implementation is focused on the SR policies, as well on its steering. Others SR-MPLS features, such as for example AdjSIDs, can be achieved using the regular VPP MPLS implementation. + +The Segment Routing Policy (*draft-filsfils-spring-segment-routing-policy*) defines SR Policies. + +## Creating a SR Policy + +An SR Policy is defined by a Binding SID and a weighted set of Segment Lists. + +A new SR policy is created with a first SID list using: + + sr mpls policy add bsid 40001 next 16001 next 16002 next 16003 (weight 5) + +* The weight parameter is only used if more than one SID list is associated with the policy. + +An SR policy is deleted with: + + sr mpls policy del bsid 40001 + +The existing SR policies are listed with: + + show sr mpls policies + +### Adding/Removing SID Lists from an SR policy + +An additional SID list is associated with an existing SR policy with: + + sr mpls policy mod bsid 40001 add sl next 16001 next 16002 next 16003 (weight 3) + +Conversely, a SID list can be removed from an SR policy with: + + sr mpls policy mod bsid 4001 del sl index 1 + +Note that this CLI cannot be used to remove the last SID list of a policy. Instead the SR policy delete CLI must be used. + +The weight of a SID list can also be modified with: + + sr mpls policy mod bsid 40001 mod sl index 1 weight 4 + sr mpls policy mod index 1 mod sl index 1 weight 4 + +### SR Policies: Spray policies + +Spray policies are a specific type of SR policies where the packet is replicated on all the SID lists, rather than load-balanced among them. + +SID list weights are ignored with this type of policies. + +A Spray policy is instantiated by appending the keyword **spray** to a regular SR-MPLS policy command, as in: + + sr mpls policy add bsid 40002 next 16001 next 16002 next 16003 spray + +Spray policies are used for removing multicast state from a network core domain, and instead send a linear unicast copy to every access node. The last SID in each list accesses the multicast tree within the access node. + +## Steering packets into a SR Policy + +To steer packets in Transit into an SR policy, the user needs to create an 'sr steering policy'. + + sr mpls steer l3 2001::/64 via sr policy bsid 40001 + sr mpls steer l3 2001::/64 via sr policy bsid 40001 fib-table 3 + sr mpls steer l3 10.0.0.0/16 via sr policy bsid 40001 diff --git a/src/vnet/srmpls/sr_mpls_policy.c b/src/vnet/srmpls/sr_mpls_policy.c new file mode 100755 index 00000000..5ebbc60d --- /dev/null +++ b/src/vnet/srmpls/sr_mpls_policy.c @@ -0,0 +1,569 @@ +/* + * sr_mpls_policy.c: SR-MPLS policies + * + * Copyright (c) 2016 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * @file + * @brief SR MPLS policy creation and application + * + * Create an SR policy. + * An SR policy can be either of 'default' type or 'spray' type + * An SR policy has attached a list of SID lists. + * In case the SR policy is a default one it will load balance among them. + * An SR policy has associated a BindingSID. + * In case any packet arrives with MPLS_label == BindingSID then the SR policy + * associated to such bindingSID will be applied to such packet. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +mpls_sr_main_t sr_mpls_main; + +/*************************** SR LB helper functions **************************/ +/** + * @brief Creates a Segment List and adds it to an SR policy + * + * Creates a Segment List and adds it to the SR policy. Notice that the SL are + * not necessarily unique. Hence there might be two Segment List within the + * same SR Policy with exactly the same segments and same weight. + * + * @param sr_policy is the SR policy where the SL will be added + * @param sl is a vector of IPv6 addresses composing the Segment List + * @param weight is the weight of the SegmentList (for load-balancing purposes) + * @param is_encap represents the mode (SRH insertion vs Encapsulation) + * + * @return pointer to the just created segment list + */ +static inline mpls_sr_sl_t * +create_sl (mpls_sr_policy_t * sr_policy, mpls_label_t * sl, u32 weight) +{ + mpls_sr_main_t *sm = &sr_mpls_main; + mpls_sr_sl_t *segment_list; + + pool_get (sm->sid_lists, segment_list); + memset (segment_list, 0, sizeof (*segment_list)); + + vec_add1 (sr_policy->segments_lists, segment_list - sm->sid_lists); + + /* Fill in segment list */ + segment_list->weight = + (weight != (u32) ~ 0 ? weight : SR_SEGMENT_LIST_WEIGHT_DEFAULT); + segment_list->segments = vec_dup (sl); + + fib_route_path_t path = { + .frp_proto = FIB_PROTOCOL_MPLS, + .frp_sw_if_index = ~0, + .frp_fib_index = 0, + .frp_weight = segment_list->weight, + .frp_flags = FIB_ROUTE_PATH_FLAG_NONE, + .frp_label_stack = NULL, + .frp_local_label = sl[0], + }; + + vec_add (path.frp_label_stack, sl + 1, vec_len (sl) - 1); + + fib_route_path_t *paths = NULL; + vec_add1 (paths, path); + + mpls_eos_bit_t eos; + FOR_EACH_MPLS_EOS_BIT (eos) + { + /* *INDENT-OFF* */ + fib_prefix_t pfx = { + .fp_len = 21, + .fp_proto = FIB_PROTOCOL_MPLS, + .fp_label = sr_policy->bsid, + .fp_eos = eos, + .fp_payload_proto = DPO_PROTO_MPLS, + }; + /* *INDENT-ON* */ + + fib_table_entry_path_add2 (0, + &pfx, + FIB_SOURCE_SR, + (sr_policy->type == SR_POLICY_TYPE_DEFAULT ? + FIB_ENTRY_FLAG_NONE : + FIB_ENTRY_FLAG_MULTICAST), paths); + } + + vec_free (paths); + + return segment_list; +} + +/******************************* SR rewrite API *******************************/ +/* Three functions for handling sr policies: + * -> sr_mpls_policy_add + * -> sr_mpls_policy_del + * -> sr_mpls_policy_mod + * All of them are API. CLI function on sr_policy_command_fn */ + +/** + * @brief Create a new SR policy + * + * @param bsid is the bindingSID of the SR Policy + * @param segments is a vector of MPLS labels composing the segment list + * @param behavior is the behavior of the SR policy. (default//spray) + * @param fib_table is the VRF where to install the FIB entry for the BSID + * @param weight is the weight of this specific SID list + * + * @return 0 if correct, else error + */ +int +sr_mpls_policy_add (mpls_label_t bsid, mpls_label_t * segments, + u8 behavior, u32 weight) +{ + mpls_sr_main_t *sm = &sr_mpls_main; + mpls_sr_policy_t *sr_policy = 0; + uword *p; + + /* Search for existing keys (BSID) */ + p = hash_get (sm->sr_policies_index_hash, bsid); + if (p) + { + /* Add SR policy that already exists; complain */ + return -12; + } + + /* Add an SR policy object */ + pool_get (sm->sr_policies, sr_policy); + memset (sr_policy, 0, sizeof (*sr_policy)); + sr_policy->bsid = bsid; + sr_policy->type = behavior; + + /* Copy the key */ + hash_set (sm->sr_policies_index_hash, bsid, sr_policy - sm->sr_policies); + + /* Create a segment list and add the index to the SR policy */ + create_sl (sr_policy, segments, weight); + + return 0; +} + +/** + * @brief Delete a SR policy + * + * @param bsid is the bindingSID of the SR Policy + * @param index is the index of the SR policy + * + * @return 0 if correct, else error + */ +int +sr_mpls_policy_del (mpls_label_t bsid, u32 index) +{ + mpls_sr_main_t *sm = &sr_mpls_main; + mpls_sr_policy_t *sr_policy = 0; + mpls_sr_sl_t *segment_list; + mpls_eos_bit_t eos; + u32 *sl_index; + uword *p; + + if (bsid) + { + p = hash_get (sm->sr_policies_index_hash, bsid); + if (p) + sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); + else + return -1; + } + else + { + sr_policy = pool_elt_at_index (sm->sr_policies, index); + if (!sr_policy) + return -1; + } + + /* Clean SID Lists */ + vec_foreach (sl_index, sr_policy->segments_lists) + { + segment_list = pool_elt_at_index (sm->sid_lists, *sl_index); + + fib_route_path_t path = { + .frp_proto = FIB_PROTOCOL_MPLS, + .frp_sw_if_index = ~0, + .frp_fib_index = 0, + .frp_weight = segment_list->weight, + .frp_flags = FIB_ROUTE_PATH_FLAG_NONE, + .frp_local_label = segment_list->segments[0], + }; + + fib_route_path_t *paths = NULL; + vec_add1 (paths, path); + + /* remove each of the MPLS routes */ + FOR_EACH_MPLS_EOS_BIT (eos) + { + /* *INDENT-OFF* */ + fib_prefix_t pfx = { + .fp_len = 21, + .fp_proto = FIB_PROTOCOL_MPLS, + .fp_label = sr_policy->bsid, + .fp_eos = eos, + .fp_payload_proto = DPO_PROTO_MPLS, + }; + /* *INDENT-ON* */ + + fib_table_entry_path_remove2 (0, &pfx, FIB_SOURCE_SR, paths); + } + vec_free (paths); + vec_free (segment_list->segments); + pool_put_index (sm->sid_lists, *sl_index); + } + + /* Remove SR policy entry */ + hash_unset (sm->sr_policies_index_hash, sr_policy->bsid); + pool_put (sm->sr_policies, sr_policy); + + return 0; +} + +/** + * @brief Modify an existing SR policy + * + * The possible modifications are adding a new Segment List, modifying an + * existing Segment List (modify the weight only) and delete a given + * Segment List from the SR Policy. + * + * @param bsid is the bindingSID of the SR Policy + * @param index is the index of the SR policy + * @param fib_table is the VRF where to install the FIB entry for the BSID + * @param operation is the operation to perform (among the top ones) + * @param segments is a vector of IPv6 address composing the segment list + * @param sl_index is the index of the Segment List to modify/delete + * @param weight is the weight of the sid list. optional. + * + * @return 0 if correct, else error + */ +int +sr_mpls_policy_mod (mpls_label_t bsid, u32 index, u8 operation, + mpls_label_t * segments, u32 sl_index, u32 weight) +{ + mpls_sr_main_t *sm = &sr_mpls_main; + mpls_sr_policy_t *sr_policy = 0; + mpls_sr_sl_t *segment_list; + u32 *sl_index_iterate; + uword *p; + + if (bsid) + { + p = hash_get (sm->sr_policies_index_hash, bsid); + if (p) + sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); + else + return -1; + } + else + { + sr_policy = pool_elt_at_index (sm->sr_policies, index); + if (!sr_policy) + return -1; + } + + if (operation == 1) /* Add SR List to an existing SR policy */ + { + /* Create the new SL */ + segment_list = create_sl (sr_policy, segments, weight); + + } + else if (operation == 2) /* Delete SR List from an existing SR policy */ + { + /* Check that currently there are more than one SID list */ + if (vec_len (sr_policy->segments_lists) == 1) + return -21; + + /* Check that the SR list does exist and is assigned to the sr policy */ + vec_foreach (sl_index_iterate, sr_policy->segments_lists) + if (*sl_index_iterate == sl_index) + break; + + if (*sl_index_iterate != sl_index) + return -22; + + /* Remove the lucky SR list that is being kicked out */ + segment_list = pool_elt_at_index (sm->sid_lists, sl_index); + + mpls_eos_bit_t eos; + fib_route_path_t path = { + .frp_proto = FIB_PROTOCOL_MPLS, + .frp_sw_if_index = ~0, + .frp_fib_index = 0, + .frp_weight = segment_list->weight, + .frp_flags = FIB_ROUTE_PATH_FLAG_NONE, + .frp_local_label = segment_list->segments[0], + }; + + fib_route_path_t *paths = NULL; + vec_add1 (paths, path); + + FOR_EACH_MPLS_EOS_BIT (eos) + { + /* *INDENT-OFF* */ + fib_prefix_t pfx = { + .fp_len = 21, + .fp_proto = FIB_PROTOCOL_MPLS, + .fp_label = sr_policy->bsid, + .fp_eos = eos, + .fp_payload_proto = DPO_PROTO_MPLS, + }; + /* *INDENT-ON* */ + + fib_table_entry_path_remove2 (0, &pfx, FIB_SOURCE_SR, paths); + } + + vec_free (paths); + vec_free (segment_list->segments); + pool_put_index (sm->sid_lists, sl_index); + vec_del1 (sr_policy->segments_lists, + sl_index_iterate - sr_policy->segments_lists); + } + else if (operation == 3) /* Modify the weight of an existing SR List */ + { + /* Find the corresponding SL */ + vec_foreach (sl_index_iterate, sr_policy->segments_lists) + if (*sl_index_iterate == sl_index) + break; + + if (*sl_index_iterate != sl_index) + return -32; + + /* Change the weight */ + segment_list = pool_elt_at_index (sm->sid_lists, sl_index); + segment_list->weight = weight; + + /* Update LB */ + //FIXME + } + return 0; +} + +/** + * @brief CLI for 'sr mpls policies' command family + */ +static clib_error_t * +sr_mpls_policy_command_fn (vlib_main_t * vm, unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + int rv = -1; + char is_del = 0, is_add = 0, is_mod = 0; + char policy_set = 0; + mpls_label_t bsid, next_label; + u32 sr_policy_index = (u32) ~ 0, sl_index = (u32) ~ 0; + u32 weight = (u32) ~ 0; + mpls_label_t *segments = 0; + u8 operation = 0; + u8 is_spray = 0; + + while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) + { + if (!is_add && !is_mod && !is_del && unformat (input, "add")) + is_add = 1; + else if (!is_add && !is_mod && !is_del && unformat (input, "del")) + is_del = 1; + else if (!is_add && !is_mod && !is_del && unformat (input, "mod")) + is_mod = 1; + else if (!policy_set + && unformat (input, "bsid %U", unformat_mpls_unicast_label, + &bsid)) + policy_set = 1; + else if (!is_add && !policy_set + && unformat (input, "index %d", &sr_policy_index)) + policy_set = 1; + else if (unformat (input, "weight %d", &weight)); + else + if (unformat + (input, "next %U", unformat_mpls_unicast_label, &next_label)) + { + vec_add (segments, &next_label, 1); + } + else if (unformat (input, "add sl")) + operation = 1; + else if (unformat (input, "del sl index %d", &sl_index)) + operation = 2; + else if (unformat (input, "mod sl index %d", &sl_index)) + operation = 3; + else if (unformat (input, "spray")) + is_spray = 1; + else + break; + } + + if (!is_add && !is_mod && !is_del) + return clib_error_return (0, "Incorrect CLI"); + + if (!policy_set) + return clib_error_return (0, "No SR policy BSID or index specified"); + + if (is_add) + { + if (vec_len (segments) == 0) + return clib_error_return (0, "No Segment List specified"); + + rv = sr_mpls_policy_add (bsid, segments, + (is_spray ? SR_POLICY_TYPE_SPRAY : + SR_POLICY_TYPE_DEFAULT), weight); + } + else if (is_del) + rv = + sr_mpls_policy_del ((sr_policy_index != (u32) ~ 0 ? (u32) ~ 0 : bsid), + sr_policy_index); + else if (is_mod) + { + if (!operation) + return clib_error_return (0, "No SL modification specified"); + if (operation != 1 && sl_index == (u32) ~ 0) + return clib_error_return (0, "No Segment List index specified"); + if (operation == 1 && vec_len (segments) == 0) + return clib_error_return (0, "No Segment List specified"); + if (operation == 3 && weight == (u32) ~ 0) + return clib_error_return (0, "No new weight for the SL specified"); + rv = + sr_mpls_policy_mod ((sr_policy_index != (u32) ~ 0 ? (u32) ~ 0 : bsid), + sr_policy_index, operation, segments, + sl_index, weight); + } + + switch (rv) + { + case 0: + break; + case 1: + return 0; + case -12: + return clib_error_return (0, + "There is already a FIB entry for the BindingSID address.\n" + "The SR policy could not be created."); + case -21: + return clib_error_return (0, + "The selected SR policy only contains ONE segment list. " + "Please remove the SR policy instead"); + case -22: + return clib_error_return (0, + "Could not delete the segment list. " + "It is not associated with that SR policy."); + case -32: + return clib_error_return (0, + "Could not modify the segment list. " + "The given SL is not associated with such SR policy."); + default: + return clib_error_return (0, "BUG: sr policy returns %d", rv); + } + return 0; +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (sr_mpls_policy_command, static) = { + .path = "sr mpls policy", + .short_help = "sr mpls policy [add||del||mod] bsid 2999 " + "next 10 next 20 next 30 (weight 1) (spray)", + .long_help = "TBD.\n", + .function = sr_mpls_policy_command_fn, +}; +/* *INDENT-ON* */ + +/** + * @brief CLI to display onscreen all the SR MPLS policies + */ +static clib_error_t * +show_sr_mpls_policies_command_fn (vlib_main_t * vm, unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + mpls_sr_main_t *sm = &sr_mpls_main; + mpls_sr_sl_t *segment_list = 0; + mpls_sr_policy_t *sr_policy = 0; + mpls_sr_policy_t **vec_policies = 0; + mpls_label_t *label; + u32 *sl_index; + u8 *s; + int i = 0; + + vlib_cli_output (vm, "SR MPLS policies:"); + + /* *INDENT-OFF* */ + pool_foreach (sr_policy, sm->sr_policies, {vec_add1 (vec_policies, sr_policy); } ); + /* *INDENT-ON* */ + + vec_foreach_index (i, vec_policies) + { + sr_policy = vec_policies[i]; + vlib_cli_output (vm, "[%u].-\tBSID: %U", + (u32) (sr_policy - sm->sr_policies), + format_mpls_unicast_label, sr_policy->bsid); + vlib_cli_output (vm, "\tType: %s", + (sr_policy->type == + SR_POLICY_TYPE_DEFAULT ? "Default" : "Spray")); + vlib_cli_output (vm, "\tSegment Lists:"); + vec_foreach (sl_index, sr_policy->segments_lists) + { + s = NULL; + segment_list = pool_elt_at_index (sm->sid_lists, *sl_index); + s = format (s, "\t[%u].- ", *sl_index); + s = format (s, "< "); + vec_foreach (label, segment_list->segments) + { + s = format (s, "%U, ", format_mpls_unicast_label, *label); + } + s = format (s, "\b\b > "); + vlib_cli_output (vm, " %s", s); + } + vlib_cli_output (vm, "-----------"); + } + vec_free (vec_policies); + return 0; +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (show_sr_mpls_policies_command, static) = { + .path = "show sr mpls policies", + .short_help = "show sr mpls policies", + .function = show_sr_mpls_policies_command_fn, +}; +/* *INDENT-ON* */ + +/********************* SR MPLS Policy initialization ***********************/ +/** + * @brief SR MPLS Policy initialization + */ +clib_error_t * +sr_mpls_policy_rewrite_init (vlib_main_t * vm) +{ + mpls_sr_main_t *sm = &sr_mpls_main; + + /* Init memory for sr policy keys (bsid <-> ip6_address_t) */ + sm->sr_policies_index_hash = hash_create (0, sizeof (mpls_label_t)); + + return 0; +} + +VLIB_INIT_FUNCTION (sr_mpls_policy_rewrite_init); + +/* +* fd.io coding-style-patch-verification: ON +* +* Local Variables: +* eval: (c-set-style "gnu") +* End: +*/ diff --git a/src/vnet/srmpls/sr_mpls_steering.c b/src/vnet/srmpls/sr_mpls_steering.c new file mode 100755 index 00000000..37707049 --- /dev/null +++ b/src/vnet/srmpls/sr_mpls_steering.c @@ -0,0 +1,453 @@ +/* + * sr_steering.c: ipv6 segment routing steering into SR policy + * + * Copyright (c) 2016 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * @file + * @brief Packet steering into SR-MPLS Policies + * + * This file is in charge of handling the FIB appropiatly to steer packets + * through SR Policies as defined in 'sr_mpls_policy.c'. Notice that here + * we are only doing steering. SR policy application is done in + * sr_policy_rewrite.c + * + * Supports: + * - Steering of IPv6 traffic Destination Address based + * - Steering of IPv4 traffic Destination Address based + */ + +#include +#include +#include +#include +#include +#include + +#include +#include + +/** + * @brief Steer traffic L3 traffic through a given SR-MPLS policy + * + * @param is_del + * @param bsid is the bindingSID of the SR Policy (alt to sr_policy_index) + * @param sr_policy is the index of the SR Policy (alt to bsid) + * @param table_id is the VRF where to install the FIB entry for the BSID + * @param prefix is the IPv4/v6 address for L3 traffic type + * @param mask_width is the mask for L3 traffic type + * @param traffic_type describes the type of traffic + * + * @return 0 if correct, else error + */ +int +sr_mpls_steering_policy (int is_del, mpls_label_t bsid, u32 sr_policy_index, + u32 table_id, ip46_address_t * prefix, + u32 mask_width, u8 traffic_type) +{ + mpls_sr_main_t *sm = &sr_mpls_main; + sr_mpls_steering_key_t key; + mpls_sr_steering_policy_t *steer_pl; + fib_prefix_t pfx = { 0 }; + + mpls_sr_policy_t *sr_policy = 0; + uword *p = 0; + + memset (&key, 0, sizeof (sr_mpls_steering_key_t)); + + /* Compute the steer policy key */ + if (traffic_type == SR_STEER_IPV4 || traffic_type == SR_STEER_IPV6) + { + key.prefix.as_u64[0] = prefix->as_u64[0]; + key.prefix.as_u64[1] = prefix->as_u64[1]; + key.mask_width = mask_width; + key.fib_table = (table_id != (u32) ~ 0 ? table_id : 0); + } + else + return -1; + + key.traffic_type = traffic_type; + + /* Search for the item */ + p = mhash_get (&sm->sr_steer_policies_hash, &key); + + if (p) + { + /* Retrieve Steer Policy function */ + steer_pl = pool_elt_at_index (sm->steer_policies, p[0]); + + if (is_del) + { + if (steer_pl->classify.traffic_type == SR_STEER_IPV6) + { + /* Remove FIB entry */ + pfx.fp_proto = FIB_PROTOCOL_IP6; + pfx.fp_len = steer_pl->classify.mask_width; + pfx.fp_addr.ip6 = steer_pl->classify.prefix.ip6; + + fib_table_entry_delete (fib_table_find + (FIB_PROTOCOL_MPLS, + steer_pl->classify.fib_table), &pfx, + FIB_SOURCE_SR); + } + else if (steer_pl->classify.traffic_type == SR_STEER_IPV4) + { + /* Remove FIB entry */ + pfx.fp_proto = FIB_PROTOCOL_IP4; + pfx.fp_len = steer_pl->classify.mask_width; + pfx.fp_addr.ip4 = steer_pl->classify.prefix.ip4; + + fib_table_entry_delete (fib_table_find + (FIB_PROTOCOL_MPLS, + steer_pl->classify.fib_table), &pfx, + FIB_SOURCE_SR); + } + + /* Delete SR steering policy entry */ + pool_put (sm->steer_policies, steer_pl); + mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); + + return 1; + } + else /* It means user requested to update an existing SR steering policy */ + { + /* Retrieve SR steering policy */ + if (bsid) //TODO FIXME + { + p = hash_get (sm->sr_policies_index_hash, bsid); + if (p) + sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); + else + return -2; + } + else + sr_policy = pool_elt_at_index (sm->sr_policies, sr_policy_index); + + if (!sr_policy) + return -2; + + steer_pl->sr_policy = sr_policy - sm->sr_policies; + + /* Remove old FIB/hw redirection and create a new one */ + if (steer_pl->classify.traffic_type == SR_STEER_IPV6) + { + /* Remove FIB entry */ + pfx.fp_proto = FIB_PROTOCOL_IP6; + pfx.fp_len = steer_pl->classify.mask_width; + pfx.fp_addr.ip6 = steer_pl->classify.prefix.ip6; + + fib_table_entry_delete (fib_table_find + (FIB_PROTOCOL_IP6, + steer_pl->classify.fib_table), &pfx, + FIB_SOURCE_SR); + + /* Create a new one */ + goto update_fib; + } + else if (steer_pl->classify.traffic_type == SR_STEER_IPV4) + { + /* Remove FIB entry */ + pfx.fp_proto = FIB_PROTOCOL_IP4; + pfx.fp_len = steer_pl->classify.mask_width; + pfx.fp_addr.ip4 = steer_pl->classify.prefix.ip4; + + fib_table_entry_delete (fib_table_find + (FIB_PROTOCOL_IP4, + steer_pl->classify.fib_table), &pfx, + FIB_SOURCE_SR); + + /* Create a new one */ + goto update_fib; + } + } + } + else + /* delete; steering policy does not exist; complain */ + if (is_del) + return -4; + + /* Retrieve SR policy */ + if (bsid) //FIX + { + p = hash_get (sm->sr_policies_index_hash, bsid); + if (p) + sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); + else + return -2; + } + else + sr_policy = pool_elt_at_index (sm->sr_policies, sr_policy_index); + + /* Create a new steering policy */ + pool_get (sm->steer_policies, steer_pl); + memset (steer_pl, 0, sizeof (*steer_pl)); + + if (traffic_type == SR_STEER_IPV4 || traffic_type == SR_STEER_IPV6) + { + clib_memcpy (&steer_pl->classify.prefix, prefix, + sizeof (ip46_address_t)); + steer_pl->classify.mask_width = mask_width; + steer_pl->classify.fib_table = (table_id != (u32) ~ 0 ? table_id : 0); + steer_pl->classify.traffic_type = traffic_type; + } + else + { + /* Incorrect API usage. Should never get here */ + pool_put (sm->steer_policies, steer_pl); + mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); + return -1; + } + steer_pl->sr_policy = sr_policy - sm->sr_policies; + + /* Create and store key */ + mhash_set (&sm->sr_steer_policies_hash, &key, steer_pl - sm->steer_policies, + NULL); + +update_fib:; + + fib_route_path_t path = { + .frp_proto = FIB_PROTOCOL_MPLS, + .frp_local_label = sr_policy->bsid, + .frp_eos = MPLS_EOS, + .frp_sw_if_index = ~0, + .frp_fib_index = 0, + .frp_weight = 1, + .frp_flags = FIB_ROUTE_PATH_FLAG_NONE, + .frp_label_stack = NULL + }; + + fib_route_path_t *paths = NULL; + + /* FIB API calls - Recursive route through the BindingSID */ + if (traffic_type == SR_STEER_IPV6) + { + pfx.fp_proto = FIB_PROTOCOL_IP6; + pfx.fp_len = steer_pl->classify.mask_width; + pfx.fp_addr.ip6 = steer_pl->classify.prefix.ip6; + path.frp_fib_index = 0; + + vec_add1 (paths, path); + + fib_table_entry_path_add2 (fib_table_find + (FIB_PROTOCOL_IP6, + (table_id != (u32) ~ 0 ? table_id : 0)), + &pfx, FIB_SOURCE_SR, + FIB_ENTRY_FLAG_LOOSE_URPF_EXEMPT, paths); + + vec_free (paths); + } + else if (traffic_type == SR_STEER_IPV4) + { + pfx.fp_proto = FIB_PROTOCOL_IP4; + pfx.fp_len = steer_pl->classify.mask_width; + pfx.fp_addr.ip4 = steer_pl->classify.prefix.ip4; + path.frp_fib_index = 0; + + vec_add1 (paths, path); + + fib_table_entry_path_add2 (fib_table_find + (FIB_PROTOCOL_IP4, + (table_id != (u32) ~ 0 ? table_id : 0)), + &pfx, FIB_SOURCE_SR, + FIB_ENTRY_FLAG_LOOSE_URPF_EXEMPT, paths); + + vec_free (paths); + } + + return 0; +} + +static clib_error_t * +sr_mpls_steer_policy_command_fn (vlib_main_t * vm, unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + int is_del = 0; + + ip46_address_t prefix; + u32 dst_mask_width = 0; + u8 traffic_type = 0; + u32 fib_table = (u32) ~ 0; + + mpls_label_t bsid; + u32 sr_policy_index = (u32) ~ 0; + + u8 sr_policy_set = 0; + + memset (&prefix, 0, sizeof (ip46_address_t)); + + int rv; + while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) + { + if (unformat (input, "del")) + is_del = 1; + else if (!traffic_type + && unformat (input, "l3 %U/%d", unformat_ip6_address, + &prefix.ip6, &dst_mask_width)) + traffic_type = SR_STEER_IPV6; + else if (!traffic_type + && unformat (input, "l3 %U/%d", unformat_ip4_address, + &prefix.ip4, &dst_mask_width)) + traffic_type = SR_STEER_IPV4; + else if (!sr_policy_set + && unformat (input, "via sr policy index %d", + &sr_policy_index)) + sr_policy_set = 1; + else if (!sr_policy_set + && unformat (input, "via sr policy bsid %U", + unformat_mpls_unicast_label, &bsid)) + sr_policy_set = 1; + else if (fib_table == (u32) ~ 0 + && unformat (input, "fib-table %d", &fib_table)); + else + break; + } + + if (!traffic_type) + return clib_error_return (0, "No L3 traffic specified"); + if (!sr_policy_set) + return clib_error_return (0, "No SR policy specified"); + + /* Make sure that the prefixes are clean */ + if (traffic_type == SR_STEER_IPV4) + { + u32 mask = + (dst_mask_width ? (0xFFFFFFFFu >> (32 - dst_mask_width)) : 0); + prefix.ip4.as_u32 &= mask; + } + else if (traffic_type == SR_STEER_IPV6) + { + ip6_address_t mask; + ip6_address_mask_from_width (&mask, dst_mask_width); + ip6_address_mask (&prefix.ip6, &mask); + } + + rv = + sr_mpls_steering_policy (is_del, bsid, + sr_policy_index, fib_table, &prefix, + dst_mask_width, traffic_type); + + switch (rv) + { + case 0: + break; + case 1: + return 0; + case -1: + return clib_error_return (0, "Incorrect API usage."); + case -2: + return clib_error_return (0, + "The requested SR policy could not be located. Review the BSID/index."); + case -3: + return clib_error_return (0, + "Unable to do SW redirect. Incorrect interface."); + case -4: + return clib_error_return (0, + "The requested SR steering policy could not be deleted."); + case -5: + return clib_error_return (0, + "The SR policy is not an encapsulation one."); + default: + return clib_error_return (0, "BUG: sr steer policy returns %d", rv); + } + return 0; +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (sr_mpls_steer_policy_command, static) = { + .path = "sr mpls steer", + .short_help = "sr mpls steer (del) l3 " + "via sr policy bsid (fib-table )", + .long_help = + "\tSteer L3 traffic through an existing SR policy.\n" + "\tExamples:\n" + "\t\tsr steer l3 2001::/64 via sr_policy index 5\n" + "\t\tsr steer l3 2001::/64 via sr_policy bsid 29999\n" + "\t\tsr steer del l3 2001::/64 via sr_policy index 5\n", + .function = sr_mpls_steer_policy_command_fn, +}; +/* *INDENT-ON* */ + +static clib_error_t * +show_sr_mpls_steering_policies_command_fn (vlib_main_t * vm, + unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + mpls_sr_main_t *sm = &sr_mpls_main; + mpls_sr_steering_policy_t **steer_policies = 0; + mpls_sr_steering_policy_t *steer_pl; + + mpls_sr_policy_t *pl = 0; + int i; + + vlib_cli_output (vm, "SR MPLS steering policies:"); + /* *INDENT-OFF* */ + pool_foreach (steer_pl, sm->steer_policies, ({vec_add1(steer_policies, steer_pl);})); + /* *INDENT-ON* */ + vlib_cli_output (vm, "Traffic\t\tSR policy BSID"); + for (i = 0; i < vec_len (steer_policies); i++) + { + steer_pl = steer_policies[i]; + pl = pool_elt_at_index (sm->sr_policies, steer_pl->sr_policy); + if (steer_pl->classify.traffic_type == SR_STEER_IPV4) + { + vlib_cli_output (vm, "L3 %U/%d\t%U", + format_ip4_address, + &steer_pl->classify.prefix.ip4, + steer_pl->classify.mask_width, + format_mpls_unicast_label, pl->bsid); + } + else if (steer_pl->classify.traffic_type == SR_STEER_IPV6) + { + vlib_cli_output (vm, "L3 %U/%d\t%U", + format_ip6_address, + &steer_pl->classify.prefix.ip6, + steer_pl->classify.mask_width, + format_mpls_unicast_label, pl->bsid); + } + } + return 0; +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (show_sr_mpls_steering_policies_command, static) = { + .path = "show sr mpls steering policies", + .short_help = "show sr mpls steering policies", + .function = show_sr_mpls_steering_policies_command_fn, +}; +/* *INDENT-ON* */ + +clib_error_t * +sr_mpls_steering_init (vlib_main_t * vm) +{ + mpls_sr_main_t *sm = &sr_mpls_main; + + /* Init memory for function keys */ + mhash_init (&sm->sr_steer_policies_hash, sizeof (uword), + sizeof (sr_mpls_steering_key_t)); + + return 0; +} + +/* *INDENT-OFF* */ +VLIB_INIT_FUNCTION (sr_mpls_steering_init); +/* *INDENT-ON* */ + +/* +* fd.io coding-style-patch-verification: ON +* +* Local Variables: +* eval: (c-set-style "gnu") +* End: +*/ diff --git a/src/vnet/srv6/dir.dox b/src/vnet/srv6/dir.dox new file mode 100755 index 00000000..3f539a58 --- /dev/null +++ b/src/vnet/srv6/dir.dox @@ -0,0 +1,25 @@ +/* + * + * Copyright (c) 2013 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +/** + @dir + @brief Segment Routing code + + An implementation of Segment Routing as per: + draft-ietf-6man-segment-routing-header-05 + + @see ietf_draft_05.txt + +*/ \ No newline at end of file diff --git a/src/vnet/srv6/ietf_draft_05.txt b/src/vnet/srv6/ietf_draft_05.txt new file mode 100755 index 00000000..e9bff04f --- /dev/null +++ b/src/vnet/srv6/ietf_draft_05.txt @@ -0,0 +1,1564 @@ +Network Working Group S. Previdi, Ed. +Internet-Draft C. Filsfils +Intended status: Standards Track Cisco Systems, Inc. +Expires: August 5, 2017 B. Field + Comcast + I. Leung + Rogers Communications + J. Linkova + Google + E. Aries + Facebook + T. Kosugi + NTT + E. Vyncke + Cisco Systems, Inc. + D. Lebrun + Universite Catholique de Louvain + February 1, 2017 + + + IPv6 Segment Routing Header (SRH) + draft-ietf-6man-segment-routing-header-05 + +Abstract + + Segment Routing (SR) allows a node to steer a packet through a + controlled set of instructions, called segments, by prepending an SR + header to the packet. A segment can represent any instruction, + topological or service-based. SR allows to enforce a flow through + any path (topological, or application/service based) while + maintaining per-flow state only at the ingress node to the SR domain. + + Segment Routing can be applied to the IPv6 data plane with the + addition of a new type of Routing Extension Header. This draft + describes the Segment Routing Extension Header Type and how it is + used by SR capable nodes. + +Requirements Language + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in RFC 2119 [RFC2119]. + +Status of This Memo + + This Internet-Draft is submitted in full conformance with the + provisions of BCP 78 and BCP 79. + + + + +Previdi, et al. Expires August 5, 2017 [Page 1] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + Internet-Drafts are working documents of the Internet Engineering + Task Force (IETF). Note that other groups may also distribute + working documents as Internet-Drafts. The list of current Internet- + Drafts is at http://datatracker.ietf.org/drafts/current/. + + Internet-Drafts are draft documents valid for a maximum of six months + and may be updated, replaced, or obsoleted by other documents at any + time. It is inappropriate to use Internet-Drafts as reference + material or to cite them other than as "work in progress." + + This Internet-Draft will expire on August 5, 2017. + +Copyright Notice + + Copyright (c) 2017 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Segment Routing Documents . . . . . . . . . . . . . . . . . . 3 + 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 + 2.1. Data Planes supporting Segment Routing . . . . . . . . . 4 + 2.2. Segment Routing (SR) Domain . . . . . . . . . . . . . . . 4 + 2.2.1. SR Domain in a Service Provider Network . . . . . . . 5 + 2.2.2. SR Domain in a Overlay Network . . . . . . . . . . . 6 + 3. Segment Routing Extension Header (SRH) . . . . . . . . . . . 7 + 3.1. SRH TLVs . . . . . . . . . . . . . . . . . . . . . . . . 9 + 3.1.1. Ingress Node TLV . . . . . . . . . . . . . . . . . . 10 + 3.1.2. Egress Node TLV . . . . . . . . . . . . . . . . . . . 11 + 3.1.3. Opaque Container TLV . . . . . . . . . . . . . . . . 11 + 3.1.4. Padding TLV . . . . . . . . . . . . . . . . . . . . . 12 + 3.1.5. HMAC TLV . . . . . . . . . . . . . . . . . . . . . . 13 + 3.2. SRH and RFC2460 behavior . . . . . . . . . . . . . . . . 14 + 4. SRH Procedures . . . . . . . . . . . . . . . . . . . . . . . 14 + 4.1. Source SR Node . . . . . . . . . . . . . . . . . . . . . 14 + 4.2. Transit Node . . . . . . . . . . . . . . . . . . . . . . 15 + 4.3. SR Segment Endpoint Node . . . . . . . . . . . . . . . . 16 + 5. Security Considerations . . . . . . . . . . . . . . . . . . . 16 + + + +Previdi, et al. Expires August 5, 2017 [Page 2] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + 5.1. Threat model . . . . . . . . . . . . . . . . . . . . . . 17 + 5.1.1. Source routing threats . . . . . . . . . . . . . . . 17 + 5.1.2. Applicability of RFC 5095 to SRH . . . . . . . . . . 17 + 5.1.3. Service stealing threat . . . . . . . . . . . . . . . 18 + 5.1.4. Topology disclosure . . . . . . . . . . . . . . . . . 18 + 5.1.5. ICMP Generation . . . . . . . . . . . . . . . . . . . 18 + 5.2. Security fields in SRH . . . . . . . . . . . . . . . . . 19 + 5.2.1. Selecting a hash algorithm . . . . . . . . . . . . . 20 + 5.2.2. Performance impact of HMAC . . . . . . . . . . . . . 21 + 5.2.3. Pre-shared key management . . . . . . . . . . . . . . 21 + 5.3. Deployment Models . . . . . . . . . . . . . . . . . . . . 22 + 5.3.1. Nodes within the SR domain . . . . . . . . . . . . . 22 + 5.3.2. Nodes outside of the SR domain . . . . . . . . . . . 22 + 5.3.3. SR path exposure . . . . . . . . . . . . . . . . . . 23 + 5.3.4. Impact of BCP-38 . . . . . . . . . . . . . . . . . . 23 + 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 + 7. Manageability Considerations . . . . . . . . . . . . . . . . 24 + 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24 + 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24 + 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 + 10.1. Normative References . . . . . . . . . . . . . . . . . . 25 + 10.2. Informative References . . . . . . . . . . . . . . . . . 25 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27 + +1. Segment Routing Documents + + Segment Routing terminology is defined in + [I-D.ietf-spring-segment-routing]. + + Segment Routing use cases are described in [RFC7855] and + [I-D.ietf-spring-ipv6-use-cases]. + + Segment Routing protocol extensions are defined in + [I-D.ietf-isis-segment-routing-extensions], and + [I-D.ietf-ospf-ospfv3-segment-routing-extensions]. + +2. Introduction + + Segment Routing (SR), defined in [I-D.ietf-spring-segment-routing], + allows a node to steer a packet through a controlled set of + instructions, called segments, by prepending an SR header to the + packet. A segment can represent any instruction, topological or + service-based. SR allows to enforce a flow through any path + (topological or service/application based) while maintaining per-flow + state only at the ingress node to the SR domain. Segments can be + derived from different components: IGP, BGP, Services, Contexts, + Locators, etc. The list of segment forming the path is called the + Segment List and is encoded in the packet header. + + + +Previdi, et al. Expires August 5, 2017 [Page 3] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + SR allows the use of strict and loose source based routing paradigms + without requiring any additional signaling protocols in the + infrastructure hence delivering an excellent scalability property. + + The source based routing model described in + [I-D.ietf-spring-segment-routing] is inherited from the ones proposed + by [RFC1940] and [RFC2460]. The source based routing model offers + the support for explicit routing capability. + +2.1. Data Planes supporting Segment Routing + + Segment Routing (SR), can be instantiated over MPLS + ([I-D.ietf-spring-segment-routing-mpls]) and IPv6. This document + defines its instantiation over the IPv6 data-plane based on the use- + cases defined in [I-D.ietf-spring-ipv6-use-cases]. + + This document defines a new type of Routing Header (originally + defined in [RFC2460]) called the Segment Routing Header (SRH) in + order to convey the Segment List in the packet header as defined in + [I-D.ietf-spring-segment-routing]. Mechanisms through which segment + are known and advertised are outside the scope of this document. + + A segment is materialized by an IPv6 address. A segment identifies a + topological instruction or a service instruction. A segment can be + either: + + o global: a global segment represents an instruction supported by + all nodes in the SR domain and it is instantiated through an IPv6 + address globally known in the SR domain. + + o local: a local segment represents an instruction supported only by + the node who originates it and it is instantiated through an IPv6 + address that is known only by the local node. + +2.2. Segment Routing (SR) Domain + + We define the concept of the Segment Routing Domain (SR Domain) as + the set of nodes participating into the source based routing model. + These nodes may be connected to the same physical infrastructure + (e.g.: a Service Provider's network) as well as nodes remotely + connected to each other (e.g.: an enterprise VPN or an overlay). + + A non-exhaustive list of examples of SR Domains is: + + o The network of an operator, service provider, content provider, + enterprise including nodes, links and Autonomous Systems. + + + + + +Previdi, et al. Expires August 5, 2017 [Page 4] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + o A set of nodes connected as an overlay over one or more transit + providers. The overlay nodes exchange SR-enabled traffic with + segments belonging solely to the overlay routers (the SR domain). + None of the segments in the SR-enabled packets exchanged by the + overlay belong to the transit networks + + The source based routing model through its instantiation of the + Segment Routing Header (SRH) defined in this document equally applies + to all the above examples. + + It is assumed in this document that the SRH is added to the packet by + its source, consistently with the source routing model defined in + [RFC2460]. For example: + + o At the node originating the packet (host, server). + + o At the ingress node of an SR domain where the ingress node + receives an IPv6 packet and encapsulates it into an outer IPv6 + header followed by a Segment Routing header. + +2.2.1. SR Domain in a Service Provider Network + + The following figure illustrates an SR domain consisting of an + operator's network infrastructure. + + (-------------------------- Operator 1 -----------------------) + ( ) + ( (-----AS 1-----) (-------AS 2-------) (----AS 3-------) ) + ( ( ) ( ) ( ) ) + A1--(--(--11---13--14-)--(-21---22---23--24-)--(-31---32---34--)--)--Z1 + ( ( /|\ /|\ /| ) ( |\ /|\ /|\ /| ) ( |\ /|\ /| \ ) ) + A2--(--(/ | \/ | \/ | ) ( | \/ | \/ | \/ | ) ( | \/ | \/ | \)--)--Z2 + ( ( | /\ | /\ | ) ( | /\ | /\ | /\ | ) ( | /\ | /\ | ) ) + ( ( |/ \|/ \| ) ( |/ \|/ \|/ \| ) ( |/ \|/ \| ) ) + A3--(--(--15---17--18-)--(-25---26---27--28-)--(-35---36---38--)--)--Z3 + ( ( ) ( ) ( ) ) + ( (--------------) (------------------) (---------------) ) + ( ) + (-------------------------------------------------------------) + + Figure 1: Service Provider SR Domain + + Figure 1 describes an operator network including several ASes and + delivering connectivity between endpoints. In this scenario, Segment + Routing is used within the operator networks and across the ASes + boundaries (all being under the control of the same operator). In + this case segment routing can be used in order to address use cases + such as end-to-end traffic engineering, fast re-route, egress peer + + + +Previdi, et al. Expires August 5, 2017 [Page 5] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + engineering, data-center traffic engineering as described in + [RFC7855], [I-D.ietf-spring-ipv6-use-cases] and + [I-D.ietf-spring-resiliency-use-cases]. + + Typically, an IPv6 packet received at ingress (i.e.: from outside the + SR domain), is classified according to network operator policies and + such classification results into an outer header with an SRH applied + to the incoming packet. The SRH contains the list of segment + representing the path the packet must take inside the SR domain. + Thus, the SA of the packet is the ingress node, the DA (due to SRH + procedures described in Section 4) is set as the first segment of the + path and the last segment of the path is the egress node of the SR + domain. + + The path may include intra-AS as well as inter-AS segments. It has + to be noted that all nodes within the SR domain are under control of + the same administration. When the packet reaches the egress point of + the SR domain, the outer header and its SRH are removed so that the + destination of the packet is unaware of the SR domain the packet has + traversed. + + The outer header with the SRH is no different from any other + tunneling encapsulation mechanism and allows a network operator to + implement traffic engineering mechanisms so to efficiently steer + traffic across his infrastructure. + +2.2.2. SR Domain in a Overlay Network + + The following figure illustrates an SR domain consisting of an + overlay network over multiple operator's networks. + + (--Operator 1---) (-----Operator 2-----) (--Operator 3---) + ( ) ( ) ( ) + A1--(--11---13--14--)--(--21---22---23--24--)--(-31---32---34--)--C1 + ( /|\ /|\ /| ) ( |\ /|\ /|\ /| ) ( |\ /|\ /| \ ) + A2--(/ | \/ | \/ | ) ( | \/ | \/ | \/ | ) ( | \/ | \/ | \)--C2 + ( | /\ | /\ | ) ( | /\ | /\ | /\ | ) ( | /\ | /\ | ) + ( |/ \|/ \| ) ( |/ \|/ \|/ \| ) ( |/ \|/ \| ) + A3--(--15---17--18--)--(--25---26---27--28--)--(-35---36---38--)--C3 + ( ) ( | | | ) ( ) + (---------------) (--|----|---------|--) (---------------) + | | | + B1 B2 B3 + + Figure 2: Overlay SR Domain + + + + + + +Previdi, et al. Expires August 5, 2017 [Page 6] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + Figure 2 describes an overlay consisting of nodes connected to three + different network operators and forming a single overlay network + where Segment routing packets are exchanged. + + The overlay consists of nodes A1, A2, A3, B1, B2, B3, C1, C2 and C3. + These nodes are connected to their respective network operator and + form an overlay network. + + Each node may originate packets with an SRH which contains, in the + segment list of the SRH or in the DA, segments identifying other + overlay nodes. This implies that packets with an SRH may traverse + operator's networks but, obviously, these SRHs cannot contain an + address/segment of the transit operators 1, 2 and 3. The SRH + originated by the overlay can only contain address/segment under the + administration of the overlay (e.g. address/segments supported by A1, + A2, A3, B1, B2, B3, C1,C2 or C3). + + In this model, the operator network nodes are transit nodes and, + according to [RFC2460], MUST NOT inspect the routing extension header + since they are not the DA of the packet. + + It is a common practice in operators networks to filter out, at + ingress, any packet whose DA is the address of an internal node and + it is also possible that an operator would filter out any packet + destined to an internal address and having an extension header in it. + + This common practice does not impact the SR-enabled traffic between + the overlay nodes as the intermediate transit networks never see a + destination address belonging to their infrastructure. These SR- + enabled overlay packets will thus never be filtered by the transit + operators. + + In all cases, transit packets (i.e.: packets whose DA is outside the + domain of the operator's network) will be forwarded accordingly + without introducing any security concern in the operator's network. + This is similar to tunneled packets. + +3. Segment Routing Extension Header (SRH) + + A new type of the Routing Header (originally defined in [RFC2460]) is + defined: the Segment Routing Header (SRH) which has a new Routing + Type, (suggested value 4) to be assigned by IANA. + + The Segment Routing Header (SRH) is defined as follows: + + + + + + + +Previdi, et al. Expires August 5, 2017 [Page 7] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Next Header | Hdr Ext Len | Routing Type | Segments Left | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | First Segment | Flags | RESERVED | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + | Segment List[0] (128 bits IPv6 address) | + | | + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + | | + ... + | | + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + | Segment List[n] (128 bits IPv6 address) | + | | + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + // // + // Optional Type Length Value objects (variable) // + // // + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + where: + + o Next Header: 8-bit selector. Identifies the type of header + immediately following the SRH. + + o Hdr Ext Len: 8-bit unsigned integer, is the length of the SRH + header in 8-octet units, not including the first 8 octets. + + o Routing Type: TBD, to be assigned by IANA (suggested value: 4). + + o Segments Left. Defined in [RFC2460], it contains the index, in + the Segment List, of the next segment to inspect. Segments Left + is decremented at each segment. + + o First Segment: contains the index, in the Segment List, of the + first segment of the path which is in fact the last element of the + Segment List. + + o Flags: 8 bits of flags. Following flags are defined: + + + + +Previdi, et al. Expires August 5, 2017 [Page 8] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + 0 1 2 3 4 5 6 7 + +-+-+-+-+-+-+-+-+ + |U|P|O|A|H| U | + +-+-+-+-+-+-+-+-+ + + U: Unused and for future use. SHOULD be unset on transmission + and MUST be ignored on receipt. + + P-flag: Protected flag. Set when the packet has been rerouted + through FRR mechanism by an SR endpoint node. + + O-flag: OAM flag. When set, it indicates that this packet is + an operations and management (OAM) packet. + + A-flag: Alert flag. If present, it means important Type Length + Value (TLV) objects are present. See Section 3.1 for details + on TLVs objects. + + H-flag: HMAC flag. If set, the HMAC TLV is present and is + encoded as the last TLV of the SRH. In other words, the last + 36 octets of the SRH represent the HMAC information. See + Section 3.1.5 for details on the HMAC TLV. + + o RESERVED: SHOULD be unset on transmission and MUST be ignored on + receipt. + + o Segment List[n]: 128 bit IPv6 addresses representing the nth + segment in the Segment List. The Segment List is encoded starting + from the last segment of the path. I.e., the first element of the + segment list (Segment List [0]) contains the last segment of the + path while the last segment of the Segment List (Segment List[n]) + contains the first segment of the path. The index contained in + "Segments Left" identifies the current active segment. + + o Type Length Value (TLV) are described in Section 3.1. + +3.1. SRH TLVs + + This section defines TLVs of the Segment Routing Header. + + Type Length Value (TLV) contain optional information that may be used + by the node identified in the DA of the packet. It has to be noted + that the information carried in the TLVs is not intended to be used + by the routing layer. Typically, TLVs carry information that is + consumed by other components (e.g.: OAM) than the routing function. + + Each TLV has its own length, format and semantic. The code-point + allocated (by IANA) to each TLV defines both the format and the + + + +Previdi, et al. Expires August 5, 2017 [Page 9] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + semantic of the information carried in the TLV. Multiple TLVs may be + encoded in the same SRH. + + The "Length" field of the TLV is primarily used to skip the TLV while + inspecting the SRH in case the node doesn't support or recognize the + TLV codepoint. The "Length" defines the TLV length in octets and not + including the "Type" and "Length" fields. + + The primary scope of TLVs is to give the receiver of the packet + information related to the source routed path (e.g.: where the packet + entered in the SR domain and where it is expected to exit). + + Additional TLVs may be defined in the future. + +3.1.1. Ingress Node TLV + + The Ingress Node TLV is optional and identifies the node this packet + traversed when entered the SR domain. The Ingress Node TLV has + following format: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type | Length | RESERVED | Flags | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + | Ingress Node (16 octets) | + | | + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + where: + + o Type: to be assigned by IANA (suggested value 1). + + o Length: 18. + + o RESERVED: 8 bits. SHOULD be unset on transmission and MUST be + ignored on receipt. + + o Flags: 8 bits. No flags are defined in this document. + + o Ingress Node: 128 bits. Defines the node where the packet is + expected to enter the SR domain. In the encapsulation case + described in Section 2.2.1, this information corresponds to the SA + of the encapsulating header. + + + + + +Previdi, et al. Expires August 5, 2017 [Page 10] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + +3.1.2. Egress Node TLV + + The Egress Node TLV is optional and identifies the node this packet + is expected to traverse when exiting the SR domain. The Egress Node + TLV has following format: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type | Length | RESERVED | Flags | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + | Egress Node (16 octets) | + | | + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + where: + + o Type: to be assigned by IANA (suggested value 2). + + o Length: 18. + + o RESERVED: 8 bits. SHOULD be unset on transmission and MUST be + ignored on receipt. + + o Flags: 8 bits. No flags are defined in this document. + + o Egress Node: 128 bits. Defines the node where the packet is + expected to exit the SR domain. In the encapsulation case + described in Section 2.2.1, this information corresponds to the + last segment of the SRH in the encapsulating header. + +3.1.3. Opaque Container TLV + + The Opaque Container TLV is optional and has the following format: + + + + + + + + + + + + + + + +Previdi, et al. Expires August 5, 2017 [Page 11] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type | Length | RESERVED | Flags | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + | Opaque Container (16 octets) | + | | + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + where: + + o Type: to be assigned by IANA (suggested value 3). + + o Length: 18. + + o RESERVED: 8 bits. SHOULD be unset on transmission and MUST be + ignored on receipt. + + o Flags: 8 bits. No flags are defined in this document. + + o Opaque Container: 128 bits of opaque data not relevant for the + routing layer. Typically, this information is consumed by a non- + routing component of the node receiving the packet (i.e.: the node + in the DA). + +3.1.4. Padding TLV + + The Padding TLV is optional and with the purpose of aligning the SRH + on a 8 octet boundary. The Padding TLV has the following format: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type | Length | Padding (variable) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + // Padding (variable) // + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + where: + + o Type: to be assigned by IANA (suggested value 4). + + o Length: 1 to 7 + + + + + + +Previdi, et al. Expires August 5, 2017 [Page 12] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + o Padding: from 1 to 7 octets of padding. Padding bits have no + semantic. They SHOULD be set to 0 on transmission and MUST be + ignored on receipt. + + The following applies to the Padding TLV: + + o Padding TLV is optional and MAY only appear once in the SRH. If + present, it MUST have a length between 1 and 7 octets. + + o The Padding TLV is used in order to align the SRH total length on + the 8 octet boundary. + + o When present, the Padding TLV MUST appear as the last TLV before + the HMAC TLV (if HMAC TLV is present). + + o When present, the Padding TLV MUST have a length from 1 to 7 in + order to align the SRH total lenght on a 8-octet boundary. + + o When a router inspecting the SRH encounters the Padding TLV, it + MUST assume that no other TLV (other than the HMAC) follow the + Padding TLV. + +3.1.5. HMAC TLV + + HMAC TLV is optional and contains the HMAC information. The HMAC TLV + has the following format: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type | Length | RESERVED | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | HMAC Key ID (4 octets) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | // + | HMAC (32 octets) // + | // + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + where: + + o Type: to be assigned by IANA (suggested value 5). + + o Length: 38. + + o RESERVED: 2 octets. SHOULD be unset on transmission and MUST be + ignored on receipt. + + + + +Previdi, et al. Expires August 5, 2017 [Page 13] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + o HMAC Key ID: 4 octets. + + o HMAC: 32 octets. + + o HMAC and HMAC Key ID usage is described in Section 5 + + The Following applies to the HMAC TLV: + + o When present, the HMAC TLV MUST be encoded as the last TLV of the + SRH. + + o If the HMAC TLV is present, the SRH H-Flag (Figure 4) MUST be set. + + o When the H-flag is set in the SRH, the router inspecting the SRH + MUST find the HMAC TLV in the last 38 octets of the SRH. + +3.2. SRH and RFC2460 behavior + + The SRH being a new type of the Routing Header, it also has the same + properties: + + SHOULD only appear once in the packet. + + Only the router whose address is in the DA field of the packet + header MUST inspect the SRH. + + Therefore, Segment Routing in IPv6 networks implies that the segment + identifier (i.e.: the IPv6 address of the segment) is moved into the + DA of the packet. + + The DA of the packet changes at each segment termination/completion + and therefore the final DA of the packet MUST be encoded as the last + segment of the path. + +4. SRH Procedures + + In this section we describe the different procedures on the SRH. + +4.1. Source SR Node + + A Source SR Node can be any node originating an IPv6 packet with its + IPv6 and Segment Routing Headers. This include either: + + A host originating an IPv6 packet. + + An SR domain ingress router encapsulating a received IPv6 packet + into an outer IPv6 header followed by an SRH. + + + + +Previdi, et al. Expires August 5, 2017 [Page 14] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + The mechanism through which a Segment List is derived is outside of + the scope of this document. As an example, the Segment List may be + obtained through: + + Local path computation. + + Local configuration. + + Interaction with a centralized controller delivering the path. + + Any other mechanism. + + The following are the steps of the creation of the SRH: + + Next Header and Hdr Ext Len fields are set according to [RFC2460]. + + Routing Type field is set as TBD (to be allocated by IANA, + suggested value 4). + + The Segment List is built with the FIRST segment of the path + encoded in the LAST element of the Segment List. Subsequent + segments are encoded on top of the first segment. Finally, the + LAST segment of the path is encoded in the FIRST element of the + Segment List. In other words, the Segment List is encoded in the + reverse order of the path. + + The final DA of the packet is encoded as the last segment of the + path (encoded in the first element of the Segment List). + + The DA of the packet is set with the value of the first segment + (found in the last element of the segment list). + + The Segments Left field is set to n-1 where n is the number of + elements in the Segment List. + + The First Segment field is set to n-1 where n is the number of + elements in the Segment List. + + The packet is sent out towards the first segment (i.e.: + represented in the packet DA). + + HMAC TLV may be set according to Section 5. + +4.2. Transit Node + + According to [RFC2460], the only node who is allowed to inspect the + Routing Extension Header (and therefore the SRH), is the node + corresponding to the DA of the packet. Any other transit node MUST + + + +Previdi, et al. Expires August 5, 2017 [Page 15] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + NOT inspect the underneath routing header and MUST forward the packet + towards the DA and according to the IPv6 routing table. + + In the example case described in Section 2.2.2, when SR capable nodes + are connected through an overlay spanning multiple third-party + infrastructure, it is safe to send SRH packets (i.e.: packet having a + Segment Routing Header) between each other overlay/SR-capable nodes + as long as the segment list does not include any of the transit + provider nodes. In addition, as a generic security measure, any + service provider will block any packet destined to one of its + internal routers, especially if these packets have an extended header + in it. + +4.3. SR Segment Endpoint Node + + The SR segment endpoint node is the node whose address is in the DA. + The segment endpoint node inspects the SRH and does: + + 1. IF DA = myself (segment endpoint) + 2. IF Segments Left > 0 THEN + decrement Segments Left + update DA with Segment List[Segments Left] + 3. ELSE continue IPv6 processing of the packet + End of processing. + 4. Forward the packet out + +5. Security Considerations + + This section analyzes the security threat model, the security issues + and proposed solutions related to the new Segment Routing Header. + + The Segment Routing Header (SRH) is simply another type of the + routing header as described in RFC 2460 [RFC2460] and is: + + o Added by an SR edge router when entering the segment routing + domain or by the originating host itself. The source host can + even be outside the SR domain; + + o inspected and acted upon when reaching the destination address of + the IP header per RFC 2460 [RFC2460]. + + Per RFC2460 [RFC2460], routers on the path that simply forward an + IPv6 packet (i.e. the IPv6 destination address is none of theirs) + will never inspect and process the content of the SRH. Routers whose + one interface IPv6 address equals the destination address field of + the IPv6 packet MUST parse the SRH and, if supported and if the local + configuration allows it, MUST act accordingly to the SRH content. + + + + +Previdi, et al. Expires August 5, 2017 [Page 16] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + According to RFC2460 [RFC2460], the default behavior of a non SR- + capable router upon receipt of an IPv6 packet with SRH destined to an + address of its, is to: + + o ignore the SRH completely if the Segment Left field is 0 and + proceed to process the next header in the IPv6 packet; + + o discard the IPv6 packet if Segment Left field is greater than 0, + it MAY send a Parameter Problem ICMP message back to the Source + Address. + +5.1. Threat model + +5.1.1. Source routing threats + + Using an SRH is similar to source routing, therefore it has some + well-known security issues as described in RFC4942 [RFC4942] section + 2.1.1 and RFC5095 [RFC5095]: + + o amplification attacks: where a packet could be forged in such a + way to cause looping among a set of SR-enabled routers causing + unnecessary traffic, hence a Denial of Service (DoS) against + bandwidth; + + o reflection attack: where a hacker could force an intermediate node + to appear as the immediate attacker, hence hiding the real + attacker from naive forensic; + + o bypass attack: where an intermediate node could be used as a + stepping stone (for example in a De-Militarized Zone) to attack + another host (for example in the datacenter or any back-end + server). + +5.1.2. Applicability of RFC 5095 to SRH + + First of all, the reader must remember this specific part of section + 1 of RFC5095 [RFC5095], "A side effect is that this also eliminates + benign RH0 use-cases; however, such applications may be facilitated + by future Routing Header specifications.". In short, it is not + forbidden to create new secure type of Routing Header; for example, + RFC 6554 (RPL) [RFC6554] also creates a new Routing Header type for a + specific application confined in a single network. + + In the segment routing architecture described in + [I-D.ietf-spring-segment-routing] there are basically two kinds of + nodes (routers and hosts): + + + + + +Previdi, et al. Expires August 5, 2017 [Page 17] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + o nodes within the SR domain, which is within one single + administrative domain, i.e., where all nodes are trusted anyway + else the damage caused by those nodes could be worse than + amplification attacks: traffic interception, man-in-the-middle + attacks, more server DoS by dropping packets, and so on. + + o nodes outside of the SR domain, which is outside of the + administrative segment routing domain hence they cannot be trusted + because there is no physical security for those nodes, i.e., they + can be replaced by hostile nodes or can be coerced in wrong + behaviors. + + The main use case for SR consists of the single administrative domain + where only trusted nodes with SR enabled and configured participate + in SR: this is the same model as in RFC6554 [RFC6554]. All non- + trusted nodes do not participate as either SR processing is not + enabled by default or because they only process SRH from nodes within + their domain. + + Moreover, all SR nodes ignore SRH created by outsiders based on + topology information (received on a peering or internal interface) or + on presence and validity of the HMAC field. Therefore, if + intermediate nodes ONLY act on valid and authorized SRH (such as + within a single administrative domain), then there is no security + threat similar to RH-0. Hence, the RFC 5095 [RFC5095] attacks are + not applicable. + +5.1.3. Service stealing threat + + Segment routing is used for added value services, there is also a + need to prevent non-participating nodes to use those services; this + is called 'service stealing prevention'. + +5.1.4. Topology disclosure + + The SRH may also contains IPv6 addresses of some intermediate SR- + nodes in the path towards the destination, this obviously reveals + those addresses to the potentially hostile attackers if those + attackers are able to intercept packets containing SRH. On the other + hand, if the attacker can do a traceroute whose probes will be + forwarded along the SR path, then there is little learned by + intercepting the SRH itself. + +5.1.5. ICMP Generation + + Per section 4.4 of RFC2460 [RFC2460], when destination nodes (i.e. + where the destination address is one of theirs) receive a Routing + Header with unsupported Routing Type, the required behavior is: + + + +Previdi, et al. Expires August 5, 2017 [Page 18] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + o If Segments Left is zero, the node must ignore the Routing header + and proceed to process the next header in the packet. + + o If Segments Left is non-zero, the node must discard the packet and + send an ICMP Parameter Problem, Code 0, message to the packet's + Source Address, pointing to the unrecognized Routing Type. + + This required behavior could be used by an attacker to force the + generation of ICMP message by any node. The attacker could send + packets with SRH (with Segment Left set to 0) destined to a node not + supporting SRH. Per RFC2460 [RFC2460], the destination node could + generate an ICMP message, causing a local CPU utilization and if the + source of the offending packet with SRH was spoofed could lead to a + reflection attack without any amplification. + + It must be noted that this is a required behavior for any unsupported + Routing Type and not limited to SRH packets. So, it is not specific + to SRH and the usual rate limiting for ICMP generation is required + anyway for any IPv6 implementation and has been implemented and + deployed for many years. + +5.2. Security fields in SRH + + This section summarizes the use of specific fields in the SRH. They + are based on a key-hashed message authentication code (HMAC). + + The security-related fields in the SRH are instantiated by the HMAC + TLV, containing: + + o HMAC Key-id, 32 bits wide; + + o HMAC, 256 bits wide (optional, exists only if HMAC Key-id is not + 0). + + The HMAC field is the output of the HMAC computation (per RFC 2104 + [RFC2104]) using a pre-shared key identified by HMAC Key-id and of + the text which consists of the concatenation of: + + o the source IPv6 address; + + o First Segment field; + + o an octet of bit flags; + + o HMAC Key-id; + + o all addresses in the Segment List. + + + + +Previdi, et al. Expires August 5, 2017 [Page 19] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + The purpose of the HMAC TLV is to verify the validity, the integrity + and the authorization of the SRH itself. If an outsider of the SR + domain does not have access to a current pre-shared secret, then it + cannot compute the right HMAC field and the first SR router on the + path processing the SRH and configured to check the validity of the + HMAC will simply reject the packet. + + The HMAC TLV is located at the end of the SRH simply because only the + router on the ingress of the SR domain needs to process it, then all + other SR nodes can ignore it (based on local policy) because they + trust the upstream router. This is to speed up forwarding operations + because SR routers which do not validate the SRH do not need to parse + the SRH until the end. + + The HMAC Key-id field allows for the simultaneous existence of + several hash algorithms (SHA-256, SHA3-256 ... or future ones) as + well as pre-shared keys. The HMAC Key-id field is opaque, i.e., it + has neither syntax nor semantic except as an index to the right + combination of pre-shared key and hash algorithm and except that a + value of 0 means that there is no HMAC field. Having an HMAC Key-id + field allows for pre-shared key roll-over when two pre-shared keys + are supported for a while when all SR nodes converged to a fresher + pre-shared key. It could also allow for interoperation among + different SR domains if allowed by local policy and assuming a + collision-free HMAC Key Id allocation. + + When a specific SRH is linked to a time-related service (such as + turbo-QoS for a 1-hour period) where the DA, Segment ID (SID) are + identical, then it is important to refresh the shared-secret + frequently as the HMAC validity period expires only when the HMAC + Key-id and its associated shared-secret expires. + +5.2.1. Selecting a hash algorithm + + The HMAC field in the HMAC TLV is 256 bit wide. Therefore, the HMAC + MUST be based on a hash function whose output is at least 256 bits. + If the output of the hash function is 256, then this output is simply + inserted in the HMAC field. If the output of the hash function is + larger than 256 bits, then the output value is truncated to 256 by + taking the least-significant 256 bits and inserting them in the HMAC + field. + + SRH implementations can support multiple hash functions but MUST + implement SHA-2 [FIPS180-4] in its SHA-256 variant. + + NOTE: SHA-1 is currently used by some early implementations used for + quick interoperations testing, the 160-bit hash value must then be + + + + +Previdi, et al. Expires August 5, 2017 [Page 20] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + right-hand padded with 96 bits set to 0. The authors understand that + this is not secure but is ok for limited tests. + +5.2.2. Performance impact of HMAC + + While adding an HMAC to each and every SR packet increases the + security, it has a performance impact. Nevertheless, it must be + noted that: + + o the HMAC field is used only when SRH is added by a device (such as + a home set-up box) which is outside of the segment routing domain. + If the SRH is added by a router in the trusted segment routing + domain, then, there is no need for an HMAC field, hence no + performance impact. + + o when present, the HMAC field MUST only be checked and validated by + the first router of the segment routing domain, this router is + named 'validating SR router'. Downstream routers may not inspect + the HMAC field. + + o this validating router can also have a cache of to improve the performance. It is not the + same use case as in IPsec where HMAC value was unique per packet, + in SRH, the HMAC value is unique per flow. + + o Last point, hash functions such as SHA-2 have been optimized for + security and performance and there are multiple implementations + with good performance. + + With the above points in mind, the performance impact of using HMAC + is minimized. + +5.2.3. Pre-shared key management + + The field HMAC Key-id allows for: + + o key roll-over: when there is a need to change the key (the hash + pre-shared secret), then multiple pre-shared keys can be used + simultaneously. The validating routing can have a table of for the currently active and future + keys. + + o different algorithms: by extending the previous table to , the validating router + can also support simultaneously several hash algorithms (see + section Section 5.2.1) + + The pre-shared secret distribution can be done: + + + +Previdi, et al. Expires August 5, 2017 [Page 21] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + o in the configuration of the validating routers, either by static + configuration or any SDN oriented approach; + + o dynamically using a trusted key distribution such as [RFC6407] + + The intent of this document is NOT to define yet-another-key- + distribution-protocol. + +5.3. Deployment Models + +5.3.1. Nodes within the SR domain + + An SR domain is defined as a set of interconnected routers where all + routers at the perimeter are configured to add and act on SRH. Some + routers inside the SR domain can also act on SRH or simply forward + IPv6 packets. + + The routers inside an SR domain can be trusted to generate SRH and to + process SRH received on interfaces that are part of the SR domain. + These nodes MUST drop all SRH packets received on an interface that + is not part of the SR domain and containing an SRH whose HMAC field + cannot be validated by local policies. This includes obviously + packet with an SRH generated by a non-cooperative SR domain. + + If the validation fails, then these packets MUST be dropped, ICMP + error messages (parameter problem) SHOULD be generated (but rate + limited) and SHOULD be logged. + +5.3.2. Nodes outside of the SR domain + + Nodes outside of the SR domain cannot be trusted for physical + security; hence, they need to request by some trusted means (outside + of the scope of this document) a complete SRH for each new connection + (i.e. new destination address). The received SRH MUST include an + HMAC TLV which is computed correctly (see Section 5.2). + + When an outside node sends a packet with an SRH and towards an SR + domain ingress node, the packet MUST contain the HMAC TLV (with a + Key-id and HMAC fields) and the the destination address MUST be an + address of an SR domain ingress node . + + The ingress SR router, i.e., the router with an interface address + equals to the destination address, MUST verify the HMAC TLV. + + If the validation is successful, then the packet is simply forwarded + as usual for an SR packet. As long as the packet travels within the + SR domain, no further HMAC check needs to be done. Subsequent + + + + +Previdi, et al. Expires August 5, 2017 [Page 22] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + routers in the SR domain MAY verify the HMAC TLV when they process + the SRH (i.e. when they are the destination). + + If the validation fails, then this packet MUST be dropped, an ICMP + error message (parameter problem) SHOULD be generated (but rate + limited) and SHOULD be logged. + +5.3.3. SR path exposure + + As the intermediate SR nodes addresses appears in the SRH, if this + SRH is visible to an outsider then he/she could reuse this knowledge + to launch an attack on the intermediate SR nodes or get some insider + knowledge on the topology. This is especially applicable when the + path between the source node and the first SR domain ingress router + is on the public Internet. + + The first remark is to state that 'security by obscurity' is never + enough; in other words, the security policy of the SR domain MUST + assume that the internal topology and addressing is known by the + attacker. A simple traceroute will also give the same information + (with even more information as all intermediate nodes between SID + will also be exposed). IPsec Encapsulating Security Payload + [RFC4303] cannot be use to protect the SRH as per RFC4303 the ESP + header must appear after any routing header (including SRH). + + To prevent a user to leverage the gained knowledge by intercepting + SRH, it it recommended to apply an infrastructure Access Control List + (iACL) at the edge of the SR domain. This iACL will drop all packets + from outside the SR-domain whose destination is any address of any + router inside the domain. This security policy should be tuned for + local operations. + +5.3.4. Impact of BCP-38 + + BCP-38 [RFC2827], also known as "Network Ingress Filtering", checks + whether the source address of packets received on an interface is + valid for this interface. The use of loose source routing such as + SRH forces packets to follow a path which differs from the expected + routing. Therefore, if BCP-38 was implemented in all routers inside + the SR domain, then SR packets could be received by an interface + which is not expected one and the packets could be dropped. + + As an SR domain is usually a subset of one administrative domain, and + as BCP-38 is only deployed at the ingress routers of this + administrative domain and as packets arriving at those ingress + routers have been normally forwarded using the normal routing + information, then there is no reason why this ingress router should + + + + +Previdi, et al. Expires August 5, 2017 [Page 23] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + drop the SRH packet based on BCP-38. Routers inside the domain + commonly do not apply BCP-38; so, this is not a problem. + +6. IANA Considerations + + This document makes the following registrations in the Internet + Protocol Version 6 (IPv6) Parameters "Routing Type" registry + maintained by IANA: + + Suggested Description Reference + Value + ---------------------------------------------------------- + 4 Segment Routing Header (SRH) This document + + In addition, this document request IANA to create and maintain a new + Registry: "Segment Routing Header Type-Value Objects". The following + code-points are requested from the registry: + + Registry: Segment Routing Header Type-Value Objects + + Suggested Description Reference + Value + ----------------------------------------------------- + 1 Ingress Node TLV This document + 2 Egress Node TLV This document + 3 Opaque Container TLV This document + 4 Padding TLV This document + 5 HMAC TLV This document + +7. Manageability Considerations + + TBD + +8. Contributors + + Dave Barach, John Leddy, John Brzozowski, Pierre Francois, Nagendra + Kumar, Mark Townsley, Christian Martin, Roberta Maglione, James + Connolly, Aloys Augustin contributed to the content of this document. + +9. Acknowledgements + + The authors would like to thank Ole Troan, Bob Hinden, Fred Baker, + Brian Carpenter, Alexandru Petrescu and Punit Kumar Jaiswal for their + comments to this document. + + + + + + + +Previdi, et al. Expires August 5, 2017 [Page 24] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + +10. References + +10.1. Normative References + + [FIPS180-4] + National Institute of Standards and Technology, "FIPS + 180-4 Secure Hash Standard (SHS)", March 2012, + . + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + . + + [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 + (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460, + December 1998, . + + [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", + RFC 4303, DOI 10.17487/RFC4303, December 2005, + . + + [RFC5095] Abley, J., Savola, P., and G. Neville-Neil, "Deprecation + of Type 0 Routing Headers in IPv6", RFC 5095, + DOI 10.17487/RFC5095, December 2007, + . + + [RFC6407] Weis, B., Rowles, S., and T. Hardjono, "The Group Domain + of Interpretation", RFC 6407, DOI 10.17487/RFC6407, + October 2011, . + +10.2. Informative References + + [I-D.ietf-isis-segment-routing-extensions] + Previdi, S., Filsfils, C., Bashandy, A., Gredler, H., + Litkowski, S., Decraene, B., and j. jefftant@gmail.com, + "IS-IS Extensions for Segment Routing", draft-ietf-isis- + segment-routing-extensions-09 (work in progress), October + 2016. + + [I-D.ietf-ospf-ospfv3-segment-routing-extensions] + Psenak, P., Previdi, S., Filsfils, C., Gredler, H., + Shakir, R., Henderickx, W., and J. Tantsura, "OSPFv3 + Extensions for Segment Routing", draft-ietf-ospf-ospfv3- + segment-routing-extensions-07 (work in progress), October + 2016. + + + + +Previdi, et al. Expires August 5, 2017 [Page 25] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + [I-D.ietf-spring-ipv6-use-cases] + Brzozowski, J., Leddy, J., Townsley, W., Filsfils, C., and + R. Maglione, "IPv6 SPRING Use Cases", draft-ietf-spring- + ipv6-use-cases-08 (work in progress), January 2017. + + [I-D.ietf-spring-resiliency-use-cases] + Filsfils, C., Previdi, S., Decraene, B., and R. Shakir, + "Resiliency use cases in SPRING networks", draft-ietf- + spring-resiliency-use-cases-08 (work in progress), October + 2016. + + [I-D.ietf-spring-segment-routing] + Filsfils, C., Previdi, S., Decraene, B., Litkowski, S., + and R. Shakir, "Segment Routing Architecture", draft-ietf- + spring-segment-routing-10 (work in progress), November + 2016. + + [I-D.ietf-spring-segment-routing-mpls] + Filsfils, C., Previdi, S., Bashandy, A., Decraene, B., + Litkowski, S., Horneffer, M., Shakir, R., + jefftant@gmail.com, j., and E. Crabbe, "Segment Routing + with MPLS data plane", draft-ietf-spring-segment-routing- + mpls-06 (work in progress), January 2017. + + [RFC1940] Estrin, D., Li, T., Rekhter, Y., Varadhan, K., and D. + Zappala, "Source Demand Routing: Packet Format and + Forwarding Specification (Version 1)", RFC 1940, + DOI 10.17487/RFC1940, May 1996, + . + + [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- + Hashing for Message Authentication", RFC 2104, + DOI 10.17487/RFC2104, February 1997, + . + + [RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering: + Defeating Denial of Service Attacks which employ IP Source + Address Spoofing", BCP 38, RFC 2827, DOI 10.17487/RFC2827, + May 2000, . + + [RFC4942] Davies, E., Krishnan, S., and P. Savola, "IPv6 Transition/ + Co-existence Security Considerations", RFC 4942, + DOI 10.17487/RFC4942, September 2007, + . + + + + + + + +Previdi, et al. Expires August 5, 2017 [Page 26] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + [RFC6554] Hui, J., Vasseur, JP., Culler, D., and V. Manral, "An IPv6 + Routing Header for Source Routes with the Routing Protocol + for Low-Power and Lossy Networks (RPL)", RFC 6554, + DOI 10.17487/RFC6554, March 2012, + . + + [RFC7855] Previdi, S., Ed., Filsfils, C., Ed., Decraene, B., + Litkowski, S., Horneffer, M., and R. Shakir, "Source + Packet Routing in Networking (SPRING) Problem Statement + and Requirements", RFC 7855, DOI 10.17487/RFC7855, May + 2016, . + +Authors' Addresses + + Stefano Previdi (editor) + Cisco Systems, Inc. + Via Del Serafico, 200 + Rome 00142 + Italy + + Email: sprevidi@cisco.com + + + Clarence Filsfils + Cisco Systems, Inc. + Brussels + BE + + Email: cfilsfil@cisco.com + + + Brian Field + Comcast + 4100 East Dry Creek Road + Centennial, CO 80122 + US + + Email: Brian_Field@cable.comcast.com + + + Ida Leung + Rogers Communications + 8200 Dixie Road + Brampton, ON L6T 0C1 + CA + + Email: Ida.Leung@rci.rogers.com + + + + +Previdi, et al. Expires August 5, 2017 [Page 27] + +Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 + + + Jen Linkova + Google + 1600 Amphitheatre Parkway + Mountain View, CA 94043 + US + + Email: furry@google.com + + + Ebben Aries + Facebook + US + + Email: exa@fb.com + + + Tomoya Kosugi + NTT + 3-9-11, Midori-Cho Musashino-Shi, + Tokyo 180-8585 + JP + + Email: kosugi.tomoya@lab.ntt.co.jp + + + Eric Vyncke + Cisco Systems, Inc. + De Kleetlaann 6A + Diegem 1831 + Belgium + + Email: evyncke@cisco.com + + + David Lebrun + Universite Catholique de Louvain + Place Ste Barbe, 2 + Louvain-la-Neuve, 1348 + Belgium + + Email: david.lebrun@uclouvain.be + + + + + + + + + + +Previdi, et al. Expires August 5, 2017 [Page 28] \ No newline at end of file diff --git a/src/vnet/srv6/sr.api b/src/vnet/srv6/sr.api new file mode 100644 index 00000000..9e900741 --- /dev/null +++ b/src/vnet/srv6/sr.api @@ -0,0 +1,168 @@ +/* + * Copyright (c) 2015-2016 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** \brief IPv6 SR LocalSID add/del request + @param client_index - opaque cookie to identify the sender + @param context - sender context, to match reply w/ request + @param is_del Boolean of whether its a delete instruction + @param localsid_addr IPv6 address of the localsid + @param end_psp Boolean of whether decapsulation is allowed in this function + @param behavior Type of behavior (function) for this localsid + @param sw_if_index Only for L2/L3 xconnect. OIF. In VRF variant the fib_table. + @param vlan_index Only for L2 xconnect. Outgoing VLAN tag. + @param fib_table FIB table in which we should install the localsid entry + @param nh_addr Next Hop IPv4/IPv6 address. Only for L2/L3 xconnect. +*/ +autoreply define sr_localsid_add_del +{ + u32 client_index; + u32 context; + u8 is_del; + u8 localsid_addr[16]; + u8 end_psp; + u8 behavior; + u32 sw_if_index; + u32 vlan_index; + u32 fib_table; + u8 nh_addr[16]; +}; + +/** \brief IPv6 SR policy add + @param client_index - opaque cookie to identify the sender + @param context - sender context, to match reply w/ request + @param bsid is the bindingSID of the SR Policy + @param weight is the weight of the sid list. optional. + @param is_encap is the behavior of the SR policy. (0.SRH insert // 1.Encapsulation) + @param type is the type of the SR policy. (0.Default // 1.Spray) + @param fib_table is the VRF where to install the FIB entry for the BSID + @param segments is a vector of IPv6 address composing the segment list +*/ +autoreply define sr_policy_add +{ + u32 client_index; + u32 context; + u8 bsid_addr[16]; + u32 weight; + u8 is_encap; + u8 type; + u32 fib_table; + u8 n_segments; + u8 segments[0]; +}; + +/** \brief IPv6 SR policy modification + @param client_index - opaque cookie to identify the sender + @param context - sender context, to match reply w/ request + @param bsid is the bindingSID of the SR Policy + @param sr_policy_index is the index of the SR policy + @param fib_table is the VRF where to install the FIB entry for the BSID + @param operation is the operation to perform (among the top ones) + @param segments is a vector of IPv6 address composing the segment list + @param sl_index is the index of the Segment List to modify/delete + @param weight is the weight of the sid list. optional. + @param is_encap Mode. Encapsulation or SRH insertion. +*/ +autoreply define sr_policy_mod +{ + u32 client_index; + u32 context; + u8 bsid_addr[16]; + u32 sr_policy_index; + u32 fib_table; + u8 operation; + u32 sl_index; + u32 weight; + u8 n_segments; + u8 segments[0]; +}; + +/** \brief IPv6 SR policy deletion + @param client_index - opaque cookie to identify the sender + @param context - sender context, to match reply w/ request + @param bsid is the bindingSID of the SR Policy + @param index is the index of the SR policy +*/ +autoreply define sr_policy_del +{ + u32 client_index; + u32 context; + u8 bsid_addr[16]; + u32 sr_policy_index; +}; + +/** \brief IPv6 SR steering add/del + @param client_index - opaque cookie to identify the sender + @param context - sender context, to match reply w/ request + @param is_del + @param bsid is the bindingSID of the SR Policy (alt to sr_policy_index) + @param sr_policy is the index of the SR Policy (alt to bsid) + @param table_id is the VRF where to install the FIB entry for the BSID + @param prefix is the IPv4/v6 address for L3 traffic type + @param mask_width is the mask for L3 traffic type + @param sw_if_index is the incoming interface for L2 traffic + @param traffic_type describes the type of traffic +*/ +autoreply define sr_steering_add_del +{ + u32 client_index; + u32 context; + u8 is_del; + u8 bsid_addr[16]; + u32 sr_policy_index; + u32 table_id; + u8 prefix_addr[16]; + u32 mask_width; + u32 sw_if_index; + u8 traffic_type; +}; + +/** \brief Dump the list of SR LocalSIDs + @param client_index - opaque cookie to identify the sender + @param context - sender context, to match reply w/ request +*/ +/**define sr_localsids_dump +{ + u32 client_index; + u32 context; +};*/ + +/** \brief Details about a single SR LocalSID + @param context - returned sender context, to match reply w/ request + @param localsid_addr IPv6 address of the localsid + @param behavior Type of behavior (function) for this localsid + @param end_psp Boolean of whether decapsulation is allowed in this function + @param sw_if_index Only for L2/L3 xconnect. OIF. In VRF variant the fib_table. + @param vlan_index Only for L2 xconnect. Outgoing VLAN tag. + @param fib_table FIB table in which we should install the localsid entry + @param nh_addr Next Hop IPv4/IPv6 address. Only for L2/L3 xconnect. +*/ +/**manual_endian define sr_localsid_details +{ + u32 context; + u8 localsid_addr[16]; + u8 behavior; + u8 end_psp; + u32 sw_if_index; + u32 vlan_index; + u32 fib_table; + u8 nh_addr[16]; +};*/ + +/* + * fd.io coding-style-patch-verification: ON + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/src/vnet/srv6/sr.c b/src/vnet/srv6/sr.c new file mode 100755 index 00000000..eb4f09e7 --- /dev/null +++ b/src/vnet/srv6/sr.c @@ -0,0 +1,57 @@ +/* + * sr.c: ipv6 segment routing + * + * Copyright (c) 2013 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * @file + * @brief Segment Routing initialization + * + */ + +#include +#include +#include +#include +#include +#include + +ip6_sr_main_t sr_main; + +/** + * @brief no-op lock function. + * The lifetime of the SR entry is managed by the control plane + */ +void +sr_dpo_lock (dpo_id_t * dpo) +{ +} + +/** + * @brief no-op unlock function. + * The lifetime of the SR entry is managed by the control plane + */ +void +sr_dpo_unlock (dpo_id_t * dpo) +{ +} + +/* +* fd.io coding-style-patch-verification: ON +* +* Local Variables: +* eval: (c-set-style "gnu") +* End: +*/ diff --git a/src/vnet/srv6/sr.h b/src/vnet/srv6/sr.h new file mode 100755 index 00000000..2014a23e --- /dev/null +++ b/src/vnet/srv6/sr.h @@ -0,0 +1,325 @@ +/* + * Copyright (c) 2015 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * @file + * @brief Segment Routing data structures definitions + * + */ + +#ifndef included_vnet_srv6_h +#define included_vnet_srv6_h + +#include +#include +#include +#include + +#include +#include + +#define IPv6_DEFAULT_HEADER_LENGTH 40 +#define IPv6_DEFAULT_HOP_LIMIT 64 +#define IPv6_DEFAULT_MAX_MASK_WIDTH 128 + +#define SR_BEHAVIOR_END 1 +#define SR_BEHAVIOR_X 2 +#define SR_BEHAVIOR_D_FIRST 3 /* Unused. Separator in between regular and D */ +#define SR_BEHAVIOR_DX2 4 +#define SR_BEHAVIOR_DX6 5 +#define SR_BEHAVIOR_DX4 6 +#define SR_BEHAVIOR_DT6 7 +#define SR_BEHAVIOR_DT4 8 +#define SR_BEHAVIOR_LAST 9 /* Must always be the last one */ + +#define SR_STEER_L2 2 +#define SR_STEER_IPV4 4 +#define SR_STEER_IPV6 6 + +#define SR_FUNCTION_SIZE 4 +#define SR_ARGUMENT_SIZE 4 + +#define SR_SEGMENT_LIST_WEIGHT_DEFAULT 1 + +/** + * @brief SR Segment List (SID list) + */ +typedef struct +{ + ip6_address_t *segments; /**< SIDs (key) */ + + u32 weight; /**< SID list weight (wECMP / UCMP) */ + + u8 *rewrite; /**< Precomputed rewrite header */ + u8 *rewrite_bsid; /**< Precomputed rewrite header for bindingSID */ + + dpo_id_t bsid_dpo; /**< DPO for Encaps/Insert for BSID */ + dpo_id_t ip6_dpo; /**< DPO for Encaps/Insert IPv6 */ + dpo_id_t ip4_dpo; /**< DPO for Encaps IPv6 */ +} ip6_sr_sl_t; + +/* SR policy types */ +#define SR_POLICY_TYPE_DEFAULT 0 +#define SR_POLICY_TYPE_SPRAY 1 +/** + * @brief SR Policy + */ +typedef struct +{ + u32 *segments_lists; /**< SID lists indexes (vector) */ + + ip6_address_t bsid; /**< BindingSID (key) */ + + u8 type; /**< Type (default is 0) */ + /* SR Policy specific DPO */ + /* IF Type = DEFAULT Then Load Balancer DPO among SID lists */ + /* IF Type = SPRAY then Spray DPO with all SID lists */ + dpo_id_t bsid_dpo; /**< SR Policy specific DPO - BSID */ + dpo_id_t ip4_dpo; /**< SR Policy specific DPO - IPv6 */ + dpo_id_t ip6_dpo; /**< SR Policy specific DPO - IPv4 */ + + u32 fib_table; /**< FIB table */ + + u8 is_encap; /**< Mode (0 is SRH insert, 1 Encaps) */ +} ip6_sr_policy_t; + +/** + * @brief SR LocalSID + */ +typedef struct +{ + ip6_address_t localsid; /**< LocalSID IPv6 address */ + + char end_psp; /**< Combined with End.PSP? */ + + u16 behavior; /**< Behavior associated to this localsid */ + + union + { + u32 sw_if_index; /**< xconnect only */ + u32 vrf_index; /**< vrf only */ + }; + + u32 fib_table; /**< FIB table where localsid is registered */ + + u32 vlan_index; /**< VLAN tag (not an index) */ + + ip46_address_t next_hop; /**< Next_hop for xconnect usage only */ + + u32 nh_adj; /**< Next_adj for xconnect usage only */ + + void *plugin_mem; /**< Memory to be used by the plugin callback functions */ +} ip6_sr_localsid_t; + +typedef int (sr_plugin_callback_t) (ip6_sr_localsid_t * localsid); + +/** + * @brief SR LocalSID behavior registration + */ +typedef struct +{ + u16 sr_localsid_function_number; /**< SR LocalSID plugin function (>SR_BEHAVIOR_LAST) */ + + u8 *function_name; /**< Function name. (key). */ + + u8 *keyword_str; /**< Behavior keyword (i.e. End.X) */ + + u8 *def_str; /**< Behavior definition (i.e. Endpoint with cross-connect) */ + + u8 *params_str; /**< Behavior parameters (i.e. ) */ + + dpo_type_t dpo; /**< DPO type registration */ + + format_function_t *ls_format; /**< LocalSID format function */ + + unformat_function_t *ls_unformat; /**< LocalSID unformat function */ + + sr_plugin_callback_t *creation; /**< Function within plugin that will be called after localsid creation*/ + + sr_plugin_callback_t *removal; /**< Function within plugin that will be called before localsid removal */ +} sr_localsid_fn_registration_t; + +/** + * @brief Steering db key + * + * L3 is IPv4/IPv6 + mask + * L2 is sf_if_index + vlan + */ +typedef struct +{ + union + { + struct + { + ip46_address_t prefix; /**< IP address of the prefix */ + u32 mask_width; /**< Mask width of the prefix */ + u32 fib_table; /**< VRF of the prefix */ + } l3; + struct + { + u32 sw_if_index; /**< Incoming software interface */ + } l2; + }; + u8 traffic_type; /**< Traffic type (IPv4, IPv6, L2) */ + u8 padding[3]; +} sr_steering_key_t; + +typedef struct +{ + sr_steering_key_t classify; /**< Traffic classification */ + u32 sr_policy; /**< SR Policy index */ +} ip6_sr_steering_policy_t; + +/** + * @brief Segment Routing main datastructure + */ +typedef struct +{ + /* L2-input -> SR rewrite next index */ + u32 l2_sr_policy_rewrite_index; + + /* SR SID lists */ + ip6_sr_sl_t *sid_lists; + + /* SRv6 policies */ + ip6_sr_policy_t *sr_policies; + + /* Hash table mapping BindingSID to SRv6 policy */ + mhash_t sr_policies_index_hash; + + /* Pool of SR localsid instances */ + ip6_sr_localsid_t *localsids; + + /* Hash table mapping LOC:FUNC to SR LocalSID instance */ + mhash_t sr_localsids_index_hash; + + /* Pool of SR steer policies instances */ + ip6_sr_steering_policy_t *steer_policies; + + /* Hash table mapping steering rules to SR steer instance */ + mhash_t sr_steer_policies_hash; + + /* L2 steering ifaces - sr_policies */ + u32 *sw_iface_sr_policies; + + /* Spray DPO */ + dpo_type_t sr_pr_spray_dpo_type; + + /* Plugin functions */ + sr_localsid_fn_registration_t *plugin_functions; + + /* Find plugin function by name */ + uword *plugin_functions_by_key; + + /* Counters */ + vlib_combined_counter_main_t sr_ls_valid_counters; + vlib_combined_counter_main_t sr_ls_invalid_counters; + + /* SR Policies FIBs */ + u32 fib_table_ip6; + u32 fib_table_ip4; + + /* convenience */ + vlib_main_t *vlib_main; + vnet_main_t *vnet_main; +} ip6_sr_main_t; + +extern ip6_sr_main_t sr_main; + +extern vlib_node_registration_t sr_policy_rewrite_encaps_node; +extern vlib_node_registration_t sr_policy_rewrite_insert_node; +extern vlib_node_registration_t sr_localsid_node; +extern vlib_node_registration_t sr_localsid_d_node; + +extern void sr_dpo_lock (dpo_id_t * dpo); +extern void sr_dpo_unlock (dpo_id_t * dpo); + +extern int +sr_localsid_register_function (vlib_main_t * vm, u8 * fn_name, + u8 * keyword_str, u8 * def_str, + u8 * params_str, dpo_type_t * dpo, + format_function_t * ls_format, + unformat_function_t * ls_unformat, + sr_plugin_callback_t * creation_fn, + sr_plugin_callback_t * removal_fn); + +extern int +sr_policy_add (ip6_address_t * bsid, ip6_address_t * segments, + u32 weight, u8 behavior, u32 fib_table, u8 is_encap); +extern int +sr_policy_mod (ip6_address_t * bsid, u32 index, u32 fib_table, + u8 operation, ip6_address_t * segments, u32 sl_index, + u32 weight); +extern int sr_policy_del (ip6_address_t * bsid, u32 index); + +extern int +sr_cli_localsid (char is_del, ip6_address_t * localsid_addr, + char end_psp, u8 behavior, u32 sw_if_index, + u32 vlan_index, u32 fib_table, ip46_address_t * nh_addr, + void *ls_plugin_mem); + +extern int +sr_steering_policy (int is_del, ip6_address_t * bsid, u32 sr_policy_index, + u32 table_id, ip46_address_t * prefix, u32 mask_width, + u32 sw_if_index, u8 traffic_type); + +/** + * @brief SR rewrite string computation for SRH insertion (inline) + * + * @param sl is a vector of IPv6 addresses composing the Segment List + * + * @return precomputed rewrite string for SRH insertion + */ +static inline u8 * +ip6_sr_compute_rewrite_string_insert (ip6_address_t * sl) +{ + ip6_sr_header_t *srh; + ip6_address_t *addrp, *this_address; + u32 header_length = 0; + u8 *rs = NULL; + + header_length = 0; + header_length += sizeof (ip6_sr_header_t); + header_length += (vec_len (sl) + 1) * sizeof (ip6_address_t); + + vec_validate (rs, header_length - 1); + + srh = (ip6_sr_header_t *) rs; + srh->type = ROUTING_HEADER_TYPE_SR; + srh->segments_left = vec_len (sl); + srh->first_segment = vec_len (sl); + srh->length = ((sizeof (ip6_sr_header_t) + + ((vec_len (sl) + 1) * sizeof (ip6_address_t))) / 8) - 1; + srh->flags = 0x00; + srh->reserved = 0x0000; + addrp = srh->segments + vec_len (sl); + vec_foreach (this_address, sl) + { + clib_memcpy (addrp->as_u8, this_address->as_u8, sizeof (ip6_address_t)); + addrp--; + } + return rs; +} + + +#endif /* included_vnet_sr_h */ + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/src/vnet/srv6/sr_api.c b/src/vnet/srv6/sr_api.c new file mode 100644 index 00000000..925b50a1 --- /dev/null +++ b/src/vnet/srv6/sr_api.c @@ -0,0 +1,244 @@ +/* + *------------------------------------------------------------------ + * sr_api.c - ipv6 segment routing api + * + * Copyright (c) 2016 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *------------------------------------------------------------------ + */ + +#include +#include +#include + +#include +#include +#include + +#include + +#define vl_typedefs /* define message structures */ +#include +#undef vl_typedefs + +#define vl_endianfun /* define message structures */ +#include +#undef vl_endianfun + +/* instantiate all the print functions we know about */ +#define vl_print(handle, ...) vlib_cli_output (handle, __VA_ARGS__) +#define vl_printfun +#include +#undef vl_printfun + +#include + +#define foreach_vpe_api_msg \ +_(SR_LOCALSID_ADD_DEL, sr_localsid_add_del) \ +_(SR_POLICY_DEL, sr_policy_del) \ +_(SR_STEERING_ADD_DEL, sr_steering_add_del) +//_(SR_LOCALSIDS, sr_localsids_dump) +//_(SR_LOCALSID_BEHAVIORS, sr_localsid_behaviors_dump) + +static void vl_api_sr_localsid_add_del_t_handler + (vl_api_sr_localsid_add_del_t * mp) +{ + vl_api_sr_localsid_add_del_reply_t *rmp; + int rv = 0; +/* + * int sr_cli_localsid (char is_del, ip6_address_t *localsid_addr, + * char end_psp, u8 behavior, u32 sw_if_index, u32 vlan_index, u32 fib_table, + * ip46_address_t *nh_addr, void *ls_plugin_mem) + */ + rv = sr_cli_localsid (mp->is_del, + (ip6_address_t *) & mp->localsid_addr, + mp->end_psp, + mp->behavior, + ntohl (mp->sw_if_index), + ntohl (mp->vlan_index), + ntohl (mp->fib_table), + (ip46_address_t *) & mp->nh_addr, NULL); + + REPLY_MACRO (VL_API_SR_LOCALSID_ADD_DEL_REPLY); +} + +static void +vl_api_sr_policy_add_t_handler (vl_api_sr_policy_add_t * mp) +{ + vl_api_sr_policy_add_reply_t *rmp; + ip6_address_t *segments = 0, *seg; + ip6_address_t *this_address = (ip6_address_t *) mp->segments; + + int i; + for (i = 0; i < mp->n_segments; i++) + { + vec_add2 (segments, seg, 1); + clib_memcpy (seg->as_u8, this_address->as_u8, sizeof (*this_address)); + this_address++; + } + +/* + * sr_policy_add (ip6_address_t *bsid, ip6_address_t *segments, + * u32 weight, u8 behavior, u32 fib_table, u8 is_encap) + */ + int rv = 0; + rv = sr_policy_add ((ip6_address_t *) & mp->bsid_addr, + segments, + ntohl (mp->weight), + mp->type, ntohl (mp->fib_table), mp->is_encap); + + REPLY_MACRO (VL_API_SR_POLICY_ADD_REPLY); +} + +static void +vl_api_sr_policy_mod_t_handler (vl_api_sr_policy_mod_t * mp) +{ + vl_api_sr_policy_mod_reply_t *rmp; + + ip6_address_t *segments = 0, *seg; + ip6_address_t *this_address = (ip6_address_t *) mp->segments; + + int i; + for (i = 0; i < mp->n_segments; i++) + { + vec_add2 (segments, seg, 1); + clib_memcpy (seg->as_u8, this_address->as_u8, sizeof (*this_address)); + this_address++; + } + + int rv = 0; +/* + * int + * sr_policy_mod(ip6_address_t *bsid, u32 index, u32 fib_table, + * u8 operation, ip6_address_t *segments, u32 sl_index, + * u32 weight, u8 is_encap) + */ + rv = sr_policy_mod ((ip6_address_t *) & mp->bsid_addr, + ntohl (mp->sr_policy_index), + ntohl (mp->fib_table), + mp->operation, + segments, ntohl (mp->sl_index), ntohl (mp->weight)); + + REPLY_MACRO (VL_API_SR_POLICY_MOD_REPLY); +} + +static void +vl_api_sr_policy_del_t_handler (vl_api_sr_policy_del_t * mp) +{ + vl_api_sr_policy_del_reply_t *rmp; + int rv = 0; +/* + * int + * sr_policy_del (ip6_address_t *bsid, u32 index) + */ + rv = sr_policy_del ((ip6_address_t *) & mp->bsid_addr, + ntohl (mp->sr_policy_index)); + + REPLY_MACRO (VL_API_SR_POLICY_DEL_REPLY); +} + +static void vl_api_sr_steering_add_del_t_handler + (vl_api_sr_steering_add_del_t * mp) +{ + vl_api_sr_steering_add_del_reply_t *rmp; + int rv = 0; +/* + * int + * sr_steering_policy(int is_del, ip6_address_t *bsid, u32 sr_policy_index, + * u32 table_id, ip46_address_t *prefix, u32 mask_width, u32 sw_if_index, + * u8 traffic_type) + */ + rv = sr_steering_policy (mp->is_del, + (ip6_address_t *) & mp->bsid_addr, + ntohl (mp->sr_policy_index), + ntohl (mp->table_id), + (ip46_address_t *) & mp->prefix_addr, + ntohl (mp->mask_width), + ntohl (mp->sw_if_index), mp->traffic_type); + + REPLY_MACRO (VL_API_SR_STEERING_ADD_DEL_REPLY); +} + +/* + * sr_api_hookup + * Add vpe's API message handlers to the table. + * vlib has alread mapped shared memory and + * added the client registration handlers. + * See .../vlib-api/vlibmemory/memclnt_vlib.c:memclnt_process() + */ +#define vl_msg_name_crc_list +#include +#undef vl_msg_name_crc_list + +static void +setup_message_id_table (api_main_t * am) +{ +#define _(id,n,crc) vl_msg_api_add_msg_name_crc (am, #n "_" #crc, id); + foreach_vl_msg_name_crc_sr; +#undef _ +} + +static clib_error_t * +sr_api_hookup (vlib_main_t * vm) +{ + api_main_t *am = &api_main; + +#define _(N,n) \ + vl_msg_api_set_handlers(VL_API_##N, #n, \ + vl_api_##n##_t_handler, \ + vl_noop_handler, \ + vl_api_##n##_t_endian, \ + vl_api_##n##_t_print, \ + sizeof(vl_api_##n##_t), 1); + foreach_vpe_api_msg; +#undef _ + + /* + * Manually register the sr policy add msg, so we trace + * enough bytes to capture a typical segment list + */ + vl_msg_api_set_handlers (VL_API_SR_POLICY_ADD, + "sr_policy_add", + vl_api_sr_policy_add_t_handler, + vl_noop_handler, + vl_api_sr_policy_add_t_endian, + vl_api_sr_policy_add_t_print, 256, 1); + + /* + * Manually register the sr policy mod msg, so we trace + * enough bytes to capture a typical segment list + */ + vl_msg_api_set_handlers (VL_API_SR_POLICY_MOD, + "sr_policy_mod", + vl_api_sr_policy_mod_t_handler, + vl_noop_handler, + vl_api_sr_policy_mod_t_endian, + vl_api_sr_policy_mod_t_print, 256, 1); + + /* + * Set up the (msg_name, crc, message-id) table + */ + setup_message_id_table (am); + + return 0; +} + +VLIB_API_INIT_FUNCTION (sr_api_hookup); + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/src/vnet/srv6/sr_doc.md b/src/vnet/srv6/sr_doc.md new file mode 100644 index 00000000..5cdfc906 --- /dev/null +++ b/src/vnet/srv6/sr_doc.md @@ -0,0 +1,55 @@ +# SRv6: Segment Routing for IPv6 {#srv6_doc} + +This is a memo intended to contain documentation of the VPP SRv6 implementation. +Everything that is not directly obvious should come here. +For any feedback on content that should be explained please mailto:pcamaril@cisco.com + +## Segment Routing + +Segment routing is a network technology focused on addressing the limitations of existing IP and Multiprotocol Label Switching (MPLS) networks in terms of simplicity, scale, and ease of operation. It is a foundation for application engineered routing as it prepares the networks for new business models where applications can control the network behavior. + +Segment routing seeks the right balance between distributed intelligence and centralized optimization and programming. It was built for the software-defined networking (SDN) era. + +Segment routing enhances packet forwarding behavior by enabling a network to transport unicast packets through a specific forwarding path, different from the normal path that a packet usually takes (IGP shortest path or BGP best path). This capability benefits many use cases, and one can build those specific paths based on application requirements. + +Segment routing uses the source routing paradigm. A node, usually a router but also a switch, a trusted server, or a virtual forwarder running on a hypervisor, steers a packet through an ordered list of instructions, called segments. A segment can represent any instruction, topological or service-based. A segment can have a local semantic to a segment-routing node or global within a segment-routing network. Segment routing allows an operator to enforce a flow through any topological path and service chain while maintaining per-flow state only at the ingress node to the segment-routing network. Segment routing also supports equal-cost multipath (ECMP) by design. + +Segment routing can operate with either an MPLS or an IPv6 data plane. All the currently available MPLS services, such as Layer 3 VPN (L3VPN), L2VPN (Virtual Private Wire Service [VPWS], Virtual Private LAN Services [VPLS], Ethernet VPN [E-VPN], and Provider Backbone Bridging Ethernet VPN [PBB-EVPN]), can run on top of a segment-routing transport network. + +**The implementation of Segment Routing in VPP covers both the IPv6 data plane (SRv6) as well as the MPLS data plane (SR-MPLS). This page contains the SRv6 documentation.** + +## Segment Routing terminology + +* Segment Routing Header (SRH): IPv6 routing extension header of type 'Segment Routing'. (draft-ietf-6man-segment-routing-header-05) +* SegmentID (SID): is an IPv6 address. +* Segment List (SL) (SID List): is the sequence of SIDs that the packet will traverse. +* SR Policy: defines the SRH that will be applied to a packet. A packet steered into an SR policy may either receive the SRH by IPv6 header encapsulation (as recommended in draft-ietf-6man-rfc2460bis) or it could be inserted within an existing IPv6 header. An SR policy is uniquely identified by its Binding SID and associated with a weighted set of Segment Lists. In case several SID lists are defined, traffic steered into the policy is unevenly load-balanced among them according to their respective weights. +* Local SID: is a SID associated with a processing function on the local node, which may go from advancing to the next SID in the SRH, to complex user-defined behaviors. When a FIB lookup, either in the main FIB or in a specific VRF, returns a match on a local SID, the associated function is performed. +* BindingSID: a BindingSID is a SID (only one) associated one-one with an SR Policy. If a packet arrives with an IPv6 DA corresponding to a BindingSID, then the SR policy will be applied to such packet. + +## SRv6 Features in VPP + +The SRv6 Network Programming (*draft-filsfils-spring-srv6-network-programming*) defines the SRv6 architecture. + +VPP supports the following SRv6 LocalSID functions: End, End.X, End.DX6, End.DT6, End.DX4, End.DT4, End.DX2, End.B6, End.B6.Encaps. + +For further information and how to configure each specific function: @subpage srv6_localsid_doc + + +The Segment Routing Policy (*draft-filsfils-spring-segment-routing-policy*) defines SR Policies. + +VPP supports SRv6 Policies with T.Insert and T.Encaps behaviors. + +For further information on how to create SR Policies: @subpage srv6_policy_doc + +For further information on how to steer traffic into SR Policies: @subpage srv6_steering_doc + +## SRv6 LocalSID development framework + +One of the *'key'* concepts about SRv6 is network programmability. This is why an SRv6 LocalSID is associated with an specific function. + +However, the trully way to enable network programmability is allowing any developer **easily** create his own SRv6 LocalSID function. That is the reason why we have added some API calls such that any developer can code his own SRv6 LocalSID behaviors as plugins an add them to the running SRv6 code. + +The principle is that the developer only codes the behavior -the graph node-. However all the FIB handling, SR LocalSID instantiation and so on are done by the VPP SRv6 code. + +For more information please refer to: @subpage srv6_plugin_doc diff --git a/src/vnet/srv6/sr_localsid.c b/src/vnet/srv6/sr_localsid.c new file mode 100755 index 00000000..bdc66386 --- /dev/null +++ b/src/vnet/srv6/sr_localsid.c @@ -0,0 +1,1492 @@ +/* + * sr_localsid.c: ipv6 segment routing Endpoint behaviors + * + * Copyright (c) 2016 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * @file + * @brief Processing of packets with a SRH + * + * CLI to define new Segment Routing End processing functions. + * Graph node to support such functions. + * + * Each function associates an SRv6 segment (IPv6 address) with an specific + * Segment Routing function. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +/** + * @brief Dynamically added SR localsid DPO type + */ +static dpo_type_t sr_localsid_dpo_type; +static dpo_type_t sr_localsid_d_dpo_type; + +/** + * @brief SR localsid add/del + * + * Function to add or delete SR LocalSIDs. + * + * @param is_del Boolean of whether its a delete instruction + * @param localsid_addr IPv6 address of the localsid + * @param is_decap Boolean of whether decapsulation is allowed in this function + * @param behavior Type of behavior (function) for this localsid + * @param sw_if_index Only for L2/L3 xconnect. OIF. In VRF variant the fib_table. + * @param vlan_index Only for L2 xconnect. Outgoing VLAN tag. + * @param fib_table FIB table in which we should install the localsid entry + * @param nh_addr Next Hop IPv4/IPv6 address. Only for L2/L3 xconnect. + * + * @return 0 on success, error otherwise. + */ +int +sr_cli_localsid (char is_del, ip6_address_t * localsid_addr, + char end_psp, u8 behavior, u32 sw_if_index, u32 vlan_index, + u32 fib_table, ip46_address_t * nh_addr, void *ls_plugin_mem) +{ + ip6_sr_main_t *sm = &sr_main; + uword *p; + int rv; + + ip6_sr_localsid_t *ls = 0; + + dpo_id_t dpo = DPO_INVALID; + + /* Search for the item */ + p = mhash_get (&sm->sr_localsids_index_hash, localsid_addr); + + if (p) + { + if (is_del) + { + /* Retrieve localsid */ + ls = pool_elt_at_index (sm->localsids, p[0]); + /* Delete FIB entry */ + fib_prefix_t pfx = { + .fp_proto = FIB_PROTOCOL_IP6, + .fp_len = 128, + .fp_addr = { + .ip6 = *localsid_addr, + } + }; + + fib_table_entry_delete (fib_table_find (FIB_PROTOCOL_IP6, + fib_table), + &pfx, FIB_SOURCE_SR); + + /* In case it is a Xconnect iface remove the (OIF, NHOP) adj */ + if (ls->behavior == SR_BEHAVIOR_X || ls->behavior == SR_BEHAVIOR_DX6 + || ls->behavior == SR_BEHAVIOR_DX4) + adj_unlock (ls->nh_adj); + + if (ls->behavior >= SR_BEHAVIOR_LAST) + { + sr_localsid_fn_registration_t *plugin = 0; + plugin = pool_elt_at_index (sm->plugin_functions, + ls->behavior - SR_BEHAVIOR_LAST); + + /* Callback plugin removal function */ + rv = plugin->removal (ls); + } + + /* Delete localsid registry */ + pool_put (sm->localsids, ls); + mhash_unset (&sm->sr_localsids_index_hash, localsid_addr, NULL); + return 1; + } + else /* create with function already existing; complain */ + return -1; + } + else + /* delete; localsid does not exist; complain */ + if (is_del) + return -2; + + /* Check whether there exists a FIB entry with such address */ + fib_prefix_t pfx = { + .fp_proto = FIB_PROTOCOL_IP6, + .fp_len = 128, + }; + + pfx.fp_addr.as_u64[0] = localsid_addr->as_u64[0]; + pfx.fp_addr.as_u64[1] = localsid_addr->as_u64[1]; + + /* Lookup the FIB index associated to the table id provided */ + u32 fib_index = fib_table_find (FIB_PROTOCOL_IP6, fib_table); + if (fib_index == ~0) + return -3; + + /* Lookup the localsid in such FIB table */ + fib_node_index_t fei = fib_table_lookup_exact_match (fib_index, &pfx); + if (FIB_NODE_INDEX_INVALID != fei) + return -4; //There is an entry for such address (the localsid addr) + + /* Create a new localsid registry */ + pool_get (sm->localsids, ls); + memset (ls, 0, sizeof (*ls)); + + clib_memcpy (&ls->localsid, localsid_addr, sizeof (ip6_address_t)); + ls->end_psp = end_psp; + ls->behavior = behavior; + ls->nh_adj = (u32) ~ 0; + ls->fib_table = fib_table; + switch (behavior) + { + case SR_BEHAVIOR_END: + break; + case SR_BEHAVIOR_X: + ls->sw_if_index = sw_if_index; + clib_memcpy (&ls->next_hop.ip6, &nh_addr->ip6, sizeof (ip6_address_t)); + break; + case SR_BEHAVIOR_DX4: + ls->sw_if_index = sw_if_index; + clib_memcpy (&ls->next_hop.ip4, &nh_addr->ip4, sizeof (ip4_address_t)); + break; + case SR_BEHAVIOR_DX6: + ls->sw_if_index = sw_if_index; + clib_memcpy (&ls->next_hop.ip6, &nh_addr->ip6, sizeof (ip6_address_t)); + break; + case SR_BEHAVIOR_DT6: + ls->vrf_index = sw_if_index; + break; + case SR_BEHAVIOR_DX2: + ls->sw_if_index = sw_if_index; + ls->vlan_index = vlan_index; + break; + } + + /* Figure out the adjacency magic for Xconnect variants */ + if (ls->behavior == SR_BEHAVIOR_X || ls->behavior == SR_BEHAVIOR_DX4 + || ls->behavior == SR_BEHAVIOR_DX6) + { + adj_index_t nh_adj_index = ADJ_INDEX_INVALID; + + /* Retrieve the adjacency corresponding to the (OIF, next_hop) */ + if (ls->behavior == SR_BEHAVIOR_DX6 || ls->behavior == SR_BEHAVIOR_X) + nh_adj_index = adj_nbr_add_or_lock (FIB_PROTOCOL_IP6, VNET_LINK_IP6, + nh_addr, sw_if_index); + + else if (ls->behavior == SR_BEHAVIOR_DX4) + nh_adj_index = adj_nbr_add_or_lock (FIB_PROTOCOL_IP4, VNET_LINK_IP4, + nh_addr, sw_if_index); + + /* Check for ADJ creation error. If so panic */ + if (nh_adj_index == ADJ_INDEX_INVALID) + { + pool_put (sm->localsids, ls); + return -5; + } + + ls->nh_adj = nh_adj_index; + } + + /* Set DPO */ + if (ls->behavior == SR_BEHAVIOR_END || ls->behavior == SR_BEHAVIOR_X) + dpo_set (&dpo, sr_localsid_dpo_type, DPO_PROTO_IP6, ls - sm->localsids); + else if (ls->behavior > SR_BEHAVIOR_D_FIRST + && ls->behavior < SR_BEHAVIOR_LAST) + dpo_set (&dpo, sr_localsid_d_dpo_type, DPO_PROTO_IP6, ls - sm->localsids); + else if (ls->behavior >= SR_BEHAVIOR_LAST) + { + sr_localsid_fn_registration_t *plugin = 0; + plugin = pool_elt_at_index (sm->plugin_functions, + ls->behavior - SR_BEHAVIOR_LAST); + /* Copy the unformat memory result */ + ls->plugin_mem = ls_plugin_mem; + /* Callback plugin creation function */ + rv = plugin->creation (ls); + if (rv) + { + pool_put (sm->localsids, ls); + return -6; + } + dpo_set (&dpo, plugin->dpo, DPO_PROTO_IP6, ls - sm->localsids); + } + + /* Set hash key for searching localsid by address */ + mhash_set (&sm->sr_localsids_index_hash, localsid_addr, ls - sm->localsids, + NULL); + + fib_table_entry_special_dpo_add (fib_index, &pfx, FIB_SOURCE_SR, + FIB_ENTRY_FLAG_EXCLUSIVE, &dpo); + dpo_reset (&dpo); + + /* Set counter to zero */ + vlib_validate_combined_counter (&(sm->sr_ls_valid_counters), + ls - sm->localsids); + vlib_validate_combined_counter (&(sm->sr_ls_invalid_counters), + ls - sm->localsids); + + vlib_zero_combined_counter (&(sm->sr_ls_valid_counters), + ls - sm->localsids); + vlib_zero_combined_counter (&(sm->sr_ls_invalid_counters), + ls - sm->localsids); + + return 0; +} + +/** + * @brief SR LocalSID CLI function. + * + * @see sr_cli_localsid + */ +static clib_error_t * +sr_cli_localsid_command_fn (vlib_main_t * vm, unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + vnet_main_t *vnm = vnet_get_main (); + ip6_sr_main_t *sm = &sr_main; + u32 sw_if_index = (u32) ~ 0, vlan_index = (u32) ~ 0, fib_index = 0; + int is_del = 0; + int end_psp = 0; + ip6_address_t resulting_address; + ip46_address_t next_hop; + char address_set = 0; + char behavior = 0; + void *ls_plugin_mem = 0; + + int rv; + + memset (&resulting_address, 0, sizeof (ip6_address_t)); + ip46_address_reset (&next_hop); + + while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) + { + if (unformat (input, "del")) + is_del = 1; + else if (!address_set + && unformat (input, "address %U", unformat_ip6_address, + &resulting_address)) + address_set = 1; + else if (!address_set + && unformat (input, "addr %U", unformat_ip6_address, + &resulting_address)) + address_set = 1; + else if (unformat (input, "fib-table %u", &fib_index)); + else if (vlan_index == (u32) ~ 0 + && unformat (input, "vlan %u", &vlan_index)); + else if (!behavior && unformat (input, "behavior")) + { + if (unformat (input, "end.x %U %U", + unformat_vnet_sw_interface, vnm, &sw_if_index, + unformat_ip6_address, &next_hop.ip6)) + behavior = SR_BEHAVIOR_X; + else if (unformat (input, "end.dx6 %U %U", + unformat_vnet_sw_interface, vnm, &sw_if_index, + unformat_ip6_address, &next_hop.ip6)) + behavior = SR_BEHAVIOR_DX6; + else if (unformat (input, "end.dx4 %U %U", + unformat_vnet_sw_interface, vnm, &sw_if_index, + unformat_ip4_address, &next_hop.ip4)) + behavior = SR_BEHAVIOR_DX4; + else if (unformat (input, "end.dx2 %U", + unformat_vnet_sw_interface, vnm, &sw_if_index)) + behavior = SR_BEHAVIOR_DX2; + else if (unformat (input, "end.dt6 %u", &sw_if_index)) + behavior = SR_BEHAVIOR_DT6; + else if (unformat (input, "end.dt4 %u", &sw_if_index)) + behavior = SR_BEHAVIOR_DT4; + else + { + /* Loop over all the plugin behavior format functions */ + sr_localsid_fn_registration_t *plugin = 0, **vec_plugins = 0; + sr_localsid_fn_registration_t **plugin_it = 0; + + /* Create a vector out of the plugin pool as recommended */ + /* *INDENT-OFF* */ + pool_foreach (plugin, sm->plugin_functions, + { + vec_add1 (vec_plugins, plugin); + }); + /* *INDENT-ON* */ + + vec_foreach (plugin_it, vec_plugins) + { + if (unformat + (input, "%U", (*plugin_it)->ls_unformat, &ls_plugin_mem)) + { + behavior = (*plugin_it)->sr_localsid_function_number; + break; + } + } + } + + if (!behavior) + { + if (unformat (input, "end")) + behavior = SR_BEHAVIOR_END; + else + break; + } + } + else if (!end_psp && unformat (input, "psp")) + end_psp = 1; + else + break; + } + + if (!behavior && end_psp) + behavior = SR_BEHAVIOR_END; + + if (!address_set) + return clib_error_return (0, + "Error: SRv6 LocalSID address is mandatory."); + if (!is_del && !behavior) + return clib_error_return (0, + "Error: SRv6 LocalSID behavior is mandatory."); + if (vlan_index != (u32) ~ 0) + return clib_error_return (0, + "Error: SRv6 End.DX2 with rewrite VLAN tag not supported by now."); + if (end_psp && !(behavior == SR_BEHAVIOR_END || behavior == SR_BEHAVIOR_X)) + return clib_error_return (0, + "Error: SRv6 PSP only compatible with End and End.X"); + + rv = sr_cli_localsid (is_del, &resulting_address, end_psp, behavior, + sw_if_index, vlan_index, fib_index, &next_hop, + ls_plugin_mem); + + switch (rv) + { + case 0: + break; + case 1: + return 0; + case -1: + return clib_error_return (0, + "Identical localsid already exists. Requested localsid not created."); + case -2: + return clib_error_return (0, + "The requested localsid could not be deleted. SR localsid not found"); + case -3: + return clib_error_return (0, "FIB table %u does not exist", fib_index); + case -4: + return clib_error_return (0, "There is already one FIB entry for the" + "requested localsid non segment routing related"); + case -5: + return clib_error_return (0, + "Could not create ARP/ND entry for such next_hop. Internal error."); + case -6: + return clib_error_return (0, + "Error on the plugin based localsid creation."); + default: + return clib_error_return (0, "BUG: sr localsid returns %d", rv); + } + return 0; +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (sr_localsid_command, static) = { + .path = "sr localsid", + .short_help = "sr localsid (del) address XX:XX::YY:YY" + "(fib-table 8) behavior STRING", + .long_help = + "Create SR LocalSID and binds it to a particular behavior\n" + "Arguments:\n" + "\tlocalSID IPv6_addr(128b) LocalSID IPv6 address\n" + "\t(fib-table X) Optional. VRF where to install SRv6 localsid\n" + "\tbehavior STRING Specifies the behavior\n" + "\n\tBehaviors:\n" + "\tEnd\t-> Endpoint.\n" + "\tEnd.X\t-> Endpoint with decapsulation and Layer-3 cross-connect.\n" + "\t\tParameters: ' '\n" + "\tEnd.DX2\t-> Endpoint with decapsulation and Layer-2 cross-connect.\n" + "\t\tParameters: ''\n" + "\tEnd.DX6\t-> Endpoint with decapsulation and IPv6 cross-connect.\n" + "\t\tParameters: ' '\n" + "\tEnd.DX4\t-> Endpoint with decapsulation and IPv4 cross-connect.\n" + "\t\tParameters: ' '\n" + "\tEnd.DT6\t-> Endpoint with decapsulation and specific IPv6 table lookup.\n" + "\t\tParameters: ''\n" + "\tEnd.DT4\t-> Endpoint with decapsulation and specific IPv4 table lookup.\n" + "\t\tParameters: ''\n", + .function = sr_cli_localsid_command_fn, +}; +/* *INDENT-ON* */ + +/** + * @brief CLI function to 'show' all SR LocalSIDs on console. + */ +static clib_error_t * +show_sr_localsid_command_fn (vlib_main_t * vm, unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + vnet_main_t *vnm = vnet_get_main (); + ip6_sr_main_t *sm = &sr_main; + ip6_sr_localsid_t **localsid_list = 0; + ip6_sr_localsid_t *ls; + int i; + + vlib_cli_output (vm, "SRv6 - My LocalSID Table:"); + vlib_cli_output (vm, "========================="); + /* *INDENT-OFF* */ + pool_foreach (ls, sm->localsids, ({ vec_add1 (localsid_list, ls); })); + /* *INDENT-ON* */ + for (i = 0; i < vec_len (localsid_list); i++) + { + ls = localsid_list[i]; + switch (ls->behavior) + { + case SR_BEHAVIOR_END: + vlib_cli_output (vm, "\tAddress: \t%U\n\tBehavior: \tEnd", + format_ip6_address, &ls->localsid); + break; + case SR_BEHAVIOR_X: + vlib_cli_output (vm, + "\tAddress: \t%U\n\tBehavior: \tX (Endpoint with Layer-3 cross-connect)" + "\n\tIface: \t%U\n\tNext hop: \t%U", + format_ip6_address, &ls->localsid, + format_vnet_sw_if_index_name, vnm, ls->sw_if_index, + format_ip6_address, &ls->next_hop.ip6); + break; + case SR_BEHAVIOR_DX4: + vlib_cli_output (vm, + "\tAddress: \t%U\n\tBehavior: \tDX4 (Endpoint with decapsulation and IPv4 cross-connect)" + "\n\tIface: \t%U\n\tNext hop: \t%U", + format_ip6_address, &ls->localsid, + format_vnet_sw_if_index_name, vnm, ls->sw_if_index, + format_ip4_address, &ls->next_hop.ip4); + break; + case SR_BEHAVIOR_DX6: + vlib_cli_output (vm, + "\tAddress: \t%U\n\tBehavior: \tDX6 (Endpoint with decapsulation and IPv6 cross-connect)" + "\n\tIface: \t%U\n\tNext hop: \t%U", + format_ip6_address, &ls->localsid, + format_vnet_sw_if_index_name, vnm, ls->sw_if_index, + format_ip6_address, &ls->next_hop.ip6); + break; + case SR_BEHAVIOR_DX2: + if (ls->vlan_index == (u32) ~ 0) + vlib_cli_output (vm, + "\tAddress: \t%U\n\tBehavior: \tDX2 (Endpoint with decapulation and Layer-2 cross-connect)" + "\n\tIface: \t%U", format_ip6_address, + &ls->localsid, format_vnet_sw_if_index_name, vnm, + ls->sw_if_index); + else + vlib_cli_output (vm, + "Unsupported yet. (DX2 with egress VLAN rewrite)"); + break; + case SR_BEHAVIOR_DT6: + vlib_cli_output (vm, + "\tAddress: \t%U\n\tBehavior: \tDT6 (Endpoint with decapsulation and specific IPv6 table lookup)" + "\n\tTable: %u", format_ip6_address, &ls->localsid, + ls->fib_table); + break; + case SR_BEHAVIOR_DT4: + vlib_cli_output (vm, + "\tAddress: \t%U\n\tBehavior: \tDT4 (Endpoint with decapsulation and specific IPv4 table lookup)" + "\n\tTable: \t%u", format_ip6_address, + &ls->localsid, ls->fib_table); + break; + default: + if (ls->behavior >= SR_BEHAVIOR_LAST) + { + sr_localsid_fn_registration_t *plugin = + pool_elt_at_index (sm->plugin_functions, + ls->behavior - SR_BEHAVIOR_LAST); + + vlib_cli_output (vm, "\tAddress: \t%U\n" + "\tBehavior: \t%s (%s)\n\t%U", + format_ip6_address, &ls->localsid, + plugin->keyword_str, plugin->def_str, + plugin->ls_format, ls->plugin_mem); + } + else + //Should never get here... + vlib_cli_output (vm, "Internal error"); + break; + } + if (ls->end_psp) + vlib_cli_output (vm, "\tPSP: \tTrue\n"); + + /* Print counters */ + vlib_counter_t valid, invalid; + vlib_get_combined_counter (&(sm->sr_ls_valid_counters), i, &valid); + vlib_get_combined_counter (&(sm->sr_ls_invalid_counters), i, &invalid); + vlib_cli_output (vm, "\tGood traffic: \t[%Ld packets : %Ld bytes]\n", + valid.packets, valid.bytes); + vlib_cli_output (vm, "\tBad traffic: \t[%Ld packets : %Ld bytes]\n", + invalid.packets, invalid.bytes); + vlib_cli_output (vm, "--------------------"); + } + return 0; +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (show_sr_localsid_command, static) = { + .path = "show sr localsids", + .short_help = "show sr localsids", + .function = show_sr_localsid_command_fn, +}; +/* *INDENT-ON* */ + +/** + * @brief Function to 'clear' ALL SR localsid counters + */ +static clib_error_t * +clear_sr_localsid_counters_command_fn (vlib_main_t * vm, + unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + ip6_sr_main_t *sm = &sr_main; + + vlib_clear_combined_counters (&(sm->sr_ls_valid_counters)); + vlib_clear_combined_counters (&(sm->sr_ls_invalid_counters)); + + return 0; +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (clear_sr_localsid_counters_command, static) = { + .path = "clear sr localsid counters", + .short_help = "clear sr localsid counters", + .function = clear_sr_localsid_counters_command_fn, +}; +/* *INDENT-ON* */ + +/************************ SR LocalSID graphs node ****************************/ +/** + * @brief SR localsid node trace + */ +typedef struct +{ + u32 localsid_index; + ip6_address_t src, out_dst; + u8 sr[256]; + u8 num_segments; + u8 segments_left; + //With SRv6 header update include flags here. +} sr_localsid_trace_t; + +#define foreach_sr_localsid_error \ +_(NO_INNER_HEADER, "(SR-Error) No inner IP header") \ +_(NO_MORE_SEGMENTS, "(SR-Error) No more segments") \ +_(NO_SRH, "(SR-Error) No SR header") \ +_(NO_PSP, "(SR-Error) PSP Not available (segments left > 0)") \ +_(NOT_LS, "(SR-Error) Decaps not available (segments left > 0)") \ +_(L2, "(SR-Error) SRv6 decapsulated a L2 frame without dest") + +typedef enum +{ +#define _(sym,str) SR_LOCALSID_ERROR_##sym, + foreach_sr_localsid_error +#undef _ + SR_LOCALSID_N_ERROR, +} sr_localsid_error_t; + +static char *sr_localsid_error_strings[] = { +#define _(sym,string) string, + foreach_sr_localsid_error +#undef _ +}; + +#define foreach_sr_localsid_next \ +_(ERROR, "error-drop") \ +_(IP6_LOOKUP, "ip6-lookup") \ +_(IP4_LOOKUP, "ip4-lookup") \ +_(IP6_REWRITE, "ip6-rewrite") \ +_(IP4_REWRITE, "ip4-rewrite") \ +_(INTERFACE_OUTPUT, "interface-output") + +typedef enum +{ +#define _(s,n) SR_LOCALSID_NEXT_##s, + foreach_sr_localsid_next +#undef _ + SR_LOCALSID_N_NEXT, +} sr_localsid_next_t; + +/** + * @brief SR LocalSID graph node trace function + * + * @see sr_localsid + */ +u8 * +format_sr_localsid_trace (u8 * s, va_list * args) +{ + CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *); + CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *); + ip6_sr_main_t *sm = &sr_main; + sr_localsid_trace_t *t = va_arg (*args, sr_localsid_trace_t *); + + ip6_sr_localsid_t *ls = + pool_elt_at_index (sm->localsids, t->localsid_index); + + s = + format (s, "SR-LOCALSID:\n\tLocalsid: %U\n", format_ip6_address, + &ls->localsid); + switch (ls->behavior) + { + case SR_BEHAVIOR_END: + s = format (s, "\tBehavior: End\n"); + break; + case SR_BEHAVIOR_DX6: + s = format (s, "\tBehavior: Decapsulation with IPv6 L3 xconnect\n"); + break; + case SR_BEHAVIOR_DX4: + s = format (s, "\tBehavior: Decapsulation with IPv4 L3 xconnect\n"); + break; + case SR_BEHAVIOR_X: + s = format (s, "\tBehavior: IPv6 L3 xconnect\n"); + break; + case SR_BEHAVIOR_DT6: + s = format (s, "\tBehavior: Decapsulation with IPv6 Table lookup\n"); + break; + case SR_BEHAVIOR_DT4: + s = format (s, "\tBehavior: Decapsulation with IPv4 Table lookup\n"); + break; + case SR_BEHAVIOR_DX2: + s = format (s, "\tBehavior: Decapsulation with L2 xconnect\n"); + break; + default: + s = format (s, "\tBehavior: defined in plugin\n"); //TODO + break; + } + if (t->num_segments != 0xFF) + { + if (t->num_segments > 0) + { + s = format (s, "\tSegments left: %d\n", t->num_segments); + s = format (s, "\tSID list: [in ietf order]"); + int i = 0; + for (i = 0; i < t->num_segments; i++) + { + s = format (s, "\n\t-> %U", format_ip6_address, + (ip6_address_t *) & t->sr[i * + sizeof (ip6_address_t)]); + } + } + } + return s; +} + +/** + * @brief Function doing End processing. + */ +static_always_inline void +end_srh_processing (vlib_node_runtime_t * node, + vlib_buffer_t * b0, + ip6_header_t * ip0, + ip6_sr_header_t * sr0, + ip6_sr_localsid_t * ls0, u32 * next0) +{ + ip6_address_t *new_dst0; + + if (PREDICT_TRUE (sr0->type == ROUTING_HEADER_TYPE_SR)) + { + if (PREDICT_TRUE (sr0->segments_left != 0)) + { + sr0->segments_left -= 1; + new_dst0 = (ip6_address_t *) (sr0->segments); + new_dst0 += sr0->segments_left; + ip0->dst_address.as_u64[0] = new_dst0->as_u64[0]; + ip0->dst_address.as_u64[1] = new_dst0->as_u64[1]; + + if (ls0->behavior == SR_BEHAVIOR_X) + { + vnet_buffer (b0)->ip.adj_index[VLIB_TX] = ls0->nh_adj; + *next0 = SR_LOCALSID_NEXT_IP6_REWRITE; + } + } + else + { + *next0 = SR_LOCALSID_NEXT_ERROR; + b0->error = node->errors[SR_LOCALSID_ERROR_NO_MORE_SEGMENTS]; + } + } + else + { + /* Error. Routing header of type != SR */ + *next0 = SR_LOCALSID_NEXT_ERROR; + b0->error = node->errors[SR_LOCALSID_ERROR_NO_SRH]; + } +} + +/* + * @brief Function doing SRH processing for D* variants + */ +//FixME. I must crosscheck that next_proto matches the localsid +static_always_inline void +end_decaps_srh_processing (vlib_node_runtime_t * node, + vlib_buffer_t * b0, + ip6_header_t * ip0, + ip6_sr_header_t * sr0, + ip6_sr_localsid_t * ls0, u32 * next0) +{ + /* Compute the size of the IPv6 header with all Ext. headers */ + u8 next_proto; + ip6_ext_header_t *next_ext_header; + u16 total_size = 0; + + next_proto = ip0->protocol; + next_ext_header = (void *) (ip0 + 1); + total_size = sizeof (ip6_header_t); + while (ip6_ext_hdr (next_proto)) + { + total_size += ip6_ext_header_len (next_ext_header); + next_proto = next_ext_header->next_hdr; + next_ext_header = ip6_ext_next_header (next_ext_header); + } + + /* Ensure this is the last segment. Otherwise drop. */ + if (sr0 && sr0->segments_left != 0) + { + *next0 = SR_LOCALSID_NEXT_ERROR; + b0->error = node->errors[SR_LOCALSID_ERROR_NOT_LS]; + return; + } + + switch (next_proto) + { + case IP_PROTOCOL_IPV6: + /* Encap-End IPv6. Pop outer IPv6 header. */ + if (ls0->behavior == SR_BEHAVIOR_DX6) + { + vlib_buffer_advance (b0, total_size); + vnet_buffer (b0)->ip.adj_index[VLIB_TX] = ls0->nh_adj; + *next0 = SR_LOCALSID_NEXT_IP6_REWRITE; + return; + } + else if (ls0->behavior == SR_BEHAVIOR_DT6) + { + vlib_buffer_advance (b0, total_size); + vnet_buffer (b0)->sw_if_index[VLIB_TX] = ls0->fib_table; + return; + } + break; + case IP_PROTOCOL_IP_IN_IP: + /* Encap-End IPv4. Pop outer IPv6 header */ + if (ls0->behavior == SR_BEHAVIOR_DX4) + { + vlib_buffer_advance (b0, total_size); + vnet_buffer (b0)->ip.adj_index[VLIB_TX] = ls0->nh_adj; + *next0 = SR_LOCALSID_NEXT_IP4_REWRITE; + return; + } + else if (ls0->behavior == SR_BEHAVIOR_DT4) + { + vlib_buffer_advance (b0, total_size); + vnet_buffer (b0)->sw_if_index[VLIB_TX] = ls0->fib_table; + *next0 = SR_LOCALSID_NEXT_IP4_LOOKUP; + return; + } + break; + case IP_PROTOCOL_IP6_NONXT: + /* L2 encaps */ + if (ls0->behavior == SR_BEHAVIOR_DX2) + { + vlib_buffer_advance (b0, total_size); + vnet_buffer (b0)->sw_if_index[VLIB_TX] = ls0->sw_if_index; + *next0 = SR_LOCALSID_NEXT_INTERFACE_OUTPUT; + return; + } + break; + } + *next0 = SR_LOCALSID_NEXT_ERROR; + b0->error = node->errors[SR_LOCALSID_ERROR_NO_INNER_HEADER]; + return; +} + +/** + * @brief Function doing End processing with PSP + */ +static_always_inline void +end_psp_srh_processing (vlib_node_runtime_t * node, + vlib_buffer_t * b0, + ip6_header_t * ip0, + ip6_ext_header_t * prev0, + ip6_sr_header_t * sr0, + ip6_sr_localsid_t * ls0, u32 * next0) +{ + u32 new_l0, sr_len; + u64 *copy_dst0, *copy_src0; + u32 copy_len_u64s0 = 0; + int i; + + if (PREDICT_TRUE (sr0->type == ROUTING_HEADER_TYPE_SR)) + { + if (PREDICT_TRUE (sr0->segments_left == 1)) + { + ip0->dst_address.as_u64[0] = sr0->segments->as_u64[0]; + ip0->dst_address.as_u64[1] = sr0->segments->as_u64[1]; + + /* Remove the SRH taking care of the rest of IPv6 ext header */ + if (prev0) + prev0->next_hdr = sr0->protocol; + else + ip0->protocol = sr0->protocol; + + sr_len = ip6_ext_header_len (sr0); + vlib_buffer_advance (b0, sr_len); + new_l0 = clib_net_to_host_u16 (ip0->payload_length) - sr_len; + ip0->payload_length = clib_host_to_net_u16 (new_l0); + copy_src0 = (u64 *) ip0; + copy_dst0 = copy_src0 + (sr0->length + 1); + /* number of 8 octet units to copy + * By default in absence of extension headers it is equal to length of ip6 header + * With extension headers it number of 8 octet units of ext headers preceding + * SR header + */ + copy_len_u64s0 = + (((u8 *) sr0 - (u8 *) ip0) - sizeof (ip6_header_t)) >> 3; + copy_dst0[4 + copy_len_u64s0] = copy_src0[4 + copy_len_u64s0]; + copy_dst0[3 + copy_len_u64s0] = copy_src0[3 + copy_len_u64s0]; + copy_dst0[2 + copy_len_u64s0] = copy_src0[2 + copy_len_u64s0]; + copy_dst0[1 + copy_len_u64s0] = copy_src0[1 + copy_len_u64s0]; + copy_dst0[0 + copy_len_u64s0] = copy_src0[0 + copy_len_u64s0]; + + for (i = copy_len_u64s0 - 1; i >= 0; i--) + { + copy_dst0[i] = copy_src0[i]; + } + + if (ls0->behavior == SR_BEHAVIOR_X) + { + vnet_buffer (b0)->ip.adj_index[VLIB_TX] = ls0->nh_adj; + *next0 = SR_LOCALSID_NEXT_IP6_REWRITE; + } + return; + } + } + /* Error. Routing header of type != SR */ + *next0 = SR_LOCALSID_NEXT_ERROR; + b0->error = node->errors[SR_LOCALSID_ERROR_NO_PSP]; +} + +/** + * @brief SR LocalSID graph node. Supports all default SR Endpoint variants + */ +static uword +sr_localsid_d_fn (vlib_main_t * vm, vlib_node_runtime_t * node, + vlib_frame_t * from_frame) +{ + u32 n_left_from, next_index, *from, *to_next; + ip6_sr_main_t *sm = &sr_main; + from = vlib_frame_vector_args (from_frame); + n_left_from = from_frame->n_vectors; + next_index = node->cached_next_index; + u32 thread_index = vlib_get_thread_index (); + + while (n_left_from > 0) + { + u32 n_left_to_next; + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + /* Quad - Loop */ + while (n_left_from >= 8 && n_left_to_next >= 4) + { + u32 bi0, bi1, bi2, bi3; + vlib_buffer_t *b0, *b1, *b2, *b3; + ip6_header_t *ip0, *ip1, *ip2, *ip3; + ip6_ext_header_t *prev0, *prev1, *prev2, *prev3; + ip6_sr_header_t *sr0, *sr1, *sr2, *sr3; + u32 next0, next1, next2, next3; + next0 = next1 = next2 = next3 = SR_LOCALSID_NEXT_IP6_LOOKUP; + ip6_sr_localsid_t *ls0, *ls1, *ls2, *ls3; + + /* Prefetch next iteration. */ + { + vlib_buffer_t *p4, *p5, *p6, *p7; + + p4 = vlib_get_buffer (vm, from[4]); + p5 = vlib_get_buffer (vm, from[5]); + p6 = vlib_get_buffer (vm, from[6]); + p7 = vlib_get_buffer (vm, from[7]); + + /* Prefetch the buffer header and packet for the N+4 loop iteration */ + vlib_prefetch_buffer_header (p4, LOAD); + vlib_prefetch_buffer_header (p5, LOAD); + vlib_prefetch_buffer_header (p6, LOAD); + vlib_prefetch_buffer_header (p7, LOAD); + + CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); + } + + to_next[0] = bi0 = from[0]; + to_next[1] = bi1 = from[1]; + to_next[2] = bi2 = from[2]; + to_next[3] = bi3 = from[3]; + from += 4; + to_next += 4; + n_left_from -= 4; + n_left_to_next -= 4; + + b0 = vlib_get_buffer (vm, bi0); + b1 = vlib_get_buffer (vm, bi1); + b2 = vlib_get_buffer (vm, bi2); + b3 = vlib_get_buffer (vm, bi3); + + ls0 = + pool_elt_at_index (sm->localsids, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + ls1 = + pool_elt_at_index (sm->localsids, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + ls2 = + pool_elt_at_index (sm->localsids, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + ls3 = + pool_elt_at_index (sm->localsids, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + + ip0 = vlib_buffer_get_current (b0); + ip1 = vlib_buffer_get_current (b1); + ip2 = vlib_buffer_get_current (b2); + ip3 = vlib_buffer_get_current (b3); + + ip6_ext_header_find_t (ip0, prev0, sr0, IP_PROTOCOL_IPV6_ROUTE); + ip6_ext_header_find_t (ip1, prev1, sr1, IP_PROTOCOL_IPV6_ROUTE); + ip6_ext_header_find_t (ip2, prev2, sr2, IP_PROTOCOL_IPV6_ROUTE); + ip6_ext_header_find_t (ip3, prev3, sr3, IP_PROTOCOL_IPV6_ROUTE); + + end_decaps_srh_processing (node, b0, ip0, sr0, ls0, &next0); + end_decaps_srh_processing (node, b1, ip1, sr1, ls1, &next1); + end_decaps_srh_processing (node, b2, ip2, sr2, ls2, &next2); + end_decaps_srh_processing (node, b3, ip3, sr3, ls3, &next3); + + //TODO: trace. + + vlib_increment_combined_counter + (((next0 == + SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : + &(sm->sr_ls_valid_counters)), thread_index, ls0 - sm->localsids, + 1, vlib_buffer_length_in_chain (vm, b0)); + + vlib_increment_combined_counter + (((next1 == + SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : + &(sm->sr_ls_valid_counters)), thread_index, ls1 - sm->localsids, + 1, vlib_buffer_length_in_chain (vm, b1)); + + vlib_increment_combined_counter + (((next2 == + SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : + &(sm->sr_ls_valid_counters)), thread_index, ls2 - sm->localsids, + 1, vlib_buffer_length_in_chain (vm, b2)); + + vlib_increment_combined_counter + (((next3 == + SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : + &(sm->sr_ls_valid_counters)), thread_index, ls3 - sm->localsids, + 1, vlib_buffer_length_in_chain (vm, b3)); + + vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, + n_left_to_next, bi0, bi1, bi2, bi3, + next0, next1, next2, next3); + } + + /* Single loop for potentially the last three packets */ + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0; + vlib_buffer_t *b0; + ip6_header_t *ip0; + ip6_ext_header_t *prev0; + ip6_sr_header_t *sr0; + u32 next0 = SR_LOCALSID_NEXT_IP6_LOOKUP; + ip6_sr_localsid_t *ls0; + + bi0 = from[0]; + to_next[0] = bi0; + from += 1; + to_next += 1; + n_left_from -= 1; + n_left_to_next -= 1; + + b0 = vlib_get_buffer (vm, bi0); + ip0 = vlib_buffer_get_current (b0); + + /* Lookup the SR End behavior based on IP DA (adj) */ + ls0 = + pool_elt_at_index (sm->localsids, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + + /* Find SRH as well as previous header */ + ip6_ext_header_find_t (ip0, prev0, sr0, IP_PROTOCOL_IPV6_ROUTE); + + /* SRH processing and End variants */ + end_decaps_srh_processing (node, b0, ip0, sr0, ls0, &next0); + + if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_localsid_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + tr->num_segments = 0; + tr->localsid_index = ls0 - sm->localsids; + + if (ip0 == vlib_buffer_get_current (b0)) + { + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->out_dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->out_dst.as_u8)); + if (ip0->protocol == IP_PROTOCOL_IPV6_ROUTE + && sr0->type == ROUTING_HEADER_TYPE_SR) + { + clib_memcpy (tr->sr, sr0->segments, sr0->length * 8); + tr->num_segments = + sr0->length * 8 / sizeof (ip6_address_t); + tr->segments_left = sr0->segments_left; + } + } + else + tr->num_segments = 0xFF; + } + + /* Increase the counters */ + vlib_increment_combined_counter + (((next0 == + SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : + &(sm->sr_ls_valid_counters)), thread_index, ls0 - sm->localsids, + 1, vlib_buffer_length_in_chain (vm, b0)); + + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, + n_left_to_next, bi0, next0); + } + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + return from_frame->n_vectors; +} + +/* *INDENT-OFF* */ +VLIB_REGISTER_NODE (sr_localsid_d_node) = { + .function = sr_localsid_d_fn, + .name = "sr-localsid-d", + .vector_size = sizeof (u32), + .format_trace = format_sr_localsid_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + .n_errors = SR_LOCALSID_N_ERROR, + .error_strings = sr_localsid_error_strings, + .n_next_nodes = SR_LOCALSID_N_NEXT, + .next_nodes = { +#define _(s,n) [SR_LOCALSID_NEXT_##s] = n, + foreach_sr_localsid_next +#undef _ + }, +}; +/* *INDENT-ON* */ + +/** + * @brief SR LocalSID graph node. Supports all default SR Endpoint variants + */ +static uword +sr_localsid_fn (vlib_main_t * vm, vlib_node_runtime_t * node, + vlib_frame_t * from_frame) +{ + u32 n_left_from, next_index, *from, *to_next; + ip6_sr_main_t *sm = &sr_main; + from = vlib_frame_vector_args (from_frame); + n_left_from = from_frame->n_vectors; + next_index = node->cached_next_index; + u32 thread_index = vlib_get_thread_index (); + + while (n_left_from > 0) + { + u32 n_left_to_next; + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + /* Quad - Loop */ + while (n_left_from >= 8 && n_left_to_next >= 4) + { + u32 bi0, bi1, bi2, bi3; + vlib_buffer_t *b0, *b1, *b2, *b3; + ip6_header_t *ip0, *ip1, *ip2, *ip3; + ip6_sr_header_t *sr0, *sr1, *sr2, *sr3; + ip6_ext_header_t *prev0, *prev1, *prev2, *prev3; + u32 next0, next1, next2, next3; + next0 = next1 = next2 = next3 = SR_LOCALSID_NEXT_IP6_LOOKUP; + ip6_sr_localsid_t *ls0, *ls1, *ls2, *ls3; + + /* Prefetch next iteration. */ + { + vlib_buffer_t *p4, *p5, *p6, *p7; + + p4 = vlib_get_buffer (vm, from[4]); + p5 = vlib_get_buffer (vm, from[5]); + p6 = vlib_get_buffer (vm, from[6]); + p7 = vlib_get_buffer (vm, from[7]); + + /* Prefetch the buffer header and packet for the N+2 loop iteration */ + vlib_prefetch_buffer_header (p4, LOAD); + vlib_prefetch_buffer_header (p5, LOAD); + vlib_prefetch_buffer_header (p6, LOAD); + vlib_prefetch_buffer_header (p7, LOAD); + + CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); + } + + to_next[0] = bi0 = from[0]; + to_next[1] = bi1 = from[1]; + to_next[2] = bi2 = from[2]; + to_next[3] = bi3 = from[3]; + from += 4; + to_next += 4; + n_left_from -= 4; + n_left_to_next -= 4; + + b0 = vlib_get_buffer (vm, bi0); + b1 = vlib_get_buffer (vm, bi1); + b2 = vlib_get_buffer (vm, bi2); + b3 = vlib_get_buffer (vm, bi3); + + ip0 = vlib_buffer_get_current (b0); + ip1 = vlib_buffer_get_current (b1); + ip2 = vlib_buffer_get_current (b2); + ip3 = vlib_buffer_get_current (b3); + + ip6_ext_header_find_t (ip0, prev0, sr0, IP_PROTOCOL_IPV6_ROUTE); + ip6_ext_header_find_t (ip1, prev1, sr1, IP_PROTOCOL_IPV6_ROUTE); + ip6_ext_header_find_t (ip2, prev2, sr2, IP_PROTOCOL_IPV6_ROUTE); + ip6_ext_header_find_t (ip3, prev3, sr3, IP_PROTOCOL_IPV6_ROUTE); + + ls0 = + pool_elt_at_index (sm->localsids, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + ls1 = + pool_elt_at_index (sm->localsids, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + ls2 = + pool_elt_at_index (sm->localsids, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + ls3 = + pool_elt_at_index (sm->localsids, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + + if (ls0->end_psp) + end_psp_srh_processing (node, b0, ip0, prev0, sr0, ls0, &next0); + else + end_srh_processing (node, b0, ip0, sr0, ls0, &next0); + + if (ls1->end_psp) + end_psp_srh_processing (node, b1, ip1, prev1, sr1, ls1, &next1); + else + end_srh_processing (node, b1, ip1, sr1, ls1, &next1); + + if (ls2->end_psp) + end_psp_srh_processing (node, b2, ip2, prev2, sr2, ls2, &next2); + else + end_srh_processing (node, b2, ip2, sr2, ls2, &next2); + + if (ls3->end_psp) + end_psp_srh_processing (node, b3, ip3, prev3, sr3, ls3, &next3); + else + end_srh_processing (node, b3, ip3, sr3, ls3, &next3); + + //TODO: proper trace. + + vlib_increment_combined_counter + (((next0 == + SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : + &(sm->sr_ls_valid_counters)), thread_index, ls0 - sm->localsids, + 1, vlib_buffer_length_in_chain (vm, b0)); + + vlib_increment_combined_counter + (((next1 == + SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : + &(sm->sr_ls_valid_counters)), thread_index, ls1 - sm->localsids, + 1, vlib_buffer_length_in_chain (vm, b1)); + + vlib_increment_combined_counter + (((next2 == + SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : + &(sm->sr_ls_valid_counters)), thread_index, ls2 - sm->localsids, + 1, vlib_buffer_length_in_chain (vm, b2)); + + vlib_increment_combined_counter + (((next3 == + SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : + &(sm->sr_ls_valid_counters)), thread_index, ls3 - sm->localsids, + 1, vlib_buffer_length_in_chain (vm, b3)); + + vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, + n_left_to_next, bi0, bi1, bi2, bi3, + next0, next1, next2, next3); + } + + /* Single loop for potentially the last three packets */ + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0; + vlib_buffer_t *b0; + ip6_header_t *ip0 = 0; + ip6_ext_header_t *prev0; + ip6_sr_header_t *sr0; + u32 next0 = SR_LOCALSID_NEXT_IP6_LOOKUP; + ip6_sr_localsid_t *ls0; + + bi0 = from[0]; + to_next[0] = bi0; + from += 1; + to_next += 1; + n_left_from -= 1; + n_left_to_next -= 1; + + b0 = vlib_get_buffer (vm, bi0); + ip0 = vlib_buffer_get_current (b0); + ip6_ext_header_find_t (ip0, prev0, sr0, IP_PROTOCOL_IPV6_ROUTE); + + /* Lookup the SR End behavior based on IP DA (adj) */ + ls0 = + pool_elt_at_index (sm->localsids, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + + /* SRH processing */ + if (ls0->end_psp) + end_psp_srh_processing (node, b0, ip0, prev0, sr0, ls0, &next0); + else + end_srh_processing (node, b0, ip0, sr0, ls0, &next0); + + if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_localsid_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + tr->num_segments = 0; + tr->localsid_index = ls0 - sm->localsids; + + if (ip0 == vlib_buffer_get_current (b0)) + { + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->out_dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->out_dst.as_u8)); + if (ip0->protocol == IP_PROTOCOL_IPV6_ROUTE + && sr0->type == ROUTING_HEADER_TYPE_SR) + { + clib_memcpy (tr->sr, sr0->segments, sr0->length * 8); + tr->num_segments = + sr0->length * 8 / sizeof (ip6_address_t); + tr->segments_left = sr0->segments_left; + } + } + else + { + tr->num_segments = 0xFF; + } + } + + vlib_increment_combined_counter + (((next0 == + SR_LOCALSID_NEXT_ERROR) ? &(sm->sr_ls_invalid_counters) : + &(sm->sr_ls_valid_counters)), thread_index, ls0 - sm->localsids, + 1, vlib_buffer_length_in_chain (vm, b0)); + + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, + n_left_to_next, bi0, next0); + } + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + return from_frame->n_vectors; +} + +/* *INDENT-OFF* */ +VLIB_REGISTER_NODE (sr_localsid_node) = { + .function = sr_localsid_fn, + .name = "sr-localsid", + .vector_size = sizeof (u32), + .format_trace = format_sr_localsid_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + .n_errors = SR_LOCALSID_N_ERROR, + .error_strings = sr_localsid_error_strings, + .n_next_nodes = SR_LOCALSID_N_NEXT, + .next_nodes = { +#define _(s,n) [SR_LOCALSID_NEXT_##s] = n, + foreach_sr_localsid_next +#undef _ + }, +}; +/* *INDENT-ON* */ + +static u8 * +format_sr_dpo (u8 * s, va_list * args) +{ + index_t index = va_arg (*args, index_t); + CLIB_UNUSED (u32 indent) = va_arg (*args, u32); + + return (format (s, "SR: localsid_index:[%d]", index)); +} + +const static dpo_vft_t sr_loc_vft = { + .dv_lock = sr_dpo_lock, + .dv_unlock = sr_dpo_unlock, + .dv_format = format_sr_dpo, +}; + +const static char *const sr_loc_ip6_nodes[] = { + "sr-localsid", + NULL, +}; + +const static char *const *const sr_loc_nodes[DPO_PROTO_NUM] = { + [DPO_PROTO_IP6] = sr_loc_ip6_nodes, +}; + +const static char *const sr_loc_d_ip6_nodes[] = { + "sr-localsid-d", + NULL, +}; + +const static char *const *const sr_loc_d_nodes[DPO_PROTO_NUM] = { + [DPO_PROTO_IP6] = sr_loc_d_ip6_nodes, +}; + + +/*************************** SR LocalSID plugins ******************************/ +/** + * @brief SR LocalSID plugin registry + */ +int +sr_localsid_register_function (vlib_main_t * vm, u8 * fn_name, + u8 * keyword_str, u8 * def_str, + u8 * params_str, dpo_type_t * dpo, + format_function_t * ls_format, + unformat_function_t * ls_unformat, + sr_plugin_callback_t * creation_fn, + sr_plugin_callback_t * removal_fn) +{ + ip6_sr_main_t *sm = &sr_main; + uword *p; + + sr_localsid_fn_registration_t *plugin; + + /* Did this function exist? If so update it */ + p = hash_get_mem (sm->plugin_functions_by_key, fn_name); + if (p) + { + plugin = pool_elt_at_index (sm->plugin_functions, p[0]); + } + /* Else create a new one and set hash key */ + else + { + pool_get (sm->plugin_functions, plugin); + hash_set_mem (sm->plugin_functions_by_key, fn_name, + plugin - sm->plugin_functions); + } + + memset (plugin, 0, sizeof (*plugin)); + + plugin->sr_localsid_function_number = (plugin - sm->plugin_functions); + plugin->sr_localsid_function_number += SR_BEHAVIOR_LAST; + plugin->ls_format = ls_format; + plugin->ls_unformat = ls_unformat; + plugin->creation = creation_fn; + plugin->removal = removal_fn; + clib_memcpy (&plugin->dpo, dpo, sizeof (dpo_type_t)); + plugin->function_name = format (0, "%s%c", fn_name, 0); + plugin->keyword_str = format (0, "%s%c", keyword_str, 0); + plugin->def_str = format (0, "%s%c", def_str, 0); + plugin->params_str = format (0, "%s%c", params_str, 0); + + return plugin->sr_localsid_function_number; +} + +/** + * @brief CLI function to 'show' all available SR LocalSID behaviors + */ +static clib_error_t * +show_sr_localsid_behaviors_command_fn (vlib_main_t * vm, + unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + ip6_sr_main_t *sm = &sr_main; + sr_localsid_fn_registration_t *plugin; + sr_localsid_fn_registration_t **plugins_vec = 0; + int i; + + vlib_cli_output (vm, + "SR LocalSIDs behaviors:\n-----------------------\n\n"); + + /* *INDENT-OFF* */ + pool_foreach (plugin, sm->plugin_functions, + ({ vec_add1 (plugins_vec, plugin); })); + /* *INDENT-ON* */ + + /* Print static behaviors */ + vlib_cli_output (vm, "Default behaviors:\n" + "\tEnd\t-> Endpoint.\n" + "\tEnd.X\t-> Endpoint with decapsulation and Layer-3 cross-connect.\n" + "\t\tParameters: ' '\n" + "\tEnd.DX2\t-> Endpoint with decapsulation and Layer-2 cross-connect.\n" + "\t\tParameters: ''\n" + "\tEnd.DX6\t-> Endpoint with decapsulation and IPv6 cross-connect.\n" + "\t\tParameters: ' '\n" + "\tEnd.DX4\t-> Endpoint with decapsulation and IPv4 cross-connect.\n" + "\t\tParameters: ' '\n" + "\tEnd.DT6\t-> Endpoint with decapsulation and specific IPv6 table lookup.\n" + "\t\tParameters: ''\n" + "\tEnd.DT4\t-> Endpoint with decapsulation and specific IPv4 table lookup.\n" + "\t\tParameters: ''\n"); + vlib_cli_output (vm, "Plugin behaviors:\n"); + for (i = 0; i < vec_len (plugins_vec); i++) + { + plugin = plugins_vec[i]; + vlib_cli_output (vm, "\t%s\t-> %s.\n", plugin->keyword_str, + plugin->def_str); + vlib_cli_output (vm, "\t\tParameters: '%s'\n", plugin->params_str); + } + return 0; +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (show_sr_localsid_behaviors_command, static) = { + .path = "show sr localsids behaviors", + .short_help = "show sr localsids behaviors", + .function = show_sr_localsid_behaviors_command_fn, +}; +/* *INDENT-ON* */ + +/** + * @brief SR LocalSID initialization + */ +clib_error_t * +sr_localsids_init (vlib_main_t * vm) +{ + /* Init memory for function keys */ + ip6_sr_main_t *sm = &sr_main; + mhash_init (&sm->sr_localsids_index_hash, sizeof (uword), + sizeof (ip6_address_t)); + /* Init SR behaviors DPO type */ + sr_localsid_dpo_type = dpo_register_new_type (&sr_loc_vft, sr_loc_nodes); + /* Init SR behaviors DPO type */ + sr_localsid_d_dpo_type = + dpo_register_new_type (&sr_loc_vft, sr_loc_d_nodes); + /* Init memory for localsid plugins */ + sm->plugin_functions_by_key = hash_create_string (0, sizeof (uword)); + return 0; +} + +VLIB_INIT_FUNCTION (sr_localsids_init); +/* +* fd.io coding-style-patch-verification: ON +* +* Local Variables: +* eval: (c-set-style "gnu") +* End: +*/ diff --git a/src/vnet/srv6/sr_localsid.md b/src/vnet/srv6/sr_localsid.md new file mode 100644 index 00000000..340af4a3 --- /dev/null +++ b/src/vnet/srv6/sr_localsid.md @@ -0,0 +1,58 @@ +# SR LocalSIDs {#srv6_localsid_doc} + +A local SID is associated to a Segment Routing behavior -or function- on the current node. + +The most basic behavior is called END. It simply activates the next SID in the current packet, by decrementing the Segments Left value and updating the IPv6 DA. + +A local END SID is instantiated using the following CLI: + + sr localsid (del) address XX::YY behavior end + +This creates a new entry in the main FIB for IPv6 address XX::YY. All packets whose IPv6 DA matches this FIB entry are redirected to the sr-localsid node, where they are processed as described above. + +Other examples of local SIDs are the following: + + sr localsid (del) address XX::YY behavior end + sr localsid (del) address XX::YY behavior end.x GE0/1/0 2001::a + sr localsid (del) address XX::YY behavior end.dx6 GE0/1/0 2001::a + sr localsid (del) address XX::YY behavior end.dx4 GE0/1/0 10.0.0.1 + sr localsid (del) address XX::YY behavior end.dx2 GigabitE0/11/0 + sr localsid (del) address XX::YY behavior end.dt6 5 + sr localsid (del) address XX::YY behavior end.dt6 5 + +Note that all of these behaviors match the definitions of the SRv6 architecture (*draft-filsfils-spring-srv6-network-programming*). Please refer to this document for a detailed description of each behavior. + +Note also that you can configure the PSP flavor of the End and End.X behaviors by typing: + + sr localsid (del) address XX::YY behavior end psp + sr localsid (del) address XX::YY behavior end.x GE0/1/0 2001::a psp + +Help on the available local SID behaviors and their usage can be obtained with: + + help sr localsid + +Alternatively they can be obtained using. + + show sr localsids behavior + +The difference in between those two commands is that the first one will only display the SR LocalSID behaviors that are built-in VPP, while the latter will display those behaviors plus the ones added with the SR LocalSID Development Framework. + + +VPP keeps a 'My LocalSID Table' where it stores all the SR local SIDs instantiated as well as their parameters. Every time a new local SID is instantiated, a new entry is added to this table. In addition, counters for correctly and incorrectly processed traffic are maintained for each local SID. The counters store both the number of packets and bytes. + +The contents of the 'My LocalSID Table' is shown with: + + vpp# show sr localsid + SRv6 - My LocalSID Table: + ========================= + Address: c3::1 + Behavior: DX6 (Endpoint with decapsulation and IPv6 cross-connect) + Iface: GigabitEthernet0/5/0 + Next hop: b:c3::b + Good traffic: [51277 packets : 5332808 bytes] + Bad traffic: [0 packets : 0 bytes] + -------------------- + +The traffic counters can be reset with: + + vpp# clear sr localsid counters diff --git a/src/vnet/srv6/sr_packet.h b/src/vnet/srv6/sr_packet.h new file mode 100755 index 00000000..7af4ad4d --- /dev/null +++ b/src/vnet/srv6/sr_packet.h @@ -0,0 +1,159 @@ +#ifndef included_vnet_sr_packet_h +#define included_vnet_sr_packet_h + +#include + +/* + * ipv6 segment-routing header format + * + * Copyright (c) 2013 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * The Segment Routing Header (SRH) is defined as follows: + * + * 0 1 2 3 + * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | Next Header | Hdr Ext Len | Routing Type | Segments Left | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | First Segment | Flags | RESERVED | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | | + * | Segment List[0] (128 bits IPv6 address) | + * | | + * | | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | | + * | | + * ... + * | | + * | | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | | + * | Segment List[n] (128 bits IPv6 address) | + * | | + * | | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * // // + * // Optional Type Length Value objects (variable) // + * // // + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * + * where: + * + * o Next Header: 8-bit selector. Identifies the type of header + * immediately following the SRH. + * + * o Hdr Ext Len: 8-bit unsigned integer, is the length of the SRH + * header in 8-octet units, not including the first 8 octets. + * + * o Routing Type: TBD, to be assigned by IANA (suggested value: 4). + * + * o Segments Left. Defined in [RFC2460], it contains the index, in + * the Segment List, of the next segment to inspect. Segments Left + * is decremented at each segment. + * + * o First Segment: contains the index, in the Segment List, of the + * first segment of the path which is in fact the last element of the + * Segment List. + * + * o Flags: 8 bits of flags. Following flags are defined: + * + * 0 1 2 3 4 5 6 7 + * +-+-+-+-+-+-+-+-+ + * |U|P|O|A|H| U | + * +-+-+-+-+-+-+-+-+ + * + * U: Unused and for future use. SHOULD be unset on transmission + * and MUST be ignored on receipt. + * + * P-flag: Protected flag. Set when the packet has been rerouted + * through FRR mechanism by an SR endpoint node. + * + * O-flag: OAM flag. When set, it indicates that this packet is + * an operations and management (OAM) packet. + * + * A-flag: Alert flag. If present, it means important Type Length + * Value (TLV) objects are present. See Section 3.1 for details + * on TLVs objects. + * + * H-flag: HMAC flag. If set, the HMAC TLV is present and is + * encoded as the last TLV of the SRH. In other words, the last + * 36 octets of the SRH represent the HMAC information. See + * Section 3.1.5 for details on the HMAC TLV. + * + * o RESERVED: SHOULD be unset on transmission and MUST be ignored on + * receipt. + * + * o Segment List[n]: 128 bit IPv6 addresses representing the nth + * segment in the Segment List. The Segment List is encoded starting + * from the last segment of the path. I.e., the first element of the + * segment list (Segment List [0]) contains the last segment of the + * path while the last segment of the Segment List (Segment List[n]) + * contains the first segment of the path. The index contained in + * "Segments Left" identifies the current active segment. + * + * o Type Length Value (TLV) are described in Section 3.1. + * + */ + +#ifndef IPPROTO_IPV6_ROUTE +#define IPPROTO_IPV6_ROUTE 43 +#endif + +#define ROUTING_HEADER_TYPE_SR 4 + +typedef struct +{ + /* Protocol for next header. */ + u8 protocol; + /* + * Length of routing header in 8 octet units, + * not including the first 8 octets + */ + u8 length; + + /* Type of routing header; type 4 = segement routing */ + u8 type; + + /* Next segment in the segment list */ + u8 segments_left; + + /* Pointer to the first segment in the header */ + u8 first_segment; + + /* Flag bits */ +#define IP6_SR_HEADER_FLAG_PROTECTED (0x40) +#define IP6_SR_HEADER_FLAG_OAM (0x20) +#define IP6_SR_HEADER_FLAG_ALERT (0x10) +#define IP6_SR_HEADER_FLAG_HMAC (0x80) + + /* values 0x0, 0x4 - 0x7 are reserved */ + u8 flags; + u16 reserved; + + /* The segment elts */ + ip6_address_t segments[0]; +} __attribute__ ((packed)) ip6_sr_header_t; + +/* +* fd.io coding-style-patch-verification: ON +* +* Local Variables: +* eval: (c-set-style "gnu") +* End: +*/ + +#endif /* included_vnet_sr_packet_h */ diff --git a/src/vnet/srv6/sr_policy.md b/src/vnet/srv6/sr_policy.md new file mode 100644 index 00000000..521b8461 --- /dev/null +++ b/src/vnet/srv6/sr_policy.md @@ -0,0 +1,56 @@ +# Creating a SR Policy {#srv6_policy_doc} + +An SR Policy is defined by a Binding SID and a weighted set of Segment Lists. + +A new SR policy is created with a first SID list using: + + sr policy add bsid 2001::1 next A1:: next B1:: next C1:: (weight 5) (fib-table 3) + +* The weight parameter is only used if more than one SID list is associated with the policy. +* The fib-table parameter specifies in which table (VRF) the Binding SID is to be installed. + +An SR policy is deleted with: + + sr policy del bsid 2001::1 + sr policy del index 1 + +The existing SR policies are listed with: + + show sr policies + +## Adding/Removing SID Lists from an SR policy + +An additional SID list is associated with an existing SR policy with: + + sr policy mod bsid 2001::1 add sl next A2:: next B2:: next C2:: (weight 3) + sr policy mod index 3 add sl next A2:: next B2:: next C2:: (weight 3) + +Conversely, a SID list can be removed from an SR policy with: + + sr policy mod bsid 2001::1 del sl index 1 + sr policy mod index 3 del sl index 1 + +Note that this cannot be used to remove the last SID list of a policy. + +The weight of a SID list can also be modified with: + + sr policy mod bsid 2001::1 mod sl index 1 weight 4 + sr policy mod index 3 mod sl index 1 weight 4 + +## SR Policies: Spray policies + +Spray policies are a specific type of SR policies where the packet is replicated on all the SID lists, rather than load-balanced among them. + +SID list weights are ignored with this type of policies. + +A Spray policy is instantiated by appending the keyword **spray** to a regular SR policy command, as in: + + sr policy add bsid 2001::1 next A1:: next B1:: next C1:: spray + +Spray policies are used for removing multicast state from a network core domain, and instead send a linear unicast copy to every access node. The last SID in each list accesses the multicast tree within the access node. + +## Encapsulation SR policies + +In case the user decides to create an SR policy an IPv6 Source Address must be specified for the encapsulated traffic. In order to do so the user might use the following command: + + set sr encaps source addr XXXX::YYYY diff --git a/src/vnet/srv6/sr_policy_rewrite.c b/src/vnet/srv6/sr_policy_rewrite.c new file mode 100755 index 00000000..7a37a66b --- /dev/null +++ b/src/vnet/srv6/sr_policy_rewrite.c @@ -0,0 +1,3227 @@ +/* + * sr_policy_rewrite.c: ipv6 sr policy creation + * + * Copyright (c) 2016 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * @file + * @brief SR policy creation and application + * + * Create an SR policy. + * An SR policy can be either of 'default' type or 'spray' type + * An SR policy has attached a list of SID lists. + * In case the SR policy is a default one it will load balance among them. + * An SR policy has associated a BindingSID. + * In case any packet arrives with IPv6 DA == BindingSID then the SR policy + * associated to such bindingSID will be applied to such packet. + * + * SR policies can be applied either by using IPv6 encapsulation or + * SRH insertion. Both methods can be found on this file. + * + * Traffic input usually is IPv6 packets. However it is possible to have + * IPv4 packets or L2 frames. (that are encapsulated into IPv6 with SRH) + * + * This file provides the appropiates VPP graph nodes to do any of these + * methods. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +/** + * @brief SR policy rewrite trace + */ +typedef struct +{ + ip6_address_t src, dst; +} sr_policy_rewrite_trace_t; + +/* Graph arcs */ +#define foreach_sr_policy_rewrite_next \ +_(IP6_LOOKUP, "ip6-lookup") \ +_(ERROR, "error-drop") + +typedef enum +{ +#define _(s,n) SR_POLICY_REWRITE_NEXT_##s, + foreach_sr_policy_rewrite_next +#undef _ + SR_POLICY_REWRITE_N_NEXT, +} sr_policy_rewrite_next_t; + +/* SR rewrite errors */ +#define foreach_sr_policy_rewrite_error \ +_(INTERNAL_ERROR, "Segment Routing undefined error") \ +_(BSID_ZERO, "BSID with SL = 0") \ +_(COUNTER_TOTAL, "SR steered IPv6 packets") \ +_(COUNTER_ENCAP, "SR: Encaps packets") \ +_(COUNTER_INSERT, "SR: SRH inserted packets") \ +_(COUNTER_BSID, "SR: BindingSID steered packets") + +typedef enum +{ +#define _(sym,str) SR_POLICY_REWRITE_ERROR_##sym, + foreach_sr_policy_rewrite_error +#undef _ + SR_POLICY_REWRITE_N_ERROR, +} sr_policy_rewrite_error_t; + +static char *sr_policy_rewrite_error_strings[] = { +#define _(sym,string) string, + foreach_sr_policy_rewrite_error +#undef _ +}; + +/** + * @brief Dynamically added SR SL DPO type + */ +static dpo_type_t sr_pr_encaps_dpo_type; +static dpo_type_t sr_pr_insert_dpo_type; +static dpo_type_t sr_pr_bsid_encaps_dpo_type; +static dpo_type_t sr_pr_bsid_insert_dpo_type; + +/** + * @brief IPv6 SA for encapsulated packets + */ +static ip6_address_t sr_pr_encaps_src; + +/******************* SR rewrite set encaps IPv6 source addr *******************/ +/* Note: This is temporal. We don't know whether to follow this path or + take the ip address of a loopback interface or even the OIF */ + +static clib_error_t * +set_sr_src_command_fn (vlib_main_t * vm, unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) + { + if (unformat + (input, "addr %U", unformat_ip6_address, &sr_pr_encaps_src)) + return 0; + else + return clib_error_return (0, "No address specified"); + } + return clib_error_return (0, "No address specified"); +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (set_sr_src_command, static) = { + .path = "set sr encaps source", + .short_help = "set sr encaps source addr ", + .function = set_sr_src_command_fn, +}; +/* *INDENT-ON* */ + +/*********************** SR rewrite string computation ************************/ +/** + * @brief SR rewrite string computation for IPv6 encapsulation (inline) + * + * @param sl is a vector of IPv6 addresses composing the Segment List + * + * @return precomputed rewrite string for encapsulation + */ +static inline u8 * +compute_rewrite_encaps (ip6_address_t * sl) +{ + ip6_header_t *iph; + ip6_sr_header_t *srh; + ip6_address_t *addrp, *this_address; + u32 header_length = 0; + u8 *rs = NULL; + + header_length = 0; + header_length += IPv6_DEFAULT_HEADER_LENGTH; + if (vec_len (sl) > 1) + { + header_length += sizeof (ip6_sr_header_t); + header_length += vec_len (sl) * sizeof (ip6_address_t); + } + + vec_validate (rs, header_length - 1); + + iph = (ip6_header_t *) rs; + iph->ip_version_traffic_class_and_flow_label = + clib_host_to_net_u32 (0 | ((6 & 0xF) << 28)); + iph->src_address.as_u64[0] = sr_pr_encaps_src.as_u64[0]; + iph->src_address.as_u64[1] = sr_pr_encaps_src.as_u64[1]; + iph->payload_length = header_length - IPv6_DEFAULT_HEADER_LENGTH; + iph->protocol = IP_PROTOCOL_IPV6; + iph->hop_limit = IPv6_DEFAULT_HOP_LIMIT; + + srh = (ip6_sr_header_t *) (iph + 1); + iph->protocol = IP_PROTOCOL_IPV6_ROUTE; + srh->protocol = IP_PROTOCOL_IPV6; + srh->type = ROUTING_HEADER_TYPE_SR; + srh->segments_left = vec_len (sl) - 1; + srh->first_segment = vec_len (sl) - 1; + srh->length = ((sizeof (ip6_sr_header_t) + + (vec_len (sl) * sizeof (ip6_address_t))) / 8) - 1; + srh->flags = 0x00; + srh->reserved = 0x00; + addrp = srh->segments + vec_len (sl) - 1; + vec_foreach (this_address, sl) + { + clib_memcpy (addrp->as_u8, this_address->as_u8, sizeof (ip6_address_t)); + addrp--; + } + iph->dst_address.as_u64[0] = sl->as_u64[0]; + iph->dst_address.as_u64[1] = sl->as_u64[1]; + return rs; +} + +/** + * @brief SR rewrite string computation for SRH insertion (inline) + * + * @param sl is a vector of IPv6 addresses composing the Segment List + * + * @return precomputed rewrite string for SRH insertion + */ +static inline u8 * +compute_rewrite_insert (ip6_address_t * sl) +{ + ip6_sr_header_t *srh; + ip6_address_t *addrp, *this_address; + u32 header_length = 0; + u8 *rs = NULL; + + header_length = 0; + header_length += sizeof (ip6_sr_header_t); + header_length += (vec_len (sl) + 1) * sizeof (ip6_address_t); + + vec_validate (rs, header_length - 1); + + srh = (ip6_sr_header_t *) rs; + srh->type = ROUTING_HEADER_TYPE_SR; + srh->segments_left = vec_len (sl); + srh->first_segment = vec_len (sl); + srh->length = ((sizeof (ip6_sr_header_t) + + ((vec_len (sl) + 1) * sizeof (ip6_address_t))) / 8) - 1; + srh->flags = 0x00; + srh->reserved = 0x0000; + addrp = srh->segments + vec_len (sl); + vec_foreach (this_address, sl) + { + clib_memcpy (addrp->as_u8, this_address->as_u8, sizeof (ip6_address_t)); + addrp--; + } + return rs; +} + +/** + * @brief SR rewrite string computation for SRH insertion with BSID (inline) + * + * @param sl is a vector of IPv6 addresses composing the Segment List + * + * @return precomputed rewrite string for SRH insertion with BSID + */ +static inline u8 * +compute_rewrite_bsid (ip6_address_t * sl) +{ + ip6_sr_header_t *srh; + ip6_address_t *addrp, *this_address; + u32 header_length = 0; + u8 *rs = NULL; + + header_length = 0; + header_length += sizeof (ip6_sr_header_t); + header_length += vec_len (sl) * sizeof (ip6_address_t); + + vec_validate (rs, header_length - 1); + + srh = (ip6_sr_header_t *) rs; + srh->type = ROUTING_HEADER_TYPE_SR; + srh->segments_left = vec_len (sl) - 1; + srh->first_segment = vec_len (sl) - 1; + srh->length = ((sizeof (ip6_sr_header_t) + + (vec_len (sl) * sizeof (ip6_address_t))) / 8) - 1; + srh->flags = 0x00; + srh->reserved = 0x0000; + addrp = srh->segments + vec_len (sl) - 1; + vec_foreach (this_address, sl) + { + clib_memcpy (addrp->as_u8, this_address->as_u8, sizeof (ip6_address_t)); + addrp--; + } + return rs; +} + +/*************************** SR LB helper functions **************************/ +/** + * @brief Creates a Segment List and adds it to an SR policy + * + * Creates a Segment List and adds it to the SR policy. Notice that the SL are + * not necessarily unique. Hence there might be two Segment List within the + * same SR Policy with exactly the same segments and same weight. + * + * @param sr_policy is the SR policy where the SL will be added + * @param sl is a vector of IPv6 addresses composing the Segment List + * @param weight is the weight of the SegmentList (for load-balancing purposes) + * @param is_encap represents the mode (SRH insertion vs Encapsulation) + * + * @return pointer to the just created segment list + */ +static inline ip6_sr_sl_t * +create_sl (ip6_sr_policy_t * sr_policy, ip6_address_t * sl, u32 weight, + u8 is_encap) +{ + ip6_sr_main_t *sm = &sr_main; + ip6_sr_sl_t *segment_list; + + pool_get (sm->sid_lists, segment_list); + memset (segment_list, 0, sizeof (*segment_list)); + + vec_add1 (sr_policy->segments_lists, segment_list - sm->sid_lists); + + /* Fill in segment list */ + segment_list->weight = + (weight != (u32) ~ 0 ? weight : SR_SEGMENT_LIST_WEIGHT_DEFAULT); + segment_list->segments = vec_dup (sl); + + if (is_encap) + { + segment_list->rewrite = compute_rewrite_encaps (sl); + segment_list->rewrite_bsid = segment_list->rewrite; + } + else + { + segment_list->rewrite = compute_rewrite_insert (sl); + segment_list->rewrite_bsid = compute_rewrite_bsid (sl); + } + + /* Create DPO */ + dpo_reset (&segment_list->bsid_dpo); + dpo_reset (&segment_list->ip6_dpo); + dpo_reset (&segment_list->ip4_dpo); + + if (is_encap) + { + dpo_set (&segment_list->ip6_dpo, sr_pr_encaps_dpo_type, DPO_PROTO_IP6, + segment_list - sm->sid_lists); + dpo_set (&segment_list->ip4_dpo, sr_pr_encaps_dpo_type, DPO_PROTO_IP4, + segment_list - sm->sid_lists); + dpo_set (&segment_list->bsid_dpo, sr_pr_bsid_encaps_dpo_type, + DPO_PROTO_IP6, segment_list - sm->sid_lists); + } + else + { + dpo_set (&segment_list->ip6_dpo, sr_pr_insert_dpo_type, DPO_PROTO_IP6, + segment_list - sm->sid_lists); + dpo_set (&segment_list->bsid_dpo, sr_pr_bsid_insert_dpo_type, + DPO_PROTO_IP6, segment_list - sm->sid_lists); + } + + return segment_list; +} + +/** + * @brief Updates the Load Balancer after an SR Policy change + * + * @param sr_policy is the modified SR Policy + */ +static inline void +update_lb (ip6_sr_policy_t * sr_policy) +{ + flow_hash_config_t fhc; + u32 *sl_index; + ip6_sr_sl_t *segment_list; + ip6_sr_main_t *sm = &sr_main; + load_balance_path_t path; + path.path_index = FIB_NODE_INDEX_INVALID; + load_balance_path_t *ip4_path_vector = 0; + load_balance_path_t *ip6_path_vector = 0; + load_balance_path_t *b_path_vector = 0; + + /* In case LB does not exist, create it */ + if (!dpo_id_is_valid (&sr_policy->bsid_dpo)) + { + fib_prefix_t pfx = { + .fp_proto = FIB_PROTOCOL_IP6, + .fp_len = 128, + .fp_addr = { + .ip6 = sr_policy->bsid, + } + }; + + /* Add FIB entry for BSID */ + fhc = fib_table_get_flow_hash_config (sr_policy->fib_table, + dpo_proto_to_fib (DPO_PROTO_IP6)); + + dpo_set (&sr_policy->bsid_dpo, DPO_LOAD_BALANCE, DPO_PROTO_IP6, + load_balance_create (0, DPO_PROTO_IP6, fhc)); + + dpo_set (&sr_policy->ip6_dpo, DPO_LOAD_BALANCE, DPO_PROTO_IP6, + load_balance_create (0, DPO_PROTO_IP6, fhc)); + + /* Update FIB entry's to point to the LB DPO in the main FIB and hidden one */ + fib_table_entry_special_dpo_update (fib_table_find (FIB_PROTOCOL_IP6, + sr_policy->fib_table), + &pfx, FIB_SOURCE_SR, + FIB_ENTRY_FLAG_EXCLUSIVE, + &sr_policy->bsid_dpo); + + fib_table_entry_special_dpo_update (sm->fib_table_ip6, + &pfx, + FIB_SOURCE_SR, + FIB_ENTRY_FLAG_EXCLUSIVE, + &sr_policy->ip6_dpo); + + if (sr_policy->is_encap) + { + dpo_set (&sr_policy->ip4_dpo, DPO_LOAD_BALANCE, DPO_PROTO_IP4, + load_balance_create (0, DPO_PROTO_IP4, fhc)); + + fib_table_entry_special_dpo_update (sm->fib_table_ip4, + &pfx, + FIB_SOURCE_SR, + FIB_ENTRY_FLAG_EXCLUSIVE, + &sr_policy->ip4_dpo); + } + + } + + /* Create the LB path vector */ + //path_vector = vec_new(load_balance_path_t, vec_len(sr_policy->segments_lists)); + vec_foreach (sl_index, sr_policy->segments_lists) + { + segment_list = pool_elt_at_index (sm->sid_lists, *sl_index); + path.path_dpo = segment_list->bsid_dpo; + path.path_weight = segment_list->weight; + vec_add1 (b_path_vector, path); + path.path_dpo = segment_list->ip6_dpo; + vec_add1 (ip6_path_vector, path); + if (sr_policy->is_encap) + { + path.path_dpo = segment_list->ip4_dpo; + vec_add1 (ip4_path_vector, path); + } + } + + /* Update LB multipath */ + load_balance_multipath_update (&sr_policy->bsid_dpo, b_path_vector, + LOAD_BALANCE_FLAG_NONE); + load_balance_multipath_update (&sr_policy->ip6_dpo, ip6_path_vector, + LOAD_BALANCE_FLAG_NONE); + if (sr_policy->is_encap) + load_balance_multipath_update (&sr_policy->ip4_dpo, ip4_path_vector, + LOAD_BALANCE_FLAG_NONE); + + /* Cleanup */ + vec_free (b_path_vector); + vec_free (ip6_path_vector); + vec_free (ip4_path_vector); + +} + +/** + * @brief Updates the Replicate DPO after an SR Policy change + * + * @param sr_policy is the modified SR Policy (type spray) + */ +static inline void +update_replicate (ip6_sr_policy_t * sr_policy) +{ + u32 *sl_index; + ip6_sr_sl_t *segment_list; + ip6_sr_main_t *sm = &sr_main; + load_balance_path_t path; + path.path_index = FIB_NODE_INDEX_INVALID; + load_balance_path_t *b_path_vector = 0; + load_balance_path_t *ip6_path_vector = 0; + load_balance_path_t *ip4_path_vector = 0; + + /* In case LB does not exist, create it */ + if (!dpo_id_is_valid (&sr_policy->bsid_dpo)) + { + dpo_set (&sr_policy->bsid_dpo, DPO_REPLICATE, + DPO_PROTO_IP6, replicate_create (0, DPO_PROTO_IP6)); + + dpo_set (&sr_policy->ip6_dpo, DPO_REPLICATE, + DPO_PROTO_IP6, replicate_create (0, DPO_PROTO_IP6)); + + /* Update FIB entry's DPO to point to SR without LB */ + fib_prefix_t pfx = { + .fp_proto = FIB_PROTOCOL_IP6, + .fp_len = 128, + .fp_addr = { + .ip6 = sr_policy->bsid, + } + }; + fib_table_entry_special_dpo_update (fib_table_find (FIB_PROTOCOL_IP6, + sr_policy->fib_table), + &pfx, FIB_SOURCE_SR, + FIB_ENTRY_FLAG_EXCLUSIVE, + &sr_policy->bsid_dpo); + + fib_table_entry_special_dpo_update (sm->fib_table_ip6, + &pfx, + FIB_SOURCE_SR, + FIB_ENTRY_FLAG_EXCLUSIVE, + &sr_policy->ip6_dpo); + + if (sr_policy->is_encap) + { + dpo_set (&sr_policy->ip4_dpo, DPO_REPLICATE, DPO_PROTO_IP4, + replicate_create (0, DPO_PROTO_IP4)); + + fib_table_entry_special_dpo_update (sm->fib_table_ip4, + &pfx, + FIB_SOURCE_SR, + FIB_ENTRY_FLAG_EXCLUSIVE, + &sr_policy->ip4_dpo); + } + + } + + /* Create the replicate path vector */ + path.path_weight = 1; + vec_foreach (sl_index, sr_policy->segments_lists) + { + segment_list = pool_elt_at_index (sm->sid_lists, *sl_index); + path.path_dpo = segment_list->bsid_dpo; + vec_add1 (b_path_vector, path); + path.path_dpo = segment_list->ip6_dpo; + vec_add1 (ip6_path_vector, path); + if (sr_policy->is_encap) + { + path.path_dpo = segment_list->ip4_dpo; + vec_add1 (ip4_path_vector, path); + } + } + + /* Update replicate multipath */ + replicate_multipath_update (&sr_policy->bsid_dpo, b_path_vector); + replicate_multipath_update (&sr_policy->ip6_dpo, ip6_path_vector); + if (sr_policy->is_encap) + replicate_multipath_update (&sr_policy->ip4_dpo, ip4_path_vector); +} + +/******************************* SR rewrite API *******************************/ +/* Three functions for handling sr policies: + * -> sr_policy_add + * -> sr_policy_del + * -> sr_policy_mod + * All of them are API. CLI function on sr_policy_command_fn */ + +/** + * @brief Create a new SR policy + * + * @param bsid is the bindingSID of the SR Policy + * @param segments is a vector of IPv6 address composing the segment list + * @param weight is the weight of the sid list. optional. + * @param behavior is the behavior of the SR policy. (default//spray) + * @param fib_table is the VRF where to install the FIB entry for the BSID + * @param is_encap (bool) whether SR policy should behave as Encap/SRH Insertion + * + * @return 0 if correct, else error + */ +int +sr_policy_add (ip6_address_t * bsid, ip6_address_t * segments, + u32 weight, u8 behavior, u32 fib_table, u8 is_encap) +{ + ip6_sr_main_t *sm = &sr_main; + ip6_sr_policy_t *sr_policy = 0; + uword *p; + + /* Search for existing keys (BSID) */ + p = mhash_get (&sm->sr_policies_index_hash, bsid); + if (p) + { + /* Add SR policy that already exists; complain */ + return -12; + } + + /* Search collision in FIB entries */ + /* Explanation: It might be possible that some other entity has already + * created a route for the BSID. This in theory is impossible, but in + * practise we could see it. Assert it and scream if needed */ + fib_prefix_t pfx = { + .fp_proto = FIB_PROTOCOL_IP6, + .fp_len = 128, + .fp_addr = { + .ip6 = *bsid, + } + }; + + /* Lookup the FIB index associated to the table selected */ + u32 fib_index = fib_table_find (FIB_PROTOCOL_IP6, + (fib_table != (u32) ~ 0 ? fib_table : 0)); + if (fib_index == ~0) + return -13; + + /* Lookup whether there exists an entry for the BSID */ + fib_node_index_t fei = fib_table_lookup_exact_match (fib_index, &pfx); + if (FIB_NODE_INDEX_INVALID != fei) + return -12; //There is an entry for such lookup + + /* Add an SR policy object */ + pool_get (sm->sr_policies, sr_policy); + memset (sr_policy, 0, sizeof (*sr_policy)); + clib_memcpy (&sr_policy->bsid, bsid, sizeof (ip6_address_t)); + sr_policy->type = behavior; + sr_policy->fib_table = (fib_table != (u32) ~ 0 ? fib_table : 0); //Is default FIB 0 ? + sr_policy->is_encap = is_encap; + + /* Copy the key */ + mhash_set (&sm->sr_policies_index_hash, bsid, sr_policy - sm->sr_policies, + NULL); + + /* Create a segment list and add the index to the SR policy */ + create_sl (sr_policy, segments, weight, is_encap); + + /* If FIB doesnt exist, create them */ + if (sm->fib_table_ip6 == (u32) ~ 0) + { + sm->fib_table_ip6 = fib_table_create_and_lock (FIB_PROTOCOL_IP6, + "SRv6 steering of IP6 prefixes through BSIDs"); + sm->fib_table_ip4 = fib_table_create_and_lock (FIB_PROTOCOL_IP6, + "SRv6 steering of IP4 prefixes through BSIDs"); + } + + /* Create IPv6 FIB for the BindingSID attached to the DPO of the only SL */ + if (sr_policy->type == SR_POLICY_TYPE_DEFAULT) + update_lb (sr_policy); + else if (sr_policy->type == SR_POLICY_TYPE_SPRAY) + update_replicate (sr_policy); + return 0; +} + +/** + * @brief Delete a SR policy + * + * @param bsid is the bindingSID of the SR Policy + * @param index is the index of the SR policy + * + * @return 0 if correct, else error + */ +int +sr_policy_del (ip6_address_t * bsid, u32 index) +{ + ip6_sr_main_t *sm = &sr_main; + ip6_sr_policy_t *sr_policy = 0; + ip6_sr_sl_t *segment_list; + u32 *sl_index; + uword *p; + + if (bsid) + { + p = mhash_get (&sm->sr_policies_index_hash, bsid); + if (p) + sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); + else + return -1; + } + else + { + sr_policy = pool_elt_at_index (sm->sr_policies, index); + if (!sr_policy) + return -1; + } + + /* Remove BindingSID FIB entry */ + fib_prefix_t pfx = { + .fp_proto = FIB_PROTOCOL_IP6, + .fp_len = 128, + .fp_addr = { + .ip6 = sr_policy->bsid, + } + , + }; + + fib_table_entry_special_remove (fib_table_find (FIB_PROTOCOL_IP6, + sr_policy->fib_table), + &pfx, FIB_SOURCE_SR); + + fib_table_entry_special_remove (sm->fib_table_ip6, &pfx, FIB_SOURCE_SR); + + if (sr_policy->is_encap) + fib_table_entry_special_remove (sm->fib_table_ip4, &pfx, FIB_SOURCE_SR); + + if (dpo_id_is_valid (&sr_policy->bsid_dpo)) + { + dpo_reset (&sr_policy->bsid_dpo); + dpo_reset (&sr_policy->ip4_dpo); + dpo_reset (&sr_policy->ip6_dpo); + } + + /* Clean SID Lists */ + vec_foreach (sl_index, sr_policy->segments_lists) + { + segment_list = pool_elt_at_index (sm->sid_lists, *sl_index); + vec_free (segment_list->segments); + vec_free (segment_list->rewrite); + vec_free (segment_list->rewrite_bsid); + pool_put_index (sm->sid_lists, *sl_index); + } + + /* Remove SR policy entry */ + mhash_unset (&sm->sr_policies_index_hash, &sr_policy->bsid, NULL); + pool_put (sm->sr_policies, sr_policy); + + /* If FIB empty unlock it */ + if (!pool_elts (sm->sr_policies) && !pool_elts (sm->steer_policies)) + { + fib_table_unlock (sm->fib_table_ip6, FIB_PROTOCOL_IP6); + fib_table_unlock (sm->fib_table_ip4, FIB_PROTOCOL_IP6); + sm->fib_table_ip6 = (u32) ~ 0; + sm->fib_table_ip4 = (u32) ~ 0; + } + + return 0; +} + +/** + * @brief Modify an existing SR policy + * + * The possible modifications are adding a new Segment List, modifying an + * existing Segment List (modify the weight only) and delete a given + * Segment List from the SR Policy. + * + * @param bsid is the bindingSID of the SR Policy + * @param index is the index of the SR policy + * @param fib_table is the VRF where to install the FIB entry for the BSID + * @param operation is the operation to perform (among the top ones) + * @param segments is a vector of IPv6 address composing the segment list + * @param sl_index is the index of the Segment List to modify/delete + * @param weight is the weight of the sid list. optional. + * @param is_encap Mode. Encapsulation or SRH insertion. + * + * @return 0 if correct, else error + */ +int +sr_policy_mod (ip6_address_t * bsid, u32 index, u32 fib_table, + u8 operation, ip6_address_t * segments, u32 sl_index, + u32 weight) +{ + ip6_sr_main_t *sm = &sr_main; + ip6_sr_policy_t *sr_policy = 0; + ip6_sr_sl_t *segment_list; + u32 *sl_index_iterate; + uword *p; + + if (bsid) + { + p = mhash_get (&sm->sr_policies_index_hash, bsid); + if (p) + sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); + else + return -1; + } + else + { + sr_policy = pool_elt_at_index (sm->sr_policies, index); + if (!sr_policy) + return -1; + } + + if (operation == 1) /* Add SR List to an existing SR policy */ + { + /* Create the new SL */ + segment_list = + create_sl (sr_policy, segments, weight, sr_policy->is_encap); + + /* Create a new LB DPO */ + if (sr_policy->type == SR_POLICY_TYPE_DEFAULT) + update_lb (sr_policy); + else if (sr_policy->type == SR_POLICY_TYPE_SPRAY) + update_replicate (sr_policy); + } + else if (operation == 2) /* Delete SR List from an existing SR policy */ + { + /* Check that currently there are more than one SID list */ + if (vec_len (sr_policy->segments_lists) == 1) + return -21; + + /* Check that the SR list does exist and is assigned to the sr policy */ + vec_foreach (sl_index_iterate, sr_policy->segments_lists) + if (*sl_index_iterate == sl_index) + break; + + if (*sl_index_iterate != sl_index) + return -22; + + /* Remove the lucky SR list that is being kicked out */ + segment_list = pool_elt_at_index (sm->sid_lists, sl_index); + vec_free (segment_list->segments); + vec_free (segment_list->rewrite); + vec_free (segment_list->rewrite_bsid); + pool_put_index (sm->sid_lists, sl_index); + vec_del1 (sr_policy->segments_lists, + sl_index_iterate - sr_policy->segments_lists); + + /* Create a new LB DPO */ + if (sr_policy->type == SR_POLICY_TYPE_DEFAULT) + update_lb (sr_policy); + else if (sr_policy->type == SR_POLICY_TYPE_SPRAY) + update_replicate (sr_policy); + } + else if (operation == 3) /* Modify the weight of an existing SR List */ + { + /* Find the corresponding SL */ + vec_foreach (sl_index_iterate, sr_policy->segments_lists) + if (*sl_index_iterate == sl_index) + break; + + if (*sl_index_iterate != sl_index) + return -32; + + /* Change the weight */ + segment_list = pool_elt_at_index (sm->sid_lists, sl_index); + segment_list->weight = weight; + + /* Update LB */ + if (sr_policy->type == SR_POLICY_TYPE_DEFAULT) + update_lb (sr_policy); + } + else /* Incorrect op. */ + return -1; + + return 0; +} + +/** + * @brief CLI for 'sr policies' command family + */ +static clib_error_t * +sr_policy_command_fn (vlib_main_t * vm, unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + int rv = -1; + char is_del = 0, is_add = 0, is_mod = 0; + char policy_set = 0; + ip6_address_t bsid, next_address; + u32 sr_policy_index = (u32) ~ 0, sl_index = (u32) ~ 0; + u32 weight = (u32) ~ 0, fib_table = (u32) ~ 0; + ip6_address_t *segments = 0, *this_seg; + u8 operation = 0; + char is_encap = 1; + char is_spray = 0; + + while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) + { + if (!is_add && !is_mod && !is_del && unformat (input, "add")) + is_add = 1; + else if (!is_add && !is_mod && !is_del && unformat (input, "del")) + is_del = 1; + else if (!is_add && !is_mod && !is_del && unformat (input, "mod")) + is_mod = 1; + else if (!policy_set + && unformat (input, "bsid %U", unformat_ip6_address, &bsid)) + policy_set = 1; + else if (!is_add && !policy_set + && unformat (input, "index %d", &sr_policy_index)) + policy_set = 1; + else if (unformat (input, "weight %d", &weight)); + else + if (unformat (input, "next %U", unformat_ip6_address, &next_address)) + { + vec_add2 (segments, this_seg, 1); + clib_memcpy (this_seg->as_u8, next_address.as_u8, + sizeof (*this_seg)); + } + else if (unformat (input, "add sl")) + operation = 1; + else if (unformat (input, "del sl index %d", &sl_index)) + operation = 2; + else if (unformat (input, "mod sl index %d", &sl_index)) + operation = 3; + else if (fib_table == (u32) ~ 0 + && unformat (input, "fib-table %d", &fib_table)); + else if (unformat (input, "encap")) + is_encap = 1; + else if (unformat (input, "insert")) + is_encap = 0; + else if (unformat (input, "spray")) + is_spray = 1; + else + break; + } + + if (!is_add && !is_mod && !is_del) + return clib_error_return (0, "Incorrect CLI"); + + if (!policy_set) + return clib_error_return (0, "No SR policy BSID or index specified"); + + if (is_add) + { + if (vec_len (segments) == 0) + return clib_error_return (0, "No Segment List specified"); + rv = sr_policy_add (&bsid, segments, weight, + (is_spray ? SR_POLICY_TYPE_SPRAY : + SR_POLICY_TYPE_DEFAULT), fib_table, is_encap); + } + else if (is_del) + rv = sr_policy_del ((sr_policy_index != (u32) ~ 0 ? NULL : &bsid), + sr_policy_index); + else if (is_mod) + { + if (!operation) + return clib_error_return (0, "No SL modification specified"); + if (operation != 1 && sl_index == (u32) ~ 0) + return clib_error_return (0, "No Segment List index specified"); + if (operation == 1 && vec_len (segments) == 0) + return clib_error_return (0, "No Segment List specified"); + if (operation == 3 && weight == (u32) ~ 0) + return clib_error_return (0, "No new weight for the SL specified"); + rv = sr_policy_mod ((sr_policy_index != (u32) ~ 0 ? NULL : &bsid), + sr_policy_index, fib_table, operation, segments, + sl_index, weight); + } + + switch (rv) + { + case 0: + break; + case 1: + return 0; + case -12: + return clib_error_return (0, + "There is already a FIB entry for the BindingSID address.\n" + "The SR policy could not be created."); + case -13: + return clib_error_return (0, "The specified FIB table does not exist."); + case -21: + return clib_error_return (0, + "The selected SR policy only contains ONE segment list. " + "Please remove the SR policy instead"); + case -22: + return clib_error_return (0, + "Could not delete the segment list. " + "It is not associated with that SR policy."); + case -32: + return clib_error_return (0, + "Could not modify the segment list. " + "The given SL is not associated with such SR policy."); + default: + return clib_error_return (0, "BUG: sr policy returns %d", rv); + } + return 0; +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (sr_policy_command, static) = { + .path = "sr policy", + .short_help = "sr policy [add||del||mod] [bsid 2001::1||index 5] " + "next A:: next B:: next C:: (weight 1) (fib-table 2) (encap|insert)", + .long_help = + "Manipulation of SR policies.\n" + "A Segment Routing policy may contain several SID lists. Each SID list has\n" + "an associated weight (default 1), which will result in wECMP (uECMP).\n" + "Segment Routing policies might be of type encapsulation or srh insertion\n" + "Each SR policy will be associated with a unique BindingSID.\n" + "A BindingSID is a locally allocated SegmentID. For every packet that arrives\n" + "with IPv6_DA:BSID such traffic will be steered into the SR policy.\n" + "The add command will create a SR policy with its first segment list (sl)\n" + "The mod command allows you to add, remove, or modify the existing segment lists\n" + "within an SR policy.\n" + "The del command allows you to delete a SR policy along with all its associated\n" + "SID lists.\n", + .function = sr_policy_command_fn, +}; +/* *INDENT-ON* */ + +/** + * @brief CLI to display onscreen all the SR policies + */ +static clib_error_t * +show_sr_policies_command_fn (vlib_main_t * vm, unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + ip6_sr_main_t *sm = &sr_main; + u32 *sl_index; + ip6_sr_sl_t *segment_list = 0; + ip6_sr_policy_t *sr_policy = 0; + ip6_sr_policy_t **vec_policies = 0; + ip6_address_t *addr; + u8 *s; + int i = 0; + + vlib_cli_output (vm, "SR policies:"); + + /* *INDENT-OFF* */ + pool_foreach (sr_policy, sm->sr_policies, + {vec_add1 (vec_policies, sr_policy); } ); + /* *INDENT-ON* */ + + vec_foreach_index (i, vec_policies) + { + sr_policy = vec_policies[i]; + vlib_cli_output (vm, "[%u].-\tBSID: %U", + (u32) (sr_policy - sm->sr_policies), + format_ip6_address, &sr_policy->bsid); + vlib_cli_output (vm, "\tBehavior: %s", + (sr_policy->is_encap ? "Encapsulation" : + "SRH insertion")); + vlib_cli_output (vm, "\tType: %s", + (sr_policy->type == + SR_POLICY_TYPE_DEFAULT ? "Default" : "Spray")); + vlib_cli_output (vm, "\tFIB table: %u", + (sr_policy->fib_table != + (u32) ~ 0 ? sr_policy->fib_table : 0)); + vlib_cli_output (vm, "\tSegment Lists:"); + vec_foreach (sl_index, sr_policy->segments_lists) + { + s = NULL; + s = format (s, "\t[%u].- ", *sl_index); + segment_list = pool_elt_at_index (sm->sid_lists, *sl_index); + s = format (s, "< "); + vec_foreach (addr, segment_list->segments) + { + s = format (s, "%U, ", format_ip6_address, addr); + } + s = format (s, "\b\b > "); + s = format (s, "weight: %u", segment_list->weight); + vlib_cli_output (vm, " %s", s); + } + vlib_cli_output (vm, "-----------"); + } + return 0; +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (show_sr_policies_command, static) = { + .path = "show sr policies", + .short_help = "show sr policies", + .function = show_sr_policies_command_fn, +}; +/* *INDENT-ON* */ + +/*************************** SR rewrite graph node ****************************/ +/** + * @brief Trace for the SR Policy Rewrite graph node + */ +static u8 * +format_sr_policy_rewrite_trace (u8 * s, va_list * args) +{ + //TODO + CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *); + CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *); + sr_policy_rewrite_trace_t *t = va_arg (*args, sr_policy_rewrite_trace_t *); + + s = format + (s, "SR-policy-rewrite: src %U dst %U", + format_ip6_address, &t->src, format_ip6_address, &t->dst); + + return s; +} + +/** + * @brief IPv6 encapsulation processing as per RFC2473 + */ +static_always_inline void +encaps_processing_v6 (vlib_node_runtime_t * node, + vlib_buffer_t * b0, + ip6_header_t * ip0, ip6_header_t * ip0_encap) +{ + u32 new_l0; + + ip0_encap->hop_limit -= 1; + new_l0 = + ip0->payload_length + sizeof (ip6_header_t) + + clib_net_to_host_u16 (ip0_encap->payload_length); + ip0->payload_length = clib_host_to_net_u16 (new_l0); + ip0->ip_version_traffic_class_and_flow_label = + ip0_encap->ip_version_traffic_class_and_flow_label; +} + +/** + * @brief Graph node for applying a SR policy into an IPv6 packet. Encapsulation + */ +static uword +sr_policy_rewrite_encaps (vlib_main_t * vm, vlib_node_runtime_t * node, + vlib_frame_t * from_frame) +{ + ip6_sr_main_t *sm = &sr_main; + u32 n_left_from, next_index, *from, *to_next; + + from = vlib_frame_vector_args (from_frame); + n_left_from = from_frame->n_vectors; + + next_index = node->cached_next_index; + + int encap_pkts = 0, bsid_pkts = 0; + + while (n_left_from > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + /* Quad - Loop */ + while (n_left_from >= 8 && n_left_to_next >= 4) + { + u32 bi0, bi1, bi2, bi3; + vlib_buffer_t *b0, *b1, *b2, *b3; + u32 next0, next1, next2, next3; + next0 = next1 = next2 = next3 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; + ip6_header_t *ip0, *ip1, *ip2, *ip3; + ip6_header_t *ip0_encap, *ip1_encap, *ip2_encap, *ip3_encap; + ip6_sr_sl_t *sl0, *sl1, *sl2, *sl3; + + /* Prefetch next iteration. */ + { + vlib_buffer_t *p4, *p5, *p6, *p7; + + p4 = vlib_get_buffer (vm, from[4]); + p5 = vlib_get_buffer (vm, from[5]); + p6 = vlib_get_buffer (vm, from[6]); + p7 = vlib_get_buffer (vm, from[7]); + + /* Prefetch the buffer header and packet for the N+2 loop iteration */ + vlib_prefetch_buffer_header (p4, LOAD); + vlib_prefetch_buffer_header (p5, LOAD); + vlib_prefetch_buffer_header (p6, LOAD); + vlib_prefetch_buffer_header (p7, LOAD); + + CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); + } + + to_next[0] = bi0 = from[0]; + to_next[1] = bi1 = from[1]; + to_next[2] = bi2 = from[2]; + to_next[3] = bi3 = from[3]; + from += 4; + to_next += 4; + n_left_from -= 4; + n_left_to_next -= 4; + + b0 = vlib_get_buffer (vm, bi0); + b1 = vlib_get_buffer (vm, bi1); + b2 = vlib_get_buffer (vm, bi2); + b3 = vlib_get_buffer (vm, bi3); + + sl0 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + sl1 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b1)->ip.adj_index[VLIB_TX]); + sl2 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b2)->ip.adj_index[VLIB_TX]); + sl3 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b3)->ip.adj_index[VLIB_TX]); + + ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl0->rewrite)); + ASSERT (b1->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl1->rewrite)); + ASSERT (b2->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl2->rewrite)); + ASSERT (b3->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl3->rewrite)); + + ip0_encap = vlib_buffer_get_current (b0); + ip1_encap = vlib_buffer_get_current (b1); + ip2_encap = vlib_buffer_get_current (b2); + ip3_encap = vlib_buffer_get_current (b3); + + clib_memcpy (((u8 *) ip0_encap) - vec_len (sl0->rewrite), + sl0->rewrite, vec_len (sl0->rewrite)); + clib_memcpy (((u8 *) ip1_encap) - vec_len (sl1->rewrite), + sl1->rewrite, vec_len (sl1->rewrite)); + clib_memcpy (((u8 *) ip2_encap) - vec_len (sl2->rewrite), + sl2->rewrite, vec_len (sl2->rewrite)); + clib_memcpy (((u8 *) ip3_encap) - vec_len (sl3->rewrite), + sl3->rewrite, vec_len (sl3->rewrite)); + + vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); + vlib_buffer_advance (b1, -(word) vec_len (sl1->rewrite)); + vlib_buffer_advance (b2, -(word) vec_len (sl2->rewrite)); + vlib_buffer_advance (b3, -(word) vec_len (sl3->rewrite)); + + ip0 = vlib_buffer_get_current (b0); + ip1 = vlib_buffer_get_current (b1); + ip2 = vlib_buffer_get_current (b2); + ip3 = vlib_buffer_get_current (b3); + + encaps_processing_v6 (node, b0, ip0, ip0_encap); + encaps_processing_v6 (node, b1, ip1, ip1_encap); + encaps_processing_v6 (node, b2, ip2, ip2_encap); + encaps_processing_v6 (node, b3, ip3, ip3_encap); + + if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) + { + if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b1->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b1, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip1->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip1->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b2->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b2, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip2->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip2->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b3->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b3, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip3->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip3->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + } + + encap_pkts += 4; + vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, + n_left_to_next, bi0, bi1, bi2, bi3, + next0, next1, next2, next3); + } + + /* Single loop for potentially the last three packets */ + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0; + vlib_buffer_t *b0; + ip6_header_t *ip0 = 0, *ip0_encap = 0; + ip6_sr_sl_t *sl0; + u32 next0 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; + + bi0 = from[0]; + to_next[0] = bi0; + from += 1; + to_next += 1; + n_left_from -= 1; + n_left_to_next -= 1; + b0 = vlib_get_buffer (vm, bi0); + + sl0 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl0->rewrite)); + + ip0_encap = vlib_buffer_get_current (b0); + + clib_memcpy (((u8 *) ip0_encap) - vec_len (sl0->rewrite), + sl0->rewrite, vec_len (sl0->rewrite)); + vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); + + ip0 = vlib_buffer_get_current (b0); + + encaps_processing_v6 (node, b0, ip0, ip0_encap); + + if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE) && + PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + encap_pkts++; + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, + n_left_to_next, bi0, next0); + } + + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + + /* Update counters */ + vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, + SR_POLICY_REWRITE_ERROR_COUNTER_TOTAL, + encap_pkts); + vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, + SR_POLICY_REWRITE_ERROR_COUNTER_BSID, + bsid_pkts); + + return from_frame->n_vectors; +} + +/* *INDENT-OFF* */ +VLIB_REGISTER_NODE (sr_policy_rewrite_encaps_node) = { + .function = sr_policy_rewrite_encaps, + .name = "sr-pl-rewrite-encaps", + .vector_size = sizeof (u32), + .format_trace = format_sr_policy_rewrite_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + .n_errors = SR_POLICY_REWRITE_N_ERROR, + .error_strings = sr_policy_rewrite_error_strings, + .n_next_nodes = SR_POLICY_REWRITE_N_NEXT, + .next_nodes = { +#define _(s,n) [SR_POLICY_REWRITE_NEXT_##s] = n, + foreach_sr_policy_rewrite_next +#undef _ + }, +}; +/* *INDENT-ON* */ + +/** + * @brief IPv4 encapsulation processing as per RFC2473 + */ +static_always_inline void +encaps_processing_v4 (vlib_node_runtime_t * node, + vlib_buffer_t * b0, + ip6_header_t * ip0, ip4_header_t * ip0_encap) +{ + u32 new_l0; + ip6_sr_header_t *sr0; + + u32 checksum0; + + /* Inner IPv4: Decrement TTL & update checksum */ + ip0_encap->ttl -= 1; + checksum0 = ip0_encap->checksum + clib_host_to_net_u16 (0x0100); + checksum0 += checksum0 >= 0xffff; + ip0_encap->checksum = checksum0; + + /* Outer IPv6: Update length, FL, proto */ + new_l0 = ip0->payload_length + clib_net_to_host_u16 (ip0_encap->length); + ip0->payload_length = clib_host_to_net_u16 (new_l0); + ip0->ip_version_traffic_class_and_flow_label = + clib_host_to_net_u32 (0 | ((6 & 0xF) << 28) | + ((ip0_encap->tos & 0xFF) << 20)); + sr0 = (void *) (ip0 + 1); + sr0->protocol = IP_PROTOCOL_IP_IN_IP; +} + +/** + * @brief Graph node for applying a SR policy into an IPv4 packet. Encapsulation + */ +static uword +sr_policy_rewrite_encaps_v4 (vlib_main_t * vm, vlib_node_runtime_t * node, + vlib_frame_t * from_frame) +{ + ip6_sr_main_t *sm = &sr_main; + u32 n_left_from, next_index, *from, *to_next; + + from = vlib_frame_vector_args (from_frame); + n_left_from = from_frame->n_vectors; + + next_index = node->cached_next_index; + + int encap_pkts = 0, bsid_pkts = 0; + + while (n_left_from > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + /* Quad - Loop */ + while (n_left_from >= 8 && n_left_to_next >= 4) + { + u32 bi0, bi1, bi2, bi3; + vlib_buffer_t *b0, *b1, *b2, *b3; + u32 next0, next1, next2, next3; + next0 = next1 = next2 = next3 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; + ip6_header_t *ip0, *ip1, *ip2, *ip3; + ip4_header_t *ip0_encap, *ip1_encap, *ip2_encap, *ip3_encap; + ip6_sr_sl_t *sl0, *sl1, *sl2, *sl3; + + /* Prefetch next iteration. */ + { + vlib_buffer_t *p4, *p5, *p6, *p7; + + p4 = vlib_get_buffer (vm, from[4]); + p5 = vlib_get_buffer (vm, from[5]); + p6 = vlib_get_buffer (vm, from[6]); + p7 = vlib_get_buffer (vm, from[7]); + + /* Prefetch the buffer header and packet for the N+2 loop iteration */ + vlib_prefetch_buffer_header (p4, LOAD); + vlib_prefetch_buffer_header (p5, LOAD); + vlib_prefetch_buffer_header (p6, LOAD); + vlib_prefetch_buffer_header (p7, LOAD); + + CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); + } + + to_next[0] = bi0 = from[0]; + to_next[1] = bi1 = from[1]; + to_next[2] = bi2 = from[2]; + to_next[3] = bi3 = from[3]; + from += 4; + to_next += 4; + n_left_from -= 4; + n_left_to_next -= 4; + + b0 = vlib_get_buffer (vm, bi0); + b1 = vlib_get_buffer (vm, bi1); + b2 = vlib_get_buffer (vm, bi2); + b3 = vlib_get_buffer (vm, bi3); + + sl0 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + sl1 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b1)->ip.adj_index[VLIB_TX]); + sl2 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b2)->ip.adj_index[VLIB_TX]); + sl3 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b3)->ip.adj_index[VLIB_TX]); + ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl0->rewrite)); + ASSERT (b1->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl1->rewrite)); + ASSERT (b2->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl2->rewrite)); + ASSERT (b3->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl3->rewrite)); + + ip0_encap = vlib_buffer_get_current (b0); + ip1_encap = vlib_buffer_get_current (b1); + ip2_encap = vlib_buffer_get_current (b2); + ip3_encap = vlib_buffer_get_current (b3); + + clib_memcpy (((u8 *) ip0_encap) - vec_len (sl0->rewrite), + sl0->rewrite, vec_len (sl0->rewrite)); + clib_memcpy (((u8 *) ip1_encap) - vec_len (sl1->rewrite), + sl1->rewrite, vec_len (sl1->rewrite)); + clib_memcpy (((u8 *) ip2_encap) - vec_len (sl2->rewrite), + sl2->rewrite, vec_len (sl2->rewrite)); + clib_memcpy (((u8 *) ip3_encap) - vec_len (sl3->rewrite), + sl3->rewrite, vec_len (sl3->rewrite)); + + vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); + vlib_buffer_advance (b1, -(word) vec_len (sl1->rewrite)); + vlib_buffer_advance (b2, -(word) vec_len (sl2->rewrite)); + vlib_buffer_advance (b3, -(word) vec_len (sl3->rewrite)); + + ip0 = vlib_buffer_get_current (b0); + ip1 = vlib_buffer_get_current (b1); + ip2 = vlib_buffer_get_current (b2); + ip3 = vlib_buffer_get_current (b3); + + encaps_processing_v4 (node, b0, ip0, ip0_encap); + encaps_processing_v4 (node, b1, ip1, ip1_encap); + encaps_processing_v4 (node, b2, ip2, ip2_encap); + encaps_processing_v4 (node, b3, ip3, ip3_encap); + + if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) + { + if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b1->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b1, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip1->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip1->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b2->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b2, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip2->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip2->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b3->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b3, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip3->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip3->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + } + + encap_pkts += 4; + vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, + n_left_to_next, bi0, bi1, bi2, bi3, + next0, next1, next2, next3); + } + + /* Single loop for potentially the last three packets */ + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0; + vlib_buffer_t *b0; + ip6_header_t *ip0 = 0; + ip4_header_t *ip0_encap = 0; + ip6_sr_sl_t *sl0; + u32 next0 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; + + bi0 = from[0]; + to_next[0] = bi0; + from += 1; + to_next += 1; + n_left_from -= 1; + n_left_to_next -= 1; + b0 = vlib_get_buffer (vm, bi0); + + sl0 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl0->rewrite)); + + ip0_encap = vlib_buffer_get_current (b0); + + clib_memcpy (((u8 *) ip0_encap) - vec_len (sl0->rewrite), + sl0->rewrite, vec_len (sl0->rewrite)); + vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); + + ip0 = vlib_buffer_get_current (b0); + + encaps_processing_v4 (node, b0, ip0, ip0_encap); + + if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE) && + PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + encap_pkts++; + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, + n_left_to_next, bi0, next0); + } + + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + + /* Update counters */ + vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, + SR_POLICY_REWRITE_ERROR_COUNTER_TOTAL, + encap_pkts); + vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, + SR_POLICY_REWRITE_ERROR_COUNTER_BSID, + bsid_pkts); + + return from_frame->n_vectors; +} + +/* *INDENT-OFF* */ +VLIB_REGISTER_NODE (sr_policy_rewrite_encaps_v4_node) = { + .function = sr_policy_rewrite_encaps_v4, + .name = "sr-pl-rewrite-encaps-v4", + .vector_size = sizeof (u32), + .format_trace = format_sr_policy_rewrite_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + .n_errors = SR_POLICY_REWRITE_N_ERROR, + .error_strings = sr_policy_rewrite_error_strings, + .n_next_nodes = SR_POLICY_REWRITE_N_NEXT, + .next_nodes = { +#define _(s,n) [SR_POLICY_REWRITE_NEXT_##s] = n, + foreach_sr_policy_rewrite_next +#undef _ + }, +}; +/* *INDENT-ON* */ + +always_inline u32 +ip_flow_hash (void *data) +{ + ip4_header_t *iph = (ip4_header_t *) data; + + if ((iph->ip_version_and_header_length & 0xF0) == 0x40) + return ip4_compute_flow_hash (iph, IP_FLOW_HASH_DEFAULT); + else + return ip6_compute_flow_hash ((ip6_header_t *) iph, IP_FLOW_HASH_DEFAULT); +} + +always_inline u64 +mac_to_u64 (u8 * m) +{ + return (*((u64 *) m) & 0xffffffffffff); +} + +always_inline u32 +l2_flow_hash (vlib_buffer_t * b0) +{ + ethernet_header_t *eh; + u64 a, b, c; + uword is_ip, eh_size; + u16 eh_type; + + eh = vlib_buffer_get_current (b0); + eh_type = clib_net_to_host_u16 (eh->type); + eh_size = ethernet_buffer_header_size (b0); + + is_ip = (eh_type == ETHERNET_TYPE_IP4 || eh_type == ETHERNET_TYPE_IP6); + + /* since we have 2 cache lines, use them */ + if (is_ip) + a = ip_flow_hash ((u8 *) vlib_buffer_get_current (b0) + eh_size); + else + a = eh->type; + + b = mac_to_u64 ((u8 *) eh->dst_address); + c = mac_to_u64 ((u8 *) eh->src_address); + hash_mix64 (a, b, c); + + return (u32) c; +} + +/** + * @brief Graph node for applying a SR policy into a L2 frame + */ +static uword +sr_policy_rewrite_encaps_l2 (vlib_main_t * vm, vlib_node_runtime_t * node, + vlib_frame_t * from_frame) +{ + ip6_sr_main_t *sm = &sr_main; + u32 n_left_from, next_index, *from, *to_next; + + from = vlib_frame_vector_args (from_frame); + n_left_from = from_frame->n_vectors; + + next_index = node->cached_next_index; + + int encap_pkts = 0, bsid_pkts = 0; + + while (n_left_from > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + /* Quad - Loop */ + while (n_left_from >= 8 && n_left_to_next >= 4) + { + u32 bi0, bi1, bi2, bi3; + vlib_buffer_t *b0, *b1, *b2, *b3; + u32 next0, next1, next2, next3; + next0 = next1 = next2 = next3 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; + ethernet_header_t *en0, *en1, *en2, *en3; + ip6_header_t *ip0, *ip1, *ip2, *ip3; + ip6_sr_header_t *sr0, *sr1, *sr2, *sr3; + ip6_sr_policy_t *sp0, *sp1, *sp2, *sp3; + ip6_sr_sl_t *sl0, *sl1, *sl2, *sl3; + + /* Prefetch next iteration. */ + { + vlib_buffer_t *p4, *p5, *p6, *p7; + + p4 = vlib_get_buffer (vm, from[4]); + p5 = vlib_get_buffer (vm, from[5]); + p6 = vlib_get_buffer (vm, from[6]); + p7 = vlib_get_buffer (vm, from[7]); + + /* Prefetch the buffer header and packet for the N+2 loop iteration */ + vlib_prefetch_buffer_header (p4, LOAD); + vlib_prefetch_buffer_header (p5, LOAD); + vlib_prefetch_buffer_header (p6, LOAD); + vlib_prefetch_buffer_header (p7, LOAD); + + CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); + } + + to_next[0] = bi0 = from[0]; + to_next[1] = bi1 = from[1]; + to_next[2] = bi2 = from[2]; + to_next[3] = bi3 = from[3]; + from += 4; + to_next += 4; + n_left_from -= 4; + n_left_to_next -= 4; + + b0 = vlib_get_buffer (vm, bi0); + b1 = vlib_get_buffer (vm, bi1); + b2 = vlib_get_buffer (vm, bi2); + b3 = vlib_get_buffer (vm, bi3); + + sp0 = pool_elt_at_index (sm->sr_policies, + sm->sw_iface_sr_policies[vnet_buffer + (b0)->sw_if_index + [VLIB_RX]]); + + sp1 = pool_elt_at_index (sm->sr_policies, + sm->sw_iface_sr_policies[vnet_buffer + (b1)->sw_if_index + [VLIB_RX]]); + + sp2 = pool_elt_at_index (sm->sr_policies, + sm->sw_iface_sr_policies[vnet_buffer + (b2)->sw_if_index + [VLIB_RX]]); + + sp3 = pool_elt_at_index (sm->sr_policies, + sm->sw_iface_sr_policies[vnet_buffer + (b3)->sw_if_index + [VLIB_RX]]); + + if (vec_len (sp0->segments_lists) == 1) + vnet_buffer (b0)->ip.adj_index[VLIB_TX] = sp0->segments_lists[0]; + else + { + vnet_buffer (b0)->ip.flow_hash = l2_flow_hash (b0); + vnet_buffer (b0)->ip.adj_index[VLIB_TX] = + sp0->segments_lists[(vnet_buffer (b0)->ip.flow_hash & + (vec_len (sp0->segments_lists) - 1))]; + } + + if (vec_len (sp1->segments_lists) == 1) + vnet_buffer (b1)->ip.adj_index[VLIB_TX] = sp1->segments_lists[1]; + else + { + vnet_buffer (b1)->ip.flow_hash = l2_flow_hash (b1); + vnet_buffer (b1)->ip.adj_index[VLIB_TX] = + sp1->segments_lists[(vnet_buffer (b1)->ip.flow_hash & + (vec_len (sp1->segments_lists) - 1))]; + } + + if (vec_len (sp2->segments_lists) == 1) + vnet_buffer (b2)->ip.adj_index[VLIB_TX] = sp2->segments_lists[2]; + else + { + vnet_buffer (b2)->ip.flow_hash = l2_flow_hash (b2); + vnet_buffer (b2)->ip.adj_index[VLIB_TX] = + sp2->segments_lists[(vnet_buffer (b2)->ip.flow_hash & + (vec_len (sp2->segments_lists) - 1))]; + } + + if (vec_len (sp3->segments_lists) == 1) + vnet_buffer (b3)->ip.adj_index[VLIB_TX] = sp3->segments_lists[3]; + else + { + vnet_buffer (b3)->ip.flow_hash = l2_flow_hash (b3); + vnet_buffer (b3)->ip.adj_index[VLIB_TX] = + sp3->segments_lists[(vnet_buffer (b3)->ip.flow_hash & + (vec_len (sp3->segments_lists) - 1))]; + } + + sl0 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + sl1 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b1)->ip.adj_index[VLIB_TX]); + sl2 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b2)->ip.adj_index[VLIB_TX]); + sl3 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b3)->ip.adj_index[VLIB_TX]); + + ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl0->rewrite)); + ASSERT (b1->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl1->rewrite)); + ASSERT (b2->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl2->rewrite)); + ASSERT (b3->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl3->rewrite)); + + en0 = vlib_buffer_get_current (b0); + en1 = vlib_buffer_get_current (b1); + en2 = vlib_buffer_get_current (b2); + en3 = vlib_buffer_get_current (b3); + + clib_memcpy (((u8 *) en0) - vec_len (sl0->rewrite), sl0->rewrite, + vec_len (sl0->rewrite)); + clib_memcpy (((u8 *) en1) - vec_len (sl1->rewrite), sl1->rewrite, + vec_len (sl1->rewrite)); + clib_memcpy (((u8 *) en2) - vec_len (sl2->rewrite), sl2->rewrite, + vec_len (sl2->rewrite)); + clib_memcpy (((u8 *) en3) - vec_len (sl3->rewrite), sl3->rewrite, + vec_len (sl3->rewrite)); + + vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); + vlib_buffer_advance (b1, -(word) vec_len (sl1->rewrite)); + vlib_buffer_advance (b2, -(word) vec_len (sl2->rewrite)); + vlib_buffer_advance (b3, -(word) vec_len (sl3->rewrite)); + + ip0 = vlib_buffer_get_current (b0); + ip1 = vlib_buffer_get_current (b1); + ip2 = vlib_buffer_get_current (b2); + ip3 = vlib_buffer_get_current (b3); + + ip0->payload_length = + clib_host_to_net_u16 (b0->current_length - sizeof (ip6_header_t)); + ip1->payload_length = + clib_host_to_net_u16 (b1->current_length - sizeof (ip6_header_t)); + ip2->payload_length = + clib_host_to_net_u16 (b2->current_length - sizeof (ip6_header_t)); + ip3->payload_length = + clib_host_to_net_u16 (b3->current_length - sizeof (ip6_header_t)); + + sr0 = (void *) (ip0 + 1); + sr1 = (void *) (ip1 + 1); + sr2 = (void *) (ip2 + 1); + sr3 = (void *) (ip3 + 1); + + sr0->protocol = sr1->protocol = sr2->protocol = sr3->protocol = + IP_PROTOCOL_IP6_NONXT; + + /* Which Traffic class and flow label do I set ? */ + //ip0->ip_version_traffic_class_and_flow_label = clib_host_to_net_u32(0|((6&0xF)<<28)|((ip0_encap->tos&0xFF)<<20)); + + if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) + { + if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b1->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b1, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip1->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip1->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b2->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b2, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip2->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip2->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b3->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b3, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip3->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip3->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + } + + encap_pkts += 4; + vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, + n_left_to_next, bi0, bi1, bi2, bi3, + next0, next1, next2, next3); + } + + /* Single loop for potentially the last three packets */ + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0; + vlib_buffer_t *b0; + ip6_header_t *ip0 = 0; + ip6_sr_header_t *sr0; + ethernet_header_t *en0; + ip6_sr_policy_t *sp0; + ip6_sr_sl_t *sl0; + u32 next0 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; + + bi0 = from[0]; + to_next[0] = bi0; + from += 1; + to_next += 1; + n_left_from -= 1; + n_left_to_next -= 1; + b0 = vlib_get_buffer (vm, bi0); + + /* Find the SR policy */ + sp0 = pool_elt_at_index (sm->sr_policies, + sm->sw_iface_sr_policies[vnet_buffer + (b0)->sw_if_index + [VLIB_RX]]); + + /* In case there is more than one SL, LB among them */ + if (vec_len (sp0->segments_lists) == 1) + vnet_buffer (b0)->ip.adj_index[VLIB_TX] = sp0->segments_lists[0]; + else + { + vnet_buffer (b0)->ip.flow_hash = l2_flow_hash (b0); + vnet_buffer (b0)->ip.adj_index[VLIB_TX] = + sp0->segments_lists[(vnet_buffer (b0)->ip.flow_hash & + (vec_len (sp0->segments_lists) - 1))]; + } + sl0 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl0->rewrite)); + + en0 = vlib_buffer_get_current (b0); + + clib_memcpy (((u8 *) en0) - vec_len (sl0->rewrite), sl0->rewrite, + vec_len (sl0->rewrite)); + + vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); + + ip0 = vlib_buffer_get_current (b0); + + ip0->payload_length = + clib_host_to_net_u16 (b0->current_length - sizeof (ip6_header_t)); + + sr0 = (void *) (ip0 + 1); + sr0->protocol = IP_PROTOCOL_IP6_NONXT; + + if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE) && + PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + encap_pkts++; + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, + n_left_to_next, bi0, next0); + } + + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + + /* Update counters */ + vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, + SR_POLICY_REWRITE_ERROR_COUNTER_TOTAL, + encap_pkts); + vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, + SR_POLICY_REWRITE_ERROR_COUNTER_BSID, + bsid_pkts); + + return from_frame->n_vectors; +} + +/* *INDENT-OFF* */ +VLIB_REGISTER_NODE (sr_policy_rewrite_encaps_l2_node) = { + .function = sr_policy_rewrite_encaps_l2, + .name = "sr-pl-rewrite-encaps-l2", + .vector_size = sizeof (u32), + .format_trace = format_sr_policy_rewrite_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + .n_errors = SR_POLICY_REWRITE_N_ERROR, + .error_strings = sr_policy_rewrite_error_strings, + .n_next_nodes = SR_POLICY_REWRITE_N_NEXT, + .next_nodes = { +#define _(s,n) [SR_POLICY_REWRITE_NEXT_##s] = n, + foreach_sr_policy_rewrite_next +#undef _ + }, +}; +/* *INDENT-ON* */ + +/** + * @brief Graph node for applying a SR policy into a packet. SRH insertion. + */ +static uword +sr_policy_rewrite_insert (vlib_main_t * vm, vlib_node_runtime_t * node, + vlib_frame_t * from_frame) +{ + ip6_sr_main_t *sm = &sr_main; + u32 n_left_from, next_index, *from, *to_next; + + from = vlib_frame_vector_args (from_frame); + n_left_from = from_frame->n_vectors; + + next_index = node->cached_next_index; + + int insert_pkts = 0, bsid_pkts = 0; + + while (n_left_from > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + /* Quad - Loop */ + while (n_left_from >= 8 && n_left_to_next >= 4) + { + u32 bi0, bi1, bi2, bi3; + vlib_buffer_t *b0, *b1, *b2, *b3; + u32 next0, next1, next2, next3; + next0 = next1 = next2 = next3 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; + ip6_header_t *ip0, *ip1, *ip2, *ip3; + ip6_sr_header_t *sr0, *sr1, *sr2, *sr3; + ip6_sr_sl_t *sl0, *sl1, *sl2, *sl3; + u16 new_l0, new_l1, new_l2, new_l3; + + /* Prefetch next iteration. */ + { + vlib_buffer_t *p4, *p5, *p6, *p7; + + p4 = vlib_get_buffer (vm, from[4]); + p5 = vlib_get_buffer (vm, from[5]); + p6 = vlib_get_buffer (vm, from[6]); + p7 = vlib_get_buffer (vm, from[7]); + + /* Prefetch the buffer header and packet for the N+2 loop iteration */ + vlib_prefetch_buffer_header (p4, LOAD); + vlib_prefetch_buffer_header (p5, LOAD); + vlib_prefetch_buffer_header (p6, LOAD); + vlib_prefetch_buffer_header (p7, LOAD); + + CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); + } + + to_next[0] = bi0 = from[0]; + to_next[1] = bi1 = from[1]; + to_next[2] = bi2 = from[2]; + to_next[3] = bi3 = from[3]; + from += 4; + to_next += 4; + n_left_from -= 4; + n_left_to_next -= 4; + + b0 = vlib_get_buffer (vm, bi0); + b1 = vlib_get_buffer (vm, bi1); + b2 = vlib_get_buffer (vm, bi2); + b3 = vlib_get_buffer (vm, bi3); + + sl0 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + sl1 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b1)->ip.adj_index[VLIB_TX]); + sl2 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b2)->ip.adj_index[VLIB_TX]); + sl3 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b3)->ip.adj_index[VLIB_TX]); + ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl0->rewrite)); + ASSERT (b1->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl1->rewrite)); + ASSERT (b2->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl2->rewrite)); + ASSERT (b3->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl3->rewrite)); + + ip0 = vlib_buffer_get_current (b0); + ip1 = vlib_buffer_get_current (b1); + ip2 = vlib_buffer_get_current (b2); + ip3 = vlib_buffer_get_current (b3); + + if (ip0->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) + sr0 = + (ip6_sr_header_t *) (((void *) (ip0 + 1)) + + ip6_ext_header_len (ip0 + 1)); + else + sr0 = (ip6_sr_header_t *) (ip0 + 1); + + if (ip1->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) + sr1 = + (ip6_sr_header_t *) (((void *) (ip1 + 1)) + + ip6_ext_header_len (ip1 + 1)); + else + sr1 = (ip6_sr_header_t *) (ip1 + 1); + + if (ip2->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) + sr2 = + (ip6_sr_header_t *) (((void *) (ip2 + 1)) + + ip6_ext_header_len (ip2 + 1)); + else + sr2 = (ip6_sr_header_t *) (ip2 + 1); + + if (ip3->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) + sr3 = + (ip6_sr_header_t *) (((void *) (ip3 + 1)) + + ip6_ext_header_len (ip3 + 1)); + else + sr3 = (ip6_sr_header_t *) (ip3 + 1); + + clib_memcpy ((u8 *) ip0 - vec_len (sl0->rewrite), (u8 *) ip0, + (void *) sr0 - (void *) ip0); + clib_memcpy ((u8 *) ip1 - vec_len (sl1->rewrite), (u8 *) ip1, + (void *) sr1 - (void *) ip1); + clib_memcpy ((u8 *) ip2 - vec_len (sl2->rewrite), (u8 *) ip2, + (void *) sr2 - (void *) ip2); + clib_memcpy ((u8 *) ip3 - vec_len (sl3->rewrite), (u8 *) ip3, + (void *) sr3 - (void *) ip3); + + clib_memcpy (((u8 *) sr0 - vec_len (sl0->rewrite)), sl0->rewrite, + vec_len (sl0->rewrite)); + clib_memcpy (((u8 *) sr1 - vec_len (sl1->rewrite)), sl1->rewrite, + vec_len (sl1->rewrite)); + clib_memcpy (((u8 *) sr2 - vec_len (sl2->rewrite)), sl2->rewrite, + vec_len (sl2->rewrite)); + clib_memcpy (((u8 *) sr3 - vec_len (sl3->rewrite)), sl3->rewrite, + vec_len (sl3->rewrite)); + + vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); + vlib_buffer_advance (b1, -(word) vec_len (sl1->rewrite)); + vlib_buffer_advance (b2, -(word) vec_len (sl2->rewrite)); + vlib_buffer_advance (b3, -(word) vec_len (sl3->rewrite)); + + ip0 = ((void *) ip0) - vec_len (sl0->rewrite); + ip1 = ((void *) ip1) - vec_len (sl1->rewrite); + ip2 = ((void *) ip2) - vec_len (sl2->rewrite); + ip3 = ((void *) ip3) - vec_len (sl3->rewrite); + + ip0->hop_limit -= 1; + ip1->hop_limit -= 1; + ip2->hop_limit -= 1; + ip3->hop_limit -= 1; + + new_l0 = + clib_net_to_host_u16 (ip0->payload_length) + + vec_len (sl0->rewrite); + new_l1 = + clib_net_to_host_u16 (ip1->payload_length) + + vec_len (sl1->rewrite); + new_l2 = + clib_net_to_host_u16 (ip2->payload_length) + + vec_len (sl2->rewrite); + new_l3 = + clib_net_to_host_u16 (ip3->payload_length) + + vec_len (sl3->rewrite); + + ip0->payload_length = clib_host_to_net_u16 (new_l0); + ip1->payload_length = clib_host_to_net_u16 (new_l1); + ip2->payload_length = clib_host_to_net_u16 (new_l2); + ip3->payload_length = clib_host_to_net_u16 (new_l3); + + sr0 = ((void *) sr0) - vec_len (sl0->rewrite); + sr1 = ((void *) sr1) - vec_len (sl1->rewrite); + sr2 = ((void *) sr2) - vec_len (sl2->rewrite); + sr3 = ((void *) sr3) - vec_len (sl3->rewrite); + + sr0->segments->as_u64[0] = ip0->dst_address.as_u64[0]; + sr0->segments->as_u64[1] = ip0->dst_address.as_u64[1]; + sr1->segments->as_u64[0] = ip1->dst_address.as_u64[0]; + sr1->segments->as_u64[1] = ip1->dst_address.as_u64[1]; + sr2->segments->as_u64[0] = ip2->dst_address.as_u64[0]; + sr2->segments->as_u64[1] = ip2->dst_address.as_u64[1]; + sr3->segments->as_u64[0] = ip3->dst_address.as_u64[0]; + sr3->segments->as_u64[1] = ip3->dst_address.as_u64[1]; + + ip0->dst_address.as_u64[0] = + (sr0->segments + sr0->segments_left)->as_u64[0]; + ip0->dst_address.as_u64[1] = + (sr0->segments + sr0->segments_left)->as_u64[1]; + ip1->dst_address.as_u64[0] = + (sr1->segments + sr1->segments_left)->as_u64[0]; + ip1->dst_address.as_u64[1] = + (sr1->segments + sr1->segments_left)->as_u64[1]; + ip2->dst_address.as_u64[0] = + (sr2->segments + sr2->segments_left)->as_u64[0]; + ip2->dst_address.as_u64[1] = + (sr2->segments + sr2->segments_left)->as_u64[1]; + ip3->dst_address.as_u64[0] = + (sr3->segments + sr3->segments_left)->as_u64[0]; + ip3->dst_address.as_u64[1] = + (sr3->segments + sr3->segments_left)->as_u64[1]; + + ip6_ext_header_t *ip_ext; + if (ip0 + 1 == (void *) sr0) + { + sr0->protocol = ip0->protocol; + ip0->protocol = IP_PROTOCOL_IPV6_ROUTE; + } + else + { + ip_ext = (void *) (ip0 + 1); + sr0->protocol = ip_ext->next_hdr; + ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; + } + + if (ip1 + 1 == (void *) sr1) + { + sr1->protocol = ip1->protocol; + ip1->protocol = IP_PROTOCOL_IPV6_ROUTE; + } + else + { + ip_ext = (void *) (ip2 + 1); + sr2->protocol = ip_ext->next_hdr; + ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; + } + + if (ip2 + 1 == (void *) sr2) + { + sr2->protocol = ip2->protocol; + ip2->protocol = IP_PROTOCOL_IPV6_ROUTE; + } + else + { + ip_ext = (void *) (ip2 + 1); + sr2->protocol = ip_ext->next_hdr; + ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; + } + + if (ip3 + 1 == (void *) sr3) + { + sr3->protocol = ip3->protocol; + ip3->protocol = IP_PROTOCOL_IPV6_ROUTE; + } + else + { + ip_ext = (void *) (ip3 + 1); + sr3->protocol = ip_ext->next_hdr; + ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; + } + + insert_pkts += 4; + + if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) + { + if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b1->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b1, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip1->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip1->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b2->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b2, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip2->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip2->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b3->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b3, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip3->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip3->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + } + + vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, + n_left_to_next, bi0, bi1, bi2, bi3, + next0, next1, next2, next3); + } + + /* Single loop for potentially the last three packets */ + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0; + vlib_buffer_t *b0; + ip6_header_t *ip0 = 0; + ip6_sr_header_t *sr0 = 0; + ip6_sr_sl_t *sl0; + u32 next0 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; + u16 new_l0 = 0; + + bi0 = from[0]; + to_next[0] = bi0; + from += 1; + to_next += 1; + n_left_from -= 1; + n_left_to_next -= 1; + + b0 = vlib_get_buffer (vm, bi0); + sl0 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl0->rewrite)); + + ip0 = vlib_buffer_get_current (b0); + + if (ip0->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) + sr0 = + (ip6_sr_header_t *) (((void *) (ip0 + 1)) + + ip6_ext_header_len (ip0 + 1)); + else + sr0 = (ip6_sr_header_t *) (ip0 + 1); + + clib_memcpy ((u8 *) ip0 - vec_len (sl0->rewrite), (u8 *) ip0, + (void *) sr0 - (void *) ip0); + clib_memcpy (((u8 *) sr0 - vec_len (sl0->rewrite)), sl0->rewrite, + vec_len (sl0->rewrite)); + + vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); + + ip0 = ((void *) ip0) - vec_len (sl0->rewrite); + ip0->hop_limit -= 1; + new_l0 = + clib_net_to_host_u16 (ip0->payload_length) + + vec_len (sl0->rewrite); + ip0->payload_length = clib_host_to_net_u16 (new_l0); + + sr0 = ((void *) sr0) - vec_len (sl0->rewrite); + sr0->segments->as_u64[0] = ip0->dst_address.as_u64[0]; + sr0->segments->as_u64[1] = ip0->dst_address.as_u64[1]; + + ip0->dst_address.as_u64[0] = + (sr0->segments + sr0->segments_left)->as_u64[0]; + ip0->dst_address.as_u64[1] = + (sr0->segments + sr0->segments_left)->as_u64[1]; + + if (ip0 + 1 == (void *) sr0) + { + sr0->protocol = ip0->protocol; + ip0->protocol = IP_PROTOCOL_IPV6_ROUTE; + } + else + { + ip6_ext_header_t *ip_ext = (void *) (ip0 + 1); + sr0->protocol = ip_ext->next_hdr; + ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; + } + + if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE) && + PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + insert_pkts++; + + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, + n_left_to_next, bi0, next0); + } + + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + + /* Update counters */ + vlib_node_increment_counter (vm, sr_policy_rewrite_insert_node.index, + SR_POLICY_REWRITE_ERROR_COUNTER_TOTAL, + insert_pkts); + vlib_node_increment_counter (vm, sr_policy_rewrite_insert_node.index, + SR_POLICY_REWRITE_ERROR_COUNTER_BSID, + bsid_pkts); + return from_frame->n_vectors; +} + +/* *INDENT-OFF* */ +VLIB_REGISTER_NODE (sr_policy_rewrite_insert_node) = { + .function = sr_policy_rewrite_insert, + .name = "sr-pl-rewrite-insert", + .vector_size = sizeof (u32), + .format_trace = format_sr_policy_rewrite_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + .n_errors = SR_POLICY_REWRITE_N_ERROR, + .error_strings = sr_policy_rewrite_error_strings, + .n_next_nodes = SR_POLICY_REWRITE_N_NEXT, + .next_nodes = { +#define _(s,n) [SR_POLICY_REWRITE_NEXT_##s] = n, + foreach_sr_policy_rewrite_next +#undef _ + }, +}; +/* *INDENT-ON* */ + +/** + * @brief Graph node for applying a SR policy into a packet. BSID - SRH insertion. + */ +static uword +sr_policy_rewrite_b_insert (vlib_main_t * vm, vlib_node_runtime_t * node, + vlib_frame_t * from_frame) +{ + ip6_sr_main_t *sm = &sr_main; + u32 n_left_from, next_index, *from, *to_next; + + from = vlib_frame_vector_args (from_frame); + n_left_from = from_frame->n_vectors; + + next_index = node->cached_next_index; + + int insert_pkts = 0, bsid_pkts = 0; + + while (n_left_from > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + /* Quad - Loop */ + while (n_left_from >= 8 && n_left_to_next >= 4) + { + u32 bi0, bi1, bi2, bi3; + vlib_buffer_t *b0, *b1, *b2, *b3; + u32 next0, next1, next2, next3; + next0 = next1 = next2 = next3 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; + ip6_header_t *ip0, *ip1, *ip2, *ip3; + ip6_sr_header_t *sr0, *sr1, *sr2, *sr3; + ip6_sr_sl_t *sl0, *sl1, *sl2, *sl3; + u16 new_l0, new_l1, new_l2, new_l3; + + /* Prefetch next iteration. */ + { + vlib_buffer_t *p4, *p5, *p6, *p7; + + p4 = vlib_get_buffer (vm, from[4]); + p5 = vlib_get_buffer (vm, from[5]); + p6 = vlib_get_buffer (vm, from[6]); + p7 = vlib_get_buffer (vm, from[7]); + + /* Prefetch the buffer header and packet for the N+2 loop iteration */ + vlib_prefetch_buffer_header (p4, LOAD); + vlib_prefetch_buffer_header (p5, LOAD); + vlib_prefetch_buffer_header (p6, LOAD); + vlib_prefetch_buffer_header (p7, LOAD); + + CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); + } + + to_next[0] = bi0 = from[0]; + to_next[1] = bi1 = from[1]; + to_next[2] = bi2 = from[2]; + to_next[3] = bi3 = from[3]; + from += 4; + to_next += 4; + n_left_from -= 4; + n_left_to_next -= 4; + + b0 = vlib_get_buffer (vm, bi0); + b1 = vlib_get_buffer (vm, bi1); + b2 = vlib_get_buffer (vm, bi2); + b3 = vlib_get_buffer (vm, bi3); + + sl0 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + sl1 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b1)->ip.adj_index[VLIB_TX]); + sl2 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b2)->ip.adj_index[VLIB_TX]); + sl3 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b3)->ip.adj_index[VLIB_TX]); + ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl0->rewrite_bsid)); + ASSERT (b1->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl1->rewrite_bsid)); + ASSERT (b2->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl2->rewrite_bsid)); + ASSERT (b3->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl3->rewrite_bsid)); + + ip0 = vlib_buffer_get_current (b0); + ip1 = vlib_buffer_get_current (b1); + ip2 = vlib_buffer_get_current (b2); + ip3 = vlib_buffer_get_current (b3); + + if (ip0->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) + sr0 = + (ip6_sr_header_t *) (((void *) (ip0 + 1)) + + ip6_ext_header_len (ip0 + 1)); + else + sr0 = (ip6_sr_header_t *) (ip0 + 1); + + if (ip1->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) + sr1 = + (ip6_sr_header_t *) (((void *) (ip1 + 1)) + + ip6_ext_header_len (ip1 + 1)); + else + sr1 = (ip6_sr_header_t *) (ip1 + 1); + + if (ip2->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) + sr2 = + (ip6_sr_header_t *) (((void *) (ip2 + 1)) + + ip6_ext_header_len (ip2 + 1)); + else + sr2 = (ip6_sr_header_t *) (ip2 + 1); + + if (ip3->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) + sr3 = + (ip6_sr_header_t *) (((void *) (ip3 + 1)) + + ip6_ext_header_len (ip3 + 1)); + else + sr3 = (ip6_sr_header_t *) (ip3 + 1); + + clib_memcpy ((u8 *) ip0 - vec_len (sl0->rewrite_bsid), (u8 *) ip0, + (void *) sr0 - (void *) ip0); + clib_memcpy ((u8 *) ip1 - vec_len (sl1->rewrite_bsid), (u8 *) ip1, + (void *) sr1 - (void *) ip1); + clib_memcpy ((u8 *) ip2 - vec_len (sl2->rewrite_bsid), (u8 *) ip2, + (void *) sr2 - (void *) ip2); + clib_memcpy ((u8 *) ip3 - vec_len (sl3->rewrite_bsid), (u8 *) ip3, + (void *) sr3 - (void *) ip3); + + clib_memcpy (((u8 *) sr0 - vec_len (sl0->rewrite_bsid)), + sl0->rewrite_bsid, vec_len (sl0->rewrite_bsid)); + clib_memcpy (((u8 *) sr1 - vec_len (sl1->rewrite_bsid)), + sl1->rewrite_bsid, vec_len (sl1->rewrite_bsid)); + clib_memcpy (((u8 *) sr2 - vec_len (sl2->rewrite_bsid)), + sl2->rewrite_bsid, vec_len (sl2->rewrite_bsid)); + clib_memcpy (((u8 *) sr3 - vec_len (sl3->rewrite_bsid)), + sl3->rewrite_bsid, vec_len (sl3->rewrite_bsid)); + + vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite_bsid)); + vlib_buffer_advance (b1, -(word) vec_len (sl1->rewrite_bsid)); + vlib_buffer_advance (b2, -(word) vec_len (sl2->rewrite_bsid)); + vlib_buffer_advance (b3, -(word) vec_len (sl3->rewrite_bsid)); + + ip0 = ((void *) ip0) - vec_len (sl0->rewrite_bsid); + ip1 = ((void *) ip1) - vec_len (sl1->rewrite_bsid); + ip2 = ((void *) ip2) - vec_len (sl2->rewrite_bsid); + ip3 = ((void *) ip3) - vec_len (sl3->rewrite_bsid); + + ip0->hop_limit -= 1; + ip1->hop_limit -= 1; + ip2->hop_limit -= 1; + ip3->hop_limit -= 1; + + new_l0 = + clib_net_to_host_u16 (ip0->payload_length) + + vec_len (sl0->rewrite_bsid); + new_l1 = + clib_net_to_host_u16 (ip1->payload_length) + + vec_len (sl1->rewrite_bsid); + new_l2 = + clib_net_to_host_u16 (ip2->payload_length) + + vec_len (sl2->rewrite_bsid); + new_l3 = + clib_net_to_host_u16 (ip3->payload_length) + + vec_len (sl3->rewrite_bsid); + + ip0->payload_length = clib_host_to_net_u16 (new_l0); + ip1->payload_length = clib_host_to_net_u16 (new_l1); + ip2->payload_length = clib_host_to_net_u16 (new_l2); + ip3->payload_length = clib_host_to_net_u16 (new_l3); + + sr0 = ((void *) sr0) - vec_len (sl0->rewrite_bsid); + sr1 = ((void *) sr1) - vec_len (sl1->rewrite_bsid); + sr2 = ((void *) sr2) - vec_len (sl2->rewrite_bsid); + sr3 = ((void *) sr3) - vec_len (sl3->rewrite_bsid); + + ip0->dst_address.as_u64[0] = + (sr0->segments + sr0->segments_left)->as_u64[0]; + ip0->dst_address.as_u64[1] = + (sr0->segments + sr0->segments_left)->as_u64[1]; + ip1->dst_address.as_u64[0] = + (sr1->segments + sr1->segments_left)->as_u64[0]; + ip1->dst_address.as_u64[1] = + (sr1->segments + sr1->segments_left)->as_u64[1]; + ip2->dst_address.as_u64[0] = + (sr2->segments + sr2->segments_left)->as_u64[0]; + ip2->dst_address.as_u64[1] = + (sr2->segments + sr2->segments_left)->as_u64[1]; + ip3->dst_address.as_u64[0] = + (sr3->segments + sr3->segments_left)->as_u64[0]; + ip3->dst_address.as_u64[1] = + (sr3->segments + sr3->segments_left)->as_u64[1]; + + ip6_ext_header_t *ip_ext; + if (ip0 + 1 == (void *) sr0) + { + sr0->protocol = ip0->protocol; + ip0->protocol = IP_PROTOCOL_IPV6_ROUTE; + } + else + { + ip_ext = (void *) (ip0 + 1); + sr0->protocol = ip_ext->next_hdr; + ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; + } + + if (ip1 + 1 == (void *) sr1) + { + sr1->protocol = ip1->protocol; + ip1->protocol = IP_PROTOCOL_IPV6_ROUTE; + } + else + { + ip_ext = (void *) (ip2 + 1); + sr2->protocol = ip_ext->next_hdr; + ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; + } + + if (ip2 + 1 == (void *) sr2) + { + sr2->protocol = ip2->protocol; + ip2->protocol = IP_PROTOCOL_IPV6_ROUTE; + } + else + { + ip_ext = (void *) (ip2 + 1); + sr2->protocol = ip_ext->next_hdr; + ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; + } + + if (ip3 + 1 == (void *) sr3) + { + sr3->protocol = ip3->protocol; + ip3->protocol = IP_PROTOCOL_IPV6_ROUTE; + } + else + { + ip_ext = (void *) (ip3 + 1); + sr3->protocol = ip_ext->next_hdr; + ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; + } + + insert_pkts += 4; + + if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) + { + if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b1->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b1, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip1->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip1->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b2->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b2, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip2->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip2->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b3->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b3, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip3->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip3->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + } + + vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, + n_left_to_next, bi0, bi1, bi2, bi3, + next0, next1, next2, next3); + } + + /* Single loop for potentially the last three packets */ + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0; + vlib_buffer_t *b0; + ip6_header_t *ip0 = 0; + ip6_sr_header_t *sr0 = 0; + ip6_sr_sl_t *sl0; + u32 next0 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; + u16 new_l0 = 0; + + bi0 = from[0]; + to_next[0] = bi0; + from += 1; + to_next += 1; + n_left_from -= 1; + n_left_to_next -= 1; + + b0 = vlib_get_buffer (vm, bi0); + sl0 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl0->rewrite_bsid)); + + ip0 = vlib_buffer_get_current (b0); + + if (ip0->protocol == IP_PROTOCOL_IP6_HOP_BY_HOP_OPTIONS) + sr0 = + (ip6_sr_header_t *) (((void *) (ip0 + 1)) + + ip6_ext_header_len (ip0 + 1)); + else + sr0 = (ip6_sr_header_t *) (ip0 + 1); + + clib_memcpy ((u8 *) ip0 - vec_len (sl0->rewrite_bsid), (u8 *) ip0, + (void *) sr0 - (void *) ip0); + clib_memcpy (((u8 *) sr0 - vec_len (sl0->rewrite_bsid)), + sl0->rewrite_bsid, vec_len (sl0->rewrite_bsid)); + + vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite_bsid)); + + ip0 = ((void *) ip0) - vec_len (sl0->rewrite_bsid); + ip0->hop_limit -= 1; + new_l0 = + clib_net_to_host_u16 (ip0->payload_length) + + vec_len (sl0->rewrite_bsid); + ip0->payload_length = clib_host_to_net_u16 (new_l0); + + sr0 = ((void *) sr0) - vec_len (sl0->rewrite_bsid); + + ip0->dst_address.as_u64[0] = + (sr0->segments + sr0->segments_left)->as_u64[0]; + ip0->dst_address.as_u64[1] = + (sr0->segments + sr0->segments_left)->as_u64[1]; + + if (ip0 + 1 == (void *) sr0) + { + sr0->protocol = ip0->protocol; + ip0->protocol = IP_PROTOCOL_IPV6_ROUTE; + } + else + { + ip6_ext_header_t *ip_ext = (void *) (ip0 + 1); + sr0->protocol = ip_ext->next_hdr; + ip_ext->next_hdr = IP_PROTOCOL_IPV6_ROUTE; + } + + if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE) && + PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + insert_pkts++; + + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, + n_left_to_next, bi0, next0); + } + + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + + /* Update counters */ + vlib_node_increment_counter (vm, sr_policy_rewrite_insert_node.index, + SR_POLICY_REWRITE_ERROR_COUNTER_TOTAL, + insert_pkts); + vlib_node_increment_counter (vm, sr_policy_rewrite_insert_node.index, + SR_POLICY_REWRITE_ERROR_COUNTER_BSID, + bsid_pkts); + return from_frame->n_vectors; +} + +/* *INDENT-OFF* */ +VLIB_REGISTER_NODE (sr_policy_rewrite_b_insert_node) = { + .function = sr_policy_rewrite_b_insert, + .name = "sr-pl-rewrite-b-insert", + .vector_size = sizeof (u32), + .format_trace = format_sr_policy_rewrite_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + .n_errors = SR_POLICY_REWRITE_N_ERROR, + .error_strings = sr_policy_rewrite_error_strings, + .n_next_nodes = SR_POLICY_REWRITE_N_NEXT, + .next_nodes = { +#define _(s,n) [SR_POLICY_REWRITE_NEXT_##s] = n, + foreach_sr_policy_rewrite_next +#undef _ + }, +}; +/* *INDENT-ON* */ + +/** + * @brief Function BSID encapsulation + */ +static_always_inline void +end_bsid_encaps_srh_processing (vlib_node_runtime_t * node, + vlib_buffer_t * b0, + ip6_header_t * ip0, + ip6_sr_header_t * sr0, u32 * next0) +{ + ip6_address_t *new_dst0; + + if (PREDICT_FALSE (!sr0)) + goto error_bsid_encaps; + + if (PREDICT_TRUE (sr0->type == ROUTING_HEADER_TYPE_SR)) + { + if (PREDICT_TRUE (sr0->segments_left != 0)) + { + sr0->segments_left -= 1; + new_dst0 = (ip6_address_t *) (sr0->segments); + new_dst0 += sr0->segments_left; + ip0->dst_address.as_u64[0] = new_dst0->as_u64[0]; + ip0->dst_address.as_u64[1] = new_dst0->as_u64[1]; + return; + } + } + +error_bsid_encaps: + *next0 = SR_POLICY_REWRITE_NEXT_ERROR; + b0->error = node->errors[SR_POLICY_REWRITE_ERROR_BSID_ZERO]; +} + +/** + * @brief Graph node for applying a SR policy BSID - Encapsulation + */ +static uword +sr_policy_rewrite_b_encaps (vlib_main_t * vm, vlib_node_runtime_t * node, + vlib_frame_t * from_frame) +{ + ip6_sr_main_t *sm = &sr_main; + u32 n_left_from, next_index, *from, *to_next; + + from = vlib_frame_vector_args (from_frame); + n_left_from = from_frame->n_vectors; + + next_index = node->cached_next_index; + + int encap_pkts = 0, bsid_pkts = 0; + + while (n_left_from > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + /* Quad - Loop */ + while (n_left_from >= 8 && n_left_to_next >= 4) + { + u32 bi0, bi1, bi2, bi3; + vlib_buffer_t *b0, *b1, *b2, *b3; + u32 next0, next1, next2, next3; + next0 = next1 = next2 = next3 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; + ip6_header_t *ip0, *ip1, *ip2, *ip3; + ip6_header_t *ip0_encap, *ip1_encap, *ip2_encap, *ip3_encap; + ip6_sr_header_t *sr0, *sr1, *sr2, *sr3; + ip6_ext_header_t *prev0, *prev1, *prev2, *prev3; + ip6_sr_sl_t *sl0, *sl1, *sl2, *sl3; + + /* Prefetch next iteration. */ + { + vlib_buffer_t *p4, *p5, *p6, *p7; + + p4 = vlib_get_buffer (vm, from[4]); + p5 = vlib_get_buffer (vm, from[5]); + p6 = vlib_get_buffer (vm, from[6]); + p7 = vlib_get_buffer (vm, from[7]); + + /* Prefetch the buffer header and packet for the N+2 loop iteration */ + vlib_prefetch_buffer_header (p4, LOAD); + vlib_prefetch_buffer_header (p5, LOAD); + vlib_prefetch_buffer_header (p6, LOAD); + vlib_prefetch_buffer_header (p7, LOAD); + + CLIB_PREFETCH (p4->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p5->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p6->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p7->data, CLIB_CACHE_LINE_BYTES, STORE); + } + + to_next[0] = bi0 = from[0]; + to_next[1] = bi1 = from[1]; + to_next[2] = bi2 = from[2]; + to_next[3] = bi3 = from[3]; + from += 4; + to_next += 4; + n_left_from -= 4; + n_left_to_next -= 4; + + b0 = vlib_get_buffer (vm, bi0); + b1 = vlib_get_buffer (vm, bi1); + b2 = vlib_get_buffer (vm, bi2); + b3 = vlib_get_buffer (vm, bi3); + + sl0 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + sl1 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b1)->ip.adj_index[VLIB_TX]); + sl2 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b2)->ip.adj_index[VLIB_TX]); + sl3 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b3)->ip.adj_index[VLIB_TX]); + ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl0->rewrite)); + ASSERT (b1->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl1->rewrite)); + ASSERT (b2->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl2->rewrite)); + ASSERT (b3->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl3->rewrite)); + + ip0_encap = vlib_buffer_get_current (b0); + ip1_encap = vlib_buffer_get_current (b1); + ip2_encap = vlib_buffer_get_current (b2); + ip3_encap = vlib_buffer_get_current (b3); + + ip6_ext_header_find_t (ip0_encap, prev0, sr0, + IP_PROTOCOL_IPV6_ROUTE); + ip6_ext_header_find_t (ip1_encap, prev1, sr1, + IP_PROTOCOL_IPV6_ROUTE); + ip6_ext_header_find_t (ip2_encap, prev2, sr2, + IP_PROTOCOL_IPV6_ROUTE); + ip6_ext_header_find_t (ip3_encap, prev3, sr3, + IP_PROTOCOL_IPV6_ROUTE); + + end_bsid_encaps_srh_processing (node, b0, ip0_encap, sr0, &next0); + end_bsid_encaps_srh_processing (node, b1, ip1_encap, sr1, &next1); + end_bsid_encaps_srh_processing (node, b2, ip2_encap, sr2, &next2); + end_bsid_encaps_srh_processing (node, b3, ip3_encap, sr3, &next3); + + clib_memcpy (((u8 *) ip0_encap) - vec_len (sl0->rewrite), + sl0->rewrite, vec_len (sl0->rewrite)); + clib_memcpy (((u8 *) ip1_encap) - vec_len (sl1->rewrite), + sl1->rewrite, vec_len (sl1->rewrite)); + clib_memcpy (((u8 *) ip2_encap) - vec_len (sl2->rewrite), + sl2->rewrite, vec_len (sl2->rewrite)); + clib_memcpy (((u8 *) ip3_encap) - vec_len (sl3->rewrite), + sl3->rewrite, vec_len (sl3->rewrite)); + + vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); + vlib_buffer_advance (b1, -(word) vec_len (sl1->rewrite)); + vlib_buffer_advance (b2, -(word) vec_len (sl2->rewrite)); + vlib_buffer_advance (b3, -(word) vec_len (sl3->rewrite)); + + ip0 = vlib_buffer_get_current (b0); + ip1 = vlib_buffer_get_current (b1); + ip2 = vlib_buffer_get_current (b2); + ip3 = vlib_buffer_get_current (b3); + + encaps_processing_v6 (node, b0, ip0, ip0_encap); + encaps_processing_v6 (node, b1, ip1, ip1_encap); + encaps_processing_v6 (node, b2, ip2, ip2_encap); + encaps_processing_v6 (node, b3, ip3, ip3_encap); + + if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) + { + if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b1->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b1, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip1->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip1->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b2->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b2, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip2->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip2->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + if (PREDICT_FALSE (b3->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b3, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip3->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip3->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + } + + encap_pkts += 4; + vlib_validate_buffer_enqueue_x4 (vm, node, next_index, to_next, + n_left_to_next, bi0, bi1, bi2, bi3, + next0, next1, next2, next3); + } + + /* Single loop for potentially the last three packets */ + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0; + vlib_buffer_t *b0; + ip6_header_t *ip0 = 0, *ip0_encap = 0; + ip6_ext_header_t *prev0; + ip6_sr_header_t *sr0; + ip6_sr_sl_t *sl0; + u32 next0 = SR_POLICY_REWRITE_NEXT_IP6_LOOKUP; + + bi0 = from[0]; + to_next[0] = bi0; + from += 1; + to_next += 1; + n_left_from -= 1; + n_left_to_next -= 1; + b0 = vlib_get_buffer (vm, bi0); + + sl0 = + pool_elt_at_index (sm->sid_lists, + vnet_buffer (b0)->ip.adj_index[VLIB_TX]); + ASSERT (b0->current_data + VLIB_BUFFER_PRE_DATA_SIZE >= + vec_len (sl0->rewrite)); + + ip0_encap = vlib_buffer_get_current (b0); + ip6_ext_header_find_t (ip0_encap, prev0, sr0, + IP_PROTOCOL_IPV6_ROUTE); + end_bsid_encaps_srh_processing (node, b0, ip0_encap, sr0, &next0); + + clib_memcpy (((u8 *) ip0_encap) - vec_len (sl0->rewrite), + sl0->rewrite, vec_len (sl0->rewrite)); + vlib_buffer_advance (b0, -(word) vec_len (sl0->rewrite)); + + ip0 = vlib_buffer_get_current (b0); + + encaps_processing_v6 (node, b0, ip0, ip0_encap); + + if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE) && + PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED)) + { + sr_policy_rewrite_trace_t *tr = + vlib_add_trace (vm, node, b0, sizeof (*tr)); + clib_memcpy (tr->src.as_u8, ip0->src_address.as_u8, + sizeof (tr->src.as_u8)); + clib_memcpy (tr->dst.as_u8, ip0->dst_address.as_u8, + sizeof (tr->dst.as_u8)); + } + + encap_pkts++; + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next, + n_left_to_next, bi0, next0); + } + + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + + /* Update counters */ + vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, + SR_POLICY_REWRITE_ERROR_COUNTER_TOTAL, + encap_pkts); + vlib_node_increment_counter (vm, sr_policy_rewrite_encaps_node.index, + SR_POLICY_REWRITE_ERROR_COUNTER_BSID, + bsid_pkts); + + return from_frame->n_vectors; +} + +/* *INDENT-OFF* */ +VLIB_REGISTER_NODE (sr_policy_rewrite_b_encaps_node) = { + .function = sr_policy_rewrite_b_encaps, + .name = "sr-pl-rewrite-b-encaps", + .vector_size = sizeof (u32), + .format_trace = format_sr_policy_rewrite_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + .n_errors = SR_POLICY_REWRITE_N_ERROR, + .error_strings = sr_policy_rewrite_error_strings, + .n_next_nodes = SR_POLICY_REWRITE_N_NEXT, + .next_nodes = { +#define _(s,n) [SR_POLICY_REWRITE_NEXT_##s] = n, + foreach_sr_policy_rewrite_next +#undef _ + }, +}; +/* *INDENT-ON* */ + +/*************************** SR Segment Lists DPOs ****************************/ +static u8 * +format_sr_segment_list_dpo (u8 * s, va_list * args) +{ + ip6_sr_main_t *sm = &sr_main; + ip6_address_t *addr; + ip6_sr_sl_t *sl; + + index_t index = va_arg (*args, index_t); + CLIB_UNUSED (u32 indent) = va_arg (*args, u32); + s = format (s, "SR: Segment List index:[%d]", index); + s = format (s, "\n\tSegments:"); + + sl = pool_elt_at_index (sm->sid_lists, index); + + s = format (s, "< "); + vec_foreach (addr, sl->segments) + { + s = format (s, "%U, ", format_ip6_address, addr); + } + s = format (s, "\b\b > - "); + s = format (s, "Weight: %u", sl->weight); + + return s; +} + +const static dpo_vft_t sr_policy_rewrite_vft = { + .dv_lock = sr_dpo_lock, + .dv_unlock = sr_dpo_unlock, + .dv_format = format_sr_segment_list_dpo, +}; + +const static char *const sr_pr_encaps_ip6_nodes[] = { + "sr-pl-rewrite-encaps", + NULL, +}; + +const static char *const sr_pr_encaps_ip4_nodes[] = { + "sr-pl-rewrite-encaps-v4", + NULL, +}; + +const static char *const *const sr_pr_encaps_nodes[DPO_PROTO_NUM] = { + [DPO_PROTO_IP6] = sr_pr_encaps_ip6_nodes, + [DPO_PROTO_IP4] = sr_pr_encaps_ip4_nodes, +}; + +const static char *const sr_pr_insert_ip6_nodes[] = { + "sr-pl-rewrite-insert", + NULL, +}; + +const static char *const *const sr_pr_insert_nodes[DPO_PROTO_NUM] = { + [DPO_PROTO_IP6] = sr_pr_insert_ip6_nodes, +}; + +const static char *const sr_pr_bsid_insert_ip6_nodes[] = { + "sr-pl-rewrite-b-insert", + NULL, +}; + +const static char *const *const sr_pr_bsid_insert_nodes[DPO_PROTO_NUM] = { + [DPO_PROTO_IP6] = sr_pr_bsid_insert_ip6_nodes, +}; + +const static char *const sr_pr_bsid_encaps_ip6_nodes[] = { + "sr-pl-rewrite-b-encaps", + NULL, +}; + +const static char *const *const sr_pr_bsid_encaps_nodes[DPO_PROTO_NUM] = { + [DPO_PROTO_IP6] = sr_pr_bsid_encaps_ip6_nodes, +}; + +/********************* SR Policy Rewrite initialization ***********************/ +/** + * @brief SR Policy Rewrite initialization + */ +clib_error_t * +sr_policy_rewrite_init (vlib_main_t * vm) +{ + ip6_sr_main_t *sm = &sr_main; + + /* Init memory for sr policy keys (bsid <-> ip6_address_t) */ + mhash_init (&sm->sr_policies_index_hash, sizeof (uword), + sizeof (ip6_address_t)); + + /* Init SR VPO DPOs type */ + sr_pr_encaps_dpo_type = + dpo_register_new_type (&sr_policy_rewrite_vft, sr_pr_encaps_nodes); + + sr_pr_insert_dpo_type = + dpo_register_new_type (&sr_policy_rewrite_vft, sr_pr_insert_nodes); + + sr_pr_bsid_encaps_dpo_type = + dpo_register_new_type (&sr_policy_rewrite_vft, sr_pr_bsid_encaps_nodes); + + sr_pr_bsid_insert_dpo_type = + dpo_register_new_type (&sr_policy_rewrite_vft, sr_pr_bsid_insert_nodes); + + /* Register the L2 encaps node used in HW redirect */ + sm->l2_sr_policy_rewrite_index = sr_policy_rewrite_encaps_node.index; + + sm->fib_table_ip6 = (u32) ~ 0; + sm->fib_table_ip4 = (u32) ~ 0; + + return 0; +} + +VLIB_INIT_FUNCTION (sr_policy_rewrite_init); + + +/* +* fd.io coding-style-patch-verification: ON +* +* Local Variables: +* eval: (c-set-style "gnu") +* End: +*/ diff --git a/src/vnet/srv6/sr_steering.c b/src/vnet/srv6/sr_steering.c new file mode 100755 index 00000000..a7903751 --- /dev/null +++ b/src/vnet/srv6/sr_steering.c @@ -0,0 +1,573 @@ +/* + * sr_steering.c: ipv6 segment routing steering into SR policy + * + * Copyright (c) 2016 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * @file + * @brief Packet steering into SR Policies + * + * This file is in charge of handling the FIB appropiatly to steer packets + * through SR Policies as defined in 'sr_policy_rewrite.c'. Notice that here + * we are only doing steering. SR policy application is done in + * sr_policy_rewrite.c + * + * Supports: + * - Steering of IPv6 traffic Destination Address based + * - Steering of IPv4 traffic Destination Address based + * - Steering of L2 frames, interface based (sw interface) + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +/** + * @brief Steer traffic L2 and L3 traffic through a given SR policy + * + * @param is_del + * @param bsid is the bindingSID of the SR Policy (alt to sr_policy_index) + * @param sr_policy is the index of the SR Policy (alt to bsid) + * @param table_id is the VRF where to install the FIB entry for the BSID + * @param prefix is the IPv4/v6 address for L3 traffic type + * @param mask_width is the mask for L3 traffic type + * @param sw_if_index is the incoming interface for L2 traffic + * @param traffic_type describes the type of traffic + * + * @return 0 if correct, else error + */ +int +sr_steering_policy (int is_del, ip6_address_t * bsid, u32 sr_policy_index, + u32 table_id, ip46_address_t * prefix, u32 mask_width, + u32 sw_if_index, u8 traffic_type) +{ + ip6_sr_main_t *sm = &sr_main; + sr_steering_key_t key; + ip6_sr_steering_policy_t *steer_pl; + fib_prefix_t pfx = { 0 }; + + ip6_sr_policy_t *sr_policy = 0; + uword *p = 0; + + memset (&key, 0, sizeof (sr_steering_key_t)); + + /* Compute the steer policy key */ + if (traffic_type == SR_STEER_IPV4 || traffic_type == SR_STEER_IPV6) + { + key.l3.prefix.as_u64[0] = prefix->as_u64[0]; + key.l3.prefix.as_u64[1] = prefix->as_u64[1]; + key.l3.mask_width = mask_width; + key.l3.fib_table = (table_id != (u32) ~ 0 ? table_id : 0); + } + else if (traffic_type == SR_STEER_L2) + { + key.l2.sw_if_index = sw_if_index; + + /* Sanitise the SW_IF_INDEX */ + if (pool_is_free_index (sm->vnet_main->interface_main.sw_interfaces, + sw_if_index)) + return -3; + + vnet_sw_interface_t *sw = + vnet_get_sw_interface (sm->vnet_main, sw_if_index); + if (sw->type != VNET_SW_INTERFACE_TYPE_HARDWARE) + return -3; + } + else + return -1; + + key.traffic_type = traffic_type; + + /* Search for the item */ + p = mhash_get (&sm->sr_steer_policies_hash, &key); + + if (p) + { + /* Retrieve Steer Policy function */ + steer_pl = pool_elt_at_index (sm->steer_policies, p[0]); + + if (is_del) + { + if (steer_pl->classify.traffic_type == SR_STEER_IPV6) + { + /* Remove FIB entry */ + pfx.fp_proto = FIB_PROTOCOL_IP6; + pfx.fp_len = steer_pl->classify.l3.mask_width; + pfx.fp_addr.ip6 = steer_pl->classify.l3.prefix.ip6; + + fib_table_entry_delete (fib_table_find + (FIB_PROTOCOL_IP6, + steer_pl->classify.l3.fib_table), + &pfx, FIB_SOURCE_SR); + } + else if (steer_pl->classify.traffic_type == SR_STEER_IPV4) + { + /* Remove FIB entry */ + pfx.fp_proto = FIB_PROTOCOL_IP4; + pfx.fp_len = steer_pl->classify.l3.mask_width; + pfx.fp_addr.ip4 = steer_pl->classify.l3.prefix.ip4; + + fib_table_entry_delete (fib_table_find + (FIB_PROTOCOL_IP4, + steer_pl->classify.l3.fib_table), &pfx, + FIB_SOURCE_SR); + } + else if (steer_pl->classify.traffic_type == SR_STEER_L2) + { + /* Remove HW redirection */ + vnet_feature_enable_disable ("device-input", + "sr-policy-rewrite-encaps-l2", + sw_if_index, 0, 0, 0); + sm->sw_iface_sr_policies[sw_if_index] = ~(u32) 0; + + /* Remove promiscous mode from interface */ + vnet_main_t *vnm = vnet_get_main (); + ethernet_main_t *em = ðernet_main; + ethernet_interface_t *eif = + ethernet_get_interface (em, sw_if_index); + + if (!eif) + goto cleanup_error_redirection; + + ethernet_set_flags (vnm, sw_if_index, 0); + } + + /* Delete SR steering policy entry */ + pool_put (sm->steer_policies, steer_pl); + mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); + + /* If no more SR policies or steering policies */ + if (!pool_elts (sm->sr_policies) && !pool_elts (sm->steer_policies)) + { + fib_table_unlock (sm->fib_table_ip6, FIB_PROTOCOL_IP6); + fib_table_unlock (sm->fib_table_ip4, FIB_PROTOCOL_IP6); + sm->fib_table_ip6 = (u32) ~ 0; + sm->fib_table_ip4 = (u32) ~ 0; + } + + return 1; + } + else /* It means user requested to update an existing SR steering policy */ + { + /* Retrieve SR steering policy */ + if (bsid) + { + p = mhash_get (&sm->sr_policies_index_hash, bsid); + if (p) + sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); + else + return -2; + } + else + sr_policy = pool_elt_at_index (sm->sr_policies, sr_policy_index); + + if (!sr_policy) + return -2; + + steer_pl->sr_policy = sr_policy - sm->sr_policies; + + /* Remove old FIB/hw redirection and create a new one */ + if (steer_pl->classify.traffic_type == SR_STEER_IPV6) + { + /* Remove FIB entry */ + pfx.fp_proto = FIB_PROTOCOL_IP6; + pfx.fp_len = steer_pl->classify.l3.mask_width; + pfx.fp_addr.ip6 = steer_pl->classify.l3.prefix.ip6; + + fib_table_entry_delete (fib_table_find + (FIB_PROTOCOL_IP6, + steer_pl->classify.l3.fib_table), + &pfx, FIB_SOURCE_SR); + + /* Create a new one */ + goto update_fib; + } + else if (steer_pl->classify.traffic_type == SR_STEER_IPV4) + { + /* Remove FIB entry */ + pfx.fp_proto = FIB_PROTOCOL_IP4; + pfx.fp_len = steer_pl->classify.l3.mask_width; + pfx.fp_addr.ip4 = steer_pl->classify.l3.prefix.ip4; + + fib_table_entry_delete (fib_table_find + (FIB_PROTOCOL_IP4, + steer_pl->classify.l3.fib_table), + &pfx, FIB_SOURCE_SR); + + /* Create a new one */ + goto update_fib; + } + else if (steer_pl->classify.traffic_type == SR_STEER_L2) + { + /* Update L2-HW redirection */ + goto update_fib; + } + } + } + else + /* delete; steering policy does not exist; complain */ + if (is_del) + return -4; + + /* Retrieve SR policy */ + if (bsid) + { + p = mhash_get (&sm->sr_policies_index_hash, bsid); + if (p) + sr_policy = pool_elt_at_index (sm->sr_policies, p[0]); + else + return -2; + } + else + sr_policy = pool_elt_at_index (sm->sr_policies, sr_policy_index); + + /* Create a new steering policy */ + pool_get (sm->steer_policies, steer_pl); + memset (steer_pl, 0, sizeof (*steer_pl)); + + if (traffic_type == SR_STEER_IPV4 || traffic_type == SR_STEER_IPV6) + { + clib_memcpy (&steer_pl->classify.l3.prefix, prefix, + sizeof (ip46_address_t)); + steer_pl->classify.l3.mask_width = mask_width; + steer_pl->classify.l3.fib_table = + (table_id != (u32) ~ 0 ? table_id : 0); + steer_pl->classify.traffic_type = traffic_type; + } + else if (traffic_type == SR_STEER_L2) + { + steer_pl->classify.l2.sw_if_index = sw_if_index; + steer_pl->classify.traffic_type = traffic_type; + } + else + { + /* Incorrect API usage. Should never get here */ + pool_put (sm->steer_policies, steer_pl); + mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); + return -1; + } + steer_pl->sr_policy = sr_policy - sm->sr_policies; + + /* Create and store key */ + mhash_set (&sm->sr_steer_policies_hash, &key, steer_pl - sm->steer_policies, + NULL); + + if (traffic_type == SR_STEER_L2) + { + if (!sr_policy->is_encap) + goto cleanup_error_encap; + + if (vnet_feature_enable_disable + ("device-input", "sr-pl-rewrite-encaps-l2", sw_if_index, 1, 0, 0)) + goto cleanup_error_redirection; + + /* Set promiscous mode on interface */ + vnet_main_t *vnm = vnet_get_main (); + ethernet_main_t *em = ðernet_main; + ethernet_interface_t *eif = ethernet_get_interface (em, sw_if_index); + + if (!eif) + goto cleanup_error_redirection; + + ethernet_set_flags (vnm, sw_if_index, + ETHERNET_INTERFACE_FLAG_ACCEPT_ALL); + } + else if (traffic_type == SR_STEER_IPV4) + if (!sr_policy->is_encap) + goto cleanup_error_encap; + +update_fib: + /* FIB API calls - Recursive route through the BindingSID */ + if (traffic_type == SR_STEER_IPV6) + { + pfx.fp_proto = FIB_PROTOCOL_IP6; + pfx.fp_len = steer_pl->classify.l3.mask_width; + pfx.fp_addr.ip6 = steer_pl->classify.l3.prefix.ip6; + + fib_table_entry_path_add (fib_table_find (FIB_PROTOCOL_IP6, + (table_id != + (u32) ~ 0 ? + table_id : 0)), + &pfx, FIB_SOURCE_SR, + FIB_ENTRY_FLAG_LOOSE_URPF_EXEMPT, + FIB_PROTOCOL_IP6, + (ip46_address_t *) & sr_policy->bsid, ~0, + sm->fib_table_ip6, 1, NULL, + FIB_ROUTE_PATH_FLAG_NONE); + } + else if (traffic_type == SR_STEER_IPV4) + { + pfx.fp_proto = FIB_PROTOCOL_IP4; + pfx.fp_len = steer_pl->classify.l3.mask_width; + pfx.fp_addr.ip4 = steer_pl->classify.l3.prefix.ip4; + + fib_table_entry_path_add (fib_table_find (FIB_PROTOCOL_IP4, + (table_id != + (u32) ~ 0 ? + table_id : 0)), + &pfx, FIB_SOURCE_SR, + FIB_ENTRY_FLAG_LOOSE_URPF_EXEMPT, + FIB_PROTOCOL_IP6, + (ip46_address_t *) & sr_policy->bsid, ~0, + sm->fib_table_ip4, 1, NULL, + FIB_ROUTE_PATH_FLAG_NONE); + } + else if (traffic_type == SR_STEER_L2) + { + if (sw_if_index < vec_len (sm->sw_iface_sr_policies)) + sm->sw_iface_sr_policies[sw_if_index] = steer_pl->sr_policy; + else + { + vec_resize (sm->sw_iface_sr_policies, + (pool_len (sm->vnet_main->interface_main.sw_interfaces) + - vec_len (sm->sw_iface_sr_policies))); + sm->sw_iface_sr_policies[sw_if_index] = steer_pl->sr_policy; + } + } + + return 0; + +cleanup_error_encap: + pool_put (sm->steer_policies, steer_pl); + mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); + return -5; + +cleanup_error_redirection: + pool_put (sm->steer_policies, steer_pl); + mhash_unset (&sm->sr_steer_policies_hash, &key, NULL); + return -3; +} + +static clib_error_t * +sr_steer_policy_command_fn (vlib_main_t * vm, unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + vnet_main_t *vnm = vnet_get_main (); + + int is_del = 0; + + ip46_address_t prefix; + u32 dst_mask_width = 0; + u32 sw_if_index = (u32) ~ 0; + u8 traffic_type = 0; + u32 fib_table = (u32) ~ 0; + + ip6_address_t bsid; + u32 sr_policy_index = (u32) ~ 0; + + u8 sr_policy_set = 0; + + memset (&prefix, 0, sizeof (ip46_address_t)); + + int rv; + while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) + { + if (unformat (input, "del")) + is_del = 1; + else if (!traffic_type + && unformat (input, "l3 %U/%d", unformat_ip6_address, + &prefix.ip6, &dst_mask_width)) + traffic_type = SR_STEER_IPV6; + else if (!traffic_type + && unformat (input, "l3 %U/%d", unformat_ip4_address, + &prefix.ip4, &dst_mask_width)) + traffic_type = SR_STEER_IPV4; + else if (!traffic_type + && unformat (input, "l2 %U", unformat_vnet_sw_interface, vnm, + &sw_if_index)) + traffic_type = SR_STEER_L2; + else if (!sr_policy_set + && unformat (input, "via sr policy index %d", + &sr_policy_index)) + sr_policy_set = 1; + else if (!sr_policy_set + && unformat (input, "via sr policy bsid %U", + unformat_ip6_address, &bsid)) + sr_policy_set = 1; + else if (fib_table == (u32) ~ 0 + && unformat (input, "fib-table %d", &fib_table)); + else + break; + } + + if (!traffic_type) + return clib_error_return (0, "No L2/L3 traffic specified"); + if (!sr_policy_set) + return clib_error_return (0, "No SR policy specified"); + + /* Make sure that the prefixes are clean */ + if (traffic_type == SR_STEER_IPV4) + { + u32 mask = + (dst_mask_width ? (0xFFFFFFFFu >> (32 - dst_mask_width)) : 0); + prefix.ip4.as_u32 &= mask; + } + else if (traffic_type == SR_STEER_IPV6) + { + ip6_address_t mask; + ip6_address_mask_from_width (&mask, dst_mask_width); + ip6_address_mask (&prefix.ip6, &mask); + } + + rv = + sr_steering_policy (is_del, (sr_policy_index == ~(u32) 0 ? &bsid : NULL), + sr_policy_index, fib_table, &prefix, dst_mask_width, + sw_if_index, traffic_type); + + switch (rv) + { + case 0: + break; + case 1: + return 0; + case -1: + return clib_error_return (0, "Incorrect API usage."); + case -2: + return clib_error_return (0, + "The requested SR policy could not be located. Review the BSID/index."); + case -3: + return clib_error_return (0, + "Unable to do SW redirect. Incorrect interface."); + case -4: + return clib_error_return (0, + "The requested SR steering policy could not be deleted."); + case -5: + return clib_error_return (0, + "The SR policy is not an encapsulation one."); + default: + return clib_error_return (0, "BUG: sr steer policy returns %d", rv); + } + return 0; +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (sr_steer_policy_command, static) = { + .path = "sr steer", + .short_help = "sr steer (del) [l3 |l2 ]" + "via sr policy [index |bsid ]" + "(fib-table )", + .long_help = + "\tSteer a L2 or L3 traffic through an existing SR policy.\n" + "\tExamples:\n" + "\t\tsr steer l3 2001::/64 via sr_policy index 5\n" + "\t\tsr steer l3 2001::/64 via sr_policy bsid 2010::9999:1\n" + "\t\tsr steer l2 GigabitEthernet0/5/0 via sr_policy index 5\n" + "\t\tsr steer del l3 2001::/64 via sr_policy index 5\n", + .function = sr_steer_policy_command_fn, +}; +/* *INDENT-ON* */ + +static clib_error_t * +show_sr_steering_policies_command_fn (vlib_main_t * vm, + unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + ip6_sr_main_t *sm = &sr_main; + ip6_sr_steering_policy_t **steer_policies = 0; + ip6_sr_steering_policy_t *steer_pl; + + vnet_main_t *vnm = vnet_get_main (); + + ip6_sr_policy_t *pl = 0; + int i; + + vlib_cli_output (vm, "SR steering policies:"); + /* *INDENT-OFF* */ + pool_foreach (steer_pl, sm->steer_policies, ({vec_add1(steer_policies, steer_pl);})); + /* *INDENT-ON* */ + vlib_cli_output (vm, "Traffic\t\tSR policy BSID"); + for (i = 0; i < vec_len (steer_policies); i++) + { + steer_pl = steer_policies[i]; + pl = pool_elt_at_index (sm->sr_policies, steer_pl->sr_policy); + if (steer_pl->classify.traffic_type == SR_STEER_L2) + { + vlib_cli_output (vm, "L2 %U\t%U", + format_vnet_sw_if_index_name, vnm, + steer_pl->classify.l2.sw_if_index, + format_ip6_address, &pl->bsid); + } + else if (steer_pl->classify.traffic_type == SR_STEER_IPV4) + { + vlib_cli_output (vm, "L3 %U/%d\t%U", + format_ip4_address, + &steer_pl->classify.l3.prefix.ip4, + steer_pl->classify.l3.mask_width, + format_ip6_address, &pl->bsid); + } + else if (steer_pl->classify.traffic_type == SR_STEER_IPV6) + { + vlib_cli_output (vm, "L3 %U/%d\t%U", + format_ip6_address, + &steer_pl->classify.l3.prefix.ip6, + steer_pl->classify.l3.mask_width, + format_ip6_address, &pl->bsid); + } + } + return 0; +} + +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (show_sr_steering_policies_command, static) = { + .path = "show sr steering policies", + .short_help = "show sr steering policies", + .function = show_sr_steering_policies_command_fn, +}; +/* *INDENT-ON* */ + +clib_error_t * +sr_steering_init (vlib_main_t * vm) +{ + ip6_sr_main_t *sm = &sr_main; + + /* Init memory for function keys */ + mhash_init (&sm->sr_steer_policies_hash, sizeof (uword), + sizeof (sr_steering_key_t)); + + sm->sw_iface_sr_policies = 0; + + sm->vnet_main = vnet_get_main (); + + return 0; +} + +/* *INDENT-OFF* */ +VLIB_INIT_FUNCTION (sr_steering_init); +/* *INDENT-ON* */ + +/* *INDENT-OFF* */ +VNET_FEATURE_INIT (sr_pl_rewrite_encaps_l2, static) = +{ + .arc_name = "device-input", + .node_name = "sr-pl-rewrite-encaps-l2", + .runs_before = VNET_FEATURES ("ethernet-input"), +}; +/* *INDENT-ON* */ + +/* +* fd.io coding-style-patch-verification: ON +* +* Local Variables: +* eval: (c-set-style "gnu") +* End: +*/ diff --git a/src/vnet/srv6/sr_steering.md b/src/vnet/srv6/sr_steering.md new file mode 100644 index 00000000..cf446f81 --- /dev/null +++ b/src/vnet/srv6/sr_steering.md @@ -0,0 +1,11 @@ +# Steering packets into a SR Policy {#srv6_steering_doc} + +To steer packets in Transit into an SR policy (T.Insert, T.Encaps and T.Encaps.L2 behaviors), the user needs to create an 'sr steering policy'. + + sr steer l3 2001::/64 via sr policy index 1 + sr steer l3 2001::/64 via sr policy bsid cafe::1 + sr steer l3 2001::/64 via sr policy bsid cafe::1 fib-table 3 + sr steer l3 10.0.0.0/16 via sr policy bsid cafe::1 + sr steer l2 TenGE0/1/0 via sr policy bsid cafe::1 + +Disclaimer: The T.Encaps.L2 will steer L2 frames into an SR Policy. Notice that creating an SR steering policy for L2 frames will actually automatically *put the interface into promiscous mode*. diff --git a/src/vnet/vnet_all_api_h.h b/src/vnet/vnet_all_api_h.h index 9d3abae5..566e22ec 100644 --- a/src/vnet/vnet_all_api_h.h +++ b/src/vnet/vnet_all_api_h.h @@ -50,7 +50,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/src/vpp/api/api.c b/src/vpp/api/api.c index baf45d5c..16d51225 100644 --- a/src/vpp/api/api.c +++ b/src/vpp/api/api.c @@ -53,7 +53,7 @@ #include #include #if WITH_LIBSSL > 0 -#include +#include #endif #include #include diff --git a/src/vpp/api/custom_dump.c b/src/vpp/api/custom_dump.c index 000fe0d4..107e83f3 100644 --- a/src/vpp/api/custom_dump.c +++ b/src/vpp/api/custom_dump.c @@ -24,7 +24,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/src/vpp/api/vpe.api b/src/vpp/api/vpe.api index 7c07c822..99ae4784 100644 --- a/src/vpp/api/vpe.api +++ b/src/vpp/api/vpe.api @@ -40,7 +40,7 @@ * LISP-GPE APIs: see .../src/vnet/lisp-gpe/{lisp_gpe.api, lisp_gpe_api.c} * SESSION APIs: .../vnet/session/{session.api session_api.c} * MPLS APIs: see .../src/vnet/mpls/{mpls.api, mpls_api.c} - * SR APIs: see .../src/vnet/sr/{sr.api, sr_api.c} + * SR APIs: see .../src/vnet/srv6/{sr.api, sr_api.c} * CLASSIFY APIs: see ... /src/vnet/classify/{classify.api, classify_api.c} * FLOW APIs: see ... /src/vnet/flow/{flow.api, flow_api.c} * DHCP APIs: see ... /src/vnet/dhcp/{dhcpk.api, dhcp_api.c} -- cgit 1.2.3-korg From 5c749734b14c2d3be8689b0c5b72ae8d1ddec099 Mon Sep 17 00:00:00 2001 From: Ole Troan Date: Mon, 13 Mar 2017 13:39:52 +0100 Subject: Flowprobe: Stateful flows and IPv6, L4 recording Change-Id: I67839281623721bf42f0a918a53356143d9dc78a Signed-off-by: Ole Troan Signed-off-by: Pavel Kotucek Signed-off-by: Ole Troan --- MAINTAINERS | 8 +- doxygen/user_doc.md | 2 +- src/configure.ac | 2 +- src/plugins/Makefile.am | 4 +- src/plugins/flowperpkt.am | 38 - src/plugins/flowperpkt/flowperpkt.api | 40 - src/plugins/flowperpkt/flowperpkt.c | 621 ------------- src/plugins/flowperpkt/flowperpkt.h | 90 -- src/plugins/flowperpkt/flowperpkt_all_api_h.h | 18 - src/plugins/flowperpkt/flowperpkt_msg_enum.h | 31 - src/plugins/flowperpkt/flowperpkt_plugin_doc.md | 13 - src/plugins/flowperpkt/flowperpkt_test.c | 200 ---- src/plugins/flowperpkt/l2_node.c | 561 ------------ src/plugins/flowperpkt/node.c | 574 ------------ src/plugins/flowprobe.am | 37 + src/plugins/flowprobe/flowprobe.api | 40 + src/plugins/flowprobe/flowprobe.c | 1122 +++++++++++++++++++++++ src/plugins/flowprobe/flowprobe.h | 167 ++++ src/plugins/flowprobe/flowprobe_all_api_h.h | 18 + src/plugins/flowprobe/flowprobe_msg_enum.h | 31 + src/plugins/flowprobe/flowprobe_plugin_doc.md | 13 + src/plugins/flowprobe/flowprobe_test.c | 263 ++++++ src/plugins/flowprobe/node.c | 998 ++++++++++++++++++++ src/plugins/ioam/analyse/ioam_summary_export.c | 3 +- src/plugins/ioam/udp-ping/udp_ping_export.c | 3 +- src/plugins/snat/snat_ipfix_logging.c | 6 +- src/vnet/flow/flow_api.c | 2 +- src/vnet/flow/flow_report.c | 21 +- src/vnet/flow/flow_report.h | 3 +- src/vnet/flow/flow_report_classify.c | 2 +- src/vnet/ip/ip6_packet.h | 6 + src/vpp-api/java/jvpp/gen/jvpp_gen.py | 2 +- test/test_flowperpkt.py | 173 ---- test/test_flowprobe.py | 967 +++++++++++++++++++ 34 files changed, 3695 insertions(+), 2384 deletions(-) delete mode 100644 src/plugins/flowperpkt.am delete mode 100644 src/plugins/flowperpkt/flowperpkt.api delete mode 100644 src/plugins/flowperpkt/flowperpkt.c delete mode 100644 src/plugins/flowperpkt/flowperpkt.h delete mode 100644 src/plugins/flowperpkt/flowperpkt_all_api_h.h delete mode 100644 src/plugins/flowperpkt/flowperpkt_msg_enum.h delete mode 100644 src/plugins/flowperpkt/flowperpkt_plugin_doc.md delete mode 100644 src/plugins/flowperpkt/flowperpkt_test.c delete mode 100644 src/plugins/flowperpkt/l2_node.c delete mode 100644 src/plugins/flowperpkt/node.c create mode 100644 src/plugins/flowprobe.am create mode 100644 src/plugins/flowprobe/flowprobe.api create mode 100644 src/plugins/flowprobe/flowprobe.c create mode 100644 src/plugins/flowprobe/flowprobe.h create mode 100644 src/plugins/flowprobe/flowprobe_all_api_h.h create mode 100644 src/plugins/flowprobe/flowprobe_msg_enum.h create mode 100644 src/plugins/flowprobe/flowprobe_plugin_doc.md create mode 100644 src/plugins/flowprobe/flowprobe_test.c create mode 100644 src/plugins/flowprobe/node.c delete mode 100644 test/test_flowperpkt.py create mode 100644 test/test_flowprobe.py (limited to 'doxygen/user_doc.md') diff --git a/MAINTAINERS b/MAINTAINERS index 2f198319..0f5dfc04 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -112,10 +112,10 @@ VNET VXLAN M: John Lo F: src/vnet/vxlan/ -Plugin - flowperpkt -M: Dave Barach -F: src/plugins/flowperpkt/ -F: src/plugins/flowperpkt.am +Plugin - flowprobe +M: Ole Troan +F: src/plugins/flowprobe/ +F: src/plugins/flowprobe.am Plugin - SIXRD M: Ole Troan diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index d052c53b..becc2e0a 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -11,7 +11,7 @@ Several modules provide operational, dataplane-user focused documentation. - @subpage lldp_doc - @subpage map_doc - @subpage dpdk_crypto_ipsec_doc -- @subpage flowperpkt_plugin_doc +- @subpage flowprobe_plugin_doc - @subpage qos_doc - @subpage span_doc - @subpage srv6_doc diff --git a/src/configure.ac b/src/configure.ac index 2a907b2f..173b3153 100644 --- a/src/configure.ac +++ b/src/configure.ac @@ -146,7 +146,7 @@ AC_SUBST(AR_FLAGS) # Please keep alphabetical order PLUGIN_ENABLED(acl) PLUGIN_ENABLED(dpdk) -PLUGIN_ENABLED(flowperpkt) +PLUGIN_ENABLED(flowprobe) PLUGIN_ENABLED(gtpu) PLUGIN_ENABLED(ila) PLUGIN_ENABLED(ioam) diff --git a/src/plugins/Makefile.am b/src/plugins/Makefile.am index 388c9ad2..f26d0fd2 100644 --- a/src/plugins/Makefile.am +++ b/src/plugins/Makefile.am @@ -37,8 +37,8 @@ if ENABLE_DPDK_PLUGIN include dpdk.am endif -if ENABLE_FLOWPERPKT_PLUGIN -include flowperpkt.am +if ENABLE_FLOWPROBE_PLUGIN +include flowprobe.am endif diff --git a/src/plugins/flowperpkt.am b/src/plugins/flowperpkt.am deleted file mode 100644 index a400603a..00000000 --- a/src/plugins/flowperpkt.am +++ /dev/null @@ -1,38 +0,0 @@ - -# Copyright (c) -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at: -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -vppplugins_LTLIBRARIES += flowperpkt_plugin.la -vppapitestplugins_LTLIBRARIES += flowperpkt_test_plugin.la - -flowperpkt_plugin_la_SOURCES = flowperpkt/flowperpkt.c \ - flowperpkt/l2_node.c \ - flowperpkt/node.c \ - flowperpkt/flowperpkt_plugin.api.h - -BUILT_SOURCES += \ - flowperpkt/flowperpkt.api.h \ - flowperpkt/flowperpkt.api.json - -noinst_HEADERS += \ - flowperpkt/flowperpkt_all_api_h.h \ - flowperpkt/flowperpkt_msg_enum.h \ - flowperpkt/flowperpkt.api.h - -flowperpkt_test_plugin_la_SOURCES = \ - flowperpkt/flowperpkt_test.c \ - flowperpkt/flowperpkt_plugin.api.h - -API_FILES += flowperpkt/flowperpkt.api - -# vi:syntax=automake diff --git a/src/plugins/flowperpkt/flowperpkt.api b/src/plugins/flowperpkt/flowperpkt.api deleted file mode 100644 index 3ff92dca..00000000 --- a/src/plugins/flowperpkt/flowperpkt.api +++ /dev/null @@ -1,40 +0,0 @@ -/* Define a simple enable-disable binary API to control the feature */ - -/** \file - This file defines the vpp control-plane API messages - used to control the flowperpkt plugin -*/ - -/** \brief Enable / disable per-packet IPFIX recording on an interface - @param client_index - opaque cookie to identify the sender - @param context - sender context, to match reply w/ request - @param is_add - add address if non-zero, else delete - @param is_ipv6 - if non-zero the address is ipv6, else ipv4 - @param sw_if_index - index of the interface -*/ -autoreply manual_print define flowperpkt_tx_interface_add_del -{ - /* Client identifier, set from api_main.my_client_index */ - u32 client_index; - - /* Arbitrary context, so client can match reply to request */ - u32 context; - - /* Enable / disable the feature */ - u8 is_add; - u8 which; /* 0 = ipv4, 1 = l2, 2 = ipv6 */ - - /* Interface handle */ - u32 sw_if_index; -}; - -autoreply define flowperpkt_params -{ - u32 client_index; - u32 context; - u8 record_l2; - u8 record_l3; - u8 record_l4; - u32 active_timer; /* ~0 is off, 0 is default */ - u32 passive_timer; /* ~0 is off, 0 is default */ -}; diff --git a/src/plugins/flowperpkt/flowperpkt.c b/src/plugins/flowperpkt/flowperpkt.c deleted file mode 100644 index 3e5fc8b0..00000000 --- a/src/plugins/flowperpkt/flowperpkt.c +++ /dev/null @@ -1,621 +0,0 @@ -/* - * flowperpkt.c - per-packet data capture flow report plugin - * - * Copyright (c) - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -/** - * @file - * @brief Per-packet IPFIX flow record generator plugin - * - * This file implements vpp plugin registration mechanics, - * debug CLI, and binary API handling. - */ - -#include -#include -#include -#include - -#include -#include -#include - -/* define message IDs */ -#include - -/* define message structures */ -#define vl_typedefs -#include -#undef vl_typedefs - -/* define generated endian-swappers */ -#define vl_endianfun -#include -#undef vl_endianfun - -/* instantiate all the print functions we know about */ -#define vl_print(handle, ...) vlib_cli_output (handle, __VA_ARGS__) -#define vl_printfun -#include -#undef vl_printfun - -flowperpkt_main_t flowperpkt_main; - -/* Get the API version number */ -#define vl_api_version(n,v) static u32 api_version=(v); -#include -#undef vl_api_version - -#define REPLY_MSG_ID_BASE fm->msg_id_base -#include - -/* Define the per-interface configurable features */ -/* *INDENT-OFF* */ -VNET_FEATURE_INIT (flow_perpacket_ipv4, static) = -{ - .arc_name = "ip4-output", - .node_name = "flowperpkt-ipv4", - .runs_before = VNET_FEATURES ("interface-output"), -}; - -VNET_FEATURE_INIT (flow_perpacket_l2, static) = -{ - .arc_name = "interface-output", - .node_name = "flowperpkt-l2", - .runs_before = VNET_FEATURES ("interface-tx"), -}; -/* *INDENT-ON* */ - -/* Macro to finish up custom dump fns */ -#define FINISH \ - vec_add1 (s, 0); \ - vl_print (handle, (char *)s); \ - vec_free (s); \ - return handle; - -/** - * @brief Create an IPFIX template packet rewrite string - * @param frm flow_report_main_t * - * @param fr flow_report_t * - * @param collector_address ip4_address_t * the IPFIX collector address - * @param src_address ip4_address_t * the source address we should use - * @param collector_port u16 the collector port we should use, host byte order - * @returns u8 * vector containing the indicated IPFIX template packet - */ -static inline u8 * -flowperpkt_template_rewrite_inline (flow_report_main_t * frm, - flow_report_t * fr, - ip4_address_t * collector_address, - ip4_address_t * src_address, - u16 collector_port, int variant) -{ - ip4_header_t *ip; - udp_header_t *udp; - ipfix_message_header_t *h; - ipfix_set_header_t *s; - ipfix_template_header_t *t; - ipfix_field_specifier_t *f; - ipfix_field_specifier_t *first_field; - u8 *rewrite = 0; - ip4_ipfix_template_packet_t *tp; - u32 field_count = 0; - flow_report_stream_t *stream; - flowperpkt_main_t *fm = &flowperpkt_main; - - stream = &frm->streams[fr->stream_index]; - - if (variant == FLOW_VARIANT_IPV4) - { - /* - * ip4 Supported Fields: - * - * ingressInterface, TLV type 10, u32 - * egressInterface, TLV type 14, u32 - * sourceIpv4Address, TLV type 8, u32 - * destinationIPv4Address, TLV type 12, u32 - * ipClassOfService, TLV type 5, u8 - * flowStartNanoseconds, TLV type 156, dateTimeNanoseconds (f64) - * Implementation: f64 nanoseconds since VPP started - * warning: wireshark doesn't really understand this TLV - * dataLinkFrameSize, TLV type 312, u16 - * warning: wireshark doesn't understand this TLV at all - */ - - /* Currently 7 fields */ - field_count += 7; - - /* allocate rewrite space */ - vec_validate_aligned - (rewrite, - sizeof (ip4_ipfix_template_packet_t) - + field_count * sizeof (ipfix_field_specifier_t) - 1, - CLIB_CACHE_LINE_BYTES); - } - else if (variant == FLOW_VARIANT_L2) - { - /* - * L2 Supported Fields: - * - * ingressInterface, TLV type 10, u32 - * egressInterface, TLV type 14, u32 - * sourceMacAddress, TLV type 56, u8[6] we hope - * destinationMacAddress, TLV type 57, u8[6] we hope - * ethernetType, TLV type 256, u16 - * flowStartNanoseconds, TLV type 156, dateTimeNanoseconds (f64) - * Implementation: f64 nanoseconds since VPP started - * warning: wireshark doesn't really understand this TLV - * dataLinkFrameSize, TLV type 312, u16 - * warning: wireshark doesn't understand this TLV at all - */ - - /* Currently 7 fields */ - field_count += 7; - - /* allocate rewrite space */ - vec_validate_aligned - (rewrite, - sizeof (ip4_ipfix_template_packet_t) - + field_count * sizeof (ipfix_field_specifier_t) - 1, - CLIB_CACHE_LINE_BYTES); - } - - tp = (ip4_ipfix_template_packet_t *) rewrite; - ip = (ip4_header_t *) & tp->ip4; - udp = (udp_header_t *) (ip + 1); - h = (ipfix_message_header_t *) (udp + 1); - s = (ipfix_set_header_t *) (h + 1); - t = (ipfix_template_header_t *) (s + 1); - first_field = f = (ipfix_field_specifier_t *) (t + 1); - - ip->ip_version_and_header_length = 0x45; - ip->ttl = 254; - ip->protocol = IP_PROTOCOL_UDP; - ip->src_address.as_u32 = src_address->as_u32; - ip->dst_address.as_u32 = collector_address->as_u32; - udp->src_port = clib_host_to_net_u16 (stream->src_port); - udp->dst_port = clib_host_to_net_u16 (collector_port); - udp->length = clib_host_to_net_u16 (vec_len (rewrite) - sizeof (*ip)); - - /* FIXUP: message header export_time */ - /* FIXUP: message header sequence_number */ - h->domain_id = clib_host_to_net_u32 (stream->domain_id); - - /* Add TLVs to the template */ - if (variant == FLOW_VARIANT_IPV4) - { - f->e_id_length = - ipfix_e_id_length (0 /* enterprise */ , ingressInterface, - 4); - f++; - f->e_id_length = - ipfix_e_id_length (0 /* enterprise */ , egressInterface, - 4); - f++; - f->e_id_length = - ipfix_e_id_length (0 /* enterprise */ , sourceIPv4Address, - 4); - f++; - f->e_id_length = - ipfix_e_id_length (0 /* enterprise */ , destinationIPv4Address, 4); - f++; - f->e_id_length = - ipfix_e_id_length (0 /* enterprise */ , ipClassOfService, - 1); - f++; - f->e_id_length = - ipfix_e_id_length (0 /* enterprise */ , flowStartNanoseconds, - 8); - f++; - f->e_id_length = - ipfix_e_id_length (0 /* enterprise */ , dataLinkFrameSize, - 2); - f++; - } - else if (variant == FLOW_VARIANT_L2) - { - f->e_id_length = - ipfix_e_id_length (0 /* enterprise */ , ingressInterface, - 4); - f++; - f->e_id_length = - ipfix_e_id_length (0 /* enterprise */ , egressInterface, - 4); - f++; - f->e_id_length = - ipfix_e_id_length (0 /* enterprise */ , sourceMacAddress, - 6); - f++; - f->e_id_length = - ipfix_e_id_length (0 /* enterprise */ , destinationMacAddress, 6); - f++; - f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , ethernetType, - 2); - f++; - f->e_id_length = - ipfix_e_id_length (0 /* enterprise */ , flowStartNanoseconds, - 8); - f++; - f->e_id_length = - ipfix_e_id_length (0 /* enterprise */ , dataLinkFrameSize, - 2); - f++; - } - - /* Extend in the obvious way, right here... */ - - /* Back to the template packet... */ - ip = (ip4_header_t *) & tp->ip4; - udp = (udp_header_t *) (ip + 1); - - ASSERT (f - first_field); - /* Field count in this template */ - t->id_count = ipfix_id_count (fr->template_id, f - first_field); - - if (variant == FLOW_VARIANT_IPV4) - fm->ipv4_report_id = fr->template_id; - else if (variant == FLOW_VARIANT_L2) - fm->l2_report_id = fr->template_id; - - /* set length in octets */ - s->set_id_length = - ipfix_set_id_length (2 /* set_id */ , (u8 *) f - (u8 *) s); - - /* message length in octets */ - h->version_length = version_length ((u8 *) f - (u8 *) h); - - ip->length = clib_host_to_net_u16 ((u8 *) f - (u8 *) ip); - ip->checksum = ip4_header_checksum (ip); - - return rewrite; -} - -u8 * -flowperpkt_template_rewrite_ipv4 (flow_report_main_t * frm, - flow_report_t * fr, - ip4_address_t * collector_address, - ip4_address_t * src_address, - u16 collector_port) -{ - return flowperpkt_template_rewrite_inline - (frm, fr, collector_address, src_address, collector_port, - FLOW_VARIANT_IPV4); -} - -u8 * -flowperpkt_template_rewrite_l2 (flow_report_main_t * frm, - flow_report_t * fr, - ip4_address_t * collector_address, - ip4_address_t * src_address, - u16 collector_port) -{ - return flowperpkt_template_rewrite_inline - (frm, fr, collector_address, src_address, collector_port, - FLOW_VARIANT_L2); -} - - -/** - * @brief Flush accumulated data - * @param frm flow_report_main_t * - * @param fr flow_report_t * - * @param f vlib_frame_t * - * - * Notes: - * This function must simply return the incoming frame, or no template packets - * will be sent. - */ -vlib_frame_t * -flowperpkt_data_callback_ipv4 (flow_report_main_t * frm, - flow_report_t * fr, - vlib_frame_t * f, u32 * to_next, - u32 node_index) -{ - flowperpkt_flush_callback_ipv4 (); - return f; -} - -vlib_frame_t * -flowperpkt_data_callback_l2 (flow_report_main_t * frm, - flow_report_t * fr, - vlib_frame_t * f, u32 * to_next, u32 node_index) -{ - flowperpkt_flush_callback_l2 (); - return f; -} - -/** - * @brief configure / deconfigure the IPFIX flow-per-packet - * @param fm flowperpkt_main_t * fm - * @param sw_if_index u32 the desired interface - * @param is_add int 1 to enable the feature, 0 to disable it - * @returns 0 if successful, non-zero otherwise - */ - -static int flowperpkt_tx_interface_add_del_feature - (flowperpkt_main_t * fm, u32 sw_if_index, int which, int is_add) -{ - flow_report_main_t *frm = &flow_report_main; - vnet_flow_report_add_del_args_t _a, *a = &_a; - int rv; - - if (which == FLOW_VARIANT_IPV4 && !fm->ipv4_report_created) - { - memset (a, 0, sizeof (*a)); - a->rewrite_callback = flowperpkt_template_rewrite_ipv4; - a->flow_data_callback = flowperpkt_data_callback_ipv4; - a->is_add = 1; - a->domain_id = 1; /*$$$$ config parameter */ - a->src_port = 4739; /*$$$$ config parameter */ - fm->ipv4_report_created = 1; - - rv = vnet_flow_report_add_del (frm, a); - if (rv) - { - clib_warning ("vnet_flow_report_add_del returned %d", rv); - return -1; - } - } - else if (which == FLOW_VARIANT_L2 && !fm->l2_report_created) - { - memset (a, 0, sizeof (*a)); - a->rewrite_callback = flowperpkt_template_rewrite_l2; - a->flow_data_callback = flowperpkt_data_callback_l2; - a->is_add = 1; - a->domain_id = 1; /*$$$$ config parameter */ - a->src_port = 4739; /*$$$$ config parameter */ - fm->l2_report_created = 1; - - rv = vnet_flow_report_add_del (frm, a); - if (rv) - { - clib_warning ("vnet_flow_report_add_del returned %d", rv); - return -1; - } - } - - if (which == FLOW_VARIANT_IPV4) - vnet_feature_enable_disable ("ip4-output", "flowperpkt-ipv4", - sw_if_index, is_add, 0, 0); - else if (which == FLOW_VARIANT_L2) - vnet_feature_enable_disable ("interface-output", "flowperpkt-l2", - sw_if_index, is_add, 0, 0); - - return 0; -} - -/** - * @brief API message handler - * @param mp vl_api_flowperpkt_tx_interface_add_del_t * mp the api message - */ -void vl_api_flowperpkt_tx_interface_add_del_t_handler - (vl_api_flowperpkt_tx_interface_add_del_t * mp) -{ - flowperpkt_main_t *fm = &flowperpkt_main; - vl_api_flowperpkt_tx_interface_add_del_reply_t *rmp; - u32 sw_if_index = ntohl (mp->sw_if_index); - int rv = 0; - - VALIDATE_SW_IF_INDEX (mp); - - if (mp->which != FLOW_VARIANT_IPV4 && mp->which != FLOW_VARIANT_L2) - { - rv = VNET_API_ERROR_UNIMPLEMENTED; - goto out; - } - - rv = flowperpkt_tx_interface_add_del_feature (fm, sw_if_index, mp->which, - mp->is_add); -out: - BAD_SW_IF_INDEX_LABEL; - - REPLY_MACRO (VL_API_FLOWPERPKT_TX_INTERFACE_ADD_DEL_REPLY); -} - -/** - * @brief API message custom-dump function - * @param mp vl_api_flowperpkt_tx_interface_add_del_t * mp the api message - * @param handle void * print function handle - * @returns u8 * output string - */ -static void *vl_api_flowperpkt_tx_interface_add_del_t_print - (vl_api_flowperpkt_tx_interface_add_del_t * mp, void *handle) -{ - u8 *s; - - s = format (0, "SCRIPT: flowperpkt_tx_interface_add_del "); - s = format (s, "sw_if_index %d is_add %d which %d ", - clib_host_to_net_u32 (mp->sw_if_index), - (int) mp->is_add, (int) mp->which); - FINISH; -} - -/* List of message types that this plugin understands */ -#define foreach_flowperpkt_plugin_api_msg \ -_(FLOWPERPKT_TX_INTERFACE_ADD_DEL, flowperpkt_tx_interface_add_del) - -/* *INDENT-OFF* */ -VLIB_PLUGIN_REGISTER () = { - .version = VPP_BUILD_VER, - .description = "Flow per Packet", -}; -/* *INDENT-ON* */ - -static clib_error_t * -flowperpkt_tx_interface_add_del_feature_command_fn (vlib_main_t * vm, - unformat_input_t * input, - vlib_cli_command_t * cmd) -{ - flowperpkt_main_t *fm = &flowperpkt_main; - u32 sw_if_index = ~0; - int is_add = 1; - u8 which = FLOW_VARIANT_IPV4; - - int rv; - - while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) - { - if (unformat (input, "disable")) - is_add = 0; - else if (unformat (input, "%U", unformat_vnet_sw_interface, - fm->vnet_main, &sw_if_index)); - else if (unformat (input, "l2")) - which = FLOW_VARIANT_L2; - else - break; - } - - if (sw_if_index == ~0) - return clib_error_return (0, "Please specify an interface..."); - - rv = - flowperpkt_tx_interface_add_del_feature (fm, sw_if_index, which, is_add); - switch (rv) - { - case 0: - break; - - case VNET_API_ERROR_INVALID_SW_IF_INDEX: - return clib_error_return - (0, "Invalid interface, only works on physical ports"); - break; - - case VNET_API_ERROR_UNIMPLEMENTED: - return clib_error_return (0, "ip6 not supported"); - break; - - default: - return clib_error_return (0, "flowperpkt_enable_disable returned %d", - rv); - } - return 0; -} - -/*? - * 'flowperpkt feature add-del' commands to enable/disable - * per-packet IPFIX flow record generation on an interface - * - * @cliexpar - * @parblock - * To enable per-packet IPFIX flow-record generation on an interface: - * @cliexcmd{flowperpkt feature add-del GigabitEthernet2/0/0} - * - * To disable per-packet IPFIX flow-record generation on an interface: - * @cliexcmd{flowperpkt feature add-del GigabitEthernet2/0/0 disable} - * @cliexend - * @endparblock -?*/ -/* *INDENT-OFF* */ -VLIB_CLI_COMMAND (flowperpkt_enable_disable_command, static) = { - .path = "flowperpkt feature add-del", - .short_help = - "flowperpkt feature add-del [disable]", - .function = flowperpkt_tx_interface_add_del_feature_command_fn, -}; -/* *INDENT-ON* */ - -/** - * @brief Set up the API message handling tables - * @param vm vlib_main_t * vlib main data structure pointer - * @returns 0 to indicate all is well - */ -static clib_error_t * -flowperpkt_plugin_api_hookup (vlib_main_t * vm) -{ - flowperpkt_main_t *fm = &flowperpkt_main; -#define _(N,n) \ - vl_msg_api_set_handlers((VL_API_##N + fm->msg_id_base), \ - #n, \ - vl_api_##n##_t_handler, \ - vl_noop_handler, \ - vl_api_##n##_t_endian, \ - vl_api_##n##_t_print, \ - sizeof(vl_api_##n##_t), 1); - foreach_flowperpkt_plugin_api_msg; -#undef _ - - return 0; -} - -#define vl_msg_name_crc_list -#include -#undef vl_msg_name_crc_list - -static void -setup_message_id_table (flowperpkt_main_t * fm, api_main_t * am) -{ -#define _(id,n,crc) \ - vl_msg_api_add_msg_name_crc (am, #n "_" #crc, id + fm->msg_id_base); - foreach_vl_msg_name_crc_flowperpkt; -#undef _ -} - -/** - * @brief Set up the API message handling tables - * @param vm vlib_main_t * vlib main data structure pointer - * @returns 0 to indicate all is well, or a clib_error_t - */ -static clib_error_t * -flowperpkt_init (vlib_main_t * vm) -{ - flowperpkt_main_t *fm = &flowperpkt_main; - vlib_thread_main_t *tm = &vlib_thread_main; - clib_error_t *error = 0; - u32 num_threads; - u8 *name; - - fm->vnet_main = vnet_get_main (); - - /* Construct the API name */ - name = format (0, "flowperpkt_%08x%c", api_version, 0); - - /* Ask for a correctly-sized block of API message decode slots */ - fm->msg_id_base = vl_msg_api_get_msg_ids - ((char *) name, VL_MSG_FIRST_AVAILABLE); - - /* Hook up message handlers */ - error = flowperpkt_plugin_api_hookup (vm); - - /* Add our API messages to the global name_crc hash table */ - setup_message_id_table (fm, &api_main); - - vec_free (name); - - /* Decide how many worker threads we have */ - num_threads = 1 /* main thread */ + tm->n_threads; - - /* Allocate per worker thread vectors */ - vec_validate (fm->ipv4_buffers_per_worker, num_threads - 1); - vec_validate (fm->l2_buffers_per_worker, num_threads - 1); - vec_validate (fm->ipv4_frames_per_worker, num_threads - 1); - vec_validate (fm->l2_frames_per_worker, num_threads - 1); - vec_validate (fm->ipv4_next_record_offset_per_worker, num_threads - 1); - vec_validate (fm->l2_next_record_offset_per_worker, num_threads - 1); - - /* Set up time reference pair */ - fm->vlib_time_0 = vlib_time_now (vm); - fm->nanosecond_time_0 = unix_time_now_nsec (); - - return error; -} - -VLIB_INIT_FUNCTION (flowperpkt_init); - -/* - * fd.io coding-style-patch-verification: ON - * - * Local Variables: - * eval: (c-set-style "gnu") - * End: - */ diff --git a/src/plugins/flowperpkt/flowperpkt.h b/src/plugins/flowperpkt/flowperpkt.h deleted file mode 100644 index 20f6939d..00000000 --- a/src/plugins/flowperpkt/flowperpkt.h +++ /dev/null @@ -1,90 +0,0 @@ -/* - * flowperpkt.h - skeleton vpp engine plug-in header file - * - * Copyright (c) - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -#ifndef __included_flowperpkt_h__ -#define __included_flowperpkt_h__ - -#include -#include -#include - -#include -#include -#include -#include - -/** - * @file - * @brief flow-per-packet plugin header file - */ -typedef struct -{ - /** API message ID base */ - u16 msg_id_base; - - /** Have the reports [templates] been created? */ - int ipv4_report_created; - int l2_report_created; - - /** stream/template IDs */ - u16 ipv4_report_id; - u16 l2_report_id; - - /** ipfix buffers under construction, per-worker thread */ - vlib_buffer_t **ipv4_buffers_per_worker; - vlib_buffer_t **l2_buffers_per_worker; - - /** frames containing ipfix buffers, per-worker thread */ - vlib_frame_t **ipv4_frames_per_worker; - vlib_frame_t **l2_frames_per_worker; - - /** next record offset, per worker thread */ - u16 *ipv4_next_record_offset_per_worker; - u16 *l2_next_record_offset_per_worker; - - /** Time reference pair */ - u64 nanosecond_time_0; - f64 vlib_time_0; - - /** convenience vlib_main_t pointer */ - vlib_main_t *vlib_main; - /** convenience vnet_main_t pointer */ - vnet_main_t *vnet_main; -} flowperpkt_main_t; - -typedef enum -{ - FLOW_VARIANT_IPV4, - FLOW_VARIANT_L2, - FLOW_N_VARIANTS, -} flowperpkt_variant_t; - -extern flowperpkt_main_t flowperpkt_main; - -extern vlib_node_registration_t flowperpkt_ipv4_node; - -void flowperpkt_flush_callback_ipv4 (void); -void flowperpkt_flush_callback_l2 (void); - -#endif /* __included_flowperpkt_h__ */ - -/* - * fd.io coding-style-patch-verification: ON - * - * Local Variables: - * eval: (c-set-style "gnu") - * End: - */ diff --git a/src/plugins/flowperpkt/flowperpkt_all_api_h.h b/src/plugins/flowperpkt/flowperpkt_all_api_h.h deleted file mode 100644 index 329c375a..00000000 --- a/src/plugins/flowperpkt/flowperpkt_all_api_h.h +++ /dev/null @@ -1,18 +0,0 @@ -/* - * flowperpkt_all_api_h.h - plug-in api #include file - * - * Copyright (c) - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -/* Include the generated file, see BUILT_SOURCES in Makefile.am */ -#include diff --git a/src/plugins/flowperpkt/flowperpkt_msg_enum.h b/src/plugins/flowperpkt/flowperpkt_msg_enum.h deleted file mode 100644 index 3177e77a..00000000 --- a/src/plugins/flowperpkt/flowperpkt_msg_enum.h +++ /dev/null @@ -1,31 +0,0 @@ -/* - * flowperpkt_msg_enum.h - vpp engine plug-in message enumeration - * - * Copyright (c) - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -#ifndef included_flowperpkt_msg_enum_h -#define included_flowperpkt_msg_enum_h - -#include - -#define vl_msg_id(n,h) n, -typedef enum -{ -#include - /* We'll want to know how many messages IDs we need... */ - VL_MSG_FIRST_AVAILABLE, -} vl_msg_id_t; -#undef vl_msg_id - -#endif /* included_flowperpkt_msg_enum_h */ diff --git a/src/plugins/flowperpkt/flowperpkt_plugin_doc.md b/src/plugins/flowperpkt/flowperpkt_plugin_doc.md deleted file mode 100644 index ed76c45c..00000000 --- a/src/plugins/flowperpkt/flowperpkt_plugin_doc.md +++ /dev/null @@ -1,13 +0,0 @@ -Per-packet IPFIX flow record plugin {#flowperpkt_plugin_doc} -=================================== - -## Introduction - -This plugin generates one ipfix record entry per packet transmitted -on interfaces which have the feature enabled - -## Sample configuration - -set ipfix exporter collector 192.168.6.2 src 192.168.6.1 template-interval 20 port 4739 path-mtu 1500 - -flowperpkt feature add-del GigabitEthernet2/3/0 diff --git a/src/plugins/flowperpkt/flowperpkt_test.c b/src/plugins/flowperpkt/flowperpkt_test.c deleted file mode 100644 index 972a3b07..00000000 --- a/src/plugins/flowperpkt/flowperpkt_test.c +++ /dev/null @@ -1,200 +0,0 @@ -/* - * flowperpkt.c - skeleton vpp-api-test plug-in - * - * Copyright (c) - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -#include -#include -#include -#include -#include - -#define __plugin_msg_base flowperpkt_test_main.msg_id_base -#include - -/** - * @file vpp_api_test plugin - */ - -uword unformat_sw_if_index (unformat_input_t * input, va_list * args); - -/* Declare message IDs */ -#include - -/* define message structures */ -#define vl_typedefs -#include -#undef vl_typedefs - -/* declare message handlers for each api */ - -#define vl_endianfun /* define message structures */ -#include -#undef vl_endianfun - -/* instantiate all the print functions we know about */ -#define vl_print(handle, ...) -#define vl_printfun -#include -#undef vl_printfun - -/* Get the API version number. */ -#define vl_api_version(n,v) static u32 api_version=(v); -#include -#undef vl_api_version - -typedef struct -{ - /** API message ID base */ - u16 msg_id_base; - /** vat_main_t pointer */ - vat_main_t *vat_main; -} flowperpkt_test_main_t; - -flowperpkt_test_main_t flowperpkt_test_main; - -#define foreach_standard_reply_retval_handler \ -_(flowperpkt_tx_interface_add_del_reply) - -#define _(n) \ - static void vl_api_##n##_t_handler \ - (vl_api_##n##_t * mp) \ - { \ - vat_main_t * vam = flowperpkt_test_main.vat_main; \ - i32 retval = ntohl(mp->retval); \ - if (vam->async_mode) { \ - vam->async_errors += (retval < 0); \ - } else { \ - vam->retval = retval; \ - vam->result_ready = 1; \ - } \ - } -foreach_standard_reply_retval_handler; -#undef _ - -/* - * Table of message reply handlers, must include boilerplate handlers - * we just generated - */ -#define foreach_vpe_api_reply_msg \ -_(FLOWPERPKT_TX_INTERFACE_ADD_DEL_REPLY, \ - flowperpkt_tx_interface_add_del_reply) - -static int -api_flowperpkt_tx_interface_add_del (vat_main_t * vam) -{ - unformat_input_t *i = vam->input; - int enable_disable = 1; - u8 which = 0; /* ipv4 by default */ - u32 sw_if_index = ~0; - vl_api_flowperpkt_tx_interface_add_del_t *mp; - int ret; - - /* Parse args required to build the message */ - while (unformat_check_input (i) != UNFORMAT_END_OF_INPUT) - { - if (unformat (i, "%U", unformat_sw_if_index, vam, &sw_if_index)) - ; - else if (unformat (i, "sw_if_index %d", &sw_if_index)) - ; - else if (unformat (i, "disable")) - enable_disable = 0; - else if (unformat (i, "l2")) - which = 1; - else - break; - } - - if (sw_if_index == ~0) - { - errmsg ("missing interface name / explicit sw_if_index number \n"); - return -99; - } - - /* Construct the API message */ - M (FLOWPERPKT_TX_INTERFACE_ADD_DEL, mp); - mp->sw_if_index = ntohl (sw_if_index); - mp->is_add = enable_disable; - mp->which = which; - - /* send it... */ - S (mp); - - /* Wait for a reply... */ - W (ret); - return ret; -} - -/* - * List of messages that the api test plugin sends, - * and that the data plane plugin processes - */ -#define foreach_vpe_api_msg \ -_(flowperpkt_tx_interface_add_del, " [disable]") - -static void -flowperpkt_vat_api_hookup (vat_main_t * vam) -{ - flowperpkt_test_main_t *sm = &flowperpkt_test_main; - /* Hook up handlers for replies from the data plane plug-in */ -#define _(N,n) \ - vl_msg_api_set_handlers((VL_API_##N + sm->msg_id_base), \ - #n, \ - vl_api_##n##_t_handler, \ - vl_noop_handler, \ - vl_api_##n##_t_endian, \ - vl_api_##n##_t_print, \ - sizeof(vl_api_##n##_t), 1); - foreach_vpe_api_reply_msg; -#undef _ - - /* API messages we can send */ -#define _(n,h) hash_set_mem (vam->function_by_name, #n, api_##n); - foreach_vpe_api_msg; -#undef _ - - /* Help strings */ -#define _(n,h) hash_set_mem (vam->help_by_name, #n, h); - foreach_vpe_api_msg; -#undef _ -} - -clib_error_t * -vat_plugin_register (vat_main_t * vam) -{ - flowperpkt_test_main_t *sm = &flowperpkt_test_main; - u8 *name; - - sm->vat_main = vam; - - /* Ask the vpp engine for the first assigned message-id */ - name = format (0, "flowperpkt_%08x%c", api_version, 0); - sm->msg_id_base = vl_client_get_first_plugin_msg_id ((char *) name); - - /* Don't attempt to hook up API messages if the data plane plugin is AWOL */ - if (sm->msg_id_base != (u16) ~ 0) - flowperpkt_vat_api_hookup (vam); - - vec_free (name); - - return 0; -} - -/* - * fd.io coding-style-patch-verification: ON - * - * Local Variables: - * eval: (c-set-style "gnu") - * End: - */ diff --git a/src/plugins/flowperpkt/l2_node.c b/src/plugins/flowperpkt/l2_node.c deleted file mode 100644 index db80e990..00000000 --- a/src/plugins/flowperpkt/l2_node.c +++ /dev/null @@ -1,561 +0,0 @@ -/* - * l2_node.c - l2 ipfix-per-packet graph node - * - * Copyright (c) - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -#include -#include -#include -#include -#include - -/** - * @file l2 flow record generator graph node - */ - -typedef struct -{ - /** interface handle */ - u32 rx_sw_if_index; - u32 tx_sw_if_index; - /** src and dst L2 addresses */ - u8 src_mac[6]; - u8 dst_mac[6]; - /** Ethertype */ - u16 ethertype; - /** packet timestamp */ - u64 timestamp; - /** size of the buffer */ - u16 buffer_size; -} flowperpkt_l2_trace_t; - -/* packet trace format function */ -static u8 * -format_flowperpkt_l2_trace (u8 * s, va_list * args) -{ - CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *); - CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *); - flowperpkt_l2_trace_t *t = va_arg (*args, flowperpkt_l2_trace_t *); - - s = format (s, - "FLOWPERPKT-L2: rx_sw_if_index %d, tx_sw_if_index %d, src %U dst %U ethertype %0x2, timestamp %lld, size %d", - t->rx_sw_if_index, t->tx_sw_if_index, - format_ethernet_address, &t->src_mac, - format_ethernet_address, &t->dst_mac, - t->ethertype, t->timestamp, t->buffer_size); - return s; -} - -vlib_node_registration_t flowperpkt_l2_node; - -/* No counters at the moment */ -#define foreach_flowperpkt_l2_error - -typedef enum -{ -#define _(sym,str) FLOWPERPKT_ERROR_##sym, - foreach_flowperpkt_l2_error -#undef _ - FLOWPERPKT_N_ERROR, -} flowperpkt_l2_error_t; - -static char *flowperpkt_l2_error_strings[] = { -#define _(sym,string) string, - foreach_flowperpkt_l2_error -#undef _ -}; - -typedef enum -{ - FLOWPERPKT_L2_NEXT_DROP, - FLOWPERPKT_L2_NEXT_IP4_LOOKUP, - FLOWPERPKT_L2_N_NEXT, -} flowperpkt_l2_next_t; - -/** - * @brief add an entry to the flow record under construction - * @param vm vlib_main_t * current worker thread main structure pointer - * @param fm flowperpkt_main_t * flow-per-packet main structure pointer - * @param sw_if_index u32 interface handle - * @param tos u8 ToS bits from the packet - * @param timestamp u64 timestamp, nanoseconds since 1/1/70 - * @param length u16 ip length of the packet - * @param do_flush int 1 = flush all cached records, 0 = construct a record - */ - -static inline void -add_to_flow_record_l2 (vlib_main_t * vm, - vlib_node_runtime_t * node, - flowperpkt_main_t * fm, - u32 rx_sw_if_index, u32 tx_sw_if_index, - u8 * src_mac, u8 * dst_mac, - u16 ethertype, u64 timestamp, u16 length, int do_flush) -{ - u32 my_thread_index = vm->thread_index; - flow_report_main_t *frm = &flow_report_main; - ip4_header_t *ip; - udp_header_t *udp; - ip4_ipfix_template_packet_t *tp; - ipfix_message_header_t *h; - ipfix_set_header_t *s; - vlib_frame_t *f; - vlib_buffer_t *b0; - u16 offset; - u32 bi0; - vlib_buffer_free_list_t *fl; - - /* Find or allocate a buffer */ - b0 = fm->l2_buffers_per_worker[my_thread_index]; - - /* Need to allocate a buffer? */ - if (PREDICT_FALSE (b0 == 0)) - { - /* Nothing to flush */ - if (do_flush) - return; - - /* $$$$ drop counter? */ - if (vlib_buffer_alloc (vm, &bi0, 1) != 1) - return; - - /* Initialize the buffer */ - b0 = fm->l2_buffers_per_worker[my_thread_index] = - vlib_get_buffer (vm, bi0); - fl = - vlib_buffer_get_free_list (vm, VLIB_BUFFER_DEFAULT_FREE_LIST_INDEX); - vlib_buffer_init_for_free_list (b0, fl); - VLIB_BUFFER_TRACE_TRAJECTORY_INIT (b0); - offset = 0; - } - else - { - /* use the current buffer */ - bi0 = vlib_get_buffer_index (vm, b0); - offset = fm->l2_next_record_offset_per_worker[my_thread_index]; - } - - /* Find or allocate a frame */ - f = fm->l2_frames_per_worker[my_thread_index]; - if (PREDICT_FALSE (f == 0)) - { - u32 *to_next; - f = vlib_get_frame_to_node (vm, ip4_lookup_node.index); - fm->l2_frames_per_worker[my_thread_index] = f; - - /* Enqueue the buffer */ - to_next = vlib_frame_vector_args (f); - to_next[0] = bi0; - f->n_vectors = 1; - } - - /* Fresh packet, construct header */ - if (PREDICT_FALSE (offset == 0)) - { - flow_report_stream_t *stream; - - stream = &frm->streams[0]; - - b0->current_data = 0; - b0->current_length = sizeof (*ip) + sizeof (*udp) + sizeof (*h) + - sizeof (*s); - b0->flags |= (VLIB_BUFFER_TOTAL_LENGTH_VALID | VLIB_BUFFER_FLOW_REPORT); - vnet_buffer (b0)->sw_if_index[VLIB_RX] = 0; - vnet_buffer (b0)->sw_if_index[VLIB_TX] = frm->fib_index; - - tp = vlib_buffer_get_current (b0); - ip = (ip4_header_t *) & tp->ip4; - udp = (udp_header_t *) (ip + 1); - h = (ipfix_message_header_t *) (udp + 1); - s = (ipfix_set_header_t *) (h + 1); - - ip->ip_version_and_header_length = 0x45; - ip->ttl = 254; - ip->protocol = IP_PROTOCOL_UDP; - ip->flags_and_fragment_offset = 0; - ip->src_address.as_u32 = frm->src_address.as_u32; - ip->dst_address.as_u32 = frm->ipfix_collector.as_u32; - udp->src_port = clib_host_to_net_u16 (UDP_DST_PORT_ipfix); - udp->dst_port = clib_host_to_net_u16 (UDP_DST_PORT_ipfix); - udp->checksum = 0; - - /* FIXUP: message header export_time */ - h->export_time = (u32) - (((f64) frm->unix_time_0) + - (vlib_time_now (frm->vlib_main) - frm->vlib_time_0)); - h->export_time = clib_host_to_net_u32 (h->export_time); - h->domain_id = clib_host_to_net_u32 (stream->domain_id); - - /* FIXUP: message header sequence_number */ - h->sequence_number = stream->sequence_number++; - h->sequence_number = clib_host_to_net_u32 (h->sequence_number); - - offset = (u32) (((u8 *) (s + 1)) - (u8 *) tp); - } - - /* Add data, unless we're flushing stale data */ - if (PREDICT_TRUE (do_flush == 0)) - { - - /* Add data */ - /* Ingress interface */ - { - u32 ingress_interface = clib_host_to_net_u32 (rx_sw_if_index); - clib_memcpy (b0->data + offset, &ingress_interface, - sizeof (ingress_interface)); - offset += sizeof (ingress_interface); - } - /* Egress interface */ - { - u32 egress_interface = clib_host_to_net_u32 (tx_sw_if_index); - clib_memcpy (b0->data + offset, &egress_interface, - sizeof (egress_interface)); - offset += sizeof (egress_interface); - } - /* src mac address */ - { - clib_memcpy (b0->data + offset, src_mac, 6); - offset += 6; - } - /* dst mac address */ - { - clib_memcpy (b0->data + offset, dst_mac, 6); - offset += 6; - } - - /* ethertype */ - b0->data[offset++] = ethertype >> 8; - b0->data[offset++] = ethertype & 0xFF; - - /* Timestamp */ - clib_memcpy (b0->data + offset, ×tamp, sizeof (f64)); - offset += sizeof (f64); - - /* pkt size */ - { - u16 pkt_size = clib_host_to_net_u16 (length); - clib_memcpy (b0->data + offset, &pkt_size, sizeof (pkt_size)); - offset += sizeof (pkt_size); - } - - b0->current_length += - /* 2*sw_if_index + 2*mac + ethertype + timestamp + length = 32 */ - 2 * sizeof (u32) + 12 + sizeof (u16) + sizeof (f64) + sizeof (u16); - - } - /* Time to flush the buffer? */ - if (PREDICT_FALSE - (do_flush || (offset + 2 * sizeof (u32) + 12 + sizeof (u16) + - +sizeof (f64) + sizeof (u16)) > frm->path_mtu)) - { - tp = vlib_buffer_get_current (b0); - ip = (ip4_header_t *) & tp->ip4; - udp = (udp_header_t *) (ip + 1); - h = (ipfix_message_header_t *) (udp + 1); - s = (ipfix_set_header_t *) (h + 1); - - s->set_id_length = ipfix_set_id_length (fm->l2_report_id, - b0->current_length - - (sizeof (*ip) + sizeof (*udp) + - sizeof (*h))); - h->version_length = version_length (b0->current_length - - (sizeof (*ip) + sizeof (*udp))); - - ip->length = clib_host_to_net_u16 (b0->current_length); - - ip->checksum = ip4_header_checksum (ip); - udp->length = clib_host_to_net_u16 (b0->current_length - sizeof (*ip)); - - if (frm->udp_checksum) - { - /* RFC 7011 section 10.3.2. */ - udp->checksum = ip4_tcp_udp_compute_checksum (vm, b0, ip); - if (udp->checksum == 0) - udp->checksum = 0xffff; - } - - ASSERT (ip->checksum == ip4_header_checksum (ip)); - - if (PREDICT_FALSE (vlib_get_trace_count (vm, node) > 0)) - { - vlib_trace_buffer (vm, node, FLOWPERPKT_L2_NEXT_IP4_LOOKUP, b0, - 0 /* follow chain */ ); - flowperpkt_l2_trace_t *t = - vlib_add_trace (vm, node, b0, sizeof (*t)); - memset (t, 0, sizeof (*t)); - t->rx_sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_RX]; - t->tx_sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_TX]; - t->buffer_size = b0->current_length; - } - - vlib_put_frame_to_node (vm, ip4_lookup_node.index, - fm->l2_frames_per_worker[my_thread_index]); - fm->l2_frames_per_worker[my_thread_index] = 0; - fm->l2_buffers_per_worker[my_thread_index] = 0; - offset = 0; - } - - fm->l2_next_record_offset_per_worker[my_thread_index] = offset; -} - -void -flowperpkt_flush_callback_l2 (void) -{ - vlib_main_t *vm = vlib_get_main (); - flowperpkt_main_t *fm = &flowperpkt_main; - vlib_node_runtime_t *node; - node = vlib_node_get_runtime (vm, flowperpkt_l2_node.index); - - add_to_flow_record_l2 (vm, node, fm, 0 /* rx_sw_if_index */ , - 0 /* tx_sw_if_index */ , - 0 /* src mac */ , - 0 /* dst mac */ , - 0 /* ethertype */ , - 0ULL /* timestamp */ , - 0 /* length */ , - 1 /* do_flush */ ); -} - - -static uword -flowperpkt_l2_node_fn (vlib_main_t * vm, - vlib_node_runtime_t * node, vlib_frame_t * frame) -{ - u32 n_left_from, *from, *to_next; - flowperpkt_l2_next_t next_index; - flowperpkt_main_t *fm = &flowperpkt_main; - u64 now; - - now = (u64) ((vlib_time_now (vm) - fm->vlib_time_0) * 1e9); - now += fm->nanosecond_time_0; - - from = vlib_frame_vector_args (frame); - n_left_from = frame->n_vectors; - next_index = node->cached_next_index; - - while (n_left_from > 0) - { - u32 n_left_to_next; - - vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); - - while (n_left_from >= 4 && n_left_to_next >= 2) - { - u32 next0 = FLOWPERPKT_L2_NEXT_DROP; - u32 next1 = FLOWPERPKT_L2_NEXT_DROP; - ethernet_header_t *eh0, *eh1; - u16 len0, len1; - u32 bi0, bi1; - vlib_buffer_t *b0, *b1; - - /* Prefetch next iteration. */ - { - vlib_buffer_t *p2, *p3; - - p2 = vlib_get_buffer (vm, from[2]); - p3 = vlib_get_buffer (vm, from[3]); - - vlib_prefetch_buffer_header (p2, LOAD); - vlib_prefetch_buffer_header (p3, LOAD); - - CLIB_PREFETCH (p2->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p3->data, CLIB_CACHE_LINE_BYTES, STORE); - } - - /* speculatively enqueue b0 and b1 to the current next frame */ - to_next[0] = bi0 = from[0]; - to_next[1] = bi1 = from[1]; - from += 2; - to_next += 2; - n_left_from -= 2; - n_left_to_next -= 2; - - b0 = vlib_get_buffer (vm, bi0); - b1 = vlib_get_buffer (vm, bi1); - - vnet_feature_next (vnet_buffer (b0)->sw_if_index[VLIB_TX], - &next0, b0); - vnet_feature_next (vnet_buffer (b1)->sw_if_index[VLIB_TX], - &next1, b1); - - eh0 = vlib_buffer_get_current (b0); - len0 = vlib_buffer_length_in_chain (vm, b0); - - if (PREDICT_TRUE ((b0->flags & VLIB_BUFFER_FLOW_REPORT) == 0)) - add_to_flow_record_l2 (vm, node, fm, - vnet_buffer (b0)->sw_if_index[VLIB_RX], - vnet_buffer (b0)->sw_if_index[VLIB_TX], - eh0->src_address, - eh0->dst_address, - eh0->type, now, len0, 0 /* flush */ ); - - eh1 = vlib_buffer_get_current (b0); - len1 = vlib_buffer_length_in_chain (vm, b0); - - if (PREDICT_TRUE ((b1->flags & VLIB_BUFFER_FLOW_REPORT) == 0)) - add_to_flow_record_l2 (vm, node, fm, - vnet_buffer (b1)->sw_if_index[VLIB_RX], - vnet_buffer (b1)->sw_if_index[VLIB_TX], - eh1->src_address, - eh1->dst_address, - eh1->type, now, len1, 0 /* flush */ ); - - if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) - { - if (b0->flags & VLIB_BUFFER_IS_TRACED) - { - flowperpkt_l2_trace_t *t = - vlib_add_trace (vm, node, b0, sizeof (*t)); - t->rx_sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_RX]; - t->tx_sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_TX]; - clib_memcpy (t->src_mac, eh0->src_address, 6); - clib_memcpy (t->dst_mac, eh0->dst_address, 6); - t->ethertype = clib_net_to_host_u16 (eh0->type); - t->timestamp = now; - t->buffer_size = len0; - } - if (b1->flags & VLIB_BUFFER_IS_TRACED) - { - flowperpkt_l2_trace_t *t = - vlib_add_trace (vm, node, b1, sizeof (*t)); - t->rx_sw_if_index = vnet_buffer (b1)->sw_if_index[VLIB_RX]; - t->tx_sw_if_index = vnet_buffer (b1)->sw_if_index[VLIB_TX]; - clib_memcpy (t->src_mac, eh1->src_address, 6); - clib_memcpy (t->dst_mac, eh1->dst_address, 6); - t->ethertype = clib_net_to_host_u16 (eh1->type); - t->timestamp = now; - t->buffer_size = len1; - } - } - - /* verify speculative enqueues, maybe switch current next frame */ - vlib_validate_buffer_enqueue_x2 (vm, node, next_index, - to_next, n_left_to_next, - bi0, bi1, next0, next1); - } - - while (n_left_from > 0 && n_left_to_next > 0) - { - u32 bi0; - vlib_buffer_t *b0; - u32 next0 = FLOWPERPKT_L2_NEXT_DROP; - ethernet_header_t *eh0; - u16 len0; - - /* speculatively enqueue b0 to the current next frame */ - bi0 = from[0]; - to_next[0] = bi0; - from += 1; - to_next += 1; - n_left_from -= 1; - n_left_to_next -= 1; - - b0 = vlib_get_buffer (vm, bi0); - - vnet_feature_next (vnet_buffer (b0)->sw_if_index[VLIB_TX], - &next0, b0); - - eh0 = vlib_buffer_get_current (b0); - len0 = vlib_buffer_length_in_chain (vm, b0); - - if (PREDICT_TRUE ((b0->flags & VLIB_BUFFER_FLOW_REPORT) == 0)) - add_to_flow_record_l2 (vm, node, fm, - vnet_buffer (b0)->sw_if_index[VLIB_RX], - vnet_buffer (b0)->sw_if_index[VLIB_TX], - eh0->src_address, - eh0->dst_address, - eh0->type, now, len0, 0 /* flush */ ); - - if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE) - && (b0->flags & VLIB_BUFFER_IS_TRACED))) - { - flowperpkt_l2_trace_t *t = - vlib_add_trace (vm, node, b0, sizeof (*t)); - t->rx_sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_RX]; - t->tx_sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_TX]; - clib_memcpy (t->src_mac, eh0->src_address, 6); - clib_memcpy (t->dst_mac, eh0->dst_address, 6); - t->ethertype = clib_net_to_host_u16 (eh0->type); - t->timestamp = now; - t->buffer_size = len0; - } - - /* verify speculative enqueue, maybe switch current next frame */ - vlib_validate_buffer_enqueue_x1 (vm, node, next_index, - to_next, n_left_to_next, - bi0, next0); - } - - vlib_put_next_frame (vm, node, next_index, n_left_to_next); - } - return frame->n_vectors; -} - -/** - * @brief IPFIX l2 flow-per-packet graph node - * @node flowperpkt-l2 - * - * This is the IPFIX flow-record-per-packet node. - * - * @param vm vlib_main_t corresponding to the current thread. - * @param node vlib_node_runtime_t data for this node. - * @param frame vlib_frame_t whose contents should be dispatched. - * - * @par Graph mechanics: buffer metadata, next index usage - * - * Uses: - * - vnet_buffer(b)->ip.save_rewrite_length - * - tells the node the length of the rewrite which was applied in - * ip4/6_rewrite_inline, allows the code to find the IP header without - * having to parse L2 headers, or make stupid assumptions about their - * length. - * - vnet_buffer(b)->flags & VLIB_BUFFER_FLOW_REPORT - * - Used to suppress flow record generation for flow record packets. - * - * Sets: - * - vnet_buffer(b)->flags & VLIB_BUFFER_FLOW_REPORT - * - To suppress flow record generation for flow record packets - * - * Next Index: - * - Next configured output feature on the interface, usually - * "interface-output." Generated flow records head for ip4-lookup - */ - -/* *INDENT-OFF* */ -VLIB_REGISTER_NODE (flowperpkt_l2_node) = { - .function = flowperpkt_l2_node_fn, - .name = "flowperpkt-l2", - .vector_size = sizeof (u32), - .format_trace = format_flowperpkt_l2_trace, - .type = VLIB_NODE_TYPE_INTERNAL, - - .n_errors = ARRAY_LEN(flowperpkt_l2_error_strings), - .error_strings = flowperpkt_l2_error_strings, - - .n_next_nodes = FLOWPERPKT_L2_N_NEXT, - - /* edit / add dispositions here */ - .next_nodes = { - [FLOWPERPKT_L2_NEXT_DROP] = "error-drop", - [FLOWPERPKT_L2_NEXT_IP4_LOOKUP] = "ip4-lookup", - }, -}; -/* *INDENT-ON* */ - -/* - * fd.io coding-style-patch-verification: ON - * - * Local Variables: - * eval: (c-set-style "gnu") - * End: - */ diff --git a/src/plugins/flowperpkt/node.c b/src/plugins/flowperpkt/node.c deleted file mode 100644 index 9bac4166..00000000 --- a/src/plugins/flowperpkt/node.c +++ /dev/null @@ -1,574 +0,0 @@ -/* - * node.c - ipv4 ipfix-per-packet graph node - * - * Copyright (c) - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at: - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -#include -#include -#include -#include -#include - -/** - * @file ipv4 flow record generator graph node - */ - -typedef struct -{ - /** interface handle */ - u32 rx_sw_if_index; - u32 tx_sw_if_index; - u32 src_address; - u32 dst_address; - /** ToS bits */ - u8 tos; - /** packet timestamp */ - u64 timestamp; - /** size of the buffer */ - u16 buffer_size; -} flowperpkt_ipv4_trace_t; - -/* packet trace format function */ -static u8 * -format_flowperpkt_ipv4_trace (u8 * s, va_list * args) -{ - CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *); - CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *); - flowperpkt_ipv4_trace_t *t = va_arg (*args, flowperpkt_ipv4_trace_t *); - - s = format (s, - "FLOWPERPKT-V4: rx_sw_if_index %d, tx_sw_if_index %d, src %U dst %U tos %0x2, timestamp %lld, size %d", - t->rx_sw_if_index, t->tx_sw_if_index, - format_ip4_address, &t->src_address, - format_ip4_address, &t->dst_address, - t->tos, t->timestamp, t->buffer_size); - return s; -} - -vlib_node_registration_t flowperpkt_ipv4_node; - -/* No counters at the moment */ -#define foreach_flowperpkt_ipv4_error - -typedef enum -{ -#define _(sym,str) FLOWPERPKT_ERROR_##sym, - foreach_flowperpkt_ipv4_error -#undef _ - FLOWPERPKT_N_ERROR, -} flowperpkt_ipv4_error_t; - -static char *flowperpkt_ipv4_error_strings[] = { -#define _(sym,string) string, - foreach_flowperpkt_ipv4_error -#undef _ -}; - -typedef enum -{ - FLOWPERPKT_IPV4_NEXT_DROP, - FLOWPERPKT_IPV4_NEXT_LOOKUP, - FLOWPERPKT_IPV4_N_NEXT, -} flowperpkt_ipv4_next_t; - -/** - * @brief add an entry to the flow record under construction - * @param vm vlib_main_t * current worker thread main structure pointer - * @param fm flowperpkt_main_t * flow-per-packet main structure pointer - * @param sw_if_index u32 interface handle - * @param tos u8 ToS bits from the packet - * @param timestamp u64 timestamp, nanoseconds since 1/1/70 - * @param length u16 ip length of the packet - * @param do_flush int 1 = flush all cached records, 0 = construct a record - */ - -static inline void -add_to_flow_record_ipv4 (vlib_main_t * vm, - vlib_node_runtime_t * node, - flowperpkt_main_t * fm, - u32 rx_sw_if_index, u32 tx_sw_if_index, - u32 src_address, u32 dst_address, - u8 tos, u64 timestamp, u16 length, int do_flush) -{ - u32 my_thread_index = vm->thread_index; - flow_report_main_t *frm = &flow_report_main; - ip4_header_t *ip; - udp_header_t *udp; - ip4_ipfix_template_packet_t *tp; - ipfix_message_header_t *h; - ipfix_set_header_t *s; - vlib_frame_t *f; - vlib_buffer_t *b0; - u16 offset; - u32 bi0; - vlib_buffer_free_list_t *fl; - - /* Find or allocate a buffer */ - b0 = fm->ipv4_buffers_per_worker[my_thread_index]; - - /* Need to allocate a buffer? */ - if (PREDICT_FALSE (b0 == 0)) - { - /* Nothing to flush */ - if (do_flush) - return; - - /* $$$$ drop counter? */ - if (vlib_buffer_alloc (vm, &bi0, 1) != 1) - return; - - /* Initialize the buffer */ - b0 = fm->ipv4_buffers_per_worker[my_thread_index] = - vlib_get_buffer (vm, bi0); - fl = - vlib_buffer_get_free_list (vm, VLIB_BUFFER_DEFAULT_FREE_LIST_INDEX); - vlib_buffer_init_for_free_list (b0, fl); - VLIB_BUFFER_TRACE_TRAJECTORY_INIT (b0); - offset = 0; - } - else - { - /* use the current buffer */ - bi0 = vlib_get_buffer_index (vm, b0); - offset = fm->ipv4_next_record_offset_per_worker[my_thread_index]; - } - - /* Find or allocate a frame */ - f = fm->ipv4_frames_per_worker[my_thread_index]; - if (PREDICT_FALSE (f == 0)) - { - u32 *to_next; - f = vlib_get_frame_to_node (vm, ip4_lookup_node.index); - fm->ipv4_frames_per_worker[my_thread_index] = f; - - /* Enqueue the buffer */ - to_next = vlib_frame_vector_args (f); - to_next[0] = bi0; - f->n_vectors = 1; - } - - /* Fresh packet, construct header */ - if (PREDICT_FALSE (offset == 0)) - { - flow_report_stream_t *stream; - - stream = &frm->streams[0]; - - b0->current_data = 0; - b0->current_length = sizeof (*ip) + sizeof (*udp) + sizeof (*h) + - sizeof (*s); - b0->flags |= (VLIB_BUFFER_TOTAL_LENGTH_VALID | VLIB_BUFFER_FLOW_REPORT); - vnet_buffer (b0)->sw_if_index[VLIB_RX] = 0; - vnet_buffer (b0)->sw_if_index[VLIB_TX] = frm->fib_index; - - tp = vlib_buffer_get_current (b0); - ip = (ip4_header_t *) & tp->ip4; - udp = (udp_header_t *) (ip + 1); - h = (ipfix_message_header_t *) (udp + 1); - s = (ipfix_set_header_t *) (h + 1); - - ip->ip_version_and_header_length = 0x45; - ip->ttl = 254; - ip->protocol = IP_PROTOCOL_UDP; - ip->flags_and_fragment_offset = 0; - ip->src_address.as_u32 = frm->src_address.as_u32; - ip->dst_address.as_u32 = frm->ipfix_collector.as_u32; - udp->src_port = clib_host_to_net_u16 (UDP_DST_PORT_ipfix); - udp->dst_port = clib_host_to_net_u16 (UDP_DST_PORT_ipfix); - udp->checksum = 0; - - /* FIXUP: message header export_time */ - h->export_time = (u32) - (((f64) frm->unix_time_0) + - (vlib_time_now (frm->vlib_main) - frm->vlib_time_0)); - h->export_time = clib_host_to_net_u32 (h->export_time); - h->domain_id = clib_host_to_net_u32 (stream->domain_id); - - /* FIXUP: message header sequence_number */ - h->sequence_number = stream->sequence_number++; - h->sequence_number = clib_host_to_net_u32 (h->sequence_number); - - offset = (u32) (((u8 *) (s + 1)) - (u8 *) tp); - } - - /* Add data, unless we're flushing stale data */ - if (PREDICT_TRUE (do_flush == 0)) - { - - /* Add data */ - /* Ingress interface */ - { - u32 ingress_interface = clib_host_to_net_u32 (rx_sw_if_index); - clib_memcpy (b0->data + offset, &ingress_interface, - sizeof (ingress_interface)); - offset += sizeof (ingress_interface); - } - /* Egress interface */ - { - u32 egress_interface = clib_host_to_net_u32 (tx_sw_if_index); - clib_memcpy (b0->data + offset, &egress_interface, - sizeof (egress_interface)); - offset += sizeof (egress_interface); - } - /* ip4 src address */ - { - clib_memcpy (b0->data + offset, &src_address, sizeof (src_address)); - offset += sizeof (src_address); - } - /* ip4 dst address */ - { - clib_memcpy (b0->data + offset, &dst_address, sizeof (dst_address)); - offset += sizeof (dst_address); - } - - /* ToS */ - b0->data[offset++] = tos; - - /* Timestamp */ - clib_memcpy (b0->data + offset, ×tamp, sizeof (f64)); - offset += sizeof (f64); - - /* pkt size */ - { - u16 pkt_size = clib_host_to_net_u16 (length); - clib_memcpy (b0->data + offset, &pkt_size, sizeof (pkt_size)); - offset += sizeof (pkt_size); - } - - b0->current_length += - /* sw_if_index + tos + timestamp + length = 15 */ - 4 * sizeof (u32) + sizeof (u8) + sizeof (f64) + sizeof (u16); - - } - /* Time to flush the buffer? */ - if (PREDICT_FALSE - (do_flush || (offset + 4 * sizeof (u32) + sizeof (u8) - + sizeof (f64) + sizeof (u16)) > frm->path_mtu)) - { - tp = vlib_buffer_get_current (b0); - ip = (ip4_header_t *) & tp->ip4; - udp = (udp_header_t *) (ip + 1); - h = (ipfix_message_header_t *) (udp + 1); - s = (ipfix_set_header_t *) (h + 1); - - s->set_id_length = ipfix_set_id_length (fm->ipv4_report_id, - b0->current_length - - (sizeof (*ip) + sizeof (*udp) + - sizeof (*h))); - h->version_length = version_length (b0->current_length - - (sizeof (*ip) + sizeof (*udp))); - - ip->length = clib_host_to_net_u16 (b0->current_length); - - ip->checksum = ip4_header_checksum (ip); - udp->length = clib_host_to_net_u16 (b0->current_length - sizeof (*ip)); - - if (frm->udp_checksum) - { - /* RFC 7011 section 10.3.2. */ - udp->checksum = ip4_tcp_udp_compute_checksum (vm, b0, ip); - if (udp->checksum == 0) - udp->checksum = 0xffff; - } - - ASSERT (ip->checksum == ip4_header_checksum (ip)); - - if (PREDICT_FALSE (vlib_get_trace_count (vm, node) > 0)) - { - vlib_trace_buffer (vm, node, FLOWPERPKT_IPV4_NEXT_LOOKUP, b0, - 0 /* follow chain */ ); - flowperpkt_ipv4_trace_t *t = - vlib_add_trace (vm, node, b0, sizeof (*t)); - t->rx_sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_RX]; - t->tx_sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_TX]; - t->src_address = 0; - t->dst_address = 0; - t->tos = 0; - t->timestamp = 0; - t->buffer_size = b0->current_length; - } - - vlib_put_frame_to_node (vm, ip4_lookup_node.index, - fm->ipv4_frames_per_worker[my_thread_index]); - fm->ipv4_frames_per_worker[my_thread_index] = 0; - fm->ipv4_buffers_per_worker[my_thread_index] = 0; - offset = 0; - } - - fm->ipv4_next_record_offset_per_worker[my_thread_index] = offset; -} - -void -flowperpkt_flush_callback_ipv4 (void) -{ - vlib_main_t *vm = vlib_get_main (); - flowperpkt_main_t *fm = &flowperpkt_main; - vlib_node_runtime_t *node; - node = vlib_node_get_runtime (vm, flowperpkt_ipv4_node.index); - - add_to_flow_record_ipv4 (vm, node, fm, 0 /* rx_sw_if_index */ , - 0 /* tx_sw_if_index */ , - 0 /* src_address */ , - 0 /* dst_address */ , - 0 /* ToS */ , - 0ULL /* timestamp */ , - 0 /* length */ , - 1 /* do_flush */ ); -} - - -static uword -flowperpkt_ipv4_node_fn (vlib_main_t * vm, - vlib_node_runtime_t * node, vlib_frame_t * frame) -{ - u32 n_left_from, *from, *to_next; - flowperpkt_ipv4_next_t next_index; - flowperpkt_main_t *fm = &flowperpkt_main; - u64 now; - - now = (u64) ((vlib_time_now (vm) - fm->vlib_time_0) * 1e9); - now += fm->nanosecond_time_0; - - from = vlib_frame_vector_args (frame); - n_left_from = frame->n_vectors; - next_index = node->cached_next_index; - - while (n_left_from > 0) - { - u32 n_left_to_next; - - vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); - - while (n_left_from >= 4 && n_left_to_next >= 2) - { - u32 next0 = FLOWPERPKT_IPV4_NEXT_DROP; - u32 next1 = FLOWPERPKT_IPV4_NEXT_DROP; - ip4_header_t *ip0, *ip1; - u16 len0, len1; - u32 bi0, bi1; - vlib_buffer_t *b0, *b1; - - /* Prefetch next iteration. */ - { - vlib_buffer_t *p2, *p3; - - p2 = vlib_get_buffer (vm, from[2]); - p3 = vlib_get_buffer (vm, from[3]); - - vlib_prefetch_buffer_header (p2, LOAD); - vlib_prefetch_buffer_header (p3, LOAD); - - CLIB_PREFETCH (p2->data, CLIB_CACHE_LINE_BYTES, STORE); - CLIB_PREFETCH (p3->data, CLIB_CACHE_LINE_BYTES, STORE); - } - - /* speculatively enqueue b0 and b1 to the current next frame */ - to_next[0] = bi0 = from[0]; - to_next[1] = bi1 = from[1]; - from += 2; - to_next += 2; - n_left_from -= 2; - n_left_to_next -= 2; - - b0 = vlib_get_buffer (vm, bi0); - b1 = vlib_get_buffer (vm, bi1); - - vnet_feature_next (vnet_buffer (b0)->sw_if_index[VLIB_TX], - &next0, b0); - vnet_feature_next (vnet_buffer (b1)->sw_if_index[VLIB_TX], - &next1, b1); - - ip0 = (ip4_header_t *) ((u8 *) vlib_buffer_get_current (b0) + - vnet_buffer (b0)->ip.save_rewrite_length); - - len0 = vlib_buffer_length_in_chain (vm, b0); - - if (PREDICT_TRUE ((b0->flags & VLIB_BUFFER_FLOW_REPORT) == 0)) - add_to_flow_record_ipv4 (vm, node, fm, - vnet_buffer (b0)->sw_if_index[VLIB_RX], - vnet_buffer (b0)->sw_if_index[VLIB_TX], - ip0->src_address.as_u32, - ip0->dst_address.as_u32, - ip0->tos, now, len0, 0 /* flush */ ); - - ip1 = (ip4_header_t *) ((u8 *) vlib_buffer_get_current (b1) + - vnet_buffer (b1)->ip.save_rewrite_length); - len1 = vlib_buffer_length_in_chain (vm, b1); - - if (PREDICT_TRUE ((b1->flags & VLIB_BUFFER_FLOW_REPORT) == 0)) - add_to_flow_record_ipv4 (vm, node, fm, - vnet_buffer (b1)->sw_if_index[VLIB_RX], - vnet_buffer (b1)->sw_if_index[VLIB_TX], - ip1->src_address.as_u32, - ip1->dst_address.as_u32, - ip1->tos, now, len1, 0 /* flush */ ); - - if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE))) - { - if (b0->flags & VLIB_BUFFER_IS_TRACED) - { - flowperpkt_ipv4_trace_t *t = - vlib_add_trace (vm, node, b0, sizeof (*t)); - t->rx_sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_RX]; - t->tx_sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_TX]; - t->src_address = ip0->src_address.as_u32; - t->dst_address = ip0->dst_address.as_u32; - t->tos = ip0->tos; - t->timestamp = now; - t->buffer_size = len0; - } - if (b1->flags & VLIB_BUFFER_IS_TRACED) - { - flowperpkt_ipv4_trace_t *t = - vlib_add_trace (vm, node, b1, sizeof (*t)); - t->rx_sw_if_index = vnet_buffer (b1)->sw_if_index[VLIB_RX]; - t->tx_sw_if_index = vnet_buffer (b1)->sw_if_index[VLIB_TX]; - t->src_address = ip1->src_address.as_u32; - t->dst_address = ip1->dst_address.as_u32; - t->tos = ip1->tos; - t->timestamp = now; - t->buffer_size = len1; - } - } - - /* verify speculative enqueues, maybe switch current next frame */ - vlib_validate_buffer_enqueue_x2 (vm, node, next_index, - to_next, n_left_to_next, - bi0, bi1, next0, next1); - } - - while (n_left_from > 0 && n_left_to_next > 0) - { - u32 bi0; - vlib_buffer_t *b0; - u32 next0 = FLOWPERPKT_IPV4_NEXT_DROP; - ip4_header_t *ip0; - u16 len0; - - /* speculatively enqueue b0 to the current next frame */ - bi0 = from[0]; - to_next[0] = bi0; - from += 1; - to_next += 1; - n_left_from -= 1; - n_left_to_next -= 1; - - b0 = vlib_get_buffer (vm, bi0); - - vnet_feature_next (vnet_buffer (b0)->sw_if_index[VLIB_TX], - &next0, b0); - - ip0 = (ip4_header_t *) ((u8 *) vlib_buffer_get_current (b0) + - vnet_buffer (b0)->ip.save_rewrite_length); - /* - * egressInterface, TLV type 14, u32 - * ipClassOfService, TLV type 5, u8 - * flowStartNanoseconds, TLV type 156, dateTimeNanoseconds (f64) - * Implementation: f64 nanoseconds since VPP started - * dataLinkFrameSize, TLV type 312, u16 - */ - len0 = vlib_buffer_length_in_chain (vm, b0); - - if (PREDICT_TRUE ((b0->flags & VLIB_BUFFER_FLOW_REPORT) == 0)) - add_to_flow_record_ipv4 (vm, node, fm, - vnet_buffer (b0)->sw_if_index[VLIB_RX], - vnet_buffer (b0)->sw_if_index[VLIB_TX], - ip0->src_address.as_u32, - ip0->dst_address.as_u32, - ip0->tos, now, len0, 0 /* flush */ ); - - if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE) - && (b0->flags & VLIB_BUFFER_IS_TRACED))) - { - flowperpkt_ipv4_trace_t *t = - vlib_add_trace (vm, node, b0, sizeof (*t)); - t->rx_sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_RX]; - t->tx_sw_if_index = vnet_buffer (b0)->sw_if_index[VLIB_TX]; - t->src_address = ip0->src_address.as_u32; - t->dst_address = ip0->dst_address.as_u32; - t->tos = ip0->tos; - t->timestamp = now; - t->buffer_size = len0; - } - - /* verify speculative enqueue, maybe switch current next frame */ - vlib_validate_buffer_enqueue_x1 (vm, node, next_index, - to_next, n_left_to_next, - bi0, next0); - } - - vlib_put_next_frame (vm, node, next_index, n_left_to_next); - } - return frame->n_vectors; -} - -/** - * @brief IPFIX ipv4 flow-per-packet graph node - * @node flowperpkt-ipv4 - * - * This is the IPFIX flow-record-per-packet node. - * - * @param vm vlib_main_t corresponding to the current thread. - * @param node vlib_node_runtime_t data for this node. - * @param frame vlib_frame_t whose contents should be dispatched. - * - * @par Graph mechanics: buffer metadata, next index usage - * - * Uses: - * - vnet_buffer(b)->ip.save_rewrite_length - * - tells the node the length of the rewrite which was applied in - * ip4/6_rewrite_inline, allows the code to find the IP header without - * having to parse L2 headers, or make stupid assumptions about their - * length. - * - vnet_buffer(b)->flags & VLIB_BUFFER_FLOW_REPORT - * - Used to suppress flow record generation for flow record packets. - * - * Sets: - * - vnet_buffer(b)->flags & VLIB_BUFFER_FLOW_REPORT - * - To suppress flow record generation for flow record packets - * - * Next Index: - * - Next configured output feature on the interface, usually - * "interface-output." Generated flow records head for ip4-lookup - */ - -/* *INDENT-OFF* */ -VLIB_REGISTER_NODE (flowperpkt_ipv4_node) = { - .function = flowperpkt_ipv4_node_fn, - .name = "flowperpkt-ipv4", - .vector_size = sizeof (u32), - .format_trace = format_flowperpkt_ipv4_trace, - .type = VLIB_NODE_TYPE_INTERNAL, - - .n_errors = ARRAY_LEN(flowperpkt_ipv4_error_strings), - .error_strings = flowperpkt_ipv4_error_strings, - - .n_next_nodes = FLOWPERPKT_IPV4_N_NEXT, - - /* edit / add dispositions here */ - .next_nodes = { - [FLOWPERPKT_IPV4_NEXT_DROP] = "error-drop", - /* Used only to trace ipfix data packets */ - [FLOWPERPKT_IPV4_NEXT_LOOKUP] = "ip4-lookup", - }, -}; -/* *INDENT-ON* */ - -/* - * fd.io coding-style-patch-verification: ON - * - * Local Variables: - * eval: (c-set-style "gnu") - * End: - */ diff --git a/src/plugins/flowprobe.am b/src/plugins/flowprobe.am new file mode 100644 index 00000000..c56e246d --- /dev/null +++ b/src/plugins/flowprobe.am @@ -0,0 +1,37 @@ + +# Copyright (c) +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at: +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +vppplugins_LTLIBRARIES += flowprobe_plugin.la +vppapitestplugins_LTLIBRARIES += flowprobe_test_plugin.la + +flowprobe_plugin_la_SOURCES = flowprobe/flowprobe.c \ + flowprobe/node.c \ + flowprobe/flowprobe_plugin.api.h + +BUILT_SOURCES += \ + flowprobe/flowprobe.api.h \ + flowprobe/flowprobe.api.json + +noinst_HEADERS += \ + flowprobe/flowprobe_all_api_h.h \ + flowprobe/flowprobe_msg_enum.h \ + flowprobe/flowprobe.api.h + +flowprobe_test_plugin_la_SOURCES = \ + flowprobe/flowprobe_test.c \ + flowprobe/flowprobe_plugin.api.h + +API_FILES += flowprobe/flowprobe.api + +# vi:syntax=automake diff --git a/src/plugins/flowprobe/flowprobe.api b/src/plugins/flowprobe/flowprobe.api new file mode 100644 index 00000000..3f8c583b --- /dev/null +++ b/src/plugins/flowprobe/flowprobe.api @@ -0,0 +1,40 @@ +/* Define a simple enable-disable binary API to control the feature */ + +/** \file + This file defines the vpp control-plane API messages + used to control the flowprobe plugin +*/ + +/** \brief Enable / disable per-packet IPFIX recording on an interface + @param client_index - opaque cookie to identify the sender + @param context - sender context, to match reply w/ request + @param is_add - add address if non-zero, else delete + @param is_ipv6 - if non-zero the address is ipv6, else ipv4 + @param sw_if_index - index of the interface +*/ +autoreply manual_print define flowprobe_tx_interface_add_del +{ + /* Client identifier, set from api_main.my_client_index */ + u32 client_index; + + /* Arbitrary context, so client can match reply to request */ + u32 context; + + /* Enable / disable the feature */ + u8 is_add; + u8 which; /* 0 = ipv4, 1 = l2, 2 = ipv6 */ + + /* Interface handle */ + u32 sw_if_index; +}; + +autoreply define flowprobe_params +{ + u32 client_index; + u32 context; + u8 record_l2; + u8 record_l3; + u8 record_l4; + u32 active_timer; /* ~0 is off, 0 is default */ + u32 passive_timer; /* ~0 is off, 0 is default */ +}; diff --git a/src/plugins/flowprobe/flowprobe.c b/src/plugins/flowprobe/flowprobe.c new file mode 100644 index 00000000..8975f89c --- /dev/null +++ b/src/plugins/flowprobe/flowprobe.c @@ -0,0 +1,1122 @@ +/* + * flowprobe.c - ipfix probe plugin + * + * Copyright (c) 2016 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * @file + * @brief Per-packet IPFIX flow record generator plugin + * + * This file implements vpp plugin registration mechanics, + * debug CLI, and binary API handling. + */ + +#include +#include +#include +#include + +#include +#include +#include + +/* define message IDs */ +#include + +/* define message structures */ +#define vl_typedefs +#include +#undef vl_typedefs + +/* define generated endian-swappers */ +#define vl_endianfun +#include +#undef vl_endianfun + +/* instantiate all the print functions we know about */ +#define vl_print(handle, ...) vlib_cli_output (handle, __VA_ARGS__) +#define vl_printfun +#include +#undef vl_printfun + +flowprobe_main_t flowprobe_main; +vlib_node_registration_t flowprobe_walker_node; +static vlib_node_registration_t flowprobe_timer_node; +uword flowprobe_walker_process (vlib_main_t * vm, vlib_node_runtime_t * rt, + vlib_frame_t * f); + +/* Get the API version number */ +#define vl_api_version(n,v) static u32 api_version=(v); +#include +#undef vl_api_version + +#define REPLY_MSG_ID_BASE fm->msg_id_base +#include + +/* Define the per-interface configurable features */ +/* *INDENT-OFF* */ +VNET_FEATURE_INIT (flow_perpacket_ip4, static) = +{ + .arc_name = "ip4-output", + .node_name = "flowprobe-ip4", + .runs_before = VNET_FEATURES ("interface-output"), +}; + +VNET_FEATURE_INIT (flow_perpacket_ip6, static) = +{ + .arc_name = "ip6-output", + .node_name = "flowprobe-ip6", + .runs_before = VNET_FEATURES ("interface-output"), +}; + +VNET_FEATURE_INIT (flow_perpacket_l2, static) = +{ + .arc_name = "interface-output", + .node_name = "flowprobe-l2", + .runs_before = VNET_FEATURES ("interface-tx"), +}; +/* *INDENT-ON* */ + +/* Macro to finish up custom dump fns */ +#define FINISH \ + vec_add1 (s, 0); \ + vl_print (handle, (char *)s); \ + vec_free (s); \ + return handle; + +static inline ipfix_field_specifier_t * +flowprobe_template_ip4_fields (ipfix_field_specifier_t * f) +{ +#define flowprobe_template_ip4_field_count() 4 + /* sourceIpv4Address, TLV type 8, u32 */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + sourceIPv4Address, 4); + f++; + /* destinationIPv4Address, TLV type 12, u32 */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + destinationIPv4Address, 4); + f++; + /* protocolIdentifier, TLV type 4, u8 */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + protocolIdentifier, 1); + f++; + /* octetDeltaCount, TLV type 1, u64 */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + octetDeltaCount, 8); + f++; + return f; +} + +static inline ipfix_field_specifier_t * +flowprobe_template_ip6_fields (ipfix_field_specifier_t * f) +{ +#define flowprobe_template_ip6_field_count() 4 + /* sourceIpv6Address, TLV type 27, 16 octets */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + sourceIPv6Address, 16); + f++; + /* destinationIPv6Address, TLV type 28, 16 octets */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + destinationIPv6Address, 16); + f++; + /* protocolIdentifier, TLV type 4, u8 */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + protocolIdentifier, 1); + f++; + /* octetDeltaCount, TLV type 1, u64 */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + octetDeltaCount, 8); + f++; + return f; +} + +static inline ipfix_field_specifier_t * +flowprobe_template_l2_fields (ipfix_field_specifier_t * f) +{ +#define flowprobe_template_l2_field_count() 3 + /* sourceMacAddress, TLV type 56, u8[6] we hope */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + sourceMacAddress, 6); + f++; + /* destinationMacAddress, TLV type 80, u8[6] we hope */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + destinationMacAddress, 6); + f++; + /* ethernetType, TLV type 256, u16 */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + ethernetType, 2); + f++; + return f; +} + +static inline ipfix_field_specifier_t * +flowprobe_template_common_fields (ipfix_field_specifier_t * f) +{ +#define flowprobe_template_common_field_count() 3 + /* ingressInterface, TLV type 10, u32 */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + ingressInterface, 4); + f++; + + /* egressInterface, TLV type 14, u32 */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + egressInterface, 4); + f++; + + /* packetDeltaCount, TLV type 2, u64 */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + packetDeltaCount, 8); + f++; + + return f; +} + +static inline ipfix_field_specifier_t * +flowprobe_template_l4_fields (ipfix_field_specifier_t * f) +{ +#define flowprobe_template_l4_field_count() 2 + /* sourceTransportPort, TLV type 7, u16 */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + sourceTransportPort, 2); + f++; + /* destinationTransportPort, TLV type 11, u16 */ + f->e_id_length = ipfix_e_id_length (0 /* enterprise */ , + destinationTransportPort, 2); + f++; + return f; +} + +/** + * @brief Create an IPFIX template packet rewrite string + * @param frm flow_report_main_t * + * @param fr flow_report_t * + * @param collector_address ip4_address_t * the IPFIX collector address + * @param src_address ip4_address_t * the source address we should use + * @param collector_port u16 the collector port we should use, host byte order + * @returns u8 * vector containing the indicated IPFIX template packet + */ +static inline u8 * +flowprobe_template_rewrite_inline (flow_report_main_t * frm, + flow_report_t * fr, + ip4_address_t * collector_address, + ip4_address_t * src_address, + u16 collector_port, + flowprobe_variant_t which) +{ + ip4_header_t *ip; + udp_header_t *udp; + ipfix_message_header_t *h; + ipfix_set_header_t *s; + ipfix_template_header_t *t; + ipfix_field_specifier_t *f; + ipfix_field_specifier_t *first_field; + u8 *rewrite = 0; + ip4_ipfix_template_packet_t *tp; + u32 field_count = 0; + flow_report_stream_t *stream; + flowprobe_main_t *fm = &flowprobe_main; + flowprobe_record_t flags = fr->opaque.as_uword; + bool collect_ip4 = false, collect_ip6 = false; + + stream = &frm->streams[fr->stream_index]; + + if (flags & FLOW_RECORD_L3) + { + collect_ip4 = which == FLOW_VARIANT_L2_IP4 || which == FLOW_VARIANT_IP4; + collect_ip6 = which == FLOW_VARIANT_L2_IP6 || which == FLOW_VARIANT_IP6; + if (which == FLOW_VARIANT_L2_IP4) + flags |= FLOW_RECORD_L2_IP4; + if (which == FLOW_VARIANT_L2_IP6) + flags |= FLOW_RECORD_L2_IP6; + } + + field_count += flowprobe_template_common_field_count (); + if (flags & FLOW_RECORD_L2) + field_count += flowprobe_template_l2_field_count (); + if (collect_ip4) + field_count += flowprobe_template_ip4_field_count (); + if (collect_ip6) + field_count += flowprobe_template_ip6_field_count (); + if (flags & FLOW_RECORD_L4) + field_count += flowprobe_template_l4_field_count (); + + /* allocate rewrite space */ + vec_validate_aligned + (rewrite, sizeof (ip4_ipfix_template_packet_t) + + field_count * sizeof (ipfix_field_specifier_t) - 1, + CLIB_CACHE_LINE_BYTES); + + tp = (ip4_ipfix_template_packet_t *) rewrite; + ip = (ip4_header_t *) & tp->ip4; + udp = (udp_header_t *) (ip + 1); + h = (ipfix_message_header_t *) (udp + 1); + s = (ipfix_set_header_t *) (h + 1); + t = (ipfix_template_header_t *) (s + 1); + first_field = f = (ipfix_field_specifier_t *) (t + 1); + + ip->ip_version_and_header_length = 0x45; + ip->ttl = 254; + ip->protocol = IP_PROTOCOL_UDP; + ip->src_address.as_u32 = src_address->as_u32; + ip->dst_address.as_u32 = collector_address->as_u32; + udp->src_port = clib_host_to_net_u16 (stream->src_port); + udp->dst_port = clib_host_to_net_u16 (collector_port); + udp->length = clib_host_to_net_u16 (vec_len (rewrite) - sizeof (*ip)); + + /* FIXUP: message header export_time */ + /* FIXUP: message header sequence_number */ + h->domain_id = clib_host_to_net_u32 (stream->domain_id); + + /* Add TLVs to the template */ + f = flowprobe_template_common_fields (f); + + if (flags & FLOW_RECORD_L2) + f = flowprobe_template_l2_fields (f); + if (collect_ip4) + f = flowprobe_template_ip4_fields (f); + if (collect_ip6) + f = flowprobe_template_ip6_fields (f); + if (flags & FLOW_RECORD_L4) + f = flowprobe_template_l4_fields (f); + + /* Back to the template packet... */ + ip = (ip4_header_t *) & tp->ip4; + udp = (udp_header_t *) (ip + 1); + + ASSERT (f - first_field); + /* Field count in this template */ + t->id_count = ipfix_id_count (fr->template_id, f - first_field); + + fm->template_size[flags] = (u8 *) f - (u8 *) s; + + /* set length in octets */ + s->set_id_length = + ipfix_set_id_length (2 /* set_id */ , (u8 *) f - (u8 *) s); + + /* message length in octets */ + h->version_length = version_length ((u8 *) f - (u8 *) h); + + ip->length = clib_host_to_net_u16 ((u8 *) f - (u8 *) ip); + ip->checksum = ip4_header_checksum (ip); + + return rewrite; +} + +static u8 * +flowprobe_template_rewrite_ip6 (flow_report_main_t * frm, + flow_report_t * fr, + ip4_address_t * collector_address, + ip4_address_t * src_address, + u16 collector_port) +{ + return flowprobe_template_rewrite_inline + (frm, fr, collector_address, src_address, collector_port, + FLOW_VARIANT_IP6); +} + +static u8 * +flowprobe_template_rewrite_ip4 (flow_report_main_t * frm, + flow_report_t * fr, + ip4_address_t * collector_address, + ip4_address_t * src_address, + u16 collector_port) +{ + return flowprobe_template_rewrite_inline + (frm, fr, collector_address, src_address, collector_port, + FLOW_VARIANT_IP4); +} + +static u8 * +flowprobe_template_rewrite_l2 (flow_report_main_t * frm, + flow_report_t * fr, + ip4_address_t * collector_address, + ip4_address_t * src_address, + u16 collector_port) +{ + return flowprobe_template_rewrite_inline + (frm, fr, collector_address, src_address, collector_port, + FLOW_VARIANT_L2); +} + +static u8 * +flowprobe_template_rewrite_l2_ip4 (flow_report_main_t * frm, + flow_report_t * fr, + ip4_address_t * collector_address, + ip4_address_t * src_address, + u16 collector_port) +{ + return flowprobe_template_rewrite_inline + (frm, fr, collector_address, src_address, collector_port, + FLOW_VARIANT_L2_IP4); +} + +static u8 * +flowprobe_template_rewrite_l2_ip6 (flow_report_main_t * frm, + flow_report_t * fr, + ip4_address_t * collector_address, + ip4_address_t * src_address, + u16 collector_port) +{ + return flowprobe_template_rewrite_inline + (frm, fr, collector_address, src_address, collector_port, + FLOW_VARIANT_L2_IP6); +} + +/** + * @brief Flush accumulated data + * @param frm flow_report_main_t * + * @param fr flow_report_t * + * @param f vlib_frame_t * + * + * Notes: + * This function must simply return the incoming frame, or no template packets + * will be sent. + */ +vlib_frame_t * +flowprobe_data_callback_ip4 (flow_report_main_t * frm, + flow_report_t * fr, + vlib_frame_t * f, u32 * to_next, u32 node_index) +{ + flowprobe_flush_callback_ip4 (); + return f; +} + +vlib_frame_t * +flowprobe_data_callback_ip6 (flow_report_main_t * frm, + flow_report_t * fr, + vlib_frame_t * f, u32 * to_next, u32 node_index) +{ + flowprobe_flush_callback_ip6 (); + return f; +} + +vlib_frame_t * +flowprobe_data_callback_l2 (flow_report_main_t * frm, + flow_report_t * fr, + vlib_frame_t * f, u32 * to_next, u32 node_index) +{ + flowprobe_flush_callback_l2 (); + return f; +} + +static int +flowprobe_template_add_del (u32 domain_id, u16 src_port, + flowprobe_record_t flags, + vnet_flow_data_callback_t * flow_data_callback, + vnet_flow_rewrite_callback_t * rewrite_callback, + bool is_add, u16 * template_id) +{ + flow_report_main_t *frm = &flow_report_main; + vnet_flow_report_add_del_args_t a = { + .rewrite_callback = rewrite_callback, + .flow_data_callback = flow_data_callback, + .is_add = is_add, + .domain_id = domain_id, + .src_port = src_port, + .opaque.as_uword = flags, + }; + return vnet_flow_report_add_del (frm, &a, template_id); +} + +static void +flowprobe_expired_timer_callback (u32 * expired_timers) +{ + vlib_main_t *vm = vlib_get_main (); + flowprobe_main_t *fm = &flowprobe_main; + u32 my_cpu_number = vm->thread_index; + int i; + u32 poolindex; + + for (i = 0; i < vec_len (expired_timers); i++) + { + poolindex = expired_timers[i] & 0x7FFFFFFF; + vec_add1 (fm->expired_passive_per_worker[my_cpu_number], poolindex); + } +} + +static clib_error_t * +flowprobe_create_state_tables (u32 active_timer) +{ + flowprobe_main_t *fm = &flowprobe_main; + vlib_thread_main_t *tm = &vlib_thread_main; + vlib_main_t *vm = vlib_get_main (); + clib_error_t *error = 0; + u32 num_threads; + int i; + + /* Decide how many worker threads we have */ + num_threads = 1 /* main thread */ + tm->n_threads; + + /* Hash table per worker */ + fm->ht_log2len = FLOWPROBE_LOG2_HASHSIZE; + + /* Init per worker flow state and timer wheels */ + if (active_timer) + { + vec_validate (fm->timers_per_worker, num_threads - 1); + vec_validate (fm->expired_passive_per_worker, num_threads - 1); + vec_validate (fm->hash_per_worker, num_threads - 1); + vec_validate (fm->pool_per_worker, num_threads - 1); + + for (i = 0; i < num_threads; i++) + { + int j; + pool_alloc (fm->pool_per_worker[i], 1 << fm->ht_log2len); + vec_resize (fm->hash_per_worker[i], 1 << fm->ht_log2len); + for (j = 0; j < (1 << fm->ht_log2len); j++) + fm->hash_per_worker[i][j] = ~0; + fm->timers_per_worker[i] = + clib_mem_alloc (sizeof (TWT (tw_timer_wheel))); + tw_timer_wheel_init_2t_1w_2048sl (fm->timers_per_worker[i], + flowprobe_expired_timer_callback, + 1.0, 1024); + } + fm->disabled = true; + } + else + { + f64 now = vlib_time_now (vm); + vec_validate (fm->stateless_entry, num_threads - 1); + for (i = 0; i < num_threads; i++) + fm->stateless_entry[i].last_exported = now; + fm->disabled = false; + } + fm->initialized = true; + return error; +} + +static int +validate_feature_on_interface (flowprobe_main_t * fm, u32 sw_if_index, + u8 which) +{ + vec_validate_init_empty (fm->flow_per_interface, sw_if_index, ~0); + + if (fm->flow_per_interface[sw_if_index] == (u8) ~ 0) + return -1; + else if (fm->flow_per_interface[sw_if_index] != which) + return 0; + else + return 1; +} + +/** + * @brief configure / deconfigure the IPFIX flow-per-packet + * @param fm flowprobe_main_t * fm + * @param sw_if_index u32 the desired interface + * @param is_add int 1 to enable the feature, 0 to disable it + * @returns 0 if successful, non-zero otherwise + */ + +static int +flowprobe_tx_interface_add_del_feature (flowprobe_main_t * fm, + u32 sw_if_index, u8 which, int is_add) +{ + vlib_main_t *vm = vlib_get_main (); + int rv = 0; + u16 template_id = 0; + flowprobe_record_t flags = fm->record; + + fm->flow_per_interface[sw_if_index] = (is_add) ? which : (u8) ~ 0; + fm->template_per_flow[which] += (is_add) ? 1 : -1; + if (is_add && fm->template_per_flow[which] > 1) + template_id = fm->template_reports[flags]; + + if ((is_add && fm->template_per_flow[which] == 1) || + (!is_add && fm->template_per_flow[which] == 0)) + { + if (which == FLOW_VARIANT_L2) + { + if (fm->record & FLOW_RECORD_L2) + { + rv = flowprobe_template_add_del (1, UDP_DST_PORT_ipfix, flags, + flowprobe_data_callback_l2, + flowprobe_template_rewrite_l2, + is_add, &template_id); + } + if (fm->record & FLOW_RECORD_L3 || fm->record & FLOW_RECORD_L4) + { + rv = flowprobe_template_add_del (1, UDP_DST_PORT_ipfix, flags, + flowprobe_data_callback_l2, + flowprobe_template_rewrite_l2_ip4, + is_add, &template_id); + fm->template_reports[flags | FLOW_RECORD_L2_IP4] = + (is_add) ? template_id : 0; + rv = + flowprobe_template_add_del (1, UDP_DST_PORT_ipfix, flags, + flowprobe_data_callback_l2, + flowprobe_template_rewrite_l2_ip6, + is_add, &template_id); + fm->template_reports[flags | FLOW_RECORD_L2_IP6] = + (is_add) ? template_id : 0; + + /* Special case L2 */ + fm->context[FLOW_VARIANT_L2_IP4].flags = + flags | FLOW_RECORD_L2_IP4; + fm->context[FLOW_VARIANT_L2_IP6].flags = + flags | FLOW_RECORD_L2_IP6; + + fm->template_reports[flags] = template_id; + } + } + else if (which == FLOW_VARIANT_IP4) + rv = flowprobe_template_add_del (1, UDP_DST_PORT_ipfix, flags, + flowprobe_data_callback_ip4, + flowprobe_template_rewrite_ip4, + is_add, &template_id); + else if (which == FLOW_VARIANT_IP6) + rv = flowprobe_template_add_del (1, UDP_DST_PORT_ipfix, flags, + flowprobe_data_callback_ip6, + flowprobe_template_rewrite_ip6, + is_add, &template_id); + } + if (rv && rv != VNET_API_ERROR_VALUE_EXIST) + { + clib_warning ("vnet_flow_report_add_del returned %d", rv); + return -1; + } + + if (which != (u8) ~ 0) + { + fm->context[which].flags = fm->record; + fm->template_reports[flags] = (is_add) ? template_id : 0; + } + + if (which == FLOW_VARIANT_IP4) + vnet_feature_enable_disable ("ip4-output", "flowprobe-ip4", + sw_if_index, is_add, 0, 0); + else if (which == FLOW_VARIANT_IP6) + vnet_feature_enable_disable ("ip6-output", "flowprobe-ip6", + sw_if_index, is_add, 0, 0); + else if (which == FLOW_VARIANT_L2) + vnet_feature_enable_disable ("interface-output", "flowprobe-l2", + sw_if_index, is_add, 0, 0); + + /* Stateful flow collection */ + if (is_add && !fm->initialized) + { + flowprobe_create_state_tables (fm->active_timer); + if (fm->active_timer) + vlib_process_signal_event (vm, flowprobe_timer_node.index, 1, 0); + } + + return 0; +} + +/** + * @brief API message handler + * @param mp vl_api_flowprobe_tx_interface_add_del_t * mp the api message + */ +void vl_api_flowprobe_tx_interface_add_del_t_handler + (vl_api_flowprobe_tx_interface_add_del_t * mp) +{ + flowprobe_main_t *fm = &flowprobe_main; + vl_api_flowprobe_tx_interface_add_del_reply_t *rmp; + u32 sw_if_index = ntohl (mp->sw_if_index); + int rv = 0; + + VALIDATE_SW_IF_INDEX (mp); + + if (mp->which != FLOW_VARIANT_IP4 && mp->which != FLOW_VARIANT_L2 + && mp->which != FLOW_VARIANT_IP6) + { + rv = VNET_API_ERROR_UNIMPLEMENTED; + goto out; + } + + if (fm->record == 0) + { + clib_warning ("Please specify flowprobe params record first..."); + rv = VNET_API_ERROR_CANNOT_ENABLE_DISABLE_FEATURE; + goto out; + } + + rv = validate_feature_on_interface (fm, sw_if_index, mp->which); + if ((rv == 1 && mp->is_add == 1) || rv == 0) + { + rv = VNET_API_ERROR_CANNOT_ENABLE_DISABLE_FEATURE; + goto out; + } + + rv = flowprobe_tx_interface_add_del_feature + (fm, sw_if_index, mp->which, mp->is_add); + +out: + BAD_SW_IF_INDEX_LABEL; + + REPLY_MACRO (VL_API_FLOWPROBE_TX_INTERFACE_ADD_DEL_REPLY); +} + +/** + * @brief API message custom-dump function + * @param mp vl_api_flowprobe_tx_interface_add_del_t * mp the api message + * @param handle void * print function handle + * @returns u8 * output string + */ +static void *vl_api_flowprobe_tx_interface_add_del_t_print + (vl_api_flowprobe_tx_interface_add_del_t * mp, void *handle) +{ + u8 *s; + + s = format (0, "SCRIPT: flowprobe_tx_interface_add_del "); + s = format (s, "sw_if_index %d is_add %d which %d ", + clib_host_to_net_u32 (mp->sw_if_index), + (int) mp->is_add, (int) mp->which); + FINISH; +} + +#define vec_neg_search(v,E) \ +({ \ + word _v(i) = 0; \ + while (_v(i) < vec_len(v) && v[_v(i)] == E) \ + { \ + _v(i)++; \ + } \ + if (_v(i) == vec_len(v)) \ + _v(i) = ~0; \ + _v(i); \ +}) + +static int +flowprobe_params (flowprobe_main_t * fm, u8 record_l2, + u8 record_l3, u8 record_l4, + u32 active_timer, u32 passive_timer) +{ + flowprobe_record_t flags = 0; + + if (vec_neg_search (fm->flow_per_interface, (u8) ~ 0) != ~0) + return ~0; + + if (record_l2) + flags |= FLOW_RECORD_L2; + if (record_l3) + flags |= FLOW_RECORD_L3; + if (record_l4) + flags |= FLOW_RECORD_L4; + + fm->record = flags; + + /* + * Timers: ~0 is default, 0 is off + */ + fm->active_timer = + (active_timer == (u32) ~ 0 ? FLOWPROBE_TIMER_ACTIVE : active_timer); + fm->passive_timer = + (passive_timer == (u32) ~ 0 ? FLOWPROBE_TIMER_PASSIVE : passive_timer); + + return 0; +} + +void +vl_api_flowprobe_params_t_handler (vl_api_flowprobe_params_t * mp) +{ + flowprobe_main_t *fm = &flowprobe_main; + vl_api_flowprobe_params_reply_t *rmp; + int rv = 0; + + rv = flowprobe_params + (fm, mp->record_l2, mp->record_l3, mp->record_l4, + clib_net_to_host_u32 (mp->active_timer), + clib_net_to_host_u32 (mp->passive_timer)); + + REPLY_MACRO (VL_API_FLOWPROBE_PARAMS_REPLY); +} + +/* List of message types that this plugin understands */ +#define foreach_flowprobe_plugin_api_msg \ +_(FLOWPROBE_TX_INTERFACE_ADD_DEL, flowprobe_tx_interface_add_del) \ +_(FLOWPROBE_PARAMS, flowprobe_params) + +/* *INDENT-OFF* */ +VLIB_PLUGIN_REGISTER () = { + .version = VPP_BUILD_VER, + .description = "Flow per Packet", +}; +/* *INDENT-ON* */ + +u8 * +format_flowprobe_entry (u8 * s, va_list * args) +{ + flowprobe_entry_t *e = va_arg (*args, flowprobe_entry_t *); + s = format (s, " %d/%d", e->key.rx_sw_if_index, e->key.tx_sw_if_index); + + s = format (s, " %U %U", format_ethernet_address, &e->key.src_mac, + format_ethernet_address, &e->key.dst_mac); + s = format (s, " %U -> %U", + format_ip46_address, &e->key.src_address, IP46_TYPE_ANY, + format_ip46_address, &e->key.dst_address, IP46_TYPE_ANY); + s = format (s, " %d", e->key.protocol); + s = format (s, " %d %d\n", clib_net_to_host_u16 (e->key.src_port), + clib_net_to_host_u16 (e->key.dst_port)); + + return s; +} + +static clib_error_t * +flowprobe_show_table_fn (vlib_main_t * vm, + unformat_input_t * input, vlib_cli_command_t * cm) +{ + flowprobe_main_t *fm = &flowprobe_main; + int i; + flowprobe_entry_t *e; + + vlib_cli_output (vm, "Dumping IPFIX table"); + + for (i = 0; i < vec_len (fm->pool_per_worker); i++) + { + /* *INDENT-OFF* */ + pool_foreach (e, fm->pool_per_worker[i], ( + { + vlib_cli_output (vm, "%U", + format_flowprobe_entry, + e); + })); + /* *INDENT-ON* */ + + } + return 0; +} + +static clib_error_t * +flowprobe_show_stats_fn (vlib_main_t * vm, + unformat_input_t * input, vlib_cli_command_t * cm) +{ + flowprobe_main_t *fm = &flowprobe_main; + int i; + + vlib_cli_output (vm, "IPFIX table statistics"); + vlib_cli_output (vm, "Flow entry size: %d\n", sizeof (flowprobe_entry_t)); + vlib_cli_output (vm, "Flow pool size per thread: %d\n", + 0x1 << FLOWPROBE_LOG2_HASHSIZE); + + for (i = 0; i < vec_len (fm->pool_per_worker); i++) + vlib_cli_output (vm, "Pool utilisation thread %d is %d%%\n", i, + (100 * pool_elts (fm->pool_per_worker[i])) / + (0x1 << FLOWPROBE_LOG2_HASHSIZE)); + return 0; +} + +static clib_error_t * +flowprobe_tx_interface_add_del_feature_command_fn (vlib_main_t * vm, + unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + flowprobe_main_t *fm = &flowprobe_main; + u32 sw_if_index = ~0; + int is_add = 1; + u8 which = FLOW_VARIANT_IP4; + int rv; + + while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) + { + if (unformat (input, "disable")) + is_add = 0; + else if (unformat (input, "%U", unformat_vnet_sw_interface, + fm->vnet_main, &sw_if_index)); + else if (unformat (input, "ip4")) + which = FLOW_VARIANT_IP4; + else if (unformat (input, "ip6")) + which = FLOW_VARIANT_IP6; + else if (unformat (input, "l2")) + which = FLOW_VARIANT_L2; + else + break; + } + + if (fm->record == 0) + return clib_error_return (0, + "Please specify flowprobe params record first..."); + + if (sw_if_index == ~0) + return clib_error_return (0, "Please specify an interface..."); + + rv = validate_feature_on_interface (fm, sw_if_index, which); + if (rv == 1) + { + if (is_add) + return clib_error_return (0, + "Datapath is already enabled for given interface..."); + } + else if (rv == 0) + return clib_error_return (0, + "Interface has enable different datapath ..."); + + rv = + flowprobe_tx_interface_add_del_feature (fm, sw_if_index, which, is_add); + switch (rv) + { + case 0: + break; + + case VNET_API_ERROR_INVALID_SW_IF_INDEX: + return clib_error_return + (0, "Invalid interface, only works on physical ports"); + break; + + case VNET_API_ERROR_UNIMPLEMENTED: + return clib_error_return (0, "ip6 not supported"); + break; + + default: + return clib_error_return (0, "flowprobe_enable_disable returned %d", + rv); + } + return 0; +} + +static clib_error_t * +flowprobe_params_command_fn (vlib_main_t * vm, + unformat_input_t * input, + vlib_cli_command_t * cmd) +{ + flowprobe_main_t *fm = &flowprobe_main; + bool record_l2 = false, record_l3 = false, record_l4 = false; + u32 active_timer = ~0; + u32 passive_timer = ~0; + + while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) + { + if (unformat (input, "active %d", &active_timer)) + ; + else if (unformat (input, "passive %d", &passive_timer)) + ; + else if (unformat (input, "record")) + while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) + { + if (unformat (input, "l2")) + record_l2 = true; + else if (unformat (input, "l3")) + record_l3 = true; + else if (unformat (input, "l4")) + record_l4 = true; + else + break; + } + else + break; + } + + if (passive_timer > 0 && active_timer > passive_timer) + return clib_error_return (0, + "Passive timer has to be greater than active one..."); + + if (flowprobe_params (fm, record_l2, record_l3, record_l4, + active_timer, passive_timer)) + return clib_error_return (0, + "Couldn't change flowperpacket params when feature is enabled on some interface ..."); + return 0; +} + +/*? + * 'flowprobe feature add-del' commands to enable/disable + * per-packet IPFIX flow record generation on an interface + * + * @cliexpar + * @parblock + * To enable per-packet IPFIX flow-record generation on an interface: + * @cliexcmd{flowprobe feature add-del GigabitEthernet2/0/0} + * + * To disable per-packet IPFIX flow-record generation on an interface: + * @cliexcmd{flowprobe feature add-del GigabitEthernet2/0/0 disable} + * @cliexend + * @endparblock +?*/ +/* *INDENT-OFF* */ +VLIB_CLI_COMMAND (flowprobe_enable_disable_command, static) = { + .path = "flowprobe feature add-del", + .short_help = + "flowprobe feature add-del disable", + .function = flowprobe_tx_interface_add_del_feature_command_fn, +}; +VLIB_CLI_COMMAND (flowprobe_params_command, static) = { + .path = "flowprobe params", + .short_help = + "flowprobe params record <[l2] [l3] [l4]> [active passive ]", + .function = flowprobe_params_command_fn, +}; +VLIB_CLI_COMMAND (flowprobe_show_table_command, static) = { + .path = "show flowprobe table", + .short_help = "show flowprobe table", + .function = flowprobe_show_table_fn, +}; +VLIB_CLI_COMMAND (flowprobe_show_stats_command, static) = { + .path = "show flowprobe statistics", + .short_help = "show flowprobe statistics", + .function = flowprobe_show_stats_fn, +}; +/* *INDENT-ON* */ + +/** + * @brief Set up the API message handling tables + * @param vm vlib_main_t * vlib main data structure pointer + * @returns 0 to indicate all is well + */ +static clib_error_t * +flowprobe_plugin_api_hookup (vlib_main_t * vm) +{ + flowprobe_main_t *fm = &flowprobe_main; +#define _(N,n) \ + vl_msg_api_set_handlers((VL_API_##N + fm->msg_id_base), \ + #n, \ + vl_api_##n##_t_handler, \ + vl_noop_handler, \ + vl_api_##n##_t_endian, \ + vl_api_##n##_t_print, \ + sizeof(vl_api_##n##_t), 1); + foreach_flowprobe_plugin_api_msg; +#undef _ + + return 0; +} + +#define vl_msg_name_crc_list +#include +#undef vl_msg_name_crc_list + +static void +setup_message_id_table (flowprobe_main_t * fm, api_main_t * am) +{ +#define _(id,n,crc) \ + vl_msg_api_add_msg_name_crc (am, #n "_" #crc, id + fm->msg_id_base); + foreach_vl_msg_name_crc_flowprobe; +#undef _ +} + +/* + * Main-core process, sending an interrupt to the per worker input + * process that spins the per worker timer wheel. + */ +static uword +timer_process (vlib_main_t * vm, vlib_node_runtime_t * rt, vlib_frame_t * f) +{ + uword *event_data = 0; + vlib_main_t **worker_vms = 0, *worker_vm; + flowprobe_main_t *fm = &flowprobe_main; + + /* Wait for Godot... */ + vlib_process_wait_for_event_or_clock (vm, 1e9); + uword event_type = vlib_process_get_events (vm, &event_data); + if (event_type != 1) + clib_warning ("bogus kickoff event received, %d", event_type); + vec_reset_length (event_data); + + int i; + if (vec_len (vlib_mains) == 0) + vec_add1 (worker_vms, vm); + else + { + for (i = 0; i < vec_len (vlib_mains); i++) + { + worker_vm = vlib_mains[i]; + if (worker_vm) + vec_add1 (worker_vms, worker_vm); + } + } + f64 sleep_duration = 0.1; + + while (1) + { + /* Send an interrupt to each timer input node */ + sleep_duration = 0.1; + for (i = 0; i < vec_len (worker_vms); i++) + { + worker_vm = worker_vms[i]; + if (worker_vm) + { + vlib_node_set_interrupt_pending (worker_vm, + flowprobe_walker_node.index); + sleep_duration = + (fm->expired_passive_per_worker[i] > 0) ? 1e-4 : 0.1; + } + } + vlib_process_suspend (vm, sleep_duration); + } + return 0; /* or not */ +} + +/* *INDENT-OFF* */ +VLIB_REGISTER_NODE (flowprobe_timer_node,static) = { + .function = timer_process, + .name = "flowprobe-timer-process", + .type = VLIB_NODE_TYPE_PROCESS, +}; +/* *INDENT-ON* */ + +/** + * @brief Set up the API message handling tables + * @param vm vlib_main_t * vlib main data structure pointer + * @returns 0 to indicate all is well, or a clib_error_t + */ +static clib_error_t * +flowprobe_init (vlib_main_t * vm) +{ + flowprobe_main_t *fm = &flowprobe_main; + vlib_thread_main_t *tm = &vlib_thread_main; + clib_error_t *error = 0; + u8 *name; + u32 num_threads; + int i; + + fm->vnet_main = vnet_get_main (); + + /* Construct the API name */ + name = format (0, "flowprobe_%08x%c", api_version, 0); + + /* Ask for a correctly-sized block of API message decode slots */ + fm->msg_id_base = vl_msg_api_get_msg_ids + ((char *) name, VL_MSG_FIRST_AVAILABLE); + + /* Hook up message handlers */ + error = flowprobe_plugin_api_hookup (vm); + + /* Add our API messages to the global name_crc hash table */ + setup_message_id_table (fm, &api_main); + + vec_free (name); + + /* Set up time reference pair */ + fm->vlib_time_0 = vlib_time_now (vm); + fm->nanosecond_time_0 = unix_time_now_nsec (); + + memset (fm->template_reports, 0, sizeof (fm->template_reports)); + memset (fm->template_size, 0, sizeof (fm->template_size)); + memset (fm->template_per_flow, 0, sizeof (fm->template_per_flow)); + + /* Decide how many worker threads we have */ + num_threads = 1 /* main thread */ + tm->n_threads; + + /* Allocate per worker thread vectors per flavour */ + for (i = 0; i < FLOW_N_VARIANTS; i++) + { + vec_validate (fm->context[i].buffers_per_worker, num_threads - 1); + vec_validate (fm->context[i].frames_per_worker, num_threads - 1); + vec_validate (fm->context[i].next_record_offset_per_worker, + num_threads - 1); + } + + fm->active_timer = FLOWPROBE_TIMER_ACTIVE; + fm->passive_timer = FLOWPROBE_TIMER_PASSIVE; + + return error; +} + +VLIB_INIT_FUNCTION (flowprobe_init); + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/src/plugins/flowprobe/flowprobe.h b/src/plugins/flowprobe/flowprobe.h new file mode 100644 index 00000000..196c92a7 --- /dev/null +++ b/src/plugins/flowprobe/flowprobe.h @@ -0,0 +1,167 @@ +/* + * flowprobe.h - ipfix probe plug-in header file + * + * Copyright (c) 2016 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#ifndef __included_flowprobe_h__ +#define __included_flowprobe_h__ + +#include +#include +#include + +#include +#include +#include +#include +#include + +/* Default timers in seconds */ +#define FLOWPROBE_TIMER_ACTIVE (15) +#define FLOWPROBE_TIMER_PASSIVE 120 // XXXX: FOR TESTING (30*60) +#define FLOWPROBE_LOG2_HASHSIZE (18) + +typedef enum +{ + FLOW_RECORD_L2 = 1 << 0, + FLOW_RECORD_L3 = 1 << 1, + FLOW_RECORD_L4 = 1 << 2, + FLOW_RECORD_L2_IP4 = 1 << 3, + FLOW_RECORD_L2_IP6 = 1 << 4, + FLOW_N_RECORDS = 1 << 5, +} flowprobe_record_t; + +/* *INDENT-OFF* */ +typedef enum __attribute__ ((__packed__)) +{ + FLOW_VARIANT_IP4, + FLOW_VARIANT_IP6, + FLOW_VARIANT_L2, + FLOW_VARIANT_L2_IP4, + FLOW_VARIANT_L2_IP6, + FLOW_N_VARIANTS, +} flowprobe_variant_t; +/* *INDENT-ON* */ + +STATIC_ASSERT (sizeof (flowprobe_variant_t) == 1, + "flowprobe_variant_t is expected to be 1 byte, " + "revisit padding in flowprobe_key_t"); + +#define FLOW_MAXIMUM_EXPORT_ENTRIES (1024) + +typedef struct +{ + /* what to collect per variant */ + flowprobe_record_t flags; + /** ipfix buffers under construction, per-worker thread */ + vlib_buffer_t **buffers_per_worker; + /** frames containing ipfix buffers, per-worker thread */ + vlib_frame_t **frames_per_worker; + /** next record offset, per worker thread */ + u16 *next_record_offset_per_worker; +} flowprobe_protocol_context_t; + +#define FLOWPROBE_KEY_IN_U32 22 +/* *INDENT-OFF* */ +typedef CLIB_PACKED (union +{ + struct { + u32 rx_sw_if_index; + u32 tx_sw_if_index; + u8 src_mac[6]; + u8 dst_mac[6]; + u16 ethertype; + ip46_address_t src_address; + ip46_address_t dst_address; + u8 protocol; + u16 src_port; + u16 dst_port; + flowprobe_variant_t which; + }; + u32 as_u32[FLOWPROBE_KEY_IN_U32]; +}) flowprobe_key_t; +/* *INDENT-ON* */ + +STATIC_ASSERT (sizeof (flowprobe_key_t) == FLOWPROBE_KEY_IN_U32 * + sizeof (u32), "flowprobe_key_t padding is wrong"); + +typedef struct +{ + flowprobe_key_t key; + u64 packetcount; + u64 octetcount; + f64 last_updated; + f64 last_exported; + u32 passive_timer_handle; +} flowprobe_entry_t; + +/** + * @file + * @brief flow-per-packet plugin header file + */ +typedef struct +{ + /** API message ID base */ + u16 msg_id_base; + + flowprobe_protocol_context_t context[FLOW_N_VARIANTS]; + u16 template_reports[FLOW_N_RECORDS]; + u16 template_size[FLOW_N_RECORDS]; + + /** Time reference pair */ + u64 nanosecond_time_0; + f64 vlib_time_0; + + /** Per CPU flow-state */ + u8 ht_log2len; /* Hash table size is 2^log2len */ + u32 **hash_per_worker; + flowprobe_entry_t **pool_per_worker; + /* *INDENT-OFF* */ + TWT (tw_timer_wheel) ** timers_per_worker; + /* *INDENT-ON* */ + u32 **expired_passive_per_worker; + + flowprobe_record_t record; + u32 active_timer; + u32 passive_timer; + flowprobe_entry_t *stateless_entry; + + bool initialized; + bool disabled; + + u16 template_per_flow[FLOW_N_VARIANTS]; + u8 *flow_per_interface; + + /** convenience vlib_main_t pointer */ + vlib_main_t *vlib_main; + /** convenience vnet_main_t pointer */ + vnet_main_t *vnet_main; +} flowprobe_main_t; + +extern flowprobe_main_t flowprobe_main; + +void flowprobe_flush_callback_ip4 (void); +void flowprobe_flush_callback_ip6 (void); +void flowprobe_flush_callback_l2 (void); +u8 *format_flowprobe_entry (u8 * s, va_list * args); + +#endif + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/src/plugins/flowprobe/flowprobe_all_api_h.h b/src/plugins/flowprobe/flowprobe_all_api_h.h new file mode 100644 index 00000000..1f30eccc --- /dev/null +++ b/src/plugins/flowprobe/flowprobe_all_api_h.h @@ -0,0 +1,18 @@ +/* + * flowprobe_all_api_h.h - plug-in api #include file + * + * Copyright (c) + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +/* Include the generated file, see BUILT_SOURCES in Makefile.am */ +#include diff --git a/src/plugins/flowprobe/flowprobe_msg_enum.h b/src/plugins/flowprobe/flowprobe_msg_enum.h new file mode 100644 index 00000000..bc0b21c9 --- /dev/null +++ b/src/plugins/flowprobe/flowprobe_msg_enum.h @@ -0,0 +1,31 @@ +/* + * flowprobe_msg_enum.h - vpp engine plug-in message enumeration + * + * Copyright (c) + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#ifndef included_flowprobe_msg_enum_h +#define included_flowprobe_msg_enum_h + +#include + +#define vl_msg_id(n,h) n, +typedef enum +{ +#include + /* We'll want to know how many messages IDs we need... */ + VL_MSG_FIRST_AVAILABLE, +} vl_msg_id_t; +#undef vl_msg_id + +#endif /* included_flowprobe_msg_enum_h */ diff --git a/src/plugins/flowprobe/flowprobe_plugin_doc.md b/src/plugins/flowprobe/flowprobe_plugin_doc.md new file mode 100644 index 00000000..4c9b2342 --- /dev/null +++ b/src/plugins/flowprobe/flowprobe_plugin_doc.md @@ -0,0 +1,13 @@ +IPFIX flow record plugin {#flowprobe_plugin_doc} +======================== + +## Introduction + +This plugin generates ipfix flow records on interfaces which have the feature enabled + +## Sample configuration + +set ipfix exporter collector 192.168.6.2 src 192.168.6.1 template-interval 20 port 4739 path-mtu 1500 + +flowprobe params record l3 active 20 passive 120 +flowprobe feature add-del GigabitEthernet2/3/0 l2 \ No newline at end of file diff --git a/src/plugins/flowprobe/flowprobe_test.c b/src/plugins/flowprobe/flowprobe_test.c new file mode 100644 index 00000000..91793f55 --- /dev/null +++ b/src/plugins/flowprobe/flowprobe_test.c @@ -0,0 +1,263 @@ +/* + * flowprobe.c - skeleton vpp-api-test plug-in + * + * Copyright (c) + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include +#include +#include +#include +#include +#include + +#define __plugin_msg_base flowprobe_test_main.msg_id_base +#include + +/** + * @file vpp_api_test plugin + */ + +uword unformat_sw_if_index (unformat_input_t * input, va_list * args); + +/* Declare message IDs */ +#include + +/* define message structures */ +#define vl_typedefs +#include +#undef vl_typedefs + +/* declare message handlers for each api */ + +#define vl_endianfun /* define message structures */ +#include +#undef vl_endianfun + +/* instantiate all the print functions we know about */ +#define vl_print(handle, ...) +#define vl_printfun +#include +#undef vl_printfun + +/* Get the API version number. */ +#define vl_api_version(n,v) static u32 api_version=(v); +#include +#undef vl_api_version + +typedef struct +{ + /** API message ID base */ + u16 msg_id_base; + /** vat_main_t pointer */ + vat_main_t *vat_main; +} flowprobe_test_main_t; + +flowprobe_test_main_t flowprobe_test_main; + +#define foreach_standard_reply_retval_handler \ +_(flowprobe_tx_interface_add_del_reply) \ +_(flowprobe_params_reply) + +#define _(n) \ + static void vl_api_##n##_t_handler \ + (vl_api_##n##_t * mp) \ + { \ + vat_main_t * vam = flowprobe_test_main.vat_main; \ + i32 retval = ntohl(mp->retval); \ + if (vam->async_mode) { \ + vam->async_errors += (retval < 0); \ + } else { \ + vam->retval = retval; \ + vam->result_ready = 1; \ + } \ + } +foreach_standard_reply_retval_handler; +#undef _ + +/* + * Table of message reply handlers, must include boilerplate handlers + * we just generated + */ +#define foreach_vpe_api_reply_msg \ +_(FLOWPROBE_TX_INTERFACE_ADD_DEL_REPLY, \ + flowprobe_tx_interface_add_del_reply) \ +_(FLOWPROBE_PARAMS_REPLY, flowprobe_params_reply) + +static int +api_flowprobe_tx_interface_add_del (vat_main_t * vam) +{ + unformat_input_t *i = vam->input; + int enable_disable = 1; + u8 which = FLOW_VARIANT_IP4; + u32 sw_if_index = ~0; + vl_api_flowprobe_tx_interface_add_del_t *mp; + int ret; + + /* Parse args required to build the message */ + while (unformat_check_input (i) != UNFORMAT_END_OF_INPUT) + { + if (unformat (i, "%U", unformat_sw_if_index, vam, &sw_if_index)) + ; + else if (unformat (i, "sw_if_index %d", &sw_if_index)) + ; + else if (unformat (i, "disable")) + enable_disable = 0; + else if (unformat (i, "ip4")) + which = FLOW_VARIANT_IP4; + else if (unformat (i, "ip6")) + which = FLOW_VARIANT_IP6; + else if (unformat (i, "l2")) + which = FLOW_VARIANT_L2; + else + break; + } + + if (sw_if_index == ~0) + { + errmsg ("missing interface name / explicit sw_if_index number \n"); + return -99; + } + + /* Construct the API message */ + M (FLOWPROBE_TX_INTERFACE_ADD_DEL, mp); + mp->sw_if_index = ntohl (sw_if_index); + mp->is_add = enable_disable; + mp->which = which; + + /* send it... */ + S (mp); + + /* Wait for a reply... */ + W (ret); + return ret; +} + +static int +api_flowprobe_params (vat_main_t * vam) +{ + unformat_input_t *i = vam->input; + u8 record_l2 = 0, record_l3 = 0, record_l4 = 0; + u32 active_timer = ~0; + u32 passive_timer = ~0; + vl_api_flowprobe_params_t *mp; + int ret; + + while (unformat_check_input (i) != UNFORMAT_END_OF_INPUT) + { + if (unformat (i, "active %d", &active_timer)) + ; + else if (unformat (i, "passive %d", &passive_timer)) + ; + else if (unformat (i, "record")) + while (unformat_check_input (i) != UNFORMAT_END_OF_INPUT) + { + if (unformat (i, "l2")) + record_l2 = 1; + else if (unformat (i, "l3")) + record_l3 = 1; + else if (unformat (i, "l4")) + record_l4 = 1; + else + break; + } + else + break; + } + + if (passive_timer > 0 && active_timer > passive_timer) + { + errmsg ("Passive timer has to be greater than active one...\n"); + return -99; + } + + /* Construct the API message */ + M (FLOWPROBE_PARAMS, mp); + mp->record_l2 = record_l2; + mp->record_l3 = record_l3; + mp->record_l4 = record_l4; + mp->active_timer = ntohl (active_timer); + mp->passive_timer = ntohl (passive_timer); + + /* send it... */ + S (mp); + + /* Wait for a reply... */ + W (ret); + + return ret; +} + +/* + * List of messages that the api test plugin sends, + * and that the data plane plugin processes + */ +#define foreach_vpe_api_msg \ +_(flowprobe_tx_interface_add_del, " [disable]") \ +_(flowprobe_params, "record <[l2] [l3] [l4]> [active passive ]") + +static void +flowprobe_vat_api_hookup (vat_main_t * vam) +{ + flowprobe_test_main_t *sm = &flowprobe_test_main; + /* Hook up handlers for replies from the data plane plug-in */ +#define _(N,n) \ + vl_msg_api_set_handlers((VL_API_##N + sm->msg_id_base), \ + #n, \ + vl_api_##n##_t_handler, \ + vl_noop_handler, \ + vl_api_##n##_t_endian, \ + vl_api_##n##_t_print, \ + sizeof(vl_api_##n##_t), 1); + foreach_vpe_api_reply_msg; +#undef _ + + /* API messages we can send */ +#define _(n,h) hash_set_mem (vam->function_by_name, #n, api_##n); + foreach_vpe_api_msg; +#undef _ + + /* Help strings */ +#define _(n,h) hash_set_mem (vam->help_by_name, #n, h); + foreach_vpe_api_msg; +#undef _ +} + +clib_error_t * +vat_plugin_register (vat_main_t * vam) +{ + flowprobe_test_main_t *sm = &flowprobe_test_main; + u8 *name; + + sm->vat_main = vam; + + /* Ask the vpp engine for the first assigned message-id */ + name = format (0, "flowprobe_%08x%c", api_version, 0); + sm->msg_id_base = vl_client_get_first_plugin_msg_id ((char *) name); + + /* Don't attempt to hook up API messages if the data plane plugin is AWOL */ + if (sm->msg_id_base != (u16) ~ 0) + flowprobe_vat_api_hookup (vam); + + vec_free (name); + + return 0; +} + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/src/plugins/flowprobe/node.c b/src/plugins/flowprobe/node.c new file mode 100644 index 00000000..6a539db9 --- /dev/null +++ b/src/plugins/flowprobe/node.c @@ -0,0 +1,998 @@ +/* + * node.c - ipfix probe graph node + * + * Copyright (c) 2017 Cisco and/or its affiliates. + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include +#include +#include +#include +#include +#include + +static void flowprobe_export_entry (vlib_main_t * vm, flowprobe_entry_t * e); + +/** + * @file flow record generator graph node + */ + +typedef struct +{ + /** interface handle */ + u32 rx_sw_if_index; + u32 tx_sw_if_index; + /** packet timestamp */ + u64 timestamp; + /** size of the buffer */ + u16 buffer_size; + + /** L2 information */ + u8 src_mac[6]; + u8 dst_mac[6]; + /** Ethertype */ + u16 ethertype; + + /** L3 information */ + ip46_address_t src_address; + ip46_address_t dst_address; + u8 protocol; + u8 tos; + + /** L4 information */ + u16 src_port; + u16 dst_port; + + flowprobe_variant_t which; +} flowprobe_trace_t; + +static char *flowprobe_variant_strings[] = { + [FLOW_VARIANT_IP4] = "IP4", + [FLOW_VARIANT_IP6] = "IP6", + [FLOW_VARIANT_L2] = "L2", + [FLOW_VARIANT_L2_IP4] = "L2-IP4", + [FLOW_VARIANT_L2_IP6] = "L2-IP6", +}; + +/* packet trace format function */ +static u8 * +format_flowprobe_trace (u8 * s, va_list * args) +{ + CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *); + CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *); + flowprobe_trace_t *t = va_arg (*args, flowprobe_trace_t *); + uword indent = format_get_indent (s); + + s = format (s, + "FLOWPROBE[%s]: rx_sw_if_index %d, tx_sw_if_index %d, " + "timestamp %lld, size %d", flowprobe_variant_strings[t->which], + t->rx_sw_if_index, t->tx_sw_if_index, + t->timestamp, t->buffer_size); + + if (t->which == FLOW_VARIANT_L2) + s = format (s, "\n%U -> %U", format_white_space, indent, + format_ethernet_address, &t->src_mac, + format_ethernet_address, &t->dst_mac); + + if (t->protocol > 0 + && (t->which == FLOW_VARIANT_L2_IP4 || t->which == FLOW_VARIANT_IP4 + || t->which == FLOW_VARIANT_L2_IP6 || t->which == FLOW_VARIANT_IP6)) + s = + format (s, "\n%U%U: %U -> %U", format_white_space, indent, + format_ip_protocol, t->protocol, format_ip46_address, + &t->src_address, IP46_TYPE_ANY, format_ip46_address, + &t->dst_address, IP46_TYPE_ANY); + return s; +} + +vlib_node_registration_t flowprobe_ip4_node; +vlib_node_registration_t flowprobe_ip6_node; +vlib_node_registration_t flowprobe_l2_node; + +/* No counters at the moment */ +#define foreach_flowprobe_error \ +_(COLLISION, "Hash table collisions") \ +_(BUFFER, "Buffer allocation error") \ +_(EXPORTED_PACKETS, "Exported packets") \ +_(INPATH, "Exported packets in path") + +typedef enum +{ +#define _(sym,str) FLOWPROBE_ERROR_##sym, + foreach_flowprobe_error +#undef _ + FLOWPROBE_N_ERROR, +} flowprobe_error_t; + +static char *flowprobe_error_strings[] = { +#define _(sym,string) string, + foreach_flowprobe_error +#undef _ +}; + +typedef enum +{ + FLOWPROBE_NEXT_DROP, + FLOWPROBE_NEXT_IP4_LOOKUP, + FLOWPROBE_N_NEXT, +} flowprobe_next_t; + +#define FLOWPROBE_NEXT_NODES { \ + [FLOWPROBE_NEXT_DROP] = "error-drop", \ + [FLOWPROBE_NEXT_IP4_LOOKUP] = "ip4-lookup", \ +} + +static inline flowprobe_variant_t +flowprobe_get_variant (flowprobe_variant_t which, + flowprobe_record_t flags, u16 ethertype) +{ + if (which == FLOW_VARIANT_L2 + && (flags & FLOW_RECORD_L3 || flags & FLOW_RECORD_L4)) + return ethertype == ETHERNET_TYPE_IP6 ? FLOW_VARIANT_L2_IP6 : ethertype == + ETHERNET_TYPE_IP4 ? FLOW_VARIANT_L2_IP4 : FLOW_VARIANT_L2; + return which; +} + +static inline u32 +flowprobe_common_add (vlib_buffer_t * to_b, flowprobe_entry_t * e, u16 offset) +{ + u16 start = offset; + + /* Ingress interface */ + u32 rx_if = clib_host_to_net_u32 (e->key.rx_sw_if_index); + clib_memcpy (to_b->data + offset, &rx_if, sizeof (rx_if)); + offset += sizeof (rx_if); + + /* Egress interface */ + u32 tx_if = clib_host_to_net_u32 (e->key.tx_sw_if_index); + clib_memcpy (to_b->data + offset, &tx_if, sizeof (tx_if)); + offset += sizeof (tx_if); + + /* packet delta count */ + u64 packetdelta = clib_host_to_net_u64 (e->packetcount); + clib_memcpy (to_b->data + offset, &packetdelta, sizeof (u64)); + offset += sizeof (u64); + + return offset - start; +} + +static inline u32 +flowprobe_l2_add (vlib_buffer_t * to_b, flowprobe_entry_t * e, u16 offset) +{ + u16 start = offset; + + /* src mac address */ + clib_memcpy (to_b->data + offset, &e->key.src_mac, 6); + offset += 6; + + /* dst mac address */ + clib_memcpy (to_b->data + offset, &e->key.dst_mac, 6); + offset += 6; + + /* ethertype */ + clib_memcpy (to_b->data + offset, &e->key.ethertype, 2); + offset += 2; + + return offset - start; +} + +static inline u32 +flowprobe_l3_ip6_add (vlib_buffer_t * to_b, flowprobe_entry_t * e, u16 offset) +{ + u16 start = offset; + + /* ip6 src address */ + clib_memcpy (to_b->data + offset, &e->key.src_address, + sizeof (ip6_address_t)); + offset += sizeof (ip6_address_t); + + /* ip6 dst address */ + clib_memcpy (to_b->data + offset, &e->key.dst_address, + sizeof (ip6_address_t)); + offset += sizeof (ip6_address_t); + + /* Protocol */ + to_b->data[offset++] = e->key.protocol; + + /* octetDeltaCount */ + u64 octetdelta = clib_host_to_net_u64 (e->octetcount); + clib_memcpy (to_b->data + offset, &octetdelta, sizeof (u64)); + offset += sizeof (u64); + + return offset - start; +} + +static inline u32 +flowprobe_l3_ip4_add (vlib_buffer_t * to_b, flowprobe_entry_t * e, u16 offset) +{ + u16 start = offset; + + /* ip4 src address */ + clib_memcpy (to_b->data + offset, &e->key.src_address.ip4, + sizeof (ip4_address_t)); + offset += sizeof (ip4_address_t); + + /* ip4 dst address */ + clib_memcpy (to_b->data + offset, &e->key.dst_address.ip4, + sizeof (ip4_address_t)); + offset += sizeof (ip4_address_t); + + /* Protocol */ + to_b->data[offset++] = e->key.protocol; + + /* octetDeltaCount */ + u64 octetdelta = clib_host_to_net_u64 (e->octetcount); + clib_memcpy (to_b->data + offset, &octetdelta, sizeof (u64)); + offset += sizeof (u64); + + return offset - start; +} + +static inline u32 +flowprobe_l4_add (vlib_buffer_t * to_b, flowprobe_entry_t * e, u16 offset) +{ + u16 start = offset; + + /* src port */ + clib_memcpy (to_b->data + offset, &e->key.src_port, 2); + offset += 2; + + /* dst port */ + clib_memcpy (to_b->data + offset, &e->key.dst_port, 2); + offset += 2; + + return offset - start; +} + +static inline u32 +flowprobe_hash (flowprobe_key_t * k) +{ + flowprobe_main_t *fm = &flowprobe_main; + int i; + u32 h = 0; + for (i = 0; i < sizeof (k->as_u32) / sizeof (u32); i++) + h = crc_u32 (k->as_u32[i], h); + return h >> (32 - fm->ht_log2len); +} + +flowprobe_entry_t * +flowprobe_lookup (u32 my_cpu_number, flowprobe_key_t * k, u32 * poolindex, + bool * collision) +{ + flowprobe_main_t *fm = &flowprobe_main; + flowprobe_entry_t *e; + u32 h; + + h = (fm->active_timer) ? flowprobe_hash (k) : 0; + + /* Lookup in the flow state pool */ + *poolindex = fm->hash_per_worker[my_cpu_number][h]; + if (*poolindex != ~0) + { + e = pool_elt_at_index (fm->pool_per_worker[my_cpu_number], *poolindex); + if (e) + { + /* Verify key or report collision */ + if (memcmp (k, &e->key, sizeof (flowprobe_key_t))) + *collision = true; + return e; + } + } + + return 0; +} + +flowprobe_entry_t * +flowprobe_create (u32 my_cpu_number, flowprobe_key_t * k, u32 * poolindex) +{ + flowprobe_main_t *fm = &flowprobe_main; + u32 h; + + flowprobe_entry_t *e; + + /* Get my index */ + h = (fm->active_timer) ? flowprobe_hash (k) : 0; + + pool_get (fm->pool_per_worker[my_cpu_number], e); + *poolindex = e - fm->pool_per_worker[my_cpu_number]; + fm->hash_per_worker[my_cpu_number][h] = *poolindex; + + e->key = *k; + + if (fm->passive_timer > 0) + { + e->passive_timer_handle = tw_timer_start_2t_1w_2048sl + (fm->timers_per_worker[my_cpu_number], *poolindex, 0, + fm->passive_timer); + } + return e; +} + +static inline void +add_to_flow_record_state (vlib_main_t * vm, vlib_node_runtime_t * node, + flowprobe_main_t * fm, vlib_buffer_t * b, + u64 timestamp, u16 length, + flowprobe_variant_t which, flowprobe_trace_t * t) +{ + if (fm->disabled) + return; + + u32 my_cpu_number = vm->thread_index; + u16 octets = 0; + + flowprobe_record_t flags = fm->context[which].flags; + bool collect_ip4 = false, collect_ip6 = false; + ASSERT (b); + ethernet_header_t *eth = vlib_buffer_get_current (b); + u16 ethertype = clib_net_to_host_u16 (eth->type); + /* *INDENT-OFF* */ + flowprobe_key_t k = { {0} }; + /* *INDENT-ON* */ + ip4_header_t *ip4 = 0; + ip6_header_t *ip6 = 0; + udp_header_t *udp = 0; + + if (flags & FLOW_RECORD_L3 || flags & FLOW_RECORD_L4) + { + collect_ip4 = which == FLOW_VARIANT_L2_IP4 || which == FLOW_VARIANT_IP4; + collect_ip6 = which == FLOW_VARIANT_L2_IP6 || which == FLOW_VARIANT_IP6; + } + + k.rx_sw_if_index = vnet_buffer (b)->sw_if_index[VLIB_RX]; + k.tx_sw_if_index = vnet_buffer (b)->sw_if_index[VLIB_TX]; + + k.which = which; + + if (flags & FLOW_RECORD_L2) + { + clib_memcpy (k.src_mac, eth->src_address, 6); + clib_memcpy (k.dst_mac, eth->dst_address, 6); + k.ethertype = ethertype; + } + if (collect_ip6 && ethertype == ETHERNET_TYPE_IP6) + { + ip6 = (ip6_header_t *) (eth + 1); + udp = (udp_header_t *) (ip6 + 1); + if (flags & FLOW_RECORD_L3) + { + k.src_address.as_u64[0] = ip6->src_address.as_u64[0]; + k.src_address.as_u64[1] = ip6->src_address.as_u64[1]; + k.dst_address.as_u64[0] = ip6->dst_address.as_u64[0]; + k.dst_address.as_u64[1] = ip6->dst_address.as_u64[1]; + } + k.protocol = ip6->protocol; + octets = clib_net_to_host_u16 (ip6->payload_length) + + sizeof (ip6_header_t); + } + if (collect_ip4 && ethertype == ETHERNET_TYPE_IP4) + { + ip4 = (ip4_header_t *) (eth + 1); + udp = (udp_header_t *) (ip4 + 1); + if (flags & FLOW_RECORD_L3) + { + k.src_address.ip4.as_u32 = ip4->src_address.as_u32; + k.dst_address.ip4.as_u32 = ip4->dst_address.as_u32; + } + k.protocol = ip4->protocol; + octets = clib_net_to_host_u16 (ip4->length); + } + if ((flags & FLOW_RECORD_L4) && udp && + (k.protocol == IP_PROTOCOL_TCP || k.protocol == IP_PROTOCOL_UDP)) + { + k.src_port = udp->src_port; + k.dst_port = udp->dst_port; + } + + if (t) + { + t->rx_sw_if_index = k.rx_sw_if_index; + t->tx_sw_if_index = k.tx_sw_if_index; + clib_memcpy (t->src_mac, k.src_mac, 6); + clib_memcpy (t->dst_mac, k.dst_mac, 6); + t->ethertype = k.ethertype; + t->src_address.ip4.as_u32 = k.src_address.ip4.as_u32; + t->dst_address.ip4.as_u32 = k.dst_address.ip4.as_u32; + t->protocol = k.protocol; + t->src_port = k.src_port; + t->dst_port = k.dst_port; + t->which = k.which; + } + + flowprobe_entry_t *e = 0; + f64 now = vlib_time_now (vm); + if (fm->active_timer > 0) + { + u32 poolindex = ~0; + bool collision = false; + + e = flowprobe_lookup (my_cpu_number, &k, &poolindex, &collision); + if (collision) + { + /* Flush data and clean up entry for reuse. */ + if (e->packetcount) + flowprobe_export_entry (vm, e); + e->key = k; + vlib_node_increment_counter (vm, node->node_index, + FLOWPROBE_ERROR_COLLISION, 1); + } + if (!e) /* Create new entry */ + { + e = flowprobe_create (my_cpu_number, &k, &poolindex); + e->last_exported = now; + } + } + else + { + e = &fm->stateless_entry[my_cpu_number]; + e->key = k; + } + + if (e) + { + /* Updating entry */ + e->packetcount++; + e->octetcount += octets; + e->last_updated = now; + + if (fm->active_timer == 0 + || (now > e->last_exported + fm->active_timer)) + flowprobe_export_entry (vm, e); + } +} + +static u16 +flowprobe_get_headersize (void) +{ + return sizeof (ip4_header_t) + sizeof (udp_header_t) + + sizeof (ipfix_message_header_t) + sizeof (ipfix_set_header_t); +} + +static void +flowprobe_export_send (vlib_main_t * vm, vlib_buffer_t * b0, + flowprobe_variant_t which) +{ + flowprobe_main_t *fm = &flowprobe_main; + flow_report_main_t *frm = &flow_report_main; + vlib_frame_t *f; + ip4_ipfix_template_packet_t *tp; + ipfix_set_header_t *s; + ipfix_message_header_t *h; + ip4_header_t *ip; + udp_header_t *udp; + flowprobe_record_t flags = fm->context[which].flags; + u32 my_cpu_number = vm->thread_index; + + /* Fill in header */ + flow_report_stream_t *stream; + + /* Nothing to send */ + if (fm->context[which].next_record_offset_per_worker[my_cpu_number] <= + flowprobe_get_headersize ()) + return; + + u32 i, index = vec_len (frm->streams); + for (i = 0; i < index; i++) + if (frm->streams[i].domain_id == 1) + { + index = i; + break; + } + if (i == vec_len (frm->streams)) + { + vec_validate (frm->streams, index); + frm->streams[index].domain_id = 1; + } + stream = &frm->streams[index]; + + tp = vlib_buffer_get_current (b0); + ip = (ip4_header_t *) & tp->ip4; + udp = (udp_header_t *) (ip + 1); + h = (ipfix_message_header_t *) (udp + 1); + s = (ipfix_set_header_t *) (h + 1); + + ip->ip_version_and_header_length = 0x45; + ip->ttl = 254; + ip->protocol = IP_PROTOCOL_UDP; + ip->flags_and_fragment_offset = 0; + ip->src_address.as_u32 = frm->src_address.as_u32; + ip->dst_address.as_u32 = frm->ipfix_collector.as_u32; + udp->src_port = clib_host_to_net_u16 (UDP_DST_PORT_ipfix); + udp->dst_port = clib_host_to_net_u16 (UDP_DST_PORT_ipfix); + udp->checksum = 0; + + /* FIXUP: message header export_time */ + h->export_time = (u32) + (((f64) frm->unix_time_0) + + (vlib_time_now (frm->vlib_main) - frm->vlib_time_0)); + h->export_time = clib_host_to_net_u32 (h->export_time); + h->domain_id = clib_host_to_net_u32 (stream->domain_id); + + /* FIXUP: message header sequence_number */ + h->sequence_number = stream->sequence_number++; + h->sequence_number = clib_host_to_net_u32 (h->sequence_number); + + s->set_id_length = ipfix_set_id_length (fm->template_reports[flags], + b0->current_length - + (sizeof (*ip) + sizeof (*udp) + + sizeof (*h))); + h->version_length = version_length (b0->current_length - + (sizeof (*ip) + sizeof (*udp))); + + ip->length = clib_host_to_net_u16 (b0->current_length); + + ip->checksum = ip4_header_checksum (ip); + udp->length = clib_host_to_net_u16 (b0->current_length - sizeof (*ip)); + + if (frm->udp_checksum) + { + /* RFC 7011 section 10.3.2. */ + udp->checksum = ip4_tcp_udp_compute_checksum (vm, b0, ip); + if (udp->checksum == 0) + udp->checksum = 0xffff; + } + + ASSERT (ip->checksum == ip4_header_checksum (ip)); + + /* Find or allocate a frame */ + f = fm->context[which].frames_per_worker[my_cpu_number]; + if (PREDICT_FALSE (f == 0)) + { + u32 *to_next; + f = vlib_get_frame_to_node (vm, ip4_lookup_node.index); + fm->context[which].frames_per_worker[my_cpu_number] = f; + u32 bi0 = vlib_get_buffer_index (vm, b0); + + /* Enqueue the buffer */ + to_next = vlib_frame_vector_args (f); + to_next[0] = bi0; + f->n_vectors = 1; + } + + vlib_put_frame_to_node (vm, ip4_lookup_node.index, f); + vlib_node_increment_counter (vm, flowprobe_l2_node.index, + FLOWPROBE_ERROR_EXPORTED_PACKETS, 1); + + fm->context[which].frames_per_worker[my_cpu_number] = 0; + fm->context[which].buffers_per_worker[my_cpu_number] = 0; + fm->context[which].next_record_offset_per_worker[my_cpu_number] = + flowprobe_get_headersize (); +} + +static vlib_buffer_t * +flowprobe_get_buffer (vlib_main_t * vm, flowprobe_variant_t which) +{ + flowprobe_main_t *fm = &flowprobe_main; + flow_report_main_t *frm = &flow_report_main; + vlib_buffer_t *b0; + u32 bi0; + vlib_buffer_free_list_t *fl; + u32 my_cpu_number = vm->thread_index; + + /* Find or allocate a buffer */ + b0 = fm->context[which].buffers_per_worker[my_cpu_number]; + + /* Need to allocate a buffer? */ + if (PREDICT_FALSE (b0 == 0)) + { + if (vlib_buffer_alloc (vm, &bi0, 1) != 1) + { + vlib_node_increment_counter (vm, flowprobe_l2_node.index, + FLOWPROBE_ERROR_BUFFER, 1); + return 0; + } + + /* Initialize the buffer */ + b0 = fm->context[which].buffers_per_worker[my_cpu_number] = + vlib_get_buffer (vm, bi0); + fl = + vlib_buffer_get_free_list (vm, VLIB_BUFFER_DEFAULT_FREE_LIST_INDEX); + vlib_buffer_init_for_free_list (b0, fl); + VLIB_BUFFER_TRACE_TRAJECTORY_INIT (b0); + + b0->current_data = 0; + b0->current_length = flowprobe_get_headersize (); + b0->flags |= (VLIB_BUFFER_TOTAL_LENGTH_VALID | VLIB_BUFFER_FLOW_REPORT); + vnet_buffer (b0)->sw_if_index[VLIB_RX] = 0; + vnet_buffer (b0)->sw_if_index[VLIB_TX] = frm->fib_index; + fm->context[which].next_record_offset_per_worker[my_cpu_number] = + b0->current_length; + } + + return b0; +} + +static void +flowprobe_export_entry (vlib_main_t * vm, flowprobe_entry_t * e) +{ + u32 my_cpu_number = vm->thread_index; + flowprobe_main_t *fm = &flowprobe_main; + flow_report_main_t *frm = &flow_report_main; + vlib_buffer_t *b0; + bool collect_ip4 = false, collect_ip6 = false; + flowprobe_variant_t which = e->key.which; + flowprobe_record_t flags = fm->context[which].flags; + u16 offset = + fm->context[which].next_record_offset_per_worker[my_cpu_number]; + + if (offset < flowprobe_get_headersize ()) + offset = flowprobe_get_headersize (); + + b0 = flowprobe_get_buffer (vm, which); + /* No available buffer, what to do... */ + if (b0 == 0) + return; + + if (flags & FLOW_RECORD_L3) + { + collect_ip4 = which == FLOW_VARIANT_L2_IP4 || which == FLOW_VARIANT_IP4; + collect_ip6 = which == FLOW_VARIANT_L2_IP6 || which == FLOW_VARIANT_IP6; + } + + offset += flowprobe_common_add (b0, e, offset); + + if (flags & FLOW_RECORD_L2) + offset += flowprobe_l2_add (b0, e, offset); + if (collect_ip6) + offset += flowprobe_l3_ip6_add (b0, e, offset); + if (collect_ip4) + offset += flowprobe_l3_ip4_add (b0, e, offset); + if (flags & FLOW_RECORD_L4) + offset += flowprobe_l4_add (b0, e, offset); + + /* Reset per flow-export counters */ + e->packetcount = 0; + e->octetcount = 0; + e->last_exported = vlib_time_now (vm); + + b0->current_length = offset; + + fm->context[which].next_record_offset_per_worker[my_cpu_number] = offset; + /* Time to flush the buffer? */ + if (offset + fm->template_size[flags] > frm->path_mtu) + flowprobe_export_send (vm, b0, which); +} + +uword +flowprobe_node_fn (vlib_main_t * vm, + vlib_node_runtime_t * node, vlib_frame_t * frame, + flowprobe_variant_t which) +{ + u32 n_left_from, *from, *to_next; + flowprobe_next_t next_index; + flowprobe_main_t *fm = &flowprobe_main; + u64 now; + + now = (u64) ((vlib_time_now (vm) - fm->vlib_time_0) * 1e9); + now += fm->nanosecond_time_0; + + from = vlib_frame_vector_args (frame); + n_left_from = frame->n_vectors; + next_index = node->cached_next_index; + + while (n_left_from > 0) + { + u32 n_left_to_next; + + vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next); + + while (n_left_from >= 4 && n_left_to_next >= 2) + { + u32 next0 = FLOWPROBE_NEXT_DROP; + u32 next1 = FLOWPROBE_NEXT_DROP; + u16 len0, len1; + u32 bi0, bi1; + vlib_buffer_t *b0, *b1; + + /* Prefetch next iteration. */ + { + vlib_buffer_t *p2, *p3; + + p2 = vlib_get_buffer (vm, from[2]); + p3 = vlib_get_buffer (vm, from[3]); + + vlib_prefetch_buffer_header (p2, LOAD); + vlib_prefetch_buffer_header (p3, LOAD); + + CLIB_PREFETCH (p2->data, CLIB_CACHE_LINE_BYTES, STORE); + CLIB_PREFETCH (p3->data, CLIB_CACHE_LINE_BYTES, STORE); + } + + /* speculatively enqueue b0 and b1 to the current next frame */ + to_next[0] = bi0 = from[0]; + to_next[1] = bi1 = from[1]; + from += 2; + to_next += 2; + n_left_from -= 2; + n_left_to_next -= 2; + + b0 = vlib_get_buffer (vm, bi0); + b1 = vlib_get_buffer (vm, bi1); + + vnet_feature_next (vnet_buffer (b0)->sw_if_index[VLIB_TX], + &next0, b0); + vnet_feature_next (vnet_buffer (b1)->sw_if_index[VLIB_TX], + &next1, b1); + + len0 = vlib_buffer_length_in_chain (vm, b0); + ethernet_header_t *eh0 = vlib_buffer_get_current (b0); + u16 ethertype0 = clib_net_to_host_u16 (eh0->type); + + if (PREDICT_TRUE ((b0->flags & VLIB_BUFFER_FLOW_REPORT) == 0)) + add_to_flow_record_state (vm, node, fm, b0, now, len0, + flowprobe_get_variant + (which, fm->context[which].flags, + ethertype0), 0); + + len1 = vlib_buffer_length_in_chain (vm, b1); + ethernet_header_t *eh1 = vlib_buffer_get_current (b1); + u16 ethertype1 = clib_net_to_host_u16 (eh1->type); + + if (PREDICT_TRUE ((b1->flags & VLIB_BUFFER_FLOW_REPORT) == 0)) + add_to_flow_record_state (vm, node, fm, b1, now, len1, + flowprobe_get_variant + (which, fm->context[which].flags, + ethertype1), 0); + + /* verify speculative enqueues, maybe switch current next frame */ + vlib_validate_buffer_enqueue_x2 (vm, node, next_index, + to_next, n_left_to_next, + bi0, bi1, next0, next1); + } + + while (n_left_from > 0 && n_left_to_next > 0) + { + u32 bi0; + vlib_buffer_t *b0; + u32 next0 = FLOWPROBE_NEXT_DROP; + u16 len0; + + /* speculatively enqueue b0 to the current next frame */ + bi0 = from[0]; + to_next[0] = bi0; + from += 1; + to_next += 1; + n_left_from -= 1; + n_left_to_next -= 1; + + b0 = vlib_get_buffer (vm, bi0); + + vnet_feature_next (vnet_buffer (b0)->sw_if_index[VLIB_TX], + &next0, b0); + + len0 = vlib_buffer_length_in_chain (vm, b0); + ethernet_header_t *eh0 = vlib_buffer_get_current (b0); + u16 ethertype0 = clib_net_to_host_u16 (eh0->type); + + if (PREDICT_TRUE ((b0->flags & VLIB_BUFFER_FLOW_REPORT) == 0)) + { + flowprobe_trace_t *t = 0; + if (PREDICT_FALSE ((node->flags & VLIB_NODE_FLAG_TRACE) + && (b0->flags & VLIB_BUFFER_IS_TRACED))) + t = vlib_add_trace (vm, node, b0, sizeof (*t)); + + add_to_flow_record_state (vm, node, fm, b0, now, len0, + flowprobe_get_variant + (which, fm->context[which].flags, + ethertype0), t); + } + + /* verify speculative enqueue, maybe switch current next frame */ + vlib_validate_buffer_enqueue_x1 (vm, node, next_index, + to_next, n_left_to_next, + bi0, next0); + } + + vlib_put_next_frame (vm, node, next_index, n_left_to_next); + } + return frame->n_vectors; +} + +static uword +flowprobe_ip4_node_fn (vlib_main_t * vm, + vlib_node_runtime_t * node, vlib_frame_t * frame) +{ + return flowprobe_node_fn (vm, node, frame, FLOW_VARIANT_IP4); +} + +static uword +flowprobe_ip6_node_fn (vlib_main_t * vm, + vlib_node_runtime_t * node, vlib_frame_t * frame) +{ + return flowprobe_node_fn (vm, node, frame, FLOW_VARIANT_IP6); +} + +static uword +flowprobe_l2_node_fn (vlib_main_t * vm, + vlib_node_runtime_t * node, vlib_frame_t * frame) +{ + return flowprobe_node_fn (vm, node, frame, FLOW_VARIANT_L2); +} + +static inline void +flush_record (flowprobe_variant_t which) +{ + vlib_main_t *vm = vlib_get_main (); + vlib_buffer_t *b = flowprobe_get_buffer (vm, which); + if (b) + flowprobe_export_send (vm, b, which); +} + +void +flowprobe_flush_callback_ip4 (void) +{ + flush_record (FLOW_VARIANT_IP4); +} + +void +flowprobe_flush_callback_ip6 (void) +{ + flush_record (FLOW_VARIANT_IP6); +} + +void +flowprobe_flush_callback_l2 (void) +{ + flush_record (FLOW_VARIANT_L2); + flush_record (FLOW_VARIANT_L2_IP4); + flush_record (FLOW_VARIANT_L2_IP6); +} + + +static void +flowprobe_delete_by_index (u32 my_cpu_number, u32 poolindex) +{ + flowprobe_main_t *fm = &flowprobe_main; + flowprobe_entry_t *e; + u32 h; + + e = pool_elt_at_index (fm->pool_per_worker[my_cpu_number], poolindex); + + /* Get my index */ + h = flowprobe_hash (&e->key); + + /* Reset hash */ + fm->hash_per_worker[my_cpu_number][h] = ~0; + + pool_put_index (fm->pool_per_worker[my_cpu_number], poolindex); +} + + +/* Per worker process processing the active/passive expired entries */ +static uword +flowprobe_walker_process (vlib_main_t * vm, + vlib_node_runtime_t * rt, vlib_frame_t * f) +{ + flowprobe_main_t *fm = &flowprobe_main; + flow_report_main_t *frm = &flow_report_main; + flowprobe_entry_t *e; + + /* + * $$$$ Remove this check from here and track FRM status and disable + * this process if required. + */ + if (frm->ipfix_collector.as_u32 == 0 || frm->src_address.as_u32 == 0) + { + fm->disabled = true; + return 0; + } + fm->disabled = false; + + u32 cpu_index = os_get_thread_index (); + u32 *to_be_removed = 0, *i; + + /* + * Tick the timer when required and process the vector of expired + * timers + */ + f64 start_time = vlib_time_now (vm); + u32 count = 0; + + tw_timer_expire_timers_2t_1w_2048sl (fm->timers_per_worker[cpu_index], + start_time); + + vec_foreach (i, fm->expired_passive_per_worker[cpu_index]) + { + u32 exported = 0; + f64 now = vlib_time_now (vm); + if (now > start_time + 100e-6 + || exported > FLOW_MAXIMUM_EXPORT_ENTRIES - 1) + break; + + if (pool_is_free_index (fm->pool_per_worker[cpu_index], *i)) + { + clib_warning ("Element is %d is freed already\n", *i); + continue; + } + else + e = pool_elt_at_index (fm->pool_per_worker[cpu_index], *i); + + /* Check last update timestamp. If it is longer than passive time nuke + * entry. Otherwise restart timer with what's left + * Premature passive timer by more than 10% + */ + if ((now - e->last_updated) < (fm->passive_timer * 0.9)) + { + f64 delta = fm->passive_timer - (now - e->last_updated); + e->passive_timer_handle = tw_timer_start_2t_1w_2048sl + (fm->timers_per_worker[cpu_index], *i, 0, delta); + } + else /* Nuke entry */ + { + vec_add1 (to_be_removed, *i); + } + /* If anything to report send it to the exporter */ + if (e->packetcount && now > e->last_exported + fm->active_timer) + { + exported++; + flowprobe_export_entry (vm, e); + } + count++; + } + if (count) + vec_delete (fm->expired_passive_per_worker[cpu_index], count, 0); + + vec_foreach (i, to_be_removed) flowprobe_delete_by_index (cpu_index, *i); + vec_free (to_be_removed); + + return 0; +} + +/* *INDENT-OFF* */ +VLIB_REGISTER_NODE (flowprobe_ip4_node) = { + .function = flowprobe_ip4_node_fn, + .name = "flowprobe-ip4", + .vector_size = sizeof (u32), + .format_trace = format_flowprobe_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + .n_errors = ARRAY_LEN(flowprobe_error_strings), + .error_strings = flowprobe_error_strings, + .n_next_nodes = FLOWPROBE_N_NEXT, + .next_nodes = FLOWPROBE_NEXT_NODES, +}; +VLIB_REGISTER_NODE (flowprobe_ip6_node) = { + .function = flowprobe_ip6_node_fn, + .name = "flowprobe-ip6", + .vector_size = sizeof (u32), + .format_trace = format_flowprobe_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + .n_errors = ARRAY_LEN(flowprobe_error_strings), + .error_strings = flowprobe_error_strings, + .n_next_nodes = FLOWPROBE_N_NEXT, + .next_nodes = FLOWPROBE_NEXT_NODES, +}; +VLIB_REGISTER_NODE (flowprobe_l2_node) = { + .function = flowprobe_l2_node_fn, + .name = "flowprobe-l2", + .vector_size = sizeof (u32), + .format_trace = format_flowprobe_trace, + .type = VLIB_NODE_TYPE_INTERNAL, + .n_errors = ARRAY_LEN(flowprobe_error_strings), + .error_strings = flowprobe_error_strings, + .n_next_nodes = FLOWPROBE_N_NEXT, + .next_nodes = FLOWPROBE_NEXT_NODES, +}; +VLIB_REGISTER_NODE (flowprobe_walker_node) = { + .function = flowprobe_walker_process, + .name = "flowprobe-walker", + .type = VLIB_NODE_TYPE_INPUT, + .state = VLIB_NODE_STATE_INTERRUPT, +}; +/* *INDENT-ON* */ + +/* + * fd.io coding-style-patch-verification: ON + * + * Local Variables: + * eval: (c-set-style "gnu") + * End: + */ diff --git a/src/plugins/ioam/analyse/ioam_summary_export.c b/src/plugins/ioam/analyse/ioam_summary_export.c index 9a2667f6..af2d39ab 100644 --- a/src/plugins/ioam/analyse/ioam_summary_export.c +++ b/src/plugins/ioam/analyse/ioam_summary_export.c @@ -398,6 +398,7 @@ ioam_flow_create (u8 del) int rv; u32 domain_id = 0; flow_report_main_t *frm = &flow_report_main; + u16 template_id; memset (&args, 0, sizeof (args)); args.rewrite_callback = ioam_template_rewrite; @@ -405,7 +406,7 @@ ioam_flow_create (u8 del) del ? (args.is_add = 0) : (args.is_add = 1); args.domain_id = domain_id; - rv = vnet_flow_report_add_del (frm, &args); + rv = vnet_flow_report_add_del (frm, &args, &template_id); switch (rv) { diff --git a/src/plugins/ioam/udp-ping/udp_ping_export.c b/src/plugins/ioam/udp-ping/udp_ping_export.c index ce64b606..91cbb4bd 100644 --- a/src/plugins/ioam/udp-ping/udp_ping_export.c +++ b/src/plugins/ioam/udp-ping/udp_ping_export.c @@ -226,6 +226,7 @@ udp_ping_flow_create (u8 del) int rv; u32 domain_id = 0; flow_report_main_t *frm = &flow_report_main; + u16 template_id; memset (&args, 0, sizeof (args)); args.rewrite_callback = udp_ping_template_rewrite; @@ -234,7 +235,7 @@ udp_ping_flow_create (u8 del) args.domain_id = domain_id; args.src_port = UDP_DST_PORT_ipfix; - rv = vnet_flow_report_add_del (frm, &args); + rv = vnet_flow_report_add_del (frm, &args, &template_id); switch (rv) { diff --git a/src/plugins/snat/snat_ipfix_logging.c b/src/plugins/snat/snat_ipfix_logging.c index 1a111863..c68dc540 100644 --- a/src/plugins/snat/snat_ipfix_logging.c +++ b/src/plugins/snat/snat_ipfix_logging.c @@ -784,7 +784,7 @@ snat_ipfix_logging_enable_disable (int enable, u32 domain_id, u16 src_port) a.rewrite_callback = snat_template_rewrite_max_entries_per_usr; a.flow_data_callback = snat_data_callback_max_entries_per_usr; - rv = vnet_flow_report_add_del (frm, &a); + rv = vnet_flow_report_add_del (frm, &a, NULL); if (rv) { clib_warning ("vnet_flow_report_add_del returned %d", rv); @@ -796,7 +796,7 @@ snat_ipfix_logging_enable_disable (int enable, u32 domain_id, u16 src_port) a.rewrite_callback = snat_template_rewrite_nat44_session; a.flow_data_callback = snat_data_callback_nat44_session; - rv = vnet_flow_report_add_del (frm, &a); + rv = vnet_flow_report_add_del (frm, &a, NULL); if (rv) { clib_warning ("vnet_flow_report_add_del returned %d", rv); @@ -806,7 +806,7 @@ snat_ipfix_logging_enable_disable (int enable, u32 domain_id, u16 src_port) a.rewrite_callback = snat_template_rewrite_addr_exhausted; a.flow_data_callback = snat_data_callback_addr_exhausted; - rv = vnet_flow_report_add_del (frm, &a); + rv = vnet_flow_report_add_del (frm, &a, NULL); if (rv) { clib_warning ("vnet_flow_report_add_del returned %d", rv); diff --git a/src/vnet/flow/flow_api.c b/src/vnet/flow/flow_api.c index b975dda1..52a608ca 100644 --- a/src/vnet/flow/flow_api.c +++ b/src/vnet/flow/flow_api.c @@ -296,7 +296,7 @@ static void args.domain_id = fcm->domain_id; args.src_port = fcm->src_port; - rv = vnet_flow_report_add_del (frm, &args); + rv = vnet_flow_report_add_del (frm, &args, NULL); /* If deleting, or add failed */ if (is_add == 0 || (rv && is_add)) diff --git a/src/vnet/flow/flow_report.c b/src/vnet/flow/flow_report.c index c78a78a9..ccc84235 100644 --- a/src/vnet/flow/flow_report.c +++ b/src/vnet/flow/flow_report.c @@ -87,7 +87,6 @@ int send_template_packet (flow_report_main_t *frm, if (frm->ipfix_collector.as_u32 == 0 || frm->src_address.as_u32 == 0) { - clib_warning ("no collector: disabling flow collector process"); vlib_node_set_state (frm->vlib_main, flow_report_process_node.index, VLIB_NODE_STATE_DISABLED); return -1; @@ -238,7 +237,8 @@ VLIB_REGISTER_NODE (flow_report_process_node) = { }; int vnet_flow_report_add_del (flow_report_main_t *frm, - vnet_flow_report_add_del_args_t *a) + vnet_flow_report_add_del_args_t *a, + u16 *template_id) { int i; int found_index = ~0; @@ -260,6 +260,8 @@ int vnet_flow_report_add_del (flow_report_main_t *frm, && fr->flow_data_callback == a->flow_data_callback) { found_index = i; + if (template_id) + *template_id = fr->template_id; break; } } @@ -304,7 +306,10 @@ int vnet_flow_report_add_del (flow_report_main_t *frm, fr->opaque = a->opaque; fr->rewrite_callback = a->rewrite_callback; fr->flow_data_callback = a->flow_data_callback; - + + if (template_id) + *template_id = fr->template_id; + return 0; } @@ -415,10 +420,7 @@ set_ipfix_exporter_command_fn (vlib_main_t * vm, break; } - if (collector.as_u32 == 0) - return clib_error_return (0, "collector address required"); - - if (src.as_u32 == 0) + if (collector.as_u32 != 0 && src.as_u32 == 0) return clib_error_return (0, "src address required"); if (path_mtu > 1450 /* vpp does not support fragmentation */) @@ -441,7 +443,8 @@ set_ipfix_exporter_command_fn (vlib_main_t * vm, frm->template_interval = template_interval; frm->udp_checksum = udp_checksum; - vlib_cli_output (vm, "Collector %U, src address %U, " + if (collector.as_u32) + vlib_cli_output (vm, "Collector %U, src address %U, " "fib index %d, path MTU %u, " "template resend interval %us, " "udp checksum %s", @@ -449,6 +452,8 @@ set_ipfix_exporter_command_fn (vlib_main_t * vm, format_ip4_address, &frm->src_address, fib_index, path_mtu, template_interval, udp_checksum ? "enabled" : "disabled"); + else + vlib_cli_output (vm, "IPFIX Collector is disabled"); /* Turn on the flow reporting process */ vlib_process_signal_event (vm, flow_report_process_node.index, diff --git a/src/vnet/flow/flow_report.h b/src/vnet/flow/flow_report.h index e8ed3818..01859ce5 100644 --- a/src/vnet/flow/flow_report.h +++ b/src/vnet/flow/flow_report.h @@ -130,7 +130,8 @@ typedef struct { } vnet_flow_report_add_del_args_t; int vnet_flow_report_add_del (flow_report_main_t *frm, - vnet_flow_report_add_del_args_t *a); + vnet_flow_report_add_del_args_t *a, + u16 *template_id); clib_error_t * flow_report_add_del_error_to_clib_error (int error); diff --git a/src/vnet/flow/flow_report_classify.c b/src/vnet/flow/flow_report_classify.c index 27f03ddc..d4c30492 100644 --- a/src/vnet/flow/flow_report_classify.c +++ b/src/vnet/flow/flow_report_classify.c @@ -458,7 +458,7 @@ ipfix_classify_table_add_del_command_fn (vlib_main_t * vm, args.domain_id = fcm->domain_id; args.src_port = fcm->src_port; - rv = vnet_flow_report_add_del (frm, &args); + rv = vnet_flow_report_add_del (frm, &args, NULL); error = flow_report_add_del_error_to_clib_error(rv); diff --git a/src/vnet/ip/ip6_packet.h b/src/vnet/ip/ip6_packet.h index cdd7eed5..c0c745e2 100644 --- a/src/vnet/ip/ip6_packet.h +++ b/src/vnet/ip/ip6_packet.h @@ -341,6 +341,12 @@ typedef struct ip6_address_t src_address, dst_address; } ip6_header_t; +always_inline u8 +ip6_traffic_class (ip6_header_t * i) +{ + return (i->ip_version_traffic_class_and_flow_label & 0x0FF00000) >> 20; +} + always_inline void * ip6_next_header (ip6_header_t * i) { diff --git a/src/vpp-api/java/jvpp/gen/jvpp_gen.py b/src/vpp-api/java/jvpp/gen/jvpp_gen.py index 7932741d..6648a4f7 100755 --- a/src/vpp-api/java/jvpp/gen/jvpp_gen.py +++ b/src/vpp-api/java/jvpp/gen/jvpp_gen.py @@ -30,7 +30,7 @@ from jvppgen import jvpp_impl_gen from jvppgen import jvpp_c_gen from jvppgen import util -blacklist = [ "memclnt.api", "flowperpkt.api" ] +blacklist = [ "memclnt.api", "flowprobe.api" ] # Invocation: # ~/Projects/vpp/vpp-api/jvpp/gen$ mkdir -p java/io/fd/vpp/jvpp && cd java/io/fd/vpp/jvpp diff --git a/test/test_flowperpkt.py b/test/test_flowperpkt.py deleted file mode 100644 index b13d0c63..00000000 --- a/test/test_flowperpkt.py +++ /dev/null @@ -1,173 +0,0 @@ -#!/usr/bin/env python - -import unittest - -from framework import VppTestCase, VppTestRunner - -from scapy.packet import Raw -from scapy.layers.l2 import Ether -from scapy.layers.inet import IP, UDP - - -class TestFlowperpkt(VppTestCase): - """ Flow-per-packet plugin: test both L2 and IP4 reporting """ - - def setUp(self): - """ - Set up - - **Config:** - - create three PG interfaces - """ - super(TestFlowperpkt, self).setUp() - - self.create_pg_interfaces(range(3)) - - self.pg_if_packet_sizes = [150] - - self.interfaces = list(self.pg_interfaces) - - for intf in self.interfaces: - intf.admin_up() - intf.config_ip4() - intf.resolve_arp() - - def create_stream(self, src_if, dst_if, packet_sizes): - """Create a packet stream to tickle the plugin - - :param VppInterface src_if: Source interface for packet stream - :param VppInterface src_if: Dst interface for packet stream - :param list packet_sizes: Sizes to test - """ - pkts = [] - for size in packet_sizes: - info = self.create_packet_info(src_if, dst_if) - payload = self.info_to_payload(info) - p = (Ether(src=src_if.remote_mac, dst=src_if.local_mac) / - IP(src=src_if.remote_ip4, dst=dst_if.remote_ip4) / - UDP(sport=1234, dport=4321) / - Raw(payload)) - info.data = p.copy() - self.extend_packet(p, size) - pkts.append(p) - return pkts - - @staticmethod - def compare_with_mask(payload, masked_expected_data): - if len(payload) * 2 != len(masked_expected_data): - return False - - # iterate over pairs: raw byte from payload and ASCII code for that - # byte from masked payload (or XX if masked) - for i in range(len(payload)): - p = payload[i] - m = masked_expected_data[2 * i:2 * i + 2] - if m != "XX": - if "%02x" % ord(p) != m: - return False - return True - - def verify_ipfix(self, collector_if): - """Check the ipfix capture""" - found_data_packet = False - found_template_packet = False - found_l2_data_packet = False - found_l2_template_packet = False - - # Scapy, of course, understands ipfix not at all... - # These data vetted by manual inspection in wireshark - # X'ed out fields are timestamps, which will absolutely - # fail to compare. - - data_udp_string = "1283128300370000000a002fXXXXXXXX000000000000000101"\ - "00001f0000000100000002ac100102ac10020200XXXXXXXXXXXXXXXX0092" - - template_udp_string = "12831283003c0000000a0034XXXXXXXX00000002000000"\ - "010002002401000007000a0004000e000400080004000c000400050001009c00"\ - "0801380002" - - l2_data_udp_string = "12831283003c0000000a0034XXXXXXXX000000010000000"\ - "1010100240000000100000002%s02020000ff020008XXXXXXXXXXX"\ - "XXXXX0092" % self.pg1.local_mac.translate(None, ":") - - l2_template_udp_string = "12831283003c0000000a0034XXXXXXXX00000002000"\ - "000010002002401010007000a0004000e0004003800060050000601000002009"\ - "c000801380002" - - self.logger.info("Look for ipfix packets on %s sw_if_index %d " - % (collector_if.name, collector_if.sw_if_index)) - # expecting 4 packets on collector interface based on traffic on other - # interfaces - capture = collector_if.get_capture(4) - - for p in capture: - ip = p[IP] - udp = p[UDP] - self.logger.info("src %s dst %s" % (ip.src, ip.dst)) - self.logger.info(" udp src_port %s dst_port %s" - % (udp.sport, udp.dport)) - - payload = str(udp) - - if self.compare_with_mask(payload, data_udp_string): - self.logger.info("found ip4 data packet") - found_data_packet = True - elif self.compare_with_mask(payload, template_udp_string): - self.logger.info("found ip4 template packet") - found_template_packet = True - elif self.compare_with_mask(payload, l2_data_udp_string): - self.logger.info("found l2 data packet") - found_l2_data_packet = True - elif self.compare_with_mask(payload, l2_template_udp_string): - self.logger.info("found l2 template packet") - found_l2_template_packet = True - else: - unmasked_payload = "".join(["%02x" % ord(c) for c in payload]) - self.logger.error("unknown pkt '%s'" % unmasked_payload) - - self.assertTrue(found_data_packet, "Data packet not found") - self.assertTrue(found_template_packet, "Template packet not found") - self.assertTrue(found_l2_data_packet, "L2 data packet not found") - self.assertTrue(found_l2_template_packet, - "L2 template packet not found") - - def test_L3_fpp(self): - """ Flow per packet L3 test """ - - # Configure an ipfix report on the [nonexistent] collector - # 172.16.3.2, as if it was connected to the pg2 interface - # Install a FIB entry, so the exporter's work won't turn into - # an ARP request - - self.pg_enable_capture(self.pg_interfaces) - self.pg2.configure_ipv4_neighbors() - self.vapi.set_ipfix_exporter(collector_address=self.pg2.remote_ip4n, - src_address=self.pg2.local_ip4n, - path_mtu=1450, - template_interval=1) - - # Export flow records for all pkts transmitted on pg1 - self.vapi.cli("flowperpkt feature add-del pg1") - self.vapi.cli("flowperpkt feature add-del pg1 l2") - - # Arrange to minimally trace generated ipfix packets - self.vapi.cli("trace add flowperpkt-ipv4 10") - self.vapi.cli("trace add flowperpkt-l2 10") - - # Create a stream from pg0 -> pg1, which causes - # an ipfix packet to be transmitted on pg2 - - pkts = self.create_stream(self.pg0, self.pg1, - self.pg_if_packet_sizes) - self.pg0.add_stream(pkts) - self.pg_start() - - # Flush the ipfix collector, so we don't need any - # asinine time.sleep(5) action - self.vapi.cli("ipfix flush") - - # Make sure the 4 pkts we expect actually showed up - self.verify_ipfix(self.pg2) - -if __name__ == '__main__': - unittest.main(testRunner=VppTestRunner) diff --git a/test/test_flowprobe.py b/test/test_flowprobe.py new file mode 100644 index 00000000..560b44cc --- /dev/null +++ b/test/test_flowprobe.py @@ -0,0 +1,967 @@ +#!/usr/bin/env python +import random +import socket +import unittest +import time + +from scapy.packet import Raw +from scapy.layers.l2 import Ether +from scapy.layers.inet import IP, UDP +from scapy.layers.inet6 import IPv6 + +from framework import VppTestCase, VppTestRunner +from vpp_object import VppObject +from vpp_pg_interface import CaptureTimeoutError +from util import ppp +from ipfix import IPFIX, Set, Template, Data, IPFIXDecoder + + +class VppCFLOW(VppObject): + """CFLOW object for IPFIX exporter and Flowprobe feature""" + + def __init__(self, test, intf='pg2', active=0, passive=0, timeout=300, + mtu=512, datapath='l2', layer='l2 l3 l4'): + self._test = test + self._intf = intf + self._active = active + if passive == 0 or passive < active: + self._passive = active+5 + else: + self._passive = passive + self._datapath = datapath # l2 ip4 ip6 + self._collect = layer # l2 l3 l4 + self._timeout = timeout + self._mtu = mtu + self._configured = False + + def add_vpp_config(self): + self.enable_exporter() + self._test.vapi.ppcli("flowprobe params record %s active %s " + "passive %s" % (self._collect, self._active, + self._passive)) + self.enable_flowprobe_feature() + self._test.vapi.cli("ipfix flush") + self._configured = True + + def remove_vpp_config(self): + self.disable_exporter() + self.disable_flowprobe_feature() + self._test.vapi.cli("ipfix flush") + self._configured = False + + def enable_exporter(self): + self._test.vapi.set_ipfix_exporter( + collector_address=self._test.pg0.remote_ip4n, + src_address=self._test.pg0.local_ip4n, + path_mtu=self._mtu, + template_interval=self._timeout) + + def enable_flowprobe_feature(self): + self._test.vapi.ppcli("flowprobe feature add-del %s %s" % + (self._intf, self._datapath)) + + def disable_exporter(self): + self._test.vapi.cli("set ipfix exporter collector 0.0.0.0") + + def disable_flowprobe_feature(self): + self._test.vapi.cli("flowprobe feature add-del %s %s disable" % + (self._intf, self._datapath)) + + def object_id(self): + return "ipfix-collector-%s" % (self._src, self.dst) + + def query_vpp_config(self): + return self._configured + + def verify_templates(self, decoder=None, timeout=1, count=3): + templates = [] + p = self._test.wait_for_cflow_packet(self._test.collector, 2, timeout) + self._test.assertTrue(p.haslayer(IPFIX)) + if decoder is not None and p.haslayer(Template): + templates.append(p[Template].templateID) + decoder.add_template(p.getlayer(Template)) + if count > 1: + p = self._test.wait_for_cflow_packet(self._test.collector, 2) + self._test.assertTrue(p.haslayer(IPFIX)) + if decoder is not None and p.haslayer(Template): + templates.append(p[Template].templateID) + decoder.add_template(p.getlayer(Template)) + if count > 2: + p = self._test.wait_for_cflow_packet(self._test.collector, 2) + self._test.assertTrue(p.haslayer(IPFIX)) + if decoder is not None and p.haslayer(Template): + templates.append(p[Template].templateID) + decoder.add_template(p.getlayer(Template)) + return templates + + +class MethodHolder(VppTestCase): + """ Flow-per-packet plugin: test L2, IP4, IP6 reporting """ + + # Test variables + debug_print = False + max_number_of_packets = 16 + pkts = [] + + @classmethod + def setUpClass(cls): + """ + Perform standard class setup (defined by class method setUpClass in + class VppTestCase) before running the test case, set test case related + variables and configure VPP. + """ + super(MethodHolder, cls).setUpClass() + try: + # Create pg interfaces + cls.create_pg_interfaces(range(7)) + + # Packet sizes + cls.pg_if_packet_sizes = [64, 512, 1518, 9018] + + # Create BD with MAC learning and unknown unicast flooding disabled + # and put interfaces to this BD + cls.vapi.bridge_domain_add_del(bd_id=1, uu_flood=1, learn=1) + cls.vapi.sw_interface_set_l2_bridge(cls.pg1._sw_if_index, bd_id=1) + cls.vapi.sw_interface_set_l2_bridge(cls.pg2._sw_if_index, bd_id=1) + + # Set up all interfaces + for i in cls.pg_interfaces: + i.admin_up() + + cls.pg0.config_ip4() + cls.pg0.configure_ipv4_neighbors() + cls.collector = cls.pg0 + + cls.pg1.config_ip4() + cls.pg1.resolve_arp() + cls.pg2.config_ip4() + cls.pg2.resolve_arp() + cls.pg3.config_ip4() + cls.pg3.resolve_arp() + cls.pg4.config_ip4() + cls.pg4.resolve_arp() + + cls.pg5.config_ip6() + cls.pg5.resolve_ndp() + cls.pg5.disable_ipv6_ra() + cls.pg6.config_ip6() + cls.pg6.resolve_ndp() + cls.pg6.disable_ipv6_ra() + except Exception: + super(MethodHolder, cls).tearDownClass() + raise + + def create_stream(self, src_if=None, dst_if=None, packets=None, + size=None, ip_ver='v4'): + """Create a packet stream to tickle the plugin + + :param VppInterface src_if: Source interface for packet stream + :param VppInterface src_if: Dst interface for packet stream + """ + if src_if is None: + src_if = self.pg1 + if dst_if is None: + dst_if = self.pg2 + self.pkts = [] + if packets is None: + packets = random.randint(1, self.max_number_of_packets) + pkt_size = size + for p in range(0, packets): + if size is None: + pkt_size = random.choice(self.pg_if_packet_sizes) + info = self.create_packet_info(src_if, dst_if) + payload = self.info_to_payload(info) + p = Ether(src=src_if.remote_mac, dst=src_if.local_mac) + if ip_ver == 'v4': + p /= IP(src=src_if.remote_ip4, dst=dst_if.remote_ip4) + else: + p /= IPv6(src=src_if.remote_ip6, dst=dst_if.remote_ip6) + p /= (UDP(sport=1234, dport=4321) / Raw(payload)) + info.data = p.copy() + self.extend_packet(p, pkt_size) + self.pkts.append(p) + + def verify_cflow_data(self, decoder, capture, cflow): + octets = 0 + packets = 0 + for p in capture: + octets += p[IP].len + packets += 1 + if cflow.haslayer(Data): + data = decoder.decode_data_set(cflow.getlayer(Set)) + for record in data: + self.assertEqual(int(record[1].encode('hex'), 16), octets) + self.assertEqual(int(record[2].encode('hex'), 16), packets) + + def verify_cflow_data_detail(self, decoder, capture, cflow, + data_set={1: 'octets', 2: 'packets'}, + ip_ver='v4'): + if self.debug_print: + print capture[0].show() + if cflow.haslayer(Data): + data = decoder.decode_data_set(cflow.getlayer(Set)) + if self.debug_print: + print data + if ip_ver == 'v4': + ip_layer = capture[0][IP] + else: + ip_layer = capture[0][IPv6] + if data_set is not None: + for record in data: + # skip flow if in/out gress interface is 0 + if int(record[10].encode('hex'), 16) == 0: + continue + if int(record[14].encode('hex'), 16) == 0: + continue + + for field in data_set: + if field not in record.keys(): + continue + value = data_set[field] + if value == 'octets': + value = ip_layer.len + if ip_ver == 'v6': + value += 40 # ??? is this correct + elif value == 'packets': + value = 1 + elif value == 'src_ip': + if ip_ver == 'v4': + ip = socket.inet_pton(socket.AF_INET, + ip_layer.src) + else: + ip = socket.inet_pton(socket.AF_INET6, + ip_layer.src) + value = int(ip.encode('hex'), 16) + elif value == 'dst_ip': + if ip_ver == 'v4': + ip = socket.inet_pton(socket.AF_INET, + ip_layer.dst) + else: + ip = socket.inet_pton(socket.AF_INET6, + ip_layer.dst) + value = int(ip.encode('hex'), 16) + elif value == 'sport': + value = int(capture[0][UDP].sport) + elif value == 'dport': + value = int(capture[0][UDP].dport) + self.assertEqual(int(record[field].encode('hex'), 16), + value) + + def verify_cflow_data_notimer(self, decoder, capture, cflows): + idx = 0 + for cflow in cflows: + if cflow.haslayer(Data): + data = decoder.decode_data_set(cflow.getlayer(Set)) + else: + raise Exception("No CFLOW data") + + for rec in data: + p = capture[idx] + idx += 1 + self.assertEqual(p[IP].len, int(rec[1].encode('hex'), 16)) + self.assertEqual(1, int(rec[2].encode('hex'), 16)) + self.assertEqual(len(capture), idx) + + def wait_for_cflow_packet(self, collector_intf, set_id=2, timeout=1, + expected=True): + """ wait for CFLOW packet and verify its correctness + + :param timeout: how long to wait + + :returns: tuple (packet, time spent waiting for packet) + """ + self.logger.info("IPFIX: Waiting for CFLOW packet") + deadline = time.time() + timeout + counter = 0 + # self.logger.debug(self.vapi.ppcli("show flow table")) + while True: + counter += 1 + # sanity check + self.assert_in_range(counter, 0, 100, "number of packets ignored") + time_left = deadline - time.time() + try: + if time_left < 0 and expected: + # self.logger.debug(self.vapi.ppcli("show flow table")) + raise CaptureTimeoutError( + "Packet did not arrive within timeout") + p = collector_intf.wait_for_packet(timeout=time_left) + except CaptureTimeoutError: + if expected: + # self.logger.debug(self.vapi.ppcli("show flow table")) + raise CaptureTimeoutError( + "Packet did not arrive within timeout") + else: + return + if not expected: + raise CaptureTimeoutError("Packet arrived even not expected") + self.assertEqual(p[Set].setID, set_id) + # self.logger.debug(self.vapi.ppcli("show flow table")) + self.logger.debug(ppp("IPFIX: Got packet:", p)) + break + return p + + def send_packets(self, src_if=None, dst_if=None): + self.sleep(3) + if src_if is None: + src_if = self.pg1 + if dst_if is None: + dst_if = self.pg2 + self.pg_enable_capture([dst_if]) + src_if.add_stream(self.pkts) + self.pg_start() + return dst_if.get_capture(len(self.pkts)) + + +class TestFFP_Timers(MethodHolder): + """Template verification, timer tests""" + + def test_0001(self): + """ receive template data packets""" + self.logger.info("FFP_TEST_START_0001") + self.pg_enable_capture(self.pg_interfaces) + + ipfix = VppCFLOW(test=self, active=10) + ipfix.add_vpp_config() + + # template packet should arrive immediately + ipfix.verify_templates(timeout=3) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0001") + + def test_0002(self): + """ timer=10s, less than template timeout""" + self.logger.info("FFP_TEST_START_0002") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, active=20) + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3) + + self.create_stream() + capture = self.send_packets() + + # make sure the one packet we expect actually showed up + cflow = self.wait_for_cflow_packet(self.collector, templates[1], 39) + self.verify_cflow_data(ipfix_decoder, capture, cflow) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0002") + + def test_0003(self): + """ timer=30s, greater than template timeout""" + self.logger.info("FFP_TEST_START_0003") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, timeout=20, active=25) + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + ipfix.verify_templates(timeout=3) + + self.create_stream() + capture = self.send_packets() + + # next set of template packet should arrive after 20 seconds + # template packet should arrive within 20 s + templates = ipfix.verify_templates(ipfix_decoder, timeout=25) + + # make sure the one packet we expect actually showed up + cflow = self.wait_for_cflow_packet(self.collector, templates[1], 55) + self.verify_cflow_data(ipfix_decoder, capture, cflow) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0003") + + def test_0004(self): + """ sent packet after first cflow packet arrived""" + self.logger.info("FFP_TEST_START_0004") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, active=20) + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3) + + self.create_stream() + self.send_packets() + + # make sure the one packet we expect actually showed up + self.wait_for_cflow_packet(self.collector, templates[1], 39) + + self.pg_enable_capture([self.pg2]) + + capture = self.send_packets() + + # make sure the one packet we expect actually showed up + cflow = self.wait_for_cflow_packet(self.collector, templates[1], 30) + self.verify_cflow_data(ipfix_decoder, capture, cflow) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0004") + + +class TestFFP_DatapathL2(MethodHolder): + """collect information on Ethernet datapath""" + + def test_0000(self): + """ verify template on L2 datapath""" + self.logger.info("FFP_TEST_START_0000") + self.pg_enable_capture(self.pg_interfaces) + + ipfix = VppCFLOW(test=self, active=10) + ipfix.add_vpp_config() + + # template packet should arrive immediately + ipfix.verify_templates(timeout=3, count=3) + self.collector.get_capture(3) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0000") + + def test_0001(self): + """ L2 data on L2 datapath""" + self.logger.info("FFP_TEST_START_0001") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, active=10, layer='l2') + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3, count=1) + + self.create_stream(packets=1) + capture = self.send_packets() + + # make sure the one packet we expect actually showed up + cflow = self.wait_for_cflow_packet(self.collector, templates[0], 39) + self.verify_cflow_data_detail(ipfix_decoder, capture, cflow, + {2: 'packets', 256: 8}) + + # expected two templates and one cflow packet + self.collector.get_capture(2) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0001") + + def test_0002(self): + """ L3 data on L2 datapath""" + self.logger.info("FFP_TEST_START_0002") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, active=10, layer='l3') + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3, count=2) + + self.create_stream(packets=1) + capture = self.send_packets() + + # make sure the one packet we expect actually showed up + cflow = self.wait_for_cflow_packet(self.collector, templates[0], 39) + self.verify_cflow_data_detail(ipfix_decoder, capture, cflow, + {2: 'packets', 4: 17, + 8: 'src_ip', 12: 'dst_ip'}) + + # expected one template and one cflow packet + self.collector.get_capture(3) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0002") + + def test_0003(self): + """ L4 data on L2 datapath""" + self.logger.info("FFP_TEST_START_0003") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, active=10, layer='l4') + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3, count=2) + + self.create_stream(packets=1) + capture = self.send_packets() + + # make sure the one packet we expect actually showed up + cflow = self.wait_for_cflow_packet(self.collector, templates[0], 39) + self.verify_cflow_data_detail(ipfix_decoder, capture, cflow, + {2: 'packets', 7: 'sport', 11: 'dport'}) + + # expected one template and one cflow packet + self.collector.get_capture(3) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0003") + + +class TestFFP_DatapathIP4(MethodHolder): + """collect information on IP4 datapath""" + + def test_0000(self): + """ verify templates on IP4 datapath""" + self.logger.info("FFP_TEST_START_0000") + + self.pg_enable_capture(self.pg_interfaces) + + ipfix = VppCFLOW(test=self, active=10, datapath='ip4') + ipfix.add_vpp_config() + + # template packet should arrive immediately + ipfix.verify_templates(timeout=3, count=1) + self.collector.get_capture(1) + + ipfix.remove_vpp_config() + + self.logger.info("FFP_TEST_FINISH_0000") + + def test_0001(self): + """ L2 data on IP4 datapath""" + self.logger.info("FFP_TEST_START_0001") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, intf='pg4', active=10, + layer='l2', datapath='ip4') + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3, count=1) + + self.create_stream(src_if=self.pg3, dst_if=self.pg4, packets=1) + capture = self.send_packets(src_if=self.pg3, dst_if=self.pg4) + + # make sure the one packet we expect actually showed up + cflow = self.wait_for_cflow_packet(self.collector, templates[0], 39) + self.verify_cflow_data_detail(ipfix_decoder, capture, cflow, + {2: 'packets', 256: 8}) + + # expected two templates and one cflow packet + self.collector.get_capture(2) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0001") + + def test_0002(self): + """ L3 data on IP4 datapath""" + self.logger.info("FFP_TEST_START_0002") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, intf='pg4', active=10, + layer='l3', datapath='ip4') + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3, count=1) + + self.create_stream(src_if=self.pg3, dst_if=self.pg4, packets=1) + capture = self.send_packets(src_if=self.pg3, dst_if=self.pg4) + + # make sure the one packet we expect actually showed up + cflow = self.wait_for_cflow_packet(self.collector, templates[0], 39) + self.verify_cflow_data_detail(ipfix_decoder, capture, cflow, + {1: 'octets', 2: 'packets', + 8: 'src_ip', 12: 'dst_ip'}) + + # expected two templates and one cflow packet + self.collector.get_capture(2) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0002") + + def test_0003(self): + """ L4 data on IP4 datapath""" + self.logger.info("FFP_TEST_START_0003") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, intf='pg4', active=10, + layer='l4', datapath='ip4') + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3, count=1) + + self.create_stream(src_if=self.pg3, dst_if=self.pg4, packets=1) + capture = self.send_packets(src_if=self.pg3, dst_if=self.pg4) + + # make sure the one packet we expect actually showed up + cflow = self.wait_for_cflow_packet(self.collector, templates[0], 39) + self.verify_cflow_data_detail(ipfix_decoder, capture, cflow, + {2: 'packets', 7: 'sport', 11: 'dport'}) + + # expected two templates and one cflow packet + self.collector.get_capture(2) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0003") + + +class TestFFP_DatapathIP6(MethodHolder): + """collect information on IP6 datapath""" + + def test_0000(self): + """ verify templates on IP6 datapath""" + self.logger.info("FFP_TEST_START_0000") + self.pg_enable_capture(self.pg_interfaces) + + ipfix = VppCFLOW(test=self, active=10, datapath='ip6') + ipfix.add_vpp_config() + + # template packet should arrive immediately + ipfix.verify_templates(timeout=3, count=1) + self.collector.get_capture(1) + + ipfix.remove_vpp_config() + + self.logger.info("FFP_TEST_FINISH_0000") + + def test_0001(self): + """ L2 data on IP6 datapath""" + self.logger.info("FFP_TEST_START_0001") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, intf='pg6', active=20, + layer='l2', datapath='ip6') + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3, count=1) + + self.create_stream(src_if=self.pg5, dst_if=self.pg6, packets=1, + ip_ver='IPv6') + capture = self.send_packets(src_if=self.pg5, dst_if=self.pg6) + + # make sure the one packet we expect actually showed up + cflow = self.wait_for_cflow_packet(self.collector, templates[0], 39) + self.verify_cflow_data_detail(ipfix_decoder, capture, cflow, + {2: 'packets', 256: 56710}, + ip_ver='v6') + + # expected two templates and one cflow packet + self.collector.get_capture(2) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0001") + + def test_0002(self): + """ L3 data on IP6 datapath""" + self.logger.info("FFP_TEST_START_0002") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, intf='pg6', active=10, + layer='l3', datapath='ip6') + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3, count=1) + + self.create_stream(src_if=self.pg5, dst_if=self.pg6, packets=1, + ip_ver='IPv6') + capture = self.send_packets(src_if=self.pg5, dst_if=self.pg6) + + # make sure the one packet we expect actually showed up + cflow = self.wait_for_cflow_packet(self.collector, templates[0], 39) + self.verify_cflow_data_detail(ipfix_decoder, capture, cflow, + {2: 'packets', + 27: 'src_ip', 28: 'dst_ip'}, + ip_ver='v6') + + # expected two templates and one cflow packet + self.collector.get_capture(2) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0002") + + def test_0003(self): + """ L4 data on IP6 datapath""" + self.logger.info("FFP_TEST_START_0003") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, intf='pg6', active=10, + layer='l4', datapath='ip6') + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3, count=1) + + self.create_stream(src_if=self.pg5, dst_if=self.pg6, packets=1, + ip_ver='IPv6') + capture = self.send_packets(src_if=self.pg5, dst_if=self.pg6) + + # make sure the one packet we expect actually showed up + cflow = self.wait_for_cflow_packet(self.collector, templates[0], 39) + self.verify_cflow_data_detail(ipfix_decoder, capture, cflow, + {2: 'packets', 7: 'sport', 11: 'dport'}, + ip_ver='v6') + + # expected two templates and one cflow packet + self.collector.get_capture(2) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0003") + + +class TestFFP_NoTimers(MethodHolder): + """No timers""" + + def test_0001(self): + """ no timers, one CFLOW packet, 9 Flows inside""" + self.logger.info("FFP_TEST_START_0001") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, active=0) + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3) + + self.create_stream(packets=9) + capture = self.send_packets() + + # make sure the one packet we expect actually showed up + cflow = self.wait_for_cflow_packet(self.collector, templates[1], 10) + self.verify_cflow_data_notimer(ipfix_decoder, capture, [cflow]) + self.wait_for_cflow_packet(self.collector, templates[1], 10, + expected=False) + self.collector.get_capture(4) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0001") + + def test_0002(self): + """ no timers, two CFLOW packets (mtu=256), 3 Flows in each""" + self.logger.info("FFP_TEST_START_0002") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, active=0, mtu=256) + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3) + + self.create_stream(packets=6) + capture = self.send_packets() + + # make sure the one packet we expect actually showed up + cflows = [] + cflows.append(self.wait_for_cflow_packet(self.collector, + templates[1], 10)) + cflows.append(self.wait_for_cflow_packet(self.collector, + templates[1], 10)) + self.verify_cflow_data_notimer(ipfix_decoder, capture, cflows) + self.collector.get_capture(5) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0002") + + +class TestFFP_DisableIPFIX(MethodHolder): + """Disable IPFIX""" + + def test_0001(self): + """ disable IPFIX after first packets""" + self.logger.info("FFP_TEST_START_0001") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, active=10) + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=30) + + self.create_stream() + self.send_packets() + + # make sure the one packet we expect actually showed up + self.wait_for_cflow_packet(self.collector, templates[1], 30) + self.collector.get_capture(4) + + # disble IPFIX + ipfix.disable_exporter() + self.pg_enable_capture([self.collector]) + + self.send_packets() + + # make sure no one packet arrived in 1 minute + self.wait_for_cflow_packet(self.collector, templates[1], 30, + expected=False) + self.collector.get_capture(0) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0001") + + +class TestFFP_ReenableIPFIX(MethodHolder): + """Re-enable IPFIX""" + + def test_0001(self): + """ disable IPFIX after first packets and re-enable after few packets + """ + self.logger.info("FFP_TEST_START_0001") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, active=10) + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3) + + self.create_stream() + self.send_packets() + + # make sure the one packet we expect actually showed up + self.wait_for_cflow_packet(self.collector, templates[1], 30) + self.collector.get_capture(4) + + # disble IPFIX + ipfix.disable_exporter() + self.pg_enable_capture([self.collector]) + + self.send_packets() + + # make sure no one packet arrived in active timer span + self.wait_for_cflow_packet(self.collector, templates[1], 30, + expected=False) + self.collector.get_capture(0) + + # enable IPFIX + ipfix.enable_exporter() + self.vapi.cli("ipfix flush") + templates = ipfix.verify_templates(ipfix_decoder, timeout=3) + + self.send_packets() + + # make sure the next packets (templates and data) we expect actually + # showed up + self.wait_for_cflow_packet(self.collector, templates[1], 30) + self.collector.get_capture(4) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0001") + + +class TestFFP_DisableFFP(MethodHolder): + """Disable Flowprobe feature""" + + def test_0001(self): + """ disable flowprobe feature after first packets""" + self.logger.info("FFP_TEST_START_0001") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + ipfix = VppCFLOW(test=self, active=10) + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3) + + self.create_stream() + self.send_packets() + + # make sure the one packet we expect actually showed up + self.wait_for_cflow_packet(self.collector, templates[1], 30) + self.collector.get_capture(4) + + # disble IPFIX + ipfix.disable_flowprobe_feature() + self.pg_enable_capture([self.collector]) + + self.send_packets() + + # make sure no one packet arrived in active timer span + self.wait_for_cflow_packet(self.collector, templates[1], 30, + expected=False) + self.collector.get_capture(0) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0001") + + +class TestFFP_ReenableFFP(MethodHolder): + """Re-enable Flowprobe feature""" + + def test_0001(self): + """ disable flowprobe feature after first packets and re-enable + after few packets """ + self.logger.info("FFP_TEST_START_0001") + self.pg_enable_capture(self.pg_interfaces) + self.pkts = [] + + ipfix = VppCFLOW(test=self, active=10) + ipfix.add_vpp_config() + + ipfix_decoder = IPFIXDecoder() + # template packet should arrive immediately + templates = ipfix.verify_templates(ipfix_decoder, timeout=3) + + self.create_stream() + self.send_packets() + + # make sure the one packet we expect actually showed up + self.wait_for_cflow_packet(self.collector, templates[1], 30) + self.collector.get_capture(4) + + # disble FPP feature + ipfix.disable_flowprobe_feature() + self.pg_enable_capture([self.collector]) + + self.send_packets() + + # make sure no one packet arrived in active timer span + self.wait_for_cflow_packet(self.collector, templates[1], 30, + expected=False) + self.collector.get_capture(0) + + # enable FPP feature + ipfix.enable_flowprobe_feature() + self.vapi.cli("ipfix flush") + templates = ipfix.verify_templates(ipfix_decoder, timeout=3) + + self.send_packets() + + # make sure the next packets (templates and data) we expect actually + # showed up + self.wait_for_cflow_packet(self.collector, templates[1], 30) + self.collector.get_capture(4) + + ipfix.remove_vpp_config() + self.logger.info("FFP_TEST_FINISH_0001") + + +if __name__ == '__main__': + unittest.main(testRunner=VppTestRunner) -- cgit 1.2.3-korg From 583dc8d3e23a780c85ebe48ea59f0338aad4df17 Mon Sep 17 00:00:00 2001 From: Ray Kinsella Date: Thu, 8 Jun 2017 15:54:19 +0100 Subject: Sample plugin: Add sample plugin documentation Added some user documentation to sample plugin. Change-Id: I518910f80499307e8fcac8dcef7baaeab5ea8e35 Signed-off-by: Ray Kinsella --- README.md | 2 +- doxygen/Makefile | 3 +- doxygen/user_doc.md | 1 + src/examples/sample-plugin/sample/sample.c | 30 ++++++++--- src/examples/sample-plugin/sample_plugin_doc.md | 66 +++++++++++++++++++++++++ 5 files changed, 92 insertions(+), 10 deletions(-) create mode 100644 src/examples/sample-plugin/sample_plugin_doc.md (limited to 'doxygen/user_doc.md') diff --git a/README.md b/README.md index 7f429d12..596494b9 100644 --- a/README.md +++ b/README.md @@ -45,7 +45,7 @@ Directory name | Description @ref src/vppinfra | VPP core library test | Unit tests @ref src/vpp/api | Not-yet-relocated API bindings - +@ref src/examples | VPP example code ## Getting started diff --git a/doxygen/Makefile b/doxygen/Makefile index abaca185..b6ba5887 100644 --- a/doxygen/Makefile +++ b/doxygen/Makefile @@ -74,8 +74,7 @@ DOXY_INPUT := $(subst $(WS_ROOT)/,,$(DOXY_INPUT)) # These must be left-anchored paths for the regexp below to work. DOXY_EXCLUDE ?= \ $(DOXY_SRC)/vlib/buffer.c \ - $(DOXY_SRC)/vpp-api/lua \ - $(DOXY_SRC)/examples/sample-plugin + $(DOXY_SRC)/vpp-api/lua # Generate a regexp for filenames to exclude DOXY_EXCLUDE_REGEXP = ($(subst .,\.,$(shell echo '$(strip $(DOXY_EXCLUDE))' | sed -e 's/ /|/g'))) diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index becc2e0a..c0a05bd9 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -16,3 +16,4 @@ Several modules provide operational, dataplane-user focused documentation. - @subpage span_doc - @subpage srv6_doc - @subpage srmpls_doc +- @subpage sample_plugin_doc diff --git a/src/examples/sample-plugin/sample/sample.c b/src/examples/sample-plugin/sample/sample.c index 2f8ac4c9..3929ac23 100644 --- a/src/examples/sample-plugin/sample/sample.c +++ b/src/examples/sample-plugin/sample/sample.c @@ -12,10 +12,9 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -/* - *------------------------------------------------------------------ - * sample.c - simple MAC-swap API / debug CLI handling - *------------------------------------------------------------------ +/** + * @file + * @brief Sample Plugin, plugin API / trace / CLI handling. */ #include @@ -65,7 +64,11 @@ VLIB_PLUGIN_REGISTER () = { }; /* *INDENT-ON* */ -/* Action function shared between message handler and debug CLI */ +/** + * @brief Enable/disable the macswap plugin. + * + * Action function shared between message handler and debug CLI. + */ int sample_macswap_enable_disable (sample_main_t * sm, u32 sw_if_index, int enable_disable) @@ -135,6 +138,9 @@ macswap_enable_disable_command_fn (vlib_main_t * vm, return 0; } +/** + * @brief CLI command to enable/disable the sample macswap plugin. + */ VLIB_CLI_COMMAND (sr_content_command, static) = { .path = "sample macswap", .short_help = @@ -142,7 +148,9 @@ VLIB_CLI_COMMAND (sr_content_command, static) = { .function = macswap_enable_disable_command_fn, }; -/* API message handler */ +/** + * @brief Plugin API message handler. + */ static void vl_api_sample_macswap_enable_disable_t_handler (vl_api_sample_macswap_enable_disable_t * mp) { @@ -156,7 +164,9 @@ static void vl_api_sample_macswap_enable_disable_t_handler REPLY_MACRO(VL_API_SAMPLE_MACSWAP_ENABLE_DISABLE_REPLY); } -/* Set up the API message handling tables */ +/** + * @brief Set up the API message handling tables. + */ static clib_error_t * sample_plugin_api_hookup (vlib_main_t *vm) { @@ -188,6 +198,9 @@ setup_message_id_table (sample_main_t * sm, api_main_t *am) #undef _ } +/** + * @brief Initialize the sample plugin. + */ static clib_error_t * sample_init (vlib_main_t * vm) { sample_main_t * sm = &sample_main; @@ -214,6 +227,9 @@ static clib_error_t * sample_init (vlib_main_t * vm) VLIB_INIT_FUNCTION (sample_init); +/** + * @brief Hook the sample plugin into the VPP graph hierarchy. + */ VNET_FEATURE_INIT (sample, static) = { .arc_name = "device-input", diff --git a/src/examples/sample-plugin/sample_plugin_doc.md b/src/examples/sample-plugin/sample_plugin_doc.md new file mode 100644 index 00000000..9348094c --- /dev/null +++ b/src/examples/sample-plugin/sample_plugin_doc.md @@ -0,0 +1,66 @@ +# Sample plugin for VPP {#sample_plugin_doc} + +## Overview + +This is the VPP sample plugin demonstrates how to create a new plugin that integrates +with VPP. The sample code implements a trival macswap algorithim that demonstrates plugin +runtime integration with the VPP graph hierachy, api and cli. + +For deeper dive information see the annotations in the sample code itself. See [sample.c](@ref sample.c) + +## How to build and run the sample plugin. + +Now (re)build VPP. + + $ make wipe + +Define environmental variable 'VPP_WITH_SAMPLE_PLUGIN=yes' with a process scope + + $ VPP_WITH_SAMPLE_PLUGIN=yes make build + +or a session scope, and build VPP. + + $ export VPP_WITH_SAMPLE_PLUGIN=yes + & make build + +Now run VPP and make sure the plugin is loaded. + + $ make run + ... + load_one_plugin:184: Loaded plugin: memif_plugin.so (Packet Memory Interface (experimetal)) + load_one_plugin:184: Loaded plugin: sample_plugin.so (Sample of VPP Plugin) + load_one_plugin:184: Loaded plugin: snat_plugin.so (Network Address Translation) + ... + DBGvpp# + +## How to create a new plugin + +To create a new plugin based on the sample plugin, copy and rename the sample plugin directory and automake config. + + cp -r src/examples/sample-plugin/sample src/plugins/newplugin + cp src/examples/sample-plugin/sample.am src/plugins/newplugin.am + +Add the following entry to the plugins section of `src/configure.ac`. + + PLUGIN_ENABLED(newplugin) + +Add the following entry to the plugins section of `src/plugins/Makefile.am` + + if ENABLE_NEWPLUGIN + include newplugin.am + endif + +Now (re)build VPP. + + $ make wipe + $ make build + +## Configuration + +To enable the sample plugin + + sample macswap + +To disable the sample plugin + + sample macswap disable -- cgit 1.2.3-korg From 29059b49debb0148ecc38901da720fcd89b34690 Mon Sep 17 00:00:00 2001 From: Matus Fabian Date: Thu, 22 Jun 2017 03:03:53 -0700 Subject: NAT64: documentation added CLI command documentation added NAT64 user documentation page Change-Id: I3df400013800fe16351e02db7762ee3f92b195ff Signed-off-by: Matus Fabian --- doxygen/user_doc.md | 1 + src/plugins/snat/nat64_cli.c | 138 ++++++++++++++++++++++++++++++++++++++++++ src/plugins/snat/nat64_doc.md | 73 ++++++++++++++++++++++ 3 files changed, 212 insertions(+) create mode 100644 src/plugins/snat/nat64_doc.md (limited to 'doxygen/user_doc.md') diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index c0a05bd9..952a72fc 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -17,3 +17,4 @@ Several modules provide operational, dataplane-user focused documentation. - @subpage srv6_doc - @subpage srmpls_doc - @subpage sample_plugin_doc +- @subpage nat64_doc diff --git a/src/plugins/snat/nat64_cli.c b/src/plugins/snat/nat64_cli.c index 25345cd8..32f671d5 100644 --- a/src/plugins/snat/nat64_cli.c +++ b/src/plugins/snat/nat64_cli.c @@ -713,6 +713,18 @@ nat64_show_prefix_command_fn (vlib_main_t * vm, unformat_input_t * input, /* *INDENT-OFF* */ +/*? + * @cliexpar + * @cliexstart{nat64 add pool address} + * Add/delete NAT64 pool address. + * To add single NAT64 pool address use: + * vpp# nat64 add pool address 10.1.1.10 + * To add NAT64 pool address range use: + * vpp# nat64 add pool address 10.1.1.2 - 10.1.1.5 + * To add NAT64 pool address for specific tenant use: + * vpp# nat64 add pool address 10.1.1.100 tenant-vrf 100 + * @cliexend +?*/ VLIB_CLI_COMMAND (nat64_add_pool_address_command, static) = { .path = "nat64 add pool address", .short_help = "nat64 add pool address [- ] " @@ -720,24 +732,64 @@ VLIB_CLI_COMMAND (nat64_add_pool_address_command, static) = { .function = nat64_add_del_pool_addr_command_fn, }; +/*? + * @cliexpar + * @cliexstart{show nat64 pool} + * Show NAT64 pool. + * vpp# show nat64 pool + * NAT64 pool: + * 10.1.1.3 tenant VRF: 0 + * 10.1.1.10 tenant VRF: 10 + * @cliexend +?*/ VLIB_CLI_COMMAND (show_nat64_pool_command, static) = { .path = "show nat64 pool", .short_help = "show nat64 pool", .function = nat64_show_pool_command_fn, }; +/*? + * @cliexpar + * @cliexstart{set interface nat64} + * Enable/disable NAT64 feature on the interface. + * To enable NAT64 feature with local (IPv6) network interface + * GigabitEthernet0/8/0 and external (IPv4) network interface + * GigabitEthernet0/a/0 use: + * vpp# set interface nat64 in GigabitEthernet0/8/0 out GigabitEthernet0/a/0 + * @cliexend +?*/ VLIB_CLI_COMMAND (set_interface_nat64_command, static) = { .path = "set interface nat64", .short_help = "set interface nat64 in|out [del]", .function = nat64_interface_feature_command_fn, }; +/*? + * @cliexpar + * @cliexstart{show nat64 interfaces} + * Show interfaces with NAT64 feature. + * To show interfaces with NAT64 feature use: + * vpp# show nat64 interfaces + * NAT64 interfaces: + * GigabitEthernet0/8/0 in + * GigabitEthernet0/a/0 out + * @cliexend +?*/ VLIB_CLI_COMMAND (show_nat64_interfaces_command, static) = { .path = "show nat64 interfaces", .short_help = "show nat64 interfaces", .function = nat64_show_interfaces_command_fn, }; +/*? + * @cliexpar + * @cliexstart{nat64 add static bib} + * Add/delete NAT64 static BIB entry. + * To create NAT64 satatic BIB entry use: + * vpp# nat64 add static bib 2001:db8:c000:221:: 1234 10.1.1.3 5678 tcp + * vpp# nat64 add static bib 2001:db8:c000:221:: 1234 10.1.1.3 5678 udp vrf 10 + * @cliexend +?*/ VLIB_CLI_COMMAND (nat64_add_del_static_bib_command, static) = { .path = "nat64 add static bib", .short_help = "nat64 add static bib " @@ -745,12 +797,42 @@ VLIB_CLI_COMMAND (nat64_add_del_static_bib_command, static) = { .function = nat64_add_del_static_bib_command_fn, }; +/*? + * @cliexpar + * @cliexstart{show nat64 bib} + * Show NAT64 BIB entries. + * To show NAT64 TCP BIB entries use: + * vpp# show nat64 bib tcp + * NAT64 tcp BIB: + * fd01:1::2 6303 10.0.0.3 62303 tcp vrf 0 dynamic 1 sessions + * 2001:db8:c000:221:: 1234 10.1.1.3 5678 tcp vrf 0 static 2 sessions + * To show NAT64 UDP BIB entries use: + * vpp# show nat64 bib udp + * NAT64 udp BIB: + * fd01:1::2 6304 10.0.0.3 10546 udp vrf 0 dynamic 10 sessions + * 2001:db8:c000:221:: 1234 10.1.1.3 5678 udp vrf 10 static 0 sessions + * To show NAT64 ICMP BIB entries use: + * vpp# show nat64 bib icmp + * NAT64 icmp BIB: + * fd01:1::2 6305 10.0.0.3 63209 icmp vrf 10 dynamic 1 sessions + * @cliexend +?*/ VLIB_CLI_COMMAND (show_nat64_bib_command, static) = { .path = "show nat64 bib", .short_help = "show nat64 bib tcp|udp|icmp", .function = nat64_show_bib_command_fn, }; +/*? + * @cliexpar + * @cliexstart{set nat64 timeouts} + * Set NAT64 session timeouts (in seconds). + * To set NAT64 session timeoutes use use: + * vpp# set nat64 timeouts udp 200 icmp 30 tcp-trans 250 tcp-est 7450 + * To reset NAT64 session timeoutes to default values use: + * vpp# set nat64 timeouts reset + * @cliexend +?*/ VLIB_CLI_COMMAND (set_nat64_timeouts_command, static) = { .path = "set nat64 timeouts", .short_help = "set nat64 timeouts udp icmp tcp-trans " @@ -758,18 +840,63 @@ VLIB_CLI_COMMAND (set_nat64_timeouts_command, static) = { .function = nat64_set_timeouts_command_fn, }; +/*? + * @cliexpar + * @cliexstart{show nat64 tiemouts} + * Show NAT64 session timeouts: + * vpp# show nat64 tiemouts + * NAT64 session timeouts: + * UDP 300sec + * ICMP 60sec + * TCP transitory 240sec + * TCP established 7440sec + * TCP incoming SYN 6sec + * @cliexend +?*/ VLIB_CLI_COMMAND (show_nat64_timeouts_command, static) = { .path = "show nat64 tiemouts", .short_help = "show nat64 tiemouts", .function = nat64_show_timeouts_command_fn, }; +/*? + * @cliexpar + * @cliexstart{show nat64 session table} + * Show NAT64 session table. + * To show NAT64 TCP session table use: + * vpp# show nat64 session table tcp + * NAT64 tcp session table: + * fd01:1::2 6303 64:ff9b::ac10:202 20 10.0.0.3 62303 172.16.2.2 20 tcp vrf 0 + * fd01:3::2 6303 64:ff9b::ac10:202 20 10.0.10.3 21300 172.16.2.2 20 tcp vrf 10 + * To show NAT64 UDP session table use: + * #vpp show nat64 session table udp + * NAT64 udp session table: + * fd01:1::2 6304 64:ff9b::ac10:202 20 10.0.0.3 10546 172.16.2.2 20 udp vrf 0 + * fd01:3::2 6304 64:ff9b::ac10:202 20 10.0.10.3 58627 172.16.2.2 20 udp vrf 10 + * fd01:1::2 1235 64:ff9b::a00:3 4023 10.0.0.3 24488 10.0.0.3 4023 udp vrf 0 + * fd01:1::3 23 64:ff9b::a00:3 24488 10.0.0.3 4023 10.0.0.3 24488 udp vrf 0 + * To show NAT64 ICMP session table use: + * #vpp show nat64 session table icmp + * NAT64 icmp session table: + * fd01:1::2 64:ff9b::ac10:202 6305 10.0.0.3 172.16.2.2 63209 icmp vrf 0 + * @cliexend +?*/ VLIB_CLI_COMMAND (show_nat64_st_command, static) = { .path = "show nat64 session table", .short_help = "show nat64 session table tcp|udp|icmp", .function = nat64_show_st_command_fn, }; +/*? + * @cliexpar + * @cliexstart{nat64 add prefix} + * Set NAT64 prefix for generating IPv6 representations of IPv4 addresses. + * To set NAT64 global prefix use: + * vpp# nat64 add prefix 2001:db8::/32 + * To set NAT64 prefix for specific tenant use: + * vpp# nat64 add prefix 2001:db8:122:300::/56 tenant-vrf 10 + * @cliexend +?*/ VLIB_CLI_COMMAND (nat64_add_del_prefix_command, static) = { .path = "nat64 add prefix", .short_help = "nat64 add prefix / [tenant-vrf ] " @@ -777,6 +904,17 @@ VLIB_CLI_COMMAND (nat64_add_del_prefix_command, static) = { .function = nat64_add_del_prefix_command_fn, }; +/*? + * @cliexpar + * @cliexstart{show nat64 prefix} + * Show NAT64 prefix. + * To show NAT64 prefix use: + * vpp# show nat64 prefix + * NAT64 prefix: + * 2001:db8::/32 tenant-vrf 0 + * 2001:db8:122:300::/56 tenant-vrf 10 + * @cliexend +?*/ VLIB_CLI_COMMAND (show_nat64_prefix_command, static) = { .path = "show nat64 prefix", .short_help = "show nat64 prefix", diff --git a/src/plugins/snat/nat64_doc.md b/src/plugins/snat/nat64_doc.md new file mode 100644 index 00000000..f94467da --- /dev/null +++ b/src/plugins/snat/nat64_doc.md @@ -0,0 +1,73 @@ +# Stateful NAT64: Network Address and Protocol Translation from IPv6 Clients to IPv4 Servers {#nat64_doc} + +## Introduction + +Stateful NAT64 in VPP allows IPv6-only clients to contact IPv4 servers using unicast UDP, TCP, or ICMP based on RFC 6146. + +## Configuration + +### Enable/disable NAT64 feature on the interface + +> set interface nat64 in|out [del] + +in: inside/local/IPv6 network +out: outside/external/IPv4 network +intfc: interface name + +### Add/delete NAT64 pool address + +One or more public IPv4 addresses assigned to a NAT64 are shared among several IPv6-only clients. + +> nat64 add pool address [- ] [tenant-vrf ] [del] + +ip4-range-start: First IPv4 address of the range +ip4-range-end: Last IPv4 address of the range (optional, not used for single address) +tenant-vrf-id: VRF id of the tenant associated with the pool address (optional, if not set pool address is global) + +### Add/delete static BIB entry + +Stateful NAT64 also supports IPv4-initiated communications to a subset of the IPv6 hosts through staticaly configured bindings. + +> nat64 add static bib tcp|udp|icmp [vfr ] [del] + +ip6-addr: inside IPv6 address of the host +in-port: inside port or ICMPv6 identifier +ip4-addr: outside IPv4 address of the host +out-port: outside port or ICMPv4 identifier +table-id: VRF id of the tenant associated with the BIB entry (optional, default use global VRF) + +### Set NAT64 session timeouts + +Session is deleted when timer expires. If all sessions corresponding to a dynamically create BIB entry are deleted, then the BIB entry is also deleted. When packets are flowing sessiom timer is refreshed to keep the session alive. + +> set nat64 timeouts udp icmp tcp-trans tcp-est tcp-incoming-syn | reset + +udp: UDP session timeout value (default 300sec) +icmp: ICMP session timeout value (default 60sec) +tcp-trans: transitory TCP session timeout value (default 240sec) +tcp-est: established TCP session timeout value (default 7440sec) +tcp-incoming-syn: incoming SYN TCP session timeout value (default 6sec) +reset: reset timers to default values + +### Set NAT64 prefix + +Stateful NAT64 support the algorithm for generating IPv6 representations of IPv4 addresses defined in RFC 6052. If no prefix is configured, Well-Known Prefix (64:ff9b::/96) is used. + +> nat64 add prefix / [tenant-vrf ] [del] + +ip6-prefix: IPv6 prefix +plen: prefix length (valid values: 32, 40, 48, 56, 64, or 96) +tenant-vrf: VRF id of the tenant associated with the prefix + +### Show commands + +> show nat64 pool +> show nat64 interfaces +> show nat64 bib tcp|udp|icmp +> show nat64 session table tcp|udp|icmp +> show nat64 tiemouts +> show nat64 prefix + +## Notes + +Multi thread is not supported yet (CLI/API commands are disabled when VPP runs with multiple threads). -- cgit 1.2.3-korg From 905c14af2b1464840cea201daed005cb30513683 Mon Sep 17 00:00:00 2001 From: Dave Barach Date: Mon, 25 Sep 2017 08:47:59 -0400 Subject: Add binary API documentation Change-Id: Id1a5da12b13d87bacfa81094f471b95db40c39be Signed-off-by: Dave Barach --- doxygen/user_doc.md | 1 + src/vlibapi/api_common.h | 192 +++++++++++++++--------- src/vlibapi/api_doc.md | 349 +++++++++++++++++++++++++++++++++++++++++++ src/vlibmemory/memory_vlib.c | 4 +- 4 files changed, 475 insertions(+), 71 deletions(-) create mode 100644 src/vlibapi/api_doc.md (limited to 'doxygen/user_doc.md') diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index 952a72fc..65b09f98 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -18,3 +18,4 @@ Several modules provide operational, dataplane-user focused documentation. - @subpage srmpls_doc - @subpage sample_plugin_doc - @subpage nat64_doc +- @subpage api_doc diff --git a/src/vlibapi/api_common.h b/src/vlibapi/api_common.h index dc6761bc..770cf47d 100644 --- a/src/vlibapi/api_common.h +++ b/src/vlibapi/api_common.h @@ -20,71 +20,74 @@ #ifndef included_api_common_h #define included_api_common_h +/** \file API common definitions + * See api_doc.md for more info + */ + #include #include #include +/** API registration types + */ typedef enum { REGISTRATION_TYPE_FREE = 0, - REGISTRATION_TYPE_SHMEM, - REGISTRATION_TYPE_SOCKET_LISTEN, - REGISTRATION_TYPE_SOCKET_SERVER, - REGISTRATION_TYPE_SOCKET_CLIENT, + REGISTRATION_TYPE_SHMEM, /**< Shared memory connection */ + REGISTRATION_TYPE_SOCKET_LISTEN, /**< Socket listener */ + REGISTRATION_TYPE_SOCKET_SERVER, /**< Socket server */ + REGISTRATION_TYPE_SOCKET_CLIENT, /**< Socket client */ } vl_registration_type_t; +/** An API client registration, only in vpp/vlib */ + typedef struct vl_api_registration_ { - vl_registration_type_t registration_type; + vl_registration_type_t registration_type; /**< type */ - /* Index in VLIB's brain (not shared memory). */ + /** Index in VLIB's brain (not shared memory). */ u32 vl_api_registration_pool_index; - u8 *name; + u8 *name; /**< Client name */ - /* - * The following groups of data could be unioned, but my fingers are - * going to be sore enough. - */ - - /* shared memory only */ + /** shared memory only: pointer to client input queue */ unix_shared_memory_queue_t *vl_input_queue; /* socket server and client */ - u32 clib_file_index; - i8 *unprocessed_input; - u32 unprocessed_msg_length; - u8 *output_vector; + u32 clib_file_index; /**< Socket only: file index */ + i8 *unprocessed_input; /**< Socket only: pending input */ + u32 unprocessed_msg_length; /**< Socket only: unprocssed length */ + u8 *output_vector; /**< Socket only: output vecto */ /* socket client only */ - u32 server_handle; - u32 server_index; - + u32 server_handle; /**< Socket client only: server handle */ + u32 server_index; /**< Socket client only: server index */ } vl_api_registration_t; -/* Trace configuration for a single message */ +/** Trace configuration for a single message */ typedef struct { - int size; - int trace_enable; - int replay_enable; + int size; /**< for sanity checking */ + int trace_enable; /**< trace this message */ + int replay_enable; /**< This message can be replayed */ } trace_cfg_t; -/* - * API recording +/** + * API trace state */ typedef struct { - u8 endian; - u8 enabled; - u8 wrapped; + u8 endian; /**< trace endianness */ + u8 enabled; /**< trace is enabled */ + u8 wrapped; /**< trace has wrapped */ u8 pad; - u32 nitems; - u32 curindex; - u8 **traces; + u32 nitems; /**< Number of trace records */ + u32 curindex; /**< Current index in circular buffer */ + u8 **traces; /**< Trace ring */ } vl_api_trace_t; +/** Trace RX / TX enum */ typedef enum { VL_API_TRACE_TX, @@ -94,35 +97,38 @@ typedef enum #define VL_API_LITTLE_ENDIAN 0x00 #define VL_API_BIG_ENDIAN 0x01 +/** Message range (belonging to a plugin) */ typedef struct { - u8 *name; - u16 first_msg_id; - u16 last_msg_id; + u8 *name; /**< name of the plugin */ + u16 first_msg_id; /**< first assigned message ID */ + u16 last_msg_id; /**< last assigned message ID */ } vl_api_msg_range_t; +/** Message configuration definition */ typedef struct { - int id; - char *name; - u32 crc; - void *handler; - void *cleanup; - void *endian; - void *print; - int size; - int traced; - int replay; - int message_bounce; - int is_mp_safe; + int id; /**< the message ID */ + char *name; /**< the message name */ + u32 crc; /**< message definition CRC */ + void *handler; /**< the message handler */ + void *cleanup; /**< non-default message cleanup handler */ + void *endian; /**< message endian function */ + void *print; /**< message print function */ + int size; /**< message size */ + int traced; /**< is this message to be traced? */ + int replay; /**< is this message to be replayed? */ + int message_bounce; /**< do not free message after processing */ + int is_mp_safe; /**< worker thread barrier required? */ } vl_msg_api_msg_config_t; +/** Message header structure */ typedef struct msgbuf_ { - unix_shared_memory_queue_t *q; - u32 data_len; - u32 gc_mark_timestamp; - u8 data[0]; + unix_shared_memory_queue_t *q; /**< message allocated in this shmem ring */ + u32 data_len; /**< message length not including header */ + u32 gc_mark_timestamp; /**< message garbage collector mark TS */ + u8 data[0]; /**< actual message begins here */ } msgbuf_t; /* api_shared.c prototypes */ @@ -171,101 +177,147 @@ typedef struct _vl_msg_api_init_function_list_elt vl_msg_api_init_function_t *f; } _vl_msg_api_function_list_elt_t; +/** API main structure, used by both vpp and binary API clients */ typedef struct { + /** Message handler vector */ void (**msg_handlers) (void *); + /** Plaform-dependent (aka hardware) message handler vector */ int (**pd_msg_handlers) (void *, int); + + /** non-default message cleanup handler vector */ void (**msg_cleanup_handlers) (void *); + + /** Message endian handler vector */ void (**msg_endian_handlers) (void *); + + /** Message print function vector */ void (**msg_print_handlers) (void *, void *); + + /** Message name vector */ const char **msg_names; + + /** Don't automatically free message buffer vetor */ u8 *message_bounce; + + /** Message is mp safe vector */ u8 *is_mp_safe; + + /** Allocator ring vectors (in shared memory) */ struct ring_alloc_ *arings; + + /** Number of times that the ring allocator failed */ u32 ring_misses; + + /** Number of garbage-collected message buffers */ u32 garbage_collects; + + /** Number of missing clients / failed message sends */ u32 missing_clients; + + /** Received message trace configuration */ vl_api_trace_t *rx_trace; + + /** Sent message trace configuration */ vl_api_trace_t *tx_trace; + + /** Print every received message */ int msg_print_flag; + + /** Current trace configuration */ trace_cfg_t *api_trace_cfg; + + /** Current process PID */ int our_pid; + + /** Binary api segment descriptor */ svm_region_t *vlib_rp; + + /** Vector of all mapped shared-VM segments */ svm_region_t **mapped_shmem_regions; + + /** Binary API shared-memory segment header pointer */ struct vl_shmem_hdr_ *shmem_hdr; + + /** vlib/vpp only: vector of client registrations */ vl_api_registration_t **vl_clients; + /** vlib/vpp only: serialized (message, name, crc) table */ u8 *serialized_message_table_in_shmem; - /* For plugin msg allocator */ + /** First available message ID, for theplugin msg allocator */ u16 first_available_msg_id; - /* message range by name hash */ + /** Message range by name hash */ uword *msg_range_by_name; - /* vector of message ranges */ + /** vector of message ranges */ vl_api_msg_range_t *msg_ranges; - /* uid for the api shared memory region */ + /** uid for the api shared memory region */ int api_uid; - /* gid for the api shared memory region */ + + /** gid for the api shared memory region */ int api_gid; - /* base virtual address for global VM region */ + /** base virtual address for global VM region */ u64 global_baseva; - /* size of the global VM region */ + /** size of the global VM region */ u64 global_size; - /* size of the API region */ + /** size of the API region */ u64 api_size; - /* size of the global VM private mheap */ + /** size of the global VM private mheap */ u64 global_pvt_heap_size; - /* size of the api private mheap */ + /** size of the api private mheap */ u64 api_pvt_heap_size; - /* Client-only data structures */ + /** Peer input queue pointer */ unix_shared_memory_queue_t *vl_input_queue; - /* + /** * All VLIB-side message handlers use my_client_index to identify * the queue / client. This works in sim replay. */ int my_client_index; - /* + /** * This is the (shared VM) address of the registration, * don't use it to id the connection since it can't possibly * work in simulator replay. */ vl_api_registration_t *my_registration; + /** (Historical) signal-based queue non-empty signal, to be removed */ i32 vlib_signal; - /* vlib input queue length */ + /** vpp/vlib input queue length */ u32 vlib_input_queue_length; - /* client side message index hash table */ + /** client message index hash table */ uword *msg_index_by_name_and_crc; + /** Shared VM binary API region name */ const char *region_name; + + /** Chroot path to the shared memory API files */ const char *root_path; - /* Replay in progress? */ + /** Replay in progress? */ int replay_in_progress; - /* Dump (msg-name, crc) snapshot here at startup */ + /** Dump (msg-name, crc) snapshot here at startup */ u8 *save_msg_table_filename; - /* List of API client reaper functions */ + /** List of API client reaper functions */ _vl_msg_api_function_list_elt_t *reaper_function_registrations; } api_main_t; extern api_main_t api_main; - #endif /* included_api_common_h */ /* diff --git a/src/vlibapi/api_doc.md b/src/vlibapi/api_doc.md new file mode 100644 index 00000000..e620ee12 --- /dev/null +++ b/src/vlibapi/api_doc.md @@ -0,0 +1,349 @@ +# Binary API support {#api_doc} + +VPP provides a binary API scheme to allow a wide variety of client codes to +program data-plane tables. As of this writing, there are hundreds of binary +APIs. + +Messages are defined in `*.api` files. Today, there are about 50 api files, +with more arriving as folks add programmable features. The API file compiler +sources reside in @ref src/tools/vppapigen . + +Here's a typical request/response message definition, from +@ref src/vnet/interface.api : + +``` + autoreply define sw_interface_set_flags + { + u32 client_index; + u32 context; + u32 sw_if_index; + /* 1 = up, 0 = down */ + u8 admin_up_down; + }; +``` + +To a first approximation, the API compiler renders this definition as +follows: + +``` + /****** Message ID / handler enum ******/ + #ifdef vl_msg_id + vl_msg_id(VL_API_SW_INTERFACE_SET_FLAGS, vl_api_sw_interface_set_flags_t_handler) + vl_msg_id(VL_API_SW_INTERFACE_SET_FLAGS_REPLY, vl_api_sw_interface_set_flags_reply_t_handler) + #endif + + /****** Message names ******/ + #ifdef vl_msg_name + vl_msg_name(vl_api_sw_interface_set_flags_t, 1) + vl_msg_name(vl_api_sw_interface_set_flags_reply_t, 1) + #endif + + /****** Message name, crc list ******/ + #ifdef vl_msg_name_crc_list + #define foreach_vl_msg_name_crc_interface \ + _(VL_API_SW_INTERFACE_SET_FLAGS, sw_interface_set_flags, f890584a) \ + _(VL_API_SW_INTERFACE_SET_FLAGS_REPLY, sw_interface_set_flags_reply, dfbf3afa) \ + #endif + + /****** Typedefs *****/ + #ifdef vl_typedefs + typedef VL_API_PACKED(struct _vl_api_sw_interface_set_flags { + u16 _vl_msg_id; + u32 client_index; + u32 context; + u32 sw_if_index; + u8 admin_up_down; + }) vl_api_sw_interface_set_flags_t; + + typedef VL_API_PACKED(struct _vl_api_sw_interface_set_flags_reply { + u16 _vl_msg_id; + u32 context; + i32 retval; + }) vl_api_sw_interface_set_flags_reply_t; +``` + +To change the admin state of an interface, a binary api client sends a +@ref vl_api_sw_interface_set_flags_t to vpp, which will respond with a +@ref vl_api_sw_interface_set_flags_reply_t message. + +Multiple layers of software, transport types, and shared libraries +implement a variety of features: + +* API message allocation, tracing, pretty-printing, and replay. +* Message transport via global shared memory, pairwise/private shared + memory, and sockets. +* Barrier synchronization of worker threads across thread-unsafe + message handlers. + +Correctly-coded message handlers know nothing about the transport used to +deliver messages to/from vpp. It's reasonably straighforward to use multiple +API message transport types simultaneously. + +For historical reasons, binary api messages are (putatively) sent in network +byte order. As of this writing, we're seriously considering whether that +choice makes sense. + + +## Message Allocation + +Since binary API messages are always processed in order, we allocate messages +using a ring allocator whenever possible. This scheme is extremely fast when +compared with a traditional memory allocator, and doesn't cause heap +fragmentation. See +@ref src/vlibmemory/memory_shared.c @ref vl_msg_api_alloc_internal() . + +Regardless of transport, binary api messages always follow a @ref msgbuf_t +header: + +``` + typedef struct msgbuf_ + { + unix_shared_memory_queue_t *q; + u32 data_len; + u32 gc_mark_timestamp; + u8 data[0]; + } msgbuf_t; +``` + +This structure makes it easy to trace messages without having to +decode them - simply save data_len bytes - and allows +@ref vl_msg_api_free() to rapidly dispose of message buffers: + +``` + void + vl_msg_api_free (void *a) + { + msgbuf_t *rv; + api_main_t *am = &api_main; + + rv = (msgbuf_t *) (((u8 *) a) - offsetof (msgbuf_t, data)); + + /* + * Here's the beauty of the scheme. Only one proc/thread has + * control of a given message buffer. To free a buffer, we just + * clear the queue field, and leave. No locks, no hits, no errors... + */ + if (rv->q) + { + rv->q = 0; + rv->gc_mark_timestamp = 0; + return; + } + + } +``` + +## Message Tracing and Replay + +It's extremely important that vpp can capture and replay sizeable binary API +traces. System-level issues involving hundreds of thousands of API +transactions can be re-run in a second or less. Partial replay allows one to +binary-search for the point where the wheels fall off. One can add scaffolding +to the data plane, to trigger when complex conditions obtain. + +With binary API trace, print, and replay, system-level bug reports of the form +"after 300,000 API transactions, the vpp data-plane stopped forwarding +traffic, FIX IT!" can be solved offline. + +More often than not, one discovers that a control-plane client +misprograms the data plane after a long time or under complex +circumstances. Without direct evidence, "it's a data-plane problem!" + +See @ref src/vlibmemory/memory_vlib.c @ref vl_msg_api_process_file() , +and @ref src/vlibapi/api_shared.c . See also the debug CLI command "api trace" + +## Client connection details + +Establishing a binary API connection to vpp from a C-language client +is easy: + +``` + int + connect_to_vpe (char *client_name, int client_message_queue_length) + { + vat_main_t *vam = &vat_main; + api_main_t *am = &api_main; + + if (vl_client_connect_to_vlib ("/vpe-api", client_name, + client_message_queue_length) < 0) + return -1; + + /* Memorize vpp's binary API message input queue address */ + vam->vl_input_queue = am->shmem_hdr->vl_input_queue; + /* And our client index */ + vam->my_client_index = am->my_client_index; + return 0; + } +``` + +32 is a typical value for client_message_queue_length. Vpp cannot +block when it needs to send an API message to a binary API client, and +the vpp-side binary API message handlers are very fast. When sending +asynchronous messages, make sure to scrape the binary API rx ring with +some enthusiasm. + +### binary API message RX pthread + +Calling @ref vl_client_connect_to_vlib spins up a binary API message RX +pthread: + +``` + static void * + rx_thread_fn (void *arg) + { + unix_shared_memory_queue_t *q; + memory_client_main_t *mm = &memory_client_main; + api_main_t *am = &api_main; + + q = am->vl_input_queue; + + /* So we can make the rx thread terminate cleanly */ + if (setjmp (mm->rx_thread_jmpbuf) == 0) + { + mm->rx_thread_jmpbuf_valid = 1; + while (1) + { + vl_msg_api_queue_handler (q); + } + } + pthread_exit (0); + } +``` + +To handle the binary API message queue yourself, use +@ref vl_client_connect_to_vlib_no_rx_pthread. + +In turn, vl_msg_api_queue_handler(...) uses mutex/condvar signalling +to wake up, process vpp -> client traffic, then sleep. Vpp supplies a +condvar broadcast when the vpp -> client API message queue transitions +from empty to nonempty. + +Vpp checks its own binary API input queue at a very high rate. Vpp +invokes message handlers in "process" context [aka cooperative +multitasking thread context] at a variable rate, depending on +data-plane packet processing requirements. + +## Client disconnection details + +To disconnect from vpp, call @ref vl_client_disconnect_from_vlib +. Please arrange to call this function if the client application +terminates abnormally. Vpp makes every effort to hold a decent funeral +for dead clients, but vpp can't guarantee to free leaked memory in the +shared binary API segment. + +## Sending binary API messages to vpp + +The point of the exercise is to send binary API messages to vpp, and +to receive replies from vpp. Many vpp binary APIs comprise a client +request message, and a simple status reply. For example, to +set the admin status of an interface, one codes: + +``` + vl_api_sw_interface_set_flags_t *mp; + + mp = vl_msg_api_alloc (sizeof (*mp)); + memset (mp, 0, sizeof (*mp)); + mp->_vl_msg_id = clib_host_to_net_u16 (VL_API_SW_INTERFACE_SET_FLAGS); + mp->client_index = api_main.my_client_index; + mp->sw_if_index = clib_host_to_net_u32 (); + vl_msg_api_send (api_main.shmem_hdr->vl_input_queue, (u8 *)mp); +``` + +Key points: + +* Use @ref vl_msg_api_alloc to allocate message buffers + +* Allocated message buffers are not initialized, and must be presumed + to contain trash. + +* Don't forget to set the _vl_msg_id field! + +* As of this writing, binary API message IDs and data are sent in + network byte order + +* The client-library global data structure @ref api_main keeps track + of sufficient pointers and handles used to communicate with vpp + +## Receiving binary API messages from vpp + +Unless you've made other arrangements (see @ref +vl_client_connect_to_vlib_no_rx_pthread), *messages are received on a +separate rx pthread*. Synchronization with the client application main +thread is the responsibility of the application! + +Set up message handlers about as follows: + +``` + #define vl_typedefs /* define message structures */ + #include + #undef vl_typedefs + + /* declare message handlers for each api */ + + #define vl_endianfun /* define message structures */ + #include + #undef vl_endianfun + + /* instantiate all the print functions we know about */ + #define vl_print(handle, ...) + #define vl_printfun + #include + #undef vl_printfun + + /* Define a list of all message that the client handles */ + #define foreach_vpe_api_reply_msg \ + _(SW_INTERFACE_SET_FLAGS_REPLY, sw_interface_set_flags_reply) + + static clib_error_t * + my_api_hookup (vlib_main_t * vm) + { + api_main_t *am = &api_main; + + #define _(N,n) \ + vl_msg_api_set_handlers(VL_API_##N, #n, \ + vl_api_##n##_t_handler, \ + vl_noop_handler, \ + vl_api_##n##_t_endian, \ + vl_api_##n##_t_print, \ + sizeof(vl_api_##n##_t), 1); + foreach_vpe_api_msg; + #undef _ + + return 0; + } +``` + +The key API used to establish message handlers is @ref +vl_msg_api_set_handlers , which sets values in multiple parallel +vectors in the @ref api_main_t structure. As of this writing: not all +vector element values can be set through the API. You'll see sporadic +API message registrations followed by minor adjustments of this form: + +``` + /* + * Thread-safe API messages + */ + am->is_mp_safe[VL_API_IP_ADD_DEL_ROUTE] = 1; + am->is_mp_safe[VL_API_GET_NODE_GRAPH] = 1; +``` + + + + + + + + + + + + + + + + + + + + + diff --git a/src/vlibmemory/memory_vlib.c b/src/vlibmemory/memory_vlib.c index 401f388a..b6b87529 100644 --- a/src/vlibmemory/memory_vlib.c +++ b/src/vlibmemory/memory_vlib.c @@ -2348,7 +2348,9 @@ cleanup: ?*/ /*? - * Display a serialized API message decode table + * Display a serialized API message decode table, compare a saved + * decode table with the current image, to establish API differences. + * ?*/ /* *INDENT-OFF* */ VLIB_CLI_COMMAND (dump_api_table_file, static) = -- cgit 1.2.3-korg From d84f2ef54a457918f21bb0ee82392274b62611fe Mon Sep 17 00:00:00 2001 From: Chris Luke Date: Wed, 27 Sep 2017 15:09:48 -0400 Subject: General documentation updates - We now have several developer-focused docs, so create an index page for them. - Rework several docs to fit into the index structure. - Experiment with code highlighting; tweak the CSS slightly to make it slightly nicer to look at. Change-Id: I4185a18f84fa0764745ca7a3148276064a3155c6 Signed-off-by: Chris Luke (cherry picked from commit 64ebb5ff1338140d94c7f9ee72138fe84d89de2e) --- README.md | 15 +- doxygen/Makefile | 4 +- doxygen/assets/doxy-vpp.css | 6 + doxygen/dev_doc.md | 12 + doxygen/doxygen.cfg | 3 +- doxygen/test_framework_doc.md | 1 + doxygen/user_doc.md | 2 - extras/libmemif/README.md | 72 ------ extras/libmemif/docs/BuildInstructions.md | 54 ---- extras/libmemif/docs/GettingStarted.md | 215 ---------------- extras/libmemif/docs/buildinstructions_doc.md | 55 ++++ extras/libmemif/docs/gettingstarted_doc.md | 219 ++++++++++++++++ extras/libmemif/examples/ExampleSetup.md | 206 --------------- extras/libmemif/examples/README.md | 16 -- extras/libmemif/examples/example_setup_doc.md | 207 +++++++++++++++ extras/libmemif/examples/examples_doc.md | 18 ++ extras/libmemif/libmemif_doc.md | 74 ++++++ src/plugins/acl/acl-plugin.md | 347 ------------------------- src/plugins/acl/acl_hash_lookup_doc.md | 241 ++++++++++++++++++ src/plugins/acl/acl_multicore_doc.md | 349 ++++++++++++++++++++++++++ src/plugins/acl/hash_lookup.md | 241 ------------------ src/vlibapi/api_doc.md | 77 +++--- 22 files changed, 1234 insertions(+), 1200 deletions(-) create mode 100644 doxygen/dev_doc.md delete mode 100644 extras/libmemif/README.md delete mode 100644 extras/libmemif/docs/BuildInstructions.md delete mode 100644 extras/libmemif/docs/GettingStarted.md create mode 100644 extras/libmemif/docs/buildinstructions_doc.md create mode 100644 extras/libmemif/docs/gettingstarted_doc.md delete mode 100644 extras/libmemif/examples/ExampleSetup.md delete mode 100644 extras/libmemif/examples/README.md create mode 100644 extras/libmemif/examples/example_setup_doc.md create mode 100644 extras/libmemif/examples/examples_doc.md create mode 100644 extras/libmemif/libmemif_doc.md delete mode 100644 src/plugins/acl/acl-plugin.md create mode 100644 src/plugins/acl/acl_hash_lookup_doc.md create mode 100644 src/plugins/acl/acl_multicore_doc.md delete mode 100644 src/plugins/acl/hash_lookup.md (limited to 'doxygen/user_doc.md') diff --git a/README.md b/README.md index 596494b9..090dd35f 100644 --- a/README.md +++ b/README.md @@ -30,10 +30,11 @@ Directory name | Description build-root | Build output directory doxygen | Documentation generator configuration dpdk | DPDK patches and build infrastructure -@ref src | VPP source code +@ref extras/libmemif | Client library for memif +@ref src/examples | VPP example code @ref src/plugins | VPP bundled plugins directory @ref src/svm | Shared virtual memory allocation library - src/tests | Unit tests + src/tests | Standalone tests (not part of test harness) src/vat | VPP API test program @ref src/vlib | VPP application library @ref src/vlibapi | VPP API library @@ -43,9 +44,8 @@ Directory name | Description @ref src/vpp | VPP application @ref src/vpp-api | VPP application API bindings @ref src/vppinfra | VPP core library - test | Unit tests @ref src/vpp/api | Not-yet-relocated API bindings -@ref src/examples | VPP example code + test | Unit tests and Python test harness ## Getting started @@ -95,12 +95,13 @@ for this can be found [on the Setting up Vagrant wiki page] ## More information Several modules provide documentation, see @subpage user_doc for more -information. +end-user-oriented information. Also see @subpage dev_doc for developer notes. Visit the [VPP wiki](https://wiki.fd.io/view/VPP) for details on more -advanced building strategies and development notes. +advanced building strategies and other development notes. ## Test Framework -There is PyDoc generated documentation available for the VPP test framework. See @subpage test_framework_doc for details. +There is PyDoc generated documentation available for the VPP test framework. +See @ref test_framework_doc for details. diff --git a/doxygen/Makefile b/doxygen/Makefile index b6ba5887..5870530f 100644 --- a/doxygen/Makefile +++ b/doxygen/Makefile @@ -62,7 +62,8 @@ DOXY_INPUT ?= \ $(wildcard $(WS_ROOT)/*.md) \ $(wildcard $(DOXY_DIR)/*.md) \ $(DOXY_SRC_DIRECTORIES) \ - $(DOXY_SRC)/plugins + $(DOXY_SRC)/plugins \ + extras # Strip leading workspace path from input names DOXY_INPUT := $(subst $(WS_ROOT)/,,$(DOXY_INPUT)) @@ -73,7 +74,6 @@ DOXY_INPUT := $(subst $(WS_ROOT)/,,$(DOXY_INPUT)) # there's a DPDK equivalent that conflicts. # These must be left-anchored paths for the regexp below to work. DOXY_EXCLUDE ?= \ - $(DOXY_SRC)/vlib/buffer.c \ $(DOXY_SRC)/vpp-api/lua # Generate a regexp for filenames to exclude diff --git a/doxygen/assets/doxy-vpp.css b/doxygen/assets/doxy-vpp.css index 3b74eb6f..108600ac 100644 --- a/doxygen/assets/doxy-vpp.css +++ b/doxygen/assets/doxy-vpp.css @@ -45,3 +45,9 @@ pre { a.el { font-family: Consolas, Courier, monospace; } + +div.fragment { + padding: 2px; + margin-left: 8px; +} + diff --git a/doxygen/dev_doc.md b/doxygen/dev_doc.md new file mode 100644 index 00000000..d1d22f76 --- /dev/null +++ b/doxygen/dev_doc.md @@ -0,0 +1,12 @@ +Developer Documentation {#dev_doc} +======================= + +Programming notes for developers. + +- @subpage test_framework_doc +- @subpage sample_plugin_doc +- @subpage api_doc +- @subpage vapi_doc +- @subpage acl_hash_lookup +- @subpage acl_multicore +- @subpage libmemif_doc diff --git a/doxygen/doxygen.cfg b/doxygen/doxygen.cfg index 7675b693..f2d39112 100644 --- a/doxygen/doxygen.cfg +++ b/doxygen/doxygen.cfg @@ -882,7 +882,8 @@ EXAMPLE_RECURSIVE = NO # that contain images that are to be included in the documentation (see the # \image command). -IMAGE_PATH = $(ROOT)/doxygen/assets +IMAGE_PATH = $(ROOT)/doxygen/assets \ + $(ROOT)/extras/libmemif/docs # The INPUT_FILTER tag can be used to specify a program that doxygen should # invoke to filter for each input file. Doxygen will invoke the filter program diff --git a/doxygen/test_framework_doc.md b/doxygen/test_framework_doc.md index b2873c5f..755a29e4 100644 --- a/doxygen/test_framework_doc.md +++ b/doxygen/test_framework_doc.md @@ -3,5 +3,6 @@ Test Framework Documentation {#test_framework_doc} PyDoc generated documentation for the "make test" framework is available for the following releases +- [Test framework documentation for VPP 17.10](https://docs.fd.io/vpp/17.10/vpp_make_test/html) - [Test framework documentation for VPP 17.04](https://docs.fd.io/vpp/17.04/vpp_make_test/html) - [Test framework documentation for VPP 17.01](https://docs.fd.io/vpp/17.01/vpp_make_test/html) diff --git a/doxygen/user_doc.md b/doxygen/user_doc.md index 65b09f98..f7d7936b 100644 --- a/doxygen/user_doc.md +++ b/doxygen/user_doc.md @@ -16,6 +16,4 @@ Several modules provide operational, dataplane-user focused documentation. - @subpage span_doc - @subpage srv6_doc - @subpage srmpls_doc -- @subpage sample_plugin_doc - @subpage nat64_doc -- @subpage api_doc diff --git a/extras/libmemif/README.md b/extras/libmemif/README.md deleted file mode 100644 index 0cccfb37..00000000 --- a/extras/libmemif/README.md +++ /dev/null @@ -1,72 +0,0 @@ -Shared Memory Packet Interface (memif) Library -============================================== -## Introduction - -Shared memory packet interface (memif) provides high performance packet transmit and receive between user application and Vector Packet Processing (VPP) or multiple user applications. Using libmemif, user application can create shared memory interface in master or slave mode and connect to VPP or another application using libmemif. Once the connection is established, user application can receive or transmit packets using libmemif API. - -![Architecture](docs/architecture.png) - -## Features - -- [x] Slave mode - - [x] Connect to VPP over memif - - [x] ICMP responder example app -- [x] Transmit/receive packets -- [x] Interrupt mode support -- [x] File descriptor event polling in libmemif (optional) - - [x] Simplify file descriptor event polling (one handler for control and interrupt channel) -- [x] Multiple connections -- [x] Multiple queues - - [x] Multi-thread support -- [x] Master mode - - [ ] Multiple regions (TODO) -- [ ] Performance testing (TODO) - -## Quickstart - -This setup will run libmemif ICMP responder example app in container. Install [docker](https://docs.docker.com/engine/installation) engine. -Useful link: [Docker documentation](https://docs.docker.com/get-started). - -Pull image: -``` -# docker pull ligato/libmemif-sample-service -``` - -Now you should be able to see ligato/libmemif-sample-service image on your local machine (IMAGE ID in this README may be outdated): -``` -# docker images -REPOSITORY TAG IMAGE ID CREATED SIZE -ligato/libmemif-sample-service latest 32ecc2f9d013 About a minute ago 468MB -... -``` - -Run container: -``` -# docker run -it --rm --name icmp-responder --hostname icmp-responder --privileged -v "/run/vpp/:/run/vpp/" ligato/libmemif-sample-service -``` -Example application will start in debug mode. Output should look like this: -``` -ICMP_Responder:add_epoll_fd:204: fd 0 added to epoll -MEMIF_DEBUG:src/main.c:memif_init:383: app name: ICMP_Responder -ICMP_Responder:add_epoll_fd:204: fd 4 added to epoll -LIBMEMIF EXAMPLE APP: ICMP_Responder (debug) -============================== -libmemif version: 1.0 (debug) -memif version: 256 -commands: - help - prints this help - exit - exit app - conn - create memif (slave-mode) - del - delete memif - show - show connection details - ip-set - set interface ip address - rx-mode - set queue rx mode -``` - -Continue with [Example setup](examples/ExampleSetup.md) which contains instructions on how to set up conenction between icmpr-epoll example app and VPP-memif. - -#### Next steps - -- [Build instructions](docs/BuildInstructions.md) Instructions on how to build/install libmemif. -- [Examples](examples/README.md) More example apps presenting different features. -- [Getting started](docs/GettingStarted.md) Introduction to libmemif API. Explaining library usage in custom app. diff --git a/extras/libmemif/docs/BuildInstructions.md b/extras/libmemif/docs/BuildInstructions.md deleted file mode 100644 index 9bb59b94..00000000 --- a/extras/libmemif/docs/BuildInstructions.md +++ /dev/null @@ -1,54 +0,0 @@ -## Build Instructions - -Install dependencies -``` -# sudo apt-get install -y git autoconf pkg_config libtool check -``` - -Clone repository to your local machine. -``` -# git clone https://github.com/JakubGrajciar/libmemif.git -``` - -From root directory execute: -For debug build: -``` -# ./bootstrap -# ./configure -# make -# make install -``` - -For release build: -``` -# ./bootstrap -# ./configure -# make release -# make install -``` -Verify installation: -``` -# ./.libs/icmpr-epoll -``` -> Make sure to run the binary file from ./.libs. File ./icmp\_responder in libmemif root directory is script that links the library, so it only verifies successful build. Default install path is /usr/lib. -Use _help_ command to display build information and commands: -``` -ICMP_Responder:add_epoll_fd:204: fd 0 added to epoll -MEMIF_DEBUG:src/main.c:memif_init:383: app name: ICMP_Responder -ICMP_Responder:add_epoll_fd:204: fd 4 added to epoll -LIBMEMIF EXAMPLE APP: ICMP_Responder (debug) -============================== -libmemif version: 1.0 (debug) -memif version: 256 -commands: - help - prints this help - exit - exit app - conn - create memif (slave-mode) - del - delete memif - show - show connection details - ip-set - set interface ip address - rx-mode - set queue rx mode -``` -#### Examples - -Once the library is build/installed, refer to [Examples](../examples/README.md) and [Getting started](GettingStarted.md) for additional information on basic use cases and API usage. diff --git a/extras/libmemif/docs/GettingStarted.md b/extras/libmemif/docs/GettingStarted.md deleted file mode 100644 index 3cc0ceb0..00000000 --- a/extras/libmemif/docs/GettingStarted.md +++ /dev/null @@ -1,215 +0,0 @@ -## Getting started - -#### Concept (Connecting to VPP) - -For detailed information on api calls and structures please refer to [libmemif.h](../src/libmemif.h) - -1. Initialize memif - - Declare callback function handling file descriptor event polling. -```C -int -control_fd_update (int fd, uint8_t events) -{ -... -} -``` - - Call memif initialization function. memif\_init -```C -err = memif_init (control_fd_update, APP_NAME); -``` - -> If event occurres on any file descriptor returned by this callback, call memif\_control\_fd\_handler function. -```C -memif_err = memif_control_fd_handler (evt.data.fd, events); -``` -> If callback function parameter for memif\_init function is set to NULL, libmemif will handle file descriptor event polling. - Api call memif\_poll\_event will call epoll\_pwait with user defined timeout to poll event on file descriptors opened by libmemif. -```C -/* main loop */ - while (1) - { - if (memif_poll_event (-1) < 0) - { - DBG ("poll_event error!"); - } - } -``` - -> Memif initialization function will initialize internal structures and create timer file descriptor, which will be used for sending periodic connection requests. Timer is disarmed if no memif interface is created. - -2. Creating interface - - Declare memif connction handle. -```C -memif_conn_handle_t c; -``` -> example app uses struct that contains connection handle, rx/tx buffers and other connection specific information. - - - Specify connection arguments. -```C -memif_conn_args_t args; -memset (&args, 0, sizeof (args)); -args.is_master = is_master; -args.log2_ring_size = 10; -args.buffer_size = 2048; -args.num_s2m_rings = 2; -args.num_m2s_rings = 2; -strncpy ((char *) args.interface_name, IF_NAME, strlen (IF_NAME)); -strncpy ((char *) args.instance_name, APP_NAME, strlen (APP_NAME)); -args.mode = 0; -args.interface_id = 0; -``` - - Declare callback functions called on connected/disconnected/interrupted status changed. -```C -int -on_connect (memif_conn_handle_t conn, void *private_ctx) -{ -... -} - -int -on_disconnect (memif_conn_handle_t conn, void *private_ctx) -{ - INFO ("memif connected!"); - return 0; -} -``` - - Call memif interface create function. memif\_create -```C -err = memif_create (&c->conn, - &args, on_connect, on_disconnect, on_interrupt, &ctx[index]); -``` -> If connection is in slave mode, arms timer file descriptor. -> If on interrupt callback is set to NULL, user will not be notified about interrupt. Use memif\_get\_queue\_efd call to get interrupt file descriptor for specific queue. -```C -int fd = -1; -err = memif_get_queue_efd (c->conn, data->qid, &fd); -``` - -3. Connection establishment - - User application will poll events on all file descriptors returned in memif\_control\_fd\_update\_t callback. - - On event call memif\_control\_fd\_handler. - - Everything else regarding connection establishment will be done internally. - - Once connection has been established, a callback will inform the user about connection status change. - -4. Interrupt packet receive - - If event is polled on interrupt file descriptor, libmemif will call memif\_interrupt\_t callback specified for every connection instance. -```C -int -on_interrupt (memif_conn_handle_t conn, void *private_ctx, uint16_t qid) -{ -... -} -``` - -6. Memif buffers - - Packet data are stored in memif\_buffer\_t. Pointer _data_ points to shared memory buffer, and unsigned integer *data\_len* contains packet data length. -```C -typedef struct -{ - uint16_t desc_index; - uint32_t buffer_len; - uint32_t data_len; - void *data; -} memif_buffer_t; -``` - -5. Packet receive - - Api call memif\_rx\_burst will set all required fields in memif buffers provided by user application. -```C -err = memif_rx_burst (c->conn, qid, c->rx_bufs, MAX_MEMIF_BUFS, &rx); -``` - - User application can then process packets. - - Api call memif\_buffer\_free will make supplied memif buffers ready for next receive and mark shared memory buffers as free. -```C -err = memif_buffer_free (c->conn, qid, c->rx_bufs, rx, &fb); -``` - -6. Packet transmit - - Api call memif\_buffer\_alloc will set all required fields in memif buffers provided by user application. -```C -err = memif_buffer_alloc (c->conn, qid, c->tx_bufs, n, &r); -``` - - User application can populate shared memory buffers with packets. - - Api call memif\_tx\_burst will inform peer interface (master memif on VPP) that there are packets ready to receive and mark memif buffers as free. -```C -err = memif_tx_burst (c->conn, qid, c->tx_bufs, c->tx_buf_num, &r); -``` - -7. Helper functions - - Memif details - - Api call memif\_get\_details will return details about connection. -```C -err = memif_get_details (c->conn, &md, buf, buflen); -``` - - Memif error messages - - Every api call returns error code (integer value) mapped to error string. - - Call memif\_strerror will return error message assigned to specific error code. -```C -if (err != MEMIF_ERR_SUCCESS) - INFO ("memif_get_details: %s", memif_strerror (err)); -``` - - Not all syscall errors are translated to memif error codes. If error code 1 (MEMIF\_ERR\_SYSCALL) is returned then libmemif needs to be compiled with -DMEMIF_DBG flag to print error message. Use _make -B_ to rebuild libmemif in debug mode. - -#### Example app (libmemif fd event polling): - -- [ICMP Responder](../examples/icmp_responder/main.c) -> Optional argument: transmit queue id. -``` -icmpr 1 -``` -> Set transmit queue id to 1. Default is 0. -> Application will create memif interface in slave mode and try to connect to VPP. Exit using Ctrl+C. Application will handle SIGINT signal, free allocated memory and exit with EXIT_SUCCESS. - -#### Example app: - -- [ICMP Responder custom fd event polling](../examples/icmp_responder-epoll/main.c) - -#### Example app (multi-thread queue polling) - -- [ICMP Responder multi-thread](../examples/icmp_responder-mt/main.c) - -> Simple example of libmemif multi-thread usage. Connection establishment is handled by main thread. There are two rx queues in this example. One in polling mode and second in interrupt mode. - -VPP config: -``` -# create memif id 0 master -# set int state memif0 up -# set int ip address memif0 192.168.1.1/24 -# ping 192.168.1.2 -``` -For multiple rings (queues) support run VPP with worker threads: -example startup.conf: -``` -unix { - interactive - nodaemon - full-coredump -} - -cpu { - workers 2 -} -``` -VPP config: -``` -# create memif id 0 master -# set int state memif0 up -# set int ip address memif0 192.168.1.1/24 -# ping 192.168.1.2 -``` -> Master mode queue number is limited by worker threads. Slave mode interface needs to specify number of queues. -``` -# create memif id 0 slave rx-queues 2 tx-queues 2 -``` -> Example applications use VPP default socket file for memif: /run/vpp/memif.sock -> For master mode, socket directory must exist prior to memif\_create call. - -#### Unit tests - -Unit tests use [Check](https://libcheck.github.io/check/index.html) framework. This framework must be installed in order to build *unit\_test* binary. -Ubuntu/Debian: -``` -sudo apt-get install check -``` -[More platforms](https://libcheck.github.io/check/web/install.html) - diff --git a/extras/libmemif/docs/buildinstructions_doc.md b/extras/libmemif/docs/buildinstructions_doc.md new file mode 100644 index 00000000..838e835a --- /dev/null +++ b/extras/libmemif/docs/buildinstructions_doc.md @@ -0,0 +1,55 @@ +## Build Instructions {#libmemif_build_doc} + +Install dependencies +``` +# sudo apt-get install -y git autoconf pkg_config libtool check +``` + +Clone repository to your local machine. +``` +# git clone https://github.com/JakubGrajciar/libmemif.git +``` + +From root directory execute: +For debug build: +``` +# ./bootstrap +# ./configure +# make +# make install +``` + +For release build: +``` +# ./bootstrap +# ./configure +# make release +# make install +``` +Verify installation: +``` +# ./.libs/icmpr-epoll +``` +> Make sure to run the binary file from ./.libs. File ./icmp\_responder in libmemif root directory is script that links the library, so it only verifies successful build. Default install path is /usr/lib. +Use _help_ command to display build information and commands: +``` +ICMP_Responder:add_epoll_fd:204: fd 0 added to epoll +MEMIF_DEBUG:src/main.c:memif_init:383: app name: ICMP_Responder +ICMP_Responder:add_epoll_fd:204: fd 4 added to epoll +LIBMEMIF EXAMPLE APP: ICMP_Responder (debug) +============================== +libmemif version: 1.0 (debug) +memif version: 256 +commands: + help - prints this help + exit - exit app + conn - create memif (slave-mode) + del - delete memif + show - show connection details + ip-set - set interface ip address + rx-mode - set queue rx mode +``` + +#### Examples + +Once the library is built/installed, refer to @ref libmemif_examples_doc and @ref libmemif_gettingstarted_doc for additional information on basic use cases and API usage. diff --git a/extras/libmemif/docs/gettingstarted_doc.md b/extras/libmemif/docs/gettingstarted_doc.md new file mode 100644 index 00000000..e3ae6e56 --- /dev/null +++ b/extras/libmemif/docs/gettingstarted_doc.md @@ -0,0 +1,219 @@ +## Getting started {#libmemif_gettingstarted_doc} + +#### Concept (Connecting to VPP) + +For detailed information on api calls and structures please refer to @ref libmemif.h. + +1. Initialize memif + - Declare callback function handling file descriptor event polling. +```C +int +control_fd_update (int fd, uint8_t events) +{ +... +} +``` + - Call memif initialization function. memif\_init +```C +err = memif_init (control_fd_update, APP_NAME); +``` + +> If event occurres on any file descriptor returned by this callback, call memif\_control\_fd\_handler function. +```C +memif_err = memif_control_fd_handler (evt.data.fd, events); +``` +> If callback function parameter for memif\_init function is set to NULL, libmemif will handle file descriptor event polling. + Api call memif\_poll\_event will call epoll\_pwait with user defined timeout to poll event on file descriptors opened by libmemif. +```C +/* main loop */ + while (1) + { + if (memif_poll_event (-1) < 0) + { + DBG ("poll_event error!"); + } + } +``` + +> Memif initialization function will initialize internal structures and create timer file descriptor, which will be used for sending periodic connection requests. Timer is disarmed if no memif interface is created. + +2. Creating interface + - Declare memif connction handle. +```C +memif_conn_handle_t c; +``` +> example app uses struct that contains connection handle, rx/tx buffers and other connection specific information. + + - Specify connection arguments. +```C +memif_conn_args_t args; +memset (&args, 0, sizeof (args)); +args.is_master = is_master; +args.log2_ring_size = 10; +args.buffer_size = 2048; +args.num_s2m_rings = 2; +args.num_m2s_rings = 2; +strncpy ((char *) args.interface_name, IF_NAME, strlen (IF_NAME)); +strncpy ((char *) args.instance_name, APP_NAME, strlen (APP_NAME)); +args.mode = 0; +args.interface_id = 0; +``` + - Declare callback functions called on connected/disconnected/interrupted status changed. +```C +int +on_connect (memif_conn_handle_t conn, void *private_ctx) +{ +... +} + +int +on_disconnect (memif_conn_handle_t conn, void *private_ctx) +{ + INFO ("memif connected!"); + return 0; +} +``` + - Call memif interface create function. memif\_create +```C +err = memif_create (&c->conn, + &args, on_connect, on_disconnect, on_interrupt, &ctx[index]); +``` +> If connection is in slave mode, arms timer file descriptor. +> If on interrupt callback is set to NULL, user will not be notified about interrupt. Use memif\_get\_queue\_efd call to get interrupt file descriptor for specific queue. +```C +int fd = -1; +err = memif_get_queue_efd (c->conn, data->qid, &fd); +``` + +3. Connection establishment + - User application will poll events on all file descriptors returned in memif\_control\_fd\_update\_t callback. + - On event call memif\_control\_fd\_handler. + - Everything else regarding connection establishment will be done internally. + - Once connection has been established, a callback will inform the user about connection status change. + +4. Interrupt packet receive + - If event is polled on interrupt file descriptor, libmemif will call memif\_interrupt\_t callback specified for every connection instance. +```C +int +on_interrupt (memif_conn_handle_t conn, void *private_ctx, uint16_t qid) +{ +... +} +``` + +6. Memif buffers + - Packet data are stored in memif\_buffer\_t. Pointer _data_ points to shared memory buffer, and unsigned integer *data\_len* contains packet data length. +```C +typedef struct +{ + uint16_t desc_index; + uint32_t buffer_len; + uint32_t data_len; + void *data; +} memif_buffer_t; +``` + +5. Packet receive + - Api call memif\_rx\_burst will set all required fields in memif buffers provided by user application. +```C +err = memif_rx_burst (c->conn, qid, c->rx_bufs, MAX_MEMIF_BUFS, &rx); +``` + - User application can then process packets. + - Api call memif\_buffer\_free will make supplied memif buffers ready for next receive and mark shared memory buffers as free. +```C +err = memif_buffer_free (c->conn, qid, c->rx_bufs, rx, &fb); +``` + +6. Packet transmit + - Api call memif\_buffer\_alloc will set all required fields in memif buffers provided by user application. +```C +err = memif_buffer_alloc (c->conn, qid, c->tx_bufs, n, &r); +``` + - User application can populate shared memory buffers with packets. + - Api call memif\_tx\_burst will inform peer interface (master memif on VPP) that there are packets ready to receive and mark memif buffers as free. +```C +err = memif_tx_burst (c->conn, qid, c->tx_bufs, c->tx_buf_num, &r); +``` + +7. Helper functions + - Memif details + - Api call memif\_get\_details will return details about connection. +```C +err = memif_get_details (c->conn, &md, buf, buflen); +``` + - Memif error messages + - Every api call returns error code (integer value) mapped to error string. + - Call memif\_strerror will return error message assigned to specific error code. +```C +if (err != MEMIF_ERR_SUCCESS) + INFO ("memif_get_details: %s", memif_strerror (err)); +``` + - Not all syscall errors are translated to memif error codes. If error code 1 (MEMIF\_ERR\_SYSCALL) is returned then libmemif needs to be compiled with -DMEMIF_DBG flag to print error message. Use _make -B_ to rebuild libmemif in debug mode. + +#### Example app (libmemif fd event polling): + +- @ref extras/libmemif/examples/icmp_responder + +> Optional argument: transmit queue id. +``` +icmpr 1 +``` +> Set transmit queue id to 1. Default is 0. +> Application will create memif interface in slave mode and try to connect to VPP. Exit using Ctrl+C. Application will handle SIGINT signal, free allocated memory and exit with EXIT_SUCCESS. + +#### Example app: + +ICMP Responder custom fd event polling. + +- @ref extras/libmemif/examples/icmp_responder-epoll + +#### Example app (multi-thread queue polling) + +ICMP Responder multi-thread. +- @ref extras/libmemif/examples/icmp_responder-mt + +> Simple example of libmemif multi-thread usage. Connection establishment is handled by main thread. There are two rx queues in this example. One in polling mode and second in interrupt mode. + +VPP config: +``` +# create memif id 0 master +# set int state memif0 up +# set int ip address memif0 192.168.1.1/24 +# ping 192.168.1.2 +``` +For multiple rings (queues) support run VPP with worker threads: +example startup.conf: +``` +unix { + interactive + nodaemon + full-coredump +} + +cpu { + workers 2 +} +``` +VPP config: +``` +# create memif id 0 master +# set int state memif0 up +# set int ip address memif0 192.168.1.1/24 +# ping 192.168.1.2 +``` +> Master mode queue number is limited by worker threads. Slave mode interface needs to specify number of queues. +``` +# create memif id 0 slave rx-queues 2 tx-queues 2 +``` +> Example applications use VPP default socket file for memif: /run/vpp/memif.sock +> For master mode, socket directory must exist prior to memif\_create call. + +#### Unit tests + +Unit tests use [Check](https://libcheck.github.io/check/index.html) framework. This framework must be installed in order to build *unit\_test* binary. +Ubuntu/Debian: +``` +sudo apt-get install check +``` +[More platforms](https://libcheck.github.io/check/web/install.html) + diff --git a/extras/libmemif/examples/ExampleSetup.md b/extras/libmemif/examples/ExampleSetup.md deleted file mode 100644 index a02506dd..00000000 --- a/extras/libmemif/examples/ExampleSetup.md +++ /dev/null @@ -1,206 +0,0 @@ -## Example setup - -#### VPP-memif master icmp_responder slave - -> Libmemif example app(s) use memif default socket file: /run/vpp/memif.sock. - -Run VPP and icmpr-epoll example (default example when running in container). -> Other examples work similar to icmpr-epoll. Brief explanation can be found in [Examples readme](README.md) file. - -VPP-side config: -``` -DBGvpp# create memif id 0 master -DBGvpp# set int state memif0/0 up -DBGvpp# set int ip address memif0/0 192.168.1.1/24 -``` -icmpr-epoll: -``` -conn 0 0 -``` -Memif in slave mode will try to connect every 2 seconds. If connection establishment is successfull, a message will show. -``` -INFO: memif connected! -``` -> Error messages like "unmatched interface id" are printed only in debug mode. - -Check connected status. -Use show command in icmpr-epoll: -``` -show -MEMIF DETAILS -============================== -interface index: 0 - interface ip: 192.168.1.2 - interface name: memif_connection - app name: ICMP_Responder - remote interface name: memif0/0 - remote app name: VPP 17.10-rc0~132-g62f9cdd - id: 0 - secret: - role: slave - mode: ethernet - socket filename: /run/vpp/memif.sock - rx queues: - queue id: 0 - ring size: 1024 - buffer size: 2048 - tx queues: - queue id: 0 - ring size: 1024 - buffer size: 2048 - link: up -interface index: 1 - no connection - -``` -Use sh memif command in VPP: -``` -DBGvpp# sh memif -interface memif0/0 - remote-name "ICMP_Responder" - remote-interface "memif_connection" - id 0 mode ethernet file /run/vpp/memif.sock - flags admin-up connected - listener-fd 12 conn-fd 13 - num-s2m-rings 1 num-m2s-rings 1 buffer-size 0 - master-to-slave ring 0: - region 0 offset 32896 ring-size 1024 int-fd 16 - head 0 tail 0 flags 0x0000 interrupts 0 - master-to-slave ring 0: - region 0 offset 0 ring-size 1024 int-fd 15 - head 0 tail 0 flags 0x0001 interrupts 0 -``` - -Send ping from VPP to icmpr-epoll: -``` -DBGvpp# ping 192.168.1.2 -64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=.1888 ms -64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=.1985 ms -64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=.1813 ms -64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=.1929 ms - -Statistics: 5 sent, 4 received, 20% packet loss -``` -#### multiple queues VPP-memif slave icmp_responder master - -Run icmpr-epoll as in previous example setup. -Run VPP with startup conf, enabling 2 worker threads. -Example startup.conf: -``` -unix { - interactive - nodaemon - full-coredump -} - -cpu { - workers 2 -} -``` -VPP-side config: -``` -DBGvpp# create memif id 0 slave rx-queues 2 tx-queues 2 -DBGvpp# set int state memif0/0 up -DBGvpp# set int ip address memif0/0 192.168.1.1/24 -``` -icmpr-epoll: -``` -conn 0 1 -``` -When connection is established a message will print: -``` -INFO: memif connected! -``` -> Error messages like "unmatched interface id" are printed only in debug mode. - -Check connected status. -Use show command in icmpr-epoll: -``` -show -MEMIF DETAILS -============================== -interface index: 0 - interface ip: 192.168.1.2 - interface name: memif_connection - app name: ICMP_Responder - remote interface name: memif0/0 - remote app name: VPP 17.10-rc0~132-g62f9cdd - id: 0 - secret: - role: master - mode: ethernet - socket filename: /run/vpp/memif.sock - rx queues: - queue id: 0 - ring size: 1024 - buffer size: 2048 - queue id: 1 - ring size: 1024 - buffer size: 2048 - tx queues: - queue id: 0 - ring size: 1024 - buffer size: 2048 - queue id: 1 - ring size: 1024 - buffer size: 2048 - link: up -interface index: 1 - no connection - -``` -Use sh memif command in VPP: -``` -DBGvpp# sh memif -interface memif0/0 - remote-name "ICMP_Responder" - remote-interface "memif_connection" - id 0 mode ethernet file /run/vpp/memif.sock - flags admin-up slave connected - listener-fd -1 conn-fd 12 - num-s2m-rings 2 num-m2s-rings 2 buffer-size 2048 - slave-to-master ring 0: - region 0 offset 0 ring-size 1024 int-fd 14 - head 0 tail 0 flags 0x0000 interrupts 0 - slave-to-master ring 1: - region 0 offset 32896 ring-size 1024 int-fd 15 - head 0 tail 0 flags 0x0000 interrupts 0 - slave-to-master ring 0: - region 0 offset 65792 ring-size 1024 int-fd 16 - head 0 tail 0 flags 0x0001 interrupts 0 - slave-to-master ring 1: - region 0 offset 98688 ring-size 1024 int-fd 17 - head 0 tail 0 flags 0x0001 interrupts 0 - -``` -Send ping from VPP to icmpr-epoll: -``` -DBGvpp# ping 192.168.1.2 -64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=.1439 ms -64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=.2184 ms -64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=.1458 ms -64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=.1687 ms - -Statistics: 5 sent, 4 received, 20% packet loss -``` - -#### icmp_responder master icmp_responder slave - -> Example apps can only repond to ping. This setup creates connection between two applications using libmemif. Traffic functionality is the same as when connection to VPP. App can receive ARP/ICMP request and transmit response, but can not send ARP/ICMP request. - -Run two instances of icmpr-epoll example. -> If not running in container, make sure folder /run/vpp/ exists before creating memif master. -Instance 1 will be in master mode, instance 2 in slave mode. -instance 1: -``` -conn 0 1 -``` -instance 2: -``` -conn 0 0 -``` -In 2 seconds, both instances should print connected! message: -``` -INFO: memif connected! -``` -Check peer interface names using show command. diff --git a/extras/libmemif/examples/README.md b/extras/libmemif/examples/README.md deleted file mode 100644 index 1375d27a..00000000 --- a/extras/libmemif/examples/README.md +++ /dev/null @@ -1,16 +0,0 @@ -## Examples - -After build, root folder will contain scripts linking binary examples with library (same name as example apps). These scripts can be executed to run example apps without installing the library. Example apps binaries can be found in _libs_ filder. To run binaries directly, make sure that libmemif library is installed. - -#### Run in container -ligato/libmemif-sample-service image contains built and installed libmemf. To run different examples, override docker CMD to start container in bash: -``` -# docker run -it --entrypoint=/bin/bash -i --rm --name icmp-responder --hostname icmp-responder --privileged -v "/run/vpp/:/run/vpp/" ligato/libmemif-sample-service -``` -Current WORKDIR is set to root repository directory. Example apps can be run from this directory (a script linking binary with library), or browse to ./.libs folder and execute binary directly. - -Example app | Description -------------|------------ -[icmpr](../examples/icmp_responder/main.c) | Simplest implementaion. Event polling is handled by libmemif. Single memif conenction in slave mode is created (id 0). Use Ctrl + C to exit app. Memif receive mode: interrupt. -[icmpr-epoll](../examples/icmp_responder-epoll/main.c) (run in container by default) | Supports multiple connections and master mode. User can create/delete connections, set ip addresses, print connection information. [Example setup](ExampleSetup.md) contains instructions on basic connection use cases setups. Memif receive mode: interrupt. App provides functionality to disable interrupts for specified queue/s for testing purposes. Polling mode is not implemented in this example. -[icmpr-mt](../examples/icmp_responder-mt/main.c) | Multi-thread example, very similar to icmpr-epoll. Packets are handled in threads assigned to specific queues. Slave mode only. Memif receive mode: polling (memif_rx_poll function), interrupt (memif_rx_interrupt function). Receive modes differ per queue. diff --git a/extras/libmemif/examples/example_setup_doc.md b/extras/libmemif/examples/example_setup_doc.md new file mode 100644 index 00000000..89a0b10e --- /dev/null +++ b/extras/libmemif/examples/example_setup_doc.md @@ -0,0 +1,207 @@ +## Example setup {#libmemif_example_setup_doc} + +#### VPP-memif master icmp_responder slave + +> Libmemif example app(s) use memif default socket file: `/run/vpp/memif.sock`. + +Run VPP and icmpr-epoll example (default example when running in container). + +> Other examples work similar to icmpr-epoll. Brief explanation can be found in @ref libmemif_examples_doc . + +VPP-side config: +``` +DBGvpp# create memif id 0 master +DBGvpp# set int state memif0/0 up +DBGvpp# set int ip address memif0/0 192.168.1.1/24 +``` +icmpr-epoll: +``` +conn 0 0 +``` +Memif in slave mode will try to connect every 2 seconds. If connection establishment is successfull, a message will show. +``` +INFO: memif connected! +``` +> Error messages like "unmatched interface id" are printed only in debug mode. + +Check connected status. +Use show command in icmpr-epoll: +``` +show +MEMIF DETAILS +============================== +interface index: 0 + interface ip: 192.168.1.2 + interface name: memif_connection + app name: ICMP_Responder + remote interface name: memif0/0 + remote app name: VPP 17.10-rc0~132-g62f9cdd + id: 0 + secret: + role: slave + mode: ethernet + socket filename: /run/vpp/memif.sock + rx queues: + queue id: 0 + ring size: 1024 + buffer size: 2048 + tx queues: + queue id: 0 + ring size: 1024 + buffer size: 2048 + link: up +interface index: 1 + no connection + +``` +Use sh memif command in VPP: +``` +DBGvpp# sh memif +interface memif0/0 + remote-name "ICMP_Responder" + remote-interface "memif_connection" + id 0 mode ethernet file /run/vpp/memif.sock + flags admin-up connected + listener-fd 12 conn-fd 13 + num-s2m-rings 1 num-m2s-rings 1 buffer-size 0 + master-to-slave ring 0: + region 0 offset 32896 ring-size 1024 int-fd 16 + head 0 tail 0 flags 0x0000 interrupts 0 + master-to-slave ring 0: + region 0 offset 0 ring-size 1024 int-fd 15 + head 0 tail 0 flags 0x0001 interrupts 0 +``` + +Send ping from VPP to icmpr-epoll: +``` +DBGvpp# ping 192.168.1.2 +64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=.1888 ms +64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=.1985 ms +64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=.1813 ms +64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=.1929 ms + +Statistics: 5 sent, 4 received, 20% packet loss +``` +#### multiple queues VPP-memif slave icmp_responder master + +Run icmpr-epoll as in previous example setup. +Run VPP with startup conf, enabling 2 worker threads. +Example startup.conf: +``` +unix { + interactive + nodaemon + full-coredump +} + +cpu { + workers 2 +} +``` +VPP-side config: +``` +DBGvpp# create memif id 0 slave rx-queues 2 tx-queues 2 +DBGvpp# set int state memif0/0 up +DBGvpp# set int ip address memif0/0 192.168.1.1/24 +``` +icmpr-epoll: +``` +conn 0 1 +``` +When connection is established a message will print: +``` +INFO: memif connected! +``` +> Error messages like "unmatched interface id" are printed only in debug mode. + +Check connected status. +Use show command in icmpr-epoll: +``` +show +MEMIF DETAILS +============================== +interface index: 0 + interface ip: 192.168.1.2 + interface name: memif_connection + app name: ICMP_Responder + remote interface name: memif0/0 + remote app name: VPP 17.10-rc0~132-g62f9cdd + id: 0 + secret: + role: master + mode: ethernet + socket filename: /run/vpp/memif.sock + rx queues: + queue id: 0 + ring size: 1024 + buffer size: 2048 + queue id: 1 + ring size: 1024 + buffer size: 2048 + tx queues: + queue id: 0 + ring size: 1024 + buffer size: 2048 + queue id: 1 + ring size: 1024 + buffer size: 2048 + link: up +interface index: 1 + no connection + +``` +Use sh memif command in VPP: +``` +DBGvpp# sh memif +interface memif0/0 + remote-name "ICMP_Responder" + remote-interface "memif_connection" + id 0 mode ethernet file /run/vpp/memif.sock + flags admin-up slave connected + listener-fd -1 conn-fd 12 + num-s2m-rings 2 num-m2s-rings 2 buffer-size 2048 + slave-to-master ring 0: + region 0 offset 0 ring-size 1024 int-fd 14 + head 0 tail 0 flags 0x0000 interrupts 0 + slave-to-master ring 1: + region 0 offset 32896 ring-size 1024 int-fd 15 + head 0 tail 0 flags 0x0000 interrupts 0 + slave-to-master ring 0: + region 0 offset 65792 ring-size 1024 int-fd 16 + head 0 tail 0 flags 0x0001 interrupts 0 + slave-to-master ring 1: + region 0 offset 98688 ring-size 1024 int-fd 17 + head 0 tail 0 flags 0x0001 interrupts 0 + +``` +Send ping from VPP to icmpr-epoll: +``` +DBGvpp# ping 192.168.1.2 +64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=.1439 ms +64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=.2184 ms +64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=.1458 ms +64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=.1687 ms + +Statistics: 5 sent, 4 received, 20% packet loss +``` + +#### icmp_responder master icmp_responder slave + +> Example apps can only repond to ping. This setup creates connection between two applications using libmemif. Traffic functionality is the same as when connection to VPP. App can receive ARP/ICMP request and transmit response, but can not send ARP/ICMP request. + +Run two instances of icmpr-epoll example. +> If not running in container, make sure folder /run/vpp/ exists before creating memif master. +Instance 1 will be in master mode, instance 2 in slave mode. +instance 1: +``` +conn 0 1 +``` +instance 2: +``` +conn 0 0 +``` +In 2 seconds, both instances should print connected! message: +``` +INFO: memif connected! +``` +Check peer interface names using show command. diff --git a/extras/libmemif/examples/examples_doc.md b/extras/libmemif/examples/examples_doc.md new file mode 100644 index 00000000..11937c57 --- /dev/null +++ b/extras/libmemif/examples/examples_doc.md @@ -0,0 +1,18 @@ +## Examples {#libmemif_examples_doc} + +After build, root folder will contain scripts linking binary examples with library (same name as example apps). These scripts can be executed to run example apps without installing the library. Example apps binaries can be found in _libs_ filder. To run binaries directly, make sure that libmemif library is installed. + +#### Run in container + +`ligato/libmemif-sample-service` image contains built and installed libmemf. To run different examples, override docker CMD to start container in bash: + +``` +# docker run -it --entrypoint=/bin/bash -i --rm --name icmp-responder --hostname icmp-responder --privileged -v "/run/vpp/:/run/vpp/" ligato/libmemif-sample-service +``` +Current WORKDIR is set to root repository directory. Example apps can be run from this directory (a script linking binary with library), or browse to `./.libs` folder and execute binary directly. + +Example app | Description +------------|------------ +@ref extras/libmemif/examples/icmp_responder | Simplest implementaion. Event polling is handled by libmemif. Single memif conenction in slave mode is created (id 0). Use Ctrl + C to exit app. Memif receive mode: interrupt. +@ref extras/libmemif/examples/icmp_responder-epoll (run in container by default) | Supports multiple connections and master mode. User can create/delete connections, set ip addresses, print connection information. @ref libmemif_example_setup_doc contains instructions on basic connection use cases setups. Memif receive mode: interrupt. App provides functionality to disable interrupts for specified queue/s for testing purposes. Polling mode is not implemented in this example. +@ref extras/libmemif/examples/icmp_responder-mt) | Multi-thread example, very similar to icmpr-epoll. Packets are handled in threads assigned to specific queues. Slave mode only. Memif receive mode: polling (memif_rx_poll function), interrupt (memif_rx_interrupt function). Receive modes differ per queue. diff --git a/extras/libmemif/libmemif_doc.md b/extras/libmemif/libmemif_doc.md new file mode 100644 index 00000000..fb4a18af --- /dev/null +++ b/extras/libmemif/libmemif_doc.md @@ -0,0 +1,74 @@ +Shared Memory Packet Interface (memif) Library {#libmemif_doc} +============================================== + +## Introduction + +Shared memory packet interface (memif) provides high performance packet transmit and receive between user application and Vector Packet Processing (VPP) or multiple user applications. Using libmemif, user application can create shared memory interface in master or slave mode and connect to VPP or another application using libmemif. Once the connection is established, user application can receive or transmit packets using libmemif API. + +![Architecture](docs/architecture.png) + +## Features + +- [x] Slave mode + - [x] Connect to VPP over memif + - [x] ICMP responder example app +- [x] Transmit/receive packets +- [x] Interrupt mode support +- [x] File descriptor event polling in libmemif (optional) + - [x] Simplify file descriptor event polling (one handler for control and interrupt channel) +- [x] Multiple connections +- [x] Multiple queues + - [x] Multi-thread support +- [x] Master mode + - [ ] Multiple regions (TODO) +- [ ] Performance testing (TODO) + +## Quickstart + +This setup will run libmemif ICMP responder example app in container. Install [docker](https://docs.docker.com/engine/installation) engine. +Useful link: [Docker documentation](https://docs.docker.com/get-started). + +Pull image: +``` +# docker pull ligato/libmemif-sample-service +``` + +Now you should be able to see ligato/libmemif-sample-service image on your local machine (IMAGE ID in this README may be outdated): +``` +# docker images +REPOSITORY TAG IMAGE ID CREATED SIZE +ligato/libmemif-sample-service latest 32ecc2f9d013 About a minute ago 468MB +... +``` + +Run container: +``` +# docker run -it --rm --name icmp-responder --hostname icmp-responder --privileged -v "/run/vpp/:/run/vpp/" ligato/libmemif-sample-service +``` +Example application will start in debug mode. Output should look like this: +``` +ICMP_Responder:add_epoll_fd:204: fd 0 added to epoll +MEMIF_DEBUG:src/main.c:memif_init:383: app name: ICMP_Responder +ICMP_Responder:add_epoll_fd:204: fd 4 added to epoll +LIBMEMIF EXAMPLE APP: ICMP_Responder (debug) +============================== +libmemif version: 1.0 (debug) +memif version: 256 +commands: + help - prints this help + exit - exit app + conn - create memif (slave-mode) + del - delete memif + show - show connection details + ip-set - set interface ip address + rx-mode - set queue rx mode +``` + +Continue with @ref libmemif_example_setup which contains instructions on how to set up conenction between icmpr-epoll example app and VPP-memif. + +#### Next steps + +- @subpage libmemif_build_doc +- @subpage libmemif_examples_doc +- @subpage libmemif_example_setup_doc +- @subpage libmemif_gettingstarted_doc diff --git a/src/plugins/acl/acl-plugin.md b/src/plugins/acl/acl-plugin.md deleted file mode 100644 index 1b44bca9..00000000 --- a/src/plugins/acl/acl-plugin.md +++ /dev/null @@ -1,347 +0,0 @@ -Multicore support for ACL plugin -================================ - -This captures some considerations and design decisions that I have made, -both for my own memory later on ("what the hell was I thinking?!?"), -and for anyone interested to criticize/improve/hack on this code. - -One of the factors taken into account while making these decisions, -was the relative emphasis on the multi-thread vs. single-thread -use cases: the latter is the vastly more prevalent. But, -one can not optimize the single-thread performance without -having a functioning code for multi-thread. - -stateless ACLs -============== - -The stateless trivially parallelizes, and the only potential for the -race between the different threads is during the reconfiguration, -at the time of replacing the old ACL being checked, with -the new ACL. - -In case an acl_add_replace is being used to replace the rules -within the existing entry, a reallocation of am->acls[X].rules -vector will happen and potentially a change in count. - -acl_match_5tuple() has the following code: - - a = am->acls + acl_index; - for (i = 0; i < a->count; i++) - { - r = a->rules + i; - . . . - -Ideally we should be immune from a->rules changing, -but the problem arises if the count changes in flight, -and the new ruleset is smaller - then we will attempt -to "match" against the free memory. - -This can(?) be solved by replacing the for() with while(), -so the comparison happens at each iteration. - -full_acl_match_5tuple(), which iterates over the list -of ACLs, is a bit less immune, since it takes the pointer -to the vector to iterate and keeps a local copy of -that pointer. - -This race can be solved by checking the -current pointer to the vector with the source pointer, -and seeing if there is an (unlikely) change, and if -there is, return the "deny" action, or, better, -restart the check. - -Since the check reloads the ACL list on a per-packet basis, -there is only a window of opportunity of one packet to -"match" packet against an incorrect rule set. -The workers also do not change anything, only read. -Therefore, it looks like building special structures -to ensure that it does not happen at all might be not -worth it. - -At least not until we have a unit-test able to -reliably catch this condition and test that -the measures applied are effective. Adding the code -which is not possible to exercise is worse than -not adding any code at all. - -So, I opt for "do-nothing" here for the moment. - -reflexive ACLs: single-thread -============================= - -Before we talk multi-thread, is worth revisiting the -design of the reflexive ACLs in the plugin, and -the history of their evolution. - -The very first version of the ACL plugin, shipped in -1701, mostly did the job using the existing components -and gluing them together. Because it needed to work -in bridged forwarding path only, using L2 classifier -as an insertion point appeared natural, also L2 classifier, -being a table with sessions, seemed like a good place -to hold the sessions. - -So, the original design had two conceptual nodes: -one, pointed by the next_miss from the L2 classifier table, -was checking the actual ACL, and inserting session into -the L2 classifier table, and the other one, pointed -to by the next_match within the specific session rule, -was checking the existing session. The timing out -of the existing connections was done in the datapath, -by periodically calling the aging function. - -This decision to use the existing components, -with its attrativeness, did bring a few limitations as well: - -* L2 classifier is a simple mask-and-value match, with -a fixed mask across the table. So, sanely supporting IPv6 -packets with extension headers in that framework was impossible. - -* There is no way to get a backpressure from L2 classifier -depending on memory usage. When it runs out of memory, -it simply crashes the box. When it runs out of memory ? -We don't really know. Depends on how it allocates it. - -* Since we need to match the *reflected* traffic, -we had to create *two* full session entries -in two different directions, which is quite wasteful memory-wise. - -* (showstopper): the L2 classifier runs only in -the bridged data path, so supporting routed data path -would require creating something else entirely different, -which would mean much more headaches support-wise going forward. - -Because of that, I have moved to a different model of -creating a session-5-tuple from the packet data - once, -and then doing all the matching just on that 5-tuple. - -This has allowed to add support for skipping IPv6 extension headers. - -Also, this new version started to store the sessions in a dedicated -bihash-per-interface, with the session key data being -aligned for the ingress packets, and being mirrored for the -egress packets. This allows of significant savings in memory, -because now we need to keep only one copy of the session table per -interface instead of two, and also to only have ONE node for all the lookups, -(L2/L3 path, in/out, IPv4/IPv6) - significantly reducing the code complexity. - -Unfortunately, bihash still has the "lack of backpressure" problem, -in a sense that if you try to insert too many entries and run out -of memory in the heap you supplied, you get a crash. - -To somewhat workaround against that, there is a "maximum tested number of sessions" -value, which tracks the currently inserted sessions in the bihash, -and if this number is being approached, a more aggressive cleanup -can happen. If this number is reached, two behaviors are possible: - -* attempt to do the stateless ACL matching and permit the packet - if it succeeds - -* deny the packet - -Currently I have opted for a second one, since it allows for -a better defined behavior, and if you have to permit -the traffic in both directions, why using stateful anyway ? - -In order to be able to do the cleanup, we need to discriminate between -the session types, with each session type having its own idle timeout. -In order to do that, we keep three lists, defined in enum acl_timeout_e: -ACL_TIMEOUT_UDP_IDLE, ACL_TIMEOUT_TCP_IDLE, ACL_TIMEOUT_TCP_TRANSIENT. - -The first one is hopefully obvious - it is just all UDP connections. -They have an idle timeout of 600 seconds. - -The second and third is a bit more subtle. TCP is a complicated protocol, -and we need to tread the fine line between doing too little and doing -too much, and triggering the potential compatibility issues because of -being a "middlebox". - -I decided to split the TCP connections into two classes: -established, and everything else. "Established", means we have seen -the SYN and ACK from both sides (with PUSH obviously masked out). -This is the "active" state of any TCP connection and we would like -to ensure we do not screw it up. So, the connections in this state -have the default idle timer of 24 hours. - -All the rest of the connections have the idle timeout of 2 minutes, -(inspired by an old value of MSL) and based on the observation -that the states this class represent are usually very short lived. - -Once we have these three baskets of connections, it is trivial to -imagine a simple cleanup mechanism to deal with this: take a -TCP transient connection that has been hanging around. - -It is debatable whether we want to do discrimination between the -different TCP transient connections. Assuming we do FIFO (and -the lists allow us to do just that), it means a given connection -on the head of the list has been hanging around for longest. -Thus, if we are short on resources, we might just go ahead and -reuse it within the datapath. - -This is where we are slowly approaching the question -"Why in the world have not you used timer wheel or such ?" - -The answer is simple: within the above constraints, it does -not buy me much. - -Also, timer wheel creates a leaky abstraction with a difficult -to manage corner case. Which corner case ? - -We have a set of objects (sessions) with an event that may -or may not happen (idle timeout timer firing), and a -necessity to reset the idle timeout when there is -activity on the session. - -In the worst case, where we had a 10000 of one-packet -UDP sessions just created 10 minutes ago, we would need -to deal with a spike of 10000 expired timers. - -Of course, if we have the active traffic on all -of these 10000 connections, then we will not have -to deal with that ? Right, but we will still have to deal -with canceling and requeueing the timers. - -In the best possible case, requeueing a timer is -going to be something along the lines of a linked-list -removal and reinsertion. - -However, keep in mind we already need to classify the -connections for reuse, so therefore we already have -the linked lists! - -And if we just check these linked lists periodically in -a FIFO fashion, we can get away with a very simple per-packet operation: -writing back the timestamp of "now" into the connection structure. - -Then rather than requeueing the list on a per-packet or per-frame -basis, we can defer this action until the time this session -appears on the head of the FIFO list, and the cleaning -routine makes the decision about whether to discard -the session (because the interval since last activity is bigger -than the idle timeout), or to requeue the session back to -the end of the list (because the last activity was less -than idle timeout ago). - -So, rather than using the timers, we can simply reuse our classification -FIFOs, with the following heuristic: do not look at the session that was -enqueued at time X until X+session_timeout. If we enqueue the sessions -in the order of their initial activity, then we can simply use enqueue -timestamp of the head session as a decision criterion for when we need -to get back at looking at it for the timeout purposes. - -Since the number of FIFOs is small, we get a slightly worse check -performance than with timers, but still O(1). - -We seemingly do quite a few "useless" operations of requeueing the items -back to the tail of the list - but, these are the operations we do not -have to do in the active data path, so overall it is a win. - -(Diversion: I believe this problem is congruent to poll vs. epoll or -events vs. threads, some reading on this subject: -http://web.archive.org/web/20120225022154/http://sheddingbikes.com/posts/1280829388.html) - -We can also can run a TCP-like scheme for adaptively changing -the wait period in the routine that deals with the connection timeouts: -we can attempt to check the connections a couple of times per second -(same as we would advance the timer wheel), and then if we have requeued -close to a max-per-quantum number of connections, we can half the waiting -interval, and if we did not requeue any, we can slowly increment the waiting -interval - which at a steady state should stabilize similar to what the TCP rate -does. - -reflexive ACLs: multi-thread -============================= - -The single-threaded implementation in 1704 used a separate "cleaner" process -to deal with the timing out of the connections. -It is all good and great when you know that there is only a single core -to run everything on, but the existence of the lists proves to be -a massive difficulty when it comes to operating from multiple threads. - -Initial study shows that with a few assumptions (e.g. that the cleaner running in main thread -and the worker have a demarcation point in time where either one or the other one touches -the session in the list) it might be possible to make it work, but the resulting -trickiness of doing it neatly with all the corner cases is quite large. - -So, for the multi-threaded scenario, we need to move the connection -aging back to the same CPU as its creation. - -Luckily we can do this with the help of the interrupts. - -So, the design is as follows: the aging thread (acl_fa_session_cleaner_process) -periodically fires the interrupts to the workers interrupt nodes (acl_fa_worker_session_cleaner_process_node.index), -using vlib_node_set_interrupt_pending(), and -the interrupt node acl_fa_worker_conn_cleaner_process() calls acl_fa_check_idle_sessions() -which does the actual job of advancing the lists. And within the actual datapath the only thing we will be -doing is putting the items onto FIFO, and updating the last active time on the existing connection. - -The one "delicate" part is that the worker for one leg of the connection might be different from -the worker of another leg of the connection - but, even if the "owner" tries to free the connection, -nothing terrible can happen - worst case the element of the pool (which is nominally free for a short period) -will get the timestamp updated - same thing about the TCP flags seen. - -A slightly trickier issue arises when the packet initially seen by one worker (thus owned by that worker), -and the return packet processed by another worker, and as a result changes the -the class of the connection (e.g. becomes TCP_ESTABLISHED from TCP_TRANSIENT or vice versa). -If the class changes from one with the shorter idle time to the one with the longer idle time, -then unless we are in the starvation mode where the transient connections are recycled, -we can simply do nothing and let the normal requeue mechanism kick in. If the class changes from the longer idle -timer to the shorter idle timer, then we risk keeping the connection around for longer than needed, which -will affect the resource usage. - -One solution to that is to have NxN ring buffers (where N is the number of workers), such that the non-owner -can signal to the owner the connection# that needs to be requeued out of order. - -A simpler solution though, is to ensure that each FIFO's period is equal to that of a shortest timer. -This way the resource starvation problem is taken care of, at an expense of some additional work. - -This all looks sufficiently nice and simple until a skeleton falls out of the closet: -sometimes we want to clean the connections en masse before they expire. - -There few potential scenarios: -1) removal of an ACL from the interface -2) removal of an interface -3) manual action of an operator (in the future). - -In order to tackle this, we need to modify the logic which decides whether to requeue the -connection on the end of the list, or to delete it due to idle timeout: - -We define a point in time, and have each worker thread fast-forward through its FIFO, -in the process looking for sessions that satisfy the criteria, and either keeping them or requeueing them. - -To keep the ease of appearance to the outside world, we still process this as an event -within the connection cleaner thread, but this event handler does as follows: -1) it creates the bitmap of the sw_if_index values requested to be cleared -2) for each worker, it waits to ensure there is no cleanup operation in progress (and if there is one, -it waits), and then makes a copy of the bitmap, sets the per-worker flag of a cleanup operation, and sends an interrupt. -3) wait until all cleanup operations have completed. - -Within the worker interrupt node, we check if the "cleanup in progress" is set, -and if it is, we check the "fast forward time" value. If unset, we initialize it to value now, and compare the -requested bitmap of sw_if_index values (pending_clear_sw_if_index_bitmap) with the bitmap of sw_if_index that this worker deals with. - -(we set the bit in the bitmap every time we enqueue the packet onto a FIFO - serviced_sw_if_index_bitmap in acl_fa_conn_list_add_session). - -If the result of this AND operation is zero - then we can clear the flag of cleanup in progress and return. -Else we kick off the quantum of cleanup, and make sure we get another interrupt ASAP if that cleanup operation returns non-zero, -meaning there is more work to do. -When that operation returns zero, everything has been processed, we can clear the "cleanup-in-progress" flag, and -zeroize the bitmap of sw_if_index-es requested to be cleaned. - -The interrupt node signals its wish to receive an interrupt ASAP by setting interrupt_is_needed -flag within the per-worker structure. The main thread, while waiting for the -cleanup operation to complete, checks if there is a request for interrupt, -and if there is - it sends one. - -This approach gives us a way to mass-clean the connections which is reusing the code of the regular idle -connection cleanup. - -One potential inefficiency is the bitmap values set by the session insertion -in the data path - there is nothing to clear them. - -So, if one rearranges the interface placement with the workers, then the cleanups will cause some unnecessary work. -For now, we consider it an acceptable limitation. It can be resolved by having another per-worker bitmap, which, when set, -would trigger the cleanup of the bits in the serviced_sw_if_index_bitmap). - -=== the end === - diff --git a/src/plugins/acl/acl_hash_lookup_doc.md b/src/plugins/acl/acl_hash_lookup_doc.md new file mode 100644 index 00000000..cb93df04 --- /dev/null +++ b/src/plugins/acl/acl_hash_lookup_doc.md @@ -0,0 +1,241 @@ +ACL plugin constant-time lookup design {#acl_hash_lookup} +====================================== + +The initial implementation of ACL plugin performs a trivial for() cycle, +going through the assigned ACLs on a per-packet basis. This is not very +efficient, even if for very short ACLs due to its simplicity it can beat +more advanced methods. + +However, to cover the case of longer ACLs with acceptable performance, +we need to have a better way of matching. This write-up proposes +a mechanism to make a lookup from O(M) where M is number of entries +to O(N) where N is number of different mask combinations. + +Preparation of ACL(s) +--------------------- + +The ACL plugin will maintain a global list of "mask types", i.e. the specific +configurations of "do not care" bits within the ACEs. +Upon the creation of a new ACL, a pass will be made through all the +ACEs, to assign and possibly allocate the "mask type number". + +Each ACL has a structure *hash_acl_info_t* representing the "hash-based" +parts of information related to that ACL, primarily the array of +*hash_ace_info_t* structures - each of the members of that array +corresponding to one of the rules (ACEs) in the original ACL, +for this they have a pair of *(acl_index, ace_index)* to keep track, +predominantly for the debugging. + +Why do we need a whole separate structure, and are not adding new fields +to the existing rile structure ? First, encapsulation, to minimize +the pollution of the main ACL code with the hash-based lookup artifacts. + +Second, one rule may correspond to more than one "hash-based" ACE. +In fact, most of the rules do correspond to two of those. Why ? + +Consider that the current ACL lookup logic is that if a packet +is not the initial fragment, and there is an L4 entry acting on the packet, +the comparison will be made only on the L4 protocol field value rather +than on the protocol and port values. This beaviour is governed by +*l4_match_nonfirst_fragment* flag in the *acl_main*, and was needed to +maintain the compatibility with the existing software switch implementation. + +While for the sequential check in *single_acl_match_5tuple()* +it is very easy to implement by just breaking out at the right moment, +in case of hash-based matching this cost us two checks: +one on full 5-tuple and the flag *pkt.is_nonfirst_fragment* being zero, +the second on 3-tuple and the flag *pkt.is_nonfirst_fragment* being one, +with the second check triggered by the *acl_main.l4_match_nonfirst_fragment* +setting being the default 1. This dictates the necessity of having a "match" +field in a given *hash_ace_info_t* element, which would reflect the value +we are supposed to match after applying the mask. + +There can be other circumstances when it might be beneficial to expand +the given rule in the original ACL into multiple - for example, as an +optimization within the port range handling for small port ranges +(this is not done as of the time of writing). + +Assigning ACLs to an interface +------------------------------ + +Once the ACL list is assigned to an interface, or, rather, a new ACL +is added to the list of the existing ACLs applied to the interface, +we need to update the bihash accelerating the lookup. + +All the entries for the lookups are stored within a single *48_8* bihash, +which captures the 5-tuple from the packet as well as the miscellaneous +per-packet information flags, e.g. *l4_valid*, *is_non_first_fragment*, +and so on. To facilitate the use of the single bihash by all the interfaces, +the *is_ip6*, *is_input*, *sw_if_index* are part of the key, +as well as *mask_type_index* - the latter being necessary because +there can be entries with the same value but different masks, e.g.: +`permit ::/0, permit::/128`. + +At the moment of an ACL being applied to an interface, we need to +walk the list of *hash_ace_info_t* entries corresponding to that ACL, +and update the bihash with the keys corresponding to the match +values in these entries. + +The value of the hash match contains the index into a per-*sw_if_index* vector +of *applied_ace_hash_entry_t* elements, as well as a couple of flags: +*shadowed* (optimization: if this flag on a matched entry is zero, means +we can stop the lookup early and declare a match - see below), +and *need_portrange_check* - meaning that what matched was a superset +of the actual match, and we need to perform an extra check. + +Also, upon insertion, we must keep in mind there can be +multiple *applied_ace_hash_entry_t* for the same key and must keep +a list of those. This is necessary to incrementally apply/unapply +the ACLs as part of the ACL vector: say, two ACLs have +"permit 2001:db8::1/128 any" - we should be able to retain the entry +for the second ACL even if we have deleted the first one. +Also, in case there are two entries with the same key but +different port ranges, say 0..42 and 142..65535 - we need +to be able to sequentially match on those if we decide not +to expand them into individual port-specific entries. + +Per-packet lookup +----------------- + +The simple single-packet lookup is defined in +*multi_acl_match_get_applied_ace_index*, which returns the index +of the applied hash ACE if there was a match, or ~0 if there wasn't. + +The future optimized per-packet lookup may be batched in three phases: + +1. Prepare the keys in the per-worker vector by doing logical AND of + original 5-tuple record with the elements of the mask vector. +2. Lookup the keys in the bihash in a batch manner, collecting the + result with lowest u64 (acl index within vector, ACE index) from + the hash lookup value, and performing the list walk if necessary + (for portranges) +3. Take the action from the ACL record as defined by (ACL#, ACE#) from the + resulting lookup winner, or, if no match found, then perform default deny. + +Shadowed/independent/redundant ACEs +------------------------------------ + +During the phase of combining multiple ACLs into one rulebase, when they +are applied to interface, we also can perform several optimizations. + +If a given ACE is a strict subset of another ACE located up in the linear +search order, we can ignore this ACE completely - because by definition +it will never match. We will call such an ACE *redundant*. Here is an example: + +``` +permit 2001:db8:1::/48 2001:db8:2::/48 (B) +deny 2001:d8b:1:1::/64 2001:db8:2:1::/64 (A) +``` + +A bit more formally, we can define this relationship of an ACE A to ACE B as: + +``` +redundant(aceA, aceB) := (contains(protoB, protoA) && contains(srcB, srcA) + && contains(dstB, dstA) && is_after(A, B)) +``` + +Here as "contains" we define an operation operating on the sets defined by +the protocol, (srcIP, srcPortDefinition) and (dstIP, dstPortDefinition) +respectively, and returning true if all the elements represented by +the second argument are represented by the first argument. The "is_after" +is true if A is located below B in the ruleset. + +If a given ACE does not intersect at all with any other ACE +in front of it, we can mark it as such. + +Then during the sequence of the lookups the successful hit on this ACE means +we do not need to look up other mask combinations - thus potentially +significantly speeding up the match process. Here is an example, +assuming we have the following ACL: + +``` +permit 2001:db8:1::/48 2001:db8:2::/48 (B) +deny 2001:db8:3::/48 2001:db8:2:1::/64 (A) +``` + +In this case if we match the second entry, we do not need to check whether +we have matched the first one - the source addresses are completely +different. We call such an ACE *independent* from another. + +We can define this as + +``` +independent(aceA, aceB) := (!intersect(protoA, protoB) || + !intersect(srcA, srcB) || + !intersect(dstA, dstB)) +``` + +where intersect is defined as operation returning true if there are +elements belonging to the sets of both arguments. + +If the entry A is neither redundant nor independent from B, and is below +B in the ruleset, we call such an entry *shadowed* by B, here is an example: + +``` +deny tcp 2001:db8:1::/48 2001:db8:2::/48 (B) +permit 2001:d8b:1:1::/64 2001:db8:2:1::/64 (A) +``` + +This means the earlier rule "carves out" a subset of A, thus leaving +a "shadow". (Evidently, the action needs to be different for the shadow +to have an effect, but for for the terminology sake we do not care). + +The more formal definition: + +``` +shadowed(aceA, aceB) := !redundante(aceA, aceB) && + !independent(aceA, aceB) && + is_after(aceA, aceB) +``` + +Using this terminology, any ruleset can be represented as +a DAG (Directed Acyclic Graph), with the bottom being the implicit +"deny any", pointing to the set of rules shadowing it or the ones +it is redundant for. + +These rules may in turn be shadowing each other. There is no cycles in +this graph because of the natural order of the rules - the rule located +closer to the end of the ruleset can never shadow or make redundant a rule +higher up. + +The optimization that enables can allow for is to skip matching certain +masks on a per-lookup basis - if a given rule has matched, +the only adjustments that can happen is the match with one of +the shadowing rules. + +Also, another avenue for the optimization can be starting the lookup process +with the mask type that maximizes the chances of the independent ACE match, +thus resulting in an ACE lookup being a single hash table hit. + + +Plumbing +-------- + +All the new routines are located in a separate file, +so we can cleanly experiment with a different approach if this +does not fit all of the use cases. + +The constant-time lookup within the data path has the API with +the same signature as: + +``` +u8 +multi_acl_match_5tuple (u32 sw_if_index, fa_5tuple_t * pkt_5tuple, int is_l2, + int is_ip6, int is_input, u32 * acl_match_p, + u32 * rule_match_p, u32 * trace_bitmap) +``` + +There should be a new upper-level function with the same signature, which +will make a decision whether to use a linear lookup, or to use the +constant-time lookup implemented by this work, or to add some other +optimizations (e.g. by keeping the cache of the last N lookups). + +The calls to the routine doing preparatory work should happen +in `acl_add_list()` after creating the linear-lookup structures, +and the routine doing the preparatory work populating the hashtable +should be called from `acl_interface_add_del_inout_acl()` or its callees. + +The initial implementation will be geared towards looking up a single +match at a time, with the subsequent optimizations possible to make +the lookup for more than one packet. + diff --git a/src/plugins/acl/acl_multicore_doc.md b/src/plugins/acl/acl_multicore_doc.md new file mode 100644 index 00000000..b2cf7b9c --- /dev/null +++ b/src/plugins/acl/acl_multicore_doc.md @@ -0,0 +1,349 @@ +Multicore support for ACL plugin {#acl_multicore} +================================ + +This captures some considerations and design decisions that I have made, +both for my own memory later on ("what the hell was I thinking?!?"), +and for anyone interested to criticize/improve/hack on this code. + +One of the factors taken into account while making these decisions, +was the relative emphasis on the multi-thread vs. single-thread +use cases: the latter is the vastly more prevalent. But, +one can not optimize the single-thread performance without +having a functioning code for multi-thread. + +stateless ACLs +============== + +The stateless trivially parallelizes, and the only potential for the +race between the different threads is during the reconfiguration, +at the time of replacing the old ACL being checked, with +the new ACL. + +In case an acl_add_replace is being used to replace the rules +within the existing entry, a reallocation of `am->acls[X].rules` +vector will happen and potentially a change in count. + +acl_match_5tuple() has the following code: + +```{.c} + a = am->acls + acl_index; + for (i = 0; i < a->count; i++) + { + r = a->rules + i; + . . . +``` + +Ideally we should be immune from a->rules changing, +but the problem arises if the count changes in flight, +and the new ruleset is smaller - then we will attempt +to "match" against the free memory. + +This can(?) be solved by replacing the for() with while(), +so the comparison happens at each iteration. + +full_acl_match_5tuple(), which iterates over the list +of ACLs, is a bit less immune, since it takes the pointer +to the vector to iterate and keeps a local copy of +that pointer. + +This race can be solved by checking the +current pointer to the vector with the source pointer, +and seeing if there is an (unlikely) change, and if +there is, return the "deny" action, or, better, +restart the check. + +Since the check reloads the ACL list on a per-packet basis, +there is only a window of opportunity of one packet to +"match" packet against an incorrect rule set. +The workers also do not change anything, only read. +Therefore, it looks like building special structures +to ensure that it does not happen at all might be not +worth it. + +At least not until we have a unit-test able to +reliably catch this condition and test that +the measures applied are effective. Adding the code +which is not possible to exercise is worse than +not adding any code at all. + +So, I opt for "do-nothing" here for the moment. + +reflexive ACLs: single-thread +============================= + +Before we talk multi-thread, is worth revisiting the +design of the reflexive ACLs in the plugin, and +the history of their evolution. + +The very first version of the ACL plugin, shipped in +1701, mostly did the job using the existing components +and gluing them together. Because it needed to work +in bridged forwarding path only, using L2 classifier +as an insertion point appeared natural, also L2 classifier, +being a table with sessions, seemed like a good place +to hold the sessions. + +So, the original design had two conceptual nodes: +one, pointed by the next_miss from the L2 classifier table, +was checking the actual ACL, and inserting session into +the L2 classifier table, and the other one, pointed +to by the next_match within the specific session rule, +was checking the existing session. The timing out +of the existing connections was done in the datapath, +by periodically calling the aging function. + +This decision to use the existing components, +with its attrativeness, did bring a few limitations as well: + +* L2 classifier is a simple mask-and-value match, with +a fixed mask across the table. So, sanely supporting IPv6 +packets with extension headers in that framework was impossible. + +* There is no way to get a backpressure from L2 classifier +depending on memory usage. When it runs out of memory, +it simply crashes the box. When it runs out of memory ? +We don't really know. Depends on how it allocates it. + +* Since we need to match the *reflected* traffic, +we had to create *two* full session entries +in two different directions, which is quite wasteful memory-wise. + +* (showstopper): the L2 classifier runs only in +the bridged data path, so supporting routed data path +would require creating something else entirely different, +which would mean much more headaches support-wise going forward. + +Because of that, I have moved to a different model of +creating a session-5-tuple from the packet data - once, +and then doing all the matching just on that 5-tuple. + +This has allowed to add support for skipping IPv6 extension headers. + +Also, this new version started to store the sessions in a dedicated +bihash-per-interface, with the session key data being +aligned for the ingress packets, and being mirrored for the +egress packets. This allows of significant savings in memory, +because now we need to keep only one copy of the session table per +interface instead of two, and also to only have ONE node for all the lookups, +(L2/L3 path, in/out, IPv4/IPv6) - significantly reducing the code complexity. + +Unfortunately, bihash still has the "lack of backpressure" problem, +in a sense that if you try to insert too many entries and run out +of memory in the heap you supplied, you get a crash. + +To somewhat workaround against that, there is a "maximum tested number of sessions" +value, which tracks the currently inserted sessions in the bihash, +and if this number is being approached, a more aggressive cleanup +can happen. If this number is reached, two behaviors are possible: + +* attempt to do the stateless ACL matching and permit the packet + if it succeeds + +* deny the packet + +Currently I have opted for a second one, since it allows for +a better defined behavior, and if you have to permit +the traffic in both directions, why using stateful anyway ? + +In order to be able to do the cleanup, we need to discriminate between +the session types, with each session type having its own idle timeout. +In order to do that, we keep three lists, defined in enum acl_timeout_e: +ACL_TIMEOUT_UDP_IDLE, ACL_TIMEOUT_TCP_IDLE, ACL_TIMEOUT_TCP_TRANSIENT. + +The first one is hopefully obvious - it is just all UDP connections. +They have an idle timeout of 600 seconds. + +The second and third is a bit more subtle. TCP is a complicated protocol, +and we need to tread the fine line between doing too little and doing +too much, and triggering the potential compatibility issues because of +being a "middlebox". + +I decided to split the TCP connections into two classes: +established, and everything else. "Established", means we have seen +the SYN and ACK from both sides (with PUSH obviously masked out). +This is the "active" state of any TCP connection and we would like +to ensure we do not screw it up. So, the connections in this state +have the default idle timer of 24 hours. + +All the rest of the connections have the idle timeout of 2 minutes, +(inspired by an old value of MSL) and based on the observation +that the states this class represent are usually very short lived. + +Once we have these three baskets of connections, it is trivial to +imagine a simple cleanup mechanism to deal with this: take a +TCP transient connection that has been hanging around. + +It is debatable whether we want to do discrimination between the +different TCP transient connections. Assuming we do FIFO (and +the lists allow us to do just that), it means a given connection +on the head of the list has been hanging around for longest. +Thus, if we are short on resources, we might just go ahead and +reuse it within the datapath. + +This is where we are slowly approaching the question +"Why in the world have not you used timer wheel or such ?" + +The answer is simple: within the above constraints, it does +not buy me much. + +Also, timer wheel creates a leaky abstraction with a difficult +to manage corner case. Which corner case ? + +We have a set of objects (sessions) with an event that may +or may not happen (idle timeout timer firing), and a +necessity to reset the idle timeout when there is +activity on the session. + +In the worst case, where we had a 10000 of one-packet +UDP sessions just created 10 minutes ago, we would need +to deal with a spike of 10000 expired timers. + +Of course, if we have the active traffic on all +of these 10000 connections, then we will not have +to deal with that ? Right, but we will still have to deal +with canceling and requeueing the timers. + +In the best possible case, requeueing a timer is +going to be something along the lines of a linked-list +removal and reinsertion. + +However, keep in mind we already need to classify the +connections for reuse, so therefore we already have +the linked lists! + +And if we just check these linked lists periodically in +a FIFO fashion, we can get away with a very simple per-packet operation: +writing back the timestamp of "now" into the connection structure. + +Then rather than requeueing the list on a per-packet or per-frame +basis, we can defer this action until the time this session +appears on the head of the FIFO list, and the cleaning +routine makes the decision about whether to discard +the session (because the interval since last activity is bigger +than the idle timeout), or to requeue the session back to +the end of the list (because the last activity was less +than idle timeout ago). + +So, rather than using the timers, we can simply reuse our classification +FIFOs, with the following heuristic: do not look at the session that was +enqueued at time X until X+session_timeout. If we enqueue the sessions +in the order of their initial activity, then we can simply use enqueue +timestamp of the head session as a decision criterion for when we need +to get back at looking at it for the timeout purposes. + +Since the number of FIFOs is small, we get a slightly worse check +performance than with timers, but still O(1). + +We seemingly do quite a few "useless" operations of requeueing the items +back to the tail of the list - but, these are the operations we do not +have to do in the active data path, so overall it is a win. + +(Diversion: I believe this problem is congruent to poll vs. epoll or +events vs. threads, some reading on this subject: +http://web.archive.org/web/20120225022154/http://sheddingbikes.com/posts/1280829388.html) + +We can also can run a TCP-like scheme for adaptively changing +the wait period in the routine that deals with the connection timeouts: +we can attempt to check the connections a couple of times per second +(same as we would advance the timer wheel), and then if we have requeued +close to a max-per-quantum number of connections, we can half the waiting +interval, and if we did not requeue any, we can slowly increment the waiting +interval - which at a steady state should stabilize similar to what the TCP rate +does. + +reflexive ACLs: multi-thread +============================= + +The single-threaded implementation in 1704 used a separate "cleaner" process +to deal with the timing out of the connections. +It is all good and great when you know that there is only a single core +to run everything on, but the existence of the lists proves to be +a massive difficulty when it comes to operating from multiple threads. + +Initial study shows that with a few assumptions (e.g. that the cleaner running in main thread +and the worker have a demarcation point in time where either one or the other one touches +the session in the list) it might be possible to make it work, but the resulting +trickiness of doing it neatly with all the corner cases is quite large. + +So, for the multi-threaded scenario, we need to move the connection +aging back to the same CPU as its creation. + +Luckily we can do this with the help of the interrupts. + +So, the design is as follows: the aging thread (acl_fa_session_cleaner_process) +periodically fires the interrupts to the workers interrupt nodes (acl_fa_worker_session_cleaner_process_node.index), +using vlib_node_set_interrupt_pending(), and +the interrupt node acl_fa_worker_conn_cleaner_process() calls acl_fa_check_idle_sessions() +which does the actual job of advancing the lists. And within the actual datapath the only thing we will be +doing is putting the items onto FIFO, and updating the last active time on the existing connection. + +The one "delicate" part is that the worker for one leg of the connection might be different from +the worker of another leg of the connection - but, even if the "owner" tries to free the connection, +nothing terrible can happen - worst case the element of the pool (which is nominally free for a short period) +will get the timestamp updated - same thing about the TCP flags seen. + +A slightly trickier issue arises when the packet initially seen by one worker (thus owned by that worker), +and the return packet processed by another worker, and as a result changes the +the class of the connection (e.g. becomes TCP_ESTABLISHED from TCP_TRANSIENT or vice versa). +If the class changes from one with the shorter idle time to the one with the longer idle time, +then unless we are in the starvation mode where the transient connections are recycled, +we can simply do nothing and let the normal requeue mechanism kick in. If the class changes from the longer idle +timer to the shorter idle timer, then we risk keeping the connection around for longer than needed, which +will affect the resource usage. + +One solution to that is to have NxN ring buffers (where N is the number of workers), such that the non-owner +can signal to the owner the connection# that needs to be requeued out of order. + +A simpler solution though, is to ensure that each FIFO's period is equal to that of a shortest timer. +This way the resource starvation problem is taken care of, at an expense of some additional work. + +This all looks sufficiently nice and simple until a skeleton falls out of the closet: +sometimes we want to clean the connections en masse before they expire. + +There few potential scenarios: +1) removal of an ACL from the interface +2) removal of an interface +3) manual action of an operator (in the future). + +In order to tackle this, we need to modify the logic which decides whether to requeue the +connection on the end of the list, or to delete it due to idle timeout: + +We define a point in time, and have each worker thread fast-forward through its FIFO, +in the process looking for sessions that satisfy the criteria, and either keeping them or requeueing them. + +To keep the ease of appearance to the outside world, we still process this as an event +within the connection cleaner thread, but this event handler does as follows: +1) it creates the bitmap of the sw_if_index values requested to be cleared +2) for each worker, it waits to ensure there is no cleanup operation in progress (and if there is one, +it waits), and then makes a copy of the bitmap, sets the per-worker flag of a cleanup operation, and sends an interrupt. +3) wait until all cleanup operations have completed. + +Within the worker interrupt node, we check if the "cleanup in progress" is set, +and if it is, we check the "fast forward time" value. If unset, we initialize it to value now, and compare the +requested bitmap of sw_if_index values (pending_clear_sw_if_index_bitmap) with the bitmap of sw_if_index that this worker deals with. + +(we set the bit in the bitmap every time we enqueue the packet onto a FIFO - serviced_sw_if_index_bitmap in acl_fa_conn_list_add_session). + +If the result of this AND operation is zero - then we can clear the flag of cleanup in progress and return. +Else we kick off the quantum of cleanup, and make sure we get another interrupt ASAP if that cleanup operation returns non-zero, +meaning there is more work to do. +When that operation returns zero, everything has been processed, we can clear the "cleanup-in-progress" flag, and +zeroize the bitmap of sw_if_index-es requested to be cleaned. + +The interrupt node signals its wish to receive an interrupt ASAP by setting interrupt_is_needed +flag within the per-worker structure. The main thread, while waiting for the +cleanup operation to complete, checks if there is a request for interrupt, +and if there is - it sends one. + +This approach gives us a way to mass-clean the connections which is reusing the code of the regular idle +connection cleanup. + +One potential inefficiency is the bitmap values set by the session insertion +in the data path - there is nothing to clear them. + +So, if one rearranges the interface placement with the workers, then the cleanups will cause some unnecessary work. +For now, we consider it an acceptable limitation. It can be resolved by having another per-worker bitmap, which, when set, +would trigger the cleanup of the bits in the serviced_sw_if_index_bitmap). + +=== the end === + diff --git a/src/plugins/acl/hash_lookup.md b/src/plugins/acl/hash_lookup.md deleted file mode 100644 index 95524643..00000000 --- a/src/plugins/acl/hash_lookup.md +++ /dev/null @@ -1,241 +0,0 @@ -ACL plugin constant-time lookup design -====================================== - -The initial implementation of ACL plugin performs a trivial for() cycle, -going through the assigned ACLs on a per-packet basis. This is not very -efficient, even if for very short ACLs due to its simplicity it can beat -more advanced methods. - -However, to cover the case of longer ACLs with acceptable performance, -we need to have a better way of matching. This write-up proposes -a mechanism to make a lookup from O(M) where M is number of entries -to O(N) where N is number of different mask combinations. - -Preparation of ACL(s) ---------------------- - -The ACL plugin will maintain a global list of "mask types", i.e. the specific -configurations of "do not care" bits within the ACEs. -Upon the creation of a new ACL, a pass will be made through all the -ACEs, to assign and possibly allocate the "mask type number". - -Each ACL has a structure *hash_acl_info_t* representing the "hash-based" -parts of information related to that ACL, primarily the array of -*hash_ace_info_t* structures - each of the members of that array -corresponding to one of the rules (ACEs) in the original ACL, -for this they have a pair of *(acl_index, ace_index)* to keep track, -predominantly for the debugging. - -Why do we need a whole separate structure, and are not adding new fields -to the existing rile structure ? First, encapsulation, to minimize -the pollution of the main ACL code with the hash-based lookup artifacts. - -Second, one rule may correspond to more than one "hash-based" ACE. -In fact, most of the rules do correspond to two of those. Why ? - -Consider that the current ACL lookup logic is that if a packet -is not the initial fragment, and there is an L4 entry acting on the packet, -the comparison will be made only on the L4 protocol field value rather -than on the protocol and port values. This beaviour is governed by -*l4_match_nonfirst_fragment* flag in the *acl_main*, and was needed to -maintain the compatibility with the existing software switch implementation. - -While for the sequential check in *single_acl_match_5tuple()* -it is very easy to implement by just breaking out at the right moment, -in case of hash-based matching this cost us two checks: -one on full 5-tuple and the flag *pkt.is_nonfirst_fragment* being zero, -the second on 3-tuple and the flag *pkt.is_nonfirst_fragment* being one, -with the second check triggered by the *acl_main.l4_match_nonfirst_fragment* -setting being the default 1. This dictates the necessity of having a "match" -field in a given *hash_ace_info_t* element, which would reflect the value -we are supposed to match after applying the mask. - -There can be other circumstances when it might be beneficial to expand -the given rule in the original ACL into multiple - for example, as an -optimization within the port range handling for small port ranges -(this is not done as of the time of writing). - -Assigning ACLs to an interface ------------------------------- - -Once the ACL list is assigned to an interface, or, rather, a new ACL -is added to the list of the existing ACLs applied to the interface, -we need to update the bihash accelerating the lookup. - -All the entries for the lookups are stored within a single *48_8* bihash, -which captures the 5-tuple from the packet as well as the miscellaneous -per-packet information flags, e.g. *l4_valid*, *is_non_first_fragment*, -and so on. To facilitate the use of the single bihash by all the interfaces, -the *is_ip6*, *is_input*, *sw_if_index* are part of the key, -as well as *mask_type_index* - the latter being necessary because -there can be entries with the same value but different masks, e.g.: -`permit ::/0, permit::/128`. - -At the moment of an ACL being applied to an interface, we need to -walk the list of *hash_ace_info_t* entries corresponding to that ACL, -and update the bihash with the keys corresponding to the match -values in these entries. - -The value of the hash match contains the index into a per-*sw_if_index* vector -of *applied_ace_hash_entry_t* elements, as well as a couple of flags: -*shadowed* (optimization: if this flag on a matched entry is zero, means -we can stop the lookup early and declare a match - see below), -and *need_portrange_check* - meaning that what matched was a superset -of the actual match, and we need to perform an extra check. - -Also, upon insertion, we must keep in mind there can be -multiple *applied_ace_hash_entry_t* for the same key and must keep -a list of those. This is necessary to incrementally apply/unapply -the ACLs as part of the ACL vector: say, two ACLs have -"permit 2001:db8::1/128 any" - we should be able to retain the entry -for the second ACL even if we have deleted the first one. -Also, in case there are two entries with the same key but -different port ranges, say 0..42 and 142..65535 - we need -to be able to sequentially match on those if we decide not -to expand them into individual port-specific entries. - -Per-packet lookup ------------------ - -The simple single-packet lookup is defined in -*multi_acl_match_get_applied_ace_index*, which returns the index -of the applied hash ACE if there was a match, or ~0 if there wasn't. - -The future optimized per-packet lookup may be batched in three phases: - -1. Prepare the keys in the per-worker vector by doing logical AND of - original 5-tuple record with the elements of the mask vector. -2. Lookup the keys in the bihash in a batch manner, collecting the - result with lowest u64 (acl index within vector, ACE index) from - the hash lookup value, and performing the list walk if necessary - (for portranges) -3. Take the action from the ACL record as defined by (ACL#, ACE#) from the - resulting lookup winner, or, if no match found, then perform default deny. - -Shadowed/independent/redundant ACEs ------------------------------------- - -During the phase of combining multiple ACLs into one rulebase, when they -are applied to interface, we also can perform several optimizations. - -If a given ACE is a strict subset of another ACE located up in the linear -search order, we can ignore this ACE completely - because by definition -it will never match. We will call such an ACE *redundant*. Here is an example: - -``` -permit 2001:db8:1::/48 2001:db8:2::/48 (B) -deny 2001:d8b:1:1::/64 2001:db8:2:1::/64 (A) -``` - -A bit more formally, we can define this relationship of an ACE A to ACE B as: - -``` -redundant(aceA, aceB) := (contains(protoB, protoA) && contains(srcB, srcA) - && contains(dstB, dstA) && is_after(A, B)) -``` - -Here as "contains" we define an operation operating on the sets defined by -the protocol, (srcIP, srcPortDefinition) and (dstIP, dstPortDefinition) -respectively, and returning true if all the elements represented by -the second argument are represented by the first argument. The "is_after" -is true if A is located below B in the ruleset. - -If a given ACE does not intersect at all with any other ACE -in front of it, we can mark it as such. - -Then during the sequence of the lookups the successful hit on this ACE means -we do not need to look up other mask combinations - thus potentially -significantly speeding up the match process. Here is an example, -assuming we have the following ACL: - -``` -permit 2001:db8:1::/48 2001:db8:2::/48 (B) -deny 2001:db8:3::/48 2001:db8:2:1::/64 (A) -``` - -In this case if we match the second entry, we do not need to check whether -we have matched the first one - the source addresses are completely -different. We call such an ACE *independent* from another. - -We can define this as - -``` -independent(aceA, aceB) := (!intersect(protoA, protoB) || - !intersect(srcA, srcB) || - !intersect(dstA, dstB)) -``` - -where intersect is defined as operation returning true if there are -elements belonging to the sets of both arguments. - -If the entry A is neither redundant nor independent from B, and is below -B in the ruleset, we call such an entry *shadowed* by B, here is an example: - -``` -deny tcp 2001:db8:1::/48 2001:db8:2::/48 (B) -permit 2001:d8b:1:1::/64 2001:db8:2:1::/64 (A) -``` - -This means the earlier rule "carves out" a subset of A, thus leaving -a "shadow". (Evidently, the action needs to be different for the shadow -to have an effect, but for for the terminology sake we do not care). - -The more formal definition: - -``` -shadowed(aceA, aceB) := !redundante(aceA, aceB) && - !independent(aceA, aceB) && - is_after(aceA, aceB) -``` - -Using this terminology, any ruleset can be represented as -a DAG (Directed Acyclic Graph), with the bottom being the implicit -"deny any", pointing to the set of rules shadowing it or the ones -it is redundant for. - -These rules may in turn be shadowing each other. There is no cycles in -this graph because of the natural order of the rules - the rule located -closer to the end of the ruleset can never shadow or make redundant a rule -higher up. - -The optimization that enables can allow for is to skip matching certain -masks on a per-lookup basis - if a given rule has matched, -the only adjustments that can happen is the match with one of -the shadowing rules. - -Also, another avenue for the optimization can be starting the lookup process -with the mask type that maximizes the chances of the independent ACE match, -thus resulting in an ACE lookup being a single hash table hit. - - -Plumbing --------- - -All the new routines are located in a separate file, -so we can cleanly experiment with a different approach if this -does not fit all of the use cases. - -The constant-time lookup within the data path has the API with -the same signature as: - -``` -u8 -multi_acl_match_5tuple (u32 sw_if_index, fa_5tuple_t * pkt_5tuple, int is_l2, - int is_ip6, int is_input, u32 * acl_match_p, - u32 * rule_match_p, u32 * trace_bitmap) -``` - -There should be a new upper-level function with the same signature, which -will make a decision whether to use a linear lookup, or to use the -constant-time lookup implemented by this work, or to add some other -optimizations (e.g. by keeping the cache of the last N lookups). - -The calls to the routine doing preparatory work should happen -in `acl_add_list()` after creating the linear-lookup structures, -and the routine doing the preparatory work populating the hashtable -should be called from `acl_interface_add_del_inout_acl()` or its callees. - -The initial implementation will be geared towards looking up a single -match at a time, with the subsequent optimizations possible to make -the lookup for more than one packet. - diff --git a/src/vlibapi/api_doc.md b/src/vlibapi/api_doc.md index e620ee12..2e7ae09a 100644 --- a/src/vlibapi/api_doc.md +++ b/src/vlibapi/api_doc.md @@ -6,12 +6,12 @@ APIs. Messages are defined in `*.api` files. Today, there are about 50 api files, with more arriving as folks add programmable features. The API file compiler -sources reside in @ref src/tools/vppapigen . +sources reside in @ref src/tools/vppapigen. -Here's a typical request/response message definition, from -@ref src/vnet/interface.api : +From @ref src/vnet/interface.api, here's a typical request/response message +definition: -``` +```{.c} autoreply define sw_interface_set_flags { u32 client_index; @@ -22,10 +22,10 @@ Here's a typical request/response message definition, from }; ``` -To a first approximation, the API compiler renders this definition as -follows: +To a first approximation, the API compiler renders this definition into +`build-root/.../vpp/include/vnet/interface.api.h` as follows: -``` +```{.c} /****** Message ID / handler enum ******/ #ifdef vl_msg_id vl_msg_id(VL_API_SW_INTERFACE_SET_FLAGS, vl_api_sw_interface_set_flags_t_handler) @@ -60,10 +60,13 @@ follows: u32 context; i32 retval; }) vl_api_sw_interface_set_flags_reply_t; + + ... + #endif /* vl_typedefs */ ``` To change the admin state of an interface, a binary api client sends a -@ref vl_api_sw_interface_set_flags_t to vpp, which will respond with a +@ref vl_api_sw_interface_set_flags_t to VPP, which will respond with a @ref vl_api_sw_interface_set_flags_reply_t message. Multiple layers of software, transport types, and shared libraries @@ -76,7 +79,7 @@ implement a variety of features: message handlers. Correctly-coded message handlers know nothing about the transport used to -deliver messages to/from vpp. It's reasonably straighforward to use multiple +deliver messages to/from VPP. It's reasonably straighforward to use multiple API message transport types simultaneously. For historical reasons, binary api messages are (putatively) sent in network @@ -90,12 +93,12 @@ Since binary API messages are always processed in order, we allocate messages using a ring allocator whenever possible. This scheme is extremely fast when compared with a traditional memory allocator, and doesn't cause heap fragmentation. See -@ref src/vlibmemory/memory_shared.c @ref vl_msg_api_alloc_internal() . +@ref src/vlibmemory/memory_shared.c @ref vl_msg_api_alloc_internal(). Regardless of transport, binary api messages always follow a @ref msgbuf_t header: -``` +```{.c} typedef struct msgbuf_ { unix_shared_memory_queue_t *q; @@ -109,7 +112,7 @@ This structure makes it easy to trace messages without having to decode them - simply save data_len bytes - and allows @ref vl_msg_api_free() to rapidly dispose of message buffers: -``` +```{.c} void vl_msg_api_free (void *a) { @@ -130,34 +133,34 @@ decode them - simply save data_len bytes - and allows return; } - } + } ``` ## Message Tracing and Replay -It's extremely important that vpp can capture and replay sizeable binary API +It's extremely important that VPP can capture and replay sizeable binary API traces. System-level issues involving hundreds of thousands of API transactions can be re-run in a second or less. Partial replay allows one to binary-search for the point where the wheels fall off. One can add scaffolding to the data plane, to trigger when complex conditions obtain. With binary API trace, print, and replay, system-level bug reports of the form -"after 300,000 API transactions, the vpp data-plane stopped forwarding +"after 300,000 API transactions, the VPP data-plane stopped forwarding traffic, FIX IT!" can be solved offline. More often than not, one discovers that a control-plane client misprograms the data plane after a long time or under complex circumstances. Without direct evidence, "it's a data-plane problem!" -See @ref src/vlibmemory/memory_vlib.c @ref vl_msg_api_process_file() , -and @ref src/vlibapi/api_shared.c . See also the debug CLI command "api trace" +See @ref src/vlibmemory/memory_vlib.c @ref vl_msg_api_process_file(), +and @ref src/vlibapi/api_shared.c. See also the debug CLI command "api trace" ## Client connection details -Establishing a binary API connection to vpp from a C-language client +Establishing a binary API connection to VPP from a C-language client is easy: -``` +```{.c} int connect_to_vpe (char *client_name, int client_message_queue_length) { @@ -176,9 +179,9 @@ is easy: } ``` -32 is a typical value for client_message_queue_length. Vpp cannot +32 is a typical value for client_message_queue_length. VPP cannot block when it needs to send an API message to a binary API client, and -the vpp-side binary API message handlers are very fast. When sending +the VPP-side binary API message handlers are very fast. When sending asynchronous messages, make sure to scrape the binary API rx ring with some enthusiasm. @@ -187,7 +190,7 @@ some enthusiasm. Calling @ref vl_client_connect_to_vlib spins up a binary API message RX pthread: -``` +```{.c} static void * rx_thread_fn (void *arg) { @@ -214,31 +217,31 @@ To handle the binary API message queue yourself, use @ref vl_client_connect_to_vlib_no_rx_pthread. In turn, vl_msg_api_queue_handler(...) uses mutex/condvar signalling -to wake up, process vpp -> client traffic, then sleep. Vpp supplies a -condvar broadcast when the vpp -> client API message queue transitions +to wake up, process VPP -> client traffic, then sleep. VPP supplies a +condvar broadcast when the VPP -> client API message queue transitions from empty to nonempty. -Vpp checks its own binary API input queue at a very high rate. Vpp +VPP checks its own binary API input queue at a very high rate. VPP invokes message handlers in "process" context [aka cooperative multitasking thread context] at a variable rate, depending on data-plane packet processing requirements. ## Client disconnection details -To disconnect from vpp, call @ref vl_client_disconnect_from_vlib -. Please arrange to call this function if the client application -terminates abnormally. Vpp makes every effort to hold a decent funeral -for dead clients, but vpp can't guarantee to free leaked memory in the +To disconnect from VPP, call @ref vl_client_disconnect_from_vlib. +Please arrange to call this function if the client application +terminates abnormally. VPP makes every effort to hold a decent funeral +for dead clients, but VPP can't guarantee to free leaked memory in the shared binary API segment. -## Sending binary API messages to vpp +## Sending binary API messages to VPP -The point of the exercise is to send binary API messages to vpp, and -to receive replies from vpp. Many vpp binary APIs comprise a client +The point of the exercise is to send binary API messages to VPP, and +to receive replies from VPP. Many VPP binary APIs comprise a client request message, and a simple status reply. For example, to set the admin status of an interface, one codes: -``` +```{.c} vl_api_sw_interface_set_flags_t *mp; mp = vl_msg_api_alloc (sizeof (*mp)); @@ -262,9 +265,9 @@ Key points: network byte order * The client-library global data structure @ref api_main keeps track - of sufficient pointers and handles used to communicate with vpp + of sufficient pointers and handles used to communicate with VPP -## Receiving binary API messages from vpp +## Receiving binary API messages from VPP Unless you've made other arrangements (see @ref vl_client_connect_to_vlib_no_rx_pthread), *messages are received on a @@ -273,7 +276,7 @@ thread is the responsibility of the application! Set up message handlers about as follows: -``` +```{.c} #define vl_typedefs /* define message structures */ #include #undef vl_typedefs @@ -319,7 +322,7 @@ vectors in the @ref api_main_t structure. As of this writing: not all vector element values can be set through the API. You'll see sporadic API message registrations followed by minor adjustments of this form: -``` +```{.c} /* * Thread-safe API messages */ -- cgit 1.2.3-korg