Network Working Group S. Previdi, Ed. Internet-Draft C. Filsfils Intended status: Standards Track Cisco Systems, Inc. Expires: August 5, 2017 B. Field Comcast I. Leung Rogers Communications J. Linkova Google E. Aries Facebook T. Kosugi NTT E. Vyncke Cisco Systems, Inc. D. Lebrun Universite Catholique de Louvain February 1, 2017 IPv6 Segment Routing Header (SRH) draft-ietf-6man-segment-routing-header-05 Abstract Segment Routing (SR) allows a node to steer a packet through a controlled set of instructions, called segments, by prepending an SR header to the packet. A segment can represent any instruction, topological or service-based. SR allows to enforce a flow through any path (topological, or application/service based) while maintaining per-flow state only at the ingress node to the SR domain. Segment Routing can be applied to the IPv6 data plane with the addition of a new type of Routing Extension Header. This draft describes the Segment Routing Extension Header Type and how it is used by SR capable nodes. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Previdi, et al. Expires August 5, 2017 [Page 1] Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 5, 2017. Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Segment Routing Documents . . . . . . . . . . . . . . . . . . 3 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Data Planes supporting Segment Routing . . . . . . . . . 4 2.2. Segment Routing (SR) Domain . . . . . . . . . . . . . . . 4 2.2.1. SR Domain in a Service Provider Network . . . . . . . 5 2.2.2. SR Domain in a Overlay Network . . . . . . . . . . . 6 3. Segment Routing Extension Header (SRH) . . . . . . . . . . . 7 3.1. SRH TLVs . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.1. Ingress Node TLV . . . . . . . . . . . . . . . . . . 10 3.1.2. Egress Node TLV . . . . . . . . . . . . . . . . . . . 11 3.1.3. Opaque Container TLV . . . . . . . . . . . . . . . . 11 3.1.4. Padding TLV . . . . . . . . . . . . . . . . . . . . . 12 3.1.5. HMAC TLV . . . . . . . . . . . . . . . . . . . . . . 13 3.2. SRH and RFC2460 behavior . . . . . . . . . . . . . . . . 14 4. SRH Procedures . . . . . . . . . . . . . . . . . . . . . . . 14 4.1. Source SR Node . . . . . . . . . . . . . . . . . . . . . 14 4.2. Transit Node . . . . . . . . . . . . . . . . . . . . . . 15 4.3. SR Segment Endpoint Node . . . . . . . . . . . . . . . . 16 5. Security Considerations . . . . . . . . . . . . . . . . . . . 16 Previdi, et al. Expires August 5, 2017 [Page 2] Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 5.1. Threat model . . . . . . . . . . . . . . . . . . . . . . 17 5.1.1. Source routing threats . . . . . . . . . . . . . . . 17 5.1.2. Applicability of RFC 5095 to SRH . . . . . . . . . . 17 5.1.3. Service stealing threat . . . . . . . . . . . . . . . 18 5.1.4. Topology disclosure . . . . . . . . . . . . . . . . . 18 5.1.5. ICMP Generation . . . . . . . . . . . . . . . . . . . 18 5.2. Security fields in SRH . . . . . . . . . . . . . . . . . 19 5.2.1. Selecting a hash algorithm . . . . . . . . . . . . . 20 5.2.2. Performance impact of HMAC . . . . . . . . . . . . . 21 5.2.3. Pre-shared key management . . . . . . . . . . . . . . 21 5.3. Deployment Models . . . . . . . . . . . . . . . . . . . . 22 5.3.1. Nodes within the SR domain . . . . . . . . . . . . . 22 5.3.2. Nodes outside of the SR domain . . . . . . . . . . . 22 5.3.3. SR path exposure . . . . . . . . . . . . . . . . . . 23 5.3.4. Impact of BCP-38 . . . . . . . . . . . . . . . . . . 23 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 7. Manageability Considerations . . . . . . . . . . . . . . . . 24 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 10.1. Normative References . . . . . . . . . . . . . . . . . . 25 10.2. Informative References . . . . . . . . . . . . . . . . . 25 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27 1. Segment Routing Documents Segment Routing terminology is defined in [I-D.ietf-spring-segment-routing]. Segment Routing use cases are described in [RFC7855] and [I-D.ietf-spring-ipv6-use-cases]. Segment Routing protocol extensions are defined in [I-D.ietf-isis-segment-routing-extensions], and [I-D.ietf-ospf-ospfv3-segment-routing-extensions]. 2. Introduction Segment Routing (SR), defined in [I-D.ietf-spring-segment-routing], allows a node to steer a packet through a controlled set of instructions, called segments, by prepending an SR header to the packet. A segment can represent any instruction, topological or service-based. SR allows to enforce a flow through any path (topological or service/application based) while maintaining per-flow state only at the ingress node to the SR domain. Segments can be derived from different components: IGP, BGP, Services, Contexts, Locators, etc. The list of segment forming the path is called the Segment List and is encoded in the packet header. Previdi, et al. Expires August 5, 2017 [Page 3] Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 SR allows the use of strict and loose source based routing paradigms without requiring any additional signaling protocols in the infrastructure hence delivering an excellent scalability property. The source based routing model described in [I-D.ietf-spring-segment-routing] is inherited from the ones proposed by [RFC1940] and [RFC2460]. The source based routing model offers the support for explicit routing capability. 2.1. Data Planes supporting Segment Routing Segment Routing (SR), can be instantiated over MPLS ([I-D.ietf-spring-segment-routing-mpls]) and IPv6. This document defines its instantiation over the IPv6 data-plane based on the use- cases defined in [I-D.ietf-spring-ipv6-use-cases]. This document defines a new type of Routing Header (originally defined in [RFC2460]) called the Segment Routing Header (SRH) in order to convey the Segment List in the packet header as defined in [I-D.ietf-spring-segment-routing]. Mechanisms through which segment are known and advertised are outside the scope of this document. A segment is materialized by an IPv6 address. A segment identifies a topological instruction or a service instruction. A segment can be either: o global: a global segment represents an instruction supported by all nodes in the SR domain and it is instantiated through an IPv6 address globally known in the SR domain. o local: a local segment represents an instruction supported only by the node who originates it and it is instantiated through an IPv6 address that is known only by the local node. 2.2. Segment Routing (SR) Domain We define the concept of the Segment Routing Domain (SR Domain) as the set of nodes participating into the source based routing model. These nodes may be connected to the same physical infrastructure (e.g.: a Service Provider's network) as well as nodes remotely connected to each other (e.g.: an enterprise VPN or an overlay). A non-exhaustive list of examples of SR Domains is: o The network of an operator, service provider, content provider, enterprise including nodes, links and Autonomous Systems. Previdi, et al. Expires August 5, 2017 [Page 4] Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 o A set of nodes connected as an overlay over one or more transit providers. The overlay nodes exchange SR-enabled traffic with segments belonging solely to the overlay routers (the SR domain). None of the segments in the SR-enabled packets exchanged by the overlay belong to the transit networks The source based routing model through its instantiation of the Segment Routing Header (SRH) defined in this document equally applies to all the above examples. It is assumed in this document that the SRH is added to the packet by its source, consistently with the source routing model defined in [RFC2460]. For example: o At the node originating the packet (host, server). o At the ingress node of an SR domain where the ingress node receives an IPv6 packet and encapsulates it into an outer IPv6 header followed by a Segment Routing header. 2.2.1. SR Domain in a Service Provider Network The following figure illustrates an SR domain consisting of an operator's network infrastructure. (-------------------------- Operator 1 -----------------------) ( ) ( (-----AS 1-----) (-------AS 2-------) (----AS 3-------) ) ( ( ) ( ) ( ) ) A1--(--(--11---13--14-)--(-21---22---23--24-)--(-31---32---34--)--)--Z1 ( ( /|\ /|\ /| ) ( |\ /|\ /|\ /| ) ( |\ /|\ /| \ ) ) A2--(--(/ | \/ | \/ | ) ( | \/ | \/ | \/ | ) ( | \/ | \/ | \)--)--Z2 ( ( | /\ | /\ | ) ( | /\ | /\ | /\ | ) ( | /\ | /\ | ) ) ( ( |/ \|/ \| ) ( |/ \|/ \|/ \| ) ( |/ \|/ \| ) ) A3--(--(--15---17--18-)--(-25---26---27--28-)--(-35---36---38--)--)--Z3 ( ( ) ( ) ( ) ) ( (--------------) (------------------) (---------------) ) ( ) (-------------------------------------------------------------) Figure 1: Service Provider SR Domain Figure 1 describes an operator network including several ASes and delivering connectivity between endpoints. In this scenario, Segment Routing is used within the operator networks and across the ASes boundaries (all being under the control of the same operator). In this case segment routing can be used in order to address use cases such as end-to-end traffic engineering, fast re-route, egress peer Previdi, et al. Expires August 5, 2017 [Page 5] Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 engineering, data-center traffic engineering as described in [RFC7855], [I-D.ietf-spring-ipv6-use-cases] and [I-D.ietf-spring-resiliency-use-cases]. Typically, an IPv6 packet received at ingress (i.e.: from outside the SR domain), is classified according to network operator policies and such classification results into an outer header with an SRH applied to the incoming packet. The SRH contains the list of segment representing the path the packet must take inside the SR domain. Thus, the SA of the packet is the ingress node, the DA (due to SRH procedures described in Section 4) is set as the first segment of the path and the last segment of the path is the egress node of the SR domain. The path may include intra-AS as well as inter-AS segments. It has to be noted that all nodes within the SR domain are under control of the same administration. When the packet reaches the egress point of the SR domain, the outer header and its SRH are removed so that the destination of the packet is unaware of the SR domain the packet has traversed. The outer header with the SRH is no different from any other tunneling encapsulation mechanism and allows a network operator to implement traffic engineering mechanisms so to efficiently steer traffic across his infrastructure. 2.2.2. SR Domain in a Overlay Network The following figure illustrates an SR domain consisting of an overlay network over multiple operator's networks. (--Operator 1---) (-----Operator 2-----) (--Operator 3---) ( ) ( ) ( ) A1--(--11---13--14--)--(--21---22---23--24--)--(-31---32---34--)--C1 ( /|\ /|\ /| ) ( |\ /|\ /|\ /| ) ( |\ /|\ /| \ ) A2--(/ | \/ | \/ | ) ( | \/ | \/ | \/ | ) ( | \/ | \/ | \)--C2 ( | /\ | /\ | ) ( | /\ | /\ | /\ | ) ( | /\ | /\ | ) ( |/ \|/ \| ) ( |/ \|/ \|/ \| ) ( |/ \|/ \| ) A3--(--15---17--18--)--(--25---26---27--28--)--(-35---36---38--)--C3 ( ) ( | | | ) ( ) (---------------) (--|----|---------|--) (---------------) | | | B1 B2 B3 Figure 2: Overlay SR Domain Previdi, et al. Expires August 5, 2017 [Page 6] Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 Figure 2 describes an overlay consisting of nodes connected to three different network operators and forming a single overlay network where Segment routing packets are exchanged. The overlay consists of nodes A1, A2, A3, B1, B2, B3, C1, C2 and C3. These nodes are connected to their respective network operator and form an overlay network. Each node may originate packets with an SRH which contains, in the segment list of the SRH or in the DA, segments identifying other overlay nodes. This implies that packets with an SRH may traverse operator's networks but, obviously, these SRHs cannot contain an address/segment of the transit operators 1, 2 and 3. The SRH originated by the overlay can only contain address/segment under the administration of the overlay (e.g. address/segments supported by A1, A2, A3, B1, B2, B3, C1,C2 or C3). In this model, the operator network nodes are transit nodes and, according to [RFC2460], MUST NOT inspect the routing extension header since they are not the DA of the packet. It is a common practice in operators networks to filter out, at ingress, any packet whose DA is the address of an internal node and it is also possible that an operator would filter out any packet destined to an internal address and having an extension header in it. This common practice does not impact the SR-enabled traffic between the overlay nodes as the intermediate transit networks never see a destination address belonging to their infrastructure. These SR- enabled overlay packets will thus never be filtered by the transit operators. In all cases, transit packets (i.e.: packets whose DA is outside the domain of the operator's network) will be forwarded accordingly without introducing any security concern in the operator's network. This is similar to tunneled packets. 3. Segment Routing Extension Header (SRH) A new type of the Routing Header (originally defined in [RFC2460]) is defined: the Segment Routing Header (SRH) which has a new Routing Type, (suggested value 4) to be assigned by IANA. The Segment Routing Header (SRH) is defined as follows: Previdi, et al. Expires August 5, 2017 [Page 7] Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Hdr Ext Len | Routing Type | Segments Left | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | First Segment | Flags | RESERVED | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Segment List[0] (128 bits IPv6 address) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | ... | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Segment List[n] (128 bits IPv6 address) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ // // // Optional Type Length Value objects (variable) // // // +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: o Next Header: 8-bit selector. Identifies the type of header immediately following the SRH. o Hdr Ext Len: 8-bit unsigned integer, is the length of the SRH header in 8-octet units, not including the first 8 octets. o Routing Type: TBD, to be assigned by IANA (suggested value: 4). o Segments Left. Defined in [RFC2460], it contains the index, in the Segment List, of the next segment to inspect. Segments Left is decremented at each segment. o First Segment: contains the index, in the Segment List, of the first segment of the path which is in fact the last element of the Segment List. o Flags: 8 bits of flags. Following flags are defined: Previdi, et al. Expires August 5, 2017 [Page 8] Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |U|P|O|A|H| U | +-+-+-+-+-+-+-+-+ U: Unused and for future use. SHOULD be unset on transmission and MUST be ignored on receipt. P-flag: Protected flag. Set when the packet has been rerouted through FRR mechanism by an SR endpoint node. O-flag: OAM flag. When set, it indicates that this packet is an operations and management (OAM) packet. A-flag: Alert flag. If present, it means important Type Length Value (TLV) objects are present. See Section 3.1 for details on TLVs objects. H-flag: HMAC flag. If set, the HMAC TLV is present and is encoded as the last TLV of the SRH. In other words, the last 36 octets of the SRH represent the HMAC information. See Section 3.1.5 for details on the HMAC TLV. o RESERVED: SHOULD be unset on transmission and MUST be ignored on receipt. o Segment List[n]: 128 bit IPv6 addresses representing the nth segment in the Segment List. The Segment List is encoded starting from the last segment of the path. I.e., the first element of the segment list (Segment List [0]) contains the last segment of the path while the last segment of the Segment List (Segment List[n]) contains the first segment of the path. The index contained in "Segments Left" identifies the current active segment. o Type Length Value (TLV) are described in Section 3.1. 3.1. SRH TLVs This section defines TLVs of the Segment Routing Header. Type Length Value (TLV) contain optional information that may be used by the node identified in the DA of the packet. It has to be noted that the information carried in the TLVs is not intended to be used by the routing layer. Typically, TLVs carry information that is consumed by other components (e.g.: OAM) than the routing function. Each TLV has its own length, format and semantic. The code-point allocated (by IANA) to each TLV defines both the format and the Previdi, et al. Expires August 5, 2017 [Page 9] Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 semantic of the information carried in the TLV. Multiple TLVs may be encoded in the same SRH. The "Length" field of the TLV is primarily used to skip the TLV while inspecting the SRH in case the node doesn't support or recognize the TLV codepoint. The "Length" defines the TLV length in octets and not including the "Type" and "Length" fields. The primary scope of TLVs is to give the receiver of the packet information related to the source routed path (e.g.: where the packet entered in the SR domain and where it is expected to exit). Additional TLVs may be defined in the future. 3.1.1. Ingress Node TLV The Ingress Node TLV is optional and identifies the node this packet traversed when entered the SR domain. The Ingress Node TLV has following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | RESERVED | Flags | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Ingress Node (16 octets) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: o Type: to be assigned by IANA (suggested value 1). o Length: 18. o RESERVED: 8 bits. SHOULD be unset on transmission and MUST be ignored on receipt. o Flags: 8 bits. No flags are defined in this document. o Ingress Node: 128 bits. Defines the node where the packet is expected to enter the SR domain. In the encapsulation case described in Section 2.2.1, this information corresponds to the SA of the encapsulating header. Previdi, et al. Expires August 5, 2017 [Page 10] Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 3.1.2. Egress Node TLV The Egress Node TLV is optional and identifies the node this packet is expected to traverse when exiting the SR domain. The Egress Node TLV has following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | RESERVED | Flags | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Egress Node (16 octets) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: o Type: to be assigned by IANA (suggested value 2). o Length: 18. o RESERVED: 8 bits. SHOULD be unset on transmission and MUST be ignored on receipt. o Flags: 8 bits. No flags are defined in this document. o Egress Node: 128 bits. Defines the node where the packet is expected to exit the SR domain. In the encapsulation case described in Section 2.2.1, this information corresponds to the last segment of the SRH in the encapsulating header. 3.1.3. Opaque Container TLV The Opaque Container TLV is optional and has the following format: Previdi, et al. Expires August 5, 2017 [Page 11] Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | RESERVED | Flags | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Opaque Container (16 octets) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: o Type: to be assigned by IANA (suggested value 3). o Length: 18. o RESERVED: 8 bits. SHOULD be unset on transmission and MUST be ignored on receipt. o Flags: 8 bits. No flags are defined in this document. o Opaque Container: 128 bits of opaque data not relevant for the routing layer. Typically, this information is consumed by a non- routing component of the node receiving the packet (i.e.: the node in the DA). 3.1.4. Padding TLV The Padding TLV is optional and with the purpose of aligning the SRH on a 8 octet boundary. The Padding TLV has the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Padding (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ // Padding (variable) // +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: o Type: to be assigned by IANA (suggested value 4). o Length: 1 to 7 Previdi, et al. Expires August 5, 2017 [Page 12] Internet-Draft IPv6 Segment Routing Header (SRH) February 2017 o Padding: from 1 to 7 octets of padding. Padding bits have no semantic. They SHOULD be set to 0 on transmission and MUST be ignored on receipt. The following applies to the Padding TLV: o Padding TLV is optional and MAY only appear once in the SRH. If present, it MUST have a length between 1 and 7 octets. o The Padding TLV is used in order to align the SRH total length on the 8 octet boundary. o When present, the Padding TLV MUST appear as the last TLV before the HMAC TLV (if HMAC TLV is present). o When present, the Padding TLV MUST have a length from 1 to 7 in order to align the SRH total lenght on a 8-octet boundary. o When a router inspecting the SRH encounters the Padding TLV, it MUST assume that no }
/*
 *------------------------------------------------------------------
 * Copyright (c) 2017 Cisco and/or its affiliates.
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at:
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 *------------------------------------------------------------------
 */

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <net/if.h>
#include <linux/if_tun.h>
#include <sys/ioctl.h>
#include <linux/virtio_net.h>
#include <linux/vhost.h>
#include <sys/eventfd.h>

#include <vlib/vlib.h>
#include <vlib/pci/pci.h>
#include <vlib/unix/unix.h>
#include <vnet/ethernet/ethernet.h>
#include <vnet/ip/ip4_packet.h>
#include <vnet/ip/ip6_packet.h>
#include <vnet/devices/virtio/virtio.h>
#include <vnet/devices/virtio/pci.h>

virtio_main_t virtio_main;

#define _IOCTL(fd,a,...) \
  if (ioctl (fd, a, __VA_ARGS__) < 0) \
    { \
      err = clib_error_return_unix (0, "ioctl(" #a ")"); \
      goto error; \
    }

static clib_error_t *
call_read_ready (clib_file_t * uf)
{
  virtio_main_t *nm = &virtio_main;
  vnet_main_t *vnm = vnet_get_main ();
  u16 qid = uf->private_data & 0xFFFF;
  virtio_if_t *vif =
    vec_elt_at_index (nm->interfaces, uf->private_data >> 16);
  u64 b;

  CLIB_UNUSED (ssize_t size) = read (uf->file_descriptor, &b, sizeof (b));
  if ((qid & 1) == 0)
    vnet_device_input_set_interrupt_pending (vnm, vif->hw_if_index, qid);

  return 0;
}


clib_error_t *
virtio_vring_init (vlib_main_t * vm, virtio_if_t * vif, u16 idx, u16 sz)
{
  clib_error_t *err = 0;
  virtio_vring_t *vring;
  struct vhost_vring_state state = { 0 };
  struct vhost_vring_addr addr = { 0 };
  struct vhost_vring_file file = { 0 };
  clib_file_t t = { 0 };
  int i;

  if (!is_pow2 (sz))
    return clib_error_return (0, "ring size must be power of 2");

  if (sz > 32768)
    return clib_error_return (0, "ring size must be 32768 or lower");

  if (sz == 0)
    sz = 256;

  if (idx % 2)
    {
      vlib_thread_main_t *thm = vlib_get_thread_main ();
      vec_validate_aligned (vif->txq_vrings, TX_QUEUE_ACCESS (idx),
			    CLIB_CACHE_LINE_BYTES);
      vring = vec_elt_at_index (vif->txq_vrings, TX_QUEUE_ACCESS (idx));
      if (thm->n_vlib_mains > 1)
	clib_spinlock_init (&vring->lockp);
    }
  else
    {
      vec_validate_aligned (vif->rxq_vrings, RX_QUEUE_ACCESS (idx),
			    CLIB_CACHE_LINE_BYTES);
      vring = vec_elt_at_index (vif->rxq_vrings, RX_QUEUE_ACCESS (idx));
    }
  i = sizeof (struct vring_desc) * sz;
  i = round_pow2 (i, CLIB_CACHE_LINE_BYTES);
  vring->desc = clib_mem_alloc_aligned (i, CLIB_CACHE_LINE_BYTES);
  clib_memset (vring->desc, 0, i);

  i = sizeof (struct vring_avail) + sz * sizeof (vring->avail->ring[0]);
  i = round_pow2 (i, CLIB_CACHE_LINE_BYTES);
  vring->avail = clib_mem_alloc_aligned (i, CLIB_CACHE_LINE_BYTES);
  clib_memset (vring->avail, 0, i);
  // tell kernel that we don't need interrupt
  vring->avail->flags = VIRTIO_RING_FLAG_MASK_INT;

  i = sizeof (struct vring_used) + sz * sizeof (struct vring_used_elem);
  i = round_pow2 (i, CLIB_CACHE_LINE_BYTES);
  vring->used = clib_mem_alloc_aligned (i, CLIB_CACHE_LINE_BYTES);
  clib_memset (vring->used, 0, i);

  vring->queue_id = idx;
  ASSERT (vring->buffers == 0);
  vec_validate_aligned (vring->buffers, sz, CLIB_CACHE_LINE_BYTES);

  vring->size = sz;
  vring->call_fd = eventfd (0, EFD_NONBLOCK | EFD_CLOEXEC);
  vring->kick_fd = eventfd (0, EFD_CLOEXEC);

  t.read_function = call_read_ready;
  t.file_descriptor = vring->call_fd;
  t.private_data = vif->dev_instance << 16 | idx;
  t.description = format (0, "%U vring %u", format_virtio_device_name,
			  vif->dev_instance, idx);
  vring->call_file_index = clib_file_add (&file_main, &t);

  state.index = idx;
  state.num = sz;
  _IOCTL (vif->fd, VHOST_SET_VRING_NUM, &state);

  addr.index = idx;
  addr.flags = 0;
  addr.desc_user_addr = pointer_to_uword (vring->desc);
  addr.avail_user_addr = pointer_to_uword (vring->avail);
  addr.used_user_addr = pointer_to_uword (vring->used);
  _IOCTL (vif->fd, VHOST_SET_VRING_ADDR, &addr);

  file.index = idx;
  file.fd = vring->kick_fd;
  _IOCTL (vif->fd, VHOST_SET_VRING_KICK, &file);
  file.fd = vring->call_fd;
  _IOCTL (vif->fd, VHOST_SET_VRING_CALL, &file);
  file.fd = vif->tap_fd;
  _IOCTL (vif->fd, VHOST_NET_SET_BACKEND, &file);

error:
  return err;
}

inline void
virtio_free_rx_buffers (vlib_main_t * vm, virtio_vring_t * vring)
{
  u16 used = vring->desc_in_use;
  u16 last = vring->last_used_idx;
  u16 mask = vring->size - 1;

  while (used)
    {
      vlib_buffer_free (vm, &vring->buffers[last & mask], 1);
      last++;
      used--;
    }
}

clib_error_t *
virtio_vring_free_rx (vlib_main_t * vm, virtio_if_t * vif, u32 idx)
{
  virtio_vring_t *vring =
    vec_elt_at_index (vif->rxq_vrings, RX_QUEUE_ACCESS (idx));

  clib_file_del_by_index (&file_main, vring->call_file_index);
  close (vring->kick_fd);
  close (vring->call_fd);
  if (vring->used)
    {
      virtio_free_rx_buffers (vm, vring);
      clib_mem_free (vring->used);
    }
  if (vring->desc)
    clib_mem_free (vring->desc);
  if (vring->avail)
    clib_mem_free (vring->avail);
  vec_free (vring->buffers);
  return 0;
}

inline void
virtio_free_used_desc (vlib_main_t * vm, virtio_vring_t * vring)
{
  u16 used = vring->desc_in_use;
  u16 sz = vring->size;
  u16 mask = sz - 1;
  u16 last = vring->last_used_idx;
  u16 n_left = vring->used->idx - last;

  if (n_left == 0)
    return;

  while (n_left)
    {
      struct vring_used_elem *e = &vring->used->ring[last & mask];
      u16 slot = e->id;

      vlib_buffer_free (vm, &vring->buffers[slot], 1);
      used--;
      last++;
      n_left--;
    }
  vring->desc_in_use = used;
  vring->last_used_idx = last;
}

clib_error_t *
virtio_vring_free_tx (vlib_main_t * vm, virtio_if_t * vif, u32 idx)
{
  virtio_vring_t *vring =
    vec_elt_at_index (vif->txq_vrings, TX_QUEUE_ACCESS (idx));

  clib_file_del_by_index (&file_main, vring->call_file_index);
  close (vring->kick_fd);
  close (vring->call_