summaryrefslogtreecommitdiffstats
path: root/src/vnet/classify/vnet_classify.h
AgeCommit message (Collapse)AuthorFilesLines
2019-01-02Fixes for buliding for 32bit targets:David Johnson1-0/+1
* u32/u64/uword mismatches * pointer-to-int fixes * printf formatting issues * issues with incorrect "ULL" and related suffixes * structure alignment and padding issues Change-Id: I70b989007758755fe8211c074f651150680f60b4 Signed-off-by: David Johnson <davijoh3@cisco.com>
2018-09-08L2 BVI/FIB: Update L2 FIB table when BVI's MAC changesNeale Ranns1-3/+0
also some moving of l2 headers to reduce dependencies Change-Id: I7a700a411a91451ef13fd65f9c90de2432b793bb Signed-off-by: Neale Ranns <nranns@cisco.com>
2018-02-26Added u8x16,u32x4,u64x2 variants of _zero_byte_mask(x) for ARM/NEON ↵Adrian Oanca1-8/+4
platform. VPP-1129 Change-Id: I954acb56d901e42976e71534317f38d7c4359bcf Signed-off-by: Adrian Oanca <adrian.oanca@enea.com>
2018-02-20vppinfra: CLIB_HAVE_VEC128 mandates SSE4.2Damjan Marion1-1/+1
Change-Id: I6511110d0472203498a4f8741781eeeeb4f90844 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-02-07classifier-based ACL: refactor + add output ACLAndrew Yourtchenko1-0/+1
For implementation of MACIP ACLs enhancement (VPP-1088), an outbound classifier-based ACL would be needed. There was an existing incomplete code for outbound ACLs, it looked almost exact copy of input ACLs, minus the various enhancements, trying to sync that code seemed error-prone and cumbersome to maintain in the longer run. This change refactors the input+output ACLs processing into a unified routine (thus any changes will have effect on both), and also adds the API to set the output interface ACL, with the same format and semantics as the existing input one (except working on output ACL of course). WARNING: IP outbound ACL in L3 mode clobbers the ip.* fields in the vnet_buffer_opaque_t, since the code is using l2_classify.* The net_buffer (p0)->ip.save_rewrite_length is rescued into l2_classify.pad.l2_len, and used to rewind the header in case of drop, so that ipX_drop prints something sensible. Change-Id: I62f814f1e3650e504474a3a5359edb8a0a8836ed Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
2017-11-28VPP-246 Coding standards cleanup - vnet/vnet/classifykhemendra kumar1-238/+278
Signed-off-by: khemendra kumar <khemendra.kumar13@gmail.com> Change-Id: I0d94ef604d34da6981e7c2d2b4da5ec3ec5fb19a
2017-11-18Rename classifier ip6-sr metadata set actionDave Barach1-1/+1
There's nothing ip6-sr specific about it. Change-Id: I9e3710162bd81b535c46599c988557abf5a5003b Signed-off-by: Dave Barach <dave@barachs.net>
2017-11-10add classify session action set-sr-policy-indexGabriel Ganne1-0/+1
This allows to use the classifier to steer source routing packets instead of using the "sr steer" command. This way we can steer on anything instead of only the dst ip address. test: * add add_node_next function to the VppPapiProvider class. * add simple test scenario using the classifier to steer packets with dest ip addr == a7::/8 to the source routing insert node. * use new interface indexes (3,4) instead of (0,1) to prevent a cleanup conflict with the other tests which attach a specific fib to the interface. The test creates interfaces sepsrated from the other tests to prevent a conflict in the cleaning of the ip6 fib index 1 which causes vpp not to be able to find a default route on this table. Change-Id: Ibacb30fab3ce53f0dfe848ca6a8cdf0d111d8336 Signed-off-by: Gabriel Ganne <gabriel.ganne@enea.com>
2017-05-24Leak locks and tables in the ClassifierNeale Ranns1-3/+6
Change-Id: Iae04c57bba87ab3665388eadd0805f75171636a5 Signed-off-by: Neale Ranns <nranns@cisco.com>
2017-05-20VPP-849: improve vnet classifier memory allocator performanceDave Barach1-46/+57
Port the linear-scan bucket fix from bihash_template.c. Change-Id: Id8b2d1fe402401f098270ce6121c2f44f2f24c49 Signed-off-by: Dave Barach <dave@barachs.net>
2017-02-23Fix vpp built-in version of api_unformat_sw_if_index(...)Dave Barach1-0/+1
Change-Id: I103fe19a1ecbaf3746ec6b957fa1010458cc9fae Signed-off-by: Dave Barach <dave@barachs.net>
2016-12-28Reorganize source tree to use single autotools instanceDamjan Marion1-0/+523
Change-Id: I7b51f88292e057c6443b12224486f2d0c9f8ae23 Signed-off-by: Damjan Marion <damarion@cisco.com>
ltin.Pseudo */ .highlight .fm { color: #0066bb; font-weight: bold } /* Name.Function.Magic */ .highlight .vc { color: #336699 } /* Name.Variable.Class */ .highlight .vg { color: #dd7700 } /* Name.Variable.Global */ .highlight .vi { color: #3333bb } /* Name.Variable.Instance */ .highlight .vm { color: #336699 } /* Name.Variable.Magic */ .highlight .il { color: #0000DD; font-weight: bold } /* Literal.Number.Integer.Long */ }
# Contiv/VPP Network Operation

This document describes the network operation of the Contiv/VPP k8s network plugin. It
elaborates the operation and config options of the Contiv IPAM, as well as 
details on how the VPP gets programmed by Contiv/VPP control plane.

The following picture shows 2-node k8s deployment of Contiv/VPP, with a VXLAN tunnel
established between the nodes to forward inter-node POD traffic. The IPAM options
are depicted on the Node 1, whereas the VPP programming is depicted on the Node 2.

![Contiv/VPP Architecture](/_images/contiv-networking.png "contiv-networking.png")

## Contiv/VPP IPAM (IP Address Management)

IPAM in Contiv/VPP is based on the concept of **Node ID**. The Node ID is a number
that uniquely identifies a node in the k8s cluster. The first node is assigned
the ID of 1, the second node 2, etc. If a node leaves the cluster, its 
ID is released back to the pool and will be re-used by the next node.

The Node ID is used to calculate per-node IP subnets for PODs
and other internal subnets that need to be unique on each node. Apart from the Node ID,
the input for IPAM calculations is a set of config knobs, which can be specified
in the `IPAMConfig` section of the \[Contiv/VPP deployment YAML\](../../../k8s/contiv-vpp.yaml):

- **PodSubnetCIDR** (default `10.1.0.0/16`): each pod gets an IP address assigned 
from this range. The size of this range (default `/16`) dictates upper limit of 
POD count for the entire k8s cluster (default 65536 PODs).

- **PodNetworkPrefixLen** (default `24`): per-node dedicated podSubnet range. 
From the allocatable range defined in `PodSubnetCIDR`, this value will dictate the 
allocation for each node. With the default value (`24`) this indicates that each node 
has a `/24` slice of the `PodSubnetCIDR`. The Node ID is used to address the node. 
In case of `PodSubnetCIDR = 10.1.0.0/16`, `PodNetworkPrefixLen = 24` and `NodeID = 5`,
the resulting POD subnet for the node would be `10.1.5.0/24`.

- **PodIfIPCIDR** (default `10.2.1.0/24`): VPP-internal addresses put the VPP interfaces 
facing towards the PODs into L3 mode. This IP range will be reused 
on each node, thereby it is never externally addressable outside of the node itself.
The only requirement is that this subnet should not collide with any other IPAM subnet.

- **VPPHostSubnetCIDR** (default `172.30.0.0/16`): used for addressing 
the interconnect of VPP with the Linux network stack, within the same node. 
Since this subnet needs to  be unique on each node, the Node ID is used to determine 
the actual subnet used on the node with the combination of `VPPHostNetworkPrefixLen`, `PodSubnetCIDR` and `PodNetworkPrefixLen`.

- **VPPHostNetworkPrefixLen** (default `24`): used to calculate the subnet 
for addressing the interconnect of VPP with the Linux network stack, within the same node.
With `VPPHostSubnetCIDR = 172.30.0.0/16`, `VPPHostNetworkPrefixLen = 24` and
`NodeID = 5` the resulting subnet for the node would be `172.30.5.0/24`.

- **NodeInterconnectCIDR** (default `192.168.16.0/24`): range for the addresses 
assigned to the data plane interfaces managed by VPP. Unless DHCP is used 
(`NodeInterconnectDHCP = True`), the Contiv/VPP control plane automatically assigns
an IP address from this range to the DPDK-managed ethernet interface bound to VPP 
on each node. The actual IP address will be calculated from the Node ID (e.g., with 
`NodeInterconnectCIDR = 192.168.16.0/24` and `NodeID = 5`, the resulting IP
address assigned to the ethernet interface on VPP will be `192.168.16.5` ).

- **NodeInterconnectDHCP** (default `False`): instead of assigning the IPs
for the data plane interfaces, which are managed by VPP from `NodeInterconnectCIDR` by the Contiv/VPP
control plane, DHCP assigns the IP addresses. The DHCP must be running in the network where the data
plane interface is connected, in case `NodeInterconnectDHCP = True`,
`NodeInterconnectCIDR` is ignored.

- **VxlanCIDR** (default `192.168.30.0/24`): in order to provide inter-node
POD to POD connectivity via any underlay network (not necessarily an L2 network),
Contiv/VPP sets up a VXLAN tunnel overlay between each of the 2 nodes within the cluster. Each node needs its unique IP address of the VXLAN BVI interface. This IP address
is automatically calculated from the Node ID, (e.g., with `VxlanCIDR = 192.168.30.0/24`
and `NodeID = 5`, the resulting IP address assigned to the VXLAN BVI interface will be `192.168.30.5`).

## VPP Programming
This section describes how the Contiv/VPP control plane programs VPP,  based on the
events it receives from k8s. This section is not necessarily for understanding
basic Contiv/VPP operation, but is very useful for debugging purposes.

Contiv/VPP currently uses a single VRF to forward the traffic between PODs on a node,
PODs on different nodes, host network stack, and DPDK-managed dataplane interface. The forwarding
between each of them is purely L3-based, even for cases of communication
between 2 PODs within the same node.

#### DPDK-Managed Data Interface
In order to allow inter-node communication between PODs on different
nodes and between PODs and outside world, Contiv/VPP uses data-plane interfaces
bound to VPP using DPDK. Each node should have one "main" VPP interface,
which is unbound from the host network stack and bound to VPP.
The Contiv/VPP control plane automatically configures the interface either
via DHCP, or with a statically assigned address (see `NodeInterconnectCIDR` and
`NodeInterconnectDHCP` yaml settings). 

#### PODs on the Same Node
PODs are connected to VPP using virtio-based TAP interfaces created by VPP,
with the POD-end of the interface placed into the POD container network namespace.
Each POD is assigned an IP address from the `PodSubnetCIDR`. The allocated IP
is configured with the prefix length `/32`. Additionally, a static route pointing 
towards the VPP is configured in the POD network namespace. 
The  prefix length `/32` means that all IP traffic will be forwarded to the
default route - VPP. To get rid of unnecessary broadcasts between POD and VPP,
a static ARP entry is configured for the gateway IP in the POD namespace, as well
as for POD IP on VPP. Both ends of the TAP interface have a static (non-default) 
MAC address applied.

#### PODs with hostNetwork=true
PODs with a `hostNetwork=true` attribute are not placed into a separate network namespace, they instead use the main host Linux network namespace; therefore, they are not directly connected to the VPP. They rely on the interconnection between the VPP and the host Linux network stack,
which is described in the next paragraph. Note, when these PODs access some service IP, their network communication will be NATed in Linux (by iptables rules programmed by kube-proxy)
as opposed to VPP, which is the case for the PODs connected to VPP directly.

#### Linux Host Network Stack
In order to interconnect the Linux host network stack with VPP (to allow access
to the cluster resources from the host itself, as well as for the PODs with `hostNetwork=true`),
VPP creates a TAP interface between VPP and the main network namespace. The TAP interface  is configured with IP addresses from the `VPPHostSubnetCIDR` range, with `.1` in the latest octet on the VPP side, and `.2` on the host side. The name of the host interface is `vpp1`. The host has static routes pointing to VPP configured with: 
- A route to the whole `PodSubnetCIDR` to route traffic targeting PODs towards VPP.
- A route to `ServiceCIDR` (default `10.96.0.0/12`), to route service IP targeted traffic that has not been translated by kube-proxy for some reason towards VPP. 
- The host also has a static ARP entry configured for the IP of the VPP-end TAP interface, to get rid of unnecessary broadcasts between the main network namespace and VPP.

#### VXLANs to Other Nodes
In order to provide inter-node POD to POD connectivity via any underlay network 
(not necessarily an L2 network), Contiv/VPP sets up a VXLAN tunnel overlay between 
each 2 nodes within the cluster (full mesh). 

All VXLAN tunnels are terminated in one bridge domain on each VPP. The bridge domain
has learning and flooding disabled, the l2fib of the bridge domain contains a static entry for each VXLAN tunnel. Each bridge domain has a BVI interface, which
interconnects the bridge domain with the main VRF (L3 forwarding). This interface needs
a unique IP address, which is assigned from the `VxlanCIDR` as describe above.

The main VRF contains several static routes that point to the BVI IP addresses of other nodes.
For each node, it is a route to PODSubnet and VppHostSubnet of the remote node, as well as a route
to the management IP address of the remote node. For each of these routes, the next hop IP is the
BVI interface IP of the remote node, which goes via the BVI interface of the local node.

The VXLAN tunnels and the static routes pointing to them are added/deleted on each VPP,
whenever a node is added/deleted in the k8s cluster.


#### More Info
Please refer to the \[Packet Flow Dev Guide\](../dev-guide/PACKET_FLOW.html) for more 
detailed description of paths traversed by request and response packets 
inside Contiv/VPP Kubernetes cluster under different situations.