diff options
author | Scott Keeler <skeeler@cisco.com> | 2018-10-01 14:50:57 -0400 |
---|---|---|
committer | Dave Barach <openvpp@barachs.net> | 2018-10-05 13:47:42 +0000 |
commit | 25c4d396eae99e23c4ebe7155fde7700dd1130b9 (patch) | |
tree | 2cd3661e26e37cf3e04327559479bc6ce0c9a752 /docs | |
parent | 2d24cd027275905f308f75bf45d0f9d163f2235b (diff) |
docs: add contiv vpp
Change-Id: I92227fc4968fc6a478beb7f38707b91e9f0635ec
Signed-off-by: Scott Keeler <skeeler@cisco.com>
Diffstat (limited to 'docs')
-rw-r--r-- | docs/_images/contiv-arch.png | bin | 0 -> 48416 bytes | |||
-rw-r--r-- | docs/_images/contiv-networking.png | bin | 0 -> 111996 bytes | |||
-rw-r--r-- | docs/usecases/contiv/BUG_REPORTS.md | 333 | ||||
-rw-r--r-- | docs/usecases/contiv/CORE_FILES.md | 73 | ||||
-rw-r--r-- | docs/usecases/contiv/CUSTOM_MGMT_NETWORK.md | 26 | ||||
-rw-r--r-- | docs/usecases/contiv/K8s_Overview.md | 109 | ||||
-rw-r--r-- | docs/usecases/contiv/MANUAL_INSTALL.md | 472 | ||||
-rw-r--r-- | docs/usecases/contiv/MULTI_NIC_SETUP.md | 21 | ||||
-rw-r--r-- | docs/usecases/contiv/NETWORKING.md | 137 | ||||
-rw-r--r-- | docs/usecases/contiv/Prometheus.md | 159 | ||||
-rw-r--r-- | docs/usecases/contiv/SECURITY.md | 104 | ||||
-rw-r--r-- | docs/usecases/contiv/SINGLE_NIC_SETUP.md | 111 | ||||
-rw-r--r-- | docs/usecases/contiv/VMWARE_FUSION_HOST.md | 52 | ||||
-rw-r--r-- | docs/usecases/contiv/VPPTRACE.md | 95 | ||||
-rw-r--r-- | docs/usecases/contiv/VPP_CONFIG.md | 153 | ||||
-rw-r--r-- | docs/usecases/contiv/VPP_PACKET_TRACING_K8S.md | 510 | ||||
-rw-r--r-- | docs/usecases/contiv/Vagrant.md | 250 | ||||
-rw-r--r-- | docs/usecases/contiv/index.rst | 26 | ||||
-rw-r--r-- | docs/usecases/index.rst | 1 |
19 files changed, 2632 insertions, 0 deletions
diff --git a/docs/_images/contiv-arch.png b/docs/_images/contiv-arch.png Binary files differnew file mode 100644 index 00000000000..4128a00ba50 --- /dev/null +++ b/docs/_images/contiv-arch.png diff --git a/docs/_images/contiv-networking.png b/docs/_images/contiv-networking.png Binary files differnew file mode 100644 index 00000000000..fb37f34772b --- /dev/null +++ b/docs/_images/contiv-networking.png diff --git a/docs/usecases/contiv/BUG_REPORTS.md b/docs/usecases/contiv/BUG_REPORTS.md new file mode 100644 index 00000000000..23c9a7c393c --- /dev/null +++ b/docs/usecases/contiv/BUG_REPORTS.md @@ -0,0 +1,333 @@ +# Debugging and Reporting Bugs in Contiv-VPP + +## Bug Report Structure + +- [Deployment description](#describe-deployment): +Briefly describes the deployment, where an issue was spotted, +number of k8s nodes, is DHCP/STN/TAP used. + +- [Logs](#collecting-the-logs): +Attach corresponding logs, at least from the vswitch pods. + +- [VPP config](#inspect-vpp-config): +Attach output of the show commands. + +- [Basic Collection Example](#basic-example) + +### Describe Deployment +Since contiv-vpp can be used with different configurations, it is helpful +to attach the config that was applied. Either attach `values.yaml` passed to the helm chart, +or attach the [corresponding part](https://github.com/contiv/vpp/blob/42b3bfbe8735508667b1e7f1928109a65dfd5261/k8s/contiv-vpp.yaml#L24-L38) from the deployment yaml file. + +``` + contiv.yaml: |- + TCPstackDisabled: true + UseTAPInterfaces: true + TAPInterfaceVersion: 2 + NatExternalTraffic: true + MTUSize: 1500 + IPAMConfig: + PodSubnetCIDR: 10.1.0.0/16 + PodNetworkPrefixLen: 24 + PodIfIPCIDR: 10.2.1.0/24 + VPPHostSubnetCIDR: 172.30.0.0/16 + VPPHostNetworkPrefixLen: 24 + NodeInterconnectCIDR: 192.168.16.0/24 + VxlanCIDR: 192.168.30.0/24 + NodeInterconnectDHCP: False +``` + +Information that might be helpful: + - Whether node IPs are statically assigned, or if DHCP is used + - STN is enabled + - Version of TAP interfaces used + - Output of `kubectl get pods -o wide --all-namespaces` + + +### Collecting the Logs + +The most essential thing that needs to be done when debugging and **reporting an issue** +in Contiv-VPP is **collecting the logs from the contiv-vpp vswitch containers**. + +#### a) Collecting Vswitch Logs Using kubectl +In order to collect the logs from individual vswitches in the cluster, connect to the master node +and then find the POD names of the individual vswitch containers: + +``` +$ kubectl get pods --all-namespaces | grep vswitch +kube-system contiv-vswitch-lqxfp 2/2 Running 0 1h +kube-system contiv-vswitch-q6kwt 2/2 Running 0 1h +``` + +Then run the following command, with *pod name* replaced by the actual POD name: +``` +$ kubectl logs <pod name> -n kube-system -c contiv-vswitch +``` + +Redirect the output to a file to save the logs, for example: + +``` +kubectl logs contiv-vswitch-lqxfp -n kube-system -c contiv-vswitch > logs-master.txt +``` + +#### b) Collecting Vswitch Logs Using Docker +If option a) does not work, then you can still collect the same logs using the plain docker +command. For that, you need to connect to each individual node in the k8s cluster, and find the container ID of the vswitch container: + +``` +$ docker ps | grep contivvpp/vswitch +b682b5837e52 contivvpp/vswitch "/usr/bin/supervisor…" 2 hours ago Up 2 hours k8s_contiv-vswitch_contiv-vswitch-q6kwt_kube-system_d09b6210-2903-11e8-b6c9-08002723b076_0 +``` + +Now use the ID from the first column to dump the logs into the `logs-master.txt` file: +``` +$ docker logs b682b5837e52 > logs-master.txt +``` + +#### Reviewing the Vswitch Logs + +In order to debug an issue, it is good to start by grepping the logs for the `level=error` string, for example: +``` +$ cat logs-master.txt | grep level=error +``` + +Also, VPP or contiv-agent may crash with some bugs. To check if some process crashed, grep for the string `exit`, for example: +``` +$ cat logs-master.txt | grep exit +2018-03-20 06:03:45,948 INFO exited: vpp (terminated by SIGABRT (core dumped); not expected) +2018-03-20 06:03:48,948 WARN received SIGTERM indicating exit request +``` + +#### Collecting the STN Daemon Logs +In STN (Steal The NIC) deployment scenarios, often need to collect and review the logs +from the STN daemon. This needs to be done on each node: +``` +$ docker logs contiv-stn > logs-stn-master.txt +``` + +#### Collecting Logs in Case of Crash Loop +If the vswitch is crashing in a loop (which can be determined by increasing the number in the `RESTARTS` +column of the `kubectl get pods --all-namespaces` output), the `kubectl logs` or `docker logs` would +give us the logs of the latest incarnation of the vswitch. That might not be the original root cause +of the very first crash, so in order to debug that, we need to disable k8s health check probes to not +restart the vswitch after the very first crash. This can be done by commenting-out the `readinessProbe` +and `livenessProbe` in the contiv-vpp deployment YAML: + +```diff +diff --git a/k8s/contiv-vpp.yaml b/k8s/contiv-vpp.yaml +index 3676047..ffa4473 100644 +--- a/k8s/contiv-vpp.yaml ++++ b/k8s/contiv-vpp.yaml +@@ -224,18 +224,18 @@ spec: + ports: + # readiness + liveness probe + - containerPort: 9999 +- readinessProbe: +- httpGet: +- path: /readiness +- port: 9999 +- periodSeconds: 1 +- initialDelaySeconds: 15 +- livenessProbe: +- httpGet: +- path: /liveness +- port: 9999 +- periodSeconds: 1 +- initialDelaySeconds: 60 ++ # readinessProbe: ++ # httpGet: ++ # path: /readiness ++ # port: 9999 ++ # periodSeconds: 1 ++ # initialDelaySeconds: 15 ++ # livenessProbe: ++ # httpGet: ++ # path: /liveness ++ # port: 9999 ++ # periodSeconds: 1 ++ # initialDelaySeconds: 60 + env: + - name: MICROSERVICE_LABEL + valueFrom: +``` + +If VPP is the crashing process, please follow the [CORE_FILES](CORE_FILES.html) guide and provide the coredump file. + + +### Inspect VPP Config +Inspect the following areas: +- Configured interfaces (issues related basic node/pod connectivity issues): +``` +vpp# sh int addr +GigabitEthernet0/9/0 (up): + 192.168.16.1/24 +local0 (dn): +loop0 (up): + l2 bridge bd_id 1 bvi shg 0 + 192.168.30.1/24 +tapcli-0 (up): + 172.30.1.1/24 +``` + +- IP forwarding table: +``` +vpp# sh ip fib +ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] locks:[src:(nil):2, src:adjacency:3, src:default-route:1, ] +0.0.0.0/0 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:1 buckets:1 uRPF:0 to:[7:552]] + [0] [@0]: dpo-drop ip4 +0.0.0.0/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:2 buckets:1 uRPF:1 to:[0:0]] + [0] [@0]: dpo-drop ip4 + +... +... + +255.255.255.255/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:5 buckets:1 uRPF:4 to:[0:0]] + [0] [@0]: dpo-drop ip4 +``` +- ARP Table: +``` +vpp# sh ip arp + Time IP4 Flags Ethernet Interface + 728.6616 192.168.16.2 D 08:00:27:9c:0e:9f GigabitEthernet0/8/0 + 542.7045 192.168.30.2 S 1a:2b:3c:4d:5e:02 loop0 + 1.4241 172.30.1.2 D 86:41:d5:92:fd:24 tapcli-0 + 15.2485 10.1.1.2 SN 00:00:00:00:00:02 tapcli-1 + 739.2339 10.1.1.3 SN 00:00:00:00:00:02 tapcli-2 + 739.4119 10.1.1.4 SN 00:00:00:00:00:02 tapcli-3 +``` +- NAT configuration (issues related to services): +``` +DBGvpp# sh nat44 addresses +NAT44 pool addresses: +192.168.16.10 + tenant VRF independent + 0 busy udp ports + 0 busy tcp ports + 0 busy icmp ports +NAT44 twice-nat pool addresses: +``` +``` +vpp# sh nat44 static mappings +NAT44 static mappings: + tcp local 192.168.42.1:6443 external 10.96.0.1:443 vrf 0 out2in-only + tcp local 192.168.42.1:12379 external 192.168.42.2:32379 vrf 0 out2in-only + tcp local 192.168.42.1:12379 external 192.168.16.2:32379 vrf 0 out2in-only + tcp local 192.168.42.1:12379 external 192.168.42.1:32379 vrf 0 out2in-only + tcp local 192.168.42.1:12379 external 192.168.16.1:32379 vrf 0 out2in-only + tcp local 192.168.42.1:12379 external 10.109.143.39:12379 vrf 0 out2in-only + udp local 10.1.2.2:53 external 10.96.0.10:53 vrf 0 out2in-only + tcp local 10.1.2.2:53 external 10.96.0.10:53 vrf 0 out2in-only +``` +``` +vpp# sh nat44 interfaces +NAT44 interfaces: + loop0 in out + GigabitEthernet0/9/0 out + tapcli-0 in out +``` +``` +vpp# sh nat44 sessions +NAT44 sessions: + 192.168.20.2: 0 dynamic translations, 3 static translations + 10.1.1.3: 0 dynamic translations, 0 static translations + 10.1.1.4: 0 dynamic translations, 0 static translations + 10.1.1.2: 0 dynamic translations, 6 static translations + 10.1.2.18: 0 dynamic translations, 2 static translations +``` +- ACL config (issues related to policies): +``` +vpp# sh acl-plugin acl +``` +- "Steal the NIC (STN)" config (issues related to host connectivity when STN is active): +``` +vpp# sh stn rules +- rule_index: 0 + address: 10.1.10.47 + iface: tapcli-0 (2) + next_node: tapcli-0-output (410) +``` +- Errors: +``` +vpp# sh errors +``` +- Vxlan tunnels: +``` +vpp# sh vxlan tunnels +``` +- Vxlan tunnels: +``` +vpp# sh vxlan tunnels +``` +- Hardware interface information: +``` +vpp# sh hardware-interfaces +``` + +### Basic Example + +[contiv-vpp-bug-report.sh][1] is an example of a script that may be a useful starting point to gathering the above information using kubectl. + +Limitations: +- The script does not include STN daemon logs nor does it handle the special + case of a crash loop + +Prerequisites: +- The user specified in the script must have passwordless access to all nodes + in the cluster; on each node in the cluster the user must have passwordless + access to sudo. + +#### Setting up Prerequisites +To enable looging into a node without a password, copy your public key to the following +node: +``` +ssh-copy-id <user-id>@<node-name-or-ip-address> +``` + +To enable running sudo without a password for a given user, enter: +``` +$ sudo visudo +``` + +Append the following entry to run ALL command without a password for a given +user: +``` +<userid> ALL=(ALL) NOPASSWD:ALL +``` + +You can also add user `<user-id>` to group `sudo` and edit the `sudo` +entry as follows: + +``` +# Allow members of group sudo to execute any command +%sudo ALL=(ALL:ALL) NOPASSWD:ALL +``` + +Add user `<user-id>` to group `<group-id>` as follows: +``` +sudo adduser <user-id> <group-id> +``` +or as follows: +``` +usermod -a -G <group-id> <user-id> +``` +#### Working with the Contiv-VPP Vagrant Test Bed +The script can be used to collect data from the [Contiv-VPP test bed created with Vagrant][2]. +To collect debug information from this Contiv-VPP test bed, do the +following steps: +* In the directory where you created your vagrant test bed, do: +``` + vagrant ssh-config > vagrant-ssh.conf +``` +* To collect the debug information do: +``` + ./contiv-vpp-bug-report.sh -u vagrant -m k8s-master -f <path-to-your-vagrant-ssh-config-file>/vagrant-ssh.conf +``` + +[1]: https://github.com/contiv/vpp/tree/master/scripts/contiv-vpp-bug-report.sh +[2]: https://github.com/contiv/vpp/blob/master/vagrant/README.md diff --git a/docs/usecases/contiv/CORE_FILES.md b/docs/usecases/contiv/CORE_FILES.md new file mode 100644 index 00000000000..5d269cd1504 --- /dev/null +++ b/docs/usecases/contiv/CORE_FILES.md @@ -0,0 +1,73 @@ +# Capturing VPP core dumps +In order to debug a crash of VPP, it is required to provide a coredump file, which allows backtracing of the VPP issue. The following items are the requirements for capturing a coredump: + +#### 1. Disable k8s Probes to Prevent k8s from Restarting the POD with a Crashed VPP +As described in [BUG_REPORTS.md](BUG_REPORTS.html#collecting-the-logs-in-case-of-crash-loop). + +#### 2. Modify VPP Startup config file +In `/etc/vpp/contiv-vswitch.conf`, add the following lines into the `unix` section: + +``` +unix { + ... + coredump-size unlimited + full-coredump +} +``` + +#### 3. Turn on Coredumps in the Vswitch Container +After re-deploying Contiv-VPP networking, enter bash shell in the vswitch +container (use actual name of the vswitch POD - `contiv-vswitch-7whk7` in this case): +``` +kubectl exec -it contiv-vswitch-7whk7 -n kube-system -c contiv-vswitch bash +``` + +Enable coredumps: +``` +mkdir -p /tmp/dumps +sysctl -w debug.exception-trace=1 +sysctl -w kernel.core_pattern="/tmp/dumps/%e-%t" +ulimit -c unlimited +echo 2 > /proc/sys/fs/suid_dumpable +``` + +#### 4. Let VPP Crash +Now repeat the steps that lead to the VPP crash. You can also force VPP to crash at the point where it is +running (e.g., if it is stuck) by using the SIGQUIT signal: +``` +kill -3 `pidof vpp` +``` + +#### 5. Locate and Inspect the Core File +The core file should appear in `/tmp/dumps` in the container: +``` +cd /tmp/dumps +ls +vpp_main-1524124440 +``` + +You can try to backtrace, after installing gdb: +``` +apt-get update && apt-get install gdb +gdb vpp vpp_main-1524124440 +(gdb) bt +``` + +#### 6. Copy the Core File Out of the Container +Finally, copy the core file out of the container. First, while still inside the container, +pack the core file into an archive: + +``` +cd /tmp/dumps +tar cvzf vppdump.tar.gz vpp_main-1524124440 +``` + +Now, on the host, determine the docker ID of the container, and then copy the file out of the host: +``` +docker ps | grep vswitch_contiv +d7aceb2e4876 c43a70ac3d01 "/usr/bin/supervisor…" 25 minutes ago Up 25 minutes k8s_contiv-vswitch_contiv-vswitch-zqzn6_kube-system_9923952f-43a6-11e8-be84-080027de08ea_0 + +docker cp d7aceb2e4876:/tmp/dumps/vppdump.tar.gz . +``` + +Now you are ready to file a bug in [jira.fd.io](https://jira.fd.io/) and attach the core file.
\ No newline at end of file diff --git a/docs/usecases/contiv/CUSTOM_MGMT_NETWORK.md b/docs/usecases/contiv/CUSTOM_MGMT_NETWORK.md new file mode 100644 index 00000000000..bf2937f2016 --- /dev/null +++ b/docs/usecases/contiv/CUSTOM_MGMT_NETWORK.md @@ -0,0 +1,26 @@ +### Setting Up a Custom Management Network on Multi-Homed Nodes + +If the interface you use for Kubernetes management traffic (for example, the +IP address used for `kubeadm join`) is not the one that contains the default +route out of the host, then you need to specify the management node IP address in +the Kubelet config file. Add the following line to: +(`/etc/systemd/system/kubelet.service.d/10-kubeadm.conf`): +``` +Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false --node-ip=<node-management-ip-address>" +``` +#### Example +Consider a 2 node deployment where each node is connected to 2 networks - +`10.0.2.0/24` and `192.168.56.0/24`, and the default route on each node points +to the interface connected to the `10.0.2.0/24` subnet. We want to use subnet +`192.168.56.0/24` for Kubernetes management traffic. Assume the addresses of +nodes connected to `192.168.56.0/24` are `192.168.56.105` and `192.168.56.106`. + +On the `192.168.56.105` node you add the following line to `10-kubeadm.conf`: +``` +Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false --node-ip=192.168.56.105" +``` +On the `192.168.56.106` node you add the following line to `10-kubeadm.conf`: +``` +Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false --node-ip=192.168.56.106" +``` + diff --git a/docs/usecases/contiv/K8s_Overview.md b/docs/usecases/contiv/K8s_Overview.md new file mode 100644 index 00000000000..f9cf9c5a9ba --- /dev/null +++ b/docs/usecases/contiv/K8s_Overview.md @@ -0,0 +1,109 @@ +# Contiv/VPP Kubernetes Network Plugin + + +## Overview + +Kubernates is a container orchestration system that efficiently manages Docker containers. The Docker containers and container platforms provide many advantages over traditional virtualization. Container isolation is done on the kernel level, which eliminates the need for a guest virtual operating system, and therefore makes containers much more efficient, faster, and lightweight. The containers in Contiv/VPP are referred to as PODs. + +Contiv/VPP is a Kubernetes network plugin that uses [FD.io VPP](https://fd.io/) +to provide network connectivity between PODs in a k8s cluster (k8s is an abbreviated reference for kubernates). +It deploys itself as a set of system PODs in the `kube-system` namespace, +some of them (`contiv-ksr`, `contiv-etcd`) on the master node, and some +of them (`contiv-cni`, `contiv-vswitch`, `contiv-stn`) on each node in the cluster. + +Contiv/VPP is fully integrated with k8s via its components, +and it automatically reprograms itself upon each change in the cluster +via k8s API. + +The main component of the [VPP](https://fd.io/technology/#vpp) solution, which +runs within the `contiv-vswitch` POD on each node in the cluster. The VPP solution also provides +POD-to-POD connectivity across the nodes in the cluster, as well as host-to-POD +and outside-to-POD connectivity. This solution also leverages +VPP's fast data processing that runs completely in userspace, and uses +[DPDK](https://dpdk.org/) for fast access to the network IO layer. + +Kubernetes services and policies are also a part of the VPP configuration, +which means they are fully supported on VPP, without the need of forwarding +packets into the Linux network stack (Kube Proxy), which makes them very +effective and scalable. + + +## Architecture + +Contiv/VPP consists of several components, each of them packed and shipped as +a Docker container. Two of them deploy on Kubernetes master node only: + + - [Contiv KSR](#contiv-ksr) + - [Contiv ETCD](#contiv-etcd) + +The rest of them deploy on all nodes within the k8s cluster (including the master node): + +- [Contiv vSwitch](#contiv-vswitch) +- [Contiv CNI](#contiv-cni) +- [Contiv STN](#contiv-stn-daemon) + + +The following section briefly describes the individual Contiv components, which are displayed +as orange boxes on the picture below: + +![Contiv/VPP Architecture](../../_images/contiv-arch.png) + + +### Contiv KSR +Contiv KSR (Kubernetes State Reflector) is an agent that subscribes to k8s control plane, watches k8s resources and +propagates all relevant cluster-related information into the Contiv ETCD data store. +Other Contiv components do not access the k8s API directly, they subscribe to +Contiv ETCD instead. For more information on KSR, read the +[KSR Readme](https://github.com/contiv/vpp/blob/master/cmd/contiv-ksr/README.md). + + +### Contiv ETCD +Contiv/VPP uses its own instance of the ETCD database for storage of k8s cluster-related data +reflected by KSR, which are then accessed by Contiv vSwitch Agents running on +individual nodes. Apart from the data reflected by KSR, ETCD also stores persisted VPP +configuration of individual vswitches (mainly used to restore the operation after restarts), +as well as some more internal metadata. + + +### Contiv vSwitch +vSwitch is the main networking component that provides the connectivity to PODs. +It deploys on each node in the cluster, and consists of two main components packed +into a single Docker container: VPP and Contiv VPP Agent. + +**VPP** is the data plane software that provides the connectivity between PODs, host Linux +network stack, and data-plane NIC interface controlled by VPP: + - PODs are connected to VPP using TAP interfaces wired between VPP, and each POD network namespace. + - host network stack is connected to VPP using another TAP interface connected + to the main (default) network namespace. + - data-plane NIC is controlled directly by VPP using DPDK. Note, this means that + this interface is not visible to the host Linux network stack, and the node either needs another + management interface for k8s control plane communication, or + [STN (Steal The NIC)](SINGLE_NIC_SETUP.html) deployment must be applied. + +**Contiv VPP Agent** is the control plane part of the vSwitch container. It is responsible +for configuring the VPP according to the information gained from ETCD, and requests +from Contiv STN. It is based on the [Ligato VPP Agent](https://github.com/ligato/vpp-agent) code with extensions that are related to k8s. + +For communication with VPP, it uses VPP binary API messages sent via shared memory using +[GoVPP](https://wiki.fd.io/view/GoVPP). +For connection with Contiv STN, the agent acts as a GRPC server serving CNI requests +forwarded from the Contiv CNI. + +### Contiv CNI +Contiv CNI (Container Network Interface) is a simple binary that implements the +[Container Network Interface](https://github.com/containernetworking/cni) +API and is being executed by Kubelet upon POD creation and deletion. The CNI binary +just packs the request into a GRPC request and forwards it to the Contiv VPP Agent +running on the same node, which then processes it (wires/unwires the container) +and replies with a response, which is then forwarded back to Kubelet. + + +### Contiv STN Daemon +This section discusses how the Contiv [STN (Steal The NIC)](SINGLE_NIC_SETUP.html) daemon operation works. As already mentioned, the default setup of Contiv/VPP requires two network interfaces +per node: one controlled by VPP for data facing PODs, and one controlled by the host +network stack for k8s control plane communication. In case that your k8s nodes +do not provide two network interfaces, Contiv/VPP can work in the single NIC setup, +when the interface will be "stolen" from the host network stack just before starting +the VPP and configured with the same IP address on VPP, as well as +on the host-VPP interconnect TAP interface, as it had in the host before it. +For more information on STN setup, read the [Single NIC Setup README](./SINGLE_NIC_SETUP.html) diff --git a/docs/usecases/contiv/MANUAL_INSTALL.md b/docs/usecases/contiv/MANUAL_INSTALL.md new file mode 100644 index 00000000000..672a7b3fc8b --- /dev/null +++ b/docs/usecases/contiv/MANUAL_INSTALL.md @@ -0,0 +1,472 @@ +# Manual Installation +This document describes how to clone the Contiv respository and then use [kubeadm][1] to manually install Kubernetes +with Contiv-VPP networking on one or more bare metal or VM hosts. + +## Clone the Contiv Respository +To clone the Contiv respository enter the following command: +``` +git clone https://github.com/contiv/vpp/<repository-name> +``` +**Note:** Replace *<repository-name>* with the name you want assigned to your cloned contiv repository. + +The cloned repository has important folders that contain content that are referenced in this Contiv documentation; those folders are noted below: +``` +vpp-contiv2$ ls +build build-root doxygen gmod LICENSE Makefile RELEASE.md src +build-data docs extras INFO.yaml MAINTAINERS README.md sphinx_venv test +``` +## Preparing Your Hosts + +### Host-specific Configurations +- **VmWare VMs**: the vmxnet3 driver is required on each interface that will + be used by VPP. Please see [here][13] for instructions how to install the + vmxnet3 driver on VmWare Fusion. + +### Setting up Network Adapter(s) +#### Setting up DPDK +DPDK setup must be completed **on each node** as follows: + +- Load the PCI UIO driver: + ``` + $ sudo modprobe uio_pci_generic + ``` + +- Verify that the PCI UIO driver has loaded successfully: + ``` + $ lsmod | grep uio + uio_pci_generic 16384 0 + uio 20480 1 uio_pci_generic + ``` + + Please note that this driver needs to be loaded upon each server bootup, + so you may want to add `uio_pci_generic` into the `/etc/modules` file, + or a file in the `/etc/modules-load.d/` directory. For example, the + `/etc/modules` file could look as follows: + ``` + # /etc/modules: kernel modules to load at boot time. + # + # This file contains the names of kernel modules that should be loaded + # at boot time, one per line. Lines beginning with "#" are ignored. + uio_pci_generic + ``` +#### Determining Network Adapter PCI Addresses +You need the PCI address of the network interface that VPP will use for the multi-node pod interconnect. On Debian-based +distributions, you can use `lshw`(*): + +``` +$ sudo lshw -class network -businfo +Bus info Device Class Description +==================================================== +pci@0000:00:03.0 ens3 network Virtio network device +pci@0000:00:04.0 ens4 network Virtio network device +``` +**Note:** On CentOS/RedHat/Fedora distributions, `lshw` may not be available by default, install it by issuing the following command: + ``` + yum -y install lshw + ``` + +#### Configuring vswitch to Use Network Adapters +Finally, you need to set up the vswitch to use the network adapters: + +- [Setup on a node with a single NIC][14] +- [Setup a node with multiple NICs][15] + +### Using a Node Setup Script +You can perform the above steps using the [node setup script][17]. + +## Installing Kubernetes with Contiv-VPP CNI plugin +After the nodes you will be using in your K8s cluster are prepared, you can +install the cluster using [kubeadm][1]. + +### (1/4) Installing Kubeadm on Your Hosts +For first-time installation, see [Installing kubeadm][6]. To update an +existing installation, you should do a `apt-get update && apt-get upgrade` +or `yum update` to get the latest version of kubeadm. + +On each host with multiple NICs where the NIC that will be used for Kubernetes +management traffic is not the one pointed to by the default route out of the +host, a [custom management network][12] for Kubernetes must be configured. + +#### Using Kubernetes 1.10 and Above +In K8s 1.10, support for huge pages in a pod has been introduced. For now, this +feature must be either disabled or memory limit must be defined for vswitch container. + +To disable huge pages, perform the following +steps as root: +* Using your favorite editor, disable huge pages in the kubelet configuration + file (`/etc/systemd/system/kubelet.service.d/10-kubeadm.conf` or `/etc/default/kubelet` for version 1.11+): +``` + Environment="KUBELET_EXTRA_ARGS=--feature-gates HugePages=false" +``` +* Restart the kubelet daemon: +``` + systemctl daemon-reload + systemctl restart kubelet +``` + +To define memory limit, append the following snippet to vswitch container in deployment yaml file: +``` + resources: + limits: + hugepages-2Mi: 1024Mi + memory: 1024Mi + +``` +or set `contiv.vswitch.defineMemoryLimits` to `true` in [helm values](https://github.com/contiv/vpp/blob/master/k8s/contiv-vpp/README.md). + +### (2/4) Initializing Your Master +Before initializing the master, you may want to [remove][8] any +previously installed K8s components. Then, proceed with master initialization +as described in the [kubeadm manual][3]. Execute the following command as +root: +``` +kubeadm init --token-ttl 0 --pod-network-cidr=10.1.0.0/16 +``` +**Note:** `kubeadm init` will autodetect the network interface to advertise +the master on as the interface with the default gateway. If you want to use a +different interface (i.e. a custom management network setup), specify the +`--apiserver-advertise-address=<ip-address>` argument to kubeadm init. For +example: +``` +kubeadm init --token-ttl 0 --pod-network-cidr=10.1.0.0/16 --apiserver-advertise-address=192.168.56.106 +``` +**Note:** The CIDR specified with the flag `--pod-network-cidr` is used by +kube-proxy, and it **must include** the `PodSubnetCIDR` from the `IPAMConfig` +section in the Contiv-vpp config map in Contiv-vpp's deployment file +[contiv-vpp.yaml](https://github.com/contiv/vpp/blob/master/k8s/contiv-vpp/values.yaml). Pods in the host network namespace +are a special case; they share their respective interfaces and IP addresses with +the host. For proxying to work properly it is therefore required for services +with backends running on the host to also **include the node management IP** +within the `--pod-network-cidr` subnet. For example, with the default +`PodSubnetCIDR=10.1.0.0/16` and `PodIfIPCIDR=10.2.1.0/24`, the subnet +`10.3.0.0/16` could be allocated for the management network and +`--pod-network-cidr` could be defined as `10.0.0.0/8`, so as to include IP +addresses of all pods in all network namespaces: +``` +kubeadm init --token-ttl 0 --pod-network-cidr=10.0.0.0/8 --apiserver-advertise-address=10.3.1.1 +``` + +If Kubernetes was initialized successfully, it prints out this message: +``` +Your Kubernetes master has initialized successfully! +``` + +After successful initialization, don't forget to set up your .kube directory +as a regular user (as instructed by `kubeadm`): +```bash +mkdir -p $HOME/.kube +sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config +sudo chown $(id -u):$(id -g) $HOME/.kube/config +``` + +### (3/4) Installing the Contiv-VPP Pod Network +If you have already used the Contiv-VPP plugin before, you may need to pull +the most recent Docker images on each node: +``` +bash <(curl -s https://raw.githubusercontent.com/contiv/vpp/master/k8s/pull-images.sh) +``` + +Install the Contiv-VPP network for your cluster as follows: + +- If you do not use the STN feature, install Contiv-vpp as follows: + ``` + kubectl apply -f https://raw.githubusercontent.com/contiv/vpp/master/k8s/contiv-vpp.yaml + ``` + +- If you use the STN feature, download the `contiv-vpp.yaml` file: + ``` + wget https://raw.githubusercontent.com/contiv/vpp/master/k8s/contiv-vpp.yaml + ``` + Then edit the STN configuration as described [here][16]. Finally, create + the Contiv-vpp deployment from the edited file: + ``` + kubectl apply -f ./contiv-vpp.yaml + ``` + +Beware contiv-etcd data is persisted in `/var/etcd` by default. It has to be cleaned up manually after `kubeadm reset`. +Otherwise outdated data will be loaded by a subsequent deployment. + +You can also generate random subfolder, alternatively: + +``` +curl --silent https://raw.githubusercontent.com/contiv/vpp/master/k8s/contiv-vpp.yaml | sed "s/\/var\/etcd\/contiv-data/\/var\/etcd\/contiv-data\/$RANDOM/g" | kubectl apply -f - +``` + +#### Deployment Verification +After some time, all contiv containers should enter the running state: +``` +root@cvpp:/home/jan# kubectl get pods -n kube-system -o wide | grep contiv +NAME READY STATUS RESTARTS AGE IP NODE +... +contiv-etcd-gwc84 1/1 Running 0 14h 192.168.56.106 cvpp +contiv-ksr-5c2vk 1/1 Running 2 14h 192.168.56.106 cvpp +contiv-vswitch-l59nv 2/2 Running 0 14h 192.168.56.106 cvpp +``` +In particular, make sure that the Contiv-VPP pod IP addresses are the same as +the IP address specified in the `--apiserver-advertise-address=<ip-address>` +argument to kubeadm init. + +Verify that the VPP successfully grabbed the network interface specified in +the VPP startup config (`GigabitEthernet0/4/0` in our case): +``` +$ sudo vppctl +vpp# sh inter + Name Idx State Counter Count +GigabitEthernet0/4/0 1 up rx packets 1294 + rx bytes 153850 + tx packets 512 + tx bytes 21896 + drops 962 + ip4 1032 +host-40df9b44c3d42f4 3 up rx packets 126601 + rx bytes 44628849 + tx packets 132155 + tx bytes 27205450 + drops 24 + ip4 126585 + ip6 16 +host-vppv2 2 up rx packets 132162 + rx bytes 27205824 + tx packets 126658 + tx bytes 44634963 + drops 15 + ip4 132147 + ip6 14 +local0 0 down +``` + +You should also see the interface to kube-dns (`host-40df9b44c3d42f4`) and to the +node's IP stack (`host-vppv2`). + +#### Master Isolation (Optional) +By default, your cluster will not schedule pods on the master for security +reasons. If you want to be able to schedule pods on the master, (e.g., for a +single-machine Kubernetes cluster for development), then run: + +``` +kubectl taint nodes --all node-role.kubernetes.io/master- +``` +More details about installing the pod network can be found in the +[kubeadm manual][4]. + +### (4/4) Joining Your Nodes +To add a new node to your cluster, run as root the command that was output +by kubeadm init. For example: +``` +kubeadm join --token <token> <master-ip>:<master-port> --discovery-token-ca-cert-hash sha256:<hash> +``` +More details can be found int the [kubeadm manual][5]. + +#### Deployment Verification +After some time, all contiv containers should enter the running state: +``` +root@cvpp:/home/jan# kubectl get pods -n kube-system -o wide | grep contiv +NAME READY STATUS RESTARTS AGE IP NODE +contiv-etcd-gwc84 1/1 Running 0 14h 192.168.56.106 cvpp +contiv-ksr-5c2vk 1/1 Running 2 14h 192.168.56.106 cvpp +contiv-vswitch-h6759 2/2 Running 0 14h 192.168.56.105 cvpp-slave2 +contiv-vswitch-l59nv 2/2 Running 0 14h 192.168.56.106 cvpp +etcd-cvpp 1/1 Running 0 14h 192.168.56.106 cvpp +kube-apiserver-cvpp 1/1 Running 0 14h 192.168.56.106 cvpp +kube-controller-manager-cvpp 1/1 Running 0 14h 192.168.56.106 cvpp +kube-dns-545bc4bfd4-fr6j9 3/3 Running 0 14h 10.1.134.2 cvpp +kube-proxy-q8sv2 1/1 Running 0 14h 192.168.56.106 cvpp +kube-proxy-s8kv9 1/1 Running 0 14h 192.168.56.105 cvpp-slave2 +kube-scheduler-cvpp 1/1 Running 0 14h 192.168.56.106 cvpp +``` +In particular, verify that a vswitch pod and a kube-proxy pod is running on +each joined node, as shown above. + +On each joined node, verify that the VPP successfully grabbed the network +interface specified in the VPP startup config (`GigabitEthernet0/4/0` in +our case): +``` +$ sudo vppctl +vpp# sh inter + Name Idx State Counter Count +GigabitEthernet0/4/0 1 up +... +``` +From the vpp CLI on a joined node you can also ping kube-dns to verify +node-to-node connectivity. For example: +``` +vpp# ping 10.1.134.2 +64 bytes from 10.1.134.2: icmp_seq=1 ttl=64 time=.1557 ms +64 bytes from 10.1.134.2: icmp_seq=2 ttl=64 time=.1339 ms +64 bytes from 10.1.134.2: icmp_seq=3 ttl=64 time=.1295 ms +64 bytes from 10.1.134.2: icmp_seq=4 ttl=64 time=.1714 ms +64 bytes from 10.1.134.2: icmp_seq=5 ttl=64 time=.1317 ms + +Statistics: 5 sent, 5 received, 0% packet loss +``` +### Deploying Example Applications +#### Simple Deployment +You can go ahead and create a simple deployment: +``` +$ kubectl run nginx --image=nginx --replicas=2 +``` + +Use `kubectl describe pod` to get the IP address of a pod, e.g.: +``` +$ kubectl describe pod nginx | grep IP +``` +You should see two ip addresses, for example: +``` +IP: 10.1.1.3 +IP: 10.1.1.4 +``` + +You can check the pods' connectivity in one of the following ways: +* Connect to the VPP debug CLI and ping any pod: +``` + sudo vppctl + vpp# ping 10.1.1.3 +``` +* Start busybox and ping any pod: +``` + kubectl run busybox --rm -ti --image=busybox /bin/sh + If you don't see a command prompt, try pressing enter. + / # + / # ping 10.1.1.3 + +``` +* You should be able to ping any pod from the host: +``` + ping 10.1.1.3 +``` + +#### Deploying Pods on Different Nodes +to enable pod deployment on the master, untaint the master first: +``` +kubectl taint nodes --all node-role.kubernetes.io/master- +``` + +In order to verify inter-node pod connectivity, we need to tell Kubernetes +to deploy one pod on the master node and one POD on the worker. For this, +we can use node selectors. + +In your deployment YAMLs, add the `nodeSelector` sections that refer to +preferred node hostnames, e.g.: +``` + nodeSelector: + kubernetes.io/hostname: vm5 +``` + +Example of whole JSONs: +``` +apiVersion: v1 +kind: Pod +metadata: + name: nginx1 +spec: + nodeSelector: + kubernetes.io/hostname: vm5 + containers: + - name: nginx + + : nginx +``` + +``` +apiVersion: v1 +kind: Pod +metadata: + name: nginx2 +spec: + nodeSelector: + kubernetes.io/hostname: vm6 + containers: + - name: nginx + image: nginx +``` + +After deploying the JSONs, verify they were deployed on different hosts: +``` +$ kubectl get pods -o wide +NAME READY STATUS RESTARTS AGE IP NODE +nginx1 1/1 Running 0 13m 10.1.36.2 vm5 +nginx2 1/1 Running 0 13m 10.1.219.3 vm6 +``` + +Now you can verify the connectivity to both nginx PODs from a busybox POD: +``` +kubectl run busybox --rm -it --image=busybox /bin/sh + +/ # wget 10.1.36.2 +Connecting to 10.1.36.2 (10.1.36.2:80) +index.html 100% |*******************************************************************************************************************************************************************| 612 0:00:00 ETA + +/ # rm index.html + +/ # wget 10.1.219.3 +Connecting to 10.1.219.3 (10.1.219.3:80) +index.html 100% |*******************************************************************************************************************************************************************| 612 0:00:00 ETA +``` + +### Uninstalling Contiv-VPP +To uninstall the network plugin itself, use `kubectl`: +``` +kubectl delete -f https://raw.githubusercontent.com/contiv/vpp/master/k8s/contiv-vpp.yaml +``` + +### Tearing down Kubernetes +* First, drain the node and make sure that the node is empty before +shutting it down: +``` + kubectl drain <node name> --delete-local-data --force --ignore-daemonsets + kubectl delete node <node name> +``` +* Next, on the node being removed, reset all kubeadm installed state: +``` + rm -rf $HOME/.kube + sudo su + kubeadm reset +``` + +* If you added environment variable definitions into + `/etc/systemd/system/kubelet.service.d/10-kubeadm.conf`, this would have been a process from the [Custom Management Network file][10], then remove the definitions now. + +### Troubleshooting +Some of the issues that can occur during the installation are: + +- Forgetting to create and initialize the `.kube` directory in your home + directory (As instructed by `kubeadm init --token-ttl 0`). This can manifest + itself as the following error: + ``` + W1017 09:25:43.403159 2233 factory_object_mapping.go:423] Failed to download OpenAPI (Get https://192.168.209.128:6443/swagger-2.0.0.pb-v1: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")), falling back to swagger + Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes") + ``` +- Previous installation lingering on the file system. + `'kubeadm init --token-ttl 0` fails to initialize kubelet with one or more + of the following error messages: + ``` + ... + [kubelet-check] It seems like the kubelet isn't running or healthy. + [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused. + ... + ``` + +If you run into any of the above issues, try to clean up and reinstall as root: +``` +sudo su +rm -rf $HOME/.kube +kubeadm reset +kubeadm init --token-ttl 0 +rm -rf /var/etcd/contiv-data +rm -rf /var/bolt/bolt.db +``` + +[1]: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ +[3]: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#initializing-your-master +[4]: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#pod-network +[5]: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#joining-your-nodes +[6]: https://kubernetes.io/docs/setup/independent/install-kubeadm/ +[8]: #tearing-down-kubernetes +[10]: https://github.com/contiv/vpp/blob/master/docs/CUSTOM_MGMT_NETWORK.md#setting-up-a-custom-management-network-on-multi-homed-nodes +[11]: ../vagrant/README.md +[12]: https://github.com/contiv/vpp/tree/master/docs/CUSTOM_MGMT_NETWORK.md +[13]: https://github.com/contiv/vpp/tree/master/docs/VMWARE_FUSION_HOST.md +[14]: https://github.com/contiv/vpp/tree/master/docs/SINGLE_NIC_SETUP.md +[15]: https://github.com/contiv/vpp/tree/master/docs/MULTI_NIC_SETUP.md +[16]: https://github.com/contiv/vpp/tree/master/docs/SINGLE_NIC_SETUP.md#configuring-stn-in-contiv-vpp-k8s-deployment-files +[17]: https://github.com/contiv/vpp/tree/master/k8s/README.md#setup-node-sh diff --git a/docs/usecases/contiv/MULTI_NIC_SETUP.md b/docs/usecases/contiv/MULTI_NIC_SETUP.md new file mode 100644 index 00000000000..5cbb031d290 --- /dev/null +++ b/docs/usecases/contiv/MULTI_NIC_SETUP.md @@ -0,0 +1,21 @@ +### Setting Up a Node with Multiple NICs + +* First, configure hardware interfaces in the VPP startup config, as +described [here](https://github.com/contiv/vpp/blob/master/docs/VPP_CONFIG.md#multi-nic-configuration). + +* For each interface owned by Linux, you need to provide individual + configuration for each interface used by VPP in the Node Configuration + for thenode in the `contiv-vpp.yaml`. For example, if both `ens3` and + `ens4` are known to Linux, then put the following stanza into the node's + NodeConfig: +``` +... + NodeConfig: + - NodeName: "ubuntu-1" + StealInterface: "ens3" + StealInterface: "ens4" +... +``` + If only `ens3` is known to Linux, you only put a line for `ens3` into the + above NodeConfig. + diff --git a/docs/usecases/contiv/NETWORKING.md b/docs/usecases/contiv/NETWORKING.md new file mode 100644 index 00000000000..0b6d08127fb --- /dev/null +++ b/docs/usecases/contiv/NETWORKING.md @@ -0,0 +1,137 @@ +# Contiv/VPP Network Operation + +This document describes the network operation of the Contiv/VPP k8s network plugin. It +elaborates the operation and config options of the Contiv IPAM, as well as +details on how the VPP gets programmed by Contiv/VPP control plane. + +The following picture shows 2-node k8s deployment of Contiv/VPP, with a VXLAN tunnel +established between the nodes to forward inter-node POD traffic. The IPAM options +are depicted on the Node 1, whereas the VPP programming is depicted on the Node 2. + +![Contiv/VPP Architecture](/_images/contiv-networking.png "contiv-networking.png") + +## Contiv/VPP IPAM (IP Address Management) + +IPAM in Contiv/VPP is based on the concept of **Node ID**. The Node ID is a number +that uniquely identifies a node in the k8s cluster. The first node is assigned +the ID of 1, the second node 2, etc. If a node leaves the cluster, its +ID is released back to the pool and will be re-used by the next node. + +The Node ID is used to calculate per-node IP subnets for PODs +and other internal subnets that need to be unique on each node. Apart from the Node ID, +the input for IPAM calculations is a set of config knobs, which can be specified +in the `IPAMConfig` section of the [Contiv/VPP deployment YAML](../../../k8s/contiv-vpp.yaml): + +- **PodSubnetCIDR** (default `10.1.0.0/16`): each pod gets an IP address assigned +from this range. The size of this range (default `/16`) dictates upper limit of +POD count for the entire k8s cluster (default 65536 PODs). + +- **PodNetworkPrefixLen** (default `24`): per-node dedicated podSubnet range. +From the allocatable range defined in `PodSubnetCIDR`, this value will dictate the +allocation for each node. With the default value (`24`) this indicates that each node +has a `/24` slice of the `PodSubnetCIDR`. The Node ID is used to address the node. +In case of `PodSubnetCIDR = 10.1.0.0/16`, `PodNetworkPrefixLen = 24` and `NodeID = 5`, +the resulting POD subnet for the node would be `10.1.5.0/24`. + +- **PodIfIPCIDR** (default `10.2.1.0/24`): VPP-internal addresses put the VPP interfaces +facing towards the PODs into L3 mode. This IP range will be reused +on each node, thereby it is never externally addressable outside of the node itself. +The only requirement is that this subnet should not collide with any other IPAM subnet. + +- **VPPHostSubnetCIDR** (default `172.30.0.0/16`): used for addressing +the interconnect of VPP with the Linux network stack, within the same node. +Since this subnet needs to be unique on each node, the Node ID is used to determine +the actual subnet used on the node with the combination of `VPPHostNetworkPrefixLen`, `PodSubnetCIDR` and `PodNetworkPrefixLen`. + +- **VPPHostNetworkPrefixLen** (default `24`): used to calculate the subnet +for addressing the interconnect of VPP with the Linux network stack, within the same node. +With `VPPHostSubnetCIDR = 172.30.0.0/16`, `VPPHostNetworkPrefixLen = 24` and +`NodeID = 5` the resulting subnet for the node would be `172.30.5.0/24`. + +- **NodeInterconnectCIDR** (default `192.168.16.0/24`): range for the addresses +assigned to the data plane interfaces managed by VPP. Unless DHCP is used +(`NodeInterconnectDHCP = True`), the Contiv/VPP control plane automatically assigns +an IP address from this range to the DPDK-managed ethernet interface bound to VPP +on each node. The actual IP address will be calculated from the Node ID (e.g., with +`NodeInterconnectCIDR = 192.168.16.0/24` and `NodeID = 5`, the resulting IP +address assigned to the ethernet interface on VPP will be `192.168.16.5` ). + +- **NodeInterconnectDHCP** (default `False`): instead of assigning the IPs +for the data plane interfaces, which are managed by VPP from `NodeInterconnectCIDR` by the Contiv/VPP +control plane, DHCP assigns the IP addresses. The DHCP must be running in the network where the data +plane interface is connected, in case `NodeInterconnectDHCP = True`, +`NodeInterconnectCIDR` is ignored. + +- **VxlanCIDR** (default `192.168.30.0/24`): in order to provide inter-node +POD to POD connectivity via any underlay network (not necessarily an L2 network), +Contiv/VPP sets up a VXLAN tunnel overlay between each of the 2 nodes within the cluster. Each node needs its unique IP address of the VXLAN BVI interface. This IP address +is automatically calculated from the Node ID, (e.g., with `VxlanCIDR = 192.168.30.0/24` +and `NodeID = 5`, the resulting IP address assigned to the VXLAN BVI interface will be `192.168.30.5`). + +## VPP Programming +This section describes how the Contiv/VPP control plane programs VPP, based on the +events it receives from k8s. This section is not necessarily for understanding +basic Contiv/VPP operation, but is very useful for debugging purposes. + +Contiv/VPP currently uses a single VRF to forward the traffic between PODs on a node, +PODs on different nodes, host network stack, and DPDK-managed dataplane interface. The forwarding +between each of them is purely L3-based, even for cases of communication +between 2 PODs within the same node. + +#### DPDK-Managed Data Interface +In order to allow inter-node communication between PODs on different +nodes and between PODs and outside world, Contiv/VPP uses data-plane interfaces +bound to VPP using DPDK. Each node should have one "main" VPP interface, +which is unbound from the host network stack and bound to VPP. +The Contiv/VPP control plane automatically configures the interface either +via DHCP, or with a statically assigned address (see `NodeInterconnectCIDR` and +`NodeInterconnectDHCP` yaml settings). + +#### PODs on the Same Node +PODs are connected to VPP using virtio-based TAP interfaces created by VPP, +with the POD-end of the interface placed into the POD container network namespace. +Each POD is assigned an IP address from the `PodSubnetCIDR`. The allocated IP +is configured with the prefix length `/32`. Additionally, a static route pointing +towards the VPP is configured in the POD network namespace. +The prefix length `/32` means that all IP traffic will be forwarded to the +default route - VPP. To get rid of unnecessary broadcasts between POD and VPP, +a static ARP entry is configured for the gateway IP in the POD namespace, as well +as for POD IP on VPP. Both ends of the TAP interface have a static (non-default) +MAC address applied. + +#### PODs with hostNetwork=true +PODs with a `hostNetwork=true` attribute are not placed into a separate network namespace, they instead use the main host Linux network namespace; therefore, they are not directly connected to the VPP. They rely on the interconnection between the VPP and the host Linux network stack, +which is described in the next paragraph. Note, when these PODs access some service IP, their network communication will be NATed in Linux (by iptables rules programmed by kube-proxy) +as opposed to VPP, which is the case for the PODs connected to VPP directly. + +#### Linux Host Network Stack +In order to interconnect the Linux host network stack with VPP (to allow access +to the cluster resources from the host itself, as well as for the PODs with `hostNetwork=true`), +VPP creates a TAP interface between VPP and the main network namespace. The TAP interface is configured with IP addresses from the `VPPHostSubnetCIDR` range, with `.1` in the latest octet on the VPP side, and `.2` on the host side. The name of the host interface is `vpp1`. The host has static routes pointing to VPP configured with: +- A route to the whole `PodSubnetCIDR` to route traffic targeting PODs towards VPP. +- A route to `ServiceCIDR` (default `10.96.0.0/12`), to route service IP targeted traffic that has not been translated by kube-proxy for some reason towards VPP. +- The host also has a static ARP entry configured for the IP of the VPP-end TAP interface, to get rid of unnecessary broadcasts between the main network namespace and VPP. + +#### VXLANs to Other Nodes +In order to provide inter-node POD to POD connectivity via any underlay network +(not necessarily an L2 network), Contiv/VPP sets up a VXLAN tunnel overlay between +each 2 nodes within the cluster (full mesh). + +All VXLAN tunnels are terminated in one bridge domain on each VPP. The bridge domain +has learning and flooding disabled, the l2fib of the bridge domain contains a static entry for each VXLAN tunnel. Each bridge domain has a BVI interface, which +interconnects the bridge domain with the main VRF (L3 forwarding). This interface needs +a unique IP address, which is assigned from the `VxlanCIDR` as describe above. + +The main VRF contains several static routes that point to the BVI IP addresses of other nodes. +For each node, it is a route to PODSubnet and VppHostSubnet of the remote node, as well as a route +to the management IP address of the remote node. For each of these routes, the next hop IP is the +BVI interface IP of the remote node, which goes via the BVI interface of the local node. + +The VXLAN tunnels and the static routes pointing to them are added/deleted on each VPP, +whenever a node is added/deleted in the k8s cluster. + + +#### More Info +Please refer to the [Packet Flow Dev Guide](../dev-guide/PACKET_FLOW.html) for more +detailed description of paths traversed by request and response packets +inside Contiv/VPP Kubernetes cluster under different situations.
\ No newline at end of file diff --git a/docs/usecases/contiv/Prometheus.md b/docs/usecases/contiv/Prometheus.md new file mode 100644 index 00000000000..ba61be3c739 --- /dev/null +++ b/docs/usecases/contiv/Prometheus.md @@ -0,0 +1,159 @@ +# Prometheus Statistics + +Each contiv-agent exposes statistics in Prometheus format at port `9999` by default. +Exposed data is split into two groups: +- `/stats` provides statistics for VPP interfaces managed by contiv-agent + Prometheus data is a set of counters with labels. For each interface, + the following counters are exposed: + * *inPackets* + * *outPackets* + * *inBytes* + * *outBytes* + * *ipv4Packets* + * *ipv6Packets* + * *outErrorPackets* + * *dropPackets* + * *inMissPackets* + * *inNobufPackets* + * *puntPackets* + + Labels let you add additional information to a counter. The *interfaceName* and *node* + labels are specified for all counters. If an interface is associated with a particular + pod, then the *podName* and *podNamespace* labels are also specified for its counters; + otherwise, a placeholder value (`--`) is used (for example, for node interconnect + interfaces). +- `/metrics` provides general go runtime statistics + +To access Prometheus stats of a node you can use `curl localhost:9999/stats` from the node. The output of contiv-agent running at k8s master node looks similar to the following: + +``` +$ curl localhost:9999/stats +# HELP dropPackets Number of dropped packets for interface +# TYPE dropPackets gauge +dropPackets{interfaceName="GigabitEthernet0/9/0",node="dev",podName="--",podNamespace="--"} 0 +dropPackets{interfaceName="tap-vpp2",node="dev",podName="--",podNamespace="--"} 52 +dropPackets{interfaceName="tap0e6439a7a934336",node="dev",podName="web-667bdcb4d8-pxkfs",podNamespace="default"} 9 +dropPackets{interfaceName="tap5338a3285ad6bd7",node="dev",podName="kube-dns-6f4fd4bdf-rsz9b",podNamespace="kube-system"} 12 +dropPackets{interfaceName="vxlanBVI",node="dev",podName="--",podNamespace="--"} 0 +# HELP inBytes Number of received bytes for interface +# TYPE inBytes gauge +inBytes{interfaceName="GigabitEthernet0/9/0",node="dev",podName="--",podNamespace="--"} 0 +inBytes{interfaceName="tap-vpp2",node="dev",podName="--",podNamespace="--"} 24716 +inBytes{interfaceName="tap0e6439a7a934336",node="dev",podName="web-667bdcb4d8-pxkfs",podNamespace="default"} 726 +inBytes{interfaceName="tap5338a3285ad6bd7",node="dev",podName="kube-dns-6f4fd4bdf-rsz9b",podNamespace="kube-system"} 6113 +inBytes{interfaceName="vxlanBVI",node="dev",podName="--",podNamespace="--"} 0 +# HELP inErrorPackets Number of received packets with error for interface +# TYPE inErrorPackets gauge +inErrorPackets{interfaceName="GigabitEthernet0/9/0",node="dev",podName="--",podNamespace="--"} 0 +inErrorPackets{interfaceName="tap-vpp2",node="dev",podName="--",podNamespace="--"} 0 +inErrorPackets{interfaceName="tap0e6439a7a934336",node="dev",podName="web-667bdcb4d8-pxkfs",podNamespace="default"} 0 +inErrorPackets{interfaceName="tap5338a3285ad6bd7",node="dev",podName="kube-dns-6f4fd4bdf-rsz9b",podNamespace="kube-system"} 0 +inErrorPackets{interfaceName="vxlanBVI",node="dev",podName="--",podNamespace="--"} 0 +# HELP inMissPackets Number of missed packets for interface +# TYPE inMissPackets gauge +inMissPackets{interfaceName="GigabitEthernet0/9/0",node="dev",podName="--",podNamespace="--"} 0 +inMissPackets{interfaceName="tap-vpp2",node="dev",podName="--",podNamespace="--"} 0 +inMissPackets{interfaceName="tap0e6439a7a934336",node="dev",podName="web-667bdcb4d8-pxkfs",podNamespace="default"} 0 +inMissPackets{interfaceName="tap5338a3285ad6bd7",node="dev",podName="kube-dns-6f4fd4bdf-rsz9b",podNamespace="kube-system"} 0 +inMissPackets{interfaceName="vxlanBVI",node="dev",podName="--",podNamespace="--"} 0 +# HELP inNobufPackets Number of received packets ??? for interface +# TYPE inNobufPackets gauge +inNobufPackets{interfaceName="GigabitEthernet0/9/0",node="dev",podName="--",podNamespace="--"} 0 +inNobufPackets{interfaceName="tap-vpp2",node="dev",podName="--",podNamespace="--"} 0 +inNobufPackets{interfaceName="tap0e6439a7a934336",node="dev",podName="web-667bdcb4d8-pxkfs",podNamespace="default"} 0 +inNobufPackets{interfaceName="tap5338a3285ad6bd7",node="dev",podName="kube-dns-6f4fd4bdf-rsz9b",podNamespace="kube-system"} 0 +inNobufPackets{interfaceName="vxlanBVI",node="dev",podName="--",podNamespace="--"} 0 +# HELP inPackets Number of received packets for interface +# TYPE inPackets gauge +inPackets{interfaceName="GigabitEthernet0/9/0",node="dev",podName="--",podNamespace="--"} 0 +inPackets{interfaceName="tap-vpp2",node="dev",podName="--",podNamespace="--"} 97 +inPackets{interfaceName="tap0e6439a7a934336",node="dev",podName="web-667bdcb4d8-pxkfs",podNamespace="default"} 9 +inPackets{interfaceName="tap5338a3285ad6bd7",node="dev",podName="kube-dns-6f4fd4bdf-rsz9b",podNamespace="kube-system"} 60 +inPackets{interfaceName="vxlanBVI",node="dev",podName="--",podNamespace="--"} 0 +# HELP ipv4Packets Number of ipv4 packets for interface +# TYPE ipv4Packets gauge +ipv4Packets{interfaceName="GigabitEthernet0/9/0",node="dev",podName="--",podNamespace="--"} 0 +ipv4Packets{interfaceName="tap-vpp2",node="dev",podName="--",podNamespace="--"} 68 +ipv4Packets{interfaceName="tap0e6439a7a934336",node="dev",podName="web-667bdcb4d8-pxkfs",podNamespace="default"} 0 +ipv4Packets{interfaceName="tap5338a3285ad6bd7",node="dev",podName="kube-dns-6f4fd4bdf-rsz9b",podNamespace="kube-system"} 52 +ipv4Packets{interfaceName="vxlanBVI",node="dev",podName="--",podNamespace="--"} 0 +# HELP ipv6Packets Number of ipv6 packets for interface +# TYPE ipv6Packets gauge +ipv6Packets{interfaceName="GigabitEthernet0/9/0",node="dev",podName="--",podNamespace="--"} 0 +ipv6Packets{interfaceName="tap-vpp2",node="dev",podName="--",podNamespace="--"} 26 +ipv6Packets{interfaceName="tap0e6439a7a934336",node="dev",podName="web-667bdcb4d8-pxkfs",podNamespace="default"} 9 +ipv6Packets{interfaceName="tap5338a3285ad6bd7",node="dev",podName="kube-dns-6f4fd4bdf-rsz9b",podNamespace="kube-system"} 8 +ipv6Packets{interfaceName="vxlanBVI",node="dev",podName="--",podNamespace="--"} 0 +# HELP outBytes Number of transmitted bytes for interface +# TYPE outBytes gauge +outBytes{interfaceName="GigabitEthernet0/9/0",node="dev",podName="--",podNamespace="--"} 0 +outBytes{interfaceName="tap-vpp2",node="dev",podName="--",podNamespace="--"} 5203 +outBytes{interfaceName="tap0e6439a7a934336",node="dev",podName="web-667bdcb4d8-pxkfs",podNamespace="default"} 0 +outBytes{interfaceName="tap5338a3285ad6bd7",node="dev",podName="kube-dns-6f4fd4bdf-rsz9b",podNamespace="kube-system"} 17504 +outBytes{interfaceName="vxlanBVI",node="dev",podName="--",podNamespace="--"} 0 +# HELP outErrorPackets Number of transmitted packets with error for interface +# TYPE outErrorPackets gauge +outErrorPackets{interfaceName="GigabitEthernet0/9/0",node="dev",podName="--",podNamespace="--"} 0 +outErrorPackets{interfaceName="tap-vpp2",node="dev",podName="--",podNamespace="--"} 0 +outErrorPackets{interfaceName="tap0e6439a7a934336",node="dev",podName="web-667bdcb4d8-pxkfs",podNamespace="default"} 0 +outErrorPackets{interfaceName="tap5338a3285ad6bd7",node="dev",podName="kube-dns-6f4fd4bdf-rsz9b",podNamespace="kube-system"} 0 +outErrorPackets{interfaceName="vxlanBVI",node="dev",podName="--",podNamespace="--"} 0 +# HELP outPackets Number of transmitted packets for interface +# TYPE outPackets gauge +outPackets{interfaceName="GigabitEthernet0/9/0",node="dev",podName="--",podNamespace="--"} 0 +outPackets{interfaceName="tap-vpp2",node="dev",podName="--",podNamespace="--"} 49 +outPackets{interfaceName="tap0e6439a7a934336",node="dev",podName="web-667bdcb4d8-pxkfs",podNamespace="default"} 0 +outPackets{interfaceName="tap5338a3285ad6bd7",node="dev",podName="kube-dns-6f4fd4bdf-rsz9b",podNamespace="kube-system"} 45 +outPackets{interfaceName="vxlanBVI",node="dev",podName="--",podNamespace="--"} 0 +# HELP puntPackets Number of punt packets for interface +# TYPE puntPackets gauge +puntPackets{interfaceName="GigabitEthernet0/9/0",node="dev",podName="--",podNamespace="--"} 0 +puntPackets{interfaceName="tap-vpp2",node="dev",podName="--",podNamespace="--"} 0 +puntPackets{interfaceName="tap0e6439a7a934336",node="dev",podName="web-667bdcb4d8-pxkfs",podNamespace="default"} 0 +puntPackets{interfaceName="tap5338a3285ad6bd7",node="dev",podName="kube-dns-6f4fd4bdf-rsz9b",podNamespace="kube-system"} 0 +puntPackets{interfaceName="vxlanBVI",node="dev",podName="--",podNamespace="--"} 0 + +``` + + +In order to browse stats in web UI Prometheus, it must be started locally by following the information in +the [Prometheus Getting Started Guide](https://prometheus.io/docs/prometheus/latest/getting_started/). + +If you start Prometheus on a node, the following sample config can be used: +```yaml +global: + scrape_interval: 15s + +scrape_configs: + - job_name: 'contiv_stats' + metrics_path: '/stats' + static_configs: + - targets: ['localhost:9999'] + - job_name: 'contiv_agent' + # metrics_path defaults to '/metrics' + static_configs: + - targets: ['localhost:9999'] +``` + +Once Prometheus is started with the specified config, you should be able access its web UI at +`localhost:9090`. +``` +tester@dev:~/Downloads/prometheus-2.1.0.linux-amd64$ ./prometheus --config.file=config.yml +``` + +If security features are enabled for the HTTP endpoint, then the config must be adjusted: +```yaml + - job_name: 'contiv_secured' + + scheme: https + basic_auth: + username: user + password: pass + metrics_path: /stats + tls_config: + insecure_skip_verify: true + # CA certificate to validate API server certificate with. + #[ ca_file: <filename> ] + static_configs: + - targets: ['localhost:9191'] +```
\ No newline at end of file diff --git a/docs/usecases/contiv/SECURITY.md b/docs/usecases/contiv/SECURITY.md new file mode 100644 index 00000000000..40c5250e311 --- /dev/null +++ b/docs/usecases/contiv/SECURITY.md @@ -0,0 +1,104 @@ +# Security + +There are two types of security that are utilized in Contiv, and are discussed in this section: [HTTP](#http-security) and [ETCD](#etcd-security). + +## HTTP Security + +By default, the access to endpoints (liveness, readiness probe, prometheus stats, ...) served by Contiv-vswitch and +Contiv-ksr is open to anybody. Contiv-vswitch exposes endpoints using port `9999` and contiv-ksr uses `9191`. + +To secure access to the endpoints, the SSL/TLS server certificate and basic auth (username password) can be configured. + +In Contiv-VPP, this can be done using the Helm charts in [k8s/contiv-vpp folder](https://github.com/contiv/vpp/tree/master/k8s/contiv-vpp). + +To generate server certificate the approach described in [ETCD security](#etcd-security) can be leveraged. + +## ETCD Security + +By default, the access to Contiv-VPP ETCD is open to anybody. ETCD gets deployed +on the master node, on port `12379`, and is exposed using the NodePort service +on port `32379`, on each node. + +To secure access to ETCD, we recommend using the SSL/TLS certificates to authenticate +both the client and server side, and encrypt the communication. In Contiv-VPP, this can be done using the Helm charts in [k8s/contiv-vpp folder](https://github.com/contiv/vpp/tree/master/k8s/contiv-vpp). + +The prerequisite for that is the generation of SSL certificates. + + +### Generate Self-Signed Certificates +In order to secure ETCD, we need to create our own certificate authority, +and then generate the private keys and certificates for both the ETCD server and ETCD clients. + +This guide uses CloudFlare's [cfssl](https://github.com/cloudflare/cfssl) tools to do this job. +It follows the steps described in this [CoreOS guide](https://github.com/coreos/docs/blob/master/os/generate-self-signed-certificates.md). + +Perform the following steps to generate private keys and certificates: + +##### 1. Install cfssl +``` +mkdir ~/bin +curl -s -L -o ~/bin/cfssl https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 +curl -s -L -o ~/bin/cfssljson https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 +chmod +x ~/bin/{cfssl,cfssljson} +export PATH=$PATH:~/bin +``` + +##### 2. Initialize a Certificate Authority +``` +echo '{"CN":"CA","key":{"algo":"rsa","size":2048}}' | cfssl gencert -initca - | cfssljson -bare ca - +echo '{"signing":{"default":{"expiry":"43800h","usages":["signing","key encipherment","server auth","client auth"]}}}' > ca-config.json +``` + +##### 3. Generate Server Key + Certificate +Replace the IP address `10.0.2.15` below with the IP address of your master node: +``` +export ADDRESS=127.0.0.1,10.0.2.15 +export NAME=server +echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=ca-config.json -ca=ca.pem -ca-key=ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $NAME +``` + +##### 4. Generate Client Key + Certificate +``` +export ADDRESS= +export NAME=client +echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=ca-config.json -ca=ca.pem -ca-key=ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $NAME +``` + +The above commands produce the following files that will be needed in order to secure ETCD: + - `ca.pem`: certificate of the certificate authority + - `server.pem`: certificate of the ETCD server + - `server-key.pem`: private key of the ETCD server + - `client.pem`: certificate for the ETCD clients + - `client-key.pem`: private key for the ETCD clients + + +### Distribute Certificates and Generate Contiv-VPP Deployment Yaml +There are two options for distributing the certificates to all nodes in a k8s cluster. +You can either distribute the certificates +[manually](#distribute-certificates-manually), or embed the certificates into the deployment yaml file and +distribute them as [k8s secrets](https://kubernetes.io/docs/concepts/configuration/secret/). + +##### Distribute Certificates Manually +In this case, you need to copy the `ca.pem`, `client.pem` and `client-key.pem` files +into a specific folder (`/var/contiv/etcd-secrets` by default) on each worker node. +On the master node, you also need to add the `server.pem` and `server-key.pem` into that location. + +Then you can generate the Contiv-VPP deployment YAML as follows: +``` +cd k8s +helm template --name my-release contiv-vpp --set etcd.secureTransport=True > contiv-vpp.yaml +``` +Then you can go ahead and deploy Contiv-VPP using this yaml file. + +##### Embed the certificates into deployment the yaml and use k8s secret to distribute them {: #Embed-certificates } +In this case, you need to copy all 5 generated files into the folder with helm definitions +(`k8s/contiv-vpp`) and generate the Contiv-VPP deployment YAML as follows: +``` +cd k8s +helm template --name my-release contiv-vpp --set etcd.secureTransport=True --set etcd.secrets.mountFromHost=False > contiv-vpp.yaml +``` +Then just deploy Contiv-VPP using this yaml file. + +Please note that the path of the mount folder with certificates, as well as the certificate +file names can be customized using the config parameters of the Contiv-VPP chart, +as described in [this README](https://github.com/contiv/vpp/blob/master/k8s/contiv-vpp/README.md).
\ No newline at end of file diff --git a/docs/usecases/contiv/SINGLE_NIC_SETUP.md b/docs/usecases/contiv/SINGLE_NIC_SETUP.md new file mode 100644 index 00000000000..875a7323fff --- /dev/null +++ b/docs/usecases/contiv/SINGLE_NIC_SETUP.md @@ -0,0 +1,111 @@ +### Setting up a Node with a Single NIC + +#### Installing the STN Daemon +The STN (Steal the NIC) daemon must be installed on every node in the cluster that has only +one NIC. The STN daemon installation(*) should be performed before deployment +of the Contiv-VPP plugin. + +\* Docker daemon must be present when installing STN. Also, Docker must be configured to allow shared mount. +On CentOS, this may not be the case by default. You can enable it by following the instructions at +[https://docs.portworx.com/knowledgebase/shared-mount-propagation.html](https://docs.portworx.com/knowledgebase/shared-mount-propagation.html). + + +Run as root (not using sudo): +``` +bash <(curl -s https://raw.githubusercontent.com/contiv/vpp/master/k8s/stn-install.sh) +``` +The install script should output the following: +``` +Installing Contiv STN daemon. +Starting contiv-stn Docker container: +550334308f85f05b2690f5cfb5dd945bd9c501ab9d074231f15c14d7098ef212 +``` + +Check that the STN daemon is running: +``` +docker ps -a +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +550334308f85 contivvpp/stn "/stn" 33 seconds ago Up 33 seconds contiv-stn +``` + +Check that the STN daemon is operational: +``` +docker logs contiv-stn +``` +The expected logs would look like the following excerpt: +``` +2018/02/23 10:08:34 Starting the STN GRPC server at port 50051 +``` + +For more details, please read the Go documentation for [contiv-stn](https://github.com/contiv/vpp/blob/master/cmd/contiv-stn/doc.go) +and [contiv-init](https://github.com/contiv/vpp/blob/master/cmd/contiv-init/doc.go). + +#### Creating a VPP Interface Configuration +Create the VPP configuration for the hardware interface as described +[here](https://github.com/contiv/vpp/blob/master/docs/VPP_CONFIG.md#single-nic-configuration). + +#### Configuring STN in Contiv-VPP K8s Deployment Files +The STN feature is disabled by default. It needs to be enabled either globally, +or individually for every node in the cluster. + +##### Global Configuration: +Global configuration is used in homogeneous environments where all nodes in +a given cluster have the same hardware configuration, for example only a single +Network Adapter. To enable the STN feature globally, put the `StealFirstNIC: True` +stanza into the [`contiv-vpp.yaml`][1] deployment file, for example: +``` +data: + contiv.yaml: |- + TCPstackDisabled: true + ... + StealFirstNIC: True + ... + IPAMConfig: +``` + +Setting `StealFirstNIC` to `True` will tell the STN Daemon on every node in the +cluster to steal the first NIC from the kernel and assign it to VPP. Note that +the Network Adapters on different nodes do not need to be of the same type. You +still need to create the respective vswitch configurations on every node in the +cluster, as shown [above](#creating-a-vpp-interface-configuration). + +##### Individual Configuration: +Individual configuration is used in heterogeneous environments where each node +in a given cluster may be configured differently. To enable the STN feature +for a specific node in the cluster, put the following stanza into its Node +Configuration in the [`contiv-vpp.yaml`][1] deployment file, for example: +``` +... + NodeConfig: + - NodeName: "ubuntu-1" + StealInterface: "enp0s8" + - NodeName: "ubuntu-2" + StealInterface: "enp0s8" +... +``` +Note that you still have to create the vswitch configuration on the node as +shown [here](#creating-a-vpp-interface-configuration). + + + +#### Uninstalling the STN Daemon + +Run as root (not using sudo): +``` +bash <(curl -s https://raw.githubusercontent.com/contiv/vpp/master/k8s/stn-install.sh) --uninstall +``` +The install script should output the following: +``` +Uninstalling Contiv STN daemon. +Stopping contiv-stn Docker container: +contiv-stn +contiv-stn +contiv-stn +``` +Make sure that the STN daemon has been uninstalled: +``` +docker ps -q -f name=contiv-stn +``` +No containers should be listed. + +[1]: ../k8s/contiv-vpp.yaml diff --git a/docs/usecases/contiv/VMWARE_FUSION_HOST.md b/docs/usecases/contiv/VMWARE_FUSION_HOST.md new file mode 100644 index 00000000000..d4e251c0fcd --- /dev/null +++ b/docs/usecases/contiv/VMWARE_FUSION_HOST.md @@ -0,0 +1,52 @@ +### Preparing a VmWare Fusion Host +The *vmxnet3 driver* is required on a GigE Network Adapter used by VPP. On VmWare +Fusion, the default Network Adapter driver is an *Intel 82545EM (e1000)*, and there +is no GUI to change it to *vmxnet3*. The change must be done manually in the VM's +configuration file as follows: + +- Bring up the VM library window: **Window -> Virtual Machine Library** +- Right click on the VM where you want to change the driver: + <*VM-Name*> **-> Show in Finder**. This pops up a new Finder window with a line + for each VM that Fusion knows about. +- Right click on the VM where you want to change the driver: + <*VM-Name*> **-> Show package contents**. This brings up a window with the + contents of the package. +- Open the file <*VM-Name*> **.vmx** with your favorite text editor. +- For each Network Adapter that you want used by VPP, look for the + Network Adapter's driver configuration. For example, for the VM's first + Network Adapter look for: + ``` + ethernet0.virtualDev = "e1000" + ``` + Replace `e1000` with `vmxnet3`: + ``` + ethernet0.virtualDev = "vmxnet3" + ``` +and restart the VM. + +If you replaced the driver on your VM's primary Network Adapter, you will +have to change the primary network interface configuration in Linux. + +First, get the new primary network interface name: +``` +sudo lshw -class network -businfo + +Bus info Device Class Description +======================================================== +pci@0000:03:00.0 ens160 network VMXNET3 Ethernet Controller +``` +Replace the existing primary network interface name in `/etc/network/interfaces` +with the above device name (ens160): +``` +# This file describes the network interfaces available on your system, +# and how to activate them. For more information, see interfaces(5). + +source /etc/network/interfaces.d/* + +# The loopback network interface +auto lo +iface lo inet loopback + +# The primary network interface +auto ens160 +iface ens160 inet dhcp
\ No newline at end of file diff --git a/docs/usecases/contiv/VPPTRACE.md b/docs/usecases/contiv/VPPTRACE.md new file mode 100644 index 00000000000..2b397427cb3 --- /dev/null +++ b/docs/usecases/contiv/VPPTRACE.md @@ -0,0 +1,95 @@ +## Using vpptrace.sh for VPP Packet Tracing + +VPP allows tracing of incoming packets using CLI commands `trace add` and `show trace` +as explained [here](VPP_PACKET_TRACING_K8S.html), but it is a rather cumbersome process. + +The buffer for captured packets is limited in size, and once it gets full the tracing stops. The user has to manually clear the buffer content, and then repeat the trace command to resume the packet capture, losing information about all packets received in the meantime. + +Packet filtering exposed via the CLI command `trace filter` is also quite limited in what it can do. Currently there is just one available filter, which allows you to keep only packets that include a certain node in the trace or exclude a certain node in the trace. +It is not possible to filter the traffic by its content (e.g., by the source/destination IP address, protocol, etc.). + +Last but not least, it is not possible to trace packets on a selected interface +like `tcpdump`, which allows tracing via the option `-i`. VPP is only able to capture packets +on the *RX side* of selected *devices* (e.g., dpdk, virtio, af-packet). This means +that interfaces based on the same device cannot be traced for incoming packets +individually, but only all at the same time. In Contiv/VPP all pods are connected +with VPP via the same kind of the TAP interface, meaning that it is not possible to +capture packets incoming only from one selected pod. + +Contiv/VPP ships with a simple bash script [vpptrace.sh](https://github.com/contiv/vpp/blob/master/scripts/vpptrace.sh), +which helps alleviate the aforementioned VPP limitations. The script automatically +re-initializes buffers and traces whenever it is close to getting full, in order to +avoid packet loss as much as possible. Next it allows you to filter packets +by the content of the trace. There are two modes of filtering: + - *substring mode* (default): packet trace must contain a given sub-string in order to + be included in the output + - *regex mode*: packet trace must match a given regex in order to be printed + +The script is still limited, in that capture runs only on the RX side of all interfaces that are built on top of selected devices. Using filtering, however, it is possible to limit +*traffic by interface* simply by using the interface name as a substring to match against. + +#### Usage + +Run the script with option `-h` to get the usage printed: +``` +Usage: ./vpptrace.sh [-i <VPP-IF-TYPE>]... [-a <VPP-ADDRESS>] [-r] [-f <REGEXP> / <SUBSTRING>] + -i <VPP-IF-TYPE> : VPP interface *type* to run the packet capture on (e.g., dpdk-input, virtio-input, etc.) + - available aliases: + - af-packet-input: afpacket, af-packet, veth + - virtio-input: tap (version determined from the VPP runtime config), tap2, tapv2 + - tapcli-rx: tap (version determined from the VPP config), tap1, tapv1 + - dpdk-input: dpdk, gbe, phys* + - multiple interfaces can be watched at the same time - the option can be repeated with + different values + - default = dpdk + tap + -a <VPP-ADDRESS> : IP address or hostname of the VPP to capture packets from + - not supported if VPP listens on a UNIX domain socket + - default = 127.0.0.1 + -r : apply filter string (passed with -f) as a regexp expression + - by default the filter is NOT treated as regexp + -f : filter string that packet must contain (without -r) or match as regexp (with -r) to be printed + - default is no filtering +``` + +`VPP-IF-TYPE` is a repeated option used to select the set of devices (e.g., virtio, dpdk, etc.) +to capture the incoming traffic. Script provides multiple aliases, which +are much easier to remember than the device names. For `dpdk-input` one can enter +just `dpdk`, or anything starting with `phys`, etc. For TAPs, the script is even +smart enough to find out the TAP version used, which allows to enter just `tap` +as the device name. + +If `VPP-IF-TYPE` is not specified, then the default behaviour is to capture from both +`dpdk` (traffic entering the node from outside) and `tap` (preferred interface type +for pod-VPP and host-VPP interconnection, receiving node-initiated traffic). + +vpptrace.sh can capture packets even from a VPP on a different host, provided that +VPP-CLI listens on a port, and not on a UNIX domain socket (for security reasons IPC +is the default communication link, see `/etc/vpp/contiv-vswitch.conf`). Enter the destination +node IP address via the option `-a`(localhost is the default). + +The capture can be filtered via the `-f` option. The output will include only packets +whose trace matches contain the given expression/sub-string. + +Option `-r` enables the regex mode for filtering. + +#### Examples + +- Capture all packets entering VPP via `tapcli-1` interface **AND** all packets + leaving VPP via `tapcli-1` that were sent from a pod, or the host on the *same node* + (sent from tap, not Gbe): +``` +$ vpptrace.sh -i tap -f "tapcli-1" +``` + +- Capture all packets with source or destination IP address 10.1.1.3: +``` +$ vpptrace.sh -i tap -i dpdk -f "10.1.1.3" + +Or just: +$ vpptrace.sh "10.1.1.3" +``` + +- Capture all SYN-ACKs received from outside: +``` +$ vpptrace.sh -i dpdk -f "SYN-ACK" +```
\ No newline at end of file diff --git a/docs/usecases/contiv/VPP_CONFIG.md b/docs/usecases/contiv/VPP_CONFIG.md new file mode 100644 index 00000000000..0d0559372cb --- /dev/null +++ b/docs/usecases/contiv/VPP_CONFIG.md @@ -0,0 +1,153 @@ +## Creating VPP Startup Configuration +This document describes how to create the VPP startup configuration +file located at `/etc/vpp/contiv-vswitch.conf`. + +### Hardware Interface Configuration +#### Single-NIC Configuration +You need to configure hardware interfaces for use by VPP. First, find out the PCI address of the host's network interface. On +Debian-based distributions, you can use `lshw`: + +``` +sudo lshw -class network -businfo +Bus info Device Class Description +======================================================== +pci@0000:03:00.0 ens160 network VMXNET3 Ethernet Controller +``` + +In our case, it would be the `ens3` interface with the PCI address +`0000:00:03.0` + +Now, add or modify the VPP startup config file (`/etc/vpp/contiv-vswitch.conf`) +to contain the proper PCI address: +``` +unix { + nodaemon + cli-listen /run/vpp/cli.sock + cli-no-pager + coredump-size unlimited + full-coredump + poll-sleep-usec 100 +} +nat { + endpoint-dependent +} +dpdk { + dev 0000:00:03.0 +} +api-trace { + on + nitems 500 +} +``` +#### Multi-NIC Configuration +Similar to the single-NIC configuration, use command *lshw* to find the PCI +addresses of all the NICs in the system, for example: + +``` +$ sudo lshw -class network -businfo +Bus info Device Class Description +==================================================== +pci@0000:00:03.0 ens3 network Virtio network device +pci@0000:00:04.0 ens4 network Virtio network device +``` + +In the example above, `ens3` would be the primary interface and `ens4` would +be the interface that would be used by VPP. The PCI address of the `ens4` +interface would be `0000:00:04.0`. + +Make sure the selected interface is *shut down*, otherwise VPP +will not grab it: +``` +sudo ip link set ens4 down +``` + +Now, add or modify the VPP startup config file in `/etc/vpp/contiv-vswitch.conf` +to contain the proper PCI address: +``` +unix { + nodaemon + cli-listen /run/vpp/cli.sock + cli-no-pager + coredump-size unlimited + full-coredump + poll-sleep-usec 100 +} +nat { + endpoint-dependent +} +dpdk { + dev 0000:00:04.0 +} +api-trace { + on + nitems 500 +} +``` +If assigning multiple NICs to VPP you will need to include each NIC's PCI address +in the dpdk stanza in `/etc/vpp/contiv-vswitch.conf`. + +##### Assigning all NICs to VPP +On a multi-NIC node, it is also possible to assign all NICs from the kernel for +use by VPP. First, you need to install the STN daemon, as described [here][1], +since you will want the NICs to revert to the kernel if VPP crashes. + +You also need to configure the NICs in the VPP startup config file +in `/etc/vpp/contiv-vswitch.conf`. For example, to use both the primary and +secondary NIC, in a two-NIC node, your VPP startup config file would look +something like this: + +``` +unix { + nodaemon + cli-listen /run/vpp/cli.sock + cli-no-pager + coredump-size unlimited + full-coredump + poll-sleep-usec 100 +} +nat { + endpoint-dependent +} +dpdk { + dev 0000:00:03.0 + dev 0000:00:04.0 +} +api-trace { + on + nitems 500 +} +``` + +#### Installing `lshw` on CentOS/RedHat/Fedora +Note: On CentOS/RedHat/Fedora distributions, `lshw` may not be available +by default, install it by +``` +sudo yum -y install lshw +``` + +### Power-saving Mode +In regular operation, VPP takes 100% of one CPU core at all times (poll loop). +If high performance and low latency is not required you can "slow-down" +the poll-loop and drastically reduce CPU utilization by adding the following +stanza to the `unix` section of the VPP startup config file: +``` +unix { + ... + poll-sleep-usec 100 + ... +} +``` +The power-saving mode is especially useful in VM-based development environments +running on laptops or less powerful servers. + +### VPP API Trace +To troubleshoot VPP configuration issues in production environments, it is +strongly recommended to configure VPP API trace. This is done by adding the +following stanza to the VPP startup config file: +``` +api-trace { + on + nitems 500 +} +``` +You can set the size of the trace buffer with the <nitems> attribute. diff --git a/docs/usecases/contiv/VPP_PACKET_TRACING_K8S.md b/docs/usecases/contiv/VPP_PACKET_TRACING_K8S.md new file mode 100644 index 00000000000..f7b6f4c19e8 --- /dev/null +++ b/docs/usecases/contiv/VPP_PACKET_TRACING_K8S.md @@ -0,0 +1,510 @@ +## How to do VPP Packet Tracing in Kubernetes + +This document describes the steps to do *manual* packet tracing (capture) using +VPP in Kubernetes. Contiv/VPP also ships with a simple bash script +[vpptrace.sh](https://github.com/contiv/vpp/blob/master/scripts/vpptrace.sh), +which allows to *continuously* trace and +*filter* packets incoming through a given set of interface types. +Documentation for vpptrace.sh is available [here](https://github.com/contiv/vpp/blob/master/docs/VPPTRACE.md). + + +More information about VPP packet tracing is in: + +* <https://wiki.fd.io/view/VPP/Command-line_Interface_(CLI)_Guide#packet_tracer> +* <https://wiki.fd.io/view/VPP/How_To_Use_The_Packet_Generator_and_Packet_Tracer> +* <https://wiki.fd.io/view/VPP/Tutorial_Routing_and_Switching> + +#### SSH into the Node +Perform the following commands to SSH into the node: + +``` +cd vpp/vagrant/vagrant-scripts/ +vagrant ssh k8s-worker1 +``` + +#### Check the VPP Graph Nodes (Input and Output Queues) + +The following content shows what is running on VPP, via the `show run` command + +``` +vagrant@k8s-worker1:~$ sudo vppctl + _______ _ _ _____ ___ + __/ __/ _ \ (_)__ | | / / _ \/ _ \ + _/ _// // / / / _ \ | |/ / ___/ ___/ + /_/ /____(_)_/\___/ |___/_/ /_/ + +vpp# show run +Time 1026791.9, average vectors/node 1.12, last 128 main loops 0.00 per node 0.00 + vector rates in 1.6459e-4, out 1.5485e-4, drop 1.3635e-5, punt 0.0000e0 + Name State Calls Vectors Suspends Clocks Vectors/Call +GigabitEthernet0/8/0-output active 56 69 0 1.34e3 1.23 +GigabitEthernet0/8/0-tx active 54 67 0 8.09e5 1.24 +acl-plugin-fa-cleaner-process event wait 0 0 1 2.84e4 0.00 +admin-up-down-process event wait 0 0 1 4.59e3 0.00 +api-rx-from-ring any wait 0 0 3316292 1.24e5 0.00 +arp-input active 3 3 0 2.53e5 1.00 +bfd-process event wait 0 0 1 5.94e3 0.00 +cdp-process any wait 0 0 145916 1.36e4 0.00 +dhcp-client-process any wait 0 0 10268 3.65e4 0.00 +dns-resolver-process any wait 0 0 1027 5.86e4 0.00 +dpdk-input polling 8211032318951 93 0 1.48e13 0.00 +dpdk-ipsec-process done 1 0 0 2.10e5 0.00 +dpdk-process any wait 0 0 342233 9.86e6 0.00 +error-drop active 12 14 0 6.67e3 1.17 +ethernet-input active 60 74 0 5.81e3 1.23 +fib-walk any wait 0 0 513322 1.59e4 0.00 +flow-report-process any wait 0 0 1 1.45e3 0.00 +flowprobe-timer-process any wait 0 0 1 6.34e3 0.00 +ikev2-manager-process any wait 0 0 1026484 1.18e4 0.00 +interface-output active 2 2 0 3.23e3 1.00 +ioam-export-process any wait 0 0 1 1.98e3 0.00 +ip-route-resolver-process any wait 0 0 10268 3.02e4 0.00 +ip4-arp active 1 1 0 1.49e4 1.00 +ip4-input active 223 248 0 3.39e3 1.11 +ip4-load-balance active 106 132 0 5.34e3 1.25 +ip4-local active 86 92 0 2.46e3 1.07 +ip4-local-end-of-arc active 86 92 0 1.00e3 1.07 +ip4-lookup active 223 248 0 3.31e3 1.11 +ip4-rewrite active 190 222 0 1.92e3 1.17 +ip4-udp-lookup active 86 92 0 3.76e3 1.07 +ip6-drop active 6 7 0 2.29e3 1.17 +ip6-icmp-neighbor-discovery-ev any wait 0 0 1026484 1.13e4 0.00 +ip6-input active 6 7 0 3.33e3 1.17 +l2-flood active 2 2 0 4.42e3 1.00 +l2-fwd active 138 157 0 2.13e3 1.14 +l2-input active 140 159 0 2.41e3 1.14 +l2-learn active 86 92 0 3.64e4 1.07 +l2-output active 54 67 0 3.05e3 1.24 +l2fib-mac-age-scanner-process event wait 0 0 85 5.01e4 0.00 +lisp-retry-service any wait 0 0 513322 1.62e4 0.00 +lldp-process event wait 0 0 1 5.02e4 0.00 +loop0-output active 54 67 0 1.66e3 1.24 +loop0-tx active 54 0 0 2.49e3 0.00 +memif-process event wait 0 0 1 1.70e4 0.00 +nat-det-expire-walk done 1 0 0 3.79e3 0.00 +nat44-classify active 171 183 0 2.49e3 1.07 +nat44-hairpinning active 86 92 0 1.80e3 1.07 +nat44-in2out active 171 183 0 4.45e3 1.07 +nat44-in2out-slowpath active 171 183 0 3.98e3 1.07 +nat44-out2in active 52 65 0 1.28e4 1.25 +nat64-expire-walk any wait 0 0 102677 5.95e4 0.00 +nat64-expire-worker-walk interrupt wa 102676 0 0 7.39e3 0.00 +send-garp-na-process event wait 0 0 1 1.28e3 0.00 +startup-config-process done 1 0 1 4.19e3 0.00 +tapcli-0-output active 1 1 0 6.97e3 1.00 +tapcli-0-tx active 1 1 0 7.32e4 1.00 +tapcli-1-output active 57 63 0 1.66e3 1.11 +tapcli-1-tx active 57 63 0 1.35e5 1.11 +tapcli-2-output active 28 28 0 3.26e3 1.00 +tapcli-2-tx active 28 28 0 4.06e5 1.00 +tapcli-rx interrupt wa 62 76 0 6.58e4 1.23 +udp-ping-process any wait 0 0 1 1.79e4 0.00 +unix-cli-127.0.0.1:43282 active 2 0 455 1.26e15 0.00 +unix-epoll-input polling 8010763239 0 0 8.17e2 0.00 +vhost-user-process any wait 0 0 1 1.96e3 0.00 +vhost-user-send-interrupt-proc any wait 0 0 1 3.85e3 0.00 +vpe-link-state-process event wait 0 0 8 9.79e4 0.00 +vpe-oam-process any wait 0 0 503263 1.21e4 0.00 +vxlan-gpe-ioam-export-process any wait 0 0 1 2.91e3 0.00 +vxlan4-encap active 54 67 0 3.55e3 1.24 +vxlan4-input active 86 92 0 3.79e3 1.07 +wildcard-ip4-arp-publisher-pro event wait 0 0 1 6.44e3 0.00 +``` + +`tapcli-rx` above is the node-level input queue for incoming packets into all the pods on the node. There is one `tapcli-rx` input queue for every node. + +The following are the input and output queueus for each pod and the node: + +``` +tapcli-0-output +tapcli-0-tx +tapcli-1-output +tapcli-1-tx +tapcli-2-output +tapcli-2-tx +``` + +Each pod and node has two queues, one for rx (`tapcli-X-output`), and one for tx (`tapcli-X-tx`). The above output is with two `nginx` pods in kubernetes. + +#### Clear Existing VPP Packet Trace +Enter the following command: +``` +vpp# clear trace +``` + +#### How to Turn on VPP Packet Tracing +Enter the following commands: + +``` +vpp# trace add <input or output queue name> <number of packets to capture> + +vpp# trace add dpdk-input 1000 + +vpp# trace add tapcli-rx 1000 +``` + +#### Send Traffic to the Pods + +Open another terminal, SSH into the master node, refer the documentation in `vpp/vagrant/README.md` and send traffic to the two `nginx` pods using `wget`. + +``` +cd vpp/vagrant/vagrant-scripts/ +vagrant ssh k8s-master + +vagrant@k8s-master:~$ kubectl get pods -o wide +NAME READY STATUS RESTARTS AGE IP NODE +nginx-8586cf59-768qw 1/1 Running 0 11d 10.1.2.3 k8s-worker1 +nginx-8586cf59-d27h2 1/1 Running 0 11d 10.1.2.2 k8s-worker1 + +vagrant@k8s-master:~$ wget 10.1.2.2 +--2018-02-08 16:46:01-- http://10.1.2.2/ +Connecting to 10.1.2.2:80... connected. +HTTP request sent, awaiting response... 200 OK +Length: 612 [text/html] +Saving to: ‘index.html’ +index.html 100%[=========================================================>] 612 --.-KB/s in 0.004s +2018-02-08 16:46:01 (162 KB/s) - ‘index.html’ saved [612/612] + +vagrant@k8s-master:~$ wget 10.1.2.3 +--2018-02-08 16:46:02-- http://10.1.2.3/ +Connecting to 10.1.2.3:80... connected. +HTTP request sent, awaiting response... 200 OK +Length: 612 [text/html] +Saving to: ‘index.html.1’ +index.html.1 100%[=========================================================>] 612 --.-KB/s in 0.004s +2018-02-08 16:46:02 (143 KB/s) - ‘index.html.1’ saved [612/612] +``` + +#### Check the Packets Captured by VPP + +Back in the first terminal, check the packets captured by VPP. + +``` +vpp# show trace +... +... +Packet 33 + +21:34:51:476110: tapcli-rx + tapcli-2 +21:34:51:476115: ethernet-input + IP4: 00:00:00:00:00:02 -> 02:fe:72:95:66:c7 +21:34:51:476117: ip4-input + TCP: 10.1.2.3 -> 172.30.1.2 + tos 0x00, ttl 64, length 52, checksum 0x6fb4 + fragment id 0x11ec, flags DONT_FRAGMENT + TCP: 80 -> 58430 + seq. 0x5db741c8 ack 0x709defa7 + flags 0x11 FIN ACK, tcp header: 32 bytes + window 235, checksum 0x55c3 +21:34:51:476118: nat44-out2in + NAT44_OUT2IN: sw_if_index 6, next index 1, session index -1 +21:34:51:476120: ip4-lookup + fib 0 dpo-idx 23 flow hash: 0x00000000 + TCP: 10.1.2.3 -> 172.30.1.2 + tos 0x00, ttl 64, length 52, checksum 0x6fb4 + fragment id 0x11ec, flags DONT_FRAGMENT + TCP: 80 -> 58430 + seq. 0x5db741c8 ack 0x709defa7 + flags 0x11 FIN ACK, tcp header: 32 bytes + window 235, checksum 0x55c3 +21:34:51:476121: ip4-load-balance + fib 0 dpo-idx 23 flow hash: 0x00000000 + TCP: 10.1.2.3 -> 172.30.1.2 + tos 0x00, ttl 64, length 52, checksum 0x6fb4 + fragment id 0x11ec, flags DONT_FRAGMENT + TCP: 80 -> 58430 + seq. 0x5db741c8 ack 0x709defa7 + flags 0x11 FIN ACK, tcp header: 32 bytes + window 235, checksum 0x55c3 +21:34:51:476122: ip4-rewrite + tx_sw_if_index 3 dpo-idx 5 : ipv4 via 192.168.30.1 loop0: 1a2b3c4d5e011a2b3c4d5e020800 flow hash: 0x00000000 + 00000000: 1a2b3c4d5e011a2b3c4d5e0208004500003411ec40003f0670b40a010203ac1e + 00000020: 01020050e43e5db741c8709defa7801100eb55c300000101080a0f4b +21:34:51:476123: loop0-output + loop0 + IP4: 1a:2b:3c:4d:5e:02 -> 1a:2b:3c:4d:5e:01 + TCP: 10.1.2.3 -> 172.30.1.2 + tos 0x00, ttl 63, length 52, checksum 0x70b4 + fragment id 0x11ec, flags DONT_FRAGMENT + TCP: 80 -> 58430 + seq. 0x5db741c8 ack 0x709defa7 + flags 0x11 FIN ACK, tcp header: 32 bytes + window 235, checksum 0x55c3 +21:34:51:476124: l2-input + l2-input: sw_if_index 3 dst 1a:2b:3c:4d:5e:01 src 1a:2b:3c:4d:5e:02 +21:34:51:476125: l2-fwd + l2-fwd: sw_if_index 3 dst 1a:2b:3c:4d:5e:01 src 1a:2b:3c:4d:5e:02 bd_index 1 +21:34:51:476125: l2-output + l2-output: sw_if_index 4 dst 1a:2b:3c:4d:5e:01 src 1a:2b:3c:4d:5e:02 data 08 00 45 00 00 34 11 ec 40 00 3f 06 +21:34:51:476126: vxlan4-encap + VXLAN encap to vxlan_tunnel0 vni 10 +21:34:51:476126: ip4-load-balance + fib 4 dpo-idx 22 flow hash: 0x00000103 + UDP: 192.168.16.2 -> 192.168.16.1 + tos 0x00, ttl 254, length 102, checksum 0x1b33 + fragment id 0x0000 + UDP: 24320 -> 4789 + length 82, checksum 0x0000 +21:34:51:476127: ip4-rewrite + tx_sw_if_index 1 dpo-idx 4 : ipv4 via 192.168.16.1 GigabitEthernet0/8/0: 080027b2610908002733fb6f0800 flow hash: 0x00000103 + 00000000: 080027b2610908002733fb6f08004500006600000000fd111c33c0a81002c0a8 + 00000020: 10015f0012b5005200000800000000000a001a2b3c4d5e011a2b3c4d +21:34:51:476127: GigabitEthernet0/8/0-output + GigabitEthernet0/8/0 + IP4: 08:00:27:33:fb:6f -> 08:00:27:b2:61:09 + UDP: 192.168.16.2 -> 192.168.16.1 + tos 0x00, ttl 253, length 102, checksum 0x1c33 + fragment id 0x0000 + UDP: 24320 -> 4789 + length 82, checksum 0x0000 +21:34:51:476128: GigabitEthernet0/8/0-tx + GigabitEthernet0/8/0 tx queue 0 + buffer 0xfa7f: current data -50, length 116, free-list 0, clone-count 0, totlen-nifb 0, trace 0x20 + l2-hdr-offset 0 l3-hdr-offset 14 + PKT MBUF: port 255, nb_segs 1, pkt_len 116 + buf_len 2176, data_len 116, ol_flags 0x0, data_off 78, phys_addr 0x569ea040 + packet_type 0x0 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0 + IP4: 08:00:27:33:fb:6f -> 08:00:27:b2:61:09 + UDP: 192.168.16.2 -> 192.168.16.1 + tos 0x00, ttl 253, length 102, checksum 0x1c33 + fragment id 0x0000 + UDP: 24320 -> 4789 + length 82, checksum 0x0000 +``` + +In the above captured packet, we can see: + +* Input queue name `tapcli-rx` +* Pod's IP address `10.1.2.3` +* IP address of the master node `172.30.1.2`, which sent the `wget` traffic to the two pods +* HTTP port `80`, destination port and TCP protocol (`TCP: 80 -> 58430`) +* NAT queue name `nat44-out2in` +* VXLAN VNI ID `VXLAN encap to vxlan_tunnel0 vni 10` +* VXLAN UDP port `4789` +* IP address of `GigabitEthernet0/8/0` interface (`192.168.16.2`) +* Packet on the outgoing queue `GigabitEthernet0/8/0-tx` + +#### Find IP Addresses of GigabitEthernet and the Tap Interfaces +Enter the following commands to find the IP addresses and Tap interfaces: + +``` +vpp# show int address +GigabitEthernet0/8/0 (up): + L3 192.168.16.2/24 +local0 (dn): +loop0 (up): + L2 bridge bd-id 1 idx 1 shg 0 bvi + L3 192.168.30.2/24 +tapcli-0 (up): + L3 172.30.2.1/24 +tapcli-1 (up): + L3 10.2.1.2/32 +tapcli-2 (up): + L3 10.2.1.3/32 +vxlan_tunnel0 (up): + L2 bridge bd-id 1 idx 1 shg 0 +``` + +#### Other Useful VPP CLIs + +Enter the following commands to see additional information about VPP: + +``` +vpp# show int + Name Idx State Counter Count +GigabitEthernet0/8/0 1 up rx packets 138 + rx bytes 18681 + tx packets 100 + tx bytes 29658 + drops 1 + ip4 137 + tx-error 2 +local0 0 down drops 1 +loop0 3 up rx packets 137 + rx bytes 9853 + tx packets 200 + tx bytes 49380 + drops 1 + ip4 136 +tapcli-0 2 up rx packets 8 + rx bytes 600 + tx packets 1 + tx bytes 42 + drops 9 + ip6 7 +tapcli-1 5 up rx packets 56 + rx bytes 13746 + tx packets 78 + tx bytes 6733 + drops 1 + ip4 56 +tapcli-2 6 up rx packets 42 + rx bytes 10860 + tx packets 58 + tx bytes 4996 + drops 1 + ip4 42 +vxlan_tunnel0 4 up rx packets 137 + rx bytes 11771 + tx packets 100 + tx bytes 28290 + +vpp# show hardware + Name Idx Link Hardware +GigabitEthernet0/8/0 1 up GigabitEthernet0/8/0 + Ethernet address 08:00:27:33:fb:6f + Intel 82540EM (e1000) + carrier up full duplex speed 1000 mtu 9216 + rx queues 1, rx desc 1024, tx queues 1, tx desc 1024 + cpu socket 0 + + tx frames ok 100 + tx bytes ok 29658 + rx frames ok 138 + rx bytes ok 19233 + extended stats: + rx good packets 138 + tx good packets 100 + rx good bytes 19233 + tx good bytes 29658 +local0 0 down local0 + local +loop0 3 up loop0 + Ethernet address 1a:2b:3c:4d:5e:02 +tapcli-0 2 up tapcli-0 + Ethernet address 02:fe:95:07:df:9c +tapcli-1 5 up tapcli-1 + Ethernet address 02:fe:3f:5f:0f:9a +tapcli-2 6 up tapcli-2 + Ethernet address 02:fe:72:95:66:c7 +vxlan_tunnel0 4 up vxlan_tunnel0 + VXLAN + +vpp# show bridge-domain + BD-ID Index BSN Age(min) Learning U-Forwrd UU-Flood Flooding ARP-Term BVI-Intf + 1 1 1 off on on on on off loop0 + +vpp# show bridge-domain 1 detail + BD-ID Index BSN Age(min) Learning U-Forwrd UU-Flood Flooding ARP-Term BVI-Intf + 1 1 1 off on on on on off loop0 + + Interface If-idx ISN SHG BVI TxFlood VLAN-Tag-Rewrite + loop0 3 3 0 * * none + vxlan_tunnel0 4 1 0 - * none + +vpp# show l2fib verbose + Mac-Address BD-Idx If-Idx BSN-ISN Age(min) static filter bvi Interface-Name + 1a:2b:3c:4d:5e:02 1 3 0/0 - * - * loop0 + 1a:2b:3c:4d:5e:01 1 4 1/1 - - - - vxlan_tunnel0 +L2FIB total/learned entries: 2/1 Last scan time: 0.0000e0sec Learn limit: 4194304 + +vpp# show ip fib +ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] locks:[src:(nil):2, src:adjacency:3, src:default-route:1, ] +0.0.0.0/0 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:1 buckets:1 uRPF:21 to:[0:0]] + [0] [@5]: ipv4 via 172.30.2.2 tapcli-0: def35b93961902fe9507df9c0800 +0.0.0.0/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:2 buckets:1 uRPF:1 to:[0:0]] + [0] [@0]: dpo-drop ip4 +10.1.1.0/24 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:24 buckets:1 uRPF:29 to:[0:0]] + [0] [@10]: dpo-load-balance: [proto:ip4 index:23 buckets:1 uRPF:28 to:[0:0] via:[98:23234]] + [0] [@5]: ipv4 via 192.168.30.1 loop0: 1a2b3c4d5e011a2b3c4d5e020800 +10.1.2.2/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:27 buckets:1 uRPF:12 to:[78:5641]] + [0] [@5]: ipv4 via 10.1.2.2 tapcli-1: 00000000000202fe3f5f0f9a0800 +10.1.2.3/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:29 buckets:1 uRPF:32 to:[58:4184]] + [0] [@5]: ipv4 via 10.1.2.3 tapcli-2: 00000000000202fe729566c70800 +10.2.1.2/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:26 buckets:1 uRPF:31 to:[0:0]] + [0] [@2]: dpo-receive: 10.2.1.2 on tapcli-1 +10.2.1.3/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:28 buckets:1 uRPF:33 to:[0:0]] + [0] [@2]: dpo-receive: 10.2.1.3 on tapcli-2 +172.30.1.0/24 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:25 buckets:1 uRPF:29 to:[98:23234]] + [0] [@10]: dpo-load-balance: [proto:ip4 index:23 buckets:1 uRPF:28 to:[0:0] via:[98:23234]] + [0] [@5]: ipv4 via 192.168.30.1 loop0: 1a2b3c4d5e011a2b3c4d5e020800 +172.30.2.0/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:14 buckets:1 uRPF:15 to:[0:0]] + [0] [@0]: dpo-drop ip4 +172.30.2.0/24 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:13 buckets:1 uRPF:14 to:[0:0]] + [0] [@4]: ipv4-glean: tapcli-0 +172.30.2.1/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:16 buckets:1 uRPF:19 to:[0:0]] + [0] [@2]: dpo-receive: 172.30.2.1 on tapcli-0 +172.30.2.2/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:17 buckets:1 uRPF:18 to:[0:0]] + [0] [@5]: ipv4 via 172.30.2.2 tapcli-0: def35b93961902fe9507df9c0800 +172.30.2.255/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:15 buckets:1 uRPF:17 to:[0:0]] + [0] [@0]: dpo-drop ip4 +192.168.16.0/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:10 buckets:1 uRPF:9 to:[0:0]] + [0] [@0]: dpo-drop ip4 +192.168.16.1/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:22 buckets:1 uRPF:34 to:[0:0] via:[100:28290]] + [0] [@5]: ipv4 via 192.168.16.1 GigabitEthernet0/8/0: 080027b2610908002733fb6f0800 +192.168.16.0/24 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:9 buckets:1 uRPF:30 to:[0:0]] + [0] [@4]: ipv4-glean: GigabitEthernet0/8/0 +192.168.16.2/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:12 buckets:1 uRPF:13 to:[137:16703]] + [0] [@2]: dpo-receive: 192.168.16.2 on GigabitEthernet0/8/0 +192.168.16.255/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:11 buckets:1 uRPF:11 to:[0:0]] + [0] [@0]: dpo-drop ip4 +192.168.30.0/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:19 buckets:1 uRPF:23 to:[0:0]] + [0] [@0]: dpo-drop ip4 +192.168.30.1/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:23 buckets:1 uRPF:28 to:[0:0] via:[98:23234]] + [0] [@5]: ipv4 via 192.168.30.1 loop0: 1a2b3c4d5e011a2b3c4d5e020800 +192.168.30.0/24 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:18 buckets:1 uRPF:22 to:[0:0]] + [0] [@4]: ipv4-glean: loop0 +192.168.30.2/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:21 buckets:1 uRPF:27 to:[0:0]] + [0] [@2]: dpo-receive: 192.168.30.2 on loop0 +192.168.30.255/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:20 buckets:1 uRPF:25 to:[0:0]] + [0] [@0]: dpo-drop ip4 +224.0.0.0/4 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:4 buckets:1 uRPF:3 to:[0:0]] + [0] [@0]: dpo-drop ip4 +240.0.0.0/4 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:3 buckets:1 uRPF:2 to:[0:0]] + [0] [@0]: dpo-drop ip4 +255.255.255.255/32 + unicast-ip4-chain + [@0]: dpo-load-balance: [proto:ip4 index:5 buckets:1 uRPF:4 to:[0:0]] + [0] [@0]: dpo-drop ip4 +``` diff --git a/docs/usecases/contiv/Vagrant.md b/docs/usecases/contiv/Vagrant.md new file mode 100644 index 00000000000..a9040a6c1a1 --- /dev/null +++ b/docs/usecases/contiv/Vagrant.md @@ -0,0 +1,250 @@ +## Contiv-VPP Vagrant Installation + +### Prerequisites +The following items are prerequisites before installing vagrant: +- Vagrant 2.0.1 or later +- Hypervisors: + - VirtualBox 5.2.8 or later + - VMWare Fusion 10.1.0 or later or VmWare Workstation 14 + - For VmWare Fusion, you will need the [Vagrant VmWare Fusion plugin](https://www.vagrantup.com/vmware/index.html) +- Laptop or server with at least 4 CPU cores and 16 Gig of RAM + +### Creating / Shutting Down / Destroying the Cluster +This folder contains the Vagrant file that is used to create a single or multi-node +Kubernetes cluster using Contiv-VPP as a Network Plugin. + +The folder is organized into two subfolders: + + - (config) - contains the files that share cluster information, which are used + during the provisioning stage (master IP address, Certificates, hash-keys). + **CAUTION:** Editing is not recommended! + - (vagrant) - contains scripts that are used for creating, destroying, rebooting + and shutting down the VMs that host the K8s cluster. + +To create and run a K8s cluster with a *contiv-vpp CNI* plugin, run the +`vagrant-start` script, located in the [vagrant folder](https://github.com/contiv/vpp/tree/master/vagrant). The `vagrant-start` +script prompts the user to select the number of worker nodes for the kubernetes cluster. +Zero (0) worker nodes mean that a single-node cluster (with one kubernetes master node) will be deployed. + +Next, the user is prompted to select either the *production environment* or the *development environment*. +Instructions on how to build the development *contiv/vpp-vswitch* image can be found below in the +[development environment](#building-and-deploying-the-dev-contiv-vswitch-image) command section. + +The last option asks the user to select either *Without StealTheNIC* or *With StealTheNIC*. +Using option *With StealTheNIC* has the plugin "steal" interfaces owned by Linux and uses their configuration in VPP. + +For the production environment, enter the following commands: +``` +| => ./vagrant-start +Please provide the number of workers for the Kubernetes cluster (0-50) or enter [Q/q] to exit: 1 + +Please choose Kubernetes environment: +1) Production +2) Development +3) Quit +--> 1 +You chose Development environment + +Please choose deployment scenario: +1) Without StealTheNIC +2) With StealTheNIC +3) Quit +--> 1 +You chose deployment without StealTheNIC + +Creating a production environment, without STN and 1 worker node(s) +``` + +For the development environment, enter the following commands: +``` +| => ./vagrant-start +Please provide the number of workers for the Kubernetes cluster (0-50) or enter [Q/q] to exit: 1 + +Please choose Kubernetes environment: +1) Production +2) Development +3) Quit +--> 2 +You chose Development environment + +Please choose deployment scenario: +1) Without StealTheNIC +2) With StealTheNIC +3) Quit +--> 1 +You chose deployment without StealTheNIC + +Creating a development environment, without STN and 1 worker node(s) +``` + +To destroy and clean-up the cluster, run the *vagrant-cleanup* script, located +[inside the vagrant folder](https://github.com/contiv/vpp/tree/master/vagrant): +``` +cd vagrant/ +./vagrant-cleanup +``` + +To shutdown the cluster, run the *vagrant-shutdown* script, located [inside the vagrant folder](https://github.com/contiv/vpp/tree/master/vagrant): +``` +cd vagrant/ +./vagrant-shutdown +``` + +- To reboot the cluster, run the *vagrant-reload* script, located [inside the vagrant folder](https://github.com/contiv/vpp/tree/master/vagrant): +``` +cd vagrant/ +./vagrant-reload +``` + +- From a suspended state, or after a reboot of the host machine, the cluster +can be brought up by running the *vagrant-up* script. + + +### Building and Deploying the dev-contiv-vswitch Image +If you chose the optional development-environment-deployment option, then perform the +following instructions on how to build a modified *contivvpp/vswitch* image: + +- Make sure changes in the code have been saved. From the k8s-master node, + build the new *contivvpp/vswitch* image (run as sudo): + +``` +vagrant ssh k8s-master +cd /vagrant/config +sudo ./save-dev-image +``` + +- The newly built *contivvpp/vswitch* image is now tagged as *latest*. Verify the +build with `sudo docker images`; the *contivvpp/vswitch* should have been created a few +seconds ago. The new image with all the changes must become available to all +the nodes in the K8s cluster. To make the changes available to all, load the docker image into the running +worker nodes (run as sudo): + +``` +vagrant ssh k8s-worker1 +cd /vagrant/config +sudo ./load-dev-image +``` + +- Verify with `sudo docker images`; the old *contivvpp/vswitch* should now be tagged as +`<none>` and the latest tagged *contivvpp/vswitch* should have been created a +few seconds ago. + +### Exploring the Cluster +Once the cluster is up, perform the following steps: +- Log into the master: +``` +cd vagrant + +vagrant ssh k8s-master + +Welcome to Ubuntu 16.04 LTS (GNU/Linux 4.4.0-21-generic x86_64) + + * Documentation: https://help.ubuntu.com/ +vagrant@k8s-master:~$ +``` +- Verify the Kubernetes/Contiv-VPP installation. First, verify the nodes +in the cluster: + +``` +vagrant@k8s-master:~$ kubectl get nodes -o wide + +NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME +k8s-master Ready master 22m v1.9.2 <none> Ubuntu 16.04 LTS 4.4.0-21-generic docker://17.12.0-ce +k8s-worker1 Ready <none> 15m v1.9.2 <none> Ubuntu 16.04 LTS 4.4.0-21-generic docker://17.12.0-ce +``` + +- Next, verify that all pods are running correctly: + +``` +vagrant@k8s-master:~$ kubectl get pods -n kube-system -o wide + +NAME READY STATUS RESTARTS AGE IP NODE +contiv-etcd-2ngdc 1/1 Running 0 17m 192.169.1.10 k8s-master +contiv-ksr-x7gsq 1/1 Running 3 17m 192.169.1.10 k8s-master +contiv-vswitch-9bql6 2/2 Running 0 17m 192.169.1.10 k8s-master +contiv-vswitch-hpt2x 2/2 Running 0 10m 192.169.1.11 k8s-worker1 +etcd-k8s-master 1/1 Running 0 16m 192.169.1.10 k8s-master +kube-apiserver-k8s-master 1/1 Running 0 16m 192.169.1.10 k8s-master +kube-controller-manager-k8s-master 1/1 Running 0 15m 192.169.1.10 k8s-master +kube-dns-6f4fd4bdf-62rv4 2/3 CrashLoopBackOff 14 17m 10.1.1.2 k8s-master +kube-proxy-bvr74 1/1 Running 0 10m 192.169.1.11 k8s-worker1 +kube-proxy-v4fzq 1/1 Running 0 17m 192.169.1.10 k8s-master +kube-scheduler-k8s-master 1/1 Running 0 16m 192.169.1.10 k8s-master +``` + +- If you want your pods to be scheduled on both the master and the workers, +you have to untaint the master node: +``` + +``` + +- Check VPP and its interfaces: +``` +vagrant@k8s-master:~$ sudo vppctl + _______ _ _ _____ ___ + __/ __/ _ \ (_)__ | | / / _ \/ _ \ + _/ _// // / / / _ \ | |/ / ___/ ___/ + /_/ /____(_)_/\___/ |___/_/ /_/ + +vpp# sh interface + Name Idx State Counter Count +GigabitEthernet0/8/0 1 up rx packets 14 + rx bytes 3906 + tx packets 18 + tx bytes 2128 + drops 3 + ip4 13 +... + +``` +- Make sure that `GigabitEthernet0/8/0` is listed and that its status is `up`. + +- Next, create an example deployment of nginx pods: +``` +vagrant@k8s-master:~$ kubectl run nginx --image=nginx --replicas=2 +deployment "nginx" created +``` +- Check the status of the deployment: + +``` +vagrant@k8s-master:~$ kubectl get deploy -o wide + +NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR +nginx 2 2 2 2 2h nginx nginx run=nginx +``` + +- Verify that the pods in the deployment are up and running: +``` +vagrant@k8s-master:~$ kubectl get pods -o wide + +NAME READY STATUS RESTARTS AGE IP NODE +nginx-8586cf59-6kx2m 1/1 Running 1 1h 10.1.2.3 k8s-worker1 +nginx-8586cf59-j5vf9 1/1 Running 1 1h 10.1.2.2 k8s-worker1 +``` + +- Issue an HTTP GET request to a pod in the deployment: + +``` +vagrant@k8s-master:~$ wget 10.1.2.2 + +--2018-01-19 12:34:08-- http://10.1.2.2/ +Connecting to 10.1.2.2:80... connected. +HTTP request sent, awaiting response... 200 OK +Length: 612 [text/html] +Saving to: ‘index.html.1’ + +index.html.1 100%[=========================================>] 612 --.-KB/s in 0s + +2018-01-19 12:34:08 (1.78 MB/s) - ‘index.html.1’ saved [612/612] +``` + +#### How to SSH into k8s Worker Node +To SSH into k8s Worker Node, perform the following steps: + +``` +cd vagrant + +vagrant status + +vagrant ssh k8s-worker1 +``` diff --git a/docs/usecases/contiv/index.rst b/docs/usecases/contiv/index.rst new file mode 100644 index 00000000000..bc52e6142ca --- /dev/null +++ b/docs/usecases/contiv/index.rst @@ -0,0 +1,26 @@ +.. _contiv:
+
+##########
+Contiv/VPP
+##########
+
+This section provides the following information about the Contiv function:
+
+.. toctree::
+ :maxdepth: 2
+
+ K8s_Overview
+ SECURITY
+ Vagrant
+ MANUAL_INSTALL
+ VPP_CONFIG
+ VMWARE_FUSION_HOST
+ NETWORKING
+ SINGLE_NIC_SETUP
+ MULTI_NIC_SETUP
+ CUSTOM_MGMT_NETWORK
+ Prometheus
+ VPP_PACKET_TRACING_K8S
+ VPPTRACE
+ CORE_FILES
+ BUG_REPORTS
diff --git a/docs/usecases/index.rst b/docs/usecases/index.rst index 3507c2adf9d..28070453692 100644 --- a/docs/usecases/index.rst +++ b/docs/usecases/index.rst @@ -9,6 +9,7 @@ extensive list, but should give a sampling of the many features contained in FD. .. toctree:: + contiv/index containers vhost/index.rst homegateway |