summaryrefslogtreecommitdiffstats
path: root/test/vpp_ipsec.py
diff options
context:
space:
mode:
authorTianyu Li <tianyu.li@arm.com>2021-08-26 09:47:31 +0800
committerDamjan Marion <dmarion@me.com>2021-09-13 13:38:36 +0000
commitdb79ac0502f3a2ecc20d4fa2e9f9be20bfb8e18c (patch)
treecf8240b961c7601a6a5b100900bed540c1bb6fff /test/vpp_ipsec.py
parent14b472e373228be0f4c96f5450f2ecd8aef4ac34 (diff)
dpdk: fix prefetch assert on Arm
CLIB_PREFETCH (cop[1], CLIB_CACHE_LINE_BYTES * 3, STORE); Note on 64 bytes cache line size arm machines, CLIB_CACHE_LINE_BYTES 128 CLIB_CACHE_PREFETCH_BYTES 6 above CLIB_PREFETCH () macro will be expand to ASSERT ((size) <= 4 * CLIB_CACHE_PREFETCH_BYTES); it will hit assert due to size (i.e. 3 * 128) > 4 * 64 Solution: Change to CLIB_PREFETCH (cop[1], sizeof(*cop[1]), STORE); Type: fix Signed-off-by: Tianyu Li <tianyu.li@arm.com> Reviewed-by: Lijian Zhang <lijian.zhang@arm.com> Change-Id: Id0981fd5bd2b25ff71db4197b25578d0b7a9803e
Diffstat (limited to 'test/vpp_ipsec.py')
0 files changed, 0 insertions, 0 deletions
/a> 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333
# Debugging and Reporting Bugs in Contiv-VPP

## Bug Report Structure

- [Deployment description](#describe-deployment):
Briefly describes the deployment, where an issue was spotted,
number of k8s nodes, is DHCP/STN/TAP used.

- [Logs](#collecting-the-logs):
Attach corresponding logs, at least from the vswitch pods.

- [VPP config](#inspect-vpp-config):
Attach output of the show commands.

- [Basic Collection Example](#basic-example)

### Describe Deployment
Since contiv-vpp can be used with different configurations, it is helpful 
to attach the config that was applied. Either attach `values.yaml` passed to the helm chart,
or attach the [corresponding part](https://github.com/contiv/vpp/blob/42b3bfbe8735508667b1e7f1928109a65dfd5261/k8s/contiv-vpp.yaml#L24-L38) from the deployment yaml file.

```
  contiv.yaml: |-
    TCPstackDisabled: true
    UseTAPInterfaces: true
    TAPInterfaceVersion: 2
    NatExternalTraffic: true
    MTUSize: 1500
    IPAMConfig:
      PodSubnetCIDR: 10.1.0.0/16
      PodNetworkPrefixLen: 24
      PodIfIPCIDR: 10.2.1.0/24
      VPPHostSubnetCIDR: 172.30.0.0/16
      VPPHostNetworkPrefixLen: 24
      NodeInterconnectCIDR: 192.168.16.0/24
      VxlanCIDR: 192.168.30.0/24
      NodeInterconnectDHCP: False
```

Information that might be helpful:
 - Whether node IPs are statically assigned, or if DHCP is used
 - STN is enabled
 - Version of TAP interfaces used
 - Output of `kubectl get pods -o wide --all-namespaces`
 

### Collecting the Logs

The most essential thing that needs to be done when debugging and **reporting an issue**
in Contiv-VPP is **collecting the logs from the contiv-vpp vswitch containers**.

#### a) Collecting Vswitch Logs Using kubectl
In order to collect the logs from individual vswitches in the cluster, connect to the master node
and then find the POD names of the individual vswitch containers:

```
$ kubectl get pods --all-namespaces | grep vswitch
kube-system   contiv-vswitch-lqxfp               2/2       Running   0          1h
kube-system   contiv-vswitch-q6kwt               2/2       Running   0          1h
```

Then run the following command, with *pod name* replaced by the actual POD name:
```
$ kubectl logs <pod name> -n kube-system -c contiv-vswitch
```

Redirect the output to a file to save the logs, for example:

```
kubectl logs contiv-vswitch-lqxfp -n kube-system -c contiv-vswitch > logs-master.txt
```

#### b) Collecting Vswitch Logs Using Docker
If option a) does not work, then you can still collect the same logs using the plain docker
command. For that, you need to connect to each individual node in the k8s cluster, and find the container ID of the vswitch container:

```
$ docker ps | grep contivvpp/vswitch
b682b5837e52        contivvpp/vswitch                                        "/usr/bin/supervisor…"   2 hours ago         Up 2 hours                              k8s_contiv-vswitch_contiv-vswitch-q6kwt_kube-system_d09b6210-2903-11e8-b6c9-08002723b076_0
```

Now use the ID from the first column to dump the logs into the `logs-master.txt` file:
```
$ docker logs b682b5837e52 > logs-master.txt
```

#### Reviewing the Vswitch Logs

In order to debug an issue, it is good to start by grepping the logs for the `level=error` string, for example:
```
$ cat logs-master.txt | grep level=error
```

Also, VPP or contiv-agent may crash with some bugs. To check if some process crashed, grep for the string `exit`, for example:
```
$ cat logs-master.txt | grep exit
2018-03-20 06:03:45,948 INFO exited: vpp (terminated by SIGABRT (core dumped); not expected)
2018-03-20 06:03:48,948 WARN received SIGTERM indicating exit request
```

#### Collecting the STN Daemon Logs
In STN (Steal The NIC) deployment scenarios, often need to collect and review the logs
from the STN daemon. This needs to be done on each node:
```
$ docker logs contiv-stn > logs-stn-master.txt
```

#### Collecting Logs in Case of Crash Loop
If the vswitch is crashing in a loop (which can be determined by increasing the number in the `RESTARTS`
column of the `kubectl get pods --all-namespaces` output), the `kubectl logs` or `docker logs` would
give us the logs of the latest incarnation of the vswitch. That might not be the original root cause
of the very first crash, so in order to debug that, we need to disable k8s health check probes to not
restart the vswitch after the very first crash. This can be done by commenting-out the `readinessProbe`
and `livenessProbe` in the contiv-vpp deployment YAML:

```diff
diff --git a/k8s/contiv-vpp.yaml b/k8s/contiv-vpp.yaml
index 3676047..ffa4473 100644
--- a/k8s/contiv-vpp.yaml
+++ b/k8s/contiv-vpp.yaml
@@ -224,18 +224,18 @@ spec:
           ports:
             # readiness + liveness probe
             - containerPort: 9999
-          readinessProbe:
-            httpGet:
-              path: /readiness
-              port: 9999
-            periodSeconds: 1
-            initialDelaySeconds: 15
-          livenessProbe:
-            httpGet:
-              path: /liveness
-              port: 9999
-            periodSeconds: 1
-            initialDelaySeconds: 60
+ #         readinessProbe:
+ #           httpGet:
+ #             path: /readiness
+ #             port: 9999
+ #           periodSeconds: 1
+ #           initialDelaySeconds: 15
+ #         livenessProbe:
+ #           httpGet:
+ #             path: /liveness
+ #             port: 9999
+ #           periodSeconds: 1
+ #           initialDelaySeconds: 60
           env:
             - name: MICROSERVICE_LABEL
               valueFrom:
```

If VPP is the crashing process, please follow the \[CORE_FILES\](CORE_FILES.html) guide and provide the coredump file.


### Inspect VPP Config
Inspect the following areas:
- Configured interfaces (issues related basic node/pod connectivity issues):
```
vpp# sh int addr
GigabitEthernet0/9/0 (up):
  192.168.16.1/24
local0 (dn):
loop0 (up):
  l2 bridge bd_id 1 bvi shg 0
  192.168.30.1/24
tapcli-0 (up):
  172.30.1.1/24
```

- IP forwarding table:
```
vpp# sh ip fib
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] locks:[src:(nil):2, src:adjacency:3, src:default-route:1, ]
0.0.0.0/0
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:1 buckets:1 uRPF:0 to:[7:552]]
    [0] [@0]: dpo-drop ip4
0.0.0.0/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:2 buckets:1 uRPF:1 to:[0:0]]
    [0] [@0]: dpo-drop ip4

... 
...

255.255.255.255/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:5 buckets:1 uRPF:4 to:[0:0]]
    [0] [@0]: dpo-drop ip4
```
- ARP Table:
```
vpp# sh ip arp
    Time           IP4       Flags      Ethernet              Interface       
    728.6616  192.168.16.2     D    08:00:27:9c:0e:9f GigabitEthernet0/8/0
    542.7045  192.168.30.2     S    1a:2b:3c:4d:5e:02 loop0
      1.4241   172.30.1.2      D    86:41:d5:92:fd:24 tapcli-0
     15.2485    10.1.1.2      SN    00:00:00:00:00:02 tapcli-1
    739.2339    10.1.1.3      SN    00:00:00:00:00:02 tapcli-2
    739.4119    10.1.1.4      SN    00:00:00:00:00:02 tapcli-3
```
- NAT configuration (issues related to services):
```
DBGvpp# sh nat44 addresses
NAT44 pool addresses:
192.168.16.10
  tenant VRF independent
  0 busy udp ports
  0 busy tcp ports
  0 busy icmp ports
NAT44 twice-nat pool addresses:
```
```
vpp# sh nat44 static mappings 
NAT44 static mappings:
 tcp local 192.168.42.1:6443 external 10.96.0.1:443 vrf 0  out2in-only
 tcp local 192.168.42.1:12379 external 192.168.42.2:32379 vrf 0  out2in-only
 tcp local 192.168.42.1:12379 external 192.168.16.2:32379 vrf 0  out2in-only
 tcp local 192.168.42.1:12379 external 192.168.42.1:32379 vrf 0  out2in-only
 tcp local 192.168.42.1:12379 external 192.168.16.1:32379 vrf 0  out2in-only
 tcp local 192.168.42.1:12379 external 10.109.143.39:12379 vrf 0  out2in-only
 udp local 10.1.2.2:53 external 10.96.0.10:53 vrf 0  out2in-only
 tcp local 10.1.2.2:53 external 10.96.0.10:53 vrf 0  out2in-only
```
```
vpp# sh nat44 interfaces
NAT44 interfaces:
 loop0 in out
 GigabitEthernet0/9/0 out
 tapcli-0 in out
```
```
vpp# sh nat44 sessions
NAT44 sessions:
  192.168.20.2: 0 dynamic translations, 3 static translations
  10.1.1.3: 0 dynamic translations, 0 static translations
  10.1.1.4: 0 dynamic translations, 0 static translations
  10.1.1.2: 0 dynamic translations, 6 static translations
  10.1.2.18: 0 dynamic translations, 2 static translations
```
- ACL config (issues related to policies):
```
vpp# sh acl-plugin acl
```
- "Steal the NIC (STN)" config (issues related to host connectivity when STN is active):
```
vpp# sh stn rules 
- rule_index: 0
  address: 10.1.10.47
  iface: tapcli-0 (2)
  next_node: tapcli-0-output (410)
```
- Errors:
```
vpp# sh errors
```
- Vxlan tunnels:
```
vpp# sh vxlan tunnels
```
- Vxlan tunnels:
```
vpp# sh vxlan tunnels
```
- Hardware interface information:
```
vpp# sh hardware-interfaces
```

### Basic Example

[contiv-vpp-bug-report.sh][1] is an example of a script that may be a useful starting point to gathering the above information using kubectl.  

Limitations: 
- The script does not include STN daemon logs nor does it handle the special
  case of a crash loop
  
Prerequisites:
- The user specified in the script must have passwordless access to all nodes
  in the cluster; on each node in the cluster the user must have passwordless
  access to sudo.
  
#### Setting up Prerequisites
To enable logging into a node without a password, copy your public key to the following
node:
```
ssh-copy-id <user-id>@<node-name-or-ip-address>
```

To enable running sudo without a password for a given user, enter:
```
$ sudo visudo
```

Append the following entry to run ALL command without a password for a given
user:
```
<userid> ALL=(ALL) NOPASSWD:ALL
```

You can also add user `<user-id>` to group `sudo` and edit the `sudo`
entry as follows:

```
# Allow members of group sudo to execute any command
%sudo	ALL=(ALL:ALL) NOPASSWD:ALL
```

Add user `<user-id>` to group `<group-id>` as follows:
```
sudo adduser <user-id> <group-id>
```
or as follows:
```
usermod -a -G <group-id> <user-id>
```
#### Working with the Contiv-VPP Vagrant Test Bed 
The script can be used to collect data from the [Contiv-VPP test bed created with Vagrant][2].
To collect debug information from this Contiv-VPP test bed, do the
following steps:
* In the directory where you created your vagrant test bed, do:
```
  vagrant ssh-config > vagrant-ssh.conf
```
* To collect the debug information do:
```
  ./contiv-vpp-bug-report.sh -u vagrant -m k8s-master -f <path-to-your-vagrant-ssh-config-file>/vagrant-ssh.conf
```

[1]: https://github.com/contiv/vpp/tree/master/scripts/contiv-vpp-bug-report.sh
[2]: https://github.com/contiv/vpp/blob/master/vagrant/README.md