summaryrefslogtreecommitdiffstats
path: root/docs/troubleshooting/reportingissues/reportingissues.rst
blob: 461bff4596d338fd1b9a00ea5323e78368d3daf8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
.. _reportingbugs:

.. toctree::

Reporting Bugs
==============

Although every situation is different, this section describes how to
collect data which will help make efficient use of everyone's time
when dealing with vpp bugs.

Before you press the Jira button to create a bug report - or email
vpp-dev@lists.fd.io - please ask yourself whether there's enough
information for someone else to understand and to reproduce the issue
given a reasonable amount of effort. **Unicast emails to maintainers,
committers, and the project PTL are strongly discouraged.**

A good strategy for clear-cut bugs: file a detailed Jira ticket, and
then send a short description of the issue to vpp-dev@lists.fd.io,
perhaps from the Jira ticket description. It's fine to send email to
vpp-dev@lists.fd.io to ask a few questions **before** filing Jira tickets.

Data to include in bug reports
==============================

Image version and operating environment
---------------------------------------

Please make sure to include the vpp image version and command-line arguments.

.. code-block:: console

    $ sudo bash
    # vppctl show version verbose cmdline
    Version:                  v18.07-rc0~509-gb9124828
    Compiled by:              vppuser
    Compile host:             vppbuild
    Compile date:             Fri Jul 13 09:05:37 EDT 2018
    Compile location:         /scratch/vpp-showversion
    Compiler:                 GCC 7.3.0
    Current PID:              5211
    Command line arguments:
      /scratch/vpp-showversion/build-root/install-vpp_debug-native/vpp/bin/vpp
      unix
      interactive

With respect to the operating environment: if misbehavior involving a
specific VM / container / bare-metal environment is involved, please
describe the environment in detail:

* Linux Distro (e.g. Ubuntu 18.04.2 LTS, CentOS-7, etc.)
* NIC type(s) (ixgbe, i40e, enic, etc. etc.), vhost-user, tuntap
* NUMA configuration if applicable

Please note the CPU architecture (x86_86, aarch64), and hardware platform.

When practicable, please report issues against released software, or
unmodified master/latest software.

"Show" command output
---------------------

Every situation is different. If the issue involves a sequence of
debug CLI command, please enable CLI command logging, and send the
sequence involved. Note that the debug CLI is a developer's tool -
**no warranty express or implied** - and that we may choose not to fix
debug CLI bugs.

Please include "show error" [error counter] output. It's often helpful
to "clear error", send a bit of traffic, then "show error"
particularly when running vpp on noisy networks.

Please include ip4 / ip6 / mpls FIB contents ("show ip fib", "show ip6
fib", "show mpls fib", "show mpls tunnel").

Please include "show hardware", "show interface", and "show interface
address" output

Here is a consolidated set of commands that are generally useful
before/after sending traffic.  Before sending traffic:

.. code-block:: console

    vppctl clear hardware
    vppctl clear interface
    vppctl clear error
    vppctl clear run

Send some traffic and then issue the following commands.

.. code-block:: console

    vppctl show version verbose
    vppctl show hardware
    vppctl show hardware address
    vppctl show interface
    vppctl show run
    vppctl show error

Here are some protocol specific show commands that may also make
sense.  Only include those features which have been configured.

.. code-block:: console

     vppctl show l2fib
     vppctl show bridge-domain

     vppctl show ip fib
     vppctl show ip arp

     vppctl show ip6 fib
     vppctl show ip6 neighbors

     vppctl show mpls fib
     vppctl show mpls tunnel

Network Topology
----------------

Please include a crisp description of the network topology, including
L2 / IP / MPLS / segment-routing addressing details. If you expect
folks to reproduce and debug issues, this is a must.

At or above a certain level of topological complexity, it becomes
problematic to reproduce the original setup.

Packet Tracer Output
--------------------

If you capture packet tracer output which seems relevant, please include it.

.. code-block:: console

    vppctl trace add dpdk-input 100  # or similar

send-traffic

.. code-block:: console

    vppctl show trace

Capturing post-mortem data
==========================

It should go without saying, but anyhow: **please put post-mortem data
in obvious, accessible places.** Time wasted trying to acquire
accounts, credentials, and IP addresses simply delays problem
resolution.

Please remember to add post-mortem data location information to Jira
tickets.

Syslog Output
-------------

The vpp signal handler typically writes a certain amount of data in
/var/log/syslog before exiting. Make sure to check for evidence, e.g
via "grep /usr/bin/vpp /var/log/syslog" or similar.

Binary API Trace
----------------

If the issue involves a sequence of control-plane API messages - even
a very long sequence - please enable control-plane API
tracing. Control-plane API post-mortem traces end up in
/tmp/api_post_mortem.<pid>.

Please remember to put post-mortem binary api traces in accessible
places.

These API traces are especially helpful in cases where the vpp engine
is throwing traffic on the floor, e.g. for want of a default route or
similar.

Make sure to leave the default stanza "... api-trace { on } ... " in
the vpp startup configuration file /etc/vpp/startup.conf, or to
include it in the command line arguments passed by orchestration
software.

Core Files
----------

Production systems, as well as long-running pre-production soak-test
systems, **must** arrange to collect core images. There are various
ways to configure core image capture, including e.g. the Ubuntu
"corekeeper" package. In a pinch, the following very basic sequence
will capture usable vpp core files in /tmp/dumps.

.. code-block:: console

    # mkdir -p /tmp/dumps
    # sysctl -w debug.exception-trace=1
    # sysctl -w kernel.core_pattern="/tmp/dumps/%e-%t"
    # ulimit -c unlimited
    # echo 2 > /proc/sys/fs/suid_dumpable

If you start VPP from systemd, you also need to edit
/lib/systemd/system/vpp.service and uncomment the "LimitCORE=infinity"
line before restarting VPP.

Vpp core files often appear enormous. Gzip typically compresses them
to manageable sizes. A multi-GByte corefile often compresses to 10-20
Mbytes.

Please remember to put compressed core files in accessible places.

Make sure to leave the default stanza "... unix { ... full-coredump
... } ... " in the vpp startup configuration file
/etc/vpp/startup.conf, or to include it in the command line arguments
passed by orchestration software.

Core files from private images require special handling. If it's
necessary to go that route, copy the **exact** Debian packages (or
RPMs) which correspond to the core file to the same public place as
the core file. A no-excuses-allowed, hard-and-fast requirement.

In particular:

.. code-block:: console

  libvppinfra_<version>_<arch>.deb # vppinfra library
  libvppinfra-dev_<version>_<arch>.deb # vppinfra library development pkg
  vpp_<version>_<arch>.deb         # the vpp executable
  vpp-dbg_<version>_<arch>.deb     # debug symbols
  vpp-dev_<version>_<arch>.deb     # vpp development pkg
  vpp-lib_<version>_<arch>.deb     # shared libraries
  vpp-plugin-core_<version>_<arch>.deb # core plugins
  vpp-plugin-dpdk_<version>_<arch>.deb # dpdk plugin

For reference, please include git commit-ID, branch, and git repo
information [for repos other than gerrit.fd.io] in the Jira ticket.

Note that git commit-ids are crypto sums of the head [latest]
**merged** patch. They say **nothing whatsoever** about local
workspace modifications, branching, or the git repo in question.

Even given a byte-for-byte identical source tree, it's easy to build
dramatically different binary artifacts. All it takes is a different
toolchain version.


Compressed Core Files
---------------------

Depending on operational requirements, it's possible to compress
corefiles as they are generated. Please note that it takes several
seconds' worth of wall-clock time to compress a vpp core file on the
fly, during which all packet processing activities are suspended.

To create compressed core files on the fly, create the following
script, e.g. in /usr/local/bin/compressed_corefiles, owned by root,
executable:

.. code-block:: console

  #!/bin/sh
  exec /bin/gzip -f - >"/tmp/dumps/core-$1.$2.gz"

Adjust the kernel core file pattern as shown:

.. code-block:: console

  sysctl -w kernel.core_pattern="|/usr/local/bin/compressed_corefiles %e %t"

Core File Summary
-----------------

Bottom line: please follow core file handling instructions to the
letter. It's not complicated. Simply copy the exact Debian packages or
RPMs which correspond to core files to accessible locations.

If we go through the setup process only to discover that the image and
core files don't match, it will simply delay resolution of the issue;
to say nothing of irritating the person who just wasted their time.