summaryrefslogtreecommitdiffstats
path: root/docs/troubleshooting
diff options
context:
space:
mode:
authorJohn DeNisco <jdenisco@cisco.com>2018-07-26 12:45:10 -0400
committerDave Barach <openvpp@barachs.net>2018-07-26 18:34:47 +0000
commit06dcd45ff81e06bc8cf40ed487c0b2652d346a5a (patch)
tree71403f9d422c4e532b2871a66ab909bd6066b10b /docs/troubleshooting
parent1d65279ffecd0f540288187b94cb1a6b84a7a0c6 (diff)
Initial commit of Sphinx docs
Change-Id: I9fca8fb98502dffc2555f9de7f507b6f006e0e77 Signed-off-by: John DeNisco <jdenisco@cisco.com>
Diffstat (limited to 'docs/troubleshooting')
-rw-r--r--docs/troubleshooting/cpuusage.rst112
-rw-r--r--docs/troubleshooting/index.rst13
-rw-r--r--docs/troubleshooting/reportingissues/index.rst8
-rw-r--r--docs/troubleshooting/reportingissues/reportingissues.rst216
4 files changed, 349 insertions, 0 deletions
diff --git a/docs/troubleshooting/cpuusage.rst b/docs/troubleshooting/cpuusage.rst
new file mode 100644
index 00000000000..b9b8942a3dd
--- /dev/null
+++ b/docs/troubleshooting/cpuusage.rst
@@ -0,0 +1,112 @@
+.. _cpuusage:
+
+**************
+CPU Load/Usage
+**************
+
+There are various commands and tools that can help users see FD.io VPP CPU and memory usage at runtime.
+
+Linux top/htop
+==============
+
+The Linux top and htop are decent tools to look at FD.io VPP cpu and memory usage, but they will only show
+preallocated memory and total CPU usage. These commands can be useful to show which cores VPP is running on.
+
+This is an example of VPP instance that is running on cores 8 and 9. For this output type **top** and then
+type **1** when the tool starts.
+
+.. code-block:: console
+
+ $ top
+
+ top - 11:04:04 up 35 days, 3:16, 5 users, load average: 2.33, 2.23, 2.16
+ Tasks: 435 total, 2 running, 432 sleeping, 1 stopped, 0 zombie
+ %Cpu0 : 1.0 us, 0.7 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
+ %Cpu1 : 2.0 us, 0.3 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+ %Cpu2 : 0.7 us, 1.0 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+ %Cpu3 : 1.7 us, 0.7 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+ %Cpu4 : 2.0 us, 0.7 sy, 0.0 ni, 97.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+ %Cpu5 : 3.0 us, 0.3 sy, 0.0 ni, 96.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+ %Cpu6 : 2.3 us, 0.7 sy, 0.0 ni, 97.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+ %Cpu7 : 2.6 us, 0.3 sy, 0.0 ni, 97.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+ %Cpu8 : 96.0 us, 0.3 sy, 0.0 ni, 3.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+ %Cpu9 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+ %Cpu10 : 1.0 us, 0.3 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+ ....
+
+VPP Memory Usage
+================
+
+For details on VPP memory usage you can use the **show memory** command
+
+This is the example VPP memory usage on 2 cores.
+
+.. code-block:: console
+
+ # vppctl show memory verbose
+ Thread 0 vpp_main
+ 22043 objects, 17878k of 20826k used, 2426k free, 2396k reclaimed, 346k overhead, 1048572k capacity
+ alloc. from small object cache: 22875 hits 39973 attempts (57.23%) replacements 5143
+ alloc. from free-list: 44732 attempts, 26017 hits (58.16%), 528461 considered (per-attempt 11.81)
+ alloc. from vector-expand: 3430
+ allocs: 52324 2027.84 clocks/call
+ frees: 30280 594.38 clocks/call
+ Thread 1 vpp_wk_0
+ 22043 objects, 17878k of 20826k used, 2427k free, 2396k reclaimed, 346k overhead, 1048572k capacity
+ alloc. from small object cache: 22881 hits 39984 attempts (57.23%) replacements 5148
+ alloc. from free-list: 44736 attempts, 26021 hits (58.17%), 528465 considered (per-attempt 11.81)
+ alloc. from vector-expand: 3430
+ allocs: 52335 2027.54 clocks/call
+ frees: 30291 594.36 clocks/call
+
+VPP CPU Load
+============
+
+To find the VPP CPU load or how busy VPP is use the **show runtime** command.
+
+With at least one interface in polling mode, the VPP CPU utilization is always 100%.
+
+A good indicator of CPU load is **"average vectors/node"**. A bigger number means VPP
+is more busy but also more efficient. The Maximum value is 255 (unless you change VLIB_FRAME_SIZE in code).
+It basically means how many packets are processed in batch.
+
+If VPP is not loaded it will likely poll so fast that it will just get one or few
+packets from the rx queue. This is the case shown below on Thread 1. As load goes up vpp
+will have more work to do, so it will poll less frequently, and that will result in more
+packets waiting in rx queue. More packets will result in more efficient execution of the
+code so number of clock cycles / packet will go down. When "average vectors/node" goes up
+close to 255, you will likely start observing rx queue tail drops.
+
+.. code-block:: console
+
+ # vppctl show run
+ Thread 0 vpp_main (lcore 8)
+ Time 6152.9, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00
+ vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
+ Name State Calls Vectors Suspends Clocks Vectors/Call
+ acl-plugin-fa-cleaner-process event wait 0 0 1 3.66e4 0.00
+ admin-up-down-process event wait 0 0 1 2.54e3 0.00
+ ....
+ ---------------
+ Thread 1 vpp_wk_0 (lcore 9)
+ Time 6152.9, average vectors/node 1.00, last 128 main loops 0.00 per node 0.00
+ vector rates in 1.3073e2, out 1.3073e2, drop 6.5009e-4, punt 0.0000e0
+ Name State Calls Vectors Suspends Clocks Vectors/Call
+ TenGigabitEthernet86/0/0-outpu active 804395 804395 0 6.17e2 1.00
+ TenGigabitEthernet86/0/0-tx active 804395 804395 0 7.29e2 1.00
+ arp-input active 2 2 0 3.82e4 1.00
+ dpdk-input polling 24239296364 804398 0 1.59e7 0.00
+ error-drop active 4 4 0 4.65e3 1.00
+ ethernet-input active 2 2 0 1.08e4 1.00
+ interface-output active 1 1 0 3.78e3 1.00
+ ip4-glean active 1 1 0 6.98e4 1.00
+ ip4-icmp-echo-request active 804394 804394 0 5.02e2 1.00
+ ip4-icmp-input active 804394 804394 0 4.63e2 1.00
+ ip4-input-no-checksum active 804394 804394 0 8.51e2 1.00
+ ip4-load-balance active 804394 804394 0 5.46e2 1.00
+ ip4-local active 804394 804394 0 5.79e2 1.00
+ ip4-lookup active 804394 804394 0 5.71e2 1.00
+ ip4-rewrite active 804393 804393 0 5.69e2 1.00
+ ip6-input active 2 2 0 5.72e3 1.00
+ ip6-not-enabled active 2 2 0 1.56e4 1.00
+ unix-epoll-input polling 835722 0 0 3.03e-3 0.00
diff --git a/docs/troubleshooting/index.rst b/docs/troubleshooting/index.rst
new file mode 100644
index 00000000000..655eb1192e6
--- /dev/null
+++ b/docs/troubleshooting/index.rst
@@ -0,0 +1,13 @@
+.. _troubleshooting:
+
+###############
+Troubleshooting
+###############
+
+This chapter describes some of the many techniques used to troubleshoot and diagnose
+problem with FD.io VPP implementations.
+
+.. toctree::
+
+ reportingissues/index.rst
+ cpuusage
diff --git a/docs/troubleshooting/reportingissues/index.rst b/docs/troubleshooting/reportingissues/index.rst
new file mode 100644
index 00000000000..4d954ac8746
--- /dev/null
+++ b/docs/troubleshooting/reportingissues/index.rst
@@ -0,0 +1,8 @@
+.. _reportingissues:
+
+How to Report an Issue
+======================
+
+.. toctree::
+
+ reportingissues
diff --git a/docs/troubleshooting/reportingissues/reportingissues.rst b/docs/troubleshooting/reportingissues/reportingissues.rst
new file mode 100644
index 00000000000..7437d8ae0cc
--- /dev/null
+++ b/docs/troubleshooting/reportingissues/reportingissues.rst
@@ -0,0 +1,216 @@
+.. _reportingbugs:
+
+.. toctree::
+
+Reporting Bugs
+==============
+
+Although every situation is different, this page describes how to
+collect data which will help make efficient use of everyone's time
+when dealing with vpp bugs.
+
+Before you press the Jira button to create a bug report - or email
+vpp-dev@lists.fd.io - please ask yourself whether there's enough
+information for someone else to understand and possibly to reproduce
+the issue given a reasonable amount of effort. **Unicast emails to
+maintainers, committers, and the project PTL are strongly discouraged.**
+
+A good strategy for clear-cut bugs: file a detailed Jira ticket, and
+then send a short description of the issue to vpp-dev@lists.fd.io,
+perhaps from the Jira ticket description. It's fine to send email to
+vpp-dev@lists.fd.io to ask a few questions **before** filing Jira tickets.
+
+Data to include in bug reports
+==============================
+
+Image version and operating environment
+---------------------------------------
+
+Please make sure to include the vpp image version and command-line arguments.
+
+.. code-block:: console
+
+ $ sudo bash
+ # vppctl show version verbose cmdline
+ Version: v18.07-rc0~509-gb9124828
+ Compiled by: vppuser
+ Compile host: vppbuild
+ Compile date: Fri Jul 13 09:05:37 EDT 2018
+ Compile location: /scratch/vpp-showversion
+ Compiler: GCC 7.3.0
+ Current PID: 5211
+ Command line arguments:
+ /scratch/vpp-showversion/build-root/install-vpp_debug-native/vpp/bin/vpp
+ unix
+ interactive
+
+With respect to the operating environment: if misbehavior involving a
+specific VM / container / bare-metal environment is involved, please
+describe the environment in detail:
+
+* Linux Distro (e.g. Ubuntu 14.04.3 LTS, CentOS-7, etc.)
+* NIC type(s) (ixgbe, i40e, enic, etc. etc.), vhost-user, tuntap
+* NUMA configuration if applicable
+
+Please note the CPU architecture (x86_86, aarch64), and hardware platform.
+
+When practicable, please report issues against released software, or
+unmodified master/latest software.
+
+"Show" command output
+---------------------
+
+Every situation is different. If the issue involves a sequence of debug CLI command, please enable CLI command logging, and send the sequence involved. Note that the debug CLI is a developer's tool - **no warranty express or implied** - and that we may choose not to fix debug CLI bugs.
+
+Please include "show error" [error counter] output. It's often helpful to "clear error", send a bit of traffic, then "show error" particularly when running vpp on a noisy networks.
+
+Please include ip4 / ip6 / mpls FIB contents ("show ip fib", "show ip6 fib", "show mpls fib", "show mpls tunnel").
+
+Please include "show hardware", "show interface", and "show interface address" output
+
+Here is a consolidated set of commands that are generally useful before/after sending traffic. Before sending traffic.
+
+.. code-block:: console
+
+ vppctl clear hardware
+ vppctl clear interface
+ vppctl clear error
+ vppctl clear run
+
+Send some traffic and then issue the following commands.
+
+.. code-block:: console
+
+ vppctl show version verbose
+ vppctl show hardware
+ vppctl show hardware address
+ vppctl show interface
+ vppctl show run
+ vppctl show error
+
+Here are some protocol specific show commands that may also make
+sense. Only include those features which have been configured.
+
+.. code-block:: console
+
+ vppctl show l2fib
+ vppctl show bridge-domain
+
+ vppctl show ip fib
+ vppctl show ip arp
+
+ vppctl show ip6 fib
+ vppctl show ip6 neighbors
+
+ vppctl show mpls fib
+ vppctl show mpls tunnel
+
+Network Topology
+----------------
+
+Please include a crisp description of the network topology, including
+L2 / IP / MPLS / segment-routing addressing details. If you expect
+folks to reproduce and debug issues, this is a must.
+
+At or above a certain level of topological complexity, it becomes
+problematic to reproduce the original setup.
+
+Packet Tracer Output
+--------------------
+
+If you capture packet tracer output which seems relevant, please include it.
+
+.. code-block:: console
+
+ vppctl trace add dpdk-input 100 # or similar
+
+send-traffic
+
+.. code-block:: console
+
+ vppctl show trace
+
+Capturing post-mortem data
+==========================
+
+It should go without saying, but anyhow: **please put post-mortem data
+in obvious, accessible places.** Time wasted trying to acquire
+accounts, credentials, and IP addresses simply delays problem
+resolution.
+
+Please remember to add post-mortem data location information to Jira
+tickets.
+
+Syslog Output
+-------------
+
+The vpp signal handler typically writes a certain amount of data in
+/var/log/syslog before exiting. Make sure to check for evidence, e.g
+via "grep /usr/bin/vpp /var/log/syslog" or similar.
+
+Binary API Trace
+----------------
+
+If the issue involves a sequence of control-plane API messages - even
+a very long sequence - please enable control-plane API
+tracing. Control-plane API post-mortem traces end up in
+/tmp/api_post_mortem.<pid>.
+
+Please remember to put post-mortem binary api traces in accessible
+places.
+
+These API traces are especially helpful in cases where the vpp engine
+is throwing traffic on the floor, e.g. for want of a default route or
+similar.
+
+Make sure to leave the default stanza "... api-trace { on } ... " in
+the vpp startup configuration file /etc/vpp/startup.conf, or to
+include it in the command line arguments passed by orchestration
+software.
+
+Core Files
+----------
+
+Production systems, as well as long-running pre-production soak-test
+systems, **must** arrange to collect core images. There are various
+ways to configure core image capture, including e.g. the Ubuntu
+"corekeeper" package. In a pinch, the following very basic sequence
+will capture usable vpp core files in /tmp/dumps.
+
+.. code-block:: console
+
+ # mkdir -p /tmp/dumps
+ # sysctl -w debug.exception-trace=1
+ # sysctl -w kernel.core_pattern="/tmp/dumps/%e-%t"
+ # ulimit -c unlimited
+ # echo 2 > /proc/sys/fs/suid_dumpable
+
+Vpp core files often appear enormous. Gzip typically compresses them
+to manageable sizes. A multi-GByte corefile often compresses to 10-20
+Mbytes.
+
+Please remember to put compressed core files in accessible places.
+
+Make sure to leave the default stanza "... unix { ... full-coredump
+... } ... " in the vpp startup configuration file
+/etc/vpp/startup.conf, or to include it in the command line arguments
+passed by orchestration software.
+
+Core files from private, modified images are discouraged. If it's
+necessary to go that route, please copy the **exact** Debian
+packages (or RPMs) corresponding to the core file to the same public
+place as the core file. In particular.
+
+* vpp_<version>_<arch>.deb # the vpp executable
+* vpp-dbg_<version>_<arch>.deb # debug symbols
+* vpp-dev_<version>_<arch>.deb # development package
+* vpp-lib_<version>_<arch>.deb # shared libraries
+* vpp-plugins_<version>_<arch>.deb # plugins
+
+Please include the full commit-ID the Jira ticket.
+
+If we go through the setup process only to discover that the image and
+core files don't match, it will simply delay resolution of the
+issue. And it will annoy the heck out of the engineer who just wasted
+their time. Exact means **exact**, not "oh, gee, I added a few lines
+of debug scaffolding since then..."