diff options
Diffstat (limited to 'docs/troubleshooting')
-rw-r--r-- | docs/troubleshooting/cpuusage.rst | 112 | ||||
-rw-r--r-- | docs/troubleshooting/index.rst | 15 | ||||
-rw-r--r-- | docs/troubleshooting/mem.rst | 87 | ||||
-rw-r--r-- | docs/troubleshooting/reportingissues/index.rst | 8 | ||||
-rw-r--r-- | docs/troubleshooting/reportingissues/reportingissues.rst | 284 | ||||
-rw-r--r-- | docs/troubleshooting/sanitizer.rst | 45 |
6 files changed, 0 insertions, 551 deletions
diff --git a/docs/troubleshooting/cpuusage.rst b/docs/troubleshooting/cpuusage.rst deleted file mode 100644 index b9b8942a3dd..00000000000 --- a/docs/troubleshooting/cpuusage.rst +++ /dev/null @@ -1,112 +0,0 @@ -.. _cpuusage: - -************** -CPU Load/Usage -************** - -There are various commands and tools that can help users see FD.io VPP CPU and memory usage at runtime. - -Linux top/htop -============== - -The Linux top and htop are decent tools to look at FD.io VPP cpu and memory usage, but they will only show -preallocated memory and total CPU usage. These commands can be useful to show which cores VPP is running on. - -This is an example of VPP instance that is running on cores 8 and 9. For this output type **top** and then -type **1** when the tool starts. - -.. code-block:: console - - $ top - - top - 11:04:04 up 35 days, 3:16, 5 users, load average: 2.33, 2.23, 2.16 - Tasks: 435 total, 2 running, 432 sleeping, 1 stopped, 0 zombie - %Cpu0 : 1.0 us, 0.7 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st - %Cpu1 : 2.0 us, 0.3 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st - %Cpu2 : 0.7 us, 1.0 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st - %Cpu3 : 1.7 us, 0.7 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st - %Cpu4 : 2.0 us, 0.7 sy, 0.0 ni, 97.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st - %Cpu5 : 3.0 us, 0.3 sy, 0.0 ni, 96.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st - %Cpu6 : 2.3 us, 0.7 sy, 0.0 ni, 97.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st - %Cpu7 : 2.6 us, 0.3 sy, 0.0 ni, 97.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st - %Cpu8 : 96.0 us, 0.3 sy, 0.0 ni, 3.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st - %Cpu9 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st - %Cpu10 : 1.0 us, 0.3 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st - .... - -VPP Memory Usage -================ - -For details on VPP memory usage you can use the **show memory** command - -This is the example VPP memory usage on 2 cores. - -.. code-block:: console - - # vppctl show memory verbose - Thread 0 vpp_main - 22043 objects, 17878k of 20826k used, 2426k free, 2396k reclaimed, 346k overhead, 1048572k capacity - alloc. from small object cache: 22875 hits 39973 attempts (57.23%) replacements 5143 - alloc. from free-list: 44732 attempts, 26017 hits (58.16%), 528461 considered (per-attempt 11.81) - alloc. from vector-expand: 3430 - allocs: 52324 2027.84 clocks/call - frees: 30280 594.38 clocks/call - Thread 1 vpp_wk_0 - 22043 objects, 17878k of 20826k used, 2427k free, 2396k reclaimed, 346k overhead, 1048572k capacity - alloc. from small object cache: 22881 hits 39984 attempts (57.23%) replacements 5148 - alloc. from free-list: 44736 attempts, 26021 hits (58.17%), 528465 considered (per-attempt 11.81) - alloc. from vector-expand: 3430 - allocs: 52335 2027.54 clocks/call - frees: 30291 594.36 clocks/call - -VPP CPU Load -============ - -To find the VPP CPU load or how busy VPP is use the **show runtime** command. - -With at least one interface in polling mode, the VPP CPU utilization is always 100%. - -A good indicator of CPU load is **"average vectors/node"**. A bigger number means VPP -is more busy but also more efficient. The Maximum value is 255 (unless you change VLIB_FRAME_SIZE in code). -It basically means how many packets are processed in batch. - -If VPP is not loaded it will likely poll so fast that it will just get one or few -packets from the rx queue. This is the case shown below on Thread 1. As load goes up vpp -will have more work to do, so it will poll less frequently, and that will result in more -packets waiting in rx queue. More packets will result in more efficient execution of the -code so number of clock cycles / packet will go down. When "average vectors/node" goes up -close to 255, you will likely start observing rx queue tail drops. - -.. code-block:: console - - # vppctl show run - Thread 0 vpp_main (lcore 8) - Time 6152.9, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00 - vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0 - Name State Calls Vectors Suspends Clocks Vectors/Call - acl-plugin-fa-cleaner-process event wait 0 0 1 3.66e4 0.00 - admin-up-down-process event wait 0 0 1 2.54e3 0.00 - .... - --------------- - Thread 1 vpp_wk_0 (lcore 9) - Time 6152.9, average vectors/node 1.00, last 128 main loops 0.00 per node 0.00 - vector rates in 1.3073e2, out 1.3073e2, drop 6.5009e-4, punt 0.0000e0 - Name State Calls Vectors Suspends Clocks Vectors/Call - TenGigabitEthernet86/0/0-outpu active 804395 804395 0 6.17e2 1.00 - TenGigabitEthernet86/0/0-tx active 804395 804395 0 7.29e2 1.00 - arp-input active 2 2 0 3.82e4 1.00 - dpdk-input polling 24239296364 804398 0 1.59e7 0.00 - error-drop active 4 4 0 4.65e3 1.00 - ethernet-input active 2 2 0 1.08e4 1.00 - interface-output active 1 1 0 3.78e3 1.00 - ip4-glean active 1 1 0 6.98e4 1.00 - ip4-icmp-echo-request active 804394 804394 0 5.02e2 1.00 - ip4-icmp-input active 804394 804394 0 4.63e2 1.00 - ip4-input-no-checksum active 804394 804394 0 8.51e2 1.00 - ip4-load-balance active 804394 804394 0 5.46e2 1.00 - ip4-local active 804394 804394 0 5.79e2 1.00 - ip4-lookup active 804394 804394 0 5.71e2 1.00 - ip4-rewrite active 804393 804393 0 5.69e2 1.00 - ip6-input active 2 2 0 5.72e3 1.00 - ip6-not-enabled active 2 2 0 1.56e4 1.00 - unix-epoll-input polling 835722 0 0 3.03e-3 0.00 diff --git a/docs/troubleshooting/index.rst b/docs/troubleshooting/index.rst deleted file mode 100644 index 5dee98a8029..00000000000 --- a/docs/troubleshooting/index.rst +++ /dev/null @@ -1,15 +0,0 @@ -.. _troubleshooting: - -############### -Troubleshooting -############### - -This chapter describes some of the many techniques used to troubleshoot and diagnose -problem with FD.io VPP implementations. - -.. toctree:: - - reportingissues/index.rst - cpuusage - sanitizer - mem diff --git a/docs/troubleshooting/mem.rst b/docs/troubleshooting/mem.rst deleted file mode 100644 index 207b2777c50..00000000000 --- a/docs/troubleshooting/mem.rst +++ /dev/null @@ -1,87 +0,0 @@ -.. _memleak: - -***************** -Memory leaks -***************** - -Memory traces -============= - -VPP supports memory traces to help debug (suspected) memory leaks. Each -allocation/deallocation is instrumented so that the number of allocations and -current global allocated size is maintained for each unique allocation stack -trace. - -Looking at a memory trace can help diagnose where memory is (over-)used, and -comparing memory traces at different point in time can help diagnose if and -where memory leaks happen. - -To enable memory traces on main-heap: - -.. code-block:: console - - $ vppctl memory-trace on main-heap - -To dump memory traces for analysis: - -.. code-block:: console - - $ vppctl show memory-trace on main-heap - Thread 0 vpp_main - base 0x7fffb6422000, size 1g, locked, unmap-on-destroy, name 'main heap' - page stats: page-size 4K, total 262144, mapped 30343, not-mapped 231801 - numa 0: 30343 pages, 118.53m bytes - total: 1023.99M, used: 115.49M, free: 908.50M, trimmable: 908.48M - free chunks 451 free fastbin blks 0 - max total allocated 1023.99M - - Bytes Count Sample Traceback - 31457440 1 0x7fffbb31ad00 clib_mem_alloc_aligned_at_offset + 0x80 - clib_mem_alloc_aligned + 0x26 - alloc_aligned_8_8 + 0xe1 - clib_bihash_instantiate_8_8 + 0x76 - clib_bihash_init2_8_8 + 0x2ec - clib_bihash_init_8_8 + 0x6a - l2fib_table_init + 0x54 - set_int_l2_mode + 0x89 - int_l3 + 0xb4 - vlib_cli_dispatch_sub_commands + 0xeee - vlib_cli_dispatch_sub_commands + 0xc62 - vlib_cli_dispatch_sub_commands + 0xc62 - 266768 5222 0x7fffbd79f978 clib_mem_alloc_aligned_at_offset + 0x80 - vec_resize_allocate_memory + 0xa8 - _vec_resize_inline + 0x240 - unix_cli_file_add + 0x83d - unix_cli_listen_read_ready + 0x10b - linux_epoll_input_inline + 0x943 - linux_epoll_input + 0x39 - dispatch_node + 0x336 - vlib_main_or_worker_loop + 0xbf1 - vlib_main_loop + 0x1a - vlib_main + 0xae7 - thread0 + 0x3e - .... - -libc memory traces -================== - -Internal VPP memory allocations rely on VPP main-heap, however when using -external libraries, esp. in plugins (eg. OpenSSL library used by the IKEv2 -plugin), those external libraries usually manages memory using the standard -libc malloc()/free()/... calls. This, in turn, makes use of the default -libc heap. - -VPP has no knowledge of this heap and tools such as memory traces cannot be -used. - -In order to enable the use of standard VPP debugging tools, this library -replaces standard libc memory management calls with version using VPP -main-heap. - -To use it, you need to use the `LD_PRELOAD` mechanism, eg. - -.. code-block:: console - - ~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libvppmem_preload.so /usr/bin/vpp -c /etc/vpp/startup.conf - -You can then use tools such as memory traces as usual. diff --git a/docs/troubleshooting/reportingissues/index.rst b/docs/troubleshooting/reportingissues/index.rst deleted file mode 100644 index 4d954ac8746..00000000000 --- a/docs/troubleshooting/reportingissues/index.rst +++ /dev/null @@ -1,8 +0,0 @@ -.. _reportingissues: - -How to Report an Issue -====================== - -.. toctree:: - - reportingissues diff --git a/docs/troubleshooting/reportingissues/reportingissues.rst b/docs/troubleshooting/reportingissues/reportingissues.rst deleted file mode 100644 index 3ccd494d092..00000000000 --- a/docs/troubleshooting/reportingissues/reportingissues.rst +++ /dev/null @@ -1,284 +0,0 @@ -.. _reportingbugs: - -.. toctree:: - -Reporting Bugs -============== - -Although every situation is different, this section describes how to -collect data which will help make efficient use of everyone's time -when dealing with vpp bugs. - -Before you press the Jira button to create a bug report - or email -vpp-dev@lists.fd.io - please ask yourself whether there's enough -information for someone else to understand and to reproduce the issue -given a reasonable amount of effort. **Unicast emails to maintainers, -committers, and the project PTL are strongly discouraged.** - -A good strategy for clear-cut bugs: file a detailed Jira ticket, and -then send a short description of the issue to vpp-dev@lists.fd.io, -perhaps from the Jira ticket description. It's fine to send email to -vpp-dev@lists.fd.io to ask a few questions **before** filing Jira tickets. - -Data to include in bug reports -============================== - -Image version and operating environment ---------------------------------------- - -Please make sure to include the vpp image version and command-line arguments. - -.. code-block:: console - - $ sudo bash - # vppctl show version verbose cmdline - Version: v18.07-rc0~509-gb9124828 - Compiled by: vppuser - Compile host: vppbuild - Compile date: Fri Jul 13 09:05:37 EDT 2018 - Compile location: /scratch/vpp-showversion - Compiler: GCC 7.3.0 - Current PID: 5211 - Command line arguments: - /scratch/vpp-showversion/build-root/install-vpp_debug-native/vpp/bin/vpp - unix - interactive - -With respect to the operating environment: if misbehavior involving a -specific VM / container / bare-metal environment is involved, please -describe the environment in detail: - -* Linux Distro (e.g. Ubuntu 18.04.2 LTS, CentOS-7, etc.) -* NIC type(s) (ixgbe, i40e, enic, etc. etc.), vhost-user, tuntap -* NUMA configuration if applicable - -Please note the CPU architecture (x86_86, aarch64), and hardware platform. - -When practicable, please report issues against released software, or -unmodified master/latest software. - -"Show" command output ---------------------- - -Every situation is different. If the issue involves a sequence of -debug CLI command, please enable CLI command logging, and send the -sequence involved. Note that the debug CLI is a developer's tool - -**no warranty express or implied** - and that we may choose not to fix -debug CLI bugs. - -Please include "show error" [error counter] output. It's often helpful -to "clear error", send a bit of traffic, then "show error" -particularly when running vpp on noisy networks. - -Please include ip4 / ip6 / mpls FIB contents ("show ip fib", "show ip6 -fib", "show mpls fib", "show mpls tunnel"). - -Please include "show hardware", "show interface", and "show interface -address" output - -Here is a consolidated set of commands that are generally useful -before/after sending traffic. Before sending traffic: - -.. code-block:: console - - vppctl clear hardware - vppctl clear interface - vppctl clear error - vppctl clear run - -Send some traffic and then issue the following commands. - -.. code-block:: console - - vppctl show version verbose - vppctl show hardware - vppctl show interface address - vppctl show interface - vppctl show run - vppctl show error - -Here are some protocol specific show commands that may also make -sense. Only include those features which have been configured. - -.. code-block:: console - - vppctl show l2fib - vppctl show bridge-domain - - vppctl show ip fib - vppctl show ip neighbors - - vppctl show ip6 fib - vppctl show ip6 neighbors - - vppctl show mpls fib - vppctl show mpls tunnel - -Network Topology ----------------- - -Please include a crisp description of the network topology, including -L2 / IP / MPLS / segment-routing addressing details. If you expect -folks to reproduce and debug issues, this is a must. - -At or above a certain level of topological complexity, it becomes -problematic to reproduce the original setup. - -Packet Tracer Output --------------------- - -If you capture packet tracer output which seems relevant, please include it. - -.. code-block:: console - - vppctl trace add dpdk-input 100 # or similar - -send-traffic - -.. code-block:: console - - vppctl show trace - -Capturing post-mortem data -========================== - -It should go without saying, but anyhow: **please put post-mortem data -in obvious, accessible places.** Time wasted trying to acquire -accounts, credentials, and IP addresses simply delays problem -resolution. - -Please remember to add post-mortem data location information to Jira -tickets. - -Syslog Output -------------- - -The vpp signal handler typically writes a certain amount of data in -/var/log/syslog before exiting. Make sure to check for evidence, e.g -via "grep /usr/bin/vpp /var/log/syslog" or similar. - -Binary API Trace ----------------- - -If the issue involves a sequence of control-plane API messages - even -a very long sequence - please enable control-plane API -tracing. Control-plane API post-mortem traces end up in -/tmp/api_post_mortem.<pid>. - -Please remember to put post-mortem binary api traces in accessible -places. - -These API traces are especially helpful in cases where the vpp engine -is throwing traffic on the floor, e.g. for want of a default route or -similar. - -Make sure to leave the default stanza "... api-trace { on } ... " in -the vpp startup configuration file /etc/vpp/startup.conf, or to -include it in the command line arguments passed by orchestration -software. - -Core Files ----------- - -Production systems, as well as long-running pre-production soak-test -systems, **must** arrange to collect core images. There are various -ways to configure core image capture, including e.g. the Ubuntu -"corekeeper" package. In a pinch, the following very basic sequence -will capture usable vpp core files in /tmp/dumps. - -.. code-block:: console - - # mkdir -p /tmp/dumps - # sysctl -w debug.exception-trace=1 - # sysctl -w kernel.core_pattern="/tmp/dumps/%e-%t" - # ulimit -c unlimited - # echo 2 > /proc/sys/fs/suid_dumpable - -If you start VPP from systemd, you also need to edit -/lib/systemd/system/vpp.service and uncomment the "LimitCORE=infinity" -line before restarting VPP. - -Vpp core files often appear enormous, but they are invariably -sparse. Gzip compresses them to manageable sizes. A multi-GByte -corefile often compresses to 10-20 Mbytes. - -When decompressing a vpp core file, we suggest using "dd" as shown to -create a sparse, uncompressed core file: - -.. code-block:: console - - $ zcat vpp_core.gz | dd conv=sparse of=vpp_core - -Please remember to put compressed core files in accessible places. - -Make sure to leave the default stanza "... unix { ... full-coredump -... } ... " in the vpp startup configuration file -/etc/vpp/startup.conf, or to include it in the command line arguments -passed by orchestration software. - -Core files from Private Images -============================== - -Core files from private images require special handling. If it's -necessary to go that route, copy the **exact** Debian packages (or -RPMs) which correspond to the core file to the same public place as -the core file. A no-excuses-allowed, hard-and-fast requirement. - -In particular: - -.. code-block:: console - - libvppinfra_<version>_<arch>.deb # vppinfra library - libvppinfra-dev_<version>_<arch>.deb # vppinfra library development pkg - vpp_<version>_<arch>.deb # the vpp executable - vpp-dbg_<version>_<arch>.deb # debug symbols - vpp-dev_<version>_<arch>.deb # vpp development pkg - vpp-lib_<version>_<arch>.deb # shared libraries - vpp-plugin-core_<version>_<arch>.deb # core plugins - vpp-plugin-dpdk_<version>_<arch>.deb # dpdk plugin - -For reference, please include git commit-ID, branch, and git repo -information [for repos other than gerrit.fd.io] in the Jira ticket. - -Note that git commit-ids are crypto sums of the head [latest] -**merged** patch. They say **nothing whatsoever** about local -workspace modifications, branching, or the git repo in question. - -Even given a byte-for-byte identical source tree, it's easy to build -dramatically different binary artifacts. All it takes is a different -toolchain version. - - -On-the-fly Core File Compression --------------------------------- - -Depending on operational requirements, it's possible to compress -corefiles as they are generated. Please note that it takes several -seconds' worth of wall-clock time to compress a vpp core file on the -fly, during which all packet processing activities are suspended. - -To create compressed core files on the fly, create the following -script, e.g. in /usr/local/bin/compressed_corefiles, owned by root, -executable: - -.. code-block:: console - - #!/bin/sh - exec /bin/gzip -f - >"/tmp/dumps/core-$1.$2.gz" - -Adjust the kernel core file pattern as shown: - -.. code-block:: console - - sysctl -w kernel.core_pattern="|/usr/local/bin/compressed_corefiles %e %t" - -Core File Summary ------------------ - -Bottom line: please follow core file handling instructions to the -letter. It's not complicated. Simply copy the exact Debian packages or -RPMs which correspond to core files to accessible locations. - -If we go through the setup process only to discover that the image and -core files don't match, it will simply delay resolution of the issue; -to say nothing of irritating the person who just wasted their time. diff --git a/docs/troubleshooting/sanitizer.rst b/docs/troubleshooting/sanitizer.rst deleted file mode 100644 index 217f5e57182..00000000000 --- a/docs/troubleshooting/sanitizer.rst +++ /dev/null @@ -1,45 +0,0 @@ -.. _sanitizer: - -***************** -Google Sanitizers -***************** - -VPP is instrumented to support `Google Sanitizers <https://github.com/google/sanitizers>`_. -As of today, only `AddressSanitizer <https://github.com/google/sanitizers/wiki/AddressSanitizer>`_ -is supported, both for GCC and clang. - -AddressSanitizer -================ - -`AddressSanitizer <https://github.com/google/sanitizers/wiki/AddressSanitizer>`_ (aka ASan) is a memory -error detector for C/C++. Think Valgrind but much faster. - -In order to use it, VPP must be recompiled with ASan support. It is implemented as a cmake -build option, so all VPP targets should be supported. For example: - -.. code-block:: console - - # build a debug image with ASan support: - $ make rebuild VPP_EXTRA_CMAKE_ARGS=-DVPP_ENABLE_SANITIZE_ADDR=ON - .... - - # build a release image with ASan support: - $ make rebuild-release VPP_EXTRA_CMAKE_ARGS=-DVPP_ENABLE_SANITIZE_ADDR=ON - .... - - # build packages in debug mode with ASan support: - $ make pkg-deb-debug VPP_EXTRA_CMAKE_ARGS=-DVPP_ENABLE_SANITIZE_ADDR=ON - .... - - # run GBP plugin tests in debug mode with ASan - $ make test-debug TEST=test_gbp VPP_EXTRA_CMAKE_ARGS=-DVPP_ENABLE_SANITIZE_ADDR=ON - .... - -Once VPP has been built with ASan support you can use it as usual including -under gdb: - -.. code-block:: console - - $ gdb --args $PWD/build-root/install-vpp_debug-native/vpp/bin/vpp "unix { interactive }" - .... - |