diff options
Diffstat (limited to 'docs/gettingstarted/troubleshooting')
-rw-r--r-- | docs/gettingstarted/troubleshooting/cpuusage.rst | 112 | ||||
-rw-r--r-- | docs/gettingstarted/troubleshooting/index.rst | 14 | ||||
-rw-r--r-- | docs/gettingstarted/troubleshooting/mem.rst | 87 | ||||
-rw-r--r-- | docs/gettingstarted/troubleshooting/sanitizer.rst | 45 |
4 files changed, 258 insertions, 0 deletions
diff --git a/docs/gettingstarted/troubleshooting/cpuusage.rst b/docs/gettingstarted/troubleshooting/cpuusage.rst new file mode 100644 index 00000000000..9b4514e128e --- /dev/null +++ b/docs/gettingstarted/troubleshooting/cpuusage.rst @@ -0,0 +1,112 @@ +.. _cpuusage: + +************** +CPU Load/Usage +************** + +There are various commands and tools that can help users see FD.io VPP CPU and memory usage at runtime. + +Linux top/htop +============== + +The Linux top and htop are decent tools to look at FD.io VPP cpu and memory usage, but they will only show +preallocated memory and total CPU usage. These commands can be useful to show which cores VPP is running on. + +This is an example of VPP instance that is running on cores 8 and 9. For this output type **top** and then +type **1** when the tool starts. + +.. code-block:: console + + $ top + + top - 11:04:04 up 35 days, 3:16, 5 users, load average: 2.33, 2.23, 2.16 + Tasks: 435 total, 2 running, 432 sleeping, 1 stopped, 0 zombie + %Cpu0 : 1.0 us, 0.7 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st + %Cpu1 : 2.0 us, 0.3 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st + %Cpu2 : 0.7 us, 1.0 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st + %Cpu3 : 1.7 us, 0.7 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st + %Cpu4 : 2.0 us, 0.7 sy, 0.0 ni, 97.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st + %Cpu5 : 3.0 us, 0.3 sy, 0.0 ni, 96.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st + %Cpu6 : 2.3 us, 0.7 sy, 0.0 ni, 97.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st + %Cpu7 : 2.6 us, 0.3 sy, 0.0 ni, 97.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st + %Cpu8 : 96.0 us, 0.3 sy, 0.0 ni, 3.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st + %Cpu9 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st + %Cpu10 : 1.0 us, 0.3 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st + .... + +VPP Memory Usage +================ + +For details on VPP memory usage you can use the **show memory** command + +This is the example VPP memory usage on 2 cores. + +.. code-block:: console + + # vppctl show memory verbose + Thread 0 vpp_main + 22043 objects, 17878k of 20826k used, 2426k free, 2396k reclaimed, 346k overhead, 1048572k capacity + alloc. from small object cache: 22875 hits 39973 attempts (57.23%) replacements 5143 + alloc. from free-list: 44732 attempts, 26017 hits (58.16%), 528461 considered (per-attempt 11.81) + alloc. from vector-expand: 3430 + allocs: 52324 2027.84 clocks/call + frees: 30280 594.38 clocks/call + Thread 1 vpp_wk_0 + 22043 objects, 17878k of 20826k used, 2427k free, 2396k reclaimed, 346k overhead, 1048572k capacity + alloc. from small object cache: 22881 hits 39984 attempts (57.23%) replacements 5148 + alloc. from free-list: 44736 attempts, 26021 hits (58.17%), 528465 considered (per-attempt 11.81) + alloc. from vector-expand: 3430 + allocs: 52335 2027.54 clocks/call + frees: 30291 594.36 clocks/call + +VPP CPU Load +============ + +To find the VPP CPU load or how busy VPP is use the **show runtime** command. + +With at least one interface in polling mode, the VPP CPU utilization is always 100%. + +A good indicator of CPU load is **"average vectors/node"**. A bigger number means VPP +is more busy but also more efficient. The Maximum value is 255 (unless you change VLIB_FRAME_SIZE in code). +It basically means how many packets are processed in batch. + +If VPP is not loaded it will likely poll so fast that it will just get one or few +packets from the rx queue. This is the case shown below on Thread 1. As load goes up vpp +will have more work to do, so it will poll less frequently, and that will result in more +packets waiting in rx queue. More packets will result in more efficient execution of the +code so number of clock cycles / packet will go down. When "average vectors/node" goes up +close to 255, you will likely start observing rx queue tail drops. + +.. code-block:: console + + # vppctl show run + Thread 0 vpp_main (lcore 8) + Time 6152.9, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00 + vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0 + Name State Calls Vectors Suspends Clocks Vectors/Call + acl-plugin-fa-cleaner-process event wait 0 0 1 3.66e4 0.00 + admin-up-down-process event wait 0 0 1 2.54e3 0.00 + .... + --------------- + Thread 1 vpp_wk_0 (lcore 9) + Time 6152.9, average vectors/node 1.00, last 128 main loops 0.00 per node 0.00 + vector rates in 1.3073e2, out 1.3073e2, drop 6.5009e-4, punt 0.0000e0 + Name State Calls Vectors Suspends Clocks Vectors/Call + TenGigabitEthernet86/0/0-outpu active 804395 804395 0 6.17e2 1.00 + TenGigabitEthernet86/0/0-tx active 804395 804395 0 7.29e2 1.00 + arp-input active 2 2 0 3.82e4 1.00 + dpdk-input polling 24239296364 804398 0 1.59e7 0.00 + error-drop active 4 4 0 4.65e3 1.00 + ethernet-input active 2 2 0 1.08e4 1.00 + interface-output active 1 1 0 3.78e3 1.00 + ip4-glean active 1 1 0 6.98e4 1.00 + ip4-icmp-echo-request active 804394 804394 0 5.02e2 1.00 + ip4-icmp-input active 804394 804394 0 4.63e2 1.00 + ip4-input-no-checksum active 804394 804394 0 8.51e2 1.00 + ip4-load-balance active 804394 804394 0 5.46e2 1.00 + ip4-local active 804394 804394 0 5.79e2 1.00 + ip4-lookup active 804394 804394 0 5.71e2 1.00 + ip4-rewrite active 804393 804393 0 5.69e2 1.00 + ip6-input active 2 2 0 5.72e3 1.00 + ip6-not-enabled active 2 2 0 1.56e4 1.00 + unix-epoll-input polling 835722 0 0 3.03e-3 0.00 diff --git a/docs/gettingstarted/troubleshooting/index.rst b/docs/gettingstarted/troubleshooting/index.rst new file mode 100644 index 00000000000..d70c19042c8 --- /dev/null +++ b/docs/gettingstarted/troubleshooting/index.rst @@ -0,0 +1,14 @@ +.. _troubleshooting: + +############### +Troubleshooting +############### + +This chapter describes some of the many techniques used to troubleshoot and diagnose +problem with FD.io VPP implementations. + +.. toctree:: + + cpuusage + sanitizer + mem diff --git a/docs/gettingstarted/troubleshooting/mem.rst b/docs/gettingstarted/troubleshooting/mem.rst new file mode 100644 index 00000000000..630b0af02f3 --- /dev/null +++ b/docs/gettingstarted/troubleshooting/mem.rst @@ -0,0 +1,87 @@ +.. _memleak: + +***************** +Memory leaks +***************** + +Memory traces +============= + +VPP supports memory traces to help debug (suspected) memory leaks. Each +allocation/deallocation is instrumented so that the number of allocations and +current global allocated size is maintained for each unique allocation stack +trace. + +Looking at a memory trace can help diagnose where memory is (over-)used, and +comparing memory traces at different point in time can help diagnose if and +where memory leaks happen. + +To enable memory traces on main-heap: + +.. code-block:: console + + $ vppctl memory-trace on main-heap + +To dump memory traces for analysis: + +.. code-block:: console + + $ vppctl show memory-trace on main-heap + Thread 0 vpp_main + base 0x7fffb6422000, size 1g, locked, unmap-on-destroy, name 'main heap' + page stats: page-size 4K, total 262144, mapped 30343, not-mapped 231801 + numa 0: 30343 pages, 118.53m bytes + total: 1023.99M, used: 115.49M, free: 908.50M, trimmable: 908.48M + free chunks 451 free fastbin blks 0 + max total allocated 1023.99M + + Bytes Count Sample Traceback + 31457440 1 0x7fffbb31ad00 clib_mem_alloc_aligned_at_offset + 0x80 + clib_mem_alloc_aligned + 0x26 + alloc_aligned_8_8 + 0xe1 + clib_bihash_instantiate_8_8 + 0x76 + clib_bihash_init2_8_8 + 0x2ec + clib_bihash_init_8_8 + 0x6a + l2fib_table_init + 0x54 + set_int_l2_mode + 0x89 + int_l3 + 0xb4 + vlib_cli_dispatch_sub_commands + 0xeee + vlib_cli_dispatch_sub_commands + 0xc62 + vlib_cli_dispatch_sub_commands + 0xc62 + 266768 5222 0x7fffbd79f978 clib_mem_alloc_aligned_at_offset + 0x80 + vec_resize_allocate_memory + 0xa8 + _vec_resize_inline + 0x240 + unix_cli_file_add + 0x83d + unix_cli_listen_read_ready + 0x10b + linux_epoll_input_inline + 0x943 + linux_epoll_input + 0x39 + dispatch_node + 0x336 + vlib_main_or_worker_loop + 0xbf1 + vlib_main_loop + 0x1a + vlib_main + 0xae7 + thread0 + 0x3e + .... + +libc memory traces +================== + +Internal VPP memory allocations rely on VPP main-heap, however when using +external libraries, esp. in plugins (e.g. OpenSSL library used by the IKEv2 +plugin), those external libraries usually manages memory using the standard +libc malloc()/free()/... calls. This, in turn, makes use of the default +libc heap. + +VPP has no knowledge of this heap and tools such as memory traces cannot be +used. + +In order to enable the use of standard VPP debugging tools, this library +replaces standard libc memory management calls with version using VPP +main-heap. + +To use it, you need to use the `LD_PRELOAD` mechanism, e.g. + +.. code-block:: console + + ~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libvppmem_preload.so /usr/bin/vpp -c /etc/vpp/startup.conf + +You can then use tools such as memory traces as usual. diff --git a/docs/gettingstarted/troubleshooting/sanitizer.rst b/docs/gettingstarted/troubleshooting/sanitizer.rst new file mode 100644 index 00000000000..217f5e57182 --- /dev/null +++ b/docs/gettingstarted/troubleshooting/sanitizer.rst @@ -0,0 +1,45 @@ +.. _sanitizer: + +***************** +Google Sanitizers +***************** + +VPP is instrumented to support `Google Sanitizers <https://github.com/google/sanitizers>`_. +As of today, only `AddressSanitizer <https://github.com/google/sanitizers/wiki/AddressSanitizer>`_ +is supported, both for GCC and clang. + +AddressSanitizer +================ + +`AddressSanitizer <https://github.com/google/sanitizers/wiki/AddressSanitizer>`_ (aka ASan) is a memory +error detector for C/C++. Think Valgrind but much faster. + +In order to use it, VPP must be recompiled with ASan support. It is implemented as a cmake +build option, so all VPP targets should be supported. For example: + +.. code-block:: console + + # build a debug image with ASan support: + $ make rebuild VPP_EXTRA_CMAKE_ARGS=-DVPP_ENABLE_SANITIZE_ADDR=ON + .... + + # build a release image with ASan support: + $ make rebuild-release VPP_EXTRA_CMAKE_ARGS=-DVPP_ENABLE_SANITIZE_ADDR=ON + .... + + # build packages in debug mode with ASan support: + $ make pkg-deb-debug VPP_EXTRA_CMAKE_ARGS=-DVPP_ENABLE_SANITIZE_ADDR=ON + .... + + # run GBP plugin tests in debug mode with ASan + $ make test-debug TEST=test_gbp VPP_EXTRA_CMAKE_ARGS=-DVPP_ENABLE_SANITIZE_ADDR=ON + .... + +Once VPP has been built with ASan support you can use it as usual including +under gdb: + +.. code-block:: console + + $ gdb --args $PWD/build-root/install-vpp_debug-native/vpp/bin/vpp "unix { interactive }" + .... + |