aboutsummaryrefslogtreecommitdiffstats
path: root/src/vlib/threads.c
AgeCommit message (Collapse)AuthorFilesLines
2020-07-16misc: add callback hooks and refactor pmcTom Seidenberg1-0/+16
Callbacks for monitoring and performance measurement: - Add new callback list type, with context - Add callbacks for API, CLI, and barrier sync - Modify node dispatch callback to pass plugin-specific context - Modify perfmon plugin to keep PMC samples local to the plugin - Include process nodes in dispatch callback - Pass dispatch function return value to callback Type: refactor Signed-off-by: Tom Seidenberg <tseidenb@cisco.com> Change-Id: I28b06c58490611e08d76ff5b01b2347ba2109b22
2020-06-16vlib: address sanitizer support for stack switch, enable clangDamjan Marion1-0/+3
Type: improvement Change-Id: I81df4b61d1f0b8c1df77c1ee9bebcb491e155b69 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-04-27vlib: deprecate i2c and cjDave Barach1-4/+0
i2c follows its only use case - the original 82599 driver - into extras/deprecated. cj is/was an emergency debug tool unused in several years. Move to extras/deprecated/vlib Type: refactor Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: Ib55b65373f62630db295c562974bd8f2456c3107
2020-04-21vlib: queue_hi_thresh fix to avoid deadlockElias Rudberg1-2/+4
Adapt queue_hi_thresh value using num_threads to avoid risk of deadlock between threads which could happen for example when different NAT threads try to handoff work to each other at the same time when their frame queues are congested. This change ensures that each thread can reserve a queue entry without causing problems even in the most extreme case when all threads attempt to add to the same queue simultaneously when the queue is nearly full. Type: fix Signed-off-by: Elias Rudberg <elias.rudberg@bahnhof.net> Change-Id: I9e02f753bd00833d8dd500d181b0d4f9a454d703
2020-03-21vlib: allow using core 0 as a last resortVladimir Isaev1-15/+12
Right now even following configuration fails: cpu { corelist-workers 0 } I think we should allow user to use CPU 0 if he explicitly configured it or as the last automatically assigned core. Type: improvement Signed-off-by: Vladimir Isaev <visaev@netgate.com> Change-Id: I041c30349bace558c673475e9a0cffccae5821f7
2020-03-21vlib: complain if workers are configured twiceVladimir Isaev1-1/+10
Right now following configuration leads to crash: cpu { corelist-workers 2 workers 2 } because threads count will be set to 2, but we have only one core in coremask. Type: fix Signed-off-by: Vladimir Isaev <visaev@netgate.com> Change-Id: Ia93b892733971e7c8ddfceaaec5f4eb8bf9063ac
2020-03-12vppinfra: refactor clib_timebase_tDave Barach1-1/+8
Add a clib_time_t * argument to clib_timebase_init(...), to encourage client code to share the vlib_main_t's clib_time_t object. Display the current day / date in GMT via the "show time" debug CLI. Fix the test framework so it processes the new "show time" output format. Type: refactor Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: I5e52d57eb164b7cdb6355362d520df6928491711
2020-02-17vlib: fix code of getting numa node with specific cpu_idLijian.Zhang1-3/+18
Use below sysfs files to check which numa node a specific cpu_id belongs to. /sys/devices/system/node/online /sys/devices/system/node/node0/cpulist /sys/devices/system/node/node1/cpulist Type: fix Change-Id: I124b80b1fd4a20dd7bd76f0ae27d5ab23a3a8ff1 Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
2020-02-11vppinfra: remove the historical mheap memory allocatorDave Barach1-14/+2
The mheap allocator has been turned off for several releases. This commit removes the cmake config parameter, parallel support for dlmalloc and mheap, and the mheap allocator itself. Type: refactor Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: I104f88a1f06e47e90e5f7fb3e11cd1ca66467903
2020-02-11vlib: Skip core 0 if workers are configured.Jon Loeliger1-0/+15
According to the description in the startup.conf, the assignment of worker threads starts with the lcore following the main_lcore. A non-zero skip_cores will correctly achieve this assignment. However, prior to this patch when workers are assigned, the code picks up and assigns core 0 even thought it shouldn't. This patch determins if a non-zero number of workers are desired and if so, marks CPU unavailable for a worker assignment. Type: fix Change-Id: I1fdf73a6f218dcbf146fda2efc90c553f7cd6d20 Signed-off-by: Jon Loeliger <jdl@netgate.com>
2020-02-10vppinfra: use vm memory allocator for numa mappingFlorin Coras1-5/+3
Type: refactor Signed-off-by: Florin Coras <fcoras@cisco.com> Change-Id: I13b239cd572ae6dfaec07019d3d9b7c0ed3edcfa
2020-02-10vlib: properly configure main thread's core idJon Loeliger1-0/+2
Prior to this patch, the main thread's core id was always reported as 0. This patch makes a call to vlib_get_thread_core_socket() on the main thread so that its core id is set properly. Type: fix Fixes: 5d64c7868f67749a6c99eb4ee5998b518ab6c71c Change-Id: I019cb95eec031da25197e48d956038c4bd6b5040 Signed-off-by: Jon Loeliger <jdl@netgate.com>
2020-02-05vppinfra: numa vector placement supportDave Barach1-6/+29
Type: feature Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: I7e7d95a089dd849c1f01ecea84529d8dbf239f21
2020-01-30vppinfra: improve clocks_per_second convergenceDave Barach1-3/+7
Apply exponential smoothing to the clock rate update calculation in clib_time_verify_frequency(), with a half-life of 1 minute and a sampling frequency of 16 seconds. Within 5 minutes or so, the calculation converges With each rate recalculation: reset total_cpu_time based on the kernel timebase delta since vpp started, and the new clock rate Improve the "show clock [verbose]" debug CLI command. BFD echo + echo fail tests marked off until the BFD code can be reworked a bit. Type: fix Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: I24e88a78819b12867736c875067b386ef6115c5c
2019-10-02vlib: improve summary vector-rate statisticsDave Barach1-1/+1
Type: refactor Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: I4b77879b0a84fdec3c1518a972cf003d5135222d Signed-off-by: Ole Troan <ot@cisco.com>
2019-09-30vlib: fix old nodes vector overflow in reforkBenoît Ganne1-5/+2
Type: fix Change-Id: I81c4cf0ce87288bb2d3c7b9f31e9419290d588b4 Signed-off-by: Benoît Ganne <bganne@cisco.com>
2019-07-23vlib: address vlib_error_t scaling issueDave Barach1-0/+2
Encoding the vpp node index into the vlib_error_t as a 10-bit quantity limits us to 1K graph nodes. Unfortunately, a few nodes need 6 bit per-node error codes. Only a very few nodes have so many counters. It turns out that there are about 2K total error counters in the system, which is (approximately) the maximum error heap index. The current (index,code) encoding limits the number of interfaces to around 250, since each interface has two associated graph nodes and we have about 500 "normal, interior" graph node This patch adds an error-index to node-index map, so we can store error heap indices directly in the vlib_buffer_t. Type: refactor Change-Id: I28101cad3d8750819e27b8785fc0cf71ff54f79a Signed-off-by: Dave Barach <dave@barachs.net>
2019-07-16api: enable binary API event logging in vatDave Barach1-26/+4
Cleaned up a few instances of side-bet elog_string hash table usage. Elog_string handles that problem itself. Add cli commands to vat to initialize, enable/disable, and save an event log. Event logging at the same time in both vpp and vat yields a pair of event logs which can be merged by the "test_elog" tool. Type: refactor Change-Id: I8d6a72206f2309c967ea1630077fba31aef47f93 Signed-off-by: Dave Barach <dave@barachs.net>
2019-06-24vlib: packet tracer support for pkt thread handoffsDave Barach1-0/+6
Type: feature Change-Id: Ia3d9a47679202c2a47cd3746b50e86c6b8627ef6 Signed-off-by: Dave Barach <dave@barachs.net>
2019-06-08trace frame-queue on trigger out of memorydongjuan1-2/+1
FRAME_QUEUE_NELTS is 64 in thread.c Change-Id: Ie7e5962afe05dfc7f38e3d597dabc74dcc2dab8d Signed-off-by: dongjuan <dong.juan1@zte.com.cn>
2019-06-04sort worker-thread init functions in advanceDave Barach1-5/+30
Otherwise, all N worker threads try to sort the list at the same time: a good way to have a bad day. This approach performs *far* better than maintaing order by adding a spin-lock. By direct measurement w/ elog + g2: 11 threads execute the per-thread init function list in 22us, vs. 50ms with a CLIB_PAUSE() enabled spin-lock. Change-Id: I1745f2a213c0561260139a60114dcb981e0c64e5 Signed-off-by: Dave Barach <dave@barachs.net>
2019-05-22stats: support multiple works for error countersOle Troan1-13/+14
The current code only allowed access to the main thread error counters. That is not so useful for a multi worker instance. No return a vector indexed by thread of counter_t values. Type: fix Change-Id: Ie322c8889c0c8175e1116e71de04a2cf453b9ed7 Signed-off-by: Ole Troan <ot@cisco.com>
2019-05-16init / exit function orderingDave Barach1-1/+1
The vlib init function subsystem now supports a mix of procedural and formally-specified ordering constraints. We should eliminate procedural knowledge wherever possible. The following schemes are *roughly* equivalent: static clib_error_t *init_runs_first (vlib_main_t *vm) { clib_error_t *error; ... do some stuff... if ((error = vlib_call_init_function (init_runs_next))) return error; ... } VLIB_INIT_FUNCTION (init_runs_first); and static clib_error_t *init_runs_first (vlib_main_t *vm) { ... do some stuff... } VLIB_INIT_FUNCTION (init_runs_first) = { .runs_before = VLIB_INITS("init_runs_next"), }; The first form will [most likely] call "init_runs_next" on the spot. The second form means that "init_runs_first" runs before "init_runs_next," possibly much earlier in the sequence. Please DO NOT construct sets of init functions where A before B actually means A *right before* B. It's not necessary - simply combine A and B - and it leads to hugely annoying debugging exercises when trying to switch from ad-hoc procedural ordering constraints to formal ordering constraints. Change-Id: I5e4353503bf43b4acb11a45fb33c79a5ade8426c Signed-off-by: Dave Barach <dave@barachs.net>
2019-05-14Preallocate mhash key_tmps vectorDave Barach1-15/+10
Fix os_get_nthreads() so that it starts returning the correct answer as early as possible. Change-Id: Id5292262f2c3f521b07ffbe6a9f6748dcc4dcb7d Signed-off-by: Dave Barach <dave@barachs.net>
2019-04-24Clean up multi-thread barrier-sync hold-down timerDave Barach1-10/+30
Main thread: don't bother with the barrier sync hold-down timer if none of the worker threads are busy. Worker threads: avoid epoll_pwait (10ms timeout) when the control-plane has been active in the last half-second. Change-Id: I82008d09968c65e2a4af0ebb7887389992e60603 Signed-off-by: Dave Barach <dave@barachs.net>
2019-03-10Perf tune get_frame_size_infoDave Barach1-2/+4
It turns out that for scalar sizes 0..24, frames are always the same size. That range includes all current use-cases - and then some - so get rid of the hash table. Old code preserved under #ifdef VLIB_SUPPORTS_ARBITRARY_SCALAR_SIZES. Change-Id: Ic005c7143c9639f77d1a0fadd2fc0e90dccb68c1 Signed-off-by: Dave Barach <dbarach@cisco.com>
2019-02-27Fix crash in barrier sync when vlib_worker_threads is zeroDamjan Marion1-1/+2
Change-Id: I6819dd9dbfc15c17740bdb98b51bdd639ef8c4d2 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-02-20calculate per-thread time offsetDave Barach1-0/+47
The main thread squirrels away vlib_time_now (&vlib_global_main), worker threads use it to calculate an offset in f64 seconds from their own vlib_time_now(vm) value. We use that offset until the next barrier sync. Thanks to Damjan for the suggestion. Change-Id: If56cdfe68e5ad8ac3b0d0fc885dc3ba556cd1215 Signed-off-by: Dave Barach <dave@barachs.net>
2019-01-30buffers: major cleanup and improvementsDamjan Marion1-22/+0
This patch introduces following changes: - deprecated free lists which are not used and not compatible with external buffer managers (i.e. DPDK) - introduces native support for per-numa buffer pools - significantly improves performance of buffer alloc and free Change-Id: I4a8e723ae47056717afd6cac0efe87cb731b5be7 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-01-28Less intrusive way to capture numa_node and cpu_id changesDamjan Marion1-2/+0
Change-Id: I3a33fb81f31ed473811e9e7a6197b81135913865 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-01-27Fix issue with cpu_id and numa_code captured too earlyDamjan Marion1-2/+1
Change-Id: I79b213b34c6071d14acf1922f89037a4a5a36c45 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-01-27perfmon: collect data on selected thread(s)Dave Barach1-1/+43
Add missing pre-input node runtime fork and refork code. unix-epoll-input runs on all threads; each instance needs its own runtime stats. Change-Id: I16b02e42d0c95f863161176c4bb9f9917bef809d Signed-off-by: Dave Barach <dave@barachs.net>
2019-01-20Store numa-noda and cpu-index in vlib_main_tDamjan Marion1-0/+3
Change-Id: If88ccd965122b9318a39a8d71b53334cd1fd81e4 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-11-25vlib:init free list buffers vecEyal Bari1-0/+2
fixes a problem seen when dhcp proxy uses vlib_buffer_chain_linearize function which tries to vlib_buffer_alloc zero buffers - which succeeds if the buffers vector is initialized but otherwise crashes when trying to update the _vec_len in vlib_buffer_alloc_from_free_list solved by initializing the free_list buffers vec Change-Id: I1186d7aac05e54864d72f3f144e1bef8064f8efa Signed-off-by: Eyal Bari <ebari@cisco.com>
2018-11-14Remove c-11 memcpy checks from perf-critical codeDave Barach1-19/+19
Change-Id: Id4f37f5d4a03160572954a416efa1ef9b3d79ad1 Signed-off-by: Dave Barach <dave@barachs.net>
2018-11-01Move RPC calls off the binary API input queueDave Barach1-0/+3
Change-Id: I2476e3e916a42b41d1e66bfc1ec4f8c4264c1720 Signed-off-by: Dave Barach <dbarach@cisco.com>
2018-10-25Revert "Keep RPC traffic off the shared-memory API queue"Florin Coras1-3/+0
This reverts commit 71615399e194847d7833b744caedab9b841733e5. There seems to be an issue with ARPs when running with multiple workers. Change-Id: Iaa68081512362945a9caf24dcb8d70fc7c5b75df Signed-off-by: Florin Coras <fcoras@cisco.com>
2018-10-24Keep RPC traffic off the shared-memory API queueDave Barach1-0/+3
Change-Id: Ib5c346641463768cf33eaf8cb5fab5b63171398d Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-23c11 safe string handling supportDave Barach1-10/+10
Change-Id: Ied34720ca5a6e6e717eea4e86003e854031b6eab Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-19vppinfra: add atomic macros for __sync builtinsSirshak Das1-21/+5
This is first part of addition of atomic macros with only macros for __sync builtins. - Based on earlier patch by Damjan (https://gerrit.fd.io/r/#/c/10729/) Additionally - clib_atomic_release macro added and used in the absence of any memory barrier. - clib_atomic_bool_cmp_and_swap added Change-Id: Ie4e48c1e184a652018d1d0d87c4be80ddd180a3b Original-patch-by: Damjan Marion <damarion@cisco.com> Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com>
2018-10-01API / CLI event-log tracingDave Barach1-4/+6
Add an "elog trace [api][cli][barrier]" debug CLI command. Removed the barrier elog test command. Remove unused reliable multicast code. Change-Id: Ib3ecde901b7c49fe92b313d0087cd7e776adcdce Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-01thread: Add show threads apiMohsin Kazmi1-5/+27
Change-Id: I3124238ab4d43bcef5590bad33a4ff0b5d8b7d15 Signed-off-by: Mohsin Kazmi <sykazmi@cisco.com>
2018-09-30vlib: barrier sync elog tracing improvementsDave Barach1-111/+23
Change-Id: I2d3b8d5a7192ff68bee443a99346ecb807b2d833 Signed-off-by: Damjan Marion <damarion@cisco.com> Signed-off-by: Dave Barach <dave@barachs.net>
2018-09-07Improve NTP / kernel time change event handlingDave Barach1-2/+14
clib_time_verify_frequency(...) rejects clock frequency changes greater than 1%. vlib_worker_thread_barrier_sync_int (...) continuously checks that the barrier hold-down timer is not unreasonably far in the future. Change-Id: I00ecb4c20e44de5d6a9c1499fa933e3dd834d11a Signed-off-by: Dave Barach <dbarach@cisco.com>
2018-08-01Improve cpu { coremask-% } configure optionYi He1-20/+4
Accept any sized hexadecimal bitmask specification to support platforms with hundreds of cores. Change-Id: Ib881db0cf60f78bdeffa13acfc2fc7fe7e128cc4 Signed-off-by: Yi He <yi.he@arm.com>
2018-07-20Add congestion drop in interface handoffDamjan Marion1-1/+14
This prevents deadlock in case when worker A sends to B and worker B sends to A Change-Id: Id9436960f932c58325fe4f5ef8ec67b50031aeda Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-07-18Add config option to use dlmalloc instead of mheapDave Barach1-11/+21
Configure w/ --enable-dlmalloc, see .../build-data/platforms/vpp.mk src/vppinfra/dlmalloc.[ch] are slightly modified versions of the well-known Doug Lea malloc. Main advantage: dlmalloc mspaces have no inherent size limit. Change-Id: I19b3f43f3c65bcfb82c1a265a97922d01912446e Signed-off-by: Dave Barach <dave@barachs.net>
2018-07-11threads: fix issue with setting main-core to 0Damjan Marion1-7/+14
Change-Id: Id3ccfcfa2a88cf7aa106f1202af7cd677de32575 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-07-04Do not override skip-cores if set in startup.confDamjan Marion1-4/+2
Change-Id: I97d75f58a46275fdad5add27dc59c731cc3ac64d Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-07-03Set main thread affinity before main heap is allocatedDamjan Marion1-7/+7
Change-Id: I524909570fc1736f51fd437d6d30566c461139bd Signed-off-by: Damjan Marion <damarion@cisco.com>