summaryrefslogtreecommitdiffstats
path: root/src/vlib/threads.c
AgeCommit message (Collapse)AuthorFilesLines
2019-10-02vlib: improve summary vector-rate statisticsDave Barach1-1/+1
Type: refactor Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: I4b77879b0a84fdec3c1518a972cf003d5135222d Signed-off-by: Ole Troan <ot@cisco.com>
2019-09-30vlib: fix old nodes vector overflow in reforkBenoît Ganne1-5/+2
Type: fix Change-Id: I81c4cf0ce87288bb2d3c7b9f31e9419290d588b4 Signed-off-by: Benoît Ganne <bganne@cisco.com>
2019-07-23vlib: address vlib_error_t scaling issueDave Barach1-0/+2
Encoding the vpp node index into the vlib_error_t as a 10-bit quantity limits us to 1K graph nodes. Unfortunately, a few nodes need 6 bit per-node error codes. Only a very few nodes have so many counters. It turns out that there are about 2K total error counters in the system, which is (approximately) the maximum error heap index. The current (index,code) encoding limits the number of interfaces to around 250, since each interface has two associated graph nodes and we have about 500 "normal, interior" graph node This patch adds an error-index to node-index map, so we can store error heap indices directly in the vlib_buffer_t. Type: refactor Change-Id: I28101cad3d8750819e27b8785fc0cf71ff54f79a Signed-off-by: Dave Barach <dave@barachs.net>
2019-07-16api: enable binary API event logging in vatDave Barach1-26/+4
Cleaned up a few instances of side-bet elog_string hash table usage. Elog_string handles that problem itself. Add cli commands to vat to initialize, enable/disable, and save an event log. Event logging at the same time in both vpp and vat yields a pair of event logs which can be merged by the "test_elog" tool. Type: refactor Change-Id: I8d6a72206f2309c967ea1630077fba31aef47f93 Signed-off-by: Dave Barach <dave@barachs.net>
2019-06-24vlib: packet tracer support for pkt thread handoffsDave Barach1-0/+6
Type: feature Change-Id: Ia3d9a47679202c2a47cd3746b50e86c6b8627ef6 Signed-off-by: Dave Barach <dave@barachs.net>
2019-06-08trace frame-queue on trigger out of memorydongjuan1-2/+1
FRAME_QUEUE_NELTS is 64 in thread.c Change-Id: Ie7e5962afe05dfc7f38e3d597dabc74dcc2dab8d Signed-off-by: dongjuan <dong.juan1@zte.com.cn>
2019-06-04sort worker-thread init functions in advanceDave Barach1-5/+30
Otherwise, all N worker threads try to sort the list at the same time: a good way to have a bad day. This approach performs *far* better than maintaing order by adding a spin-lock. By direct measurement w/ elog + g2: 11 threads execute the per-thread init function list in 22us, vs. 50ms with a CLIB_PAUSE() enabled spin-lock. Change-Id: I1745f2a213c0561260139a60114dcb981e0c64e5 Signed-off-by: Dave Barach <dave@barachs.net>
2019-05-22stats: support multiple works for error countersOle Troan1-13/+14
The current code only allowed access to the main thread error counters. That is not so useful for a multi worker instance. No return a vector indexed by thread of counter_t values. Type: fix Change-Id: Ie322c8889c0c8175e1116e71de04a2cf453b9ed7 Signed-off-by: Ole Troan <ot@cisco.com>
2019-05-16init / exit function orderingDave Barach1-1/+1
The vlib init function subsystem now supports a mix of procedural and formally-specified ordering constraints. We should eliminate procedural knowledge wherever possible. The following schemes are *roughly* equivalent: static clib_error_t *init_runs_first (vlib_main_t *vm) { clib_error_t *error; ... do some stuff... if ((error = vlib_call_init_function (init_runs_next))) return error; ... } VLIB_INIT_FUNCTION (init_runs_first); and static clib_error_t *init_runs_first (vlib_main_t *vm) { ... do some stuff... } VLIB_INIT_FUNCTION (init_runs_first) = { .runs_before = VLIB_INITS("init_runs_next"), }; The first form will [most likely] call "init_runs_next" on the spot. The second form means that "init_runs_first" runs before "init_runs_next," possibly much earlier in the sequence. Please DO NOT construct sets of init functions where A before B actually means A *right before* B. It's not necessary - simply combine A and B - and it leads to hugely annoying debugging exercises when trying to switch from ad-hoc procedural ordering constraints to formal ordering constraints. Change-Id: I5e4353503bf43b4acb11a45fb33c79a5ade8426c Signed-off-by: Dave Barach <dave@barachs.net>
2019-05-14Preallocate mhash key_tmps vectorDave Barach1-15/+10
Fix os_get_nthreads() so that it starts returning the correct answer as early as possible. Change-Id: Id5292262f2c3f521b07ffbe6a9f6748dcc4dcb7d Signed-off-by: Dave Barach <dave@barachs.net>
2019-04-24Clean up multi-thread barrier-sync hold-down timerDave Barach1-10/+30
Main thread: don't bother with the barrier sync hold-down timer if none of the worker threads are busy. Worker threads: avoid epoll_pwait (10ms timeout) when the control-plane has been active in the last half-second. Change-Id: I82008d09968c65e2a4af0ebb7887389992e60603 Signed-off-by: Dave Barach <dave@barachs.net>
2019-03-10Perf tune get_frame_size_infoDave Barach1-2/+4
It turns out that for scalar sizes 0..24, frames are always the same size. That range includes all current use-cases - and then some - so get rid of the hash table. Old code preserved under #ifdef VLIB_SUPPORTS_ARBITRARY_SCALAR_SIZES. Change-Id: Ic005c7143c9639f77d1a0fadd2fc0e90dccb68c1 Signed-off-by: Dave Barach <dbarach@cisco.com>
2019-02-27Fix crash in barrier sync when vlib_worker_threads is zeroDamjan Marion1-1/+2
Change-Id: I6819dd9dbfc15c17740bdb98b51bdd639ef8c4d2 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-02-20calculate per-thread time offsetDave Barach1-0/+47
The main thread squirrels away vlib_time_now (&vlib_global_main), worker threads use it to calculate an offset in f64 seconds from their own vlib_time_now(vm) value. We use that offset until the next barrier sync. Thanks to Damjan for the suggestion. Change-Id: If56cdfe68e5ad8ac3b0d0fc885dc3ba556cd1215 Signed-off-by: Dave Barach <dave@barachs.net>
2019-01-30buffers: major cleanup and improvementsDamjan Marion1-22/+0
This patch introduces following changes: - deprecated free lists which are not used and not compatible with external buffer managers (i.e. DPDK) - introduces native support for per-numa buffer pools - significantly improves performance of buffer alloc and free Change-Id: I4a8e723ae47056717afd6cac0efe87cb731b5be7 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-01-28Less intrusive way to capture numa_node and cpu_id changesDamjan Marion1-2/+0
Change-Id: I3a33fb81f31ed473811e9e7a6197b81135913865 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-01-27Fix issue with cpu_id and numa_code captured too earlyDamjan Marion1-2/+1
Change-Id: I79b213b34c6071d14acf1922f89037a4a5a36c45 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-01-27perfmon: collect data on selected thread(s)Dave Barach1-1/+43
Add missing pre-input node runtime fork and refork code. unix-epoll-input runs on all threads; each instance needs its own runtime stats. Change-Id: I16b02e42d0c95f863161176c4bb9f9917bef809d Signed-off-by: Dave Barach <dave@barachs.net>
2019-01-20Store numa-noda and cpu-index in vlib_main_tDamjan Marion1-0/+3
Change-Id: If88ccd965122b9318a39a8d71b53334cd1fd81e4 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-11-25vlib:init free list buffers vecEyal Bari1-0/+2
fixes a problem seen when dhcp proxy uses vlib_buffer_chain_linearize function which tries to vlib_buffer_alloc zero buffers - which succeeds if the buffers vector is initialized but otherwise crashes when trying to update the _vec_len in vlib_buffer_alloc_from_free_list solved by initializing the free_list buffers vec Change-Id: I1186d7aac05e54864d72f3f144e1bef8064f8efa Signed-off-by: Eyal Bari <ebari@cisco.com>
2018-11-14Remove c-11 memcpy checks from perf-critical codeDave Barach1-19/+19
Change-Id: Id4f37f5d4a03160572954a416efa1ef9b3d79ad1 Signed-off-by: Dave Barach <dave@barachs.net>
2018-11-01Move RPC calls off the binary API input queueDave Barach1-0/+3
Change-Id: I2476e3e916a42b41d1e66bfc1ec4f8c4264c1720 Signed-off-by: Dave Barach <dbarach@cisco.com>
2018-10-25Revert "Keep RPC traffic off the shared-memory API queue"Florin Coras1-3/+0
This reverts commit 71615399e194847d7833b744caedab9b841733e5. There seems to be an issue with ARPs when running with multiple workers. Change-Id: Iaa68081512362945a9caf24dcb8d70fc7c5b75df Signed-off-by: Florin Coras <fcoras@cisco.com>
2018-10-24Keep RPC traffic off the shared-memory API queueDave Barach1-0/+3
Change-Id: Ib5c346641463768cf33eaf8cb5fab5b63171398d Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-23c11 safe string handling supportDave Barach1-10/+10
Change-Id: Ied34720ca5a6e6e717eea4e86003e854031b6eab Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-19vppinfra: add atomic macros for __sync builtinsSirshak Das1-21/+5
This is first part of addition of atomic macros with only macros for __sync builtins. - Based on earlier patch by Damjan (https://gerrit.fd.io/r/#/c/10729/) Additionally - clib_atomic_release macro added and used in the absence of any memory barrier. - clib_atomic_bool_cmp_and_swap added Change-Id: Ie4e48c1e184a652018d1d0d87c4be80ddd180a3b Original-patch-by: Damjan Marion <damarion@cisco.com> Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com>
2018-10-01API / CLI event-log tracingDave Barach1-4/+6
Add an "elog trace [api][cli][barrier]" debug CLI command. Removed the barrier elog test command. Remove unused reliable multicast code. Change-Id: Ib3ecde901b7c49fe92b313d0087cd7e776adcdce Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-01thread: Add show threads apiMohsin Kazmi1-5/+27
Change-Id: I3124238ab4d43bcef5590bad33a4ff0b5d8b7d15 Signed-off-by: Mohsin Kazmi <sykazmi@cisco.com>
2018-09-30vlib: barrier sync elog tracing improvementsDave Barach1-111/+23
Change-Id: I2d3b8d5a7192ff68bee443a99346ecb807b2d833 Signed-off-by: Damjan Marion <damarion@cisco.com> Signed-off-by: Dave Barach <dave@barachs.net>
2018-09-07Improve NTP / kernel time change event handlingDave Barach1-2/+14
clib_time_verify_frequency(...) rejects clock frequency changes greater than 1%. vlib_worker_thread_barrier_sync_int (...) continuously checks that the barrier hold-down timer is not unreasonably far in the future. Change-Id: I00ecb4c20e44de5d6a9c1499fa933e3dd834d11a Signed-off-by: Dave Barach <dbarach@cisco.com>
2018-08-01Improve cpu { coremask-% } configure optionYi He1-20/+4
Accept any sized hexadecimal bitmask specification to support platforms with hundreds of cores. Change-Id: Ib881db0cf60f78bdeffa13acfc2fc7fe7e128cc4 Signed-off-by: Yi He <yi.he@arm.com>
2018-07-20Add congestion drop in interface handoffDamjan Marion1-1/+14
This prevents deadlock in case when worker A sends to B and worker B sends to A Change-Id: Id9436960f932c58325fe4f5ef8ec67b50031aeda Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-07-18Add config option to use dlmalloc instead of mheapDave Barach1-11/+21
Configure w/ --enable-dlmalloc, see .../build-data/platforms/vpp.mk src/vppinfra/dlmalloc.[ch] are slightly modified versions of the well-known Doug Lea malloc. Main advantage: dlmalloc mspaces have no inherent size limit. Change-Id: I19b3f43f3c65bcfb82c1a265a97922d01912446e Signed-off-by: Dave Barach <dave@barachs.net>
2018-07-11threads: fix issue with setting main-core to 0Damjan Marion1-7/+14
Change-Id: Id3ccfcfa2a88cf7aa106f1202af7cd677de32575 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-07-04Do not override skip-cores if set in startup.confDamjan Marion1-4/+2
Change-Id: I97d75f58a46275fdad5add27dc59c731cc3ac64d Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-07-03Set main thread affinity before main heap is allocatedDamjan Marion1-7/+7
Change-Id: I524909570fc1736f51fd437d6d30566c461139bd Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-06-13Stat segment / client: show run" works nowDave Barach1-0/+20
Seems to have minimal-to-zero performance consequences. Data appears accurate: result match the debug CLI output. Checked at low rates, 27 MPPS sprayed across two worker threads. Change-Id: I09ede5150b88a91547feeee448a2854997613004 Signed-off-by: Dave Barach <dave@barachs.net>
2018-06-03dpdk: improve buffer alloc perfomanceDamjan Marion1-1/+0
This is ~50% improvement in buffer alloc performance. For a 256 buffer allocation, it was ~10 clocks/buffer, now is < 5 clocks. Change-Id: I97590e240a79a42bcab5eb26587fc2d11e6eb163 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-03-27vlib: make cloned structures cacheline alignedDamjan Marion1-13/+23
This address crash with gcc-7 observed when -o3 is used. Change-Id: I10e87da8e5037ad480eba7fb0aaa9a657d3bf48d Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-03-14vlib: internal buffer manager reworkDamjan Marion1-8/+4
- buffer_main is no longer part of vlib_main_t - pool of free lists is still part of vlib_main_t - mheap is not used anymore for buffer allocation - simple bitmap bassed buffer alloc scheme is introduced Change-Id: I3e1e6d00e2c8122293ed0a741245eb841315a1ff Signed-off-by: Damjan Marion <damarion@cisco.com>
2017-12-15VPP-1102: fix dangling references in RPC handlingDave Barach1-0/+3
Queue RPC calls and send them from the main dispatch loop. As things stood, if the vpp main input queue filled, worker threads could enter a barrier-sync spin-wait in the middle of processing a frame. If thread 0 decided to recreate worker thread data structures, the worker thread(s) could easily crash. Legislate the problem out of existence by enqueueing RPC messages only from the main dispatch loop. At that point, doing a barrier-sync wait is perfectly OK. Change-Id: I18da3e44bb1f29a63fe5f30cf11de732ecfd5bf7 Signed-off-by: Dave Barach <dave@barachs.net>
2017-09-26Add thread-safe event signaller, use RPC where requiredDave Barach1-1/+23
Update ping code to use the new function Change-Id: Ieb753b23f8402cbe5667c22747896784c8ece937 Signed-off-by: Florin Coras <fcoras@cisco.com> Signed-off-by: Dave Barach <dave@barachs.net>
2017-09-15vppinfra: add clib_mem_vm_ext_alloc functionDamjan Marion1-3/+3
Change-Id: Iff33694fc42cc3bcc73cf1372339053a6365039c Signed-off-by: Damjan Marion <damarion@cisco.com>
2017-09-11Recombine diags and minimum barrier open time changes (VPP-968)Colin Tregenza Dancer1-10/+303
Support logging to both syslog and elog Also include DaveB is_mp_safe fix, which had been lost Change-Id: If82f7969e2f43c63c3fed5b1a0c7434c90c1f380 Signed-off-by: Colin Tregenza Dancer <ctd@metaswitch.com>
2017-09-05Refork worker thread data structures in parallel (VPP-970)Colin Tregenza Dancer1-136/+197
Change the rebuilding of worker thread clone datastructures to run in parallel on the workers, instead of serially on main. Change-Id: Ib76bcfbef1e51f2399972090f4057be7aaa84e08 Signed-off-by: Colin Tregenza Dancer <ctd@metaswitch.com>
2017-07-26VPP-905: Wrong define used in function start_workers.JingLiuZTE1-1/+2
Change-Id: I6a5faebb63e9360cebfcfb1bc3f3c0eb6b15e937 Signed-off-by: JingLiuZTE <liu.jing5@zte.com.cn>
2017-06-09Implement sack based tcp loss recovery (RFC 6675)Florin Coras1-1/+1
- refactor existing congestion control code (RFC 6582/5681). Handling of ack feedback now consists of: ack parsing, cc event detection, event handling, congestion control update - extend sack scoreboard to support sack based retransmissions - basic implementation of Eifel detection algorithm (RFC 3522) for detecting spurious retransmissions - actually initialize the per-thread frame freelist hash tables - increase worker stack size to 2mb - fix session queue node out-of-buffer handling - ensure that the local buffer cache vec_len matches reality - avoid 2x spurious event requeues when short of buffers - count out-of-buffer events - make the builtin server thread-safe - fix bihash template threading issue: need to paint -1 across uninitialized working_copy_length vector elements (via rebase from master) Change-Id: I646cb9f1add9a67d08f4a87badbcb117980ebfc4 Signed-off-by: Florin Coras <fcoras@cisco.com> Signed-off-by: Dave Barach <dbarach@cisco.com>
2017-05-10completelly deprecate os_get_cpu_number, replace new occurencesDamjan Marion1-10/+2
Change-Id: I82c663bc0866c6c68ba354104b0bb059387f4b9d Signed-off-by: Damjan Marion <damarion@cisco.com>
2017-05-09Fix remaining 32-bit compile issuesDamjan Marion1-1/+1
Change-Id: I9664214652229b663c3e3ba7406b4ede96bfb123 Signed-off-by: Damjan Marion <damarion@cisco.com>
2017-04-06Use thread local storage for thread indexDamjan Marion1-47/+22
This patch deprecates stack-based thread identification, Also removes requirement that thread stacks are adjacent. Finally, possibly annoying for some folks, it renames all occurences of cpu_index and cpu_number with thread index. Using word "cpu" is misleading here as thread can be migrated ti different CPU, and also it is not related to linux cpu index. Change-Id: I68cdaf661e701d2336fc953dcb9978d10a70f7c1 Signed-off-by: Damjan Marion <damarion@cisco.com>