vpp - Vector Packet Processing

Age	Commit message (Collapse)	Author	Files	Lines
2019-10-02	vlib: improve summary vector-rate statistics	Dave Barach	1	-1/+1
	Type: refactor Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: I4b77879b0a84fdec3c1518a972cf003d5135222d Signed-off-by: Ole Troan <ot@cisco.com>
2019-09-30	vlib: fix old nodes vector overflow in refork	Benoît Ganne	1	-5/+2
	Type: fix Change-Id: I81c4cf0ce87288bb2d3c7b9f31e9419290d588b4 Signed-off-by: Benoît Ganne <bganne@cisco.com>
2019-07-23	vlib: address vlib_error_t scaling issue	Dave Barach	1	-0/+2
	Encoding the vpp node index into the vlib_error_t as a 10-bit quantity limits us to 1K graph nodes. Unfortunately, a few nodes need 6 bit per-node error codes. Only a very few nodes have so many counters. It turns out that there are about 2K total error counters in the system, which is (approximately) the maximum error heap index. The current (index,code) encoding limits the number of interfaces to around 250, since each interface has two associated graph nodes and we have about 500 "normal, interior" graph node This patch adds an error-index to node-index map, so we can store error heap indices directly in the vlib_buffer_t. Type: refactor Change-Id: I28101cad3d8750819e27b8785fc0cf71ff54f79a Signed-off-by: Dave Barach <dave@barachs.net>
2019-07-16	api: enable binary API event logging in vat	Dave Barach	1	-26/+4
	Cleaned up a few instances of side-bet elog_string hash table usage. Elog_string handles that problem itself. Add cli commands to vat to initialize, enable/disable, and save an event log. Event logging at the same time in both vpp and vat yields a pair of event logs which can be merged by the "test_elog" tool. Type: refactor Change-Id: I8d6a72206f2309c967ea1630077fba31aef47f93 Signed-off-by: Dave Barach <dave@barachs.net>
2019-06-24	vlib: packet tracer support for pkt thread handoffs	Dave Barach	1	-0/+6
	Type: feature Change-Id: Ia3d9a47679202c2a47cd3746b50e86c6b8627ef6 Signed-off-by: Dave Barach <dave@barachs.net>
2019-06-08	trace frame-queue on trigger out of memory	dongjuan	1	-2/+1
	FRAME_QUEUE_NELTS is 64 in thread.c Change-Id: Ie7e5962afe05dfc7f38e3d597dabc74dcc2dab8d Signed-off-by: dongjuan <dong.juan1@zte.com.cn>
2019-06-04	sort worker-thread init functions in advance	Dave Barach	1	-5/+30
	Otherwise, all N worker threads try to sort the list at the same time: a good way to have a bad day. This approach performs far better than maintaing order by adding a spin-lock. By direct measurement w/ elog + g2: 11 threads execute the per-thread init function list in 22us, vs. 50ms with a CLIB_PAUSE() enabled spin-lock. Change-Id: I1745f2a213c0561260139a60114dcb981e0c64e5 Signed-off-by: Dave Barach <dave@barachs.net>
2019-05-22	stats: support multiple works for error counters	Ole Troan	1	-13/+14
	The current code only allowed access to the main thread error counters. That is not so useful for a multi worker instance. No return a vector indexed by thread of counter_t values. Type: fix Change-Id: Ie322c8889c0c8175e1116e71de04a2cf453b9ed7 Signed-off-by: Ole Troan <ot@cisco.com>
2019-05-16	init / exit function ordering	Dave Barach	1	-1/+1
	The vlib init function subsystem now supports a mix of procedural and formally-specified ordering constraints. We should eliminate procedural knowledge wherever possible. The following schemes are roughly equivalent: static clib_error_t init_runs_first (vlib_main_t vm) { clib_error_t error; ... do some stuff... if ((error = vlib_call_init_function (init_runs_next))) return error; ... } VLIB_INIT_FUNCTION (init_runs_first); and static clib_error_t init_runs_first (vlib_main_t vm) { ... do some stuff... } VLIB_INIT_FUNCTION (init_runs_first) = { .runs_before = VLIB_INITS("init_runs_next"), }; The first form will [most likely] call "init_runs_next" on the spot. The second form means that "init_runs_first" runs before "init_runs_next," possibly much earlier in the sequence. Please DO NOT construct sets of init functions where A before B actually means A right before* B. It's not necessary - simply combine A and B - and it leads to hugely annoying debugging exercises when trying to switch from ad-hoc procedural ordering constraints to formal ordering constraints. Change-Id: I5e4353503bf43b4acb11a45fb33c79a5ade8426c Signed-off-by: Dave Barach <dave@barachs.net>
2019-05-14	Preallocate mhash key_tmps vector	Dave Barach	1	-15/+10
	Fix os_get_nthreads() so that it starts returning the correct answer as early as possible. Change-Id: Id5292262f2c3f521b07ffbe6a9f6748dcc4dcb7d Signed-off-by: Dave Barach <dave@barachs.net>
2019-04-24	Clean up multi-thread barrier-sync hold-down timer	Dave Barach	1	-10/+30
	Main thread: don't bother with the barrier sync hold-down timer if none of the worker threads are busy. Worker threads: avoid epoll_pwait (10ms timeout) when the control-plane has been active in the last half-second. Change-Id: I82008d09968c65e2a4af0ebb7887389992e60603 Signed-off-by: Dave Barach <dave@barachs.net>
2019-03-10	Perf tune get_frame_size_info	Dave Barach	1	-2/+4
	It turns out that for scalar sizes 0..24, frames are always the same size. That range includes all current use-cases - and then some - so get rid of the hash table. Old code preserved under #ifdef VLIB_SUPPORTS_ARBITRARY_SCALAR_SIZES. Change-Id: Ic005c7143c9639f77d1a0fadd2fc0e90dccb68c1 Signed-off-by: Dave Barach <dbarach@cisco.com>
2019-02-27	Fix crash in barrier sync when vlib_worker_threads is zero	Damjan Marion	1	-1/+2
	Change-Id: I6819dd9dbfc15c17740bdb98b51bdd639ef8c4d2 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-02-20	calculate per-thread time offset	Dave Barach	1	-0/+47
	The main thread squirrels away vlib_time_now (&vlib_global_main), worker threads use it to calculate an offset in f64 seconds from their own vlib_time_now(vm) value. We use that offset until the next barrier sync. Thanks to Damjan for the suggestion. Change-Id: If56cdfe68e5ad8ac3b0d0fc885dc3ba556cd1215 Signed-off-by: Dave Barach <dave@barachs.net>
2019-01-30	buffers: major cleanup and improvements	Damjan Marion	1	-22/+0
	This patch introduces following changes: - deprecated free lists which are not used and not compatible with external buffer managers (i.e. DPDK) - introduces native support for per-numa buffer pools - significantly improves performance of buffer alloc and free Change-Id: I4a8e723ae47056717afd6cac0efe87cb731b5be7 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-01-28	Less intrusive way to capture numa_node and cpu_id changes	Damjan Marion	1	-2/+0
	Change-Id: I3a33fb81f31ed473811e9e7a6197b81135913865 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-01-27	Fix issue with cpu_id and numa_code captured too early	Damjan Marion	1	-2/+1
	Change-Id: I79b213b34c6071d14acf1922f89037a4a5a36c45 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-01-27	perfmon: collect data on selected thread(s)	Dave Barach	1	-1/+43
	Add missing pre-input node runtime fork and refork code. unix-epoll-input runs on all threads; each instance needs its own runtime stats. Change-Id: I16b02e42d0c95f863161176c4bb9f9917bef809d Signed-off-by: Dave Barach <dave@barachs.net>
2019-01-20	Store numa-noda and cpu-index in vlib_main_t	Damjan Marion	1	-0/+3
	Change-Id: If88ccd965122b9318a39a8d71b53334cd1fd81e4 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-11-25	vlib:init free list buffers vec	Eyal Bari	1	-0/+2
	fixes a problem seen when dhcp proxy uses vlib_buffer_chain_linearize function which tries to vlib_buffer_alloc zero buffers - which succeeds if the buffers vector is initialized but otherwise crashes when trying to update the _vec_len in vlib_buffer_alloc_from_free_list solved by initializing the free_list buffers vec Change-Id: I1186d7aac05e54864d72f3f144e1bef8064f8efa Signed-off-by: Eyal Bari <ebari@cisco.com>
2018-11-14	Remove c-11 memcpy checks from perf-critical code	Dave Barach	1	-19/+19
	Change-Id: Id4f37f5d4a03160572954a416efa1ef9b3d79ad1 Signed-off-by: Dave Barach <dave@barachs.net>
2018-11-01	Move RPC calls off the binary API input queue	Dave Barach	1	-0/+3
	Change-Id: I2476e3e916a42b41d1e66bfc1ec4f8c4264c1720 Signed-off-by: Dave Barach <dbarach@cisco.com>
2018-10-25	Revert "Keep RPC traffic off the shared-memory API queue"	Florin Coras	1	-3/+0
	This reverts commit 71615399e194847d7833b744caedab9b841733e5. There seems to be an issue with ARPs when running with multiple workers. Change-Id: Iaa68081512362945a9caf24dcb8d70fc7c5b75df Signed-off-by: Florin Coras <fcoras@cisco.com>
2018-10-24	Keep RPC traffic off the shared-memory API queue	Dave Barach	1	-0/+3
	Change-Id: Ib5c346641463768cf33eaf8cb5fab5b63171398d Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-23	c11 safe string handling support	Dave Barach	1	-10/+10
	Change-Id: Ied34720ca5a6e6e717eea4e86003e854031b6eab Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-19	vppinfra: add atomic macros for __sync builtins	Sirshak Das	1	-21/+5
	This is first part of addition of atomic macros with only macros for __sync builtins. - Based on earlier patch by Damjan (https://gerrit.fd.io/r/#/c/10729/) Additionally - clib_atomic_release macro added and used in the absence of any memory barrier. - clib_atomic_bool_cmp_and_swap added Change-Id: Ie4e48c1e184a652018d1d0d87c4be80ddd180a3b Original-patch-by: Damjan Marion <damarion@cisco.com> Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com>
2018-10-01	API / CLI event-log tracing	Dave Barach	1	-4/+6
	Add an "elog trace [api][cli][barrier]" debug CLI command. Removed the barrier elog test command. Remove unused reliable multicast code. Change-Id: Ib3ecde901b7c49fe92b313d0087cd7e776adcdce Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-01	thread: Add show threads api	Mohsin Kazmi	1	-5/+27
	Change-Id: I3124238ab4d43bcef5590bad33a4ff0b5d8b7d15 Signed-off-by: Mohsin Kazmi <sykazmi@cisco.com>
2018-09-30	vlib: barrier sync elog tracing improvements	Dave Barach	1	-111/+23
	Change-Id: I2d3b8d5a7192ff68bee443a99346ecb807b2d833 Signed-off-by: Damjan Marion <damarion@cisco.com> Signed-off-by: Dave Barach <dave@barachs.net>
2018-09-07	Improve NTP / kernel time change event handling	Dave Barach	1	-2/+14
	clib_time_verify_frequency(...) rejects clock frequency changes greater than 1%. vlib_worker_thread_barrier_sync_int (...) continuously checks that the barrier hold-down timer is not unreasonably far in the future. Change-Id: I00ecb4c20e44de5d6a9c1499fa933e3dd834d11a Signed-off-by: Dave Barach <dbarach@cisco.com>
2018-08-01	Improve cpu { coremask-% } configure option	Yi He	1	-20/+4
	Accept any sized hexadecimal bitmask specification to support platforms with hundreds of cores. Change-Id: Ib881db0cf60f78bdeffa13acfc2fc7fe7e128cc4 Signed-off-by: Yi He <yi.he@arm.com>
2018-07-20	Add congestion drop in interface handoff	Damjan Marion	1	-1/+14
	This prevents deadlock in case when worker A sends to B and worker B sends to A Change-Id: Id9436960f932c58325fe4f5ef8ec67b50031aeda Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-07-18	Add config option to use dlmalloc instead of mheap	Dave Barach	1	-11/+21
	Configure w/ --enable-dlmalloc, see .../build-data/platforms/vpp.mk src/vppinfra/dlmalloc.[ch] are slightly modified versions of the well-known Doug Lea malloc. Main advantage: dlmalloc mspaces have no inherent size limit. Change-Id: I19b3f43f3c65bcfb82c1a265a97922d01912446e Signed-off-by: Dave Barach <dave@barachs.net>
2018-07-11	threads: fix issue with setting main-core to 0	Damjan Marion	1	-7/+14
	Change-Id: Id3ccfcfa2a88cf7aa106f1202af7cd677de32575 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-07-04	Do not override skip-cores if set in startup.conf	Damjan Marion	1	-4/+2
	Change-Id: I97d75f58a46275fdad5add27dc59c731cc3ac64d Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-07-03	Set main thread affinity before main heap is allocated	Damjan Marion	1	-7/+7
	Change-Id: I524909570fc1736f51fd437d6d30566c461139bd Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-06-13	Stat segment / client: show run" works now	Dave Barach	1	-0/+20
	Seems to have minimal-to-zero performance consequences. Data appears accurate: result match the debug CLI output. Checked at low rates, 27 MPPS sprayed across two worker threads. Change-Id: I09ede5150b88a91547feeee448a2854997613004 Signed-off-by: Dave Barach <dave@barachs.net>
2018-06-03	dpdk: improve buffer alloc perfomance	Damjan Marion	1	-1/+0
	This is ~50% improvement in buffer alloc performance. For a 256 buffer allocation, it was ~10 clocks/buffer, now is < 5 clocks. Change-Id: I97590e240a79a42bcab5eb26587fc2d11e6eb163 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-03-27	vlib: make cloned structures cacheline aligned	Damjan Marion	1	-13/+23
	This address crash with gcc-7 observed when -o3 is used. Change-Id: I10e87da8e5037ad480eba7fb0aaa9a657d3bf48d Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-03-14	vlib: internal buffer manager rework	Damjan Marion	1	-8/+4
	- buffer_main is no longer part of vlib_main_t - pool of free lists is still part of vlib_main_t - mheap is not used anymore for buffer allocation - simple bitmap bassed buffer alloc scheme is introduced Change-Id: I3e1e6d00e2c8122293ed0a741245eb841315a1ff Signed-off-by: Damjan Marion <damarion@cisco.com>
2017-12-15	VPP-1102: fix dangling references in RPC handling	Dave Barach	1	-0/+3
	Queue RPC calls and send them from the main dispatch loop. As things stood, if the vpp main input queue filled, worker threads could enter a barrier-sync spin-wait in the middle of processing a frame. If thread 0 decided to recreate worker thread data structures, the worker thread(s) could easily crash. Legislate the problem out of existence by enqueueing RPC messages only from the main dispatch loop. At that point, doing a barrier-sync wait is perfectly OK. Change-Id: I18da3e44bb1f29a63fe5f30cf11de732ecfd5bf7 Signed-off-by: Dave Barach <dave@barachs.net>
2017-09-26	Add thread-safe event signaller, use RPC where required	Dave Barach	1	-1/+23
	Update ping code to use the new function Change-Id: Ieb753b23f8402cbe5667c22747896784c8ece937 Signed-off-by: Florin Coras <fcoras@cisco.com> Signed-off-by: Dave Barach <dave@barachs.net>
2017-09-15	vppinfra: add clib_mem_vm_ext_alloc function	Damjan Marion	1	-3/+3
	Change-Id: Iff33694fc42cc3bcc73cf1372339053a6365039c Signed-off-by: Damjan Marion <damarion@cisco.com>
2017-09-11	Recombine diags and minimum barrier open time changes (VPP-968)	Colin Tregenza Dancer	1	-10/+303
	Support logging to both syslog and elog Also include DaveB is_mp_safe fix, which had been lost Change-Id: If82f7969e2f43c63c3fed5b1a0c7434c90c1f380 Signed-off-by: Colin Tregenza Dancer <ctd@metaswitch.com>
2017-09-05	Refork worker thread data structures in parallel (VPP-970)	Colin Tregenza Dancer	1	-136/+197
	Change the rebuilding of worker thread clone datastructures to run in parallel on the workers, instead of serially on main. Change-Id: Ib76bcfbef1e51f2399972090f4057be7aaa84e08 Signed-off-by: Colin Tregenza Dancer <ctd@metaswitch.com>
2017-07-26	VPP-905: Wrong define used in function start_workers.	JingLiuZTE	1	-1/+2
	Change-Id: I6a5faebb63e9360cebfcfb1bc3f3c0eb6b15e937 Signed-off-by: JingLiuZTE <liu.jing5@zte.com.cn>
2017-06-09	Implement sack based tcp loss recovery (RFC 6675)	Florin Coras	1	-1/+1
	- refactor existing congestion control code (RFC 6582/5681). Handling of ack feedback now consists of: ack parsing, cc event detection, event handling, congestion control update - extend sack scoreboard to support sack based retransmissions - basic implementation of Eifel detection algorithm (RFC 3522) for detecting spurious retransmissions - actually initialize the per-thread frame freelist hash tables - increase worker stack size to 2mb - fix session queue node out-of-buffer handling - ensure that the local buffer cache vec_len matches reality - avoid 2x spurious event requeues when short of buffers - count out-of-buffer events - make the builtin server thread-safe - fix bihash template threading issue: need to paint -1 across uninitialized working_copy_length vector elements (via rebase from master) Change-Id: I646cb9f1add9a67d08f4a87badbcb117980ebfc4 Signed-off-by: Florin Coras <fcoras@cisco.com> Signed-off-by: Dave Barach <dbarach@cisco.com>
2017-05-10	completelly deprecate os_get_cpu_number, replace new occurences	Damjan Marion	1	-10/+2
	Change-Id: I82c663bc0866c6c68ba354104b0bb059387f4b9d Signed-off-by: Damjan Marion <damarion@cisco.com>
2017-05-09	Fix remaining 32-bit compile issues	Damjan Marion	1	-1/+1
	Change-Id: I9664214652229b663c3e3ba7406b4ede96bfb123 Signed-off-by: Damjan Marion <damarion@cisco.com>
2017-04-06	Use thread local storage for thread index	Damjan Marion	1	-47/+22
	This patch deprecates stack-based thread identification, Also removes requirement that thread stacks are adjacent. Finally, possibly annoying for some folks, it renames all occurences of cpu_index and cpu_number with thread index. Using word "cpu" is misleading here as thread can be migrated ti different CPU, and also it is not related to linux cpu index. Change-Id: I68cdaf661e701d2336fc953dcb9978d10a70f7c1 Signed-off-by: Damjan Marion <damarion@cisco.com>