vpp - Vector Packet Processing

Age	Commit message (Collapse)	Author	Files	Lines
2020-10-19	vppinfra: minor refactor in lock.h	Dave Barach	1	-2/+4
	For whatever reason, "typedef struct { ... } *foo_t" gives cgo a horrible case of indigestion. A minor refactor makes the pain go away. Type: refactor Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: I856b2abf9b16348d2f6145178e683e722914c756
2020-09-18	virtio: fix txq locking	Benoît Ganne	1	-0/+17
	Initialize txq lock only if some txq are shared and check if another worker is already operating on the txq before processing gro timeouts in input node. Type: fix Change-Id: I89dab6c0e6eb6a7aa621fa1548b0a2c76e6c7581 Signed-off-by: Benoît Ganne <bganne@cisco.com>
2019-09-27	vppinfra: implement CLIB_PAUSE () for aarch64 platforms	jaszha03	1	-0/+2
	Define CLIB_PAUSE () to generate the "yield" instruction. No significant performance changes were observed for clib_spinlock_t and clib_rwlock_t. Type: feature Change-Id: I59eb996e61c7a16007517e57e6996567302c1657 Signed-off-by: Jason Zhang <jason.zhang2@arm.com> Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
2019-08-01	vppinfra: refactor clib_rwlock_t to use single condition variable	jaszha03	1	-24/+19
	Previous implementation of clib_rwlock_t used two spinlocks: one writer lock, and one to guard the counter for the number of readers. This implementation uses a single condition variable rw_cnt which has the following properties: if a writer has the rwlock, rw_cnt = -1 if the rwlock is free, rw_cnt = 0 otherwise, rw_cnt > 0 and rw_cnt = number of readers rw_cnt will never be less than -1 Benchmarking: The results below are the cycle counts from test_rwlock.c, configured so that for 10000 iterations, 6 reader and 6 writer threads on separate cores are spawned such that each writer thread increments a global counter 10000 times in each iteration. For Taishan, 4 reader and 4 writer threads are spawned in each test. x86 Xeon old rwlock: 12.473e8, 11.655e8, 13.201e8, 11.347e8, 13.182e8 x86 Xeon new rwlock: 5.881e8, 5.796e8, 6.536e8, 5.540e8, 5.890e8 Aarch64 ThX2* old rwlock: 9.263e7, 8.933e7, 9.074e7, 8.979e7, 9.378e7 Aarch64 ThX2* new rwlock: 7.221e7, 8.107e7, 7.515e7, 7.672e7, 7.386e7 A72 old rwlock: 3.268e6, 3.200e6, 3.086e6, 3.176e6, 3.170e6 A72 new rwlock: 1.261e6, 1.288e6, 1.251e6, 1.229e6, 1.234e6 *ThunderX2 used additional gcc options "-march=armv8.1-a+crc+crypto+lse" Type: refactor Change-Id: I7c347d3037b36205ab532cbcb52a374c846eb275 Signed-off-by: Jason Zhang <jason.zhang2@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
2019-08-01	vppinfra: refactor clib_spinlock_t to use compare and swap	jaszha03	1	-2/+10
	Tested performance of a CAS implementation (using __atomic_compare_exchange) against a TAS implementation (using __atomic_exchange) using test_spinlock.c and found some performance improvement. Generated assembly for CAS and TAS implementations show that TAS always executes with a load-store dependency, but CAS moves a branch condition between the load and store so that only a load occurs when the lock is free. Benchmarking: The results below are the cycle counts from test_spinlock.c, configured so that for 10000 iterations, 12 threads on separate cores are spawned, each of which increments a global counter 10000 times in each iteration. For A72, 8 threads are spawned in each test. x86 Xeon TAS: 7.333e8, 7.605e8, 7.535e8, 7.485e8, 7.321e8 x86 Xeon CAS: 5.842e8, 5.433e8, 5.389e8, 5.983e8, 5.552e8 Aarch64 ThX2* TAS: 9.852e7, 10.209e7, 9.190e7, 9.600e7, 9.224e7 Aarch64 ThX2* CAS: 7.640e7, 7.486e7, 7.425e7, 7.269e7, 7.534e7 A72 TAS: 7.289e6, 6.963e6, 7.208e6, 6.976e6, 7.200e6 A72 CAS: 1.695e6, 1.608e6, 1.600e6, 1.634e6, 1.746e6 *ThunderX2 used additional gcc options "-march=armv8.1-a+crc+crypto+lse" Type: refactor Change-Id: Ic5cd97991804f6b012707fad1a5d1a6edb96cd3d Signed-off-by: Jason Zhang <jason.zhang2@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
2019-07-31	vppinfra: refactor test_and_set spinlocks to use clib_spinlock_t	jaszha03	1	-0/+3
	Spinlock performance improved when implemented with compare_and_exchange instead of test_and_set. All instances of test_and_set locks were refactored to use clib_spinlock_t when possible. Some locks e.g. ssvm synchronize between processes rather than threads, so they cannot directly use clib_spinlock_t. Type: refactor Change-Id: Ia16b5d4cd49209b2b57b8df6c94615c28b11bb60 Signed-off-by: Jason Zhang <jason.zhang2@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
2019-07-30	vppinfra: refactor use of CLIB_MEMORY_BARRIER ()	jaszha03	1	-13/+6
	All instances of test_and_set locks used the following sequence to release the locks: CLIB_MEMORY_BARRIER (); p->lock = 0; // p is a generic struct with a TAS lock Use clib_atomic_release to generate more efficient assembly code. Type: refactor Change-Id: Idca3a38b1cf43578108bdd1afe83b6ebc17a4c68 Signed-off-by: Jason Zhang <jason.zhang2@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
2018-10-23	c11 safe string handling support	Dave Barach	1	-2/+2
	Change-Id: Ied34720ca5a6e6e717eea4e86003e854031b6eab Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-19	vppinfra: add atomic macros for __sync builtins	Sirshak Das	1	-5/+5
	This is first part of addition of atomic macros with only macros for __sync builtins. - Based on earlier patch by Damjan (https://gerrit.fd.io/r/#/c/10729/) Additionally - clib_atomic_release macro added and used in the absence of any memory barrier. - clib_atomic_bool_cmp_and_swap added Change-Id: Ie4e48c1e184a652018d1d0d87c4be80ddd180a3b Original-patch-by: Damjan Marion <damarion@cisco.com> Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com>
2018-02-05	session: segment manager refactor	Florin Coras	1	-2/+19
	- use valloc as a 'central' segment baseva manager - use per segment manager segment pools and use rwlocks to guard them - add session test that exercises segment creation - embed segment manager properties into application since they're shared - fix rw locks Change-Id: I761164c147275d9e8a926f1eda395e090d231f9a Signed-off-by: Florin Coras <fcoras@cisco.com>
2018-01-25	vppinfra: add readers-writer lock	Florin Coras	1	-14/+98
	Change-Id: I606fd89c410369cbd9ce9dcaaaa9dc58796e7c0e Signed-off-by: Florin Coras <fcoras@cisco.com>
2017-10-15	Fix minor issues in clib_spinlock_unlock()	Dave Barach	1	-1/+3
	Change-Id: I20ce799c9dd57332c06003b466ee7c36169bce98 Signed-off-by: Dave Barach <dave@barachs.net>
2017-05-10	completelly deprecate os_get_cpu_number, replace new occurences	Damjan Marion	1	-3/+3
	Change-Id: I82c663bc0866c6c68ba354104b0bb059387f4b9d Signed-off-by: Damjan Marion <damarion@cisco.com>
2017-03-30	vppinfra: add spinlock inline functions	Damjan Marion	1	-0/+97
	Change-Id: I86089e9bb604adfc260a111685001be1c897ce53 Signed-off-by: Damjan Marion <damarion@cisco.com>