vpp - Vector Packet Processing

Age	Commit message (Collapse)	Author	Files	Lines
2019-08-01	vppinfra: refactor clib_spinlock_t to use compare and swap	jaszha03	1	-0/+3
	Tested performance of a CAS implementation (using __atomic_compare_exchange) against a TAS implementation (using __atomic_exchange) using test_spinlock.c and found some performance improvement. Generated assembly for CAS and TAS implementations show that TAS always executes with a load-store dependency, but CAS moves a branch condition between the load and store so that only a load occurs when the lock is free. Benchmarking: The results below are the cycle counts from test_spinlock.c, configured so that for 10000 iterations, 12 threads on separate cores are spawned, each of which increments a global counter 10000 times in each iteration. For A72, 8 threads are spawned in each test. x86 Xeon TAS: 7.333e8, 7.605e8, 7.535e8, 7.485e8, 7.321e8 x86 Xeon CAS: 5.842e8, 5.433e8, 5.389e8, 5.983e8, 5.552e8 Aarch64 ThX2* TAS: 9.852e7, 10.209e7, 9.190e7, 9.600e7, 9.224e7 Aarch64 ThX2* CAS: 7.640e7, 7.486e7, 7.425e7, 7.269e7, 7.534e7 A72 TAS: 7.289e6, 6.963e6, 7.208e6, 6.976e6, 7.200e6 A72 CAS: 1.695e6, 1.608e6, 1.600e6, 1.634e6, 1.746e6 *ThunderX2 used additional gcc options "-march=armv8.1-a+crc+crypto+lse" Type: refactor Change-Id: Ic5cd97991804f6b012707fad1a5d1a6edb96cd3d Signed-off-by: Jason Zhang <jason.zhang2@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
2019-06-05	Switch atomic release API from __sync to __atomic builtin.	Sirshak Das	1	-1/+1
	__sync_lock_release switched to __atomic_store for code consitency, although both generate same instructions with current compilers. Change-Id: I37d320509e43a4c2b8a49af6346dc4a43ca2f535 Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
2019-06-05	Switch atomic test and set API from __sync to __atomic builtin	Sirshak Das	1	-1/+1
	__sync_test_and_set uses full memory barriers for AArch64, __atomic_exchange(ACQUIRE) would use load acquire. Change-Id: Ifdf2481db3b9dde6c5842d75671402862adb6d81 Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
2019-04-16	svm_fifo rework to avoid contention on cursize	Sirshak Das	1	-0/+3
	Problems Addressed: - Contention of cursize by producer and consumer. - Reduce the no of modulo operations. Changes: - Synchronization between producer and consumer changed from cursize to head and tail indexes Implications: reduces the usable size of fifo by 1. - Using weaker memory ordering C++11 atomics to access head and tail based on producer and consumer role. - Head and tail indexes are unsigned 32 bit integers. Additions and subtraction on them are implicit 32 bit Modulo operation. - Adding weaker memory ordering variants of max_enq, max_deq, is_empty and is_full Using them appropriately in all places. Perfomance improvement (iperf3 via Hoststack): iperf3 Server: Marvell ThunderX2(AArch64) - iperf3 Client: Skylake(x86) ~6%(256 rxd/txd) - ~11%(2048 rxd/txd) Change-Id: I1d484e000e437430fdd5a819657d1c6b62443018 Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2019-03-22	svm/atomics: add clib_atomic_swap_rel_n	Florin Coras	1	-0/+1
	Change-Id: Iea2c173000570043beafef58ca923463ce76d872 Signed-off-by: Florin Coras <fcoras@cisco.com>
2018-11-28	Use acquire/release ordering when accessing svm_fifo shared variable cursize	Sirshak Das	1	-0/+4
	Improves TCP iperf3 performance by ~3% on AArch64. Change-Id: I1e51bd8403ba45ec6af4c2f96b95e884c1ae0d67 Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
2018-11-05	Enable atomic swap and store macro with acquire and release ordering	Sirshak Das	1	-2/+3
	Add atomic swap and store macro with acquire and release ordering respectively. Variable in question is interupt_pending variable which is used as guard variable by input nodes to process the device queue. Atomic Swap is used with Acquire ordering as writes or reads following this in program order should not be reordered before the swap. Atomic Store is used with Release ordering, as post store the node is added to pending list. Change-Id: I1be49e91a15c58d0bf21ff5ba1bd37d5d7d12f7a Original-patch-by: Damjan Marion <damarion@cisco.com> Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
2018-10-19	vppinfra: add atomic macros for __sync builtins	Sirshak Das	1	-0/+45
	This is first part of addition of atomic macros with only macros for __sync builtins. - Based on earlier patch by Damjan (https://gerrit.fd.io/r/#/c/10729/) Additionally - clib_atomic_release macro added and used in the absence of any memory barrier. - clib_atomic_bool_cmp_and_swap added Change-Id: Ie4e48c1e184a652018d1d0d87c4be80ddd180a3b Original-patch-by: Damjan Marion <damarion@cisco.com> Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com>