diff options
author | jaszha03 <jason.zhang2@arm.com> | 2019-06-26 17:06:59 -0500 |
---|---|---|
committer | Dave Barach <openvpp@barachs.net> | 2019-08-01 11:54:17 +0000 |
commit | fb1ccc7c369d879a20567df609d04cfdcec5f8c1 (patch) | |
tree | c2f289132da3c8e0a71c186df990270bf9c26043 /src/vppinfra/memcpy_sse3.h | |
parent | 635acbf4b76038023b0b5a9bdae0dc87e13fd308 (diff) |
vppinfra: refactor clib_spinlock_t to use compare and swap
Tested performance of a CAS implementation (using __atomic_compare_exchange)
against a TAS implementation (using __atomic_exchange) using test_spinlock.c
and found some performance improvement.
Generated assembly for CAS and TAS implementations show that TAS always
executes with a load-store dependency, but CAS moves a branch condition
between the load and store so that only a load occurs when the lock is free.
Benchmarking:
The results below are the cycle counts from test_spinlock.c, configured so
that for 10000 iterations, 12 threads on separate cores are spawned, each of
which increments a global counter 10000 times in each iteration. For
A72, 8 threads are spawned in each test.
x86 Xeon TAS: 7.333e8, 7.605e8, 7.535e8, 7.485e8, 7.321e8
x86 Xeon CAS: 5.842e8, 5.433e8, 5.389e8, 5.983e8, 5.552e8
Aarch64 ThX2* TAS: 9.852e7, 10.209e7, 9.190e7, 9.600e7, 9.224e7
Aarch64 ThX2* CAS: 7.640e7, 7.486e7, 7.425e7, 7.269e7, 7.534e7
A72 TAS: 7.289e6, 6.963e6, 7.208e6, 6.976e6, 7.200e6
A72 CAS: 1.695e6, 1.608e6, 1.600e6, 1.634e6, 1.746e6
*ThunderX2 used additional gcc options "-march=armv8.1-a+crc+crypto+lse"
Type: refactor
Change-Id: Ic5cd97991804f6b012707fad1a5d1a6edb96cd3d
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
Diffstat (limited to 'src/vppinfra/memcpy_sse3.h')
0 files changed, 0 insertions, 0 deletions