Age | Commit message (Collapse) | Author | Files | Lines |
|
Initialize txq lock only if some txq are shared and check if another
worker is already operating on the txq before processing gro timeouts
in input node.
Type: fix
Change-Id: I89dab6c0e6eb6a7aa621fa1548b0a2c76e6c7581
Signed-off-by: Benoît Ganne <bganne@cisco.com>
(cherry picked from commit b6b484d01adb8ab2ef5a50d5a3d6f3f097df2e0c)
|
|
Define CLIB_PAUSE () to generate the "yield" instruction. No significant
performance changes were observed for clib_spinlock_t and clib_rwlock_t.
Type: feature
Change-Id: I59eb996e61c7a16007517e57e6996567302c1657
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
|
|
Previous implementation of clib_rwlock_t used two spinlocks: one
writer lock, and one to guard the counter for the number of readers.
This implementation uses a single condition variable rw_cnt which
has the following properties:
if a writer has the rwlock, rw_cnt = -1
if the rwlock is free, rw_cnt = 0
otherwise, rw_cnt > 0 and rw_cnt = number of readers
rw_cnt will never be less than -1
Benchmarking:
The results below are the cycle counts from test_rwlock.c, configured so
that for 10000 iterations, 6 reader and 6 writer threads on separate cores
are spawned such that each writer thread increments a global counter
10000 times in each iteration. For Taishan, 4 reader and 4 writer
threads are spawned in each test.
x86 Xeon old rwlock: 12.473e8, 11.655e8, 13.201e8, 11.347e8, 13.182e8
x86 Xeon new rwlock: 5.881e8, 5.796e8, 6.536e8, 5.540e8, 5.890e8
Aarch64 ThX2* old rwlock: 9.263e7, 8.933e7, 9.074e7, 8.979e7, 9.378e7
Aarch64 ThX2* new rwlock: 7.221e7, 8.107e7, 7.515e7, 7.672e7, 7.386e7
A72 old rwlock: 3.268e6, 3.200e6, 3.086e6, 3.176e6, 3.170e6
A72 new rwlock: 1.261e6, 1.288e6, 1.251e6, 1.229e6, 1.234e6
*ThunderX2 used additional gcc options "-march=armv8.1-a+crc+crypto+lse"
Type: refactor
Change-Id: I7c347d3037b36205ab532cbcb52a374c846eb275
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
|
|
Tested performance of a CAS implementation (using __atomic_compare_exchange)
against a TAS implementation (using __atomic_exchange) using test_spinlock.c
and found some performance improvement.
Generated assembly for CAS and TAS implementations show that TAS always
executes with a load-store dependency, but CAS moves a branch condition
between the load and store so that only a load occurs when the lock is free.
Benchmarking:
The results below are the cycle counts from test_spinlock.c, configured so
that for 10000 iterations, 12 threads on separate cores are spawned, each of
which increments a global counter 10000 times in each iteration. For
A72, 8 threads are spawned in each test.
x86 Xeon TAS: 7.333e8, 7.605e8, 7.535e8, 7.485e8, 7.321e8
x86 Xeon CAS: 5.842e8, 5.433e8, 5.389e8, 5.983e8, 5.552e8
Aarch64 ThX2* TAS: 9.852e7, 10.209e7, 9.190e7, 9.600e7, 9.224e7
Aarch64 ThX2* CAS: 7.640e7, 7.486e7, 7.425e7, 7.269e7, 7.534e7
A72 TAS: 7.289e6, 6.963e6, 7.208e6, 6.976e6, 7.200e6
A72 CAS: 1.695e6, 1.608e6, 1.600e6, 1.634e6, 1.746e6
*ThunderX2 used additional gcc options "-march=armv8.1-a+crc+crypto+lse"
Type: refactor
Change-Id: Ic5cd97991804f6b012707fad1a5d1a6edb96cd3d
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
|
|
Spinlock performance improved when implemented with compare_and_exchange
instead of test_and_set. All instances of test_and_set locks were refactored
to use clib_spinlock_t when possible. Some locks e.g. ssvm synchronize
between processes rather than threads, so they cannot directly use
clib_spinlock_t.
Type: refactor
Change-Id: Ia16b5d4cd49209b2b57b8df6c94615c28b11bb60
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
|
|
All instances of test_and_set locks used the following sequence
to release the locks:
CLIB_MEMORY_BARRIER ();
p->lock = 0; // p is a generic struct with a TAS lock
Use clib_atomic_release to generate more efficient assembly code.
Type: refactor
Change-Id: Idca3a38b1cf43578108bdd1afe83b6ebc17a4c68
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
|
|
Change-Id: Ied34720ca5a6e6e717eea4e86003e854031b6eab
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
This is first part of addition of atomic macros with only macros for
__sync builtins.
- Based on earlier patch by Damjan (https://gerrit.fd.io/r/#/c/10729/)
Additionally
- clib_atomic_release macro added and used in the absence
of any memory barrier.
- clib_atomic_bool_cmp_and_swap added
Change-Id: Ie4e48c1e184a652018d1d0d87c4be80ddd180a3b
Original-patch-by: Damjan Marion <damarion@cisco.com>
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
|
|
- use valloc as a 'central' segment baseva manager
- use per segment manager segment pools and use rwlocks to guard them
- add session test that exercises segment creation
- embed segment manager properties into application since they're shared
- fix rw locks
Change-Id: I761164c147275d9e8a926f1eda395e090d231f9a
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
Change-Id: I606fd89c410369cbd9ce9dcaaaa9dc58796e7c0e
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
Change-Id: I20ce799c9dd57332c06003b466ee7c36169bce98
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: I82c663bc0866c6c68ba354104b0bb059387f4b9d
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I86089e9bb604adfc260a111685001be1c897ce53
Signed-off-by: Damjan Marion <damarion@cisco.com>
|