Age | Commit message (Collapse) | Author | Files | Lines |
|
Previous implementation of clib_rwlock_t used two spinlocks: one
writer lock, and one to guard the counter for the number of readers.
This implementation uses a single condition variable rw_cnt which
has the following properties:
if a writer has the rwlock, rw_cnt = -1
if the rwlock is free, rw_cnt = 0
otherwise, rw_cnt > 0 and rw_cnt = number of readers
rw_cnt will never be less than -1
Benchmarking:
The results below are the cycle counts from test_rwlock.c, configured so
that for 10000 iterations, 6 reader and 6 writer threads on separate cores
are spawned such that each writer thread increments a global counter
10000 times in each iteration. For Taishan, 4 reader and 4 writer
threads are spawned in each test.
x86 Xeon old rwlock: 12.473e8, 11.655e8, 13.201e8, 11.347e8, 13.182e8
x86 Xeon new rwlock: 5.881e8, 5.796e8, 6.536e8, 5.540e8, 5.890e8
Aarch64 ThX2* old rwlock: 9.263e7, 8.933e7, 9.074e7, 8.979e7, 9.378e7
Aarch64 ThX2* new rwlock: 7.221e7, 8.107e7, 7.515e7, 7.672e7, 7.386e7
A72 old rwlock: 3.268e6, 3.200e6, 3.086e6, 3.176e6, 3.170e6
A72 new rwlock: 1.261e6, 1.288e6, 1.251e6, 1.229e6, 1.234e6
*ThunderX2 used additional gcc options "-march=armv8.1-a+crc+crypto+lse"
Type: refactor
Change-Id: I7c347d3037b36205ab532cbcb52a374c846eb275
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
|
|
"timer.[ch]" used a signal handler to deliver timer callbacks. Without
indulging in a set of sigprocmask(...) system calls, it would be
unsafe to use the mechanism.
Rather than wait for another developer to accidentally open this
particular can of worms, best to remove the code. It's nothing more
than an attractive nuisance at this point.
Type: refactor
Signed-off-by: Dave Barach <dave@barachs.net>
Change-Id: Ia3e7b00a389c302b466605dff0c1bf3566b8dbbd
|
|
Type: fix
Signed-off-by: Dave Barach <dave@barachs.net>
Change-Id: Ie37ff66faba79e3b8f46c7a704137f9ef2acc773
|
|
Tested performance of a CAS implementation (using __atomic_compare_exchange)
against a TAS implementation (using __atomic_exchange) using test_spinlock.c
and found some performance improvement.
Generated assembly for CAS and TAS implementations show that TAS always
executes with a load-store dependency, but CAS moves a branch condition
between the load and store so that only a load occurs when the lock is free.
Benchmarking:
The results below are the cycle counts from test_spinlock.c, configured so
that for 10000 iterations, 12 threads on separate cores are spawned, each of
which increments a global counter 10000 times in each iteration. For
A72, 8 threads are spawned in each test.
x86 Xeon TAS: 7.333e8, 7.605e8, 7.535e8, 7.485e8, 7.321e8
x86 Xeon CAS: 5.842e8, 5.433e8, 5.389e8, 5.983e8, 5.552e8
Aarch64 ThX2* TAS: 9.852e7, 10.209e7, 9.190e7, 9.600e7, 9.224e7
Aarch64 ThX2* CAS: 7.640e7, 7.486e7, 7.425e7, 7.269e7, 7.534e7
A72 TAS: 7.289e6, 6.963e6, 7.208e6, 6.976e6, 7.200e6
A72 CAS: 1.695e6, 1.608e6, 1.600e6, 1.634e6, 1.746e6
*ThunderX2 used additional gcc options "-march=armv8.1-a+crc+crypto+lse"
Type: refactor
Change-Id: Ic5cd97991804f6b012707fad1a5d1a6edb96cd3d
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
|
|
Spawns a uniform number of writer and reader threads across a number of
cores where each writer thread increments a global variable a specified
number of times, and the reader threads continually poll the global's
value until the writers complete.
Type: test
Change-Id: I979c3734c6d03139d0802bff1846875d226f6fbb
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
|
|
Spinlock performance improved when implemented with compare_and_exchange
instead of test_and_set. All instances of test_and_set locks were refactored
to use clib_spinlock_t when possible. Some locks e.g. ssvm synchronize
between processes rather than threads, so they cannot directly use
clib_spinlock_t.
Type: refactor
Change-Id: Ia16b5d4cd49209b2b57b8df6c94615c28b11bb60
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
|
|
Spawns a uniform number of threads across a number of cores where each
thread increments a global variable a specified number of times.
Type: test
Change-Id: I12b3a37708a199c297d022348d99dbb0e8349a9f
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
|
|
All instances of test_and_set locks used the following sequence
to release the locks:
CLIB_MEMORY_BARRIER ();
p->lock = 0; // p is a generic struct with a TAS lock
Use clib_atomic_release to generate more efficient assembly code.
Type: refactor
Change-Id: Idca3a38b1cf43578108bdd1afe83b6ebc17a4c68
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
|
|
Modified test-and-set spin locks to call CLIB_PAUSE () when spinning
for code consistency. Decreases the memory bandwidth consumed.
Type: fix
Change-Id: I1cca4f87f44f23f257c7a35466cd2e7767072f51
Signed-off-by: Jason Zhang <jason.zhang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
|
|
Type: feature
Change-Id: I5e030b23943c012d8191ff657165055d33ec87a2
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
Type: fix
Ticket: VPP-1649
Change-Id: Ief77ec8d5f06bfcc63af6454c4cd9979cf0ab49d
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Type: feature
Change-Id: Ic720d56a6f8901efde2a58519bc9aa553205a9a6
Signed-off-by: Gary Boon <gboon@cisco.com>
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
The OOM check must consider the end of alloced arena and
not the start when checking for overflow.
Type: fix
Change-Id: Ie83e653d0894199d2fa433a604a0fe0cee142338
Signed-off-by: Andreas Schultz <andreas.schultz@travelping.com>
|
|
Type: refactor
Change-Id: I24159e0a848f552b4e27acfb5fe6f2cd91b50a19
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
elog string hashtable use strlen() to determine string length for
hashing, strings must be NULL-terminated for both inserts and lookups.
Type: fix
Fixes: 9c8ca8dd3197e40dfcb8bcecd95c10eeb56239ed
Change-Id: I0680d39a9b89411055fd6adc89c9f253adfae32c
Signed-off-by: Benoît Ganne <bganne@cisco.com>
|
|
Type: feature
Change-Id: I21511c1abea703da67f1a491e73342496275c498
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
If is_add=2, fail w/ return value -2 if the key exists instead of
overwriting the (key,value) pair.
Type: feature
Change-Id: I00a3c194a381c68090369c31d6c6f9870cfe0a62
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Reduces the vpp image virtual size by multiple gigabytes
Add a "show bihash" command which displays configured and current
virtual space in use by bihash tables.
Modify the .py test framework to call "show bihash" on test tear-down
Type: refactor
Change-Id: Ifc1b7e2c43d29bbef645f6802fa29ff8ef09940c
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
New coverity toolset, new set of squawks to fix
Ticket: VPP-1649
Type: fix
Change-Id: I2a7e4c42b101c6c79c01b150b2523ce3d5d62354
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Makes life easier for binary API language bindings
Type: fix
Change-Id: Ib459274fda05153d01cbb7bc328a8407e3ee5027
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Add u64x2_scatter/u32x4_scatter in vector_neon.h. u64x2_scatter/u32x4_scatter
scatters data from SIMD register to scattered memory locations.
Type: feature
Change-Id: I298d5478c7ba6935ab7402a6d467c7ee00f17e9f
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Sirshak Das <Sirshak.Das@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
|
|
When only the fast in wheel is in use, the next expiring has
to be within the fast_slot_bitmap.
With mutliple wheels, the next expiring timer could be in the
slow wheel. The timers on the slow wheel are only moved into the
fast wheel when the fast wheel index reaches TW_SLOTS_PER_RING.
When calculating the next expiring timer we therefor need to
consider the timers on the slow wheel as well.
When there are no more before reaching TW_SLOTS_PER_RING, instead
of scanning the slow wheel, return the number of ticks until
TW_SLOTS_PER_RING is reached.
Type: fix
Change-Id: I847031f8efc015c888d082f0b0c1bd500aa65704
Signed-off-by: Andreas Schultz <andreas.schultz@travelping.com>
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Add u64x2_gather/u32x4_gather in vector_neon.h. u64x2_gather/u32x4_gather
gathers data from scattered memory locations to a SIMD register.
Type: feature
Change-Id: I1dd27e38af28b9bed85143014c86197ee549fede
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Sirshak Das <Sirshak.Das@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
|
|
Type:fix
Make sure tnil color is black and that the right node colors are
updated.
Change-Id: Ibd9d7ea9438df4dab977202955957824723a865d
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
Type: feature
Add support for insert/search/del with custom compare function.
Change-Id: Ibb740afc224d8adc29d3e1b51b46cdd738d1bd93
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
This fixes two leaks in registering errors in the stats segment.
- The error name created by vlib_register_errors() was not freed.
- Duplicate error names (when interface readded) was added to the vector.
This fix also adds memory usage statistics for the statistics segment
as /mem/statseg/{used, total}
Change-Id: Ife98d5fc5baef5bdae426a5a1eef428af2b9ab8a
Type: fix
Signed-off-by: Ole Troan <ot@cisco.com>
|
|
Type: feature
Change-Id: I53e1f05b2b048925fca3b2f6b0499ff9c3e6ee12
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: feature
Change-Id: Ie020fd7e2618284a63efbeb9895068f27c0fb9ab
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Add a string hash to make sure that strings in the string table are
unique. This optimization has been coded piecemeal in multiple places,
we should have made the underlying function do the work years ago.
Type: fix
Change-Id: I5010fd4926b9b80ce3a168748f6de64e333ef498
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
__sync_lock_release switched to __atomic_store for code consitency,
although both generate same instructions with current compilers.
Change-Id: I37d320509e43a4c2b8a49af6346dc4a43ca2f535
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
|
|
__sync_test_and_set uses full memory barriers for AArch64,
__atomic_exchange(ACQUIRE) would use load acquire.
Change-Id: Ifdf2481db3b9dde6c5842d75671402862adb6d81
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
|
|
elog is supposed to be thread safe. Yet elog_string() is not. To fulfill
that promise, let's make elog_string() thread safe to avoid surprises.
Change-Id: Iab82faa7cb6719777a66c3ff14775e59a6a68a20
Signed-off-by: Steven Luong <sluong@cisco.com>
|
|
Change-Id: If59fea9bb733f6a134d9cb5a9f61dfabf3441820
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: Iddeb3a1b0e20706e72ec8f74dabc60b342f003ba
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
leak-check { <any-debug-cli-command-and-args> }
Hint: "set term history off" or you'll have to sort through a bunch of
bogus leaks related to the debug cli history mechanism.
Cleaned up a set of reported leaks in the "show interface" command. At
some point, we thought about making a per-thread vlib_mains vector,
but we never did that. Several interface-related CLI's maintained
local static cache vectors. Not a bad idea, but not useful as things
shook out. Removed the static vectors.
Change-Id: I756bf2721a0d91993ecfded34c79da406f30a548
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Fix os_get_nthreads() so that it starts returning the correct answer
as early as possible.
Change-Id: Id5292262f2c3f521b07ffbe6a9f6748dcc4dcb7d
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: Ibc69195244bcc1e8e82c488fcd50b1eb2fcf81c0
Signed-off-by: Igor Mikhailov (imichail) <imichail@cisco.com>
|
|
2019/ 5/ 9 21:18:01:710 notice -> 2019/05/09 21:18:01:710 notice
Change-Id: I8ed932a0d6f5e6ce29ab6428f0f01221d0a6ae34
Signed-off-by: Paul Vinciguerra <pvinci@vinciconsulting.com>
|
|
1)Freeing up working_copy_lengths vector
2)Passing vebososity level to fmt_fn
Change-Id: I5e3f541e2f8cc0150105cc35835366f84937bb2e
Signed-off-by: Vijayabhaskar Katamreddy <vkatamre@cisco.com>
|
|
Remove unused code from the vppinfra build
Add a bihash test case
Change-Id: Ia930309efa28620dd1c0d69aaec432e2d8bd962c
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Example / unit-test in .../src/plugins/unittest/bihash_test.c
Change-Id: I23fd0ba742d65291667a755965aee1a3d3477ca2
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Neon version of memcpy_le gives better performance compared with memmove on aarch64
Change-Id: I44b487bb0795a6e70dd1e55bdde4a077773ec859
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Sirshak Das <sirdas@arm.com>
|
|
name can be (and will be) smaller than section_name_start. Both name and
section_name_start are NULL-terminated so we can use strcmp.
Change-Id: Iec414810109f9267de20c25c0fe601df35daf89e
Signed-off-by: Benoît Ganne <bganne@cisco.com>
|
|
Change-Id: Ia60e4092c45c192002de064c362a9265bc9baeec
Signed-off-by: Benoît Ganne <bganne@cisco.com>
|
|
Change-Id: Ie96706b4d8bcb32d2d5f065bc765f95f4e9369e7
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
Change-Id: If23a04623a7138c9f6c98ee9ecfa587396618a60
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
The macro for x86 platform is wrong, and the patch fixes it.
Change-Id: I4f0ed254b177d6863fbe9113308c6a8b4478e0b2
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
|
|
Change-Id: I6934beaf5c08bae2d4f0bd3a6bb811810407c1f9
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
A simple memcmp won't work when comparing pointer-keys, such as those
used by the bihash_vec8_8.h template.
Change-Id: I77e59f3fd7f7740ef42908ace90ed4843e1c9ac7
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: I26c704ec27b8f5431faef08156778f53ea454269
Signed-off-by: Damjan Marion <damarion@cisco.com>
|