Age | Commit message (Collapse) | Author | Files | Lines |
|
In a case where one pounds on a single kvp in a KVP_AT_BUCKET_LEVEL
table, the code would sporadically return a transitional value (junk)
from a half-deleted kvp. At most, 64-bits worth of the kvp will be
written atomically, so using memset(...) to smear 0xFF's across a kvp
to free it left a lot to be desired.
Performance impact: very mild positive, thanks to FC for doing a
multi-thread host stack perf/scale test.
Added an ASSERT to catch attempts to add a (key,value) pair which
contains the magic "free kvp" value.
Type: fix
Signed-off-by: Dave Barach <dave@barachs.net>
Change-Id: I6a1aa8a2c30bc70bec4b696ce7b17c2839927065
|
|
bihash keys are less than 64-bytes, do not overflow.
Type: fix
Change-Id: Ic55407eb9ccca38058f7e62b363ec05c8445fbcb
Signed-off-by: Benoît Ganne <bganne@cisco.com>
|
|
Type: improvement
Change-Id: Ifb0fa114414aa2fdc244f964612ca3ac3e29b5e1
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Template instances can allocate BIHASH_KVP_PER_PAGE data records
tangent to the bucket, to remove a dependent read / prefetch.
Template instances can ask for immediate memory allocation, to avoid
several branches in the lookup path.
Clean up l2 fib, gpb plugin codes: use clib_bihash_get_bucket(...)
Use hugepages for bihash allocation arenas
Type: improvement
Signed-off-by: Dave Barach <dave@barachs.net>
Signed-off-by: Damjan Marion <damarion@cisco.com>
Change-Id: I92fc11bc58e48d84e2d61f44580916dd1c56361c
|
|
Example / unit-test in .../src/plugins/unittest/bihash_test.c
Change-Id: I23fd0ba742d65291667a755965aee1a3d3477ca2
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: Ied34720ca5a6e6e717eea4e86003e854031b6eab
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Add a bucket-level lock bit. Use a spinlock only when actually
allocating, freeing, or splitting a bucket. Should improve
multi-thread add/del performance.
Change-Id: I3e40e2a8371685457f340d6584dea14e3207f2b0
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Looks like CPU doesn't like overlaping loads.
This new codes in some cases shows 3-4 clock improvements.
Change-Id: Ia1b49976ad95140c573f892fdc0a32eebbfa06c8
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
bihash_48_8 case:
Scalar code: 6 clocks
SSE4.2 code: 3 clocks
AVX2 code: 2.27 clocks
AVX512 code: 1.5 clocks
Change-Id: I40700175835a1e7321276e47eadbf9771d3c5a68
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Simply call pool_init_fixed(...) before using the pool. Note that
fixed, preallocated pools live in individually-mmap'ed address
segments, except for the free element bitmap. A large fixed pool can
exceed 4gb.
Fix tcp buffer allocator leak, remove broken assert
Change-Id: I4421082e12a77c41c6e20f7747f3150dcd01fc26
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
According to Maciek, the easiest way to leverage the csit "performance
trend" job is to actually merge the patch once verified. Manual
testing indicates that the patch improves l2 path performance. Other
use-cases are TBD. It's possible that we'll need to back out the patch
depending on what happens.
Change-Id: Ic0a0363de35ef9be953ad7709c57c3936b73fd5a
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
crc_u32 was not defined for non x86_64 with SSE4.2 processors.
Calls to "crc_u32" are removed and replaced by either a call to
clib_crc32c or a call to clib_xxhash, as the result is not used
as a check value but as a hash.
Change-Id: I3af4d68e2e5ebd0c9b0a6090f848d043cb0f20a2
Signed-off-by: Christophe Fontaine <christophe.fontaine@enea.com>
|
|
32-bit code still can use crc32c instructions, but it operates
on 32 registers
Change-Id: I9bb6b0b59635d6ea6a753584676ebcf59c8f6584
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I7b51f88292e057c6443b12224486f2d0c9f8ae23
Signed-off-by: Damjan Marion <damarion@cisco.com>
|