Age | Commit message (Collapse) | Author | Files | Lines |
|
Using bitfield struct for 5tuple proved to be fragile from
the performance standpoint - the zeroizing of the entire
structure and then setting the separate pieces of it
triggers increased memory latency. So, move to using
flags byte.
Also, use the direct object copies rather than memcpy.
Change-Id: Iad8faf9de050ff1256e40c950dee212cbd3e5267
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
This is first part of addition of atomic macros with only macros for
__sync builtins.
- Based on earlier patch by Damjan (https://gerrit.fd.io/r/#/c/10729/)
Additionally
- clib_atomic_release macro added and used in the absence
of any memory barrier.
- clib_atomic_bool_cmp_and_swap added
Change-Id: Ie4e48c1e184a652018d1d0d87c4be80ddd180a3b
Original-patch-by: Damjan Marion <damarion@cisco.com>
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
|
|
The stateful ICMP/ICMPv6 handling got broken.
Fix that and introduce testcases to catch in the future.
Change-Id: Ie602e72d6ac613d64ab0bf6693b6d75afb1a9552
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
This commit adds explicit signaling from a non-owning thread to the owning thread
to restart the session timer as necessary.
Consequently, we now can sweep the session lists at their respective timeouts,
rather than sweeping all the lists at the pace of the shortest timeout value,
just taking care to wake up if the session requeue to a different list results
in needing to wake up earlier.
Change-Id: Ifc8c500f6988748f4cd3dc184dd7824321aaaaca
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Change-Id: I085615fde1f966490f30ed5d32017b8b088cfd59
Signed-off-by: Paul Vinciguerra <pvinci@vinciconsulting.com>
|
|
Just do a regular increment, since we are not contending for these counters.
This also makes the tests pass on ARM ThunderX (VPP-1395).
Change-Id: I333aaa11d4145d13d322868900bc114df85a020d
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
bihash deletion operation may in turn do underlying memory operations,
so ensure it is using the correct (private) heap.
Change-Id: Ibef7ad7f9db6fa83da02316bf7509072ce579bc0
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
(cherry picked from commit f74b4d2b559b1d5697fd625d9c8e0f76ba5a4463)
|
|
Change-Id: I63c36644c9d93f2c3ec6606ca0205b407499de4e
Signed-off-by: Eyal Bari <ebari@cisco.com>
|
|
Add a new kv_16_8 field into 5tuple union, rename
the existing kv into kv_40_8 for clarity, and
add the compile-time alignment constraints.
Change-Id: I9bfca91f34850a5c89cba590fbfe9b865e63ef94
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
contiguous with L4 data
Using ip46_address_t was convenient from operational point of view but created
some difficulties dealing with IPv4 addresses - the extra 3x of u32 padding
are costly, and the "holes" mean we can not use the smaller key-value
data structures for the lookup.
This commit changes the 5tuple layout for the IPv4 case, such that
the src/dst addresses directly precede the L4 information.
That will allow to treat the same data within 40x8 key-value
structure as a 16x8 key-value structure starting with 24 byte offset.
Change-Id: Ifea8d266ca0b9c931d44440bf6dc62446c1a83ec
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
- implement a 1us purgatory for the session structures
by adding a special connection list, where all connections
about to be deleted go.
- add per-list-head timeouts updated upon the list enqueue/dequeue
for connection idle management
- add a "unused" session list with list ID#0, which should
never be used unless there is a logic error. Use this ID
to initialize the sessions.
- improve the maintainability of the session linked list
structures by using symbolic bogus index name instead of ~0
- change the ordering of session creations - first reverse, then
local. To minimize the potential for two workers competing for
the same session in the corner case of the two packets
on different workers creating the same logical session
- reduce the maximum session count to keep the memory usage the same
- add extra log/debug/trace to session cleaning logic
- be more aggressive with cleaning up sessions - wind up the
interrupts from the workers to themselves if there is more
work to do
Change-Id: I3aa1c91a925a08e83793467cb15bda178c21e426
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
clib_bihash_search_40_8 for session lookups
Use inline version rather than calling the function, this gives slightly better performance.
The straighforward diff uncovered an interesting problem: the stateful ACL IPv4 unit tests would fail
for the "make test" but succeed in "make test-debug". Also, they would succeed even in "make test",
if before calling the clib_bihash_search_inline_2_40_8 we would change the code
to store the key in a temporary variable.
Debugging revealed that the generated optimized code is not what one would expect:
the zeroing of the u64s overlaying the memcpy into ipv4 value of ip46_address_t
made the optimizer not notice the latter, and think that those fields should be
always zero in the bihash, thus generating incorrect assembly for the bihash key
comparison for the ipv4 nodes.
Changing the zeroing to be non-overlapping by zeroing only the pad fields resulted
in the optimizer generating the correct code and the tests pass.
Change-Id: Ib0f55cef2b5fe70c931d17ca4dc32a5755d160cd
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
per-packet session key
Using a separate session key has proven to be tricky for the following reasons:
- it's a lot of storage to have what looks to be nearly identical to 5tuple,
just maybe with some fields swapped
- shuffling the fields from 5tuple adds to memory pressure
- the fact that the fields do not coincide with the packet memory
means for any staged processing we need to use up a lot of memory
Thus, just add two entries into the bihash table pointing to
the same session entry, so we could match the packets from either
direction.
With this we have the key layout of L3 info (which takes up
the majority of space for IPv6 case) the same as in the packet,
thus, opening up the possibility for other optimizations.
Not having to create and store a separate session key
should also give us a small performance win in itself.
Also, add the routine to show the session bihash in a better
way than a bunch of numbers.
Alas, the memory usage in the bihash obviously doubles.
Change-Id: I8fd2ed4714ad7fc447c4fa224d209bc0b736b371
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
This commit splits the functions from fa_node.c
into the pure dataplane node functions (which are multiarch-compiled),
session management node functions (which are compiled only once),
and session find/add/delete functions which are split out into the inlines.
As part of the refactoring:
- get rid of BV() macros in the affected chunk of code,
rather use the explicit bihash function names.
- add the magic trailer to the new files to
ensure make checkstyle watches them.
- move the bihash_template.c include for 40_8 bihash into acl.c
Change-Id: I4d781e9ec4307ea84e92af93c09470ea2bd0c375
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|