Age | Commit message (Collapse) | Author | Files | Lines |
|
Change-Id: Id5b7429ca7cce10ce8022c9b8a223bd02f6c3b5f
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
The assignment was redundant with a one just a dozen lines above
in the case of the ACL loaded being non-empty, so its only
apparent purpose in life was make coverity unhappy...
Thus fix by deletion.
Change-Id: I573308cb9c212bdfdca2551aa381720dbbcb006e
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Change-Id: Id4f37f5d4a03160572954a416efa1ef9b3d79ad1
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Using bitfield struct for 5tuple proved to be fragile from
the performance standpoint - the zeroizing of the entire
structure and then setting the separate pieces of it
triggers increased memory latency. So, move to using
flags byte.
Also, use the direct object copies rather than memcpy.
Change-Id: Iad8faf9de050ff1256e40c950dee212cbd3e5267
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Thanks to Damjan for rewriting my previous patch into
not-to-be-deprecated soon form!
Change-Id: I595a13c44ed07d4c6d60e2aef0f0bd807a76cbba
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
I was expecting "%4d" format string to exhibit the same behavior as
the one in C standard library, but rather than specifying _minimal_
width and expanding as necessary, it actually truncates the output.
Changing that to "%9d" should take care of pushing this surprising
difference in behavior into the domain of impossible.
Change-Id: Ia687137ca765bf9c1575af998ff11314010e81ad
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Abstracting out the internal format function
for L4 session key type makes the other
acl plugin format/print functions more maintainable.
Change-Id: Ica1302263a42981555462b5338d18d9a9f9c8342
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Change-Id: Ied34720ca5a6e6e717eea4e86003e854031b6eab
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
for plumbing
This makes ACL plugin use the new feature arcs, which slightly increases performance.
Since for ethertype whitelisting we were using the L2 classifier, to retain
the functionality, make a simple node doing that, and plug it into non-ip
L2 feature arc whenever needed.
Change-Id: I3add377a6c790117dd3fd056e5615cb4c4438cf4
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
This is first part of addition of atomic macros with only macros for
__sync builtins.
- Based on earlier patch by Damjan (https://gerrit.fd.io/r/#/c/10729/)
Additionally
- clib_atomic_release macro added and used in the absence
of any memory barrier.
- clib_atomic_bool_cmp_and_swap added
Change-Id: Ie4e48c1e184a652018d1d0d87c4be80ddd180a3b
Original-patch-by: Damjan Marion <damarion@cisco.com>
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
|
|
each collision in split_partition() (VPP-1458)
A pointer to hash-ready ACL rules is only set once, which might cause a crash if there are colliding entries
from more than one ACL applied.
Solution: reload the pointer based on the element being processed.
Change-Id: I7a701c2c3b4236d67293159f2a33c4f967168953
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
(cherry picked from commit 84112dd4f98e5a31a8c7340a741f89e77fd03363)
|
|
Change-Id: Ie8380cb39424548bf64cb19aee59ec20e29d1e39
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
VPP-1352
If the number of rules within a given partition exceeds the limit,
the split_partition() might get called, in which we calculate
the relaxed mask, create a new partition with that mask and
attempt to reallocate some entries from the overcrowded partition.
The non-TM code was pre-expanding the vector with rules by
the number of rules in the new ACL being applied - which
caused the split_partition() to iterate over the rules
filled with zeroes. Most of the time it is benign, but
if a newly created relaxed partition is such that these
entries can be "relocated", then the code attempts to
do so, which does not end well.
Change-Id: I2dbf3ccd29ff97277b21cdb11c4424ff0915c3b7
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
The stateful ICMP/ICMPv6 handling got broken.
Fix that and introduce testcases to catch in the future.
Change-Id: Ie602e72d6ac613d64ab0bf6693b6d75afb1a9552
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
This commit adds explicit signaling from a non-owning thread to the owning thread
to restart the session timer as necessary.
Consequently, we now can sweep the session lists at their respective timeouts,
rather than sweeping all the lists at the pace of the shortest timeout value,
just taking care to wake up if the session requeue to a different list results
in needing to wake up earlier.
Change-Id: Ifc8c500f6988748f4cd3dc184dd7824321aaaaca
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Change-Id: I085615fde1f966490f30ed5d32017b8b088cfd59
Signed-off-by: Paul Vinciguerra <pvinci@vinciconsulting.com>
|
|
Just do a regular increment, since we are not contending for these counters.
This also makes the tests pass on ARM ThunderX (VPP-1395).
Change-Id: I333aaa11d4145d13d322868900bc114df85a020d
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
bihash deletion operation may in turn do underlying memory operations,
so ensure it is using the correct (private) heap.
Change-Id: Ibef7ad7f9db6fa83da02316bf7509072ce579bc0
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
(cherry picked from commit f74b4d2b559b1d5697fd625d9c8e0f76ba5a4463)
|
|
applied ACLs
The partition_split() did not increment the refcount when using a mask type index,
thus subsequent modifications potentially resulted in double frees and in the best case
immediate crash, in the worst case delayed crash in another place.
Introduce the lock_mask_type_index() and call it, move the mask type index
related functions closer to the top of the file.
Make the assignment of the new mask type indices
for the tuplemerge case to use the assign_mask_type_index().
Keep some debugs in case we need to investigate this further at some point.
Change-Id: Iae370f5cd92e1fe1442480db34656a8a3442dbc0
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
(cherry picked from commit 1edc406da3d4f6e63de2f278360b5753f55c00df)
|
|
Change-Id: I634971f6376a7ea49de718ade9139e67eeed48e5
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
(cherry picked from commit d039281e11cfc4580fe140e72390c1c48688c722)
|
|
Change-Id: I90600d000afb02e8969f3c01bcf9e4b5c10a7d39
Signed-off-by: Mohsin Kazmi <sykazmi@cisco.com>
|
|
Change-Id: Ib61f0299c17c0f021408ab0a44c5b54f55f8a8ec
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: Iffd5c45ab242a919592a1f686f7f880936b68a1a
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: Ibc59323e849810531dd0963e85493efad3b86857
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
This significantly reduces need for
...
in multiarch code. Simply constructor macros will jost create static unused
entry if CLIB_MARCH_VARIANT is defined and that will be optimized out by
compiler.
Change-Id: I17d1c4ac0c903adcfadaa4a07de1b854c7ab14ac
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Commit 1c7bf5d41737984907e8bad1dc832eb6cb1d6288 added the poisoning
of the newly freed memory in debug builds, exposing a logic
error in mask assignment code - it passed a pointer to
within a pool to a function which might potentially expand the pool.
This resulted in a failure of the test in the debug version.
Fix that by making a local copy of the value before passing
a pointer to it.
Change-Id: I73f3670672c3d86778aad0f944d052d0480cc593
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
This change avoids the long-read-after-short-write, resulting
in a small performance improvement.
Change-Id: Ic01d1fd19182e96100ccaf441d7ab5e5898b9734
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Change-Id: Ifc8b3d30d66c7ade1e3584844ce3f82d42d2fb94
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Change-Id: I63c36644c9d93f2c3ec6606ca0205b407499de4e
Signed-off-by: Eyal Bari <ebari@cisco.com>
|
|
Change-Id: Ieb8b53977fc8484c19780941e232ee072b667de3
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Configure w/ --enable-dlmalloc, see .../build-data/platforms/vpp.mk
src/vppinfra/dlmalloc.[ch] are slightly modified versions of the
well-known Doug Lea malloc. Main advantage: dlmalloc mspaces have no
inherent size limit.
Change-Id: I19b3f43f3c65bcfb82c1a265a97922d01912446e
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Fix the copypaste-triggered errors in load-from-file ACL test code.
Also, add an explicit exit(1) after clib_error() to convince coverity
that code path is terminal indeed.
Change-Id: I1deedb49144559c9183449005ada0433c24db18a
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Change-Id: Ie899ccbaae4df7cce4ebbba47ed6c3cce5269bdb
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
A small store into a middle of a larger structure that was subsequently
loaded for calculating the bihash key was noticeably impacting the performance.
Change-Id: If7f33e1b66e8b438ba7cc91abc0ca749850c6e45
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Slightly refactored from the initial implementation of the TupleMerge [1]
algorithm by Valerio Bruschi (valerio.bruschi@telecom-paristech.fr)
[1] James Daly, Eric Torng "TupleMerge: Building Online Packet Classifiers
by Omitting Bits", In Proc. IEEE ICCCN 2017, pp. 1-10
Also add startup parameters to turn on/off the algorithm ("use tuple merge 1/0"),
and a startup parameter to be able to tweak the split threshold
("tuple merge split threshold N"), the default value of the split threshold
is 39 as per paper, but some more tuning might be necessary to find the best
value.
This change, alongside with the optimizations which avoid extra lookups,
significantly reduces the slowdown on the ClassBench generated ACLs, which
are supposed to resemble realistic ACLs seen in use in the field.
Change-Id: I9713e4673970e9a62d4d9e9718365293375fab7b
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
- instantiate the per-use mask type entry for a given hash ACE
this prepares to adding tuplemerge where the applied ACE may
have a different mask type due to relaxing of the tuples
- store the vector of the colliding rules for linear lookups
rather than traversing the linked list.
- store the lowest rule index for a given mask type inside
the structure. This allows to skip looking up at the later
mask types if we already matched an entry that is in front
of the very first entry in the new candidate mask type,
thus saving a worthless hash table lookup.
- use a vector of mask type indices rather than bitmap,
in the sorted order (by construction) of ascending
lowest rule index - this allows to terminate the lookups
early.
- adapt the debug cli outputs accordingly to show the data
- propagate the is_ip6 into the inner calls
Change-Id: I7a67b271e66785c6eab738b632b432d5886a0a8a
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Change-Id: I6a3cfcb24f5027ec0f2cd2ec21ea47a01fef331b
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Trying to accomodate fragments as first class citizens
has shown to be more trouble than it's worth. So
fallback to linear ACL search in case it is a fragment
packet. Delete the corresponding code from the hash
matching.
Change-Id: Ic9ecc7c800d575615addb33dcaa89621462e9c7b
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
in more than one C file
Including the exports.h from multiple .c files belonging to a single plugin results in an error.
Rework the approach to require the table of function pointers to be filled in by
the initialization function.
Since the inline functions are compiled in the "caller" context,
there is no knowledge about the acl_main structure used by the ACL
plugin. To help with that, the signature of inline functions is slightly
different, taking the p_acl_main pointer as the first parameter.
That pointer is filled into the .p_acl_main field of the method
table during the initialization - since the calling of non-inline variants
would have required filling the method table, this should give
minimal headaches during the use and switch between the two methods.
Change-Id: Icb70695efa23579c46c716944838766cebc8573e
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
The commit 4bc1796b346efd10f3fb19b176ff089179263a24 had incorrect
calculation of the session lists minimal timeout, resulting
in returned value of 0 which resulted in existing sessions
constantly requeued, taking up the CPU. Fix this calculation.
Change-Id: I9a789739f96a1f01522c68f91b0a02db2417837f
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
optimize stores in the (l4/pkt)
Having two pieces of code - one for now much simpler to recreate L3 info,
one for a more difficult do build L4/pkt metadata allows more
degrees of freedom for optimizations.
Also, construct the metadata in local variables first before
saving it into the memory structure, this fewer memory stores
and they are better aligned, allowing to coalesce with
subsequent reads if needed.
Change-Id: Icb35d933834b14294f875362c9b58db3feb38d99
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
file for testing
Add a command "acl_add_replace_from_file" to VAT which can load a ruleset and
add an ACL with it. There are a few options which augment the ACL being created:
"permit+reflect" or "permit" alter the default action from deny on the ACEs
created.
"append-default-permit" adds an entry in the end with the "permit+reflect"
if the default action has been changed to permit+reflect, or with a simple
permit otherwise.
This command is IPv4-only because the available datasets were IPv4-only.
Change-Id: I26b9f33ecb6b59e051d1d9cbafedbc47e8203392
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Add a new kv_16_8 field into 5tuple union, rename
the existing kv into kv_40_8 for clarity, and
add the compile-time alignment constraints.
Change-Id: I9bfca91f34850a5c89cba590fbfe9b865e63ef94
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
contiguous with L4 data
Using ip46_address_t was convenient from operational point of view but created
some difficulties dealing with IPv4 addresses - the extra 3x of u32 padding
are costly, and the "holes" mean we can not use the smaller key-value
data structures for the lookup.
This commit changes the 5tuple layout for the IPv4 case, such that
the src/dst addresses directly precede the L4 information.
That will allow to treat the same data within 40x8 key-value
structure as a 16x8 key-value structure starting with 24 byte offset.
Change-Id: Ifea8d266ca0b9c931d44440bf6dc62446c1a83ec
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
- implement a 1us purgatory for the session structures
by adding a special connection list, where all connections
about to be deleted go.
- add per-list-head timeouts updated upon the list enqueue/dequeue
for connection idle management
- add a "unused" session list with list ID#0, which should
never be used unless there is a logic error. Use this ID
to initialize the sessions.
- improve the maintainability of the session linked list
structures by using symbolic bogus index name instead of ~0
- change the ordering of session creations - first reverse, then
local. To minimize the potential for two workers competing for
the same session in the corner case of the two packets
on different workers creating the same logical session
- reduce the maximum session count to keep the memory usage the same
- add extra log/debug/trace to session cleaning logic
- be more aggressive with cleaning up sessions - wind up the
interrupts from the workers to themselves if there is more
work to do
Change-Id: I3aa1c91a925a08e83793467cb15bda178c21e426
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Change-Id: Ibab5e27277f618ceb2d543b9d6a1a5f191e7d1db
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
next nodes
Use the new frame-at-once functions vlib_get_buffers() and vlib_buffer_enqueue_to_next()
to calculate the buffer pointers and to dispatch the packets after the processing.
This simplifies the dataplane node processing loop.
Change-Id: I454308f847aac76a199f8dd7490c1e176414bde7
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
clib_bihash_search_40_8 for session lookups
Use inline version rather than calling the function, this gives slightly better performance.
The straighforward diff uncovered an interesting problem: the stateful ACL IPv4 unit tests would fail
for the "make test" but succeed in "make test-debug". Also, they would succeed even in "make test",
if before calling the clib_bihash_search_inline_2_40_8 we would change the code
to store the key in a temporary variable.
Debugging revealed that the generated optimized code is not what one would expect:
the zeroing of the u64s overlaying the memcpy into ipv4 value of ip46_address_t
made the optimizer not notice the latter, and think that those fields should be
always zero in the bihash, thus generating incorrect assembly for the bihash key
comparison for the ipv4 nodes.
Changing the zeroing to be non-overlapping by zeroing only the pad fields resulted
in the optimizer generating the correct code and the tests pass.
Change-Id: Ib0f55cef2b5fe70c931d17ca4dc32a5755d160cd
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
per-packet session key
Using a separate session key has proven to be tricky for the following reasons:
- it's a lot of storage to have what looks to be nearly identical to 5tuple,
just maybe with some fields swapped
- shuffling the fields from 5tuple adds to memory pressure
- the fact that the fields do not coincide with the packet memory
means for any staged processing we need to use up a lot of memory
Thus, just add two entries into the bihash table pointing to
the same session entry, so we could match the packets from either
direction.
With this we have the key layout of L3 info (which takes up
the majority of space for IPv6 case) the same as in the packet,
thus, opening up the possibility for other optimizations.
Not having to create and store a separate session key
should also give us a small performance win in itself.
Also, add the routine to show the session bihash in a better
way than a bunch of numbers.
Alas, the memory usage in the bihash obviously doubles.
Change-Id: I8fd2ed4714ad7fc447c4fa224d209bc0b736b371
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
This commit splits the functions from fa_node.c
into the pure dataplane node functions (which are multiarch-compiled),
session management node functions (which are compiled only once),
and session find/add/delete functions which are split out into the inlines.
As part of the refactoring:
- get rid of BV() macros in the affected chunk of code,
rather use the explicit bihash function names.
- add the magic trailer to the new files to
ensure make checkstyle watches them.
- move the bihash_template.c include for 40_8 bihash into acl.c
Change-Id: I4d781e9ec4307ea84e92af93c09470ea2bd0c375
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|