1 files changed, 243 insertions, 0 deletions
diff --git a/src/plugins/acl/acl_hash_lookup_doc.rst b/src/plugins/acl/acl_hash_lookup_doc.rst
new file mode 100644
index 00000000000..72842af423d
--- /dev/null
+++ b/src/plugins/acl/acl_hash_lookup_doc.rst
@@ -0,0 +1,243 @@
+ACL plugin constant-time lookup
+===============================
+
+The initial implementation of ACL plugin performs a trivial for() cycle,
+going through the assigned ACLs on a per-packet basis. This is not very
+efficient, even if for very short ACLs due to its simplicity it can beat
+more advanced methods.
+
+However, to cover the case of longer ACLs with acceptable performance,
+we need to have a better way of matching. This write-up proposes a
+mechanism to make a lookup from O(M) where M is number of entries to
+O(N) where N is number of different mask combinations.
+
+Preparation of ACL(s)
+---------------------
+
+The ACL plugin will maintain a global list of “mask types”, i.e. the
+specific configurations of “do not care” bits within the ACEs. Upon the
+creation of a new ACL, a pass will be made through all the ACEs, to
+assign and possibly allocate the “mask type number”.
+
+Each ACL has a structure *hash_acl_info_t* representing the “hash-based”
+parts of information related to that ACL, primarily the array of
+*hash_ace_info_t* structures - each of the members of that array
+corresponding to one of the rules (ACEs) in the original ACL, for this
+they have a pair of *(acl_index, ace_index)* to keep track,
+predominantly for debugging.
+
+Why do we need a whole separate structure, and are not adding new fields
+to the existing rule structure? First, encapsulation, to minimize the
+pollution of the main ACL code with the hash-based lookup artifacts.
+Second, one rule may correspond to more than one “hash-based” ACE. In
+fact, most of the rules do correspond to two of those. Why ?
+
+Consider that the current ACL lookup logic is that if a packet is not
+the initial fragment, and there is an L4 entry acting on the packet, the
+comparison will be made only on the L4 protocol field value rather than
+on the protocol and port values. This behavior is governed by
+*l4_match_nonfirst_fragment* flag in the *acl_main*, and is needed to
+maintain the compatibility with the existing software switch
+implementation.
+
+While for the sequential check in *single_acl_match_5tuple()* it is very
+easy to implement by just breaking out at the right moment, in case of
+hash-based matching this cost us two checks: one on full 5-tuple and the
+flag *pkt.is_nonfirst_fragment* being zero, the second on 3-tuple and
+the flag *pkt.is_nonfirst_fragment* being one, with the second check
+triggered by the *acl_main.l4_match_nonfirst_fragment* setting being the
+default 1. This dictates the necessity of having a “match” field in a
+given *hash_ace_info_t* element, which would reflect the value we are
+supposed to match after applying the mask.
+
+There can be other circumstances when it might be beneficial to expand
+the given rule in the original ACL into multiple - for example, as an
+optimization within the port range handling for small port ranges (this
+is not done as of the time of writing).
+
+Assigning ACLs to an interface
+------------------------------
+
+Once the ACL list is assigned to an interface, or, rather, a new ACL is
+added to the list of the existing ACLs applied to the interface, we need
+to update the bihash accelerating the lookup.
+
+All the entries for the lookups are stored within a single *48_8*
+bihash, which captures the 5-tuple from the packet as well as the
+miscellaneous per-packet information flags, e.g. *l4_valid*,
+*is_non_first_fragment*, and so on. To facilitate the use of the single
+bihash by all the interfaces, the *is_ip6*, *is_input*, *sw_if_index*
+are part of the key, as well as *mask_type_index* - the latter being
+necessary because there can be entries with the same value but different
+masks, e.g.: ``permit ::/0, permit::/128``.
+
+At the moment of an ACL being applied to an interface, we need to walk
+the list of *hash_ace_info_t* entries corresponding to that ACL, and
+update the bihash with the keys corresponding to the match values in
+these entries.
+
+The value of the hash match contains the index into a per-*sw_if_index*
+vector of *applied_ace_hash_entry_t* elements, as well as a couple of
+flags: *shadowed* (optimization: if this flag on a matched entry is
+zero, means we can stop the lookup early and declare a match - see
+below), and *need_portrange_check* - meaning that what matched was a
+superset of the actual match, and we need to perform an extra check.
+
+Also, upon insertion, we must keep in mind there can be multiple
+*applied_ace_hash_entry_t* for the same key and must keep a list of
+those. This is necessary to incrementally apply/unapply the ACLs as part
+of the ACL vector: say, two ACLs have “permit 2001:db8::1/128 any” - we
+should be able to retain the entry for the second ACL even if we have
+deleted the first one. Also, in case there are two entries with the same
+key but different port ranges, say 0..42 and 142..65535 - we need to be
+able to sequentially match on those if we decide not to expand them into
+individual port-specific entries.
+
+Per-packet lookup
+-----------------
+
+The simple single-packet lookup is defined in
+*multi_acl_match_get_applied_ace_index*, which returns the index of the
+applied hash ACE if there was a match, or ~0 if there wasn’t.
+
+The future optimized per-packet lookup may be batched in three phases:
+
+1. Prepare the keys in the per-worker vector by doing logical AND of
+   original 5-tuple record with the elements of the mask vector.
+2. Lookup the keys in the bihash in a batch manner, collecting the
+   result with lowest u64 (acl index within vector, ACE index) from the
+   hash lookup value, and performing the list walk if necessary (for
+   portranges).
+3. Take the action from the ACL record as defined by (ACL#, ACE#) from
+   the resulting lookup winner, or, if no match found, then perform
+   default deny.
+
+Shadowed/independent/redundant ACEs
+-----------------------------------
+
+During the phase of combining multiple ACLs into one rulebase, when they
+are applied to interface, we also can perform several optimizations.
+
+If a given ACE is a strict subset of another ACE located up in the
+linear search order, we can ignore this ACE completely - because by
+definition it will never match. We will call such an ACE *redundant*.
+Here is an example:
+
+::
+
+   permit 2001:db8:1::/48 2001:db8:2::/48   (B)
+   deny 2001:d8b:1:1::/64 2001:db8:2:1::/64 (A)
+
+A bit more formally, we can define this relationship of an ACE A to ACE
+B as:
+
+::
+
+   redundant(aceA, aceB) := (contains(protoB, protoA) && contains(srcB, srcA)
+                             && contains(dstB, dstA) && is_after(A, B))
+
+Here as “contains” we define an operation operating on the sets defined
+by the protocol, (srcIP, srcPortDefinition) and (dstIP,
+dstPortDefinition) respectively, and returning true if all the elements
+represented by the second argument are represented by the first
+argument. The “is_after” is true if A is located below B in the ruleset.
+
+If a given ACE does not intersect at all with any other ACE in front of
+it, we can mark it as such.
+
+Then during the sequence of the lookups the successful hit on this ACE
+means we do not need to look up other mask combinations - thus
+potentially significantly speeding up the match process. Here is an
+example, assuming we have the following ACL:
+
+::
+
+   permit 2001:db8:1::/48 2001:db8:2::/48    (B)
+   deny 2001:db8:3::/48 2001:db8:2:1::/64    (A)
+
+In this case if we match the second entry, we do not need to check
+whether we have matched the first one - the source addresses are
+completely different. We call such an ACE *independent* from another.
+
+We can define this as
+
+::
+
+   independent(aceA, aceB) := (!intersect(protoA, protoB) ||
+                               !intersect(srcA, srcB) ||
+                               !intersect(dstA, dstB))
+
+where intersect is defined as operation returning true if there are
+elements belonging to the sets of both arguments.
+
+If the entry A is neither redundant nor independent from B, and is below
+B in the ruleset, we call such an entry *shadowed* by B, here is an
+example:
+
+::
+
+   deny tcp 2001:db8:1::/48 2001:db8:2::/48         (B)
+   permit 2001:d8b:1:1::/64 2001:db8:2:1::/64    (A)
+
+This means the earlier rule “carves out” a subset of A, thus leaving a
+“shadow”. (Evidently, the action needs to be different for the shadow to
+have an effect, but for for the terminology sake we do not care).
+
+The more formal definition:
+
+::
+
+   shadowed(aceA, aceB) := !redundant(aceA, aceB) &&
+                           !independent(aceA, aceB) &&
+                           is_after(aceA, aceB)
+
+Using this terminology, any ruleset can be represented as a DAG
+(Directed Acyclic Graph), with the bottom being the implicit “deny any”,
+pointing to the set of rules shadowing it or the ones it is redundant
+for.
+
+These rules may in turn be shadowing each other. There is no cycles in
+this graph because of the natural order of the rules - the rule located
+closer to the end of the ruleset can never shadow or make redundant a
+rule higher up.
+
+The optimization that enables can allow for is to skip matching certain
+masks on a per-lookup basis - if a given rule has matched, the only
+adjustments that can happen is the match with one of the shadowing
+rules.
+
+Also, another avenue for the optimization can be starting the lookup
+process with the mask type that maximizes the chances of the independent
+ACE match, thus resulting in an ACE lookup being a single hash table
+hit.
+
+Plumbing
+--------
+
+All the new routines are located in a separate file, so we can cleanly
+experiment with a different approach if this does not fit all of the use
+cases.
+
+The constant-time lookup within the data path has the API with the same
+signature as:
+
+::
+
+   u8
+   multi_acl_match_5tuple (u32 sw_if_index, fa_5tuple_t * pkt_5tuple, int is_l2,
+                          int is_ip6, int is_input, u32 * acl_match_p,
+                          u32 * rule_match_p, u32 * trace_bitmap)
+
+There should be a new upper-level function with the same signature,
+which will make a decision whether to use a linear lookup, or to use the
+constant-time lookup implemented by this work, or to add some other
+optimizations (e.g. by keeping the cache of the last N lookups).
+
+The calls to the routine doing preparatory work should happen in
+``acl_add_list()`` after creating the linear-lookup structures, and the
+routine doing the preparatory work populating the hashtable should be
+called from ``acl_interface_add_del_inout_acl()`` or its callees.
+
+The initial implementation will be geared towards looking up a single
+match at a time, with the subsequent optimizations possible to make the
+lookup for more than one packet.