summaryrefslogtreecommitdiffstats
path: root/src/plugins/acl/acl_lookup_context.md
blob: e95f82043f9362006afcfd31c7feb4cf10858098 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
Lookup contexts aka "ACL as a service" {#acl_lookup_context}
======================================

The initial implementation of the ACL plugin had tightly tied the policy (L3-L4) ACLs
to ingress/egress processing on an interface.

However, some uses outside of pure traffic control have appeared, for example,
ACL-based forwarding, etc. Also, improved algorithms of the ACL lookup
could benefit of the more abstract representation, not coupled to the interfaces.

This describes a way to accommodate these use cases by generalizing the ACL
lookups into "ACL lookup contexts", not tied to specific interfaces, usable
by other portions of the code by utilizing the exports.h header file,
which provides the necessary interface.


Why "lookup contexts" and not "match me an ACL#" ?
================================================

The first reason is the logical grouping of multiple ACLs.

The interface matching code currently allows for matching multiple ACLs
in a 'first-match' fashion. Some other use cases also fall into a similar
pattern: they attempt to match a sequence of ACLs, and the first matched ACL
determines what the outcome is, e.g. where to forward traffic. Thus,
a match never happens on an ACL in isolation, but always on a group of
ACLs.

The second reason is potential optimizations in matching.

A naive match on series of ACLs each represented as a vector of ACEs
does not care about the API level - it could be "match one ACL", or
"match the set of ACLs" - there will be just a simple loop iterating over
the ACLs to match, returning the first match. Be it in the ACL code or
in the user code.

However, for more involved lookup methods, providing a more high-level
interface of matching over the entire group of ACLs allows for future
improvements in the algorithms, delivered at once to all the users
of the API.

What is a "lookup context" ?
============================

An ACL lookup context is an entity that groups the set of ACL#s
together for the purposes of a first-match lookup, and may store
additional internal information needed to optimize the lookups
for that particular vector of ACLs.

Using ACL contexts in your code
===============================

In order to use the ACL lookup contexts, you need to include
plugins/acl/exports.h into your code. This header includes
all the necessary dependencies required.

As you probably will invoke this code from another plugin,
the non-inline function calls are implemented via function pointers,
which you need to initialize by calling acl_plugin_exports_init(&acl_plugin), which,
if everything succeeds, returns 0 and fills in the acl_plugin structure
with pointers to the exported methods - else it will return clib_error_t with
more information about what went wrong.

When you have initialized the symbols, you also need to register yourself
as a user of the ACL lookups - this allows to track the ACL lookup context
ownership, as well as make the debug show outputs more user friendly.

To do that, call acl_plugin.register_user_module(caller_module_string, val1_label, val2_label) -
and record the returned value. This will bethe first parameter that you pass to create a new
lookup context. The passed strings must be static, and are used as descriptions for the ACL
contexts themselves, as well as labels for up to two user-supplied u32 labels, used to
differentiate the lookup contexts for the debugging purposes.

Creating a new context is done by calling acl_plugin.get_lookup_context_index(user_id, val1, val2).
The first argument is your "user" ID obtained in a registration call earlier, the other two
arguments are u32s with semantics that you designate. They are used purely for debugging purposes
in the "show acl lookup context" command.

To set the vector of ACL numbers to be looked up within the context, use the function
acl_plugin.set_acl_vec_for_context(lc_index, acl_list). The first parameter specifies the context
that you have created, the second parameter is a vector of u32s, each u32 being the index of the ACL
which we should be looking up within this context. The command is idempotent, i.e.
it unapplies the previously applied list of ACLs, and then sets the new list of ACLs.

Subsequent ACL updates for the already applied ACLs will cause the re-application
on an as-needed basis. Note, that the ACL application is potentially a relatively costly operation,
so it is only expected that these changes will be done in the control plane, NOT in the datapath.

The matching within the context is done using two functions - acl_plugin.fill_5tuple() and
acl_plugin.match_5tuple() and their corresponding inline versions, named acl_plugin_fill_5tuple_inline()
and acl_plugin_match_5tuple_inline(). The inline and non-inline versions have the equivalent functionality,
in that the non-inline version calls the inline version. These two variants are provided
for debugging/maintenance reasons.

When you no longer need a particular context, you can return the allocated resources by calling
acl_plugin.put_lookup_context_index() to mark it as free. The lookup structured associated with
the vector of ACLs set for the lookup are cleaned up automatically. However, the ACLs themselves
are not deleted and are available for subsequent reuse by other lookup contexts if needed.

There is one delicate detail that you might want to be aware of.
When the non-inline functions reference the inline functions,
they are compiled as part of ACL plugin; whereas when you refer to the inline
functions from your code, they are compiled as part of your code.
This makes referring to a single acl_main structure a little trickier.

It is done by having a static p_acl_main within the .h file, 
which points to acl_main of the ACL plugin, and is initialized by a static constructor
function.

This way the multiple includes and inlines will "just work" as one would expect.


Debug CLIs
==========

To see the state of the ACL lookup contexts, you can issue "show acl-plugin lookup user" to see
all of the users which registered for the usage of the ACL plugin lookup contexts,
and "show acl-plugin lookup context" to show the actual contexts created. You will notice
that the latter command uses the values supplied during the module registration in order to
make the output more friendly.

The "show acl-plugin acl" and "show acl-plugin interface" commands have also acquired the
notion of lookup context, but there it is used from the client perspective, since
with this change the interface ACL lookup itself is a user of ACL lookup contexts.
* RTE IP Fragmentation and Reassembly * * Implementation of IP packet fragmentation and reassembly. */ #ifdef __cplusplus extern "C" { #endif #include <stdint.h> #include <stdio.h> #include <rte_config.h> #include <rte_malloc.h> #include <rte_memory.h> #include <rte_ip.h> #include <rte_byteorder.h> struct rte_mbuf; enum { IP_LAST_FRAG_IDX, /**< index of last fragment */ IP_FIRST_FRAG_IDX, /**< index of first fragment */ IP_MIN_FRAG_NUM, /**< minimum number of fragments */ IP_MAX_FRAG_NUM = RTE_LIBRTE_IP_FRAG_MAX_FRAG, /**< maximum number of fragments per packet */ }; /** @internal fragmented mbuf */ struct ip_frag { uint16_t ofs; /**< offset into the packet */ uint16_t len; /**< length of fragment */ struct rte_mbuf *mb; /**< fragment mbuf */ }; /** @internal <src addr, dst_addr, id> to uniquely identify fragmented datagram. */ struct ip_frag_key { uint64_t src_dst[4]; /**< src and dst address, only first 8 bytes used for IPv4 */ RTE_STD_C11 union { uint64_t id_key_len; /**< combined for easy fetch */ __extension__ struct { uint32_t id; /**< packet id */ uint32_t key_len; /**< src/dst key length */ }; }; }; /** * @internal Fragmented packet to reassemble. * First two entries in the frags[] array are for the last and first fragments. */ struct ip_frag_pkt { TAILQ_ENTRY(ip_frag_pkt) lru; /**< LRU list */ struct ip_frag_key key; /**< fragmentation key */ uint64_t start; /**< creation timestamp */ uint32_t total_size; /**< expected reassembled size */ uint32_t frag_size; /**< size of fragments received */ uint32_t last_idx; /**< index of next entry to fill */ struct ip_frag frags[IP_MAX_FRAG_NUM]; /**< fragments */ } __rte_cache_aligned; #define IP_FRAG_DEATH_ROW_LEN 32 /**< death row size (in packets) */ /* death row size in mbufs */ #define IP_FRAG_DEATH_ROW_MBUF_LEN (IP_FRAG_DEATH_ROW_LEN * (IP_MAX_FRAG_NUM + 1)) /** mbuf death row (packets to be freed) */ struct rte_ip_frag_death_row { uint32_t cnt; /**< number of mbufs currently on death row */ struct rte_mbuf *row[IP_FRAG_DEATH_ROW_MBUF_LEN]; /**< mbufs to be freed */ }; TAILQ_HEAD(ip_pkt_list, ip_frag_pkt); /**< @internal fragments tailq */ /** fragmentation table statistics */ struct ip_frag_tbl_stat { uint64_t find_num; /**< total # of find/insert attempts. */ uint64_t add_num; /**< # of add ops. */ uint64_t del_num; /**< # of del ops. */ uint64_t reuse_num; /**< # of reuse (del/add) ops. */ uint64_t fail_total; /**< total # of add failures. */ uint64_t fail_nospace; /**< # of 'no space' add failures. */ } __rte_cache_aligned; /** fragmentation table */ struct rte_ip_frag_tbl { uint64_t max_cycles; /**< ttl for table entries. */ uint32_t entry_mask; /**< hash value mask. */ uint32_t max_entries; /**< max entries allowed. */ uint32_t use_entries; /**< entries in use. */ uint32_t bucket_entries; /**< hash associativity. */ uint32_t nb_entries; /**< total size of the table. */ uint32_t nb_buckets; /**< num of associativity lines. */ struct ip_frag_pkt *last; /**< last used entry. */ struct ip_pkt_list lru; /**< LRU list for table entries. */ struct ip_frag_tbl_stat stat; /**< statistics counters. */ __extension__ struct ip_frag_pkt pkt[0]; /**< hash table. */ }; /** IPv6 fragment extension header */ #define RTE_IPV6_EHDR_MF_SHIFT 0 #define RTE_IPV6_EHDR_MF_MASK 1 #define RTE_IPV6_EHDR_FO_SHIFT 3 #define RTE_IPV6_EHDR_FO_MASK (~((1 << RTE_IPV6_EHDR_FO_SHIFT) - 1)) #define RTE_IPV6_FRAG_USED_MASK \ (RTE_IPV6_EHDR_MF_MASK | RTE_IPV6_EHDR_FO_MASK) #define RTE_IPV6_GET_MF(x) ((x) & RTE_IPV6_EHDR_MF_MASK) #define RTE_IPV6_GET_FO(x) ((x) >> RTE_IPV6_EHDR_FO_SHIFT) #define RTE_IPV6_SET_FRAG_DATA(fo, mf) \ (((fo) & RTE_IPV6_EHDR_FO_MASK) | ((mf) & RTE_IPV6_EHDR_MF_MASK)) struct ipv6_extension_fragment { uint8_t next_header; /**< Next header type */ uint8_t reserved; /**< Reserved */ uint16_t frag_data; /**< All fragmentation data */ uint32_t id; /**< Packet ID */ } __attribute__((__packed__)); /** * Create a new IP fragmentation table. * * @param bucket_num * Number of buckets in the hash table. * @param bucket_entries * Number of entries per bucket (e.g. hash associativity). * Should be power of two. * @param max_entries * Maximum number of entries that could be stored in the table. * The value should be less or equal then bucket_num * bucket_entries. * @param max_cycles * Maximum TTL in cycles for each fragmented packet. * @param socket_id * The *socket_id* argument is the socket identifier in the case of * NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA constraints. * @return * The pointer to the new allocated fragmentation table, on success. NULL on error. */ struct rte_ip_frag_tbl * rte_ip_frag_table_create(uint32_t bucket_num, uint32_t bucket_entries, uint32_t max_entries, uint64_t max_cycles, int socket_id); /** * Free allocated IP fragmentation table. * * @param tbl * Fragmentation table to free. */ void rte_ip_frag_table_destroy(struct rte_ip_frag_tbl *tbl); /** * This function implements the fragmentation of IPv6 packets. * * @param pkt_in * The input packet. * @param pkts_out * Array storing the output fragments. * @param nb_pkts_out * Number of fragments. * @param mtu_size * Size in bytes of the Maximum Transfer Unit (MTU) for the outgoing IPv6 * datagrams. This value includes the size of the IPv6 header. * @param pool_direct * MBUF pool used for allocating direct buffers for the output fragments. * @param pool_indirect * MBUF pool used for allocating indirect buffers for the output fragments. * @return * Upon successful completion - number of output fragments placed * in the pkts_out array. * Otherwise - (-1) * errno. */ int32_t rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in, struct rte_mbuf **pkts_out, uint16_t nb_pkts_out, uint16_t mtu_size, struct rte_mempool *pool_direct, struct rte_mempool *pool_indirect); /** * This function implements reassembly of fragmented IPv6 packets. * Incoming mbuf should have its l2_len/l3_len fields setup correctly. * * @param tbl * Table where to lookup/add the fragmented packet. * @param dr * Death row to free buffers to * @param mb * Incoming mbuf with IPv6 fragment. * @param tms * Fragment arrival timestamp. * @param ip_hdr * Pointer to the IPv6 header. * @param frag_hdr * Pointer to the IPv6 fragment extension header. * @return * Pointer to mbuf for reassembled packet, or NULL if: * - an error occurred. * - not all fragments of the packet are collected yet. */ struct rte_mbuf *rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, uint64_t tms, struct ipv6_hdr *ip_hdr, struct ipv6_extension_fragment *frag_hdr); /** * Return a pointer to the packet's fragment header, if found. * It only looks at the extension header that's right after the fixed IPv6 * header, and doesn't follow the whole chain of extension headers. * * @param hdr * Pointer to the IPv6 header. * @return * Pointer to the IPv6 fragment extension header, or NULL if it's not * present. */ static inline struct ipv6_extension_fragment * rte_ipv6_frag_get_ipv6_fragment_header(struct ipv6_hdr *hdr) { if (hdr->proto == IPPROTO_FRAGMENT) { return (struct ipv6_extension_fragment *) ++hdr; } else return NULL; } /** * IPv4 fragmentation. * * This function implements the fragmentation of IPv4 packets. * * @param pkt_in * The input packet. * @param pkts_out * Array storing the output fragments. * @param nb_pkts_out * Number of fragments. * @param mtu_size * Size in bytes of the Maximum Transfer Unit (MTU) for the outgoing IPv4 * datagrams. This value includes the size of the IPv4 header. * @param pool_direct * MBUF pool used for allocating direct buffers for the output fragments. * @param pool_indirect * MBUF pool used for allocating indirect buffers for the output fragments. * @return * Upon successful completion - number of output fragments placed * in the pkts_out array. * Otherwise - (-1) * errno. */ int32_t rte_ipv4_fragment_packet(struct rte_mbuf *pkt_in, struct rte_mbuf **pkts_out, uint16_t nb_pkts_out, uint16_t mtu_size, struct rte_mempool *pool_direct, struct rte_mempool *pool_indirect); /** * This function implements reassembly of fragmented IPv4 packets. * Incoming mbufs should have its l2_len/l3_len fields setup correclty. * * @param tbl * Table where to lookup/add the fragmented packet. * @param dr * Death row to free buffers to * @param mb * Incoming mbuf with IPv4 fragment. * @param tms * Fragment arrival timestamp. * @param ip_hdr * Pointer to the IPV4 header inside the fragment. * @return * Pointer to mbuf for reassembled packet, or NULL if: * - an error occurred. * - not all fragments of the packet are collected yet. */ struct rte_mbuf * rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, uint64_t tms, struct ipv4_hdr *ip_hdr); /** * Check if the IPv4 packet is fragmented * * @param hdr * IPv4 header of the packet * @return * 1 if fragmented, 0 if not fragmented */ static inline int rte_ipv4_frag_pkt_is_fragmented(const struct ipv4_hdr * hdr) { uint16_t flag_offset, ip_flag, ip_ofs; flag_offset = rte_be_to_cpu_16(hdr->fragment_offset); ip_ofs = (uint16_t)(flag_offset & IPV4_HDR_OFFSET_MASK); ip_flag = (uint16_t)(flag_offset & IPV4_HDR_MF_FLAG); return ip_flag != 0 || ip_ofs != 0; } /** * Free mbufs on a given death row. * * @param dr * Death row to free mbufs in. * @param prefetch * How many buffers to prefetch before freeing. */ void rte_ip_frag_free_death_row(struct rte_ip_frag_death_row *dr, uint32_t prefetch); /** * Dump fragmentation table statistics to file. * * @param f * File to dump statistics to * @param tbl * Fragmentation table to dump statistics from */ void rte_ip_frag_table_statistics_dump(FILE * f, const struct rte_ip_frag_tbl *tbl); /** * Delete expired fragments * * @param tbl * Table to delete expired fragments from * @param dr * Death row to free buffers to * @param tms * Current timestamp */ void __rte_experimental rte_frag_table_del_expired_entries(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row *dr, uint64_t tms); #ifdef __cplusplus } #endif #endif /* _RTE_IP_FRAG_H_ */