summaryrefslogtreecommitdiffstats
path: root/src/plugins/map/map_doc.rst
diff options
context:
space:
mode:
Diffstat (limited to 'src/plugins/map/map_doc.rst')
-rw-r--r--src/plugins/map/map_doc.rst99
1 files changed, 99 insertions, 0 deletions
diff --git a/src/plugins/map/map_doc.rst b/src/plugins/map/map_doc.rst
new file mode 100644
index 00000000000..663e815d545
--- /dev/null
+++ b/src/plugins/map/map_doc.rst
@@ -0,0 +1,99 @@
+MAP and Lw4o6
+=============
+
+This is a memo intended to contain documentation of the VPP MAP and
+Lw4o6 implementations. Everything that is not directly obvious should
+come here.
+
+MAP-E Virtual Reassembly
+------------------------
+
+The MAP-E implementation supports handling of IPv4 fragments as well as
+IPv4-in-IPv6 inner and outer fragments. This is called virtual
+reassembly because the fragments are not actually reassembled. Instead,
+some meta-data are kept about the first fragment and reused for
+subsequent fragments.
+
+Fragment caching and handling is not always necessary. It is performed
+when: \* An IPv4 fragment is received and the destination IPv4 address
+is shared. \* An IPv6 packet is received with an inner IPv4 fragment,
+the IPv4 source address is shared, and ‘security-check fragments’ is on.
+\* An IPv6 fragment is received.
+
+There are 3 dedicated nodes: \* ip4-map-reass \* ip6-map-ip4-reass \*
+ip6-map-ip6-reass
+
+ip4-map sends all fragments to ip4-map-reass. ip6-map sends all
+inner-fragments to ip6-map-ip4-reass. ip6-map sends all outer-fragments
+to ip6-map-ip6-reass.
+
+IPv4 (resp. IPv6) virtual reassembly makes use of a hash table in order
+to store IPv4 (resp. IPv6) reassembly structures. The hash-key is based
+on the IPv4-src:IPv4-dst:Frag-ID:Protocol tuple (resp.
+IPv6-src:IPv6-dst:Frag-ID tuple, as the protocol is IPv4-in-IPv6).
+Therefore, each packet reassembly makes use of exactly one reassembly
+structure. When such a structure is allocated, it is timestamped with
+the current time. Finally, those structures are capable of storing a
+limited number of buffer indexes.
+
+An IPv4 (resp. IPv6) reassembly structure can cache up to
+MAP_IP4_REASS_MAX_FRAGMENTS_PER_REASSEMBLY (resp.
+MAP_IP6_REASS_MAX_FRAGMENTS_PER_REASSEMBLY) buffers. Buffers are cached
+until the first fragment is received.
+
+Virtual Reassembly configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+IPv4 and IPv6 virtual reassembly support the following configuration:
+map params reassembly [ip4 \| ip6] [lifetime ] [pool-size ] [buffers ]
+[ht-ratio ]
+
+lifetime: The time in milliseconds a reassembly structure is considered
+valid. The longer, the more reliable is reassembly, but the more likely
+it is to exhaust the pool of reassembly structures. IPv4 standard
+suggests a lifetime of 15 seconds. IPv6 specifies a lifetime of 60
+seconds. Those values are not realistic for high-throughput cases.
+
+buffers: The upper limit of buffers that are allowed to be cached. It
+can be used to protect against fragmentation attacks which would aim to
+exhaust the global buffers pool.
+
+pool-size: The number of reassembly structures that can be allocated. As
+each structure can store a small fixed number of fragments, it also sets
+an upper-bound of ‘pool-size \*
+MAP_IPX_REASS_MAX_FRAGMENTS_PER_REASSEMBLY’ buffers that can be cached
+in total.
+
+ht-ratio: The amount of buckets in the hash-table is pool-size \*
+ht-ratio.
+
+Any time pool-size and ht-ratio is modified, the hash-table is destroyed
+and created again, which means all current state is lost.
+
+Additional considerations
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Reassembly at high rate is expensive in terms of buffers. There is a
+trade-off between the lifetime and number of allocated buffers. Reducing
+the lifetime helps, but at the cost of loosing state for fragments that
+are wide apart.
+
+Let: R be the packet rate at which fragments are received. F be the
+number of fragments per packet.
+
+Assuming the first fragment is always received last. We should have:
+buffers > lifetime \* R / F \* (F - 1) pool-size > lifetime \* R/F
+
+This is a worst case. Receiving the first fragment earlier helps
+reducing the number of required buffers. Also, an optimization is
+implemented (MAP_IP6_REASS_COUNT_BYTES and MAP_IP4_REASS_COUNT_BYTES)
+which counts the number of transmitted bytes and remembers the total
+number of bytes which should be transmitted based on the last fragment,
+and therefore helps reducing ‘pool-size’.
+
+But the formula shows that it is challenging to forward a significant
+amount of fragmented packets at high rates. For instance, with a
+lifetime of 1 second, 5Mpps packet rate would require buffering up to
+2.5 millions fragments.
+
+If you want to do that, be prepared to configure a lot of fragments.