docs/developer/corefeatures/fib/routes.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353

.. _routes:

Routes
^^^^^^

Basics
------

The anatomy of a route is crucial to understand:

.. code-block:: console

                1.1.1.0/24 via 10.0.0.1 eth0

A route is composed of two parts; **what** to match against and **how** to forward
the matched packets. In the above example we want to match packets
whose destination IP address is in the 1.1.1.0/24 subnet and then we
want to forward those packet to 10.0.0.1 on interface eth0. We
therefore want to match the **prefix** 1.1.1.0/24 and forward on the
**path** to 10.0.0.1, eth0.

Matching on a prefix is the particular task of the IP FIB, matching on
other packet attributes is done by other subsystems, e.g. matching on
MPLS labels in the MPLS-FIB, or matching on a tuple in ACL based
forwarding (ABF), 'matching' on all packets that arrive on an L3
interface (l3XC). Although these subsystems match on different
properties, they share the infrastructure on **how** to forward
matched packets, that is they share the **paths**. The FIB paths (or
really the path-list) thus provide services to clients, this service
is to **contribute** forwarding, this, in terms that will be made
clear in later sections, is to provide the DPO to use.

The prime function of the FIB is to *resolve* the paths for a
route. To resolve a route is to construct an object graph that fully
describes how to forward matching packets. This means that the graph
must terminate with an object (the leaf node) that describes how
to send a packet on an interface [#f1]_, i.e what encap to add to the
packet and what interface to send it to; this is the purpose of the IP
adjacency object. In Figure 3 the route is resolved as the graph is
complete from *fib_entry_t* to *ip_adjacency_t*.


Thread Model
^^^^^^^^^^^^

The FIB is not thread safe. All actions on the FIB are expected to
occur exclusively in the main thread. However, the data-structures
that FIB updates to add routes are thread safe,
w.r.t. addition/deletion and read, therefore routes can be added
without holding the worker thread barrier lock.


Tables
------

An IP FIB is a set of prefixes against which to match; it is
sub-address family (SAFI) specific (i.e. there is one for ipv4 and ipv6, unicast
and multicast). An IP Table is address family (AFI) specific (i.e. the
'table' includes the unicast and multicast FIB).

Each FIB is identified by the SAFI and instance number (the [pool]
index), each table is identified by the AFI and ID. The table's ID is
assigned by the user when the table is constructed. Table ID 0 is
reserved for the global/default table.

In most routing models a VRF is composed of an IPv4 and IPv6 table,
however, VPP has no construct to model this association, it deals only
with tables and FIBs.

A unicast FIB is comprised of two route data-bases; forwarding and non-forwarding. The
forwarding data-base contains routes against which a packet will perform a longest
prefix match (LPM) in the data-plane. The non-forwarding DB contains all the routes
with which VPP has been programmed. Some of these routes may be
unresolved, preventing their insertion into the forwarding DB.
(see section: Adjacency source FIB entries).

Model
-----

The route data is decomposed into three parts; entry, path-list and paths;

* The *fib_entry_t*, which contains the route's prefix, is the representation of that prefix's entry in the FIB table.
* The *fib_path_t* is a description of where to send the packets destined to the route's prefix. There are several types of path, including:

    * Attached next-hop: the path is described with an interface and a next-hop. The next-hop is in the same sub-net as the router's own address on that interface, hence the peer is considered to be *attached*

    * Attached: the path is described only by an interface. An
      attached path means that all addresses covered by the route's
      prefix are on the same L2 segment to which that router's
      interface is attached. This means it is possible to ARP for any
      address covered by the route's prefix. If this is not the case
      then another device in that L2 segment needs to run proxy
      ARP. An attached path is really only appropriate for a point-to-point
      (P2P) interface where ARP is not required, i.e. a GRE tunnel. On
      a p2p interface, attached and attached-nexthop paths will
      resolve via a special 'auto-adjacency'. This is an adjacency
      whose next-hop is the all zeros address and describes the only
      peer on the link.

    * Recursive: The path is described only via the next-hop and table-id.

    * De-aggregate: The path is described only via the special all
      zeros address and a table-id. This implies a subsequent lookup
      in the table should be performed.

    * There are other path types, please consult the code.

* The *fib_path_list_t* represents the list of paths from which to choose when forwarding. A path-list is a shared object, i.e. it is the parent to multiple fib_entry_t children. In order to share any object type it is necessary for a child to search for an existing object matching its requirements. For this there must be a database. The key to the path-list database is a combined description of all of the paths it contains [#f2]_.  Searching the path-list database is required with each route addition, so it is populated only with path-lists for which sharing will bring convergence benefits (see Section: :ref:`fastconvergence`).

.. figure:: /_images/fib20fig2.png

Figure 2: Route data model class diagram

Figure 2 shows an example of a route with two attached-next-hop paths. Each of these
paths will *resolve* by finding the adjacency that matches the paths attributes, which
are the same as the key for the adjacency database [#f3]_. The *forwarding information (FI)*
is the set of adjacencies that are available for load-balancing the traffic in the
data-plane. A path *contributes* an adjacency to the route's forwarding information, the
path-list contributes the full forwarding information for IP packets.

.. figure:: /_images/fib20fig3.png

Figure 3: Route object diagram

Figure 3 shows the object instances and their relationships created in order to resolve
the routes also shown. The graph nature of these relationships is evident; children
are displayed at the top of the diagram, their parents below them. Forward walks are
thus from top to bottom, back walks bottom to top. The diagram shows the objects
that are shared, the path-list and adjacency. Sharing objects is critical to fast
convergence (see section :ref:`fastconvergence`).

FIB sources
"""""""""""
There are various entities in the system that can add routes to the FIB tables.
Each of these entities is termed a *source*. When the same prefix is added by different
sources the FIB must arbitrate between them to determine which source will contribute
the forwarding information. Since each source determines the forwarding information
using different best path and loop prevention algorithms, it is not correct for the
forwarding information of multiple sources to be combined. Instead the FIB must choose
to use the forwarding information from only one source. This choice is based on a static
priority assignment [#f4]_. The FIB must maintain the information each source has added
so it can be restored should that source become the best source. VPP has two
*control-plane* sources; the API and the CLI the API has the higher priority.
Each *source* data is represented by a *fib_entry_src_t* object of which a
*fib_entry_t* maintains a sorted vector.

The following configuration:

.. code-block:: console

   $ set interface ip address GigabitEthernet0/8/0 192.168.1.1/24

results in the addition of two FIB entries; 192.168.1.0/24 which is connected and
attached, and 192.168.1.1/32 which is connected and local (a.k.a.
receive or for-us). A prefix is *connected* when it is applied to a router's interface.
Both prefixes are *interface* sourced. The interface source has a high priority, so
the accidental or nefarious addition of identical prefixes does not prevent the
router from correctly forwarding. Packets matching a connected prefix will
generate an ARP request for the packets destination address, this process is known
as a *glean*.

An *attached* prefix also results in a glean, but the router does not have its own
address in that sub-net. The following configuration will result in an attached
route, which resolves via an attached path;

.. code-block:: console

   $ ip route add table X 10.10.10.0/24 via gre0

as mentioned before, these are only appropriate for point-to-point
links.

If table X is not the table to which gre0 is bound,
then this is the case of an attached export (see the section :ref:`attachedexport`).

Adjacency source FIB entries
""""""""""""""""""""""""""""

Whenever an ARP entry is created it will source a *fib_entry_t*. In this case the
route is of the form:

.. code-block:: console

   $ ip route add table X 10.0.0.1/32 via 10.0.0.1 GigabitEthernet0/8/0

This is a host prefix with a path whose next-hop address is the same host. This route
highlights the distinction between the route's prefix - a description of the traffic
to match - and the path - a description of where to send the matched traffic.
Table X is the same table to which the interface is bound. FIB entries that are
sourced by adjacencies are termed *adj-fibs*. The priority of the adjacency source
is lower than the API source, so the following configuration:

.. code-block:: console

   $ set interface address 192.168.1.1/24 GigabitEthernet0/8/0
   $ ip arp 192.168.1.2 GigabitEthernet0/8/0 dead.dead.dead
   $ ip route add 192.168.1.2 via 10.10.10.10 GigabitEthernet1/8/0

will forward traffic for 192.168.1.2 via GigabitEthernet1/8/0. That is the route added by the control
plane is favoured over the adjacency discovered by ARP. The control plane, with its
associated authentication, is considered the authoritative source. To counter the
nefarious addition of adj-fibs, through the nefarious injection of adjacencies, the
FIB is also required to ensure that only adj-fibs whose less specific covering prefix
is attached are installed in forwarding. This requires the use of *cover tracking*,
where a route maintains a dependency relationship with the route that is its less
specific cover. When this cover changes (i.e. there is a new covering route) or the
forwarding information of the cover is updated, then the covered route is notified.
Adj-fibs that fail this cover check are not installed in the fib_table_t's forwarding
table, they are only present in the non-forwarding table.

Overlapping sub-nets are not supported, so no adj-fib has multiple paths. The control
plane is expected to remove a prefix configured for an interface before the interface
changes VRF.

Recursive Routes
""""""""""""""""

Figure 4 shows the data structures used to describe a recursive route. The
representation is almost identical to attached next-hop paths. The difference
being that the *fib_path_t* has a parent that is another *fib_entry_t*, termed the
*via-entry*

.. figure:: /_images/fib20fig4.png

Figure 4: Recursive route class diagram.

In order to forward traffic to 64.10.128.0/20 the FIB must first determine how to forward
traffic to 1.1.1.1/32. This is recursive resolution. Recursive resolution, which is
essentially a cache of the data-plane result, emulates a longest prefix match for the
*via-address" 1.1.1.1 in the *via-table* table 0 [#f5]_.

Recursive resolution (RR) will source a host-prefix entry in the via-table for the
via-address. The RR source is a low priority source. In the unlikely [#f6]_ event that the
RR source is the best source, then it must derive forwarding information from its
covering prefix.

There are two cases to consider:

* The cover is connected [#f7]_. The via-address is then an attached host and the RR source can resolve directly via the adjacency with the key {via-address, interface-of-connected-cover}
* The cover is not connected [#f8]_. The RR source can directly inherit the forwarding information from its cover.

This dependency on the covering prefix means the RR source will track its cover The
covering prefix will *change* when;

* A more specific prefix is inserted. For this reason whenever an entry is inserted into a FIB table its cover must be found so that its covered dependents can be informed.
* The existing cover is removed. The covered prefixes must form a new relationship with the next less specific.

The cover will be *updated* when the route for the covering prefix is modified. The
cover tracking mechanism will provide the RR sourced entry with a notification in the
event of a change or update of the cover, and the source can take the necessary action.

The RR sourced FIB entry becomes the parent of the *fib_path_t* and will contribute its
forwarding information to that path, so that the child's FIB entry can construct its own
forwarding information.

Figure 5 shows the object instances created to represent the recursive route and
its resolving route also shown.

.. figure:: /_images/fib20fig5.png

Figure 5: Recursive Routes object diagram

If the source adding recursive routes does not itself perform recursive resolution [#f9]_
then it is possible that the source may inadvertently programme a recursion loop.

An example of a recursion loop is the following configuration:

.. code-block:: console

   $ ip route add 5.5.5.5/32 via 6.6.6.6
   $ ip route add 6.6.6.6/32 via 7.7.7.7
   $ ip route add 7.7.7.7/32 via 5.5.5.5

This shows a loop over three levels, but any number is possible. FIB will detect
recursion loops by forward walking the graph when a *fib_entry_t* forms a child-parent
relationship with a *fib_path_list_t*. The walk checks to see if the same object instances
are encountered. When a recursion loop is formed the control plane [#f10]_ graph becomes
cyclic, thus allowing the child-parent dependencies to form. This is necessary so that
when the loop breaks, the affected children and be updated.

Output labels
"""""""""""""

A route may have associated output MPLS labels [#f11]_. These are labels that are expected
to be imposed on a packet as it is forwarded. It is important to note that an MPLS
label is per-route and per-path, therefore, even though routes share paths they do not
necessarily have the same label for that path [#f12]_. A label is therefore uniquely associated
to a *fib_entry_t* and associated with one of the *fib_path_t* to which it forwards.
MPLS labels are modelled via the generic concept of a *path-extension*. A *fib_entry_t*
therefore has a vector of zero to many *fib_path_ext_t* objects to represent the labels
with which it is configured.


Delegates
^^^^^^^^^

A common software development pattern, a delegate is a means to
extend the functionality of one object through composition of
another, these other objects are called delegates. Both
**fib_entry_t** and **ip_adjacency_t** support extension via delegates.

The FIB uses delegates to add functionality when those functions are
required by only a few objects instances rather than all of them, to
save on memory. For example, building/contributing a load-balance
object used to forward non-EOS MPLS traffic is only required for a
fib_entry_t that corresponds to a BGP peer and that peer is
advertising labeled route - there are only a few of
these. See **fib_entry_delegate.h** for a full list of delegate types.


Tracking
^^^^^^^^

A prime service FIB provides for other sub-system is the ability to
'track' the forwarding for a given next-hop. For example, a tunnel
will want to know how to forward to its destination address. It can
therefore request of the FIB to track this host-prefix and inform it
when the forwarding for that prefix changes.

FIB tracking sources a host-prefix entry in the FIB using the 'recusive
resolution (RR)' source, it exactly the same way that a recursive path
does. If the entry did not previously exist, then the RR source will
inherit (and track) forwarding from its covering prefix, therefore all
packets that match this entry are forwarded in the same way as if the
entry did not exist. The tunnel that is tracking this FIB entry will
become a child dependent. The benefit to creating the entry, is that
it now exists in the FIB node graph, so all actions that happen on its
parents, are propagated to the host-prefix entry and consequently to
the tunnel.

FIB provides a wrapper to the sourcing of the host-prefix using a
delegate attached to the entry, and the entry is RR sourced only once.
. The benefit of this approach is that each time a new client tracks
the entry it doesn't RR source it. When an entry is sourced all its
children are updated. Thus, new clients tracking an entry is
O(n^2). With the tracker as indirection, the entry is sourced only once.


.. rubric:: Footnotes:

.. [#f1] Or terminate in an object that transitions the packet out of
         the FIB domain, e.g. a drop.
.. [#f2] Optimisations
.. [#f3] Note it is valid for either interface to be bound to a different table than table 1
.. [#f4] The engaged reader can see the full priority list in vnet/vnet/fib/fib_entry.h
.. [#f5] Note it is only possible to add routes via an address (i.e. a/32 or /128) not via a shorter mask prefix. There is no use case for the latter
.. [#f6] For iBGP the via-address is the loopback address of the peer PE, for eBGP it is the adj-fib for the CE
.. [#f7] As is the case ofr eBGP
.. [#f8] As is the case for iBGP
.. [#f9] If that source is relying on FIB to perform recursive resolution, then there is no reason it should do so itself.
.. [#f10] The derived data-plane graph MUST never be cyclic
.. [#f11] Advertised, e.g. by LDP, SR or BGP
.. [#f12] The only case where the labels will be the same is BGP VPNv4 label allocation per-VRF