/*
 * Copyright (c) 2016 Cisco and/or its affiliates.
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at:
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
/**
 * \brief
 * A IP v4/6 independent FIB.
 *
 * The main functions provided by the FIB are as follows;
 *
 *  - source priorities
 *
 *   A route can be added to the FIB by more than entity or source. Sources
 * include, but are not limited to, API, CLI, LISP, MAP, etc (for the full list
 * see fib_entry.h). Each source provides the forwarding information (FI) that
 * is has determined as required for that route. Since each source determines the
 * FI using different best  path and loop prevention algorithms, it is not
 * correct for the FI of multiple sources to be combined. Instead the FIB must
 * choose to use the FI from only one source. This choose is based on a static
 * priority assignment. For example;
 * IF a prefix is added as a result of interface configuration:
 *    set interface address 192.168.1.1/24 GigE0
 * and then it is also added from the CLI
 *    ip route 192.168.1.1/32 via 2.2.2.2/32
 * then the 'interface' source will prevail, and the route will remain as
 * 'local'.
 * The requirement of the FIB is to always install the FI from the winning
 * source and thus to maintain the FI added by losing sources so it can be
 * installed should the winning source be withdrawn.
 *
 *  - adj-fib maintenance
 *
 *   When ARP or ND discover a neighbour on a link an adjacency forms for the
 * address of that neighbour. It is also required to insert a route in the
 * appropriate FIB table, corresponding to the VRF for the link, an entry for
 * that neighbour. This entry is often referred to as an adj-fib. Adj-fibs
 * have a dedicated source; 'ADJ'.
 * The priority of the ADJ source is lower than most. This is so the following
 * config;
 *    set interface address 192.168.1.1/32 GigE0
 *    ip arp 192.168.1.2 GigE0 dead.dead.dead
 *    ip route add 192.168.1.2 via 10.10.10.10 GigE1
 * will forward traffic for 192.168.1.2 via GigE1. That is the route added
 * by the control plane is favoured over the adjacency discovered by ARP.
 * The control plane, with its associated authentication, is considered the
 * authoritative source.
 * To counter the nefarious addition of adj-fib, through the nefarious injection
 * of adjacencies, the FIB is also required to ensure that only adj-fibs whose
 * less specific covering prefix is connected are installed in forwarding. This
 * requires the use of 'cover tracking', where a route maintains a dependency
 * relationship with the route that is its less specific cover. When this cover
 * changes (i.e. there is a new covering route) or the forwarding information
 * of the cover changes, then the covered route is notified.
 *
 * Overlapping sub-nets are not supported, so no adj-fib has multiple paths.
 * The control plane is expected to remove a prefix configured for an interface
 * before the interface changes VRF.
 * So while the following config is accepted:
 *    set interface address 192.168.1.1/32 GigE0
 *    ip arp 192.168.1.2 GigE0 dead.dead.dead
 *    set interface ip table GigE0 2
 * it does not result in the desired behaviour.
 *
 *  - attached export.
 *
 * Further to adj-fib maintenance above consider the following config:
 *    set interface address 192.168.1.1/24 GigE0
 *    ip route add table 2 192.168.1.0/24 GigE0
 * Traffic destined for 192.168.1.2 in table 2 will generate an ARP request
 * on GigE0. However, since GigE0 is in table 0, all adj-fibs will be added in
 * FIB 0. Hence all hosts in the sub-net are unreachable from table 2. To resolve
 * this, all adj-fib and local prefixes are exported (i.e. copied) from the 
 * 'export' table 0, to the 'import' table 2. There can be many import tables
 * for a single export table.
 *
 *  - recursive route resolution
 *
 *   A recursive route is of the form:
 *       1.1.1.1/32 via 10.10.10.10
 * i.e. a route for which no egress interface is provided. In order to forward
 * traffic to 1.1.1.1/32 the FIB must therefore first determine how to forward
 * traffic to 10.10.10.10/32. This is recursive resolution.
 * Recursive resolution, just like normal resolution, proceeds via a longest
 * prefix match for the 'via-address' 10.10.10.10. Note it is only possible
 * to add routes via an address (i.e. a /32 or /128) not via a shorter mask
 * prefix. There is no use case for the latter.
 * Since recursive resolution proceeds via a longest prefix match, the entry
 * in the FIB that will resolve the recursive route, termed the via-entry, may
 * change as other routes are added to the FIB. Consider the recursive
 * route shown above, and this non-recursive route:
 *       10.10.10.0/24 via 192.168.16.1 GigE0
 * The entry for 10.10.10.0/24 is thus the resolving via-entry. If this entry is
 * modified, to say;
 *       10.10.10.0/24 via 192.16.1.3 GigE0
 * Then packet for 1.1.1.1/32 must also be sent to the new next-hop.
 * Now consider the addition of;
 *       10.10.10.0/28 via 192.168.16.2 GigE0
 * The more specific /28 is a better longest prefix match and thus becomes the
 * via-entry. Removal of the /28 means the resolution will revert to the /24.
 * The tracking to the changes in recursive resolution is the requirement of
 * the FIB. When the forwarding information of the via-entry changes a back-walk
 * is used to update dependent recursive routes. When new routes are added to
 * the table the cover tracking feature provides the necessary notifications to
 * the via-entry routes.
 * The adjacency constructed for 1.1.1.1/32 will be a recursive adjacency
 * whose next adjacency will be contributed from the via-entry. Maintaining
 * the validity of this recursive adjacency is a requirement of the FIB.
 *
 *  - recursive loop avoidance
 *
 * Consider this set of routes:
 *     1.1.1.1/32 via 2.2.2.2
 *     2.2.2.2/32 via 3.3.3.3
 *     3.3.3.3/32 via 1.1.1.1
 * this is termed a recursion loop - all of the routes in t<style>.highlight .hll { background-color: #ffffcc }
.highlight .c { color: #888888 } /* Comment */
.highlight .err { color: #a61717; background-color: #e3d2d2 } /* Error */
.highlight .k { color: #008800; font-weight: bold } /* Keyword */
.highlight .ch { color: #888888 } /* Comment.Hashbang */
.highlight .cm { color: #888888 } /* Comment.Multiline */
.highlight .cp { color: #cc0000; font-weight: bold } /* Comment.Preproc */
.highlight .cpf { color: #888888 } /* Comment.PreprocFile */
.highlight .c1 { color: #888888 } /* Comment.Single */
.highlight .cs { color: #cc0000; font-weight: bold; background-color: #fff0f0 } /* Comment.Special */
.highlight .gd { color: #000000; background-color: #ffdddd } /* Generic.Deleted */
.highlight .ge { font-style: italic } /* Generic.Emph */
.highlight .gr { color: #aa0000 } /* Generic.Error */
.highlight .gh { color: #333333 } /* Generic.Heading */
.highlight .gi { color: #000000; background-color: #ddffdd } /* Generic.Inserted */
.highlight .go { color: #888888 } /* Generic.Output */
.highlight .gp { color: #555555 } /* Generic.Prompt */
.highlight .gs { font-weight: bold } /* Generic.Strong */
.highlight .gu { color: #666666 } /* Generic.Subheading */
.highlight .gt { color: #aa0000 } /* Generic.Traceback */
.highlight .kc { color: #008800; font-weight: bold } /* Keyword.Constant */
.highlight .kd { color: #008800; font-weight: bold } /* Keyword.Declaration */
.highlight .kn { color: #008800; font-weight: bold } /* Keyword.Namespace */
.highlight .kp { color: #008800 } /* Keyword.Pseudo */
.highlight .kr { color: #008800; font-weight: bold } /* Keyword.Reserved */
.highlight .kt { color: #888888; font-weight: bold } /* Keyword.Type */
.highlight .m { color: #0000DD; font-weight: bold } /* Literal.Number */
.highlight .s { color: #dd2200; background-color: #fff0f0 } /* Literal.String */
.highlight .na { color: #336699 } /* Name.Attribute */
.highlight .nb { color: #003388 } /* Name.Builtin */
.highlight .nc { color: #bb0066; font-weight: bold } /* Name.Class */
.highlight .no { color: #003366; font-weight: bold } /* Name.Constant */
.highlight .nd { color: #555555 } /* Name.Decorator */
.highlight .ne { color: #bb0066; font-weight: bold } /* Name.Exception */
.highlight .nf { color: #0066bb; font-weight: bold } /* Name.Function */
.highlight .nl { color: #336699; font-style: italic } /* Name.Label */
.highlight .nn { color: #bb0066; font-weight: bold } /* Name.Namespace */
.highlight .py { color: #336699; font-weight: bold } /* Name.Property */
.highlight .nt { color: #bb0066; font-weight: bold } /* Name.Tag */
.highlight .nv { color: #336699 } /* Name.Variable */
.highlight .ow { color: #008800 } /* Operator.Word */
.highlight .w { color: #bbbbbb } /* Text.Whitespace */
.highlight .mb { color: #0000DD; font-weight: bold } /* Literal.Number.Bin */
.highlight .mf { color: #0000DD; font-weight: bold } /* Literal.Number.Float */
.highlight .mh { color: #0000DD; font-weight: bold } /* Literal.Number.Hex */
.highlight .mi { color: #0000DD; font-weight: bold } /* Literal.Number.Integer */
.highlight .mo { color: #0000DD; font-weight: bold } /* Literal.Number.Oct */
.highlight .sa { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Affix */
.highlight .sb { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Backtick */
.highlight .sc { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Char */
.highlight .dl { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Delimiter */
.highlight .sd { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Doc */
.highlight .s2 { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Double */
.highlight .se { color: #0044dd; background-color: #fff0f0 } /* Literal.String.Escape */
.highlight .sh { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Heredoc */
.highlight .si { color: #3333bb; background-color: #fff0f0 } /* Literal.String.Interpol */
.highlight .sx { color: #22bb22; background-color: #f0fff0 } /* Literal.String.Other */
.highlight .sr { color: #008800; background-color: #fff0ff } /* Literal.String.Regex */
.highlight .s1 { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Single */
.highlight .ss { color: #aa6600; background-color: #fff0f0 } /* Literal.String.Symbol */
.highlight .bp { color: #003388 } /* Name.Builtin.Pseudo */
.highlight .fm { color: #0066bb; font-weight: bold } /* Name.Function.Magic */
.highlight .vc { color: #336699 } /* Name.Variable.Class */
.highlight .vg { color: #dd7700 } /* Name.Variable.Global */
.highlight .vi { color: #3333bb } /* Name.Variable.Instance */
.highlight .vm { color: #336699 } /* Name.Variable.Magic */
.highlight .il { color: #0000DD; font-weight: bold } /* Literal.Number.Integer.Long */</style><div class="highlight"><pre><span></span><span class="cm">/*</span>
<span class="cm"> * Copyright (c) 2018 Cisco and/or its affiliates.</span>
<span class="cm"> * Licensed under the Apache License, Version 2.0 (the &quot;License&quot;);</span>
<span class="cm"> * you may not use this file except in compliance with the License.</span>
<span class="cm"> * You may obtain a copy of the License at:</span>
<span class="cm"> *</span>
<span class="cm"> *     http://www.apache.org/licenses/LICENSE-2.0</span>
<span class="cm"> *</span>
<span class="cm"> * Unless required by applicable law or agreed to in writing, software</span>
<span class="cm"> * distributed under the License is distributed on an &quot;AS IS&quot; BASIS,</span>
<span class="cm"> * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.</span>
<span class="cm"> * See the License for the specific language governing permissions and</span>
<span class="cm"> * limitations under the License.</span>
<span class="cm"> */</span>

<span class="cm">/**</span>
<span class="cm"> * Source VRF Selection matches against a packet&#39;s source address to set</span>
<span class="cm"> * the VRF in which the subsequnet destination IP address lookup is done.</span>
<span class="cm"> * If no match for the source address is found, then the RX interface&#39;s</span>
<span class="cm"> * table/VRF is used.</span>
<span class="cm"> */</span>
<span class="cp">#ifndef __SVS_H__</span>
<span class="cp">#define __SVS_H__</span>

<span class="cp">#include</span> <span class="cpf">&lt;vnet/fib/fib_types.h&gt;</span><span class="cp"></span>

<span class="cp">#define SVS_PLUGIN_VERSION_MAJOR 1</span>
<span class="cp">#define SVS_PLUGIN_VERSION_MINOR 0</span>

<span class="k">extern</span> <span class="kt">int</span> <span class="nf">svs_table_add</span> <span class="p">(</span><span class="n">fib_protocol_t</span> <span class="n">fproto</span><span class="p">,</span> <span class="n">u32</span> <span class="n">table_id</span><span class="p">);</span>
<span class="k">extern</span> <span class="kt">int</span> <span class="nf">svs_table_delete</span> <span class="p">(</span><span class="n">fib_protocol_t</span> <span class="n">fproto</span><span class="p">,</span> <span class="n">u32</span> <span class="n">table_id</span><span class="p">);</span>

<span class="k">extern</span> <span class="kt">int</span> <span class="nf">svs_route_add</span> <span class="p">(</span><span class="n">u32</span> <span class="n">table_id</span><span class="p">,</span>
			  <span class="k">const</span> <span class="n">fib_prefix_t</span> <span class="o">*</span> <span class="n">pfx</span><span class="p">,</span> <span class="n">u32</span> <span class="n">source_table_id</span><span class="p">);</span>
<span class="k">extern</span> <span class="kt">int</span> <span class="nf">svs_route_delete</span> <span class="p">(</span><span class="n">u32</span> <span class="n">table_id</span><span class="p">,</span> <span class="k">const</span> <span class="n">fib_prefix_t</span> <span class="o">*</span> <span class="n">pfx</span><span class="p">);</span>

<span class="k">extern</span> <span class="kt">int</span> <span class="nf">svs_enable</span> <span class="p">(</span><span class="n">fib_protocol_t</span> <span class="n">fproto</span><span class="p">,</span> <span class="n">u32</span> <span class="n">table_id</span><span class="p">,</span> <span class="n">u32</span> <span class="n">sw_if_index</span><span class="p">);</span>
<span class="k">extern</span> <span class="kt">int</span> <span class="nf">svs_disable</span> <span class="p">(</span><span class="n">fib_protocol_t</span> <span class="n">fproto</span><span class="p">,</span> <span class="n">u32</span> <span class="n">table_id</span><span class="p">,</span> <span class="n">u32</span> <span class="n">sw_if_index</span><span class="p">);</span>

<span class="k">typedef</span> <span class="nf">walk_rc_t</span> <span class="p">(</span><span class="o">*</span><span class="n">svs_walk_fn_t</span><span class="p">)</span> <span class="p">(</span><span class="n">fib_protocol_t</span> <span class="n">fproto</span><span class="p">,</span> <span class="n">u32</span> <span class="n">table_id</span><span class="p">,</span>
				    <span class="n">u32</span> <span class="n">sw_if_index</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ctx</span><span class="p">);</span>

<span class="k">extern</span> <span class="kt">void</span> <span class="nf">svs_walk</span> <span class="p">(</span><span class="n">svs_walk_fn_t</span> <span class="n">fn</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ctx</span><span class="p">);</span>

<span class="cm">/*</span>
<span class="cm"> * fd.io coding-style-patch-verification: ON</span>
<span class="cm"> *</span>
<span class="cm"> * Local Variables:</span>
<span class="cm"> * eval: (c-set-style &quot;gnu&quot;)</span>
<span class="cm"> * End:</span>
<span class="cm"> */</span>

<span class="cp">#endif</span>
</pre></div>
</code></pre></td></tr></table>
</div> <!-- class=content -->
<div id="lfcollabprojects-footer">
  <div class="gray-diagonal">
    <div class="footer-inner">
      <p>
        &copy; 2016 <a href="https://www.fd.io/">FD.io</a> a Linux Foundation
        Collaborative Project. All Rights Reserved.
      </p>
      <p>
        Linux Foundation is a registered trademark of The Linux Foundation.
        Linux is a registered
        <a
          href="http://www.linuxfoundation.org/programs/legal/trademark"
          title="Linux Mark Institute"
          >trademark</a
        >
        of Linus Torvalds.
      </p>
      <p>
        Please see our
        <a href="http://www.linuxfoundation.org/privacy">privacy policy</a> and
        <a href="http://www.linuxfoundation.org/terms">terms of use</a>
      </p>
    </div>
  </div>
</div>
</div> <!-- id=cgit -->
</body>
</html>
IC in
 * each VRF. Note though that total PIC cutover time will depend on VRF scale
 * with either trigger.
 * Primary and backup paths in this eBGP PIC-edge scenario are calculated by
 * BGP. Each peer is configured to always advertise its best external path to
 * its iBGP peers. Backup paths therefore send traffic from the PE back into the
 * core to an alternate PE. A PE may have multiple external paths, i.e. multiple
 * directly connected CEs, it may also have multiple backup PEs, however there
 * is no correlation between the two, so unlike LFA-FRR, the redundancy model is
 * N-M; N primary paths are backed-up by M backup paths - only when all primary
 * paths fail, then the cutover is performed onto the M backup paths. Note that
 * PE2 must be suitably configured to forward traffic on its external path that
 * was received from PE1. VPP FIB does not support external-internal-BGP (eiBGP)
 * load-balancing.
 *
 * As with LFA-FRR the use of primary and backup paths is not currently
 * supported, however, the use of a recursive-multi-path-adj, and a suitably
 * constrained hashing algorithm to choose from the primary or backup path sets,
 * would again provide the necessary shared object and hence the prefix scale
 * independent cutover.
 *
 * Astute readers will recognise that both of the eBGP PIC scenarios refer only
 * to a BGP free core.
 *
 * Fast convergence implementation options come in two flavours:
 *  1) Insert switches into the data-path. The switch represents the protected
 *     resource. If the switch is 'on' the primary path is taken, otherwise
 *     the backup path is taken. Testing the switch in the data-path comes with
 *     an associated performance cost. A given packet may encounter more than
 *     one protected resource as it is forwarded. This approach minimises
 *     cutover times as packets will be forwarded on the backup path as soon
 *     as the protected resource is detected to be down and the single switch
 *     is tripped. However, it comes at a performance cost, which increases
 *     with each shared resource a packet encounters in the data-path.
 *     This approach is thus best suited to LFA-FRR where the protected routes
 *     are non-recursive (i.e. encounter few shared resources) and the
 *     expectation on cutover times is more stringent (<50msecs).
 *  2) Update shared objects. Identify objects in the data-path, that are
 *     required to be present whether or not fast convergence is required (i.e.
 *     adjacencies) that can be shared by multiple routes. Create a dependency
 *     between these objects at the protected resource. When the protected
 *     resource fails, each of the shared objects is updated in a way that all
 *     users of it see a consistent change. This approach incurs no performance
 *     penalty as the data-path structure is unchanged, however, the cutover
 *     times are longer as more work is required when the resource fails. This
 *     scheme is thus more appropriate to recursive prefixes (where the packet
 *     will encounter multiple protected resources) and to fast-convergence
 *     technologies where the cutover times are less stringent (i.e. PIC).
 *
 * Implementation:
 * ---------------
 *
 * Due to the requirements outlined above, not all routes known to FIB
 * (e.g. adj-fibs) are installed in forwarding. However, should circumstances
 * change, those routes will need to be added. This adds the requirement that
 * a FIB maintains two tables per-VRF, per-AF (where a 'table' is indexed by
 * prefix); the forwarding and non-forwarding tables.
 *
 * For DP speed in VPP we want the lookup in the forwarding table to directly 
 * result in the ADJ. So the two tables; one contains all the routes (a 
 * lookup therein yields a fib_entry_t), the other contains only the forwarding 
 * routes (a lookup therein yields an ip_adjacency_t). The latter is used by the
 * DP. 
 * This trades memory for forwarding performance. A good trade-off in VPP's
 * expected operating environments.
 *
 * Note these tables are keyed only by the prefix (and since there 2 two
 * per-VRF, implicitly by the VRF too). The key for an adjacency is the
 * tuple:{next-hop, address (and it's AF), interface, link/ether-type}.
 * consider this curious, but allowed, config;
 *
 *   set int ip addr 10.0.0.1/24 Gig0
 *   set ip arp Gig0 10.0.0.2 dead.dead.dead
 *   # a host in that sub-net is routed via a better next hop (say it avoids a
 *   # big L2 domain)
 *   ip route add 10.0.0.2 Gig1 192.168.1.1
 *   # this recursive should go via Gig1
 *   ip route add 1.1.1.1/32 via 10.0.0.2
 *   # this non-recursive should go via Gig0
 *   ip route add 2.2.2.2/32 via Gig0 10.0.0.2
 *
 * for the last route, the lookup for the path (via {Gig0, 10.0.0.2}) in the
 * prefix table would not yield the correct result. To fix this we need a
 * separate table for the adjacencies.
 *
 *  - FIB data structures;
 *
 * fib_entry_t:
 *   - a representation of a route.
 *     - has a prefix.
 *    - it maintains an array of path-lists that have been contributed by the
 *      different sources
 *    - install an adjacency in the forwarding table contributed by the best
 *      source's path-list.
 *
 * fib_path_list_t:
 *   - a list of paths
 *   - path-lists may be shared between FIB entries. The path-lists are thus
 *     kept in a DB. The key is the combined description of the paths. We share
 *     path-lists  when it will aid convergence to do so. Adding path-lists to
 *     this DB that are never shared, or are not shared by prefixes that are
 *     not subject to PIC, will increase the size of the DB unnecessarily and
 *     may lead to increased search times due to hash collisions.
 *   - the path-list contributes the appropriate adj for the entry in the 
 *     forwarding table. The adj can be 'normal', multi-path or recursive,
 *     depending on the number of paths and their types.
 *   - since path-lists are shared there is only one instance of the multi-path 
 *     adj that they [may] create. As such multi-path adjacencies do not need a
 *     separate DB.
 * The path-list with recursive paths and the recursive adjacency that it
 * contributes forms the backbone of the fast convergence architecture (as 
 * described previously). 
 *
 * fib_path_t:
 *   - a description of how to forward the traffic (i.e. via {Gig1, K}).
 *   - the path describes the intent on how to forward. This differs from how 
 *     the path resolves. I.e. it might not be resolved at all (since the
 *     interface is deleted or down).
 *   - paths have different types, most notably recursive or non-recursive.
 *   - a fib_path_t will contribute the appropriate adjacency object. It is from
 *     these contributions that the DP graph/chain for the route is built.
 *   - if the path is recursive and a recursion loop is detected, then the path
 *     will contribute the special DROP adjacency. This way, whilst the control
 *     plane graph is looped, the data-plane graph does not.
 *
 * we build a graph of these objects;
 *
 *  fib_entry_t -> fib_path_list_t -> fib_path_t -> ...
 *
 * for recursive paths:
 *
 *   fib_path_t -> fib_entry_t -> ....
 *
 * for non-recursive paths
 *
 *  fib_path_t -> ip_adjacency_t -> interface
 *
 * These objects, which constitute the 'control plane' part of the FIB are used
 * to represent the resolution of a route. As a whole this is referred to as the
 * control plane graph. There is a separate DP graph to represent the forwarding
 * of a packet. In the DP graph each object represents an action that is applied
 * to a packet as it traverses the graph. For example, a lookup of a IP address
 * in the forwarding table could result in the following graph:
 *
 *    recursive-adj --> multi-path-adj --> interface_A
 *                                     --> interface_B
 *
 * A packet traversing this FIB DP graph would thus also traverse a VPP node
 * graph of:
 *
 *    ipX_recursive --> ipX_rewrite --> interface_A_tx --> etc
 *
 * The taxonomy of objects in a FIB graph is as follows, consider;
 *
 *   A -->  
 *   B --> D
 *   C -->
 *
 * Where A,B and C are (for example) routes that resolve through D. 
 *  parent; D is the parent of A, B, and C.
 *  children: A, B, and C are children of D. 
 *  sibling: A, B and C are siblings of one another.
 *
 * All shared objects in the FIB are reference counted. Users of these objects
 * are thus expected to use the add_lock/unlock semantics (as one would
 * normally use malloc/free).
 *
 * WALKS
 *
 * It is necessary to walk/traverse the graph forwards (entry to interface) to
 * perform a collapse or build a recursive adj and backwards (interface
 * to entry) to perform updates, i.e. when interface state changes or when
 * recursive route resolution updates occur.
 * A forward walk follows simply by navigating an object's parent pointer to
 * access its parent object. For objects with multiple parents (e.g. a 
 * path-list), each parent is walked in turn.
 * To support back-walks direct dependencies are maintained between objects,
 * i.e. in the relationship, {A, B, C} --> D, then object D will maintain a list
 * of 'pointers' to its children {A, B, C}. Bare C-language pointers are not 
 * allowed, so a pointer is described in terms of an object type (i.e. entry,
 * path-list, etc) and index - this allows the object to be retrieved from the
 * appropriate pool. A list is maintained to achieve fast convergence at scale.
 * When there are millions or recursive prefixes, it is very inefficient to
 * blindly walk the tables looking for entries that were affected by a given
 * topology change. The lowest hanging fruit when optimising is to remove
 * actions that are not required, so all back-walks only traverse objects that
 * are directly affected by the change.
 *
 * PIC Core and fast-reroute rely on FIB reacting quickly to an interface
 * state change to update the multi-path-adjacencies that use this interface.
 * An example graph is shown below:
 *
 *    E_a -->
 *    E_b --> PL_2 --> P_a --> Interface_A
 *    ...          --> P_c -\
 *    E_k -->                \
 *                            Interface_K
 *                            /
 *    E_l -->                /
 *    E_m --> PL_1 --> P_d -/ 
 *    ...          --> P_f --> Interface_F
 *    E_z -->
 *
 * E  = fib_entry_t
 * PL = fib_path_list_t
 * P  = fib_path_t 
 * The subscripts are arbitrary and serve only to distinguish object instances.
 * This CP graph result in the following DP graph:
 *
 *     M-ADJ-2 --> Interface_A
 *             \
 *              -> Interface_K
 *             / 
 *     M-ADJ-1 --> Interface_F
 *
 * M-ADJ = multi-path-adjacency.
 *
 * When interface K goes down a back-walk is started over its dependants in the
 * control plane graph. This back-walk will reach PL_1 and PL_2 and result in
 * the calculation of new adjacencies that have interface K removed. The walk
 * will continue to the entry objects and thus the forwarding table is updated
 * for each prefix with the new adjacency. The DP graph then becomes:
 *
 *    ADJ-3 --> Interface_A
 *
 *    ADJ-4 --> Interface_F
 * 
 * The eBGP PIC scenarios described above relied on the update of a path-list's
 * recursive-adjacency to provide the shared point of cutover. This is shown
 * below
 *
 *    E_a -->
 *    E_b --> PL_2 --> P_a --> E_44 --> PL_a --> P_b --> Interface_A
 *    ...          --> P_c -\
 *    E_k -->                \
 *                            \
 *                           E_1 --> PL_k -> P_k --> Interface_K
 *                            /
 *    E_l -->                /
 *    E_m --> PL_1 --> P_d -/ 
 *    ...          --> P_f --> E_55 --> PL_e --> P_e --> Interface_E
 *    E_z -->
 *
 * The failure scenario is the removal of entry E_1 and thus the paths P_c and
 * P_d become unresolved. To achieve PIC the two shared recursive path-lists,
 * PL_1 and PL_2 must be updated to remove E_1 from the recursive-multi-path-
 * adjacencies that they contribute, before any entry E_a to E_z is updated.
 * This means that as the update propagates backwards (right to left) in the
 * graph it must do so breadth first not depth first. Note this approach leads
 * to convergence times that are dependent on the number of path-list and so
 * the number of combinations of egress PEs - this is desirable as this
 * scale is considerably lower than the number of prefixes.
 *
 * If we consider another section of the graph that is similar to the one
 * shown above where there is another prefix E_2 in a similar position to E_1
 * and so also has many dependent children. It is reasonable to expect that a
 * particular network failure may simultaneously render E_1 and E_2 unreachable.
 * This means that the update to withdraw E_2 is download immediately after the
 * update to withdraw E_1. It is a requirement on the FIB to not spend large
 * amounts of time in a back-walk whilst processing the update for E_1, i.e. the
 * back-walk must not reach as far as E_a and its siblings. Therefore, after the
 * back-walk has traversed one generation (breadth first) to update all the
 * path-lists it should be suspended/back-ground and further updates allowed
 * to be handled. Once the update queue is empty, the suspended walks can be
 * resumed. Note that in the case that multiple updates affect the same entry
 * (say E_1) then this will trigger multiple similar walks, these are merged,
 * so each child is updated only once.
 * In the presence of more layers of recursion PIC is still a desirable
 * feature. Consider an extension to the diagram above, where more recursive
 * routes (E_100 -> E_200) are added as children of E_a:
 *
 * E_100 -->
 * E_101 --> PL_3 --> P_j-\
 * ...                     \
 * E_199 -->               E_a -->
 *                         E_b --> PL_2 --> P_a --> E_44 --> ...etc..
 *                         ...          --> P_c -\
 *                         E_k                    \
 *                                                E_1 --> ...etc..
 *                                                 /
 *                         E_l -->                /
 *                         E_m --> PL_1 --> P_d -/ 
 *                         ...          --> P_e --> E_55 --> ...etc..
 *                         E_z -->
 *
 * To achieve PIC for the routes E_100->E_199, PL_3 needs to be updated before
 * E_b -> E_z, a breadth first traversal at each level would not achieve this.
 * Instead the walk must proceed intelligently. Children on PL_2 are sorted so
 * those Entry objects that themselves have children appear first in the list,
 * those without later. When an entry object is walked that has children, a
 * walk of its children is pushed to the front background queue. The back
 * ground queue is a priority queue. As the breadth first traversal proceeds
 * across the dependent entry object E_a to E_k, when the first entry that does
 * not have children is reached (E_b), the walk is suspended and placed at the
 * back of the queue. Following this prioritisation method shared path-list
 * updates are performed before all non-resolving entry objects.
 * The CPU/core/thread that handles the updates is the same thread that handles
 * the back-walks. Handling updates has a higher priority than making walk
 * progress, so a walk is required to be interruptable/suspendable when new
 * updates are available.
 * !!! TODO - this section describes how walks should be not how they are !!!
 *
 * In the diagram above E_100 is an IP route, however, VPP has no restrictions
 * on the type of object that can be a dependent of a FIB entry. Children of
 * a FIB entry can be (and are) GRE & VXLAN tunnels endpoints, L2VPN LSPs etc.
 * By including all object types into the graph and extending the back-walk, we
 * can thus deliver fast convergence to technologies that overlay on an IP
 * network.
 *
 * If having read all the above carefully you are still thinking;  'i don't need
 * all this %&$* i have a route only I know about and I just need to jam it in',
 * then fib_table_entry_special_add() is your only friend.
 */

#ifndef __FIB_H__
#define __FIB_H__

#include <vnet/fib/fib_table.h>
#include <vnet/fib/fib_entry.h>

#endif