summaryrefslogtreecommitdiffstats
path: root/src/vppinfra/bihash_template.c
AgeCommit message (Collapse)AuthorFilesLines
2021-05-06vppinfra: fix testsDamjan Marion1-0/+4
Type: fix Change-Id: If59a66aae658dd35dbcb4987ab00c306b3c6e2e2 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-04-18vppinfra: remove linux/syscall.hDamjan Marion1-1/+1
For portabiliy reasons it is better to have all wrapped in clib code. I.e. instead of using getcpu() we have clib_get_current_numa_node () and clib_get_current_cpu_id(). Type: refactor Change-Id: I29b52d7f29bc7f93873402c4070561f564b71c63 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-10-02vppinfra: Function to check if a bihash has been initialisedNeale Ranns1-0/+5
Type: improvement Signed-off-by: Neale Ranns <nranns@cisco.com> Change-Id: Ic31f7721f326ca9d78d645abcea63ce58df5bd5b
2020-09-30vpp: update 'show bihash' commandDamjan Marion1-11/+14
Type: improvement Change-Id: I6d00ba840d2168af0658f97c45a42d39be7cbbad Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-09-30vppinfra: use heap to store bihash dataDamjan Marion1-25/+165
Type: improvement Change-Id: Ifb0fa114414aa2fdc244f964612ca3ac3e29b5e1 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-08-17vppinfra: fix RC in bihash instantiationNathan Skrzypczak1-2/+2
There can be a race condition in the case a thread tries to do a bihash_search while another instantiates the bihash. Type: fix Change-Id: Ic61b590763beb409e112957c43a5a66cd10afb28 Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>
2020-08-06vppinfra: harmonize function namesDave Barach1-2/+2
Type: fix Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: Icce7eab4510785e15bdcf97e4d1881b0f46f6899
2020-05-27vppinfra: fix SIGBUS in bihash init when running unpriviledged, take twoDamjan Marion1-1/+1
Looks like MAP_LOCK is not enough, so call mlock(...) instead.... Type: fix Change-Id: I1bc668a2bf3c861ca1c2d376c0fb6bfea87d4f48 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-05-24vppinfra: fix SIGBUS in bihash init when running unpriviledgedDamjan Marion1-1/+1
Obserbed when VPP is running in k8s container. Type: fix Change-Id: Ibbff9c3921bd7f4f97d47cb6f10eed8ed5efe269 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-04-23nat: add/del ed_ext_ports only if the table is instantiatedDave Barach1-0/+3
Add a suitable ASSERT in the bihash template in case this happens again. Type: fix Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: Ib370d4238f6bae2995bc30fd17fad5c41053c3d1
2020-04-23vppinfra: more bihash optimizatonsDamjan Marion1-54/+20
* Avoid doing expensive bit extraction for most likely case where bucket .log2_page_size == 0 and .linear_search == 0, saves 3-5 cycles for lookup, data_prefetch and add operation * use bextr instruction when available (x86 BMI instruction set) Type: improvement Change-Id: I163df36a29287482c5f133be8b21d62a2f7440de Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-04-22vppinfra: improve bihash add/del performanceDamjan Marion1-19/+27
Measured improvement is from 439 to 167 clocks for add operation in 16_8 case... Type: improvement Change-Id: I975ff46ff30b983a3ec80a5cde25ccb68d7fa03b Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-04-21vppinfra: bihash improvementsDave Barach1-10/+119
Template instances can allocate BIHASH_KVP_PER_PAGE data records tangent to the bucket, to remove a dependent read / prefetch. Template instances can ask for immediate memory allocation, to avoid several branches in the lookup path. Clean up l2 fib, gpb plugin codes: use clib_bihash_get_bucket(...) Use hugepages for bihash allocation arenas Type: improvement Signed-off-by: Dave Barach <dave@barachs.net> Signed-off-by: Damjan Marion <damarion@cisco.com> Change-Id: I92fc11bc58e48d84e2d61f44580916dd1c56361c
2020-03-27vppinfra: add clib_bihash_get_bucketDamjan Marion1-6/+2
Type: improvement Change-Id: I073bb7bea2a55eabbb6c253b003966f0a821e4a3 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-02-05vppinfra: numa vector placement supportDave Barach1-1/+0
Type: feature Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: I7e7d95a089dd849c1f01ecea84529d8dbf239f21
2020-01-16vppinfra: fixing compilation issues in 32-bitVijayabhaskar Katamreddy1-2/+3
Fixing compilation issuues for 32-bit also setting init flag for shm based bihash Type: fix Signed-off-by: Vijayabhaskar Katamreddy <vkatamre@cisco.com> Change-Id: Ic2072c5ba7fc77d061ca9f1b844a71f6e22e58b2
2019-12-16vppinfra: bihash walk cb typedef and continue/stop controlsNeale Ranns1-3/+4
Type: feature Change-Id: I28f7a658be3f3beec9ea32635b60d1d3a10d9b06 Signed-off-by: Neale Ranns <nranns@cisco.com>
2019-09-03vppinfra: add bihash_init2Dave Barach1-20/+40
Add controls to list / not list a specific bihash in clib_all_bihashes, to immediately initialize a bihash. clib_bihash_init2 is now the primary API. It takes a typical args_t structure. clib_bihash_init becomes a compatibility widget. It fabricates an args_t and calls init2... Type: refactor Ticket: VPP-1758 Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: Ib3e1304884997cf7025af20bdc67a7dda290f15b
2019-08-01vppinfra: make first bihash add thread-safeDave Barach1-7/+24
Type: fix Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: Ie37ff66faba79e3b8f46c7a704137f9ef2acc773
2019-07-19vppinfra: fix OOM check in bihashAndreas Schultz1-1/+1
The OOM check must consider the end of alloced arena and not the start when checking for overflow. Type: fix Change-Id: Ie83e653d0894199d2fa433a604a0fe0cee142338 Signed-off-by: Andreas Schultz <andreas.schultz@travelping.com>
2019-07-11vppinfra: bihash add-but-do-not-overwrite semanticsDave Barach1-0/+7
If is_add=2, fail w/ return value -2 if the key exists instead of overwriting the (key,value) pair. Type: feature Change-Id: I00a3c194a381c68090369c31d6c6f9870cfe0a62 Signed-off-by: Dave Barach <dave@barachs.net>
2019-07-09vppinfra: allocate bihash virtual space on demandDave Barach1-11/+63
Reduces the vpp image virtual size by multiple gigabytes Add a "show bihash" command which displays configured and current virtual space in use by bihash tables. Modify the .py test framework to call "show bihash" on test tear-down Type: refactor Change-Id: Ifc1b7e2c43d29bbef645f6802fa29ff8ef09940c Signed-off-by: Dave Barach <dave@barachs.net>
2019-05-07bihash: Freeing up working_copy_lengths vectorVijayabhaskar Katamreddy1-1/+2
1)Freeing up working_copy_lengths vector 2)Passing vebososity level to fmt_fn Change-Id: I5e3f541e2f8cc0150105cc35835366f84937bb2e Signed-off-by: Vijayabhaskar Katamreddy <vkatamre@cisco.com>
2019-05-07Add bihash statistics hookDave Barach1-0/+18
Example / unit-test in .../src/plugins/unittest/bihash_test.c Change-Id: I23fd0ba742d65291667a755965aee1a3d3477ca2 Signed-off-by: Dave Barach <dave@barachs.net>
2019-04-17Use template-specific key compare fn when deleting recordsDave Barach1-2/+2
A simple memcmp won't work when comparing pointer-keys, such as those used by the bihash_vec8_8.h template. Change-Id: I77e59f3fd7f7740ef42908ace90ed4843e1c9ac7 Signed-off-by: Dave Barach <dave@barachs.net>
2019-03-15Fix bihash bucket double unlock.Tom Seidenberg1-2/+1
Change-Id: Icc9bef32d1bb2b8f277598c50c69343c81f22cd2 Signed-off-by: Tom Seidenberg <tseidenb@cisco.com>
2018-11-14Remove c-11 memcpy checks from perf-critical codeDave Barach1-11/+12
Change-Id: Id4f37f5d4a03160572954a416efa1ef9b3d79ad1 Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-23c11 safe string handling supportDave Barach1-15/+15
Change-Id: Ied34720ca5a6e6e717eea4e86003e854031b6eab Signed-off-by: Dave Barach <dave@barachs.net>
2018-09-20bihash template: avoid memory leak upon rehashAndrew Yourtchenko1-0/+3
Call the BV (value_free) when we have performed the rehash and thus no longer need the memory that old value for the bucket refers to. Change-Id: Ibb82174fc8002aeb3e1a6c8d1f90293d73bc45d8 Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
2018-09-19bihash template: reinstate the check for the available memory in the arenaAndrew Yourtchenko1-1/+1
ffb14b9554afa1e58c3657e0c91dda3135008274 has changed the semantics of alloc_arena_next to become an offset off alloc_arena, but in the available memory check in BV (alloc_aligned) it still treats it as a virtual address, resulting in the check always succeeding, thus over a prolonged period bihash arena allocator potentially overwriting whatever is following the arena. Change-Id: I18882c5f340ca767a389e15cca2696a0a97ef015 Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
2018-09-11bihash 32/64 bit shared memory interopDave Barach1-33/+33
This patch makes 32/64 bit interoperable shared memory bihash tables work regardless of where they're mapped. Change-Id: If5b4a37ccdaa75410eba755c7d7195633de1b30b Signed-off-by: Dave Barach <dave@barachs.net>
2018-08-2832/64 shmem bihash interoperabilityDave Barach1-18/+154
Move the binary api segment above 4gb Change-Id: I40e8aa7a97722a32397f5a538b5ff8344c50d408 Signed-off-by: Dave Barach <dave@barachs.net>
2018-08-23bihash: remove unused countersDamjan Marion1-4/+0
Change-Id: I1f0aae16e4ace850d7d79b9c2c644a3e0d002636 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-08-22bihash: add support for reuse of expired entry when bucket is full (VPP-1272)Matus Fabian1-2/+30
Applications such as NAT that dynamically create entries require these entries to expire after some time. Bihash user can now lazily delete expired entries. When inserting and bucket is full, expired entry is overwritten. Change-Id: I6852305df399b546159407f1729c856afde5a634 Signed-off-by: Matus Fabian <matfabia@cisco.com>
2018-08-06fix dangling reference in foreach_key_value_pairDave Barach1-0/+7
When the user deletes the last entry in a bihash bucket, the bihash infra frees the bucket's backing storage. If this happens under clib_bihash_foreach_key_value_pair - and the freed bucket happens to be the bucket being traversed - the resulting dangling reference can easily make the wheels fall off. Simple fix: if (bucket-is-now-empty) double-break. Change-Id: Idc44247a82ed5d0ba548507b4a53d4c8503ba8bb Signed-off-by: Dave Barach <dave@barachs.net>
2018-07-20bihash: give hint to CPU that we are spinlockingDamjan Marion1-1/+1
Change-Id: I78c0a6da5d8fc63c1ced43589c42abc15ab12b16 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-07-20Fine-grained add / delete lockingDave Barach1-119/+95
Add a bucket-level lock bit. Use a spinlock only when actually allocating, freeing, or splitting a bucket. Should improve multi-thread add/del performance. Change-Id: I3e40e2a8371685457f340d6584dea14e3207f2b0 Signed-off-by: Dave Barach <dave@barachs.net>
2018-07-18vppinfra: increase max bihash arena size to 512GBDamjan Marion1-4/+4
Change-Id: Ic636297df4c03303fdcb176669f0268d80e22123 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-02-22bihash table size perf/scale improvementsDave Barach1-30/+46
Directly allocate and carve cache-line-aligned chunks of virtual memory. To a first approximation, bihash wasn't using clib_mem_free(...). We eliminate mheap object header/trailers, which improves space efficiency. We also eliminate the 4gb bihash table size limit. An 8_8 bihash w/ 100 million random entries uses 3.8 Gbytes. Change-Id: Icf925fdf99bce7d6ac407ac4edd30560b8f04808 Signed-off-by: Dave Barach <dave@barachs.net>
2018-02-08Minimize bihash memory consumptionDave Barach1-9/+46
Reference-count the number of entries in each bucket. If the reference count goes to zero, free the backing store. Add long-term churn-testing to test_bihash_template.c, thanks to Andrew Yourtchenko for the initial implementation. Change-Id: I4fbd9229cacfaba8027a85cbf87b74afdead6e39 Signed-off-by: Dave Barach <dave@barachs.net>
2018-01-24Adding a format function for bihash init routine to format the key, value, ↵Vijayabhaskar Katamreddy1-3/+20
when verbose option is used Change-Id: Ib63ead4525332f897b8a1d8a4cf5a0eb1da1e7f3 Signed-off-by: Vijayabhaskar Katamreddy <vkatamre@cisco.com>
2017-11-09lock initJingLiuZTE1-0/+1
writer_lock must be inited before used. Change-Id: Ib258aa09b3bccc4de6edba0eb75a7eec20f1a61f Signed-off-by: JingLiuZTE <liu.jing5@zte.com.cn>
2017-09-06Fixes for issues raised by Coverity (VPP-972)Chris Luke1-1/+2
Change-Id: I4b1f27b95d67d48b7a13750ff8754c344ed7afa7 Signed-off-by: Chris Luke <chrisy@flirble.org>
2017-08-31Fix BIHASH_KVP_CACHE_SIZE == 0 caseDave Barach1-1/+13
Setting the bucket-level LRU cache size to zero removes the bucket-level LRU cache code. Change-Id: Idf2e63d0d508675e957366515863766f79a3479c Signed-off-by: Dave Barach <dbarach@cisco.com>
2017-07-23Atomic bucket lockDave Barach1-13/+14
Change-Id: I84908b9ad30d7555024e98b69ed37b111f31c27a Signed-off-by: Dave Barach <dbarach@cisco.com>
2017-07-19Add a bihash prefetchable bucket-level cacheDave Barach1-8/+70
According to Maciek, the easiest way to leverage the csit "performance trend" job is to actually merge the patch once verified. Manual testing indicates that the patch improves l2 path performance. Other use-cases are TBD. It's possible that we'll need to back out the patch depending on what happens. Change-Id: Ic0a0363de35ef9be953ad7709c57c3936b73fd5a Signed-off-by: Dave Barach <dave@barachs.net>
2017-06-05More GCC-7 errorsMarco Varlese1-0/+2
The Wmaybe-uninitialized is the new error included with Wall. This patch addresses the warning and fixes it. Change-Id: I8fdf9ff2d236c46b717024a14874fbbbad8af303 Signed-off-by: Marco Varlese <marco.varlese@suse.com>
2017-06-02Fix mac_age process crash in multi-threaded environmentSteve Shin1-2/+1
VPP crash is observed when MAC aging is enabled with multi-threaded mode. If a thread other-than-zero expands the working_copies vector, working_copy_lengths should be initialized with vec_validate_init_empty(..., -1) to fill -1 across lower-numbered working_copy_lengths vector element. Change-Id: I60959fc6511306b33acae323df9c6898fc6c50ce Signed-off-by: Steve Shin <jonshin@cisco.com>
2017-05-18VPP-847: improve bihash template memory allocator performanceDave Barach1-27/+43
Particularly in the DCLIB_VEC64=1 case, using vectors vs. raw clib_mem_alloc'ed memory causes abysmal memory allocator performance. Change-Id: I07a4dec0cd69ca357445385e2671cdf23c59b95d Signed-off-by: Dave Barach <dave@barachs.net>
2017-05-10completelly deprecate os_get_cpu_number, replace new occurencesDamjan Marion1-8/+8
Change-Id: I82c663bc0866c6c68ba354104b0bb059387f4b9d Signed-off-by: Damjan Marion <damarion@cisco.com>
433' href='#n1433'>1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227
/*
 * Copyright (c) 2017 Cisco and/or its affiliates.
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at:
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/**
 * @file
 * @brief IPv6 Full Reassembly.
 *
 * This file contains the source code for IPv6 full reassembly.
 */

#include <vppinfra/vec.h>
#include <vnet/vnet.h>
#include <vnet/ip/ip.h>
#include <vppinfra/bihash_48_8.h>
#include <vnet/ip/reass/ip6_full_reass.h>
#include <vnet/ip/ip6_inlines.h>

#define MSEC_PER_SEC 1000
#define IP6_FULL_REASS_TIMEOUT_DEFAULT_MS 200
/* As there are only 1024 reass context per thread, either the DDOS attacks
 * or fractions of real timeouts, would consume these contexts quickly and
 * running out context space and unable to perform reassembly */
#define IP6_FULL_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS 50 // 50 ms default
#define IP6_FULL_REASS_MAX_REASSEMBLIES_DEFAULT 1024
#define IP6_FULL_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT 3
#define IP6_FULL_REASS_HT_LOAD_FACTOR (0.75)

typedef enum
{
  IP6_FULL_REASS_RC_OK,
  IP6_FULL_REASS_RC_INTERNAL_ERROR,
  IP6_FULL_REASS_RC_TOO_MANY_FRAGMENTS,
  IP6_FULL_REASS_RC_NO_BUF,
  IP6_FULL_REASS_RC_HANDOFF,
  IP6_FULL_REASS_RC_INVALID_FRAG_LEN,
  IP6_FULL_REASS_RC_OVERLAP,
} ip6_full_reass_rc_t;

typedef struct
{
  union
  {
    struct
    {
      ip6_address_t src;
      ip6_address_t dst;
      u32 xx_id;
      u32 frag_id;
      u8 unused[7];
      u8 proto;
    };
    u64 as_u64[6];
  };
} ip6_full_reass_key_t;

typedef union
{
  struct
  {
    u32 reass_index;
    u32 memory_owner_thread_index;
  };
  u64 as_u64;
} ip6_full_reass_val_t;

typedef union
{
  struct
  {
    ip6_full_reass_key_t k;
    ip6_full_reass_val_t v;
  };
  clib_bihash_kv_48_8_t kv;
} ip6_full_reass_kv_t;


always_inline u32
ip6_full_reass_buffer_get_data_offset (vlib_buffer_t * b)
{
  vnet_buffer_opaque_t *vnb = vnet_buffer (b);
  return vnb->ip.reass.range_first - vnb->ip.reass.fragment_first;
}

always_inline u16
ip6_full_reass_buffer_get_data_len (vlib_buffer_t * b)
{
  vnet_buffer_opaque_t *vnb = vnet_buffer (b);
  return clib_min (vnb->ip.reass.range_last, vnb->ip.reass.fragment_last) -
    (vnb->ip.reass.fragment_first +
     ip6_full_reass_buffer_get_data_offset (b)) + 1;
}

typedef struct
{
  // hash table key
  ip6_full_reass_key_t key;
  // time when last packet was received
  f64 last_heard;
  // internal id of this reassembly
  u64 id;
  // buffer index of first buffer in this reassembly context
  u32 first_bi;
  // last octet of packet, ~0 until fragment without more_fragments arrives
  u32 last_packet_octet;
  // length of data collected so far
  u32 data_len;
  // trace operation counter
  u32 trace_op_counter;
  // next index - used by custom apps (~0 if not set)
  u32 next_index;
  // error next index - used by custom apps (~0 if not set)
  u32 error_next_index;
  // minimum fragment length for this reassembly - used to estimate MTU
  u16 min_fragment_length;
  // number of fragments for this reassembly
  u32 fragments_n;
  // thread owning memory for this context (whose pool contains this ctx)
  u32 memory_owner_thread_index;
  // thread which received fragment with offset 0 and which sends out the
  // completed reassembly
  u32 sendout_thread_index;
} ip6_full_reass_t;

typedef struct
{
  ip6_full_reass_t *pool;
  u32 reass_n;
  u32 id_counter;
  // for pacing the main thread timeouts
  u32 last_id;
  clib_spinlock_t lock;
} ip6_full_reass_per_thread_t;

typedef struct
{
  // IPv6 config
  u32 timeout_ms;
  f64 timeout;
  u32 expire_walk_interval_ms;
  // maximum number of fragments in one reassembly
  u32 max_reass_len;
  // maximum number of reassemblies
  u32 max_reass_n;

  // IPv6 runtime
  clib_bihash_48_8_t hash;

  // per-thread data
  ip6_full_reass_per_thread_t *per_thread_data;

  // convenience
  vlib_main_t *vlib_main;

  u32 ip6_icmp_error_idx;
  u32 ip6_full_reass_expire_node_idx;

  /** Worker handoff */
  u32 fq_index;
  u32 fq_local_index;
  u32 fq_feature_index;
  u32 fq_custom_index;

  // reference count for enabling/disabling feature - per interface
  u32 *feature_use_refcount_per_intf;

  // whether local fragmented packets are reassembled or not
  int is_local_reass_enabled;
} ip6_full_reass_main_t;

extern ip6_full_reass_main_t ip6_full_reass_main;

#ifndef CLIB_MARCH_VARIANT
ip6_full_reass_main_t ip6_full_reass_main;
#endif /* CLIB_MARCH_VARIANT */

typedef enum
{
  IP6_FULL_REASSEMBLY_NEXT_INPUT,
  IP6_FULL_REASSEMBLY_NEXT_DROP,
  IP6_FULL_REASSEMBLY_NEXT_ICMP_ERROR,
  IP6_FULL_REASSEMBLY_NEXT_HANDOFF,
  IP6_FULL_REASSEMBLY_N_NEXT,
} ip6_full_reass_next_t;

typedef enum
{
  NORMAL,
  FEATURE,
  CUSTOM
} ip6_full_reass_node_type_t;

typedef enum
{
  RANGE_NEW,
  RANGE_DISCARD,
  RANGE_OVERLAP,
  ICMP_ERROR_RT_EXCEEDED,
  ICMP_ERROR_FL_TOO_BIG,
  ICMP_ERROR_FL_NOT_MULT_8,
  FINALIZE,
  HANDOFF,
  PASSTHROUGH,
} ip6_full_reass_trace_operation_e;

typedef struct
{
  u16 range_first;
  u16 range_last;
  u32 range_bi;
  i32 data_offset;
  u32 data_len;
  u32 first_bi;
} ip6_full_reass_range_trace_t;

typedef struct
{
  ip6_full_reass_trace_operation_e action;
  u32 reass_id;
  ip6_full_reass_range_trace_t trace_range;
  u32 op_id;
  u32 fragment_first;
  u32 fragment_last;
  u32 total_data_len;
  u32 thread_id;
  u32 thread_id_to;
  bool is_after_handoff;
  ip6_header_t ip6_header;
  ip6_frag_hdr_t ip6_frag_header;
} ip6_full_reass_trace_t;

static void
ip6_full_reass_trace_details (vlib_main_t * vm, u32 bi,
			      ip6_full_reass_range_trace_t * trace)
{
  vlib_buffer_t *b = vlib_get_buffer (vm, bi);
  vnet_buffer_opaque_t *vnb = vnet_buffer (b);
  trace->range_first = vnb->ip.reass.range_first;
  trace->range_last = vnb->ip.reass.range_last;
  trace->data_offset = ip6_full_reass_buffer_get_data_offset (b);
  trace->data_len = ip6_full_reass_buffer_get_data_len (b);
  trace->range_bi = bi;
}

static u8 *
format_ip6_full_reass_range_trace (u8 * s, va_list * args)
{
  ip6_full_reass_range_trace_t *trace =
    va_arg (*args, ip6_full_reass_range_trace_t *);
  s =
    format (s, "range: [%u, %u], off %d, len %u, bi %u", trace->range_first,
	    trace->range_last, trace->data_offset, trace->data_len,
	    trace->range_bi);
  return s;
}

static u8 *
format_ip6_full_reass_trace (u8 * s, va_list * args)
{
  CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *);
  CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *);
  ip6_full_reass_trace_t *t = va_arg (*args, ip6_full_reass_trace_t *);
  u32 indent = 0;
  if (~0 != t->reass_id)
    {
      if (t->is_after_handoff)
	{
	  s =
	    format (s, "%U\n", format_ip6_header, &t->ip6_header,
		    sizeof (t->ip6_header));
	  s =
	    format (s, "  %U\n", format_ip6_frag_hdr, &t->ip6_frag_header,
		    sizeof (t->ip6_frag_header));
	  indent = 2;
	}
      s =
	format (s, "%Ureass id: %u, op id: %u, ", format_white_space, indent,
		t->reass_id, t->op_id);
      indent = format_get_indent (s);
      s = format (s, "first bi: %u, data len: %u, ip/fragment[%u, %u]",
		  t->trace_range.first_bi, t->total_data_len,
		  t->fragment_first, t->fragment_last);
    }
  switch (t->action)
    {
    case RANGE_NEW:
      s = format (s, "\n%Unew %U", format_white_space, indent,
		  format_ip6_full_reass_range_trace, &t->trace_range);
      break;
    case RANGE_DISCARD:
      s = format (s, "\n%Udiscard %U", format_white_space, indent,
		  format_ip6_full_reass_range_trace, &t->trace_range);
      break;
    case RANGE_OVERLAP:
      s = format (s, "\n%Uoverlap %U", format_white_space, indent,
		  format_ip6_full_reass_range_trace, &t->trace_range);
      break;
    case ICMP_ERROR_FL_TOO_BIG:
      s = format (s, "\n%Uicmp-error - frag_len > 65535 %U",
		  format_white_space, indent,
		  format_ip6_full_reass_range_trace, &t->trace_range);
      break;
    case ICMP_ERROR_FL_NOT_MULT_8:
      s = format (s, "\n%Uicmp-error - frag_len mod 8 != 0 %U",
		  format_white_space, indent,
		  format_ip6_full_reass_range_trace, &t->trace_range);
      break;
    case ICMP_ERROR_RT_EXCEEDED:
      s = format (s, "\n%Uicmp-error - reassembly time exceeded",
		  format_white_space, indent);
      break;
    case FINALIZE:
      s = format (s, "\n%Ufinalize reassembly", format_white_space, indent);
      break;
    case HANDOFF:
      s =
	format (s, "handoff from thread #%u to thread #%u", t->thread_id,
		t->thread_id_to);
      break;
    case PASSTHROUGH:
      s = format (s, "passthrough - not a fragment");
      break;
    }
  return s;
}

static void
ip6_full_reass_add_trace (vlib_main_t * vm, vlib_node_runtime_t * node,
			  ip6_full_reass_t * reass, u32 bi,
			  ip6_frag_hdr_t * ip6_frag_header,
			  ip6_full_reass_trace_operation_e action,
			  u32 thread_id_to)
{
  vlib_buffer_t *b = vlib_get_buffer (vm, bi);
  vnet_buffer_opaque_t *vnb = vnet_buffer (b);
  bool is_after_handoff = false;
  if (pool_is_free_index
      (vm->trace_main.trace_buffer_pool, vlib_buffer_get_trace_index (b)))
    {
      // this buffer's trace is gone
      b->flags &= ~VLIB_BUFFER_IS_TRACED;
      return;
    }
  if (vlib_buffer_get_trace_thread (b) != vm->thread_index)
    {
      is_after_handoff = true;
    }
  ip6_full_reass_trace_t *t = vlib_add_trace (vm, node, b, sizeof (t[0]));
  t->is_after_handoff = is_after_handoff;
  if (t->is_after_handoff)
    {
      clib_memcpy (&t->ip6_header, vlib_buffer_get_current (b),
		   clib_min (sizeof (t->ip6_header), b->current_length));
      if (ip6_frag_header)
	{
	  clib_memcpy (&t->ip6_frag_header, ip6_frag_header,
		       sizeof (t->ip6_frag_header));
	}
      else
	{
	  clib_memset (&t->ip6_frag_header, 0, sizeof (t->ip6_frag_header));
	}
    }
  if (reass)
    {
      t->reass_id = reass->id;
      t->op_id = reass->trace_op_counter;
      t->trace_range.first_bi = reass->first_bi;
      t->total_data_len = reass->data_len;
      ++reass->trace_op_counter;
    }
  else
    {
      t->reass_id = ~0;
    }
  t->action = action;
  t->thread_id = vm->thread_index;
  t->thread_id_to = thread_id_to;
  ip6_full_reass_trace_details (vm, bi, &t->trace_range);
  t->fragment_first = vnb->ip.reass.fragment_first;
  t->fragment_last = vnb->ip.reass.fragment_last;
#if 0
  static u8 *s = NULL;
  s = format (s, "%U", format_ip6_full_reass_trace, NULL, NULL, t);
  printf ("%.*s\n", vec_len (s), s);
  fflush (stdout);
  vec_reset_length (s);
#endif
}

always_inline void
ip6_full_reass_free_ctx (ip6_full_reass_per_thread_t * rt,
			 ip6_full_reass_t * reass)
{
  pool_put (rt->pool, reass);
  --rt->reass_n;
}

always_inline void
ip6_full_reass_free (ip6_full_reass_main_t * rm,
		     ip6_full_reass_per_thread_t * rt,
		     ip6_full_reass_t * reass)
{
  clib_bihash_kv_48_8_t kv;
  kv.key[0] = reass->key.as_u64[0];
  kv.key[1] = reass->key.as_u64[1];
  kv.key[2] = reass->key.as_u64[2];
  kv.key[3] = reass->key.as_u64[3];
  kv.key[4] = reass->key.as_u64[4];
  kv.key[5] = reass->key.as_u64[5];
  clib_bihash_add_del_48_8 (&rm->hash, &kv, 0);
  ip6_full_reass_free_ctx (rt, reass);
}

/* n_left_to_next, and to_next are taken as input params, as this function
 * could be called from a graphnode, where its managing local copy of these
 * variables, and ignoring those and still trying to enqueue the buffers
 * with local variables would cause either buffer leak or corruption */
always_inline void
ip6_full_reass_drop_all (vlib_main_t *vm, vlib_node_runtime_t *node,
			 ip6_full_reass_t *reass, u32 *n_left_to_next,
			 u32 **to_next)
{
  u32 range_bi = reass->first_bi;
  vlib_buffer_t *range_b;
  vnet_buffer_opaque_t *range_vnb;
  u32 *to_free = NULL;

  while (~0 != range_bi)
    {
      range_b = vlib_get_buffer (vm, range_bi);
      range_vnb = vnet_buffer (range_b);

      if (~0 != range_bi)
	{
	  vec_add1 (to_free, range_bi);
	}
      range_bi = range_vnb->ip.reass.next_range_bi;
    }

  /* send to next_error_index */
  if (~0 != reass->error_next_index &&
      reass->error_next_index < node->n_next_nodes)
    {
      u32 next_index;

      next_index = reass->error_next_index;
      u32 bi = ~0;

      /* record number of packets sent to custom app */
      vlib_node_increment_counter (vm, node->node_index,
				   IP6_ERROR_REASS_TO_CUSTOM_APP,
				   vec_len (to_free));

      while (vec_len (to_free) > 0)
	{
	  vlib_get_next_frame (vm, node, next_index, *to_next,
			       (*n_left_to_next));

	  while (vec_len (to_free) > 0 && (*n_left_to_next) > 0)
	    {
	      bi = vec_pop (to_free);

	      if (~0 != bi)
		{
		  vlib_buffer_t *b = vlib_get_buffer (vm, bi);
		  if (PREDICT_FALSE (b->flags & VLIB_BUFFER_IS_TRACED))
		    {
		      ip6_full_reass_add_trace (vm, node, reass, bi, NULL,
						RANGE_DISCARD, ~0);
		    }
		  *to_next[0] = bi;
		  (*to_next) += 1;
		  (*n_left_to_next) -= 1;
		}
	    }
	  vlib_put_next_frame (vm, node, next_index, (*n_left_to_next));
	}
    }
  else
    {
      vlib_buffer_free (vm, to_free, vec_len (to_free));
    }
  vec_free (to_free);
}

always_inline void
sanitize_reass_buffers_add_missing (vlib_main_t *vm, ip6_full_reass_t *reass,
				    u32 *bi0)
{
  u32 range_bi = reass->first_bi;
  vlib_buffer_t *range_b;
  vnet_buffer_opaque_t *range_vnb;

  while (~0 != range_bi)
    {
      range_b = vlib_get_buffer (vm, range_bi);
      range_vnb = vnet_buffer (range_b);
      u32 bi = range_bi;
      if (~0 != bi)
	{
	  if (bi == *bi0)
	    *bi0 = ~0;
	  if (range_b->flags & VLIB_BUFFER_NEXT_PRESENT)
	    {
	      u32 _bi = bi;
	      vlib_buffer_t *_b = vlib_get_buffer (vm, _bi);
	      while (_b->flags & VLIB_BUFFER_NEXT_PRESENT)
		{
		  if (_b->next_buffer != range_vnb->ip.reass.next_range_bi)
		    {
		      _bi = _b->next_buffer;
		      _b = vlib_get_buffer (vm, _bi);
		    }
		  else
		    {
		      _b->flags &= ~VLIB_BUFFER_NEXT_PRESENT;
		      break;
		    }
		}
	    }
	  range_bi = range_vnb->ip.reass.next_range_bi;
	}
    }
  if (*bi0 != ~0)
    {
      vlib_buffer_t *fb = vlib_get_buffer (vm, *bi0);
      vnet_buffer_opaque_t *fvnb = vnet_buffer (fb);
      if (~0 != reass->first_bi)
	{
	  fvnb->ip.reass.next_range_bi = reass->first_bi;
	  reass->first_bi = *bi0;
	}
      else
	{
	  reass->first_bi = *bi0;
	  fvnb->ip.reass.next_range_bi = ~0;
	}
      *bi0 = ~0;
    }
}

always_inline void
ip6_full_reass_on_timeout (vlib_main_t *vm, vlib_node_runtime_t *node,
			   ip6_full_reass_t *reass, u32 *icmp_bi,
			   u32 *n_left_to_next, u32 **to_next)
{
  if (~0 == reass->first_bi)
    {
      return;
    }
  if (~0 == reass->next_index)	// custom apps don't want icmp
    {
      vlib_buffer_t *b = vlib_get_buffer (vm, reass->first_bi);
      if (0 == vnet_buffer (b)->ip.reass.fragment_first)
	{
	  *icmp_bi = reass->first_bi;
	  if (PREDICT_FALSE (b->flags & VLIB_BUFFER_IS_TRACED))
	    {
	      ip6_full_reass_add_trace (vm, node, reass, reass->first_bi, NULL,
					ICMP_ERROR_RT_EXCEEDED, ~0);
	    }
	  // fragment with offset zero received - send icmp message back
	  if (b->flags & VLIB_BUFFER_NEXT_PRESENT)
	    {
	      // separate first buffer from chain and steer it towards icmp node
	      b->flags &= ~VLIB_BUFFER_NEXT_PRESENT;
	      reass->first_bi = b->next_buffer;
	    }
	  else
	    {
	      reass->first_bi = vnet_buffer (b)->ip.reass.next_range_bi;
	    }
	  icmp6_error_set_vnet_buffer (b, ICMP6_time_exceeded,
				       ICMP6_time_exceeded_fragment_reassembly_time_exceeded,
				       0);
	}
    }
  ip6_full_reass_drop_all (vm, node, reass, n_left_to_next, to_next);
}

always_inline ip6_full_reass_t *
ip6_full_reass_find_or_create (vlib_main_t *vm, vlib_node_runtime_t *node,
			       ip6_full_reass_main_t *rm,
			       ip6_full_reass_per_thread_t *rt,
			       ip6_full_reass_kv_t *kv, u32 *icmp_bi,
			       u8 *do_handoff, int skip_bihash,
			       u32 *n_left_to_next, u32 **to_next)
{
  ip6_full_reass_t *reass;
  f64 now;

again:

  reass = NULL;
  now = vlib_time_now (vm);

  if (!skip_bihash && !clib_bihash_search_48_8 (&rm->hash, &kv->kv, &kv->kv))
    {
      if (vm->thread_index != kv->v.memory_owner_thread_index)
	{
	  *do_handoff = 1;
	  return NULL;
	}

      reass =
	pool_elt_at_index (rm->per_thread_data
			   [kv->v.memory_owner_thread_index].pool,
			   kv->v.reass_index);

      if (now > reass->last_heard + rm->timeout)
	{
	  vlib_node_increment_counter (vm, node->node_index,
				       IP6_ERROR_REASS_TIMEOUT, 1);
	  ip6_full_reass_on_timeout (vm, node, reass, icmp_bi, n_left_to_next,
				     to_next);
	  ip6_full_reass_free (rm, rt, reass);
	  reass = NULL;
	}
    }

  if (reass)
    {
      reass->last_heard = now;
      return reass;
    }

  if (rt->reass_n >= rm->max_reass_n)
    {
      reass = NULL;
      return reass;
    }
  else
    {
      pool_get (rt->pool, reass);
      clib_memset (reass, 0, sizeof (*reass));
      reass->id = ((u64) vm->thread_index * 1000000000) + rt->id_counter;
      ++rt->id_counter;
      reass->first_bi = ~0;
      reass->last_packet_octet = ~0;
      reass->data_len = 0;
      reass->next_index = ~0;
      reass->error_next_index = ~0;
      reass->memory_owner_thread_index = vm->thread_index;
      ++rt->reass_n;
    }

  kv->v.reass_index = (reass - rt->pool);
  kv->v.memory_owner_thread_index = vm->thread_index;
  reass->last_heard = now;

  if (!skip_bihash)
    {
      reass->key.as_u64[0] = kv->kv.key[0];
      reass->key.as_u64[1] = kv->kv.key[1];
      reass->key.as_u64[2] = kv->kv.key[2];
      reass->key.as_u64[3] = kv->kv.key[3];
      reass->key.as_u64[4] = kv->kv.key[4];
      reass->key.as_u64[5] = kv->kv.key[5];

      int rv = clib_bihash_add_del_48_8 (&rm->hash, &kv->kv, 2);
      if (rv)
	{
	  ip6_full_reass_free (rm, rt, reass);
	  reass = NULL;
	  // if other worker created a context already work with the other copy
	  if (-2 == rv)
	    goto again;
	}
    }
  else
    {
      reass->key.as_u64[0] = ~0;
      reass->key.as_u64[1] = ~0;
      reass->key.as_u64[2] = ~0;
      reass->key.as_u64[3] = ~0;
      reass->key.as_u64[4] = ~0;
      reass->key.as_u64[5] = ~0;
    }

  return reass;
}

always_inline ip6_full_reass_rc_t
ip6_full_reass_finalize (vlib_main_t * vm, vlib_node_runtime_t * node,
			 ip6_full_reass_main_t * rm,
			 ip6_full_reass_per_thread_t * rt,
			 ip6_full_reass_t * reass, u32 * bi0, u32 * next0,
			 u32 * error0, bool is_custom_app)
{
  *bi0 = reass->first_bi;
  *error0 = IP6_ERROR_NONE;
  ip6_frag_hdr_t *frag_hdr;
  vlib_buffer_t *last_b = NULL;
  u32 sub_chain_bi = reass->first_bi;
  u32 total_length = 0;
  u32 *vec_drop_compress = NULL;
  ip6_full_reass_rc_t rv = IP6_FULL_REASS_RC_OK;
  do
    {
      u32 tmp_bi = sub_chain_bi;
      vlib_buffer_t *tmp = vlib_get_buffer (vm, tmp_bi);
      vnet_buffer_opaque_t *vnb = vnet_buffer (tmp);
      if (!(vnb->ip.reass.range_first >= vnb->ip.reass.fragment_first) &&
	  !(vnb->ip.reass.range_last > vnb->ip.reass.fragment_first))
	{
	  rv = IP6_FULL_REASS_RC_INTERNAL_ERROR;
	  goto free_buffers_and_return;
	}

      u32 data_len = ip6_full_reass_buffer_get_data_len (tmp);
      u32 trim_front = vnet_buffer (tmp)->ip.reass.ip6_frag_hdr_offset +
	sizeof (*frag_hdr) + ip6_full_reass_buffer_get_data_offset (tmp);
      u32 trim_end =
	vlib_buffer_length_in_chain (vm, tmp) - trim_front - data_len;
      if (tmp_bi == reass->first_bi)
	{
	  /* first buffer - keep ip6 header */
	  if (0 != ip6_full_reass_buffer_get_data_offset (tmp))
	    {
	      rv = IP6_FULL_REASS_RC_INTERNAL_ERROR;
	      goto free_buffers_and_return;
	    }
	  trim_front = 0;
	  trim_end = vlib_buffer_length_in_chain (vm, tmp) - data_len -
	    (vnet_buffer (tmp)->ip.reass.ip6_frag_hdr_offset +
	     sizeof (*frag_hdr));
	  if (!(vlib_buffer_length_in_chain (vm, tmp) - trim_end > 0))
	    {
	      rv = IP6_FULL_REASS_RC_INTERNAL_ERROR;
	      goto free_buffers_and_return;
	    }
	}
      u32 keep_data =
	vlib_buffer_length_in_chain (vm, tmp) - trim_front - trim_end;
      while (1)
	{
	  if (trim_front)
	    {
	      if (trim_front > tmp->current_length)
		{
		  /* drop whole buffer */
		  if (!(tmp->flags & VLIB_BUFFER_NEXT_PRESENT))
		    {
		      rv = IP6_FULL_REASS_RC_INTERNAL_ERROR;
		      goto free_buffers_and_return;
		    }
		  trim_front -= tmp->current_length;
		  vec_add1 (vec_drop_compress, tmp_bi);
		  tmp->flags &= ~VLIB_BUFFER_NEXT_PRESENT;
		  tmp_bi = tmp->next_buffer;
		  tmp = vlib_get_buffer (vm, tmp_bi);
		  continue;
		}
	      else
		{
		  vlib_buffer_advance (tmp, trim_front);
		  trim_front = 0;
		}
	    }
	  if (keep_data)
	    {
	      if (last_b)
		{
		  last_b->flags |= VLIB_BUFFER_NEXT_PRESENT;
		  last_b->next_buffer = tmp_bi;
		}
	      last_b = tmp;
	      if (keep_data <= tmp->current_length)
		{
		  tmp->current_length = keep_data;
		  keep_data = 0;
		}
	      else
		{
		  keep_data -= tmp->current_length;
		  if (!(tmp->flags & VLIB_BUFFER_NEXT_PRESENT))
		    {
		      rv = IP6_FULL_REASS_RC_INTERNAL_ERROR;
		      goto free_buffers_and_return;
		    }
		}
	      total_length += tmp->current_length;
	    }
	  else
	    {
	      if (reass->first_bi == tmp_bi)
		{
		  rv = IP6_FULL_REASS_RC_INTERNAL_ERROR;
		  goto free_buffers_and_return;
		}
	      vec_add1 (vec_drop_compress, tmp_bi);
	    }
	  if (tmp->flags & VLIB_BUFFER_NEXT_PRESENT)
	    {
	      tmp_bi = tmp->next_buffer;
	      tmp = vlib_get_buffer (vm, tmp->next_buffer);
	    }
	  else
	    {
	      break;
	    }
	}
      sub_chain_bi =
	vnet_buffer (vlib_get_buffer (vm, sub_chain_bi))->ip.
	reass.next_range_bi;
    }
  while (~0 != sub_chain_bi);

  if (!last_b)
    {
      rv = IP6_FULL_REASS_RC_INTERNAL_ERROR;
      goto free_buffers_and_return;
    }
  last_b->flags &= ~VLIB_BUFFER_NEXT_PRESENT;
  vlib_buffer_t *first_b = vlib_get_buffer (vm, reass->first_bi);
  if (total_length < first_b->current_length)
    {
      rv = IP6_FULL_REASS_RC_INTERNAL_ERROR;
      goto free_buffers_and_return;
    }
  total_length -= first_b->current_length;
  first_b->flags |= VLIB_BUFFER_TOTAL_LENGTH_VALID;
  first_b->total_length_not_including_first_buffer = total_length;
  // drop fragment header
  vnet_buffer_opaque_t *first_b_vnb = vnet_buffer (first_b);
  ip6_header_t *ip = vlib_buffer_get_current (first_b);
  u16 ip6_frag_hdr_offset = first_b_vnb->ip.reass.ip6_frag_hdr_offset;
  ip6_ext_hdr_chain_t hdr_chain;
  ip6_ext_header_t *prev_hdr = 0;
  int res = ip6_ext_header_walk (first_b, ip, IP_PROTOCOL_IPV6_FRAGMENTATION,
				 &hdr_chain);
  if (res < 0 ||
      (hdr_chain.eh[res].protocol != IP_PROTOCOL_IPV6_FRAGMENTATION))
    {
      rv = IP6_FULL_REASS_RC_INTERNAL_ERROR;
      goto free_buffers_and_return;
    }
  frag_hdr = ip6_ext_next_header_offset (ip, hdr_chain.eh[res].offset);
  if (res > 0)
    {
      prev_hdr = ip6_ext_next_header_offset (ip, hdr_chain.eh[res - 1].offset);
      prev_hdr->next_hdr = frag_hdr->next_hdr;
    }
  else
    {
      ip->protocol = frag_hdr->next_hdr;
    }
  if (hdr_chain.eh[res].offset != ip6_frag_hdr_offset)
    {
      rv = IP6_FULL_REASS_RC_INTERNAL_ERROR;
      goto free_buffers_and_return;
    }
  memmove (frag_hdr, (u8 *) frag_hdr + sizeof (*frag_hdr),
	   first_b->current_length - ip6_frag_hdr_offset -
	   sizeof (ip6_frag_hdr_t));
  first_b->current_length -= sizeof (*frag_hdr);
  ip->payload_length =
    clib_host_to_net_u16 (total_length + first_b->current_length -
			  sizeof (*ip));
  if (!vlib_buffer_chain_linearize (vm, first_b))
    {
      rv = IP6_FULL_REASS_RC_NO_BUF;
      goto free_buffers_and_return;
    }
  first_b->flags &= ~VLIB_BUFFER_EXT_HDR_VALID;
  if (PREDICT_FALSE (first_b->flags & VLIB_BUFFER_IS_TRACED))
    {
      ip6_full_reass_add_trace (vm, node, reass, reass->first_bi, NULL,
				FINALIZE, ~0);
#if 0
      // following code does a hexdump of packet fragments to stdout ...
      do
	{
	  u32 bi = reass->first_bi;
	  u8 *s = NULL;
	  while (~0 != bi)
	    {
	      vlib_buffer_t *b = vlib_get_buffer (vm, bi);
	      s = format (s, "%u: %U\n", bi, format_hexdump,
			  vlib_buffer_get_current (b), b->current_length);
	      if (b->flags & VLIB_BUFFER_NEXT_PRESENT)
		{
		  bi = b->next_buffer;
		}
	      else
		{
		  break;
		}
	    }
	  printf ("%.*s\n", vec_len (s), s);
	  fflush (stdout);
	  vec_free (s);
	}
      while (0);
#endif
    }
  if (!is_custom_app)
    {
      *next0 = IP6_FULL_REASSEMBLY_NEXT_INPUT;
    }
  else
    {
      *next0 = reass->next_index;
    }
  vnet_buffer (first_b)->ip.reass.estimated_mtu = reass->min_fragment_length;
  /* Keep track of number of successfully reassembled packets and number of
   * fragments reassembled */
  vlib_node_increment_counter (vm, node->node_index, IP6_ERROR_REASS_SUCCESS,
			       1);

  vlib_node_increment_counter (vm, node->node_index,
			       IP6_ERROR_REASS_FRAGMENTS_REASSEMBLED,
			       reass->fragments_n);

  ip6_full_reass_free (rm, rt, reass);
  reass = NULL;
free_buffers_and_return:
  vlib_buffer_free (vm, vec_drop_compress, vec_len (vec_drop_compress));
  vec_free (vec_drop_compress);
  return rv;
}

always_inline void
ip6_full_reass_insert_range_in_chain (vlib_main_t * vm,
				      ip6_full_reass_t * reass,
				      u32 prev_range_bi, u32 new_next_bi)
{

  vlib_buffer_t *new_next_b = vlib_get_buffer (vm, new_next_bi);
  vnet_buffer_opaque_t *new_next_vnb = vnet_buffer (new_next_b);
  if (~0 != prev_range_bi)
    {
      vlib_buffer_t *prev_b = vlib_get_buffer (vm, prev_range_bi);
      vnet_buffer_opaque_t *prev_vnb = vnet_buffer (prev_b);
      new_next_vnb->ip.reass.next_range_bi = prev_vnb->ip.reass.next_range_bi;
      prev_vnb->ip.reass.next_range_bi = new_next_bi;
    }
  else
    {
      if (~0 != reass->first_bi)
	{
	  new_next_vnb->ip.reass.next_range_bi = reass->first_bi;
	}
      reass->first_bi = new_next_bi;
    }
  reass->data_len += ip6_full_reass_buffer_get_data_len (new_next_b);
}

always_inline ip6_full_reass_rc_t
ip6_full_reass_update (vlib_main_t *vm, vlib_node_runtime_t *node,
		       ip6_full_reass_main_t *rm,
		       ip6_full_reass_per_thread_t *rt,
		       ip6_full_reass_t *reass, u32 *bi0, u32 *next0,
		       u32 *error0, ip6_frag_hdr_t *frag_hdr,
		       bool is_custom_app, u32 *handoff_thread_idx,
		       int skip_bihash)
{
  int consumed = 0;
  vlib_buffer_t *fb = vlib_get_buffer (vm, *bi0);
  vnet_buffer_opaque_t *fvnb = vnet_buffer (fb);
  if (is_custom_app)
    {
      reass->next_index = fvnb->ip.reass.next_index;	// store next_index before it's overwritten
      reass->error_next_index = fvnb->ip.reass.error_next_index;	// store error_next_index before it is overwritten
    }

  fvnb->ip.reass.ip6_frag_hdr_offset =
    (u8 *) frag_hdr - (u8 *) vlib_buffer_get_current (fb);
  ip6_header_t *fip = vlib_buffer_get_current (fb);
  if (fb->current_length < sizeof (*fip) ||
      fvnb->ip.reass.ip6_frag_hdr_offset == 0 ||
      fvnb->ip.reass.ip6_frag_hdr_offset >= fb->current_length)
    {
      return IP6_FULL_REASS_RC_INTERNAL_ERROR;
    }

  u32 fragment_first = fvnb->ip.reass.fragment_first =
    ip6_frag_hdr_offset_bytes (frag_hdr);
  u32 fragment_length =
    vlib_buffer_length_in_chain (vm, fb) -
    (fvnb->ip.reass.ip6_frag_hdr_offset + sizeof (*frag_hdr));
  if (0 == fragment_length)
    {
      return IP6_FULL_REASS_RC_INVALID_FRAG_LEN;
    }
  u32 fragment_last = fvnb->ip.reass.fragment_last =
    fragment_first + fragment_length - 1;
  int more_fragments = ip6_frag_hdr_more (frag_hdr);
  u32 candidate_range_bi = reass->first_bi;
  u32 prev_range_bi = ~0;
  fvnb->ip.reass.range_first = fragment_first;
  fvnb->ip.reass.range_last = fragment_last;
  fvnb->ip.reass.next_range_bi = ~0;
  if (!more_fragments)
    {
      reass->last_packet_octet = fragment_last;
    }
  if (~0 == reass->first_bi)
    {
      // starting a new reassembly
      ip6_full_reass_insert_range_in_chain (vm, reass, prev_range_bi, *bi0);
      reass->min_fragment_length = clib_net_to_host_u16 (fip->payload_length);
      consumed = 1;
      reass->fragments_n = 1;
      goto check_if_done_maybe;
    }
  reass->min_fragment_length =
    clib_min (clib_net_to_host_u16 (fip->payload_length),
	      fvnb->ip.reass.estimated_mtu);
  while (~0 != candidate_range_bi)
    {
      vlib_buffer_t *candidate_b = vlib_get_buffer (vm, candidate_range_bi);
      vnet_buffer_opaque_t *candidate_vnb = vnet_buffer (candidate_b);
      if (fragment_first > candidate_vnb->ip.reass.range_last)
	{
	  // this fragments starts after candidate range
	  prev_range_bi = candidate_range_bi;
	  candidate_range_bi = candidate_vnb->ip.reass.next_range_bi;
	  if (candidate_vnb->ip.reass.range_last < fragment_last &&
	      ~0 == candidate_range_bi)
	    {
	      // special case - this fragment falls beyond all known ranges
	      ip6_full_reass_insert_range_in_chain (vm, reass, prev_range_bi,
						    *bi0);
	      consumed = 1;
	      break;
	    }
	  continue;
	}
      if (fragment_last < candidate_vnb->ip.reass.range_first)
	{
	  // this fragment ends before candidate range without any overlap
	  ip6_full_reass_insert_range_in_chain (vm, reass, prev_range_bi,
						*bi0);
	  consumed = 1;
	}
      else if (fragment_first == candidate_vnb->ip.reass.range_first &&
	       fragment_last == candidate_vnb->ip.reass.range_last)
	{
	  // duplicate fragment - ignore
	}
      else
	{
	  // overlapping fragment - not allowed by RFC 8200
	  if (PREDICT_FALSE (fb->flags & VLIB_BUFFER_IS_TRACED))
	    {
	      ip6_full_reass_add_trace (vm, node, reass, *bi0, frag_hdr,
					RANGE_OVERLAP, ~0);
	    }
	  return IP6_FULL_REASS_RC_OVERLAP;
	}
      break;
    }
  ++reass->fragments_n;
check_if_done_maybe:
  if (consumed)
    {
      if (PREDICT_FALSE (fb->flags & VLIB_BUFFER_IS_TRACED))
	{
	  ip6_full_reass_add_trace (vm, node, reass, *bi0, frag_hdr, RANGE_NEW,
				    ~0);
	}
    }
  else if (skip_bihash)
    {
      // if this reassembly is not in bihash, then the packet must have been
      // consumed
      return IP6_FULL_REASS_RC_INTERNAL_ERROR;
    }
  if (~0 != reass->last_packet_octet &&
      reass->data_len == reass->last_packet_octet + 1)
    {
      *handoff_thread_idx = reass->sendout_thread_index;
      int handoff =
	reass->memory_owner_thread_index != reass->sendout_thread_index;
      ip6_full_reass_rc_t rc =
	ip6_full_reass_finalize (vm, node, rm, rt, reass, bi0, next0, error0,
				 is_custom_app);
      if (IP6_FULL_REASS_RC_OK == rc && handoff)
	{
	  return IP6_FULL_REASS_RC_HANDOFF;
	}
      return rc;
    }
  else
    {
      if (skip_bihash)
	{
	  // if this reassembly is not in bihash, it should've been an atomic
	  // fragment and thus finalized
	  return IP6_FULL_REASS_RC_INTERNAL_ERROR;
	}
      if (consumed)
	{
	  *bi0 = ~0;
	  if (reass->fragments_n > rm->max_reass_len)
	    {
	      return IP6_FULL_REASS_RC_TOO_MANY_FRAGMENTS;
	    }
	}
      else
	{
	  *next0 = IP6_FULL_REASSEMBLY_NEXT_DROP;
	  *error0 = IP6_ERROR_REASS_DUPLICATE_FRAGMENT;
	}
    }
  return IP6_FULL_REASS_RC_OK;
}

always_inline bool
ip6_full_reass_verify_upper_layer_present (vlib_node_runtime_t *node,
					   vlib_buffer_t *b,
					   ip6_ext_hdr_chain_t *hc)
{
  int nh = hc->eh[hc->length - 1].protocol;
  /* Checking to see if it's a terminating header */
  if (ip6_ext_hdr (nh))
    {
      icmp6_error_set_vnet_buffer (
	b, ICMP6_parameter_problem,
	ICMP6_parameter_problem_first_fragment_has_incomplete_header_chain, 0);
      b->error = node->errors[IP6_ERROR_REASS_MISSING_UPPER];
      return false;
    }
  return true;
}

always_inline bool
ip6_full_reass_verify_fragment_multiple_8 (vlib_main_t *vm,
					   vlib_node_runtime_t *node,
					   vlib_buffer_t *b,
					   ip6_frag_hdr_t *frag_hdr)
{
  vnet_buffer_opaque_t *vnb = vnet_buffer (b);
  ip6_header_t *ip = vlib_buffer_get_current (b);
  int more_fragments = ip6_frag_hdr_more (frag_hdr);
  u32 fragment_length =
    vlib_buffer_length_in_chain (vm, b) -
    (vnb->ip.reass.ip6_frag_hdr_offset + sizeof (*frag_hdr));
  if (more_fragments && 0 != fragment_length % 8)
    {
      icmp6_error_set_vnet_buffer (b, ICMP6_parameter_problem,
				   ICMP6_parameter_problem_erroneous_header_field,
				   (u8 *) & ip->payload_length - (u8 *) ip);
      b->error = node->errors[IP6_ERROR_REASS_INVALID_FRAG_SIZE];
      return false;
    }
  return true;
}

always_inline bool
ip6_full_reass_verify_packet_size_lt_64k (vlib_main_t *vm,
					  vlib_node_runtime_t *node,
					  vlib_buffer_t *b,
					  ip6_frag_hdr_t *frag_hdr)
{
  vnet_buffer_opaque_t *vnb = vnet_buffer (b);
  u32 fragment_first = ip6_frag_hdr_offset_bytes (frag_hdr);
  u32 fragment_length =
    vlib_buffer_length_in_chain (vm, b) -
    (vnb->ip.reass.ip6_frag_hdr_offset + sizeof (*frag_hdr));
  if (fragment_first + fragment_length > 65535)
    {
      ip6_header_t *ip0 = vlib_buffer_get_current (b);
      icmp6_error_set_vnet_buffer (b, ICMP6_parameter_problem,
				   ICMP6_parameter_problem_erroneous_header_field,
				   (u8 *) & frag_hdr->fragment_offset_and_more
				   - (u8 *) ip0);
      b->error = node->errors[IP6_ERROR_REASS_INVALID_FRAG_SIZE];
      return false;
    }
  return true;
}

always_inline uword
ip6_full_reassembly_inline (vlib_main_t *vm, vlib_node_runtime_t *node,
			    vlib_frame_t *frame, bool is_feature,
			    bool is_custom_app, bool is_local)
{
  u32 *from = vlib_frame_vector_args (frame);
  u32 n_left_from, n_left_to_next, *to_next, next_index;
  ip6_full_reass_main_t *rm = &ip6_full_reass_main;
  ip6_full_reass_per_thread_t *rt = &rm->per_thread_data[vm->thread_index];
  clib_spinlock_lock (&rt->lock);

  n_left_from = frame->n_vectors;
  next_index = node->cached_next_index;
  while (n_left_from > 0)
    {
      vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next);

      while (n_left_from > 0 && n_left_to_next > 0)
	{
	  u32 bi0;
	  vlib_buffer_t *b0;
	  u32 next0 = IP6_FULL_REASSEMBLY_NEXT_DROP;
	  u32 error0 = IP6_ERROR_NONE;
	  u32 icmp_bi = ~0;

	  bi0 = from[0];
	  b0 = vlib_get_buffer (vm, bi0);

	  ip6_header_t *ip0 = vlib_buffer_get_current (b0);
	  ip6_frag_hdr_t *frag_hdr = NULL;
	  ip6_ext_hdr_chain_t hdr_chain;
	  vnet_buffer_opaque_t *fvnb = vnet_buffer (b0);

	  int res = ip6_ext_header_walk (
	    b0, ip0, IP_PROTOCOL_IPV6_FRAGMENTATION, &hdr_chain);
	  if (res < 0 ||
	      hdr_chain.eh[res].protocol != IP_PROTOCOL_IPV6_FRAGMENTATION)
	    {
	      vlib_node_increment_counter (vm, node->node_index,
					   IP6_ERROR_REASS_NO_FRAG_HDR, 1);
	      // this is a mangled packet - no fragmentation
	      next0 = is_custom_app ? fvnb->ip.reass.error_next_index :
					    IP6_FULL_REASSEMBLY_NEXT_DROP;
	      ip6_full_reass_add_trace (vm, node, NULL, bi0, NULL, PASSTHROUGH,
					~0);
	      goto skip_reass;
	    }
	  if (is_local && !rm->is_local_reass_enabled)
	    {
	      next0 = IP6_FULL_REASSEMBLY_NEXT_DROP;
	      goto skip_reass;
	    }

	  /* Keep track of received fragments */
	  vlib_node_increment_counter (vm, node->node_index,
				       IP6_ERROR_REASS_FRAGMENTS_RCVD, 1);
	  frag_hdr =
	    ip6_ext_next_header_offset (ip0, hdr_chain.eh[res].offset);
	  vnet_buffer (b0)->ip.reass.ip6_frag_hdr_offset =
	    hdr_chain.eh[res].offset;

	  if (0 == ip6_frag_hdr_offset (frag_hdr))
	    {
	      // first fragment - verify upper-layer is present
	      if (!ip6_full_reass_verify_upper_layer_present (node, b0,
							      &hdr_chain))
		{
		  next0 = is_custom_app ? fvnb->ip.reass.error_next_index :
						IP6_FULL_REASSEMBLY_NEXT_ICMP_ERROR;
		  goto skip_reass;
		}
	    }

	  if (!ip6_full_reass_verify_fragment_multiple_8 (vm, node, b0,
							  frag_hdr) ||
	      !ip6_full_reass_verify_packet_size_lt_64k (vm, node, b0,
							 frag_hdr))
	    {
	      next0 = is_custom_app ? fvnb->ip.reass.error_next_index :
					    IP6_FULL_REASSEMBLY_NEXT_ICMP_ERROR;
	      goto skip_reass;
	    }

	  int skip_bihash = 0;
	  ip6_full_reass_kv_t kv;
	  u8 do_handoff = 0;

	  if (0 == ip6_frag_hdr_offset (frag_hdr) &&
	      !ip6_frag_hdr_more (frag_hdr))
	    {
	      // this is atomic fragment and needs to be processed separately
	      skip_bihash = 1;
	    }
	  else
	    {
	      u32 fib_index =
		(vnet_buffer (b0)->sw_if_index[VLIB_TX] == (u32) ~0) ?
			vec_elt (ip6_main.fib_index_by_sw_if_index,
			   vnet_buffer (b0)->sw_if_index[VLIB_RX]) :
			vnet_buffer (b0)->sw_if_index[VLIB_TX];
	      kv.k.as_u64[0] = ip0->src_address.as_u64[0];
	      kv.k.as_u64[1] = ip0->src_address.as_u64[1];
	      kv.k.as_u64[2] = ip0->dst_address.as_u64[0];
	      kv.k.as_u64[3] = ip0->dst_address.as_u64[1];
	      kv.k.as_u64[4] =
		((u64) fib_index) << 32 | (u64) frag_hdr->identification;
	      /* RFC 8200: The Next Header values in the Fragment headers of
	       * different fragments of the same original packet may differ.
	       * Only the value from the Offset zero fragment packet is used
	       * for reassembly.
	       *
	       * Also, IPv6 Header doesnt contain the protocol value unlike
	       * IPv4.*/
	      kv.k.as_u64[5] = 0;
	    }

	  ip6_full_reass_t *reass = ip6_full_reass_find_or_create (
	    vm, node, rm, rt, &kv, &icmp_bi, &do_handoff, skip_bihash,
	    &n_left_to_next, &to_next);

	  if (reass)
	    {
	      const u32 fragment_first = ip6_frag_hdr_offset (frag_hdr);
	      if (0 == fragment_first)
		{
		  reass->sendout_thread_index = vm->thread_index;
		}
	    }
	  if (PREDICT_FALSE (do_handoff))
	    {
	      next0 = IP6_FULL_REASSEMBLY_NEXT_HANDOFF;
	      vnet_buffer (b0)->ip.reass.owner_thread_index =
		kv.v.memory_owner_thread_index;
	    }
	  else if (reass)
	    {
	      u32 handoff_thread_idx;
	      u32 counter = ~0;
	      switch (ip6_full_reass_update (
		vm, node, rm, rt, reass, &bi0, &next0, &error0, frag_hdr,
		is_custom_app, &handoff_thread_idx, skip_bihash))
		{
		case IP6_FULL_REASS_RC_OK:
		  /* nothing to do here */
		  break;
		case IP6_FULL_REASS_RC_HANDOFF:
		  next0 = IP6_FULL_REASSEMBLY_NEXT_HANDOFF;
		  b0 = vlib_get_buffer (vm, bi0);
		  vnet_buffer (b0)->ip.reass.owner_thread_index =
		    handoff_thread_idx;
		  break;
		case IP6_FULL_REASS_RC_TOO_MANY_FRAGMENTS:
		  counter = IP6_ERROR_REASS_FRAGMENT_CHAIN_TOO_LONG;
		  break;
		case IP6_FULL_REASS_RC_NO_BUF:
		  counter = IP6_ERROR_REASS_NO_BUF;
		  break;
		case IP6_FULL_REASS_RC_INVALID_FRAG_LEN:
		  counter = IP6_ERROR_REASS_INVALID_FRAG_LEN;
		  break;
		case IP6_FULL_REASS_RC_OVERLAP:
		  counter = IP6_ERROR_REASS_OVERLAPPING_FRAGMENT;
		  break;
		case IP6_FULL_REASS_RC_INTERNAL_ERROR:
		  counter = IP6_ERROR_REASS_INTERNAL_ERROR;
		  /* Sanitization is needed in internal error cases only, as
		   * the incoming packet is already dropped in other cases,
		   * also adding bi0 back to the reassembly list, fixes the
		   * leaking of buffers during internal errors.
		   *
		   * Also it doesnt make sense to send these buffers custom
		   * app, these fragments are with internal errors */
		  sanitize_reass_buffers_add_missing (vm, reass, &bi0);
		  reass->error_next_index = ~0;
		  break;
		}
	      if (~0 != counter)
		{
		  vlib_node_increment_counter (vm, node->node_index, counter,
					       1);
		  ip6_full_reass_drop_all (vm, node, reass, &n_left_to_next,
					   &to_next);
		  ip6_full_reass_free (rm, rt, reass);
		  goto next_packet;
		  break;
		}
	    }
	  else
	    {
	      if (is_feature)
		{
		  next0 = IP6_FULL_REASSEMBLY_NEXT_DROP;
		}
	      else
		{
		  next0 = fvnb->ip.reass.error_next_index;
		}
	      error0 = IP6_ERROR_REASS_LIMIT_REACHED;
	    }

	  if (~0 != bi0)
	    {
	    skip_reass:
	      to_next[0] = bi0;
	      to_next += 1;
	      n_left_to_next -= 1;

	      /* bi0 might have been updated by reass_finalize, reload */
	      b0 = vlib_get_buffer (vm, bi0);
	      if (IP6_ERROR_NONE != error0)
		{
		  b0->error = node->errors[error0];
		}

	      if (next0 == IP6_FULL_REASSEMBLY_NEXT_HANDOFF)
		{
		  if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED))
		    {
		      ip6_full_reass_add_trace (
			vm, node, NULL, bi0, frag_hdr, HANDOFF,
			vnet_buffer (b0)->ip.reass.owner_thread_index);
		    }
		}
	      else if (is_feature && IP6_ERROR_NONE == error0)
		{
		  vnet_feature_next (&next0, b0);
		}

	      /* Increment the counter to-custom-app also as this fragment is
	       * also going to application */
	      if (is_custom_app)
		{
		  vlib_node_increment_counter (
		    vm, node->node_index, IP6_ERROR_REASS_TO_CUSTOM_APP, 1);
		}

	      vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next,
					       n_left_to_next, bi0, next0);
	    }

	  if (~0 != icmp_bi)
	    {
	      next0 = IP6_FULL_REASSEMBLY_NEXT_ICMP_ERROR;
	      to_next[0] = icmp_bi;
	      to_next += 1;
	      n_left_to_next -= 1;
	      vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next,
					       n_left_to_next, icmp_bi,
					       next0);
	    }
	next_packet:
	  from += 1;
	  n_left_from -= 1;
	}

      vlib_put_next_frame (vm, node, next_index, n_left_to_next);
    }

  clib_spinlock_unlock (&rt->lock);
  return frame->n_vectors;
}

VLIB_NODE_FN (ip6_full_reass_node) (vlib_main_t * vm,
				    vlib_node_runtime_t * node,
				    vlib_frame_t * frame)
{
  return ip6_full_reassembly_inline (vm, node, frame, false /* is_feature */,
				     false /* is_custom_app */,
				     false /* is_local */);
}

VLIB_REGISTER_NODE (ip6_full_reass_node) = {
    .name = "ip6-full-reassembly",
    .vector_size = sizeof (u32),
    .format_trace = format_ip6_full_reass_trace,
    .n_errors = IP6_N_ERROR,
    .error_counters = ip6_error_counters,
    .n_next_nodes = IP6_FULL_REASSEMBLY_N_NEXT,
    .next_nodes =
        {
                [IP6_FULL_REASSEMBLY_NEXT_INPUT] = "ip6-input",
                [IP6_FULL_REASSEMBLY_NEXT_DROP] = "ip6-drop",
                [IP6_FULL_REASSEMBLY_NEXT_ICMP_ERROR] = "ip6-icmp-error",
                [IP6_FULL_REASSEMBLY_NEXT_HANDOFF] = "ip6-full-reassembly-handoff",
        },
};

VLIB_NODE_FN (ip6_local_full_reass_node)
(vlib_main_t *vm, vlib_node_runtime_t *node, vlib_frame_t *frame)
{
  return ip6_full_reassembly_inline (vm, node, frame, false /* is_feature */,
				     false /* is_custom_app */,
				     true /* is_local */);
}

VLIB_REGISTER_NODE (ip6_local_full_reass_node) = {
    .name = "ip6-local-full-reassembly",
    .vector_size = sizeof (u32),
    .format_trace = format_ip6_full_reass_trace,
    .n_errors = IP6_N_ERROR,
    .error_counters = ip6_error_counters,
    .n_next_nodes = IP6_FULL_REASSEMBLY_N_NEXT,
    .next_nodes =
        {
                [IP6_FULL_REASSEMBLY_NEXT_INPUT] = "ip6-input",
                [IP6_FULL_REASSEMBLY_NEXT_DROP] = "ip6-drop",
                [IP6_FULL_REASSEMBLY_NEXT_ICMP_ERROR] = "ip6-icmp-error",
                [IP6_FULL_REASSEMBLY_NEXT_HANDOFF] = "ip6-local-full-reassembly-handoff",
        },
};

VLIB_NODE_FN (ip6_full_reass_node_feature) (vlib_main_t * vm,
					    vlib_node_runtime_t * node,
					    vlib_frame_t * frame)
{
  return ip6_full_reassembly_inline (vm, node, frame, true /* is_feature */,
				     false /* is_custom_app */,
				     false /* is_local */);
}

VLIB_REGISTER_NODE (ip6_full_reass_node_feature) = {
    .name = "ip6-full-reassembly-feature",
    .vector_size = sizeof (u32),
    .format_trace = format_ip6_full_reass_trace,
    .n_errors = IP6_N_ERROR,
    .error_counters = ip6_error_counters,
    .n_next_nodes = IP6_FULL_REASSEMBLY_N_NEXT,
    .next_nodes =
        {
                [IP6_FULL_REASSEMBLY_NEXT_INPUT] = "ip6-input",
                [IP6_FULL_REASSEMBLY_NEXT_DROP] = "ip6-drop",
                [IP6_FULL_REASSEMBLY_NEXT_ICMP_ERROR] = "ip6-icmp-error",
                [IP6_FULL_REASSEMBLY_NEXT_HANDOFF] = "ip6-full-reass-feature-hoff",
        },
};

VNET_FEATURE_INIT (ip6_full_reassembly_feature, static) = {
    .arc_name = "ip6-unicast",
    .node_name = "ip6-full-reassembly-feature",
    .runs_before = VNET_FEATURES ("ip6-lookup",
                                  "ipsec6-input-feature"),
    .runs_after = 0,
};

VLIB_NODE_FN (ip6_full_reass_node_custom)
(vlib_main_t *vm, vlib_node_runtime_t *node, vlib_frame_t *frame)
{
  return ip6_full_reassembly_inline (vm, node, frame, false /* is_feature */,
				     true /* is_custom_app */,
				     false /* is_local */);
}

VLIB_REGISTER_NODE (ip6_full_reass_node_custom) = {
    .name = "ip6-full-reassembly-custom",
    .vector_size = sizeof (u32),
    .format_trace = format_ip6_full_reass_trace,
    .n_errors = IP6_N_ERROR,
    .error_counters = ip6_error_counters,
    .n_next_nodes = IP6_FULL_REASSEMBLY_N_NEXT,
    .next_nodes =
        {
                [IP6_FULL_REASSEMBLY_NEXT_INPUT] = "ip6-input",
                [IP6_FULL_REASSEMBLY_NEXT_DROP] = "ip6-drop",
                [IP6_FULL_REASSEMBLY_NEXT_ICMP_ERROR] = "ip6-icmp-error",
                [IP6_FULL_REASSEMBLY_NEXT_HANDOFF] = "ip6-full-reass-custom-hoff",
        },
};

#ifndef CLIB_MARCH_VARIANT
static u32
ip6_full_reass_get_nbuckets ()
{
  ip6_full_reass_main_t *rm = &ip6_full_reass_main;
  u32 nbuckets;
  u8 i;

  /* need more mem with more workers */
  nbuckets = (u32) (rm->max_reass_n * (vlib_num_workers () + 1) /
		    IP6_FULL_REASS_HT_LOAD_FACTOR);

  for (i = 0; i < 31; i++)
    if ((1 << i) >= nbuckets)
      break;
  nbuckets = 1 << i;

  return nbuckets;
}
#endif /* CLIB_MARCH_VARIANT */

typedef enum
{
  IP6_EVENT_CONFIG_CHANGED = 1,
} ip6_full_reass_event_t;

#ifndef CLIB_MARCH_VARIANT
typedef struct
{
  int failure;
  clib_bihash_48_8_t *new_hash;
} ip6_rehash_cb_ctx;

static int
ip6_rehash_cb (clib_bihash_kv_48_8_t * kv, void *_ctx)
{
  ip6_rehash_cb_ctx *ctx = _ctx;
  if (clib_bihash_add_del_48_8 (ctx->new_hash, kv, 1))
    {
      ctx->failure = 1;
    }
  return (BIHASH_WALK_CONTINUE);
}

static void
ip6_full_reass_set_params (u32 timeout_ms, u32 max_reassemblies,
			   u32 max_reassembly_length,
			   u32 expire_walk_interval_ms)
{
  ip6_full_reass_main.timeout_ms = timeout_ms;
  ip6_full_reass_main.timeout = (f64) timeout_ms / (f64) MSEC_PER_SEC;
  ip6_full_reass_main.max_reass_n = max_reassemblies;
  ip6_full_reass_main.max_reass_len = max_reassembly_length;
  ip6_full_reass_main.expire_walk_interval_ms = expire_walk_interval_ms;
}

vnet_api_error_t
ip6_full_reass_set (u32 timeout_ms, u32 max_reassemblies,
		    u32 max_reassembly_length, u32 expire_walk_interval_ms)
{
  u32 old_nbuckets = ip6_full_reass_get_nbuckets ();
  ip6_full_reass_set_params (timeout_ms, max_reassemblies,
			     max_reassembly_length, expire_walk_interval_ms);
  vlib_process_signal_event (ip6_full_reass_main.vlib_main,
			     ip6_full_reass_main.ip6_full_reass_expire_node_idx,
			     IP6_EVENT_CONFIG_CHANGED, 0);
  u32 new_nbuckets = ip6_full_reass_get_nbuckets ();
  if (ip6_full_reass_main.max_reass_n > 0 && new_nbuckets > old_nbuckets)
    {
      clib_bihash_48_8_t new_hash;
      clib_memset (&new_hash, 0, sizeof (new_hash));
      ip6_rehash_cb_ctx ctx;
      ctx.failure = 0;
      ctx.new_hash = &new_hash;
      clib_bihash_init_48_8 (&new_hash, "ip6-full-reass", new_nbuckets,
			     new_nbuckets * 1024);
      clib_bihash_foreach_key_value_pair_48_8 (&ip6_full_reass_main.hash,
					       ip6_rehash_cb, &ctx);
      if (ctx.failure)
	{
	  clib_bihash_free_48_8 (&new_hash);
	  return -1;
	}
      else
	{
	  clib_bihash_free_48_8 (&ip6_full_reass_main.hash);
	  clib_memcpy_fast (&ip6_full_reass_main.hash, &new_hash,
			    sizeof (ip6_full_reass_main.hash));
	  clib_bihash_copied (&ip6_full_reass_main.hash, &new_hash);
	}
    }
  return 0;
}

vnet_api_error_t
ip6_full_reass_get (u32 * timeout_ms, u32 * max_reassemblies,
		    u32 * max_reassembly_length,
		    u32 * expire_walk_interval_ms)
{
  *timeout_ms = ip6_full_reass_main.timeout_ms;
  *max_reassemblies = ip6_full_reass_main.max_reass_n;
  *max_reassembly_length = ip6_full_reass_main.max_reass_len;
  *expire_walk_interval_ms = ip6_full_reass_main.expire_walk_interval_ms;
  return 0;
}

static clib_error_t *
ip6_full_reass_init_function (vlib_main_t * vm)
{
  ip6_full_reass_main_t *rm = &ip6_full_reass_main;
  clib_error_t *error = 0;
  u32 nbuckets;
  vlib_node_t *node;

  rm->vlib_main = vm;

  vec_validate (rm->per_thread_data, vlib_num_workers ());
  ip6_full_reass_per_thread_t *rt;
  vec_foreach (rt, rm->per_thread_data)
  {
    clib_spinlock_init (&rt->lock);
    pool_alloc (rt->pool, rm->max_reass_n);
  }

  node = vlib_get_node_by_name (vm, (u8 *) "ip6-full-reassembly-expire-walk");
  ASSERT (node);
  rm->ip6_full_reass_expire_node_idx = node->index;

  ip6_full_reass_set_params (IP6_FULL_REASS_TIMEOUT_DEFAULT_MS,
			     IP6_FULL_REASS_MAX_REASSEMBLIES_DEFAULT,
			     IP6_FULL_REASS_MAX_REASSEMBLY_LENGTH_DEFAULT,
			     IP6_FULL_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS);

  nbuckets = ip6_full_reass_get_nbuckets ();
  clib_bihash_init_48_8 (&rm->hash, "ip6-full-reass", nbuckets,
			 nbuckets * 1024);

  node = vlib_get_node_by_name (vm, (u8 *) "ip6-icmp-error");
  ASSERT (node);
  rm->ip6_icmp_error_idx = node->index;

  if ((error = vlib_call_init_function (vm, ip_main_init)))
    return error;
  ip6_register_protocol (IP_PROTOCOL_IPV6_FRAGMENTATION,
			 ip6_local_full_reass_node.index);
  rm->is_local_reass_enabled = 1;

  rm->fq_index = vlib_frame_queue_main_init (ip6_full_reass_node.index, 0);
  rm->fq_local_index =
    vlib_frame_queue_main_init (ip6_local_full_reass_node.index, 0);
  rm->fq_feature_index =
    vlib_frame_queue_main_init (ip6_full_reass_node_feature.index, 0);
  rm->fq_custom_index =
    vlib_frame_queue_main_init (ip6_full_reass_node_custom.index, 0);

  rm->feature_use_refcount_per_intf = NULL;
  return error;
}

VLIB_INIT_FUNCTION (ip6_full_reass_init_function);
#endif /* CLIB_MARCH_VARIANT */

static uword
ip6_full_reass_walk_expired (vlib_main_t *vm, vlib_node_runtime_t *node,
			     CLIB_UNUSED (vlib_frame_t *f))
{
  ip6_full_reass_main_t *rm = &ip6_full_reass_main;
  uword event_type, *event_data = 0;

  while (true)
    {
      vlib_process_wait_for_event_or_clock (vm,
					    (f64) rm->expire_walk_interval_ms
					    / (f64) MSEC_PER_SEC);
      event_type = vlib_process_get_events (vm, &event_data);

      switch (event_type)
	{
	case ~0:
	  /* no events => timeout */
	  /* fallthrough */
	case IP6_EVENT_CONFIG_CHANGED:
	  /* nothing to do here */
	  break;
	default:
	  clib_warning ("BUG: event type 0x%wx", event_type);
	  break;
	}
      f64 now = vlib_time_now (vm);

      ip6_full_reass_t *reass;
      int *pool_indexes_to_free = NULL;

      uword thread_index = 0;
      int index;
      const uword nthreads = vlib_num_workers () + 1;
      u32 *vec_icmp_bi = NULL;
      u32 n_left_to_next, *to_next;

      for (thread_index = 0; thread_index < nthreads; ++thread_index)
	{
	  ip6_full_reass_per_thread_t *rt =
	    &rm->per_thread_data[thread_index];
	  u32 reass_timeout_cnt = 0;
	  clib_spinlock_lock (&rt->lock);

	  vec_reset_length (pool_indexes_to_free);
	  /* Pace the number of timeouts handled per thread,to avoid barrier
	   * sync issues in real world scenarios */

	  u32 beg = rt->last_id;
	  /* to ensure we walk at least once per sec per context */
	  u32 end = beg + (IP6_FULL_REASS_MAX_REASSEMBLIES_DEFAULT *
			     IP6_FULL_REASS_EXPIRE_WALK_INTERVAL_DEFAULT_MS /
			     MSEC_PER_SEC +
			   1);
	  if (end > vec_len (rt->pool))
	    {
	      end = vec_len (rt->pool);
	      rt->last_id = 0;
	    }
	  else
	    {
	      rt->last_id = end;
	    }

	  pool_foreach_stepping_index (index, beg, end, rt->pool)
	  {
	    reass = pool_elt_at_index (rt->pool, index);
	    if (now > reass->last_heard + rm->timeout)
	      {
		vec_add1 (pool_indexes_to_free, index);
	      }
	  }

	  int *i;
          vec_foreach (i, pool_indexes_to_free)
          {
            ip6_full_reass_t *reass = pool_elt_at_index (rt->pool, i[0]);
            u32 icmp_bi = ~0;

	    reass_timeout_cnt += reass->fragments_n;
	    ip6_full_reass_on_timeout (vm, node, reass, &icmp_bi,
				       &n_left_to_next, &to_next);
	    if (~0 != icmp_bi)
	      vec_add1 (vec_icmp_bi, icmp_bi);

	    ip6_full_reass_free (rm, rt, reass);
	  }

	  clib_spinlock_unlock (&rt->lock);
	  if (reass_timeout_cnt)
	    vlib_node_increment_counter (vm, node->node_index,
					 IP6_ERROR_REASS_TIMEOUT,
					 reass_timeout_cnt);
	}

      while (vec_len (vec_icmp_bi) > 0)
	{
	  vlib_frame_t *f =
	    vlib_get_frame_to_node (vm, rm->ip6_icmp_error_idx);
	  u32 *to_next = vlib_frame_vector_args (f);
	  u32 n_left_to_next = VLIB_FRAME_SIZE - f->n_vectors;
	  int trace_frame = 0;
	  while (vec_len (vec_icmp_bi) > 0 && n_left_to_next > 0)
	    {
	      u32 bi = vec_pop (vec_icmp_bi);
	      vlib_buffer_t *b = vlib_get_buffer (vm, bi);
	      if (PREDICT_FALSE (b->flags & VLIB_BUFFER_IS_TRACED))
		trace_frame = 1;
	      to_next[0] = bi;
	      ++f->n_vectors;
	      to_next += 1;
	      n_left_to_next -= 1;
	    }
	  f->frame_flags |= (trace_frame * VLIB_FRAME_TRACE);
	  vlib_put_frame_to_node (vm, rm->ip6_icmp_error_idx, f);
	}

      vec_free (pool_indexes_to_free);
      vec_free (vec_icmp_bi);
      if (event_data)
	{
	  vec_set_len (event_data, 0);
	}
    }

  return 0;
}

VLIB_REGISTER_NODE (ip6_full_reass_expire_node) = {
  .function = ip6_full_reass_walk_expired,
  .format_trace = format_ip6_full_reass_trace,
  .type = VLIB_NODE_TYPE_PROCESS,
  .name = "ip6-full-reassembly-expire-walk",

  .n_errors = IP6_N_ERROR,
  .error_counters = ip6_error_counters,
};

static u8 *
format_ip6_full_reass_key (u8 * s, va_list * args)
{
  ip6_full_reass_key_t *key = va_arg (*args, ip6_full_reass_key_t *);
  s = format (s, "xx_id: %u, src: %U, dst: %U, frag_id: %u, proto: %u",
	      key->xx_id, format_ip6_address, &key->src, format_ip6_address,
	      &key->dst, clib_net_to_host_u16 (key->frag_id), key->proto);
  return s;
}

static u8 *
format_ip6_full_reass (u8 * s, va_list * args)
{
  vlib_main_t *vm = va_arg (*args, vlib_main_t *);
  ip6_full_reass_t *reass = va_arg (*args, ip6_full_reass_t *);

  s = format (s, "ID: %lu, key: %U\n  first_bi: %u, data_len: %u, "
	      "last_packet_octet: %u, trace_op_counter: %u\n",
	      reass->id, format_ip6_full_reass_key, &reass->key,
	      reass->first_bi, reass->data_len, reass->last_packet_octet,
	      reass->trace_op_counter);
  u32 bi = reass->first_bi;
  u32 counter = 0;
  while (~0 != bi)
    {
      vlib_buffer_t *b = vlib_get_buffer (vm, bi);
      vnet_buffer_opaque_t *vnb = vnet_buffer (b);
      s = format (s, "  #%03u: range: [%u, %u], bi: %u, off: %d, len: %u, "
		  "fragment[%u, %u]\n",
		  counter, vnb->ip.reass.range_first,
		  vnb->ip.reass.range_last, bi,
		  ip6_full_reass_buffer_get_data_offset (b),
		  ip6_full_reass_buffer_get_data_len (b),
		  vnb->ip.reass.fragment_first, vnb->ip.reass.fragment_last);
      if (b->flags & VLIB_BUFFER_NEXT_PRESENT)
	{
	  bi = b->next_buffer;
	}
      else
	{
	  bi = ~0;
	}
    }
  return s;
}

static clib_error_t *
show_ip6_full_reass (vlib_main_t * vm, unformat_input_t * input,
		     CLIB_UNUSED (vlib_cli_command_t * lmd))
{
  ip6_full_reass_main_t *rm = &ip6_full_reass_main;

  vlib_cli_output (vm, "---------------------");
  vlib_cli_output (vm, "IP6 reassembly status");
  vlib_cli_output (vm, "---------------------");
  bool details = false;
  if (unformat (input, "details"))
    {
      details = true;
    }

  u32 sum_reass_n = 0;
  u64 sum_buffers_n = 0;
  ip6_full_reass_t *reass;
  uword thread_index;
  const uword nthreads = vlib_num_workers () + 1;
  for (thread_index = 0; thread_index < nthreads; ++thread_index)
    {
      ip6_full_reass_per_thread_t *rt = &rm->per_thread_data[thread_index];
      clib_spinlock_lock (&rt->lock);
      if (details)
	{
          pool_foreach (reass, rt->pool) {
            vlib_cli_output (vm, "%U", format_ip6_full_reass, vm, reass);
          }
	}
      sum_reass_n += rt->reass_n;
      clib_spinlock_unlock (&rt->lock);
    }
  vlib_cli_output (vm, "---------------------");
  vlib_cli_output (vm, "Current IP6 reassemblies count: %lu\n",
		   (long unsigned) sum_reass_n);
  vlib_cli_output (vm,
		   "Maximum configured concurrent full IP6 reassemblies per worker-thread: %lu\n",
		   (long unsigned) rm->max_reass_n);
  vlib_cli_output (vm,
		   "Maximum configured amount of fragments "
		   "per full IP6 reassembly: %lu\n",
		   (long unsigned) rm->max_reass_len);
  vlib_cli_output (vm,
		   "Maximum configured full IP6 reassembly timeout: %lums\n",
		   (long unsigned) rm->timeout_ms);
  vlib_cli_output (vm,
		   "Maximum configured full IP6 reassembly expire walk interval: %lums\n",
		   (long unsigned) rm->expire_walk_interval_ms);
  vlib_cli_output (vm, "Buffers in use: %lu\n",
		   (long unsigned) sum_buffers_n);
  return 0;
}

VLIB_CLI_COMMAND (show_ip6_full_reassembly_cmd, static) = {
    .path = "show ip6-full-reassembly",
    .short_help = "show ip6-full-reassembly [details]",
    .function = show_ip6_full_reass,
};

#ifndef CLIB_MARCH_VARIANT
vnet_api_error_t
ip6_full_reass_enable_disable (u32 sw_if_index, u8 enable_disable)
{
  return vnet_feature_enable_disable ("ip6-unicast",
				      "ip6-full-reassembly-feature",
				      sw_if_index, enable_disable, 0, 0);
}
#endif /* CLIB_MARCH_VARIANT */

#define foreach_ip6_full_reassembly_handoff_error                       \
_(CONGESTION_DROP, "congestion drop")


typedef enum
{
#define _(sym,str) IP6_FULL_REASSEMBLY_HANDOFF_ERROR_##sym,
  foreach_ip6_full_reassembly_handoff_error
#undef _
    IP6_FULL_REASSEMBLY_HANDOFF_N_ERROR,
} ip6_full_reassembly_handoff_error_t;

static char *ip6_full_reassembly_handoff_error_strings[] = {
#define _(sym,string) string,
  foreach_ip6_full_reassembly_handoff_error
#undef _
};

typedef struct
{
  u32 next_worker_index;
} ip6_full_reassembly_handoff_trace_t;

static u8 *
format_ip6_full_reassembly_handoff_trace (u8 * s, va_list * args)
{
  CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *);
  CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *);
  ip6_full_reassembly_handoff_trace_t *t =
    va_arg (*args, ip6_full_reassembly_handoff_trace_t *);

  s =
    format (s, "ip6-full-reassembly-handoff: next-worker %d",
	    t->next_worker_index);

  return s;
}

always_inline uword
ip6_full_reassembly_handoff_inline (vlib_main_t *vm, vlib_node_runtime_t *node,
				    vlib_frame_t *frame,
				    ip6_full_reass_node_type_t type,
				    bool is_local)
{
  ip6_full_reass_main_t *rm = &ip6_full_reass_main;

  vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b;
  u32 n_enq, n_left_from, *from;
  u16 thread_indices[VLIB_FRAME_SIZE], *ti;
  u32 fq_index;

  from = vlib_frame_vector_args (frame);
  n_left_from = frame->n_vectors;
  vlib_get_buffers (vm, from, bufs, n_left_from);

  b = bufs;
  ti = thread_indices;

  switch (type)
    {
    case NORMAL:
      if (is_local)
	{
	  fq_index = rm->fq_local_index;
	}
      else
	{
	  fq_index = rm->fq_index;
	}
      break;
    case FEATURE:
      fq_index = rm->fq_feature_index;
      break;
    case CUSTOM:
      fq_index = rm->fq_custom_index;
      break;
    default:
      clib_warning ("Unexpected `type' (%d)!", type);
      ASSERT (0);
    }
  while (n_left_from > 0)
    {
      ti[0] = vnet_buffer (b[0])->ip.reass.owner_thread_index;

      if (PREDICT_FALSE
	  ((node->flags & VLIB_NODE_FLAG_TRACE)
	   && (b[0]->flags & VLIB_BUFFER_IS_TRACED)))
	{
	  ip6_full_reassembly_handoff_trace_t *t =
	    vlib_add_trace (vm, node, b[0], sizeof (*t));
	  t->next_worker_index = ti[0];
	}

      n_left_from -= 1;
      ti += 1;
      b += 1;
    }
  n_enq = vlib_buffer_enqueue_to_thread (vm, node, fq_index, from,
					 thread_indices, frame->n_vectors, 1);

  if (n_enq < frame->n_vectors)
    vlib_node_increment_counter (vm, node->node_index,
				 IP6_FULL_REASSEMBLY_HANDOFF_ERROR_CONGESTION_DROP,
				 frame->n_vectors - n_enq);
  return frame->n_vectors;
}

VLIB_NODE_FN (ip6_full_reassembly_handoff_node) (vlib_main_t * vm,
						 vlib_node_runtime_t * node,
						 vlib_frame_t * frame)
{
  return ip6_full_reassembly_handoff_inline (vm, node, frame, NORMAL,
					     false /* is_local */);
}

VLIB_REGISTER_NODE (ip6_full_reassembly_handoff_node) = {
  .name = "ip6-full-reassembly-handoff",
  .vector_size = sizeof (u32),
  .n_errors = ARRAY_LEN(ip6_full_reassembly_handoff_error_strings),
  .error_strings = ip6_full_reassembly_handoff_error_strings,
  .format_trace = format_ip6_full_reassembly_handoff_trace,

  .n_next_nodes = 1,

  .next_nodes = {
    [0] = "error-drop",
  },
};

VLIB_NODE_FN (ip6_local_full_reassembly_handoff_node)
(vlib_main_t *vm, vlib_node_runtime_t *node, vlib_frame_t *frame)
{
  return ip6_full_reassembly_handoff_inline (vm, node, frame, NORMAL,
					     true /* is_feature */);
}

VLIB_REGISTER_NODE (ip6_local_full_reassembly_handoff_node) = {
  .name = "ip6-local-full-reassembly-handoff",
  .vector_size = sizeof (u32),
  .n_errors = ARRAY_LEN(ip6_full_reassembly_handoff_error_strings),
  .error_strings = ip6_full_reassembly_handoff_error_strings,
  .format_trace = format_ip6_full_reassembly_handoff_trace,

  .n_next_nodes = 1,

  .next_nodes = {
    [0] = "error-drop",
  },
};

VLIB_NODE_FN (ip6_full_reassembly_feature_handoff_node) (vlib_main_t * vm,
                               vlib_node_runtime_t * node, vlib_frame_t * frame)
{
  return ip6_full_reassembly_handoff_inline (vm, node, frame, FEATURE,
					     false /* is_local */);
}

VLIB_REGISTER_NODE (ip6_full_reassembly_feature_handoff_node) = {
  .name = "ip6-full-reass-feature-hoff",
  .vector_size = sizeof (u32),
  .n_errors = ARRAY_LEN(ip6_full_reassembly_handoff_error_strings),
  .error_strings = ip6_full_reassembly_handoff_error_strings,
  .format_trace = format_ip6_full_reassembly_handoff_trace,

  .n_next_nodes = 1,

  .next_nodes = {
    [0] = "error-drop",
  },
};

VLIB_NODE_FN (ip6_full_reassembly_custom_handoff_node)
(vlib_main_t *vm, vlib_node_runtime_t *node, vlib_frame_t *frame)
{
  return ip6_full_reassembly_handoff_inline (vm, node, frame, CUSTOM,
					     false /* is_local */);
}

VLIB_REGISTER_NODE (ip6_full_reassembly_custom_handoff_node) = {
  .name = "ip6-full-reass-custom-hoff",
  .vector_size = sizeof (u32),
  .n_errors = ARRAY_LEN(ip6_full_reassembly_handoff_error_strings),
  .error_strings = ip6_full_reassembly_handoff_error_strings,
  .format_trace = format_ip6_full_reassembly_handoff_trace,

  .n_next_nodes = 1,

  .next_nodes = {
    [0] = "error-drop",
  },
};

#ifndef CLIB_MARCH_VARIANT
int
ip6_full_reass_enable_disable_with_refcnt (u32 sw_if_index, int is_enable)
{
  ip6_full_reass_main_t *rm = &ip6_full_reass_main;
  vec_validate (rm->feature_use_refcount_per_intf, sw_if_index);
  if (is_enable)
    {
      if (!rm->feature_use_refcount_per_intf[sw_if_index])
	{
	  ++rm->feature_use_refcount_per_intf[sw_if_index];
	  return vnet_feature_enable_disable ("ip6-unicast",
					      "ip6-full-reassembly-feature",
					      sw_if_index, 1, 0, 0);
	}
      ++rm->feature_use_refcount_per_intf[sw_if_index];
    }
  else
    {
      --rm->feature_use_refcount_per_intf[sw_if_index];
      if (!rm->feature_use_refcount_per_intf[sw_if_index])
	return vnet_feature_enable_disable ("ip6-unicast",
					    "ip6-full-reassembly-feature",
					    sw_if_index, 0, 0, 0);
    }
  return 0;
}

void
ip6_local_full_reass_enable_disable (int enable)
{
  if (enable)
    {
      if (!ip6_full_reass_main.is_local_reass_enabled)
	{
	  ip6_full_reass_main.is_local_reass_enabled = 1;
	  ip6_register_protocol (IP_PROTOCOL_IPV6_FRAGMENTATION,
				 ip6_local_full_reass_node.index);
	}
    }
  else
    {
      if (ip6_full_reass_main.is_local_reass_enabled)
	{
	  ip6_full_reass_main.is_local_reass_enabled = 0;
	  ip6_unregister_protocol (IP_PROTOCOL_IPV6_FRAGMENTATION);
	}
    }
}

int
ip6_local_full_reass_enabled ()
{
  return ip6_full_reass_main.is_local_reass_enabled;
}

#endif

/*
 * fd.io coding-style-patch-verification: ON
 *
 * Local Variables:
 * eval: (c-set-style "gnu")
 * End:
 */