aboutsummaryrefslogtreecommitdiffstats
path: root/Makefile
AgeCommit message (Collapse)AuthorFilesLines
2021-05-17build: fix the "show version" output in the centos release buildsAndrew Yourtchenko1-1/+1
A couple of releases ago I made a change that slightly changes the naming of artifacts, in order to make the version sorting better. However, that change broke the "show version" output of the VPP in the RPM - that build copies the entire source tree into a new location and builds there, supplying the version information in .version file rather than using git describe. Updating only the version script and not the .version file content resulted in the VPP release builds within the RPM installs have the "count of commits since tagging" being zero and the commit ID, rather than having XX.YY-release format. A couple of releases - 20.09 and 21.01, saw a more haphazard solution, but it is proper to fix the content of the .version file so it is all consistent. This commit fixes the contents of this file made for the RPM build, so that the versioning script does not see unexpected input, thus addressing the issue of "show version" in the release build. Change-Id: I0af68e69b1e40fc49ade759bb2f0ed9f47614217 Type: fix Fixes: 1060332e62d1216bf33c697d0a54ba35d4903eb3 Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
2021-04-30misc: experimental configure scriptDamjan Marion1-1/+1
Type: make Change-Id: Iaeb9d22eec9a7a763b63899814a44e78c8050f1f Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-04-27build: Makefile cleanupDamjan Marion1-10/+3
Type: make Change-Id: I751b0a25161c6eb8614ca19f7c77a4de82401f3d Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-04-27misc: auto-generate go bindingsVladimir Lavor1-0/+5
Type: feature Added target 'make go-api-files' creating compatible go bindings using JSON API definition and GoVPP binary API generator. Signed-off-by: Vladimir Lavor <vlavor@cisco.com> Change-Id: I5bae113b85eaf5ebda8e292d34c9826075ef19b5
2021-04-12tests: support attaching to existing vppKlement Sekera1-0/+8
Introduce a new option DEBUG=attach to run a test against existing already running vpp. A new target 'make test-start-gdb' will spawn VPP in gdb for this purpose. Customization options explained in test-help. Type: improvement Change-Id: Ia160a85b33da3b2df292d44bb95729af9dd9da96 Signed-off-by: Klement Sekera <ksekera@cisco.com>
2021-03-25tests: introduce test-checkstyle-diffKlement Sekera1-0/+5
Make test-checkstyle-diff is a new target which checks PEP8 compliance only for changed files. This makes it faster to execute and also more readable as most of the time, only changed files will fail. Type: improvement Signed-off-by: Klement Sekera <ksekera@cisco.com> Change-Id: I71baca76ab3a21a7a3790617cbfb0d48aacbd9ec
2021-03-24misc: fuse fs for the stats segmentArthur de Kerhor1-0/+28
This extra allows to mount a FUSE filesystem reflecting the state of the stats segment. Type: feature Signed-off-by: Arthur de Kerhor <arthurdekerhor@gmail.com> Change-Id: I692f9ca5a65c1123b3cf28c761455eec36049791
2021-03-16misc: fix checkstyle on fedoraRay Kinsella1-3/+0
The fedora clang-format command helpfully does not include the version suffix, and places clang-format-diff in /usr/share/clang. Have summited #1939018 in Fedora, to fix upstream. https://bugzilla.redhat.com/show_bug.cgi?id=1939018 Until then ... Type: fix Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: Ibceae0fc15e7461c24288ee04f4d28943f889c36
2021-03-05api: crchcecker ignore version < 1.0.0 and outside of src directoryOle Troan1-1/+1
- For check patchset ignore files outside of src directory - For check patchset ignore files that have version < 1.0.0 - fix Pylint warnings - Modify vppapigen_crc to include version in JSON output Type: fix Signed-off-by: Ole Troan <ot@cisco.com> Change-Id: I171cf6397e129e2438b2a494c5656236a7810f7b
2021-03-04build: add libmemif as part of build-coverity targetAndrew Yourtchenko1-0/+1
Change-Id: I81a3b5d0845724da40b483832a8eaed081e6e4ed Type: improvement Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
2021-02-12build: add missing virtualenv dependencies for debian-10Andrew Yourtchenko1-0/+1
Type: make Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com> Change-Id: I57a1f90d7fe9e1262f210d0c091bafda5d496c69
2021-02-09misc: Now that VOM is deprecated remove the build dependency on boostNeale Ranns1-2/+1
Type: make Signed-off-by: Neale Ranns <neale@graphiant.com> Change-Id: Icbbae3ab222e7d97e6c496c13ec9229e94cf5ede
2021-02-04linux-cp: Linux Interface Mirroring for Control Plane IntegrationNeale Ranns1-0/+2
Type: feature please see FEATURE.yaml for details. Signed-off-by: Neale Ranns <nranns@cisco.com> Signed-off-by: Matthew Smith <mgsmith@netgate.com> Signed-off-by: Jon Loeliger <jdl@netgate.com> Signed-off-by: Pim van Pelt <pim@ipng.nl> Change-Id: I04a45c15c0838906aa787e06660fa29f39f755fa
2021-01-21build: deprecate debian-9 supportDave Wallace1-4/+0
Type: make Signed-off-by: Dave Wallace <dwallacelf@gmail.com> Change-Id: Ib14ce5561446aefb03c3989cf82ac7c6eb1471bb
2021-01-20build: add python files to ctagsJerome Tollet1-1/+1
Type: improvement Signed-off-by: Jerome Tollet <jtollet@cisco.com> Change-Id: Ifb97b3a52d8bf4ecc09dc1e8ff94992fef309a65
2021-01-15build: add missing openssl-devel package for centos-8 vpp-ext-depsDave Wallace1-1/+1
- In a new centos-8 installation, vpp-ext-deps fails on missing ssl.h header file after 'make install-deps'. Type: fix Signed-off-by: Dave Wallace <dwallacelf@gmail.com> Change-Id: I521d817dd1f1e21aff427d98b9832ea7c7b89339
2021-01-11build: Add deps for ubuntu 20.10Pim van Pelt1-0/+6
Type: make Ubuntu Groovy Gorilla (20.10) has bumped its FFI library. Move from v6 to v8. Signed-off-by: Pim van Pelt <pim@ipng.nl> Change-Id: I32bc2905ad9ed6918446020accee2a4c2ca9d4b5
2020-12-18misc: migrate from GNU indent to clang-formatDamjan Marion1-4/+9
Type: make Change-Id: I085dcd6fe826da14d456f84a23355310bdc5d1e9 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-12-16build: remove centos-7 directive from MakefileDave Wallace1-3/+0
- CentOS-7 support has been deprecated. Type: fix Signed-off-by: Dave Wallace <dwallacelf@gmail.com> Change-Id: If7f1988487d0b63a596dfee9dd03af89fb159573
2020-12-15api: crchcecker ignore version < 1.0.0 and outside of src directoryOle Tr�an1-1/+1
This reverts commit 510aaa8911843206f7b9ff48b41e3c7b8c4a99fe. Reason for revert: failed in case of no api file in changeset. Change-Id: I2c6f01b25a35128df870418eef0008766bb590df Type: fix Signed-off-by: Ole Troan <ot@cisco.com>
2020-12-15api: crchcecker ignore version < 1.0.0 and outside of src directoryOle Troan1-1/+1
- For check patchset ignore files outside of src directory - For check patchset ignore files that have version < 1.0.0 - fix Pylint warnings - Modify vppapigen_crc to include version in JSON output Type: fix Signed-off-by: Ole Troan <ot@cisco.com> Change-Id: I93f7bebeeaeedc19b2b1e5e135ea1035517d7f76 Signed-off-by: Ole Troan <ot@cisco.com>
2020-12-14build: update ctags --tag-relative option used in make ctagsJerome Tollet1-1/+1
Type: fix Exhuberant ctags --tag-relative expects =[yes|no] Signed-off-by: Jerome Tollet <jtollet@cisco.com> Change-Id: Ic60b7014508d5c8c286f85f26e9eb0bdc0e90aa5
2020-12-08build: fix centos-8 'make install-deps' enable PowerTools repoDave Wallace1-1/+2
- The name of the powertools repo was changed [0] in centos-8 from 'PowerTools' to 'powertools'. Retrieve the correct name from 'dnf repolist all' instead of hard coding it. [0] https://git.centos.org/rpms/centos-repos/c/b759b17557b9577e8ea156740af0249ab1a22d70 Type: fix Signed-off-by: Dave Wallace <dwallacelf@gmail.com> Change-Id: Ic1402e671eb1d70dec429bab82ad18d8251f4eef
2020-10-24build: add compile_commands.json cleanup scriptDamjan Marion1-1/+3
Type: make Change-Id: I8d6a5018bddf029e106df3cb8b8eded4fa28067d Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-10-22build: add -E to sudo invocation in top-level MakefileDamjan Marion1-1/+1
Type: make Change-Id: I447c6082bfe4aa549e6a7ebc156b657419fc608c Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-10-16misc: deprecate VOMDamjan Marion1-19/+0
Type: make Change-Id: Ifb3e52af93d24fcc2f2e6a0c408e16902a2fe553 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-10-14build: add missing dnf-plugins-core package on centos-8Dave Wallace1-1/+1
Type: fix Change-Id: I1a4d9a7a8089cbf488dcd6f09eec6b4e0d0d72fe Signed-off-by: Dave Wallace <dwallacelf@gmail.com>
2020-10-02build: fix build for debian testingChuan Han1-1/+3
1. add libelf-dev to default deb deps 2. Also use libffi7 instead of libffi6 for debian-testing Type: fix Signed-off-by: Chuan Han <chuan.han.comm@gmail.com> Change-Id: I9f13955812877422ecb8aac3dd34c5828b9c4607
2020-09-26build: remove OS distros which are EOLDave Wallace1-11/+0
Type: fix Signed-off-by: Dave Wallace <dwallacelf@gmail.com> Change-Id: If80ff6bfbd42779a663af1e7dcfff80d75f47f1e
2020-09-24build: fix missing openssl package on debian-10Dave Wallace1-0/+1
- libssl-dev missing on debian-10 breaks 'make install-ext-deps' Type: fix Signed-off-by: Dave Wallace <dwallacelf@gmail.com> Change-Id: Ib6a6f120147e8ae0dcfead6fae9f0a7a3434d687
2020-09-21build: remove opensuse build infraDave Wallace1-47/+2
- VPP on opensuse has not been supported for several releases. Type: fix Signed-off-by: Dave Wallace <dwallacelf@gmail.com> Change-Id: I2b5316ad5c20a843b8936f4ceb473f932a5338d9
2020-09-18build: missing deb pkg on ubuntu-20.04Dave Wallace1-1/+1
- The vpp build on the ubuntu-20.04 executor failed due to the package 'dh-python' not getting installed by 'make install-dep' Type: fix Signed-off-by: Dave Wallace <dwallacelf@gmail.com> Change-Id: Id9307ad1b4e34c413d90258c6bde2aa5afafec63
2020-09-15build: fix the the build on centos/rhel 8Yichen Wang1-0/+1
1. Remove uncessary runtime dependency; 2. Add missing build dependency; 3. Fix runtime dependency for api-python3 RPM; Type: make Change-Id: I2700f1a15112effba8d1527aca6467158f81f486 Signed-off-by: Yichen Wang <yicwang@cisco.com>
2020-09-11build: fix build for Debian 9 and Debian 10Benoît Ganne1-0/+2
Type: fix Change-Id: Ic07d0ae313b32e420ba93693cb75960a86f752a9 Signed-off-by: Benoît Ganne <bganne@cisco.com>
2020-09-02build: Fix 'make build VPP_EXTRA_CMAKE_ARGS=-DVPP_ENABLE_SANITIZE_ADDR=ON' ↵jiangxiaoming1-2/+2
error on Centos 7 Type: fix Signed-off-by: jiangxiaoming <jiangxiaoming@outlook.com> Change-Id: Ic47f5e8627923c951333c70004850b53ed4cab06
2020-08-31af_xdp: AF_XDP input pluginBenoît Ganne1-0/+1
Type: feature Change-Id: I85aa4ad6b68c1aa0e51938002dc691a4b11c545c Signed-off-by: Damjan Marion <damarion@cisco.com> Signed-off-by: Benoît Ganne <bganne@cisco.com>
2020-07-29build: Fix 'make install-deps' errors on aarch64 CentOS 7Jieqiang Wang1-1/+6
On CentOS-7 aarch64, command of 'debuginfo-install -y glibc openssl-libs mbedtls-devel zlib' in 'make install-deps' fails because it tries to install the corresponding *debuginfo* packages from some inaccessible/unmaintained repos on aarch64, e.g., centos-sclo-rh-debuginfo. The error message shows as below. Using 'debuginfo-install --enablerepo=xxx' also fails because it will still enable all the repos including the broken repos on aarch64. Using 'debuginfo-install --disablerepo=xxx' (xxx is the broken repo) works fine but we are not centain about that if VPP user will install additional broken repos on aarch64 or not. So to fix this error, we install all the *debuginfo* packages for 'glibc openssl-libs mbedtls-devel zlib' packages using 'yum install' instead. [root@ ~]# debuginfo-install -y glibc openssl-libs mbedtls-devel zlib Loaded plugins: auto-update-debuginfo, fastestmirror, ovl enabling epel-debuginfo enabling base-debuginfo enabling centos-sclo-rh-debuginfo Loading mirror speeds from cached hostfile epel/aarch64/metalink | 8.2 kB 00:00:00 epel-debuginfo/aarch64/metalink | 8.5 kB 00:00:00 * base: mirror.aktkn.sg * centos-sclo-rh: mirror.aktkn.sg * epel: mirrors.yun-idc.com * epel-debuginfo: mirrors.yun-idc.com * extras: mirror.aktkn.sg * updates: mirror.xtom.com.hk http://debuginfo.centos.org/centos/7/sclo/aarch64/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found Trying other mirror. To address this issue please refer to the below wiki article https://wiki.centos.org/yum-errors If above article doesn't help to resolve this issue please use https://bugs.centos.org/. failure: repodata/repomd.xml from centos-sclo-rh-debuginfo: [Errno 256] No more mirrors to try. http://debuginfo.centos.org/centos/7/sclo/aarch64/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found Type: fix Change-Id: I017c3b20a167d8035c3ae617b9ad5ae479e52f57 Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
2020-06-15build: fix the build on centos8Yichen Wang1-1/+5
Add missing dependencies and correct the building to support CentOS8 Type: make Change-Id: Ie15b9b1174fa9b6d5ae02bace36ebc77e17d770c Signed-off-by: Yichen Wang <yicwang@cisco.com>
2020-06-04build: add libssl-dev library for ubuntu 20.04Jieqiang Wang1-0/+1
Add the libssl-dev library for ubuntu 20.04 in Makefile. Type: fix Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com> Change-Id: I4187cb041997e7457734ffdb18bdbec98a051669
2020-05-15misc: fix ubuntu 20.04 python depsDamjan Marion1-7/+7
Type: fix Change-Id: I9cdfbffd6333d090f970422bf047aaa90c1e4c65 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-05-09vppapigen: api crc checkerOle Troan1-0/+5
crcchecker is a tool for enforcement of the binary API. 1. Production APIs should never change. 2. An API can be deprecated across three release cycles. Release 1: API is marked as deprecated. option deprecated="vyy.mm"; option replaced_by="new_api_2"; Release 2: both old and new APIs are supported Release 3: the deprecated API is deleted. 3. APIs that are experimental / not released are not checked. An API message can be individually marked as in progress, by: option status="in_progress"; In the API definition. The definition of a "production API" is if the major version in the API file is > 0. extras/scripts/crcchecker.py --check-patchset # returns -1 if backwards incompatible extras/scripts/crcchecker.py --dump-manifest extras/scripts/crcchecker.py --git-revision v20.01 <files> extras/scripts/crcchecker.py -- diff <oldfile> <newfile> This patch integrates the tool in "make checkstyle-api". A future patch is required to integrate the tool in the verify job. I.e. this patch does not enable enforcement through Jenkins. Change-Id: I5f13c0976d8a12a58131b3e270f2dc9c00dc7d8c Type: feature Signed-off-by: Ole Troan <ot@cisco.com>
2020-05-08misc: add knob to generate compile_commands.jsonDamjan Marion1-0/+5
Used for lanuguage servers like clangd and ccls Type: improvement Change-Id: I68d534dfa7b8ba3459fbd919d5ffccaa1fa1171e Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-04-30build: rework x86 CPU variantsDamjan Marion1-7/+2
Type: improvement Change-Id: Ief243f88e654e578ef9b8060fcf535b364aececb Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-04-29misc: switch to clang-9Damjan Marion1-1/+1
Type: improvement Change-Id: Iebf77a63c0c19b130a3fbd26b5293304a9fed4c1 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-03-27build tests: fix 'test-wipe-papi' targetPaul Vinciguerra1-1/+1
Fix transposed terms. Type: fix Change-Id: Ibc3f5d5d9dbd81c9edf09ae5024c3ac4b1939d03 Signed-off-by: Paul Vinciguerra <pvinci@vinciconsulting.com>
2020-03-26build: use gcc-8 as default on ubuntu 18.04Damjan Marion1-0/+6
Type: improvement Change-Id: I34c9e95ad9160436cb62dec7a1a2d0ce94602ab7 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-03-22vppinfra: fix typo in dlmalloc.cDave Barach1-1/+7
Fix libffi package name for Ubuntu 20.04 Type: fix Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: Idc567717494b4c40c307f20a40d5e10cd26b0a46
2020-03-18build: add snap packaging (experimental)Dave Barach1-1/+17
Type: feature Signed-off-by: Dave Barach <dave@barachs.net> Change-Id: I5a5efde5378f63d89d82d71ae009c7595aaa800c
2020-03-10build: add libssl-dev for ubuntu 16.04 and 18.04Jieqiang Wang1-0/+2
The recent changes to Makefile lead to the lack of libssl-dev dependency for ubuntu 16.04 and 18.04. Add libssl-dev to DEB_DEPENDS variable for corresponding ubuntu version. Type: fix Change-Id: I42e0e4761d5ec377de71b11cccf747c7f55ca337 Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
2020-02-07build: Makefile dep change for ubuntuEd Kern1-3/+5
Alter dep name and location for ubuntu-20 package naming Dropping 14.04 support while keeping 16.04 and 18.04 Dropping python2-dev for ubuntu-20 Type: make Change-Id: I324aa646cdb6e13d39b7a99722857e59906b0843 Signed-off-by: Ed Kern <ejk@cisco.com>
'>1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043
/*
 * Copyright (c) 2016 Cisco and/or its affiliates.
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at:
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#include <vnet/tcp/tcp.h>
#include <vnet/lisp-cp/packets.h>
#include <math.h>

vlib_node_registration_t tcp4_output_node;
vlib_node_registration_t tcp6_output_node;

typedef enum _tcp_output_next
{
  TCP_OUTPUT_NEXT_DROP,
  TCP_OUTPUT_NEXT_IP_LOOKUP,
  TCP_OUTPUT_N_NEXT
} tcp_output_next_t;

#define foreach_tcp4_output_next              	\
  _ (DROP, "error-drop")                        \
  _ (IP_LOOKUP, "ip4-lookup")

#define foreach_tcp6_output_next              	\
  _ (DROP, "error-drop")                        \
  _ (IP_LOOKUP, "ip6-lookup")

static char *tcp_error_strings[] = {
#define tcp_error(n,s) s,
#include <vnet/tcp/tcp_error.def>
#undef tcp_error
};

typedef struct
{
  tcp_header_t tcp_header;
  tcp_connection_t tcp_connection;
} tcp_tx_trace_t;

u16 dummy_mtu = 1460;

u8 *
format_tcp_tx_trace (u8 * s, va_list * args)
{
  CLIB_UNUSED (vlib_main_t * vm) = va_arg (*args, vlib_main_t *);
  CLIB_UNUSED (vlib_node_t * node) = va_arg (*args, vlib_node_t *);
  tcp_tx_trace_t *t = va_arg (*args, tcp_tx_trace_t *);
  uword indent = format_get_indent (s);

  s = format (s, "%U\n%U%U",
	      format_tcp_header, &t->tcp_header, 128,
	      format_white_space, indent,
	      format_tcp_connection, &t->tcp_connection, 1);

  return s;
}

static u8
tcp_window_compute_scale (u32 available_space)
{
  u8 wnd_scale = 0;
  while (wnd_scale < TCP_MAX_WND_SCALE
	 && (available_space >> wnd_scale) > TCP_WND_MAX)
    wnd_scale++;
  return wnd_scale;
}

/**
 * Update max segment size we're able to process.
 *
 * The value is constrained by our interface's MTU and IP options. It is
 * also what we advertise to our peer.
 */
void
tcp_update_rcv_mss (tcp_connection_t * tc)
{
  /* TODO find our iface MTU */
  tc->mss = dummy_mtu - sizeof (tcp_header_t);
}

/**
 * TCP's initial window
 */
always_inline u32
tcp_initial_wnd_unscaled (tcp_connection_t * tc)
{
  /* RFC 6928 recommends the value lower. However at the time our connections
   * are initialized, fifos may not be allocated. Therefore, advertise the
   * smallest possible unscaled window size and update once fifos are
   * assigned to the session.
   */
  /*
     tcp_update_rcv_mss (tc);
     TCP_IW_N_SEGMENTS * tc->mss;
   */
  return TCP_MIN_RX_FIFO_SIZE;
}

/**
 * Compute initial window and scale factor. As per RFC1323, window field in
 * SYN and SYN-ACK segments is never scaled.
 */
u32
tcp_initial_window_to_advertise (tcp_connection_t * tc)
{
  u32 max_fifo;

  /* Initial wnd for SYN. Fifos are not allocated yet.
   * Use some predefined value. For SYN-ACK we still want the
   * scale to be computed in the same way */
  max_fifo = TCP_MAX_RX_FIFO_SIZE;

  tc->rcv_wscale = tcp_window_compute_scale (max_fifo);
  tc->rcv_wnd = tcp_initial_wnd_unscaled (tc);

  return clib_min (tc->rcv_wnd, TCP_WND_MAX);
}

/**
 * Compute and return window to advertise, scaled as per RFC1323
 */
u32
tcp_window_to_advertise (tcp_connection_t * tc, tcp_state_t state)
{
  if (state < TCP_STATE_ESTABLISHED)
    return tcp_initial_window_to_advertise (tc);

  tcp_update_rcv_wnd (tc);

  if (tc->rcv_wnd == 0)
    {
      tc->flags |= TCP_CONN_SENT_RCV_WND0;
    }
  else
    {
      tc->flags &= ~TCP_CONN_SENT_RCV_WND0;
    }

  return tc->rcv_wnd >> tc->rcv_wscale;
}

void
tcp_update_rcv_wnd (tcp_connection_t * tc)
{
  i32 observed_wnd;
  u32 available_space, max_fifo, wnd;

  /*
   * Figure out how much space we have available
   */
  available_space = stream_session_max_rx_enqueue (&tc->connection);
  max_fifo = stream_session_rx_fifo_size (&tc->connection);

  ASSERT (tc->rcv_opts.mss < max_fifo);
  if (available_space < tc->rcv_opts.mss && available_space < max_fifo >> 3)
    available_space = 0;

  /*
   * Use the above and what we know about what we've previously advertised
   * to compute the new window
   */
  observed_wnd = (i32) tc->rcv_wnd - (tc->rcv_nxt - tc->rcv_las);
  if (observed_wnd < 0)
    observed_wnd = 0;

  /* Bad. Thou shalt not shrink */
  if (available_space < observed_wnd)
    {
      wnd = observed_wnd;
      TCP_EVT_DBG (TCP_EVT_RCV_WND_SHRUNK, tc, observed_wnd, available_space);
    }
  else
    {
      wnd = available_space;
    }

  /* Make sure we have a multiple of rcv_wscale */
  if (wnd && tc->rcv_wscale)
    {
      wnd &= ~(1 << tc->rcv_wscale);
      if (wnd == 0)
	wnd = 1 << tc->rcv_wscale;
    }

  tc->rcv_wnd = clib_min (wnd, TCP_WND_MAX << tc->rcv_wscale);
}

/**
 * Write TCP options to segment.
 */
u32
tcp_options_write (u8 * data, tcp_options_t * opts)
{
  u32 opts_len = 0;
  u32 buf, seq_len = 4;

  if (tcp_opts_mss (opts))
    {
      *data++ = TCP_OPTION_MSS;
      *data++ = TCP_OPTION_LEN_MSS;
      buf = clib_host_to_net_u16 (opts->mss);
      clib_memcpy (data, &buf, sizeof (opts->mss));
      data += sizeof (opts->mss);
      opts_len += TCP_OPTION_LEN_MSS;
    }

  if (tcp_opts_wscale (opts))
    {
      *data++ = TCP_OPTION_WINDOW_SCALE;
      *data++ = TCP_OPTION_LEN_WINDOW_SCALE;
      *data++ = opts->wscale;
      opts_len += TCP_OPTION_LEN_WINDOW_SCALE;
    }

  if (tcp_opts_sack_permitted (opts))
    {
      *data++ = TCP_OPTION_SACK_PERMITTED;
      *data++ = TCP_OPTION_LEN_SACK_PERMITTED;
      opts_len += TCP_OPTION_LEN_SACK_PERMITTED;
    }

  if (tcp_opts_tstamp (opts))
    {
      *data++ = TCP_OPTION_TIMESTAMP;
      *data++ = TCP_OPTION_LEN_TIMESTAMP;
      buf = clib_host_to_net_u32 (opts->tsval);
      clib_memcpy (data, &buf, sizeof (opts->tsval));
      data += sizeof (opts->tsval);
      buf = clib_host_to_net_u32 (opts->tsecr);
      clib_memcpy (data, &buf, sizeof (opts->tsecr));
      data += sizeof (opts->tsecr);
      opts_len += TCP_OPTION_LEN_TIMESTAMP;
    }

  if (tcp_opts_sack (opts))
    {
      int i;
      u32 n_sack_blocks = clib_min (vec_len (opts->sacks),
				    TCP_OPTS_MAX_SACK_BLOCKS);

      if (n_sack_blocks != 0)
	{
	  *data++ = TCP_OPTION_SACK_BLOCK;
	  *data++ = 2 + n_sack_blocks * TCP_OPTION_LEN_SACK_BLOCK;
	  for (i = 0; i < n_sack_blocks; i++)
	    {
	      buf = clib_host_to_net_u32 (opts->sacks[i].start);
	      clib_memcpy (data, &buf, seq_len);
	      data += seq_len;
	      buf = clib_host_to_net_u32 (opts->sacks[i].end);
	      clib_memcpy (data, &buf, seq_len);
	      data += seq_len;
	    }
	  opts_len += 2 + n_sack_blocks * TCP_OPTION_LEN_SACK_BLOCK;
	}
    }

  /* Terminate TCP options */
  if (opts_len % 4)
    {
      *data++ = TCP_OPTION_EOL;
      opts_len += TCP_OPTION_LEN_EOL;
    }

  /* Pad with zeroes to a u32 boundary */
  while (opts_len % 4)
    {
      *data++ = TCP_OPTION_NOOP;
      opts_len += TCP_OPTION_LEN_NOOP;
    }
  return opts_len;
}

always_inline int
tcp_make_syn_options (tcp_options_t * opts, u8 wnd_scale)
{
  u8 len = 0;

  opts->flags |= TCP_OPTS_FLAG_MSS;
  opts->mss = dummy_mtu;	/*XXX discover that */
  len += TCP_OPTION_LEN_MSS;

  opts->flags |= TCP_OPTS_FLAG_WSCALE;
  opts->wscale = wnd_scale;
  len += TCP_OPTION_LEN_WINDOW_SCALE;

  opts->flags |= TCP_OPTS_FLAG_TSTAMP;
  opts->tsval = tcp_time_now ();
  opts->tsecr = 0;
  len += TCP_OPTION_LEN_TIMESTAMP;

  if (TCP_USE_SACKS)
    {
      opts->flags |= TCP_OPTS_FLAG_SACK_PERMITTED;
      len += TCP_OPTION_LEN_SACK_PERMITTED;
    }

  /* Align to needed boundary */
  len += (TCP_OPTS_ALIGN - len % TCP_OPTS_ALIGN) % TCP_OPTS_ALIGN;
  return len;
}

always_inline int
tcp_make_synack_options (tcp_connection_t * tc, tcp_options_t * opts)
{
  u8 len = 0;

  opts->flags |= TCP_OPTS_FLAG_MSS;
  opts->mss = tc->mss;
  len += TCP_OPTION_LEN_MSS;

  if (tcp_opts_wscale (&tc->rcv_opts))
    {
      opts->flags |= TCP_OPTS_FLAG_WSCALE;
      opts->wscale = tc->rcv_wscale;
      len += TCP_OPTION_LEN_WINDOW_SCALE;
    }

  if (tcp_opts_tstamp (&tc->rcv_opts))
    {
      opts->flags |= TCP_OPTS_FLAG_TSTAMP;
      opts->tsval = tcp_time_now ();
      opts->tsecr = tc->tsval_recent;
      len += TCP_OPTION_LEN_TIMESTAMP;
    }

  if (tcp_opts_sack_permitted (&tc->rcv_opts))
    {
      opts->flags |= TCP_OPTS_FLAG_SACK_PERMITTED;
      len += TCP_OPTION_LEN_SACK_PERMITTED;
    }

  /* Align to needed boundary */
  len += (TCP_OPTS_ALIGN - len % TCP_OPTS_ALIGN) % TCP_OPTS_ALIGN;
  return len;
}

always_inline int
tcp_make_established_options (tcp_connection_t * tc, tcp_options_t * opts)
{
  u8 len = 0;

  opts->flags = 0;

  if (tcp_opts_tstamp (&tc->rcv_opts))
    {
      opts->flags |= TCP_OPTS_FLAG_TSTAMP;
      opts->tsval = tcp_time_now ();
      opts->tsecr = tc->tsval_recent;
      len += TCP_OPTION_LEN_TIMESTAMP;
    }
  if (tcp_opts_sack_permitted (&tc->rcv_opts))
    {
      if (vec_len (tc->snd_sacks))
	{
	  opts->flags |= TCP_OPTS_FLAG_SACK;
	  opts->sacks = tc->snd_sacks;
	  opts->n_sack_blocks = clib_min (vec_len (tc->snd_sacks),
					  TCP_OPTS_MAX_SACK_BLOCKS);
	  len += 2 + TCP_OPTION_LEN_SACK_BLOCK * opts->n_sack_blocks;
	}
    }

  /* Align to needed boundary */
  len += (TCP_OPTS_ALIGN - len % TCP_OPTS_ALIGN) % TCP_OPTS_ALIGN;
  return len;
}

always_inline int
tcp_make_options (tcp_connection_t * tc, tcp_options_t * opts,
		  tcp_state_t state)
{
  switch (state)
    {
    case TCP_STATE_ESTABLISHED:
    case TCP_STATE_FIN_WAIT_1:
      return tcp_make_established_options (tc, opts);
    case TCP_STATE_SYN_RCVD:
      return tcp_make_synack_options (tc, opts);
    case TCP_STATE_SYN_SENT:
      return tcp_make_syn_options (opts, tc->rcv_wscale);
    default:
      clib_warning ("Not handled!");
      return 0;
    }
}

/**
 * Update snd_mss to reflect the effective segment size that we can send
 * by taking into account all TCP options, including SACKs
 */
void
tcp_update_snd_mss (tcp_connection_t * tc)
{
  /* Compute options to be used for connection. These may be reused when
   * sending data or to compute the effective mss (snd_mss) */
  tc->snd_opts_len =
    tcp_make_options (tc, &tc->snd_opts, TCP_STATE_ESTABLISHED);

  /* XXX check if MTU has been updated */
  tc->snd_mss = clib_min (tc->mss, tc->rcv_opts.mss) - tc->snd_opts_len;
  ASSERT (tc->snd_mss > 0);
}

void
tcp_init_mss (tcp_connection_t * tc)
{
  u16 default_min_mss = 536;
  tcp_update_rcv_mss (tc);

  /* TODO cache mss and consider PMTU discovery */
  tc->snd_mss = clib_min (tc->rcv_opts.mss, tc->mss);

  if (tc->snd_mss < 45)
    {
      clib_warning ("snd mss is 0");
      /* Assume that at least the min default mss works */
      tc->snd_mss = default_min_mss;
      tc->rcv_opts.mss = default_min_mss;
    }

  /* We should have enough space for 40 bytes of options */
  ASSERT (tc->snd_mss > 45);

  /* If we use timestamp option, account for it */
  if (tcp_opts_tstamp (&tc->rcv_opts))
    tc->snd_mss -= TCP_OPTION_LEN_TIMESTAMP;
}

always_inline int
tcp_alloc_tx_buffers (tcp_main_t * tm, u8 thread_index, u32 n_free_buffers)
{
  vec_validate (tm->tx_buffers[thread_index],
		vec_len (tm->tx_buffers[thread_index]) + n_free_buffers - 1);
  _vec_len (tm->tx_buffers[thread_index]) =
    vlib_buffer_alloc_from_free_list (vlib_get_main (),
				      tm->tx_buffers[thread_index],
				      n_free_buffers,
				      VLIB_BUFFER_DEFAULT_FREE_LIST_INDEX);
  /* buffer shortage, report failure */
  if (vec_len (tm->tx_buffers[thread_index]) == 0)
    {
      clib_warning ("out of buffers");
      return -1;
    }
  return 0;
}

always_inline int
tcp_get_free_buffer_index (tcp_main_t * tm, u32 * bidx)
{
  u32 *my_tx_buffers;
  u32 thread_index = vlib_get_thread_index ();
  if (PREDICT_FALSE (vec_len (tm->tx_buffers[thread_index]) == 0))
    {
      if (tcp_alloc_tx_buffers (tm, thread_index, VLIB_FRAME_SIZE))
	return -1;
    }
  my_tx_buffers = tm->tx_buffers[thread_index];
  *bidx = my_tx_buffers[_vec_len (my_tx_buffers) - 1];
  _vec_len (my_tx_buffers) -= 1;
  return 0;
}

always_inline void
tcp_return_buffer (tcp_main_t * tm)
{
  u32 *my_tx_buffers;
  u32 thread_index = vlib_get_thread_index ();
  my_tx_buffers = tm->tx_buffers[thread_index];
  _vec_len (my_tx_buffers) += 1;
}

always_inline void *
tcp_reuse_buffer (vlib_main_t * vm, vlib_buffer_t * b)
{
  if (b->flags & VLIB_BUFFER_NEXT_PRESENT)
    vlib_buffer_free_one (vm, b->next_buffer);
  b->flags = 0;
  b->current_data = 0;
  b->current_length = 0;
  b->total_length_not_including_first_buffer = 0;
  vnet_buffer (b)->tcp.flags = 0;

  /* Leave enough space for headers */
  return vlib_buffer_make_headroom (b, MAX_HDRS_LEN);
}

always_inline void *
tcp_init_buffer (vlib_main_t * vm, vlib_buffer_t * b)
{
  ASSERT ((b->flags & VLIB_BUFFER_NEXT_PRESENT) == 0);
  b->flags = VNET_BUFFER_F_LOCALLY_ORIGINATED;
  b->total_length_not_including_first_buffer = 0;
  vnet_buffer (b)->tcp.flags = 0;

  /* Leave enough space for headers */
  return vlib_buffer_make_headroom (b, MAX_HDRS_LEN);
}

/**
 * Prepare ACK
 */
void
tcp_make_ack_i (tcp_connection_t * tc, vlib_buffer_t * b, tcp_state_t state,
		u8 flags)
{
  tcp_options_t _snd_opts, *snd_opts = &_snd_opts;
  u8 tcp_opts_len, tcp_hdr_opts_len;
  tcp_header_t *th;
  u16 wnd;

  wnd = tcp_window_to_advertise (tc, state);

  /* Make and write options */
  tcp_opts_len = tcp_make_established_options (tc, snd_opts);
  tcp_hdr_opts_len = tcp_opts_len + sizeof (tcp_header_t);

  th = vlib_buffer_push_tcp (b, tc->c_lcl_port, tc->c_rmt_port, tc->snd_nxt,
			     tc->rcv_nxt, tcp_hdr_opts_len, flags, wnd);

  tcp_options_write ((u8 *) (th + 1), snd_opts);
  vnet_buffer (b)->tcp.connection_index = tc->c_c_index;
}

/**
 * Convert buffer to ACK
 */
void
tcp_make_ack (tcp_connection_t * tc, vlib_buffer_t * b)
{
  vlib_main_t *vm = vlib_get_main ();

  tcp_reuse_buffer (vm, b);
  tcp_make_ack_i (tc, b, TCP_STATE_ESTABLISHED, TCP_FLAG_ACK);
  TCP_EVT_DBG (TCP_EVT_ACK_SENT, tc);
  vnet_buffer (b)->tcp.flags = TCP_BUF_FLAG_ACK;
  tc->rcv_las = tc->rcv_nxt;
}

/**
 * Convert buffer to FIN-ACK
 */
void
tcp_make_fin (tcp_connection_t * tc, vlib_buffer_t * b)
{
  vlib_main_t *vm = vlib_get_main ();
  u8 flags = 0;

  tcp_reuse_buffer (vm, b);

  flags = TCP_FLAG_FIN | TCP_FLAG_ACK;
  tcp_make_ack_i (tc, b, TCP_STATE_ESTABLISHED, flags);

  /* Reset flags, make sure ack is sent */
  vnet_buffer (b)->tcp.flags &= ~TCP_BUF_FLAG_DUPACK;

  tc->snd_nxt += 1;
}

/**
 * Convert buffer to SYN-ACK
 */
void
tcp_make_synack (tcp_connection_t * tc, vlib_buffer_t * b)
{
  vlib_main_t *vm = vlib_get_main ();
  tcp_options_t _snd_opts, *snd_opts = &_snd_opts;
  u8 tcp_opts_len, tcp_hdr_opts_len;
  tcp_header_t *th;
  u16 initial_wnd;
  u32 time_now;

  memset (snd_opts, 0, sizeof (*snd_opts));

  tcp_reuse_buffer (vm, b);

  /* Set random initial sequence */
  time_now = tcp_time_now ();

  tc->iss = random_u32 (&time_now);
  tc->snd_una = tc->iss;
  tc->snd_nxt = tc->iss + 1;
  tc->snd_una_max = tc->snd_nxt;

  initial_wnd = tcp_initial_window_to_advertise (tc);

  /* Make and write options */
  tcp_opts_len = tcp_make_synack_options (tc, snd_opts);
  tcp_hdr_opts_len = tcp_opts_len + sizeof (tcp_header_t);

  th = vlib_buffer_push_tcp (b, tc->c_lcl_port, tc->c_rmt_port, tc->iss,
			     tc->rcv_nxt, tcp_hdr_opts_len,
			     TCP_FLAG_SYN | TCP_FLAG_ACK, initial_wnd);

  tcp_options_write ((u8 *) (th + 1), snd_opts);

  vnet_buffer (b)->tcp.connection_index = tc->c_c_index;
  vnet_buffer (b)->tcp.flags = TCP_BUF_FLAG_ACK;

  /* Init retransmit timer */
  tcp_retransmit_timer_set (tc);
  TCP_EVT_DBG (TCP_EVT_SYNACK_SENT, tc);
}

always_inline void
tcp_enqueue_to_ip_lookup (vlib_main_t * vm, vlib_buffer_t * b, u32 bi,
			  u8 is_ip4)
{
  u32 *to_next, next_index;
  vlib_frame_t *f;

  b->flags |= VNET_BUFFER_F_LOCALLY_ORIGINATED;
  b->error = 0;

  /* Default FIB for now */
  vnet_buffer (b)->sw_if_index[VLIB_TX] = 0;

  /* Send to IP lookup */
  next_index = is_ip4 ? ip4_lookup_node.index : ip6_lookup_node.index;
  f = vlib_get_frame_to_node (vm, next_index);

  /* Enqueue the packet */
  to_next = vlib_frame_vector_args (f);
  to_next[0] = bi;
  f->n_vectors = 1;
  vlib_put_frame_to_node (vm, next_index, f);
}

always_inline void
tcp_enqueue_to_output_i (vlib_main_t * vm, vlib_buffer_t * b, u32 bi,
			 u8 is_ip4, u8 flush)
{
  tcp_main_t *tm = vnet_get_tcp_main ();
  u32 thread_index = vlib_get_thread_index ();
  u32 *to_next, next_index;
  vlib_frame_t *f;

  b->flags |= VNET_BUFFER_F_LOCALLY_ORIGINATED;
  b->error = 0;

  /* Decide where to send the packet */
  next_index = is_ip4 ? tcp4_output_node.index : tcp6_output_node.index;

  /* Initialize the trajectory trace, if configured */
  if (VLIB_BUFFER_TRACE_TRAJECTORY > 0)
    {
      b->pre_data[0] = 1;
      b->pre_data[1] = next_index;
    }

  /* Get frame to v4/6 output node */
  f = tm->tx_frames[!is_ip4][thread_index];
  if (!f)
    {
      f = vlib_get_frame_to_node (vm, next_index);
      ASSERT (f);
      tm->tx_frames[!is_ip4][thread_index] = f;
    }
  to_next = vlib_frame_vector_args (f);
  to_next[f->n_vectors] = bi;
  f->n_vectors += 1;
  if (flush || f->n_vectors == VLIB_FRAME_SIZE)
    {
      vlib_put_frame_to_node (vm, next_index, f);
      tm->tx_frames[!is_ip4][thread_index] = 0;
    }
}

always_inline void
tcp_enqueue_to_output (vlib_main_t * vm, vlib_buffer_t * b, u32 bi, u8 is_ip4)
{
  tcp_enqueue_to_output_i (vm, b, bi, is_ip4, 0);
}

always_inline void
tcp_enqueue_to_output_now (vlib_main_t * vm, vlib_buffer_t * b, u32 bi,
			   u8 is_ip4)
{
  tcp_enqueue_to_output_i (vm, b, bi, is_ip4, 1);
}

int
tcp_make_reset_in_place (vlib_main_t * vm, vlib_buffer_t * b0,
			 tcp_state_t state, u8 thread_index, u8 is_ip4)
{
  ip4_header_t *ih4;
  ip6_header_t *ih6;
  tcp_header_t *th0;
  ip4_address_t src_ip40, dst_ip40;
  ip6_address_t src_ip60, dst_ip60;
  u16 src_port, dst_port;
  u32 tmp;
  u32 seq, ack;
  u8 flags;

  /* Find IP and TCP headers */
  th0 = tcp_buffer_hdr (b0);

  /* Save src and dst ip */
  if (is_ip4)
    {
      ih4 = vlib_buffer_get_current (b0);
      ASSERT ((ih4->ip_version_and_header_length & 0xF0) == 0x40);
      src_ip40.as_u32 = ih4->src_address.as_u32;
      dst_ip40.as_u32 = ih4->dst_address.as_u32;
    }
  else
    {
      ih6 = vlib_buffer_get_current (b0);
      ASSERT ((ih6->ip_version_traffic_class_and_flow_label & 0xF0) == 0x60);
      clib_memcpy (&src_ip60, &ih6->src_address, sizeof (ip6_address_t));
      clib_memcpy (&dst_ip60, &ih6->dst_address, sizeof (ip6_address_t));
    }

  src_port = th0->src_port;
  dst_port = th0->dst_port;

  /* Try to determine what/why we're actually resetting */
  if (state == TCP_STATE_CLOSED)
    {
      if (!tcp_syn (th0))
	return -1;

      tmp = clib_net_to_host_u32 (th0->seq_number);

      /* Got a SYN for no listener. */
      flags = TCP_FLAG_RST | TCP_FLAG_ACK;
      ack = clib_host_to_net_u32 (tmp + 1);
      seq = 0;
    }
  else
    {
      flags = TCP_FLAG_RST;
      seq = th0->ack_number;
      ack = 0;
    }

  tcp_reuse_buffer (vm, b0);
  th0 = vlib_buffer_push_tcp_net_order (b0, dst_port, src_port, seq, ack,
					sizeof (tcp_header_t), flags, 0);

  if (is_ip4)
    {
      ih4 = vlib_buffer_push_ip4 (vm, b0, &dst_ip40, &src_ip40,
				  IP_PROTOCOL_TCP, 1);
      th0->checksum = ip4_tcp_udp_compute_checksum (vm, b0, ih4);
    }
  else
    {
      int bogus = ~0;
      ih6 = vlib_buffer_push_ip6 (vm, b0, &dst_ip60, &src_ip60,
				  IP_PROTOCOL_TCP);
      th0->checksum = ip6_tcp_udp_icmp_compute_checksum (vm, b0, ih6, &bogus);
      ASSERT (!bogus);
    }

  return 0;
}

/**
 *  Send reset without reusing existing buffer
 *
 *  It extracts connection info out of original packet
 */
void
tcp_send_reset_w_pkt (tcp_connection_t * tc, vlib_buffer_t * pkt, u8 is_ip4)
{
  vlib_buffer_t *b;
  u32 bi;
  tcp_main_t *tm = vnet_get_tcp_main ();
  vlib_main_t *vm = vlib_get_main ();
  u8 tcp_hdr_len, flags = 0;
  tcp_header_t *th, *pkt_th;
  u32 seq, ack;
  ip4_header_t *ih4, *pkt_ih4;
  ip6_header_t *ih6, *pkt_ih6;

  if (PREDICT_FALSE (tcp_get_free_buffer_index (tm, &bi)))
    return;

  b = vlib_get_buffer (vm, bi);
  tcp_init_buffer (vm, b);

  /* Make and write options */
  tcp_hdr_len = sizeof (tcp_header_t);

  if (is_ip4)
    {
      pkt_ih4 = vlib_buffer_get_current (pkt);
      pkt_th = ip4_next_header (pkt_ih4);
    }
  else
    {
      pkt_ih6 = vlib_buffer_get_current (pkt);
      pkt_th = ip6_next_header (pkt_ih6);
    }

  if (tcp_ack (pkt_th))
    {
      flags = TCP_FLAG_RST;
      seq = pkt_th->ack_number;
      ack = (tc && tc->state >= TCP_STATE_SYN_RCVD) ? tc->rcv_nxt : 0;
    }
  else
    {
      flags = TCP_FLAG_RST | TCP_FLAG_ACK;
      seq = 0;
      ack = clib_host_to_net_u32 (vnet_buffer (pkt)->tcp.seq_end);
    }

  th = vlib_buffer_push_tcp_net_order (b, pkt_th->dst_port, pkt_th->src_port,
				       seq, ack, tcp_hdr_len, flags, 0);

  /* Swap src and dst ip */
  if (is_ip4)
    {
      ASSERT ((pkt_ih4->ip_version_and_header_length & 0xF0) == 0x40);
      ih4 = vlib_buffer_push_ip4 (vm, b, &pkt_ih4->dst_address,
				  &pkt_ih4->src_address, IP_PROTOCOL_TCP, 1);
      th->checksum = ip4_tcp_udp_compute_checksum (vm, b, ih4);
    }
  else
    {
      int bogus = ~0;
      ASSERT ((pkt_ih6->ip_version_traffic_class_and_flow_label & 0xF0) ==
	      0x60);
      ih6 = vlib_buffer_push_ip6 (vm, b, &pkt_ih6->dst_address,
				  &pkt_ih6->src_address, IP_PROTOCOL_TCP);
      th->checksum = ip6_tcp_udp_icmp_compute_checksum (vm, b, ih6, &bogus);
      ASSERT (!bogus);
    }

  tcp_enqueue_to_ip_lookup (vm, b, bi, is_ip4);
  TCP_EVT_DBG (TCP_EVT_RST_SENT, tc);
}

/**
 * Build and set reset packet for connection
 */
void
tcp_send_reset (tcp_connection_t * tc)
{
  vlib_main_t *vm = vlib_get_main ();
  tcp_main_t *tm = vnet_get_tcp_main ();
  vlib_buffer_t *b;
  u32 bi;
  tcp_header_t *th;
  u16 tcp_hdr_opts_len, advertise_wnd, opts_write_len;
  u8 flags;

  if (PREDICT_FALSE (tcp_get_free_buffer_index (tm, &bi)))
    return;
  b = vlib_get_buffer (vm, bi);
  tcp_init_buffer (vm, b);

  tc->snd_opts_len = tcp_make_options (tc, &tc->snd_opts, tc->state);
  tcp_hdr_opts_len = tc->snd_opts_len + sizeof (tcp_header_t);
  advertise_wnd = tcp_window_to_advertise (tc, TCP_STATE_ESTABLISHED);
  flags = TCP_FLAG_RST;
  th = vlib_buffer_push_tcp (b, tc->c_lcl_port, tc->c_rmt_port, tc->snd_nxt,
			     tc->rcv_nxt, tcp_hdr_opts_len, flags,
			     advertise_wnd);
  opts_write_len = tcp_options_write ((u8 *) (th + 1), &tc->snd_opts);
  ASSERT (opts_write_len == tc->snd_opts_len);
  vnet_buffer (b)->tcp.connection_index = tc->c_c_index;
  tcp_enqueue_to_output_now (vm, b, bi, tc->c_is_ip4);
}

void
tcp_push_ip_hdr (tcp_main_t * tm, tcp_connection_t * tc, vlib_buffer_t * b)
{
  tcp_header_t *th = vlib_buffer_get_current (b);
  vlib_main_t *vm = vlib_get_main ();
  if (tc->c_is_ip4)
    {
      ip4_header_t *ih;
      ih = vlib_buffer_push_ip4 (vm, b, &tc->c_lcl_ip4,
				 &tc->c_rmt_ip4, IP_PROTOCOL_TCP, 1);
      th->checksum = ip4_tcp_udp_compute_checksum (vm, b, ih);
    }
  else
    {
      ip6_header_t *ih;
      int bogus = ~0;

      ih = vlib_buffer_push_ip6 (vm, b, &tc->c_lcl_ip6,
				 &tc->c_rmt_ip6, IP_PROTOCOL_TCP);
      th->checksum = ip6_tcp_udp_icmp_compute_checksum (vm, b, ih, &bogus);
      ASSERT (!bogus);
    }
}

/**
 *  Send SYN
 *
 *  Builds a SYN packet for a half-open connection and sends it to ipx_lookup.
 *  The packet is not forwarded through tcpx_output to avoid doing lookups
 *  in the half_open pool.
 */
void
tcp_send_syn (tcp_connection_t * tc)
{
  vlib_buffer_t *b;
  u32 bi;
  tcp_main_t *tm = vnet_get_tcp_main ();
  vlib_main_t *vm = vlib_get_main ();
  u8 tcp_hdr_opts_len, tcp_opts_len;
  tcp_header_t *th;
  u32 time_now;
  u16 initial_wnd;
  tcp_options_t snd_opts;

  if (PREDICT_FALSE (tcp_get_free_buffer_index (tm, &bi)))
    return;

  b = vlib_get_buffer (vm, bi);
  tcp_init_buffer (vm, b);

  /* Set random initial sequence */
  time_now = tcp_time_now ();

  tc->iss = random_u32 (&time_now);
  tc->snd_una = tc->iss;
  tc->snd_una_max = tc->snd_nxt = tc->iss + 1;

  initial_wnd = tcp_initial_window_to_advertise (tc);

  /* Make and write options */
  memset (&snd_opts, 0, sizeof (snd_opts));
  tcp_opts_len = tcp_make_syn_options (&snd_opts, tc->rcv_wscale);
  tcp_hdr_opts_len = tcp_opts_len + sizeof (tcp_header_t);

  th = vlib_buffer_push_tcp (b, tc->c_lcl_port, tc->c_rmt_port, tc->iss,
			     tc->rcv_nxt, tcp_hdr_opts_len, TCP_FLAG_SYN,
			     initial_wnd);

  tcp_options_write ((u8 *) (th + 1), &snd_opts);

  /* Measure RTT with this */
  tc->rtt_ts = tcp_time_now ();
  tc->rtt_seq = tc->snd_nxt;

  /* Start retransmit trimer  */
  tcp_timer_set (tc, TCP_TIMER_RETRANSMIT_SYN, tc->rto * TCP_TO_TIMER_TICK);
  tc->rto_boff = 0;

  /* Set the connection establishment timer */
  tcp_timer_set (tc, TCP_TIMER_ESTABLISH, TCP_ESTABLISH_TIME);

  tcp_push_ip_hdr (tm, tc, b);
  tcp_enqueue_to_ip_lookup (vm, b, bi, tc->c_is_ip4);
  TCP_EVT_DBG (TCP_EVT_SYN_SENT, tc);
}

/**
 * Flush tx frame populated by retransmits and timer pops
 */
void
tcp_flush_frame_to_output (vlib_main_t * vm, u8 thread_index, u8 is_ip4)
{
  if (tcp_main.tx_frames[!is_ip4][thread_index])
    {
      u32 next_index;
      next_index = is_ip4 ? tcp4_output_node.index : tcp6_output_node.index;
      vlib_put_frame_to_node (vm, next_index,
			      tcp_main.tx_frames[!is_ip4][thread_index]);
      tcp_main.tx_frames[!is_ip4][thread_index] = 0;
    }
}

/**
 * Flush both v4 and v6 tx frames for thread index
 */
void
tcp_flush_frames_to_output (u8 thread_index)
{
  vlib_main_t *vm = vlib_get_main ();
  tcp_flush_frame_to_output (vm, thread_index, 1);
  tcp_flush_frame_to_output (vm, thread_index, 0);
}

/**
 *  Send FIN
 */
void
tcp_send_fin (tcp_connection_t * tc)
{
  vlib_buffer_t *b;
  u32 bi;
  tcp_main_t *tm = vnet_get_tcp_main ();
  vlib_main_t *vm = vlib_get_main ();

  if (PREDICT_FALSE (tcp_get_free_buffer_index (tm, &bi)))
    return;
  b = vlib_get_buffer (vm, bi);
  /* buffer will be initialized by in tcp_make_fin */
  tcp_make_fin (tc, b);
  tcp_enqueue_to_output_now (vm, b, bi, tc->c_is_ip4);
  tc->flags |= TCP_CONN_FINSNT;
  tc->flags &= ~TCP_CONN_FINPNDG;
  tcp_retransmit_timer_force_update (tc);
  TCP_EVT_DBG (TCP_EVT_FIN_SENT, tc);
}

always_inline u8
tcp_make_state_flags (tcp_connection_t * tc, tcp_state_t next_state)
{
  switch (next_state)
    {
    case TCP_STATE_ESTABLISHED:
      return TCP_FLAG_ACK;
    case TCP_STATE_SYN_RCVD:
      return TCP_FLAG_SYN | TCP_FLAG_ACK;
    case TCP_STATE_SYN_SENT:
      return TCP_FLAG_SYN;
    case TCP_STATE_LAST_ACK:
    case TCP_STATE_FIN_WAIT_1:
      if (tc->snd_nxt + 1 < tc->snd_una_max)
	return TCP_FLAG_ACK;
      else
	return TCP_FLAG_FIN;
    default:
      clib_warning ("Shouldn't be here!");
    }
  return 0;
}

/**
 * Push TCP header and update connection variables
 */
static void
tcp_push_hdr_i (tcp_connection_t * tc, vlib_buffer_t * b,
		tcp_state_t next_state, u8 compute_opts)
{
  u32 advertise_wnd, data_len;
  u8 tcp_hdr_opts_len, opts_write_len, flags;
  tcp_header_t *th;

  data_len = b->current_length + b->total_length_not_including_first_buffer;
  ASSERT (!b->total_length_not_including_first_buffer
	  || (b->flags & VLIB_BUFFER_NEXT_PRESENT));
  vnet_buffer (b)->tcp.flags = 0;

  if (compute_opts)
    tc->snd_opts_len = tcp_make_options (tc, &tc->snd_opts, tc->state);

  tcp_hdr_opts_len = tc->snd_opts_len + sizeof (tcp_header_t);
  advertise_wnd = tcp_window_to_advertise (tc, next_state);
  flags = tcp_make_state_flags (tc, next_state);

  /* Push header and options */
  th = vlib_buffer_push_tcp (b, tc->c_lcl_port, tc->c_rmt_port, tc->snd_nxt,
			     tc->rcv_nxt, tcp_hdr_opts_len, flags,
			     advertise_wnd);
  opts_write_len = tcp_options_write ((u8 *) (th + 1), &tc->snd_opts);

  ASSERT (opts_write_len == tc->snd_opts_len);
  vnet_buffer (b)->tcp.connection_index = tc->c_c_index;

  /*
   * Update connection variables
   */

  tc->snd_nxt += data_len;
  tc->rcv_las = tc->rcv_nxt;

  /* TODO this is updated in output as well ... */
  if (seq_gt (tc->snd_nxt, tc->snd_una_max))
    {
      tc->snd_una_max = tc->snd_nxt;
      tcp_validate_txf_size (tc, tc->snd_una_max - tc->snd_una);
    }

  TCP_EVT_DBG (TCP_EVT_PKTIZE, tc);
}

void
tcp_send_ack (tcp_connection_t * tc)
{
  tcp_main_t *tm = vnet_get_tcp_main ();
  vlib_main_t *vm = vlib_get_main ();

  vlib_buffer_t *b;
  u32 bi;

  /* Get buffer */
  if (PREDICT_FALSE (tcp_get_free_buffer_index (tm, &bi)))
    return;
  b = vlib_get_buffer (vm, bi);

  /* Fill in the ACK */
  tcp_make_ack (tc, b);
  tcp_enqueue_to_output (vm, b, bi, tc->c_is_ip4);
}

/**
 * Delayed ack timer handler
 *
 * Sends delayed ACK when timer expires
 */
void
tcp_timer_delack_handler (u32 index)
{
  u32 thread_index = vlib_get_thread_index ();
  tcp_connection_t *tc;

  tc = tcp_connection_get (index, thread_index);
  tc->timers[TCP_TIMER_DELACK] = TCP_TIMER_HANDLE_INVALID;
  tcp_send_ack (tc);
}

/**
 * Build a retransmit segment
 *
 * @return the number of bytes in the segment or 0 if there's nothing to
 *         retransmit
 */
u32
tcp_prepare_retransmit_segment (tcp_connection_t * tc, u32 offset,
				u32 max_deq_bytes, vlib_buffer_t ** b)
{
  tcp_main_t *tm = vnet_get_tcp_main ();
  vlib_main_t *vm = vlib_get_main ();
  int n_bytes = 0;
  u32 start, bi, available_bytes, seg_size;
  u8 *data;

  ASSERT (tc->state >= TCP_STATE_ESTABLISHED);
  ASSERT (max_deq_bytes != 0);

  /*
   * Make sure we can retransmit something
   */
  available_bytes = stream_session_tx_fifo_max_dequeue (&tc->connection);
  available_bytes -= offset;
  if (!available_bytes)
    return 0;
  max_deq_bytes = clib_min (tc->snd_mss, max_deq_bytes);
  max_deq_bytes = clib_min (available_bytes, max_deq_bytes);

  /* Start is beyond snd_congestion */
  start = tc->snd_una + offset;
  if (seq_geq (start, tc->snd_congestion))
    goto done;

  /* Don't overshoot snd_congestion */
  if (seq_gt (start + max_deq_bytes, tc->snd_congestion))
    {
      max_deq_bytes = tc->snd_congestion - start;
      if (max_deq_bytes == 0)
	goto done;
    }

  seg_size = max_deq_bytes + MAX_HDRS_LEN;

  /*
   * Prepare options
   */
  tc->snd_opts_len = tcp_make_options (tc, &tc->snd_opts, tc->state);

  /*
   * Allocate and fill in buffer(s)
   */

  if (PREDICT_FALSE (tcp_get_free_buffer_index (tm, &bi)))
    return 0;
  *b = vlib_get_buffer (vm, bi);
  data = tcp_init_buffer (vm, *b);

  /* Easy case, buffer size greater than mss */
  if (PREDICT_TRUE (seg_size <= tm->bytes_per_buffer))
    {
      n_bytes = stream_session_peek_bytes (&tc->connection, data, offset,
					   max_deq_bytes);
      ASSERT (n_bytes == max_deq_bytes);
      b[0]->current_length = n_bytes;
      tcp_push_hdr_i (tc, *b, tc->state, 0);
    }
  /* Split mss into multiple buffers */
  else
    {
      u32 chain_bi = ~0, n_bufs_per_seg;
      u32 thread_index = vlib_get_thread_index ();
      u16 n_peeked, len_to_deq, available_bufs;
      vlib_buffer_t *chain_b, *prev_b;
      int i;

      n_bufs_per_seg = ceil ((double) seg_size / tm->bytes_per_buffer);

      /* Make sure we have enough buffers */
      available_bufs = vec_len (tm->tx_buffers[thread_index]);
      if (n_bufs_per_seg > available_bufs)
	{
	  if (tcp_alloc_tx_buffers (tm, thread_index,
				    VLIB_FRAME_SIZE - available_bufs))
	    {
	      tcp_return_buffer (tm);
	      return 0;
	    }
	}

      n_bytes = stream_session_peek_bytes (&tc->connection, data, offset,
					   tm->bytes_per_buffer -
					   MAX_HDRS_LEN);
      b[0]->current_length = n_bytes;
      b[0]->flags |= VLIB_BUFFER_TOTAL_LENGTH_VALID;
      b[0]->total_length_not_including_first_buffer = 0;
      max_deq_bytes -= n_bytes;

      chain_b = *b;
      for (i = 1; i < n_bufs_per_seg; i++)
	{
	  prev_b = chain_b;
	  len_to_deq = clib_min (max_deq_bytes, tm->bytes_per_buffer);
	  tcp_get_free_buffer_index (tm, &chain_bi);
	  ASSERT (chain_bi != (u32) ~ 0);
	  chain_b = vlib_get_buffer (vm, chain_bi);
	  chain_b->current_data = 0;
	  data = vlib_buffer_get_current (chain_b);
	  n_peeked = stream_session_peek_bytes (&tc->connection, data,
						offset + n_bytes, len_to_deq);
	  ASSERT (n_peeked == len_to_deq);
	  n_bytes += n_peeked;
	  chain_b->current_length = n_peeked;
	  chain_b->flags = 0;
	  chain_b->next_buffer = 0;

	  /* update previous buffer */
	  prev_b->next_buffer = chain_bi;
	  prev_b->flags |= VLIB_BUFFER_NEXT_PRESENT;

	  max_deq_bytes -= n_peeked;
	  b[0]->total_length_not_including_first_buffer += n_peeked;
	}

      tcp_push_hdr_i (tc, *b, tc->state, 0);
    }

  ASSERT (n_bytes > 0);
  ASSERT (((*b)->current_data + (*b)->current_length) <=
	  tm->bytes_per_buffer);

  if (tcp_in_fastrecovery (tc))
    tc->snd_rxt_bytes += n_bytes;

done:
  TCP_EVT_DBG (TCP_EVT_CC_RTX, tc, offset, n_bytes);
  return n_bytes;
}

/**
 * Reset congestion control, switch cwnd to loss window and try again.
 */
static void
tcp_rtx_timeout_cc (tcp_connection_t * tc)
{
  tc->prev_ssthresh = tc->ssthresh;
  tc->prev_cwnd = tc->cwnd;

  /* Cleanly recover cc (also clears up fast retransmit) */
  if (tcp_in_fastrecovery (tc))
    tcp_cc_fastrecovery_exit (tc);

  /* Start again from the beginning */
  tc->ssthresh = clib_max (tcp_flight_size (tc) / 2, 2 * tc->snd_mss);
  tc->cwnd = tcp_loss_wnd (tc);
  tc->snd_congestion = tc->snd_una_max;

  tcp_recovery_on (tc);
}

static void
tcp_timer_retransmit_handler_i (u32 index, u8 is_syn)
{
  tcp_main_t *tm = vnet_get_tcp_main ();
  vlib_main_t *vm = vlib_get_main ();
  u32 thread_index = vlib_get_thread_index ();
  tcp_connection_t *tc;
  vlib_buffer_t *b = 0;
  u32 bi, n_bytes;

  if (is_syn)
    {
      tc = tcp_half_open_connection_get (index);
      tc->timers[TCP_TIMER_RETRANSMIT_SYN] = TCP_TIMER_HANDLE_INVALID;
    }
  else
    {
      tc = tcp_connection_get (index, thread_index);
      tc->timers[TCP_TIMER_RETRANSMIT] = TCP_TIMER_HANDLE_INVALID;
    }

  if (!tcp_in_recovery (tc) && tc->rto_boff > 0
      && tc->state >= TCP_STATE_ESTABLISHED)
    {
      tc->rto_boff = 0;
      tcp_update_rto (tc);
    }

  /* Increment RTO backoff (also equal to number of retries) */
  tc->rto_boff += 1;

  /* Go back to first un-acked byte */
  tc->snd_nxt = tc->snd_una;

  if (tc->state >= TCP_STATE_ESTABLISHED)
    {
      /* Lost FIN, retransmit and return */
      if (tcp_is_lost_fin (tc))
	{
	  tcp_send_fin (tc);
	  return;
	}

      /* First retransmit timeout */
      if (tc->rto_boff == 1)
	tcp_rtx_timeout_cc (tc);

      /* Exponential backoff */
      tc->rto = clib_min (tc->rto << 1, TCP_RTO_MAX);

      TCP_EVT_DBG (TCP_EVT_CC_EVT, tc, 1);

      /* Send one segment */
      n_bytes = tcp_prepare_retransmit_segment (tc, 0, tc->snd_mss, &b);
      ASSERT (n_bytes);
      bi = vlib_get_buffer_index (vm, b);
      /* TODO be less aggressive about this */
      scoreboard_clear (&tc->sack_sb);

      if (n_bytes == 0)
	{
	  clib_warning ("could not retransmit anything");
	  clib_warning ("%U", format_tcp_connection, tc, 2);

	  /* Try again eventually */
	  tcp_retransmit_timer_set (tc);
	  ASSERT (0 || (tc->rto_boff > 1
			&& tc->snd_una == tc->snd_congestion));
	  return;
	}

      /* For first retransmit, record timestamp (Eifel detection RFC3522) */
      if (tc->rto_boff == 1)
	tc->snd_rxt_ts = tcp_time_now ();
    }
  /* Retransmit for SYN/SYNACK */
  else if (tc->state == TCP_STATE_SYN_RCVD || tc->state == TCP_STATE_SYN_SENT)
    {
      /* Half-open connection actually moved to established but we were
       * waiting for syn retransmit to pop to call cleanup from the right
       * thread. */
      if (tc->flags & TCP_CONN_HALF_OPEN_DONE)
	{
	  ASSERT (tc->state == TCP_STATE_SYN_SENT);
	  if (tcp_half_open_connection_cleanup (tc))
	    {
	      clib_warning ("could not remove half-open connection");
	      ASSERT (0);
	    }
	  return;
	}

      /* Try without increasing RTO a number of times. If this fails,
       * start growing RTO exponentially */
      if (tc->rto_boff > TCP_RTO_SYN_RETRIES)
	tc->rto = clib_min (tc->rto << 1, TCP_RTO_MAX);

      if (PREDICT_FALSE (tcp_get_free_buffer_index (tm, &bi)))
	return;
      b = vlib_get_buffer (vm, bi);
      tcp_init_buffer (vm, b);
      tcp_push_hdr_i (tc, b, tc->state, 1);

      /* Account for the SYN */
      tc->snd_nxt += 1;
      tc->rtt_ts = 0;
      TCP_EVT_DBG (TCP_EVT_SYN_RXT, tc,
		   (tc->state == TCP_STATE_SYN_SENT ? 0 : 1));
    }
  else
    {
      ASSERT (tc->state == TCP_STATE_CLOSED);
      clib_warning ("connection closed ...");
      return;
    }

  if (!is_syn)
    {
      tcp_enqueue_to_output (vm, b, bi, tc->c_is_ip4);

      /* Re-enable retransmit timer */
      tcp_retransmit_timer_set (tc);
    }
  else
    {
      ASSERT (tc->state == TCP_STATE_SYN_SENT);

      /* This goes straight to ipx_lookup */
      tcp_push_ip_hdr (tm, tc, b);
      tcp_enqueue_to_ip_lookup (vm, b, bi, tc->c_is_ip4);

      /* Re-enable retransmit timer */
      tcp_timer_set (tc, TCP_TIMER_RETRANSMIT_SYN,
		     tc->rto * TCP_TO_TIMER_TICK);
    }
}

void
tcp_timer_retransmit_handler (u32 index)
{
  tcp_timer_retransmit_handler_i (index, 0);
}

void
tcp_timer_retransmit_syn_handler (u32 index)
{
  tcp_timer_retransmit_handler_i (index, 1);
}

/**
 * Got 0 snd_wnd from peer, try to do something about it.
 *
 */
void
tcp_timer_persist_handler (u32 index)
{
  tcp_main_t *tm = vnet_get_tcp_main ();
  vlib_main_t *vm = vlib_get_main ();
  u32 thread_index = vlib_get_thread_index ();
  tcp_connection_t *tc;
  vlib_buffer_t *b;
  u32 bi, old_snd_nxt, max_snd_bytes, available_bytes, offset;
  int n_bytes = 0;
  u8 *data;

  tc = tcp_connection_get_if_valid (index, thread_index);

  if (!tc)
    return;

  /* Make sure timer handle is set to invalid */
  tc->timers[TCP_TIMER_PERSIST] = TCP_TIMER_HANDLE_INVALID;

  /* Problem already solved or worse */
  if (tc->state == TCP_STATE_CLOSED || tc->state > TCP_STATE_ESTABLISHED
      || tc->snd_wnd > tc->snd_mss || tcp_in_recovery (tc))
    return;

  available_bytes = stream_session_tx_fifo_max_dequeue (&tc->connection);
  offset = tc->snd_una_max - tc->snd_una;

  /* Reprogram persist if no new bytes available to send. We may have data
   * next time */
  if (!available_bytes)
    {
      tcp_persist_timer_set (tc);
      return;
    }

  if (available_bytes <= offset)
    {
      ASSERT (tcp_timer_is_active (tc, TCP_TIMER_RETRANSMIT));
      return;
    }

  /* Increment RTO backoff */
  tc->rto_boff += 1;
  tc->rto = clib_min (tc->rto << 1, TCP_RTO_MAX);

  /*
   * Try to force the first unsent segment (or buffer)
   */
  if (PREDICT_FALSE (tcp_get_free_buffer_index (tm, &bi)))
    return;
  b = vlib_get_buffer (vm, bi);
  data = tcp_init_buffer (vm, b);

  tcp_validate_txf_size (tc, offset);
  tc->snd_opts_len = tcp_make_options (tc, &tc->snd_opts, tc->state);
  max_snd_bytes = clib_min (tc->snd_mss, tm->bytes_per_buffer - MAX_HDRS_LEN);
  n_bytes = stream_session_peek_bytes (&tc->connection, data, offset,
				       max_snd_bytes);
  b->current_length = n_bytes;
  ASSERT (n_bytes != 0 && (tc->snd_nxt == tc->snd_una_max || tc->rto_boff > 1
			   || tcp_timer_is_active (tc,
						   TCP_TIMER_RETRANSMIT)));

  /* Allow updating of snd_una_max but don't update snd_nxt */
  old_snd_nxt = tc->snd_nxt;
  tcp_push_hdr_i (tc, b, tc->state, 0);
  tc->snd_nxt = old_snd_nxt;
  tcp_enqueue_to_output (vm, b, bi, tc->c_is_ip4);

  /* Just sent new data, enable retransmit */
  tcp_retransmit_timer_update (tc);
}

/**
 * Retransmit first unacked segment
 */
void
tcp_retransmit_first_unacked (tcp_connection_t * tc)
{
  vlib_main_t *vm = vlib_get_main ();
  vlib_buffer_t *b;
  u32 bi, old_snd_nxt, n_bytes;

  old_snd_nxt = tc->snd_nxt;
  tc->snd_nxt = tc->snd_una;

  TCP_EVT_DBG (TCP_EVT_CC_EVT, tc, 2);
  n_bytes = tcp_prepare_retransmit_segment (tc, 0, tc->snd_mss, &b);
  if (!n_bytes)
    return;
  bi = vlib_get_buffer_index (vm, b);
  tcp_enqueue_to_output (vm, b, bi, tc->c_is_ip4);

  tc->snd_nxt = old_snd_nxt;
}

/**
 * Do fast retransmit with SACKs
 */
void
tcp_fast_retransmit_sack (tcp_connection_t * tc)
{
  vlib_main_t *vm = vlib_get_main ();
  u32 n_written = 0, offset, max_bytes;
  vlib_buffer_t *b = 0;
  sack_scoreboard_hole_t *hole;
  sack_scoreboard_t *sb;
  u32 bi, old_snd_nxt;
  int snd_space;
  u8 snd_limited = 0, can_rescue = 0;

  ASSERT (tcp_in_fastrecovery (tc));
  TCP_EVT_DBG (TCP_EVT_CC_EVT, tc, 0);

  old_snd_nxt = tc->snd_nxt;
  sb = &tc->sack_sb;
  snd_space = tcp_available_snd_space (tc);

  hole = scoreboard_get_hole (sb, sb->cur_rxt_hole);
  while (hole && snd_space > 0)
    {
      hole = scoreboard_next_rxt_hole (sb, hole,
				       tcp_fastrecovery_sent_1_smss (tc),
				       &can_rescue, &snd_limited);
      if (!hole)
	{
	  if (!can_rescue || !(seq_lt (sb->rescue_rxt, tc->snd_una)
			       || seq_gt (sb->rescue_rxt,
					  tc->snd_congestion)))
	    break;

	  /* If rescue rxt undefined or less than snd_una then one segment of
	   * up to SMSS octets that MUST include the highest outstanding
	   * unSACKed sequence number SHOULD be returned, and RescueRxt set to
	   * RecoveryPoint. HighRxt MUST NOT be updated.
	   */
	  max_bytes = clib_min (tc->snd_mss,
				tc->snd_congestion - tc->snd_una);
	  max_bytes = clib_min (max_bytes, snd_space);
	  offset = tc->snd_congestion - tc->snd_una - max_bytes;
	  sb->rescue_rxt = tc->snd_congestion;
	  tc->snd_nxt = tc->snd_una + offset;
	  n_written = tcp_prepare_retransmit_segment (tc, offset, max_bytes,
						      &b);
	  ASSERT (n_written);
	  bi = vlib_get_buffer_index (vm, b);
	  tcp_enqueue_to_output (vm, b, bi, tc->c_is_ip4);
	  break;
	}

      max_bytes = clib_min (hole->end - sb->high_rxt, snd_space);
      max_bytes = snd_limited ? clib_min (max_bytes, tc->snd_mss) : max_bytes;
      if (max_bytes == 0)
	break;
      offset = sb->high_rxt - tc->snd_una;
      tc->snd_nxt = sb->high_rxt;
      n_written = tcp_prepare_retransmit_segment (tc, offset, max_bytes, &b);

      /* Nothing left to retransmit */
      if (n_written == 0)
	break;

      bi = vlib_get_buffer_index (vm, b);
      sb->high_rxt += n_written;
      tcp_enqueue_to_output (vm, b, bi, tc->c_is_ip4);
      ASSERT (n_written <= snd_space);
      snd_space -= n_written;
    }

  /* If window allows, send 1 SMSS of new data */
  tc->snd_nxt = old_snd_nxt;
}

/**
 * Fast retransmit without SACK info
 */
void
tcp_fast_retransmit_no_sack (tcp_connection_t * tc)
{
  vlib_main_t *vm = vlib_get_main ();
  u32 n_written = 0, offset = 0, bi, old_snd_nxt;
  int snd_space;
  vlib_buffer_t *b;

  ASSERT (tcp_in_fastrecovery (tc));
  TCP_EVT_DBG (TCP_EVT_CC_EVT, tc, 0);

  /* Start resending from first un-acked segment */
  old_snd_nxt = tc->snd_nxt;
  tc->snd_nxt = tc->snd_una;
  snd_space = tcp_available_snd_space (tc);

  while (snd_space > 0)
    {
      offset += n_written;
      n_written = tcp_prepare_retransmit_segment (tc, offset, snd_space, &b);

      /* Nothing left to retransmit */
      if (n_written == 0)
	break;

      bi = vlib_get_buffer_index (vm, b);
      tcp_enqueue_to_output (vm, b, bi, tc->c_is_ip4);
      snd_space -= n_written;
    }

  /* Restore snd_nxt. If window allows, send 1 SMSS of new data */
  tc->snd_nxt = old_snd_nxt;
}

/**
 * Do fast retransmit
 */
void
tcp_fast_retransmit (tcp_connection_t * tc)
{
  if (tcp_opts_sack_permitted (&tc->rcv_opts)
      && scoreboard_first_hole (&tc->sack_sb))
    tcp_fast_retransmit_sack (tc);
  else
    tcp_fast_retransmit_no_sack (tc);
}

always_inline u32
tcp_session_has_ooo_data (tcp_connection_t * tc)
{
  stream_session_t *s =
    stream_session_get (tc->c_s_index, tc->c_thread_index);
  return svm_fifo_has_ooo_data (s->server_rx_fifo);
}

always_inline uword
tcp46_output_inline (vlib_main_t * vm,
		     vlib_node_runtime_t * node,
		     vlib_frame_t * from_frame, int is_ip4)
{
  u32 n_left_from, next_index, *from, *to_next;
  u32 my_thread_index = vm->thread_index;

  from = vlib_frame_vector_args (from_frame);
  n_left_from = from_frame->n_vectors;
  next_index = node->cached_next_index;
  tcp_set_time_now (my_thread_index);

  while (n_left_from > 0)
    {
      u32 n_left_to_next;

      vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next);

      while (n_left_from > 0 && n_left_to_next > 0)
	{
	  u32 bi0;
	  vlib_buffer_t *b0;
	  tcp_connection_t *tc0;
	  tcp_tx_trace_t *t0;
	  tcp_header_t *th0 = 0;
	  u32 error0 = TCP_ERROR_PKTS_SENT, next0 = TCP_OUTPUT_NEXT_IP_LOOKUP;

	  bi0 = from[0];
	  to_next[0] = bi0;
	  from += 1;
	  to_next += 1;
	  n_left_from -= 1;
	  n_left_to_next -= 1;

	  b0 = vlib_get_buffer (vm, bi0);
	  tc0 = tcp_connection_get (vnet_buffer (b0)->tcp.connection_index,
				    my_thread_index);
	  if (PREDICT_FALSE (tc0 == 0 || tc0->state == TCP_STATE_CLOSED))
	    {
	      error0 = TCP_ERROR_INVALID_CONNECTION;
	      next0 = TCP_OUTPUT_NEXT_DROP;
	      goto done;
	    }

	  th0 = vlib_buffer_get_current (b0);
	  TCP_EVT_DBG (TCP_EVT_OUTPUT, tc0, th0->flags, b0->current_length);

	  if (is_ip4)
	    {
	      vlib_buffer_push_ip4 (vm, b0, &tc0->c_lcl_ip4, &tc0->c_rmt_ip4,
				    IP_PROTOCOL_TCP, 1);
	      b0->flags |= VNET_BUFFER_F_OFFLOAD_TCP_CKSUM;
	      vnet_buffer (b0)->l4_hdr_offset = (u8 *) th0 - b0->data;
	      th0->checksum = 0;
	    }
	  else
	    {
	      ip6_header_t *ih0;
	      ih0 = vlib_buffer_push_ip6 (vm, b0, &tc0->c_lcl_ip6,
					  &tc0->c_rmt_ip6, IP_PROTOCOL_TCP);
	      b0->flags |= VNET_BUFFER_F_OFFLOAD_TCP_CKSUM;
	      vnet_buffer (b0)->l3_hdr_offset = (u8 *) ih0 - b0->data;
	      vnet_buffer (b0)->l4_hdr_offset = (u8 *) th0 - b0->data;
	      th0->checksum = 0;
	    }

	  /* Filter out DUPACKs if there are no OOO segments left */
	  if (PREDICT_FALSE
	      (vnet_buffer (b0)->tcp.flags & TCP_BUF_FLAG_DUPACK))
	    {
	      if (!tcp_session_has_ooo_data (tc0))
		{
		  error0 = TCP_ERROR_FILTERED_DUPACKS;
		  next0 = TCP_OUTPUT_NEXT_DROP;
		  goto done;
		}
	    }

	  /* Stop DELACK timer and fix flags */
	  tc0->flags &= ~(TCP_CONN_SNDACK);
	  tcp_timer_reset (tc0, TCP_TIMER_DELACK);

	  /* If not retransmitting
	   * 1) update snd_una_max (SYN, SYNACK, FIN)
	   * 2) If we're not tracking an ACK, start tracking */
	  if (seq_lt (tc0->snd_una_max, tc0->snd_nxt))
	    {
	      tc0->snd_una_max = tc0->snd_nxt;
	      if (tc0->rtt_ts == 0)
		{
		  tc0->rtt_ts = tcp_time_now ();
		  tc0->rtt_seq = tc0->snd_nxt;
		}
	    }

	  /* Set the retransmit timer if not set already and not
	   * doing a pure ACK */
	  if (!tcp_timer_is_active (tc0, TCP_TIMER_RETRANSMIT)
	      && tc0->snd_nxt != tc0->snd_una)
	    {
	      tcp_retransmit_timer_set (tc0);
	      tc0->rto_boff = 0;
	    }

#if 0
	  /* Make sure we haven't lost route to our peer */
	  if (PREDICT_FALSE (tc0->last_fib_check
			     < tc0->snd_opts.tsval + TCP_FIB_RECHECK_PERIOD))
	    {
	      if (PREDICT_TRUE
		  (tc0->c_rmt_fei == tcp_lookup_rmt_in_fib (tc0)))
		{
		  tc0->last_fib_check = tc0->snd_opts.tsval;
		}
	      else
		{
		  clib_warning ("lost connection to peer");
		  tcp_connection_reset (tc0);
		  goto done;
		}
	    }

	  /* Use pre-computed dpo to set next node */
	  next0 = tc0->c_rmt_dpo.dpoi_next_node;
	  vnet_buffer (b0)->ip.adj_index[VLIB_TX] = tc0->c_rmt_dpo.dpoi_index;
#endif

	  vnet_buffer (b0)->sw_if_index[VLIB_RX] = 0;
	  vnet_buffer (b0)->sw_if_index[VLIB_TX] = ~0;

	  b0->flags |= VNET_BUFFER_F_LOCALLY_ORIGINATED;
	done:
	  b0->error = node->errors[error0];
	  if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED))
	    {
	      t0 = vlib_add_trace (vm, node, b0, sizeof (*t0));
	      if (th0)
		{
		  clib_memcpy (&t0->tcp_header, th0, sizeof (t0->tcp_header));
		}
	      else
		{
		  memset (&t0->tcp_header, 0, sizeof (t0->tcp_header));
		}
	      clib_memcpy (&t0->tcp_connection, tc0,
			   sizeof (t0->tcp_connection));
	    }

	  vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next,
					   n_left_to_next, bi0, next0);
	}

      vlib_put_next_frame (vm, node, next_index, n_left_to_next);
    }

  return from_frame->n_vectors;
}

static uword
tcp4_output (vlib_main_t * vm, vlib_node_runtime_t * node,
	     vlib_frame_t * from_frame)
{
  return tcp46_output_inline (vm, node, from_frame, 1 /* is_ip4 */ );
}

static uword
tcp6_output (vlib_main_t * vm, vlib_node_runtime_t * node,
	     vlib_frame_t * from_frame)
{
  return tcp46_output_inline (vm, node, from_frame, 0 /* is_ip4 */ );
}

/* *INDENT-OFF* */
VLIB_REGISTER_NODE (tcp4_output_node) =
{
  .function = tcp4_output,.name = "tcp4-output",
    /* Takes a vector of packets. */
    .vector_size = sizeof (u32),
    .n_errors = TCP_N_ERROR,
    .error_strings = tcp_error_strings,
    .n_next_nodes = TCP_OUTPUT_N_NEXT,
    .next_nodes = {
#define _(s,n) [TCP_OUTPUT_NEXT_##s] = n,
    foreach_tcp4_output_next
#undef _
    },
    .format_buffer = format_tcp_header,
    .format_trace = format_tcp_tx_trace,
};
/* *INDENT-ON* */

VLIB_NODE_FUNCTION_MULTIARCH (tcp4_output_node, tcp4_output);

/* *INDENT-OFF* */
VLIB_REGISTER_NODE (tcp6_output_node) =
{
  .function = tcp6_output,
  .name = "tcp6-output",
    /* Takes a vector of packets. */
  .vector_size = sizeof (u32),
  .n_errors = TCP_N_ERROR,
  .error_strings = tcp_error_strings,
  .n_next_nodes = TCP_OUTPUT_N_NEXT,
  .next_nodes = {
#define _(s,n) [TCP_OUTPUT_NEXT_##s] = n,
    foreach_tcp6_output_next
#undef _
  },
  .format_buffer = format_tcp_header,
  .format_trace = format_tcp_tx_trace,
};
/* *INDENT-ON* */

VLIB_NODE_FUNCTION_MULTIARCH (tcp6_output_node, tcp6_output);

u32
tcp_push_header (transport_connection_t * tconn, vlib_buffer_t * b)
{
  tcp_connection_t *tc;

  tc = (tcp_connection_t *) tconn;
  tcp_push_hdr_i (tc, b, TCP_STATE_ESTABLISHED, 0);
  ASSERT (seq_leq (tc->snd_una_max, tc->snd_una + tc->snd_wnd));

  if (tc->rtt_ts == 0 && !tcp_in_cong_recovery (tc))
    {
      tc->rtt_ts = tcp_time_now ();
      tc->rtt_seq = tc->snd_nxt;
    }
  return 0;
}

typedef enum _tcp_reset_next
{
  TCP_RESET_NEXT_DROP,
  TCP_RESET_NEXT_IP_LOOKUP,
  TCP_RESET_N_NEXT
} tcp_reset_next_t;

#define foreach_tcp4_reset_next        	\
  _(DROP, "error-drop")                 \
  _(IP_LOOKUP, "ip4-lookup")

#define foreach_tcp6_reset_next        	\
  _(DROP, "error-drop")                 \
  _(IP_LOOKUP, "ip6-lookup")

static uword
tcp46_send_reset_inline (vlib_main_t * vm, vlib_node_runtime_t * node,
			 vlib_frame_t * from_frame, u8 is_ip4)
{
  u32 n_left_from, next_index, *from, *to_next;
  u32 my_thread_index = vm->thread_index;

  from = vlib_frame_vector_args (from_frame);
  n_left_from = from_frame->n_vectors;

  next_index = node->cached_next_index;

  while (n_left_from > 0)
    {
      u32 n_left_to_next;

      vlib_get_next_frame (vm, node, next_index, to_next, n_left_to_next);

      while (n_left_from > 0 && n_left_to_next > 0)
	{
	  u32 bi0;
	  vlib_buffer_t *b0;
	  tcp_tx_trace_t *t0;
	  tcp_header_t *th0;
	  u32 error0 = TCP_ERROR_RST_SENT, next0 = TCP_RESET_NEXT_IP_LOOKUP;

	  bi0 = from[0];
	  to_next[0] = bi0;
	  from += 1;
	  to_next += 1;
	  n_left_from -= 1;
	  n_left_to_next -= 1;

	  b0 = vlib_get_buffer (vm, bi0);

	  if (tcp_make_reset_in_place (vm, b0, vnet_buffer (b0)->tcp.flags,
				       my_thread_index, is_ip4))
	    {
	      error0 = TCP_ERROR_LOOKUP_DROPS;
	      next0 = TCP_RESET_NEXT_DROP;
	      goto done;
	    }

	  /* Prepare to send to IP lookup */
	  vnet_buffer (b0)->sw_if_index[VLIB_TX] = 0;
	  next0 = TCP_RESET_NEXT_IP_LOOKUP;

	done:
	  b0->error = node->errors[error0];
	  b0->flags |= VNET_BUFFER_F_LOCALLY_ORIGINATED;
	  if (PREDICT_FALSE (b0->flags & VLIB_BUFFER_IS_TRACED))
	    {
	      th0 = vlib_buffer_get_current (b0);
	      if (is_ip4)
		th0 = ip4_next_header ((ip4_header_t *) th0);
	      else
		th0 = ip6_next_header ((ip6_header_t *) th0);
	      t0 = vlib_add_trace (vm, node, b0, sizeof (*t0));
	      clib_memcpy (&t0->tcp_header, th0, sizeof (t0->tcp_header));
	    }

	  vlib_validate_buffer_enqueue_x1 (vm, node, next_index, to_next,
					   n_left_to_next, bi0, next0);
	}
      vlib_put_next_frame (vm, node, next_index, n_left_to_next);
    }
  return from_frame->n_vectors;
}

static uword
tcp4_send_reset (vlib_main_t * vm, vlib_node_runtime_t * node,
		 vlib_frame_t * from_frame)
{
  return tcp46_send_reset_inline (vm, node, from_frame, 1);
}

static uword
tcp6_send_reset (vlib_main_t * vm, vlib_node_runtime_t * node,
		 vlib_frame_t * from_frame)
{
  return tcp46_send_reset_inline (vm, node, from_frame, 0);
}

/* *INDENT-OFF* */
VLIB_REGISTER_NODE (tcp4_reset_node) = {
  .function = tcp4_send_reset,
  .name = "tcp4-reset",
  .vector_size = sizeof (u32),
  .n_errors = TCP_N_ERROR,
  .error_strings = tcp_error_strings,
  .n_next_nodes = TCP_RESET_N_NEXT,
  .next_nodes = {
#define _(s,n) [TCP_RESET_NEXT_##s] = n,
    foreach_tcp4_reset_next
#undef _
  },
  .format_trace = format_tcp_tx_trace,
};
/* *INDENT-ON* */

VLIB_NODE_FUNCTION_MULTIARCH (tcp4_reset_node, tcp4_send_reset);

/* *INDENT-OFF* */
VLIB_REGISTER_NODE (tcp6_reset_node) = {
  .function = tcp6_send_reset,
  .name = "tcp6-reset",
  .vector_size = sizeof (u32),
  .n_errors = TCP_N_ERROR,
  .error_strings = tcp_error_strings,
  .n_next_nodes = TCP_RESET_N_NEXT,
  .next_nodes = {
#define _(s,n) [TCP_RESET_NEXT_##s] = n,
    foreach_tcp6_reset_next
#undef _
  },
  .format_trace = format_tcp_tx_trace,
};
/* *INDENT-ON* */

VLIB_NODE_FUNCTION_MULTIARCH (tcp6_reset_node, tcp6_send_reset);

/*
 * fd.io coding-style-patch-verification: ON
 *
 * Local Variables:
 * eval: (c-set-style "gnu")
 * End:
 */