diff options
Diffstat (limited to 'src/plugins/af_xdp')
-rw-r--r-- | src/plugins/af_xdp/af_xdp_doc.md | 129 | ||||
-rw-r--r-- | src/plugins/af_xdp/af_xdp_doc.rst | 164 |
2 files changed, 164 insertions, 129 deletions
diff --git a/src/plugins/af_xdp/af_xdp_doc.md b/src/plugins/af_xdp/af_xdp_doc.md deleted file mode 100644 index f5859dbb901..00000000000 --- a/src/plugins/af_xdp/af_xdp_doc.md +++ /dev/null @@ -1,129 +0,0 @@ -# AF_XDP Ethernet driver {#af_xdp_doc} - -This driver relies on Linux AF_XDP socket to rx/tx Ethernet packets. - -## Maturity level -Under development: it should work, but has not been thoroughly tested. - -## Features - - copy and zero-copy mode - - multiqueue - - API - - custom eBPF program - - polling, interrupt and adaptive mode - -## Known limitations - -### MTU -Because of AF_XDP restrictions, the MTU is limited to below PAGE_SIZE -(4096-bytes on most systems) minus 256-bytes, and they are additional -limitations depending upon specific Linux device drivers. -As a rule of thumb, a MTU of 3000-bytes or less should be safe. - -### Number of buffers -Furthermore, upon UMEM creation, the kernel allocates a -physically-contiguous structure, whose size is proportional to the number -of 4KB pages contained in the UMEM. That allocation might fail when -the number of buffers allocated by VPP is too high. That number can be -controlled with the `buffers { buffers-per-numa }` configuration option. -Finally, note that because of this limitation, this plugin is unlikely -to be compatible with the use of 1GB hugepages. - -### Interrupt mode -Interrupt and adaptive mode are supported but is limited by default to single -threaded (no worker) configurations because of a kernel limitation prior to -5.6. You can bypass the limitation at interface creation time by adding the -`no-syscall-lock` parameter, but you must be sure that your kernel can -support it, otherwise you will experience double-frees. -See -https://lore.kernel.org/bpf/BYAPR11MB365382C5DB1E5FCC53242609C1549@BYAPR11MB3653.namprd11.prod.outlook.com/ -for more details. - -### Mellanox -When setting the number of queues on Mellanox NIC with `ethtool -L`, you must -use twice the amount of configured queues: it looks like the Linux driver will -create separate RX queues and TX queues (but all queues can be used for both -RX and TX, the NIC will just not sent any packet on "pure" TX queues. -Confused? So I am.). For example if you set `combined 2` you will effectively -have to create 4 rx queues in AF_XDP if you want to be sure to receive all -packets. - -## Requirements -This drivers supports Linux kernel 5.4 and later. Kernels older than 5.4 are -missing unaligned buffers support. - -The Linux kernel interface must be up and have enough queues before -creating the VPP AF_XDP interface, otherwise Linux will deny creating -the AF_XDP socket. -The AF_XDP interface will claim NIC RX queue starting from 0, up to the -requested number of RX queues (only 1 by default). It means all packets -destined to NIC RX queue `[0, num_rx_queues[` will be received by the -AF_XDP interface, and only them. Depending on your configuration, there -will usually be several RX queues (typically 1 per core) and packets are -spread accross queues by RSS. In order to receive consistent traffic, -you **must** program the NIC dispatching accordingly. The simplest way -to get all the packets is to specify `num-rx-queues all` to grab all -available queues or to reconfigure the Linux kernel driver to use only -`num_rx_queues` RX queues (ie all NIC queues will be associated with -the AF_XDP socket): -``` -~# ethtool -L <iface> combined <num_rx_queues> -``` -Additionally, the VPP AF_XDP interface will use a MAC address generated at -creation time instead of the Linux kernel interface MAC. As Linux kernel -interface are not in promiscuous mode by default (see below) this will -results in a useless configuration where the VPP AF_XDP interface only -receives packets destined to the Linux kernel interface MAC just to drop -them because the destination MAC does not match VPP AF_XDP interface MAC. -If you want to use the Linux interface MAC for the VPP AF_XDP interface, -you can change it afterwards in VPP: -``` -~# vppctl set int mac address <iface> <mac> -``` -Finally, if you wish to receive all packets and not only the packets -destined to the Linux kernel interface MAC you need to set the Linux -kernel interface in promiscuous mode: -``` -~# ip link set dev <iface> promisc on -``` - -## Security considerations -When creating an AF_XDP interface, it will receive all packets arriving -to the NIC RX queue `[0, num_rx_queues[`. You need to configure the Linux -kernel NIC driver properly to ensure that only intented packets will -arrive in this queue. There is no way to filter the packets after-the-fact -using eg. netfilter or eBPF. - -## Quickstart -1. Put the Linux kernel interface up and in promiscuous mode: -``` -~# ip l set dev enp216s0f0 promisc on up -``` -2. Create the AF_XDP interface: -``` -~# vppctl create int af_xdp host-if enp216s0f0 num-rx-queues all -``` -3. Use the interface as usual, eg.: -``` -~# vppctl set int ip addr enp216s0f0/0 1.1.1.1/24 -~# vppctl set int st enp216s0f0/0 up -~# vppctl ping 1.1.1.100` -``` - -## Custom eBPF XDP program -This driver relies on libbpf and as such relies on the `xsks_map` eBPF -map. The default behavior is to use the XDP program already attached -to the interface if any, otherwise load the default one. -You can request to load a custom XDP program with the `prog` option when -creating the interface in VPP: -``` -~# vppctl create int af_xdp host-if enp216s0f0 num-rx-queues 4 prog extras/bpf/af_xdp.bpf.o -``` -In that case it will replace any previously attached program. A custom -XDP program example is provided in `extras/bpf/`. - -## Performance consideration -AF_XDP relies on the Linux kernel NIC driver to rx/tx packets. To reach -high-performance (10's MPPS), the Linux kernel NIC driver must support -zero-copy mode and its RX path must run on a dedicated core in the NUMA -where the NIC is physically connected. diff --git a/src/plugins/af_xdp/af_xdp_doc.rst b/src/plugins/af_xdp/af_xdp_doc.rst new file mode 100644 index 00000000000..de951340a2d --- /dev/null +++ b/src/plugins/af_xdp/af_xdp_doc.rst @@ -0,0 +1,164 @@ +AF_XDP device driver +==================== + +This driver relies on Linux AF_XDP socket to rx/tx Ethernet packets. + +Maturity level +-------------- + +Under development: it should work, but has not been thoroughly tested. + +Features +-------- + +- copy and zero-copy mode +- multiqueue +- API +- custom eBPF program +- polling, interrupt and adaptive mode + +Known limitations +----------------- + +MTU +~~~ + +Because of AF_XDP restrictions, the MTU is limited to below PAGE_SIZE +(4096-bytes on most systems) minus 256-bytes, and they are additional +limitations depending upon specific Linux device drivers. As a rule of +thumb, a MTU of 3000-bytes or less should be safe. + +Number of buffers +~~~~~~~~~~~~~~~~~ + +Furthermore, upon UMEM creation, the kernel allocates a +physically-contiguous structure, whose size is proportional to the +number of 4KB pages contained in the UMEM. That allocation might fail +when the number of buffers allocated by VPP is too high. That number can +be controlled with the ``buffers { buffers-per-numa }`` configuration +option. Finally, note that because of this limitation, this plugin is +unlikely to be compatible with the use of 1GB hugepages. + +Interrupt mode +~~~~~~~~~~~~~~ + +Interrupt and adaptive mode are supported but is limited by default to +single threaded (no worker) configurations because of a kernel +limitation prior to 5.6. You can bypass the limitation at interface +creation time by adding the ``no-syscall-lock`` parameter, but you must +be sure that your kernel can support it, otherwise you will experience +double-frees. See +https://lore.kernel.org/bpf/BYAPR11MB365382C5DB1E5FCC53242609C1549@BYAPR11MB3653.namprd11.prod.outlook.com/ +for more details. + +Mellanox +~~~~~~~~ + +When setting the number of queues on Mellanox NIC with ``ethtool -L``, +you must use twice the amount of configured queues: it looks like the +Linux driver will create separate RX queues and TX queues (but all +queues can be used for both RX and TX, the NIC will just not sent any +packet on “pure” TX queues. Confused? So I am.). For example if you set +``combined 2`` you will effectively have to create 4 rx queues in AF_XDP +if you want to be sure to receive all packets. + +Requirements +------------ + +This drivers supports Linux kernel 5.4 and later. Kernels older than 5.4 +are missing unaligned buffers support. + +The Linux kernel interface must be up and have enough queues before +creating the VPP AF_XDP interface, otherwise Linux will deny creating +the AF_XDP socket. The AF_XDP interface will claim NIC RX queue starting +from 0, up to the requested number of RX queues (only 1 by default). It +means all packets destined to NIC RX queue ``[0, num_rx_queues[`` will +be received by the AF_XDP interface, and only them. Depending on your +configuration, there will usually be several RX queues (typically 1 per +core) and packets are spread across queues by RSS. In order to receive +consistent traffic, you **must** program the NIC dispatching +accordingly. The simplest way to get all the packets is to specify +``num-rx-queues all`` to grab all available queues or to reconfigure the +Linux kernel driver to use only ``num_rx_queues`` RX queues (i.e. all NIC +queues will be associated with the AF_XDP socket): + +:: + + ~# ethtool -L <iface> combined <num_rx_queues> + +Additionally, the VPP AF_XDP interface will use a MAC address generated +at creation time instead of the Linux kernel interface MAC. As Linux +kernel interface are not in promiscuous mode by default (see below) this +will results in a useless configuration where the VPP AF_XDP interface +only receives packets destined to the Linux kernel interface MAC just to +drop them because the destination MAC does not match VPP AF_XDP +interface MAC. If you want to use the Linux interface MAC for the VPP +AF_XDP interface, you can change it afterwards in VPP: + +:: + + ~# vppctl set int mac address <iface> <mac> + +Finally, if you wish to receive all packets and not only the packets +destined to the Linux kernel interface MAC you need to set the Linux +kernel interface in promiscuous mode: + +:: + + ~# ip link set dev <iface> promisc on + +Security considerations +----------------------- + +When creating an AF_XDP interface, it will receive all packets arriving +to the NIC RX queue ``[0, num_rx_queues[``. You need to configure the +Linux kernel NIC driver properly to ensure that only intended packets +will arrive in this queue. There is no way to filter the packets +after-the-fact using e.g. netfilter or eBPF. + +Quickstart +---------- + +1. Put the Linux kernel interface up and in promiscuous mode: + +:: + + ~# ip l set dev enp216s0f0 promisc on up + +2. Create the AF_XDP interface: + +:: + + ~# vppctl create int af_xdp host-if enp216s0f0 num-rx-queues all + +3. Use the interface as usual, e.g.: + +:: + + ~# vppctl set int ip addr enp216s0f0/0 1.1.1.1/24 + ~# vppctl set int st enp216s0f0/0 up + ~# vppctl ping 1.1.1.100` + +Custom eBPF XDP program +----------------------- + +This driver relies on libbpf and as such relies on the ``xsks_map`` eBPF +map. The default behavior is to use the XDP program already attached to +the interface if any, otherwise load the default one. You can request to +load a custom XDP program with the ``prog`` option when creating the +interface in VPP: + +:: + + ~# vppctl create int af_xdp host-if enp216s0f0 num-rx-queues 4 prog extras/bpf/af_xdp.bpf.o + +In that case it will replace any previously attached program. A custom +XDP program example is provided in ``extras/bpf/``. + +Performance consideration +------------------------- + +AF_XDP relies on the Linux kernel NIC driver to rx/tx packets. To reach +high-performance (10’s MPPS), the Linux kernel NIC driver must support +zero-copy mode and its RX path must run on a dedicated core in the NUMA +where the NIC is physically connected. |