aboutsummaryrefslogtreecommitdiffstats
path: root/src/vnet/devices/virtio/vhost_user.c
AgeCommit message (Collapse)AuthorFilesLines
2022-12-02vhost: convert vhost device driver to a pluginSteven Luong1-2613/+0
convert vhost device driver to a plugin as described in https://jira.fd.io/browse/VPP-2065 Type: improvement Signed-off-by: Steven Luong <sluong@cisco.com> Change-Id: Ibfe2f351bcaed36a04b136d082ae414145dd37b5
2022-05-09vhost: memory leak upon deleting vhost-user interfaceSteven Luong1-3/+1
We allocate vring_spinlock for all vrings in the vui, not just the ones being used. So when we free the vui, we have to free vring_spinlock for all vrings, not just the one being used. Type: fix Signed-off-by: Steven Luong <sluong@cisco.com> Change-Id: I3951fda71ce6e11474b04302116ea9e08b404758
2022-03-01virtio: refactor codeMohsin Kazmi1-8/+14
Type: refactor Signed-off-by: Mohsin Kazmi <sykazmi@cisco.com> Change-Id: I3788cc857023fafcc8eb6d6ff4524425026a75d8
2022-01-10interface: refactor interface capabilities code, part 2Damjan Marion1-7/+8
Type: improvement Signed-off-by: Damjan Marion <damarion@cisco.com> Signed-off-by: Mohsin Kazmi <sykazmi@cisco.com> Change-Id: Ie595e69af8657b0ee18a84ac71c5d433108d9ef8
2022-01-07ethernet: new interface registration functionDamjan Marion1-10/+5
Prep for supporting multiple callbacks, optional args, etc. Type: improvement Change-Id: I96244c098712e8213374678623f12527b0e7f387 Signed-off-by: Damjan Marion <damarion@cisco.com>
2022-01-05interface: refactor interface capabilities codeDamjan Marion1-6/+4
Make it shorter to type, easier to debug, make adding callbacks in future simpler. Type: improvement Change-Id: I6cdd6375e36da23bd452a7c7273ff42789e94433 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-10-06docs: vnet comment nitfixesNathan Skrzypczak1-31/+37
Type: improvement Change-Id: Iac01d7830b53819ace8f199554be10ab89ecdb97 Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>
2021-08-26vhost: migrate to new TX infraSteven Luong1-44/+51
Take advantage of the new TX infra and support manual thread placement Type: improvement Signed-off-by: Steven Luong <sluong@cisco.com> Change-Id: Id8023846a2eb59125fcf2f80f4f11eb681cf14dc
2021-08-18vhost: interrupt mode support with mqSteven Luong1-124/+175
vhost interrupt mode support with mq does not work when coalesce frame is not configured to 0. When packed-ring is configured, we were also setting the wrong flag for want-interrupt. No need to trigger an interrupt to RX infra if there is at least one queue in the same thread that is doing polling. Type: fix Signed-off-by: Steven Luong <sluong@cisco.com> Change-Id: I8e64250f2aa6bf611a777a53e4d2c1076d83305f
2021-05-08vhost: low performance in interrupt mode in some casesSteven Luong1-35/+19
When vhost and another interface, say tap, are configured for interrupt mode, performance number may be very low from vhost. Further analysis discovers that when vhost posts an interrupt to the RX infra, there is a 10 msec delay in waking up its input routine. The delay is due to vhost posts the interrupt from the main thread which tries to wake up the worker thread. The fix is for vhost to move the posting interrupt call to the corresponding input worker thread by calling vnet_hw_if_set_rx_queue_file_index() to set it up. While at it, streamline the function vhost_user_kickfd_read_ready() since it will be called from the worker thread. Type: fix Signed-off-by: Steven Luong <sluong@cisco.com> Change-Id: I9beedcd33e1558c8335da4ee7fadc51c29ee4589
2021-04-05vhost: Crash upon disconnectSteven Luong1-1/+1
Patch 24601 introduced description into template. In vhost_user_process, template is reused inside the while loop to call clib_file_add. But description is allocated outside of the loop only once. As a result, the same storage for description is being referenced by all instances of call to clib_file_add. As long as we don't call clib_file_del, we may be good with multiple fds sharing the same storage for the description. When one of the fds disconnects and frees the description, the other fds is holding onto the free memory pointer. Bad news eventually happens when another fd disconnects and frees the description that was already free previously. The fix is to move the allocation of description inside the loop to avoid sharing. Type: fix Fixes: gerrit.fd.io/r/c/vpp/+/24601 Signed-off-by: Steven Luong <sluong@cisco.com> Change-Id: Ie670931acdc2c7b851982d98fd0d837284a19036
2021-03-19interface: add capabilities flagsMohsin Kazmi1-6/+10
Type: improvement This patch adds flags to represent the modern NICs capabilities. Change-Id: I96d38d9ab7eac55974d72795cd100d8337168e1e Signed-off-by: Mohsin Kazmi <sykazmi@cisco.com>
2021-01-27vhost: vhost interface hardware address not setSteven Luong1-6/+7
The check args->hwaddr is always true and it always copies the mac address from args->hwaddr even though none was set. Check args->use_custom_mac instead. Type: fix Fixes: gerrit 29970 Signed-off-by: Steven Luong <sluong@cisco.com> Change-Id: I0c51bf1ea79b02c4fbdc3c52e694f186bdd96600
2021-01-21interface: rx queue infra rework, part oneDamjan Marion1-33/+18
Type: improvement Change-Id: I4008cadfd5141f921afbdc09a3ebcd1dcf88eb29 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-01-08vhost: Add event index for interrupt notification to driverSteven Luong1-133/+128
VPP only supports a poor man's approach for interrupt notification to the driver. It uses a simple binary flag for "interrupt needed" or "interrupt not needed". Most drivers support more sophisticated event index already. This feature is to add the long due missing feature and make it configurable, off by default. Type: feature Signed-off-by: Steven Luong <sluong@cisco.com> Change-Id: I68dab7dd07045cafb49af97b7f70db9b8131ae03
2020-12-28vlib: add missing file template descriptionsPaul Vinciguerra1-0/+4
Add descriptions to clib_file_t template structures so that sockets can be identified via the 'show unix file' cli command. Type: fix Change-Id: Ibf82d55aa6c7b1126bd252b76d0dc8b7076f5046 Signed-off-by: Paul Vinciguerra <pvinci@vinciconsulting.com>
2020-12-25virtio: Extend vhost multi-queues support beyond 8 queue pairsSteven Luong1-44/+101
Current vhost multi-queues support has a hard limit of 8 queue pairs due to static vring array. This limit was raised in qemu. VPP should support more than 8 queue pairs also. Change static vring allocation to dynamic. When the interface is created, we allocate 8 queue pairs to begin with. We also keep track of how many queue pairs that the interface actually uses. We reply VHOST_USER_GET_QUEUE_NUM with 128 as our maximum number of support queue pair. When qemu starts initializing queue pair greater than 8, we expand the vrings as needed on demand. Type: improvement Signed-off-by: Steven Luong <sluong@cisco.com> Change-Id: I4a02d987d52d1bbe601b00e71f650fe6ebfcc0d7
2020-12-14misc: move to new pool_foreach macrosDamjan Marion1-15/+12
Type: refactor Change-Id: Ie67dc579e88132ddb1ee4a34cb69f96920101772 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-12-14vppinfra: simpler and faster pool_foreachDamjan Marion1-0/+1
- reduces number of instructions generated 4 times compared to old code - adds pool_foreach2 which is more friendly to clang-format Type: improvement Change-Id: I51e9c7fb09655c60d883987dadf5b2666c12b3f7 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-12-11virtio: fix vrings overflow in vhost_userBenoît Ganne1-1/+10
Type: fix Change-Id: I7ca955882c0e263a9ace4b14021e51488564e411 Signed-off-by: Benoît Ganne <bganne@cisco.com>
2020-10-28misc: Break the big IP header files to improve compile timeNeale Ranns1-2/+0
Type: refactor Signed-off-by: Neale Ranns <neale.ranns@cisco.com> Change-Id: Id1801519638a9b97175847d7ed58824fb83433d6
2020-10-08interface: shorten vnet_hw_if_rx_modeDamjan Marion1-6/+6
This is part of bigger refactor. Type: refactor Change-Id: I6fc2c0a1e2d217a70952901bcf775b8485bd3c20 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-09-04virtio: remove kernel virtio header dependenciesMohsin Kazmi1-19/+19
Type: refactor tap, virtio and vhost use virtio/vhost header files from linux kernel. Different features are supported on different kernel versions, making it difficult to use those in VPP. This patch removes virtio/vhost based header dependencies to local header files. Change-Id: I064a8adb5cd9753c986b6f224bb075200b3856af Signed-off-by: Mohsin Kazmi <sykazmi@cisco.com>
2020-04-29devices: vhost: simplify string copies for GCC-10Benoît Ganne1-10/+5
GCC-10 increases string truncations warnings. Refactor string copies confusing it. Type: refactor Change-Id: I9720a0539059de00ab212ff2fc73055f04f5af1d Signed-off-by: Benoît Ganne <bganne@cisco.com>
2020-04-27virtio: support virtio 1.1 packed ring in vhostSteven Luong1-47/+264
virtio 1.1 defines a number of new features. Packed ring is among the most notable and important one. It combines used, available, and descripptor rings into one. This patch provides experimental support for packed ring. To avoid regression, when packed ring is configured for the interface, it is branched to a separate RX and TX driver. Non packed ring should continue to perform as it was before. Packed ring is tested using qemu4.2 and ubuntu focal fossa (kernel 5.4.0-12) on the guess VM which supports packed ring. To configure VPP with packed ring, just add the optional keyword "packed" when creating the vhost interface. To bring up the guest VM with packed ring, add "packed=on" in the qemu launch command. To facilitate troubleshooting, also added "verbose" option in show vhost desc CLI to include displaying the indirect descriptors. Known qemu reconnect issue - If VPP is restarted, guest VMs also need to be restarted. The problem is kernel virtio-net-pci keeps track of the previous available and used indices. For virtio 1.0, these indices are in shared memory and qemu can easily copy them to pass to the backend for reconnect. For virio 1.1, these indices are no longer in shared memory. Qemu needs a new mechanism to retrieve them and it is not currently implemented. So when the protocol reconnects, qemu does not have the correct available and used indices to pass to the backend. As a result, after the reconnect, virtio-net-pci is reading the TX ring from the wrong position in the ring, not the same position which the backend is writing. Similar problem exists also in the RX. Type: feature Signed-off-by: Steven Luong <sluong@cisco.com> Change-Id: I5afc50b0bafab5a1de7a6dd10f399db3fafd144c
2019-10-07devices: vhost not reading packets from vringSteven Luong1-0/+22
In a rare event, after the vhost protocol message exchange has finished and the interface had been brought up successfully, the driver MAY still change its mind about the memory regions by sending new memory maps via SET_MEM_TABLE. Upon processing SET_MEM_TABLE, VPP invalidates the old memory regions and the descriptor tables. But it does not re-compute the new descriptor tables based on the new memory maps. Since VPP does not have the descriptor tables, it does not read the packets from the vring. In the normal working case, after SET_MEM_TABLE, the driver follows up with SET_VRING_ADDRESS which VPP computes the descriptor tables. The fix is to stash away the descriptor table addresses from SET_VRING_ADDRESS. Re-compute the new descriptor tables when processing SET_MEM_TABLE if descriptor table addresses are known. Type: fix Ticket: VPP-1784 Signed-off-by: Steven Luong <sluong@cisco.com> Change-Id: I3361f14c3a0372b8d07943eb6aa4b3a3f10708f9 (cherry picked from commit 61b8ba69f7a9540ed00576504528ce439f0286f5)
2019-08-19devices: skip checksum calculation if guest supports checksum offloadSteven Luong1-3/+7
Set VNET_HW_INTERFACE_FLAG_SUPPORTS_TX_L4_CKSUM_OFFLOAD for the interface to skip checksum calculation if guest supports checksum offload. Type: fix Signed-off-by: Steven Luong <sluong@cisco.com> Change-Id: Ie933c3462394f07580ef7f2bec1d2eb3b075bd0c
2019-07-31devices interface tests: vhosst GSO supportSteven Luong1-8/+47
Add gso option in create vhost interface to support gso and checksum offload. Tested with the following startup options in qemu: csum=on,gso=on,guest_csum=on,guest_tso4=on,guest_tso6=on,guest_ufo=on, host_tso4=on,host_tso6=on,host_ufo=on Type: feature Change-Id: I9ba1ee33677a694c4a0dfe66e745b098995902b8 Signed-off-by: Steven Luong <sluong@cisco.com>
2019-07-23api: binary api cleanupDave Barach1-5/+9
Multiple API message handlers call vnet_get_sup_hw_interface(...) without checking the inbound sw_if_index. This can cause a pool_elt_at_index ASSERT in a debug image, and major disorder in a production image. Given that a number of places are coded as follows, add an "api_visible_or_null" variant of vnet_get_sup_hw_interface, which returns NULL given an invalid sw_if_index, or a hidden sw interface: - hw = vnet_get_sup_hw_interface (vnm, sw_if_index); + hw = vnet_get_sup_hw_interface_api_visible_or_null (vnm, sw_if_index); if (hw == NULL || memif_device_class.index != hw->dev_class_index) return clib_error_return (0, "not a memif interface"); Rename two existing xxx_safe functions -> xxx_or_null to make it obvious what they return. Type: fix Change-Id: I29996e8d0768fd9e0c5495bd91ff8bedcf2c5697 Signed-off-by: Dave Barach <dave@barachs.net>
2019-07-23devices: vhost handling VHOST_USER_SET_FEATURESSteven Luong1-0/+1
Some combinations of new qemu (2.11) and old dpdk (16.10) may send VHOST_USER_SET_FEATURES at the end of the protocol exchange which the vhost interface is already declared up and ready. Unfortunately, the process of VHOST_USER_SET_FEATURES will cause the interface to go down. Not sure if it is correct or needed. Because there is no additional messages thereafter, the hardware interface stays down. The fix is to check the interface again at the end of processing VHOST_USER_SET_FEATURES. If it is up and ready, we bring back the hardware interface. Type: fix Change-Id: I490cd03820deacbd8b44d8f2cb38c26349dbe3b2 Signed-off-by: Steven Luong <sluong@cisco.com>
2019-05-16init / exit function orderingDave Barach1-6/+6
The vlib init function subsystem now supports a mix of procedural and formally-specified ordering constraints. We should eliminate procedural knowledge wherever possible. The following schemes are *roughly* equivalent: static clib_error_t *init_runs_first (vlib_main_t *vm) { clib_error_t *error; ... do some stuff... if ((error = vlib_call_init_function (init_runs_next))) return error; ... } VLIB_INIT_FUNCTION (init_runs_first); and static clib_error_t *init_runs_first (vlib_main_t *vm) { ... do some stuff... } VLIB_INIT_FUNCTION (init_runs_first) = { .runs_before = VLIB_INITS("init_runs_next"), }; The first form will [most likely] call "init_runs_next" on the spot. The second form means that "init_runs_first" runs before "init_runs_next," possibly much earlier in the sequence. Please DO NOT construct sets of init functions where A before B actually means A *right before* B. It's not necessary - simply combine A and B - and it leads to hugely annoying debugging exercises when trying to switch from ad-hoc procedural ordering constraints to formal ordering constraints. Change-Id: I5e4353503bf43b4acb11a45fb33c79a5ade8426c Signed-off-by: Dave Barach <dave@barachs.net>
2019-02-21vhost: VPP stalls with vhost performing control plane actionsSteven Luong1-211/+248
Symptom ------- With NDR traffic blasting at VPP, bringing up a new VM with vhost connection to VPP causes packet drops. I am able to recreate this problem easily using a simple setup like this. TREX-------------- switch ---- VPP |---------------| |-------| Cause ----- The reason for the packet drops is due to vhost holding onto the worker barrier lock for too long in vhost_user_socket_read(). There are quite a few of system calls inside the routine. At the end of the routine, it unconditionally calls vhost_user_update_iface_state() for all message types. vhost_user_update_iface_state() also unconditionally calls vhost_user_rx_thread_placement() and vhost_user_tx_thread_placement(). vhost_user_rx_thread_placement scraps out all existing cpu/queue mappings for the interface and creates brand new cpu/queue mappings for the interface. This process is very disruptive and very expensive. In my opinion, this area of code needs a makeover. Fixes ----- * vhost_user_socket_read() is rewritten that it should not hold onto the worker barrier lock for system calls, or at least minimize the need for doing it. * Remove the call to vhost_user_update_iface_state as a default route at the end of vhost_user_socket_read(). There is only a couple of message types which really need to call vhost_user_update_iface_state(). We put the call to those message types which need it. * Remove vhost_user_rx_thread_placement() and vhost_user_tx_thread_placement from vhost_user_update_iface_state(). There is no need to repetatively change the cpu/queue mappings. * vhost_user_rx_thread_placement() is actually quite expensive. It should be called only once per queue for the interface. There is no need to scrap the existing cpu/queue mappings and create new cpu/queue mappings when the additional queues becomes active/enable. * Change to create the cpu/queue mappings for the first RX when the interface is created. Dont remove the cpu/queue mapping when the interface is disconnected. Remove the cpu/queue mapping only when the interface is deleted. The create vhost user interface CLI also has some very expensive system calls if the command is entered with the optional keyword "server" As a bonus, This patch makes the create vhost user interface binary-api and CLI thread safe. Do the protection for the small amount of code which is thread unsafe. Change-Id: I4a19cbf7e9cc37ea01286169882e5603e6d7eb77 Signed-off-by: Steven Luong <sluong@cisco.com>
2018-11-14Remove c-11 memcpy checks from perf-critical codeDave Barach1-3/+3
Change-Id: Id4f37f5d4a03160572954a416efa1ef9b3d79ad1 Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-23c11 safe string handling supportDave Barach1-4/+4
Change-Id: Ied34720ca5a6e6e717eea4e86003e854031b6eab Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-02vhost-user: add support for vlib_log APIJerome Tollet1-155/+99
Change-Id: I0af68f6b41d0024aa64b93a8b18e2d179bf939b0 Signed-off-by: Jerome Tollet <jtollet@cisco.com> Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-10-02vhost-user: Interface state updatesJuraj Sloboda1-11/+12
Fix inconsistencies between admin and link interface states Admin down should imply link down: link_up = admin_up && link_ready Change-Id: I4d668d82d035b5d2ae508727f34f1722a0c3e677 Signed-off-by: Juraj Sloboda <jsloboda@cisco.com>
2018-10-02vhost_user: Fix setting MTU using uninitialized variableJuraj Sloboda1-2/+1
Change-Id: I0caa5fd584e3785f237d08f3d3be23e9bfee7605 Signed-off-by: Juraj Sloboda <jsloboda@cisco.com>
2018-09-20vhost: show vhost may display garbage after the interface nameSteven1-2/+3
DBGvpp# show vhost-user Virtio vhost-user interfaces Global: coalesce frames 32 time 1e-3 number of rx virtqueues in interrupt mode: 0 Interface: VirtualEthernet0/0/0�?x�D (ifindex 3) The fix is to use format_vnet_hw_if_index_name rather than hi->name. The former format the name with %v rather than %s Change-Id: If4d275e1eb249cf87b2d6b796b42f24769f9e3e3 Signed-off-by: Steven <sluong@cisco.com>
2018-09-12vhost: Fix VPP crash when reloading VM with mixed-type ports [VPP-1406]Yichen Wang1-1/+11
When VM is having mixed type of vhost-user and SRIOV ports, QEMU (RedHat v2.10) will not send disconnect signal to VPP, and just gives the new memory region directly. VPP is not able to handle new memory region mapping without disconnect signal first, which will result in a SEGV. The fix will handle the VM reboot scenario without explict disconnect signal from QEMU. The fix is to invalidate the avail, desc, and used pointers in the txvq when the new memory regions are received. This is because these pointers are not valid anymore with the new memory regions. In the input node, check to make sure the avail pointer is valid and punt if not. Change-Id: Ieb8b427b202f4442a58907dab1661d63a03650de Signed-off-by: Yichen Wang <yicwang@cisco.com>
2018-07-08vhost: Add multi-versioning supportMohsin Kazmi1-0/+2242
It also refactors the vhost code which was in one big file vhost-user.c. Receive side code is in vhost_user_input.c and Transmit side code is in vhost_user_output.c Change-Id: I1b539b5008685889723e228265786a2a3e9f3a78 Signed-off-by: Mohsin Kazmi <sykazmi@cisco.com>