Age | Commit message (Collapse) | Author | Files | Lines |
|
With heavy traffic, tx code path may crash due to memory corruption
Thread 5 "vpp_wk_2" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff3995c700 (LWP 2505)]
0x00007ffff73675e8 in vhost_user_if_input (vm=0x7fffb5f5bf9c,
vum=0x7ffff7882a40 <vhost_user_main>, vui=0x7fffb65570c4, qid=0,
node=0x7fffb6577dac, mode=VNET_HW_INTERFACE_RX_MODE_POLLING)
at /home/sluong/vpp-master/vpp/build-data/../src/vnet/devices/virtio/vhost-user.c:1610
1610 bi_current = (vum->cpus[thread_index].rx_buffers)
[vum->cpus[thread_index].rx_buffers_len];
(gdb) p vum->cpus[thread_index].rx_buffers_len
$2 = 793212607
(gdb)
Apparently, some code accidentally wrote the bad value in rx_buffers_len.
rx_buffers_len should never be greater than 1024 since that is how many buffers
we request each time.
After debugging many hours, I discovered that the memory corruption happens
in the tx code path right here on line 2176.
{
vhost_copy_t *cpy = &vum->cpus[thread_index].copy[copy_len];
copy_len++;
cpy->len = bytes_left;
cpy->len = (cpy->len > buffer_len) ? buffer_len : cpy->len;
cpy->dst = buffer_map_addr;
cpy->src = (uword) vlib_buffer_get_current (current_b0) +
current_b0->current_length - bytes_left;
(gdb) p cpy
$3 = (vhost_copy_t *) 0x7fffb554077c
(gdb) p copy_len
$4 = 1025
(gdb) p &vum->cpus[3].rx_buffers_len
$8 = (u32 *) 0x7fffb5540784
copy_len is picking up the index entry 1024 before it was incremented. copy array has only
1024 members (0 - 1023 are valid).
The assignment here in cpy surely causes memory corruption. It is only discovered later
when the memory location that it corrupted is used.
The condition for the crash is to transmit jumbo frames under heavy volume. Since ring
size is 1024, with one packet taking up one index for frame size (less 2048), it does
not cause overflow. With jumbo frames, it requires multiple indices for one packet,
it can cause the overflow under heavy traffic.
The fix is to do copy out when we have 1000 entries in the array to avoid
overflow.
Change-Id: Iefbc739b8e80470f1cf13123113f8331ffcd0eb2
Signed-off-by: Steven <sluong@cisco.com>
(cherry picked from commit aa5df48cb233b377b5910694e2440a16e5973864)
|
|
Deps are required for Fedora too.
This patch should be in version 17.10 because it
fixes breaking rpm builds in some circumstances.
JIRA: VPP-1015
Change-Id: I10807069742cdd6b09a0f34d9d05e9cae4146ec3
Signed-off-by: Thomas F Herbert <therbert@redhat.com>
|
|
Fixes bug introduced in commit 5349f94d.
JIRA: VPP-1014
Change-Id: Ia18f4c6f5f1124306cce790a36f6de970d186687
Signed-off-by: Thomas F Herbert <therbert@redhat.com>
|
|
Change-Id: Ie3245b96c511cc30915e70e8c881f445291a38c2
Signed-off-by: Matus Fabian <matfabia@cisco.com>
|
|
- Also change default coredump configuration from
"coredump-size unlimited" to "full-coredump"
Change-Id: Iefedc2636f2d9696b7575b34e91dd7be49f601fa
Signed-off-by: Dave Wallace <dwallacelf@gmail.com>
(cherry picked from commit 981fadf928dadac683d2f629edf738aa91510af3)
|
|
https://gerrit.fd.io/r/#/c/8551/ decoupled the global variable,
namely tm->iovecs from TX and RX. However, to support multi-threads,
we have to eliminate the use of this global variable with per thread
variable. I notice that rx_buffers must also be per thread variable.
So, we introduce per thread struct to contain rx_buffers and iovecs.
Each thread will find the per thread struct with thread_index.
Change-Id: I61abf2fdace8d722525a382ac72f0d04a173b9ce
Signed-off-by: Steven <sluong@cisco.com>
(cherry picked from commit 4cd257667406d0500a81323ef91f5c7c8c902b25)
|
|
A bug was reported where a jumbo packet would stay in vhost
queue forever or until a large enough number of other packets
arrived in the queue too.
This is due to a bug in vhost input node buffer allocation.
The fix is to make sure that vhost always allocates at least
enough buffers for one single big packet. '40' is used to
account for 65kB frames.
Change-Id: I1d293028854165083e30cd798fab9d4140230b78
Signed-off-by: Pierre Pfister <ppfister@cisco.com>
|
|
Change-Id: I99c2c1d0d5b96f33efdb58dd3a2897a752e65349
Signed-off-by: Klement Sekera <ksekera@cisco.com>
|
|
- Fix invocation of compress_failed.sh
- Fix compress_failed to copy compressed results
files to $WORKSPACE/archives and return failure
exit code.
Failed test case data will be copied to logs.fd.io
and found in the archives/<make test data dir>-FAILED
directory in the build log link in the
vpp-verify-master-ubuntu1604 jenkins job page.
For example:
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/7353/archives/
Change-Id: Ife9a0737115e69c0a8441e3bb0133af1528d909b
Signed-off-by: Dave Wallace <dwallacelf@gmail.com>
(cherry picked from commit 25dc16715ee3fc0a600e2f58841173249bfae501)
|
|
Crash was seen on recent image with this BT on top of the stack
(gdb) bt full
(mif=0x7fffb6226568) at
/vpp/build-data/../src/plugins/memif/memif.c:297
ring = 0x0 <<<<<<<<<<
i = 0
j = 0
buffer_offset = 65792
r = 0x7fffb5e59f80
alloc = {flags = 1, name = 0x7fffb449f965 "memif region",
size = 4260096, numa_node = 0, addr = 0x7fff41dac000,
fd = 11,
log2_page_size = 12, n_pages = 1041}
err = 0x0
__FUNCTION__ = "memif_init_regions_and_queues"
The crash happened at this line.
ring = memif_get_ring (mif, MEMIF_RING_S2M, i);
ring=>head = ring->tail = 0; <=====
Please note that the crash is caused by dereferencing NULL rinng.
Put breakpoint into the function. I notice that
mif->regions[0].shm is not initialized.
(gdb) p mif->regions[0].shm
$8 = (void *) 0x0
It looks like we forgot to set shm after clib_mem_vm_ext_alloc().
Add the missing cide and the crash is fixed.
Change-Id: Ib722a6c241c77acfa8e33962106b57faa50e1ea7
Signed-off-by: Steven <sluong@cisco.com>
(cherry picked from commit 9fefa9a697daf0e949ea7a2700ecaf2ba4d1d2cb)
|
|
Change-Id: I492bea060ba5c219ea75e19ebfdad79b1074e04b
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
Identify and complain when the same IP prefix is assigned
to two different SW interfaces:
vpp# set int ip address TenGigabitEthernet6/0/0 1.2.3.4/32
vpp# set int ip address TenGigabitEthernet6/0/1 1.2.3.4/32
set interface ip address: Prefix 1.2.3.4/32 already found on
interface TenGigabitEthernet6/0/0
Change-Id: I1aee1b6a7ddd00d3109a53d8e1b6ce97bf45e372
Signed-off-by: Jon Loeliger <jdl@netgate.com>
(cherry picked from commit 35ffa3e8f6b032f6e324234d495f769049d8feea)
|
|
Change-Id: I7b175d57b85e626aab00221b6dac0498aebcbeae
Signed-off-by: Neale Ranns <nranns@cisco.com>
(cherry picked from commit d867a7cf6baffcebbf1b6e408272ec22dc55dd68)
|
|
Change-Id: I58772a83e22885a9ea8a7a981d2bcb4b31a050d2
Signed-off-by: Neale Ranns <nranns@cisco.com>
(cherry picked from commit 7b7ba572ab486d57b59c12af521175a6bcd7a52b)
|
|
Change L2 learning path so it update stale timestamp in MAC entry
only if aging is enabled on the BD for the MAC entry.
Change-Id: I7babe986ceef3c030d8ef9185076c42b405f7b0f
Signed-off-by: John Lo <loj@cisco.com>
(cherry picked from commit 5a6508d7269266b4a3ecacdd197ea3514a0c0e28)
|
|
Change-Id: I3e6276e6829dfee5a7aeae1b4ab4c3d2f2e932a4
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
fixes an issue where events were not sent if BD doesn't enable mac aging
Change-Id: Iddc53cb5c45e560633e6c5cff2731dccfc70ad5b
Signed-off-by: Eyal Bari <ebari@cisco.com>
(cherry picked from commit 24db0ec78fb651c4c585ebf30e07108240574045)
|
|
When making a call to vlib_packet_template_get_packet(), it
is possible to get back a NULL if the system runs out of buffer.
This can happen when there is buffer leaks. But don't crash
just because we run out of buffers, just punt.
Change-Id: Ie90ea41f3dda6e583d48959cbd18ff124158d7f8
Signed-off-by: Steven <sluong@cisco.com>
(cherry picked from commit 0ff5c563d5048991dbd02a3892dccde8305a7e30)
|
|
It was observed that under heavy traffic, VPP accidentally sent traffic
with the wrong source and destination to the tun/tap interface. Traffic
appears to be sent to the wrong direction. This problem is only
seen when worker thread is configured.
When worker thread is used, TX and RX may reside in different
core. Yet both TX and RX threads are sharing the same global variable,
namely iovecs without any mutex or memory barrier protection.
This creates a race condition when heavy traffic is blasted to VPP,
like 1000 pps.
We could create a mutex or memory barrier to ensure atomic memory access.
But why bother? It is a lot cheaper to just decouple the iovecs such
that TX and RX have their own iovecs.
Change-Id: I86a5a19bd8de54d54f32e1f0845bae6a81bbf686
Signed-off-by: Steven <sluong@cisco.com>
(cherry picked from commit 4ff586d1c6fc5c40e1548cd6f221a8a7f3ad033b)
|
|
- We now have several developer-focused docs, so create an index page
for them.
- Rework several docs to fit into the index structure.
- Experiment with code highlighting; tweak the CSS slightly to make
it slightly nicer to look at.
Change-Id: I4185a18f84fa0764745ca7a3148276064a3155c6
Signed-off-by: Chris Luke <chrisy@flirble.org>
(cherry picked from commit 64ebb5ff1338140d94c7f9ee72138fe84d89de2e)
|
|
Change-Id: Icd2cc7e328719b3964dfe344caf8ed9858283661
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
Change-Id: I5ca5763f0dc0a73cc6f014b855426b7ac180f356
Signed-off-by: Matej Perina <mperina@cisco.com>
|
|
Change-Id: Ib675164c475edcdbe3013df7b847adf5e050c53f
Signed-off-by: Filip Tehlar <ftehlar@cisco.com>
|
|
On host interface if a VLAN tagged packet is received, linux kernel removes
the VLAN header from packet byte stream and adds metadata in tpacket2_hdr.
This patch explicitely checks for the presense of VLAN metadata and adds it
in VPP packet.
Change-Id: I0ba35c1e98dbc008ce18d032f22f2717d610c1aa
Signed-off-by: Akshaya N <akshaya@rtbrick.com>
|
|
Change-Id: I45c1227b53ba9e57b94f1bc68de939cd3ce9d619
Signed-off-by: Dave Wallace <dwallacelf@gmail.com>
|
|
UIO binding is not required for Mellanox NIC and calling vlib_pci_bind_to_uio()
should be skipped.
Change-Id: I10ea457bc3c8d4be8117dec51d5bd940ee416a44
Signed-off-by: Steve Shin <jonshin@cisco.com>
|
|
174267: Revisit this string termination issue
174816: Add check for NULL when trace is enabled
177211: Add notation that mutex is not required here
177117: Added check for log2_page_size == 0 and returns an error if so
163697,163698: Added missing sw_if_index validation
Change-Id: I5a76fcf6505c785bfb3269e353360031c6a0fd0f
Signed-off-by: Chris Luke <chrisy@flirble.org>
|
|
The replacement of [] with pool_elt_at_index and subsequent fixing it
was incorrect - it was equivalent to &[], since it returns a pointer to
the element. I've added VPP-993 previously to create a testcase,
so this commit partially fulfills that one as well.
Change-Id: I5b15e3ce48316f0429232aacf885e8f7c63d9522
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Change-Id: I9f5212ee670ea91c6b35f1406c256d0687b9c6b5
Signed-off-by: Klement Sekera <ksekera@cisco.com>
|
|
- update of CSIT operational branch to be used for VPP-patch test
Change-Id: If582dc7c5e37bd3cda7ba4858e98fc504e2b7b1e
Signed-off-by: Jan Gelety <jgelety@cisco.com>
|
|
Change-Id: Ib677955448833dfeb1291490340f5ea1e417213b
Signed-off-by: Marco Varlese <marco.varlese@suse.com>
|
|
Change-Id: I2cf4c4850b9c3c093a7dce0cec89b9f710f69393
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
Update ping code to use the new function
Change-Id: Ieb753b23f8402cbe5667c22747896784c8ece937
Signed-off-by: Florin Coras <fcoras@cisco.com>
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: Iecf35bd9fd760856e32eb1c0c9542ffbed472379
Signed-off-by: Klement Sekera <ksekera@cisco.com>
|
|
Skip recompilation of test binaries from test/ext if these are
up-to-date. This speeds up repeated test runs.
Change-Id: I96dbfafc372398e3d858d8419219ef35c47bd0f3
Signed-off-by: Klement Sekera <ksekera@cisco.com>
|
|
Change-Id: Ie38dad209cce6d546379b4a5e449b34fbcadf171
Signed-off-by: Matej Perina <mperina@cisco.com>
|
|
Change-Id: I3873d3e411bf93cac82e73a0b8e3b22563aaf217
Signed-off-by: Matus Fabian <matfabia@cisco.com>
|
|
Change-Id: I1d3818027b8a1fcb1ec12016e3476b5c22a2d5a5
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Change-Id: I7d8f807fb502d61688aa1dee25fa4edcbeb32f41
Signed-off-by: Aequitas <wang.junqi@zte.com.cn>
|
|
- introduced by e6f3b467 "Fix for ssl dependency on debian 9"
Change-Id: If41e517b2a55d2028ade6671f407831cfcf205c4
Signed-off-by: Dave Wallace <dwallacelf@gmail.com>
|
|
- Revert Vagrantfile symlink to the default
- Update README and env.sh
Change-Id: Ib1a557b897e0217b162c31118a4c265769dd1760
Signed-off-by: Dave Wallace <dwallacelf@gmail.com>
|
|
Change-Id: I4fcde6652e0c66315a453250c6e02cd32176833d
Signed-off-by: Dave Wallace <dwallacelf@gmail.com>
|
|
Change-Id: I365c31607332a944ef498369881332b515894ed7
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
vector elements
bb7f0f644 aimed to fix the coverity issue has incorrectly replaced the previous [] access
with pool_elt_at_index(), for an element of a vector, with predictably interesting result.
VPP-991 has uncovered the issue.
Change-Id: Ifd3fb70332d3fdd1c4ff8570372f394913f7b6c8
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Change-Id: Idad65cbb3765500a66f1097126076a2c5fdb4f1b
Signed-off-by: Jerome Tollet <jtollet@cisco.com>
|
|
For bonded interface in Active/Backup mode (mode 1), we need to
send a GARP/NA packet, if IP address is present, on slave link
state change to up or down to help with route convergence. The
callback from DPDK happens in a separate thread so we need to make
sure RPC call is used to signal the send_garp_na process in the
main thread. Also need to fix DPDK polling so the slave links are
not polled.
Change-Id: If5fd8ea2d28c54dd28726ac403ad366386ce9651
Signed-off-by: John Lo <loj@cisco.com>
|
|
Change-Id: Id1a5da12b13d87bacfa81094f471b95db40c39be
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: I7f18f8c4ba609d96950dc1f833feb967d4a099b7
Signed-off-by: Matus Fabian <matfabia@cisco.com>
|
|
Change-Id: Icadf1c28b4ab712a210a8e037200ab29d6c53fe4
Signed-off-by: Jakub Grajciar <grajciar.jakub@gmail.com>
|
|
* Fixed package dependency
* Fixed bash unary operation error
Change-Id: I782dda8ffd807931241fa6034c110f5fedbeca8e
Signed-off-by: Marco Varlese <marco.varlese@suse.com>
|