Age | Commit message (Collapse) | Author | Files | Lines |
|
Encoding the vpp node index into the vlib_error_t as a 10-bit quantity
limits us to 1K graph nodes. Unfortunately, a few nodes need 6 bit
per-node error codes. Only a very few nodes have so many counters.
It turns out that there are about 2K total error counters in the system,
which is (approximately) the maximum error heap index.
The current (index,code) encoding limits the number of interfaces to
around 250, since each interface has two associated graph nodes and we
have about 500 "normal, interior" graph node
This patch adds an error-index to node-index map, so we can store
error heap indices directly in the vlib_buffer_t.
Type: refactor
Change-Id: I28101cad3d8750819e27b8785fc0cf71ff54f79a
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Cleaned up a few instances of side-bet elog_string hash table
usage. Elog_string handles that problem itself.
Add cli commands to vat to initialize, enable/disable, and save an
event log.
Event logging at the same time in both vpp and vat yields a pair
of event logs which can be merged by the "test_elog" tool.
Type: refactor
Change-Id: I8d6a72206f2309c967ea1630077fba31aef47f93
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Type: feature
Change-Id: Ia3d9a47679202c2a47cd3746b50e86c6b8627ef6
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
FRAME_QUEUE_NELTS is 64 in thread.c
Change-Id: Ie7e5962afe05dfc7f38e3d597dabc74dcc2dab8d
Signed-off-by: dongjuan <dong.juan1@zte.com.cn>
|
|
Otherwise, all N worker threads try to sort the list at the same time:
a good way to have a bad day.
This approach performs *far* better than maintaing order by adding a
spin-lock. By direct measurement w/ elog + g2: 11 threads execute the
per-thread init function list in 22us, vs. 50ms with a CLIB_PAUSE()
enabled spin-lock.
Change-Id: I1745f2a213c0561260139a60114dcb981e0c64e5
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
The current code only allowed access to the main thread error counters.
That is not so useful for a multi worker instance.
No return a vector indexed by thread of counter_t values.
Type: fix
Change-Id: Ie322c8889c0c8175e1116e71de04a2cf453b9ed7
Signed-off-by: Ole Troan <ot@cisco.com>
|
|
The vlib init function subsystem now supports a mix of procedural and
formally-specified ordering constraints. We should eliminate procedural
knowledge wherever possible.
The following schemes are *roughly* equivalent:
static clib_error_t *init_runs_first (vlib_main_t *vm)
{
clib_error_t *error;
... do some stuff...
if ((error = vlib_call_init_function (init_runs_next)))
return error;
...
}
VLIB_INIT_FUNCTION (init_runs_first);
and
static clib_error_t *init_runs_first (vlib_main_t *vm)
{
... do some stuff...
}
VLIB_INIT_FUNCTION (init_runs_first) =
{
.runs_before = VLIB_INITS("init_runs_next"),
};
The first form will [most likely] call "init_runs_next" on the
spot. The second form means that "init_runs_first" runs before
"init_runs_next," possibly much earlier in the sequence.
Please DO NOT construct sets of init functions where A before B
actually means A *right before* B. It's not necessary - simply combine
A and B - and it leads to hugely annoying debugging exercises when
trying to switch from ad-hoc procedural ordering constraints to formal
ordering constraints.
Change-Id: I5e4353503bf43b4acb11a45fb33c79a5ade8426c
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Fix os_get_nthreads() so that it starts returning the correct answer
as early as possible.
Change-Id: Id5292262f2c3f521b07ffbe6a9f6748dcc4dcb7d
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Main thread: don't bother with the barrier sync hold-down timer if
none of the worker threads are busy.
Worker threads: avoid epoll_pwait (10ms timeout) when the
control-plane has been active in the last half-second.
Change-Id: I82008d09968c65e2a4af0ebb7887389992e60603
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
It turns out that for scalar sizes 0..24, frames are always the same
size. That range includes all current use-cases - and then some - so
get rid of the hash table. Old code preserved under #ifdef
VLIB_SUPPORTS_ARBITRARY_SCALAR_SIZES.
Change-Id: Ic005c7143c9639f77d1a0fadd2fc0e90dccb68c1
Signed-off-by: Dave Barach <dbarach@cisco.com>
|
|
Change-Id: I6819dd9dbfc15c17740bdb98b51bdd639ef8c4d2
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
The main thread squirrels away vlib_time_now (&vlib_global_main),
worker threads use it to calculate an offset in f64 seconds from their
own vlib_time_now(vm) value. We use that offset until the next barrier
sync.
Thanks to Damjan for the suggestion.
Change-Id: If56cdfe68e5ad8ac3b0d0fc885dc3ba556cd1215
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
This patch introduces following changes:
- deprecated free lists which are not used and not compatible
with external buffer managers (i.e. DPDK)
- introduces native support for per-numa buffer pools
- significantly improves performance of buffer alloc and free
Change-Id: I4a8e723ae47056717afd6cac0efe87cb731b5be7
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I3a33fb81f31ed473811e9e7a6197b81135913865
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I79b213b34c6071d14acf1922f89037a4a5a36c45
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Add missing pre-input node runtime fork and refork code.
unix-epoll-input runs on all threads; each instance needs its own
runtime stats.
Change-Id: I16b02e42d0c95f863161176c4bb9f9917bef809d
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: If88ccd965122b9318a39a8d71b53334cd1fd81e4
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
fixes a problem seen when dhcp proxy uses vlib_buffer_chain_linearize
function which tries to vlib_buffer_alloc zero buffers - which succeeds
if the buffers vector is initialized but otherwise crashes when trying to
update the _vec_len in vlib_buffer_alloc_from_free_list
solved by initializing the free_list buffers vec
Change-Id: I1186d7aac05e54864d72f3f144e1bef8064f8efa
Signed-off-by: Eyal Bari <ebari@cisco.com>
|
|
Change-Id: Id4f37f5d4a03160572954a416efa1ef9b3d79ad1
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: I2476e3e916a42b41d1e66bfc1ec4f8c4264c1720
Signed-off-by: Dave Barach <dbarach@cisco.com>
|
|
This reverts commit 71615399e194847d7833b744caedab9b841733e5.
There seems to be an issue with ARPs when running with multiple workers.
Change-Id: Iaa68081512362945a9caf24dcb8d70fc7c5b75df
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
Change-Id: Ib5c346641463768cf33eaf8cb5fab5b63171398d
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: Ied34720ca5a6e6e717eea4e86003e854031b6eab
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
This is first part of addition of atomic macros with only macros for
__sync builtins.
- Based on earlier patch by Damjan (https://gerrit.fd.io/r/#/c/10729/)
Additionally
- clib_atomic_release macro added and used in the absence
of any memory barrier.
- clib_atomic_bool_cmp_and_swap added
Change-Id: Ie4e48c1e184a652018d1d0d87c4be80ddd180a3b
Original-patch-by: Damjan Marion <damarion@cisco.com>
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
|
|
Add an "elog trace [api][cli][barrier]" debug CLI command. Removed the
barrier elog test command. Remove unused reliable multicast code.
Change-Id: Ib3ecde901b7c49fe92b313d0087cd7e776adcdce
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: I3124238ab4d43bcef5590bad33a4ff0b5d8b7d15
Signed-off-by: Mohsin Kazmi <sykazmi@cisco.com>
|
|
Change-Id: I2d3b8d5a7192ff68bee443a99346ecb807b2d833
Signed-off-by: Damjan Marion <damarion@cisco.com>
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
clib_time_verify_frequency(...) rejects clock frequency changes
greater than 1%.
vlib_worker_thread_barrier_sync_int (...) continuously checks that the
barrier hold-down timer is not unreasonably far in the future.
Change-Id: I00ecb4c20e44de5d6a9c1499fa933e3dd834d11a
Signed-off-by: Dave Barach <dbarach@cisco.com>
|
|
Accept any sized hexadecimal bitmask specification to
support platforms with hundreds of cores.
Change-Id: Ib881db0cf60f78bdeffa13acfc2fc7fe7e128cc4
Signed-off-by: Yi He <yi.he@arm.com>
|
|
This prevents deadlock in case when worker A sends to B and worker B
sends to A
Change-Id: Id9436960f932c58325fe4f5ef8ec67b50031aeda
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Configure w/ --enable-dlmalloc, see .../build-data/platforms/vpp.mk
src/vppinfra/dlmalloc.[ch] are slightly modified versions of the
well-known Doug Lea malloc. Main advantage: dlmalloc mspaces have no
inherent size limit.
Change-Id: I19b3f43f3c65bcfb82c1a265a97922d01912446e
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: Id3ccfcfa2a88cf7aa106f1202af7cd677de32575
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I97d75f58a46275fdad5add27dc59c731cc3ac64d
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I524909570fc1736f51fd437d6d30566c461139bd
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Seems to have minimal-to-zero performance consequences. Data appears
accurate: result match the debug CLI output. Checked at low rates, 27
MPPS sprayed across two worker threads.
Change-Id: I09ede5150b88a91547feeee448a2854997613004
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
This is ~50% improvement in buffer alloc performance.
For a 256 buffer allocation, it was ~10 clocks/buffer, now is < 5 clocks.
Change-Id: I97590e240a79a42bcab5eb26587fc2d11e6eb163
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
This address crash with gcc-7 observed when -o3 is used.
Change-Id: I10e87da8e5037ad480eba7fb0aaa9a657d3bf48d
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
- buffer_main is no longer part of vlib_main_t
- pool of free lists is still part of vlib_main_t
- mheap is not used anymore for buffer allocation
- simple bitmap bassed buffer alloc scheme is introduced
Change-Id: I3e1e6d00e2c8122293ed0a741245eb841315a1ff
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Queue RPC calls and send them from the main dispatch loop. As things stood,
if the vpp main input queue filled, worker threads could enter a
barrier-sync spin-wait in the middle of processing a frame. If thread
0 decided to recreate worker thread data structures, the worker thread(s)
could easily crash.
Legislate the problem out of existence by enqueueing RPC messages only
from the main dispatch loop. At that point, doing a barrier-sync wait
is perfectly OK.
Change-Id: I18da3e44bb1f29a63fe5f30cf11de732ecfd5bf7
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Update ping code to use the new function
Change-Id: Ieb753b23f8402cbe5667c22747896784c8ece937
Signed-off-by: Florin Coras <fcoras@cisco.com>
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: Iff33694fc42cc3bcc73cf1372339053a6365039c
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Support logging to both syslog and elog
Also include DaveB is_mp_safe fix, which had been lost
Change-Id: If82f7969e2f43c63c3fed5b1a0c7434c90c1f380
Signed-off-by: Colin Tregenza Dancer <ctd@metaswitch.com>
|
|
Change the rebuilding of worker thread clone datastructures
to run in parallel on the workers, instead of serially
on main.
Change-Id: Ib76bcfbef1e51f2399972090f4057be7aaa84e08
Signed-off-by: Colin Tregenza Dancer <ctd@metaswitch.com>
|
|
Change-Id: I6a5faebb63e9360cebfcfb1bc3f3c0eb6b15e937
Signed-off-by: JingLiuZTE <liu.jing5@zte.com.cn>
|
|
- refactor existing congestion control code (RFC 6582/5681). Handling of ack
feedback now consists of: ack parsing, cc event detection, event handling,
congestion control update
- extend sack scoreboard to support sack based retransmissions
- basic implementation of Eifel detection algorithm (RFC 3522) for
detecting spurious retransmissions
- actually initialize the per-thread frame freelist hash tables
- increase worker stack size to 2mb
- fix session queue node out-of-buffer handling
- ensure that the local buffer cache vec_len matches reality
- avoid 2x spurious event requeues when short of buffers
- count out-of-buffer events
- make the builtin server thread-safe
- fix bihash template threading issue: need to paint -1 across uninitialized
working_copy_length vector elements (via rebase from master)
Change-Id: I646cb9f1add9a67d08f4a87badbcb117980ebfc4
Signed-off-by: Florin Coras <fcoras@cisco.com>
Signed-off-by: Dave Barach <dbarach@cisco.com>
|
|
Change-Id: I82c663bc0866c6c68ba354104b0bb059387f4b9d
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I9664214652229b663c3e3ba7406b4ede96bfb123
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
This patch deprecates stack-based thread identification,
Also removes requirement that thread stacks are adjacent.
Finally, possibly annoying for some folks, it renames
all occurences of cpu_index and cpu_number with thread
index. Using word "cpu" is misleading here as thread can
be migrated ti different CPU, and also it is not related
to linux cpu index.
Change-Id: I68cdaf661e701d2336fc953dcb9978d10a70f7c1
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
This change adds two new debug CLI command:
- "show interface placmenet" to display which
thread (main or worker) is responsible for processing
interface rx queue
vpp# show interface placement
Thread 0 (vpp_main):
node af-packet-input:
host-vpp1 queue 0
Thread 1 (vpp_wk_0):
node af-packet-input:
host-virbr0 queue 0
Thread 2 (vpp_wk_1):
node af-packet-input:
host-vpp2 queue 0
host-lxcbr0 queue 0
- "set interface placmenet" to assign thread (main or worker)
which process specific interface rx queue
vpp# set interface placement host-vpp1 queue 0 main
Change-Id: Id4dd00cf2b05e10fae2125ac7cb4411b446c5e9c
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I9b6ed9741fae89bdefa6f601398eb63a21155069
Signed-off-by: Damjan Marion <damarion@cisco.com>
|