Age | Commit message (Collapse) | Author | Files | Lines |
|
Callbacks for monitoring and performance measurement:
- Add new callback list type, with context
- Add callbacks for API, CLI, and barrier sync
- Modify node dispatch callback to pass plugin-specific context
- Modify perfmon plugin to keep PMC samples local to the plugin
- Include process nodes in dispatch callback
- Pass dispatch function return value to callback
Type: refactor
Signed-off-by: Tom Seidenberg <tseidenb@cisco.com>
Change-Id: I28b06c58490611e08d76ff5b01b2347ba2109b22
|
|
Type: improvement
Change-Id: I81df4b61d1f0b8c1df77c1ee9bebcb491e155b69
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Calling vlib_time_now (&vlib_global_main) from a worker thread is a bad
mistake. ASSERT (vm->thread_index == __os_thread_index) will catch it.
Type: improvement
Signed-off-by: Dave Barach <dave@barachs.net>
Change-Id: I55af6de84e06143f8f43adc62103b77267a7a441
|
|
When configured at compile time via the cmake
VPP_BUFFER_FAULT_INJECTOR option, the buffer allocator will appear to
fail a certain fraction of the time.
By default, the allocator succeeds 80% of the time. Detailed command
line configuration options are available, but only when the image has
been compiled with cmake option described above:
vlib { buffer-alloc-success-rate [0.0 ... 1.0]
buffer-alloc-success-seed <nnnn> }
Modify vlib_buffer_pool_create(...) so 0 is always an invalid buffer
index.
Debug images: add checks for bad buffer index enqueues, and also
verify that f->n_vectors doesn't accidentally map one or more
instances of the frame poison pattern 0xfefefefe.
Type: improvement
Signed-off-by: Dave Barach <dave@barachs.net>
Change-Id: Iab939858014463d1e664682805013d334d6fcbe5
|
|
Use exponential smoothing. Each sample has a half-life of 1
second. reported_rate(t) = reported_rate(t-1) * K + rate(t)*(1-K)
Sample every 20ms, i.e. 50 samples per second
K = exp (-1.0/20.0);
K = 0.95;
Type: feature
Signed-off-by: Dave Barach <dave@barachs.net>
Change-Id: I9aea5dd5fecfaefffb78245316adb4bf62eb2bd4
|
|
Configure n-tuple classifier filters which apply to the vpp packet
tracer.
Update the documentation to reflect the new feature.
Add a test vector.
Type: feature
Signed-off-by: Dave Barach <dave@barachs.net>
Change-Id: Iefa911716c670fc12e4825b937b62044433fec36
|
|
Type: refactor
Signed-off-by: Dave Barach <dave@barachs.net>
Change-Id: I4b77879b0a84fdec3c1518a972cf003d5135222d
Signed-off-by: Ole Troan <ot@cisco.com>
|
|
Use a single vnet_pcap_t in vlib_global_main, specifically to support
unified tracing
Update sphinx docs, doxygen tags
Type: refactor
Ticket: VPP-1776
Signed-off-by: Dave Barach <dave@barachs.net>
Change-Id: Id15d41a596712968c0714cef1bd2cd5bc9cbdd55
|
|
See .../src/vnet/classify/trace_classify.h for the business end
of the scheme.
It would be best to hash pkts, prefetch buckets, and do the primary
table lookups two at a time. The inline as given works, but perf
tuning will be required. "At least it works..."
Add "classify filter" debug cli, for example:
classify filter mask l3 ip4 src dst \
match l3 ip4 dst 192.168.2.10 src 192.168.1.10
Add "pcap rx | tx trace ... filter" to use the current classify filter chain
Patch includes sphinx documentation and doxygen tags.
Next step: device-driver integration
Type: feature
Signed-off-by: Dave Barach <dave@barachs.net>
Change-Id: I05b1358a769f61e6d32470e0c87058f640486b26
|
|
Separate debug CLI arg parsing from the underlying action
function. Fixes a number of subtle ordering dependencies, and will
allow us to add a binary API to control the feature at some point in
the future.
Type: refactor
Ticket: VPP-1762
Signed-off-by: Dave Barach <dave@barachs.net>
Change-Id: I1240fe3f61a0acf5ee9faed60d6ad3386e72e569
|
|
Provide default packet_to_capture value. Display interface name
correctly for "pcap tx/rx trace status".
Type: fix
Signed-off-by: John Lo <loj@cisco.com>
Change-Id: I7064d0dbea236a9aff68bba7fbaf2c4a73b16c6f
Signed-off-by: John Lo <loj@cisco.com>
|
|
Change-Id: Iddeb3a1b0e20706e72ec8f74dabc60b342f003ba
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: Iaa5cd89791b0dfdb56a75009c564581d10696d83
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: I5cfa0f6eee67156bf87907fcf8a39f16d68a0905
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Moved code to the ethernet input node, and the interface output
path(s). Since we no longer skip ethernet-input, there's no reason
for device drivers to know anything about pcap rx tracing, etc.
Change-Id: I08d32fb1b90cbee1bd4f609837d533e047b36fa4
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
The main thread squirrels away vlib_time_now (&vlib_global_main),
worker threads use it to calculate an offset in f64 seconds from their
own vlib_time_now(vm) value. We use that offset until the next barrier
sync.
Thanks to Damjan for the suggestion.
Change-Id: If56cdfe68e5ad8ac3b0d0fc885dc3ba556cd1215
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: I195c8eabc0ee67880f1e85fc7594b00be6b563e3
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
This patch introduces following changes:
- deprecated free lists which are not used and not compatible
with external buffer managers (i.e. DPDK)
- introduces native support for per-numa buffer pools
- significantly improves performance of buffer alloc and free
Change-Id: I4a8e723ae47056717afd6cac0efe87cb731b5be7
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I79b213b34c6071d14acf1922f89037a4a5a36c45
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
As a FUD reduction measure, this patch implements 2-way parallel
counter collection. Synthetic stat component counter pairs run at the
same time. Running two counters (of any kind) at the same time
naturally reduces the aggregate time required by an approximate
factor-of-2, depending on whether an even or odd number of stats have
been requested.
I don't completely buy the argument that computing synthetic stats
such as instructions-per-clock will be inaccurate if component counter
values are collected sequentially. Given uniform traffic pattern, it
must make no difference.
As the collection interval increases, the difference between serial
and parallel component counter collection will approach zero, see also
the Central Limit theorem.
Change-Id: I36ebdcf125e8882cca8a1929ec58f17fba1ad8f1
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: I3bb1d9f83dd08f4b93acd4a281bfec0674e39c2e
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: If88ccd965122b9318a39a8d71b53334cd1fd81e4
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
VPP graph dispatch trace record description:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Major Version | Minor Version | NStrings | ProtoHint |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Buffer index (big endian) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ VPP graph node name ... ... | NULL octet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Buffer Metadata ... ... | NULL octet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Buffer Opaque ... ... | NULL octet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Buffer Opaque 2 ... ... | NULL octet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VPP ASCII packet trace (if NStrings > 4) | NULL octet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Packet data (up to 16K) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Graph dispatch records comprise a version stamp, an indication of how
many NULL-terminated strings will follow the record header, and a
protocol hint.
The buffer index allows downstream consumers of these data to easily
filter/track single packets as they traverse the forwarding
graph. FWIW, the 32-bit buffer index is stored in big endian format.
As of this writing, major version = 1, minor version = 0. Nstrings
will be either 4 or 5.
Here is the current set of protocol hints:
typedef enum
{
VLIB_NODE_PROTO_HINT_NONE = 0,
VLIB_NODE_PROTO_HINT_ETHERNET,
VLIB_NODE_PROTO_HINT_IP4,
VLIB_NODE_PROTO_HINT_IP6,
VLIB_NODE_PROTO_HINT_TCP,
VLIB_NODE_PROTO_HINT_UDP,
VLIB_NODE_N_PROTO_HINTS,
} vlib_node_proto_hint_t;
Example: VLIB_NODE_PROTO_HINT_IP6 means that the first octet of packet
data SHOULD be 0x60, and should begin an ipv6 packet header.
Change-Id: Idf310bad80cc0e4207394c80f18db5f77c378741
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: I56f25d653b71a25c70e6c5c1a93dd9c5158f2079
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
To facilitate dispatch trajectory tracing, vlib_buffer_t decoding, etc.
through Wireshark
Change-Id: I31356b9fa1f40cba8830aaf10a86a9fbb7546438
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: I2476e3e916a42b41d1e66bfc1ec4f8c4264c1720
Signed-off-by: Dave Barach <dbarach@cisco.com>
|
|
This reverts commit 71615399e194847d7833b744caedab9b841733e5.
There seems to be an issue with ARPs when running with multiple workers.
Change-Id: Iaa68081512362945a9caf24dcb8d70fc7c5b75df
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
Change-Id: Ib5c346641463768cf33eaf8cb5fab5b63171398d
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: Ic4c46bc733afae8bf0d8146623ed15633928de30
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: Ie5a00c15ee9536cc61afab57f6cadc1aa1972f3c
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Add an "elog trace [api][cli][barrier]" debug CLI command. Removed the
barrier elog test command. Remove unused reliable multicast code.
Change-Id: Ib3ecde901b7c49fe92b313d0087cd7e776adcdce
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: I1f54b994425c58776e1445c8d9fe142e7a644d3d
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Configure w/ --enable-dlmalloc, see .../build-data/platforms/vpp.mk
src/vppinfra/dlmalloc.[ch] are slightly modified versions of the
well-known Doug Lea malloc. Main advantage: dlmalloc mspaces have no
inherent size limit.
Change-Id: I19b3f43f3c65bcfb82c1a265a97922d01912446e
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
This is ~50% improvement in buffer alloc performance.
For a 256 buffer allocation, it was ~10 clocks/buffer, now is < 5 clocks.
Change-Id: I97590e240a79a42bcab5eb26587fc2d11e6eb163
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
This address crash with gcc-7 observed when -o3 is used.
Change-Id: I10e87da8e5037ad480eba7fb0aaa9a657d3bf48d
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
- buffer_main is no longer part of vlib_main_t
- pool of free lists is still part of vlib_main_t
- mheap is not used anymore for buffer allocation
- simple bitmap bassed buffer alloc scheme is introduced
Change-Id: I3e1e6d00e2c8122293ed0a741245eb841315a1ff
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
We don't need per vlib_main physmem_main, so keep it separatelly instead
of trying to keep them in sync.
Change-Id: I0fbeecf4d9672d31af7a43c640a7d8f05dd6e46f
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Queue RPC calls and send them from the main dispatch loop. As things stood,
if the vpp main input queue filled, worker threads could enter a
barrier-sync spin-wait in the middle of processing a frame. If thread
0 decided to recreate worker thread data structures, the worker thread(s)
could easily crash.
Legislate the problem out of existence by enqueueing RPC messages only
from the main dispatch loop. At that point, doing a barrier-sync wait
is perfectly OK.
Change-Id: I18da3e44bb1f29a63fe5f30cf11de732ecfd5bf7
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
It's way too easy to imagine leaving a mutex or a spin-lock held in
the /vpe-api shared-memory segment, or elsewhere. Set a volatile
variable and check it in a safe place...
Change-Id: I9d91c38cffeb921143c272162d055c9c24a6c312
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Support logging to both syslog and elog
Also include DaveB is_mp_safe fix, which had been lost
Change-Id: If82f7969e2f43c63c3fed5b1a0c7434c90c1f380
Signed-off-by: Colin Tregenza Dancer <ctd@metaswitch.com>
|
|
This patch adds supprot support for multiple numa-aware physmem regions.
Change-Id: I5c69a6f4da33c8ee21bdb8604d52fd2886f2327e
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change the rebuilding of worker thread clone datastructures
to run in parallel on the workers, instead of serially
on main.
Change-Id: Ib76bcfbef1e51f2399972090f4057be7aaa84e08
Signed-off-by: Colin Tregenza Dancer <ctd@metaswitch.com>
|
|
Off by default. Enable via cmdline "... vlib { elog-post-mortem-dump }
..."
Change-Id: I2056b9de9b37475f2bfeeb5404da838f1b42645a
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Change-Id: I82c663bc0866c6c68ba354104b0bb059387f4b9d
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
This patch deprecates stack-based thread identification,
Also removes requirement that thread stacks are adjacent.
Finally, possibly annoying for some folks, it renames
all occurences of cpu_index and cpu_number with thread
index. Using word "cpu" is misleading here as thread can
be migrated ti different CPU, and also it is not related
to linux cpu index.
Change-Id: I68cdaf661e701d2336fc953dcb9978d10a70f7c1
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I6ff7b65a400734a47bc0a7d03faf86ef1cf4f8c8
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I4aa3e7e42fb81211de1aed07dc7befee87a1e18b
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: Id18d59c9442602633a6310b2001a95bce8b6b232
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I7b51f88292e057c6443b12224486f2d0c9f8ae23
Signed-off-by: Damjan Marion <damarion@cisco.com>
|