Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch enables statistics from the Arm PMUv3 through the perfmon
plugin.
In comparison to using the Linux "perf" tool, it allows obtaining
direct, per node level statistics (rather than per thread). By accessing
the PMU counter registers directly from userspace, we can avoid the
overhead of using a read() system call and get more accurate and fine
grained statistics about the running of individual nodes.
A demo of perfmon on Arm can be found at:
https://asciinema.org/a/egVNN1OF7JEKHYmfl5bpDYxfF
*Important Note*
Perfmon on Arm is dependent on and works only on Linux kernel versions
of v5.17+ as this is when userspace access to Arm perf counters was
included.
On most Arm systems, a maximum of 7 PMU events can be configured at once
- (6x PMU events + 1x CPU_CYCLE counter). If some perf counters are in
use elsewhere by other applications, and there are insufficient counters
remaining to open the bundle, the perf_event_open call will fail
(provided the events are grouped with the group_fd param, which perfmon
currently utilises).
See arm/events.h for a list of PMUv3 events available, although it is
implementation defined whether most events are implemented or not. Only
a small set of 7 events is required to be implemented in Armv8.0, with
some additional events required in later versions. As such, depending on
the implementation, some statistics may not be available. See Arm
Architecture Reference Manual for Armv8-A, D7.10.2 "The PMU event number
space and common events" for more information.
arm/events.c:arm_init() gets information from the sysfs about what
events are implemented on a particular CPU at runtime. Arm's
implementation of the perfmon source callback .bundle_support uses this
information to disable unsupported events in a bundle, or in the case
no events are supported, disable the entire bundle.
Where a particular event in a bundle is not implemented, the statistic
for that event is shown as '-' in the 'show perfmon statistics' cli
output, by disabling the column.
There is additional code in perfmon.c to only open events which are
marked as implemented. Since we're only opening and reading events that
are implemented, some extra logic is required in cli.c to re-align
either perfmon_node_stats_t or perfmon_reading_t with the column
headings configured in each bundle, taking into account disabled
columns.
Userspace access to perf counters is disabled by default, and needs to
be enabled with 'sudo sysctl kernel/perf_user_access=1'.
There is a check built into the Arm event source init function
(arm/events.c:arm_init) to check that userspace reading of perf counters
is enabled in the /proc/sys/kernel/perf_user_access file.
If the above file does not exist, it means the kernel version is
unsupported. Users without a supported kernel will see a warning
message, and no Arm bundles will be registered to use in perfmon.
Enabling/using plugin:
- include the following in startup.conf:
- plugins { plugin perfmon_plugin.so { enable }
- 'show perfmon bundle [verbose]' - show available statistics bundles
- 'perfmon start bundle <bundle-name>' - enable and start logging
- 'perfmon stop' - stop logging
- 'show perfmon statistics' - show output
For a general guide on using and understanding Arm PMUv3 events, see
https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/arm-neoverse-n1-performance-analysis-methodology
Type: feature
Signed-off-by: Zachary Leaf <zachary.leaf@arm.com>
Tested-by: Jieqiang Wang <jieqiang.wang@arm.com>
Change-Id: I0620fe5b1bbe78842dfb1d0b6a060bb99e777651
|
|
Type: fix
Signed-off-by: Ivan Shvedunov <ivan4th@gmail.com>
Signed-off-by: Sergey Matov <sergey.matov@travelping.com>
Change-Id: I4ec1a68b7266f05ab7c543cd8207afb29e740743
|
|
Type: fix
Change-Id: I1e8655baaf09b455f7f0052452402a372f738d0f
Signed-off-by: Benoît Ganne <bganne@cisco.com>
|
|
Type: improvement
Change-Id: I7489327d8b9c5f69b4ceb2159456f00f8a3612df
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: fix
reset the memory trace if mem trace is turned on
Signed-off-by: Leung Lai Yung <benkerbuild@gmail.com>
Change-Id: Ib99355b9ed42ff66c720bbea5cbbf03c65820d12
|
|
Type: improvement
Change-Id: I20b41537a249a55f01004e45392b34adaa8fd792
Signed-off-by: Mohammed Hawari <mohammed@hawari.fr>
|
|
Type: fix
pool_is_free_index() check is performed only for the first element
Signed-off-by: Vijayabhaskar Katamreddy <vkatamre@cisco.com>
Change-Id: Icadc715a9b54761ec69805a134a69a262137536d
|
|
Type: fix
Pace the main thread activity for reassembly timeouts, to avoid barrier syncs
Signed-off-by: Vijayabhaskar Katamreddy <vkatamre@cisco.com>
Change-Id: If8c62a05c7d28bfa6ac530c2cd5124834b4e8a70
|
|
Type: fix
Fixes: 56f54af
Change-Id: Id03185953eb16da3a3276d2f21d64499784bbf17
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: fix
Change-Id: Ie292ee56dd5265a56ef472554aaf086e61da7089
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Passing wrong pointer to clib_mem_vm_unmap...
Type: fix
Change-Id: I1f695d77bc45d9a6de3a4a3da1fbe6faebdad15e
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
GCC added asm inline in 8.3, so we change asm inline to asm volatile.
Type: fix
Fixes: d5045e68a782 ("vppinfra: introduce clib_perfmom")
Signed-off-by: Guillaume Solignac <gsoligna@cisco.com>
Change-Id: I9f7781ba9de66211404348ff477a17059b408a78
|
|
clang start to support parse asm inline from clang-11,
Use asm volatile instead.
Type: fix
Fixes: d5045e68a782 ("vppinfra: introduce clib_perfmom")
Signed-off-by: Tianyu Li <tianyu.li@arm.com>
Change-Id: I00e5e19856caaed94e22f8fa6cf4f918483976a4
|
|
Type: improvement
Change-Id: I37c187af80c21b8fb1ab15af112527a837e0df9e
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: improvement
Change-Id: I85a90774eb313020435c9bc2297c1bdf23d52efc
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
The clib interrupt structure has a couple of fields at the start of
the cacheline, and then in the next cacheline it has a bitmap, which
is then followed by an atomic bitmap. The size of the bitmaps is
based on the number of interrupts, and when the memory is allocated
the number of interrupts needed is used to size the overall block of
memory. The interrupts typically map to pool entries, so if we want
to store 512 entries then we store them in indices 0..511. This
would then take 8 6 4bit words, so each bitmap would be this size
when the struct is allocated.
It is possible to walk over the end of the allocated data with certain
sizes, one of which is 512. The reason this happens with 512 is that
the check to see when to exit the loop is returning when offset is
greater than the value needed to fit all the values. In this case
512 >> 6 = 8. If there had only been 511 entries then the size would
have been 511 >> 6 = 7, and so it would have fitted in the space.
Therefore modify the check to also check that we are not looking into
the memory beyond what we have allocated in the case where the
number of interrupt is one of the boundary values like 512.
Also add a similar check first time round the loop as it is
possible we could have ate same problem there too.
Add a new test file to verify the new code works. The old version
of the code made this test fail when run with the address
sanitizer. Without the sanitiser it tended to pass because the
following memory was typically set to 0 even though it was
uninitialised.
Type: fix
Signed-off-by: Paul Atkins <patkins@graphiant.com>
Change-Id: I2ec4afae43d296a5c30299bd7694c072ca76b9a4
|
|
Type: improvement
Change-Id: I21de21af6dea9e39df5e912e20e56d878a40659f
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: improvement
Change-Id: Iab3d65b6276829ad1e522e66380d1797e37579b8
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: improvement
Change-Id: I1bca733d2a35733275efa46ee0a23bf572365c74
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: refactor
Change-Id: I5ca142ec1557d5b5c3806b43553ad9d3b5ea1112
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
This commit only update is_equal flavours to use
_mm512_cmpneq_epi64_mask() but clang-format decided to also reformat the
whole block.
Type: improvement
Change-Id: I530cf91e8bceaf15022556f803b27de4f8583217
Signed-off-by: Benoît Ganne <bganne@cisco.com>
|
|
Use of _vec_len() to set vector length breaks address sanitizer.
Users should use vec_set_len(), vec_inc_len(), vec_dec_len () instead.
Type: improvement
Change-Id: I441ae948771eb21c23a61f3ff9163bdad74a2cb8
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: fix
Change-Id: Ia9f715f6074c72ff581fba1740273cfebe48c0f1
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: improvement
Change-Id: Ic675ad4edbf27b7230fc2a77f00c90c46d6350c3
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
- support of in-place growth of vectors (if there is available space next to
existing alloc)
- drops the need for alloc_aligned_at_offset from memory allocator,
which allows easier swap to different memory allocator and reduces
malloc overhead
- rework of pool and vec macros to inline functions to improve debuggability
- fix alignment - in many cases macros were not using native alignment
of the particular datatype. Explicitly setting alignment with XXX_aligned()
versions of the macro is not needed anymore in > 99% of cases
- fix ASAN usage
- avoid use of vector of voids, this was root cause of several bugs
found in vec_* and pool_* function where sizeof() was used on voids
instead of real vector data type
- introduce minimal alignment which is currently 8 bytes, vectors will
be always aligned at least to that value (underlay allocator actually always
provide 16-byte aligned allocs)
Type: improvement
Change-Id: I20f4b081bb13bbf7bc0ace85cc4e301787f12fdf
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: Ic81aafd5f378de06e5ea8cdd6a59e07ff1a7afca
Type: improvement
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Convenient for type conversion
Type: improvement
Change-Id: I35a770e06e6318fb4249b70021ed6f6c93b07162
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Use of clib_mem_is_heap_object is not reliable enough for production use
as it relies on just few bytes of memory allocator chunk header.
Type: improvement
Change-Id: I48c8adde8b6348b15477e3a015ba515eb7ee7ec2
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
bihash keys are less than 64-bytes, do not overflow.
Type: fix
Change-Id: Ic55407eb9ccca38058f7e62b363ec05c8445fbcb
Signed-off-by: Benoît Ganne <bganne@cisco.com>
|
|
Type: improvement
This also makes the is_white_space function
public
Change-Id: Ifc1c0d4509f3ecae14f09bb5fa7a2eea33c49b09
Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>
|
|
Type: refactor
Change-Id: Iba0466b60354955ba73f851435a7127435b7066d
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
This adds two new exported functions
for the clib_bihash
* clib_bihash_add_with_overwrite_cb allowing
to pass a callback to be called on overwriting
a key with bucket lock held.
* clib_bihash_add_del_with_hash doing an add_del
with a precomputed hash.
Type: feature
Change-Id: I1590c933fa7cf21e6a8ada89b3456a60c4988244
Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>
|
|
More generic vector heap code coming in another patch...
Type: refactor
Change-Id: I2327128fb3aba9d5d330f46a35afec32e1e3942e
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: refactor
Change-Id: I3625eacf9e04542ca8778df5d46075a8654642c7
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
vec_free() does the work
Type: refactor
Change-Id: I8a97607c3b2f58d116863642b32b55525dc15d88
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: refactor
Change-Id: Iaa1e43c87c5725ab33ea8489bff2a7bda18b9c79
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Immediate benefit is ability to use hugepage backed memory.
Type: improvement
Change-Id: Ibcae961aa09ea92d3e931a40bedbc6346a4b2039
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: improvement
Change-Id: I17778e89674da0e8204713302e2293377bdabcbc
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
On the forst vector alloc values are stored into header.
Later, when vector grows values from header are used istead of provided
ones.
In the debug image code will assert if same values are not provided.
Type: improvement
Change-Id: I8fdcfa495e9c1df0f6392c90f634e8c74b73b328
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: refactor
Change-Id: Ibd29a717eaf12d795b3bceb31835d6fc655268b1
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Rename vec_capacity to vec_mem_size as it returned the size of the
underlying memory allocation not the number of bytes that can be used
for vector elements.
Add new vec_max_elts macro that returns number of elements that can fit
into generic vector.
Type: fix
Signed-off-by: Florin Coras <fcoras@cisco.com>
Change-Id: I2e53a2bfa6e56a89af62d6ddc073ead58b8c49bb
|
|
Type: fix
Change-Id: I298d2a5067f7949002e6c010f892553f1eb9f477
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: fix
Signed-off-by: Florin Coras <fcoras@cisco.com>
Change-Id: I3425350f5e874df79716bd726900540629793beb
|
|
Deadly combination is clib_{set,long}jmp + lazy linking + tail call compiler
optimization. On the first call to clib_setjmp, dynamic linker executes loader
code which then calls clib_setjmp, so stored stack position contains dynamic
loader data. Tail call optimization simply jumps back to the calling
code when clib_longjump is called and that results in wrong return
address used from the stack.
Change-Id: Ia7d8dbd5b2c425cdd0449374aa07ab6b684a330e
Type: fix
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: improvement
Change-Id: I4b89c32c224caf8a3a4ac94b26ecefffd26c7038
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Faster memcpy variant when overrun of up to SIMD register size - 1 is
accepable.
Type: improvement
Change-Id: I1fb22e2989f2156caa9f34fa096e6ff9ea9ec7b6
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
not needed, '>' works...
Type: improvement
Change-Id: I9bfbac787e8dca24e9b1bceda0922740eed26346
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
This fixes the maglev logic which previously
included a wrong simplication.
It moves the maglev logic to its own file,
and adds a test function in the debug cli.
Type: improvement
Change-Id: I2790ae2a26fc1c5739ff02f41d436bfcafd5b380
Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>
|
|
embedded vec_header_t
Type: fix
Signed-off-by: Neale Ranns <neale@graphiant.com>
Change-Id: Ie9532543c4d4439beec8ce097eafa34588ad1266
|
|
Prune perfmon bundles that exceed the number of available pmu counters.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I70fec26bb8ca915f4b980963e06c2e43dfde5a23
|