Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch enables statistics from the Arm PMUv3 through the perfmon
plugin.
In comparison to using the Linux "perf" tool, it allows obtaining
direct, per node level statistics (rather than per thread). By accessing
the PMU counter registers directly from userspace, we can avoid the
overhead of using a read() system call and get more accurate and fine
grained statistics about the running of individual nodes.
A demo of perfmon on Arm can be found at:
https://asciinema.org/a/egVNN1OF7JEKHYmfl5bpDYxfF
*Important Note*
Perfmon on Arm is dependent on and works only on Linux kernel versions
of v5.17+ as this is when userspace access to Arm perf counters was
included.
On most Arm systems, a maximum of 7 PMU events can be configured at once
- (6x PMU events + 1x CPU_CYCLE counter). If some perf counters are in
use elsewhere by other applications, and there are insufficient counters
remaining to open the bundle, the perf_event_open call will fail
(provided the events are grouped with the group_fd param, which perfmon
currently utilises).
See arm/events.h for a list of PMUv3 events available, although it is
implementation defined whether most events are implemented or not. Only
a small set of 7 events is required to be implemented in Armv8.0, with
some additional events required in later versions. As such, depending on
the implementation, some statistics may not be available. See Arm
Architecture Reference Manual for Armv8-A, D7.10.2 "The PMU event number
space and common events" for more information.
arm/events.c:arm_init() gets information from the sysfs about what
events are implemented on a particular CPU at runtime. Arm's
implementation of the perfmon source callback .bundle_support uses this
information to disable unsupported events in a bundle, or in the case
no events are supported, disable the entire bundle.
Where a particular event in a bundle is not implemented, the statistic
for that event is shown as '-' in the 'show perfmon statistics' cli
output, by disabling the column.
There is additional code in perfmon.c to only open events which are
marked as implemented. Since we're only opening and reading events that
are implemented, some extra logic is required in cli.c to re-align
either perfmon_node_stats_t or perfmon_reading_t with the column
headings configured in each bundle, taking into account disabled
columns.
Userspace access to perf counters is disabled by default, and needs to
be enabled with 'sudo sysctl kernel/perf_user_access=1'.
There is a check built into the Arm event source init function
(arm/events.c:arm_init) to check that userspace reading of perf counters
is enabled in the /proc/sys/kernel/perf_user_access file.
If the above file does not exist, it means the kernel version is
unsupported. Users without a supported kernel will see a warning
message, and no Arm bundles will be registered to use in perfmon.
Enabling/using plugin:
- include the following in startup.conf:
- plugins { plugin perfmon_plugin.so { enable }
- 'show perfmon bundle [verbose]' - show available statistics bundles
- 'perfmon start bundle <bundle-name>' - enable and start logging
- 'perfmon stop' - stop logging
- 'show perfmon statistics' - show output
For a general guide on using and understanding Arm PMUv3 events, see
https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/arm-neoverse-n1-performance-analysis-methodology
Type: feature
Signed-off-by: Zachary Leaf <zachary.leaf@arm.com>
Tested-by: Jieqiang Wang <jieqiang.wang@arm.com>
Change-Id: I0620fe5b1bbe78842dfb1d0b6a060bb99e777651
|
|
In preparation for enabling perfmon on Arm platforms, move some Intel
/arch specific logic into the /intel directory and update the CMake to
split the common code from arch specific files.
Since the dispatch_wrapper code is very different on Arm/Intel,
each arch can provide their own implementation + conduct any additional
arch specific config e.g. on Intel, all indexes from the mmap pages are
cached. The new method intel_config_dispatch_wrapper conducts this
config and returns a pointer to the dispatch wrapper to use.
Similarly, is_bundle_supported() looks very different on Arm/Intel, so
each implementation is to provide their own arch specific checks.
Two new callbacks/function ptrs are added in PERFMON_REGISTER_SOURCE to
support this - .bundle_support and .config_dispatch_wrapper.
Type: refactor
Signed-off-by: Zachary Leaf <zachary.leaf@arm.com>
Change-Id: Idd121ddcfd1cc80a57c949cecd64eb2db0ac8be3
|
|
format() expects a NULL-terminated C-string as format string.
Type: fix
Change-Id: Ib428cf2debbf98850eed512907175f8ae8ba3c04
Signed-off-by: Benoît Ganne <bganne@cisco.com>
|
|
Type: fix
Change-Id: I43ebb2c2922f3b8b8eddf26ccdf044f31d7b7a10
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Breakdown the distribution of uops delivered to the frontend.
Collerates directly with the source of the uops.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I93a57dbe56dfa0f378527844aa4e63f45a548e55
|
|
Topdown level 1 and 2 for Intel Ice Lake (ICX). Limiting topdown support
to THREAD for the moment on Ice Lake, as NODE support is still
unreliable. Also removing Topdown Level 1 from Sapphire Rapids onwards,
as Topdown LeveL 2 also shows Level 1 on Sapphire, and it reduces the
overall number of bundles.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: Iaa68b711dc8b6fb1090880b411debadb3c37f8bc
|
|
Previously Linux pseudo events were being counted as multiple fixed
events, such that a bundle with pseudo events could exceed the number of
available fixed counters. Reworked to ignore pseudo events in the
accounting for the moment.
Type: fix
Fixes: 0024e53ad
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: Ic938f8266fd04d7731afbd02e261c61ef22a8522
|
|
Add a bundle to measure topdown backend bound core cycles, will indicate if any
given execution port has contention.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I37d1b38c101ac42d51c10fa4452b822d34b729c9
|
|
Renamed memory stalls to topdown backend-bound-mem, added topdown
frontend-bound-latency and frontend-bound-bandwidth.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I70f42b6b63fe2502635cad4aed4271e2bbdda5f1
|
|
Prune perfmon bundles that exceed the number of available pmu counters.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I70fec26bb8ca915f4b980963e06c2e43dfde5a23
|
|
Added a cli to show Linux perf config for a give perfmon bundle. This
makes it easier to format Linux perf commands for next level analysis.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I9adafa7d441b72120390d186e3c8f884b1bc9828
|
|
Refactoring perf metric support to remove branching on bundle type in
the dispatch wrapper. This change includes caching the rdpmc index at
perfmon_start(), so that the mmap_page.index doesn't need to be looked
up each time. It also exclude the effects of mmap_page.index.
This patch prepares the path for bundles that support general, fixed and
metrics counters simulataneously.
Type: refactor
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I9c5b4917bd02fea960e546e8558452c4362eabc4
|
|
Check for possible hash lookup failure to avoid NULL dereference.
Type: fix
Fixes: e15c999c30
Signed-off-by: Klement Sekera <ksekera@cisco.com>
Change-Id: Ib806b4d124be26fbccf36fe9d19af1aec63f487b
|
|
Rename the memory bandwidth bundle to memory stalls, to differentiate it
from the bundle that measures memory controller bandwidth boundedness.
Type: refactor
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I828c73b6f769046e1ab592712bdf81ceefcd7911
|
|
Fixes an number of coverity issues associated with the iio-bw feature.
Type: fix
Fixes: e15c999c3
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I9ad2b336694132545d90a3483200a510226e9198
|
|
Type: fix
Signed-off-by: Xiaoming Jiang <jiangxiaoming@outlook.com>
Change-Id: I85e41d58884af71afba960d20604bb1b01876d26
|
|
Added an Intel Ice Lake specific bundles to measure pci bandwidth through the
Intel IO PMU. The "PCI" bundle measures read/writes from pci devices. The "CPU"
bundle measure read/writes from cpus to pci devices.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: Id48cef5988113e8dc4690b97d22243311bfa7961
|
|
Added support for the Intel Internal IO Uncore PMU, along with the ability to
format PMU Unit specific names.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I2939f8ade5e5ed63ccf7f3ccd0279d7c72e95a6e
|
|
Check that cpumask is initialised properly to avoid possible NULL
pointer dereference.
Type: fix
Signed-off-by: Klement Sekera <ksekera@cisco.com>
Change-Id: I8df5a718104fe703d6baf3f1294b4a6d2ca01619
|
|
Added topdown level 2 support on sapphire rapids,
including ability to indentify a sapphire rapids cpu.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I9f99a92fa0886b98bb5185cff32bebd5a094f329
|
|
Enable Topdown Level 1 support on Snowridge,
enabled with standard CPU events on small core.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I58ad09383de7464265ac1b69e683f253591e3b5e
|
|
Fix peusdo events, missed populating "core" events with peusdo events.
Type: fix
Fixes: bf37bf6f7
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I569fa876f1b58540adac0b095be0ff4ade664dec
|
|
Allow perfmon bundles to support more than one bundle type, either node
or thread. Only used for topdown bundle for the moment.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: Iba3653a4deb39b0a8ee8ad448a7e8f954283ccd8
|
|
Topdown events are peusdo events exposed by linux,
and are only present on Intel platforms.
Change to clarifies this.
Type: fix
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I6a3dcea5f43f53dbb96475329baf5e596a24d54f
|
|
Added memory bandwidth boundedness bundle, closely related to cache-hierarchy.
This bundle works on ICX only, due to an ICX specific counter.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: Id385bd5f4e645ac020774e311c623afb64b79b1e
|
|
Adding support for Linux papi TMAM on Intel Snowridge. Adds the ability to
indicate that a bundle should be thread or node bundle type based on available
cpu features (rdpmc support).
Type: feature
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: Ib871b2644fdb2410fbb580e0d21c3a8e2be13aba
|
|
Revert raw column from the perfmon plugin.
Type: refactor
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: If127f57ee2022cc1c0ea5177f1655a792f195f1d
|
|
Adding perfmon node TMAM support on ICX.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I48a9a9ff6a72efc28eaf0cb11ef39fb62cebb126
|
|
Show % time spent per graph node in power level 0, 1 and 2.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I678ee812fa993af39568e9f9dfbf2396fc13ad42
|
|
Add branches, branches taken (a meteric for branchy code), and branch
misses.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: If92d4aaf9d0a6e3b99b8c19e6311cc08ca470590
|
|
Account for occasional instances with the misses rates between caches
are inconsistent.
Type: fix
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: Idfb8bb7543401405cfe04291ad201c28be030cc9
|
|
Added basic support for counting cache hits and misses per node.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: Ic566611fd3d4246ccaa2117d8f74a569a6862e80
|
|
Type: feature
Change-Id: I2c14f82393d11fc05c6d229f5c58603ab5c0f14d
Signed-off-by: Damjan Marion <damarion@cisco.com>
|