Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch enables statistics from the Arm PMUv3 through the perfmon
plugin.
In comparison to using the Linux "perf" tool, it allows obtaining
direct, per node level statistics (rather than per thread). By accessing
the PMU counter registers directly from userspace, we can avoid the
overhead of using a read() system call and get more accurate and fine
grained statistics about the running of individual nodes.
A demo of perfmon on Arm can be found at:
https://asciinema.org/a/egVNN1OF7JEKHYmfl5bpDYxfF
*Important Note*
Perfmon on Arm is dependent on and works only on Linux kernel versions
of v5.17+ as this is when userspace access to Arm perf counters was
included.
On most Arm systems, a maximum of 7 PMU events can be configured at once
- (6x PMU events + 1x CPU_CYCLE counter). If some perf counters are in
use elsewhere by other applications, and there are insufficient counters
remaining to open the bundle, the perf_event_open call will fail
(provided the events are grouped with the group_fd param, which perfmon
currently utilises).
See arm/events.h for a list of PMUv3 events available, although it is
implementation defined whether most events are implemented or not. Only
a small set of 7 events is required to be implemented in Armv8.0, with
some additional events required in later versions. As such, depending on
the implementation, some statistics may not be available. See Arm
Architecture Reference Manual for Armv8-A, D7.10.2 "The PMU event number
space and common events" for more information.
arm/events.c:arm_init() gets information from the sysfs about what
events are implemented on a particular CPU at runtime. Arm's
implementation of the perfmon source callback .bundle_support uses this
information to disable unsupported events in a bundle, or in the case
no events are supported, disable the entire bundle.
Where a particular event in a bundle is not implemented, the statistic
for that event is shown as '-' in the 'show perfmon statistics' cli
output, by disabling the column.
There is additional code in perfmon.c to only open events which are
marked as implemented. Since we're only opening and reading events that
are implemented, some extra logic is required in cli.c to re-align
either perfmon_node_stats_t or perfmon_reading_t with the column
headings configured in each bundle, taking into account disabled
columns.
Userspace access to perf counters is disabled by default, and needs to
be enabled with 'sudo sysctl kernel/perf_user_access=1'.
There is a check built into the Arm event source init function
(arm/events.c:arm_init) to check that userspace reading of perf counters
is enabled in the /proc/sys/kernel/perf_user_access file.
If the above file does not exist, it means the kernel version is
unsupported. Users without a supported kernel will see a warning
message, and no Arm bundles will be registered to use in perfmon.
Enabling/using plugin:
- include the following in startup.conf:
- plugins { plugin perfmon_plugin.so { enable }
- 'show perfmon bundle [verbose]' - show available statistics bundles
- 'perfmon start bundle <bundle-name>' - enable and start logging
- 'perfmon stop' - stop logging
- 'show perfmon statistics' - show output
For a general guide on using and understanding Arm PMUv3 events, see
https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/arm-neoverse-n1-performance-analysis-methodology
Type: feature
Signed-off-by: Zachary Leaf <zachary.leaf@arm.com>
Tested-by: Jieqiang Wang <jieqiang.wang@arm.com>
Change-Id: I0620fe5b1bbe78842dfb1d0b6a060bb99e777651
|
|
In preparation for enabling perfmon on Arm platforms, move some Intel
/arch specific logic into the /intel directory and update the CMake to
split the common code from arch specific files.
Since the dispatch_wrapper code is very different on Arm/Intel,
each arch can provide their own implementation + conduct any additional
arch specific config e.g. on Intel, all indexes from the mmap pages are
cached. The new method intel_config_dispatch_wrapper conducts this
config and returns a pointer to the dispatch wrapper to use.
Similarly, is_bundle_supported() looks very different on Arm/Intel, so
each implementation is to provide their own arch specific checks.
Two new callbacks/function ptrs are added in PERFMON_REGISTER_SOURCE to
support this - .bundle_support and .config_dispatch_wrapper.
Type: refactor
Signed-off-by: Zachary Leaf <zachary.leaf@arm.com>
Change-Id: Idd121ddcfd1cc80a57c949cecd64eb2db0ac8be3
|
|
Signed-off-by: Jon Loeliger <jdl@netgate.com>
Type: style
Change-Id: I955c19ddbe06ef3651c03820fcc14054c63258b9
|
|
Previously Linux pseudo events were being counted as multiple fixed
events, such that a bundle with pseudo events could exceed the number of
available fixed counters. Reworked to ignore pseudo events in the
accounting for the moment.
Type: fix
Fixes: 0024e53ad
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: Ic938f8266fd04d7731afbd02e261c61ef22a8522
|
|
Move checking for duplicate bundle names after the other checks.
Type: fix
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I7fed5be758814e166eb8756b3df090130ac13bfd
|
|
Prune perfmon bundles that exceed the number of available pmu counters.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I70fec26bb8ca915f4b980963e06c2e43dfde5a23
|
|
Change the skipping bundle message to debug
Type: refactor
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I942ff72bd9c26ccad923442fdedddf22ba75e117
|
|
A bit ugly, but generates faster and less noisy code which
should be important for this particular use case.
Type: improvement
Change-Id: If2bba947dac33ffedb4236a5b3fb50fc783668e1
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Refactoring perf metric support to remove branching on bundle type in
the dispatch wrapper. This change includes caching the rdpmc index at
perfmon_start(), so that the mmap_page.index doesn't need to be looked
up each time. It also exclude the effects of mmap_page.index.
This patch prepares the path for bundles that support general, fixed and
metrics counters simulataneously.
Type: refactor
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I9c5b4917bd02fea960e546e8558452c4362eabc4
|
|
Add missing array index so that actual mmapped pages are unmpapped
instead of attempting to unmap array holding those pages.
Type: fix
Signed-off-by: Klement Sekera <ksekera@cisco.com>
Change-Id: Ib8709cce1bcbfb505307c140266834b284af796c
|
|
Add a check bundle is supported before futher activation.
Enable different bundles with same name, supported on different platforms.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I73e8bbd1e07c05ebccd9146d48a234eb598a2388
|
|
Allow perfmon bundles to support more than one bundle type, either node
or thread. Only used for topdown bundle for the moment.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: Iba3653a4deb39b0a8ee8ad448a7e8f954283ccd8
|
|
Adding perfmon node TMAM support on ICX.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I48a9a9ff6a72efc28eaf0cb11ef39fb62cebb126
|
|
Original set, start, stop, reset, show etc interface was somewhat cumbersome, we
can improve slightly by combining set and start.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I7b865b2c29d2ab32adbd24d7f8a580da6990bb76
|
|
Type: improvement
Change-Id: If3da7d4338470912f37ff1794620418d928fb77f
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Add perfmon plugin support to output raw counter and timestamps, both
are useful for debug.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: Ia5a73d1f05e3464c18991c2346f0ed8b7ef63099
|
|
Type: feature
Change-Id: I2c14f82393d11fc05c6d229f5c58603ab5c0f14d
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: refactor
Change-Id: I1303219f9f2a25d821737665903b0264edd3de32
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: refactor
Change-Id: Ie67dc579e88132ddb1ee4a34cb69f96920101772
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Callbacks for monitoring and performance measurement:
- Add new callback list type, with context
- Add callbacks for API, CLI, and barrier sync
- Modify node dispatch callback to pass plugin-specific context
- Modify perfmon plugin to keep PMC samples local to the plugin
- Include process nodes in dispatch callback
- Pass dispatch function return value to callback
Type: refactor
Signed-off-by: Tom Seidenberg <tseidenb@cisco.com>
Change-Id: I28b06c58490611e08d76ff5b01b2347ba2109b22
|
|
When perfmon_init is called at initialization time worker threads are
not created yet and vec_len(vlib_mains) returns 1.
Initialize per-worker data when the number of workers is known, when
enabling data collection instead.
Type: fix
Change-Id: I36887cc7b2a3e88d9728d3cd7262d9b1c968dd3c
Signed-off-by: Benoît Ganne <bganne@cisco.com>
|
|
- Make plugin descriptions more consistent
so the output of "show plugin" can be
used in the wiki.
Change-Id: I4c6feb11e7dcc5a4cf0848eed37f1d3b035c7dda
Signed-off-by: Dave Wallace <dwallacelf@gmail.com>
|
|
EXAMPLE:
src/plugins/perfmon/intel_json_to_c.py \
-i skylakex_core_v1.12.json \
-o src/plugins/perfmon/perfmon_intel_skx.c \
-m 0x55,0 \
-m 0x55,1 \
-m 0x55,2 \
-m 0x55,3
Change-Id: I16ce059e231d340ecfcb6f6638e29c5b46304683
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Add missing pre-input node runtime fork and refork code.
unix-epoll-input runs on all threads; each instance needs its own
runtime stats.
Change-Id: I16b02e42d0c95f863161176c4bb9f9917bef809d
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
As a FUD reduction measure, this patch implements 2-way parallel
counter collection. Synthetic stat component counter pairs run at the
same time. Running two counters (of any kind) at the same time
naturally reduces the aggregate time required by an approximate
factor-of-2, depending on whether an even or odd number of stats have
been requested.
I don't completely buy the argument that computing synthetic stats
such as instructions-per-clock will be inaccurate if component counter
values are collected sequentially. Given uniform traffic pattern, it
must make no difference.
As the collection interval increases, the difference between serial
and parallel component counter collection will approach zero, see also
the Central Limit theorem.
Change-Id: I36ebdcf125e8882cca8a1929ec58f17fba1ad8f1
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Built a tool to chew up https://download.01.org/perfmon/mapfile.csv,
and output a table in this format:
typedef struct {
u8 model;
u8 stepping;
u8 has_stepping;
char *filename;
} file_by_model_and_stepping_t;
static const file_by_model_and_stepping_t fms_table [] =
{
/* model, stepping, stepping valid, file */
{ 0x2E, 0x0, 0, "NehalemEX_core_V2.json" },
{ 0x1E, 0x0, 0, "NehalemEP_core_V2.json" },
<snip>
{ 0x55, 0x5, 1, "cascadelakex_core_v1.00.json" },
{ 0x55, 0x6, 1, "cascadelakex_core_v1.00.json" },
{ 0x55, 0x7, 1, "cascadelakex_core_v1.00.json" },
<snip>
Change-Id: Ie0e8a7e851799e9d060b966047745039c066ec7b
Signed-off-by: Dave Barach <dave@barachs.net>
|
|
Added/tested additional cpuids from our testbed.
Change-Id: Ifd3ea9e8e8231a8901966903bf5eceb635b82482
Signed-off-by: Paul Vinciguerra <pvinci@vinciconsulting.com>
|
|
Change-Id: Ie5a00c15ee9536cc61afab57f6cadc1aa1972f3c
Signed-off-by: Dave Barach <dave@barachs.net>
|