aboutsummaryrefslogtreecommitdiffstats
path: root/src/plugins/perfmon/perfmon.c
AgeCommit message (Collapse)AuthorFilesLines
2022-07-12perfmon: enable perfmon plugin for ArmZachary Leaf1-6/+21
This patch enables statistics from the Arm PMUv3 through the perfmon plugin. In comparison to using the Linux "perf" tool, it allows obtaining direct, per node level statistics (rather than per thread). By accessing the PMU counter registers directly from userspace, we can avoid the overhead of using a read() system call and get more accurate and fine grained statistics about the running of individual nodes. A demo of perfmon on Arm can be found at: https://asciinema.org/a/egVNN1OF7JEKHYmfl5bpDYxfF *Important Note* Perfmon on Arm is dependent on and works only on Linux kernel versions of v5.17+ as this is when userspace access to Arm perf counters was included. On most Arm systems, a maximum of 7 PMU events can be configured at once - (6x PMU events + 1x CPU_CYCLE counter). If some perf counters are in use elsewhere by other applications, and there are insufficient counters remaining to open the bundle, the perf_event_open call will fail (provided the events are grouped with the group_fd param, which perfmon currently utilises). See arm/events.h for a list of PMUv3 events available, although it is implementation defined whether most events are implemented or not. Only a small set of 7 events is required to be implemented in Armv8.0, with some additional events required in later versions. As such, depending on the implementation, some statistics may not be available. See Arm Architecture Reference Manual for Armv8-A, D7.10.2 "The PMU event number space and common events" for more information. arm/events.c:arm_init() gets information from the sysfs about what events are implemented on a particular CPU at runtime. Arm's implementation of the perfmon source callback .bundle_support uses this information to disable unsupported events in a bundle, or in the case no events are supported, disable the entire bundle. Where a particular event in a bundle is not implemented, the statistic for that event is shown as '-' in the 'show perfmon statistics' cli output, by disabling the column. There is additional code in perfmon.c to only open events which are marked as implemented. Since we're only opening and reading events that are implemented, some extra logic is required in cli.c to re-align either perfmon_node_stats_t or perfmon_reading_t with the column headings configured in each bundle, taking into account disabled columns. Userspace access to perf counters is disabled by default, and needs to be enabled with 'sudo sysctl kernel/perf_user_access=1'. There is a check built into the Arm event source init function (arm/events.c:arm_init) to check that userspace reading of perf counters is enabled in the /proc/sys/kernel/perf_user_access file. If the above file does not exist, it means the kernel version is unsupported. Users without a supported kernel will see a warning message, and no Arm bundles will be registered to use in perfmon. Enabling/using plugin: - include the following in startup.conf: - plugins { plugin perfmon_plugin.so { enable } - 'show perfmon bundle [verbose]' - show available statistics bundles - 'perfmon start bundle <bundle-name>' - enable and start logging - 'perfmon stop' - stop logging - 'show perfmon statistics' - show output For a general guide on using and understanding Arm PMUv3 events, see https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/arm-neoverse-n1-performance-analysis-methodology Type: feature Signed-off-by: Zachary Leaf <zachary.leaf@arm.com> Tested-by: Jieqiang Wang <jieqiang.wang@arm.com> Change-Id: I0620fe5b1bbe78842dfb1d0b6a060bb99e777651
2022-07-12perfmon: make less arch dependentZachary Leaf1-92/+8
In preparation for enabling perfmon on Arm platforms, move some Intel /arch specific logic into the /intel directory and update the CMake to split the common code from arch specific files. Since the dispatch_wrapper code is very different on Arm/Intel, each arch can provide their own implementation + conduct any additional arch specific config e.g. on Intel, all indexes from the mmap pages are cached. The new method intel_config_dispatch_wrapper conducts this config and returns a pointer to the dispatch wrapper to use. Similarly, is_bundle_supported() looks very different on Arm/Intel, so each implementation is to provide their own arch specific checks. Two new callbacks/function ptrs are added in PERFMON_REGISTER_SOURCE to support this - .bundle_support and .config_dispatch_wrapper. Type: refactor Signed-off-by: Zachary Leaf <zachary.leaf@arm.com> Change-Id: Idd121ddcfd1cc80a57c949cecd64eb2db0ac8be3
2022-02-06perfmon: Fix typo in debug log messagesJon Loeliger1-2/+2
Signed-off-by: Jon Loeliger <jdl@netgate.com> Type: style Change-Id: I955c19ddbe06ef3651c03820fcc14054c63258b9
2022-01-30perfmon: fix init of bundles with pseudo eventsRay Kinsella1-11/+17
Previously Linux pseudo events were being counted as multiple fixed events, such that a bundle with pseudo events could exceed the number of available fixed counters. Reworked to ignore pseudo events in the accounting for the moment. Type: fix Fixes: 0024e53ad Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: Ic938f8266fd04d7731afbd02e261c61ef22a8522
2022-01-30perfmon: check for duplicates after other checksRay Kinsella1-3/+3
Move checking for duplicate bundle names after the other checks. Type: fix Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: I7fed5be758814e166eb8756b3df090130ac13bfd
2022-01-27perfmon: prune bundles by available pmu countersRay Kinsella1-7/+33
Prune perfmon bundles that exceed the number of available pmu counters. Type: improvement Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: I70fec26bb8ca915f4b980963e06c2e43dfde5a23
2022-01-12perfmon: skipping bundle messageRay Kinsella1-1/+1
Change the skipping bundle message to debug Type: refactor Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: I942ff72bd9c26ccad923442fdedddf22ba75e117
2021-12-02perfmon: compile dispatch wrapper once for each number of countersDamjan Marion1-2/+2
A bit ugly, but generates faster and less noisy code which should be important for this particular use case. Type: improvement Change-Id: If2bba947dac33ffedb4236a5b3fb50fc783668e1 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-12-02perfmon: refactor perf metric supportRay Kinsella1-8/+44
Refactoring perf metric support to remove branching on bundle type in the dispatch wrapper. This change includes caching the rdpmc index at perfmon_start(), so that the mmap_page.index doesn't need to be looked up each time. It also exclude the effects of mmap_page.index. This patch prepares the path for bundles that support general, fixed and metrics counters simulataneously. Type: refactor Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: I9c5b4917bd02fea960e546e8558452c4362eabc4
2021-10-26perfmon: properly unmap mmapped pagesKlement Sekera1-1/+1
Add missing array index so that actual mmapped pages are unmpapped instead of attempting to unmap array holding those pages. Type: fix Signed-off-by: Klement Sekera <ksekera@cisco.com> Change-Id: Ib8709cce1bcbfb505307c140266834b284af796c
2021-10-07perfmon: check bundle is supportedRay Kinsella1-0/+23
Add a check bundle is supported before futher activation. Enable different bundles with same name, supported on different platforms. Type: improvement Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: I73e8bbd1e07c05ebccd9146d48a234eb598a2388
2021-10-05perfmon: bundles with multiple typesRay Kinsella1-3/+3
Allow perfmon bundles to support more than one bundle type, either node or thread. Only used for topdown bundle for the moment. Type: improvement Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: Iba3653a4deb39b0a8ee8ad448a7e8f954283ccd8
2021-04-27perfmon: top down level 1 supportmdr781-2/+12
Adding perfmon node TMAM support on ICX. Type: improvement Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: I48a9a9ff6a72efc28eaf0cb11ef39fb62cebb126
2021-04-01perfmon: combined set and start command.Ray Kinsella1-6/+11
Original set, start, stop, reset, show etc interface was somewhat cumbersome, we can improve slightly by combining set and start. Type: improvement Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: I7b865b2c29d2ab32adbd24d7f8a580da6990bb76
2021-03-26vlib: introduce vlib_get_main_by_index(), vlib_get_n_threads()Damjan Marion1-8/+8
Type: improvement Change-Id: If3da7d4338470912f37ff1794620418d928fb77f Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-03-16perfmon: add support for raw and timestampsRay Kinsella1-0/+2
Add perfmon plugin support to output raw counter and timestamps, both are useful for debug. Type: improvement Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: Ia5a73d1f05e3464c18991c2346f0ed8b7ef63099
2020-12-18perfmon: new perfmon pluginDamjan Marion1-0/+336
Type: feature Change-Id: I2c14f82393d11fc05c6d229f5c58603ab5c0f14d Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-12-18misc: deprecate old perfmonDamjan Marion1-633/+0
Type: refactor Change-Id: I1303219f9f2a25d821737665903b0264edd3de32 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-12-14misc: move to new pool_foreach macrosDamjan Marion1-3/+3
Type: refactor Change-Id: Ie67dc579e88132ddb1ee4a34cb69f96920101772 Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-07-16misc: add callback hooks and refactor pmcTom Seidenberg1-16/+17
Callbacks for monitoring and performance measurement: - Add new callback list type, with context - Add callbacks for API, CLI, and barrier sync - Modify node dispatch callback to pass plugin-specific context - Modify perfmon plugin to keep PMC samples local to the plugin - Include process nodes in dispatch callback - Pass dispatch function return value to callback Type: refactor Signed-off-by: Tom Seidenberg <tseidenb@cisco.com> Change-Id: I28b06c58490611e08d76ff5b01b2347ba2109b22
2019-12-17perfmon: fix per-worker data initializationBenoît Ganne1-6/+0
When perfmon_init is called at initialization time worker threads are not created yet and vec_len(vlib_mains) returns 1. Initialize per-worker data when the number of workers is known, when enabling data collection instead. Type: fix Change-Id: I36887cc7b2a3e88d9728d3cd7262d9b1c968dd3c Signed-off-by: Benoît Ganne <bganne@cisco.com>
2019-05-03plugins: clean up plugin descriptionsDave Wallace1-1/+1
- Make plugin descriptions more consistent so the output of "show plugin" can be used in the wiki. Change-Id: I4c6feb11e7dcc5a4cf0848eed37f1d3b035c7dda Signed-off-by: Dave Wallace <dwallacelf@gmail.com>
2019-03-20perfmon: python to C parser for intel CPUsDamjan Marion1-187/+97
EXAMPLE: src/plugins/perfmon/intel_json_to_c.py \ -i skylakex_core_v1.12.json \ -o src/plugins/perfmon/perfmon_intel_skx.c \ -m 0x55,0 \ -m 0x55,1 \ -m 0x55,2 \ -m 0x55,3 Change-Id: I16ce059e231d340ecfcb6f6638e29c5b46304683 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-01-27perfmon: collect data on selected thread(s)Dave Barach1-3/+25
Add missing pre-input node runtime fork and refork code. unix-epoll-input runs on all threads; each instance needs its own runtime stats. Change-Id: I16b02e42d0c95f863161176c4bb9f9917bef809d Signed-off-by: Dave Barach <dave@barachs.net>
2019-01-24perfmon plugin: 2-way parallel stat collectionDave Barach1-21/+41
As a FUD reduction measure, this patch implements 2-way parallel counter collection. Synthetic stat component counter pairs run at the same time. Running two counters (of any kind) at the same time naturally reduces the aggregate time required by an approximate factor-of-2, depending on whether an even or odd number of stats have been requested. I don't completely buy the argument that computing synthetic stats such as instructions-per-clock will be inaccurate if component counter values are collected sequentially. Given uniform traffic pattern, it must make no difference. As the collection interval increases, the difference between serial and parallel component counter collection will approach zero, see also the Central Limit theorem. Change-Id: I36ebdcf125e8882cca8a1929ec58f17fba1ad8f1 Signed-off-by: Dave Barach <dave@barachs.net>
2018-12-17Improve perfmon json table pickerDave Barach1-22/+90
Built a tool to chew up https://download.01.org/perfmon/mapfile.csv, and output a table in this format: typedef struct { u8 model; u8 stepping; u8 has_stepping; char *filename; } file_by_model_and_stepping_t; static const file_by_model_and_stepping_t fms_table [] = { /* model, stepping, stepping valid, file */ { 0x2E, 0x0, 0, "NehalemEX_core_V2.json" }, { 0x1E, 0x0, 0, "NehalemEP_core_V2.json" }, <snip> { 0x55, 0x5, 1, "cascadelakex_core_v1.00.json" }, { 0x55, 0x6, 1, "cascadelakex_core_v1.00.json" }, { 0x55, 0x7, 1, "cascadelakex_core_v1.00.json" }, <snip> Change-Id: Ie0e8a7e851799e9d060b966047745039c066ec7b Signed-off-by: Dave Barach <dave@barachs.net>
2018-10-23perfmon.c: Register additional cpuids.Paul Vinciguerra1-4/+7
Added/tested additional cpuids from our testbed. Change-Id: Ifd3ea9e8e8231a8901966903bf5eceb635b82482 Signed-off-by: Paul Vinciguerra <pvinci@vinciconsulting.com>
2018-10-22X86_64 perf counter pluginDave Barach1-0/+615
Change-Id: Ie5a00c15ee9536cc61afab57f6cadc1aa1972f3c Signed-off-by: Dave Barach <dave@barachs.net>