summaryrefslogtreecommitdiffstats
path: root/src/vppinfra/cpu.h
AgeCommit message (Collapse)AuthorFilesLines
2022-07-12perfmon: enable perfmon plugin for ArmZachary Leaf1-2/+2
This patch enables statistics from the Arm PMUv3 through the perfmon plugin. In comparison to using the Linux "perf" tool, it allows obtaining direct, per node level statistics (rather than per thread). By accessing the PMU counter registers directly from userspace, we can avoid the overhead of using a read() system call and get more accurate and fine grained statistics about the running of individual nodes. A demo of perfmon on Arm can be found at: https://asciinema.org/a/egVNN1OF7JEKHYmfl5bpDYxfF *Important Note* Perfmon on Arm is dependent on and works only on Linux kernel versions of v5.17+ as this is when userspace access to Arm perf counters was included. On most Arm systems, a maximum of 7 PMU events can be configured at once - (6x PMU events + 1x CPU_CYCLE counter). If some perf counters are in use elsewhere by other applications, and there are insufficient counters remaining to open the bundle, the perf_event_open call will fail (provided the events are grouped with the group_fd param, which perfmon currently utilises). See arm/events.h for a list of PMUv3 events available, although it is implementation defined whether most events are implemented or not. Only a small set of 7 events is required to be implemented in Armv8.0, with some additional events required in later versions. As such, depending on the implementation, some statistics may not be available. See Arm Architecture Reference Manual for Armv8-A, D7.10.2 "The PMU event number space and common events" for more information. arm/events.c:arm_init() gets information from the sysfs about what events are implemented on a particular CPU at runtime. Arm's implementation of the perfmon source callback .bundle_support uses this information to disable unsupported events in a bundle, or in the case no events are supported, disable the entire bundle. Where a particular event in a bundle is not implemented, the statistic for that event is shown as '-' in the 'show perfmon statistics' cli output, by disabling the column. There is additional code in perfmon.c to only open events which are marked as implemented. Since we're only opening and reading events that are implemented, some extra logic is required in cli.c to re-align either perfmon_node_stats_t or perfmon_reading_t with the column headings configured in each bundle, taking into account disabled columns. Userspace access to perf counters is disabled by default, and needs to be enabled with 'sudo sysctl kernel/perf_user_access=1'. There is a check built into the Arm event source init function (arm/events.c:arm_init) to check that userspace reading of perf counters is enabled in the /proc/sys/kernel/perf_user_access file. If the above file does not exist, it means the kernel version is unsupported. Users without a supported kernel will see a warning message, and no Arm bundles will be registered to use in perfmon. Enabling/using plugin: - include the following in startup.conf: - plugins { plugin perfmon_plugin.so { enable } - 'show perfmon bundle [verbose]' - show available statistics bundles - 'perfmon start bundle <bundle-name>' - enable and start logging - 'perfmon stop' - stop logging - 'show perfmon statistics' - show output For a general guide on using and understanding Arm PMUv3 events, see https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/arm-neoverse-n1-performance-analysis-methodology Type: feature Signed-off-by: Zachary Leaf <zachary.leaf@arm.com> Tested-by: Jieqiang Wang <jieqiang.wang@arm.com> Change-Id: I0620fe5b1bbe78842dfb1d0b6a060bb99e777651
2022-05-24vlib: implement aux data handoffMohammed Hawari1-0/+3
Type: improvement Change-Id: I20b41537a249a55f01004e45392b34adaa8fd792 Signed-off-by: Mohammed Hawari <mohammed@hawari.fr>
2022-01-27perfmon: prune bundles by available pmu countersRay Kinsella1-0/+18
Prune perfmon bundles that exceed the number of available pmu counters. Type: improvement Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: I70fec26bb8ca915f4b980963e06c2e43dfde5a23
2021-10-16perfmon: topdown lvl 2 support on sapphire rapidsRay Kinsella1-0/+1
Added topdown level 2 support on sapphire rapids, including ability to indentify a sapphire rapids cpu. Type: improvement Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: I9f99a92fa0886b98bb5185cff32bebd5a094f329
2021-05-07vppinfra: add universal array mask_compare and compress funcsDamjan Marion1-0/+1
Type: improvement Change-Id: I6d812339f626ea630ad9354632d2f9506122d379 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-05-05vppinfra: enfoce function pointer cast in CLIB_MARCH_FN_POINTERDamjan Marion1-2/+2
Type: improvement Change-Id: I6ba9f9467a3990f1436a60b1dbc6cb795fd18ba9 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-04-30build: switch release build to -O3Damjan Marion1-18/+11
Type: make Change-Id: Ia39c2f847e7b4907fdd3fcac75897b8b4c545a65 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-04-27perfmon: top down level 1 supportmdr781-0/+1
Adding perfmon node TMAM support on ICX. Type: improvement Signed-off-by: Ray Kinsella <mdr@ashroe.eu> Change-Id: I48a9a9ff6a72efc28eaf0cb11ef39fb62cebb126
2021-04-18vppinfra: remove linux/syscall.hDamjan Marion1-15/+2
For portabiliy reasons it is better to have all wrapped in clib code. I.e. instead of using getcpu() we have clib_get_current_numa_node () and clib_get_current_cpu_id(). Type: refactor Change-Id: I29b52d7f29bc7f93873402c4070561f564b71c63 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-03-11vlib: refactor node function variantsDamjan Marion1-19/+22
It allows default variant selection from startup.conf Type: improvement Change-Id: Idff95e12dd0c105dab7c905089548b05a6e974e0 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-02-04vlib: "revert startup multi-arch variant configuration fix for interfaces"Damjan Marion1-20/+0
Type: fix This reverts commit 5a48b3b9d88fa2793793e2bf3db8bf156fe2951f. Change-Id: Ifa91b18bdbbc32bb729abc09d95637d9cdf42c3b Signed-off-by: Damjan Marion <damarion@cisco.com> Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
2021-01-25vlib: startup multi-arch variant configuration fix for interfacesRadu Nicolau1-0/+20
Propagate the multi-arch variant selection to interfaces. Type: fix Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Change-Id: I99c4a7896f172f0d14d2ded22a27383825529a7d
2021-01-21avf: use write combining store for queues tail updateRadu Nicolau1-23/+32
Performance improvement: on supported platforms, currently only Intel Tremont, use a write combining store to update the tail pointers. Also, Tremont node variant is added for all. Type: improvement Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Change-Id: Ie9606e403b7d9655184f778e3ffee3027c8c9edd
2020-08-27vppinfra: Multiarch support for OCTEONTX2 SoCNitin Saxena1-4/+16
Type: feature - Added multiarch support for Marvell OCTEONTX2 - Corrected Marvell THUNDERX2 spelling Signed-off-by: Nitin Saxena <nsaxena@marvell.com> Change-Id: I42d3654aeda89759a2db9d695592ee3227b26c2b
2020-04-30build: rework x86 CPU variantsDamjan Marion1-3/+14
Type: improvement Change-Id: Ief243f88e654e578ef9b8060fcf535b364aececb Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-04-30build: support arch-specific compiling for Neoverse N1Lijian.Zhang1-0/+11
Enable arch-specific compiling and dynamic optimal function selection for Arm Neoverse-N1. Support for -march=armv8.2-a+crc+crypto -mtune=neoverse-n1 for Neoverse-N1 is added starting from gcc-9.2.0 without change with change improvement [L2 - 1x flows] 11.00 Mpps 11.46 Mpps 4% [L2 - 10Kx flows] 6.83 Mpps 7.17 Mpps 5% [L3 - 1x flows] 10.39 Mpps 10.78 Mpps 3.7% [L3 - 10Kx flows] 6.67 Mpps 7.19 Mpps 7.8% Type: feature Change-Id: I5d24d17820b3dd6909b913202e8c31fc7d48650f Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Jieqiang Wang <Jieqiang.Wang@arm.com> Reviewed-by: Govindarajan Mohandoss <Govindarajan.Mohandoss@arm.com>
2019-05-27vppinfra: add pclmulqdq x86 cpuid flagDamjan Marion1-0/+1
Change-Id: If59fea9bb733f6a134d9cb5a9f61dfabf3441820 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-04-23vppinfra: fix one macro issue in clib_cpu_supports_aes()Zhiyong Yang1-1/+1
The macro for x86 platform is wrong, and the patch fixes it. Change-Id: I4f0ed254b177d6863fbe9113308c6a8b4478e0b2 Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
2019-03-30vppinfra: add few more cpuid flagsDamjan Marion1-0/+7
Change-Id: Ic2a61c45face18b6dd23d5509d75367d3b2441e9 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-03-20cpu: fix rtype for fn selected by CLIB_MARCH_FNFlorin Coras1-1/+1
Change-Id: I79f65d7bd19af035442dca48b7be6c94de20c167 Signed-off-by: Florin Coras <fcoras@cisco.com>
2019-03-12Use AVX2 as default even on AVX512 systemsDamjan Marion1-1/+1
Use of scalar AVX512 instructions impacts the maximum CPU frequency as they require CPU to request level 1 power license. So sparse use of AVX512 like we do, typically causes more harm than improvement. This patch makes AVX2 higher priority than AVX512, but still there is option to change it on per-node basis: vpp# set node function ethernet-input avx512 Change-Id: I42e03510a6efc1756e22fbb70e8c76d7f74f59d7 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-02-02Deprecate old mutliarch code, phase 1Damjan Marion1-6/+2
It is causing compilation sloness with gcc-7 so removing it before it was originally planned. So far macros are left in the tree so we can know which nodes to convert to new multiarch code. Change-Id: Idb14622ca61fdce1eba59723b20d98715b7971e6 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-01-30buffers: major cleanup and improvementsDamjan Marion1-0/+56
This patch introduces following changes: - deprecated free lists which are not used and not compatible with external buffer managers (i.e. DPDK) - introduces native support for per-numa buffer pools - significantly improves performance of buffer alloc and free Change-Id: I4a8e723ae47056717afd6cac0efe87cb731b5be7 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-01-27Fix issue with cpu_id and numa_code captured too earlyDamjan Marion1-1/+1
Change-Id: I79b213b34c6071d14acf1922f89037a4a5a36c45 Signed-off-by: Damjan Marion <damarion@cisco.com>
2019-01-20Store numa-noda and cpu-index in vlib_main_tDamjan Marion1-0/+17
Change-Id: If88ccd965122b9318a39a8d71b53334cd1fd81e4 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-11-29do not opttimize graph node functions in debug buildsDamjan Marion1-1/+1
Change-Id: I5b4cd419d317381a06e7e6d703373959f4bbd97b Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-10-01Support dynamic dual/quad loop selection on aarch64Lijian Zhang1-0/+90
Currently, there are three variants available on aarch64, qdf24xx, thunderx2t99, and cortex-a72. -DCLIB_N_PREFETCHES is passed to source code to select dual/quad implementation. Besides, different compiler options are applied on these critical functions. gcc-7.3.0 reports ICE(internal compiler error) with -mtune=thunderx2t99, so -mtune=thunderx2t99 is enabled only when gcc version is greater than 7.3.0 Cavium ThunderX2, Impermenter 0x43, Part 0x0af -march=armv8-a+crc+crypto -mtune=thunderx2t99 Qualcomm Centriq 2400, Impermenter 0x51, Part 0xc00 -march=armv8.1-a+crc+crypto -mtune=qdf24xx Cortex-A72, Impermenter 0x41, Part 0xd08 -march=armv8-a+crc+crypto -mtune=cortex-a72 Change-Id: Id5649c6325c1e642d0fd42535e3908793b13e02a Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
2018-09-24svm: march svm_fifo take 2Florin Coras1-0/+29
Change-Id: Ifa4fceef7edbe43d444790a624957db0817064de Signed-off-by: Florin Coras <fcoras@cisco.com>
2018-05-29Add VLIB_NODE_FN() macro to simplify multiversioning of node functionsDamjan Marion1-2/+25
Change-Id: Ibab5e27277f618ceb2d543b9d6a1a5f191e7d1db Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-05-28Change optimizaton level from tree-vectorize to O3Damjan Marion1-1/+1
Change-Id: Ia1b49d7fd5f32d9a5139df5df636b46264003a63 Signed-off-by: Damjan Marion <damarion@cisco.com>
2017-12-05fill "show cpu" Flag list on aarch64 platforms (VPP-1065)Gabriel Ganne1-4/+65
use getauxval(AT_HWCAP) to get the processor capabilities. The result should be the same as calling cat /proc/cpuinfo | grep Feature | head -n1 All but one (aes) features have a different name. handle aes by adding it an arch prefix, which is skipped during print and a clib_cpu_supports_aes() custom function. Change-Id: If9830bd5a17bac1bd1b5337dacbb0ddbb8ed6b18 Signed-off-by: Gabriel Ganne <gabriel.ganne@enea.com>
2017-11-13dpdk: introduce AVX512 variants of node functionsDamjan Marion1-0/+7
Change-Id: If581feca0d51d0420c971801aecdf9250c671b36 Signed-off-by: Damjan Marion <damarion@cisco.com>
2016-12-28Reorganize source tree to use single autotools instanceDamjan Marion1-0/+112
Change-Id: I7b51f88292e057c6443b12224486f2d0c9f8ae23 Signed-off-by: Damjan Marion <damarion@cisco.com>