Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch enables statistics from the Arm PMUv3 through the perfmon
plugin.
In comparison to using the Linux "perf" tool, it allows obtaining
direct, per node level statistics (rather than per thread). By accessing
the PMU counter registers directly from userspace, we can avoid the
overhead of using a read() system call and get more accurate and fine
grained statistics about the running of individual nodes.
A demo of perfmon on Arm can be found at:
https://asciinema.org/a/egVNN1OF7JEKHYmfl5bpDYxfF
*Important Note*
Perfmon on Arm is dependent on and works only on Linux kernel versions
of v5.17+ as this is when userspace access to Arm perf counters was
included.
On most Arm systems, a maximum of 7 PMU events can be configured at once
- (6x PMU events + 1x CPU_CYCLE counter). If some perf counters are in
use elsewhere by other applications, and there are insufficient counters
remaining to open the bundle, the perf_event_open call will fail
(provided the events are grouped with the group_fd param, which perfmon
currently utilises).
See arm/events.h for a list of PMUv3 events available, although it is
implementation defined whether most events are implemented or not. Only
a small set of 7 events is required to be implemented in Armv8.0, with
some additional events required in later versions. As such, depending on
the implementation, some statistics may not be available. See Arm
Architecture Reference Manual for Armv8-A, D7.10.2 "The PMU event number
space and common events" for more information.
arm/events.c:arm_init() gets information from the sysfs about what
events are implemented on a particular CPU at runtime. Arm's
implementation of the perfmon source callback .bundle_support uses this
information to disable unsupported events in a bundle, or in the case
no events are supported, disable the entire bundle.
Where a particular event in a bundle is not implemented, the statistic
for that event is shown as '-' in the 'show perfmon statistics' cli
output, by disabling the column.
There is additional code in perfmon.c to only open events which are
marked as implemented. Since we're only opening and reading events that
are implemented, some extra logic is required in cli.c to re-align
either perfmon_node_stats_t or perfmon_reading_t with the column
headings configured in each bundle, taking into account disabled
columns.
Userspace access to perf counters is disabled by default, and needs to
be enabled with 'sudo sysctl kernel/perf_user_access=1'.
There is a check built into the Arm event source init function
(arm/events.c:arm_init) to check that userspace reading of perf counters
is enabled in the /proc/sys/kernel/perf_user_access file.
If the above file does not exist, it means the kernel version is
unsupported. Users without a supported kernel will see a warning
message, and no Arm bundles will be registered to use in perfmon.
Enabling/using plugin:
- include the following in startup.conf:
- plugins { plugin perfmon_plugin.so { enable }
- 'show perfmon bundle [verbose]' - show available statistics bundles
- 'perfmon start bundle <bundle-name>' - enable and start logging
- 'perfmon stop' - stop logging
- 'show perfmon statistics' - show output
For a general guide on using and understanding Arm PMUv3 events, see
https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/arm-neoverse-n1-performance-analysis-methodology
Type: feature
Signed-off-by: Zachary Leaf <zachary.leaf@arm.com>
Tested-by: Jieqiang Wang <jieqiang.wang@arm.com>
Change-Id: I0620fe5b1bbe78842dfb1d0b6a060bb99e777651
|
|
Type: improvement
Change-Id: I20b41537a249a55f01004e45392b34adaa8fd792
Signed-off-by: Mohammed Hawari <mohammed@hawari.fr>
|
|
Prune perfmon bundles that exceed the number of available pmu counters.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I70fec26bb8ca915f4b980963e06c2e43dfde5a23
|
|
Added topdown level 2 support on sapphire rapids,
including ability to indentify a sapphire rapids cpu.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I9f99a92fa0886b98bb5185cff32bebd5a094f329
|
|
Type: improvement
Change-Id: I6d812339f626ea630ad9354632d2f9506122d379
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: improvement
Change-Id: I6ba9f9467a3990f1436a60b1dbc6cb795fd18ba9
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: make
Change-Id: Ia39c2f847e7b4907fdd3fcac75897b8b4c545a65
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Adding perfmon node TMAM support on ICX.
Type: improvement
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Change-Id: I48a9a9ff6a72efc28eaf0cb11ef39fb62cebb126
|
|
For portabiliy reasons it is better to have all wrapped in clib code.
I.e. instead of using getcpu() we have clib_get_current_numa_node () and
clib_get_current_cpu_id().
Type: refactor
Change-Id: I29b52d7f29bc7f93873402c4070561f564b71c63
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
It allows default variant selection from startup.conf
Type: improvement
Change-Id: Idff95e12dd0c105dab7c905089548b05a6e974e0
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Type: fix
This reverts commit 5a48b3b9d88fa2793793e2bf3db8bf156fe2951f.
Change-Id: Ifa91b18bdbbc32bb729abc09d95637d9cdf42c3b
Signed-off-by: Damjan Marion <damarion@cisco.com>
Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
|
|
Propagate the multi-arch variant selection to interfaces.
Type: fix
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Change-Id: I99c4a7896f172f0d14d2ded22a27383825529a7d
|
|
Performance improvement: on supported platforms, currently only Intel Tremont,
use a write combining store to update the tail pointers.
Also, Tremont node variant is added for all.
Type: improvement
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Change-Id: Ie9606e403b7d9655184f778e3ffee3027c8c9edd
|
|
Type: feature
- Added multiarch support for Marvell OCTEONTX2
- Corrected Marvell THUNDERX2 spelling
Signed-off-by: Nitin Saxena <nsaxena@marvell.com>
Change-Id: I42d3654aeda89759a2db9d695592ee3227b26c2b
|
|
Type: improvement
Change-Id: Ief243f88e654e578ef9b8060fcf535b364aececb
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Enable arch-specific compiling and dynamic optimal function selection
for Arm Neoverse-N1.
Support for -march=armv8.2-a+crc+crypto -mtune=neoverse-n1 for
Neoverse-N1 is added starting from gcc-9.2.0
without change with change improvement
[L2 - 1x flows] 11.00 Mpps 11.46 Mpps 4%
[L2 - 10Kx flows] 6.83 Mpps 7.17 Mpps 5%
[L3 - 1x flows] 10.39 Mpps 10.78 Mpps 3.7%
[L3 - 10Kx flows] 6.67 Mpps 7.19 Mpps 7.8%
Type: feature
Change-Id: I5d24d17820b3dd6909b913202e8c31fc7d48650f
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Jieqiang Wang <Jieqiang.Wang@arm.com>
Reviewed-by: Govindarajan Mohandoss <Govindarajan.Mohandoss@arm.com>
|
|
Change-Id: If59fea9bb733f6a134d9cb5a9f61dfabf3441820
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
The macro for x86 platform is wrong, and the patch fixes it.
Change-Id: I4f0ed254b177d6863fbe9113308c6a8b4478e0b2
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
|
|
Change-Id: Ic2a61c45face18b6dd23d5509d75367d3b2441e9
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I79f65d7bd19af035442dca48b7be6c94de20c167
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
Use of scalar AVX512 instructions impacts the maximum CPU frequency
as they require CPU to request level 1 power license.
So sparse use of AVX512 like we do, typically causes more harm than
improvement.
This patch makes AVX2 higher priority than AVX512, but still there
is option to change it on per-node basis:
vpp# set node function ethernet-input avx512
Change-Id: I42e03510a6efc1756e22fbb70e8c76d7f74f59d7
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
It is causing compilation sloness with gcc-7 so removing it
before it was originally planned.
So far macros are left in the tree so we can know which nodes to
convert to new multiarch code.
Change-Id: Idb14622ca61fdce1eba59723b20d98715b7971e6
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
This patch introduces following changes:
- deprecated free lists which are not used and not compatible
with external buffer managers (i.e. DPDK)
- introduces native support for per-numa buffer pools
- significantly improves performance of buffer alloc and free
Change-Id: I4a8e723ae47056717afd6cac0efe87cb731b5be7
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I79b213b34c6071d14acf1922f89037a4a5a36c45
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: If88ccd965122b9318a39a8d71b53334cd1fd81e4
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I5b4cd419d317381a06e7e6d703373959f4bbd97b
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Currently, there are three variants available on aarch64, qdf24xx, thunderx2t99, and cortex-a72.
-DCLIB_N_PREFETCHES is passed to source code to select dual/quad implementation.
Besides, different compiler options are applied on these critical functions.
gcc-7.3.0 reports ICE(internal compiler error) with -mtune=thunderx2t99,
so -mtune=thunderx2t99 is enabled only when gcc version is greater than 7.3.0
Cavium ThunderX2, Impermenter 0x43, Part 0x0af
-march=armv8-a+crc+crypto -mtune=thunderx2t99
Qualcomm Centriq 2400, Impermenter 0x51, Part 0xc00
-march=armv8.1-a+crc+crypto -mtune=qdf24xx
Cortex-A72, Impermenter 0x41, Part 0xd08
-march=armv8-a+crc+crypto -mtune=cortex-a72
Change-Id: Id5649c6325c1e642d0fd42535e3908793b13e02a
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
|
|
Change-Id: Ifa4fceef7edbe43d444790a624957db0817064de
Signed-off-by: Florin Coras <fcoras@cisco.com>
|
|
Change-Id: Ibab5e27277f618ceb2d543b9d6a1a5f191e7d1db
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: Ia1b49d7fd5f32d9a5139df5df636b46264003a63
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
use getauxval(AT_HWCAP) to get the processor capabilities.
The result should be the same as calling
cat /proc/cpuinfo | grep Feature | head -n1
All but one (aes) features have a different name.
handle aes by adding it an arch prefix, which is skipped during print
and a clib_cpu_supports_aes() custom function.
Change-Id: If9830bd5a17bac1bd1b5337dacbb0ddbb8ed6b18
Signed-off-by: Gabriel Ganne <gabriel.ganne@enea.com>
|
|
Change-Id: If581feca0d51d0420c971801aecdf9250c671b36
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Change-Id: I7b51f88292e057c6443b12224486f2d0c9f8ae23
Signed-off-by: Damjan Marion <damarion@cisco.com>
|