summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJay Wang <jay.wang2@arm.com>2024-04-19 12:16:41 +0000
committerDamjan Marion <dmarion@0xa5.net>2024-07-09 15:35:52 +0000
commitc44fa9355bb8a5f0315c49ed56bc44799e1fd84f (patch)
treeacd93d291d2ff0067378a320ecec7b5aa3439733
parent3fe610a2aae76263b44864b8239c22951c67e752 (diff)
vppinfra: fix huge page alloc error on 5.19+ kernel
Running VPP on a NUMA system with 5.19+ kernel outputs the following error messages. 'show physmem' command confirms that VPP falls back to using normal 4K pages instead of the preallocated 1G huge pages. The root cause is that VPP uses move_pages()[1] to get the huge page node information. However, this misbehaves on the 5.19+ kernel due to changes introduced in its implementation[2]. Our proposed fix is retry obtaining NUMA node info with get_mempolicy()[3] only if we see -ENOENT returned in status from move_pages() and huge pages are used. Additionally, we use mincore()[4] to check if pages are allocated and in memory to avoid the possibility of get_mempolicy() falsely allocating a new page. buffer [warn ]: numa[1] falling back to non-hugepage backed buffer pool () vpp# show physmem used-pages 2 reserved-pages 16 default-page-size 1G lookup-page-size 4K arena 'buffers-numa-0' pages 1 subpage-size 1G numa-node 0 shared fd 5 arena 'buffers-numa-1' pages 1 subpage-size 4K numa-node 1 shared fd 6 [1] https://man7.org/linux/man-pages/man2/move_pages.2.html [2] https://lore.kernel.org/linux-mm/91da2c3b-96f1-bb03-8fff-4c38f31cb9be@huawei.com/ [3] https://man7.org/linux/man-pages/man2/get_mempolicy.2.html [4] https://man7.org/linux/man-pages/man2/mincore.2.html Type: fix Signed-off-by: Jay Wang <jay.wang2@arm.com> Change-Id: Ia423745423bb080404292333ef95455a4950ce0a
-rw-r--r--src/vppinfra/linux/mem.c14
1 files changed, 14 insertions, 0 deletions
diff --git a/src/vppinfra/linux/mem.c b/src/vppinfra/linux/mem.c
index 734f5c4788c..21aaa55fc00 100644
--- a/src/vppinfra/linux/mem.c
+++ b/src/vppinfra/linux/mem.c
@@ -530,6 +530,7 @@ clib_mem_get_page_stats (void *start, clib_mem_page_sz_t log2_page_size,
{
int i, *status = 0;
void **ptr = 0;
+ unsigned char incore;
log2_page_size = clib_mem_log2_page_size_validate (log2_page_size);
@@ -551,6 +552,19 @@ clib_mem_get_page_stats (void *start, clib_mem_page_sz_t log2_page_size,
for (i = 0; i < n_pages; i++)
{
+ /* move_pages() returns -ENONET in status for huge pages on 5.19+ kernel.
+ * Retry with get_mempolicy() to obtain NUMA node info only if the pages
+ * are allocated and in memory, which is checked by mincore(). */
+ if (status[i] == -ENOENT &&
+ syscall (__NR_mincore, ptr[i], 1, &incore) == 0 && (incore & 1) != 0)
+ {
+ if (syscall (__NR_get_mempolicy, &status[i], 0, 0, ptr[i],
+ MPOL_F_NODE | MPOL_F_ADDR) != 0)
+ {
+ /* if get_mempolicy fails, keep the original value in status */
+ status[i] = -ENONET;
+ }
+ }
if (status[i] >= 0 && status[i] < CLIB_MAX_NUMAS)
{
stats->mapped++;