Age | Commit message (Collapse) | Author | Files | Lines |
|
Add u64x2_scatter/u32x4_scatter in vector_neon.h. u64x2_scatter/u32x4_scatter
scatters data from SIMD register to scattered memory locations.
Type: feature
Change-Id: I298d5478c7ba6935ab7402a6d467c7ee00f17e9f
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Sirshak Das <Sirshak.Das@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
|
|
Add u64x2_gather/u32x4_gather in vector_neon.h. u64x2_gather/u32x4_gather
gathers data from scattered memory locations to a SIMD register.
Type: feature
Change-Id: I1dd27e38af28b9bed85143014c86197ee549fede
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Sirshak Das <Sirshak.Das@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
|
|
Neon version of memcpy_le gives better performance compared with memmove on aarch64
Change-Id: I44b487bb0795a6e70dd1e55bdde4a077773ec859
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Sirshak Das <sirdas@arm.com>
|
|
int vaddvq_u8 (uint8x16_t __a) is not appropriate to implement xxx_is_all_zero,
as there may be overflow causing incorrect return value.
Here's an example.
u8x16 x = {0 <repeats 12 times>, 1, 255, 0, 0};
Change-Id: Ia6a10bdf8da360dec12db902d028751a1a77e9a4
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Sirshak Das <Sirshak.Das@arm.com>
|
|
Optimize zero byte mask NEON functions below with less intrinsics,
and get their outputs consistent with functions in vector_sse42.h
always_inline u32 u64x2_zero_byte_mask (u64x2 input)
always_inline u32 u32x4_zero_byte_mask (u32x4 input)
always_inline u32 u16x8_zero_byte_mask (u16x8 input)
always_inline u32 u8x16_zero_byte_mask (u8x16 input)
always_inline u32 i64x2_zero_byte_mask (i64x2 input)
always_inline u32 i32x4_zero_byte_mask (i32x4 input)
always_inline u32 i16x8_zero_byte_mask (i16x8 input)
always_inline u32 i8x16_zero_byte_mask (i8x16 input)
Change-Id: I7f485915baeb37fa2dd484699b8769e0136f6574
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Sirshak Das <Sirshak.Das@arm.com>
|
|
This patch enables the use of this function for enqueuing frames to the next graph node.
Change-Id: I4003110db59870f7106e0d13942d6ff7bc54b46d
Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com>
Reviewed-by: Sirshak Das <Sirshak.Das@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Steve Capper <Steve.Capper@arm.com>
|
|
This fixes the l2BD and ip4 test case failures.
Fixes VPP-1432, VPP-1428, VPP-1430
Change-Id: I48b5c961bab60cc3b39fcd6db47e098c81579480
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
|
|
This is the high version of extendto. This function accomplishes the
same task as both shuffling and extending done by SSE intrinsics.
This enables the NEON version for buffer indexes to buffer pointer
translation.
Change-Id: I52d7bbf3d76ba69c9acb0e518ff4bc6abf3bbbd4
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Yi He <yi.he@arm.com>
Verified-by: Lijian Zhang <lijian.zhang@arm.com>
|
|
Using rev16 vector intrinsic to reverse byteorder in each word
independently.
Change-Id: I071c40780baffe0bda614ec5d9dd92858f574b0d
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Brian Brooks <brian.brooks@arm.com>
Reviewed-by: Yi He <yi.he@arm.com>
Verified-by: Lijian Zhang <lijian.zhang@arm.com>
|
|
This is used in vlib_get_buffers_with_offset.
Change-Id: If4ff776bc97d21a22e870300b164eeb6a5ec3638
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Brian Brooks <brian.brooks@arm.com>
Reviewed-by: Yi He <yi.he@arm.com>
Verified-by: Lijian Zhang <lijian.zhang@arm.com>
|
|
Having the NEON equivalent of u32x4_hadd for CLIB_HAVE_VEC128
Change-Id: I210f96f7ecb9b80b4753311a68e5e09ccda7e95b
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Brian Brooks <brian.brooks@arm.com>
Reviewed-by: Yi He <yi.he@arm.com>
Verified-by: Lijian Zhang <lijian.zhang@arm.com>
|
|
This adds byte_swap (variant of shuffle) and shuffle vector intrinsic
for ARM based on Neon, concuring with same signature as SSE vector
intrinsic.
Change-Id: I386fd2b1dcc83654e4ad9f90a6065d7736e4ce5c
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
|
|
Add aarch64 neon intrinsics to fix build failures similar to this:
error: implicit declaration of function ‘u64x2_load_unaligned’
Change-Id: I6178504a48242742df3f7d75abdaf108796cf73f
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
|
|
platform. VPP-1129
Change-Id: I954acb56d901e42976e71534317f38d7c4359bcf
Signed-off-by: Adrian Oanca <adrian.oanca@enea.com>
|
|
Nintin
Change-Id: I88aabd34ef385d620695ac17ec3fe2f4a5177ada
Signed-off-by: Adrian Oanca <adrian.oanca@enea.com>
|
|
Change-Id: I5045e0f3ac4698e820b69ad46b96763e404e6fe4
Signed-off-by: Adrian Oanca <adrian.oanca@enea.com>
|
|
Change-Id: Ie9f611fa6a962b0937245f5cc949571ba11c5604
Signed-off-by: Damjan Marion <damarion@cisco.com>
|
|
Enable CLIB_HAVE_VEC128 if both aarch64 and __ARM_NEON
ie. armv8 only, not armv7
Add more neon compare intrinsics wrappers.
I only add simple intrinsics wrappers. More complex ones can be added
later as they are needed, with performance tests on the corresponding
feature to back them up.
Remove wrongly added 128bits definitions defined on both armv7 and armv8
without concern for NEON instructions presence.
Notable correspondinf code activations:
* MHEAP_FLAG_SMALL_OBJECT_CACHE in mheap.c
* ip4 fib mtrie leaves access
* enable ixge plugin compilation for aarch64
(conf still disables it by default)
Change-Id: I99953823627bdff6f222d232c78aa7b655aaf77a
Signed-off-by: Gabriel Ganne <gabriel.ganne@enea.com>
|
|
Change-Id: I7b51f88292e057c6443b12224486f2d0c9f8ae23
Signed-off-by: Damjan Marion <damarion@cisco.com>
|