aboutsummaryrefslogtreecommitdiffstats
path: root/src/vppinfra/vector_neon.h
AgeCommit message (Collapse)AuthorFilesLines
2023-03-23vppinfra: AES-CBC and AES-GCM refactor and optimizationsDamjan Marion1-0/+55
- crypto code moved to vppinfra for better testing and reuse - added 256-bit VAES support (Intel Client CPUs) - added AES_GMAC functions Change-Id: I960c8e14ca0a0126703e8f1589d86f32e2a98361 Type: improvement Signed-off-by: Damjan Marion <damarion@cisco.com>
2022-02-10vppinfra:remove uXxY_is_greaterDamjan Marion1-6/+0
not needed, '>' works... Type: improvement Change-Id: I9bfbac787e8dca24e9b1bceda0922740eed26346 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-12-02vppinfra: vector shuffle cleanupDamjan Marion1-6/+0
Type: refactor Change-Id: I8b3fc2ce30df313467274a174c5ac6adbf296153 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-11-10vppinfra: new vectorized ip checksum functions incl. csum_and_copyDamjan Marion1-0/+12
Type: improvement Change-Id: Id5810b7f4a6d6e4ce16b73c235b50db5d475ebf7 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-07-13vppinfra: fix saturate add/sub NEON wrappersLijian.Zhang1-40/+60
Fix the saturate add/sub wrappers in vector_neon.h by using the correct intrinsics. Type: fix Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Tianyu Li <Tianyu.Li@arm.com> Change-Id: I38a85633948472d4bdb1c199a806633d3070013f
2020-08-31vppinfra: convert A_extend_to_B to B_from_A format of vector inlinesDamjan Marion1-2/+2
Make it shorter and same format when converting to biggor or smaller types. Type: refactor Change-Id: I443d67e18ae65d779b4d9a0dce5406f7d9f0e4ac Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-05-08vppinfra: fix u32x4_byte_swap on ArmLijian.Zhang1-1/+1
Fix the endianness conversion function u32x4_byte_swap() on Arm. Here's an example of using this function with and without the fix. This issue is seen using Mellanox NIC RDMA driver on Arm servers. The packet length cannot be parsed correctly. Testing code: u32x4 s = {0x12345678, 0x23456789, 0x3456789a, 0x456789ab}; u32x4 ss = u32x4_byte_swap (s); Without the code change: (gdb) p /x s $1 = {0x12345678, 0x23456789, 0x3456789a, 0x456789ab} (gdb) p /x ss $2 = {0x23456789, 0x12345678, 0x456789ab, 0x3456789a} With the code change: (gdb) p /x s $3 = {0x12345678, 0x23456789, 0x3456789a, 0x456789ab} (gdb) p /x ss $4 = {0x78563412, 0x89674523, 0x9a785634, 0xab896745} Type: fix Change-Id: Ie5f263e94331783940e7c00397092a64e4fc4279 Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Jieqiang Wang <Jieqiang.Wang@arm.com> Reviewed-by: Govindarajan Mohandoss <Govindarajan.Mohandoss@arm.com>
2020-03-16rdma: add Mellanox mlx5 Direct Verbs receive supportDamjan Marion1-0/+6
Type: feature Change-Id: I3f287ab536a482c366ad7df47e1c04e640992ebc Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-02-18vppinfra: fix debug image builds on armDamjan Marion1-11/+2
vextq_u8(...) reuqires constant value so instead of inline function we need to use macro. Type: fix Signed-off-by: Damjan Marion <dmarion@me.com> Change-Id: I9c1d878c9ec750f0ed5b5eac4dffde50e97e7357
2020-02-13vppinfra: add 128-bit and 512-bit a ^ b ^ c shortcutDamjan Marion1-0/+11
This allows us to combine 2 XOR operations into signle instruction which makes difference in crypto op: - in x86, by using ternary logic instruction - on ARM, by using EOR3 instruction (available with sha3 feature) Type: refactor Change-Id: Ibdf9001840399d2f838d491ca81b57cbd8430433 Signed-off-by: Damjan Marion <damjan.marion@gmail.com>
2020-02-13crypto-native: add AArch64 AES-GCM native implementationDamjan Marion1-0/+9
Type: feature Change-Id: I4f96b0af13b875d491704b010328a1814e1dbda1 Signed-off-by: Damjan Marion <dmarion@me.com>
2020-02-12vppinfra: add ARM NEON implementation of u8x16_word_shift_{left,right}Damjan Marion1-0/+12
Type: improvement Change-Id: I310e421513e9d3f96ad7debc72c9407e231962b8 Signed-off-by: Damjan Marion <dmarion@me.com>
2020-01-31crypto-native: add ARMv8 AES-CBC implementationDamjan Marion1-0/+6
Type: feature Change-Id: I32256061b9509880eec843db2f918879cdafbe47 Signed-off-by: Damjan Marion <dmarion@me.com>
2019-06-26vppinfra: add u64x2_scatter/u32x4_scatterLijian.Zhang1-0/+16
Add u64x2_scatter/u32x4_scatter in vector_neon.h. u64x2_scatter/u32x4_scatter scatters data from SIMD register to scattered memory locations. Type: feature Change-Id: I298d5478c7ba6935ab7402a6d467c7ee00f17e9f Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Sirshak Das <Sirshak.Das@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
2019-06-21vppinfra: add u64x2_gather/u32x4_gatherLijian.Zhang1-0/+18
Add u64x2_gather/u32x4_gather in vector_neon.h. u64x2_gather/u32x4_gather gathers data from scattered memory locations to a SIMD register. Type: feature Change-Id: I1dd27e38af28b9bed85143014c86197ee549fede Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Sirshak Das <Sirshak.Das@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
2019-05-01Enable NEON instructions in memcpy_leLijian.Zhang1-0/+8
Neon version of memcpy_le gives better performance compared with memmove on aarch64 Change-Id: I44b487bb0795a6e70dd1e55bdde4a077773ec859 Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Sirshak Das <sirdas@arm.com>
2019-03-20Re-enable aarch64 neon instruction in vlib_buffer_free_inlineLijian.Zhang1-2/+9
int vaddvq_u8 (uint8x16_t __a) is not appropriate to implement xxx_is_all_zero, as there may be overflow causing incorrect return value. Here's an example. u8x16 x = {0 <repeats 12 times>, 1, 255, 0, 0}; Change-Id: Ia6a10bdf8da360dec12db902d028751a1a77e9a4 Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Sirshak Das <Sirshak.Das@arm.com>
2018-11-07Optimize xxx_zero_byte_mask NEON functionLijian Zhang1-44/+7
Optimize zero byte mask NEON functions below with less intrinsics, and get their outputs consistent with functions in vector_sse42.h always_inline u32 u64x2_zero_byte_mask (u64x2 input) always_inline u32 u32x4_zero_byte_mask (u32x4 input) always_inline u32 u16x8_zero_byte_mask (u16x8 input) always_inline u32 u8x16_zero_byte_mask (u8x16 input) always_inline u32 i64x2_zero_byte_mask (i64x2 input) always_inline u32 i32x4_zero_byte_mask (i32x4 input) always_inline u32 i16x8_zero_byte_mask (i16x8 input) always_inline u32 i8x16_zero_byte_mask (i8x16 input) Change-Id: I7f485915baeb37fa2dd484699b8769e0136f6574 Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Sirshak Das <Sirshak.Das@arm.com>
2018-10-31Add and enable msb mask vector intrinsic for aarch64.Lijian Zhang1-10/+28
This patch enables the use of this function for enqueuing frames to the next graph node. Change-Id: I4003110db59870f7106e0d13942d6ff7bc54b46d Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Sirshak Das <Sirshak.Das@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> Reviewed-by: Steve Capper <Steve.Capper@arm.com>
2018-10-09vppinfra: Fix extendto_high aarch64 NEON api.v19.01-rc0Sirshak Das1-1/+1
This fixes the l2BD and ip4 test case failures. Fixes VPP-1432, VPP-1428, VPP-1430 Change-Id: I48b5c961bab60cc3b39fcd6db47e098c81579480 Signed-off-by: Sirshak Das <sirshak.das@arm.com>
2018-09-12Add and enable u32x4_extend_to_u64x2_high for aarch64 NEON intrinsics.Sirshak Das1-0/+6
This is the high version of extendto. This function accomplishes the same task as both shuffling and extending done by SSE intrinsics. This enables the NEON version for buffer indexes to buffer pointer translation. Change-Id: I52d7bbf3d76ba69c9acb0e518ff4bc6abf3bbbd4 Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Yi He <yi.he@arm.com> Verified-by: Lijian Zhang <lijian.zhang@arm.com>
2018-09-11Replacing vtbl NEON intrinsic with rev NEON intrinsic for byte_swap.Sirshak Das1-5/+1
Using rev16 vector intrinsic to reverse byteorder in each word independently. Change-Id: I071c40780baffe0bda614ec5d9dd92858f574b0d Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Brian Brooks <brian.brooks@arm.com> Reviewed-by: Yi He <yi.he@arm.com> Verified-by: Lijian Zhang <lijian.zhang@arm.com>
2018-09-11Add u32x4_extend_to_u64x2 for aarch64 using NEON intrinsicsSirshak Das1-0/+6
This is used in vlib_get_buffers_with_offset. Change-Id: If4ff776bc97d21a22e870300b164eeb6a5ec3638 Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Brian Brooks <brian.brooks@arm.com> Reviewed-by: Yi He <yi.he@arm.com> Verified-by: Lijian Zhang <lijian.zhang@arm.com>
2018-09-11Add horizontal add (hadd) vector intrinsic via NEON.Sirshak Das1-0/+6
Having the NEON equivalent of u32x4_hadd for CLIB_HAVE_VEC128 Change-Id: I210f96f7ecb9b80b4753311a68e5e09ccda7e95b Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Brian Brooks <brian.brooks@arm.com> Reviewed-by: Yi He <yi.he@arm.com> Verified-by: Lijian Zhang <lijian.zhang@arm.com>
2018-08-01Add support for shuffle vector intrinsic via Neon in ARMSirshak Das1-0/+16
This adds byte_swap (variant of shuffle) and shuffle vector intrinsic for ARM based on Neon, concuring with same signature as SSE vector intrinsic. Change-Id: I386fd2b1dcc83654e4ad9f90a6065d7736e4ce5c Signed-off-by: Sirshak Das <sirshak.das@arm.com>
2018-06-26Fix load_unaligned undefined and other possible build failuresSirshak Das1-26/+40
Add aarch64 neon intrinsics to fix build failures similar to this: error: implicit declaration of function ‘u64x2_load_unaligned’ Change-Id: I6178504a48242742df3f7d75abdaf108796cf73f Signed-off-by: Sirshak Das <sirshak.das@arm.com>
2018-02-26Added u8x16,u32x4,u64x2 variants of _zero_byte_mask(x) for ARM/NEON ↵Adrian Oanca1-0/+20
platform. VPP-1129 Change-Id: I954acb56d901e42976e71534317f38d7c4359bcf Signed-off-by: Adrian Oanca <adrian.oanca@enea.com>
2018-02-24u8x16_compare_byte_mask - optimize to use 128bit registers as suggested by ↵Adrian Oanca1-24/+9
Nintin Change-Id: I88aabd34ef385d620695ac17ec3fe2f4a5177ada Signed-off-by: Adrian Oanca <adrian.oanca@enea.com>
2018-02-21add 'is_all_zero(x)' for NEONAdrian Oanca1-0/+24
Change-Id: I5045e0f3ac4698e820b69ad46b96763e404e6fe4 Signed-off-by: Adrian Oanca <adrian.oanca@enea.com>
2018-02-20vppinfra: autogerate vector typedefs and basic inline functionsDamjan Marion1-43/+0
Change-Id: Ie9f611fa6a962b0937245f5cc949571ba11c5604 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-02-08add CLIB_HAVE_VEC128 with NEON intrinsics (VPP-1127)Gabriel Ganne1-0/+60
Enable CLIB_HAVE_VEC128 if both aarch64 and __ARM_NEON ie. armv8 only, not armv7 Add more neon compare intrinsics wrappers. I only add simple intrinsics wrappers. More complex ones can be added later as they are needed, with performance tests on the corresponding feature to back them up. Remove wrongly added 128bits definitions defined on both armv7 and armv8 without concern for NEON instructions presence. Notable correspondinf code activations: * MHEAP_FLAG_SMALL_OBJECT_CACHE in mheap.c * ip4 fib mtrie leaves access * enable ixge plugin compilation for aarch64 (conf still disables it by default) Change-Id: I99953823627bdff6f222d232c78aa7b655aaf77a Signed-off-by: Gabriel Ganne <gabriel.ganne@enea.com>
2016-12-28Reorganize source tree to use single autotools instanceDamjan Marion1-0/+71
Change-Id: I7b51f88292e057c6443b12224486f2d0c9f8ae23 Signed-off-by: Damjan Marion <damarion@cisco.com>