vpp - Vector Packet Processing

Age	Commit message (Collapse)	Author	Files	Lines
2023-03-23	vppinfra: AES-CBC and AES-GCM refactor and optimizations	Damjan Marion	1	-0/+55
	- crypto code moved to vppinfra for better testing and reuse - added 256-bit VAES support (Intel Client CPUs) - added AES_GMAC functions Change-Id: I960c8e14ca0a0126703e8f1589d86f32e2a98361 Type: improvement Signed-off-by: Damjan Marion <damarion@cisco.com>
2022-02-10	vppinfra:remove uXxY_is_greater	Damjan Marion	1	-6/+0
	not needed, '>' works... Type: improvement Change-Id: I9bfbac787e8dca24e9b1bceda0922740eed26346 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-12-02	vppinfra: vector shuffle cleanup	Damjan Marion	1	-6/+0
	Type: refactor Change-Id: I8b3fc2ce30df313467274a174c5ac6adbf296153 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-11-10	vppinfra: new vectorized ip checksum functions incl. csum_and_copy	Damjan Marion	1	-0/+12
	Type: improvement Change-Id: Id5810b7f4a6d6e4ce16b73c235b50db5d475ebf7 Signed-off-by: Damjan Marion <damarion@cisco.com>
2021-07-13	vppinfra: fix saturate add/sub NEON wrappers	Lijian.Zhang	1	-40/+60
	Fix the saturate add/sub wrappers in vector_neon.h by using the correct intrinsics. Type: fix Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Tianyu Li <Tianyu.Li@arm.com> Change-Id: I38a85633948472d4bdb1c199a806633d3070013f
2020-08-31	vppinfra: convert A_extend_to_B to B_from_A format of vector inlines	Damjan Marion	1	-2/+2
	Make it shorter and same format when converting to biggor or smaller types. Type: refactor Change-Id: I443d67e18ae65d779b4d9a0dce5406f7d9f0e4ac Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-05-08	vppinfra: fix u32x4_byte_swap on Arm	Lijian.Zhang	1	-1/+1
	Fix the endianness conversion function u32x4_byte_swap() on Arm. Here's an example of using this function with and without the fix. This issue is seen using Mellanox NIC RDMA driver on Arm servers. The packet length cannot be parsed correctly. Testing code: u32x4 s = {0x12345678, 0x23456789, 0x3456789a, 0x456789ab}; u32x4 ss = u32x4_byte_swap (s); Without the code change: (gdb) p /x s $1 = {0x12345678, 0x23456789, 0x3456789a, 0x456789ab} (gdb) p /x ss $2 = {0x23456789, 0x12345678, 0x456789ab, 0x3456789a} With the code change: (gdb) p /x s $3 = {0x12345678, 0x23456789, 0x3456789a, 0x456789ab} (gdb) p /x ss $4 = {0x78563412, 0x89674523, 0x9a785634, 0xab896745} Type: fix Change-Id: Ie5f263e94331783940e7c00397092a64e4fc4279 Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Jieqiang Wang <Jieqiang.Wang@arm.com> Reviewed-by: Govindarajan Mohandoss <Govindarajan.Mohandoss@arm.com>
2020-03-16	rdma: add Mellanox mlx5 Direct Verbs receive support	Damjan Marion	1	-0/+6
	Type: feature Change-Id: I3f287ab536a482c366ad7df47e1c04e640992ebc Signed-off-by: Damjan Marion <damarion@cisco.com>
2020-02-18	vppinfra: fix debug image builds on arm	Damjan Marion	1	-11/+2
	vextq_u8(...) reuqires constant value so instead of inline function we need to use macro. Type: fix Signed-off-by: Damjan Marion <dmarion@me.com> Change-Id: I9c1d878c9ec750f0ed5b5eac4dffde50e97e7357
2020-02-13	vppinfra: add 128-bit and 512-bit a ^ b ^ c shortcut	Damjan Marion	1	-0/+11
	This allows us to combine 2 XOR operations into signle instruction which makes difference in crypto op: - in x86, by using ternary logic instruction - on ARM, by using EOR3 instruction (available with sha3 feature) Type: refactor Change-Id: Ibdf9001840399d2f838d491ca81b57cbd8430433 Signed-off-by: Damjan Marion <damjan.marion@gmail.com>
2020-02-13	crypto-native: add AArch64 AES-GCM native implementation	Damjan Marion	1	-0/+9
	Type: feature Change-Id: I4f96b0af13b875d491704b010328a1814e1dbda1 Signed-off-by: Damjan Marion <dmarion@me.com>
2020-02-12	vppinfra: add ARM NEON implementation of u8x16_word_shift_{left,right}	Damjan Marion	1	-0/+12
	Type: improvement Change-Id: I310e421513e9d3f96ad7debc72c9407e231962b8 Signed-off-by: Damjan Marion <dmarion@me.com>
2020-01-31	crypto-native: add ARMv8 AES-CBC implementation	Damjan Marion	1	-0/+6
	Type: feature Change-Id: I32256061b9509880eec843db2f918879cdafbe47 Signed-off-by: Damjan Marion <dmarion@me.com>
2019-06-26	vppinfra: add u64x2_scatter/u32x4_scatter	Lijian.Zhang	1	-0/+16
	Add u64x2_scatter/u32x4_scatter in vector_neon.h. u64x2_scatter/u32x4_scatter scatters data from SIMD register to scattered memory locations. Type: feature Change-Id: I298d5478c7ba6935ab7402a6d467c7ee00f17e9f Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Sirshak Das <Sirshak.Das@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
2019-06-21	vppinfra: add u64x2_gather/u32x4_gather	Lijian.Zhang	1	-0/+18
	Add u64x2_gather/u32x4_gather in vector_neon.h. u64x2_gather/u32x4_gather gathers data from scattered memory locations to a SIMD register. Type: feature Change-Id: I1dd27e38af28b9bed85143014c86197ee549fede Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Sirshak Das <Sirshak.Das@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
2019-05-01	Enable NEON instructions in memcpy_le	Lijian.Zhang	1	-0/+8
	Neon version of memcpy_le gives better performance compared with memmove on aarch64 Change-Id: I44b487bb0795a6e70dd1e55bdde4a077773ec859 Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Sirshak Das <sirdas@arm.com>
2019-03-20	Re-enable aarch64 neon instruction in vlib_buffer_free_inline	Lijian.Zhang	1	-2/+9
	int vaddvq_u8 (uint8x16_t __a) is not appropriate to implement xxx_is_all_zero, as there may be overflow causing incorrect return value. Here's an example. u8x16 x = {0 <repeats 12 times>, 1, 255, 0, 0}; Change-Id: Ia6a10bdf8da360dec12db902d028751a1a77e9a4 Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Sirshak Das <Sirshak.Das@arm.com>
2018-11-07	Optimize xxx_zero_byte_mask NEON function	Lijian Zhang	1	-44/+7
	Optimize zero byte mask NEON functions below with less intrinsics, and get their outputs consistent with functions in vector_sse42.h always_inline u32 u64x2_zero_byte_mask (u64x2 input) always_inline u32 u32x4_zero_byte_mask (u32x4 input) always_inline u32 u16x8_zero_byte_mask (u16x8 input) always_inline u32 u8x16_zero_byte_mask (u8x16 input) always_inline u32 i64x2_zero_byte_mask (i64x2 input) always_inline u32 i32x4_zero_byte_mask (i32x4 input) always_inline u32 i16x8_zero_byte_mask (i16x8 input) always_inline u32 i8x16_zero_byte_mask (i8x16 input) Change-Id: I7f485915baeb37fa2dd484699b8769e0136f6574 Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Sirshak Das <Sirshak.Das@arm.com>
2018-10-31	Add and enable msb mask vector intrinsic for aarch64.	Lijian Zhang	1	-10/+28
	This patch enables the use of this function for enqueuing frames to the next graph node. Change-Id: I4003110db59870f7106e0d13942d6ff7bc54b46d Signed-off-by: Lijian Zhang <Lijian.Zhang@arm.com> Reviewed-by: Sirshak Das <Sirshak.Das@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> Reviewed-by: Steve Capper <Steve.Capper@arm.com>
2018-10-09	vppinfra: Fix extendto_high aarch64 NEON api.v19.01-rc0	Sirshak Das	1	-1/+1
	This fixes the l2BD and ip4 test case failures. Fixes VPP-1432, VPP-1428, VPP-1430 Change-Id: I48b5c961bab60cc3b39fcd6db47e098c81579480 Signed-off-by: Sirshak Das <sirshak.das@arm.com>
2018-09-12	Add and enable u32x4_extend_to_u64x2_high for aarch64 NEON intrinsics.	Sirshak Das	1	-0/+6
	This is the high version of extendto. This function accomplishes the same task as both shuffling and extending done by SSE intrinsics. This enables the NEON version for buffer indexes to buffer pointer translation. Change-Id: I52d7bbf3d76ba69c9acb0e518ff4bc6abf3bbbd4 Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Yi He <yi.he@arm.com> Verified-by: Lijian Zhang <lijian.zhang@arm.com>
2018-09-11	Replacing vtbl NEON intrinsic with rev NEON intrinsic for byte_swap.	Sirshak Das	1	-5/+1
	Using rev16 vector intrinsic to reverse byteorder in each word independently. Change-Id: I071c40780baffe0bda614ec5d9dd92858f574b0d Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Brian Brooks <brian.brooks@arm.com> Reviewed-by: Yi He <yi.he@arm.com> Verified-by: Lijian Zhang <lijian.zhang@arm.com>
2018-09-11	Add u32x4_extend_to_u64x2 for aarch64 using NEON intrinsics	Sirshak Das	1	-0/+6
	This is used in vlib_get_buffers_with_offset. Change-Id: If4ff776bc97d21a22e870300b164eeb6a5ec3638 Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Brian Brooks <brian.brooks@arm.com> Reviewed-by: Yi He <yi.he@arm.com> Verified-by: Lijian Zhang <lijian.zhang@arm.com>
2018-09-11	Add horizontal add (hadd) vector intrinsic via NEON.	Sirshak Das	1	-0/+6
	Having the NEON equivalent of u32x4_hadd for CLIB_HAVE_VEC128 Change-Id: I210f96f7ecb9b80b4753311a68e5e09ccda7e95b Signed-off-by: Sirshak Das <sirshak.das@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Brian Brooks <brian.brooks@arm.com> Reviewed-by: Yi He <yi.he@arm.com> Verified-by: Lijian Zhang <lijian.zhang@arm.com>
2018-08-01	Add support for shuffle vector intrinsic via Neon in ARM	Sirshak Das	1	-0/+16
	This adds byte_swap (variant of shuffle) and shuffle vector intrinsic for ARM based on Neon, concuring with same signature as SSE vector intrinsic. Change-Id: I386fd2b1dcc83654e4ad9f90a6065d7736e4ce5c Signed-off-by: Sirshak Das <sirshak.das@arm.com>
2018-06-26	Fix load_unaligned undefined and other possible build failures	Sirshak Das	1	-26/+40
	Add aarch64 neon intrinsics to fix build failures similar to this: error: implicit declaration of function ‘u64x2_load_unaligned’ Change-Id: I6178504a48242742df3f7d75abdaf108796cf73f Signed-off-by: Sirshak Das <sirshak.das@arm.com>
2018-02-26	Added u8x16,u32x4,u64x2 variants of _zero_byte_mask(x) for ARM/NEON ↵	Adrian Oanca	1	-0/+20
	platform. VPP-1129 Change-Id: I954acb56d901e42976e71534317f38d7c4359bcf Signed-off-by: Adrian Oanca <adrian.oanca@enea.com>
2018-02-24	u8x16_compare_byte_mask - optimize to use 128bit registers as suggested by ↵	Adrian Oanca	1	-24/+9
	Nintin Change-Id: I88aabd34ef385d620695ac17ec3fe2f4a5177ada Signed-off-by: Adrian Oanca <adrian.oanca@enea.com>
2018-02-21	add 'is_all_zero(x)' for NEON	Adrian Oanca	1	-0/+24
	Change-Id: I5045e0f3ac4698e820b69ad46b96763e404e6fe4 Signed-off-by: Adrian Oanca <adrian.oanca@enea.com>
2018-02-20	vppinfra: autogerate vector typedefs and basic inline functions	Damjan Marion	1	-43/+0
	Change-Id: Ie9f611fa6a962b0937245f5cc949571ba11c5604 Signed-off-by: Damjan Marion <damarion@cisco.com>
2018-02-08	add CLIB_HAVE_VEC128 with NEON intrinsics (VPP-1127)	Gabriel Ganne	1	-0/+60
	Enable CLIB_HAVE_VEC128 if both aarch64 and __ARM_NEON ie. armv8 only, not armv7 Add more neon compare intrinsics wrappers. I only add simple intrinsics wrappers. More complex ones can be added later as they are needed, with performance tests on the corresponding feature to back them up. Remove wrongly added 128bits definitions defined on both armv7 and armv8 without concern for NEON instructions presence. Notable correspondinf code activations: * MHEAP_FLAG_SMALL_OBJECT_CACHE in mheap.c * ip4 fib mtrie leaves access * enable ixge plugin compilation for aarch64 (conf still disables it by default) Change-Id: I99953823627bdff6f222d232c78aa7b655aaf77a Signed-off-by: Gabriel Ganne <gabriel.ganne@enea.com>
2016-12-28	Reorganize source tree to use single autotools instance	Damjan Marion	1	-0/+71
	Change-Id: I7b51f88292e057c6443b12224486f2d0c9f8ae23 Signed-off-by: Damjan Marion <damarion@cisco.com>