Erasure Code: Extend ec_encode_data_avx2_gfni and ec_encode_data_update_avx2_gfni to support parity blocks k+1 through k+6#423
Conversation
|
Hi @OH195-C. I was looking for the question about why there is no p=4,5,6 implementation for AVX2_gfni, as you are implementing it, but I cannot find it... |
|
Hi @pablodelara. I closed the issue #419, because this commit fixes it. (I will be using the OH195-C account for all future communication.)
During implementation, some YMM and general-purpose registers were reused to address register pressure. As demonstrated above, AVX2+GFNI (+1~+6) consistently outperforms AVX2. Crucially, it introduces no data read amplification, maintaining consistency with other SIMD implementations. The performance results for ec_encode_data_update_single_src_simple_warm are shown below: |
|
Hi @pablodelara, I noticed that some CI checks are failing. I reviewed the error logs, which is strange because I haven't modified any of the affected files. Could you advise on how to resolve this? |
Can you rebase on top of latest master? |
1) add 4~6 vector AVX2 dot product with GFNI implementation
2) add AVX2 6vect mad with GFNI implementation
3) ensuring encoding process not modify the input mul_array pointer
Complete the implementations of ec_encode_data_avx2_gfni and
ec_encode_data_update_avx2_gfni to support parity blocks k+1 through
k+6, consistent with the implementations for other instruction sets
(AVX512, AVX2, etc.). This avoids reading source data twice when
computing parities k+4 to k+6, preventing memory bandwidth
amplification.
Signed-off-by: cl304641 <cl304641@alibaba-inc.com>
Done. I've rebased onto the latest master. |
|
Hi @pablodelara, could you please re-run the run_tests_linux-riscv64-v job? This looks like a self-hosted runner issue — the build process was killed by an external signal with no compilation errors, and all other 8 CI jobs passed. |


Erasure Code:
1) add 4~6 vector AVX2 dot product with GFNI implementation
2) add AVX2 6vect mad with GFNI implementation
3) ensuring encoding process not modify the input mul_array pointer
Complete the implementations of ec_encode_data_avx2_gfni and ec_encode_data_update_avx2_gfni to support parity blocks k+1 through k+6, consistent with the implementations for other instruction sets (AVX512, AVX2, etc.). This avoids reading source data twice when computing parities k+4 to k+6, preventing memory bandwidth amplification.