pub unsafe fn pack_avx2(bits: &[u8]) -> Vec<u64>
Pack u8 bits into u64 words using AVX2 movemask.
Processes 64 bytes into one u64 word by building two 32-bit masks.
Caller must ensure the current CPU supports avx2.
avx2