crypto: x86/chacha20 - Support partial lengths in 8-block AVX2 variant
Add a length argument to the eight block function for AVX2, so the
block function may XOR only a partial length of eight blocks.
To avoid unnecessary operations, we integrate XORing of the first four
blocks in the final lane interleaving; this also avoids some work in
the partial lengths path.
Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>