crypto: arm64/sm4 - add CE implementation for CTS-CBC mode
This patch is a CE-optimized assembly implementation for CTS-CBC mode.
Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of
tcrypt, and compared the performance before and after this patch (the driver
used before this patch is cts(cbc-sm4-ce)). The abscissas are blocks of
different lengths. The data is tabulated and the unit is Mb/s: