]> www.infradead.org Git - users/hch/block.git/commit
LoongArch: vDSO: Tune chacha implementation
authorXi Ruoyao <xry111@xry111.site>
Thu, 19 Sep 2024 09:13:59 +0000 (17:13 +0800)
committerJason A. Donenfeld <Jason@zx2c4.com>
Tue, 24 Sep 2024 12:21:05 +0000 (14:21 +0200)
commit9805f39d423a30a7189158905ec3d71774fe98a1
treedd812bbda8ffefcddab4d2ecaa955fd81f968534
parent6ff2c290147a65027fb04b154a52723a6efabced
LoongArch: vDSO: Tune chacha implementation

As Christophe pointed out, tuning the chacha implementation by
scheduling the instructions like what GCC does can improve the
performance.

The tuning does not introduce too much complexity (basically it's just
reordering some instructions). And the tuning does not hurt readibility
too much: actually the tuned code looks even more similar to a
textbook-style implementation based on 128-bit vectors.  So overall it's
a good deal to me.

Tested with vdso_test_getchacha and benched with vdso_test_getrandom.
On a LA664 the speedup is 5%, and I expect a larger speedup on LA[2-4]64
with a lower issue rate.

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/all/77655d9e-fc05-4300-8f0d-7b2ad840d091@csgroup.eu/
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
arch/loongarch/vdso/vgetrandom-chacha.S