riscv: Improve zacas fully-ordered cmpxchg()
The current fully-ordered cmpxchgXX() implementation results in:
amocas.X.rl a5,a4,(s1)
fence rw,rw
This provides enough sync but we can actually use the following better
mapping instead:
amocas.X.aqrl a5,a4,(s1)
Suggested-by: Andrea Parri <andrea@rivosinc.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20241103145153.105097-7-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>