arm64: move percpu cmpxchg implementation from cmpxchg.h to percpu.h
We want to avoid pulling linux/preempt.h into cmpxchg.h, since that can
introduce a circular dependency on linux/bitops.h. linux/preempt.h is
only needed by the per-cpu cmpxchg implementation, which is better off
alongside the per-cpu xchg implementation in percpu.h, so move it there
and add the missing #include.
Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>