x86/tsc: Avoid synchronizing TSCs with multiple CPUs in parallel
The TSC sync algorithm is only designed to do a 1:1 sync between the
source and target CPUs.
In order to enable parallel CPU bringup, serialize it by using an
atomic_t containing the number of the target CPU whose turn it is.
In future we should look at inventing a 1:many TSC synchronization
algorithm, perhaps falling back to 1:1 if a warp is observed but
doing them all in parallel for the common case where no adjustment
is needed. Or just avoiding the sync completely for cases like kexec
where we trust that they were in sync already.
This is perfectly sufficient for the short term though, until we get
those further optimisations.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>