x86/tsc: Avoid synchronizing TSCs with multiple CPUs in parallel
The TSC sync algorithm is only designed to do a 1:1 sync between the
source and target CPUs.
In order to enable parallel CPU bringup, serialize it by using an
atomic_t containing the number of the target CPU whose turn it is.
In future this could be optimised by inventing a 1:many algorithm for
TSC synchronization algorithm, perhaps falling back to 1:1 if a warp is
observed but doing them all in parallel for the common case where no
adjustment is needed. Or just avoiding the sync completely for cases
like kexec where we trust that they were in sync already.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>