x86/apic/x2apic: Fix parallel handling of cluster_mask
For each CPU being brought up, the alloc_clustermask() function
allocates a new struct cluster_mask just in case it's needed. Then the
target CPU actually runs, and in init_x2apic_ldr() it either uses a
cluster_mask from a previous CPU in the same cluster, or consumes the
"spare" one and sets the global pointer to NULL.
That isn't going to parallelise stunningly well.
Ditch the global variable, let alloc_clustermask() install the struct
*directly* in the per_cpu data for the CPU being brought up. As an
optimisation, actually make it do so for *all* present CPUs in the same
cluster, which means only one iteration over for_each_present_cpu()
instead of doing so repeatedly, once for each CPU.
This was a harmless "bug" while CPU bringup wasn't actually happening in
parallel. It's about to become less harmless...
Fixes: 023a611748fd5 ("x86/apic/x2apic: Simplify cluster management") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>