From: Henry Willard Date: Thu, 14 Mar 2019 18:01:01 +0000 (-0700) Subject: x86/apic: Make arch_setup_hwirq NUMA node aware X-Git-Tag: v4.1.12-124.31.3~236 X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=ff3efcf39b34994ab1ba9390bd3a79c1a985c895;p=users%2Fjedix%2Flinux-maple.git x86/apic: Make arch_setup_hwirq NUMA node aware In a xen VM with vNUMA enabled, irq affinity for a device on node 1 may become stuck on CPU 0. /proc/irq/nnn/smp_affinity_list may show affinity for all the CPUs on node 1, but this is wrong. All interrupts are on the first CPU of node 0 which is usually CPU 0. The problem is caused when __assign_irq_vector() is called by arch_setup_hwirq() with a mask of all online CPUs, and then called later with a mask including only the node 1 CPUs. The first call assigns affinity to CPU 0, and the second tries to move affinity to the first online node 1 CPU. In the reported case this is always CPU 2. For some reason, the CPU 0 affinity is never cleaned up, and all interrupts remain with CPU 0. Since an incomplete move appears to be in progress, all attempts to reassign affinity for the irq fail. Because of a quirk in how affinity is displayed in /proc/irq/nnn/smp_affinity_list, changes may appear to work temporarily. It was not reproducible on baremetal on the machine I had available for testing, but it is possible that it was observed on other machines. It does not appear in UEK5. The APIC and IRQ code is very different in UEK5, and the code changed here doesn't exist there. It is unknown whether KVM guests might see the same problem with UEK4. Making arch_setup_hwirq() NUMA sensitive eliminates the problem by using the correct cpumask for the node for the initial assignment. The second assignment becomes a noop. After initialization is complete, affinity can be moved to any CPU on any node and back without a problem. Orabug: 29292411 Signed-off-by: Henry Willard Reviewed-by: Boris Ostrovsky Signed-off-by: Brian Maly --- diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 4902161b69e3..4345000c6632 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -509,7 +509,17 @@ int arch_setup_hwirq(unsigned int irq, int node) return -ENOMEM; raw_spin_lock_irqsave(&vector_lock, flags); - ret = __assign_irq_vector(irq, cfg, apic->target_cpus()); + if (node != NUMA_NO_NODE) { + const struct cpumask *node_mask = cpumask_of_node(node); + struct cpumask apic_mask; + + cpumask_copy(&apic_mask, apic->target_cpus()); + if (cpumask_intersects(&apic_mask, node_mask)) + cpumask_and(&apic_mask, &apic_mask, node_mask); + ret = __assign_irq_vector(irq, cfg, &apic_mask); + } else { + ret = __assign_irq_vector(irq, cfg, apic->target_cpus()); + } raw_spin_unlock_irqrestore(&vector_lock, flags); if (!ret)