In a xen VM with vNUMA enabled, irq affinity for a device on node 1
may become stuck on CPU 0. /proc/irq/nnn/smp_affinity_list may show
affinity for all the CPUs on node 1, but this is wrong. All interrupts
are on the first CPU of node 0 which is usually CPU 0.
The problem is caused when __assign_irq_vector() is called by
arch_setup_hwirq() with a mask of all online CPUs, and then called later
with a mask including only the node 1 CPUs. The first call assigns affinity
to CPU 0, and the second tries to move affinity to the first online node 1
CPU. In the reported case this is always CPU 2. For some reason, the
CPU 0 affinity is never cleaned up, and all interrupts remain with CPU 0.
Since an incomplete move appears to be in progress, all attempts to
reassign affinity for the irq fail. Because of a quirk in how affinity is
displayed in /proc/irq/nnn/smp_affinity_list, changes may appear to work
temporarily.
It was not reproducible on baremetal on the machine I had available for
testing, but it is possible that it was observed on other machines. It
does not appear in UEK5. The APIC and IRQ code is very different in UEK5,
and the code changed here doesn't exist there. It is unknown whether KVM
guests might see the same problem with UEK4.
Making arch_setup_hwirq() NUMA sensitive eliminates the problem by
using the correct cpumask for the node for the initial assignment. The
second assignment becomes a noop. After initialization is complete,
affinity can be moved to any CPU on any node and back without a problem.
Orabug:
29292411
Signed-off-by: Henry Willard <henry.willard@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
return -ENOMEM;
raw_spin_lock_irqsave(&vector_lock, flags);
- ret = __assign_irq_vector(irq, cfg, apic->target_cpus());
+ if (node != NUMA_NO_NODE) {
+ const struct cpumask *node_mask = cpumask_of_node(node);
+ struct cpumask apic_mask;
+
+ cpumask_copy(&apic_mask, apic->target_cpus());
+ if (cpumask_intersects(&apic_mask, node_mask))
+ cpumask_and(&apic_mask, &apic_mask, node_mask);
+ ret = __assign_irq_vector(irq, cfg, &apic_mask);
+ } else {
+ ret = __assign_irq_vector(irq, cfg, apic->target_cpus());
+ }
raw_spin_unlock_irqrestore(&vector_lock, flags);
if (!ret)