]> www.infradead.org Git - users/jedix/linux-maple.git/commit
x86/irq: Plug irq vector hotplug race
authorThomas Gleixner <tglx@linutronix.de>
Sun, 5 Jul 2015 17:12:32 +0000 (17:12 +0000)
committerEthan Zhao <ethan.zhao@oracle.com>
Wed, 23 Aug 2017 03:57:00 +0000 (12:57 +0900)
commit5b1e36a0f31a886413d5023085d1fabb64cc2100
treef4c9ae79885d1590e0860bc8ac6e4d814ef48c5b
parente9c7554a174f1fa23e4dd4ebc0cee2f8e3e31bf7
x86/irq: Plug irq vector hotplug race

Jin debugged a nasty cpu hotplug race which results in leaking a irq
vector on the newly hotplugged cpu.

cpu N cpu M
native_cpu_up                   device_shutdown
  do_boot_cpu   free_msi_irqs
  start_secondary                   arch_teardown_msi_irqs
    smp_callin                        default_teardown_msi_irqs
       setup_vector_irq                  arch_teardown_msi_irq
        __setup_vector_irq    native_teardown_msi_irq
          lock(vector_lock)      destroy_irq
          install vectors
          unlock(vector_lock)
       lock(vector_lock)
--->                                          __clear_irq_vector
                                            unlock(vector_lock)
    lock(vector_lock)
    set_cpu_online
    unlock(vector_lock)

This leaves the irq vector(s) which are torn down on CPU M stale in
the vector array of CPU N, because CPU M does not see CPU N online
yet. There is a similar issue with concurrent newly setup interrupts.

The alloc/free protection of irq descriptors does not prevent the
above race, because it merily prevents interrupt descriptors from
going away or changing concurrently.

Prevent this by moving the call to setup_vector_irq() into the
vector_lock held region which protects set_cpu_online():

cpu N cpu M
native_cpu_up                   device_shutdown
  do_boot_cpu   free_msi_irqs
  start_secondary                   arch_teardown_msi_irqs
    smp_callin                        default_teardown_msi_irqs
       lock(vector_lock)                arch_teardown_msi_irq
       setup_vector_irq()
        __setup_vector_irq    native_teardown_msi_irq
          install vectors      destroy_irq
       set_cpu_online
       unlock(vector_lock)
       lock(vector_lock)
                                          __clear_irq_vector
                                            unlock(vector_lock)

So cpu M either sees the cpu N online before clearing the vector or
cpu N installs the vectors after cpu M has cleared it.

Reported-by: xiao jin <jin.xiao@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.141898931@linutronix.de
(cherry picked from commit 5a3f75e3f02836518ce49536e9c460ca8e1fa290)

Orabug: 25671838

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Conflicts:
arch/x86/kernel/smpboot.c
arch/x86/kernel/apic/vector.c
arch/x86/kernel/smpboot.c