]> www.infradead.org Git - users/willy/linux.git/commit
mm, swap: fix race between swapoff and some swap operations
authorHuang Ying <ying.huang@intel.com>
Wed, 5 Dec 2018 00:14:03 +0000 (11:14 +1100)
committerStephen Rothwell <sfr@canb.auug.org.au>
Thu, 6 Dec 2018 22:08:52 +0000 (09:08 +1100)
commit09126a9c1be2bded95d9145ee8421e98ce5a626e
tree4af96fdb6edca82b6efe253cb2b3c90b21e021f7
parent6240f88b6c38252a6dd4ed2006c2e2efa938ea8e
mm, swap: fix race between swapoff and some swap operations

When swapin is performed, after getting the swap entry information from
the page table, system will swap in the swap entry, without any lock held
to prevent the swap device from being swapoff.  This may cause the race
like below,

CPU 1 CPU 2
----- -----
do_swap_page
  swapin_readahead
    __read_swap_cache_async
swapoff       swapcache_prepare
  p->swap_map = NULL         __swap_duplicate
  p->swap_map[?] /* !!! NULL pointer access */

Because swapoff is usually done when system shutdown only, the race may
not hit many people in practice.  But it is still a race need to be fixed.

To fix the race, get_swap_device() is added to check whether the specified
swap entry is valid in its swap device.  If so, it will keep the swap
entry valid via preventing the swap device from being swapoff, until
put_swap_device() is called.

Because swapoff() is very rare code path, to make the normal path runs as
fast as possible, disabling preemption + stop_machine() instead of
reference count is used to implement get/put_swap_device().  From
get_swap_device() to put_swap_device(), the preemption is disabled, so
stop_machine() in swapoff() will wait until put_swap_device() is called.

In addition to swap_map, cluster_info, etc.  data structure in the struct
swap_info_struct, the swap cache radix tree will be freed after swapoff,
so this patch fixes the race between swap cache looking up and swapoff
too.

Races between some other swap cache usages protected via disabling
preemption and swapoff are fixed too via calling stop_machine() between
clearing PageSwapCache() and freeing swap cache data structure.

Alternative implementation could be replacing disable preemption with
rcu_read_lock_sched and stop_machine() with synchronize_sched().

Link: http://lkml.kernel.org/r/20180213014220.2464-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
include/linux/swap.h
mm/memory.c
mm/swap_state.c
mm/swapfile.c