slab: switch percpu sheaves locking to localtry_lock
Instead of local_lock_irqsave(), use localtry_trylock() when potential
callers include irq context, and localtry_lock() otherwise (such as when
we already know the gfp flags allow blocking).
This should reduce the locking (due to irq disabling/enabling) overhead.
Failing to use percpu sheaves in an irq due to preempting an already
locked user of sheaves should be rare so it's a favorable tradeoff.