kmem_cache_setup_percpu_array() will allocate a per-cpu array for
caching alloc/free objects of given capacity for the cache. The cache
has to have been created with the SLAB_NO_MERGE flag.
Further allocations from the cache will allocate from the per-cpu array
as long as they have no NUMA node preference. Frees will also go to the
array.
When the array is found empty during an allocation that is allowed to
block, half of the array is refilled from slabs by an internal bulk
alloc operation. When the array is found full during freeing, half of
the array is flushed by an internal bulk free operation.
The array does not distinguish NUMA locality of the cached objects. If
an allocation is requested with kmem_cache_alloc_node() with a specific
node (not NUMA_NO_NODE), the array is bypassed.
The bulk operations exposed to slab users also try to utilize the array
when possible, but leave the array empty or full without a refill or
flush, and use the internal bulk alloc/free only to fulfil (part of) the
request itself, if the array cannot serve it completely.
If kmemcg is enabled and active, bulk freeing skips the array completely
as it would be less efficient to use it than not.
kmem_cache_prefill_percpu_array() can be called to ensure the array on
the current cpu contains at least the given number of objects. If the
array needs to be refilled, it will be done above the indicated count to
prevent doing many small prefills followed by small numbers of actual
allocations.
However the prefill is only opportunistic as there's no cpu pinning or
disabled preemption between the prefill and actual allocations.
Therefore allocations cannot fully rely on the array for success even
after a prefill. But misses should be rare enough that e.g. GFP_ATOMIC
allocations in a should be acceptable to perform in a restricted context
following a prefill.
Sysfs stat counters alloc_cpu_cache and free_cpu_cache count objects
allocated or freed using the percpu array; counters cpu_cache_refill and
cpu_cache_flush count objects refilled or flushed form the array. The
efficiency of reusing objects in the array can thus be determined by
comparing the alloc_cpu_cache/free_cpu_cache counters with the
refill/flush counters. The refill and flush operations will also count
towards the usual alloc_fastpath/slowpath, free_fastpath/slowpath and
other counters.
Access to the array is protected by local_lock_irqsave() operations.
When slub_debug is enabled for a cache with percpu array, the objects in
the array are considered as allocated from the slub_debug perspective,
and the alloc/free debugging hooks occur when moving the objects between
the array and slab pages. This means that e.g. an use-after-free that
occurs for an object cached in the array is undetected. Collected
alloc/free stacktraces might also be less useful. This limitation could
be changed in the future.
On the other hand, KASAN, kmemcg and other hooks are executed on actual
allocations and frees by kmem_cache users even if those use the array,
so their debugging or accounting accuracy should be unaffected.