Paul E. McKenney [Thu, 9 Dec 2021 19:38:09 +0000 (11:38 -0800)]
Merge branches 'doc.2021.11.30c', 'exp.2021.12.07a', 'fastnohz.2021.11.30c', 'fixes.2021.11.30c', 'nocb.2021.12.09a', 'nolibc.2021.11.30c', 'tasks.2021.12.09a', 'torture.2021.12.07a' and 'torturescript.2021.11.30c' into HEAD
Frederic Weisbecker [Tue, 23 Nov 2021 00:37:08 +0000 (01:37 +0100)]
rcu/nocb: Merge rcu_spawn_cpu_nocb_kthread() and rcu_spawn_one_nocb_kthread()
The rcu_spawn_one_nocb_kthread() function is called only from
rcu_spawn_cpu_nocb_kthread(). Therefore, inline the former into
the latter, saving a few lines of code.
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Allow the rcu_nocbs kernel parameter to be specified just by itself,
without specifying any CPUs. This allows systems administrators to use
"rcu_nocbs" to specify that none of the CPUs are to be offloaded at boot
time, but than any of them may be offloaded at runtime via cpusets.
In contrast, if the "rcu_nocbs" or "nohz_full" kernel parameters are not
specified at all, then not only are none of the CPUs offloaded at boot,
none of them can be offloaded at runtime, either.
While in the area, modernize the description of the "rcuo" kthreads'
naming scheme.
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Tue, 23 Nov 2021 00:37:06 +0000 (01:37 +0100)]
rcu/nocb: Create kthreads on all CPUs if "rcu_nocbs=" or "nohz_full=" are passed
In order to be able to (de-)offload any CPU using cpusets in the future,
create the NOCB data structures for all possible CPUs. For now this is
done only as long as the "rcu_nocbs=" or "nohz_full=" kernel parameters
are passed to avoid the unnecessary overhead for most users.
Note that the rcuog and rcuoc kthreads are not created until at least
one of the corresponding CPUs comes online. This approach avoids the
creation of excess kthreads when firmware lies about the number of CPUs
present on the system.
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Tue, 23 Nov 2021 00:37:05 +0000 (01:37 +0100)]
rcu/nocb: Optimize kthreads and rdp initialization
Currently cpumask_available() is used to prevent from unwanted NOCB
initialization. However if neither "rcu_nocbs=" nor "nohz_full="
parameters are passed to a kernel built with CONFIG_CPUMASK_OFFSTACK=n,
the initialization path is still taken, running through all sorts of
needless operations and iterations on an empty cpumask.
Fix this by relying on a real initialization state instead. This also
optimizes kthread creation, preventing needless iteration over all online
CPUs when the kernel is booted without any offloaded CPUs.
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Tue, 23 Nov 2021 00:37:04 +0000 (01:37 +0100)]
rcu/nocb: Prepare nocb_cb_wait() to start with a non-offloaded rdp
In order to be able to toggle the offloaded state from cpusets, a nocb
kthread will need to be created for all possible CPUs whenever either
of the "rcu_nocbs=" or "nohz_full=" parameters are specified.
Therefore, the nocb_cb_wait() kthread must be prepared to start running
on a de-offloaded rdp. To accomplish this, simply move the sleeping
condition to the beginning of the nocb_cb_wait() function, which prevents
this kthread from attempting to invoke callbacks before the corresponding
CPU is offloaded.
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Tue, 23 Nov 2021 00:37:03 +0000 (01:37 +0100)]
rcu/nocb: Remove rcu_node structure from nocb list when de-offloaded
The nocb_gp_wait() function iterates over all CPUs in its group,
including even those CPUs that have been de-offloaded. This is of
course suboptimal, especially if none of the CPUs within the group are
currently offloaded. This will become even more of a problem once a
nocb kthread is created for all possible CPUs.
Therefore use a standard double linked list to link all the offloaded
rcu_data structures and safely add or delete these structure as we
offload or de-offload them, respectively.
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 30 Nov 2021 00:52:31 +0000 (16:52 -0800)]
rcu-tasks: Use fewer callbacks queues if callback flood ends
By default, when lock contention is encountered, the RCU Tasks flavors
of RCU switch to using per-CPU queueing. However, if the callback
flood ends, per-CPU queueing continues to be used, which introduces
significant additional overhead, especially for callback invocation,
which fans out a series of workqueue handlers.
This commit therefore switches back to single-queue operation if at the
beginning of a grace period there are very few callbacks. The definition
of "very few" is set by the rcupdate.rcu_task_collapse_lim module
parameter, which defaults to 10. This switch happens in two phases,
with the first phase causing future callbacks to be enqueued on CPU 0's
queue, but with all queues continuing to be checked for grace periods
and callback invocation. The second phase checks to see if an RCU grace
period has elapsed and if all remaining RCU-Tasks callbacks are queued
on CPU 0. If so, only CPU 0 is checked for future grace periods and
callback operation.
Of course, the return of contention anywhere during this process will
result in returning to per-CPU callback queueing.
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 29 Nov 2021 19:46:33 +0000 (11:46 -0800)]
rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueing
Decreasing the number of callback queues is a bit tricky because it
is necessary to handle callbacks that were queued before the number of
queues decreased, but which were not ready to invoke until afterwards.
This commit takes a first step in this direction by maintaining a separate
->percpu_dequeue_lim to control callback dequeueing, in addition to the
existing ->percpu_enqueue_lim which now controls only enqueueing.
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 24 Nov 2021 23:12:15 +0000 (15:12 -0800)]
rcu-tasks: Use more callback queues if contention encountered
The rcupdate.rcu_task_enqueue_lim module parameter allows system
administrators to tune the number of callback queues used by the RCU
Tasks flavors. However if callback storms are infrequent, it would
be better to operate with a single queue on a given system unless and
until that system actually needed more queues. Systems not needing
more queues can then avoid the overhead of checking the extra queues
and especially avoid the overhead of fanning workqueue handlers out to
all CPUs to invoke callbacks.
This commit therefore switches to using all the CPUs' callback queues if
call_rcu_tasks_generic() encounters too much lock contention. The amount
of lock contention to tolerate defaults to 100 contended lock acquisitions
per jiffy, and can be adjusted using the new rcupdate.rcu_task_contend_lim
module parameter.
Such switching is undertaken only if the rcupdate.rcu_task_enqueue_lim
module parameter is negative, which is its default value (-1).
This allows savvy systems administrators to set the number of queues
to some known good value and to not have to worry about the kernel doing
any second guessing.
[ paulmck: Apply feedback from Guillaume Tucker and kernelci. ]
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 24 Nov 2021 00:16:50 +0000 (16:16 -0800)]
rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic()
If the caller of of call_rcu_tasks(), call_rcu_tasks_rude(),
or call_rcu_tasks_trace() holds a raw spinlock, and then if
call_rcu_tasks_generic() determines that the grace-period kthread must
be awakened, then the wakeup might acquire a normal spinlock while a
raw spinlock is held. This results in lockdep splats when the
kernel is built with CONFIG_PROVE_RAW_LOCK_NESTING=y.
This commit therefore defers the wakeup using irq_work_queue().
It would be nice to directly invoke wakeup when a raw spinlock is not
held, but there is currently no way to check for this in all kernels.
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 22 Nov 2021 21:38:42 +0000 (13:38 -0800)]
rcu-tasks: Count trylocks to estimate call_rcu_tasks() contention
This commit converts the unconditional raw_spin_lock_rcu_node() lock
acquisition in call_rcu_tasks_generic() to a trylock followed by an
unconditional acquisition if the trylock fails. If the trylock fails,
the failure is counted, but the count is reset to zero on each new jiffy.
This statistic will be used to determine when to move from a single
callback queue to per-CPU callback queues.
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 12 Nov 2021 15:33:40 +0000 (07:33 -0800)]
rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueing
This commit adds a rcupdate.rcu_task_enqueue_lim module parameter that
sets the initial number of callback queues to use for the RCU Tasks
family of RCU implementations. This parameter allows testing of various
fanout values.
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 11 Nov 2021 22:53:43 +0000 (14:53 -0800)]
rcu-tasks: Make rcu_barrier_tasks*() handle multiple callback queues
Currently, rcu_barrier_tasks(), rcu_barrier_tasks_rude(),
and rcu_barrier_tasks_trace() simply invoke the corresponding
synchronize_rcu_tasks*() function. This works because there is only
one callback queue.
However, there will soon be multiple callback queues. This commit
therefore scans the queues currently in use, entraining a callback on
each non-empty queue. Sequence numbers and reference counts are used
to synchronize this process in a manner similar to the approach taken
by rcu_barrier().
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 10 Nov 2021 23:56:40 +0000 (15:56 -0800)]
rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations
If there is a flood of callbacks, it is necessary to put multiple
CPUs to work invoking those callbacks. This commit therefore uses a
workqueue-flooding approach to parallelize RCU Tasks callback execution.
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 9 Nov 2021 21:37:34 +0000 (13:37 -0800)]
rcu-tasks: Abstract checking of callback lists
This commit adds a rcu_tasks_need_gpcb() function that returns an
indication of whether another grace period is required, and if no grace
period is required, whether there are callbacks that need to be invoked.
The function scans all per-CPU lists currently in use.
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 9 Nov 2021 19:11:32 +0000 (11:11 -0800)]
rcu-tasks: Add a ->percpu_enqueue_lim to the rcu_tasks structure
This commit adds a ->percpu_enqueue_lim field to the rcu_tasks structure.
This field contains two to the power of the ->percpu_enqueue_shift
field, easing construction of iterators over the per-CPU queues that
might contain RCU Tasks callbacks. Such iterators will be introduced
in later commits.
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Neeraj Upadhyay [Tue, 9 Nov 2021 11:22:14 +0000 (16:52 +0530)]
rcu-tasks: Inspect stalled task's trc state in locked state
On RCU tasks trace stall, inspect the RCU-tasks-trace specific
states of stalled task in locked down state, using try_invoke_
on_locked_down_task(), to get reliable trc state of a non-running
stalled task.
Paul E. McKenney [Tue, 9 Nov 2021 00:52:02 +0000 (16:52 -0800)]
rcu-tasks: Use spin_lock_rcu_node() and friends
This commit renames the rcu_tasks_percpu structure's ->cbs_pcpu_lock
to ->lock and then uses spin_lock_rcu_node() and friends to acquire and
release this lock, preparing for upcoming commits that will spread the
grace-period process across multiple CPUs and kthreads.
[ paulmck: Apply feedback from kernel test robot. ]
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 23 Nov 2021 21:51:11 +0000 (13:51 -0800)]
rcutorture: Combine n_max_cbs from all kthreads in a callback flood
With the addition of multiple callback-flood kthreads, the maximum number
of callbacks from any one of those kthreads is reported in the rcutorture
run summary. This commit changes this to report the sum of each kthread's
maximum number of callbacks in a given callback-flooding episode.
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 23 Nov 2021 19:53:52 +0000 (11:53 -0800)]
rcutorture: Add ability to limit callback-flood intensity
The RCU tasks flavors of RCU now need concurrent callback flooding to
test their ability to switch between single-queue mode and per-CPU queue
mode, but their lack of heavy-duty forward-progress features rules out
the use of rcutorture's current callback-flooding code. This commit
therefore provides the ability to limit the intensity of the callback
floods using a new ->cbflood_max field in the rcu_operations structure.
When this field is zero, there is no limit, otherwise, each callback-flood
kthread allocates at most ->cbflood_max callbacks.
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit converts the rcutorture.fwd_progress module parameter from
bool to int, so that it specifies the number of callback-flood kthreads.
Values less than zero specify one kthread per CPU, however, the number of
kthreads executing concurrently is limited to the number of online CPUs.
This commit also reverse the order of the need-resched and callback-flood
operations to cause the callback flooding to happen more nearly at the
same time.
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 12 Nov 2021 16:16:42 +0000 (08:16 -0800)]
rcutorture: Test RCU-tasks multiqueue callback queueing
This commit modifies the TASKS01 scenario to use four callback queues
and the TRACE01 scenario to use two queues, thus providing testing of
multiple queues by default.
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Li Zhijian [Mon, 25 Oct 2021 03:26:58 +0000 (11:26 +0800)]
refscale: Always log the error message
An OOM is a serious error that should be logged even in non-verbose runs.
This commit therefore adds an unconditional SCALEOUT_ERRSTRING() macro
and uses it instead of VERBOSE_SCALEOUT_ERRSTRING() when reporting an OOM.
[ paulmck: Drop do-while from SCALEOUT_ERRSTRING() due to only single statement. ]
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 9 Nov 2021 00:18:57 +0000 (16:18 -0800)]
rcu_tasks: Convert bespoke callback list to rcu_segcblist structure
This commit moves from a bespoke head and tail pointer in the
rcu_tasks_percpu structure to an rcu_segcblist structure, thus allowing
associating the grace-period sequence number with groups of callbacks.
This in turn will allow callbacks to be invoked independently on
different CPUs.
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 8 Nov 2021 22:14:43 +0000 (14:14 -0800)]
rcu-tasks: Convert grace-period counter to grace-period sequence number
This commit moves the rcu_tasks structure's ->n_gps grace-period-counter
field to a ->task_gp_seq grce-period sequence number in order to enable
use of the rcu_segcblist structure for the callback lists. This in turn
permits CPUs to lag behind the RCU Tasks grace-period sequence number
without suffering long-term slowdowns in callback invocation.
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 8 Nov 2021 18:51:13 +0000 (10:51 -0800)]
rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection
This commit introduces a ->percpu_enqueue_shift field to the rcu_tasks
structure, and uses it to shift down the CPU number in order to
select a rcu_tasks_percpu structure. This field is currently set to a
sufficiently large shift count to always select the CPU-0 instance of
the rcu_tasks_percpu structure, and later commits will adjust this.
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Sat, 6 Nov 2021 04:52:00 +0000 (21:52 -0700)]
rcu-tasks: Create per-CPU callback lists
Currently, RCU Tasks Trace (as well as the other two flavors of RCU Tasks)
use a single global callback list. This works well and is simple, but
expected changes in workload will cause this list to become a bottleneck.
This commit therefore creates per-CPU callback lists for the various
flavors of RCU Tasks, but continues queueing on a single list, namely
that of CPU 0. Later commits will dynamically vary the number of lists
in use to accommodate dynamic changes in workload.
Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Tested-by: kernel test robot <beibei.si@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Tue, 19 Oct 2021 00:08:16 +0000 (02:08 +0200)]
rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread
rcu_core() tries to ensure that its self-invocation in case of callbacks
overload only happen in softirq/rcuc mode. Indeed it doesn't make sense
to trigger local RCU core from nocb_cb kthread since it can execute
on a CPU different from the target rdp. Also in case of overload, the
nocb_cb kthread simply iterates a new loop of callbacks processing.
However the "offloaded" check that aims at preventing misplaced
rcu_core() invocations is wrong. First of all that state is volatile
and second: softirq/rcuc can execute while the target rdp is offloaded.
As a result rcu_core() can be invoked on the wrong CPU while in the
process of (de-)offloading.
Fix that with moving the rcu_core() self-invocation to rcu_core() itself,
irrespective of the rdp offloaded state.
Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Tue, 19 Oct 2021 00:08:15 +0000 (02:08 +0200)]
rcu: Apply callbacks processing time limit only on softirq
Time limit only makes sense when callbacks are serviced in softirq mode
because:
_ In case we need to get back to the scheduler,
cond_resched_tasks_rcu_qs() is called after each callback.
_ In case some other softirq vector needs the CPU, the call to
local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about
them via a call to do_softirq().
Therefore, make sure the time limit only applies to softirq mode.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Tue, 19 Oct 2021 00:08:14 +0000 (02:08 +0200)]
rcu: Fix callbacks processing time limit retaining cond_resched()
The callbacks processing time limit makes sure we are not exceeding a
given amount of time executing the queue.
However its "continue" clause bypasses the cond_resched() call on
rcuc and NOCB kthreads, delaying it until we reach the limit, which can
be very long...
Make sure the scheduler has a higher priority than the time limit.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Tue, 19 Oct 2021 00:08:13 +0000 (02:08 +0200)]
rcu/nocb: Limit number of softirq callbacks only on softirq
The current condition to limit the number of callbacks executed in a
row checks the offloaded state of the rdp. Not only is it volatile
but it is also misleading: the rcu_core() may well be executing
callbacks concurrently with NOCB kthreads, and the offloaded state
would then be verified on both cases. As a result the limit would
spuriously not apply anymore on softirq while in the middle of
(de-)offloading process.
Fix and clarify the condition with those constraints in mind:
_ If callbacks are processed either by rcuc or NOCB kthread, the call
to cond_resched_tasks_rcu_qs() is enough to take care of the overload.
_ If instead callbacks are processed by softirqs:
* If need_resched(), exit the callbacks processing
* Otherwise if CPU is idle we can continue
* Otherwise exit because a softirq shouldn't interrupt a task for too
long nor deprive other pending softirq vectors of the CPU.
Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Tue, 19 Oct 2021 00:08:11 +0000 (02:08 +0200)]
rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check
It's not entirely obvious why rdp->qlen_last_fqs_check is updated before
processing the queue only on offloaded rdp. There can be different
effect to that, either in favour of triggering the force quiescent state
path or not. For example:
1) If the number of callbacks has decreased since the last
rdp->qlen_last_fqs_check update (because we recently called
rcu_do_batch() and we executed below qhimark callbacks) and the number
of processed callbacks on a subsequent do_batch() arranges for
exceeding qhimark on non-offloaded but not on offloaded setup, then we
may spare a later run to the force quiescent state
slow path on __call_rcu_nocb_wake(), as compared to the non-offloaded
counterpart scenario.
In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check
would be 1000 and the fqs slowpath would have executed.
2) If the number of callbacks has increased since the last
rdp->qlen_last_fqs_check update (because we recently queued below
qhimark callbacks) and the number of callbacks executed in rcu_do_batch()
doesn't exceed qhimark for either offloaded or non-offloaded setup,
then it's possible that the offloaded scenario later run the force
quiescent state slow path on __call_rcu_nocb_wake() while the
non-offloaded doesn't.
In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check
would be 3000 and the fqs slowpath would have executed.
The reason for updating rdp->qlen_last_fqs_check when invoking callbacks
for offloaded CPUs is that there is usually no point in waking up either
the rcuog or rcuoc kthreads while in this state. After all, both threads
are prohibited from indefinite sleeps.
The exception is when some huge number of callbacks are enqueued while
rcu_do_batch() is in the midst of invoking, in which case interrupting
the rcuog kthread's timed sleep might get more callbacks set up for the
next grace period.
Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Original-patch-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Tue, 19 Oct 2021 00:08:10 +0000 (02:08 +0200)]
rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe
When callbacks are offloaded, the NOCB kthreads handle the callbacks
progression on behalf of rcu_core().
However during the (de-)offloading process, the kthread may not be
entirely up to the task. As a result some callbacks grace period
sequence number may remain stale for a while because rcu_core() won't
take care of them either.
Fix this with forcing callbacks acceleration from rcu_core() as long
as the offloading process isn't complete.
Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Thomas Gleixner [Tue, 19 Oct 2021 00:08:09 +0000 (02:08 +0200)]
rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe
While reporting a quiescent state for a given CPU, rcu_core() takes
advantage of the freshly loaded grace period sequence number and the
locked rnp to accelerate the callbacks whose sequence number have been
assigned a stale value.
This action is only necessary when the rdp isn't offloaded, otherwise
the NOCB kthreads already take care of the callbacks progression.
However the check for the offloaded state is volatile because it is
performed outside the IRQs disabled section. It's possible for the
offloading process to preempt rcu_core() at that point on PREEMPT_RT.
This is dangerous because rcu_core() may end up accelerating callbacks
concurrently with NOCB kthreads without appropriate locking.
Fix this with moving the offloaded check inside the rnp locking section.
Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Tue, 19 Oct 2021 00:08:08 +0000 (02:08 +0200)]
rcu/nocb: Invoke rcu_core() at the start of deoffloading
On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process,
some work, such as callbacks acceleration and invocation, may be left
unattended due to the volatile checks on the offloaded state.
In the worst case this work is postponed until the next rcu_pending()
check that can take a jiffy to reach, which can be a problem in case
of callbacks flooding.
Solve that with invoking rcu_core() early in the de-offloading process.
This way any work dismissed by an ongoing rcu_core() call fooled by
a preempting deoffloading process will be caught up by a nearby future
recall to rcu_core(), this time fully aware of the de-offloading state.
Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Tue, 19 Oct 2021 00:08:07 +0000 (02:08 +0200)]
rcu/nocb: Prepare state machine for a new step
Currently SEGCBLIST_SOFTIRQ_ONLY is a bit of an exception among the
segcblist flags because it is an exclusive state that doesn't mix up
with the other flags. Remove it in favour of:
_ A flag specifying that rcu_core() needs to perform callbacks execution
and acceleration
and
_ A flag specifying we want the nocb lock to be held in any needed
circumstances
This clarifies the code and is more flexible: It allows to have a state
where rcu_core() runs with locking while offloading hasn't started yet.
This is a necessary step to prepare for triggering rcu_core() at the
very beginning of the de-offloading process so that rcu_core() won't
dismiss work while being preempted by the de-offloading process, at
least not without a pending subsequent rcu_core() that will quickly
catch up.
Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Mon, 11 Oct 2021 14:51:30 +0000 (16:51 +0200)]
rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading
rcu_nocb_lock_irqsave() can be preempted between the call to
rcu_segcblist_is_offloaded() and the actual locking. This matters now
that rcu_core() is preemptible on PREEMPT_RT and the (de-)offloading
process can interrupt the softirq or the rcuc kthread.
As a result we may locklessly call into code that requires nocb locking.
In practice this is a problem while we accelerate callbacks on rcu_core().
Simply disabling interrupts before (instead of after) checking the NOCB
offload state fixes the issue.
Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 17 Sep 2021 22:04:48 +0000 (15:04 -0700)]
rcu: Tighten rcu_advance_cbs_nowake() checks
Currently, rcu_advance_cbs_nowake() checks that a grace period is in
progress, however, that grace period could end just after the check.
This commit rechecks that a grace period is still in progress while
holding the rcu_node structure's lock. The grace period cannot end while
the current CPU's rcu_node structure's ->lock is held, thus avoiding
false positives from the WARN_ON_ONCE().
As Daniel Vacek noted, it is not necessary for the rcu_node structure
to have a CPU that has not yet passed through its quiescent state.
Tested-by: Guillaume Morin <guillaume@morinfr.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Tue, 30 Nov 2021 16:21:08 +0000 (17:21 +0100)]
rcu/exp: Mark current CPU as exp-QS in IPI loop second pass
Expedited RCU grace periods invoke sync_rcu_exp_select_node_cpus(), which
takes two passes over the leaf rcu_node structure's CPUs. The first
pass gathers up the current CPU and CPUs that are in dynticks idle mode.
The workqueue will report a quiescent state on their behalf later.
The second pass sends IPIs to the rest of the CPUs, but excludes the
current CPU, incorrectly assuming it has been included in the first
pass's list of CPUs.
Unfortunately the current CPU may have changed between the first and
second pass, due to the fact that the various rcu_node structures'
->lock fields have been dropped, thus momentarily enabling preemption.
This means that if the second pass's CPU was not on the first pass's
list, it will be ignored completely. There will be no IPI sent to
it, and there will be no reporting of quiescent states on its behalf.
Unfortunately, the expedited grace period will nevertheless be waiting
for that CPU to report a quiescent state, but with that CPU having no
reason to believe that such a report is needed.
The result will be an expedited grace period stall.
Fix this by no longer excluding the current CPU from consideration during
the second pass.
Fixes: b9ad4d6ed18e ("rcu: Avoid self-IPI in sync_rcu_exp_select_node_cpus()") Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 29 Sep 2021 18:09:34 +0000 (11:09 -0700)]
rcu: Make idle entry report expedited quiescent states
In non-preemptible kernels, an unfortunately timed expedited grace period
can result in the rcu_exp_handler() IPI handler setting the rcu_data
structure's cpu_no_qs.b.exp field just as the target CPU enters idle.
There are situations in which this field will not be checked until after
that CPU exits idle. The resulting grace-period latency does not qualify
as "expedited".
This commit therefore checks this field upon non-preemptible idle entry in
the rcu_preempt_deferred_qs() function. It also qualifies the rcu_core()
preempt_count() check with IS_ENABLED(CONFIG_PREEMPT_COUNT) to prevent
false-positive quiescent states from count-free kernels.
Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 29 Sep 2021 16:21:34 +0000 (09:21 -0700)]
rcu: Prevent expedited GP from enabling tick on offline CPU
If an RCU expedited grace period starts just when a CPU is in the process
of going offline, so that the outgoing CPU has completed its pass through
stop-machine but has not yet completed its final dive into the idle loop,
RCU will attempt to enable that CPU's scheduling-clock tick via a call
to tick_dep_set_cpu(). For this to happen, that CPU has to have been
online when the expedited grace period completed its CPU-selection phase.
This is pointless: The outgoing CPU has interrupts disabled, so it cannot
take a scheduling-clock tick anyway. In addition, the tick_dep_set_cpu()
function's eventual call to irq_work_queue_on() will splat as follows:
Paul E. McKenney [Tue, 28 Sep 2021 18:06:35 +0000 (11:06 -0700)]
rcu: Mark sync_sched_exp_online_cleanup() ->cpu_no_qs.b.exp load
The sync_sched_exp_online_cleanup() is called from rcutree_online_cpu(),
which can be invoked with interrupts enabled. This means that
the ->cpu_no_qs.b.exp field is subject to data races from the
rcu_exp_handler() IPI handler, so this commit marks the load from
that field.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
rcu: Remove rcu_data.exp_deferred_qs and convert to rcu_data.cpu no_qs.b.exp
Having two fields for the same purpose with subtle differences on
different RCU flavours is confusing, especially when both fields always
exist on both RCU flavours.
Fortunately, it is now safe for preemptible RCU to rely on the rcu_data
structure's ->cpu_no_qs.b.exp field, just like non-preemptible RCU.
This commit therefore removes the ad-hoc ->exp_deferred_qs field.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
rcu: Move rcu_data.cpu_no_qs.b.exp reset to rcu_export_exp_rdp()
On non-preemptible RCU, move clearing of the rcu_data structure's
->cpu_no_qs.b.exp filed to the actual expedited quiescent state report
function, matching hw preemptible RCU handles the ->exp_deferred_qs field.
This prepares for removing ->exp_deferred_qs in favor of ->cpu_no_qs.b.exp
for both preemptible and non-preemptible RCU.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
rcu: Ignore rdp.cpu_no_qs.b.exp on preemptible RCU's rcu_qs()
Preemptible RCU does not use the rcu_data structure's ->cpu_no_qs.b.exp,
instead using a separate ->exp_deferred_qs field to record the need for
an expedited quiescent state.
In fact ->cpu_no_qs.b.exp should never be set in preemptible RCU because
preemptible RCU's expedited grace periods use other mechanisms to record
quiescent states.
This commit therefore removes the implicit rcu_qs() reference to
->cpu_no_qs.b.exp in favor of a direct reference to ->cpu_no_qs.b.norm.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 25 Nov 2021 00:04:51 +0000 (16:04 -0800)]
rcutorture: Test RCU Tasks lock-contention detection
This commit adjusts the TRACE02 scenario to use a pair of callback-flood
kthreads. This in turn forces lock contention on the single RCU Tasks
Trace callback queue, which forces use of all CPUs' queues, thus testing
this transition. (No, there is not yet any way to transition back.
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 22 Nov 2021 20:47:50 +0000 (12:47 -0800)]
torture: Retry download once before giving up
Currently, a transient network error can kill a run if it happens while
downloading the tarball to one of the target systems. This commit
therefore does a 60-second wait and then a retry. If further experience
indicates, a more elaborate mechanism might be used later.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Li Zhijian [Mon, 25 Oct 2021 03:26:57 +0000 (11:26 +0800)]
refscale: Prevent buffer to pr_alert() being too long
0Day/LKP observed that the refscale results fail to complete when larger
values of nrun (such as 300) are specified. The problem is that printk()
can accept at most a 1024-byte buffer. This commit therefore prints
the buffer whenever its length exceeds 800 bytes.
CC: Philip Li <philip.li@intel.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Sat, 25 Sep 2021 04:30:26 +0000 (21:30 -0700)]
rcutorture: Suppress pi-lock-across read-unlock testing for Tiny SRCU
Because Tiny srcu_read_unlock() directly calls swake_up_one(), lockdep
complains when a pi lock is held across that srcu_read_unlock().
Although this is a lockdep false positive (there is no other CPU to
complete the deadlock cycle), lockdep is what it is at the moment.
This commit therefore prevents rcutorture from holding pi lock across
a Tiny srcu_read_unlock().
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 23 Sep 2021 03:49:12 +0000 (20:49 -0700)]
rcutorture: More thoroughly test nested readers
Currently, nested readers occur only when a timer handler interrupts a
reader. This is rare, and is thus insufficient testing of the transition
between nesting levels. This commit therefore causes rcutorture nested
readers to be the rule rather than the exception.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 23 Sep 2021 03:31:44 +0000 (20:31 -0700)]
rcutorture: Sanitize RCUTORTURE_RDR_MASK
RCUTORTURE_RDR_MASK is currently not the bit indicated by
RCUTORTURE_RDR_SHIFT, but is instead all the bits less significant than
that one. This is an accident waiting to happen, so this commit makes
RCUTORTURE_RDR_MASK be that one bit and adjusts uses accordingly.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Sun, 19 Sep 2021 03:40:48 +0000 (20:40 -0700)]
rcu-tasks: Don't remove tasks with pending IPIs from holdout list
Currently, the check_all_holdout_tasks_trace() function removes all tasks
marked with ->trc_reader_checked from the holdout list, including those
with IPIs pending. This means that the IPI handler might arrive at
a task that has already been removed from the list, which is at best
an accident waiting to happen.
This commit therefore avoids removing tasks with IPIs pending from
the holdout list. This in turn means that the "if" condition in the
for_each_online_cpu() loop in rcu_tasks_trace_postgp() should always
evaluate to false, so a WARN_ON_ONCE() is added to check that.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tiny SRCU readers can appear at task level, but also in interrupt and
softirq handlers. Because Tiny SRCU is selected only in kernels built
with CONFIG_SMP=n and CONFIG_PREEMPTION=n, it is not possible for a grace
period to start while there is a non-task-level SRCU reader executing.
This means that it does not make sense for __srcu_read_unlock() to awaken
the Tiny SRCU grace period, because that can only happen when the grace
period is waiting for one value of ->srcu_idx and __srcu_read_unlock()
is ending the last reader for some other value of ->srcu_idx. After all,
any such wakeup will be redundant.
Worse yet, in some cases, such wakeups generate lockdep splats:
======================================================
WARNING: possible circular locking dependency detected
5.15.0-rc1+ #3758 Not tainted
------------------------------------------------------
rcu_torture_rea/53 is trying to acquire lock: ffffffff9514e6a8 (srcu_ctl.srcu_wq.lock){..-.}-{2:2}, at:
xa/0x30
but task is already holding lock: ffff95c642479d80 (&p->pi_lock){-.-.}-{2:2}, at:
_extend+0x370/0x400
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
This is a false positive because there is only one CPU, and both locks
are raw (non-preemptible) spinlocks. However, it is worthwhile getting
rid of the redundant wakeup, which has the side effect of breaking
the theoretical deadlock cycle. This commit therefore eliminates the
redundant wakeups.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Mark Brown [Sun, 24 Oct 2021 17:43:23 +0000 (19:43 +0200)]
tools/nolibc: Implement gettid()
Allow test programs to determine their thread ID.
Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Ammar Faizi [Sun, 24 Oct 2021 17:43:22 +0000 (19:43 +0200)]
tools/nolibc: x86-64: Use `mov $60,%eax` instead of `mov $60,%rax`
Note that mov to 32-bit register will zero extend to 64-bit register.
Thus `mov $60,%eax` has the same effect with `mov $60,%rax`. Use the
shorter opcode to achieve the same thing.
```
b8 3c 00 00 00 mov $60,%eax (5 bytes) [1]
48 c7 c0 3c 00 00 00 mov $60,%rax (7 bytes) [2]
```
Currently, we use [2]. Change it to [1] for shorter code.
Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Ammar Faizi [Sun, 24 Oct 2021 17:43:21 +0000 (19:43 +0200)]
tools/nolibc: x86: Remove `r8`, `r9` and `r10` from the clobber list
Linux x86-64 syscall only clobbers rax, rcx and r11 (and "memory").
- rax for the return value.
- rcx to save the return address.
- r11 to save the rflags.
Other registers are preserved.
Having r8, r9 and r10 in the syscall clobber list is harmless, but this
results in a missed-optimization.
As the syscall doesn't clobber r8-r10, GCC should be allowed to reuse
their value after the syscall returns to userspace. But since they are
in the clobber list, GCC will always miss this opportunity.
Remove them from the x86-64 syscall clobber list to help GCC generate
better code and fix the comment.
See also the x86-64 ABI, section A.2 AMD64 Linux Kernel Conventions,
A.2.1 Calling Conventions [1].
Extra note:
Some people may think it does not really give a benefit to remove r8,
r9 and r10 from the syscall clobber list because the impression of
syscall is a C function call, and function call always clobbers those 3.
However, that is not the case for nolibc.h, because we have a potential
to inline the "syscall" instruction (which its opcode is "0f 05") to the
user functions.
All syscalls in the nolibc.h are written as a static function with inline
ASM and are likely always inline if we use optimization flag, so this is
a profit not to have r8, r9 and r10 in the clobber list.
Here is the example where this matters.
Consider the following C code:
```
#include "tools/include/nolibc/nolibc.h"
#define read_abc(a, b, c) __asm__ volatile("nop"::"r"(a),"r"(b),"r"(c))
int main(void)
{
int a = 0xaa;
int b = 0xbb;
int c = 0xcc;
GCC thinks that syscall will clobber r8, r9, r10. So it spills 0xaa,
0xbb and 0xcc to callee saved registers (r12, rbp and rbx). This is
clearly extra memory access and extra stack size for preserving them.
But syscall does not actually clobber them, so this is a missed
optimization.
Now without r8, r9, r10 in the clobber list, GCC generates better code:
Willy Tarreau [Sun, 24 Oct 2021 17:28:16 +0000 (19:28 +0200)]
tools/nolibc: fix incorrect truncation of exit code
Ammar Faizi reported that our exit code handling is wrong. We truncate
it to the lowest 8 bits but the syscall itself is expected to take a
regular 32-bit signed integer, not an unsigned char. It's the kernel
that later truncates it to the lowest 8 bits. The difference is visible
in strace, where the program below used to show exit(255) instead of
exit(-1):
int main(void)
{
return -1;
}
This patch applies the fix to all archs. x86_64, i386, arm64, armv7 and
mips were all tested and confirmed to work fine now. Risc-v was not
tested but the change is trivial and exactly the same as for other archs.
Reported-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id> Cc: stable@vger.kernel.org Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Sun, 24 Oct 2021 17:28:15 +0000 (19:28 +0200)]
tools/nolibc: i386: fix initial stack alignment
After re-checking in the spec and comparing stack offsets with glibc,
The last pushed argument must be 16-byte aligned (i.e. aligned before the
call) so that in the callee esp+4 is multiple of 16, so the principle is
the 32-bit equivalent to what Ammar fixed for x86_64. It's possible that
32-bit code using SSE2 or MMX could have been affected. In addition the
frame pointer ought to be zero at the deepest level.
Ammar Faizi [Sun, 24 Oct 2021 17:28:14 +0000 (19:28 +0200)]
tools/nolibc: x86-64: Fix startup code bug
Before this patch, the `_start` function looks like this:
``` 0000000000001170 <_start>:
1170: pop %rdi
1171: mov %rsp,%rsi
1174: lea 0x8(%rsi,%rdi,8),%rdx
1179: and $0xfffffffffffffff0,%rsp
117d: sub $0x8,%rsp
1181: call 1000 <main>
1186: movzbq %al,%rdi
118a: mov $0x3c,%rax
1191: syscall
1193: hlt
1194: data16 cs nopw 0x0(%rax,%rax,1)
119f: nop
```
Note the "and" to %rsp with $-16, it makes the %rsp be 16-byte aligned,
but then there is a "sub" with $0x8 which makes the %rsp no longer
16-byte aligned, then it calls main. That's the bug!
What actually the x86-64 System V ABI mandates is that right before the
"call", the %rsp must be 16-byte aligned, not after the "call". So the
"sub" with $0x8 here breaks the alignment. Remove it.
An example where this rule matters is when the callee needs to align
its stack at 16-byte for aligned move instruction, like `movdqa` and
`movaps`. If the callee can't align its stack properly, it will result
in segmentation fault.
x86-64 System V ABI also mandates the deepest stack frame should be
zero. Just to be safe, let's zero the %rbp on startup as the content
of %rbp may be unspecified when the program starts. Now it looks like
this:
``` 0000000000001170 <_start>:
1170: pop %rdi
1171: mov %rsp,%rsi
1174: lea 0x8(%rsi,%rdi,8),%rdx
1179: xor %ebp,%ebp # zero the %rbp
117b: and $0xfffffffffffffff0,%rsp # align the %rsp
117f: call 1000 <main>
1184: movzbq %al,%rdi
1188: mov $0x3c,%rax
118f: syscall
1191: hlt
1192: data16 cs nopw 0x0(%rax,%rax,1)
119d: nopl (%rax)
```
Cc: Bedirhan KURT <windowz414@gnuweeb.org> Cc: Louvian Lyndal <louvianlyndal@gmail.com> Reported-by: Peter Cordes <peter@cordes.ca> Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
[wt: I did this on purpose due to a misunderstanding of the spec, other
archs will thus have to be rechecked, particularly i386] Cc: stable@vger.kernel.org Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Jun Miao [Mon, 15 Nov 2021 23:23:02 +0000 (07:23 +0800)]
rcu: Avoid alloc_pages() when recording stack
The default kasan_record_aux_stack() calls stack_depot_save() with GFP_NOWAIT,
which in turn can then call alloc_pages(GFP_NOWAIT, ...). In general, however,
it is not even possible to use either GFP_ATOMIC nor GFP_NOWAIT in certain
non-preemptive contexts/RT kernel including raw_spin_locks (see gfp.h and ab00db216c9c7).
Fix it by instructing stackdepot to not expand stack storage via alloc_pages()
in case it runs out by using kasan_record_aux_stack_noalloc().
Jianwei Hu reported:
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:969
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 15319, name: python3
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
softirqs last enabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 6 PID: 15319 Comm: python3 Tainted: G W O 5.15-rc7-preempt-rt #1
Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1b 12/17/2018
Call Trace:
show_stack+0x52/0x58
dump_stack+0xa1/0xd6
___might_sleep.cold+0x11c/0x12d
rt_spin_lock+0x3f/0xc0
rmqueue+0x100/0x1460
rmqueue+0x100/0x1460
mark_usage+0x1a0/0x1a0
ftrace_graph_ret_addr+0x2a/0xb0
rmqueue_pcplist.constprop.0+0x6a0/0x6a0
__kasan_check_read+0x11/0x20
__zone_watermark_ok+0x114/0x270
get_page_from_freelist+0x148/0x630
is_module_text_address+0x32/0xa0
__alloc_pages_nodemask+0x2f6/0x790
__alloc_pages_slowpath.constprop.0+0x12d0/0x12d0
create_prof_cpu_mask+0x30/0x30
alloc_pages_current+0xb1/0x150
stack_depot_save+0x39f/0x490
kasan_save_stack+0x42/0x50
kasan_save_stack+0x23/0x50
kasan_record_aux_stack+0xa9/0xc0
__call_rcu+0xff/0x9c0
call_rcu+0xe/0x10
put_object+0x53/0x70
__delete_object+0x7b/0x90
kmemleak_free+0x46/0x70
slab_free_freelist_hook+0xb4/0x160
kfree+0xe5/0x420
kfree_const+0x17/0x30
kobject_cleanup+0xaa/0x230
kobject_put+0x76/0x90
netdev_queue_update_kobjects+0x17d/0x1f0
... ...
ksys_write+0xd9/0x180
__x64_sys_write+0x42/0x50
do_syscall_64+0x38/0x50
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Links: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/kasan.h?id=7cb3007ce2da27ec02a1a3211941e7fe6875b642 Fixes: 84109ab58590 ("rcu: Record kvfree_call_rcu() call stack for KASAN") Fixes: 26e760c9a7c8 ("rcu: kasan: record and print call_rcu() call stack") Reported-by: Jianwei Hu <jianwei.hu@windriver.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Marco Elver <elver@google.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Jun Miao <jun.miao@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Zqiang [Mon, 15 Nov 2021 05:15:46 +0000 (13:15 +0800)]
rcu: Avoid running boost kthreads on isolated CPUs
When the boost kthreads are created on systems with nohz_full CPUs,
the cpus_allowed_ptr is set to housekeeping_cpumask(HK_FLAG_KTHREAD).
However, when the rcu_boost_kthread_setaffinity() is called, the original
affinity will be changed and these kthreads can subsequently run on
nohz_full CPUs. This commit makes rcu_boost_kthread_setaffinity()
restrict these boost kthreads to housekeeping CPUs.
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
rcu: Replace ________p1 and _________p1 with __UNIQUE_ID(rcu)
This commit replaces both ________p1 and _________p1 with __UNIQUE_ID(rcu),
and also adjusts the callers of the affected macros.
__UNIQUE_ID(rcu) will generate unique variable names during compilation,
which eliminates the need of ________p1 and _________p1 (both having 4
occurrences prior to the code change). This also avoids the variable
name shadowing issue, or at least makes those wishing to cause shadowing
problems work much harder to do so.
The same idea is used for the min/max macros (commit 589a978 and commit e9092d0).
Signed-off-by: Jim Huang <jserv@ccns.ncku.edu.tw> Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 27 Sep 2021 21:30:20 +0000 (14:30 -0700)]
rcu: Move rcu_needs_cpu() to tree.c
Now that RCU_FAST_NO_HZ is no more, there is but one implementation of
the rcu_needs_cpu() function. This commit therefore moves this function
from kernel/rcu/tree_plugin.c to kernel/rcu/tree.c.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 27 Sep 2021 21:18:51 +0000 (14:18 -0700)]
rcu: Remove the RCU_FAST_NO_HZ Kconfig option
All of the uses of CONFIG_RCU_FAST_NO_HZ=y that I have seen involve
systems with RCU callbacks offloaded. In this situation, all that this
Kconfig option does is slow down idle entry/exit with an additional
allways-taken early exit. If this is the only use case, then this
Kconfig option nothing but an attractive nuisance that needs to go away.
This commit therefore removes the RCU_FAST_NO_HZ Kconfig option.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 27 Sep 2021 21:17:35 +0000 (14:17 -0700)]
torture: Remove RCU_FAST_NO_HZ from rcu scenarios
All of the rcu scenarios that mentioning CONFIG_RCU_FAST_NO_HZ disable it.
But this Kconfig option is disabled by default, so this commit removes
the pointless "CONFIG_RCU_FAST_NO_HZ=n" lines from these scenarios.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 27 Sep 2021 21:13:24 +0000 (14:13 -0700)]
torture: Remove RCU_FAST_NO_HZ from rcuscale and refscale scenarios
All of the rcuscale and refscale scenarios that mention the Kconfig option
CONFIG_RCU_FAST_NO_HZ disable it. But this Kconfig option is disabled by
default, so this commit removes the pointless "CONFIG_RCU_FAST_NO_HZ=n"
lines from these scenarios.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Akira Yokosawa [Tue, 2 Nov 2021 23:48:15 +0000 (08:48 +0900)]
doc: RCU: Avoid 'Symbol' font-family in SVG figures
On Ubuntu Focal, strings in some of SVG files under
Documentation/RCU/Design can not be rendered properly when
converted to PDF.
Ubuntu releases since Focal and Debian bullseye have trouble
with "Symbol" font-family in SVG files.
As those strings are mostly API names such as "READ_ONCE()",
"WRITE_ONCE(), "rcu_read_lock()", and so on, using a generic
monospace font-family should be a good alternative.
Substitute the font-family name by a simple sed pattern:
's/Symbol/monospace/g'
Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
NeilBrown [Tue, 26 Oct 2021 04:53:38 +0000 (15:53 +1100)]
doc: Add refcount analogy to What is RCU
The reader-writer-lock analogy is a useful way to think about RCU, but
it is not always applicable. It is useful to have other analogies to
work with, and particularly to emphasise that no single analogy is
perfect.
This patch add a "RCU as reference count" to the "what is RCU" document.
See https://lwn.net/Articles/872559/
[ paulmck: Apply Akira Yokosawa feedback. ] Reviewed-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This document advises building with both CONFIG_NO_HZ=y and
CONFIG_RCU_FAST_NO_HZ=y. However, CONFIG_NO_HZ=y offloads callbacks from
all nohz_full CPUs, and CPUs with offloaded callbacks do not benefit from
CONFIG_RCU_FAST_NO_HZ=y. Quite the opposite: CONFIG_RCU_FAST_NO_HZ=y
simply adds a bit of idle entry/exit overhead.
This commit therefore changes that advice to only CONFIG_NO_HZ=y.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 18 Nov 2021 02:57:01 +0000 (18:57 -0800)]
rcutorture: Add CONFIG_PREEMPT_DYNAMIC=n to tiny scenarios
With CONFIG_PREEMPT_DYNAMIC=y, the kernel builds with CONFIG_PREEMPTION=y
because preemption can be enabled at runtime. This prevents any tests
of Tiny RCU or Tiny SRCU from running correctly. This commit therefore
explicitly sets CONFIG_PREEMPT_DYNAMIC=n for those scenarios.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Gustavo A. R. Silva [Sun, 14 Nov 2021 00:57:25 +0000 (18:57 -0600)]
kconfig: Add support for -Wimplicit-fallthrough
Add Kconfig support for -Wimplicit-fallthrough for both GCC and Clang.
The compiler option is under configuration CC_IMPLICIT_FALLTHROUGH,
which is enabled by default.
Special thanks to Nathan Chancellor who fixed the Clang bug[1][2]. This
bugfix only appears in Clang 14.0.0, so older versions still contain
the bug and -Wimplicit-fallthrough won't be enabled for them, for now.
This concludes a long journey and now we are finally getting rid
of the unintentional fallthrough bug-class in the kernel, entirely. :)
Linus Torvalds [Sun, 14 Nov 2021 20:18:22 +0000 (12:18 -0800)]
Merge tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs cleanups from Darrick Wong:
"The most 'exciting' aspect of this branch is that the xfsprogs
maintainer and I have worked through the last of the code
discrepancies between kernel and userspace libxfs such that there are
no code differences between the two except for #includes.
IOWs, diff suffices to demonstrate that the userspace tools behave the
same as the kernel, and kernel-only bits are clearly marked in the
/kernel/ source code instead of just the userspace source.
Summary:
- Clean up open-coded swap() calls.
- A little bit of #ifdef golf to complete the reunification of the
kernel and userspace libxfs source code"
* tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: sync xfs_btree_split macros with userspace libxfs
xfs: #ifdef out perag code for userspace
xfs: use swap() to make dabtree code cleaner
Linus Torvalds [Sun, 14 Nov 2021 19:53:59 +0000 (11:53 -0800)]
Merge tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull more parisc fixes from Helge Deller:
"Fix a build error in stracktrace.c, fix resolving of addresses to
function names in backtraces, fix single-stepping in assembly code and
flush userspace pte's when using set_pte_at()"
* tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc/entry: fix trace test in syscall exit path
parisc: Flush kernel data mapping in set_pte_at() when installing pte for user page
parisc: Fix implicit declaration of function '__kernel_text_address'
parisc: Fix backtrace to always include init funtion names
Linus Torvalds [Sun, 14 Nov 2021 19:37:49 +0000 (11:37 -0800)]
Merge tag 'sh-for-5.16' of git://git.libc.org/linux-sh
Pull arch/sh updates from Rich Felker.
* tag 'sh-for-5.16' of git://git.libc.org/linux-sh:
sh: pgtable-3level: Fix cast to pointer from integer of different size
sh: fix READ/WRITE redefinition warnings
sh: define __BIG_ENDIAN for math-emu
sh: math-emu: drop unused functions
sh: fix kconfig unmet dependency warning for FRAME_POINTER
sh: Cleanup about SPARSE_IRQ
sh: kdump: add some attribute to function
maple: fix wrong return value of maple_bus_init().
sh: boot: avoid unneeded rebuilds under arch/sh/boot/compressed/
sh: boot: add intermediate vmlinux.bin* to targets instead of extra-y
sh: boards: Fix the cacography in irq.c
sh: check return code of request_irq
sh: fix trivial misannotations
Linus Torvalds [Sun, 14 Nov 2021 19:30:50 +0000 (11:30 -0800)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- Fix early_iounmap
- Drop cc-option fallbacks for architecture selection
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9156/1: drop cc-option fallbacks for architecture selection
ARM: 9155/1: fix early early_iounmap()
Linus Torvalds [Sun, 14 Nov 2021 19:11:51 +0000 (11:11 -0800)]
Merge tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Two fixes due to DT node name changes on Arm, Ltd. boards
- Treewide rename of Ingenic CGU headers
- Update ST email addresses
- Remove Netlogic DT bindings
- Dropping few more cases of redundant 'maxItems' in schemas
- Convert toshiba,tc358767 bridge binding to schema
* tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: watchdog: sunxi: fix error in schema
bindings: media: venus: Drop redundant maxItems for power-domain-names
dt-bindings: Remove Netlogic bindings
clk: versatile: clk-icst: Ensure clock names are unique
of: Support using 'mask' in making device bus id
dt-bindings: treewide: Update @st.com email address to @foss.st.com
dt-bindings: media: Update maintainers for st,stm32-hwspinlock.yaml
dt-bindings: media: Update maintainers for st,stm32-cec.yaml
dt-bindings: mfd: timers: Update maintainers for st,stm32-timers
dt-bindings: timer: Update maintainers for st,stm32-timer
dt-bindings: i2c: imx: hardware do not restrict clock-frequency to only 100 and 400 kHz
dt-bindings: display: bridge: Convert toshiba,tc358767.txt to yaml
dt-bindings: Rename Ingenic CGU headers to ingenic,*.h
Linus Torvalds [Sun, 14 Nov 2021 18:43:38 +0000 (10:43 -0800)]
Merge tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for POSIX CPU timers to address a problem where POSIX CPU
timer delivery stops working for a new child task because
copy_process() copies state information which is only valid for the
parent task"
* tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()
Linus Torvalds [Sun, 14 Nov 2021 18:38:27 +0000 (10:38 -0800)]
Merge tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for the interrupt subsystem
Core code:
- A regression fix for the Open Firmware interrupt mapping code where
a interrupt controller property in a node caused a map property in
the same node to be ignored.
Interrupt chip drivers:
- Workaround a limitation in SiFive PLIC interrupt chip which
silently ignores an EOI when the interrupt line is masked.
- Provide the missing mask/unmask implementation for the CSKY MP
interrupt controller.
PCI/MSI:
- Prevent a use after free when PCI/MSI interrupts are released by
destroying the sysfs entries before freeing the memory which is
accessed in the sysfs show() function.
- Implement a mask quirk for the Nvidia ION AHCI chip which does not
advertise masking capability despite implementing it. Even worse
the chip comes out of reset with all MSI entries masked, which due
to the missing masking capability never get unmasked.
- Move the check which prevents accessing the MSI[X] masking for XEN
back into the low level accessors. The recent consolidation missed
that these accessors can be invoked from places which do not have
that check which broke XEN. Move them back to he original place
instead of sprinkling tons of these checks all over the code"
* tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
of/irq: Don't ignore interrupt-controller when interrupt-map failed
irqchip/sifive-plic: Fixup EOI failed when masked
irqchip/csky-mpintc: Fixup mask/unmask implementation
PCI/MSI: Destroy sysfs before freeing entries
PCI: Add MSI masking quirk for Nvidia ION AHCI
PCI/MSI: Deal with devices lying about their MSI mask capability
PCI/MSI: Move non-mask check back into low level accessors
Linus Torvalds [Sun, 14 Nov 2021 18:30:17 +0000 (10:30 -0800)]
Merge tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 static call update from Thomas Gleixner:
"A single fix for static calls to make the trampoline patching more
robust by placing explicit signature bytes after the call trampoline
to prevent patching random other jumps like the CFI jump table
entries"
* tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
static_call,x86: Robustify trampoline patching
Linus Torvalds [Sun, 14 Nov 2021 17:39:03 +0000 (09:39 -0800)]
Merge tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Avoid touching ~100 config files in order to be able to select the
preemption model
- clear cluster CPU masks too, on the CPU unplug path
- prevent use-after-free in cfs
- Prevent a race condition when updating CPU cache domains
- Factor out common shared part of smp_prepare_cpus() into a common
helper which can be called by both baremetal and Xen, in order to fix
a booting of Xen PV guests
* tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
preempt: Restore preemption model selection configs
arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology()
sched/fair: Prevent dead task groups from regaining cfs_rq's
sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
x86/smp: Factor out parts of native_smp_prepare_cpus()
Linus Torvalds [Sun, 14 Nov 2021 17:33:12 +0000 (09:33 -0800)]
Merge tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Prevent unintentional page sharing by checking whether a page
reference to a PMU samples page has been acquired properly before
that
- Make sure the LBR_SELECT MSR is saved/restored too
- Reset the LBR_SELECT MSR when resetting the LBR PMU to clear any
residual data left
* tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Avoid put_page() when GUP fails
perf/x86/vlbr: Add c->flags to vlbr event constraints
perf/x86/lbr: Reset LBR_SELECT during vlbr reset