The brcmnand driver contains a bug in which if a page (example 2k byte)
is read from the parallel/ONFI NAND and within that page a subpage (512
byte) has correctable errors which is followed by a subpage with
uncorrectable errors, the page read will return the wrong status of
correctable (as opposed to the actual status of uncorrectable.)
The bug is in function brcmnand_read_by_pio where there is a check for
uncorrectable bits which will be preempted if a previous status for
correctable bits is detected.
The fix is to stop checking for bad bits only if we already have a bad
bits status.
Interacting with a NAND chip on an IPQ6018 I found that the qcomsmem NAND
partition parser was returning -EPROBE_DEFER waiting for the main smem
driver to load.
This caused the board to reset. Playing about with the probe() function
shows that the problem lies in the core clock being switched off before the
nandc_unalloc() routine has completed.
If we look at how qcom_nandc_remove() tears down allocated resources we see
the expected order is
Daniel Gibson reports that the n_tty code gets line termination wrong in
very specific cases:
"If you feed a line with exactly 64 chars + terminating newline, and
directly afterwards (without reading) another line into a pseudo
terminal, the the first read() on the other side will return the 64
char line *without* terminating newline, and the next read() will
return the missing terminating newline AND the complete next line (if
it fits in the buffer)"
and bisected the behavior to commit 3b830a9c34d5 ("tty: convert
tty_ldisc_ops 'read()' function to take a kernel pointer").
Now, digging deeper, it turns out that the behavior isn't exactly new:
what changed in commit 3b830a9c34d5 was that the tty line discipline
.read() function is now passed an intermediate kernel buffer rather than
the final user space buffer.
And that intermediate kernel buffer is 64 bytes in size - thus that
special case with exactly 64 bytes plus terminating newline.
The same problem did exist before, but historically the boundary was not
the 64-byte chunk, but the user-supplied buffer size, which is obviously
generally bigger (and potentially bigger than N_TTY_BUF_SIZE, which
would hide the issue entirely).
The reason is that the n_tty canon_copy_from_read_buf() code would look
ahead for the EOL character one byte further than it would actually
copy. It would then decide that it had found the terminator, and unmark
it as an EOL character - which in turn explains why the next read
wouldn't then be terminated by it.
Now, the reason it did all this in the first place is related to some
historical and pretty obscure EOF behavior, see commit ac8f3bf8832a
("n_tty: Fix poll() after buffer-limited eof push read") and commit 40d5e0905a03 ("n_tty: Fix EOF push handling").
And the reason for the EOL confusion is that we treat EOF as a special
EOL condition, with the EOL character being NUL (aka "__DISABLED_CHAR"
in the kernel sources).
So that EOF look-ahead also affects the normal EOL handling.
This patch just removes the look-ahead that causes problems, because EOL
is much more critical than the historical "EOF in the middle of a line
that coincides with the end of the buffer" handling ever was.
Now, it is possible that we should indeed re-introduce the "look at next
character to see if it's a EOF" behavior, but if so, that should be done
not at the kernel buffer chunk boundary in canon_copy_from_read_buf(),
but at a higher level, when we run out of the user buffer.
In particular, the place to do that would be at the top of
'n_tty_read()', where we check if it's a continuation of a previously
started read, and there is no more buffer space left, we could decide to
just eat the __DISABLED_CHAR at that point.
But that would be a separate patch, because I suspect nobody actually
cares, and I'd like to get a report about it before bothering.
Fixes: 3b830a9c34d5 ("tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer") Fixes: ac8f3bf8832a ("n_tty: Fix poll() after buffer-limited eof push read") Fixes: 40d5e0905a03 ("n_tty: Fix EOF push handling") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215611 Reported-and-tested-by: Daniel Gibson <metalcaedes@gmail.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The result of the writeback, whether it is an ENOSPC or an EIO, or
anything else, does not inhibit the NFS client from reporting the
correct file timestamps.
Commit ac795161c936 (NFSv4: Handle case where the lookup of a directory
fails) [1], part of Linux since 5.17-rc2, introduced a regression, where
a symbolic link on an NFS mount to a directory on another NFS does not
resolve(?) the first time it is accessed:
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Fixes: ac795161c936 ("NFSv4: Handle case where the lookup of a directory fails") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Donald Buczek <buczek@molgen.mpg.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that we disable wbt by set WBT_STATE_OFF_DEFAULT in
wbt_disable_default() when switch elevator to bfq. And when
we remove scsi device, wbt will be enabled by wbt_enable_default.
If it become false positive between wbt_wait() and wbt_track()
when submit write request.
The following is the scenario that triggered the problem.
T1 T2 T3
elevator_switch_mq
bfq_init_queue
wbt_disable_default <= Set
rwb->enable_state (OFF)
Submit_bio
blk_mq_make_request
rq_qos_throttle
<= rwb->enable_state (OFF)
scsi_remove_device
sd_remove
del_gendisk
blk_unregister_queue
elv_unregister_queue
wbt_enable_default
<= Set rwb->enable_state (ON)
q_qos_track
<= rwb->enable_state (ON)
^^^^^^ this request will mark WBT_TRACKED without inflight add and will
lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.
Fix this by move wbt_enable_default() from elv_unregister to
bfq_exit_queue(). Only re-enable wbt when bfq exit.
Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")
Remove oneline stale comment, and kill one oneshot local variable.
Signed-off-by: Ming Lei <ming.lei@rehdat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-block/20211214133103.551813-1-qiulaibin@huawei.com/ Signed-off-by: Laibin Qiu <qiulaibin@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Always signal that emulation is possible for !SEV guests regardless of
whether or not the CPU provided a valid instruction byte stream. KVM can
read all guest state (memory and registers) for !SEV guests, i.e. can
fetch the code stream from memory even if the CPU failed to do so because
of the SMAP errata.
Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)") Cc: stable@vger.kernel.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[jwang: adjust context for kernel 5.10.101] Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add the 'ifdef CONFIG_PPC64' around the 'ptesync' in function
'emulate_update_regs()' to like it is in 'analyse_instr()'. Since it looks like
it got dropped inadvertently by commit 3cdfcbfd32b9 ("powerpc: Change
analyse_instr so it doesn't modify *regs").
A key detail is that analyse_instr() will never recognise lwsync or
ptesync on 32-bit (because of the existing ifdef), and as a result
emulate_update_regs() should never be called with an op specifying
either of those on 32-bit. So removing them from emulate_update_regs()
should be a nop in terms of runtime behaviour.
Fixes: 3cdfcbfd32b9 ("powerpc: Change analyse_instr so it doesn't modify *regs") Cc: stable@vger.kernel.org # v4.14+ Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
[mpe: Add last paragraph of change log mentioning analyse_instr() details] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220211005113.1361436-1-anders.roxell@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When writing out a stereo control we discard the change notification from
the first channel, meaning that events are only generated based on changes
to the second channel. Ensure that we report a change if either channel
has changed.
When writing out a stereo control we discard the change notification from
the first channel, meaning that events are only generated based on changes
to the second channel. Ensure that we report a change if either channel
has changed.
By some unknown reason, BIOS on Shenker Dock 15 doesn't set up the
codec mask properly for the onboard audio. Let's set the forced codec
mask to enable the codec discovery.
The forced probe mask via probe_mask 0x100 bit doesn't work any longer
as expected since the bus init code was moved and it's clearing the
codec_mask value that was set beforehand. This patch fixes the
long-time regression by moving the check_probe_mask() call.
The recently introduced coef_mutex for Realtek codec seems causing a
deadlock when the relevant code is invoked from the power-off state;
then the HD-audio core tries to power-up internally, and this kicks
off the codec runtime PM code that tries to take the same coef_mutex.
In order to avoid the deadlock, do the temporary power up/down around
the coef_mutex acquisition and release. This assures that the
power-up sequence runs before the mutex, hence no re-entrance will
happen.
Legion Y9000X 2019 has the same speaker with Y9000X 2020,
but with a different quirk address. Add one quirk entry
to make the speaker work on Y9000X 2019 too.
non-regular file needs to be compiled and then copied to the output
directory. Remove it from TEST_PROGS and add it to TEST_GEN_PROGS. This
removes error thrown by rsync when non-regular object isn't found:
rsync: [sender] link_stat "/linux/tools/testing/selftests/exec/non-regular" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
This was detected by the gcc in Fedora Rawhide's gcc:
50 11.01 fedora:rawhide : FAIL gcc version 12.0.1 20220205 (Red Hat 12.0.1-0) (GCC)
inlined from 'bpf__config_obj' at util/bpf-loader.c:1242:9:
util/bpf-loader.c:1225:34: error: pointer 'map_opt' may be used after 'free' [-Werror=use-after-free]
1225 | *key_scan_pos += strlen(map_opt);
| ^~~~~~~~~~~~~~~
util/bpf-loader.c:1223:9: note: call to 'free' here
1223 | free(map_name);
| ^~~~~~~~~~~~~~
cc1: all warnings being treated as errors
So do the calculations on the pointer before freeing it.
Fixes: 04f9bf2bac72480c ("perf bpf-loader: Add missing '*' for key_scan_pos") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Link: https://lore.kernel.org/lkml/Yg1VtQxKrPpS3uNA@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1588 Single Step Timestamping code path uses a mutex to
enforce atomicity for two events:
- update of ptp single step register
- transmit ptp event packet
Before this patch the mutex was not initialized. This
caused unexpected crashes in the Tx function.
Fixes: c55211892f463 ("dpaa2-eth: support PTP Sync packet one-step timestamping") Signed-off-by: Radu Bulie <radu-andrei.bulie@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
GCC 12 correctly reports a potential use-after-free condition in the
xrealloc helper. Fix the warning by avoiding an implicit "free(ptr)"
when size == 0:
In file included from help.c:12:
In function 'xrealloc',
inlined from 'add_cmdname' at help.c:24:2: subcmd-util.h:56:23: error: pointer may be used after 'realloc' [-Werror=use-after-free]
56 | ret = realloc(ptr, size);
| ^~~~~~~~~~~~~~~~~~
subcmd-util.h:52:21: note: call to 'realloc' here
52 | void *ret = realloc(ptr, size);
| ^~~~~~~~~~~~~~~~~~
subcmd-util.h:58:31: error: pointer may be used after 'realloc' [-Werror=use-after-free]
58 | ret = realloc(ptr, 1);
| ^~~~~~~~~~~~~~~
subcmd-util.h:52:21: note: call to 'realloc' here
52 | void *ret = realloc(ptr, size);
| ^~~~~~~~~~~~~~~~~~
syzbot reported that two threads might write over agg_select_timer
at the same time. Make agg_select_timer atomic to fix the races.
BUG: KCSAN: data-race in bond_3ad_initiate_agg_selection / bond_3ad_state_machine_handler
read to 0xffff8881242aea90 of 4 bytes by task 1846 on cpu 1:
bond_3ad_state_machine_handler+0x99/0x2810 drivers/net/bonding/bond_3ad.c:2317
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
worker_thread+0x616/0xa70 kernel/workqueue.c:2454
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 25910 Comm: syz-executor.1 Tainted: G W 5.17.0-rc4-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e1383-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 470502de5bdb ("net: sched: unlock rules update API") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Buslov <vladbu@mellanox.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
trace_napi_poll_hit() is reading stat->dev while another thread can write
on it from dropmon_net_event()
Use READ_ONCE()/WRITE_ONCE() here, RCU rules are properly enforced already,
we only have to take care of load/store tearing.
BUG: KCSAN: data-race in dropmon_net_event / trace_napi_poll_hit
write to 0xffff88816f3ab9c0 of 8 bytes by task 20260 on cpu 1:
dropmon_net_event+0xb8/0x2b0 net/core/drop_monitor.c:1579
notifier_call_chain kernel/notifier.c:84 [inline]
raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:392
call_netdevice_notifiers_info net/core/dev.c:1919 [inline]
call_netdevice_notifiers_extack net/core/dev.c:1931 [inline]
call_netdevice_notifiers net/core/dev.c:1945 [inline]
unregister_netdevice_many+0x867/0xfb0 net/core/dev.c:10415
ip_tunnel_delete_nets+0x24a/0x280 net/ipv4/ip_tunnel.c:1123
vti_exit_batch_net+0x2a/0x30 net/ipv4/ip_vti.c:515
ops_exit_list net/core/net_namespace.c:173 [inline]
cleanup_net+0x4dc/0x8d0 net/core/net_namespace.c:597
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
worker_thread+0x616/0xa70 kernel/workqueue.c:2454
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
read to 0xffff88816f3ab9c0 of 8 bytes by interrupt on cpu 0:
trace_napi_poll_hit+0x89/0x1c0 net/core/drop_monitor.c:292
trace_napi_poll include/trace/events/napi.h:14 [inline]
__napi_poll+0x36b/0x3f0 net/core/dev.c:6366
napi_poll net/core/dev.c:6432 [inline]
net_rx_action+0x29e/0x650 net/core/dev.c:6519
__do_softirq+0x158/0x2de kernel/softirq.c:558
do_softirq+0xb1/0xf0 kernel/softirq.c:459
__local_bh_enable_ip+0x68/0x70 kernel/softirq.c:383
__raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline]
_raw_spin_unlock_bh+0x33/0x40 kernel/locking/spinlock.c:210
spin_unlock_bh include/linux/spinlock.h:394 [inline]
ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline]
wg_packet_decrypt_worker+0x73c/0x780 drivers/net/wireguard/receive.c:506
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
worker_thread+0x616/0xa70 kernel/workqueue.c:2454
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
value changed: 0xffff88815883e000 -> 0x0000000000000000
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 26435 Comm: kworker/0:1 Not tainted 5.17.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: wg-crypt-wg2 wg_packet_decrypt_worker
Fixes: 4ea7e38696c7 ("dropmon: add ability to detect when hardware dropsrxpackets") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In __bond_release_one(), bond_set_carrier() is only called when bond
device has no slave. Therefore, if we remove the up slave from a master
with two slaves and keep the down slave, the master will remain up.
Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
statement.
When 'ping' changes to use PING socket instead of RAW socket by:
# sysctl -w net.ipv4.ping_group_range="0 100"
There is another regression caused when matching sk_bound_dev_if
and dif, RAW socket is using inet_iif() while PING socket lookup
is using skb->dev->ifindex, the cmd below fails due to this:
# ip link add dummy0 type dummy
# ip link set dummy0 up
# ip addr add 192.168.111.1/24 dev dummy0
# ping -I dummy0 192.168.111.1 -c1
The issue was also reported on:
https://github.com/iputils/iputils/issues/104
But fixed in iputils in a wrong way by not binding to device when
destination IP is on device, and it will cause some of kselftests
to fail, as Jianlin noticed.
This patch is to use inet(6)_iif and inet(6)_sdif to get dif and
sdif for PING socket, and keep consistent with RAW socket.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These periods are expressed in time units (microseconds) while 40 and 12
are the number of symbol durations these periods will last. We need to
multiply them both with the symbol_duration in order to get these
values in microseconds.
The reset input to the LAN9303 chip is active low, and devicetree
gpio handles reflect this. Therefore, the gpio should be requested
with an initial state of high in order for the reset signal to be
asserted. Other uses of the gpio already use the correct polarity.
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303") Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fianelil <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220209145454.19749-1-mans@mansr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ipv6 flowlabels historically require a reservation before use.
Optionally in exclusive mode (e.g., user-private).
Commit 59c820b2317f ("ipv6: elide flowlabel check if no exclusive
leases exist") introduced a fastpath that avoids this check when no
exclusive leases exist in the system, and thus any flowlabel use
will be granted.
That allows skipping the control operation to reserve a flowlabel
entirely. Though with a warning if the fast path fails:
This is an optimization. Robust applications still have to revert to
requesting leases if the fast path fails due to an exclusive lease.
Still, this is subtle. Better isolate network namespaces from each
other. Flowlabels are per-netns. Also record per-netns whether
exclusive leases are in use. Then behavior does not change based on
activity in other netns.
Changes
v2
- wrap in IS_ENABLED(CONFIG_IPV6) to avoid breakage if disabled
When the nft_concat_range test failed, it exit 1 in the code
specifically.
But when part of, or all of the test passed, it will failed the
[ ${passed} -eq 0 ] check and thus exit with 1, which is the same
exit value with failure result. Fix it by exit 0 when passed is not 0.
Fixes: 611973c1e06f ("selftests: netfilter: Introduce tests for sets with range concatenation") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vsock_connect() expects that the socket could already be in the
TCP_ESTABLISHED state when the connecting task wakes up with a signal
pending. If this happens the socket will be in the connected table, and
it is not removed when the socket state is reset. In this situation it's
common for the process to retry connect(), and if the connection is
successful the socket will be added to the connected table a second
time, corrupting the list.
Prevent this by calling vsock_remove_connected() if a signal is received
while waiting for a connection. This is harmless if the socket is not in
the connected table, and if it is in the table then removing it will
prevent list corruption from a double add.
Note for backporting: this patch requires d5afa82c977e ("vsock: correct
removal of socket from the list"), which is in all current stable trees
except 4.9.y.
The mapping from enum port to whatever port numbering scheme is used by
the SWSCI Display Power State Notification is odd, and the memory of it
has faded. In any case, the parameter only has space for ports numbered
[0..4], and UBSAN reports bit shift beyond it when the platform has port
F or more.
Since the SWSCI functionality is supposed to be obsolete for new
platforms (i.e. ones that might have port F or more), just bail out
early if the mapped and mangled port number is beyond what the Display
Power State Notification can support.
Fixes: 9c4b0a683193 ("drm/i915: add opregion function to notify bios of encoder enable/disable") Cc: <stable@vger.kernel.org> # v3.13+ Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4800 Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/cc363f42d6b5a5932b6d218fefcc8bdfb15dbbe5.1644489329.git.jani.nikula@intel.com
(cherry picked from commit 24a644ebbfd3b13cda702f98907f9dd123e34bf9) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If no firmware was present at all (or, presumably, all of the
firmware files failed to parse), we end up unbinding by calling
device_release_driver(), which calls remove(), which then in
iwlwifi calls iwl_drv_stop(), freeing the 'drv' struct. However
the new code I added will still erroneously access it after it
was freed.
Set 'failure=false' in this case to avoid the access, all data
was already freed anyway.
Cc: stable@vger.kernel.org Reported-by: Stefan Agner <stefan@agner.ch> Reported-by: Wolfgang Walter <linux@stwm.de> Reported-by: Jason Self <jason@bluehome.net> Reported-by: Dominik Behr <dominik@dominikbehr.com> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Fixes: ab07506b0454 ("iwlwifi: fix leaks/bad data after failed firmware load") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220208114728.e6b514cf4c85.Iffb575ca2a623d7859b542c33b2a507d01554251@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Merge module sections only when using Clang LTO. With ld.bfd, merging
sections does not appear to update the symbol tables for the module,
e.g. 'readelf -s' shows the value that a symbol would have had, if
sections were not merged. ld.lld does not show this problem.
The stale symbol table breaks gdb's function disassembler, and presumably
other things, e.g.
LLD always splits sections with LTO, which increases module sizes. This
change adds linker script rules to merge the split sections in the final
module.
Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201211184633.3213045-6-samitolvanen@google.com Cc: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When account() is called, and the amount of entropy dips below
random_write_wakeup_bits, we wake up the random writers, so that they
can write some more in. However, the RNDZAPENTCNT/RNDCLEARPOOL ioctl
sets the entropy count to zero -- a potential reduction just like
account() -- but does not unblock writers. This commit adds the missing
logic to that ioctl to unblock waiting writers.
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While the stackleak plugin was already using notrace, objtool is now a
bit more picky. Update the notrace uses to noinstr. Silences the
following objtool warnings when building with:
vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section
vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section
vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section
vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section
Note that the plugin's addition of calls to stackleak_track_stack() from
noinstr functions is expected to be safe, as it isn't runtime
instrumentation and is self-contained.
Cc: Alexander Popov <alex.popov@linux.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We need to finish all async code before the module init sequence is
done. In the reverted commit the PF_USED_ASYNC flag was added to mark a
thread that called async_schedule(). Then the PF_USED_ASYNC flag was
used to determine whether or not async_synchronize_full() needs to be
invoked. This works when modprobe thread is calling async_schedule(),
but it does not work if module dispatches init code to a worker thread
which then calls async_schedule().
For example, PCI driver probing is invoked from a worker thread based on
a node where device is attached:
We end up in a situation where a worker thread gets the PF_USED_ASYNC
flag set instead of the modprobe thread. As a result,
async_synchronize_full() is not invoked and modprobe completes without
waiting for the async code to finish.
The issue was discovered while loading the pm80xx driver:
(scsi_mod.scan=async)
Commit 21c3c5d28007 ("block: don't request module during elevator init")
fixed the deadlock issue which the reverted commit 774a1221e862
("module, async: async_synchronize_full() on module init iff async is
used") tried to fix.
Since commit 0fdff3ec6d87 ("async, kmod: warn on synchronous
request_module() from async workers") synchronous module loading from
async is not allowed.
Given that the original deadlock issue is fixed and it is no longer
allowed to call synchronous request_module() from async we can remove
PF_USED_ASYNC flag to make module init consistently invoke
async_synchronize_full() unless async module probe is requested.
This started out with me noticing that "dom0_max_vcpus=<N>" with <N>
larger than the number of physical CPUs reported through ACPI tables
would not bring up the "excess" vCPU-s. Addressing this is the primary
purpose of the change; CPU maps handling is being tidied only as far as
is necessary for the change here (with the effect of also avoiding the
setting up of too much per-CPU infrastructure, i.e. for CPUs which can
never come online).
Noticing that xen_fill_possible_map() is called way too early, whereas
xen_filter_cpu_maps() is called too late (after per-CPU areas were
already set up), and further observing that each of the functions serves
only one of Dom0 or DomU, it looked like it was better to simplify this.
Use the .get_smp_config hook instead, uniformly for Dom0 and DomU.
xen_fill_possible_map() can be dropped altogether, while
xen_filter_cpu_maps() is re-purposed but not otherwise changed.
We probably never trigger this, but the logic inside the check is
inverted.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While nvme_rdma_submit_async_event_work is checking the ctrl and queue
state before preparing the AER command and scheduling io_work, in order
to fully prevent a race where this check is not reliable the error
recovery work must flush async_event_work before continuing to destroy
the admin queue after setting the ctrl state to RESETTING such that
there is no race .submit_async_event and the error recovery handler
itself changing the ctrl state.
While nvme_tcp_submit_async_event_work is checking the ctrl and queue
state before preparing the AER command and scheduling io_work, in order
to fully prevent a race where this check is not reliable the error
recovery work must flush async_event_work before continuing to destroy
the admin queue after setting the ctrl state to RESETTING such that
there is no race .submit_async_event and the error recovery handler
itself changing the ctrl state.
Unlike .queue_rq, in .submit_async_event drivers may not check the ctrl
readiness for AER submission. This may lead to a use-after-free
condition that was observed with nvme-tcp.
The race condition may happen in the following scenario:
1. driver executes its reset_ctrl_work
2. -> nvme_stop_ctrl - flushes ctrl async_event_work
3. ctrl sends AEN which is received by the host, which in turn
schedules AEN handling
4. teardown admin queue (which releases the queue socket)
5. AEN processed, submits another AER, calling the driver to submit
6. driver attempts to send the cmd
==> use-after-free
In order to fix that, add ctrl state check to validate the ctrl
is actually able to accept the AER submission.
This addresses the above race in controller resets because the driver
during teardown should:
1. change ctrl state to RESETTING
2. flush async_event_work (as well as other async work elements)
So after 1,2, any other AER command will find the
ctrl state to be RESETTING and bail out without submitting the AER.
Currently a use-after-free may occur if a sas_task is aborted by the upper
layer before we handle the I/O completion in mpi_ssp_completion() or
mpi_sata_completion().
In this case, the following are the two steps in handling those I/O
completions:
- Call complete() to inform the upper layer handler of completion of
the I/O.
- Release driver resources associated with the sas_task in
pm8001_ccb_task_free() call.
When complete() is called, the upper layer may free the sas_task. As such,
we should not touch the associated sas_task afterwards, but we do so in the
pm8001_ccb_task_free() call.
Fix by swapping the complete() and pm8001_ccb_task_free() calls ordering.
Link: https://lore.kernel.org/r/1643289172-165636-4-git-send-email-john.garry@huawei.com Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently a use-after-free may occur if a TMF sas_task is aborted before we
handle the IO completion in mpi_ssp_completion(). The abort occurs due to
timeout.
When the timeout occurs, the SAS_TASK_STATE_ABORTED flag is set and the
sas_task is freed in pm8001_exec_internal_tmf_task().
However, if the I/O completion occurs later, the I/O completion still
thinks that the sas_task is available. Fix this by clearing the ccb->task
if the TMF times out - the I/O completion handler does nothing if this
pointer is cleared.
Link: https://lore.kernel.org/r/1643289172-165636-3-git-send-email-john.garry@huawei.com Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Strangely, dquot_quota_sync ignores the return code from the ->sync_fs
call, which means that quotacalls like Q_SYNC never see the error. This
doesn't seem right, so fix that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If we fail to synchronize the filesystem while preparing to freeze the
fs, abort the freeze.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The previous commit 1ade48d0c27d ("ax25: NPD bug when detaching
AX25 device") introduce lock_sock() into ax25_kill_by_device to
prevent NPD bug. But the concurrency NPD or UAF bug will occur,
when lock_sock() or release_sock() dereferences the ax25_cb->sock.
The NULL pointer dereference bug can be shown as below:
The root cause is that the sock is set to null before dereference
site (1) or (2). Therefore, this patch extracts the ax25_cb->sock
in advance, and uses ax25_list_lock to protect it, which can synchronize
with ax25_cb_del() and ensure the value of sock is not null before
dereference sites.
The root cause is that the sock is released before dereference
site (1) or (2). Therefore, this patch uses sock_hold() to increase
the refcount of sock and uses ax25_list_lock to protect it, which
can synchronize with ax25_cb_del() in ax25_destroy_socket() and
ensure the sock wil not be released before dereference sites.
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Report mincore.check_file_mmap as SKIP instead of FAIL if the underlying
filesystem lacks support of O_TMPFILE or fallocate since such failures
are not really related to mincore functionality.
If zram-generator package is installed and works, then we can not remove
zram module because zram swap is being used. This case needs a clean zram
environment, change this test by using hot_add/hot_remove interface. So
even zram device is being used, we still can add zram device and remove
them in cleanup.
The two interface was introduced since kernel commit 6566d1a32bf7("zram:
add dynamic device add/remove functionality") in v4.2-rc1. If kernel
supports these two interface, we use hot_add/hot_remove to slove this
problem, if not, just check whether zram is being used or built in, then
skip it on old kernel.
zram01 uses `free -m` to measure zram memory usage. The results are no
sense because they are polluted by all running processes on the system.
We Should only calculate the free memory delta for the current process.
So use the third field of /sys/block/zram<id>/mm_stat to measure memory
usage instead. The file is available since kernel 4.1.
orig_data_size(first): uncompressed size of data stored in this disk.
compr_data_size(second): compressed size of data stored in this disk
mem_used_total(third): the amount of memory allocated for this disk
Also remove useless zram cleanup call in zram_fill_fs and so we don't
need to cleanup zram twice if fails.
Since commit 43209ea2d17a ("zram: remove max_comp_streams internals"), zram
has switched to per-cpu streams. Even kernel still keep this interface for
some reasons, but writing to max_comp_stream doesn't take any effect. So
skip it on newer kernel ie 4.7.
The code that comparing kernel version is from xfstests testsuite ext4/053.
Upon error the ieee802154_xmit_complete() helper is not called. Only
ieee802154_wake_queue() is called manually. In the Tx case we then leak
the skb structure.
Free the skb structure upon error before returning when appropriate.
As the 'is_tx = 0' cannot be moved in the complete handler because of a
possible race between the delay in switching to STATE_RX_AACK_ON and a
new interrupt, we introduce an intermediate 'was_tx' boolean just for
this purpose.
There is no Fixes tag applying here, many changes have been made on this
area and the issue kind of always existed.
Suggested-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220125121426.848337-4-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We have some many cases that will create child process as well, such as
pidfd_wait. Previously, we will signal/kill the parent process when it
is time out, but this signal will not be sent to its child process. In
such case, if child process doesn't terminate itself, ksefltest framework
will hang forever.
Here we group all its child processes so that kill() can signal all of
them in timeout.
The timeout setting for the rtc kselftest is currently 90 seconds. This
setting is used by the kselftest runner to stop running a test if it
takes longer than the assigned value.
However, two of the test cases inside rtc set alarms. These alarms are
set to the next beginning of the minute, so each of these test cases may
take up to, in the worst case, 60 seconds.
In order to allow for all test cases in rtc to run, even in the worst
case, when using the kselftest runner, the timeout value should be
increased to at least 120. Set it to 180, so there's some additional
slack.
Correct operation can be tested by running the following command right
after the start of a minute (low second count), and checking that all
test cases run:
./run_kselftest.sh -c rtc
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The call sequence suggested that misc_device /dev file can be opened
before misc device is yet to be registered, which is done only once.
Here punit_misc_dev_lock was used as common lock, to protect the
registration by multiple ISST HW drivers, one time setup, prevent
duplicate registry of misc device and prevent load/unload when device
is open.
We can split into locks:
- One which just prevent duplicate call to misc_register() and one
time setup. Also never call again if the misc_register() failed or
required one time setup is failed. This lock is not shared with
any misc device callbacks.
- The other lock protects registry, load and unload of HW drivers.
Sequence in isst_if_cdev_register()
- Register callbacks under punit_misc_dev_open_lock
- Call isst_misc_reg() which registers misc_device on the first
registry which is under punit_misc_dev_reg_lock, which is not
shared with callbacks.
Sequence in isst_if_cdev_unregister
Just opposite of isst_if_cdev_register
Reported-and-tested-by: Liwei Song <liwei.song@windriver.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20220112022521.54669-1-srinivas.pandruvada@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add touchscreen info for RWC NANOTE P8 (AY07J) 2-in-1.
Signed-off-by: Yuka Kawajiri <yukx00@gmail.com> Link: https://lore.kernel.org/r/20220111154019.4599-1-yukx00@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently if we get IO error while doing send then we abort without
logging information about which file caused issue. So log it to help
with debugging.
CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Dāvis Mosāns <davispuh@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's a followup to the previous commit f15309d7ad5d ("parisc: Add
ioread64_hi_lo() and iowrite64_hi_lo()") which does only half of
the job. Add the rest, so we won't get a new kernel test robot
reports.
Fixes: f15309d7ad5d ("parisc: Add ioread64_hi_lo() and iowrite64_hi_lo()") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When kernel boots with a NUMA topology with some NUMA nodes offline, the PCI
driver should only set an online NUMA node on the device. This can happen
during KDUMP where some NUMA nodes are not made online by the KDUMP kernel.
This patch also fixes the case where kernel is booting with "numa=off".
Fixes: 999dd956d838 ("PCI: hv: Add support for protocol 1.3 and support PCI_BUS_RELATIONS2") Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@microsoft.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Link: https://lore.kernel.org/r/1643247814-15184-1-git-send-email-longli@linuxonhyperv.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oded Gabbay reports that enabling NUMA balancing causes corruption with
his Gaudi accelerator test load:
"All the details are in the bug, but the bottom line is that somehow,
this patch causes corruption when the numa balancing feature is
enabled AND we don't use process affinity AND we use GUP to pin pages
so our accelerator can DMA to/from system memory.
Either disabling numa balancing, using process affinity to bind to
specific numa-node or reverting this patch causes the bug to
disappear"
and Oded bisected the issue to commit 09854ba94c6a ("mm: do_wp_page()
simplification").
Now, the NUMA balancing shouldn't actually be changing the writability
of a page, and as such shouldn't matter for COW. But it appears it
does. Suspicious.
However, regardless of that, the condition for enabling NUMA faults in
change_pte_range() is nonsensical. It uses "page_mapcount(page)" to
decide if a COW page should be NUMA-protected or not, and that makes
absolutely no sense.
The number of mappings a page has is irrelevant: not only does GUP get a
reference to a page as in Oded's case, but the other mappings migth be
paged out and the only reference to them would be in the page count.
Since we should never try to NUMA-balance a page that we can't move
anyway due to other references, just fix the code to use 'page_count()'.
Oded confirms that that fixes his issue.
Now, this does imply that something in NUMA balancing ends up changing
page protections (other than the obvious one of making the page
inaccessible to get the NUMA faulting information). Otherwise the COW
simplification wouldn't matter - since doing the GUP on the page would
make sure it's writable.
The cause of that permission change would be good to figure out too,
since it clearly results in spurious COW events - but fixing the
nonsensical test that just happened to work before is obviously the
CorrectThing(tm) to do regardless.
On reads with MMC_READ_MULTIPLE_BLOCK that fail,
the recovery handler will use MMC_READ_SINGLE_BLOCK for
each of the blocks, up to MMC_READ_SINGLE_RETRIES times each.
The logic for this is fixed to never report unsuccessful reads
as success to the block layer.
On command error with retries remaining, blk_update_request was
called with whatever value error was set last to.
In case it was last set to BLK_STS_OK (default), the read will be
reported as success, even though there was no data read from the device.
This could happen on a CRC mismatch for the response,
a card rejecting the command (e.g. again due to a CRC mismatch).
In case it was last set to BLK_STS_IOERR, the error is reported correctly,
but no retries will be attempted.
The bug is caused by overrunning the sglist and incorrectly testing
sg_dma_len(sglist) before nents. Normally this doesn't cause a crash,
but in this case sglist crossed a page boundary. This occurs in the
following code:
while (sg_dma_len(sglist) && nents--) {
The fix is simply to test nents first and move the decrement of nents
into the loop.
Reported-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a build error when using a kernel .config file from
'kernel test robot' for a different build problem:
hppa64-linux-ld: drivers/tty/serial/8250/8250_gsc.o: in function `.LC3':
(.data.rel.ro+0x18): undefined reference to `iosapic_serial_irq'
when:
CONFIG_GSC=y
CONFIG_SERIO_GSCPS2=y
CONFIG_SERIAL_8250_GSC=y
CONFIG_PCI is not set
and hence PCI_LBA is not set.
IOSAPIC depends on PCI_LBA, so IOSAPIC is not set/enabled.
Make the use of iosapic_serial_irq() conditional to fix the build error.
Remove a WARN on an "AVIC IPI invalid target" exit, the WARN is trivial
to trigger from guest as it will fail on any destination APIC ID that
doesn't exist from the guest's perspective.
Don't bother recording anything in the kernel log, the common tracepoint
for kvm_avic_incomplete_ipi() is sufficient for debugging.
This patch adds support for the UGTABLET WP5540 digitizer tablet
devices. Without it, the pen moves the cursor, but neither the
buttons nor the tap sensor in the tip do work.
Contention for the mailbox interface may occur during driver initialization
(immediately after a function reset), between mailbox commands initiated
via ioctl (bsg) and those driver requested by the driver.
After setting SLI_ACTIVE flag for a port, there is a window in which the
driver will allow an ioctl to be initiated while the adapter is
initializing and issuing mailbox commands via polling. The polling logic
then gets confused.
Correct by having thread setting SLI_ACTIVE spot an active mailbox command
and allow it complete before proceeding.
Link: https://lore.kernel.org/r/20210921143008.64212-1-jsmart2021@gmail.com Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When CAN_ISOTP_SF_BROADCAST is set in the CAN_ISOTP_OPTS flags the CAN_ISOTP
socket is switched into functional addressing mode, where only single frame
(SF) protocol data units can be send on the specified CAN interface and the
given tp.tx_id after bind().
In opposite to normal and extended addressing this socket does not register a
CAN-ID for reception which would be needed for a 1-to-1 ISOTP connection with a
segmented bi-directional data transfer.
Sending SFs on this socket is therefore a TX-only 'broadcast' operation.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Thomas Wagner <thwa1@web.de> Link: https://lore.kernel.org/r/20201206144731.4609-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A race condition was found in isotp_setsockopt() which allows to
change socket options after the socket was bound.
For the specific case of SF_BROADCAST support, this might lead to possible
use-after-free because can_rx_unregister() is not called.
Checking for the flag under the socket lock in isotp_bind() and taking
the lock in isotp_setsockopt() fixes the issue.
Fixes: 921ca574cd38 ("can: isotp: add SF_BROADCAST support for functional addressing") Link: https://lore.kernel.org/r/trinity-e6ae9efa-9afb-4326-84c0-f3609b9b8168-1620773528307@3c-app-gmx-bs06 Reported-by: Norbert Slusarek <nslusarek@gmx.net> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Norbert Slusarek <nslusarek@gmx.net> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The reproducer was trying to read /proc/$PID/smaps when calling
MADV_FREE at the mean time. MADV_FREE may split THPs if it is called
for partial THP. It may trigger the below race:
CPU A CPU B
----- -----
smaps walk: MADV_FREE:
page_mapcount()
PageCompound()
split_huge_page()
page = compound_head(page)
PageDoubleMap(page)
When calling PageDoubleMap() this page is not a tail page of THP anymore
so the BUG is triggered.
This could be fixed by elevated refcount of the page before calling
mapcount, but that would prevent it from counting migration entries, and
it seems overkilling because the race just could happen when PMD is
split so all PTE entries of tail pages are actually migration entries,
and smaps_account() does treat migration entries as mapcount == 1 as
Kirill pointed out.
Add a new parameter for smaps_account() to tell this entry is migration
entry then skip calling page_mapcount(). Don't skip getting mapcount
for device private entries since they do track references with mapcount.
Pagemap also has the similar issue although it was not reported. Fixed
it as well.
[shy828301@gmail.com: v4] Link: https://lkml.kernel.org/r/20220203182641.824731-1-shy828301@gmail.com
[nathan@kernel.org: avoid unused variable warning in pagemap_pmd_range()] Link: https://lkml.kernel.org/r/20220207171049.1102239-1-nathan@kernel.org Link: https://lkml.kernel.org/r/20220120202805.3369-1-shy828301@gmail.com Fixes: e9b61f19858a ("thp: reintroduce split_huge_page()") Signed-off-by: Yang Shi <shy828301@gmail.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reported-by: syzbot+1f52b3a18d5633fa7f82@syzkaller.appspotmail.com Acked-by: David Hildenbrand <david@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 054aa8d439b9 ("fget: check that the fd still exists after getting
a ref to it") fixed a race with getting a reference to a file just as it
was being closed. It was a fairly minimal patch, and I didn't think
re-checking the file pointer lookup would be a measurable overhead,
since it was all right there and cached.
But I was wrong, as pointed out by the kernel test robot.
The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed
quite noticeably. Admittedly it seems to be a very artificial test:
doing "poll()" system calls on regular files in a very tight loop in
multiple threads.
That means that basically all the time is spent just looking up file
descriptors without ever doing anything useful with them (not that doing
'poll()' on a regular file is useful to begin with). And as a result it
shows the extra "re-check fd" cost as a sore thumb.
Happily, the regression is fixable by just writing the code to loook up
the fd to be better and clearer. There's still a cost to verify the
file pointer, but now it's basically in the noise even for that
benchmark that does nothing else - and the code is more understandable
and has better comments too.
[ Side note: this patch is also a classic case of one that looks very
messy with the default greedy Myers diff - it's much more legible with
either the patience of histogram diff algorithm ]
An outgoing CPU is marked offline in a stop-machine handler and most
of that CPU's services stop at that point, including IRQ work queues.
However, that CPU must take another pass through the scheduler and through
a number of CPU-hotplug notifiers, many of which contain RCU readers.
In the past, these readers were not a problem because the outgoing CPU
has interrupts disabled, so that rcu_read_unlock_special() would not
be invoked, and thus RCU would never attempt to queue IRQ work on the
outgoing CPU.
This changed with the advent of the CONFIG_RCU_STRICT_GRACE_PERIOD
Kconfig option, in which rcu_read_unlock_special() is invoked upon exit
from almost all RCU read-side critical sections. Worse yet, because
interrupts are disabled, rcu_read_unlock_special() cannot immediately
report a quiescent state and will therefore attempt to defer this
reporting, for example, by queueing IRQ work. Which fails with a splat
because the CPU is already marked as being offline.
But it turns out that there is no need to report this quiescent state
because rcu_report_dead() will do this job shortly after the outgoing
CPU makes its final dive into the idle loop. This commit therefore
makes rcu_read_unlock_special() refrain from queuing IRQ work onto
outgoing CPUs.
Fixes: 44bad5b3cca2 ("rcu: Do full report for .need_qs for strict GPs") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jann Horn <jannh@google.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander reported a circular lock dependency revealed by the mmap1 ltp
test:
LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
WARNING: possible circular locking dependency detected
5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
------------------------------------------------------
mmap1/202299 is trying to acquire lock: 00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
but task is already holding lock: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&sighand->siglock){-.-.}-{2:2}:
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
__lock_task_sighand+0x90/0x190
cgroup_freeze_task+0x2e/0x90
cgroup_migrate_execute+0x11c/0x608
cgroup_update_dfl_csses+0x246/0x270
cgroup_subtree_control_write+0x238/0x518
kernfs_fop_write_iter+0x13e/0x1e0
new_sync_write+0x100/0x190
vfs_write+0x22c/0x2d8
ksys_write+0x6c/0xf8
__do_syscall+0x1da/0x208
system_call+0x82/0xb0
-> #0 (css_set_lock){..-.}-{2:2}:
check_prev_add+0xe0/0xed8
validate_chain+0x736/0xb20
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
obj_cgroup_release+0x4a/0xe0
percpu_ref_put_many.constprop.0+0x150/0x168
drain_obj_stock+0x94/0xe8
refill_obj_stock+0x94/0x278
obj_cgroup_charge+0x164/0x1d8
kmem_cache_alloc+0xac/0x528
__sigqueue_alloc+0x150/0x308
__send_signal+0x260/0x550
send_signal+0x7e/0x348
force_sig_info_to_task+0x104/0x180
force_sig_fault+0x48/0x58
__do_pgm_check+0x120/0x1f0
pgm_check_handler+0x11e/0x180
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sighand->siglock);
lock(css_set_lock);
lock(&sighand->siglock);
lock(css_set_lock);
*** DEADLOCK ***
2 locks held by mmap1/202299:
#0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
#1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
stack backtrace:
CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
Hardware name: IBM 3906 M04 704 (LPAR)
Call Trace:
dump_stack_lvl+0x76/0x98
check_noncircular+0x136/0x158
check_prev_add+0xe0/0xed8
validate_chain+0x736/0xb20
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
obj_cgroup_release+0x4a/0xe0
percpu_ref_put_many.constprop.0+0x150/0x168
drain_obj_stock+0x94/0xe8
refill_obj_stock+0x94/0x278
obj_cgroup_charge+0x164/0x1d8
kmem_cache_alloc+0xac/0x528
__sigqueue_alloc+0x150/0x308
__send_signal+0x260/0x550
send_signal+0x7e/0x348
force_sig_info_to_task+0x104/0x180
force_sig_fault+0x48/0x58
__do_pgm_check+0x120/0x1f0
pgm_check_handler+0x11e/0x180
INFO: lockdep is turned off.
In this example a slab allocation from __send_signal() caused a
refilling and draining of a percpu objcg stock, resulted in a releasing
of another non-related objcg. Objcg release path requires taking the
css_set_lock, which is used to synchronize objcg lists.
This can create a circular dependency with the sighandler lock, which is
taken with the locked css_set_lock by the freezer code (to freeze a
task).
In general it seems that using css_set_lock to synchronize objcg lists
makes any slab allocations and deallocation with the locked css_set_lock
and any intervened locks risky.
To fix the problem and make the code more robust let's stop using
css_set_lock to synchronize objcg lists and use a new dedicated spinlock
instead.
Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API") Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Waiman Long <longman@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kasan has reported the following use after free on dev->iommu.
when a device probe fails and it is in process of freeing dev->iommu
in dev_iommu_free function, a deferred_probe_work_func runs in parallel
and tries to access dev->iommu->fwspec in of_iommu_configure path thus
causing use after free.
BUG: KASAN: use-after-free in of_iommu_configure+0xb4/0x4a4
Read of size 8 at addr ffffff87a2f1acb8 by task kworker/u16:2/153
There's list corruption on cgrp_cpuctx_list. This happens on the
following path:
perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list)
cpu_ctx_sched_in
ctx_sched_in
ctx_pinned_sched_in
merge_sched_in
perf_cgroup_event_disable: remove the event from the list
Use list_for_each_entry_safe() to allow removing an entry during
iteration.
Fixes: 058fe1c0440e ("perf/core: Make cgroup switch visit only cpuctxs with cgroup events") Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220204004057.2961252-1-song@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The port node does not have a unit-address, remove it.
This fixes the warnings:
lcd-controller@30320000: 'port' is a required property
lcd-controller@30320000: 'port@0' does not match any of the regexes:
'pinctrl-[0-9]+'
Messages around firmware download were incorrectly tagged as being related
to discovery trace events. Thus, firmware download status ended up dumping
the trace log as well as the firmware update message. As there were a
couple of log messages in this state, the trace log was dumped multiple
times.
Resolve this by converting from trace events to SLI events.
Link: https://lore.kernel.org/r/20220207180442.72836-1-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver is initiating NVMe PRLIs to determine device NVMe support. This
should not be occurring if CONFIG_NVME_FC support is disabled.
Correct this by changing the default value for FC4 support. Currently it
defaults to FCP and NVMe. With change, when NVME_FC support is not enabled
in the kernel, the default value is just FCP.
Link: https://lore.kernel.org/r/20220207180516.73052-1-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 43a08c3bdac4 ("can: isotp: isotp_sendmsg(): fix TX buffer concurrent
access in isotp_sendmsg()") introduced a new locking scheme that may render
the userspace application in a locking state when an error is detected.
This issue shows up under high load on simultaneously running isotp channels
with identical configuration which is against the ISO specification and
therefore breaks any reasonable PDU communication anyway.
Fixes: 43a08c3bdac4 ("can: isotp: isotp_sendmsg(): fix TX buffer concurrent access in isotp_sendmsg()") Link: https://lore.kernel.org/all/20220209073601.25728-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Cc: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-Wunaligned-access is a new warning in clang that is default enabled for
arm and arm64 under certain circumstances within the clang frontend (see
LLVM commit below). On v5.17-rc2, an ARCH=arm allmodconfig build shows
1284 total/70 unique instances of this warning (most of the instances
are in header files), which is quite noisy.
To keep a normal build green through CONFIG_WERROR, only show this
warning with W=1, which will allow automated build systems to catch new
instances of the warning so that the total number can be driven down to
zero eventually since catching unaligned accesses at compile time would
be generally useful.
When setting the fan speed, i8k_set_fan() calls i8k_get_fan_status(),
causing an unnecessary SMM call since from the two users of this
function, only i8k_ioctl_unlocked() needs to know the new fan status
while dell_smm_write() ignores the new fan status.
Since SMM calls can be very slow while also making error reporting
difficult for dell_smm_write(), remove the function call from
i8k_set_fan() and call it separately in i8k_ioctl_unlocked().
_get_table_maxdiv() tries to access "clk_div_table" array out of bound
defined in phy-j721e-wiz.c. Add a sentinel entry to prevent
the following global-out-of-bounds error reported by enabling KASAN.
d97a9d7aea04 ("staging/speakup: Add inflection synth parameter")
introduced the inflection parameter, but happened to drop the pitch
parameter from the dectlk driver. This restores it.
Programmable lab power supplies made by GW Instek, such as the
GPP-2323, have a USB port exposing a serial port to control the device.
Stringing the supplied Windows driver, references to the ch341 chip are
found. Binding the existing ch341 driver to the VID/PID of the GPP-2323
("GW Instek USB2.0-Serial" as per the USB product name) works out of the
box, communication and control is now possible.
This patch should work with any GPP series power supply due to
similarities in the product line.