The ->srcu_last_gp_end field is accessed from any CPU at any time
by synchronize_srcu(), so non-initialization references need to use
READ_ONCE() and WRITE_ONCE(). This commit therefore makes that change.
Reported-by: syzbot+08f3e9d26e5541e1ecf2@syzkaller.appspotmail.com Acked-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
read to 0xffffffff85a7f190 of 8 bytes by task 10 on cpu 1:
rcu_gp_fqs_check_wake kernel/rcu/tree.c:1556 [inline]
rcu_gp_fqs_check_wake+0x93/0xd0 kernel/rcu/tree.c:1546
rcu_gp_fqs_loop+0x36c/0x580 kernel/rcu/tree.c:1611
rcu_gp_kthread+0x143/0x220 kernel/rcu/tree.c:1768
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 10 Comm: rcu_preempt Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com>
[ paulmck: Added another READ_ONCE() for RCU CPU stall warnings. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The rcu_node structure's ->expmask field is accessed locklessly when
starting a new expedited grace period and when reporting an expedited
RCU CPU stall warning. This commit therefore handles the former by
taking a snapshot of ->expmask while the lock is held and the latter
by applying READ_ONCE() to lockless reads and WRITE_ONCE() to the
corresponding updates.
Reading the sched_cmdline_ref and sched_tgid_ref initial state within
tracing_start_sched_switch without holding the sched_register_mutex is
racy against concurrent updates, which can lead to tracepoint probes
being registered more than once (and thus trigger warnings within
tracepoint.c).
[ May be the fix for this bug ] Link: https://lore.kernel.org/r/000000000000ab6f84056c786b93@google.com Link: http://lkml.kernel.org/r/20190817141208.15226-1-mathieu.desnoyers@efficios.com Cc: stable@vger.kernel.org CC: Steven Rostedt (VMware) <rostedt@goodmis.org> CC: Joel Fernandes (Google) <joel@joelfernandes.org> CC: Peter Zijlstra <peterz@infradead.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul E. McKenney <paulmck@linux.ibm.com> Reported-by: syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com Fixes: d914ba37d7145 ("tracing: Add support for recording tgid of tasks") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Test case 66 'Use vfs_getname probe to get syscall args filenames'
> is broken on s390, but works on x86. The test case fails with:
>
> [root@m35lp76 perf]# perf test -F 66
> 66: Use vfs_getname probe to get syscall args filenames
> :Recording open file:
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.004 MB /tmp/__perf_test.perf.data.TCdYj\
> (20 samples) ]
> Looking at perf.data file for vfs_getname records for the file we touched:
> FAILED!
> [root@m35lp76 perf]#
The root cause was the print_fmt of the kprobe event that referenced the
"ustring"
A use of uninitialized memory in msgctl_down() because msqid64 in
ksys_msgctl hasn't been initialized. The local | msqid64 | is created in
ksys_msgctl() and then passed into msgctl_down(). Along the way msqid64
is never initialized before msgctl_down() checks msqid64->msg_qbytes.
KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool)
reports:
==================================================================
BUG: KUMSAN: use of uninitialized memory in msgctl_down+0x94/0x300
Read of size 8 at addr ffff88806bb97eb8 by task syz-executor707/2022
ALSA PCM core recently introduced a new managed PCM buffer allocation
mode that does allocate / free automatically at hw_params and
hw_free. However, it overlooked the code path directly calling
hw_free PCM ops at releasing the PCM substream, and it may result in a
memory leak as spotted by syzkaller when no buffer preallocation is
used (e.g. vmalloc buffer).
This patch papers over it with a slight refactoring. The hw_free ops
call and relevant tasks are unified in a new helper function, and call
it from both places.
find_set_and_id() is called when the NFNL_SUBSYS_IPSET mutex is held.
However, in the error path there can be a follow-up recvmsg() without
the mutex held. Use the start() function of struct netlink_dump_control
instead of dump() to verify and report if the specified set does not
exist.
Thanks to Pablo Neira Ayuso for helping me to understand the subleties
of the netlink protocol.
The problem with this descriptor is that it is self-referential: the
source ID of 0 matches itself! This causes the 'struct uvc_entity'
representing the display to be added to its chain list twice during
'uvc_scan_chain()': once via 'uvc_scan_chain_entity()' when it is
processed directly from the 'dev->entities' list and then again
immediately afterwards when trying to follow the source ID in
'uvc_scan_chain_forward()'
Add a check before adding an entity to a chain list to ensure that the
entity is not already part of a chain.
Link: https://lore.kernel.org/linux-media/CAAeHK+z+Si69jUR+N-SjN9q4O+o5KFiNManqEa-PjUta7EOb7A@mail.gmail.com/ Cc: <stable@vger.kernel.org> Fixes: c0efd232929c ("V4L/DVB (8145a): USB Video Class driver") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current logic that calls pci_disable_device() in __bnxt_close_nic()
during firmware reset is flawed. If firmware is still alive, we're
disabling the device too early, causing some firmware commands to
not reach the firmware.
Fix it by moving the logic to bnxt_reset_close(). If firmware is
in fatal condition, we call pci_disable_device() before we free
any of the rings to prevent DMA corruption of the freed rings. If
firmware is still alive, we call pci_disable_device() after the
last firmware message has been sent.
When netdevsim dev is being created, a debugfs directory is created.
The variable "dev_ddir_name" is 16bytes device name pointer and device
name is "netdevsim<dev id>".
The maximum dev id length is 10.
So, 16bytes for device name isn't enough.
Test commands:
modprobe netdevsim
echo "1000000000 0" > /sys/bus/netdevsim/new_device
Commit 6d97985072dc ("isdn: move capi drivers to staging") cleaned up the
isdn drivers and split the MAINTAINERS section for ISDN, but missed to add
the terminal slash for the two directories mISDN and hardware. Hence, all
files in those directories were not part of the new ISDN/mISDN SUBSYSTEM,
but were considered to be part of "THE REST".
Rectify the situation, and while at it, also complete the section with two
further build files that belong to that subsystem.
This was identified with a small script that finds all files belonging to
"THE REST" according to the current MAINTAINERS file, and I investigated
upon its output.
bnxt_ulp_start() needs to be called before SRIOV is re-enabled after
firmware reset. Re-enabling SRIOV may consume all the resources and
may cause the RDMA driver to fail to get MSIX and other resources.
Fix it by calling bnxt_ulp_start() first before calling
bnxt_reenable_sriov().
We re-arrange the logic so that we call bnxt_ulp_start() and
bnxt_reenable_sriov() in proper sequence in bnxt_fw_reset_task() and
bnxt_open(). The former is the normal coordinated firmware reset sequence
and the latter is firmware reset while the function is down. This new
logic is now more straight forward and will now fix both scenarios.
Fixes: f3a6d206c25a ("bnxt_en: Call bnxt_ulp_stop()/bnxt_ulp_start() during error recovery.") Reported-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Put the current logic in bnxt_open() to re-enable SRIOV after detecting
firmware reset into a new function bnxt_reenable_sriov(). This call
needs to be invoked in the firmware reset path also in the next patch.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a call is disconnected, the connection pointer from the call is
cleared to make sure it isn't used again and to prevent further attempted
transmission for the call. Unfortunately, there might be a daemon trying
to use it at the same time to transmit a packet.
Fix this by keeping call->conn set, but setting a flag on the call to
indicate disconnection instead.
Remove also the bits in the transmission functions where the conn pointer is
checked and a ref taken under spinlock as this is now redundant.
Fixes: 8d94aa381dab ("rxrpc: Calls shouldn't hold socket refs") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The introduction of a split between the reference count on rxrpc_local
objects and the usage count didn't quite go far enough. A number of kernel
work items need to make use of the socket to perform transmission. These
also need to get an active count on the local object to prevent the socket
from being closed.
Fix this by getting the active count in those places.
Also split out the raw active count get/put functions as these places tend
to hold refs on the rxrpc_local object already, so getting and putting an
extra object ref is just a waste of time.
In rxrpc_input_data(), rxrpc_notify_socket() is called if the base sequence
number of the packet is immediately following the hard-ack point at the end
of the function. However, this isn't sufficient, since the recvmsg side
may have been advancing the window and then overrun the position in which
we're adding - at which point rx_hard_ack >= seq0 and no notification is
generated.
Fix this by always generating a notification at the end of the input
function.
Without this, a long call may stall, possibly indefinitely.
Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix rxrpc_put_local() to not access local->debug_id after calling
atomic_dec_return() as, unless that returned n==0, we no longer have the
right to access the object.
Fixes: 06d9532fa6b3 ("rxrpc: Fix read-after-free in rxrpc_queue_local()") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 6fa8c0144b77 ("[NET_SCHED]: Use nla_policy for attribute validation in classifiers") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When running v5.5 with a rootfs on NFS, memory abort may happen in
the system resume stage:
Unable to handle kernel paging request at virtual address dead00000000012a
[dead00000000012a] address between user and kernel address ranges
pc : run_timer_softirq+0x334/0x3d8
lr : run_timer_softirq+0x244/0x3d8
x1 : ffff800011cafe80 x0 : dead000000000122
Call trace:
run_timer_softirq+0x334/0x3d8
efi_header_end+0x114/0x234
irq_exit+0xd0/0xd8
__handle_domain_irq+0x60/0xb0
gic_handle_irq+0x58/0xa8
el1_irq+0xb8/0x180
arch_cpu_idle+0x10/0x18
do_idle+0x1d8/0x2b0
cpu_startup_entry+0x24/0x40
secondary_start_kernel+0x1b4/0x208
Code: f9000693a9400660f9000020b4000040 (f9000401)
---[ end trace bb83ceeb4c482071 ]---
Kernel panic - not syncing: Fatal exception in interrupt
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 2-3
Kernel Offset: disabled
CPU features: 0x00002,2300aa30
Memory Limit: none
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
It's found that stmmac_xmit() and stmmac_resume() sometimes might
run concurrently, possibly resulting in a race condition between
mod_timer() and setup_timer(), being called by stmmac_xmit() and
stmmac_resume() respectively.
Since the resume() runs setup_timer() every time, it'd be safer to
have del_timer_sync() in the suspend() as the counterpart.
The driver currently only calls netdev_set_tc_queue when the number of
TCs is greater than 1. Instead, the comparison should be greater than
or equal to 1. Even with 1 TC, we need to set the queue mapping.
This bug can cause warnings when the number of TCs is changed back to 1.
Fixes: 7809592d3e2e ("bnxt_en: Enable MSIX early in bnxt_init_one().") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In order to fix XDP support if sw buffer management is used as fallback
for hw bm devices, define MVNETA_SKB_HEADROOM as maximum between
XDP_PACKET_HEADROOM and NET_SKB_PAD and let the hw aligns the IP header
to 4-byte boundary.
Fix rx_offset_correction initialization if mvneta_bm_port_init fails in
mvneta_resume routine
Fixes: 0db51da7a8e9 ("net: mvneta: add basic XDP support") Tested-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
VF numbers were assigned to node_guid and port_guid, but cleared
right before such query calls were issued. It caused to return
node/port GUIDs of VF index 0 for all VFs.
Fixes: 30aad41721e0 ("net/core: Add support for getting VF GUIDs") Reported-by: Adrian Chiris <adrianc@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As Eric noticed, tcindex_alloc_perfect_hash() uses cp->hash
to compute the size of memory allocation, but cp->hash is
set again after the allocation, this caused an out-of-bound
access.
So we have to move all cp->hash initialization and computation
before the memory allocation. Move cp->mask and cp->shift together
as cp->hash may need them for computation too.
Reported-and-tested-by: syzbot+35d4dea36c387813ed31@syzkaller.appspotmail.com Fixes: 331b72922c5f ("net: sched: RCU cls_tcindex") Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: c5a759117210 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the past it was possible to create multiple L2TPv3 sessions with the
same session id as long as the sessions belonged to different tunnels.
The resulting sessions had issues when used with IP encapsulated tunnels,
but worked fine with UDP encapsulated ones. Some applications began to
rely on this behaviour to avoid having to negotiate unique session ids.
Some time ago a change was made to require session ids to be unique across
all tunnels, breaking the applications making use of this "feature".
This change relaxes the duplicate session id check to allow duplicates
if both of the colliding sessions belong to UDP encapsulated tunnels.
Fixes: dbdbc73b4478 ("l2tp: fix duplicate session creation") Signed-off-by: Ridge Kennedy <ridge.kennedy@alliedtelesis.co.nz> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gtp hashtable size is received by user-space.
So, this hashtable size could be too large. If so, kmalloc will internally
print a warning message.
This warning message is actually not necessary for the gtp module.
So, this patch adds __GFP_NOWARN to avoid this message.
Latest kernels get the phys_port_name via devlink, if
ndo_get_phys_port_name is not defined. To provide the phys_port_name
correctly, register devlink before registering netdev.
Also call devlink_port_type_eth_set() after registering netdev as
devlink port updates the netdev structure and notifies user.
Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As discussed in the strace issue tracker, it appears that the sparc32
sysvipc support has been broken for the past 11 years. It was however
working in compat mode, which is how it must have escaped most of the
regular testing.
The problem is that a cleanup patch inadvertently changed the uid/gid
fields in struct ipc64_perm from 32-bit types to 16-bit types in uapi
headers.
Both glibc and uclibc-ng still use the original types, so they should
work fine with compat mode, but not natively. Change the definitions
to use __kernel_uid32_t and __kernel_gid32_t again.
Fixes: 83c86984bff2 ("sparc: unify ipcbuf.h") Link: https://github.com/strace/strace/issues/116 Cc: <stable@vger.kernel.org> # v2.6.29 Cc: Sam Ravnborg <sam@ravnborg.org> Cc: "Dmitry V . Levin" <ldv@altlinux.org> Cc: Rich Felker <dalias@libc.org> Cc: libc-alpha@sourceware.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There was some logic added a while ago to clear out f_bavail in statfs()
if we did not have enough free metadata space to satisfy our global
reserve. This was incorrect at the time, however didn't really pose a
problem for normal file systems because we would often allocate chunks
if we got this low on free metadata space, and thus wouldn't really hit
this case unless we were actually full.
Fast forward to today and now we are much better about not allocating
metadata chunks all of the time. Couple this with d792b0f19711 ("btrfs:
always reserve our entire size for the global reserve") which now means
we'll easily have a larger global reserve than our free space, we are
now more likely to trip over this while still having plenty of space.
Fix this by skipping this logic if the global rsv's space_info is not
full. space_info->full is 0 unless we've attempted to allocate a chunk
for that space_info and that has failed. If this happens then the space
for the global reserve is definitely sacred and we need to report
b_avail == 0, but before then we can just use our calculated b_avail.
Reported-by: Martin Steigerwald <martin@lichtvoll.de> Fixes: ca8a51b3a979 ("btrfs: statfs: report zero available if metadata are exhausted") CC: stable@vger.kernel.org # 4.5+ Reviewed-by: Qu Wenruo <wqu@suse.com> Tested-By: Martin Steigerwald <martin@lichtvoll.de> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Both stem from a call to cgroup_type_write. The first warning was also
triggered by syzkaller.
When we're switching cgroup to threaded mode shortly after a subsystem
was disabled on it, we can see the respective subsystem css dying there.
The warning in cgroup_apply_control_enable is harmless in this case
since we're not adding new subsys anyway.
The warning in cgroup_apply_control_disable indicates an attempt to kill
css of recently disabled subsystem repeatedly.
The commit prevents these situations by making cgroup_type_write wait
for all dying csses to go away before re-applying subtree controls.
When at it, the locations of WARN_ON_ONCE calls are moved so that
warning is triggered only when we are about to misuse the dying css.
Reported-by: syzbot+5493b2a54d31d6aea629@syzkaller.appspotmail.com Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Syzbot managed to trigger a use after free "KASAN: use-after-free Write
in hci_sock_bind". I have reviewed the code manually and one possibly
cause I have found is that we are not holding lock_sock(sk) when we do
the hci_dev_put(hdev) in hci_sock_release(). My theory is that the bind
and the release are racing against each other which results in this use
after free.
Reported-by: syzbot+eba992608adf3d796bcc@syzkaller.appspotmail.com Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tpk_write()/tpk_close() could be interrupted when holding a mutex, then
in timer handler tpk_write() may be called again trying to acquire same
mutex, lead to deadlock.
Google syzbot reported this issue with CONFIG_DEBUG_ATOMIC_SLEEP
enabled:
BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:938
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/1
1 lock held by swapper/1/0:
...
Call Trace:
<IRQ>
dump_stack+0x197/0x210
___might_sleep.cold+0x1fb/0x23e
__might_sleep+0x95/0x190
__mutex_lock+0xc5/0x13c0
mutex_lock_nested+0x16/0x20
tpk_write+0x5d/0x340
resync_tnc+0x1b6/0x320
call_timer_fn+0x1ac/0x780
run_timer_softirq+0x6c3/0x1790
__do_softirq+0x262/0x98c
irq_exit+0x19b/0x1e0
smp_apic_timer_interrupt+0x1a3/0x610
apic_timer_interrupt+0xf/0x20
</IRQ>
See link https://syzkaller.appspot.com/bug?extid=2eeef62ee31f9460ad65 for
more details.
Fix it by using spinlock in process context instead of mutex and having
interrupt disabled in critical section.
syzbot is reporting that there is a race at tomoyo_stat_update() [1].
Although it is acceptable to fail to track exact number of times policy
was updated, convert to atomic_t because this is not a hot path.
Allocate gspca_dev->usb_buf with kzalloc instead of kmalloc to
ensure it is property zeroed. This fixes various syzbot errors
about uninitialized data.
When a filesystem is mounted with jdev mount option, we store the
journal device name in an allocated string in superblock. However we
fail to ever free that string. Fix it.
Reported-by: syzbot+1c6756baf4b16b94d2a6@syzkaller.appspotmail.com Fixes: c3aa077648e1 ("reiserfs: Properly display mount options in /proc/mounts") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
What we are trying to do is change the '=' character to a NUL terminator
and then at the end of the function we restore it back to an '='. The
problem is there are two error paths where we jump to the end of the
function before we have replaced the '=' with NUL.
We end up putting the '=' in the wrong place (possibly one element
before the start of the buffer).
Link: http://lkml.kernel.org/r/20200115055426.vdjwvry44nfug7yy@kili.mountain Reported-by: syzbot+e64a13c5369a194d67df@syzkaller.appspotmail.com Fixes: 095f1fc4ebf3 ("mempolicy: rework shmem mpol parsing and display") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz>
Dmitry Vyukov <dvyukov@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since v4.3-rc1 commit 0723c05fb75e44 ("arm64: enable more compressed
Image formats"), it is possible to build Image.{bz2,lz4,lzma,lzo}
AArch64 images. However, the commit missed adding support for removing
those images on 'make ARCH=arm64 (dist)clean'.
Fix this by adding them to the target list.
Make sure to match the order of the recipes in the makefile.
Disable a couple of compilation warnings (which are treated as errors)
on strlcpy() definition and declaration, allowing users to compile perf
and kernel (objtool) when:
1. glibc have strlcpy() (such as in ALT Linux since 2004) objtool and
perf build fails with this (in gcc):
In file included from exec-cmd.c:3:
tools/include/linux/string.h:20:15: error: redundant redeclaration of ‘strlcpy’ [-Werror=redundant-decls]
20 | extern size_t strlcpy(char *dest, const char *src, size_t size);
2. clang ignores `-Wredundant-decls', but produces another warning when
building perf:
CC util/string.o
../lib/string.c:99:8: error: attribute declaration must precede definition [-Werror,-Wignored-attributes]
size_t __weak strlcpy(char *dest, const char *src, size_t size)
../../tools/include/linux/compiler.h:66:34: note: expanded from macro '__weak'
# define __weak __attribute__((weak))
/usr/include/bits/string_fortified.h:151:8: note: previous definition is here
__NTH (strlcpy (char *__restrict __dest, const char *__restrict __src,
Committer notes:
The
#pragma GCC diagnostic
directive was introduced in gcc 4.6, so check for that as well.
The commit 4585fbcb5331 ("PM / devfreq: Modify the device name as devfreq(X) for
sysfs") changed the node name to devfreq(x). After this commit, it is not
possible to get the device name through /sys/class/devfreq/devfreq(X)/*.
Add new name attribute in order to get device name.
Cc: stable@vger.kernel.org Fixes: 4585fbcb5331 ("PM / devfreq: Modify the device name as devfreq(X) for sysfs") Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 722ddfde366f ("perf tools: Fix time sorting") changed - correctly
so - hist_entry__sort to return int64. Unfortunately several of the
builtin-c2c.c comparison routines only happened to work due the cast
caused by the wrong return type.
This causes meaningless ordering of both the cacheline list, and the
cacheline details page. E.g a simple:
perf c2c record -a sleep 3
perf c2c report
will result in cacheline table like
=================================================
Shared Data Cache Line Table
=================================================
#
# ------- Cacheline ---------- Total Tot - LLC Load Hitm - - Store Reference - - Load Dram - LLC Total - Core Load Hit - - LLC Load Hit -
# Index Address Node PA cnt records Hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2 Llc Rmt
# ..... .............. .... ...... ....... ...... ..... ..... ... .... ..... ...... ...... .... ...... ..... ..... ..... ... .... .......
On filesystems with a block size smaller than the page size,
gfs2_find_jhead can split a page across two bios (for example, when
blocks are not allocated consecutively). When that happens, the first
bio that completes will unlock the page in its bi_end_io handler even
though the page hasn't been read completely yet. Fix that by using a
chained bio for the rest of the page.
While at it, clean up the sector calculation logic in
gfs2_log_alloc_bio. In gfs2_find_jhead, simplify the disk block and
offset calculation logic and fix a variable name.
Fixes: f4686c26ecc3 ("gfs2: read journal in large chunks") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a race condition in the following scenario which results in an
use-after-free issue when reading a monitoring file and deleting the
parent ctrl_mon group concurrently:
Thread 1 calls atomic_inc() to take refcount of rdtgrp and then calls
kernfs_break_active_protection() to drop the active reference of kernfs
node in rdtgroup_kn_lock_live().
In Thread 2, kernfs_remove() is a blocking routine. It waits on all sub
kernfs nodes to drop the active reference when removing all subtree
kernfs nodes recursively. Thread 2 could block on kernfs_remove() until
Thread 1 calls kernfs_break_active_protection(). Only after
kernfs_remove() completes the refcount of rdtgrp could be trusted.
Before Thread 1 calls atomic_inc() and kernfs_break_active_protection(),
Thread 2 could call kfree() when the refcount of rdtgrp (sentry) is 0
instead of 1 due to the race.
In Thread 1, in rdtgroup_kn_unlock(), referring to earlier rdtgrp memory
(rdtgrp->waitcount) which was already freed in Thread 2 results in
use-after-free issue.
Thread 1 (rdtgroup_mondata_show) Thread 2 (rdtgroup_rmdir)
-------------------------------- -------------------------
rdtgroup_kn_lock_live
/*
* kn active protection until
* kernfs_break_active_protection(kn)
*/
rdtgrp = kernfs_to_rdtgroup(kn)
rdtgroup_kn_lock_live
atomic_inc(&rdtgrp->waitcount)
mutex_lock
rdtgroup_rmdir_ctrl
free_all_child_rdtgrp
/*
* sentry->waitcount should be 1
* but is 0 now due to the race.
*/
kfree(sentry)*[1]
/*
* Only after kernfs_remove()
* completes, the refcount of
* rdtgrp could be trusted.
*/
atomic_inc(&rdtgrp->waitcount)
/* kn->active-- */
kernfs_break_active_protection(kn)
rdtgroup_ctrl_remove
rdtgrp->flags = RDT_DELETED
/*
* Blocking routine, wait for
* all sub kernfs nodes to drop
* active reference in
* kernfs_break_active_protection.
*/
kernfs_remove(rdtgrp->kn)
rdtgroup_kn_unlock
mutex_unlock
atomic_dec_and_test(
&rdtgrp->waitcount)
&& (flags & RDT_DELETED)
kernfs_unbreak_active_protection(kn)
kfree(rdtgrp)
mutex_lock
mon_event_read
rdtgroup_kn_unlock
mutex_unlock
/*
* Use-after-free: refer to earlier rdtgrp
* memory which was freed in [1].
*/
atomic_dec_and_test(&rdtgrp->waitcount)
&& (flags & RDT_DELETED)
/* kn->active++ */
kernfs_unbreak_active_protection(kn)
kfree(rdtgrp)
Fix it by moving free_all_child_rdtgrp() to after kernfs_remove() in
rdtgroup_rmdir_ctrl() to ensure it has the accurate refcount of rdtgrp.
A resource group (rdtgrp) contains a reference count (rdtgrp->waitcount)
that indicates how many waiters expect this rdtgrp to exist. Waiters
could be waiting on rdtgroup_mutex or some work sitting on a task's
workqueue for when the task returns from kernel mode or exits.
The deletion of a rdtgrp is intended to have two phases:
(1) while holding rdtgroup_mutex the necessary cleanup is done and
rdtgrp->flags is set to RDT_DELETED,
(2) after releasing the rdtgroup_mutex, the rdtgrp structure is freed
only if there are no waiters and its flag is set to RDT_DELETED. Upon
gaining access to rdtgroup_mutex or rdtgrp, a waiter is required to check
for the RDT_DELETED flag.
When unmounting the resctrl file system or deleting ctrl_mon groups,
all of the subdirectories are removed and the data structure of rdtgrp
is forcibly freed without checking rdtgrp->waitcount. If at this point
there was a waiter on rdtgrp then a use-after-free issue occurs when the
waiter starts running and accesses the rdtgrp structure it was waiting
on.
See kfree() calls in [1], [2] and [3] in these two call paths in
following scenarios:
(1) rdt_kill_sb() -> rmdir_all_sub() -> free_all_child_rdtgrp()
(2) rdtgroup_rmdir() -> rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp()
There are several scenarios that result in use-after-free issue in
following:
Scenario 1:
-----------
In Thread 1, rdtgroup_tasks_write() adds a task_work callback
move_myself(). If move_myself() is scheduled to execute after Thread 2
rdt_kill_sb() is finished, referring to earlier rdtgrp memory
(rdtgrp->waitcount) which was already freed in Thread 2 results in
use-after-free issue.
Thread 1 (rdtgroup_tasks_write) Thread 2 (rdt_kill_sb)
------------------------------- ----------------------
rdtgroup_kn_lock_live
atomic_inc(&rdtgrp->waitcount)
mutex_lock
rdtgroup_move_task
__rdtgroup_move_task
/*
* Take an extra refcount, so rdtgrp cannot be freed
* before the call back move_myself has been invoked
*/
atomic_inc(&rdtgrp->waitcount)
/* Callback move_myself will be scheduled for later */
task_work_add(move_myself)
rdtgroup_kn_unlock
mutex_unlock
atomic_dec_and_test(&rdtgrp->waitcount)
&& (flags & RDT_DELETED)
mutex_lock
rmdir_all_sub
/*
* sentry and rdtgrp are freed
* without checking refcount
*/
free_all_child_rdtgrp
kfree(sentry)*[1]
kfree(rdtgrp)*[2]
mutex_unlock
/*
* Callback is scheduled to execute
* after rdt_kill_sb is finished
*/
move_myself
/*
* Use-after-free: refer to earlier rdtgrp
* memory which was freed in [1] or [2].
*/
atomic_dec_and_test(&rdtgrp->waitcount)
&& (flags & RDT_DELETED)
kfree(rdtgrp)
Scenario 2:
-----------
In Thread 1, rdtgroup_tasks_write() adds a task_work callback
move_myself(). If move_myself() is scheduled to execute after Thread 2
rdtgroup_rmdir() is finished, referring to earlier rdtgrp memory
(rdtgrp->waitcount) which was already freed in Thread 2 results in
use-after-free issue.
Thread 1 (rdtgroup_tasks_write) Thread 2 (rdtgroup_rmdir)
------------------------------- -------------------------
rdtgroup_kn_lock_live
atomic_inc(&rdtgrp->waitcount)
mutex_lock
rdtgroup_move_task
__rdtgroup_move_task
/*
* Take an extra refcount, so rdtgrp cannot be freed
* before the call back move_myself has been invoked
*/
atomic_inc(&rdtgrp->waitcount)
/* Callback move_myself will be scheduled for later */
task_work_add(move_myself)
rdtgroup_kn_unlock
mutex_unlock
atomic_dec_and_test(&rdtgrp->waitcount)
&& (flags & RDT_DELETED)
rdtgroup_kn_lock_live
atomic_inc(&rdtgrp->waitcount)
mutex_lock
rdtgroup_rmdir_ctrl
free_all_child_rdtgrp
/*
* sentry is freed without
* checking refcount
*/
kfree(sentry)*[3]
rdtgroup_ctrl_remove
rdtgrp->flags = RDT_DELETED
rdtgroup_kn_unlock
mutex_unlock
atomic_dec_and_test(
&rdtgrp->waitcount)
&& (flags & RDT_DELETED)
kfree(rdtgrp)
/*
* Callback is scheduled to execute
* after rdt_kill_sb is finished
*/
move_myself
/*
* Use-after-free: refer to earlier rdtgrp
* memory which was freed in [3].
*/
atomic_dec_and_test(&rdtgrp->waitcount)
&& (flags & RDT_DELETED)
kfree(rdtgrp)
If CONFIG_DEBUG_SLAB=y, Slab corruption on kmalloc-2k can be observed
like following. Note that "0x6b" is POISON_FREE after kfree(). The
corrupted bits "0x6a", "0x64" at offset 0x424 correspond to
waitcount member of struct rdtgroup which was freed:
Slab corruption (Not tainted): kmalloc-2k start=ffff9504c5b0d000, len=2048
420: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkjkkkkkkkkkkk
Single bit error detected. Probably bad RAM.
Run memtest86+ or a similar memory test tool.
Next obj: start=ffff9504c5b0d800, len=2048
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Fix this by taking reference count (waitcount) of rdtgrp into account in
the two call paths that currently do not do so. Instead of always
freeing the resource group it will only be freed if there are no waiters
on it. If there are waiters, the resource group will have its flags set
to RDT_DELETED.
It will be left to the waiter to free the resource group when it starts
running and finding that it was the last waiter and the resource group
has been removed (rdtgrp->flags & RDT_DELETED) since. (1) rdt_kill_sb()
-> rmdir_all_sub() -> free_all_child_rdtgrp() (2) rdtgroup_rmdir() ->
rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp()
Thread 1 is deleting control group "c1". Holding rdtgroup_mutex,
kernfs_remove() removes all kernfs nodes under directory "c1"
recursively, then waits for sub kernfs node "mon_groups" to drop active
reference.
Thread 2 is trying to create a subdirectory "m1" in the "mon_groups"
directory. The wrapper kernfs_iop_mkdir() takes an active reference to
the "mon_groups" directory but the code drops the active reference to
the parent directory "c1" instead.
As a result, Thread 1 is blocked on waiting for active reference to drop
and never release rdtgroup_mutex, while Thread 2 is also blocked on
trying to get rdtgroup_mutex.
rdtgroup_ctrl_remove
rdtgrp->flags = RDT_DELETED
kernfs_get(kn)
kernfs_remove(rdtgrp->kn)
__kernfs_remove
/* "mon_groups", sub_kn */
atomic_add(KN_DEACTIVATED_BIAS, &sub_kn->active)
kernfs_drain(sub_kn)
/*
* sub_kn->active == KN_DEACTIVATED_BIAS + 1,
* waiting on sub_kn->active to drop, but it
* never drops in Thread 2 which is blocked
* on getting rdtgroup_mutex.
*/
Thread 1 hangs here ---->
wait_event(sub_kn->active == KN_DEACTIVATED_BIAS)
...
rdtgroup_mkdir
rdtgroup_mkdir_mon(parent_kn, prgrp_kn)
mkdir_rdt_prepare(parent_kn, prgrp_kn)
rdtgroup_kn_lock_live(prgrp_kn)
atomic_inc(&rdtgrp->waitcount)
/*
* "c1", prgrp_kn->active--
*
* The active reference on "c1" is
* dropped, but not matching the
* actual active reference taken
* on "mon_groups", thus causing
* Thread 1 to wait forever while
* holding rdtgroup_mutex.
*/
kernfs_break_active_protection(
prgrp_kn)
/*
* Trying to get rdtgroup_mutex
* which is held by Thread 1.
*/
Thread 2 hangs here ----> mutex_lock
...
The problem is that the creation of a subdirectory in the "mon_groups"
directory incorrectly releases the active protection of its parent
directory instead of itself before it starts waiting for rdtgroup_mutex.
This is triggered by the rdtgroup_mkdir() flow calling
rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() with kernfs node of the
parent control group ("c1") as argument. It should be called with kernfs
node "mon_groups" instead. What is currently missing is that the
kn->priv of "mon_groups" is NULL instead of pointing to the rdtgrp.
Fix it by pointing kn->priv to rdtgrp when "mon_groups" is created. Then
it could be passed to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
instead. And then it operates on the same rdtgroup structure but handles
the active reference of kernfs node "mon_groups" to prevent deadlock.
The same changes are also made to the "mon_data" directories.
This results in some unused function parameters that will be cleaned up
in follow-up patch as the focus here is on the fix only in support of
backporting efforts.
In recent DFS updates we have a new variable controlling how many times we will
retry to reconnect the share.
If DFS is not used, then this variable is initialized to 0 in:
This means that in the reconnect loop in smb2_reconnect() we will immediately wrap retries to -1
and never actually get to pass this conditional:
if (--retries)
continue;
The effect is that we no longer reach the point where we fail the commands with -EHOSTDOWN
and basically the kernel threads are virtually hung and unkillable.
Fixes: a3a53b7603798fd8 (cifs: Add support for failover in smb2_reconnect()) Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Brown paperbag time: fetching ->i_uid/->i_mode really should've been
done from nd->inode. I even suggested that, but the reason for that has
slipped through the cracks and I went for dir->d_inode instead - made
for more "obvious" patch.
Analysis:
- at the entry into do_last() and all the way to step_into(): dir (aka
nd->path.dentry) is known not to have been freed; so's nd->inode and
it's equal to dir->d_inode unless we are already doomed to -ECHILD.
inode of the file to get opened is not known.
- after step_into(): inode of the file to get opened is known; dir
might be pointing to freed memory/be negative/etc.
- at the call of may_create_in_sticky(): guaranteed to be out of RCU
mode; inode of the file to get opened is known and pinned; dir might
be garbage.
The last was the reason for the original patch. Except that at the
do_last() entry we can be in RCU mode and it is possible that
nd->path.dentry->d_inode has already changed under us.
In that case we are going to fail with -ECHILD, but we need to be
careful; nd->inode is pointing to valid struct inode and it's the same
as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we
should use that.
Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com> Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com
Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Fixes: d0cb50185ae9 ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ADC in the JZ4740 can work either in high-precision mode with a 2.5V
range, or in low-precision mode with a 7.5V range. The code in place in
this driver will select the proper scale according to the maximum
voltage of the battery.
The JZ4770 however only has one mode, with a 6.6V range. If only one
scale is available, there's no need to change it (and nothing to change
it to), and trying to do so will fail with -EINVAL.
Fixes: fb24ccfbe1e0 ("power: supply: add Ingenic JZ47xx battery driver.") Signed-off-by: Paul Cercueil <paul@crapouillou.net> Acked-by: Artur Rojek <contact@artur-rojek.eu> Cc: stable@vger.kernel.org Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 786b2384bf1c ("um: Enable CONFIG_CONSTRUCTORS").
There are two issues with this commit, uncovered by Anton in tests
on some (Debian) systems:
1) I completely forgot to call any constructors if CONFIG_CONSTRUCTORS
isn't set. Don't recall now if it just wasn't needed on my system, or
if I never tested this case.
2) With that fixed, it works - with CONFIG_CONSTRUCTORS *unset*. If I
set CONFIG_CONSTRUCTORS, it fails again, which isn't totally
unexpected since whatever wanted to run is likely to have to run
before the kernel init etc. that calls the constructors in this case.
Basically, some constructors that gcc emits (libc has?) need to run
very early during init; the failure mode otherwise was that the ptrace
fork test already failed:
----------------------
$ ./linux mem=512M
Core dump limits :
soft - 0
hard - NONE
Checking that ptrace can change system call numbers...check_ptrace : child exited with exitcode 6, while expecting 0; status 0x67f
Aborted
----------------------
Thinking more about this, it's clear that we simply cannot support
CONFIG_CONSTRUCTORS in UML. All the cases we need now (gcov, kasan)
involve not use of the __attribute__((constructor)), but instead
some constructor code/entry generated by gcc. Therefore, we cannot
distinguish between kernel constructors and system constructors.
Thus, revert this commit.
Cc: stable@vger.kernel.org [5.4+] Fixes: 786b2384bf1c ("um: Enable CONFIG_CONSTRUCTORS") Reported-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Richard Weinberger <richard@nod.at>
On VHE systems arch.mdcr_el2 is written to mdcr_el2 at vcpu_load time to
set options for self-hosted debug and the performance monitors
extension.
Unfortunately the value of arch.mdcr_el2 is not calculated until
kvm_arm_setup_debug() in the run loop after the vcpu has been loaded.
This means that the initial brief iterations of the run loop use a zero
value of mdcr_el2 - until the vcpu is preempted. This also results in a
delay between changes to vcpu->guest_debug taking effect.
Fix this by writing to mdcr_el2 in kvm_arm_setup_debug() on VHE systems
when a change to arch.mdcr_el2 has been detected.
Fixes: d5a21bcc2995 ("KVM: arm64: Move common VHE/non-VHE trap config in separate functions") Cc: <stable@vger.kernel.org> # 4.17.x- Suggested-by: James Morse <james.morse@arm.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On module unload of pcrypt we must unregister the crypto algorithms
first and then tear down the padata structure. As otherwise the
crypto algorithms are still alive and can be used while the padata
structure is being freed.
In commit 'a1cf573ee95 ("crypto: caam - select DMA address size at runtime")'
CAAM pointer size (caam_ptr_size) is changed from
sizeof(dma_addr_t) to runtime value computed from MCFGR register.
Therefore, do not reset MCFGR[PS].
When the kernel XTS implementation was extended to deal with ciphertext
stealing in commit 8083b1bf8163 ("crypto: xts - add support for ciphertext
stealing"), a check was added to reject inputs that were too short.
However, in the vmx enablement - commit 239668419349 ("crypto: vmx/xts -
use fallback for ciphertext stealing"), that check wasn't added to the
vmx implementation. This disparity leads to errors like the following:
alg: skcipher: p8_aes_xts encryption unexpectedly succeeded on test vector "random: len=0 klen=64"; expected_error=-22, cfg="random: inplace may_sleep use_finup src_divs=[<flush>66.99%@+10, 33.1%@alignmask+1155]"
Return -EINVAL if asked to operate with a cryptlen smaller than the AES
block size. This brings vmx in line with the generic implementation.
Reported-by: Erhard Furtner <erhard_f@mailbox.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206049 Fixes: 239668419349 ("crypto: vmx/xts - use fallback for ciphertext stealing") Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[dja: commit message] Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As af_alg_release_parent may be called from BH context (most notably
due to an async request that only completes after socket closure,
or as reported here because of an RCU-delayed sk_destruct call), we
must use bh_lock_sock instead of lock_sock.
The driver would fail to stop the command timer in most error paths,
something which specifically could lead to the timer being freed while
still active on I/O errors during probe.
Fix this by making sure that each function starting the timer also stops
it in all relevant error paths.
Reported-by: syzbot+1d1597a5aa3679c65b9f@syzkaller.appspotmail.com Fixes: b78e91bcfb33 ("rsi: Add new firmware loading method") Cc: stable <stable@vger.kernel.org> # 4.12 Cc: Prameela Rani Garnepudi <prameela.j04cs@gmail.com> Cc: Amitkumar Karwar <amit.karwar@redpinesignals.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The subpacket scanning loop in rxrpc_receive_data() references the
subpacket count in the private data part of the sk_buff in the loop
termination condition. However, when the final subpacket is pasted into
the ring buffer, the function is no longer has a ref on the sk_buff and
should not be looking at sp->* any more. This point is actually marked in
the code when skb is cleared (but sp is not - which is an error).
Fix this by caching sp->nr_subpackets in a local variable and using that
instead.
Also clear 'sp' to catch accesses after that point.
This can show up as an oops in rxrpc_get_skb() if sp->nr_subpackets gets
trashed by the sk_buff getting freed and reused in the meantime.
Fixes: e2de6c404898 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Include the size of struct nhmsg size when calculating
how much of a payload to allocate in a new netlink nexthop
notification message.
Without this, we will fail to fill the skbuff at certain nexthop
group sizes.
You can reproduce the failure with the following iproute2 commands:
ip link add dummy1 type dummy
ip link add dummy2 type dummy
ip link add dummy3 type dummy
ip link add dummy4 type dummy
ip link add dummy5 type dummy
ip link add dummy6 type dummy
ip link add dummy7 type dummy
ip link add dummy8 type dummy
ip link add dummy9 type dummy
ip link add dummy10 type dummy
ip link add dummy11 type dummy
ip link add dummy12 type dummy
ip link add dummy13 type dummy
ip link add dummy14 type dummy
ip link add dummy15 type dummy
ip link add dummy16 type dummy
ip link add dummy17 type dummy
ip link add dummy18 type dummy
ip link add dummy19 type dummy
ip ro add 1.1.1.1/32 dev dummy1
ip ro add 1.1.1.2/32 dev dummy2
ip ro add 1.1.1.3/32 dev dummy3
ip ro add 1.1.1.4/32 dev dummy4
ip ro add 1.1.1.5/32 dev dummy5
ip ro add 1.1.1.6/32 dev dummy6
ip ro add 1.1.1.7/32 dev dummy7
ip ro add 1.1.1.8/32 dev dummy8
ip ro add 1.1.1.9/32 dev dummy9
ip ro add 1.1.1.10/32 dev dummy10
ip ro add 1.1.1.11/32 dev dummy11
ip ro add 1.1.1.12/32 dev dummy12
ip ro add 1.1.1.13/32 dev dummy13
ip ro add 1.1.1.14/32 dev dummy14
ip ro add 1.1.1.15/32 dev dummy15
ip ro add 1.1.1.16/32 dev dummy16
ip ro add 1.1.1.17/32 dev dummy17
ip ro add 1.1.1.18/32 dev dummy18
ip ro add 1.1.1.19/32 dev dummy19
ip next add id 1 via 1.1.1.1 dev dummy1
ip next add id 2 via 1.1.1.2 dev dummy2
ip next add id 3 via 1.1.1.3 dev dummy3
ip next add id 4 via 1.1.1.4 dev dummy4
ip next add id 5 via 1.1.1.5 dev dummy5
ip next add id 6 via 1.1.1.6 dev dummy6
ip next add id 7 via 1.1.1.7 dev dummy7
ip next add id 8 via 1.1.1.8 dev dummy8
ip next add id 9 via 1.1.1.9 dev dummy9
ip next add id 10 via 1.1.1.10 dev dummy10
ip next add id 11 via 1.1.1.11 dev dummy11
ip next add id 12 via 1.1.1.12 dev dummy12
ip next add id 13 via 1.1.1.13 dev dummy13
ip next add id 14 via 1.1.1.14 dev dummy14
ip next add id 15 via 1.1.1.15 dev dummy15
ip next add id 16 via 1.1.1.16 dev dummy16
ip next add id 17 via 1.1.1.17 dev dummy17
ip next add id 18 via 1.1.1.18 dev dummy18
ip next add id 19 via 1.1.1.19 dev dummy19
ip next add id 1111 group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19
ip next del id 1111
Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Multicast and broadcast packets can be looped from egress to ingress
pre segmentation with dev_loopback_xmit. That function unconditionally
sets ip_summed to CHECKSUM_UNNECESSARY.
udp_rcv_segment segments gso packets in the udp rx path. Segmentation
usually executes on egress, and does not expect packets of this type.
__udp_gso_segment interprets !CHECKSUM_PARTIAL as CHECKSUM_NONE. But
the offsets are not correct for gso_make_checksum.
UDP GSO packets are of type CHECKSUM_PARTIAL, with their uh->check set
to the correct pseudo header checksum. Reset ip_summed to this type.
(CHECKSUM_PARTIAL is allowed on ingress, see comments in skbuff.h)
Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix xdp_result initialization in netsec_process_rx in order to not
increase rx counters if there is no bpf program attached to the xdp hook
and napi_gro_receive returns GRO_DROP
Fixes: ba2b232108d3c ("net: netsec: add XDP support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix possible use-after-free in in netsec_process_rx that can occurs if
the first packet is sent to the normal networking stack and the
following one is dropped by the bpf program attached to the xdp hook.
Fix the issue defining the skb pointer in the 'budget' loop
Fixes: ba2b232108d3c ("net: netsec: add XDP support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
sport 80 0xffff flowid 1:3
tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
sport 25 0xffff flowid 1:4
where filters are installed on qdisc 1:0, so we can't merely
search from class 1:1 when creating class 1:3 and class 1:4. We have
to walk through all the child classes of the direct parent qdisc.
Otherwise we would miss filters those need reverse binding.
Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current implementations of ops->bind_class() are merely
searching for classid and updating class in the struct tcf_result,
without invoking either of cl_ops->bind_tcf() or
cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
like cbq use them to count filters too. This is why syzbot triggered
the warning in cbq_destroy_class().
In order to fix this, we have to call cl_ops->bind_tcf() and
cl_ops->unbind_tcf() like the filter binding path. This patch does
so by refactoring out two helper functions __tcf_bind_filter()
and __tcf_unbind_filter(), which are lockless and accept a Qdisc
pointer, then teaching each implementation to call them correctly.
Note, we merely pass the Qdisc pointer as an opaque pointer to
each filter, they only need to pass it down to the helper
functions without understanding it at all.
Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class") Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+03c4738ed29d5d366ddf@syzkaller.appspotmail.com Cc: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Recently XDP Support was added to the mvneta driver
for software buffer management only.
It is still possible to attach an XDP program if
hardware buffer management is used.
It is not doing anything at that point.
The patch disallows attaching XDP programs to mvneta
if hardware buffer management is used.
I am sorry about that. It is my first submission and I am having
some troubles with the format of my emails.
v4 -> v5:
- Remove extra tabs
v3 -> v4:
- Please ignore v3 I accidentally submitted
my other patch with git-send-mail and v4 is correct
v2 -> v3:
- My mailserver corrupted the patch
resubmission with git-send-email
v1 -> v2:
- Fixing the patches indentation
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure to use the current alternate setting when verifying the
interface descriptors to avoid binding to an invalid interface.
Failing to do so could cause the driver to misbehave or trigger a WARN()
in usb_submit_urb() that kernels with panic_on_warn set would choke on.
Fixes: 71bb244ba2fd ("brcm80211: fmac: add USB support for bcm43235/6/8 chipsets") Cc: stable <stable@vger.kernel.org> # 3.4 Cc: Arend van Spriel <arend@broadcom.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Fixes: 9150c3adbf24 ("CIFS: Close open handle after interrupted close") Cc: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix two places where we need to adjust down the max response size for
ioctl when it is used together with compounding.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The task which created the MID may be gone by the time cifsd attempts to
call the callbacks on MIDs from cifs_reconnect().
This leads to a use-after-free of the task struct in cifs_wake_up_task:
==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x31a0/0x3270
Read of size 8 at addr ffff8880103e3a68 by task cifsd/630
This can be reliably reproduced by adding the below delay to
cifs_reconnect(), running find(1) on the mount, restarting the samba
server while find is running, and killing find during the delay:
Fix this by holding a reference to the task struct until the MID is
freed.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The chelsio crypto driver is casting 'struct crypto_aead' directly to
'struct crypto_tfm', which is incorrect because the crypto_tfm isn't the
first field of 'struct crypto_aead'. Consequently, the calls to
crypto_tfm_set_flags() are modifying some other field in the struct.
Also, the driver is setting CRYPTO_TFM_RES_BAD_KEY_LEN in
->setauthsize(), not just in ->setkey(). This is incorrect since this
flag is for bad key lengths, not for bad authentication tag lengths.
Fix these bugs by removing the broken crypto_tfm_set_flags() calls from
->setauthsize() and by fixing them in ->setkey().
Fixes: 324429d74127 ("chcr: Support for Chelsio's Crypto Hardware") Cc: <stable@vger.kernel.org> # v4.9+ Cc: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 57ea974fb871 ("driver core: Rewrite test_async_driver_probe
to cover serialization and NUMA affinity"), running the test with NUMA
disabled results in warning messages similar to the following.
test_async_driver test_async_driver.12: NUMA node mismatch -1 != 0
If CONFIG_NUMA=n, dev_to_node(dev) returns -1, and numa_node_id()
returns 0. Both are widely used, so it appears risky to change return
values. Augment the check with IS_ENABLED(CONFIG_NUMA) instead
to fix the problem.
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Fixes: 57ea974fb871 ("driver core: Rewrite test_async_driver_probe to cover serialization and NUMA affinity") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: stable <stable@vger.kernel.org> Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Link: https://lore.kernel.org/r/20191127202453.28087-1-linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit 41c128cb25ce ("iio: st_gyro: Add lsm9ds0-gyro support")
assumes that gyro in LSM9DS0 is the same as others with 0xd4 WAI ID,
but datasheet tells slight different story, i.e. the first scale factor
for the chip is 245 dps, and not 250 dps.
Correct this by introducing a separate settings for LSM9DS0.
Fixes: 41c128cb25ce ("iio: st_gyro: Add lsm9ds0-gyro support")
Depends-on: 45a4e4220bf4 ("iio: gyro: st_gyro: fix L3GD20H support") Cc: Leonard Crestez <leonard.crestez@nxp.com> Cc: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 43e23b6c0b01 ("debugfs: log errors when something goes wrong")
debugfs logs attempts to create existing files.
However binder attempts to create multiple debugfs files with
the same name when a single PID has multiple contexts, this leads
to log spamming during an Android boot (17 such messages during
boot on my system).
Fix this by checking if we already know the PID and only create
the debugfs entry for the first context per PID.
The match data does not have to be a struct device pointer, and indeed
very often is not. Attempt to treat it as such easily results in a
crash.
For the components that are not registered, we don't know which device
is missing. Once it it is there, we can use the struct component to get
the device and whether it's bound or not.
Change the return type to int and return -EPERM when lockdown is enabled
to remove the warning above. Also rename debugfs_is_locked_down to
debugfs_locked_down to make it sound less like it returns a boolean.
Fixes: 5496197f9b08 ("debugfs: Restrict debugfs when the kernel is locked down") Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: stable <stable@vger.kernel.org> Acked-by: James Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/r/20191207161603.35907-1-eric.snowberg@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>