While reading sysctl_tcp_limit_output_bytes, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 46d3ceabd8d9 ("tcp: TCP Small Queues") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This to-be-reverted commit was meant to apply a stricter rule for the
stack to enter pingpong mode. However, the condition used to check for
interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is
jiffy based and might be too coarse, which delays the stack entering
pingpong mode.
We revert this patch so that we no longer use the above condition to
determine interactive session, and also reduce pingpong threshold to 1.
Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") Reported-by: LemmyHuang <hlm3280@163.com> Suggested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220721204404.388396-1-weiwan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tx side sets EOP and RS bits on descriptors to indicate that a
particular descriptor is the last one and needs to generate an irq when
it was sent. These bits should not be checked on completion path
regardless whether it's the Tx or the Rx. DD bit serves this purpose and
it indicates that a particular descriptor is either for Rx or was
successfully Txed. EOF is also set as loopback test does not xmit
fragmented frames.
Look at (DD | EOF) bits setting in ice_lbtest_receive_frames() instead
of EOP and RS pair.
Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a watch is being added to a queue, it needs to guard against
interference from addition of a new watch, manual removal of a watch and
removal of a watch due to some other queue being destroyed.
KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by
holding the key->sem writelocked and by holding refs on both the key and
the queue - but that doesn't prevent interaction from other {key,queue}
pairs.
While add_watch_to_object() does take the spinlock on the event queue,
it doesn't take the lock on the source's watch list. The assumption was
that the caller would prevent that (say by taking key->sem) - but that
doesn't prevent interference from the destruction of another queue.
Fix this by locking the watcher list in add_watch_to_object().
Since __post_watch_notification() walks wlist->watchers with only the
RCU read lock held, we need to use RCU methods to add to the list (we
already use RCU methods to remove from the list).
Fix add_watch_to_object() to use hlist_add_head_rcu() instead of
hlist_add_head() for that list.
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Users may request that pages from an OpenCL SVM allocation be migrated
to the GPU with clEnqueueSVMMigrateMem(). In Nouveau this will call into
nouveau_dmem_migrate_vma() to do the migration. If the total range to be
migrated exceeds SG_MAX_SINGLE_ALLOC the pages will be migrated in
chunks of size SG_MAX_SINGLE_ALLOC. However a typo in updating the
starting address means that only the first chunk will get migrated.
Fix the calculation so that the entire range will get migrated if
possible.
Signed-off-by: Alistair Popple <apopple@nvidia.com> Fixes: e3d8b0890469 ("drm/nouveau/svm: map pages after migration") Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220720062745.960701-1-apopple@nvidia.com Cc: <stable@vger.kernel.org> # v5.8+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch slightly reworks the s390 arch_get_random_seed_{int,long}
implementation: Make sure the CPACF trng instruction is never
called in any interrupt context. This is done by adding an
additional condition in_task().
Justification:
There are some constrains to satisfy for the invocation of the
arch_get_random_seed_{int,long}() functions:
- They should provide good random data during kernel initialization.
- They should not be called in interrupt context as the TRNG
instruction is relatively heavy weight and may for example
make some network loads cause to timeout and buck.
However, it was not clear what kind of interrupt context is exactly
encountered during kernel init or network traffic eventually calling
arch_get_random_seed_long().
After some days of investigations it is clear that the s390
start_kernel function is not running in any interrupt context and
so the trng is called:
which confirms that the call is in softirq context. So in_task() covers exactly
the cases where we want to have CPACF trng called: not in nmi, not in hard irq,
not in soft irq but in normal task context and during kernel init.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Link: https://lore.kernel.org/r/20220713131721.257907-1-freude@linux.ibm.com Fixes: e4f74400308c ("s390/archrandom: simplify back to earlier design and initialize earlier")
[agordeev@linux.ibm.com changed desc, added Fixes and Link, removed -stable] Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
==================================================================
BUG: KASAN: use-after-free in ntfs_ucsncmp+0x123/0x130
Read of size 2 at addr ffff8880751acee8 by task a.out/879
This commit introduced a regression that can cause mount hung. The
changes in __ocfs2_find_empty_slot causes that any node with none-zero
node number can grab the slot that was already taken by node 0, so node 1
will access the same journal with node 0, when it try to grab journal
cluster lock, it will hung because it was already acquired by node 0.
It's very easy to reproduce this, in one cluster, mount node 0 first, then
node 1, you will see the following call trace from node 1.
To fix it, we can just fix __ocfs2_find_empty_slot. But original commit
introduced the feature to mount ocfs2 locally even it is cluster based,
that is a very dangerous, it can easily cause serious data corruption,
there is no way to stop other nodes mounting the fs and corrupting it.
Setup ha or other cluster-aware stack is just the cost that we have to
take for avoiding corruption, otherwise we have to do it in kernel.
Link: https://lkml.kernel.org/r/20220603222801.42488-1-junxiao.bi@oracle.com Fixes: 912f655d78c5("ocfs2: mount shared volume without ha stack") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <heming.zhao@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes the following trace which is caused by hci_rx_work starting up
*after* the final channel reference has been put() during sock_close() but
*before* the references to the channel have been destroyed, so instead
the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to
prevent referencing a channel that is about to be destroyed.
refcount_t: increment on 0; use-after-free.
BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28
Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150
Google Inc. MSM sm8150 Flame DVT (DT)
Workqueue: hci0 hci_rx_work
Call trace:
dump_backtrace+0x0/0x378
show_stack+0x20/0x2c
dump_stack+0x124/0x148
print_address_description+0x80/0x2e8
__kasan_report+0x168/0x188
kasan_report+0x10/0x18
__asan_load4+0x84/0x8c
refcount_dec_and_test+0x20/0xd0
l2cap_chan_put+0x48/0x12c
l2cap_recv_frame+0x4770/0x6550
l2cap_recv_acldata+0x44c/0x7a4
hci_acldata_packet+0x100/0x188
hci_rx_work+0x178/0x23c
process_one_work+0x35c/0x95c
worker_thread+0x4cc/0x960
kthread+0x1a8/0x1c4
ret_from_fork+0x10/0x18
Cc: stable@kernel.org Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sedat Dilek noticed that I had an extraneous semicolon at the end of a
line in the previous patch.
It's harmless, but unintentional, and while compilers just treat it as
an extra empty statement, for all I know some other tooling might warn
about it. So clean it up before other people notice too ;)
Problems observed:
======================================================================
1) Using ssh/sshfs. The remote sshd daemon can abort with the message:
"message authentication code incorrect"
This happens because the tcp message sent is corrupted during the
USB "Bulk out". The device calculate the tcp checksum and send a
valid tcp message to the remote sshd. Then the encryption detects
the error and aborts.
2) NETDEV WATCHDOG: ... (ax88179_178a): transmit queue 0 timed out
3) Stop normal work without any log message.
The "Bulk in" continue receiving packets normally.
The host sends "Bulk out" and the device responds with -ECONNRESET.
(The netusb.c code tx_complete ignore -ECONNRESET)
Under normal conditions these errors take days to happen and in
intense usage take hours.
A test with ping gives packet loss, showing that something is wrong:
ping -4 -s 462 {destination} # 462 = 512 - 42 - 8
Not all packets fail.
My guess is that the device tries to find another packet starting
at the extra byte and will fail or not depending on the next
bytes (old buffer content).
======================================================================
Signed-off-by: Jose Alonso <joalonsof@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a race in pty_write(). pty_write() can be called in parallel
with e.g. ioctl(TIOCSTI) or ioctl(TCXONC) which also inserts chars to
the buffer. Provided, tty_flip_buffer_push() in pty_write() is called
outside the lock, it can commit inconsistent tail. This can lead to out
of bounds writes and other issues. See the Link below.
To fix this, we have to introduce a new helper called
tty_insert_flip_string_and_push_buffer(). It does both
tty_insert_flip_string() and tty_flip_buffer_commit() under the port
lock. It also calls queue_work(), but outside the lock. See 71a174b39f10 (pty: do tty_flip_buffer_push without port->lock in
pty_write) for the reasons.
Keep the helper internal-only (in drivers' tty.h). It is not intended to
be used widely.
Link: https://seclists.org/oss-sec/2022/q2/155 Fixes: 71a174b39f10 (pty: do tty_flip_buffer_push without port->lock in pty_write) Cc: 一只狗 <chennbnbnb@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Suggested-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20220707082558.9250-2-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit a9c3f68f3cd8d (tty: Fix low_latency BUG) in 2014,
tty_flip_buffer_push() is only a wrapper to tty_schedule_flip(). All
users were converted in the previous patches, so remove
tty_schedule_flip() completely while inlining its body into
tty_flip_buffer_push().
Since commit a9c3f68f3cd8d (tty: Fix low_latency BUG) in 2014,
tty_flip_buffer_push() is only a wrapper to tty_schedule_flip(). We are
going to remove the latter (as it is used less), so call the former in
the rest of the users.
Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: William Hubbs <w.d.hubbs@gmail.com> Cc: Chris Brannon <chris@the-brannons.com> Cc: Kirk Reiser <kirk@reisers.ca> Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Johan Hovold <johan@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20211122111648.30379-3-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit a9c3f68f3cd8d (tty: Fix low_latency BUG) in 2014,
tty_flip_buffer_push() is only a wrapper to tty_schedule_flip(). We are
going to remove the latter (as it is used less), so call the former in
drivers/tty/.
Cc: Vladimir Zapolskiy <vz@mleia.com> Reviewed-by: Johan Hovold <johan@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20211122111648.30379-2-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the pipe is closed, we mark the associated watchqueue defunct by
calling watch_queue_clear(). However, while that is protected by the
watchqueue lock, new watchqueue entries aren't actually added under that
lock at all: they use the pipe->rd_wait.lock instead, and looking up
that pipe happens without any locking.
The watchqueue code uses the RCU read-side section to make sure that the
wqueue entry itself hasn't disappeared, but that does not protect the
pipe_info in any way.
So make sure to actually hold the wqueue lock when posting watch events,
properly serializing against the pipe being torn down.
On AMD IBRS does not prevent Retbleed; as such use IBPB before a
firmware call to flush the branch history state.
And because in order to do an EFI call, the kernel maps a whole lot of
the kernel page table into the EFI page table, do an IBPB just in case
in order to prevent the scenario of poisoning the BTB and causing an EFI
call using the unprotected RET there.
Since bt_skb_sendmmsg can be used with the likes of SOCK_STREAM it
shall return the partial chunks it could allocate instead of freeing
everything as otherwise it can cause problems like bellow.
Fixes: 81be03e026dc ("Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Link: https://lore.kernel.org/r/d7206e12-1b99-c3be-84f4-df22af427ef5@molgen.mpg.de BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215594 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> (Nokia N9 (MeeGo/Harmattan) Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Passing NULL to PTR_ERR will result in 0 (success), also since the likes of
bt_skb_sendmsg does never return NULL it is safe to replace the instances of
IS_ERR_OR_NULL with IS_ERR when checking its return.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Tedd Ho-Jeong An <tedd.an@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
bt_skb_sendmsg helps takes care of allocation the skb and copying the
the contents of msg over to the skb while checking for possible errors
so it should be safe to call it without holding lock_sock.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the standard memory allocator (snd_dma_malloc_pages*())
passes the byte size to allocate as is. Most of the backends
allocates real pages, hence the actual allocations are aligned in page
size. However, the genalloc doesn't seem assuring the size alignment,
hence it may result in the access outside the buffer when the whole
memory pages are exposed via mmap.
For avoiding such inconsistencies, this patch makes the allocation
size always to be aligned in page size.
Note that, after this change, snd_dma_buffer.bytes field contains the
aligned size, not the originally requested size. This value is also
used for releasing the pages in return.
Fix unused but set variable warning building with `make W=1`:
drivers/gpu/drm/imx/dcss/dcss-plane.c:270:6: warning:
variable ‘pixel_format’ set but not used [-Wunused-but-set-variable]
u32 pixel_format;
^~~~~~~~~~~~
Fixes: 9021c317b770 ("drm/imx: Add initial support for DCSS on iMX8MQ") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: Laurentiu Palcu <laurentiu.palcu@oss.nxp.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Link: https://patchwork.freedesktop.org/patch/msgid/20200911014414.4663-1-bobo.shaobowang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch unsets ls_remove_len and ls_remove_name if a message
allocation of a remove messages fails. In this case we never send a
remove message out but set the per ls ls_remove_len ls_remove_name
variable for a pending remove. Unset those variable should indicate
possible waiters in wait_pending_remove() that no pending remove is
going on at this moment.
Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
IBRS mitigation for spectre_v2 forces write to MSR_IA32_SPEC_CTRL at
every kernel entry/exit. On Enhanced IBRS parts setting
MSR_IA32_SPEC_CTRL[IBRS] only once at boot is sufficient. MSR writes at
every kernel entry/exit incur unnecessary performance loss.
When Enhanced IBRS feature is present, print a warning about this
unnecessary performance loss.
Tasks the are being deboosted from SCHED_DEADLINE might enter
enqueue_task_dl() one last time and hit an erroneous BUG_ON condition:
since they are not boosted anymore, the if (is_dl_boosted()) branch is
not taken, but the else if (!dl_prio) is and inside this one we
BUG_ON(!is_dl_boosted), which is of course false (BUG_ON triggered)
otherwise we had entered the if branch above. Long story short, the
current condition doesn't make sense and always leads to triggering of a
BUG.
Fix this by only checking enqueue flags, properly: ENQUEUE_REPLENISH has
to be present, but additional flags are not a problem.
Fixes: 64be6f1f5f71 ("sched/deadline: Don't replenish from a !SCHED_DEADLINE entity") Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220714151908.533052-1-juri.lelli@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: f9aefd6b2aa3 ("net: warn if mac header was not set") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220707123900.945305-1-edumazet@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mpol_set_nodemask()(mm/mempolicy.c) does not set up nodemask when
pol->mode is MPOL_LOCAL. Check pol->mode before access
pol->w.cpuset_mems_allowed in mpol_rebind_policy()(mm/mempolicy.c).
A KVM device cleanup happens in either of two callbacks:
1) destroy() which is called when the VM is being destroyed;
2) release() which is called when a device fd is closed.
Most KVM devices use 1) but Book3s's interrupt controller KVM devices
(XICS, XIVE, XIVE-native) use 2) as they need to close and reopen during
the machine execution. The error handling in kvm_ioctl_create_device()
assumes destroy() is always defined which leads to NULL dereference as
discovered by Syzkaller.
This adds a checks for destroy!=NULL and adds a missing release().
This is not changing kvm_destroy_devices() as devices with defined
release() should have been removed from the KVM devices list by then.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case a IRQ based transfer times out the bcm2835_spi_handle_err()
function is called. Since commit 1513ceee70f2 ("spi: bcm2835: Drop
dma_pending flag") the TX and RX DMA transfers are unconditionally
canceled, leading to NULL pointer derefs if ctlr->dma_tx or
ctlr->dma_rx are not set.
Fix the NULL pointer deref by checking that ctlr->dma_tx and
ctlr->dma_rx are valid pointers before accessing them.
Fixes: 1513ceee70f2 ("spi: bcm2835: Drop dma_pending flag") Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/r/20220719072234.2782764-1-mkl@pengutronix.de Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While reading sysctl_tcp_thin_linear_timeouts, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 36e31b0af587 ("net: TCP thin linear timeouts") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_tcp_recovery, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 4f41b1c58a32 ("tcp: use RACK to detect losses") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_tcp_early_retrans, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: eed530b6c676 ("tcp: early retransmit") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_udp_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 63a6fff353d0 ("net: Avoid receiving packets with an l3mdev on unbound UDP sockets") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
sysctl_ip_prot_sock is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.
Fixes: 4548b683b781 ("Introduce a sysctl that modifies the value of PROT_SOCK.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
In dcss_dev_create() and dcss_dev_destroy(), we should call of_node_put()
in fail path or before the dcss's destroy as of_graph_get_port_by_id() has
increased the refcount.
Fixes: 9021c317b770 ("drm/imx: Add initial support for DCSS on iMX8MQ") Signed-off-by: Liang He <windhl@126.com> Reviewed-by: Laurentiu Palcu <laurentiu.palcu@oss.nxp.com> Signed-off-by: Laurentiu Palcu <laurentiu.palcu@oss.nxp.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220714081337.374761-1-windhl@126.com Signed-off-by: Sasha Levin <sashal@kernel.org>
be_cmd_read_port_transceiver_data assumes that it is given a buffer that
is at least PAGE_DATA_LEN long, or twice that if the module supports SFF
8472. However, this is not always the case.
Fix this by passing the desired offset and length to
be_cmd_read_port_transceiver_data so that we only copy the bytes once.
regmap will sync a range of registers, here use the correct range
to make sure the sync do not touch other unexpected registers.
Find on pca9557pw on imx8qxp/dxl evk board, this device support
8 pin, so only need one register(8 bits) to cover all the 8 pins's
property setting. But when sync the output, we find it actually
update two registers, output register and the following register.
For the device use NO AI mode(not support auto address increment),
only use the single read/write when config the regmap.
We meet issue on PCA9557PW on i.MX8QXP/DXL evk board, this device
do not support AI mode, but when do the regmap sync, regmap will
sync 3 byte data to register 1, logically this means write first
data to register 1, write second data to register 2, write third data
to register 3. But this device do not support AI mode, finally, these
three data write only into register 1 one by one. the reault is the
value of register 1 alway equal to the latest data, here is the third
data, no operation happened on register 2 and register 3. This is
not what we expect.
This can be eventually be reproduced with the following script:
while :
do
echo 63 > /sys/class/net/<devname>/device/sriov_numvfs
sleep 1
echo 0 > /sys/class/net/<devname>/device/sriov_numvfs
sleep 1
done
Add lock when disabling SR-IOV to prevent process VF mailbox communication.
Fixes: d773d1310625 ("ixgbe: Fix memory leak when SR-IOV VFs are direct assigned") Signed-off-by: Piotr Skajewski <piotrx.skajewski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220715214456.2968711-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix an issue when driver incorrectly detects state
of recovery process and erroneously reinitializes interrupts,
which results in a kernel error and call trace message.
The issue was caused by a combination of two factors:
1. Assuming the EMP reset issued after completing
firmware recovery means the whole recovery process is complete.
2. Erroneous reinitialization of interrupt vector after detecting
the above mentioned EMP reset.
Fixes (1) by changing how recovery state change is detected
and (2) by adjusting the conditional expression to ensure using proper
interrupt reinitialization method, depending on the situation.
Fixes: 4ff0ee1af016 ("i40e: Introduce recovery mode support") Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220715214542.2968762-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix memory leak caused by not handling dummy receive descriptor properly.
iavf_get_rx_buffer now sets the rx_buffer return value for dummy receive
descriptors. Without this patch, when the hardware writes a dummy
descriptor, iavf would not free the page allocated for the previous receive
buffer. This is an unlikely event but can still happen.
While reading sysctl_tcp_fastopen_blackhole_timeout, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: cf1ef3f0719b ("net/tcp_fastopen: Disable active side TFO in certain scenarios") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_tcp_fastopen, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 2100c8d2d9db ("net-tcp: Fast Open base") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_igmp_llm_reports, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
This test can be packed into a helper, so such changes will be in the
follow-up series after net is merged into net-next.
if (ipv4_is_local_multicast(pmc->multiaddr) &&
!READ_ONCE(net->ipv4.sysctl_igmp_llm_reports))
Fixes: df2cf4a78e48 ("IGMP: Inhibit reports for local multicast groups") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Socket destruction flow and tls_device_down function sync against each
other using tls_device_lock and the context refcount, to guarantee the
device resources are freed via tls_dev_del() by the end of
tls_device_down.
In the following unfortunate flow, this won't happen:
- refcount is decreased to zero in tls_device_sk_destruct.
- tls_device_down starts, skips the context as refcount is zero, going
all the way until it flushes the gc work, and returns without freeing
the device resources.
- only then, tls_device_queue_ctx_destruction is called, queues the gc
work and frees the context's device resources.
Solve it by decreasing the refcount in the socket's destruction flow
under the tls_device_lock, for perfect synchronization. This does not
slow down the common likely destructor flow, in which both the refcount
is decreased and the spinlock is acquired, anyway.
Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Problems were observed on the Xilinx ZynqMP platform with large I2C reads.
When a read of 277 bytes was performed, the controller NAKed the transfer
after only 252 bytes were transferred and returned an ENXIO error on the
transfer.
There is some code in cdns_i2c_master_isr to handle this case by resetting
the transfer count in the controller before it reaches 0, to allow larger
transfers to work, but it was conditional on the CDNS_I2C_BROKEN_HOLD_BIT
quirk being set on the controller, and ZynqMP uses the r1p14 version of
the core where this quirk is not being set. The requirement to do this to
support larger reads seems like an inherently required workaround due to
the core only having an 8-bit transfer size register, so it does not
appear that this should be conditional on the broken HOLD bit quirk which
is used elsewhere in the driver.
Remove the dependency on the CDNS_I2C_BROKEN_HOLD_BIT for this transfer
size reset logic to fix this problem.
Fixes: 63cab195bf49 ("i2c: removed work arounds in i2c driver for Zynq Ultrascale+ MPSoC") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Shubhrajyoti Datta <Shubhrajyoti.datta@amd.com> Acked-by: Michal Simek <michal.simek@amd.com> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Current stmmac driver will prepare/enable ptp_ref clock in
stmmac_init_tstamp_counter().
The stmmac_pltfr_noirq_suspend will disable it once in suspend flow.
But in resume flow,
stmmac_pltfr_noirq_resume --> stmmac_init_tstamp_counter
stmmac_resume --> stmmac_hw_setup --> stmmac_init_ptp --> stmmac_init_tstamp_counter
ptp_ref clock reference counter increases twice, which leads to unbalance
ptp clock when resume back.
Move ptp_ref clock prepare/enable out of stmmac_init_tstamp_counter to fix it.
Fixes: 0735e639f129d ("net: stmmac: skip only stmmac_ptp_register when resume from suspend") Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_tcp_probe_interval, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 05cbc0db03e8 ("ipv4: Create probe timer for tcp PMTU as per RFC4821") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_tcp_probe_threshold, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 6b58e0a5f32d ("ipv4: Use binary search to choose tcp PMTU probe_size") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_tcp_mtu_probe_floor, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: c04b79b6cfd7 ("tcp: add new tcp_mtu_probe_floor sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_tcp_base_mss, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 5d424d5a674f ("[TCP]: MTU probing") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_tcp_mtu_probing, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 5d424d5a674f ("[TCP]: MTU probing") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_fwmark_reflect, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_ip_autobind_reuse, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 4b01a9674231 ("tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_ip_fwd_update_priority, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 432e05d32892 ("net: ipv4: Control SKB reprioritization after forwarding") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_ip_fwd_use_pmtu, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: f87c10a8aa1e ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The initially merged version of the igc driver code (via commit 146740f9abc4, "igc: Add support for PF") contained the following
IGC_REMOVED checks in the igc_rd32/wr32() MMIO accessors:
/* reads should not return all F's */
if (!(~value) && (!reg || !(~readl(hw_addr))))
hw->hw_addr = NULL;
return value;
}
And:
#define wr32(reg, val) \
do { \
u8 __iomem *hw_addr = READ_ONCE((hw)->hw_addr); \
if (!IGC_REMOVED(hw_addr)) \
writel((val), &hw_addr[(reg)]); \
} while (0)
E.g. igb has similar checks in its MMIO accessors, and has a similar
macro E1000_REMOVED, which is implemented as follows:
#define E1000_REMOVED(h) unlikely(!(h))
These checks serve to detect and take note of an 0xffffffff MMIO read
return from the device, which can be caused by a PCIe link flap or some
other kind of PCI bus error, and to avoid performing MMIO reads and
writes from that point onwards.
However, the IGC_REMOVED macro was not originally implemented:
This led to the IGC_REMOVED logic to be removed entirely in a
subsequent commit (commit 3c215fb18e70, "igc: remove IGC_REMOVED
function"), with the rationale that such checks matter only for
virtualization and that igc does not support virtualization -- but a
PCIe device can become detached even without virtualization being in
use, and without proper checks, a PCIe bus error affecting an igc
adapter will lead to various NULL pointer dereferences, as the first
access after the error will set hw->hw_addr to NULL, and subsequent
accesses will blindly dereference this now-NULL pointer.
This patch reinstates the IGC_REMOVED checks in igc_rd32/wr32(), and
implements IGC_REMOVED the way it is done for igb, by checking for the
unlikely() case of hw_addr being NULL. This change prevents the oopses
seen when a PCIe link flap occurs on an igc adapter.
Fixes: 146740f9abc4 ("igc: Add support for PF") Signed-off-by: Lennert Buytenhek <buytenh@arista.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Stutter mode is a power saving feature on GPUs, however at
least one early raven system exhibits stability issues with
it. Add a quirk to disable it for that system.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214417 Fixes: 005440066f929b ("drm/amdgpu: enable gfxoff again on raven series (v2)") Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
After this; e1 is attached to an unmapped rb and a subsequent
perf_mmap() will loop forever more:
again:
mutex_lock(&e->mmap_mutex);
if (event->rb) {
...
if (!atomic_inc_not_zero(&e->rb->mmap_count)) {
...
mutex_unlock(&e->mmap_mutex);
goto again;
}
}
The loop in perf_mmap_close() holds e2->mmap_mutex, while the attach
in perf_event_set_output() holds e1->mmap_mutex. As such there is no
serialization to avoid this race.
Change perf_event_set_output() to take both e1->mmap_mutex and
e2->mmap_mutex to alleviate that problem. Additionally, have the loop
in perf_mmap() detach the rb directly, this avoids having to wait for
the concurrent perf_mmap_close() to get around to doing it to make
progress.
Fixes: 9bb5d40cd93c ("perf: Fix mmap() accounting hole") Reported-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yang Jihong <yangjihong1@huawei.com> Link: https://lkml.kernel.org/r/YsQ3jm2GR38SW7uD@worktop.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>