Christoph Hellwig [Thu, 11 Jul 2024 07:17:03 +0000 (09:17 +0200)]
nfs: split nfs_read_folio
nfs_read_folio is a bit hard to follow because it mixes highlevel logic
with the actual data read. Split the latter into a helper and update
the comments to be more accurate.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Christoph Hellwig [Thu, 11 Jul 2024 07:17:02 +0000 (09:17 +0200)]
nfs: pass explicit offset/count to trace events
nfs_folio_length is unsafe to use without having the folio locked and a
check for a NULL ->f_mapping that protects against truncations and can
lead to kernel crashes. E.g. when running xfstests generic/065 with
all nfs trace points enabled.
Follow the model of the XFS trace points and pass in an explіcit offset
and length. This has the additional benefit that these values can
be more accurate as some of the users touch partial folio ranges.
Fixes: eb5654b3b89d ("NFS: Enable tracing of nfs_invalidate_folio() and nfs_launder_folio()") Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Christoph Hellwig [Fri, 5 Jul 2024 05:42:51 +0000 (07:42 +0200)]
nfs: do not extend writes to the entire folio
nfs_update_folio has code to extend a write to the entire page under
certain conditions. With the support for large folios this now
suddenly extents to the variable sized and potentially much larger folio.
Add code to limit the extension to the page boundaries of the start and
end of the write, which matches the historic expecation and the code
comments.
Fixes: b73fe2dd6cd5 ("nfs: add support for large folios") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Christoph Hellwig [Fri, 5 Jul 2024 16:51:41 +0000 (18:51 +0200)]
nfs/blocklayout: add support for NVMe
Look for the udev generated persistent device name for NVMe devices
in addition to the SCSI ones and the Redhat-specific device mapper
name.
This is the client side implementation of RFC 9561 "Using the Parallel
NFS (pNFS) SCSI Layout to Access Non-Volatile Memory Express (NVMe)
Storage Devices".
Note that the udev rules for nvme are a bit of a mess and udev will only
create a link for the uuid if the NVMe namespace has one, and not the
NGUID. As the current RFCs don't support UUID based identifications this
means the layout can't be used on such namespaces out of the box. A
small tweak to the udev rules can work around it, and as the real fix I
will submit a draft to the IETF NFSv4 working group to support UUID-based
identifiers for SCSI and NVMe.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Christoph Hellwig [Mon, 1 Jul 2024 05:26:54 +0000 (07:26 +0200)]
nfs: don't reuse partially completed requests in nfs_lock_and_join_requests
When NFS requests are split into sub-requests, nfs_inode_remove_request
calls nfs_page_group_sync_on_bit to set PG_REMOVE on this sub-request and
only completes the head requests once PG_REMOVE is set on all requests.
This means that when nfs_lock_and_join_requests sees a PG_REMOVE bit, I/O
on the request is in progress and has partially completed. If such a
request is returned to nfs_try_to_update_request, it could be extended
with the newly dirtied region and I/O for the combined range will be
re-scheduled, leading to extra I/O.
Change the logic to instead restart the search for a request when any
PG_REMOVE bit is set, as the completion handler will remove the request
as soon as it can take the page group lock. This not only avoid
extending the I/O but also does the right thing for the callers that
want to cancel or flush the request.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Christoph Hellwig [Mon, 1 Jul 2024 05:26:52 +0000 (07:26 +0200)]
nfs: fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests
Fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests to
prepare for future changes to this code, and move the helpers to write.c
as well.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Christoph Hellwig [Mon, 1 Jul 2024 05:26:50 +0000 (07:26 +0200)]
nfs: simplify nfs_folio_find_and_lock_request
nfs_folio_find_and_lock_request and the nfs_page_group_lock_head helper
called by it spend quite some effort to deal with head vs subrequests.
But given that only the head request can be stashed in the folio private
data, non of that is required.
Fold the locking logic from nfs_page_group_lock_head into
nfs_folio_find_and_lock_request and simplify the result based on the
invariant that we always find the head request in the folio private data.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Christoph Hellwig [Mon, 1 Jul 2024 05:26:49 +0000 (07:26 +0200)]
nfs: remove nfs_folio_private_request
nfs_folio_private_request is a trivial wrapper around, which itself has
fallen out of favor and has been replaced with plain ->private
dereferences in recent folio conversions. Do the same for nfs.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Christoph Hellwig [Mon, 1 Jul 2024 05:26:48 +0000 (07:26 +0200)]
nfs: remove dead code for the old swap over NFS implementation
Remove the code testing folio_test_swapcache either explicitly or
implicitly in pagemap.h headers, as is now handled using the direct I/O
path and not the buffered I/O path that these helpers are located in.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Olga Kornievskaia [Mon, 24 Jun 2024 13:28:27 +0000 (09:28 -0400)]
NFSv4.1 another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server
Previously in order to mark the communication with the DS server,
we tried to use NFS_CS_DS in cl_flags. However, this flag would
only be saved for the DS server and in case where DS equals MDS,
the client would not find a matching nfs_client in nfs_match_client
that represents the MDS (but is also a DS).
Instead, don't rely on the NFS_CS_DS but instead use NFS_CS_PNFS.
Fixes: 379e4adfddd6 ("NFSv4.1: fixup use EXCHGID4_FLAG_USE_PNFS_DS for DS server") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Jan Kara [Mon, 1 Jul 2024 10:50:48 +0000 (12:50 +0200)]
nfs: Block on write congestion
Commit 6df25e58532b ("nfs: remove reliance on bdi congestion")
introduced NFS-private solution for limiting number of writes
outstanding against a particular server. Unlike previous bdi congestion
this algorithm actually works and limits number of outstanding writeback
pages to nfs_congestion_kb which scales with amount of client's memory
and is capped at 256 MB. As a result some workloads such as random
buffered writes over NFS got slower (from ~170 MB/s to ~126 MB/s). The
fio command to reproduce is:
This happens because the client sends ~256 MB worth of dirty pages to
the server and any further background writeback request is ignored until
the number of writeback pages gets below the threshold of 192 MB. By the
time this happens and clients decides to trigger another round of
writeback, the server often has no pages to write and the disk is idle.
To fix this problem and make the client react faster to eased congestion
of the server by blocking waiting for congestion to resolve instead of
aborting writeback. This improves the random 4k buffered write
throughput to 184 MB/s.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Jan Kara [Mon, 1 Jul 2024 10:50:47 +0000 (12:50 +0200)]
nfs: Properly initialize server->writeback
Atomic types should better be initialized with atomic_long_set() instead
of relying on zeroing done by kzalloc(). Clean this up.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Jan Kara [Mon, 1 Jul 2024 10:50:46 +0000 (12:50 +0200)]
nfs: Drop pointless check from nfs_commit_release_pages()
nfss->writeback is updated only when we are ending page writeback and at
that moment we also clear nfss->write_congested. So there's no point in
rechecking congestion state in nfs_commit_release_pages(). Drop the
pointless check.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 25 Jun 2024 20:02:08 +0000 (16:02 -0400)]
nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg
An administrator cannot take action on these messages, but the
reported errors might be helpful for troubleshooting. Transition
them to trace points so these events appear in the trace log and
can be easily lined up with other traced NFS client operations.
Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 25 Jun 2024 20:02:07 +0000 (16:02 -0400)]
nfs/blocklayout: Report only when /no/ device is found
Since commit f931d8374cad ("nfs/blocklayout: refactor block device
opening"), an error is reported when no multi-path device is found.
But this isn't a fatal error if the subsequent device open is
successful. On systems without multi-path devices, this message
always appears whether there is a problem or not.
Instead, generate less system journal noise by reporting an error
only when both open attempts fail. The new error message is more
actionable since it indicates that there is a real configuration
issue to be addressed.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
nfs4_find_get_deviceid() does a lookup without the spin lock first.
If it can't find a matching deviceid, it creates a new device_info
(which calls bl_alloc_deviceid_node, and that registers the device's
PR key).
Then it takes the nfs4_deviceid_lock and looks up the deviceid again.
If it finds it this time, bl_find_get_deviceid() frees the spare
(new) device_info, which unregisters the PR key for the same device.
Any subsequent I/O from this client on that device gets EBADE.
The umount later unregisters the device's PR key again.
To prevent this problem, register the PR key after the deviceid_node
lookup.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Thu, 13 Jun 2024 05:00:55 +0000 (01:00 -0400)]
NFSv4/pNFS: Do layout state recovery upon reboot
Some pNFS implementations, such as flexible files, want the client to
send the layout stats and layout errors that may have incurred while the
metadata server was booting. To do so, the client sends a layoutreturn
with an all-zero stateid while the server is in grace during reboot
recovery.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Thu, 13 Jun 2024 05:00:52 +0000 (01:00 -0400)]
NFSv4/pNFS: Retry the layout return later in case of a timeout or reboot
If the layout return failed due to a timeout or reboot, then leave the
layout segments on the list so that the layout return gets replayed
later.
The exception would be if we're freeing the inode.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Thu, 13 Jun 2024 05:00:50 +0000 (01:00 -0400)]
NFSv4/pNFS: Add a helper to defer failed layoutreturn calls
If the layoutreturn-on-close fails due to an RPC layer problem, such as
a timeout, then we want to retry at a later time. Add a helper function
to allow this.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Thu, 13 Jun 2024 05:00:49 +0000 (01:00 -0400)]
NFSv4/pnfs: Add support for the PNFS_LAYOUT_FILE_BULK_RETURN flag
Add a flag PNFS_LAYOUT_FILE_BULK_RETURN, that will attempt to return all
the layouts in a pnfs_layout_destroy_byfsid/pnfs_layout_destroy_byclid
call, instead of just invalidating them.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Thu, 13 Jun 2024 05:00:48 +0000 (01:00 -0400)]
pNFS: Add a flag argument to pnfs_destroy_layouts_byclid()
Change the bool argument to a flag so that we can add different modes
for doing bulk destroy of a layout. In particular, we will want the
ability to schedule return of all the layouts associated with a given
NFS server when it reboots.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 17 Jun 2024 01:21:37 +0000 (21:21 -0400)]
NFSv4: Don't send delegation-related share access modes to CLOSE
When we set the new share access modes for CLOSE in nfs4_close_prepare().
we should only set a mode of NFS4_SHARE_ACCESS_READ, NFS4_SHARE_ACCESS_WRITE
or NFS4_SHARE_ACCESS_BOTH. Currently, we may also be passing in the NFSv4.1
share modes for controlling delegation requests in OPEN, which is wrong.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 17 Jun 2024 01:21:33 +0000 (21:21 -0400)]
NFSv4: Detect support for OPEN4_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION
If the server supports the NFSv4.2 protocol extension to optimise away
returning a stateid when it returns a delegation, then we cache that
information in another capability flag.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 17 Jun 2024 01:21:31 +0000 (21:21 -0400)]
NFSv4: Don't request atime/mtime/size if they are delegated to us
If the timestamps and size are delegated to the client, then it is
authoritative w.r.t. their values, so we should not be requesting those
values from the server.
Note that this allows us to optimise away most GETATTR calls if the only
changes to the attributes are the result of read() or write().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 17 Jun 2024 01:21:25 +0000 (21:21 -0400)]
NFSv4: Add support for delegated atime and mtime attributes
Ensure that we update the mtime and atime correctly when we read
or write data to the file and when we truncate. Let the server manage
ctime on other attribute updates.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 17 Jun 2024 01:21:19 +0000 (21:21 -0400)]
NFSv4: Clean up open delegation return structure
Instead of having the fields open coded in the struct nfs_openres,
add a separate structure for them so that we can reuse that code
for the WANT_DELEGATION case.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Jeff Johnson [Tue, 25 Jun 2024 16:42:49 +0000 (09:42 -0700)]
fs: nfs: add missing MODULE_DESCRIPTION() macros
Fix the 'make W=1' warnings:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nfs_common/nfs_acl.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nfs_common/grace.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nfs/nfs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nfs/nfsv2.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nfs/nfsv3.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nfs/nfsv4.o
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Dr. David Alan Gilbert [Fri, 31 May 2024 00:00:33 +0000 (01:00 +0100)]
NFS: remove unused struct 'mnt_fhstatus'
'mnt_fhstatus' has been unused since
commit 065015e5efff ("NFS: Remove unused XDR decoder functions").
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NeilBrown [Wed, 19 Jun 2024 01:05:13 +0000 (11:05 +1000)]
SUNRPC: avoid soft lockup when transmitting UDP to reachable server.
Prior to the commit identified below, call_transmit_status() would
handle -EPERM and other errors related to an unreachable server by
falling through to call_status() which added a 3-second delay and
handled the failure as a timeout.
Since that commit, call_transmit_status() falls through to
handle_bind(). For UDP this moves straight on to handle_connect() and
handle_transmit() so we immediately retransmit - and likely get the same
error.
This results in an indefinite loop in __rpc_execute() which triggers a
soft-lockup warning.
For the errors that indicate an unreachable server,
call_transmit_status() should fall back to call_status() as it did
before. This cannot cause the thundering herd that the previous patch
was avoiding, as the call_status() will insert a delay.
Fixes: ed7dc973bd91 ("SUNRPC: Prevent thundering herd when the socket is not connected") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 4 Jun 2024 19:45:27 +0000 (15:45 -0400)]
xprtrdma: Remove temp allocation of rpcrdma_rep objects
The original code was designed so that most calls to
rpcrdma_rep_create() would occur on the NUMA node that the device
preferred. There are a few cases where that's not possible, so
those reps are marked as temporary.
However, we have the device (and its preferred node) already in
rpcrdma_rep_create(), so let's use that to guarantee the memory
is allocated from the correct node.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 4 Jun 2024 19:45:26 +0000 (15:45 -0400)]
xprtrdma: Clean up synopsis of frwr_mr_unmap()
Commit 7a03aeb66c41 ("xprtrdma: Micro-optimize MR DMA-unmapping")
removed the last use of the @r_xprt parameter in this function, but
neglected to remove the parameter itself.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 4 Jun 2024 19:45:25 +0000 (15:45 -0400)]
xprtrdma: Handle device removal outside of the CM event handler
Wait for all disconnects to complete to ensure the transport has
divested all of its hardware resources before the underlying RDMA
device can be removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 4 Jun 2024 19:45:24 +0000 (15:45 -0400)]
rpcrdma: Implement generic device removal
Commit e87a911fed07 ("nvme-rdma: use ib_client API to detect device
removal") explains the benefits of handling device removal outside
of the CM event handler.
Sketch in an IB device removal notification mechanism that can be
used by both the client and server side RPC-over-RDMA transport
implementations.
Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 4 Jun 2024 19:45:23 +0000 (15:45 -0400)]
xprtrdma: Fix rpcrdma_reqs_reset()
Avoid FastReg operations getting MW_BIND_ERR after a reconnect.
rpcrdma_reqs_reset() is called on transport tear-down to get each
rpcrdma_req back into a clean state.
MRs on req->rl_registered are waiting for a FastReg, are already
registered, or are waiting for invalidation. If the transport is
being torn down when reqs_reset() is called, the matching LocalInv
might never be posted. That leaves these MR registered /and/ on
req->rl_free_mrs, where they can be re-used for the next
connection.
Since xprtrdma does not keep specific track of the MR state, it's
not possible to know what state these MRs are in, so the only safe
thing to do is release them immediately.
Fixes: 5de55ce951a1 ("xprtrdma: Release in-flight MRs on disconnect") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Christoph Hellwig [Mon, 27 May 2024 16:36:09 +0000 (18:36 +0200)]
nfs: add support for large folios
NFS already is void of folio size assumption, so just pass the chunk size
to __filemap_get_folio and set the large folio address_space flag for all
regular files.
Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Tanzir Hasan [Tue, 2 Jan 2024 22:22:18 +0000 (22:22 +0000)]
xprtrdma: removed asm-generic headers from verbs.c
asm-generic/barrier.h and asm/bitops.h are already brought into the
file through the header linux/sunrpc/svc_rdma.h and the file can
still be built with their removal. They have been replaced with the
preferred linux/bitops.h and asm/barrier.h to remove the need for the
asm-generic header.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Tanzir Hasan <tanzirh@google.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A set of clk fixes for the Qualcomm, Mediatek, and Allwinner drivers:
- Fix the Qualcomm Stromer Plus PLL set_rate() clk_op to explicitly
set the alpha enable bit and not set bits that don't exist
- Mark Qualcomm IPQ9574 crypto clks as voted to avoid stuck clk
warnings
- Fix the parent of some PLLs on Qualcomm sm6530 so their rate is
correct
- Fix the min/max rate clamping logic in the Allwinner driver that
got broken in v6.9
- Limit runtime PM enabling in the Mediatek driver to only
mt8183-mfgcfg so that system wide resume doesn't break on other
Mediatek SoCs"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: mediatek: mt8183: Only enable runtime PM on mt8183-mfgcfg
clk: sunxi-ng: common: Don't call hw_to_ccu_common on hw without common
clk: qcom: gcc-ipq9574: Add BRANCH_HALT_VOTED flag
clk: qcom: apss-ipq-pll: remove 'config_ctl_hi_val' from Stromer pll configs
clk: qcom: clk-alpha-pll: set ALPHA_EN bit for Stromer Plus PLLs
clk: qcom: gcc-sm6350: Fix gpll6* & gpll7 parents
Merge tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix unnecessary copy to 0 when kernel is booted at address 0
- Fix usercopy crash when dumping dtl via debugfs
- Avoid possible crash when PCI hotplug races with error handling
- Fix kexec crash caused by scv being disabled before other CPUs
call-in
- Fix powerpc selftests build with USERCFLAGS set
Thanks to Anjali K, Ganesh Goudar, Gautam Menghani, Jinglin Wen,
Nicholas Piggin, Sourabh Jain, Srikar Dronamraju, and Vishal Chourasia.
* tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Fix build with USERCFLAGS set
powerpc/pseries: Fix scv instruction crash with kexec
powerpc/eeh: avoid possible crash when edev->pdev changes
powerpc/pseries: Whitelist dtl slub object for copying to userspace
powerpc/64s: Fix unnecessary copy to 0 when kernel is booted at address 0
Merge tag 'i2c-for-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"An i2c driver fix"
* tag 'i2c-for-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr
Michael Ellerman [Sat, 6 Jul 2024 12:08:33 +0000 (22:08 +1000)]
selftests/powerpc: Fix build with USERCFLAGS set
Currently building the powerpc selftests with USERCFLAGS set to anything
causes the build to break:
$ make -C tools/testing/selftests/powerpc V=1 USERCFLAGS=-Wno-error
...
gcc -Wno-error cache_shape.c ...
cache_shape.c:18:10: fatal error: utils.h: No such file or directory
18 | #include "utils.h"
| ^~~~~~~~~
compilation terminated.
This happens because the USERCFLAGS are added to CFLAGS in lib.mk, which
causes the check of CFLAGS in powerpc/flags.mk to skip setting CFLAGS at
all, resulting in none of the usual CFLAGS being passed. That can
be seen in the output above, the only flag passed to the compiler is
-Wno-error.
Fix it by dropping the conditional setting of CFLAGS in flags.mk.
Instead always set CFLAGS, but also append USERCFLAGS if they are set.
Note that appending to CFLAGS (with +=) wouldn't work, because flags.mk
is included by multiple Makefiles (to support partial builds), causing
CFLAGS to be appended to multiple times. Additionally that would place
the USERCFLAGS prior to the standard CFLAGS, meaning the USERCFLAGS
couldn't override the standard flags. Being able to override the
standard flags is desirable, for example for adding -Wno-error.
With the fix in place, the CFLAGS are set correctly, including the
USERCFLAGS:
Merge tag 'integrity-v6.10-fix' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity fix from Mimi Zohar:
"A single bug fix to properly remove all of the securityfs IMA
measurement lists"
* tag 'integrity-v6.10-fix' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: fix wrong zero-assignment during securityfs dentry remove
Merge tag 'pci-v6.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci update from Bjorn Helgaas:
- Update MAINTAINERS and CREDITS to credit Gustavo Pimentel with the
Synopsys DesignWare eDMA driver and reflect that he is no longer at
Synopsys and isn't in a position to maintain the DesignWare xData
traffic generator (Bjorn Helgaas)
* tag 'pci-v6.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
CREDITS: Add Synopsys DesignWare eDMA driver for Gustavo Pimentel
MAINTAINERS: Orphan Synopsys DesignWare xData traffic generator
Merge tag 'riscv-for-linus-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for the CMODX example in the recently added icache flushing
prctl()
- A fix to the perf driver to avoid corrupting event data on counter
overflows when external overflow handlers are in use
- A fix to clear all hardware performance monitor events on boot, to
avoid dangling events firmware or previously booted kernels from
triggering spuriously
- A fix to the perf event probing logic to avoid erroneously reporting
the presence of unimplemented counters. This also prevents some
implemented counters from being reported
- A build fix for the vector sigreturn selftest on clang
- A fix to ftrace, which now requires the previously optional index
argument to ftrace_graph_ret_addr()
- A fix to avoid deadlocking if kexec crash handling triggers in an
interrupt context
* tag 'riscv-for-linus-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: kexec: Avoid deadlock in kexec crash path
riscv: stacktrace: fix usage of ftrace_graph_ret_addr()
riscv: selftests: Fix vsetivli args for clang
perf: RISC-V: Check standard event availability
drivers/perf: riscv: Reset the counter to hpmevent mapping while starting cpus
drivers/perf: riscv: Do not update the event data if uptodate
documentation: Fix riscv cmodx example
- xe: migration error handling + typoed register name in gt setup
- i915: usb-c fix to shut up warnings on MTL+
- panthor: fix sync-only jobs + ioctl validation fix to not EINVAL
wrongly
- panel quirks
- nouveau: NULL deref in get_modes
drm core:
- fbdev big endian fix for the dma memory backed variant
drivers/firmware:
- fix sysfb refcounting"
* tag 'drm-fixes-2024-07-05' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe/mcr: Avoid clobbering DSS steering
drm/xe: fix error handling in xe_migrate_update_pgtables
drm/ttm: Always take the bo delayed cleanup path for imported bos
drm/fbdev-generic: Fix framebuffer on big endian devices
drm/panthor: Fix sync-only jobs
drm/panthor: Don't check the array stride on empty uobj arrays
drm/amdgpu/atomfirmware: silence UBSAN warning
drm/radeon: check bo_va->bo is non-NULL before using it
drm/amd/display: Fix array-index-out-of-bounds in dml2/FCLKChangeSupport
drm/amd/display: Update efficiency bandwidth for dcn351
drm/amd/display: Fix refresh rate range for some panel
drm/amd/display: Account for cursor prefetch BW in DML1 mode support
drm/amd/display: Add refresh rate range check
drm/amd/display: Reset freesync config before update new state
drm: panel-orientation-quirks: Add labels for both Valve Steam Deck revisions
drm: panel-orientation-quirks: Add quirk for Valve Galileo
drm/i915/display: For MTL+ platforms skip mg dp programming
drm/nouveau: fix null pointer dereference in nouveau_connector_get_modes
firmware: sysfb: Fix reference count of sysfb parent device
Merge tag 'gpio-fixes-for-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"Two OF lookup quirks and one fix for an issue in the generic gpio-mmio
driver:
- add two OF lookup quirks for TSC2005 and MIPS Lantiq
- don't try to figure out bgpio_bits from the 'ngpios' property in
gpio-mmio"
* tag 'gpio-fixes-for-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: of: add polarity quirk for TSC2005
gpio: mmio: do not calculate bgpio_bits via "ngpios"
gpiolib: of: fix lookup quirk for MIPS Lantiq
Merge tag 'tpmdd-next-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull TPM fixes from Jarkko Sakkinen:
"This contains the fixes for !chip->auth condition, preventing the
breakage of:
- tpm_ftpm_tee.c
- tpm_i2c_nuvoton.c
- tpm_ibmvtpm.c
- tpm_tis_i2c_cr50.c
- tpm_vtpm_proxy.c
All drivers will continue to work as they did in 6.9, except a single
warning (dev_warn() not WARN()) is printed to klog only to inform that
authenticated sessions are not enabled"
* tag 'tpmdd-next-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: Address !chip->auth in tpm_buf_append_hmac_session*()
tpm: Address !chip->auth in tpm_buf_append_name()
tpm: Address !chip->auth in tpm2_*_auth_session()
Wolfram Sang [Fri, 5 Jul 2024 14:08:55 +0000 (16:08 +0200)]
Merge tag 'i2c-host-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
This tag includes a nice fix in the PNX driver that has been
pending for a long time. Piotr has replaced a potential lock in
the interrupt context with a more efficient and straightforward
handling of the timeout signaling.
DTS for Nokia N900 incorrectly specifies "active high" polarity for
the reset line, while the chip documentation actually specifies it as
"active low". In the past the driver fudged gpiod API and inverted
the logic internally, but it was changed in d0d89493bff8.
Jarkko Sakkinen [Wed, 3 Jul 2024 15:47:46 +0000 (18:47 +0300)]
tpm: Address !chip->auth in tpm_buf_append_hmac_session*()
Unless tpm_chip_bootstrap() was called by the driver, !chip->auth can
cause a null derefence in tpm_buf_hmac_session*(). Thus, address
!chip->auth in tpm_buf_hmac_session*() and remove the fallback
implementation for !TCG_TPM2_HMAC.
Cc: stable@vger.kernel.org # v6.9+ Reported-by: Stefan Berger <stefanb@linux.ibm.com> Closes: https://lore.kernel.org/linux-integrity/20240617193408.1234365-1-stefanb@linux.ibm.com/ Fixes: 1085b8276bb4 ("tpm: Add the rest of the session HMAC API") Tested-by: Michael Ellerman <mpe@ellerman.id.au> # ppc Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Jarkko Sakkinen [Wed, 3 Jul 2024 15:33:14 +0000 (18:33 +0300)]
tpm: Address !chip->auth in tpm_buf_append_name()
Unless tpm_chip_bootstrap() was called by the driver, !chip->auth can
cause a null derefence in tpm_buf_append_name(). Thus, address
!chip->auth in tpm_buf_append_name() and remove the fallback
implementation for !TCG_TPM2_HMAC.
Cc: stable@vger.kernel.org # v6.10+ Reported-by: Stefan Berger <stefanb@linux.ibm.com> Closes: https://lore.kernel.org/linux-integrity/20240617193408.1234365-1-stefanb@linux.ibm.com/ Fixes: d0a25bb961e6 ("tpm: Add HMAC session name/handle append") Tested-by: Michael Ellerman <mpe@ellerman.id.au> # ppc Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Jarkko Sakkinen [Wed, 3 Jul 2024 16:39:27 +0000 (19:39 +0300)]
tpm: Address !chip->auth in tpm2_*_auth_session()
Unless tpm_chip_bootstrap() was called by the driver, !chip->auth can cause
a null derefence in tpm2_*_auth_session(). Thus, address !chip->auth in
tpm2_*_auth_session().
Cc: stable@vger.kernel.org # v6.9+ Reported-by: Stefan Berger <stefanb@linux.ibm.com> Closes: https://lore.kernel.org/linux-integrity/20240617193408.1234365-1-stefanb@linux.ibm.com/ Fixes: 699e3efd6c64 ("tpm: Add HMAC session start and end functions") Tested-by: Michael Ellerman <mpe@ellerman.id.au> # ppc Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Merge tag 'for-6.10-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix folio refcounting when releasing them (encoded write, dummy
extent buffer)
- fix out of bounds read when checking qgroup inherit data
- fix how configurable chunk size is handled in zoned mode
- in the ref-verify tool, fix uninitialized return value when checking
extent owner ref and simple quota are not enabled
* tag 'for-6.10-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix folio refcount in __alloc_dummy_extent_buffer()
btrfs: fix folio refcount in btrfs_do_encoded_write()
btrfs: fix uninitialized return value in the ref-verify tool
btrfs: always do the basic checks for btrfs_qgroup_inherit structure
btrfs: zoned: fix calc_available_free_space() for zoned mode
Merge tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, wireless and netfilter.
There's one fix for power management with Intel's e1000e here,
Thorsten tells us there's another problem that started in v6.9. We're
trying to wrap that up but I don't think it's blocking.
Current release - new code bugs:
- wifi: mac80211: disable softirqs for queued frame handling
- af_unix: fix uninit-value in __unix_walk_scc(), with the new
garbage collection algo
Previous releases - regressions:
- Bluetooth:
- qca: fix BT enable failure for QCA6390 after warm reboot
- add quirk to ignore reserved PHY bits in LE Extended Adv Report,
abused by some Broadcom controllers found on Apple machines
- wifi: wilc1000: fix ies_len type in connect path
Previous releases - always broken:
- tcp: fix DSACK undo in fast recovery to call tcp_try_to_open(),
avoid premature timeouts
- net: make sure skb_datagram_iter maps fragments page by page, in
case we somehow get compound highmem mixed in
- eth: bnx2x: fix multiple UBSAN array-index-out-of-bounds when more
queues are used
Misc:
- MAINTAINERS: Remembering Larry Finger"
* tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
bnxt_en: Fix the resource check condition for RSS contexts
mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file
inet_diag: Initialize pad field in struct inet_diag_req_v2
tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.
selftests: make order checking verbose in msg_zerocopy selftest
selftests: fix OOM in msg_zerocopy selftest
ice: use proper macro for testing bit
ice: Reject pin requests with unsupported flags
ice: Don't process extts if PTP is disabled
ice: Fix improper extts handling
selftest: af_unix: Add test case for backtrack after finalising SCC.
af_unix: Fix uninit-value in __unix_walk_scc()
bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()
net: rswitch: Avoid use-after-free in rswitch_poll()
netfilter: nf_tables: unconditionally flush pending work before notifier
wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
wifi: iwlwifi: mvm: avoid link lookup in statistics
wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
wifi: wilc1000: fix ies_len type in connect path
...
Merge tag 's390-6.10-8' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Fix and add physical to virtual address translations in dasd and
virtio_ccw drivers. For virtio_ccw this is just a minimal fix.
More code cleanup will follow.
- Small defconfig updates
* tag 's390-6.10-8' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/dasd: Fix invalid dereferencing of indirect CCW data pointer
s390/vfio_ccw: Fix target addresses of TIC CCWs
s390: Update defconfigs
Merge tag 'mm-hotfixes-stable-2024-07-03-22-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from, Andrew Morton:
"6 hotfies, all cc:stable. Some fixes for longstanding nilfs2 issues
and three unrelated MM fixes"
* tag 'mm-hotfixes-stable-2024-07-03-22-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: fix incorrect inode allocation from reserved inodes
nilfs2: add missing check for inode numbers on directory entries
nilfs2: fix inode number range checks
mm: avoid overflows in dirty throttling logic
Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
mm: optimize the redundant loop of mm_update_owner_next()
bnxt_en: Fix the resource check condition for RSS contexts
While creating a new RSS context, bnxt_rfs_capable() currently
makes a strict check to see if the required VNICs are already
available. If the current VNICs are not what is required,
either too many or not enough, it will call the firmware to
reserve the exact number required.
There is a bug in the firmware when the driver tries to
relinquish some reserved VNICs and RSS contexts. It will
cause the default VNIC to lose its RSS configuration and
cause receive packets to be placed incorrectly.
Workaround this problem by skipping the resource reduction.
The driver will not reduce the VNIC and RSS context reservations
when a context is deleted. The resources will be available for
use when new contexts are created later.
Potentially, this workaround can cause us to run out of VNIC
and RSS contexts if there are a lot of VF functions creating
and deleting RSS contexts. In the future, we will conditionally
disable this workaround when the firmware fix is available.
Aleksandr Mishin [Wed, 3 Jul 2024 20:32:51 +0000 (23:32 +0300)]
mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file
In case of invalid INI file mlxsw_linecard_types_init() deallocates memory
but doesn't reset pointer to NULL and returns 0. In case of any error
occurred after mlxsw_linecard_types_init() call, mlxsw_linecards_init()
calls mlxsw_linecard_types_fini() which performs memory deallocation again.
Add pointer reset to NULL.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: b217127e5e4e ("mlxsw: core_linecards: Add line card objects and implement provisioning") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20240703203251.8871-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 4 Jul 2024 14:31:54 +0000 (07:31 -0700)]
Merge tag 'wireless-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.10
Hopefully the last fixes for v6.10. Fix a regression in wilc1000
where bitrate Information Elements longer than 255 bytes were broken.
Few fixes also to mac80211 and iwlwifi.
* tag 'wireless-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
wifi: iwlwifi: mvm: avoid link lookup in statistics
wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
wifi: wilc1000: fix ies_len type in connect path
wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP
====================
Paolo Abeni [Thu, 4 Jul 2024 13:31:26 +0000 (15:31 +0200)]
Merge tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains a oneliner patch to inconditionally flush
workqueue containing stale objects to be released, syzbot managed to
trigger UaF. Patch from Florian Westphal.
netfilter pull request 24-07-04
* tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: unconditionally flush pending work before notifier
====================
Shigeru Yoshida [Wed, 3 Jul 2024 09:16:49 +0000 (18:16 +0900)]
inet_diag: Initialize pad field in struct inet_diag_req_v2
KMSAN reported uninit-value access in raw_lookup() [1]. Diag for raw
sockets uses the pad field in struct inet_diag_req_v2 for the
underlying protocol. This field corresponds to the sdiag_raw_protocol
field in struct inet_diag_req_raw.
inet_diag_get_exact_compat() converts inet_diag_req to
inet_diag_req_v2, but leaves the pad field uninitialized. So the issue
occurs when raw_lookup() accesses the sdiag_raw_protocol field.
Fix this by initializing the pad field in
inet_diag_get_exact_compat(). Also, do the same fix in
inet_diag_dump_compat() to avoid the similar issue in the future.
tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.
When we process segments with TCP AO, we don't check it in
tcp_parse_options(). Thus, opt_rx->saw_unknown is set to 1,
which unconditionally triggers the BPF TCP option parser.
Let's avoid the unnecessary BPF invocation.
Fixes: 0a3a809089eb ("net/tcp: Verify inbound TCP-AO signed segments") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20240703033508.6321-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Matt Roper [Wed, 26 Jun 2024 21:05:37 +0000 (14:05 -0700)]
drm/xe/mcr: Avoid clobbering DSS steering
A couple copy/paste mistakes in the code that selects steering targets
for OADDRM and INSTANCE0 unintentionally clobbered the steering target
for DSS ranges in some cases.
The OADDRM/INSTANCE0 values were also not assigned as intended, although
that mistake wound up being harmless since the desired values for those
specific ranges were '0' which the kzalloc of the GT structure should
have already taken care of implicitly.
Matthew Auld [Thu, 20 Jun 2024 10:20:26 +0000 (11:20 +0100)]
drm/xe: fix error handling in xe_migrate_update_pgtables
Don't call drm_suballoc_free with sa_bo pointing to PTR_ERR.
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2120 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240620102025.127699-2-matthew.auld@intel.com
(cherry picked from commit ce6b63336f79ec5f3996de65f452330e395f99ae) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Thomas Hellström [Fri, 28 Jun 2024 15:38:48 +0000 (17:38 +0200)]
drm/ttm: Always take the bo delayed cleanup path for imported bos
Bos can be put with multiple unrelated dma-resv locks held. But
imported bos attempt to grab the bo dma-resv during dma-buf detach
that typically happens during cleanup. That leads to lockde splats
similar to the below and a potential ABBA deadlock.
Fix this by always taking the delayed workqueue cleanup path for
imported bos.
Requesting stable fixes from when the Xe driver was introduced,
since its usage of drm_exec and wide vm dma_resvs appear to be
the first reliable trigger of this.
[22982.116427] ============================================
[22982.116428] WARNING: possible recursive locking detected
[22982.116429] 6.10.0-rc2+ #10 Tainted: G U W
[22982.116430] --------------------------------------------
[22982.116430] glxgears:sh0/5785 is trying to acquire lock:
[22982.116431] ffff8c2bafa539a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: dma_buf_detach+0x3b/0xf0
[22982.116438]
but task is already holding lock:
[22982.116438] ffff8c2d9aba6da8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x49/0x2b0 [drm_exec]
[22982.116442]
other info that might help us debug this:
[22982.116442] Possible unsafe locking scenario:
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: intel-xe@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.8+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240628153848.4989-1-thomas.hellstrom@linux.intel.com
====================
fix OOM and order check in msg_zerocopy selftest
In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.
Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.
And, we find that when lock debugging is on, notifications may not come in
order. Thus, we have order checking outputs managed by cfg_verbose, to
avoid too many outputs in this case.
====================