]> www.infradead.org Git - users/dwmw2/linux.git/log
users/dwmw2/linux.git
15 months agonfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open
Jeff Layton [Fri, 3 Feb 2023 18:18:34 +0000 (13:18 -0500)]
nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open

[ Upstream commit dcd779dc46540e174a6ac8d52fbed23593407317 ]

The nested if statements here make no sense, as you can never reach
"else" branch in the nested statement. Fix the error handling for
when there is a courtesy client that holds a conflicting deny mode.

Fixes: 3d6942715180 ("NFSD: add support for share reservation conflict to courteous server")
Reported-by: 張智諺 <cc85nod@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: fix problems with cleanup on errors in nfsd4_copy
Dai Ngo [Tue, 31 Jan 2023 19:12:29 +0000 (11:12 -0800)]
NFSD: fix problems with cleanup on errors in nfsd4_copy

[ Upstream commit 81e722978ad21072470b73d8f6a50ad62c7d5b7d ]

When nfsd4_copy fails to allocate memory for async_copy->cp_src, or
nfs4_init_copy_state fails, it calls cleanup_async_copy to do the
cleanup for the async_copy which causes page fault since async_copy
is not yet initialized.

This patche rearranges the order of initializing the fields in
async_copy and adds checks in cleanup_async_copy to skip un-initialized
fields.

Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Fixes: 87689df69491 ("NFSD: Shrink size of struct nfsd4_copy")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: don't hand out delegation on setuid files being opened for write
Jeff Layton [Fri, 27 Jan 2023 12:09:33 +0000 (07:09 -0500)]
nfsd: don't hand out delegation on setuid files being opened for write

[ Upstream commit 826b67e6376c2a788e3a62c4860dcd79500a27d5 ]

We had a bug report that xfstest generic/355 was failing on NFSv4.0.
This test sets various combinations of setuid/setgid modes and tests
whether DIO writes will cause them to be stripped.

What I found was that the server did properly strip those bits, but
the client didn't notice because it held a delegation that was not
recalled. The recall didn't occur because the client itself was the
one generating the activity and we avoid recalls in that case.

Clearing setuid bits is an "implicit" activity. The client didn't
specifically request that we do that, so we need the server to issue a
CB_RECALL, or avoid the situation entirely by not issuing a delegation.

The easiest fix here is to simply not give out a delegation if the file
is being opened for write, and the mode has the setuid and/or setgid bit
set. Note that there is a potential race between the mode and lease
being set, so we test for this condition both before and after setting
the lease.

This patch fixes generic/355, generic/683 and generic/684 for me. (Note
that 355 fails only on v4.0, and 683 and 684 require NFSv4.2 to run and
fail).

Reported-by: Boyang Xue <bxue@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: fix leaked reference count of nfsd4_ssc_umount_item
Dai Ngo [Tue, 24 Jan 2023 05:34:13 +0000 (21:34 -0800)]
NFSD: fix leaked reference count of nfsd4_ssc_umount_item

[ Upstream commit 34e8f9ec4c9ac235f917747b23a200a5e0ec857b ]

The reference count of nfsd4_ssc_umount_item is not decremented
on error conditions. This prevents the laundromat from unmounting
the vfsmount of the source file.

This patch decrements the reference count of nfsd4_ssc_umount_item
on error.

Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: clean up potential nfsd_file refcount leaks in COPY codepath
Jeff Layton [Tue, 17 Jan 2023 19:38:31 +0000 (14:38 -0500)]
nfsd: clean up potential nfsd_file refcount leaks in COPY codepath

[ Upstream commit 6ba434cb1a8d403ea9aad1b667c3ea3ad8b3191f ]

There are two different flavors of the nfsd4_copy struct. One is
embedded in the compound and is used directly in synchronous copies. The
other is dynamically allocated, refcounted and tracked in the client
struture. For the embedded one, the cleanup just involves releasing any
nfsd_files held on its behalf. For the async one, the cleanup is a bit
more involved, and we need to dequeue it from lists, unhash it, etc.

There is at least one potential refcount leak in this code now. If the
kthread_create call fails, then both the src and dst nfsd_files in the
original nfsd4_copy object are leaked.

The cleanup in this codepath is also sort of weird. In the async copy
case, we'll have up to four nfsd_file references (src and dst for both
flavors of copy structure). They are both put at the end of
nfsd4_do_async_copy, even though the ones held on behalf of the embedded
one outlive that structure.

Change it so that we always clean up the nfsd_file refs held by the
embedded copy structure before nfsd4_copy returns. Rework
cleanup_async_copy to handle both inter and intra copies. Eliminate
nfsd4_cleanup_intra_ssc since it now becomes a no-op.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: allow nfsd_file_get to sanely handle a NULL pointer
Jeff Layton [Fri, 6 Jan 2023 15:33:47 +0000 (10:33 -0500)]
nfsd: allow nfsd_file_get to sanely handle a NULL pointer

[ Upstream commit 70f62231cdfd52357836733dd31db787e0412ab2 ]

...and remove some now-useless NULL pointer checks in its callers.

Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: enhance inter-server copy cleanup
Dai Ngo [Mon, 19 Dec 2022 00:55:53 +0000 (16:55 -0800)]
NFSD: enhance inter-server copy cleanup

[ Upstream commit df24ac7a2e3a9d0bc68f1756a880e50bfe4b4522 ]

Currently nfsd4_setup_inter_ssc returns the vfsmount of the source
server's export when the mount completes. After the copy is done
nfsd4_cleanup_inter_ssc is called with the vfsmount of the source
server and it searches nfsd_ssc_mount_list for a matching entry
to do the clean up.

The problems with this approach are (1) the need to search the
nfsd_ssc_mount_list and (2) the code has to handle the case where
the matching entry is not found which looks ugly.

The enhancement is instead of nfsd4_setup_inter_ssc returning the
vfsmount, it returns the nfsd4_ssc_umount_item which has the
vfsmount embedded in it. When nfsd4_cleanup_inter_ssc is called
it's passed with the nfsd4_ssc_umount_item directly to do the
clean up so no searching is needed and there is no need to handle
the 'not found' case.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[ cel: adjusted whitespace and variable/function names ]
Reviewed-by: Olga Kornievskaia <kolga@netapp.com>
15 months agonfsd: don't destroy global nfs4_file table in per-net shutdown
Jeff Layton [Sat, 11 Feb 2023 12:50:08 +0000 (07:50 -0500)]
nfsd: don't destroy global nfs4_file table in per-net shutdown

[ Upstream commit 4102db175b5d884d133270fdbd0e59111ce688fc ]

The nfs4_file table is global, so shutting it down when a containerized
nfsd is shut down is wrong and can lead to double-frees. Tear down the
nfs4_file_rhltable in nfs4_state_shutdown instead of
nfs4_state_shutdown_net.

Fixes: d47b295e8d76 ("NFSD: Use rhashtable for managing nfs4_file objects")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2169017
Reported-by: JianHong Yin <jiyin@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: don't free files unconditionally in __nfsd_file_cache_purge
Jeff Layton [Fri, 20 Jan 2023 19:52:14 +0000 (14:52 -0500)]
nfsd: don't free files unconditionally in __nfsd_file_cache_purge

[ Upstream commit 4bdbba54e9b1c769da8ded9abd209d765715e1d6 ]

nfsd_file_cache_purge is called when the server is shutting down, in
which case, tearing things down is generally fine, but it also gets
called when the exports cache is flushed.

Instead of walking the cache and freeing everything unconditionally,
handle it the same as when we have a notification of conflicting access.

Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
Reported-by: Shachar Kagan <skagan@nvidia.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Shachar Kagan <skagan@nvidia.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: replace delayed_work with work_struct for nfsd_client_shrinker
Dai Ngo [Thu, 12 Jan 2023 00:06:51 +0000 (16:06 -0800)]
NFSD: replace delayed_work with work_struct for nfsd_client_shrinker

[ Upstream commit 7c24fa225081f31bc6da6a355c1ba801889ab29a ]

Since nfsd4_state_shrinker_count always calls mod_delayed_work with
0 delay, we can replace delayed_work with work_struct to save some
space and overhead.

Also add the call to cancel_work after unregister the shrinker
in nfs4_state_shutdown_net.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown time
Dai Ngo [Wed, 11 Jan 2023 20:17:09 +0000 (12:17 -0800)]
NFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown time

[ Upstream commit f385f7d244134246f984975ed34cd75f77de479f ]

Currently the nfsd-client shrinker is registered and unregistered at
the time the nfsd module is loaded and unloaded. The problem with this
is the shrinker is being registered before all of the relevant fields
in nfsd_net are initialized when nfsd is started. This can lead to an
oops when memory is low and the shrinker is called while nfsd is not
running.

This patch moves the  register/unregister of nfsd-client shrinker from
module load/unload time to nfsd startup/shutdown time.

Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Use set_bit(RQ_DROPME)
Chuck Lever [Sat, 7 Jan 2023 15:15:35 +0000 (10:15 -0500)]
NFSD: Use set_bit(RQ_DROPME)

[ Upstream commit 5304930dbae82d259bcf7e5611db7c81e7a42eff ]

The premise that "Once an svc thread is scheduled and executing an
RPC, no other processes will touch svc_rqst::rq_flags" is false.
svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd
threads when determining which thread to wake up next.

Fixes: 9315564747cb ("NFSD: Use only RQ_DROPME to signal the need to drop a reply")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoRevert "SUNRPC: Use RMW bitops in single-threaded hot paths"
Chuck Lever [Fri, 6 Jan 2023 17:43:37 +0000 (12:43 -0500)]
Revert "SUNRPC: Use RMW bitops in single-threaded hot paths"

[ Upstream commit 7827c81f0248e3c2f40d438b020f3d222f002171 ]

The premise that "Once an svc thread is scheduled and executing an
RPC, no other processes will touch svc_rqst::rq_flags" is false.
svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd
threads when determining which thread to wake up next.

Found via KCSAN.

Fixes: 28df0988815f ("SUNRPC: Use RMW bitops in single-threaded hot paths")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: fix handling of cached open files in nfsd4_open codepath
Jeff Layton [Thu, 5 Jan 2023 19:55:56 +0000 (14:55 -0500)]
nfsd: fix handling of cached open files in nfsd4_open codepath

[ Upstream commit 0b3a551fa58b4da941efeb209b3770868e2eddd7 ]

Commit fb70bf124b05 ("NFSD: Instantiate a struct file when creating a
regular NFSv4 file") added the ability to cache an open fd over a
compound. There are a couple of problems with the way this currently
works:

It's racy, as a newly-created nfsd_file can end up with its PENDING bit
cleared while the nf is hashed, and the nf_file pointer is still zeroed
out. Other tasks can find it in this state and they expect to see a
valid nf_file, and can oops if nf_file is NULL.

Also, there is no guarantee that we'll end up creating a new nfsd_file
if one is already in the hash. If an extant entry is in the hash with a
valid nf_file, nfs4_get_vfs_file will clobber its nf_file pointer with
the value of op_file and the old nf_file will leak.

Fix both issues by making a new nfsd_file_acquirei_opened variant that
takes an optional file pointer. If one is present when this is called,
we'll take a new reference to it instead of trying to open the file. If
the nfsd_file already has a valid nf_file, we'll just ignore the
optional file and pass the nfsd_file back as-is.

Also rework the tracepoints a bit to allow for an "opened" variant and
don't try to avoid counting acquisitions in the case where we already
have a cached open file.

Fixes: fb70bf124b05 ("NFSD: Instantiate a struct file when creating a regular NFSv4 file")
Cc: Trond Myklebust <trondmy@hammerspace.com>
Reported-by: Stanislav Saner <ssaner@redhat.com>
Reported-and-Tested-by: Ruben Vestergaard <rubenv@drcmr.dk>
Reported-and-Tested-by: Torkil Svensgaard <torkil@drcmr.dk>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: rework refcounting in filecache
Jeff Layton [Sun, 11 Dec 2022 11:19:33 +0000 (06:19 -0500)]
nfsd: rework refcounting in filecache

[ Upstream commit ac3a2585f018f10039b4a856dcb122da88c1c1c9 ]

The filecache refcounting is a bit non-standard for something searchable
by RCU, in that we maintain a sentinel reference while it's hashed. This
in turn requires that we have to do things differently in the "put"
depending on whether its hashed, which we believe to have led to races.

There are other problems in here too. nfsd_file_close_inode_sync can end
up freeing an nfsd_file while there are still outstanding references to
it, and there are a number of subtle ToC/ToU races.

Rework the code so that the refcount is what drives the lifecycle. When
the refcount goes to zero, then unhash and rcu free the object. A task
searching for a nfsd_file is allowed to bump its refcount, but only if
it's not already 0. Ensure that we don't make any other changes to it
until a reference is held.

With this change, the LRU carries a reference. Take special care to deal
with it when removing an entry from the list, and ensure that we only
repurpose the nf_lru list_head when the refcount is 0 to ensure
exclusive access to it.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Avoid clashing function prototypes
Kees Cook [Fri, 2 Dec 2022 20:48:59 +0000 (12:48 -0800)]
NFSD: Avoid clashing function prototypes

[ Upstream commit e78e274eb22d966258a3845acc71d3c5b8ee2ea8 ]

When built with Control Flow Integrity, function prototypes between
caller and function declaration must match. These mismatches are visible
at compile time with the new -Wcast-function-type-strict in Clang[1].

There were 97 warnings produced by NFS. For example:

fs/nfsd/nfs4xdr.c:2228:17: warning: cast from '__be32 (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)') to 'nfsd4_dec' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, void *)') converts to incompatible function type [-Wcast-function-type-strict]
        [OP_ACCESS]             = (nfsd4_dec)nfsd4_decode_access,
                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The enc/dec callbacks were defined as passing "void *" as the second
argument, but were being implicitly cast to a new type. Replace the
argument with union nfsd4_op_u, and perform explicit member selection
in the function body. There are no resulting binary differences.

Changes were made mechanically using the following Coccinelle script,
with minor by-hand fixes for members that didn't already match their
existing argument name:

@find@
identifier func;
type T, opsT;
identifier ops, N;
@@

 opsT ops[] = {
        [N] = (T) func,
 };

@already_void@
identifier find.func;
identifier name;
@@

 func(...,
-void
+union nfsd4_op_u
 *name)
 {
        ...
 }

@proto depends on !already_void@
identifier find.func;
type T;
identifier name;
position p;
@@

 func@p(...,
        T name
 ) {
        ...
   }

@script:python get_member@
type_name << proto.T;
member;
@@

coccinelle.member = cocci.make_ident(type_name.split("_", 1)[1].split(' ',1)[0])

@convert@
identifier find.func;
type proto.T;
identifier proto.name;
position proto.p;
identifier get_member.member;
@@

 func@p(...,
-       T name
+       union nfsd4_op_u *u
 ) {
+       T name = &u->member;
        ...
   }

@cast@
identifier find.func;
type T, opsT;
identifier ops, N;
@@

 opsT ops[] = {
        [N] =
-       (T)
        func,
 };

Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Use only RQ_DROPME to signal the need to drop a reply
Chuck Lever [Sat, 26 Nov 2022 20:55:30 +0000 (15:55 -0500)]
NFSD: Use only RQ_DROPME to signal the need to drop a reply

[ Upstream commit 9315564747cb6a570e99196b3a4880fb817635fd ]

Clean up: NFSv2 has the only two usages of rpc_drop_reply in the
NFSD code base. Since NFSv2 is going away at some point, replace
these in order to simplify the "drop this reply?" check in
nfsd_dispatch().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: add CB_RECALL_ANY tracepoints
Dai Ngo [Thu, 17 Nov 2022 03:44:48 +0000 (19:44 -0800)]
NFSD: add CB_RECALL_ANY tracepoints

[ Upstream commit 638593be55c0b37a1930038460a9918215d5c24b ]

Add tracepoints to trace start and end of CB_RECALL_ANY operation.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
[ cel: added show_rca_mask() macro ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: add delegation reaper to react to low memory condition
Dai Ngo [Thu, 17 Nov 2022 03:44:47 +0000 (19:44 -0800)]
NFSD: add delegation reaper to react to low memory condition

[ Upstream commit 44df6f439a1790a5f602e3842879efa88f346672 ]

The delegation reaper is called by nfsd memory shrinker's on
the 'count' callback. It scans the client list and sends the
courtesy CB_RECALL_ANY to the clients that hold delegations.

To avoid flooding the clients with CB_RECALL_ANY requests, the
delegation reaper sends only one CB_RECALL_ANY request to each
client per 5 seconds.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
[ cel: moved definition of RCA4_TYPE_MASK_RDATA_DLG ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: add support for sending CB_RECALL_ANY
Dai Ngo [Thu, 17 Nov 2022 03:44:46 +0000 (19:44 -0800)]
NFSD: add support for sending CB_RECALL_ANY

[ Upstream commit 3959066b697b5dfbb7141124ae9665337d4bc638 ]

Add XDR encode and decode function for CB_RECALL_ANY.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: refactoring courtesy_client_reaper to a generic low memory shrinker
Dai Ngo [Thu, 17 Nov 2022 03:44:45 +0000 (19:44 -0800)]
NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker

[ Upstream commit a1049eb47f20b9eabf9afb218578fff16b4baca6 ]

Refactoring courtesy_client_reaper to generic low memory
shrinker so it can be used for other purposes.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agotrace: Relocate event helper files
Chuck Lever [Mon, 14 Nov 2022 13:57:43 +0000 (08:57 -0500)]
trace: Relocate event helper files

[ Upstream commit 247c01ff5f8d66e62a404c91733be52fecb8b7f6 ]

Steven Rostedt says:
> The include/trace/events/ directory should only hold files that
> are to create events, not headers that hold helper functions.
>
> Can you please move them out of include/trace/events/ as that
> directory is "special" in the creation of events.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Stable-dep-of: 638593be55c0 ("NFSD: add CB_RECALL_ANY tracepoints")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agolockd: fix file selection in nlmsvc_cancel_blocked
Jeff Layton [Fri, 11 Nov 2022 19:36:38 +0000 (14:36 -0500)]
lockd: fix file selection in nlmsvc_cancel_blocked

[ Upstream commit 9f27783b4dd235ef3c8dbf69fc6322777450323c ]

We currently do a lock_to_openmode call based on the arguments from the
NLM_UNLOCK call, but that will always set the fl_type of the lock to
F_UNLCK, and the O_RDONLY descriptor is always chosen.

Fix it to use the file_lock from the block instead.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agolockd: ensure we use the correct file descriptor when unlocking
Jeff Layton [Fri, 11 Nov 2022 19:36:37 +0000 (14:36 -0500)]
lockd: ensure we use the correct file descriptor when unlocking

[ Upstream commit 69efce009f7df888e1fede3cb2913690eb829f52 ]

Shared locks are set on O_RDONLY descriptors and exclusive locks are set
on O_WRONLY ones. nlmsvc_unlock however calls vfs_lock_file twice, once
for each descriptor, but it doesn't reset fl_file. Ensure that it does.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agolockd: set missing fl_flags field when retrieving args
Jeff Layton [Fri, 11 Nov 2022 19:36:36 +0000 (14:36 -0500)]
lockd: set missing fl_flags field when retrieving args

[ Upstream commit 75c7940d2a86d3f1b60a0a265478cb8fc887b970 ]

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Use struct_size() helper in alloc_session()
Xiu Jianfeng [Fri, 11 Nov 2022 09:18:35 +0000 (17:18 +0800)]
NFSD: Use struct_size() helper in alloc_session()

[ Upstream commit 85a0d0c9a58002ef7d1bf5e3ea630f4fbd42a4f0 ]

Use struct_size() helper to simplify the code, no functional changes.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: return error if nfs4_setacl fails
Jeff Layton [Mon, 7 Nov 2022 11:58:41 +0000 (06:58 -0500)]
nfsd: return error if nfs4_setacl fails

[ Upstream commit 01d53a88c08951f88f2a42f1f1e6568928e0590e ]

With the addition of POSIX ACLs to struct nfsd_attrs, we no longer
return an error if setting the ACL fails. Ensure we return the na_aclerr
error on SETATTR if there is one.

Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs")
Cc: Neil Brown <neilb@suse.de>
Reported-by: Yongcheng Yang <yoyang@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Add an nfsd_file_fsync tracepoint
Chuck Lever [Thu, 3 Nov 2022 20:22:48 +0000 (16:22 -0400)]
NFSD: Add an nfsd_file_fsync tracepoint

[ Upstream commit d7064eaf688cfe454c50db9f59298463d80d403c ]

Add a tracepoint to capture the number of filecache-triggered fsync
calls and which files needed it. Also, record when an fsync triggers
a write verifier reset.

Examples:

<...>-97    [007]   262.505611: nfsd_file_free:       inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400
<...>-97    [007]   262.505612: nfsd_file_fsync:      inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 ret=0
<...>-97    [007]   262.505623: nfsd_file_free:       inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00
<...>-97    [007]   262.505624: nfsd_file_fsync:      inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 ret=0

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: fix up the filecache laundrette scheduling
Jeff Layton [Wed, 2 Nov 2022 18:44:50 +0000 (14:44 -0400)]
nfsd: fix up the filecache laundrette scheduling

[ Upstream commit 22ae4c114f77b55a4c5036e8f70409a0799a08f8 ]

We don't really care whether there are hashed entries when it comes to
scheduling the laundrette. They might all be non-gc entries, after all.
We only want to schedule it if there are entries on the LRU.

Switch to using list_lru_count, and move the check into
nfsd_file_gc_worker. The other callsite in nfsd_file_put doesn't need to
count entries, since it only schedules the laundrette after adding an
entry to the LRU.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agofilelock: add a new locks_inode_context accessor function
Jeff Layton [Wed, 16 Nov 2022 14:02:30 +0000 (09:02 -0500)]
filelock: add a new locks_inode_context accessor function

[ Upstream commit 401a8b8fd5acd51582b15238d72a8d0edd580e9f ]

There are a number of places in the kernel that are accessing the
inode->i_flctx field without smp_load_acquire. This is required to
ensure that the caller doesn't see a partially-initialized structure.

Add a new accessor function for it to make this clear and convert all of
the relevant accesses in locks.c to use it. Also, convert
locks_free_lock_context to use the helper as well instead of just doing
a "bare" assignment.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Stable-dep-of: 77c67530e1f9 ("nfsd: use locks_inode_context helper")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: reorganize filecache.c
Jeff Layton [Wed, 2 Nov 2022 18:44:48 +0000 (14:44 -0400)]
nfsd: reorganize filecache.c

[ Upstream commit 8214118589881b2d390284410c5ff275e7a5e03c ]

In a coming patch, we're going to rework how the filecache refcounting
works. Move some code around in the function to reduce the churn in the
later patches, and rename some of the functions with (hopefully) clearer
names: nfsd_file_flush becomes nfsd_file_fsync, and
nfsd_file_unhash_and_dispose is renamed to nfsd_file_unhash_and_queue.

Also, the nfsd_file_put_final tracepoint is renamed to nfsd_file_free,
to better match the name of the function from which it's called.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: remove the pages_flushed statistic from filecache
Jeff Layton [Wed, 2 Nov 2022 18:44:47 +0000 (14:44 -0400)]
nfsd: remove the pages_flushed statistic from filecache

[ Upstream commit 1f696e230ea5198e393368b319eb55651828d687 ]

We're counting mapping->nrpages, but not all of those are necessarily
dirty. We don't really have a simple way to count just the dirty pages,
so just remove this stat since it's not accurate.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Fix licensing header in filecache.c
Chuck Lever [Mon, 31 Oct 2022 13:53:26 +0000 (09:53 -0400)]
NFSD: Fix licensing header in filecache.c

[ Upstream commit 3f054211b29c0fa06dfdcab402c795fd7e906be1 ]

Add a missing SPDX header.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Use rhashtable for managing nfs4_file objects
Chuck Lever [Fri, 28 Oct 2022 14:47:53 +0000 (10:47 -0400)]
NFSD: Use rhashtable for managing nfs4_file objects

[ Upstream commit d47b295e8d76a4d69f0e2ea0cd8a79c9d3488280 ]

fh_match() is costly, especially when filehandles are large (as is
the case for NFSv4). It needs to be used sparingly when searching
data structures. Unfortunately, with common workloads, I see
multiple thousands of objects stored in file_hashtbl[], which has
just 256 buckets, making its bucket hash chains quite lengthy.

Walking long hash chains with the state_lock held blocks other
activity that needs that lock. Sizable hash chains are a common
occurrance once the server has handed out some delegations, for
example -- IIUC, each delegated file is held open on the server by
an nfs4_file object.

To help mitigate the cost of searching with fh_match(), replace the
nfs4_file hash table with an rhashtable, which can dynamically
resize its bucket array to minimize hash chain length.

The result of this modification is an improvement in the latency of
NFSv4 operations, and the reduction of nfsd CPU utilization due to
eliminating the cost of multiple calls to fh_match() and reducing
the CPU cache misses incurred while walking long hash chains in the
nfs4_file hash table.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Refactor find_file()
Chuck Lever [Fri, 28 Oct 2022 14:47:47 +0000 (10:47 -0400)]
NFSD: Refactor find_file()

[ Upstream commit 15424748001a9b5ea62b3e6ad45f0a8b27f01df9 ]

find_file() is now the only caller of find_file_locked(), so just
fold these two together.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Clean up find_or_add_file()
Chuck Lever [Fri, 28 Oct 2022 14:47:41 +0000 (10:47 -0400)]
NFSD: Clean up find_or_add_file()

[ Upstream commit 9270fc514ba7d415636b23bcb937573a1ce54f6a ]

Remove the call to find_file_locked() in insert_nfs4_file(). Tracing
shows that over 99% of these calls return NULL. Thus it is not worth
the expense of the extra bucket list traversal. insert_file() already
deals correctly with the case where the item is already in the hash
bucket.

Since nfsd4_file_hash_insert() is now just a wrapper around
insert_file(), move the meat of insert_file() into
nfsd4_file_hash_insert() and get rid of it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Add a nfsd4_file_hash_remove() helper
Chuck Lever [Fri, 28 Oct 2022 14:47:34 +0000 (10:47 -0400)]
NFSD: Add a nfsd4_file_hash_remove() helper

[ Upstream commit 3341678f2fd6106055cead09e513fad6950a0d19 ]

Refactor to relocate hash deletion operation to a helper function
that is close to most other nfs4_file data structure operations.

The "noinline" annotation will become useful in a moment when the
hlist_del_rcu() is replaced with a more complex rhash remove
operation. It also guarantees that hash remove operations can be
traced with "-p function -l remove_nfs4_file_locked".

This also simplifies the organization of forward declarations: the
to-be-added rhashtable and its param structure will be defined
/after/ put_nfs4_file().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Clean up nfsd4_init_file()
Chuck Lever [Fri, 28 Oct 2022 14:47:28 +0000 (10:47 -0400)]
NFSD: Clean up nfsd4_init_file()

[ Upstream commit 81a21fa3e7fdecb3c5b97014f0fc5a17d5806cae ]

Name this function more consistently. I'm going to use nfsd4_file_
and nfsd4_file_hash_ for these helpers.

Change the @fh parameter to be const pointer for better type safety.

Finally, move the hash insertion operation to the caller. This is
typical for most other "init_object" type helpers, and it is where
most of the other nfs4_file hash table operations are located.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Update file_hashtbl() helpers
Chuck Lever [Fri, 28 Oct 2022 14:47:22 +0000 (10:47 -0400)]
NFSD: Update file_hashtbl() helpers

[ Upstream commit 3fe828caddd81e68e9d29353c6e9285a658ca056 ]

Enable callers to use const pointers for type safety.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Use const pointers as parameters to fh_ helpers
Chuck Lever [Fri, 28 Oct 2022 14:47:16 +0000 (10:47 -0400)]
NFSD: Use const pointers as parameters to fh_ helpers

[ Upstream commit b48f8056c034f28dd54668399f1d22be421b0bef ]

Enable callers to use const pointers where they are able to.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Trace delegation revocations
Chuck Lever [Fri, 28 Oct 2022 14:47:09 +0000 (10:47 -0400)]
NFSD: Trace delegation revocations

[ Upstream commit a1c74569bbde91299f24535abf711be5c84df9de ]

Delegation revocation is an exceptional event that is not otherwise
visible externally (eg, no network traffic is emitted). Generate a
trace record when it occurs so that revocation can be observed or
other activity can be triggered. Example:

nfsd-1104  [005]  1912.002544: nfsd_stid_revoke:        client 633c9343:4e82788d stateid 00000003:00000001 ref=2 type=DELEG

Trace infrastructure is provided for subsequent additional tracing
related to nfs4_stid activity.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Trace stateids returned via DELEGRETURN
Chuck Lever [Fri, 28 Oct 2022 14:47:03 +0000 (10:47 -0400)]
NFSD: Trace stateids returned via DELEGRETURN

[ Upstream commit 20eee313ff4b8a7e71ae9560f5c4ba27cd763005 ]

Handing out a delegation stateid is recorded with the
nfsd_deleg_read tracepoint, but there isn't a matching tracepoint
for recording when the stateid is returned.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Clean up nfs4_preprocess_stateid_op() call sites
Chuck Lever [Fri, 28 Oct 2022 14:46:57 +0000 (10:46 -0400)]
NFSD: Clean up nfs4_preprocess_stateid_op() call sites

[ Upstream commit eeff73f7c1c583f79a401284f46c619294859310 ]

Remove the lame-duck dprintk()s around nfs4_preprocess_stateid_op()
call sites.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Flesh out a documenting comment for filecache.c
Chuck Lever [Tue, 1 Nov 2022 17:30:46 +0000 (13:30 -0400)]
NFSD: Flesh out a documenting comment for filecache.c

[ Upstream commit b3276c1f5b268ff56622e9e125b792b4c3dc03ac ]

Record what we've learned recently about the NFSD filecache in a
documenting comment so our future selves don't forget what all this
is for.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
Chuck Lever [Fri, 28 Oct 2022 14:46:51 +0000 (10:46 -0400)]
NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection

[ Upstream commit 4d1ea8455716ca070e3cd85767e6f6a562a58b1b ]

NFSv4 operations manage the lifetime of nfsd_file items they use by
means of NFSv4 OPEN and CLOSE. Hence there's no need for them to be
garbage collected.

Introduce a mechanism to enable garbage collection for nfsd_file
items used only by NFSv2/3 callers.

Note that the change in nfsd_file_put() ensures that both CLOSE and
DELEGRETURN will actually close out and free an nfsd_file on last
reference of a non-garbage-collected file.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=394
Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately"
Chuck Lever [Fri, 28 Oct 2022 14:46:44 +0000 (10:46 -0400)]
NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately"

[ Upstream commit dcf3f80965ca787c70def402cdf1553c93c75529 ]

This reverts commit 5e138c4a750dc140d881dab4a8804b094bbc08d2.

That commit attempted to make files available to other users as soon
as all NFSv4 clients were done with them, rather than waiting until
the filecache LRU had garbage collected them.

It gets the reference counting wrong, for one thing.

But it also misses that DELEGRETURN should release a file in the
same fashion. In fact, any nfsd_file_put() on an file held open
by an NFSv4 client needs potentially to release the file
immediately...

Clear the way for implementing that idea.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Pass the target nfsd_file to nfsd_commit()
Chuck Lever [Fri, 28 Oct 2022 14:46:38 +0000 (10:46 -0400)]
NFSD: Pass the target nfsd_file to nfsd_commit()

[ Upstream commit c252849082ff525af18b4f253b3c9ece94e951ed ]

In a moment I'm going to introduce separate nfsd_file types, one of
which is garbage-collected; the other, not. The garbage-collected
variety is to be used by NFSv2 and v3, and the non-garbage-collected
variety is to be used by NFSv4.

nfsd_commit() is invoked by both NFSv3 and NFSv4 consumers. We want
nfsd_commit() to find and use the correct variety of cached
nfsd_file object for the NFS version that is in use.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoexportfs: use pr_debug for unreachable debug statements
David Disseldorp [Fri, 21 Oct 2022 12:24:14 +0000 (14:24 +0200)]
exportfs: use pr_debug for unreachable debug statements

[ Upstream commit 427505ffeaa464f683faba945a88d3e3248f6979 ]

expfs.c has a bunch of dprintk statements which are unusable due to:
 #define dprintk(fmt, args...) do{}while(0)
Use pr_debug so that they can be enabled dynamically.
Also make some minor changes to the debug statements to fix some
incorrect types, and remove __func__ which can be handled by dynamic
debug separately.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: allow disabling NFSv2 at compile time
Jeff Layton [Tue, 18 Oct 2022 11:47:56 +0000 (07:47 -0400)]
nfsd: allow disabling NFSv2 at compile time

[ Upstream commit 2f3a4b2ac2f28b9be78ad21f401f31e263845214 ]

rpc.nfsd stopped supporting NFSv2 a year ago. Take the next logical
step toward deprecating it and allow NFSv2 support to be compiled out.

Add a new CONFIG_NFSD_V2 option that can be turned off and rework the
CONFIG_NFSD_V?_ACL option dependencies. Add a description that
discourages enabling it.

Also, change the description of CONFIG_NFSD to state that the always-on
version is now 3 instead of 2.

Finally, add an #ifdef around "case 2:" in __write_versions. When NFSv2
is disabled at compile time, this should make the kernel ignore attempts
to disable it at runtime, but still error out when trying to enable it.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: move nfserrno() to vfs.c
Jeff Layton [Tue, 18 Oct 2022 11:47:55 +0000 (07:47 -0400)]
nfsd: move nfserrno() to vfs.c

[ Upstream commit cb12fae1c34b1fa7eaae92c5aadc72d86d7fae19 ]

nfserrno() is common to all nfs versions, but nfsproc.c is specifically
for NFSv2. Move it to vfs.c, and the prototype to vfs.h.

While we're in here, remove the #ifdef EDQUOT check in this function.
It's apparently a holdover from the initial merge of the nfsd code in
1997. No other place in the kernel checks that that symbol is defined
before using it, so I think we can dispense with it here.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: ignore requests to disable unsupported versions
Jeff Layton [Tue, 18 Oct 2022 11:47:54 +0000 (07:47 -0400)]
nfsd: ignore requests to disable unsupported versions

[ Upstream commit 8e823bafff2308753d430566256c83d8085952da ]

The kernel currently errors out if you attempt to enable or disable a
version that it doesn't recognize. Change it to ignore attempts to
disable an unrecognized version. If we don't support it, then there is
no harm in doing so.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Finish converting the NFSv3 GETACL result encoder
Chuck Lever [Sun, 16 Oct 2022 15:47:08 +0000 (11:47 -0400)]
NFSD: Finish converting the NFSv3 GETACL result encoder

[ Upstream commit 841fd0a3cb490eae5dfd262eccb8c8b11d57f8b8 ]

For some reason, the NFSv2 GETACL result encoder was fully converted
to use the new nfs_stream_encode_acl(), but the NFSv3 equivalent was
not similarly converted.

Fixes: 20798dfe249a ("NFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Remove redundant assignment to variable host_err
Colin Ian King [Mon, 10 Oct 2022 20:24:23 +0000 (21:24 +0100)]
NFSD: Remove redundant assignment to variable host_err

[ Upstream commit 69eed23baf877bbb1f14d7f4df54f89807c9ee2a ]

Variable host_err is assigned a value that is never read, it is being
re-assigned a value in every different execution path in the following
switch statement. The assignment is redundant and can be removed.

Cleans up clang-scan warning:
warning: Value stored to 'host_err' is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Simplify READ_PLUS
Anna Schumaker [Tue, 13 Sep 2022 18:01:51 +0000 (14:01 -0400)]
NFSD: Simplify READ_PLUS

[ Upstream commit eeadcb75794516839078c28b3730132aeb700ce6 ]

Chuck had suggested reverting READ_PLUS so it returns a single DATA
segment covering the requested read range. This prepares the server for
a future "sparse read" function so support can easily be added without
needing to rip out the old READ_PLUS code at the same time.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: use locks_inode_context helper
Jeff Layton [Wed, 16 Nov 2022 14:36:07 +0000 (09:36 -0500)]
nfsd: use locks_inode_context helper

[ Upstream commit 77c67530e1f95ac25c7075635f32f04367380894 ]

nfsd currently doesn't access i_flctx safely everywhere. This requires a
smp_load_acquire, as the pointer is set via cmpxchg (a release
operation).

Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agolockd: use locks_inode_context helper
Jeff Layton [Wed, 16 Nov 2022 14:19:43 +0000 (09:19 -0500)]
lockd: use locks_inode_context helper

[ Upstream commit 98b41ffe0afdfeaa1439a5d6bd2db4a94277e31b ]

lockd currently doesn't access i_flctx safely. This requires a
smp_load_acquire, as the pointer is set via cmpxchg (a release
operation).

Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Fix reads with a non-zero offset that don't end on a page boundary
Chuck Lever [Wed, 23 Nov 2022 19:14:32 +0000 (14:14 -0500)]
NFSD: Fix reads with a non-zero offset that don't end on a page boundary

[ Upstream commit ac8db824ead0de2e9111337c401409d010fba2f0 ]

This was found when virtual machines with nfs-mounted qcow2 disks
failed to boot properly.

Reported-by: Anders Blomdell <anders.blomdell@control.lth.se>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2142132
Fixes: bfbfb6182ad1 ("nfsd_splice_actor(): handle compound pages")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Fix trace_nfsd_fh_verify_err() crasher
Chuck Lever [Sat, 12 Nov 2022 20:06:07 +0000 (15:06 -0500)]
NFSD: Fix trace_nfsd_fh_verify_err() crasher

[ Upstream commit 5a01c805441bdc86e7af206d8a03735cc9394ffb ]

Now that the nfsd_fh_verify_err() tracepoint is always called on
error, it needs to handle cases where the filehandle is not yet
fully formed.

Fixes: 93c128e709ae ("nfsd: ensure we always call fh_verify_error tracepoint")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: put the export reference in nfsd4_verify_deleg_dentry
Jeff Layton [Tue, 8 Nov 2022 16:23:11 +0000 (11:23 -0500)]
nfsd: put the export reference in nfsd4_verify_deleg_dentry

[ Upstream commit 50256e4793a5e5ab77703c82a47344ad2e774a59 ]

nfsd_lookup_dentry returns an export reference in addition to the dentry
ref. Ensure that we put it too.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2138866
Fixes: 876c553cb410 ("NFSD: verify the opened dentry after setting a delegation")
Reported-by: Yongcheng Yang <yoyang@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: fix use-after-free in nfsd_file_do_acquire tracepoint
Jeff Layton [Sat, 5 Nov 2022 13:49:26 +0000 (09:49 -0400)]
nfsd: fix use-after-free in nfsd_file_do_acquire tracepoint

[ Upstream commit bdd6b5624c62d0acd350d07564f1c82fe649235f ]

When we fail to insert into the hashtable with a non-retryable error,
we'll free the object and then goto out_status. If the tracepoint is
enabled, it'll end up accessing the freed object when it tries to
grab the fields out of it.

Set nf to NULL after freeing it to avoid the issue.

Fixes: 243a5263014a ("nfsd: rework hashtable handling in nfsd_do_file_acquire")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: fix net-namespace logic in __nfsd_file_cache_purge
Jeff Layton [Mon, 31 Oct 2022 15:49:21 +0000 (11:49 -0400)]
nfsd: fix net-namespace logic in __nfsd_file_cache_purge

[ Upstream commit d3aefd2b29ff5ffdeb5c06a7d3191a027a18cdb8 ]

If the namespace doesn't match the one in "net", then we'll continue,
but that doesn't cause another rhashtable_walk_next call, so it will
loop infinitely.

Fixes: ce502f81ba88 ("NFSD: Convert the filecache to use rhashtable")
Reported-by: Petr Vorel <pvorel@suse.cz>
Link: https://lore.kernel.org/ltp/Y1%2FP8gDAcWC%2F+VR3@pevik/
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: ensure we always call fh_verify_error tracepoint
Jeff Layton [Wed, 12 Oct 2022 18:42:54 +0000 (14:42 -0400)]
nfsd: ensure we always call fh_verify_error tracepoint

[ Upstream commit 93c128e709aec23b10f3a2f78a824080d4085318 ]

This is a conditional tracepoint. Call it every time, not just when
nfs_permission fails.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: unregister shrinker when nfsd_init_net() fails
Tetsuo Handa [Mon, 10 Oct 2022 05:59:02 +0000 (14:59 +0900)]
NFSD: unregister shrinker when nfsd_init_net() fails

[ Upstream commit bd86c69dae65de30f6d47249418ba7889809e31a ]

syzbot is reporting UAF read at register_shrinker_prepared() [1], for
commit 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on
low memory condition") missed that nfsd4_leases_net_shutdown() from
nfsd_exit_net() is called only when nfsd_init_net() succeeded.
If nfsd_init_net() fails due to nfsd_reply_cache_init() failure,
register_shrinker() from nfsd4_init_leases_net() has to be undone
before nfsd_init_net() returns.

Link: https://syzkaller.appspot.com/bug?extid=ff796f04613b4c84ad89
Reported-by: syzbot <syzbot+ff796f04613b4c84ad89@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on low memory condition")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: rework hashtable handling in nfsd_do_file_acquire
Jeff Layton [Tue, 4 Oct 2022 19:41:10 +0000 (15:41 -0400)]
nfsd: rework hashtable handling in nfsd_do_file_acquire

[ Upstream commit 243a5263014a30436c93ed3f1f864c1da845455e ]

nfsd_file is RCU-freed, so we need to hold the rcu_read_lock long enough
to get a reference after finding it in the hash. Take the
rcu_read_lock() and call rhashtable_lookup directly.

Switch to using rhashtable_lookup_insert_key as well, and use the usual
retry mechanism if we hit an -EEXIST. Rename the "retry" bool to
open_retry, and eliminiate the insert_err goto target.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: fix nfsd_file_unhash_and_dispose
Jeff Layton [Fri, 30 Sep 2022 20:56:02 +0000 (16:56 -0400)]
nfsd: fix nfsd_file_unhash_and_dispose

[ Upstream commit 8d0d254b15cc5b7d46d85fb7ab8ecede9575e672 ]

nfsd_file_unhash_and_dispose() is called for two reasons:

We're either shutting down and purging the filecache, or we've gotten a
notification about a file delete, so we want to go ahead and unhash it
so that it'll get cleaned up when we close.

We're either walking the hashtable or doing a lookup in it and we
don't take a reference in either case. What we want to do in both cases
is to try and unhash the object and put it on the dispose list if that
was successful. If it's no longer hashed, then we don't want to touch
it, with the assumption being that something else is already cleaning
up the sentinel reference.

Instead of trying to selectively decrement the refcount in this
function, just unhash it, and if that was successful, move it to the
dispose list. Then, the disposal routine will just clean that up as
usual.

Also, just make this a void function, drop the WARN_ON_ONCE, and the
comments about deadlocking since the nature of the purported deadlock
is no longer clear.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agofanotify: Remove obsoleted fanotify_event_has_path()
Gaosheng Cui [Mon, 26 Sep 2022 02:30:18 +0000 (10:30 +0800)]
fanotify: Remove obsoleted fanotify_event_has_path()

[ Upstream commit 7a80bf902d2bc722b4477442ee772e8574603185 ]

All uses of fanotify_event_has_path() have
been removed since commit 9c61f3b560f5 ("fanotify: break up
fanotify_alloc_event()"), now it is useless, so remove it.

Link: https://lore.kernel.org/r/20220926023018.1505270-1-cuigaosheng1@huawei.com
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
15 months agofsnotify: remove unused declaration
Gaosheng Cui [Fri, 9 Sep 2022 03:38:28 +0000 (11:38 +0800)]
fsnotify: remove unused declaration

[ Upstream commit f847c74d6e89f10926db58649a05b99237258691 ]

fsnotify_alloc_event_holder() and fsnotify_destroy_event_holder()
has been removed since commit 7053aee26a35 ("fsnotify: do not share
events between notification groups"), so remove it.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agofs/notify: constify path
Al Viro [Thu, 4 Aug 2022 16:57:38 +0000 (12:57 -0400)]
fs/notify: constify path

[ Upstream commit d5bf88895f24686641c39420ee6df716dc1d95d8 ]

Reviewed-by: Matthew Bobrowski <repnop@google.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: extra checks when freeing delegation stateids
Jeff Layton [Mon, 26 Sep 2022 18:41:02 +0000 (14:41 -0400)]
nfsd: extra checks when freeing delegation stateids

[ Upstream commit 895ddf5ed4c54ea9e3533606d7a8b4e4f27f95ef ]

We've had some reports of problems in the refcounting for delegation
stateids that we've yet to track down. Add some extra checks to ensure
that we've removed the object from various lists before freeing it.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2127067
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: make nfsd4_run_cb a bool return function
Jeff Layton [Mon, 26 Sep 2022 18:41:01 +0000 (14:41 -0400)]
nfsd: make nfsd4_run_cb a bool return function

[ Upstream commit b95239ca4954a0d48b19c09ce7e8f31b453b4216 ]

queue_work can return false and not queue anything, if the work is
already queued. If that happens in the case of a CB_RECALL, we'll have
taken an extra reference to the stid that will never be put. Ensure we
throw a warning in that case.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: fix comments about spinlock handling with delegations
Jeff Layton [Mon, 26 Sep 2022 16:38:45 +0000 (12:38 -0400)]
nfsd: fix comments about spinlock handling with delegations

[ Upstream commit 25fbe1fca14142beae6c882f7906510363d42bff ]

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: only fill out return pointer on success in nfsd4_lookup_stateid
Jeff Layton [Mon, 26 Sep 2022 16:38:44 +0000 (12:38 -0400)]
nfsd: only fill out return pointer on success in nfsd4_lookup_stateid

[ Upstream commit 4d01416ab41540bb13ec4a39ac4e6c4aa5934bc9 ]

In the case of a revoked delegation, we still fill out the pointer even
when returning an error, which is bad form. Only overwrite the pointer
on success.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Cap rsize_bop result based on send buffer size
Chuck Lever [Thu, 1 Sep 2022 19:29:55 +0000 (15:29 -0400)]
NFSD: Cap rsize_bop result based on send buffer size

[ Upstream commit 76ce4dcec0dc08a032db916841ddc4e3998be317 ]

Since before the git era, NFSD has conserved the number of pages
held by each nfsd thread by combining the RPC receive and send
buffers into a single array of pages. This works because there are
no cases where an operation needs a large RPC Call message and a
large RPC Reply at the same time.

Once an RPC Call has been received, svc_process() updates
svc_rqst::rq_res to describe the part of rq_pages that can be
used for constructing the Reply. This means that the send buffer
(rq_res) shrinks when the received RPC record containing the RPC
Call is large.

Add an NFSv4 helper that computes the size of the send buffer. It
replaces svc_max_payload() in spots where svc_max_payload() returns
a value that might be larger than the remaining send buffer space.
Callers who need to know the transport's actual maximum payload size
will continue to use svc_max_payload().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Rename the fields in copy_stateid_t
Chuck Lever [Thu, 22 Sep 2022 17:10:35 +0000 (13:10 -0400)]
NFSD: Rename the fields in copy_stateid_t

[ Upstream commit 781fde1a2ba2391f31142f46f964cf1148ca1791 ]

Code maintenance: The name of the copy_stateid_t::sc_count field
collides with the sc_count field in struct nfs4_stid, making the
latter difficult to grep for when auditing stateid reference
counting.

No behavior change expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops
ChenXiaoSong [Thu, 22 Sep 2022 16:31:56 +0000 (00:31 +0800)]
nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops

[ Upstream commit 1342f9dd3fc219089deeb2620f6790f19b4129b1 ]

Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops
ChenXiaoSong [Thu, 22 Sep 2022 16:31:55 +0000 (00:31 +0800)]
nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops

[ Upstream commit 64776611a06322b99386f8dfe3b3ba1aa0347a38 ]

Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.

nfsd_net is converted from seq_file->file instead of seq_file->private in
nfsd_reply_cache_stats_show().

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
[ cel: reduce line length ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops
ChenXiaoSong [Thu, 22 Sep 2022 16:31:54 +0000 (00:31 +0800)]
nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops

[ Upstream commit 1d7f6b302b75ff7acb9eb3cab0c631b10cfa7542 ]

Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.

inode is converted from seq_file->file instead of seq_file->private in
client_info_show().

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes...
ChenXiaoSong [Thu, 22 Sep 2022 16:31:53 +0000 (00:31 +0800)]
nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops

[ Upstream commit 9beeaab8e05d353d709103cafa1941714b4d5d94 ]

Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
[ cel: reduce line length ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops
ChenXiaoSong [Thu, 22 Sep 2022 16:31:52 +0000 (00:31 +0800)]
nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops

[ Upstream commit 0cfb0c4228a5c8e2ed2b58f8309b660b187cef02 ]

Use DEFINE_PROC_SHOW_ATTRIBUTE helper macro to simplify the code.

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Pack struct nfsd4_compoundres
Chuck Lever [Mon, 12 Sep 2022 21:23:36 +0000 (17:23 -0400)]
NFSD: Pack struct nfsd4_compoundres

[ Upstream commit 9f553e61bd36c1048543ac2f6945103dd2f742be ]

Remove a couple of 4-byte holes on platforms with 64-bit pointers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Remove unused nfsd4_compoundargs::cachetype field
Chuck Lever [Mon, 12 Sep 2022 21:23:30 +0000 (17:23 -0400)]
NFSD: Remove unused nfsd4_compoundargs::cachetype field

[ Upstream commit 77e378cf2a595d8e39cddf28a31efe6afd9394a0 ]

This field was added by commit 1091006c5eb1 ("nfsd: turn on reply
cache for NFSv4") but was never put to use.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Remove "inline" directives on op_rsize_bop helpers
Chuck Lever [Mon, 12 Sep 2022 21:23:25 +0000 (17:23 -0400)]
NFSD: Remove "inline" directives on op_rsize_bop helpers

[ Upstream commit 6604148cf961b57fc735e4204f8996536da9253c ]

These helpers are always invoked indirectly, so the compiler can't
inline these anyway. While we're updating the synopses of these
helpers, defensively convert their parameters to const pointers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Clean up nfs4svc_encode_compoundres()
Chuck Lever [Mon, 12 Sep 2022 21:23:19 +0000 (17:23 -0400)]
NFSD: Clean up nfs4svc_encode_compoundres()

[ Upsteam commit 9993a66317fc9951322483a9edbfae95a640b210 ]

In today's Linux NFS server implementation, the NFS dispatcher
initializes each XDR result stream, and the NFSv4 .pc_func and
.pc_encode methods all use xdr_stream-based encoding. This keeps
rq_res.len automatically updated. There is no longer a need for
the WARN_ON_ONCE() check in nfs4svc_encode_compoundres().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Clean up WRITE arg decoders
Chuck Lever [Mon, 12 Sep 2022 21:23:07 +0000 (17:23 -0400)]
NFSD: Clean up WRITE arg decoders

[ Upstream commit d4da5baa533215b14625458e645056baf646bb2e ]

xdr_stream_subsegment() already returns a boolean value.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Use xdr_inline_decode() to decode NFSv3 symlinks
Chuck Lever [Mon, 12 Sep 2022 21:23:02 +0000 (17:23 -0400)]
NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks

[ Upstream commit c3d2a04f05c590303c125a176e6e43df4a436fdb ]

Replace the check for buffer over/underflow with a helper that is
commonly used for this purpose. The helper also sets xdr->nwords
correctly after successfully linearizing the symlink argument into
the stream's scratch buffer.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Refactor common code out of dirlist helpers
Chuck Lever [Mon, 12 Sep 2022 21:22:56 +0000 (17:22 -0400)]
NFSD: Refactor common code out of dirlist helpers

[ Upstream commit 98124f5bd6c76699d514fbe491dd95265369cc99 ]

The dust has settled a bit and it's become obvious what code is
totally common between nfsd_init_dirlist_pages() and
nfsd3_init_dirlist_pages(). Move that common code to SUNRPC.

The new helper brackets the existing xdr_init_decode_pages() API.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing
Chuck Lever [Mon, 12 Sep 2022 21:22:44 +0000 (17:22 -0400)]
NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing

[ Upstream commit 3fdc546462348b8a497c72bc894e0cde9f10fc40 ]

Have SunRPC clear everything except for the iops array. Then have
each NFSv4 XDR decoder clear it's own argument before decoding.

Now individual operations may have a large argument struct while not
penalizing the vast majority of operations with a small struct.

And, clearing the argument structure occurs as the argument fields
are initialized, enabling the CPU to do write combining on that
memory. In some cases, clearing is not even necessary because all
of the fields in the argument structure are initialized by the
decoder.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoSUNRPC: Parametrize how much of argsize should be zeroed
Chuck Lever [Mon, 12 Sep 2022 21:22:38 +0000 (17:22 -0400)]
SUNRPC: Parametrize how much of argsize should be zeroed

[ Upstream commit 103cc1fafee48adb91fca0e19deb869fd23e46ab ]

Currently, SUNRPC clears the whole of .pc_argsize before processing
each incoming RPC transaction. Add an extra parameter to struct
svc_procedure to enable upper layers to reduce the amount of each
operation's argument structure that is zeroed by SUNRPC.

The size of struct nfsd4_compoundargs, in particular, is a lot to
clear on each incoming RPC Call. A subsequent patch will cut this
down to something closer to what NFSv2 and NFSv3 uses.

This patch should cause no behavior changes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: add shrinker to reap courtesy clients on low memory condition
Dai Ngo [Wed, 14 Sep 2022 15:54:26 +0000 (08:54 -0700)]
NFSD: add shrinker to reap courtesy clients on low memory condition

[ Upstream commit 7746b32f467b3813fb61faaab3258de35806a7ac ]

Add courtesy_client_reaper to react to low memory condition triggered
by the system memory shrinker.

The delayed_work for the courtesy_client_reaper is scheduled on
the shrinker's count callback using the laundry_wq.

The shrinker's scan callback is not used for expiring the courtesy
clients due to potential deadlocks.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: keep track of the number of courtesy clients in the system
Dai Ngo [Wed, 14 Sep 2022 15:54:25 +0000 (08:54 -0700)]
NFSD: keep track of the number of courtesy clients in the system

[ Upstream commit 3a4ea23d86a317c4b68b9a69d51f7e84e1e04357 ]

Add counter nfs4_courtesy_client_count to nfsd_net to keep track
of the number of courtesy clients in the system.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Make nfsd4_remove() wait before returning NFS4ERR_DELAY
Chuck Lever [Thu, 8 Sep 2022 22:14:25 +0000 (18:14 -0400)]
NFSD: Make nfsd4_remove() wait before returning NFS4ERR_DELAY

[ Upstream commit 5f5f8b6d655fd947e899b1771c2f7cb581a06764 ]

nfsd_unlink() can kick off a CB_RECALL (via
vfs_unlink() -> leases_conflict()) if a delegation is present.
Before returning NFS4ERR_DELAY, give the client holding that
delegation a chance to return it and then retry the nfsd_unlink()
again, once.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY
Chuck Lever [Thu, 8 Sep 2022 22:14:19 +0000 (18:14 -0400)]
NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY

[ Upstream commit 68c522afd0b1936b48a03a4c8b81261e7597c62d ]

nfsd_rename() can kick off a CB_RECALL (via
vfs_rename() -> leases_conflict()) if a delegation is present.
Before returning NFS4ERR_DELAY, give the client holding that
delegation a chance to return it and then retry the nfsd_rename()
again, once.

This version of the patch handles renaming an existing file,
but does not deal with renaming onto an existing file. That
case will still always trigger an NFS4ERR_DELAY.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY
Chuck Lever [Thu, 8 Sep 2022 22:14:13 +0000 (18:14 -0400)]
NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY

[ Upstream commit 34b91dda7124fc3259e4b2ae53e0c933dedfec01 ]

nfsd_setattr() can kick off a CB_RECALL (via
notify_change() -> break_lease()) if a delegation is present. Before
returning NFS4ERR_DELAY, give the client holding that delegation a
chance to return it and then retry the nfsd_setattr() again, once.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Refactor nfsd_setattr()
Chuck Lever [Thu, 8 Sep 2022 22:14:07 +0000 (18:14 -0400)]
NFSD: Refactor nfsd_setattr()

[ Upstream commit c0aa1913db57219e91a0a8832363cbafb3a9cf8f ]

Move code that will be retried (in a subsequent patch) into a helper
function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Add a mechanism to wait for a DELEGRETURN
Chuck Lever [Thu, 8 Sep 2022 22:14:00 +0000 (18:14 -0400)]
NFSD: Add a mechanism to wait for a DELEGRETURN

[ Upstream commit c035362eb935fe9381d9d1cc453bc2a37460e24c ]

Subsequent patches will use this mechanism to wake up an operation
that is waiting for a client to return a delegation.

The new tracepoint records whether the wait timed out or was
properly awoken by the expected DELEGRETURN:

            nfsd-1155  [002] 83799.493199: nfsd_delegret_wakeup: xid=0x14b7d6ef fh_hash=0xf6826792 (timed out)

Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Add tracepoints to report NFSv4 callback completions
Chuck Lever [Thu, 8 Sep 2022 22:13:54 +0000 (18:13 -0400)]
NFSD: Add tracepoints to report NFSv4 callback completions

[ Upstream commit 1035d65446a018ca2dd179e29a2fcd6d29057781 ]

Wireshark has always been lousy about dissecting NFSv4 callbacks,
especially NFSv4.0 backchannel requests. Add tracepoints so we
can surgically capture these events in the trace log.

Tracepoints are time-stamped and ordered so that we can now observe
the timing relationship between a CB_RECALL Reply and the client's
DELEGRETURN Call. Example:

            nfsd-1153  [002]   211.986391: nfsd_cb_recall:       addr=192.168.1.67:45767 client 62ea82e4:fee7492a stateid 00000003:00000001

            nfsd-1153  [002]   212.095634: nfsd_compound:        xid=0x0000002c opcnt=2
            nfsd-1153  [002]   212.095647: nfsd_compound_status: op=1/2 OP_PUTFH status=0
            nfsd-1153  [002]   212.095658: nfsd_file_put:        hash=0xf72 inode=0xffff9291148c7410 ref=3 flags=HASHED|REFERENCED may=READ file=0xffff929103b3ea00
            nfsd-1153  [002]   212.095661: nfsd_compound_status: op=2/2 OP_DELEGRETURN status=0
   kworker/u25:8-148   [002]   212.096713: nfsd_cb_recall_done:  client 62ea82e4:fee7492a stateid 00000003:00000001 status=0

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Trace NFSv4 COMPOUND tags
Chuck Lever [Thu, 8 Sep 2022 22:13:48 +0000 (18:13 -0400)]
NFSD: Trace NFSv4 COMPOUND tags

[ Upstream commit de29cf7e6cbbe236c3a51999c188fcd467762899 ]

The Linux NFSv4 client implementation does not use COMPOUND tags,
but the Solaris and MacOS implementations do, and so does pynfs.
Record these eye-catchers in the server's trace buffer to annotate
client requests while troubleshooting.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agoNFSD: Replace dprintk() call site in fh_verify()
Chuck Lever [Thu, 8 Sep 2022 22:13:42 +0000 (18:13 -0400)]
NFSD: Replace dprintk() call site in fh_verify()

[ Upstream commit 948755efc951de75c87d4fa916d9d36b58299295 ]

Record permission errors in the trace log. Note that the new trace
event is conditional, so it will only record non-zero return values
from nfsd_permission().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: remove nfsd4_prepare_cb_recall() declaration
Gaosheng Cui [Fri, 9 Sep 2022 06:59:10 +0000 (14:59 +0800)]
nfsd: remove nfsd4_prepare_cb_recall() declaration

[ Upstream commit 18224dc58d960c65446971930d0487fc72d00598 ]

nfsd4_prepare_cb_recall() has been removed since
commit 0162ac2b978e ("nfsd: introduce nfsd4_callback_ops"),
so remove it.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
15 months agonfsd: clean up mounted_on_fileid handling
Jeff Layton [Thu, 8 Sep 2022 16:31:07 +0000 (12:31 -0400)]
nfsd: clean up mounted_on_fileid handling

[ Upstream commit 6106d9119b6599fa23dc556b429d887b4c2d9f62 ]

We only need the inode number for this, not a full rack of attributes.
Rename this function make it take a pointer to a u64 instead of
struct kstat, and change it to just request STATX_INO.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
[ cel: renamed get_mounted_on_ino() ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>