]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
10 months agoio_uring/kbuf: fix unintentional sign extension on shift of reg.bgid
Colin Ian King [Wed, 4 Dec 2024 15:39:23 +0000 (15:39 +0000)]
io_uring/kbuf: fix unintentional sign extension on shift of reg.bgid

Shifting reg.bgid << IORING_OFF_PBUF_SHIFT results in a promotion
from __u16 to a 32 bit signed integer, this is then sign extended
to a 64 bit unsigned long on 64 bit architectures. If reg.bgid is
greater than 0x7fff then this leads to a sign extended result where
all the upper 32 bits of mmap_offset are set to 1. Fix this by
casting reg.bgid to the same type as mmap_offset before performing
the shift.

Fixes: ef62de3c4ad5 ("io_uring/kbuf: use region api for pbuf rings")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20241204153923.401674-1-colin.i.king@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoblock: make bio_integrity_map_user() static inline
Jens Axboe [Fri, 29 Nov 2024 22:53:58 +0000 (15:53 -0700)]
block: make bio_integrity_map_user() static inline

If CONFIG_BLK_DEV_INTEGRITY isn't set, then the dummy helper must be
static inline to avoid complaints about the function being unused.

Fixes: fe8f4ca7107e ("block: modify bio_integrity_map_user to accept iov_iter as argument")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202411300229.y7h60mDg-lkp@intel.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoblock: add support to pass user meta buffer
Kanchan Joshi [Thu, 28 Nov 2024 11:22:40 +0000 (16:52 +0530)]
block: add support to pass user meta buffer

If an iocb contains metadata, extract that and prepare the bip.
Based on flags specified by the user, set corresponding guard/app/ref
tags to be checked in bip.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20241128112240.8867-11-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoscsi: add support for user-meta interface
Anuj Gupta [Thu, 28 Nov 2024 11:22:39 +0000 (16:52 +0530)]
scsi: add support for user-meta interface

Add support for sending user-meta buffer. Set tags to be checked
using flags specified by user/block-layer.
With this change, BIP_CTRL_NOCHECK becomes unused. Remove it.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241128112240.8867-10-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agonvme: add support for passing on the application tag
Kanchan Joshi [Thu, 28 Nov 2024 11:22:38 +0000 (16:52 +0530)]
nvme: add support for passing on the application tag

With user integrity buffer, there is a way to specify the app_tag.
Set the corresponding protocol specific flags and send the app_tag down.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20241128112240.8867-9-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoblock: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags
Anuj Gupta [Thu, 28 Nov 2024 11:22:37 +0000 (16:52 +0530)]
block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags

This patch introduces BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags which
indicate how the hardware should check the integrity payload.
BIP_CHECK_GUARD/REFTAG are conversion of existing semantics, while
BIP_CHECK_APPTAG is a new flag. The driver can now just rely on block
layer flags, and doesn't need to know the integrity source. Submitter
of PI decides which tags to check. This would also give us a unified
interface for user and kernel generated integrity.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20241128112240.8867-8-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring: introduce attributes for read/write and PI support
Anuj Gupta [Thu, 28 Nov 2024 11:22:36 +0000 (16:52 +0530)]
io_uring: introduce attributes for read/write and PI support

Add the ability to pass additional attributes along with read/write.
Application can prepare attibute specific information and pass its
address using the SQE field:
__u64 attr_ptr;

Along with setting a mask indicating attributes being passed:
__u64 attr_type_mask;

Overall 64 attributes are allowed and currently one attribute
'IORING_RW_ATTR_FLAG_PI' is supported.

With PI attribute, userspace can pass following information:
- flags: integrity check flags IO_INTEGRITY_CHK_{GUARD/APPTAG/REFTAG}
- len: length of PI/metadata buffer
- addr: address of metadata buffer
- seed: seed value for reftag remapping
- app_tag: application defined 16b value

Process this information to prepare uio_meta_descriptor and pass it down
using kiocb->private.

PI attribute is supported only for direct IO.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20241128112240.8867-7-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agofs: introduce IOCB_HAS_METADATA for metadata
Anuj Gupta [Thu, 28 Nov 2024 11:22:35 +0000 (16:52 +0530)]
fs: introduce IOCB_HAS_METADATA for metadata

Introduce an IOCB_HAS_METADATA flag for the kiocb struct, for handling
requests containing meta payload.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241128112240.8867-6-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agofs, iov_iter: define meta io descriptor
Anuj Gupta [Thu, 28 Nov 2024 11:22:34 +0000 (16:52 +0530)]
fs, iov_iter: define meta io descriptor

Add flags to describe checks for integrity meta buffer. Also, introduce
a  new 'uio_meta' structure that upper layer can use to pass the
meta/integrity information.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241128112240.8867-5-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoblock: modify bio_integrity_map_user to accept iov_iter as argument
Anuj Gupta [Thu, 28 Nov 2024 11:22:33 +0000 (16:52 +0530)]
block: modify bio_integrity_map_user to accept iov_iter as argument

This patch refactors bio_integrity_map_user to accept iov_iter as
argument. This is a prep patch.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20241128112240.8867-4-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoblock: copy back bounce buffer to user-space correctly in case of split
Christoph Hellwig [Thu, 28 Nov 2024 11:22:32 +0000 (16:52 +0530)]
block: copy back bounce buffer to user-space correctly in case of split

Copy back the bounce buffer to user-space in entirety when the parent
bio completes. The existing code uses bip_iter.bi_size for sizing the
copy, which can be modified. So move away from that and fetch it from
the vector passed to the block layer. While at it, switch to using
better variable names.

Fixes: 492c5d455969f ("block: bio-integrity: directly map user buffers")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20241128112240.8867-3-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoblock: define set of integrity flags to be inherited by cloned bip
Anuj Gupta [Thu, 28 Nov 2024 11:22:31 +0000 (16:52 +0530)]
block: define set of integrity flags to be inherited by cloned bip

Introduce BIP_CLONE_FLAGS describing integrity flags that should be
inherited in the cloned bip from the parent.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20241128112240.8867-2-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/memmap: unify io_uring mmap'ing code
Pavel Begunkov [Fri, 29 Nov 2024 13:34:39 +0000 (13:34 +0000)]
io_uring/memmap: unify io_uring mmap'ing code

All mapped memory is now backed by regions and we can unify and clean
up io_region_validate_mmap() and io_uring_mmap(). Extract a function
looking up a region, the rest of the handling should be generic and just
needs the region.

There is one more ring type specific code, i.e. the mmaping size
truncation quirk for IORING_OFF_[S,C]Q_RING, which is left as is.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f5e1eda1562bfd34276de07465525ae5f10e1e84.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/kbuf: use region api for pbuf rings
Pavel Begunkov [Fri, 29 Nov 2024 13:34:38 +0000 (13:34 +0000)]
io_uring/kbuf: use region api for pbuf rings

Convert internal parts of the provided buffer ring managment to the
region API. It's the last non-region mapped ring we have, so it also
kills a bunch of now unused memmap.c helpers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6c40cf7beaa648558acd4d84bc0fb3279a35d74b.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/kbuf: remove pbuf ring refcounting
Pavel Begunkov [Fri, 29 Nov 2024 13:34:37 +0000 (13:34 +0000)]
io_uring/kbuf: remove pbuf ring refcounting

struct io_buffer_list refcounting was needed for RCU based sync with
mmap, now  we can kill it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4a9cc54bf0077bb2bf2f3daf917549ddd41080da.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/kbuf: use mmap_lock to sync with mmap
Pavel Begunkov [Fri, 29 Nov 2024 13:34:36 +0000 (13:34 +0000)]
io_uring/kbuf: use mmap_lock to sync with mmap

A preparation / cleanup patch simplifying the buf ring - mmap
synchronisation. Instead of relying on RCU, which is trickier, do it by
grabbing the mmap_lock when when anyone tries to publish or remove a
registered buffer to / from ->io_bl_xa.

Modifications of the xarray should always be protected by both
->uring_lock and ->mmap_lock, while lookups should hold either of them.
While a struct io_buffer_list is in the xarray, the mmap related fields
like ->flags and ->buf_pages should stay stable.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/af13bde56ee1a26bcaefaa9aad37a9ea318a590e.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring: use region api for CQ
Pavel Begunkov [Fri, 29 Nov 2024 13:34:35 +0000 (13:34 +0000)]
io_uring: use region api for CQ

Convert internal parts of the CQ/SQ array managment to the region API.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/46fc3c801290d6b1ac16023d78f6b8e685c87fd6.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring: use region api for SQ
Pavel Begunkov [Fri, 29 Nov 2024 13:34:34 +0000 (13:34 +0000)]
io_uring: use region api for SQ

Convert internal parts of the SQ managment to the region API.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1fb73ced6b835cb319ab0fe1dc0b2e982a9a5650.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring: pass ctx to io_register_free_rings
Pavel Begunkov [Fri, 29 Nov 2024 13:34:33 +0000 (13:34 +0000)]
io_uring: pass ctx to io_register_free_rings

A preparation patch, pass the context to io_register_free_rings.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c1865fd2b3d4db22d1a1aac7dd06ea22cb990834.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/memmap: implement mmap for regions
Pavel Begunkov [Fri, 29 Nov 2024 13:34:32 +0000 (13:34 +0000)]
io_uring/memmap: implement mmap for regions

The patch implements mmap for the param region and enables the kernel
allocation mode. Internally it uses a fixed mmap offset, however the
user has to use the offset returned in
struct io_uring_region_desc::mmap_offset.

Note, mmap doesn't and can't take ->uring_lock and the region / ring
lookup is protected by ->mmap_lock, and it's directly peeking at
ctx->param_region. We can't protect io_create_region() with the
mmap_lock as it'd deadlock, which is why io_create_region_mmap_safe()
initialises it for us in a temporary variable and then publishes it
with the lock taken. It's intentionally decoupled from main region
helpers, and in the future we might want to have a list of active
regions, which then could be protected by the ->mmap_lock.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0f1212bd6af7fb39b63514b34fae8948014221d1.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/memmap: implement kernel allocated regions
Pavel Begunkov [Fri, 29 Nov 2024 13:34:31 +0000 (13:34 +0000)]
io_uring/memmap: implement kernel allocated regions

Allow the kernel to allocate memory for a region. That's the classical
way SQ/CQ are allocated. It's not yet useful to user space as there
is no way to mmap it, which is why it's explicitly disabled in
io_register_mem_region().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7b8c40e6542546bbf93f4842a9a42a7373b81e0d.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/memmap: add IO_REGION_F_SINGLE_REF
Pavel Begunkov [Fri, 29 Nov 2024 13:34:30 +0000 (13:34 +0000)]
io_uring/memmap: add IO_REGION_F_SINGLE_REF

Kernel allocated compound pages will have just one reference for the
entire page array, add a flag telling io_free_region about that.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a7abfa7535e9728d5fcade29a1ea1605ec2c04ce.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/memmap: helper for pinning region pages
Pavel Begunkov [Fri, 29 Nov 2024 13:34:29 +0000 (13:34 +0000)]
io_uring/memmap: helper for pinning region pages

In preparation to adding kernel allocated regions extract a new helper
that pins user pages.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a17d7c39c3de4266b66b75b2dcf768150e1fc618.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/memmap: optimise single folio regions
Pavel Begunkov [Fri, 29 Nov 2024 13:34:28 +0000 (13:34 +0000)]
io_uring/memmap: optimise single folio regions

We don't need to vmap if memory is already physically contiguous. There
are two important cases it covers: PAGE_SIZE regions and huge pages.
Use io_check_coalesce_buffer() to get the number of contiguous folios.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d5240af23064a824c29d14d2406f1ae764bf4505.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/memmap: reuse io_free_region for failure path
Pavel Begunkov [Fri, 29 Nov 2024 13:34:27 +0000 (13:34 +0000)]
io_uring/memmap: reuse io_free_region for failure path

Regions are going to become more complex with allocation options and
optimisations, I want to split initialisation into steps and for that it
needs a sane fail path. Reuse io_free_region(), it's smart enough to
undo only what's needed and leaves the structure in a consistent state.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b853b4ec407cc80d033d021bdd2c14e22378fc78.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/memmap: account memory before pinning
Pavel Begunkov [Fri, 29 Nov 2024 13:34:26 +0000 (13:34 +0000)]
io_uring/memmap: account memory before pinning

Move memory accounting before page pinning. It shouldn't even try to pin
pages if it's not allowed, and accounting is also relatively
inexpensive. It also give a better code structure as we do generic
accounting and then can branch for different mapping types.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1e242b8038411a222e8b269d35e021fa5015289f.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/memmap: flag regions with user pages
Pavel Begunkov [Fri, 29 Nov 2024 13:34:25 +0000 (13:34 +0000)]
io_uring/memmap: flag regions with user pages

In preparation to kernel allocated regions add a flag telling if
the region contains user pinned pages or not.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0dc91564642654405bab080b7ec911cb4a43ec6e.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/memmap: flag vmap'ed regions
Pavel Begunkov [Fri, 29 Nov 2024 13:34:24 +0000 (13:34 +0000)]
io_uring/memmap: flag vmap'ed regions

Add internal flags for struct io_mapped_region. The first flag we need
is IO_REGION_F_VMAPPED, that indicates that the pointer has to be
unmapped on region destruction. For now all regions are vmap'ed, so it's
set unconditionally.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5a3d8046a038da97c0f8a8c8f1733fa3fc689d31.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring/rsrc: export io_check_coalesce_buffer
Pavel Begunkov [Fri, 29 Nov 2024 13:34:23 +0000 (13:34 +0000)]
io_uring/rsrc: export io_check_coalesce_buffer

io_try_coalesce_buffer() is a useful helper collecting useful info about
a set of pages, I want to reuse it for analysing ring/etc. mappings. I
don't need the entire thing and only interested if it can be coalesced
into a single page, but that's better than duplicating the parsing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/353b447953cd5d34c454a7d909bb6024c391d6e2.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoio_uring: rename ->resize_lock
Pavel Begunkov [Fri, 29 Nov 2024 13:34:22 +0000 (13:34 +0000)]
io_uring: rename ->resize_lock

->resize_lock is used for resizing rings, but it's a good idea to reuse
it in other cases as well. Rename it into mmap_lock as it's protects
from races with mmap.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/68f705306f3ac4d2fb999eb80ea1615015ce9f7f.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoLinux 6.13-rc4
Linus Torvalds [Sun, 22 Dec 2024 21:22:21 +0000 (13:22 -0800)]
Linux 6.13-rc4

10 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 22 Dec 2024 20:16:41 +0000 (12:16 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM x86 fixes from Paolo Bonzini:

 - Disable AVIC on SNP-enabled systems that don't allow writes to the
   virtual APIC page, as such hosts will hit unexpected RMP #PFs in the
   host when running VMs of any flavor.

 - Fix a WARN in the hypercall completion path due to KVM trying to
   determine if a guest with protected register state is in 64-bit mode
   (KVM's ABI is to assume such guests only make hypercalls in 64-bit
   mode).

 - Allow the guest to write to supported bits in MSR_AMD64_DE_CFG to fix
   a regression with Windows guests, and because KVM's read-only
   behavior appears to be entirely made up.

 - Treat TDP MMU faults as spurious if the faulting access is allowed
   given the existing SPTE. This fixes a benign WARN (other than the
   WARN itself) due to unexpectedly replacing a writable SPTE with a
   read-only SPTE.

 - Emit a warning when KVM is configured with ignore_msrs=1 and also to
   hide the MSRs that the guest is looking for from the kernel logs.
   ignore_msrs can trick guests into assuming that certain processor
   features are present, and this in turn leads to bogus bug reports.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: let it be known that ignore_msrs is a bad idea
  KVM: VMX: don't include '<linux/find.h>' directly
  KVM: x86/mmu: Treat TDP MMU faults as spurious if access is already allowed
  KVM: SVM: Allow guest writes to set MSR_AMD64_DE_CFG bits
  KVM: x86: Play nice with protected guests in complete_hypercall_exit()
  KVM: SVM: Disable AVIC on SNP-enabled system without HvInUseWrAllowed feature

10 months agoMerge tag 'kvm-x86-fixes-6.13-rcN' of https://github.com/kvm-x86/linux into HEAD
Paolo Bonzini [Sun, 22 Dec 2024 17:07:16 +0000 (12:07 -0500)]
Merge tag 'kvm-x86-fixes-6.13-rcN' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 6.13:

 - Disable AVIC on SNP-enabled systems that don't allow writes to the virtual
   APIC page, as such hosts will hit unexpected RMP #PFs in the host when
   running VMs of any flavor.

 - Fix a WARN in the hypercall completion path due to KVM trying to determine
   if a guest with protected register state is in 64-bit mode (KVM's ABI is to
   assume such guests only make hypercalls in 64-bit mode).

 - Allow the guest to write to supported bits in MSR_AMD64_DE_CFG to fix a
   regression with Windows guests, and because KVM's read-only behavior appears
   to be entirely made up.

 - Treat TDP MMU faults as spurious if the faulting access is allowed given the
   existing SPTE.  This fixes a benign WARN (other than the WARN itself) due to
   unexpectedly replacing a writable SPTE with a read-only SPTE.

10 months agoKVM: x86: let it be known that ignore_msrs is a bad idea
Paolo Bonzini [Thu, 19 Dec 2024 12:43:20 +0000 (07:43 -0500)]
KVM: x86: let it be known that ignore_msrs is a bad idea

When running KVM with ignore_msrs=1 and report_ignored_msrs=0, the user has
no clue that that the guest is being lied to.  This may cause bug reports
such as https://gitlab.com/qemu-project/qemu/-/issues/2571, where enabling
a CPUID bit in QEMU caused Linux guests to try reading MSR_CU_DEF_ERR; and
being lied about the existence of MSR_CU_DEF_ERR caused the guest to assume
other things about the local APIC which were not true:

  Sep 14 12:02:53 kernel: mce: [Firmware Bug]: Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.
  Sep 14 12:02:53 kernel: unchecked MSR access error: RDMSR from 0x852 at rIP: 0xffffffffb548ffa7 (native_read_msr+0x7/0x40)
  Sep 14 12:02:53 kernel: Call Trace:
  ...
  Sep 14 12:02:53 kernel:  native_apic_msr_read+0x20/0x30
  Sep 14 12:02:53 kernel:  setup_APIC_eilvt+0x47/0x110
  Sep 14 12:02:53 kernel:  mce_amd_feature_init+0x485/0x4e0
  ...
  Sep 14 12:02:53 kernel: [Firmware Bug]: cpu 0, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu

Without reported_ignored_msrs=0 at least the host kernel log will contain
enough information to avoid going on a wild goose chase.  But if reports
about individual MSR accesses are being silenced too, at least complain
loudly the first time a VM is started.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10 months agoKVM: VMX: don't include '<linux/find.h>' directly
Wolfram Sang [Tue, 17 Dec 2024 07:05:40 +0000 (08:05 +0100)]
KVM: VMX: don't include '<linux/find.h>' directly

The header clearly states that it does not want to be included directly,
only via '<linux/bitmap.h>'. Replace the include accordingly.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Message-ID: <20241217070539.2433-2-wsa+renesas@sang-engineering.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10 months agoMerge tag 'devicetree-fixes-for-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 22 Dec 2024 16:40:23 +0000 (08:40 -0800)]
Merge tag 'devicetree-fixes-for-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Disable #address-cells/#size-cells warning on coreboot (Chromebooks)
   platforms

 - Add missing root #address-cells/#size-cells in default empty DT

 - Fix uninitialized variable in of_irq_parse_one()

 - Fix interrupt-map cell length check in of_irq_parse_imap_parent()

 - Fix refcount handling in __of_get_dma_parent()

 - Fix error path in of_parse_phandle_with_args_map()

 - Fix dma-ranges handling with flags cells

 - Drop explicit fw_devlink handling of 'interrupt-parent'

 - Fix "compression" typo in fixed-partitions binding

 - Unify "fsl,liodn" property type definitions

* tag 'devicetree-fixes-for-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: Add coreboot firmware to excluded default cells list
  of/irq: Fix using uninitialized variable @addr_len in API of_irq_parse_one()
  of/irq: Fix interrupt-map cell length check in of_irq_parse_imap_parent()
  of: Fix refcount leakage for OF node returned by __of_get_dma_parent()
  of: Fix error path in of_parse_phandle_with_args_map()
  dt-bindings: mtd: fixed-partitions: Fix "compression" typo
  of: Add #address-cells/#size-cells in the device-tree root empty node
  dt-bindings: Unify "fsl,liodn" type definitions
  of: address: Preserve the flags portion on 1:1 dma-ranges mapping
  of/unittest: Add empty dma-ranges address translation tests
  of: property: fw_devlink: Do not use interrupt-parent directly

10 months agoMerge tag 'soc-fixes-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Sat, 21 Dec 2024 23:45:06 +0000 (15:45 -0800)]
Merge tag 'soc-fixes-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
 "Two more small fixes, correcting the cacheline size on Raspberry Pi 5
  and fixing a logic mistake in the microchip mpfs firmware driver"

* tag 'soc-fixes-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: dts: broadcom: Fix L2 linesize for Raspberry Pi 5
  firmware: microchip: fix UL_IAP lock check in mpfs_auto_update_state()

10 months agoMerge tag 'mm-hotfixes-stable-2024-12-21-12-09' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 21 Dec 2024 23:31:56 +0000 (15:31 -0800)]
Merge tag 'mm-hotfixes-stable-2024-12-21-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "25 hotfixes.  16 are cc:stable.  19 are MM and 6 are non-MM.

  The usual bunch of singletons and doubletons - please see the relevant
  changelogs for details"

* tag 'mm-hotfixes-stable-2024-12-21-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits)
  mm: huge_memory: handle strsep not finding delimiter
  alloc_tag: fix set_codetag_empty() when !CONFIG_MEM_ALLOC_PROFILING_DEBUG
  alloc_tag: fix module allocation tags populated area calculation
  mm/codetag: clear tags before swap
  mm/vmstat: fix a W=1 clang compiler warning
  mm: convert partially_mapped set/clear operations to be atomic
  nilfs2: fix buffer head leaks in calls to truncate_inode_pages()
  vmalloc: fix accounting with i915
  mm/page_alloc: don't call pfn_to_page() on possibly non-existent PFN in split_large_buddy()
  fork: avoid inappropriate uprobe access to invalid mm
  nilfs2: prevent use of deleted inode
  zram: fix uninitialized ZRAM not releasing backing device
  zram: refuse to use zero sized block device as backing device
  mm: use clear_user_(high)page() for arch with special user folio handling
  mm: introduce cpu_icache_is_aliasing() across all architectures
  mm: add RCU annotation to pte_offset_map(_lock)
  mm: correctly reference merged VMA
  mm: use aligned address in copy_user_gigantic_page()
  mm: use aligned address in clear_gigantic_page()
  mm: shmem: fix ShmemHugePages at swapout
  ...

10 months agostaging: gpib: Fix allyesconfig build failures
Steven Rostedt [Tue, 17 Dec 2024 15:19:04 +0000 (10:19 -0500)]
staging: gpib: Fix allyesconfig build failures

My tests run an allyesconfig build and it failed with the following errors:

    LD [M]  samples/kfifo/dma-example.ko
  ld.lld: error: undefined symbol: nec7210_board_reset
  ld.lld: error: undefined symbol: nec7210_read
  ld.lld: error: undefined symbol: nec7210_write

It appears that some modules call the function nec7210_board_reset()
that is defined in nec7210.c.  In an allyesconfig build, these other
modules are built in.  But the file that holds nec7210_board_reset()
has:

  obj-m += nec7210.o

Where that "-m" means it only gets built as a module. With the other
modules built in, they have no access to nec7210_board_reset() and the build
fails.

This isn't the only function. After fixing that one, I hit another:

  ld.lld: error: undefined symbol: push_gpib_event
  ld.lld: error: undefined symbol: gpib_match_device_path

Where push_gpib_event() was also used outside of the file it was defined
in, and that file too only was built as a module.

Since the directory that nec7210.c is only traversed when
CONFIG_GPIB_NEC7210 is set, and the directory with gpib_common.c is only
traversed when CONFIG_GPIB_COMMON is set, use those configs as the
option to build those modules.  When it is an allyesconfig, then they
will both be built in and their functions will be available to the other
modules that are also built in.

Fixes: 3ba84ac69b53e ("staging: gpib: Add nec7210 GPIB chip driver")
Fixes: 9dde4559e9395 ("staging: gpib: Add GPIB common core driver")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 months agoMerge tag 'kbuild-fixes-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 21 Dec 2024 19:24:32 +0000 (11:24 -0800)]
Merge tag 'kbuild-fixes-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Remove stale code in usr/include/headers_check.pl

 - Fix issues in the user-mode-linux Debian package

 - Fix false-positive "export twice" errors in modpost

* tag 'kbuild-fixes-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  modpost: distinguish same module paths from different dump files
  kbuild: deb-pkg: Do not install maint scripts for arch 'um'
  kbuild: deb-pkg: add debarch for ARCH=um
  kbuild: Drop support for include/asm-<arch> in headers_check.pl

10 months agoMerge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Linus Torvalds [Sat, 21 Dec 2024 19:07:19 +0000 (11:07 -0800)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull BPF fixes from Daniel Borkmann:

 - Fix inlining of bpf_get_smp_processor_id helper for !CONFIG_SMP
   systems (Andrea Righi)

 - Fix BPF USDT selftests helper code to use asm constraint "m" for
   LoongArch (Tiezhu Yang)

 - Fix BPF selftest compilation error in get_uprobe_offset when
   PROCMAP_QUERY is not defined (Jerome Marchand)

 - Fix BPF bpf_skb_change_tail helper when used in context of BPF
   sockmap to handle negative skb header offsets (Cong Wang)

 - Several fixes to BPF sockmap code, among others, in the area of
   socket buffer accounting (Levi Zim, Zijian Zhang, Cong Wang)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Test bpf_skb_change_tail() in TC ingress
  selftests/bpf: Introduce socket_helpers.h for TC tests
  selftests/bpf: Add a BPF selftest for bpf_skb_change_tail()
  bpf: Check negative offsets in __bpf_skb_min_len()
  tcp_bpf: Fix copied value in tcp_bpf_sendmsg
  skmsg: Return copied bytes in sk_msg_memcopy_from_iter
  tcp_bpf: Add sk_rmem_alloc related logic for tcp_bpf ingress redirection
  tcp_bpf: Charge receive socket buffer in bpf_tcp_ingress()
  selftests/bpf: Fix compilation error in get_uprobe_offset()
  selftests/bpf: Use asm constraint "m" for LoongArch
  bpf: Fix bpf_get_smp_processor_id() on !CONFIG_SMP

10 months agoMerge tag 'media/v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Sat, 21 Dec 2024 18:56:34 +0000 (10:56 -0800)]
Merge tag 'media/v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - fix a clang build issue with mediatec vcodec

 - add missing variable initialization to dib3000mb write function

* tag 'media/v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: mediatek: vcodec: mark vdec_vp9_slice_map_counts_eob_coef noinline
  media: dvb-frontends: dib3000mb: fix uninit-value in dib3000_write_reg

10 months agoMerge tag 'pci-v6.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Sat, 21 Dec 2024 18:51:04 +0000 (10:51 -0800)]
Merge tag 'pci-v6.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI fixes from Krzysztof Wilczyński:
 "Two small patches that are important for fixing boot time hang on
  Intel JHL7540 'Titan Ridge' platforms equipped with a Thunderbolt
  controller.

  The boot time issue manifests itself when a PCI Express bandwidth
  control is unnecessarily enabled on the Thunderbolt controller
  downstream ports, which only supports a link speed of 2.5 GT/s in
  accordance with USB4 v2 specification (p. 671, sec. 11.2.1, "PCIe
  Physical Layer Logical Sub-block").

  As such, there is no need to enable bandwidth control on such
  downstream port links, which also works around the issue.

  Both patches were tested by the original reporter on the hardware on
  which the failure origin golly manifested itself. Both fixes were
  proven to resolve the reported boot hang issue, and both patches have
  been in linux-next this week with no reported problems"

* tag 'pci-v6.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/bwctrl: Enable only if more than one speed is supported
  PCI: Honor Max Link Speed when determining supported speeds

10 months agoMerge tag 'pm-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Sat, 21 Dec 2024 18:47:47 +0000 (10:47 -0800)]
Merge tag 'pm-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix some amd-pstate driver issues:

   - Detect preferred core support in amd-pstate before driver
     registration to avoid initialization ordering issues (K Prateek
     Nayak)

   - Fix issues with with boost numerator handling in amd-pstate leading
     to inconsistently programmed CPPC max performance values (Mario
     Limonciello)"

* tag 'pm-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq/amd-pstate: Use boost numerator for upper bound of frequencies
  cpufreq/amd-pstate: Store the boost numerator as highest perf again
  cpufreq/amd-pstate: Detect preferred core support before driver registration

10 months agoMerge tag 'thermal-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Sat, 21 Dec 2024 18:44:44 +0000 (10:44 -0800)]
Merge tag 'thermal-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
 "Fix two issues with the user thermal thresholds feature introduced in
  this development cycle (Daniel Lezcano)"

* tag 'thermal-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal/thresholds: Fix boundaries and detection routine
  thermal/thresholds: Fix uapi header macros leading to a compilation error

10 months agoMerge tag 'acpi-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Sat, 21 Dec 2024 18:42:35 +0000 (10:42 -0800)]
Merge tag 'acpi-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "Unbreak ACPI EC support on LoongArch that has been broken earlier in
  this development cycle (Huacai Chen)"

* tag 'acpi-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: EC: Enable EC support on LoongArch by default

10 months agoMerge tag '6.13-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 21 Dec 2024 17:35:18 +0000 (09:35 -0800)]
Merge tag '6.13-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - fix regression in display of write stats

 - fix rmmod failure with network namespaces

 - two minor cleanups

* tag '6.13-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: fix bytes written value in /proc/fs/cifs/Stats
  smb: client: fix TCP timers deadlock after rmmod
  smb: client: Deduplicate "select NETFS_SUPPORT" in Kconfig
  smb: use macros instead of constants for leasekey size and default cifsattrs value

10 months agoMerge tag 'nfs-for-6.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Sat, 21 Dec 2024 17:32:24 +0000 (09:32 -0800)]
Merge tag 'nfs-for-6.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:

 - NFS/pnfs: Fix a live lock between recalled layouts and layoutget

 - Fix a build warning about an undeclared symbol 'nfs_idmap_cache_timeout'

* tag 'nfs-for-6.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  fs/nfs: fix missing declaration of nfs_idmap_cache_timeout
  NFS/pnfs: Fix a live lock between recalled layouts and layoutget

10 months agoMerge tag 'ceph-for-6.13-rc4' of https://github.com/ceph/ceph-client
Linus Torvalds [Sat, 21 Dec 2024 17:29:46 +0000 (09:29 -0800)]
Merge tag 'ceph-for-6.13-rc4' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A handful of important CephFS fixes from Max, Alex and myself: memory
  corruption due to a buffer overrun, potential infinite loop and
  several memory leaks on the error paths. All but one marked for
  stable"

* tag 'ceph-for-6.13-rc4' of https://github.com/ceph/ceph-client:
  ceph: allocate sparse_ext map only for sparse reads
  ceph: fix memory leak in ceph_direct_read_write()
  ceph: improve error handling and short/overflow-read logic in __ceph_sync_read()
  ceph: validate snapdirname option length when mounting
  ceph: give up on paths longer than PATH_MAX
  ceph: fix memory leaks in __ceph_sync_read()

10 months agomodpost: distinguish same module paths from different dump files
Masahiro Yamada [Thu, 12 Dec 2024 15:46:15 +0000 (00:46 +0900)]
modpost: distinguish same module paths from different dump files

Since commit 13b25489b6f8 ("kbuild: change working directory to external
module directory with M="), module paths are always relative to the top
of the external module tree.

The module paths recorded in Module.symvers are no longer globally unique
when they are passed via KBUILD_EXTRA_SYMBOLS for building other external
modules, which may result in false-positive "exported twice" errors.
Such errors should not occur because external modules should be able to
override in-tree modules.

To address this, record the dump file path in struct module and check it
when searching for a module.

Fixes: 13b25489b6f8 ("kbuild: change working directory to external module directory with M=")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Closes: https://lore.kernel.org/all/eb21a546-a19c-40df-b821-bbba80f19a3d@nvidia.com/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
10 months agokbuild: deb-pkg: Do not install maint scripts for arch 'um'
Nicolas Schier [Thu, 12 Dec 2024 13:05:29 +0000 (14:05 +0100)]
kbuild: deb-pkg: Do not install maint scripts for arch 'um'

Stop installing Debian maintainer scripts when building a
user-mode-linux Debian package.

Debian maintainer scripts are used for e.g. requesting rebuilds of
initrd, rebuilding DKMS modules and updating of grub configuration.  As
all of this is not relevant for UML but also may lead to failures while
processing the kernel hooks, do no more install maintainer scripts for
the UML package.

Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Nicolas Schier <nicolas@fjasle.eu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
10 months agokbuild: deb-pkg: add debarch for ARCH=um
Masahiro Yamada [Tue, 3 Dec 2024 11:14:45 +0000 (20:14 +0900)]
kbuild: deb-pkg: add debarch for ARCH=um

'make ARCH=um bindeb-pkg' shows the following warning.

  $ make ARCH=um bindeb-pkg
     [snip]
    GEN     debian

  ** ** **  WARNING  ** ** **

  Your architecture doesn't have its equivalent
  Debian userspace architecture defined!
  Falling back to the current host architecture (amd64).
  Please add support for um to ./scripts/package/mkdebian ...

This commit hard-codes i386/amd64 because UML is only supported for x86.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
10 months agokbuild: Drop support for include/asm-<arch> in headers_check.pl
Geert Uytterhoeven [Thu, 5 Dec 2024 13:20:43 +0000 (14:20 +0100)]
kbuild: Drop support for include/asm-<arch> in headers_check.pl

"include/asm-<arch>" was replaced by "arch/<arch>/include/asm" a long
time ago.  All assembler header files are now included using
"#include <asm/*>", so there is no longer a need to rewrite paths.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
10 months agoselftests/bpf: Test bpf_skb_change_tail() in TC ingress
Cong Wang [Fri, 13 Dec 2024 03:40:57 +0000 (19:40 -0800)]
selftests/bpf: Test bpf_skb_change_tail() in TC ingress

Similarly to the previous test, we also need a test case to cover
positive offsets as well, TC is an excellent hook for this.

Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Zijian Zhang <zijianzhang@bytedance.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241213034057.246437-5-xiyou.wangcong@gmail.com
10 months agoselftests/bpf: Introduce socket_helpers.h for TC tests
Cong Wang [Fri, 13 Dec 2024 03:40:56 +0000 (19:40 -0800)]
selftests/bpf: Introduce socket_helpers.h for TC tests

Pull socket helpers out of sockmap_helpers.h so that they can be reused
for TC tests as well. This prepares for the next patch.

Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241213034057.246437-4-xiyou.wangcong@gmail.com
10 months agoselftests/bpf: Add a BPF selftest for bpf_skb_change_tail()
Cong Wang [Fri, 13 Dec 2024 03:40:55 +0000 (19:40 -0800)]
selftests/bpf: Add a BPF selftest for bpf_skb_change_tail()

As requested by Daniel, we need to add a selftest to cover
bpf_skb_change_tail() cases in skb_verdict. Here we test trimming,
growing and error cases, and validate its expected return values and the
expected sizes of the payload.

Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241213034057.246437-3-xiyou.wangcong@gmail.com
10 months agobpf: Check negative offsets in __bpf_skb_min_len()
Cong Wang [Fri, 13 Dec 2024 03:40:54 +0000 (19:40 -0800)]
bpf: Check negative offsets in __bpf_skb_min_len()

skb_network_offset() and skb_transport_offset() can be negative when
they are called after we pull the transport header, for example, when
we use eBPF sockmap at the point of ->sk_data_ready().

__bpf_skb_min_len() uses an unsigned int to get these offsets, this
leads to a very large number which then causes bpf_skb_change_tail()
failed unexpectedly.

Fix this by using a signed int to get these offsets and ensure the
minimum is at least zero.

Fixes: 5293efe62df8 ("bpf: add bpf_skb_change_tail helper")
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241213034057.246437-2-xiyou.wangcong@gmail.com
10 months agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 20 Dec 2024 22:10:01 +0000 (14:10 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
 "Fix a sparse warning in the arm64 signal code dealing with the user
  shadow stack register, GCSPR_EL0"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/signal: Silence sparse warning storing GCSPR_EL0

10 months agotcp_bpf: Fix copied value in tcp_bpf_sendmsg
Levi Zim [Sat, 30 Nov 2024 13:38:23 +0000 (21:38 +0800)]
tcp_bpf: Fix copied value in tcp_bpf_sendmsg

bpf kselftest sockhash::test_txmsg_cork_hangs in test_sockmap.c triggers a
kernel NULL pointer dereference:

BUG: kernel NULL pointer dereference, address: 0000000000000008
 ? __die_body+0x6e/0xb0
 ? __die+0x8b/0xa0
 ? page_fault_oops+0x358/0x3c0
 ? local_clock+0x19/0x30
 ? lock_release+0x11b/0x440
 ? kernelmode_fixup_or_oops+0x54/0x60
 ? __bad_area_nosemaphore+0x4f/0x210
 ? mmap_read_unlock+0x13/0x30
 ? bad_area_nosemaphore+0x16/0x20
 ? do_user_addr_fault+0x6fd/0x740
 ? prb_read_valid+0x1d/0x30
 ? exc_page_fault+0x55/0xd0
 ? asm_exc_page_fault+0x2b/0x30
 ? splice_to_socket+0x52e/0x630
 ? shmem_file_splice_read+0x2b1/0x310
 direct_splice_actor+0x47/0x70
 splice_direct_to_actor+0x133/0x300
 ? do_splice_direct+0x90/0x90
 do_splice_direct+0x64/0x90
 ? __ia32_sys_tee+0x30/0x30
 do_sendfile+0x214/0x300
 __se_sys_sendfile64+0x8e/0xb0
 __x64_sys_sendfile64+0x25/0x30
 x64_sys_call+0xb82/0x2840
 do_syscall_64+0x75/0x110
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

This is caused by tcp_bpf_sendmsg() returning a larger value(12289) than
size (8192), which causes the while loop in splice_to_socket() to release
an uninitialized pipe buf.

The underlying cause is that this code assumes sk_msg_memcopy_from_iter()
will copy all bytes upon success but it actually might only copy part of
it.

This commit changes it to use the real copied bytes.

Signed-off-by: Levi Zim <rsworktech@outlook.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@kernel.org>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241130-tcp-bpf-sendmsg-v1-2-bae583d014f3@outlook.com
10 months agoskmsg: Return copied bytes in sk_msg_memcopy_from_iter
Levi Zim [Sat, 30 Nov 2024 13:38:22 +0000 (21:38 +0800)]
skmsg: Return copied bytes in sk_msg_memcopy_from_iter

Previously sk_msg_memcopy_from_iter returns the copied bytes from the
last copy_from_iter{,_nocache} call upon success.

This commit changes it to return the total number of copied bytes on
success.

Signed-off-by: Levi Zim <rsworktech@outlook.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@kernel.org>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241130-tcp-bpf-sendmsg-v1-1-bae583d014f3@outlook.com
10 months agoMerge tag 'hwmon-for-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 20 Dec 2024 21:48:41 +0000 (13:48 -0800)]
Merge tag 'hwmon-for-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - Fix reporting of negative temperature, current, and voltage values in
   the tmp513 driver

* tag 'hwmon-for-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (tmp513) Fix interpretation of values of Temperature Result and Limit Registers
  hwmon: (tmp513) Fix Current Register value interpretation
  hwmon: (tmp513) Fix interpretation of values of Shunt Voltage and Limit Registers

10 months agoof: Add coreboot firmware to excluded default cells list
Rob Herring (Arm) [Fri, 20 Dec 2024 21:06:47 +0000 (15:06 -0600)]
of: Add coreboot firmware to excluded default cells list

Google Juniper and other Chromebook platforms have a very old bootloader
which populates /firmware node without proper address/size-cells leading
to warnings:

  Missing '#address-cells' in /firmware
  WARNING: CPU: 0 PID: 1 at drivers/of/base.c:106 of_bus_n_addr_cells+0x90/0xf0
  Modules linked in:
  CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0 #1 933ab9971ff4d5dc58cb378a96f64c7f72e3454d
  Hardware name: Google juniper sku16 board (DT)
  ...
  Missing '#size-cells' in /firmware
  WARNING: CPU: 0 PID: 1 at drivers/of/base.c:133 of_bus_n_size_cells+0x90/0xf0
  Modules linked in:
  CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.12.0 #1 933ab9971ff4d5dc58cb378a96f64c7f72e3454d
  Tainted: [W]=WARN
  Hardware name: Google juniper sku16 board (DT)

These platform won't receive updated bootloader/firmware, so add an
exclusion for platforms with a "coreboot" compatible node. While this is
wider than necessary, that's the easiest fix and it doesn't doesn't
matter if we miss checking other platforms using coreboot.

We may revisit this later and address with a fixup to the DT itself.

Reported-by: Sasha Levin <sashal@kernel.org>
Closes: https://lore.kernel.org/all/Z0NUdoG17EwuCigT@sashalap/
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Chen-Yu Tsai <wenst@chromium.org>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
10 months agoMerge tag 'block-6.13-20241220' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 20 Dec 2024 21:37:58 +0000 (13:37 -0800)]
Merge tag 'block-6.13-20241220' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - Minor cleanups for bdev/nvme using the helpers introduced

 - Revert of a deadlock fix that still needs more work

 - Fix a UAF of hctx in the cpu hotplug code

* tag 'block-6.13-20241220' of git://git.kernel.dk/linux:
  block: avoid to reuse `hctx` not removed from cpuhp callback list
  block: Revert "block: Fix potential deadlock while freezing queue and acquiring sysfs_lock"
  nvme: use blk_validate_block_size() for max LBA check
  block/bdev: use helper for max block size check

10 months agoMerge tag 'io_uring-6.13-20241220' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 20 Dec 2024 21:32:43 +0000 (13:32 -0800)]
Merge tag 'io_uring-6.13-20241220' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Fix for a file ref leak for registered ring fds

 - Turn the ->timeout_lock into a raw spinlock, as it nests under the
   io-wq lock which is a raw spinlock as it's called from the scheduler
   side

 - Limit ring resizing to DEFER_TASKRUN for now. We will broaden this in
   the future, but for now, ensure that it's only feasible on rings with
   a single user

 - Add sanity check for io-wq enqueuing

* tag 'io_uring-6.13-20241220' of git://git.kernel.dk/linux:
  io_uring: check if iowq is killed before queuing
  io_uring/register: limit ring resizing to DEFER_TASKRUN
  io_uring: Fix registered ring file refcount leak
  io_uring: make ctx->timeout_lock a raw spinlock

10 months agoMerge tag 'usb-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Fri, 20 Dec 2024 19:09:40 +0000 (11:09 -0800)]
Merge tag 'usb-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB / Thunderbolt fixes from Greg KH:
 "Here are some important, and small, fixes for USB and Thunderbolt
  issues that have come up in the -rc releases. And some new device ids
  for good measure. Included in here are:

   - Much reported xhci bugfix for usb-storage devices (and other
     devices as well, tripped me up on a video camera)

   - thunderbolt fixes for some small reported issues

   - new usb-serial device ids

  All of these have been in linux-next this week with no reported issues"

* tag 'usb-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: xhci: fix ring expansion regression in 6.13-rc1
  xhci: Turn NEC specific quirk for handling Stop Endpoint errors generic
  thunderbolt: Improve redrive mode handling
  USB: serial: option: add Telit FE910C04 rmnet compositions
  USB: serial: option: add MediaTek T7XX compositions
  USB: serial: option: add Netprisma LCUK54 modules for WWAN Ready
  USB: serial: option: add MeiG Smart SLM770A
  USB: serial: option: add TCL IK512 MBIM & ECM
  thunderbolt: Don't display nvm_version unless upgrade supported
  thunderbolt: Add support for Intel Panther Lake-M/P

10 months agoMerge tag 'spi-fix-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brooni...
Linus Torvalds [Fri, 20 Dec 2024 19:06:25 +0000 (11:06 -0800)]
Merge tag 'spi-fix-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fix from Mark Brown:
 "A fix for the remove path of the Rockchip driver, the code was just
  clearly and obviously wrong"

* tag 'spi-fix-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: rockchip-sfc: Fix error in remove progress

10 months agoMerge tag 'regulator-fix-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 20 Dec 2024 19:04:02 +0000 (11:04 -0800)]
Merge tag 'regulator-fix-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
 "The recently added regulator-uv-survival-time-ms property was renamed
  during the review of the series that added it, but unfortunately only
  in the DT binding and not in the code that parses the binding.

  This brings the code in line with the binding, if someone started
  using the original name we can add compat support for it but there's
  nothing upstream yet and it's a very niche feature so hopefully not"

* tag 'regulator-fix-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: rename regulator-uv-survival-time-ms according to DT binding

10 months agoMerge tag 'drm-fixes-2024-12-20' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 20 Dec 2024 18:17:53 +0000 (10:17 -0800)]
Merge tag 'drm-fixes-2024-12-20' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Probably the last pull before Christmas holidays, I'll still be around
  for most of the time anyways, nothing too major in here, bunch of
  amdgpu and i915 along with a smattering of fixes across the board.

  core:
   - fix FB dependency
   - avoid div by 0 more in vrefresh
   - maintainers update

  display:
   - fix DP tunnel error path

  dma-buf:
   - fix !DEBUG_FS

  sched:
   - docs warning fix

  panel:
   - collection of misc panel fixes

  i915:
   - Reset engine utilization buffer before registration
   - Ensure busyness counter increases motonically
   - Accumulate active runtime on gt reset

  amdgpu:
   - Disable BOCO when CONFIG_HOTPLUG_PCI_PCIE is not enabled
   - scheduler job fixes
   - IP version check fixes
   - devcoredump fix
   - GPUVM update fix
   - NBIO 2.5 fix

  udmabuf:
   - fix memory leak on last export
   - sealing fixes

  ivpu:
   - fix NULL pointer
   - fix memory leak
   - fix WARN"

* tag 'drm-fixes-2024-12-20' of https://gitlab.freedesktop.org/drm/kernel: (33 commits)
  drm/sched: Fix drm_sched_fini() docu generation
  accel/ivpu: Fix WARN in ivpu_ipc_send_receive_internal()
  accel/ivpu: Fix memory leak in ivpu_mmu_reserved_context_init()
  accel/ivpu: Fix general protection fault in ivpu_bo_list()
  drm/amdgpu/nbio7.0: fix IP version check
  drm/amd: Update strapping for NBIO 2.5.0
  drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_update
  drm/amdgpu: fix amdgpu_coredump
  drm/amdgpu/smu14.0.2: fix IP version check
  drm/amdgpu/gfx12: fix IP version check
  drm/amdgpu/mmhub4.1: fix IP version check
  drm/amdgpu/nbio7.11: fix IP version check
  drm/amdgpu/nbio7.7: fix IP version check
  drm/amdgpu: don't access invalid sched
  drm/amd: Require CONFIG_HOTPLUG_PCI_PCIE for BOCO
  drm: rework FB_CORE dependency
  drm/fbdev: Select FB_CORE dependency for fbdev on DMA and TTM
  fbdev: Fix recursive dependencies wrt BACKLIGHT_CLASS_DEVICE
  i915/guc: Accumulate active runtime on gt reset
  i915/guc: Ensure busyness counter increases motonically
  ...

10 months agoMerge tag 'trace-ringbuffer-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 20 Dec 2024 18:13:26 +0000 (10:13 -0800)]
Merge tag 'trace-ringbuffer-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer fixes from Steven Rostedt:

 - Fix possible overflow of mmapped ring buffer with bad offset

   If the mmap() to the ring buffer passes in a start address that is
   passed the end of the mmapped file, it is not caught and a
   slab-out-of-bounds is triggered.

   Add a check to make sure the start address is within the bounds

 - Do not use TP_printk() to boot mapped ring buffers

   As a boot mapped ring buffer's data may have pointers that map to the
   previous boot's memory map, it is unsafe to allow the TP_printk() to
   be used to read the boot mapped buffer's events. If a TP_printk()
   points to a static string from within the kernel it will not match
   the current kernel mapping if KASLR is active, and it can fault.

   Have it simply print out the raw fields.

* tag 'trace-ringbuffer-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  trace/ring-buffer: Do not use TP_printk() formatting for boot mapped buffers
  ring-buffer: Fix overflow in __rb_map_vma

10 months agoMerge tag 'arm-soc/for-6.13/devicetree-arm64-fixes' of https://github.com/Broadcom...
Arnd Bergmann [Fri, 20 Dec 2024 17:02:27 +0000 (18:02 +0100)]
Merge tag 'arm-soc/for-6.13/devicetree-arm64-fixes' of https://github.com/Broadcom/stblinux into arm/fixes

This pull request contains Broadcom ARM64-based SoCs Device Tree fixes
for 6.13, please pull the following:

- Willow corrects the L2 cache line size on the Raspberry Pi 5 (2712) to
  the correct value of 64 bytes

* tag 'arm-soc/for-6.13/devicetree-arm64-fixes' of https://github.com/Broadcom/stblinux:
  arm64: dts: broadcom: Fix L2 linesize for Raspberry Pi 5

Link: https://lore.kernel.org/r/20241217190547.868744-1-florian.fainelli@broadcom.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
10 months agoMerge tag 'riscv-soc-fixes-for-v6.13-rc4' of https://git.kernel.org/pub/scm/linux...
Arnd Bergmann [Fri, 20 Dec 2024 17:00:30 +0000 (18:00 +0100)]
Merge tag 'riscv-soc-fixes-for-v6.13-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes

RISC-V soc driver fixes for v6.13-rc4

A single fix for the Auto Update driver, where a mistake in array
indexing (accessing as a u32 rather than a u8) caused the driver to read
the wrong feature disable bits.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-soc-fixes-for-v6.13-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
  firmware: microchip: fix UL_IAP lock check in mpfs_auto_update_state()

Link: https://lore.kernel.org/r/20241218-suffrage-unfazed-fa0113072a42@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
10 months agotcp_bpf: Add sk_rmem_alloc related logic for tcp_bpf ingress redirection
Zijian Zhang [Tue, 10 Dec 2024 01:20:39 +0000 (01:20 +0000)]
tcp_bpf: Add sk_rmem_alloc related logic for tcp_bpf ingress redirection

When we do sk_psock_verdict_apply->sk_psock_skb_ingress, an sk_msg will
be created out of the skb, and the rmem accounting of the sk_msg will be
handled by the skb.

For skmsgs in __SK_REDIRECT case of tcp_bpf_send_verdict, when redirecting
to the ingress of a socket, although we sk_rmem_schedule and add sk_msg to
the ingress_msg of sk_redir, we do not update sk_rmem_alloc. As a result,
except for the global memory limit, the rmem of sk_redir is nearly
unlimited. Thus, add sk_rmem_alloc related logic to limit the recv buffer.

Since the function sk_msg_recvmsg and __sk_psock_purge_ingress_msg are
used in these two paths. We use "msg->skb" to test whether the sk_msg is
skb backed up. If it's not, we shall do the memory accounting explicitly.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241210012039.1669389-3-zijianzhang@bytedance.com
10 months agotcp_bpf: Charge receive socket buffer in bpf_tcp_ingress()
Cong Wang [Tue, 10 Dec 2024 01:20:38 +0000 (01:20 +0000)]
tcp_bpf: Charge receive socket buffer in bpf_tcp_ingress()

When bpf_tcp_ingress() is called, the skmsg is being redirected to the
ingress of the destination socket. Therefore, we should charge its
receive socket buffer, instead of sending socket buffer.

Because sk_rmem_schedule() tests pfmemalloc of skb, we need to
introduce a wrapper and call it for skmsg.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241210012039.1669389-2-zijianzhang@bytedance.com
10 months agoarm64/signal: Silence sparse warning storing GCSPR_EL0
Mark Brown [Sat, 14 Dec 2024 02:12:06 +0000 (02:12 +0000)]
arm64/signal: Silence sparse warning storing GCSPR_EL0

We are seeing a sparse warning in gcs_restore_signal():

  arch/arm64/kernel/signal.c:1054:9: sparse: sparse: cast removes address space '__user' of expression

when storing the final GCSPR_EL0 value back into the register, caused by
the fact that write_sysreg_s() casts the value it writes to a u64 which
sparse sees as discarding the __userness of the pointer.

Avoid this by treating the address as an integer, casting to a pointer only
when using it to write to userspace.

While we're at it also inline gcs_signal_cap_valid() into it's one user
and make equivalent updates to gcs_signal_entry().

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202412082005.OBJ0BbWs-lkp@intel.com/
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241214-arm64-gcs-signal-sparse-v3-1-5e8d18fffc0c@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
10 months agoMerge tag 'amd-drm-fixes-6.13-2024-12-18' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Fri, 20 Dec 2024 02:53:42 +0000 (12:53 +1000)]
Merge tag 'amd-drm-fixes-6.13-2024-12-18' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.13-2024-12-18:

amdgpu:
- Disable BOCO when CONFIG_HOTPLUG_PCI_PCIE is not enabled
- scheduler job fixes
- IP version check fixes
- devcoredump fix
- GPUVM update fix
- NBIO 2.5 fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241218204637.2966198-1-alexander.deucher@amd.com
10 months agoKVM: x86/mmu: Treat TDP MMU faults as spurious if access is already allowed
Sean Christopherson [Wed, 18 Dec 2024 21:36:11 +0000 (13:36 -0800)]
KVM: x86/mmu: Treat TDP MMU faults as spurious if access is already allowed

Treat slow-path TDP MMU faults as spurious if the access is allowed given
the existing SPTE to fix a benign warning (other than the WARN itself)
due to replacing a writable SPTE with a read-only SPTE, and to avoid the
unnecessary LOCK CMPXCHG and subsequent TLB flush.

If a read fault races with a write fault, fast GUP fails for any reason
when trying to "promote" the read fault to a writable mapping, and KVM
resolves the write fault first, then KVM will end up trying to install a
read-only SPTE (for a !map_writable fault) overtop a writable SPTE.

Note, it's not entirely clear why fast GUP fails, or if that's even how
KVM ends up with a !map_writable fault with a writable SPTE.  If something
else is going awry, e.g. due to a bug in mmu_notifiers, then treating read
faults as spurious in this scenario could effectively mask the underlying
problem.

However, retrying the faulting access instead of overwriting an existing
SPTE is functionally correct and desirable irrespective of the WARN, and
fast GUP _can_ legitimately fail with a writable VMA, e.g. if the Accessed
bit in primary MMU's PTE is toggled and causes a PTE value mismatch.  The
WARN was also recently added, specifically to track down scenarios where
KVM is unnecessarily overwrites SPTEs, i.e. treating the fault as spurious
doesn't regress KVM's bug-finding capabilities in any way.  In short,
letting the WARN linger because there's a tiny chance it's due to a bug
elsewhere would be excessively paranoid.

Fixes: 1a175082b190 ("KVM: x86/mmu: WARN and flush if resolving a TDP MMU fault clears MMU-writable")
Reported-by: Lei Yang <leiyang@redhat.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219588
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://lore.kernel.org/r/20241218213611.3181643-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
10 months agoKVM: SVM: Allow guest writes to set MSR_AMD64_DE_CFG bits
Sean Christopherson [Wed, 11 Dec 2024 17:29:52 +0000 (09:29 -0800)]
KVM: SVM: Allow guest writes to set MSR_AMD64_DE_CFG bits

Drop KVM's arbitrary behavior of making DE_CFG.LFENCE_SERIALIZE read-only
for the guest, as rejecting writes can lead to guest crashes, e.g. Windows
in particular doesn't gracefully handle unexpected #GPs on the WRMSR, and
nothing in the AMD manuals suggests that LFENCE_SERIALIZE is read-only _if
it exists_.

KVM only allows LFENCE_SERIALIZE to be set, by the guest or host, if the
underlying CPU has X86_FEATURE_LFENCE_RDTSC, i.e. if LFENCE is guaranteed
to be serializing.  So if the guest sets LFENCE_SERIALIZE, KVM will provide
the desired/correct behavior without any additional action (the guest's
value is never stuffed into hardware).  And having LFENCE be serializing
even when it's not _required_ to be is a-ok from a functional perspective.

Fixes: 74a0e79df68a ("KVM: SVM: Disallow guest from changing userspace's MSR_AMD64_DE_CFG value")
Fixes: d1d93fa90f1a ("KVM: SVM: Add MSR-based feature support for serializing LFENCE")
Reported-by: Simon Pilkington <simonp.git@mailbox.org>
Closes: https://lore.kernel.org/all/52914da7-a97b-45ad-86a0-affdf8266c61@mailbox.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20241211172952.1477605-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
10 months agoKVM: x86: Play nice with protected guests in complete_hypercall_exit()
Sean Christopherson [Thu, 28 Nov 2024 00:43:39 +0000 (16:43 -0800)]
KVM: x86: Play nice with protected guests in complete_hypercall_exit()

Use is_64_bit_hypercall() instead of is_64_bit_mode() to detect a 64-bit
hypercall when completing said hypercall.  For guests with protected state,
e.g. SEV-ES and SEV-SNP, KVM must assume the hypercall was made in 64-bit
mode as the vCPU state needed to detect 64-bit mode is unavailable.

Hacking the sev_smoke_test selftest to generate a KVM_HC_MAP_GPA_RANGE
hypercall via VMGEXIT trips the WARN:

  ------------[ cut here ]------------
  WARNING: CPU: 273 PID: 326626 at arch/x86/kvm/x86.h:180 complete_hypercall_exit+0x44/0xe0 [kvm]
  Modules linked in: kvm_amd kvm ... [last unloaded: kvm]
  CPU: 273 UID: 0 PID: 326626 Comm: sev_smoke_test Not tainted 6.12.0-smp--392e932fa0f3-feat #470
  Hardware name: Google Astoria/astoria, BIOS 0.20240617.0-0 06/17/2024
  RIP: 0010:complete_hypercall_exit+0x44/0xe0 [kvm]
  Call Trace:
   <TASK>
   kvm_arch_vcpu_ioctl_run+0x2400/0x2720 [kvm]
   kvm_vcpu_ioctl+0x54f/0x630 [kvm]
   __se_sys_ioctl+0x6b/0xc0
   do_syscall_64+0x83/0x160
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
   </TASK>
  ---[ end trace 0000000000000000 ]---

Fixes: b5aead0064f3 ("KVM: x86: Assume a 64-bit hypercall for guests with protected state")
Cc: stable@vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20241128004344.4072099-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
10 months agoKVM: SVM: Disable AVIC on SNP-enabled system without HvInUseWrAllowed feature
Suravee Suthikulpanit [Mon, 4 Nov 2024 07:58:45 +0000 (07:58 +0000)]
KVM: SVM: Disable AVIC on SNP-enabled system without HvInUseWrAllowed feature

On SNP-enabled system, VMRUN marks AVIC Backing Page as in-use while
the guest is running for both secure and non-secure guest. Any hypervisor
write to the in-use vCPU's AVIC backing page (e.g. to inject an interrupt)
will generate unexpected #PF in the host.

Currently, attempt to run AVIC guest would result in the following error:

    BUG: unable to handle page fault for address: ff3a442e549cc270
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x80000003) - RMP violation
    PGD b6ee01067 P4D b6ee02067 PUD 10096d063 PMD 11c540063 PTE 80000001149cc163
    SEV-SNP: PFN 0x1149cc unassigned, dumping non-zero entries in 2M PFN region: [0x114800 - 0x114a00]
    ...

Newer AMD system is enhanced to allow hypervisor to modify the backing page
for non-secure guest on SNP-enabled system. This enhancement is available
when the CPUID Fn8000_001F_EAX bit 30 is set (HvInUseWrAllowed).

This table describes AVIC support matrix w.r.t. SNP enablement:

               | Non-SNP system |     SNP system
-----------------------------------------------------
 Non-SNP guest |  AVIC Activate | AVIC Activate iff
               |                | HvInuseWrAllowed=1
-----------------------------------------------------
     SNP guest |      N/A       |    Secure AVIC

Therefore, check and disable AVIC in kvm_amd driver when the feature is not
available on SNP-enabled system.

See the AMD64 Architecture Programmer’s Manual (APM) Volume 2 for detail.
(https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/
programmer-references/40332.pdf)

Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20241104075845.7583-1-suravee.suthikulpanit@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
10 months agoMerge tag 'drm-misc-fixes-2024-12-19' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Thu, 19 Dec 2024 21:13:44 +0000 (07:13 +1000)]
Merge tag 'drm-misc-fixes-2024-12-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

drm-misc-fixes for v6.13-rc4:
- udma-buf fixes related to sealing.
- dma-buf build warning fix when debugfs is not enabled.
- Assorted drm/panel fixes.
- Correct error return in drm_dp_tunnel_mgr_create.
- Fix even more divide by zero in drm_mode_vrefresh.
- Fix FBDEV dependencies in Kconfig.
- Documentation fix for drm_sched_fini.
- IVPU NULL pointer, memory leak and WARN fix.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/d0763051-87b7-483e-89e0-a9f993383450@linux.intel.com
10 months agoio_uring: check if iowq is killed before queuing
Pavel Begunkov [Thu, 19 Dec 2024 19:52:58 +0000 (19:52 +0000)]
io_uring: check if iowq is killed before queuing

task work can be executed after the task has gone through io_uring
termination, whether it's the final task_work run or the fallback path.
In this case, task work will find ->io_wq being already killed and
null'ed, which is a problem if it then tries to forward the request to
io_queue_iowq(). Make io_queue_iowq() fail requests in this case.

Note that it also checks PF_KTHREAD, because the user can first close
a DEFER_TASKRUN ring and shortly after kill the task, in which case
->iowq check would race.

Cc: stable@vger.kernel.org
Fixes: 50c52250e2d74 ("block: implement async io_uring discard cmd")
Fixes: 773af69121ecc ("io_uring: always reissue from task_work context")
Reported-by: Will <willsroot@protonmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/63312b4a2c2bb67ad67b857d17a300e1d3b078e8.1734637909.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agosmb: fix bytes written value in /proc/fs/cifs/Stats
Bharath SM [Thu, 19 Dec 2024 17:58:50 +0000 (23:28 +0530)]
smb: fix bytes written value in /proc/fs/cifs/Stats

With recent netfs apis changes, the bytes written
value was not getting updated in /proc/fs/cifs/Stats.
Fix this by updating tcon->bytes in write operations.

Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 months agoMerge tag 'net-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 19 Dec 2024 17:19:11 +0000 (09:19 -0800)]
Merge tag 'net-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from can and netfilter.

  Current release - regressions:

   - rtnetlink: try the outer netns attribute in rtnl_get_peer_net()

   - rust: net::phy fix module autoloading

  Current release - new code bugs:

   - phy: avoid undefined behavior in *_led_polarity_set()

   - eth: octeontx2-pf: fix netdev memory leak in rvu_rep_create()

  Previous releases - regressions:

   - smc: check sndbuf_space again after NOSPACE flag is set in smc_poll

   - ipvs: fix clamp() of ip_vs_conn_tab on small memory systems

   - dsa: restore dsa_software_vlan_untag() ability to operate on
     VLAN-untagged traffic

   - eth:
       - tun: fix tun_napi_alloc_frags()
       - ionic: no double destroy workqueue
       - idpf: trigger SW interrupt when exiting wb_on_itr mode
       - rswitch: rework ts tags management
       - team: fix feature exposure when no ports are present

  Previous releases - always broken:

   - core: fix repeated netlink messages in queue dump

   - mdiobus: fix an OF node reference leak

   - smc: check iparea_offset and ipv6_prefixes_cnt when receiving
     proposal msg

   - can: fix missed interrupts with m_can_pci

   - eth: oa_tc6: fix infinite loop error when tx credits becomes 0"

* tag 'net-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
  net: mctp: handle skb cleanup on sock_queue failures
  net: mdiobus: fix an OF node reference leak
  octeontx2-pf: fix error handling of devlink port in rvu_rep_create()
  octeontx2-pf: fix netdev memory leak in rvu_rep_create()
  psample: adjust size if rate_as_probability is set
  netdev-genl: avoid empty messages in queue dump
  net: dsa: restore dsa_software_vlan_untag() ability to operate on VLAN-untagged traffic
  selftests: openvswitch: fix tcpdump execution
  net: usb: qmi_wwan: add Quectel RG255C
  net: phy: avoid undefined behavior in *_led_polarity_set()
  netfilter: ipset: Fix for recursive locking warning
  ipvs: Fix clamp() of ip_vs_conn_tab on small memory systems
  can: m_can: fix missed interrupts with m_can_pci
  can: m_can: set init flag earlier in probe
  rtnetlink: Try the outer netns attribute in rtnl_get_peer_net().
  net: netdevsim: fix nsim_pp_hold_write()
  idpf: trigger SW interrupt when exiting wb_on_itr mode
  idpf: add support for SW triggered interrupts
  qed: fix possible uninit pointer read in qed_mcp_nvm_info_populate()
  net: ethernet: bgmac-platform: fix an OF node reference leak
  ...

10 months agoMerge tag 'mmc-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Thu, 19 Dec 2024 16:53:51 +0000 (08:53 -0800)]
Merge tag 'mmc-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:

 - mtk-sd: Cleanup the wakeup configuration in error/remove-path

 - sdhci-tegra: Correct quirk for ADMA2 length

* tag 'mmc-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: mtk-sd: disable wakeup in .remove() and in the error path of .probe()
  mmc: sdhci-tegra: Remove SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC quirk

10 months agoMerge tag 'pwm/for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 19 Dec 2024 16:50:05 +0000 (08:50 -0800)]
Merge tag 'pwm/for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux

Pull pwm fix from Uwe Kleine-König:
 "Fix regression in pwm-stm32 driver when converting to new waveform
  support

  Fabrice Gasnier found and fixed a regression I introduced with
  v6.13-rc1 when converting the stm32 pwm driver to support the new
  waveform stuff. On some hardware variants this completely broke the
  driver"

* tag 'pwm/for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
  pwm: stm32: Fix complementary output in round_waveform_tohw()

10 months agoMerge tag 'v6.13-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Thu, 19 Dec 2024 16:45:37 +0000 (08:45 -0800)]
Merge tag 'v6.13-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - Two fixes for better handling maximum outstanding requests

 - Fix simultaneous negotiate protocol race

* tag 'v6.13-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: conn lock to serialize smb2 negotiate
  ksmbd: fix broken transfers when exceeding max simultaneous operations
  ksmbd: count all requests in req_running counter

10 months agoPCI/bwctrl: Enable only if more than one speed is supported
Lukas Wunner [Tue, 17 Dec 2024 09:51:02 +0000 (10:51 +0100)]
PCI/bwctrl: Enable only if more than one speed is supported

If a PCIe port only supports a single speed, enabling bandwidth control
is pointless:  There's no need to monitor autonomous speed changes, nor
can the speed be changed.

Not enabling it saves a small amount of memory and compute resources,
but also fixes a boot hang reported by Niklas:  It occurs when enabling
bandwidth control on Downstream Ports of Intel JHL7540 "Titan Ridge 2018"
Thunderbolt controllers.  The ports only support 2.5 GT/s in accordance
with USB4 v2 sec 11.2.1, so the present commit works around the issue.

PCIe r6.2 sec 8.2.1 prescribes that:

   "A device must support 2.5 GT/s and is not permitted to skip support
    for any data rates between 2.5 GT/s and the highest supported rate."

Consequently, bandwidth control is currently only disabled if a port
doesn't support higher speeds than 2.5 GT/s.  However the Implementation
Note in PCIe r6.2 sec 7.5.3.18 cautions:

   "It is strongly encouraged that software primarily utilize the
    Supported Link Speeds Vector instead of the Max Link Speed field,
    so that software can determine the exact set of supported speeds on
    current and future hardware.  This can avoid software being confused
    if a future specification defines Links that do not require support
    for all slower speeds."

In other words, future revisions of the PCIe Base Spec may allow gaps
in the Supported Link Speeds Vector.  To be future-proof, don't just
check whether speeds above 2.5 GT/s are supported, but rather check
whether *more than one* speed is supported.

Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
Closes: https://lore.kernel.org/r/db8e457fcd155436449b035e8791a8241b0df400.camel@kernel.org
Link: https://lore.kernel.org/r/3564908a9c99fc0d2a292473af7a94ebfc8f5820.1734428762.git.lukas@wunner.de
Reported-by: Niklas Schnelle <niks@kernel.org>
Tested-by: Niklas Schnelle <niks@kernel.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Jonathan Cameron <Jonthan.Cameron@huawei.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
10 months agoPCI: Honor Max Link Speed when determining supported speeds
Lukas Wunner [Tue, 17 Dec 2024 09:51:01 +0000 (10:51 +0100)]
PCI: Honor Max Link Speed when determining supported speeds

The Supported Link Speeds Vector in the Link Capabilities 2 Register
indicates the *supported* link speeds.  The Max Link Speed field in the
Link Capabilities Register indicates the *maximum* of those speeds.

pcie_get_supported_speeds() neglects to honor the Max Link Speed field and
will thus incorrectly deem higher speeds as supported.  Fix it.

One user-visible issue addressed here is an incorrect value in the sysfs
attribute "max_link_speed".

But the main motivation is a boot hang reported by Niklas:  Intel JHL7540
"Titan Ridge 2018" Thunderbolt controllers supports 2.5-8 GT/s speeds,
but indicate 2.5 GT/s as maximum.  Ilpo recalls seeing this on more
devices.  It can be explained by the controller's Downstream Ports
supporting 8 GT/s if an Endpoint is attached, but limiting to 2.5 GT/s
if the port interfaces to a PCIe Adapter, in accordance with USB4 v2
sec 11.2.1:

   "This section defines the functionality of an Internal PCIe Port that
    interfaces to a PCIe Adapter. [...]
    The Logical sub-block shall update the PCIe configuration registers
    with the following characteristics: [...]
    Max Link Speed field in the Link Capabilities Register set to 0001b
    (data rate of 2.5 GT/s only).
    Note: These settings do not represent actual throughput. Throughput
    is implementation specific and based on the USB4 Fabric performance."

The present commit is not sufficient on its own to fix Niklas' boot hang,
but it is a prerequisite:  A subsequent commit will fix the boot hang by
enabling bandwidth control only if more than one speed is supported.

The GENMASK() macro used herein specifies 0 as lowest bit, even though
the Supported Link Speeds Vector ends at bit 1.  This is done on purpose
to avoid a GENMASK(0, 1) macro if Max Link Speed is zero.  That macro
would be invalid as the lowest bit is greater than the highest bit.
Ilpo has witnessed a zero Max Link Speed on Root Complex Integrated
Endpoints in particular, so it does occur in practice.  (The Link
Capabilities Register is optional on RCiEPs per PCIe r6.2 sec 7.5.3.)

Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org
Link: https://lore.kernel.org/r/fe03941e3e1cc42fb9bf4395e302bff53ee2198b.1734428762.git.lukas@wunner.de
Reported-by: Niklas Schnelle <niks@kernel.org>
Tested-by: Niklas Schnelle <niks@kernel.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
10 months agoio_uring/register: limit ring resizing to DEFER_TASKRUN
Jens Axboe [Thu, 19 Dec 2024 16:32:26 +0000 (09:32 -0700)]
io_uring/register: limit ring resizing to DEFER_TASKRUN

With DEFER_TASKRUN, we know the ring can't be both waited upon and
resized at the same time. This is important for CQ resizing. Allowing SQ
ring resizing is more trivial, but isn't the interesting use case. Hence
limit ring resizing in general to DEFER_TASKRUN only for now. This isn't
a huge problem as CQ ring resizing is generally the most useful on
networking type of workloads where it can be hard to size the ring
appropriately upfront, and those should be using DEFER_TASKRUN for
better performance.

Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agosmb: client: fix TCP timers deadlock after rmmod
Enzo Matsumiya [Tue, 10 Dec 2024 21:15:12 +0000 (18:15 -0300)]
smb: client: fix TCP timers deadlock after rmmod

Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.")
fixed a netns UAF by manually enabled socket refcounting
(sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)).

The reason the patch worked for that bug was because we now hold
references to the netns (get_net_track() gets a ref internally)
and they're properly released (internally, on __sk_destruct()),
but only because sk->sk_net_refcnt was set.

Problem:
(this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless
if init_net or other)

Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not
only out of cifs scope, but also technically wrong -- it's set conditionally
based on user (=1) vs kernel (=0) sockets.  And net/ implementations
seem to base their user vs kernel space operations on it.

e.g. upon TCP socket close, the TCP timers are not cleared because
sk->sk_net_refcnt=1:
(cf. commit 151c9c724d05 ("tcp: properly terminate timers for kernel sockets"))

net/ipv4/tcp.c:
    void tcp_close(struct sock *sk, long timeout)
    {
     lock_sock(sk);
     __tcp_close(sk, timeout);
     release_sock(sk);
     if (!sk->sk_net_refcnt)
     inet_csk_clear_xmit_timers_sync(sk);
     sock_put(sk);
    }

Which will throw a lockdep warning and then, as expected, deadlock on
tcp_write_timer().

A way to reproduce this is by running the reproducer from ef7134c7fc48
and then 'rmmod cifs'.  A few seconds later, the deadlock/lockdep
warning shows up.

Fix:
We shouldn't mess with socket internals ourselves, so do not set
sk_net_refcnt manually.

Also change __sock_create() to sock_create_kern() for explicitness.

As for non-init_net network namespaces, we deal with it the best way
we can -- hold an extra netns reference for server->ssocket and drop it
when it's released.  This ensures that the netns still exists whenever
we need to create/destroy server->ssocket, but is not directly tied to
it.

Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.")
Cc: stable@vger.kernel.org
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 months agosmb: client: Deduplicate "select NETFS_SUPPORT" in Kconfig
Dragan Simic [Tue, 17 Dec 2024 09:25:10 +0000 (10:25 +0100)]
smb: client: Deduplicate "select NETFS_SUPPORT" in Kconfig

Repeating automatically selected options in Kconfig files is redundant, so
let's delete repeated "select NETFS_SUPPORT" that was added accidentally.

Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks")
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 months agosmb: use macros instead of constants for leasekey size and default cifsattrs value
Bharath SM [Mon, 16 Dec 2024 18:39:36 +0000 (00:09 +0530)]
smb: use macros instead of constants for leasekey size and default cifsattrs value

Replace default hardcoded value for cifsAttrs with ATTR_ARCHIVE macro
Use SMB2_LEASE_KEY_SIZE macro for leasekey size in smb2_lease_break

Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 months agodrm/sched: Fix drm_sched_fini() docu generation
Bagas Sanjaya [Tue, 17 Dec 2024 03:49:15 +0000 (10:49 +0700)]
drm/sched: Fix drm_sched_fini() docu generation

Commit baf4afc5831438 ("drm/sched: Improve teardown documentation")
added a list of drm_sched_fini()'s problems. The list triggers htmldocs
warning (but renders correctly in htmldocs output):

Documentation/gpu/drm-mm:571: ./drivers/gpu/drm/scheduler/sched_main.c:1359: ERROR: Unexpected indentation.

Separate the list from the preceding paragraph by a blank line to fix
the warning. While at it, also end the aforementioned paragraph by a
colon.

Fixes: baf4afc58314 ("drm/sched: Improve teardown documentation")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/r/20241108175655.6d3fcfb7@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
[phasta: Adjust commit message]
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20241217034915.62594-1-bagasdotme@gmail.com
10 months agoselftests/bpf: Fix compilation error in get_uprobe_offset()
Jerome Marchand [Wed, 18 Dec 2024 17:57:24 +0000 (18:57 +0100)]
selftests/bpf: Fix compilation error in get_uprobe_offset()

In get_uprobe_offset(), the call to procmap_query() use the constant
PROCMAP_QUERY_VMA_EXECUTABLE, even if PROCMAP_QUERY is not defined.

Define PROCMAP_QUERY_VMA_EXECUTABLE when PROCMAP_QUERY isn't.

Fixes: 4e9e07603ecd ("selftests/bpf: make use of PROCMAP_QUERY ioctl if available")
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20241218175724.578884-1-jmarchan@redhat.com
10 months agoaccel/ivpu: Fix WARN in ivpu_ipc_send_receive_internal()
Jacek Lawrynowicz [Tue, 10 Dec 2024 13:09:39 +0000 (14:09 +0100)]
accel/ivpu: Fix WARN in ivpu_ipc_send_receive_internal()

Move pm_runtime_set_active() to ivpu_pm_init() so when
ivpu_ipc_send_receive_internal() is executed before ivpu_pm_enable()
it already has correct runtime state, even if last resume was
not successful.

Fixes: 8ed520ff4682 ("accel/ivpu: Move set autosuspend delay to HW specific code")
Cc: stable@vger.kernel.org # v6.7+
Reviewed-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241210130939.1575610-4-jacek.lawrynowicz@linux.intel.com
10 months agoaccel/ivpu: Fix memory leak in ivpu_mmu_reserved_context_init()
Jacek Lawrynowicz [Tue, 10 Dec 2024 13:09:38 +0000 (14:09 +0100)]
accel/ivpu: Fix memory leak in ivpu_mmu_reserved_context_init()

Add appropriate error handling to ensure all allocated resources are
released upon encountering an error.

Fixes: a74f4d991352 ("accel/ivpu: Defer MMU root page table allocation")
Cc: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241210130939.1575610-3-jacek.lawrynowicz@linux.intel.com
10 months agoaccel/ivpu: Fix general protection fault in ivpu_bo_list()
Jacek Lawrynowicz [Tue, 10 Dec 2024 13:09:37 +0000 (14:09 +0100)]
accel/ivpu: Fix general protection fault in ivpu_bo_list()

Check if ctx is not NULL before accessing its fields.

Fixes: 37dee2a2f433 ("accel/ivpu: Improve buffer object debug logs")
Cc: stable@vger.kernel.org # v6.8
Reviewed-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241210130939.1575610-2-jacek.lawrynowicz@linux.intel.com
10 months agoselftests/bpf: Use asm constraint "m" for LoongArch
Tiezhu Yang [Thu, 19 Dec 2024 11:15:06 +0000 (19:15 +0800)]
selftests/bpf: Use asm constraint "m" for LoongArch

Currently, LoongArch LLVM does not support the constraint "o" and no plan
to support it, it only supports the similar constraint "m", so change the
constraints from "nor" in the "else" case to arch-specific "nmr" to avoid
the build error such as "unexpected asm memory constraint" for LoongArch.

Fixes: 630301b0d59d ("selftests/bpf: Add basic USDT selftests")
Suggested-by: Weining Lu <luweining@loongson.cn>
Suggested-by: Li Chen <chenli@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Cc: stable@vger.kernel.org
Link: https://llvm.org/docs/LangRef.html#supported-constraint-code-list
Link: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp#L172
Link: https://lore.kernel.org/bpf/20241219111506.20643-1-yangtiezhu@loongson.cn
10 months agoMerge tag 'thunderbolt-for-v6.13-rc4' of ssh://gitolite.kernel.org/pub/scm/linux...
Greg Kroah-Hartman [Thu, 19 Dec 2024 11:35:02 +0000 (12:35 +0100)]
Merge tag 'thunderbolt-for-v6.13-rc4' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-linus

Mika writes:

thunderbolt: Fixes for v6.13-rc4

This includes following USB4/Thunderbolt fixes for v6.13-rc4:

  - Add Intel Panther Lake PCI IDs
  - Do not show nvm_version for retimers that are not supported
  - Fix redrive mode handling.

All these have been in linux-next with no reported issues.

* tag 'thunderbolt-for-v6.13-rc4' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
  thunderbolt: Improve redrive mode handling
  thunderbolt: Don't display nvm_version unless upgrade supported
  thunderbolt: Add support for Intel Panther Lake-M/P

10 months agoregulator: rename regulator-uv-survival-time-ms according to DT binding
Ahmad Fatoum [Wed, 18 Dec 2024 19:54:53 +0000 (20:54 +0100)]
regulator: rename regulator-uv-survival-time-ms according to DT binding

The regulator bindings don't document regulator-uv-survival-time-ms, but
the more descriptive regulator-uv-less-critical-window-ms instead.

Looking back at v3[1] and v4[2] of the series adding the support,
the property was indeed renamed between these patch series, but
unfortunately the rename only made it into the DT bindings with the
driver code still using the old name.

Let's therefore rename the property in the driver code to follow suit.
This will break backwards compatibility, but there are no upstream
device trees using the property and we never documented the old name
of the property anyway. ¯\_(ツ)_/¯"

[1]: https://lore.kernel.org/all/20231025084614.3092295-7-o.rempel@pengutronix.de/
[2]: https://lore.kernel.org/all/20231026144824.4065145-5-o.rempel@pengutronix.de/

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Link: https://patch.msgid.link/20241218-regulator-uv-survival-time-ms-rename-v1-1-6cac9c3c75da@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>