]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
5 months agocxl/edac: Add support for PERFORM_MAINTENANCE command
Shiju Jose [Wed, 21 May 2025 12:47:43 +0000 (13:47 +0100)]
cxl/edac: Add support for PERFORM_MAINTENANCE command

Add support for PERFORM_MAINTENANCE command.

CXL spec 3.2 section 8.2.10.7.1 describes the Perform Maintenance command.
This command requests the device to execute the maintenance operation
specified by the maintenance operation class and the maintenance operation
subclass.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20250521124749.817-6-shiju.jose@huawei.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
5 months agocxl/edac: Add CXL memory device ECS control feature
Shiju Jose [Wed, 21 May 2025 12:47:42 +0000 (13:47 +0100)]
cxl/edac: Add CXL memory device ECS control feature

CXL spec 3.2 section 8.2.10.9.11.2 describes the DDR5 ECS (Error Check
Scrub) control feature.
The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
Specification (JESD79-5) and allows the DRAM to internally read, correct
single-bit errors, and write back corrected data bits to the DRAM array
while providing transparency to error counts.

The ECS control allows the requester to change the log entry type, the ECS
threshold count (provided the request falls within the limits specified in
DDR5 mode registers), switch between codeword mode and row count mode, and
reset the ECS counter.

Register with EDAC device driver, which retrieves the ECS attribute
descriptors from the EDAC ECS and exposes the ECS control attributes to
userspace via sysfs. For example, the ECS control for the memory media FRU0
in CXL mem0 device is located at /sys/bus/edac/devices/cxl_mem0/ecs_fru0/

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20250521124749.817-5-shiju.jose@huawei.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
5 months agocxl/edac: Add CXL memory device patrol scrub control feature
Shiju Jose [Wed, 21 May 2025 12:47:41 +0000 (13:47 +0100)]
cxl/edac: Add CXL memory device patrol scrub control feature

CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub
control feature. The device patrol scrub proactively locates and makes
corrections to errors in regular cycle.

Allow specifying the number of hours within which the patrol scrub must be
completed, subject to minimum and maximum limits reported by the device.
Also allow disabling scrub allowing trade-off error rates against
performance.

Add support for patrol scrub control on CXL memory devices.
Register with the EDAC device driver, which retrieves the scrub attribute
descriptors from EDAC scrub and exposes the sysfs scrub control attributes
to userspace. For example, scrub control for the CXL memory device
"cxl_mem0" is exposed in /sys/bus/edac/devices/cxl_mem0/scrubX/.

Additionally, add support for region-based CXL memory patrol scrub control.
CXL memory regions may be interleaved across one or more CXL memory
devices. For example, region-based scrub control for "cxl_region1" is
exposed in /sys/bus/edac/devices/cxl_region1/scrubX/.

[dj: A few formatting fixes from Jonathan]

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20250521124749.817-4-shiju.jose@huawei.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
5 months agocxl: Update prototype of function get_support_feature_info()
Shiju Jose [Wed, 21 May 2025 12:47:40 +0000 (13:47 +0100)]
cxl: Update prototype of function get_support_feature_info()

Add following changes to function get_support_feature_info()
1. Make generic to share between cxl-fwctl and cxl-edac paths.
2. Rename get_support_feature_info() to cxl_feature_info()
3. Change parameter const struct fwctl_rpc_cxl *rpc_in to
   const uuid_t *uuid.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20250521124749.817-3-shiju.jose@huawei.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
5 months agoEDAC: Update documentation for the CXL memory patrol scrub control feature
Shiju Jose [Wed, 21 May 2025 12:47:39 +0000 (13:47 +0100)]
EDAC: Update documentation for the CXL memory patrol scrub control feature

Update the Documentation/edac/scrub.rst to include use cases and
policies for CXL memory device-based, CXL region-based patrol scrub
control and CXL Error Check Scrub (ECS).

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20250521124749.817-2-shiju.jose@huawei.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
5 months agocxl/features: Remove the inline specifier from to_cxlfs()
Alison Schofield [Wed, 21 May 2025 23:36:23 +0000 (16:36 -0700)]
cxl/features: Remove the inline specifier from to_cxlfs()

to_cxlfs() was declared 'inline' in the header but only defined in
drivers/cxl/core/features.c. This has worked because features.c was
the only file using the function and the definition happened to be
available in the same compilation unit.

However, in preparation for a second .c file using the header and
needing to call the function, the inline specifier became an issue.
Sparse flagged the declaration as invalid since 'inline' requires a
visible definition at the point of use.

Defining the function in the header was considered but rejected, as
it depends on internal symbols not visible at that level.

Remove the inline specifier to correct the linkage violation.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Shiju Jose <shiju.jose@huawei.com>
Link: https://patch.msgid.link/20250521233625.1745849-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
5 months agoLinux 6.15-rc4
Linus Torvalds [Sun, 27 Apr 2025 22:19:23 +0000 (15:19 -0700)]
Linux 6.15-rc4

5 months agoMerge tag 'pci-v6.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Sat, 26 Apr 2025 20:02:36 +0000 (13:02 -0700)]
Merge tag 'pci-v6.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI fixes from Bjorn Helgaas:

 - When releasing a start-aligned resource, e.g., a bridge window, save
   start/end/flags for the next assignment attempt; fixes a v6.15-rc1
   regression (Ilpo Järvinen)

 - Move set_pcie_speed.sh from TEST_PROGS to TEST_FILE; fixes a bwctrl
   selftest v6.15-rc1 regression (Ilpo Järvinen)

 - Add Manivannan Sadhasivam as maintainer of native host bridge and
   endpoint drivers (Manivannan Sadhasivam)

 - In endpoint test driver, defer IRQ allocation from .probe() until
   ioctl() to fix a regression on platforms where the Vendor/Device ID
   match doesn't include driver_data (Niklas Cassel)

* tag 'pci-v6.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  misc: pci_endpoint_test: Defer IRQ allocation until ioctl(PCITEST_SET_IRQTYPE)
  MAINTAINERS: Move Manivannan Sadhasivam as PCI Native host bridge and endpoint maintainer
  selftests/pcie_bwctrl: Fix test progs list
  PCI: Restore assigned resources fully after release

5 months agoMerge tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Sat, 26 Apr 2025 17:43:03 +0000 (10:43 -0700)]
Merge tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Revert a v6.15 patch due to a report of SELinux test failures

* tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  Revert "sunrpc: clean cache_detail immediately when flush is written frequently"

5 months agoMerge tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 26 Apr 2025 16:45:54 +0000 (09:45 -0700)]
Merge tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

 - Fix 32-bit kernel boot crash if passed physical memory with more than
   32 address bits

 - Fix Xen PV crash

 - Work around build bug in certain limited build environments

 - Fix CTEST instruction decoding in insn_decoder_test

* tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/insn: Fix CTEST instruction decoding
  x86/boot: Work around broken busybox 'truncate' tool
  x86/mm: Fix _pgd_alloc() for Xen PV mode
  x86/e820: Discard high memory that can't be addressed by 32-bit systems

5 months agoMerge tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 26 Apr 2025 16:23:20 +0000 (09:23 -0700)]
Merge tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
 "Fix sporadic crashes in dequeue_entities() due to ... bad math.

  [ Arguably if pick_eevdf()/pick_next_entity() was less trusting of
    complex math being correct it could have de-escalated a crash into
    a warning, but that's for a different patch ]"

* tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash

5 months agoMerge tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 26 Apr 2025 16:13:09 +0000 (09:13 -0700)]
Merge tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc perf events fixes from Ingo Molnar:

 - Use POLLERR for events in error state, instead of the ambiguous
   POLLHUP error value

 - Fix non-sampling (counting) events on certain x86 platforms

* tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix non-sampling (counting) events on certain x86 platforms
  perf/core: Change to POLLERR for pinned events with error

5 months agoMerge tag 'irq-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 26 Apr 2025 16:08:45 +0000 (09:08 -0700)]
Merge tag 'irq-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Ingo Molnar:
 "Fix crashes in the gic-v2m irqchip driver, caused by an incorrect
  __init annotation"

* tag 'irq-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()

5 months agoMerge tag 'loongarch-fixes-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 26 Apr 2025 16:02:41 +0000 (09:02 -0700)]
Merge tag 'loongarch-fixes-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
 "Add a missing Kconfig option, fix some bugs in exception handlers,
  memory management and KVM"

* tag 'loongarch-fixes-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally
  LoongArch: KVM: Fully clear some CSRs when VM reboot
  LoongArch: KVM: Fix multiple typos of KVM code
  LoongArch: Return NULL from huge_pte_offset() for invalid PMD
  LoongArch: Remove a bogus reference to ZONE_DMA
  LoongArch: Handle fp, lsx, lasx and lbt assembly symbols
  LoongArch: Make do_xyz() exception handlers more robust
  LoongArch: Make regs_irqs_disabled() more clear
  LoongArch: Select ARCH_USE_MEMTEST

5 months agoMerge tag 'for-linus' of https://github.com/openrisc/linux
Linus Torvalds [Sat, 26 Apr 2025 16:01:13 +0000 (09:01 -0700)]
Merge tag 'for-linus' of https://github.com/openrisc/linux

Pull OpenRISC updates from Stafford Horne:

 - Support for cacheinfo API to expose OpenRISC cache info via sysfs,
   this also translated to some cleanups to OpenRISC cache flush and
   invalidate API's

 - Documentation updates for new mailing list and toolchain binaries

* tag 'for-linus' of https://github.com/openrisc/linux:
  Documentation: openrisc: Update toolchain binaries URL
  Documentation: openrisc: Update mailing list
  openrisc: Add cacheinfo support
  openrisc: Introduce new utility functions to flush and invalidate caches
  openrisc: Refactor struct cpuinfo_or1k to reduce duplication

5 months agoRevert "sunrpc: clean cache_detail immediately when flush is written frequently"
Chuck Lever [Thu, 24 Apr 2025 13:27:35 +0000 (09:27 -0400)]
Revert "sunrpc: clean cache_detail immediately when flush is written frequently"

Ondrej reports that certain SELinux tests are failing after commit
fc2a169c56de ("sunrpc: clean cache_detail immediately when flush is
written frequently"), merged during the v6.15 merge window.

Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Fixes: fc2a169c56de ("sunrpc: clean cache_detail immediately when flush is written frequently")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
5 months agoMerge tag 'move-lib-kunit-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 26 Apr 2025 15:55:24 +0000 (08:55 -0700)]
Merge tag 'move-lib-kunit-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kunit fix from Kees Cook:
 "A single fix for the kunit lib/tests/ relocation:

   - Ensure prime numbers tests are included in KUnit test runs (Mark Brown)"

* tag 'move-lib-kunit-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lib: Ensure prime numbers tests are included in KUnit test runs

5 months agoMerge tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Sat, 26 Apr 2025 15:32:29 +0000 (08:32 -0700)]
Merge tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Weekly drm fixes, mostly amdgpu, with some exynos cleanups and a
  couple of minor fixes, seems a bit quiet, but probably some lag from
  Easter holidays.

  amdgpu:
   - P2P DMA fixes
   - Display reset fixes
   - DCN 3.5 fixes
   - ACPI EDID fix
   - LTTPR fix
   - mode_valid() fix

  exynos:
   - fix spelling error
   - remove redundant error handling in exynos_drm_vidi.c module
   - marks struct decon_data as const in the exynos7_drm_decon driver
     since it is only read
   - Remove unnecessary checking in exynos_drm_drv.c module

  meson:
   - Fix VCLK calculation

  panel:
   - jd9365a: Fix reset polarity"

* tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernel:
  drm/exynos: Fix spelling mistake "enqueu" -> "enqueue"
  drm/exynos: exynos7_drm_decon: Consstify struct decon_data
  drm/exynos: fixed a spelling error
  drm/exynos/vidi: Remove redundant error handling in vidi_get_modes()
  drm/exynos: Remove unnecessary checking
  drm/amd/display: do not copy invalid CRTC timing info
  drm/amd/display: Default IPS to RCG_IN_ACTIVE_IPS2_IN_OFF
  drm/amd/display: Use 16ms AUX read interval for LTTPR with old sinks
  drm/amd/display: Fix ACPI edid parsing on some Lenovo systems
  drm/amdgpu: Allow P2P access through XGMI
  drm/amd/display: Enable urgent latency adjustment on DCN35
  drm/amd/display: Force full update in gpu reset
  drm/amd/display: Fix gpu reset in multidisplay config
  drm/amdgpu: Don't pin VRAM without DMABUF_MOVE_NOTIFY
  drm/amdgpu: Use allowed_domains for pinning dmabufs
  drm: panel: jd9365da: fix reset signal polarity in unprepare
  drm/meson: use unsigned long long / Hz for frequency types
  Revert "drm/meson: vclk: fix calculation of 59.94 fractional rates"

5 months agosched/eevdf: Fix se->slice being set to U64_MAX and resulting crash
Omar Sandoval [Fri, 25 Apr 2025 08:51:24 +0000 (01:51 -0700)]
sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash

There is a code path in dequeue_entities() that can set the slice of a
sched_entity to U64_MAX, which sometimes results in a crash.

The offending case is when dequeue_entities() is called to dequeue a
delayed group entity, and then the entity's parent's dequeue is delayed.
In that case:

1. In the if (entity_is_task(se)) else block at the beginning of
   dequeue_entities(), slice is set to
   cfs_rq_min_slice(group_cfs_rq(se)). If the entity was delayed, then
   it has no queued tasks, so cfs_rq_min_slice() returns U64_MAX.
2. The first for_each_sched_entity() loop dequeues the entity.
3. If the entity was its parent's only child, then the next iteration
   tries to dequeue the parent.
4. If the parent's dequeue needs to be delayed, then it breaks from the
   first for_each_sched_entity() loop _without updating slice_.
5. The second for_each_sched_entity() loop sets the parent's ->slice to
   the saved slice, which is still U64_MAX.

This throws off subsequent calculations with potentially catastrophic
results. A manifestation we saw in production was:

6. In update_entity_lag(), se->slice is used to calculate limit, which
   ends up as a huge negative number.
7. limit is used in se->vlag = clamp(vlag, -limit, limit). Because limit
   is negative, vlag > limit, so se->vlag is set to the same huge
   negative number.
8. In place_entity(), se->vlag is scaled, which overflows and results in
   another huge (positive or negative) number.
9. The adjusted lag is subtracted from se->vruntime, which increases or
   decreases se->vruntime by a huge number.
10. pick_eevdf() calls entity_eligible()/vruntime_eligible(), which
    incorrectly returns false because the vruntime is so far from the
    other vruntimes on the queue, causing the
    (vruntime - cfs_rq->min_vruntime) * load calulation to overflow.
11. Nothing appears to be eligible, so pick_eevdf() returns NULL.
12. pick_next_entity() tries to dereference the return value of
    pick_eevdf() and crashes.

Dumping the cfs_rq states from the core dumps with drgn showed tell-tale
huge vruntime ranges and bogus vlag values, and I also traced se->slice
being set to U64_MAX on live systems (which was usually "benign" since
the rest of the runqueue needed to be in a particular state to crash).

Fix it in dequeue_entities() by always setting slice from the first
non-empty cfs_rq.

Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/f0c2d1072be229e1bdddc73c0703919a8b00c652.1745570998.git.osandov@fb.com
5 months agoirqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()
Suzuki K Poulose [Tue, 22 Apr 2025 16:16:16 +0000 (17:16 +0100)]
irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()

With ACPI in place, gicv2m_get_fwnode() is registered with the pci
subsystem as pci_msi_get_fwnode_cb(), which may get invoked at runtime
during a PCI host bridge probe. But, the call back is wrongly marked as
__init, causing it to be freed, while being registered with the PCI
subsystem and could trigger:

 Unable to handle kernel paging request at virtual address ffff8000816c0400
  gicv2m_get_fwnode+0x0/0x58 (P)
  pci_set_bus_msi_domain+0x74/0x88
  pci_register_host_bridge+0x194/0x548

This is easily reproducible on a Juno board with ACPI boot.

Retain the function for later use.

Fixes: 0644b3daca28 ("irqchip/gic-v2m: acpi: Introducing GICv2m ACPI support")
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
5 months agoLoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally
Bibo Mao [Thu, 24 Apr 2025 12:15:52 +0000 (20:15 +0800)]
LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally

In function kvm_pre_enter_guest(), it prepares to enter guest and check
whether there are pending signals or events. And it will not enter guest
if there are, PMU pass-through preparation for guest should be cancelled
and host should own PMU hardware.

Cc: stable@vger.kernel.org
Fixes: f4e40ea9f78f ("LoongArch: KVM: Add PMU support for guest")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
5 months agoLoongArch: KVM: Fully clear some CSRs when VM reboot
Bibo Mao [Thu, 24 Apr 2025 12:15:52 +0000 (20:15 +0800)]
LoongArch: KVM: Fully clear some CSRs when VM reboot

Some registers such as LOONGARCH_CSR_ESTAT and LOONGARCH_CSR_GINTC are
partly cleared with function _kvm_setcsr(). This comes from the hardware
specification, some bits are read only in VM mode, and however they can
be written in host mode. So they are partly cleared in VM mode, and can
be fully cleared in host mode.

These read only bits show pending interrupt or exception status. When VM
reset, the read-only bits should be cleared, otherwise vCPU will receive
unknown interrupts in boot stage.

Here registers LOONGARCH_CSR_ESTAT/LOONGARCH_CSR_GINTC are fully cleared
in ioctl KVM_REG_LOONGARCH_VCPU_RESET vCPU reset path.

Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
5 months agoLoongArch: KVM: Fix multiple typos of KVM code
Yulong Han [Thu, 24 Apr 2025 12:15:52 +0000 (20:15 +0800)]
LoongArch: KVM: Fix multiple typos of KVM code

Fix multiple typos inside arch/loongarch/kvm.

Cc: stable@vger.kernel.org
Reviewed-by: Yuli Wang <wangyuli@uniontech.com>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Yulong Han <wheatfox17@icloud.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
5 months agoLoongArch: Return NULL from huge_pte_offset() for invalid PMD
Ming Wang [Thu, 24 Apr 2025 12:15:47 +0000 (20:15 +0800)]
LoongArch: Return NULL from huge_pte_offset() for invalid PMD

LoongArch's huge_pte_offset() currently returns a pointer to a PMD slot
even if the underlying entry points to invalid_pte_table (indicating no
mapping). Callers like smaps_hugetlb_range() fetch this invalid entry
value (the address of invalid_pte_table) via this pointer.

The generic is_swap_pte() check then incorrectly identifies this address
as a swap entry on LoongArch, because it satisfies the "!pte_present()
&& !pte_none()" conditions. This misinterpretation, combined with a
coincidental match by is_migration_entry() on the address bits, leads to
kernel crashes in pfn_swap_entry_to_page().

Fix this at the architecture level by modifying huge_pte_offset() to
check the PMD entry's content using pmd_none() before returning. If the
entry is invalid (i.e., it points to invalid_pte_table), return NULL
instead of the pointer to the slot.

Cc: stable@vger.kernel.org
Acked-by: Peter Xu <peterx@redhat.com>
Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Ming Wang <wangming01@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
5 months agoLoongArch: Remove a bogus reference to ZONE_DMA
Petr Tesarik [Thu, 24 Apr 2025 12:15:41 +0000 (20:15 +0800)]
LoongArch: Remove a bogus reference to ZONE_DMA

Remove dead code. LoongArch does not have a DMA memory zone (24bit DMA).
The architecture does not even define MAX_DMA_PFN.

Cc: stable@vger.kernel.org
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
5 months agoLoongArch: Handle fp, lsx, lasx and lbt assembly symbols
Tiezhu Yang [Thu, 24 Apr 2025 12:15:41 +0000 (20:15 +0800)]
LoongArch: Handle fp, lsx, lasx and lbt assembly symbols

Like the other relevant symbols, export some fp, lsx, lasx and lbt
assembly symbols and put the function declarations in header files
rather than source files.

While at it, use "asmlinkage" for the other existing C prototypes
of assembly functions and also do not use the "extern" keyword with
function declarations according to the document coding-style.rst.

Cc: stable@vger.kernel.org # 6.6+
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
5 months agoLoongArch: Make do_xyz() exception handlers more robust
Tiezhu Yang [Thu, 24 Apr 2025 12:15:41 +0000 (20:15 +0800)]
LoongArch: Make do_xyz() exception handlers more robust

Currently, interrupts need to be disabled before single-step mode is
set, it requires that CSR_PRMD_PIE be cleared in save_local_irqflag()
which is called by setup_singlestep(), this is reasonable.

But in the first kprobe breakpoint exception, if the irq is enabled at
the beginning of do_bp(), it will not be disabled at the end of do_bp()
due to the CSR_PRMD_PIE has been cleared in save_local_irqflag(). So for
this case, it may corrupt exception context when restoring the exception
after do_bp() in handle_bp(), this is not reasonable.

In order to restore exception safely in handle_bp(), it needs to ensure
the irq is disabled at the end of do_bp(), so just add a local variable
to record the original interrupt status in the parent context, then use
it as the check condition to enable and disable irq in do_bp().

While at it, do the similar thing for other do_xyz() exception handlers
to make them more robust.

Fixes: 6d4cc40fb5f5 ("LoongArch: Add kprobes support")
Suggested-by: Jinyang He <hejinyang@loongson.cn>
Suggested-by: Huacai Chen <chenhuacai@loongson.cn>
Co-developed-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
5 months agoLoongArch: Make regs_irqs_disabled() more clear
Tiezhu Yang [Thu, 24 Apr 2025 12:15:41 +0000 (20:15 +0800)]
LoongArch: Make regs_irqs_disabled() more clear

In the current code, the definition of regs_irqs_disabled() is actually
"!(regs->csr_prmd & CSR_CRMD_IE)" because arch_irqs_disabled_flags() is
defined as "!(flags & CSR_CRMD_IE)", it looks a little strange.

Define regs_irqs_disabled() as !(regs->csr_prmd & CSR_PRMD_PIE) directly
to make it more clear, no functional change.

While at it, the return value of regs_irqs_disabled() is true or false,
so change its type to reflect that and also make it always inline.

Fixes: 803b0fc5c3f2 ("LoongArch: Add process management")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
5 months agoLoongArch: Select ARCH_USE_MEMTEST
Yuli Wang [Thu, 24 Apr 2025 12:15:22 +0000 (20:15 +0800)]
LoongArch: Select ARCH_USE_MEMTEST

As of commit dce44566192e ("mm/memtest: add ARCH_USE_MEMTEST"),
architectures must select ARCH_USE_MEMTESET to enable CONFIG_MEMTEST.

Commit 628c3bb40e9a ("LoongArch: Add boot and setup routines") added
support for early_memtest but did not select ARCH_USE_MEMTESET.

Fixes: 628c3bb40e9a ("LoongArch: Add boot and setup routines")
Tested-by: Erpeng Xu <xuerpeng@uniontech.com>
Tested-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
5 months agoMerge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Linus Torvalds [Sat, 26 Apr 2025 00:53:09 +0000 (17:53 -0700)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Add namespace to BPF internal symbols (Alexei Starovoitov)

 - Fix possible endless loop in BPF map iteration (Brandon Kammerdiener)

 - Fix compilation failure for samples/bpf on LoongArch (Haoran Jiang)

 - Disable a part of sockmap_ktls test (Ihor Solodrai)

 - Correct typo in __clang_major__ macro (Peilin Ye)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Correct typo in __clang_major__ macro
  samples/bpf: Fix compilation failure for samples/bpf on LoongArch Fedora
  bpf: Add namespace to BPF internal symbols
  selftests/bpf: add test for softlock when modifying hashmap while iterating
  bpf: fix possible endless loop in BPF map iteration
  selftests/bpf: Mitigate sockmap_ktls disconnect_after_delete failure

5 months agoselftests/bpf: Correct typo in __clang_major__ macro
Peilin Ye [Fri, 25 Apr 2025 21:37:10 +0000 (21:37 +0000)]
selftests/bpf: Correct typo in __clang_major__ macro

Make sure that CAN_USE_BPF_ST test (compute_live_registers/store) is
enabled when __clang_major__ >= 18.

Fixes: 2ea8f6a1cda7 ("selftests/bpf: test cases for compute_live_registers()")
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/20250425213712.1542077-1-yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 months agoMerge tag 'ata-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata...
Linus Torvalds [Fri, 25 Apr 2025 23:31:10 +0000 (16:31 -0700)]
Merge tag 'ata-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fixes from Damien Le Moal:

 - Fix the incorrect return type of ata_mselect_control_ata_feature()

 - Several fixes for the control of the Command Duration Limits feature
   to avoid unnecessary enable and disable actions. Avoiding the
   unnecessary enable action also avoids unwanted resets of the CDL
   statistics log page as that is implied for any enable action.

 - Fix the translation for sensing the control mode page to correctly
   return the last enable or disable action performed, as defined in
   SAT-6. This correct mode sense information is used to fix the
   behavior of the scsi layer to avoid unnecessary mode select command
   issuing.

* tag 'ata-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  scsi: Improve CDL control
  ata: libata-scsi: Improve CDL control
  ata: libata-scsi: Fix ata_msense_control_ata_feature()
  ata: libata-scsi: Fix ata_mselect_control_ata_feature() return type

5 months agoMerge tag 'vfs-6.15-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Linus Torvalds [Fri, 25 Apr 2025 22:57:21 +0000 (15:57 -0700)]
Merge tag 'vfs-6.15-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - For some reason we went from zero to three maintainers for HFS/HFS+
   in a matter of days. The lesson to learn from this might just be that
   we need to threaten code removal more often!?

 - Fix a regression introduced by enabling large folios for lage logical
   block sizes. This has caused issues for noref migration with large
   folios due to sleeping while in an atomic context.

   New sleeping variants of pagecache lookup helpers are introduced.
   These helpers take the folio lock instead of the mapping's private
   spinlock. The problematic users are converted to the sleeping
   variants and serialize against noref migration. Atomic users will
   bail on seeing the new BH_Migrate flag.

   This also shrinks the critical region of the mapping's private lock
   and the new blocking callers reduce contention on the spinlock for
   bdev mappings.

 - Fix two bugs in do_move_mount() when with MOVE_MOUNT_BENEATH. The
   first bug is using a mountpoint that is located on a mount we're not
   holding a reference to. The second bug is putting the mountpoint
   after we've called namespace_unlock() as it's no longer guaranteed
   that it does stay a mountpoint.

 - Remove a pointless call to vfs_getattr_nosec() in the devtmpfs code
   just to query i_mode instead of simply querying the inode directly.
   This also avoids lifetime issues for the dm code by an earlier bugfix
   this cycle that moved bdev_statx() handling into vfs_getattr_nosec().

 - Fix AT_FDCWD handling with getname_maybe_null() in the xattr code.

 - Fix a performance regression for files when multiple callers issue a
   close when it's not the last reference.

 - Remove a duplicate noinline annotation from pipe_clear_nowait().

* tag 'vfs-6.15-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs/xattr: Fix handling of AT_FDCWD in setxattrat(2) and getxattrat(2)
  MAINTAINERS: hfs/hfsplus: add myself as maintainer
  splice: remove duplicate noinline from pipe_clear_nowait
  devtmpfs: don't use vfs_getattr_nosec to query i_mode
  fix a couple of races in MNT_TREE_BENEATH handling by do_move_mount()
  fs: fall back to file_ref_put() for non-last reference
  mm/migrate: fix sleep in atomic for large folios and buffer heads
  fs/ext4: use sleeping version of sb_find_get_block()
  fs/jbd2: use sleeping version of __find_get_block()
  fs/ocfs2: use sleeping version of __find_get_block()
  fs/buffer: use sleeping version of __find_get_block()
  fs/buffer: introduce sleeping flavors for pagecache lookups
  MAINTAINERS: add HFS/HFS+ maintainers
  fs/buffer: split locking for pagecache lookups

5 months agoMerge tag 'ceph-for-6.15-rc4' of https://github.com/ceph/ceph-client
Linus Torvalds [Fri, 25 Apr 2025 22:51:28 +0000 (15:51 -0700)]
Merge tag 'ceph-for-6.15-rc4' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A small CephFS encryption-related fix and a dead code cleanup"

* tag 'ceph-for-6.15-rc4' of https://github.com/ceph/ceph-client:
  ceph: Fix incorrect flush end position calculation
  ceph: Remove osd_client deadcode

5 months agoMerge tag 'cxl-fixes-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Linus Torvalds [Fri, 25 Apr 2025 22:21:11 +0000 (15:21 -0700)]
Merge tag 'cxl-fixes-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl fixes from Dave Jiang:
 "The fixes address global persistent flush (GPF) changes and CXL
  Features support changes that went in the 6.15 merge window. And also
  a fix to an issue observed on CXL 1.1 platform during device
  enumeration.

  Summary:

   - Fix using the wrong GPF DVSEC location:
       - Fix caching of dport GPF DVSEC from the first endpoint
       - Ensure that the GPF phase timeout is only updated once by first
         endpoint
       - Drop is_port parameter for cxl_gpf_get_dvsec()

   - Fix the devm_* call host device for CXL fwctl setup

   - Set the out_len in Set Features failure case

   - Fix RCD initialization by skipping unneeded mem_en check"

* tag 'cxl-fixes-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/core/regs.c: Skip Memory Space Enable check for RCD and RCH Ports
  cxl/feature: Update out_len in set feature failure case
  cxl: Fix devm host device for CXL fwctl initialization
  cxl/pci: Drop the parameter is_port of cxl_gpf_get_dvsec()
  cxl/pci: Update Port GPF timeout only when the first EP attaching
  cxl/core: Fix caching dport GPF DVSEC issue

5 months agoMerge tag 'amd-drm-fixes-6.15-2025-04-23' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Fri, 25 Apr 2025 22:12:15 +0000 (08:12 +1000)]
Merge tag 'amd-drm-fixes-6.15-2025-04-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.15-2025-04-23:

amdgpu:
- P2P DMA fixes
- Display reset fixes
- DCN 3.5 fixes
- ACPI EDID fix
- LTTPR fix
- mode_valid() fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250423183045.2886753-1-alexander.deucher@amd.com
5 months agoMerge tag 'exynos-drm-fixes-for-v6.15-rc4' of git://git.kernel.org/pub/scm/linux...
Dave Airlie [Fri, 25 Apr 2025 22:11:49 +0000 (08:11 +1000)]
Merge tag 'exynos-drm-fixes-for-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes

Several fixups
- fix spelling error
- remove redundant error handling in exynos_drm_vidi.c module.
- marks struct decon_data as const in the exynos7_drm_decon driver since it is only read.

Cleanup
- Remove unnecessary checking in exynos_drm_drv.c module

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <inki.dae@samsung.com>
Link: https://lore.kernel.org/r/20250423143044.46165-1-inki.dae@samsung.com
5 months agoMerge tag 'drm-misc-fixes-2025-04-22' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Fri, 25 Apr 2025 21:59:30 +0000 (07:59 +1000)]
Merge tag 'drm-misc-fixes-2025-04-22' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

meson:
- Fix VCLK calculation

panel:
- jd9365a: Fix reset polarity

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250422151209.GA24823@2a02-2454-fd5e-fd00-5cc9-93f1-8e9a-df9b.dyn6.pyur.net
5 months agoMerge tag 'riscv-for-linus-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 25 Apr 2025 20:22:08 +0000 (13:22 -0700)]
Merge tag 'riscv-for-linus-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A fix for a missing icache flush in uprobes, which manifests as at
   least a BFF selftest failure on the Spacemit X1

 - A workaround for build warnings in flush_icache_range()

* tag 'riscv-for-linus-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: uprobes: Add missing fence.i after building the XOL buffer
  riscv: Replace function-like macro by static inline function

5 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 25 Apr 2025 19:00:56 +0000 (12:00 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Single fix for broken usage of 'multi-MIDR' infrastructure in PI
     code, adding an open-coded erratum check for everyone's favorite
     pile of sand: Cavium ThunderX

  x86:

   - Bugfixes from a planned posted interrupt rework

   - Do not use kvm_rip_read() unconditionally to cater for guests with
     inaccessible register state"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Do not use kvm_rip_read() unconditionally for KVM_PROFILING
  KVM: x86: Do not use kvm_rip_read() unconditionally in KVM tracepoints
  KVM: SVM: WARN if an invalid posted interrupt IRTE entry is added
  iommu/amd: WARN if KVM attempts to set vCPU affinity without posted intrrupts
  iommu/amd: Return an error if vCPU affinity is set for non-vCPU IRTE
  KVM: x86: Take irqfds.lock when adding/deleting IRQ bypass producer
  KVM: x86: Explicitly treat routing entry type changes as changes
  KVM: x86: Reset IRTE to host control if *new* route isn't postable
  KVM: SVM: Allocate IR data using atomic allocation
  KVM: SVM: Don't update IRTEs if APICv/AVIC is disabled
  KVM: arm64, x86: make kvm_arch_has_irq_bypass() inline
  arm64: Rework checks for broken Cavium HW in the PI code

5 months agoMerge tag 'block-6.15-20250424' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 25 Apr 2025 18:34:39 +0000 (11:34 -0700)]
Merge tag 'block-6.15-20250424' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - Fix autoloading of drivers from stat*(2)

 - Fix losing read-ahead setting one suspend/resume, when a device is
   re-probed.

 - Fix race between setting the block size and page cache updates.
   Includes a helper that a coming XFS fix will use as well.

 - ublk cancelation fixes.

 - ublk selftest additions and fixes.

 - NVMe pull via Christoph:
      - fix an out-of-bounds access in nvmet_enable_port (Richard
        Weinberger)

* tag 'block-6.15-20250424' of git://git.kernel.dk/linux:
  ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
  ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA
  block: don't autoload drivers on blk-cgroup configuration
  block: don't autoload drivers on stat
  block: remove the backing_inode variable in bdev_statx
  block: move blkdev_{get,put} _no_open prototypes out of blkdev.h
  block: never reduce ra_pages in blk_apply_bdi_limits
  selftests: ublk: common: fix _get_disk_dev_t for pre-9.0 coreutils
  selftests: ublk: remove useless 'delay_us' from 'struct dev_ctx'
  selftests: ublk: fix recover test
  block: hoist block size validation code to a separate function
  block: fix race between set_blocksize and read paths
  nvmet: fix out-of-bounds access in nvmet_enable_port

5 months agoMerge tag 'io_uring-6.15-20250424' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 25 Apr 2025 18:31:47 +0000 (11:31 -0700)]
Merge tag 'io_uring-6.15-20250424' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Fix an older bug for handling of fallback task_work, when the task is
   exiting. Found by code inspection while reworking cancelation.

 - Fix duplicate flushing in one of the CQE posting helpers.

* tag 'io_uring-6.15-20250424' of git://git.kernel.dk/linux:
  io_uring: fix 'sync' handling of io_fallback_tw()
  io_uring: don't duplicate flushing in io_req_post_cqe

5 months agoMerge tag 'pm-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 25 Apr 2025 17:56:19 +0000 (10:56 -0700)]
Merge tag 'pm-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These are cpufreq driver fixes addressing multiple assorted issues:

   - Fix possible out-of-bound / NULL-ptr-deref in cpufreq drivers
     (Henry Martin, Andre Przywara)

   - Fix Kconfig issues with compile-test in cpufreq drivers (Krzysztof
     Kozlowski, Johan Hovold)

   - Fix invalid return value in .get() in the CPPC cpufreq driver (Marc
     Zyngier)

   - Add SM8650 to cpufreq-dt-platdev blocklist (Pengyu Luo)"

* tag 'pm-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: fix compile-test defaults
  cpufreq: cppc: Fix invalid return value in .get() callback
  cpufreq: scpi: Fix null-ptr-deref in scpi_cpufreq_get_rate()
  cpufreq: scmi: Fix null-ptr-deref in scmi_cpufreq_get_rate()
  cpufreq: apple-soc: Fix null-ptr-deref in apple_soc_cpufreq_get_rate()
  cpufreq: Do not enable by default during compile testing
  cpufreq: Add SM8650 to cpufreq-dt-platdev blocklist
  cpufreq: sun50i: prevent out-of-bounds access

5 months agoMerge tag 'usb-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Fri, 25 Apr 2025 17:48:16 +0000 (10:48 -0700)]
Merge tag 'usb-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small USB driver fixes and new device ids for 6.15-rc4.
  Nothing major in here, just the normal set of issues that have cropped
  up after -rc1:

   - new device ids for usb-serial drivers

   - new device quirks added

   - typec driver fixes

   - chipidea driver fixes

   - xhci driver fixes

   - wdm driver fixes

   - cdns3 driver fixes

   - MAINTAINERS file update

  All of these, except for the MAINTAINERS file update, have been in
  linux-next for a while with no reported issues"

* tag 'usb-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (27 commits)
  MAINTAINERS: Assign maintainer for the port controller drivers
  USB: serial: simple: add OWON HDS200 series oscilloscope support
  USB: serial: ftdi_sio: add support for Abacus Electrics Optical Probe
  USB: serial: option: add Sierra Wireless EM9291
  usb: typec: class: Unlocked on error in typec_register_partner()
  usb: quirks: Add delay init quirk for SanDisk 3.2Gen1 Flash Drive
  USB: wdm: add annotation
  USB: wdm: wdm_wwan_port_tx_complete mutex in atomic context
  USB: wdm: close race between wdm_open and wdm_wwan_port_stop
  USB: wdm: handle IO errors in wdm_wwan_port_start
  USB: VLI disk crashes if LPM is used
  usb: dwc3: gadget: check that event count does not exceed event buffer length
  USB: storage: quirk for ADATA Portable HDD CH94
  usb: quirks: add DELAY_INIT quirk for Silicon Motion Flash Drive
  USB: OHCI: Add quirk for LS7A OHCI controller (rev 0x02)
  usb: dwc3: xilinx: Prevent spike in reset signal
  usb: cdns3: Fix deadlock when using NCM gadget
  usb: chipidea: ci_hdrc_imx: implement usb_phy_init() error handling
  usb: chipidea: ci_hdrc_imx: fix call balance of regulator routines
  usb: chipidea: ci_hdrc_imx: fix usbmisc handling
  ...

5 months agoMerge tag 'tty-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Fri, 25 Apr 2025 17:44:07 +0000 (10:44 -0700)]
Merge tag 'tty-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are three small tty/serial driver fixes for 6.15-rc4 to resolve
  some reported issues. They are:

   - permissions change for TIOCL_SELMOUSEREPORT to resolve a relaxing
     of permissions that showed up 6.14 that wasn't _quite_ right.

   - sifive serial driver fix

   - msm serial driver fix

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'tty-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: sifive: lock port in startup()/shutdown() callbacks
  tty: Require CAP_SYS_ADMIN for all usages of TIOCL_SELMOUSEREPORT
  serial: msm: Configure correct working mode before starting earlycon

5 months agoMerge tag 'char-misc-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Fri, 25 Apr 2025 17:30:40 +0000 (10:30 -0700)]
Merge tag 'char-misc-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small char/misc driver fixes to resolve reported
  problems for 6.15-rc4. Included in here are:

   - misc chrdev region range fix reported by many people

   - nvmem driver fixes and dt updates

   - mei new device id and fixes

   - comedi driver fix

   - pps driver fix

   - binder debug log fix

   - pci1xxxx driver fixes

   - firmware driver fix

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'char-misc-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (25 commits)
  firmware: stratix10-svc: Add of_platform_default_populate()
  mei: vsc: Use struct vsc_tp_packet as vsc-tp tx_buf and rx_buf type
  mei: vsc: Fix fortify-panic caused by invalid counted_by() use
  pps: generators: tio: fix platform_set_drvdata()
  mcb: fix a double free bug in chameleon_parse_gdd()
  misc: microchip: pci1xxxx: Fix incorrect IRQ status handling during ack
  misc: microchip: pci1xxxx: Fix Kernel panic during IRQ handler registration
  char: misc: register chrdev region with all possible minors
  mei: me: add panther lake H DID
  comedi: jr3_pci: Fix synchronous deletion of timer
  binder: fix offset calculation in debug log
  intel_th: avoid using deprecated page->mapping, index fields
  dt-bindings: nvmem: Add compatible for MSM8960
  dt-bindings: nvmem: Add compatible for IPQ5018
  nvmem: qfprom: switch to 4-byte aligned reads
  nvmem: core: update raw_len if the bit reading is required
  nvmem: core: verify cell's raw_len
  nvmem: core: fix bit offsets of more than one byte
  dt-bindings: nvmem: fixed-cell: increase bits start value to 31
  dt-bindings: nvmem: Add compatible for MS8937
  ...

5 months agoMerge tag 'driver-core-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 25 Apr 2025 17:02:59 +0000 (10:02 -0700)]
Merge tag 'driver-core-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core

Pull driver core fixes from Greg KH:
 "Here are some small driver core fixes to resolve a number of reported
  problems. Included in here are:

   - driver core sync fix revert to resolve a much reported problem,
     hopefully this is finally resolved

   - MAINTAINERS file update, documenting that the driver-core tree is
     now under a "shared" maintainership model, thanks to Rafael and
     Danilo for offering to do this!

   - auxbus documentation and MAINTAINERS file update

   - MAINTAINERS file update for Rust PCI code

   - firmware rust binding fixup

   - software node link fix

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'driver-core-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  drivers/base/memory: Avoid overhead from for_each_present_section_nr()
  software node: Prevent link creation failure from causing kobj reference count imbalance
  device property: Add a note to the fwnode.h
  drivers/base: Add myself as auxiliary bus reviewer
  drivers/base: Extend documentation with preferred way to use auxbus
  driver core: fix potential NULL pointer dereference in dev_uevent()
  driver core: introduce device_set_driver() helper
  Revert "drivers: core: synchronize really_probe() and dev_uevent()"
  MAINTAINERS: update the location of the driver-core git tree
  rust: firmware: Use `ffi::c_char` type in `FwFunc`
  MAINTAINERS: pci: add entry for Rust PCI code

5 months agoMerge tag 'dma-mapping-6.15-2025-04-25' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 25 Apr 2025 16:44:53 +0000 (09:44 -0700)]
Merge tag 'dma-mapping-6.15-2025-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-maping fixes from Marek Szyprowski:

 - avoid unused variable warnings (Arnd Bergmann, Marek Szyprowski)

 - add runtume warnings and debug messages for devices with limited DMA
   capabilities (Balbir Singh, Chen-Yu Tsai)

* tag 'dma-mapping-6.15-2025-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma-coherent: Warn if OF reserved memory is beyond current coherent DMA mask
  dma-mapping: Fix warning reported for missing prototype
  dma-mapping: avoid potential unused data compilation warning
  dma/mapping.c: dev_dbg support for dma_addressing_limited
  dma/contiguous: avoid warning about unused size_bytes

5 months agoMerge tag 'xfs-fixes-6.15-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Fri, 25 Apr 2025 16:37:21 +0000 (09:37 -0700)]
Merge tag 'xfs-fixes-6.15-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:
 "This contains a fix for a build failure on some 32-bit architectures
  and a warning generating docs"

* tag 'xfs-fixes-6.15-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: remove duplicate Zoned Filesystems sections in admin-guide
  XFS: fix zoned gc threshold math for 32-bit arches

5 months agosamples/bpf: Fix compilation failure for samples/bpf on LoongArch Fedora
Haoran Jiang [Fri, 25 Apr 2025 09:50:42 +0000 (17:50 +0800)]
samples/bpf: Fix compilation failure for samples/bpf on LoongArch Fedora

When building the latest samples/bpf on LoongArch Fedora

     make M=samples/bpf

There are compilation errors as follows:

In file included from ./linux/samples/bpf/sockex2_kern.c:2:
In file included from ./include/uapi/linux/in.h:25:
In file included from ./include/linux/socket.h:8:
In file included from ./include/linux/uio.h:9:
In file included from ./include/linux/thread_info.h:60:
In file included from ./arch/loongarch/include/asm/thread_info.h:15:
In file included from ./arch/loongarch/include/asm/processor.h:13:
In file included from ./arch/loongarch/include/asm/cpu-info.h:11:
./arch/loongarch/include/asm/loongarch.h:13:10: fatal error: 'larchintrin.h' file not found
         ^~~~~~~~~~~~~~~
1 error generated.

larchintrin.h is included in /usr/lib64/clang/14.0.6/include,
and the header file location is specified at compile time.

Test on LoongArch Fedora:
https://github.com/fedora-remix-loongarch/releases-info

Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn>
Signed-off-by: zhangxi <zhangxi@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250425095042.838824-1-jianghaoran@kylinos.cn
5 months agobpf: Add namespace to BPF internal symbols
Alexei Starovoitov [Fri, 25 Apr 2025 01:45:42 +0000 (18:45 -0700)]
bpf: Add namespace to BPF internal symbols

Add namespace to BPF internal symbols used by light skeleton
to prevent abuse and document with the code their allowed usage.

Fixes: b1d18a7574d0 ("bpf: Extend sys_bpf commands for bpf_syscall programs.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20250425014542.62385-1-alexei.starovoitov@gmail.com
5 months agoMerge tag 'bcachefs-2025-04-24' of git://evilpiepirate.org/bcachefs
Linus Torvalds [Fri, 25 Apr 2025 16:06:14 +0000 (09:06 -0700)]
Merge tag 'bcachefs-2025-04-24' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:

 - Case insensitive directories now work

 - Ciemap now correctly reports on unwritten pagecache data

 - bcachefs tools 1.25.1 was incorrectly picking unaligned bucket sizes;
   fix journal and write path bugs this uncovered

And assorted smaller fixes...

* tag 'bcachefs-2025-04-24' of git://evilpiepirate.org/bcachefs: (24 commits)
  bcachefs: Rework fiemap transaction restart handling
  bcachefs: add fiemap delalloc extent detection
  bcachefs: refactor fiemap processing into extent helper and struct
  bcachefs: track current fiemap offset in start variable
  bcachefs: drop duplicate fiemap sync flag
  bcachefs: Fix btree_iter_peek_prev() at end of inode
  bcachefs: Make btree_iter_peek_prev() assert more precise
  bcachefs: Unit test fixes
  bcachefs: Print mount opts earlier
  bcachefs: unlink: casefold d_invalidate
  bcachefs: Fix casefold lookups
  bcachefs: Casefold is now a regular opts.h option
  bcachefs: Implement fileattr_(get|set)
  bcachefs: Allocator now copes with unaligned buckets
  bcachefs: Start copygc, rebalance threads earlier
  bcachefs: Refactor bch2_run_recovery_passes()
  bcachefs: bch2_copygc_wakeup()
  bcachefs: Fix ref leak in write_super()
  bcachefs: Change __journal_entry_close() assert to ERO
  bcachefs: Ensure journal space is block size aligned
  ...

5 months agoMerge branch 'bpf-fix-softlock-condition-in-bpf-hashmap-interation'
Alexei Starovoitov [Fri, 25 Apr 2025 15:36:59 +0000 (08:36 -0700)]
Merge branch 'bpf-fix-softlock-condition-in-bpf-hashmap-interation'

Brandon Kammerdiener says:

====================
This patchset fixes an endless loop condition that can occur in
bpf_for_each_hash_elem, causing the core to softlock. My understanding is
that a combination of RCU list deletion and insertion introduces the new
element after the iteration cursor and that there is a chance that an RCU
reader may in fact use this new element in iteration. The patch uses a
_safe variant of the macro which gets the next element to iterate before
executing the loop body for the current element.

I have also added a subtest in the for_each selftest that can trigger this
condition without the fix.

Changes since v2:
- Renaming and additional checks in selftests/bpf/prog_tests/for_each.c

Changes since v1:
- Added missing Signed-off-by lines to both patches
====================

Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://patch.msgid.link/20250424153246.141677-1-brandon.kammerdiener@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 months agoselftests/bpf: add test for softlock when modifying hashmap while iterating
Brandon Kammerdiener [Thu, 24 Apr 2025 15:32:55 +0000 (11:32 -0400)]
selftests/bpf: add test for softlock when modifying hashmap while iterating

Add test that modifies the map while it's being iterated in such a way that
hangs the kernel thread unless the _safe fix is applied to
bpf_for_each_hash_elem.

Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@intel.com>
Link: https://lore.kernel.org/r/20250424153246.141677-3-brandon.kammerdiener@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
5 months agobpf: fix possible endless loop in BPF map iteration
Brandon Kammerdiener [Thu, 24 Apr 2025 15:32:51 +0000 (11:32 -0400)]
bpf: fix possible endless loop in BPF map iteration

The _safe variant used here gets the next element before running the callback,
avoiding the endless loop condition.

Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@intel.com>
Link: https://lore.kernel.org/r/20250424153246.141677-2-brandon.kammerdiener@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
5 months agoMAINTAINERS: Assign maintainer for the port controller drivers
Heikki Krogerus [Mon, 7 Apr 2025 13:33:06 +0000 (16:33 +0300)]
MAINTAINERS: Assign maintainer for the port controller drivers

Especially the port manager (tcpm.c) is so major driver that
it should have somebody watching over it who really
understands it, and the port controller interface in
general. Assigning Badhri as the designated reviewer and
restoring the status to Maintained from Orphan.

Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Badhri Jagan Sridharan <badhri@google.com>
Acked-by: Badhri Jagan Sridharan <badhri@google.com>
Link: https://lore.kernel.org/r/20250407133306.387576-1-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 months agofs/xattr: Fix handling of AT_FDCWD in setxattrat(2) and getxattrat(2)
Jan Kara [Thu, 24 Apr 2025 13:22:47 +0000 (15:22 +0200)]
fs/xattr: Fix handling of AT_FDCWD in setxattrat(2) and getxattrat(2)

Currently, setxattrat(2) and getxattrat(2) are wrongly handling the
calls of the from setxattrat(AF_FDCWD, NULL, AT_EMPTY_PATH, ...) and
fail with -EBADF error instead of operating on CWD. Fix it.

Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/20250424132246.16822-2-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 months agoMAINTAINERS: hfs/hfsplus: add myself as maintainer
Yangtao Li [Wed, 23 Apr 2025 12:34:22 +0000 (06:34 -0600)]
MAINTAINERS: hfs/hfsplus: add myself as maintainer

I used to maintain Allwinner SoC cpufreq and thermal drivers and
have some work experience in the F2FS file system.

I volunteered to maintain the code together with Slava and Adrian.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Link: https://lore.kernel.org/20250423123423.2062619-1-frank.li@vivo.com
Acked-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 months agosplice: remove duplicate noinline from pipe_clear_nowait
T.J. Mercier [Wed, 23 Apr 2025 18:00:23 +0000 (18:00 +0000)]
splice: remove duplicate noinline from pipe_clear_nowait

pipe_clear_nowait has two noinline macros, but we only need one.

I checked the whole tree, and this is the only occurrence:

$ grep -r "noinline .* noinline"
fs/splice.c:static noinline void noinline pipe_clear_nowait(struct file *file)
$

Fixes: 0f99fc513ddd ("splice: clear FMODE_NOWAIT on file if splice/vmsplice is used")
Signed-off-by: "T.J. Mercier" <tjmercier@google.com>
Link: https://lore.kernel.org/20250423180025.2627670-1-tjmercier@google.com
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 months agodevtmpfs: don't use vfs_getattr_nosec to query i_mode
Christoph Hellwig [Wed, 23 Apr 2025 04:59:41 +0000 (06:59 +0200)]
devtmpfs: don't use vfs_getattr_nosec to query i_mode

The recent move of the bdev_statx call to the low-level vfs_getattr_nosec
helper caused it being used by devtmpfs, which leads to deadlocks in
md teardown due to the block device lookup and put interfering with the
unusual lifetime rules in md.

But as handle_remove only works on inodes created and owned by devtmpfs
itself there is no need to use vfs_getattr_nosec vs simply reading the
mode from the inode directly.  Switch to that to avoid the bdev lookup
or any other unintentional side effect.

Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reported-by: Xiao Ni <xni@redhat.com>
Fixes: 777d0961ff95 ("fs: move the bdex_statx call to vfs_getattr_nosec")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250423045941.1667425-1-hch@lst.de
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Xiao Ni <xni@redhat.com>
Tested-by: Ayush Jain <Ayush.jain3@amd.com>
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
5 months agoublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
Ming Lei [Fri, 25 Apr 2025 01:37:40 +0000 (09:37 +0800)]
ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd

ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but
we may have scheduled task work via io_uring_cmd_complete_in_task() for
dispatching request, then kernel crash can be triggered.

Fix it by not trying to canceling the command if ublk block request is
started.

Fixes: 216c8f5ef0f2 ("ublk: replace monitor with cancelable uring_cmd")
Reported-by: Jared Holzman <jholzman@nvidia.com>
Tested-by: Jared Holzman <jholzman@nvidia.com>
Closes: https://lore.kernel.org/linux-block/d2179120-171b-47ba-b664-23242981ef19@nvidia.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250425013742.1079549-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 months agoublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA
Ming Lei [Fri, 25 Apr 2025 01:37:39 +0000 (09:37 +0800)]
ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA

We call io_uring_cmd_complete_in_task() to schedule task_work for handling
UBLK_U_IO_NEED_GET_DATA.

This way is really not necessary because the current context is exactly
the ublk queue context, so call ublk_dispatch_req() directly for handling
UBLK_U_IO_NEED_GET_DATA.

Fixes: 216c8f5ef0f2 ("ublk: replace monitor with cancelable uring_cmd")
Tested-by: Jared Holzman <jholzman@nvidia.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250425013742.1079549-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 months agobcachefs: Rework fiemap transaction restart handling
Kent Overstreet [Tue, 22 Apr 2025 15:52:45 +0000 (11:52 -0400)]
bcachefs: Rework fiemap transaction restart handling

Restart handling in the previous patch was incorrect, so: move btree
operations into a separate helper, and run it with a lockrestart_do().

Additionally, clarify whether pagecache or the btree takes precedence.

Right now, the btree takes precedence: this is incorrect, but it's
needed to pass fstests. Add a giant comment explaining why.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 months agobcachefs: add fiemap delalloc extent detection
Brian Foster [Mon, 15 Jan 2024 19:46:00 +0000 (14:46 -0500)]
bcachefs: add fiemap delalloc extent detection

bcachefs currently populates fiemap data from the extents btree.
This works correctly when the fiemap sync flag is provided, but if
not, it skips all delalloc extents that have not yet been flushed.
This is because delalloc extents from buffered writes are first
stored as reservation in the pagecache, and only become resident in
the extents btree after writeback completes.

Update the fiemap implementation to process holes between extents by
scanning pagecache for data, via seek data/hole. If a valid data
range is found over a hole in the extent btree, fake up an extent
key and flag the extent as delalloc for reporting to userspace.

Note that this does not necessarily change behavior for the case
where there is dirty pagecache over already written extents, where
when in COW mode, writeback will allocate new blocks for the
underlying ranges. The existing behavior is consistent with btrfs
and it is recommended to use the sync flag for the most up to date
extent state from fiemap.

Signed-off-by: Brian Foster <bfoster@redhat.com>
5 months agobcachefs: refactor fiemap processing into extent helper and struct
Brian Foster [Mon, 15 Jan 2024 19:27:49 +0000 (14:27 -0500)]
bcachefs: refactor fiemap processing into extent helper and struct

The bulk of the loop in bch2_fiemap() involves processing the
current extent key from the iter, including following indirections
and trimming the extent size and such. This patch makes a few
changes to reduce the size of the loop and facilitate future changes
to support delalloc extents.

Define a new bch_fiemap_extent structure to wrap the bkey buffer
that holds the extent key to report to userspace along with
associated fiemap flags. Update bch2_fill_extent() to take the
bch_fiemap_extent as a param instead of the individual fields.
Finally, lift the bulk of the extent processing into a
bch2_fiemap_extent() helper that takes the current key and formats
the bch_fiemap_extent appropriately for the fill function.

No functional changes intended by this patch.

Signed-off-by: Brian Foster <bfoster@redhat.com>
5 months agobcachefs: track current fiemap offset in start variable
Brian Foster [Mon, 15 Jan 2024 19:21:15 +0000 (14:21 -0500)]
bcachefs: track current fiemap offset in start variable

Signed-off-by: Brian Foster <bfoster@redhat.com>
5 months agobcachefs: drop duplicate fiemap sync flag
Brian Foster [Mon, 15 Jan 2024 17:26:56 +0000 (12:26 -0500)]
bcachefs: drop duplicate fiemap sync flag

FIEMAP_FLAG_SYNC handling was deliberately moved into core code in
commit 45dd052e67ad ("fs: handle FIEMAP_FLAG_SYNC in fiemap_prep"),
released in kernel v5.8. Update bcachefs accordingly.

Signed-off-by: Brian Foster <bfoster@redhat.com>
5 months agobcachefs: Fix btree_iter_peek_prev() at end of inode
Kent Overstreet [Tue, 22 Apr 2025 08:54:54 +0000 (04:54 -0400)]
bcachefs: Fix btree_iter_peek_prev() at end of inode

At the end of the inode, on an extents iterator, peek_slot() has to
advance to the next position to avoid returning a 0 size extent, which
is not allowed.

Changing iter->pos confuses peek_prev(), but we don't need to call
peek_slot() in this case.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 months agobcachefs: Make btree_iter_peek_prev() assert more precise
Kent Overstreet [Tue, 22 Apr 2025 08:45:25 +0000 (04:45 -0400)]
bcachefs: Make btree_iter_peek_prev() assert more precise

The issue this assert is guarding against is that in
BTREE_ITER_filter_snapshots mode we only want to be iterating within a
single inode number - if we iterate into another inode number with keys
for a different snapshot tree, we'll loop arbitrarily long before
finding a key we can return.

This comes up in the unit tests, where we're using inode 0 for our test
keys.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 months agobcachefs: Unit test fixes
Kent Overstreet [Tue, 22 Apr 2025 08:44:00 +0000 (04:44 -0400)]
bcachefs: Unit test fixes

The peek_end() tests expect an empty btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 months agobcachefs: Print mount opts earlier
Kent Overstreet [Tue, 22 Apr 2025 01:16:24 +0000 (21:16 -0400)]
bcachefs: Print mount opts earlier

If we aren't mounting with the correct degraded option, it's helpful to
know that before we fail to mount degraded.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 months agobcachefs: unlink: casefold d_invalidate
Kent Overstreet [Thu, 24 Apr 2025 22:07:06 +0000 (18:07 -0400)]
bcachefs: unlink: casefold d_invalidate

casefolding results in additional aliases on lookup for the
non-casefolded names - these need invalidating on unlink.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 months agobcachefs: Fix casefold lookups
Kent Overstreet [Thu, 24 Apr 2025 16:58:06 +0000 (12:58 -0400)]
bcachefs: Fix casefold lookups

Add casefolding to bch2_lookup_trans:

During the delay between when casefolding was written and when it was
merged, the main filesystem lookup path grew self healing - which meant
it was no longer using bch2_dirent_lookup_trans(), where casefolding on
lookups happens.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 months agobcachefs: Casefold is now a regular opts.h option
Kent Overstreet [Sun, 20 Apr 2025 16:15:15 +0000 (12:15 -0400)]
bcachefs: Casefold is now a regular opts.h option

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 months agoriscv: uprobes: Add missing fence.i after building the XOL buffer
Björn Töpel [Sat, 19 Apr 2025 11:14:00 +0000 (13:14 +0200)]
riscv: uprobes: Add missing fence.i after building the XOL buffer

The XOL (execute out-of-line) buffer is used to single-step the
replaced instruction(s) for uprobes. The RISC-V port was missing a
proper fence.i (i$ flushing) after constructing the XOL buffer, which
can result in incorrect execution of stale/broken instructions.

This was found running the BPF selftests "test_progs:
uprobe_autoattach, attach_probe" on the Spacemit K1/X60, where the
uprobes tests randomly blew up.

Reviewed-by: Guo Ren <guoren@kernel.org>
Fixes: 74784081aac8 ("riscv: Add uprobes supported")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250419111402.1660267-2-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
5 months agoriscv: Replace function-like macro by static inline function
Björn Töpel [Sat, 19 Apr 2025 11:13:59 +0000 (13:13 +0200)]
riscv: Replace function-like macro by static inline function

The flush_icache_range() function is implemented as a "function-like
macro with unused parameters", which can result in "unused variables"
warnings.

Replace the macro with a static inline function, as advised by
Documentation/process/coding-style.rst.

Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250419111402.1660267-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
5 months agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Thu, 24 Apr 2025 20:01:31 +0000 (13:01 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "The single core change is an obvious bug fix (and falls within the LF
  guidelines for patches from sanctioned entities). The other driver
  changes are a bit larger but likewise pretty obvious"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: mpi3mr: Add level check to control event logging
  scsi: ufs: core: Add NULL check in ufshcd_mcq_compl_pending_transfer()
  scsi: core: Clear flags for scsi_cmnd that did not complete
  scsi: ufs: Introduce quirk to extend PA_HIBERN8TIME for UFS devices
  scsi: ufs: qcom: Add quirks for Samsung UFS devices
  scsi: target: iscsi: Fix timeout on deleted connection
  scsi: mpi3mr: Reset the pending interrupt flag
  scsi: mpi3mr: Fix pending I/O counter
  scsi: ufs: mcq: Add NULL check in ufshcd_mcq_abort()

5 months agoMerge tag 'landlock-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic...
Linus Torvalds [Thu, 24 Apr 2025 19:59:05 +0000 (12:59 -0700)]
Merge tag 'landlock-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock fixes from Mickaël Salaün:
 "Fix some Landlock audit issues, add related tests, and updates
  documentation"

* tag 'landlock-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  landlock: Update log documentation
  landlock: Fix documentation for landlock_restrict_self(2)
  landlock: Fix documentation for landlock_create_ruleset(2)
  selftests/landlock: Add PID tests for audit records
  selftests/landlock: Factor out audit fixture in audit_test
  landlock: Log the TGID of the domain creator
  landlock: Remove incorrect warning

5 months agox86/insn: Fix CTEST instruction decoding
Kirill A. Shutemov [Wed, 23 Apr 2025 06:58:15 +0000 (09:58 +0300)]
x86/insn: Fix CTEST instruction decoding

insn_decoder_test found a problem with decoding APX CTEST instructions:

Found an x86 instruction decoder bug, please report this.
ffffffff810021df 62 54 94 05 85 ff     ctestneq
objdump says 6 bytes, but insn_get_length() says 5

It happens because x86-opcode-map.txt doesn't specify arguments for the
instruction and the decoder doesn't expect to see ModRM byte.

Fixes: 690ca3a3067f ("x86/insn: Add support for APX EVEX instructions to the opcode map")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # v6.10+
Link: https://lore.kernel.org/r/20250423065815.2003231-1-kirill.shutemov@linux.intel.com
5 months agoperf/x86: Fix non-sampling (counting) events on certain x86 platforms
Luo Gengkun [Wed, 23 Apr 2025 06:47:24 +0000 (06:47 +0000)]
perf/x86: Fix non-sampling (counting) events on certain x86 platforms

Perf doesn't work at perf stat for hardware events on certain x86 platforms:

 $perf stat -- sleep 1
 Performance counter stats for 'sleep 1':
             16.44 msec task-clock                       #    0.016 CPUs utilized
                 2      context-switches                 #  121.691 /sec
                 0      cpu-migrations                   #    0.000 /sec
                54      page-faults                      #    3.286 K/sec
   <not supported> cycles
   <not supported> instructions
   <not supported> branches
   <not supported> branch-misses

The reason is that the check in x86_pmu_hw_config() for sampling events is
unexpectedly applied to counting events as well.

It should only impact x86 platforms with limit_period used for non-PEBS
events. For Intel platforms, it should only impact some older platforms,
e.g., HSW, BDW and NHM.

Fixes: 88ec7eedbbd2 ("perf/x86: Fix low freqency setting issue")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250423064724.3716211-1-luogengkun@huaweicloud.com
5 months agoMerge tag 'kvmarm-fixes-6.15-2' of https://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Thu, 24 Apr 2025 17:28:53 +0000 (13:28 -0400)]
Merge tag 'kvmarm-fixes-6.15-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.15, round #2

 - Single fix for broken usage of 'multi-MIDR' infrastructure in PI
   code, adding an open-coded erratum check for everyone's favorite pile
   of sand: Cavium ThunderX

5 months agoio_uring: fix 'sync' handling of io_fallback_tw()
Jens Axboe [Thu, 24 Apr 2025 16:28:14 +0000 (10:28 -0600)]
io_uring: fix 'sync' handling of io_fallback_tw()

A previous commit added a 'sync' parameter to io_fallback_tw(), which if
true, means the caller wants to wait on the fallback thread handling it.
But the logic is somewhat messed up, ensure that ctxs are swapped and
flushed appropriately.

Cc: stable@vger.kernel.org
Fixes: dfbe5561ae93 ("io_uring: flush offloaded and delayed task_work on exit")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 months agox86/boot: Work around broken busybox 'truncate' tool
Ard Biesheuvel [Thu, 24 Apr 2025 10:19:18 +0000 (12:19 +0200)]
x86/boot: Work around broken busybox 'truncate' tool

The GNU coreutils version of truncate, which is the original, accepts a
% prefix for the -s size argument which means the file in question
should be padded to a multiple of the given size. This is currently used
to pad the setup block of bzImage to a multiple of 4k before appending
the decompressor.

busybox reimplements truncate but does not support this idiom, and
therefore fails the build since commit

  9c54baab4401 ("x86/boot: Drop CRC-32 checksum and the build tool that generates it")

Since very little build code within the kernel depends on the 'truncate'
utility, work around this incompatibility by avoiding truncate altogether,
and relying on dd to perform the padding.

Fixes: 9c54baab4401 ("x86/boot: Drop CRC-32 checksum and the build tool that generates it")
Reported-by: <phasta@kernel.org>
Tested-by: Philipp Stanner <phasta@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250424101917.1552527-2-ardb+git@google.com
5 months agoMerge tag 'net-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 24 Apr 2025 16:14:50 +0000 (09:14 -0700)]
Merge tag 'net-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "No fixes from any subtree.

  Current release - regressions:

   - net: fix the missing unlock for detached devices

  Previous releases - regressions:

   - sched: fix UAF vulnerability in HFSC qdisc

   - lwtunnel: disable BHs when required

   - mptcp: pm: defer freeing of MPTCP userspace path manager entries

   - tipc: fix NULL pointer dereference in tipc_mon_reinit_self()

   - eth: virtio-net: disable delayed refill when pausing rx

  Previous releases - always broken:

   - phylink: fix suspend/resume with WoL enabled and link down

   - eth:
       - mlx5: fix null-ptr-deref in mlx5_create_{inner_,}ttc_table()
       - xen-netfront: handle NULL returned by xdp_convert_buff_to_frame()
       - enetc: fix frame corruption on bpf_xdp_adjust_head/tail() and XDP_PASS
       - stmmac: fix dwmac1000 ptp timestamp status offset
       - pds_core: prevent possible adminq overflow/stuck condition

  Misc:

   - a bunch of MAINTAINERS updates"

* tag 'net-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (32 commits)
  net: stmmac: fix multiplication overflow when reading timestamp
  net: stmmac: fix dwmac1000 ptp timestamp status offset
  net: dp83822: Fix OF_MDIO config check
  pds_core: make wait_context part of q_info
  pds_core: Remove unnecessary check in pds_client_adminq_cmd()
  pds_core: handle unsupported PDS_CORE_CMD_FW_CONTROL result
  pds_core: Prevent possible adminq overflow/stuck condition
  net: dsa: mt7530: sync driver-specific behavior of MT7531 variants
  selftests/tc-testing: Add test for HFSC queue emptying during peek operation
  net_sched: hfsc: Fix a potential UAF in hfsc_dequeue() too
  net_sched: hfsc: Fix a UAF vulnerability in class handling
  selftests: mptcp: diag: use mptcp_lib_get_info_value
  mptcp: pm: Defer freeing of MPTCP userspace path manager entries
  net: ethernet: mtk_eth_soc: net: revise NETSYSv3 hardware configuration
  tipc: fix NULL pointer dereference in tipc_mon_reinit_self()
  virtio-net: disable delayed refill when pausing rx
  net: phy: leds: fix memory leak
  net: phylink: mac_link_(up|down)() clarifications
  net: phylink: fix suspend/resume with WoL enabled and link down
  net: lwtunnel: disable BHs when required
  ...

5 months agoMerge tag 'v6.15-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Thu, 24 Apr 2025 16:10:01 +0000 (09:10 -0700)]
Merge tag 'v6.15-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

 - Revert acomp multibuffer tests which were buggy

 - Fix off-by-one regression in new scomp code

 - Lower quality setting on atmel-sha204a as it may not be random

* tag 'v6.15-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: atmel-sha204a - Set hwrng quality to lowest possible
  crypto: scomp - Fix off-by-one bug when calculating last page
  Revert "crypto: testmgr - Add multibuffer acomp testing"

5 months agoKVM: x86: Do not use kvm_rip_read() unconditionally for KVM_PROFILING
Adrian Hunter [Tue, 15 Apr 2025 10:48:21 +0000 (13:48 +0300)]
KVM: x86: Do not use kvm_rip_read() unconditionally for KVM_PROFILING

Not all VMs allow access to RIP.  Check guest_state_protected before
calling kvm_rip_read().

This avoids, for example, hitting WARN_ON_ONCE in vt_cache_reg() for
TDX VMs.

Fixes: 81bf912b2c15 ("KVM: TDX: Implement TDX vcpu enter/exit path")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Message-ID: <20250415104821.247234-3-adrian.hunter@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 months agoKVM: x86: Do not use kvm_rip_read() unconditionally in KVM tracepoints
Adrian Hunter [Tue, 15 Apr 2025 10:48:20 +0000 (13:48 +0300)]
KVM: x86: Do not use kvm_rip_read() unconditionally in KVM tracepoints

Not all VMs allow access to RIP.  Check guest_state_protected before
calling kvm_rip_read().

This avoids, for example, hitting WARN_ON_ONCE in vt_cache_reg() for
TDX VMs.

Fixes: 81bf912b2c15 ("KVM: TDX: Implement TDX vcpu enter/exit path")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Message-ID: <20250415104821.247234-2-adrian.hunter@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 months agoKVM: SVM: WARN if an invalid posted interrupt IRTE entry is added
Sean Christopherson [Fri, 4 Apr 2025 19:38:22 +0000 (12:38 -0700)]
KVM: SVM: WARN if an invalid posted interrupt IRTE entry is added

Now that the AMD IOMMU doesn't signal success incorrectly, WARN if KVM
attempts to track an AMD IRTE entry without metadata.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 months agoiommu/amd: WARN if KVM attempts to set vCPU affinity without posted intrrupts
Sean Christopherson [Fri, 4 Apr 2025 19:38:21 +0000 (12:38 -0700)]
iommu/amd: WARN if KVM attempts to set vCPU affinity without posted intrrupts

WARN if KVM attempts to set vCPU affinity when posted interrupts aren't
enabled, as KVM shouldn't try to enable posting when they're unsupported,
and the IOMMU driver darn well should only advertise posting support when
AMD_IOMMU_GUEST_IR_VAPIC() is true.

Note, KVM consumes is_guest_mode only on success.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 months agoiommu/amd: Return an error if vCPU affinity is set for non-vCPU IRTE
Sean Christopherson [Fri, 4 Apr 2025 19:38:20 +0000 (12:38 -0700)]
iommu/amd: Return an error if vCPU affinity is set for non-vCPU IRTE

Return -EINVAL instead of success if amd_ir_set_vcpu_affinity() is
invoked without use_vapic; lying to KVM about whether or not the IRTE was
configured to post IRQs is all kinds of bad.

Fixes: d98de49a53e4 ("iommu/amd: Enable vAPIC interrupt remapping mode by default")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 months agoKVM: x86: Take irqfds.lock when adding/deleting IRQ bypass producer
Sean Christopherson [Fri, 4 Apr 2025 19:38:19 +0000 (12:38 -0700)]
KVM: x86: Take irqfds.lock when adding/deleting IRQ bypass producer

Take irqfds.lock when adding/deleting an IRQ bypass producer to ensure
irqfd->producer isn't modified while kvm_irq_routing_update() is running.
The only lock held when a producer is added/removed is irqbypass's mutex.

Fixes: 872768800652 ("KVM: x86: select IRQ_BYPASS_MANAGER")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 months agoKVM: x86: Explicitly treat routing entry type changes as changes
Sean Christopherson [Fri, 4 Apr 2025 19:38:18 +0000 (12:38 -0700)]
KVM: x86: Explicitly treat routing entry type changes as changes

Explicitly treat type differences as GSI routing changes, as comparing MSI
data between two entries could get a false negative, e.g. if userspace
changed the type but left the type-specific data as-is.

Fixes: 515a0c79e796 ("kvm: irqfd: avoid update unmodified entries of the routing")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 months agoKVM: x86: Reset IRTE to host control if *new* route isn't postable
Sean Christopherson [Fri, 4 Apr 2025 19:38:17 +0000 (12:38 -0700)]
KVM: x86: Reset IRTE to host control if *new* route isn't postable

Restore an IRTE back to host control (remapped or posted MSI mode) if the
*new* GSI route prevents posting the IRQ directly to a vCPU, regardless of
the GSI routing type.  Updating the IRTE if and only if the new GSI is an
MSI results in KVM leaving an IRTE posting to a vCPU.

The dangling IRTE can result in interrupts being incorrectly delivered to
the guest, and in the worst case scenario can result in use-after-free,
e.g. if the VM is torn down, but the underlying host IRQ isn't freed.

Fixes: efc644048ecd ("KVM: x86: Update IRTE for posted-interrupts")
Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 months agoKVM: SVM: Allocate IR data using atomic allocation
Sean Christopherson [Fri, 4 Apr 2025 19:38:16 +0000 (12:38 -0700)]
KVM: SVM: Allocate IR data using atomic allocation

Allocate SVM's interrupt remapping metadata using GFP_ATOMIC as
svm_ir_list_add() is called with IRQs are disabled and irqfs.lock held
when kvm_irq_routing_update() reacts to GSI routing changes.

Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 months agoKVM: SVM: Don't update IRTEs if APICv/AVIC is disabled
Sean Christopherson [Tue, 1 Apr 2025 16:18:03 +0000 (09:18 -0700)]
KVM: SVM: Don't update IRTEs if APICv/AVIC is disabled

Skip IRTE updates if AVIC is disabled/unsupported, as forcing the IRTE
into remapped mode (kvm_vcpu_apicv_active() will never be true) is
unnecessary and wasteful.  The IOMMU driver is responsible for putting
IRTEs into remapped mode when an IRQ is allocated by a device, long before
that device is assigned to a VM.  I.e. the kernel as a whole has major
issues if the IRTE isn't already in remapped mode.

Opportunsitically kvm_arch_has_irq_bypass() to query for APICv/AVIC, so
so that all checks in KVM x86 incorporate the same information.

Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250401161804.842968-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 months agoKVM: arm64, x86: make kvm_arch_has_irq_bypass() inline
Paolo Bonzini [Fri, 18 Apr 2025 17:03:08 +0000 (13:03 -0400)]
KVM: arm64, x86: make kvm_arch_has_irq_bypass() inline

kvm_arch_has_irq_bypass() is a small function and even though it does
not appear in any *really* hot paths, it's also not entirely rare.
Make it inline---it also works out nicely in preparation for using it in
kvm-intel.ko and kvm-amd.ko, since the function is not currently exported.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 months agoblock: don't autoload drivers on blk-cgroup configuration
Christoph Hellwig [Wed, 23 Apr 2025 05:37:42 +0000 (07:37 +0200)]
block: don't autoload drivers on blk-cgroup configuration

Loading a driver just to configure blk-cgroup doesn't make sense, as that
assumes and already existing device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250423053810.1683309-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 months agoblock: don't autoload drivers on stat
Christoph Hellwig [Wed, 23 Apr 2025 05:37:41 +0000 (07:37 +0200)]
block: don't autoload drivers on stat

blkdev_get_no_open can trigger the legacy autoload of block drivers.  A
simple stat of a block device has not historically done that, so disable
this behavior again.

Fixes: 9abcfbd235f5 ("block: Add atomic write support for statx")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250423053810.1683309-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 months agoblock: remove the backing_inode variable in bdev_statx
Christoph Hellwig [Wed, 23 Apr 2025 05:37:40 +0000 (07:37 +0200)]
block: remove the backing_inode variable in bdev_statx

backing_inode is only used once, so remove it and update the comment
describing the bdev lookup to be a bit more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250423053810.1683309-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 months agoblock: move blkdev_{get,put} _no_open prototypes out of blkdev.h
Christoph Hellwig [Wed, 23 Apr 2025 05:37:39 +0000 (07:37 +0200)]
block: move blkdev_{get,put} _no_open prototypes out of blkdev.h

These are only to be used by block internal code.  Remove the comment
as we grew more users due to reworking block device node opening.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250423053810.1683309-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>