www.infradead.org Git - nvme.git/log

Merge tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

- Fix hypercall detection on Xen guests

- Extend the AMD microcode loader SHA check to Zen5, to block loading
   of any unreleased standalone Zen5 microcode patches

- Add new Intel CPU model number for Bartlett Lake

- Fix the workaround for AMD erratum 1054

- Fix buggy early memory acceptance between SEV-SNP guests and the EFI
   stub

* tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/sev: Avoid shared GHCB page for early memory acceptance
  x86/cpu/amd: Fix workaround for erratum 1054
  x86/cpu: Add CPU model number for Bartlett Lake CPUs with Raptor Cove cores
  x86/microcode/AMD: Extend the SHA check to Zen5, block loading of any unreleased standalone Zen5 microcode patches
  x86/xen: Fix __xen_hypercall_setfunc()

Merge tag 'timers-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
"Fix a lockdep false positive in the i8253 driver"

* tag 'timers-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/i8253: Call clockevent_i8253_disable() with interrupts disabled

Merge tag 'perf-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf event fixes from Ingo Molnar:
"Miscellaneous fixes and a hardware-enabling change:

   - Fix Intel uncore PMU IIO free running counters on SPR, ICX and SNR
     systems

   - Fix Intel PEBS buffer overflow handling

   - Fix skid in Intel PEBS sampling of user-space general purpose
     registers

   - Enable Panther Lake PMU support - similar to Lunar Lake"

* tag 'perf-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Add Panther Lake support
  perf/x86/intel: Allow to update user space GPRs from PEBS records
  perf/x86/intel: Don't clear perf metrics overflow bit unconditionally
  perf/x86/intel/uncore: Fix the scale of IIO free running counters on SPR
  perf/x86/intel/uncore: Fix the scale of IIO free running counters on ICX
  perf/x86/intel/uncore: Fix the scale of IIO free running counters on SNR

Merge tag 'irq-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc irq fixes from Ingo Molnar:

- Fix BCM2712 irqchip driver Kconfig dependencies required on the
   Raspberry PI5

- Fix spurious interrupts on RZ/G3E SMARC EVK systems

- Fix crash regression on Sun/NIU hardware

- Apply MSI driver quirk for Sun Neptune chips

* tag 'irq-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-bcm2712-mip: Enable driver when ARCH_BCM2835 is enabled
  irqchip/renesas-rzv2h: Prevent TINT spurious interrupt
  net/niu: Niu requires MSIX ENTRY_DATA fields touch before entry reads
  PCI/MSI: Add an option to write MSIX ENTRY_DATA before any reads

Merge tag 'core-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc core fixes from Ingo Molnar:
"Fix a genksyms related bug, triggered by recent changes to the percpu
  code, and update the .clang-format file to not include obsolete
  function names"

* tag 'core-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genksyms: Handle typeof_unqual keyword and __seg_{fs,gs} qualifiers
  clang-format: Update the ForEachMacros list for v6.15-rc1

Merge tag 'hardening-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

- lib/prime_numbers: KUnit test should not select PRIME_NUMBERS (Geert
   Uytterhoeven)

- ubsan: Fix panic from test_ubsan_out_of_bounds (Mostafa Saleh)

- ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP (Nathan
   Chancellor)

- string: Add load_unaligned_zeropad() code path to sized_strscpy()
   (Peter Collingbourne)

- kasan: Add strscpy() test to trigger tag fault on arm64 (Vincenzo
   Frascino)

- Disable GCC randstruct for COMPILE_TEST

* tag 'hardening-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lib/prime_numbers: KUnit test should not select PRIME_NUMBERS
  ubsan: Fix panic from test_ubsan_out_of_bounds
  lib/Kconfig.ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP
  hardening: Disable GCC randstruct for COMPILE_TEST
  kasan: Add strscpy() test to trigger tag fault on arm64
  string: Add load_unaligned_zeropad() code path to sized_strscpy()

Merge tag 'gpio-fixes-for-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fix from Bartosz Golaszewski:

- check for both the new AND old (deprecated) setter callback when
changing GPIO direction to output

* tag 'gpio-fixes-for-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: Allow to use setters with return value for output-only gpios

Merge tag 'thermal-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
"Add missing DVFS support flags for the Lunar Lake and Panther Lake
  platforms to the int340x Intel thermal driver and fix DLVR support
  for Panther Lake in it (Srinivas Pandruvada)"

* tag 'thermal-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel: int340x: Fix Panther Lake DLVR support
  thermal: intel: int340x: Add missing DVFS support flags

Merge tag 'pm-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These are mostly cpufreq fixes, some of which address recent
  regressions and some address older issues that have come to light
  during the last two weeks, and a runtime PM documentation correction:

   - Fix the performance-to-frequency scaling factor computation on
     systems using HWP in the intel_pstate driver after a recent
     incorrect update of it (Rafael Wysocki)

   - Fix the usage of the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag
     in the schedutil cpufreq governor after a recent update of it that
     has caused frequency limits changes to be missed sometimes (Rafael
     Wysocki)

   - Address some recently discovered synchronization issues related to
     frequency limits changes in the schedutil cpufreq governor and in
     the cpufreq core (Rafael Wysocki)

   - Fix ITMT support in the amd-pstate cpufreq driver so that it is
     enabled after asym priorities have been correctly initialized for
     all CPUs (K Prateek Nayak)

   - Fix changing min/max limits in the amd-pstate cpufreq driver while
     on the performance governor (Dhananjay Ugwekar)

   - Fix a function name in the runtime PM documentation that was
     previously incorrectly updated by mistake (Sakari Ailus)"

* tag 'pm-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: Avoid using inconsistent policy->min and policy->max
  cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit()
  cpufreq/sched: Explicitly synchronize limits_changed flag handling
  cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS
  Documentation: PM: runtime: Fix a reference to pm_runtime_autosuspend()
  cpufreq: intel_pstate: Fix hwp_get_cpu_scaling()
  cpufreq/amd-pstate: Enable ITMT support after initializing core rankings
  cpufreq/amd-pstate: Fix min_limit perf and freq updation for performance governor

Merge branch 'pm-docs'

Merge a runtime PM documentation correction for 6.15-rc3.

* pm-docs:
Documentation: PM: runtime: Fix a reference to pm_runtime_autosuspend()

Merge tag 'riscv-for-linus-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

- A fix for an issue where C instructions ended up in non-C builds, due
   to some broken inline assembly in the KGDB breakpoint insertion code

- A fix to avoid spurious printk messages about misaligned access
   performance probing

- A fix for a handful of issues with /proc/iomem's reserved region
   handling

- A pair of fixes for module relocation processing

- A few build-time fixes

* tag 'riscv-for-linus-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: KGDB: Remove ".option norvc/.option rvc" for kgdb_compiled_break
  riscv: KGDB: Do not inline arch_kgdb_breakpoint()
  riscv: Avoid fortify warning in syscall_get_arguments()
  riscv: Provide all alternative macros all the time
  riscv: module: Allocate PLT entries for R_RISCV_PLT32
  riscv: module: Fix out-of-bounds relocation access
  riscv: Properly export reserved regions in /proc/iomem
  riscv: Fix unaligned access info messages
  riscv: Avoid fortify warning in syscall_get_arguments()
  Documentation: riscv: Fix typo MIMPLID -> MIMPID
  riscv: Use kvmalloc_array on relocation_hashtable

Merge tag 'linux_kselftest-kunit-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kunit fix from Shuah Khan:
"Fixes arch sh kunit qemu_configs script sh.py to honor kunit cmdline"

* tag 'linux_kselftest-kunit-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: qemu_configs: SH: Respect kunit cmdline

Merge tag 'linux_kselftest-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fix from Shuah Khan:
"Fixes dynevent_limitations.tc test failure on dash by detecting and
handling bash and dash differences in evaluating \\"

* tag 'linux_kselftest-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftrace: Differentiate bash and dash in dynevent_limitations.tc

Merge tag 'v6.15-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- Fix integer overflow in server disconnect deadtime calculation

- Three fixes for potential use after frees: one for oplocks, and one
   for leases and one for kerberos authentication

- Fix to prevent attempted write to directory

- Fix locking warning for durable scavenger thread

* tag 'v6.15-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: Prevent integer overflow in calculation of deadtime
  ksmbd: fix the warning from __kernel_write_iter
  ksmbd: fix use-after-free in smb_break_all_levII_oplock()
  ksmbd: fix use-after-free in __smb2_lease_break_noti()
  ksmbd: fix WARNING "do not call blocking ops when !TASK_RUNNING"
  ksmbd: Fix dangling pointer in krb_authenticate

Merge tag 'block-6.15-20250417' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

- MD pull via Yu:
      - fix raid10 missing discard IO accounting (Yu Kuai)
      - fix bitmap stats for bitmap file (Zheng Qixing)
      - fix oops while reading all member disks failed during
        check/repair (Meir Elisha)

- NVMe pull via Christoph:
      - fix scan failure for non-ANA multipath controllers (Hannes
        Reinecke)
      - fix multipath sysfs links creation for some cases (Hannes
        Reinecke)
      - PCIe endpoint fixes (Damien Le Moal)
      - use NULL instead of 0 in the auth code (Damien Le Moal)

- Various ublk fixes:
      - Slew of selftest additions
      - Improvements and fixes for IO cancelation
      - Tweak to Kconfig verbiage

- Fix for page dirtying for blk integrity mapped pages

- loop fixes:
      - buffered IO fix
      - uevent fixes
      - request priority inheritance fix

- Various little fixes

* tag 'block-6.15-20250417' of git://git.kernel.dk/linux: (38 commits)
  selftests: ublk: add generic_06 for covering fault inject
  ublk: simplify aborting ublk request
  ublk: remove __ublk_quiesce_dev()
  ublk: improve detection and handling of ublk server exit
  ublk: move device reset into ublk_ch_release()
  ublk: rely on ->canceling for dealing with ublk_nosrv_dev_should_queue_io
  ublk: add ublk_force_abort_dev()
  ublk: properly serialize all FETCH_REQs
  selftests: ublk: move creating UBLK_TMP into _prep_test()
  selftests: ublk: add test_stress_05.sh
  selftests: ublk: support user recovery
  selftests: ublk: support target specific command line
  selftests: ublk: increase max nr_queues and queue depth
  selftests: ublk: set queue pthread's cpu affinity
  selftests: ublk: setup ring with IORING_SETUP_SINGLE_ISSUER/IORING_SETUP_DEFER_TASKRUN
  selftests: ublk: add two stress tests for zero copy feature
  selftests: ublk: run stress tests in parallel
  selftests: ublk: make sure _add_ublk_dev can return in sub-shell
  selftests: ublk: cleanup backfile automatically
  selftests: ublk: add io_uring uapi header
  ...

Merge tag 'io_uring-6.15-20250418' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

- Correctly cap iov_iter->nr_segs for imports of registered buffers,
   both kbuf and normal ones.

   Three cleanups to make it saner first, then two fixes for each of the
   buffer types.

   This fixes a performance regression where partial buffer usage
   doesn't trim the tail number of segments, leading the block layer to
   iterate the IOs to check if it needs splitting.

- Two patches tweaking the newly introduced zero-copy rx API, mostly to
   keep the API consistent once we add multiple interface queues per
   ring support in the 6.16 release.

- zc rx unmapping fix for a dead device

* tag 'io_uring-6.15-20250418' of git://git.kernel.dk/linux:
  io_uring/zcrx: fix late dma unmap for a dead dev
  io_uring/rsrc: ensure segments counts are correct on kbuf buffers
  io_uring/rsrc: send exact nr_segs for fixed buffer
  io_uring/rsrc: refactor io_import_fixed
  io_uring/rsrc: separate kbuf offset adjustments
  io_uring/rsrc: don't skip offset calculation
  io_uring/zcrx: add pp to ifq conversion helper
  io_uring/zcrx: return ifq id to the user

x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Communicating with the hypervisor using the shared GHCB page requires
clearing the C bit in the mapping of that page. When executing in the
context of the EFI boot services, the page tables are owned by the
firmware, and this manipulation is not possible.

So switch to a different API for accepting memory in SEV-SNP guests, one
which is actually supported at the point during boot where the EFI stub
may need to accept memory, but the SEV-SNP init code has not executed
yet.

For simplicity, also switch the memory acceptance carried out by the
decompressor when not booting via EFI - this only involves the
allocation for the decompressed kernel, and is generally only called
after kexec, as normal boot will jump straight into the kernel from the
EFI stub.

Fixes: 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support")
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250404082921.2767593-8-ardb+git@google.com
Link: https://lore.kernel.org/r/20250410132850.3708703-2-ardb+git@google.com
Link: https://lore.kernel.org/r/20250417202120.1002102-2-ardb+git@google.com

x86/cpu/amd: Fix workaround for erratum 1054

Erratum 1054 affects AMD Zen processors that are a part of Family 17h
Models 00-2Fh and the workaround is to not set HWCR[IRPerfEn]. However,
when X86_FEATURE_ZEN1 was introduced, the condition to detect unaffected
processors was incorrectly changed in a way that the IRPerfEn bit gets
set only for unaffected Zen 1 processors.

Ensure that HWCR[IRPerfEn] is set for all unaffected processors. This
includes a subset of Zen 1 (Family 17h Models 30h and above) and all
later processors. Also clear X86_FEATURE_IRPERF on affected processors
so that the IRPerfCount register is not used by other entities like the
MSR PMU driver.

Fixes: 232afb557835 ("x86/CPU/AMD: Add X86_FEATURE_ZEN1")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/caa057a9d6f8ad579e2f1abaa71efbd5bd4eaf6d.1744956467.git.sandipan.das@amd.com

io_uring/zcrx: fix late dma unmap for a dead dev

There is a problem with page pools not dma-unmapping immediately when
the device is going down, and delaying it until the page pool is
destroyed, which is not allowed (see links). That just got fixed for
normal page pools, and we need to address memory providers as well.

Unmap pages in the memory provider uninstall callback, and protect it
with a new lock. There is also a gap between when a dma mapping is
created and the mp is installed, so if the device is killed in between,
io_uring would be holding on to dma mappings to a dead device with no
one to call ->uninstall. Move it to page pool init and rely on
->is_mapped to make sure it's only done once.

Link: https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/
Link: https://lore.kernel.org/all/20250409-page-pool-track-dma-v9-0-6a9ef2e0cba8@redhat.com/
Fixes: 34a3e60821ab9 ("io_uring/zcrx: implement zerocopy receive pp memory provider")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ef9b7db249b14f6e0b570a1bb77ff177389f881c.1744965853.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'pci-v6.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fix from Bjorn Helgaas:

- Revert a reset patch that broke VFIO passthrough because devices
ended up with no available reset mechanisms (Alex Williamson)

* tag 'pci-v6.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
Revert "PCI: Avoid reset when disabled via sysfs"

Merge tag 'bcachefs-2025-04-17' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
"Usual set of small fixes/logging improvements.

  One bigger user reported fix, for inode <-> dirent inconsistencies
  reported in fsck, after moving a subvolume that had been snapshotted"

* tag 'bcachefs-2025-04-17' of git://evilpiepirate.org/bcachefs:
  bcachefs: Fix snapshotting a subvolume, then renaming it
  bcachefs: Add missing READ_ONCE() for metadata replicas
  bcachefs: snapshot_node_missing is now autofix
  bcachefs: Log message when incompat version requested but not enabled
  bcachefs: Print version_incompat_allowed on startup
  bcachefs: Silence extent_poisoned error messages
  bcachefs: btree_root_unreadable_and_scan_found_nothing now AUTOFIX
  bcachefs: fix bch2_dev_usage_full_read_fast()
  bcachefs: Don't print data read retry success on non-errors
  bcachefs: Add missing error handling
  bcachefs: Prevent granting write refs when filesystem is read-only

Merge tag 'vfio-v6.15-rc3' of https://github.com/awilliam/linux-vfio

Pull vfio fix from Alex Williamson:

- Include devices where the platform indicates PCI INTx is not routed
   by setting pdev->irq to zero in the expanded virtualization of the
   PCI pin register. This provides consistency in the INFO and SET_IRQS
   ioctls (Alex Williamson)

* tag 'vfio-v6.15-rc3' of https://github.com/awilliam/linux-vfio:
  vfio/pci: Virtualize zero INTx PIN if no pdev->irq

Merge tag 'spi-fix-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A few more device specific fixes plus one trivial quirk.

  There's a couple of patches for Tegra which avoid some fairly
  spectacular log spam if the hardware breaks in ways which were
  actually seen in production, plus a fix for the i.MX driver to
  propagate errors properly when setting up the hardware.

  We also have a trivial patch marking the sun4i driver as being
  compatible with GPIO chip selects"

* tag 'spi-fix-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-imx: Add check for spi_imx_setupxfer()
  spi: tegra210-quad: add rate limiting and simplify timeout error message
  spi: tegra210-quad: use WARN_ON_ONCE instead of WARN_ON for timeouts
  spi: sun4i: add support for GPIO chip select lines

Merge tag 'net-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from Bluetooth, CAN and Netfilter.

  Current release - regressions:

   - two fixes for the netdev per-instance locking

   - batman-adv: fix double-hold of meshif when getting enabled

  Current release - new code bugs:

   - Bluetooth: increment TX timestamping tskey always for stream
     sockets

   - wifi: static analysis and build fixes for the new Intel sub-driver

  Previous releases - regressions:

   - net: fib_rules: fix iif / oif matching on L3 master (VRF) device

   - ipv6: add exception routes to GC list in rt6_insert_exception()

   - netfilter: conntrack: fix erroneous removal of offload bit

   - Bluetooth:
       - fix sending MGMT_EV_DEVICE_FOUND for invalid address
       - l2cap: process valid commands in too long frame
       - btnxpuart: Revert baudrate change in nxp_shutdown

  Previous releases - always broken:

   - ethtool: fix memory corruption during SFP FW flashing

   - eth:
       - hibmcge: fixes for link and MTU handling, pause frames etc
       - igc: fixes for PTM (PCIe timestamping)

   - dsa: b53: enable BPDU reception for management port

  Misc:

   - fixes for Netlink protocol schemas"

* tag 'net-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
  net: ethernet: mtk_eth_soc: revise QDMA packet scheduler settings
  net: ethernet: mtk_eth_soc: correct the max weight of the queue limit for 100Mbps
  net: ethernet: mtk_eth_soc: reapply mdc divider on reset
  net: ti: icss-iep: Fix possible NULL pointer dereference for perout request
  net: ti: icssg-prueth: Fix possible NULL pointer dereference inside emac_xmit_xdp_frame()
  net: ti: icssg-prueth: Fix kernel warning while bringing down network interface
  netfilter: conntrack: fix erronous removal of offload bit
  net: don't try to ops lock uninitialized devs
  ptp: ocp: fix start time alignment in ptp_ocp_signal_set
  net: dsa: avoid refcount warnings when ds->ops->tag_8021q_vlan_del() fails
  net: dsa: free routing table on probe failure
  net: dsa: clean up FDB, MDB, VLAN entries on unbind
  net: dsa: mv88e6xxx: fix -ENOENT when deleting VLANs and MST is unsupported
  net: dsa: mv88e6xxx: avoid unregistering devlink regions which were never registered
  net: txgbe: fix memory leak in txgbe_probe() error path
  net: bridge: switchdev: do not notify new brentries as changed
  net: b53: enable BPDU reception for management port
  netlink: specs: rt-neigh: prefix struct nfmsg members with ndm
  netlink: specs: rt-link: adjust mctp attribute naming
  netlink: specs: rtnetlink: attribute naming corrections
  ...

bcachefs: Fix snapshotting a subvolume, then renaming it

Subvolume roots and the dirents that point to them are special; they
don't obey the normal snapshot versioning rules because they cross
snapshot boundaries.

We don't keep around older versions of subvolume dirents on rename - we
don't need to, because subvolume dirents are only visible in the parent
subvolume, and we wouldn't be able to match up the different dirent and
inode versions due to crossing the snapshot ID boundary.

That means that when we rename a subvolume, that's been snapshotted, the
older version of the subvolume root will become dangling - it won't have
a dirent that points to it.

That's expected, we just need to tell fsck that this is ok.

Fixes: https://github.com/koverstreet/bcachefs/issues/856
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

io_uring/rsrc: ensure segments counts are correct on kbuf buffers

kbuf imports have the front offset adjusted and segments removed, but
the tail segments are still included in the segment count that gets
passed in the iov_iter. As the segments aren't necessarily all the
same size, move importing to a separate helper and iterate the
mapped length to get an exact count.

Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'for-linus-6.15a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
"Just a single fix for the Xen multicall driver avoiding a percpu
variable referencing initdata by its initializer"

* tag 'for-linus-6.15a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: fix multicall debug feature

Merge tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull fwctl fixes from Jason Gunthorpe:
"Three small changes from further build testing:

   - Don't rely on the userspace uuid.h for the uapi header

   - Fix sparse warnings in pds

   - Typo in log message"

* tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  fwctl: Fix repeated device word in log message
  pds_fwctl: Fix type and endian complaints
  fwctl/cxl: Fix uuid_t usage in uapi

Merge tag 'sound-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes. All are device-specific like quirks, new
  IDs, and other safe (or rather boring) changes"

* tag 'sound-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  firmware: cs_dsp: test_bin_error: Fix uninitialized data used as fw version
  ASoC: codecs: Add of_match_table for aw888081 driver
  ASoC: fsl: fsl_qmc_audio: Reset audio data pointers on TRIGGER_START event
  mailmap: Add entry for Srinivas Kandagatla
  MAINTAINERS: use kernel.org alias
  ASoC: cs42l43: Reset clamp override on jack removal
  ALSA: hda/realtek - Fixed ASUS platform headset Mic issue
  ALSA: hda/cirrus_scodec_test: Don't select dependencies
  ALSA: azt2320: Replace deprecated strcpy() with strscpy()
  ASoC: hdmi-codec: use RTD ID instead of DAI ID for ELD entry
  ASoC: Intel: avs: Constrain path based on BE capabilities
  ALSA: hda/tas2781: Remove unnecessary NULL check before release_firmware()
  ASoC: Intel: avs: Fix null-ptr-deref in avs_component_probe()
  ASoC: fsl_asrc_dma: get codec or cpu dai from backend
  ASoC: qcom: Fix sc7280 lpass potential buffer overflow
  ASoC: dwc: always enable/disable i2s irqs
  ASoC: Intel: sof_sdw: Add quirk for Asus Zenbook S16
  ASoC: codecs:lpass-wsa-macro: Fix logic of enabling vi channels
  ASoC: codecs:lpass-wsa-macro: Fix vi feedback rate

Merge tag 'platform-drivers-x86-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers fixes from Ilpo Järvinen:
"Fixes:
   - amd/pmf: Fix STT limits
   - asus-laptop: Fix an uninitialized variable
   - intel_pmc_ipc: Allow building without ACPI
   - mlxbf-bootctl: Use sysfs_emit_at() in secure_boot_fuse_state_show()
   - msi-wmi-platform: Add locking to workaround ACPI firmware bug

  New HW support:
   - alienware-wmi-wmax:
      - Extended thermal control support to:
         - Alienware Area-51m R2
         - Alienware m16 R1
         - Alienware m16 R2
         - Dell G16 7630
         - Dell G5 5505 SE
      - G-Mode support to Alienware m16 R1
   - x86-android-tablets: Add Vexia Edu Atla 10 tablet 5V data"

* tag 'platform-drivers-x86-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: msi-wmi-platform: Workaround a ACPI firmware bug
  platform/x86: msi-wmi-platform: Rename "data" variable
  platform/x86: alienware-wmi-wmax: Extend support to more laptops
  platform/x86: alienware-wmi-wmax: Add G-Mode support to Alienware m16 R1
  platform/x86: amd: pmf: Fix STT limits
  mlxbf-bootctl: use sysfs_emit_at() in secure_boot_fuse_state_show()
  platform/x86: x86-android-tablets: Add Vexia Edu Atla 10 tablet 5V data
  platform/x86: x86-android-tablets: Add "9v" to Vexia EDU ATLA 10 tablet symbols
  asus-laptop: Fix an uninitialized variable
  platform/x86: intel_pmc_ipc: add option to build without ACPI

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Small drivers fixes, except for ufs which has two large updates, one
  for exposing the device level feature, which is a new addition to the
  device spec and the other reworking the exynos driver to fix coherence
  issues on some android phones"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: megaraid_sas: Driver version update to 07.734.00.00-rc1
  scsi: megaraid_sas: Block zero-length ATA VPD inquiry
  scsi: scsi_transport_srp: Replace min/max nesting with clamp()
  scsi: ufs: core: Add device level exception support
  scsi: ufs: core: Rename ufshcd_wb_presrv_usrspc_keep_vcc_on()
  scsi: smartpqi: Use is_kdump_kernel() to check for kdump
  scsi: pm80xx: Set phy_attached to zero when device is gone
  scsi: ufs: exynos: gs101: Put UFS device in reset on .suspend()
  scsi: ufs: exynos: Move phy calls to .exit() callback
  scsi: ufs: exynos: Enable PRDT pre-fetching with UFSHCD_CAP_CRYPTO
  scsi: ufs: exynos: Ensure consistent phy reference counts
  scsi: ufs: exynos: Disable iocc if dma-coherent property isn't set
  scsi: ufs: exynos: Move UFS shareability value to drvdata
  scsi: ufs: exynos: Ensure pre_link() executes before exynos_ufs_phy_init()
  scsi: iscsi: Fix missing scsi_host_put() in error path
  scsi: ufs: core: Fix a race condition related to device commands
  scsi: hisi_sas: Fix I/O errors caused by hardware port ID changes
  scsi: hisi_sas: Enable force phy when SATA disk directly connected

Merge tag 'ata-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fix from Damien Le Moal:

- Fix how sense data from the sense data for successfull NCQ commands
   log page is used to fully initialize the result_tf of a completed
   command, so that the sense data returned to the scsi layer is fully
   initialized with all the device provided information (from Niklas)

* tag 'ata-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-sata: Save all fields from sense data descriptor

Merge tag 'xfs-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull XFS fixes from Carlos Maiolino:
"This mostly includes fixes and documentation for the zoned allocator
  feature merged during previous merge window, but it also adds a sysfs
  tunable for the zone garbage collector.

  There is also a fix for a regression to the RT device that we'd like
  to fix ASAP now that we're getting more users on the RT zoned
  allocator"

* tag 'xfs-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: document zoned rt specifics in admin-guide
  xfs: fix fsmap for internal zoned devices
  xfs: Fix spelling mistake "drity" -> "dirty"
  xfs: compute buffer address correctly in xmbuf_map_backing_mem
  xfs: add tunable threshold parameter for triggering zone GC
  xfs: mark xfs_buf_free as might_sleep()
  xfs: remove the leftover xfs_{set,clear}_li_failed infrastructure

Merge tag 'for-6.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- handle encoded read ioctl returning EAGAIN so it does not mistakenly
   free the work structure

- escape subvolume path in mount option list so it cannot be wrongly
   parsed when the path contains ","

- remove folio size assertions when writing super block to device with
   enabled large folios

* tag 'for-6.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: remove folio order ASSERT()s in super block writeback path
  btrfs: correctly escape subvol in btrfs_show_options()
  btrfs: ioctl: don't free iov when btrfs_encoded_read() returns -EAGAIN

Merge tag 'slab-for-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:

- Stable fix adding zero initialization of slab->obj_ext to prevent
crashes with allocation profiling (Suren Baghdasaryan)

* tag 'slab-for-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slab: ensure slab->obj_exts is clear in a newly allocated slab page

Merge tag 'amd-pstate-v6.15-2025-04-15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux

Merge amd-pstate content for 6.15 (4/15/25) from Mario Limonciello:

"Add a fix for X3D processors where depending upon what BIOS was
set initially rankings might be set improperly.

Add a fix for changing min/max limits while on the performance
governor."

* tag 'amd-pstate-v6.15-2025-04-15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
cpufreq/amd-pstate: Enable ITMT support after initializing core rankings
cpufreq/amd-pstate: Fix min_limit perf and freq updation for performance governor

cpufreq: Avoid using inconsistent policy->min and policy->max

Since cpufreq_driver_resolve_freq() can run in parallel with
cpufreq_set_policy() and there is no synchronization between them,
the former may access policy->min and policy->max while the latter
is updating them and it may see intermediate values of them due
to the way the update is carried out. Also the compiler is free
to apply any optimizations it wants both to the stores in
cpufreq_set_policy() and to the loads in cpufreq_driver_resolve_freq()
which may result in additional inconsistencies.

To address this, use WRITE_ONCE() when updating policy->min and
policy->max in cpufreq_set_policy() and use READ_ONCE() for reading
them in cpufreq_driver_resolve_freq(). Moreover, rearrange the update
in cpufreq_set_policy() to avoid storing intermediate values in
policy->min and policy->max with the help of the observation that
their new values are expected to be properly ordered upfront.

Also modify cpufreq_driver_resolve_freq() to take the possible reverse
ordering of policy->min and policy->max, which may happen depending on
the ordering of operations when this function and cpufreq_set_policy()
run concurrently, into account by always honoring the max when it
turns out to be less than the min (in case it comes from thermal
throttling or similar).

Fixes: 151717690694 ("cpufreq: Make policy min/max hard requirements")
Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/5907080.DvuYhMxLoT@rjwysocki.net

cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit()

Notice that ignore_dl_rate_limit() need not piggy back on the
limits_changed handling to achieve its goal (which is to enforce a
frequency update before its due time).

Namely, if sugov_should_update_freq() is updated to check
sg_policy->need_freq_update and return 'true' if it is set when
sg_policy->limits_changed is not set, ignore_dl_rate_limit() may
set the former directly instead of setting the latter, so it can
avoid hitting the memory barrier in sugov_should_update_freq().

Update the code accordingly.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/10666429.nUPlyArG6x@rjwysocki.net

cpufreq/sched: Explicitly synchronize limits_changed flag handling

The handling of the limits_changed flag in struct sugov_policy needs to
be explicitly synchronized to ensure that cpufreq policy limits updates
will not be missed in some cases.

Without that synchronization it is theoretically possible that
the limits_changed update in sugov_should_update_freq() will be
reordered with respect to the reads of the policy limits in
cpufreq_driver_resolve_freq() and in that case, if the limits_changed
update in sugov_limits() clobbers the one in sugov_should_update_freq(),
the new policy limits may not take effect for a long time.

Likewise, the limits_changed update in sugov_limits() may theoretically
get reordered with respect to the updates of the policy limits in
cpufreq_set_policy() and if sugov_should_update_freq() runs between
them, the policy limits change may be missed.

To ensure that the above situations will not take place, add memory
barriers preventing the reordering in question from taking place and
add READ_ONCE() and WRITE_ONCE() annotations around all of the
limits_changed flag updates to prevent the compiler from messing up
with that code.

Fixes: 600f5badb78c ("cpufreq: schedutil: Don't skip freq update when limits change")
Cc: 5.3+ <stable@vger.kernel.org> # 5.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/3376719.44csPzL39Z@rjwysocki.net

cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS

Commit 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused
by need_freq_update") modified sugov_should_update_freq() to set the
need_freq_update flag only for drivers with CPUFREQ_NEED_UPDATE_LIMITS
set, but that flag generally needs to be set when the policy limits
change because the driver callback may need to be invoked for the new
limits to take effect.

However, if the return value of cpufreq_driver_resolve_freq() after
applying the new limits is still equal to the previously selected
frequency, the driver callback needs to be invoked only in the case
when CPUFREQ_NEED_UPDATE_LIMITS is set (which means that the driver
specifically wants its callback to be invoked every time the policy
limits change).

Update the code accordingly to avoid missing policy limits changes for
drivers without CPUFREQ_NEED_UPDATE_LIMITS.

Fixes: 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused by need_freq_update")
Closes: https://lore.kernel.org/lkml/Z_Tlc6Qs-tYpxWYb@linaro.org/
Reported-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/3010358.e9J7NaK4W3@rjwysocki.net

net: ethernet: mtk_eth_soc: revise QDMA packet scheduler settings

The QDMA packet scheduler suffers from a performance issue.
Fix this by picking up changes from MediaTek's SDK which change to use
Token Bucket instead of Leaky Bucket and fix the SPEED_1000 configuration.

Fixes: 160d3a9b1929 ("net: ethernet: mtk_eth_soc: introduce MTK_NETSYS_V2 support")
Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/18040f60f9e2f5855036b75b28c4332a2d2ebdd8.1744764277.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_eth_soc: correct the max weight of the queue limit for 100Mbps

Without this patch, the maximum weight of the queue limit will be
incorrect when linked at 100Mbps due to an apparent typo.

Fixes: f63959c7eec31 ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues")
Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/74111ba0bdb13743313999ed467ce564e8189006.1744764277.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_eth_soc: reapply mdc divider on reset

In the current method, the MDC divider was reset to the default setting
of 2.5MHz after the NETSYS SER. Therefore, we need to reapply the MDC
divider configuration function in mtk_hw_init() after reset.

Fixes: c0a440031d431 ("net: ethernet: mtk_eth_soc: set MDIO bus clock frequency")
Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/8ab7381447e6cdcb317d5b5a6ddd90a1734efcb0.1744764277.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

io_uring/rsrc: send exact nr_segs for fixed buffer

Sending exact nr_segs, avoids bio split check and processing in
block layer, which takes around 5%[1] of overall CPU utilization.

In our setup, we see overall improvement of IOPS from 7.15M to 7.65M [2]
and 5% less CPU utilization.

[1]
     3.52%  io_uring         [kernel.kallsyms]     [k] bio_split_rw_at
     1.42%  io_uring         [kernel.kallsyms]     [k] bio_split_rw
     0.62%  io_uring         [kernel.kallsyms]     [k] bio_submit_split

[2]
sudo taskset -c 0,1 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2
-r4 /dev/nvme0n1 /dev/nvme1n1

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
[Pavel: fixed for kbuf, rebased and reworked on top of cleanups]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7a1a49a8d053bd617c244291d63dbfbc07afde36.1744882081.git.asml.silence@gmail.com
[axboe: fold in fix factoring in buf reg offset]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'nf-25-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fix for net

The following batch contains one Netfilter fix for net:

1) conntrack offload bit is erroneously unset in a race scenario,
from Florian Westphal.

netfilter pull request 25-04-17

* tag 'nf-25-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: conntrack: fix erronous removal of offload bit
====================

Link: https://patch.msgid.link/20250417102847.16640-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

io_uring/rsrc: refactor io_import_fixed

io_import_fixed is a mess. Even though we know the final len of the
iterator, we still assign offset + len and do some magic after to
correct for that.

Do offset calculation first and finalise it with iov_iter_bvec at the
end.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2d5107fed24f8b23245ef2ede9a5a7f7c426df61.1744882081.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/rsrc: separate kbuf offset adjustments

Kernel registered buffers are special because segments are not uniform
in size, and we have a bunch of optimisations based on that uniformity
for normal buffers. Handle kbuf separately, it'll be cleaner this way.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4e9e5990b0ab5aee723c0be5cd9b5bcf810375f9.1744882081.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/rsrc: don't skip offset calculation

Don't optimise for requests with offset=0. Large registered buffers are
the preference and hence the user is likely to pass an offset, and the
adjustments are not expensive and will be made even cheaper in following
patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1c2beb20470ee3c886a363d4d8340d3790db19f3.1744882081.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

perf/x86/intel: Add Panther Lake support

From PMU's perspective, Panther Lake is similar to the previous
generation Lunar Lake. Both are hybrid platforms, with e-core and
p-core.

The key differences are the ARCH PEBS feature and several new events.
The ARCH PEBS is supported in the following patches.
The new events will be supported later in perf tool.

Share the code path with the Lunar Lake. Only update the name.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20250415114428.341182-2-dapeng1.mi@linux.intel.com

perf/x86/intel: Allow to update user space GPRs from PEBS records

Currently when a user samples user space GPRs (--user-regs option) with
PEBS, the user space GPRs actually always come from software PMI
instead of from PEBS hardware. This leads to the sampled GPRs to
possibly be inaccurate for single PEBS record case because of the
skid between counter overflow and GPRs sampling on PMI.

For the large PEBS case, it is even worse. If user sets the
exclude_kernel attribute, large PEBS would be used to sample user space
GPRs, but since PEBS GPRs group is not really enabled, it leads to all
samples in the large PEBS record to share the same piece of user space
GPRs, like this reproducer shows:

  $ perf record -e branches:pu --user-regs=ip,ax -c 100000 ./foo
  $ perf report -D | grep "AX"

  .... AX    0x000000003a0d4ead
  .... AX    0x000000003a0d4ead
  .... AX    0x000000003a0d4ead
  .... AX    0x000000003a0d4ead
  .... AX    0x000000003a0d4ead
  .... AX    0x000000003a0d4ead
  .... AX    0x000000003a0d4ead
  .... AX    0x000000003a0d4ead
  .... AX    0x000000003a0d4ead
  .... AX    0x000000003a0d4ead
  .... AX    0x000000003a0d4ead

So enable GPRs group for user space GPRs sampling and prioritize reading
GPRs from PEBS. If the PEBS sampled GPRs is not user space GPRs (single
PEBS record case), perf_sample_regs_user() modifies them to user space
GPRs.

[ mingo: Clarified the changelog. ]

Fixes: c22497f5838c ("perf/x86/intel: Support adaptive PEBS v4")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250415104135.318169-2-dapeng1.mi@linux.intel.com

perf/x86/intel: Don't clear perf metrics overflow bit unconditionally

The below code would always unconditionally clear other status bits like
perf metrics overflow bit once PEBS buffer overflows:

status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;

This is incorrect. Perf metrics overflow bit should be cleared only when
fixed counter 3 in PEBS counter group. Otherwise perf metrics overflow
could be missed to handle.

Closes: https://lore.kernel.org/all/20250225110012.GK31462@noisy.programming.kicks-ass.net/
Fixes: 7b2c05a15d29 ("perf/x86/intel: Generic support for hardware TopDown metrics")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250415104135.318169-1-dapeng1.mi@linux.intel.com

Merge tag 'nvme-6.15-2025-04-17' of git://git.infradead.org/nvme into block-6.15

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.15

- fix scan failure for non-ANA multipath controllers (Hannes Reinecke)
- fix multipath sysfs links creation for some cases (Hannes Reinecke)
- PCIe endpoint fixes (Damien Le Moal)
- use NULL instead of 0 in the auth code (Damien Le Moal)"

* tag 'nvme-6.15-2025-04-17' of git://git.infradead.org/nvme:
  nvmet: pci-epf: cleanup link state management
  nvmet: pci-epf: clear CC and CSTS when disabling the controller
  nvmet: pci-epf: always fully initialize completion entries
  nvmet: auth: use NULL to clear a pointer in nvmet_auth_sq_free()
  nvme-multipath: sysfs links may not be created for devices
  nvme: fixup scan failure for non-ANA multipath controllers

spi: spi-imx: Add check for spi_imx_setupxfer()

Add check for the return value of spi_imx_setupxfer().
spi_imx->rx and spi_imx->tx function pointer can be NULL when
spi_imx_setupxfer() return error, and make NULL pointer dereference.

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Call trace:
  0x0
  spi_imx_pio_transfer+0x50/0xd8
  spi_imx_transfer_one+0x18c/0x858
  spi_transfer_one_message+0x43c/0x790
  __spi_pump_transfer_message+0x238/0x5d4
  __spi_sync+0x2b0/0x454
  spi_write_then_read+0x11c/0x200

Signed-off-by: Tamura Dai <kirinode0@gmail.com>
Reviewed-by: Carlos Song <carlos.song@nxp.com>
Link: https://patch.msgid.link/20250417011700.14436-1-kirinode0@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

Merge tag 'for-net-2025-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

- l2cap: Process valid commands in too long frame
- vhci: Avoid needless snprintf() calls

* tag 'for-net-2025-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: vhci: Avoid needless snprintf() calls
Bluetooth: l2cap: Process valid commands in too long frame
====================

Link: https://patch.msgid.link/20250416210126.2034212-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

perf/x86/intel/uncore: Fix the scale of IIO free running counters on SPR

The scale of IIO bandwidth in free running counters is inherited from
the ICX. The counter increments for every 32 bytes rather than 4 bytes.

The IIO bandwidth out free running counters don't increment with a
consistent size. The increment depends on the requested size. It's
impossible to find a fixed increment. Remove it from the event_descs.

Fixes: 0378c93a92e2 ("perf/x86/intel/uncore: Support IIO free-running counters on Sapphire Rapids server")
Reported-by: Tang Jun <dukang.tj@alibaba-inc.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250416142426.3933977-3-kan.liang@linux.intel.com

perf/x86/intel/uncore: Fix the scale of IIO free running counters on ICX

There was a mistake in the ICX uncore spec too. The counter increments
for every 32 bytes rather than 4 bytes.

The same as SNR, there are 1 ioclk and 8 IIO bandwidth in free running
counters. Reuse the snr_uncore_iio_freerunning_events().

Fixes: 2b3b76b5ec67 ("perf/x86/intel/uncore: Add Ice Lake server uncore support")
Reported-by: Tang Jun <dukang.tj@alibaba-inc.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250416142426.3933977-2-kan.liang@linux.intel.com

perf/x86/intel/uncore: Fix the scale of IIO free running counters on SNR

There was a mistake in the SNR uncore spec. The counter increments for
every 32 bytes of data sent from the IO agent to the SOC, not 4 bytes
which was documented in the spec.

The event list has been updated:

  "EventName": "UNC_IIO_BANDWIDTH_IN.PART0_FREERUN",
  "BriefDescription": "Free running counter that increments for every 32
       bytes of data sent from the IO agent to the SOC",

Update the scale of the IIO bandwidth in free running counters as well.

Fixes: 210cc5f9db7a ("perf/x86/intel/uncore: Add uncore support for Snow Ridge server")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250416142426.3933977-1-kan.liang@linux.intel.com

Merge branch 'bug-fixes-from-xdp-and-perout-series'

Meghana Malladi says:

====================
Bug fixes from XDP and perout series

This patch series consists of bug fixes from the XDP series:
1. Fixes a kernel warning that occurs when bringing down the
   network interface.
2. Resolves a potential NULL pointer dereference in the
   emac_xmit_xdp_frame() function.
3. Resolves a potential NULL pointer dereference in the
   icss_iep_perout_enable() function

v3: https://lore.kernel.org/all/20250328102403.2626974-1-m-malladi@ti.com/
====================

Link: https://patch.msgid.link/20250415090543.717991-1-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: icss-iep: Fix possible NULL pointer dereference for perout request

The ICSS IEP driver tracks perout and pps enable state with flags.
Currently when disabling pps and perout signals during icss_iep_exit(),
results in NULL pointer dereference for perout.

To fix the null pointer dereference issue, the icss_iep_perout_enable_hw
function can be modified to directly clear the IEP CMP registers when
disabling PPS or PEROUT, without referencing the ptp_perout_request
structure, as its contents are irrelevant in this case.

Fixes: 9b115361248d ("net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/7b1c7c36-363a-4085-b26c-4f210bee1df6@stanley.mountain/
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250415090543.717991-4-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: icssg-prueth: Fix possible NULL pointer dereference inside emac_xmit_xdp_frame()

There is an error check inside emac_xmit_xdp_frame() function which
is called when the driver wants to transmit XDP frame, to check if
the allocated tx descriptor is NULL, if true to exit and return
ICSSG_XDP_CONSUMED implying failure in transmission.

In this case trying to free a descriptor which is NULL will result
in kernel crash due to NULL pointer dereference. Fix this error handling
and increase netdev tx_dropped stats in the caller of this function
if the function returns ICSSG_XDP_CONSUMED.

Fixes: 62aa3246f462 ("net: ti: icssg-prueth: Add XDP support")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/70d8dd76-0c76-42fc-8611-9884937c82f5@stanley.mountain/
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250415090543.717991-3-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: icssg-prueth: Fix kernel warning while bringing down network interface

During network interface initialization, the NIC driver needs to register
its Rx queue with the XDP, to ensure the incoming XDP buffer carries a
pointer reference to this info and is stored inside xdp_rxq_info.

While this struct isn't tied to XDP prog, if there are any changes in
Rx queue, the NIC driver needs to stop the Rx queue by unregistering
with XDP before purging and reallocating memory. Drop page_pool destroy
during Rx channel reset as this is already handled by XDP during
xdp_rxq_info_unreg (Rx queue unregister), failing to do will cause the
following warning:

warning logs: https://gist.github.com/MeghanaMalladiTI/eb627e5dc8de24e42d7d46572c13e576

Fixes: 46eeb90f03e0 ("net: ti: icssg-prueth: Use page_pool API for RX buffer allocation")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250415090543.717991-2-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netfilter: conntrack: fix erronous removal of offload bit

The blamed commit exposes a possible issue with flow_offload_teardown():
We might remove the offload bit of a conntrack entry that has been
offloaded again.

1. conntrack entry c1 is offloaded via flow f1 (f1->ct == c1).
2. f1 times out and is pushed back to slowpath, c1 offload bit is
   removed.  Due to bug, f1 is not unlinked from rhashtable right away.
3. a new packet arrives for the flow and re-offload is triggered, i.e.
   f2->ct == c1.  This is because lookup in flowtable skip entries with
   teardown bit set.
4. Next flowtable gc cycle finds f1 again
5. flow_offload_teardown() is called again for f1 and c1 offload bit is
   removed again, even though we have f2 referencing the same entry.

This is harmless, but clearly not correct.
Fix the bug that exposes this: set 'teardown = true' to have the gc
callback unlink the flowtable entry from the table right away instead of
the unintentional defer to the next round.

Also prevent flow_offload_teardown() from fixing up the ct state more than
once: We could also be called from the data path or a notifier, not only
from the flowtable gc callback.

NF_FLOW_TEARDOWN can never be unset, so we can use it as synchronization
point: if we observe did not see a 0 -> 1 transition, then another CPU
is already doing the ct state fixups for us.

Fixes: 03428ca5cee9 ("netfilter: conntrack: rework offload nf_conn timeout extension logic")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

xfs: document zoned rt specifics in admin-guide

Document the lifetime, nolifetime and max_open_zones mount options
added for zoned rt file systems.

Also add documentation describing the max_open_zones sysfs attribute
exposed in /sys/fs/xfs/<dev>/zoned/

Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

Merge tag 'md-6.15-20250416' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into block-6.15

Pull MD fixes from Yu:

"- fix raid10 missing discard IO accounting (Yu Kuai)
- fix bitmap stats for bitmap file (Zheng Qixing)
- fix oops while reading all member disks failed during check/repair
   (Meir Elisha)"

* tag 'md-6.15-20250416' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux:
  md/raid1: Add check for missing source disk in process_checks()
  md/md-bitmap: fix stats collection for external bitmaps
  md/raid10: fix missing discard IO accounting

Merge tag 'mm-hotfixes-stable-2025-04-16-19-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc hotfixes from Andrew Morton:
"31 hotfixes.

  9 are cc:stable and the remainder address post-6.15 issues or aren't
  considered necessary for -stable kernels.

  22 patches are for MM, 9 are otherwise"

* tag 'mm-hotfixes-stable-2025-04-16-19-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
  MAINTAINERS: update HUGETLB reviewers
  mm: fix apply_to_existing_page_range()
  selftests/mm: fix compiler -Wmaybe-uninitialized warning
  alloc_tag: handle incomplete bulk allocations in vm_module_tags_populate
  mailmap: add entry for Jean-Michel Hautbois
  mm: (un)track_pfn_copy() fix + doc improvements
  mm: fix filemap_get_folios_contig returning batches of identical folios
  mm/hugetlb: add a line break at the end of the format string
  selftests: mincore: fix tmpfs mincore test failure
  mm/hugetlb: fix set_max_huge_pages() when there are surplus pages
  mm/cma: report base address of single range correctly
  mm: page_alloc: speed up fallbacks in rmqueue_bulk()
  kunit: slub: add module description
  mm/kasan: add module decription
  ucs2_string: add module description
  zlib: add module description
  fpga: tests: add module descriptions
  samples/livepatch: add module descriptions
  ASN.1: add module description
  mm/vma: add give_up_on_oom option on modify/merge, use in uffd release
  ...

selftests: ublk: add generic_06 for covering fault inject

Add one simple fault inject target, and verify if an application using ublk
device sees an I/O error quickly after the ublk server dies.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ublk: simplify aborting ublk request

Now ublk_abort_queue() is moved to ublk char device release handler,
meantime our request queue is "quiesced" because either ->canceling was
set from uring_cmd cancel function or all IOs are inflight and can't be
completed by ublk server, things becomes easy much:

- all uring_cmd are done, so we needn't to mark io as UBLK_IO_FLAG_ABORTED
for handling completion from uring_cmd

- ublk char device is closed, no one can hold IO request reference any more,
so we can simply complete this request or requeue it for ublk_nosrv_should_reissue_outstanding.

Reviewed-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ublk: remove __ublk_quiesce_dev()

Remove __ublk_quiesce_dev() and open code for updating device state as
QUIESCED.

We needn't to drain inflight requests in __ublk_quiesce_dev() any more,
because all inflight requests are aborted in ublk char device release
handler.

Also we needn't to set ->canceling in __ublk_quiesce_dev() any more
because it is done unconditionally now in ublk_ch_release().

Reviewed-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ublk: improve detection and handling of ublk server exit

There are currently two ways in which ublk server exit is detected by
ublk_drv:

1. uring_cmd cancellation. If there are any outstanding uring_cmds which
   have not been completed to the ublk server when it exits, io_uring
   calls the uring_cmd callback with a special cancellation flag as the
   issuing task is exiting.
2. I/O timeout. This is needed in addition to the above to handle the
   "saturated queue" case, when all I/Os for a given queue are in the
   ublk server, and therefore there are no outstanding uring_cmds to
   cancel when the ublk server exits.

There are a couple of issues with this approach:

- It is complex and inelegant to have two methods to detect the same
  condition
- The second method detects ublk server exit only after a long delay
  (~30s, the default timeout assigned by the block layer). This delays
  the nosrv behavior from kicking in and potential subsequent recovery
  of the device.

The second issue is brought to light with the new test_generic_06 which
will be added in following patch. It fails before this fix:

selftests: ublk: test_generic_06.sh
dev id is 0
dd: error writing '/dev/ublkb0': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 30.0611 s, 0.0 kB/s
DEAD
dd took 31 seconds to exit (>= 5s tolerance)!
generic_06 : [FAIL]

Fix this by instead detecting and handling ublk server exit in the
character file release callback. This has several advantages:

- This one place can handle both saturated and unsaturated queues. Thus,
  it replaces both preexisting methods of detecting ublk server exit.
- It runs quickly on ublk server exit - there is no 30s delay.
- It starts the process of removing task references in ublk_drv. This is
  needed if we want to relax restrictions in the driver like letting
  only one thread serve each queue

There is also the disadvantage that the character file release callback
can also be triggered by intentional close of the file, which is a
significant behavior change. Preexisting ublk servers (libublksrv) are
dependent on the ability to open/close the file multiple times. To
address this, only transition to a nosrv state if the file is released
while the ublk device is live. This allows for programs to open/close
the file multiple times during setup. It is still a behavior change if a
ublk server decides to close/reopen the file while the device is LIVE
(i.e. while it is responsible for serving I/O), but that would be highly
unusual. This behavior is in line with what is done by FUSE, which is
very similar to ublk in that a userspace daemon is providing services
traditionally provided by the kernel.

With this change in, the new test (and all other selftests, and all
ublksrv tests) pass:

selftests: ublk: test_generic_06.sh
dev id is 0
dd: error writing '/dev/ublkb0': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 0.0376731 s, 0.0 kB/s
DEAD
generic_04 : [PASS]

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ublk: move device reset into ublk_ch_release()

ublk_ch_release() is called after ublk char device is closed, when all
uring_cmd are done, so it is perfect fine to move ublk device reset to
ublk_ch_release() from ublk_ctrl_start_recovery().

This way can avoid to grab the exiting daemon task_struct too long.

However, reset of the following ublk IO flags has to be moved until ublk
io_uring queues are ready:

- ubq->canceling

For requeuing IO in case of ublk_nosrv_dev_should_queue_io() before device
is recovered

- ubq->fail_io

For failing IO in case of UBLK_F_USER_RECOVERY_FAIL_IO before device is
recovered

- ublk_io->flags

For preventing using io->cmd

With this way, recovery is simplified a lot.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ublk: rely on ->canceling for dealing with ublk_nosrv_dev_should_queue_io

Now ublk deals with ublk_nosrv_dev_should_queue_io() by keeping request
queue as quiesced. This way is fragile because queue quiesce crosses syscalls
or process contexts.

Switch to rely on ubq->canceling for dealing with
ublk_nosrv_dev_should_queue_io(), because it has been used for this purpose
during io_uring context exiting, and it can be reused before recovering too.
In ublk_queue_rq(), the request will be added to requeue list without
kicking off requeue in case of ubq->canceling, and finally requests added in
requeue list will be dispatched from either ublk_stop_dev() or
ublk_ctrl_end_recovery().

Meantime we have to move reset of ubq->canceling from ublk_ctrl_start_recovery()
to ublk_ctrl_end_recovery(), when IO handling can be recovered completely.

Then blk_mq_quiesce_queue() and blk_mq_unquiesce_queue() are always used
in same context.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Uday Shankar <ushankar@purestorage.com>
Link: https://lore.kernel.org/r/20250416035444.99569-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ublk: add ublk_force_abort_dev()

Add ublk_force_abort_dev() for handling ublk_nosrv_dev_should_queue_io()
in ublk_stop_dev(). Then queue quiesce and unquiesce can be paired in
single function.

Meantime not change device state to QUIESCED any more, since the disk is
going to be removed soon.

Reviewed-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ublk: properly serialize all FETCH_REQs

Most uring_cmds issued against ublk character devices are serialized
because each command affects only one queue, and there is an early check
which only allows a single task (the queue's ubq_daemon) to issue
uring_cmds against that queue. However, this mechanism does not work for
FETCH_REQs, since they are expected before ubq_daemon is set. Since
FETCH_REQs are only used at initialization and not in the fast path,
serialize them using the per-ublk-device mutex. This fixes a number of
data races that were previously possible if a badly behaved ublk server
decided to issue multiple FETCH_REQs against the same qid/tag
concurrently.

Reported-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

selftests: ublk: move creating UBLK_TMP into _prep_test()

test may exit early because of missing program or not having required
feature before calling _prep_test(), then $UBLK_TMP isn't cleaned.

Fix it by moving creating $UBLK_TMP into _prep_test(), any resources
created since _prep_test() will be cleaned by _cleanup_test().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-14-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

selftests: ublk: add test_stress_05.sh

Add test_stress_05.sh for covering removing device with recovery
enabled.

io-hang has been observed with the following patch:

https://lore.kernel.org/linux-block/20250403-ublk_timeout-v3-1-aa09f76c7451@purestorage.com/

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-13-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

selftests: ublk: support user recovery

Add user recovery feature.

Meantime add user recovery test: generic_04 and generic_05(zero copy)

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-12-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

selftests: ublk: support target specific command line

Support target specific command line for making related command line code
handling more readable & clean.

Also helps for adding new features.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-11-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

selftests: ublk: increase max nr_queues and queue depth

Increase max nr_queues to 32, and queue depth to 1024.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-10-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

selftests: ublk: set queue pthread's cpu affinity

In NUMA machine, ublk IO performance is very sensitive with queue
pthread's affinity setting.

Retrieve queue's affinity and select the 1st cpu as queue thread's sched
affinity, and it is observed that single cpu task affinity can get
stable & good performance if client application is put on proper cpu.

Dump this info when adding one ublk device. Use shmem to communicate
queue's tid between parent and daemon.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

selftests: ublk: setup ring with IORING_SETUP_SINGLE_ISSUER/IORING_SETUP_DEFER_TASKRUN

It is observed that this way is more efficient for fast nvme backing
file.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

selftests: ublk: add two stress tests for zero copy feature

Add stress_03 & stress_04 for covering zero copy feature.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

selftests: ublk: run stress tests in parallel

Run stress tests in parallel, meantime add shell local function to
simplify the two stress tests.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

selftests: ublk: make sure _add_ublk_dev can return in sub-shell

Detach ublk daemon from the starting process completely by double-fork and
clearing its process group, so that `_add_ublk_dev` can return from sub-shell.

Then it is more friendly for writing shell test script for adding/recovering
ublk device.

Prepare for running ublk test in parallel.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

selftests: ublk: cleanup backfile automatically

Use global array of $UBLK_BACKFILES for storing all backfile name, then
clean them automatically.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

selftests: ublk: add io_uring uapi header

Add io_uring UAPI header so that ublk can work with latest uapi
definition.

Fix the following build failure:

stripe.c: In function ‘stripe_to_uring_op’:
stripe.c:120:29: error: ‘IORING_OP_READV_FIXED’ undeclared (first use in this function); did you mean ‘IORING_OP_READ_FIXED’?
  120 |                 return zc ? IORING_OP_READV_FIXED : IORING_OP_READV;
      |                             ^~~~~~~~~~~~~~~~~~~~~
      |                             IORING_OP_READ_FIXED

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Fixes: 57ed58c13256 ("selftests: ublk: enable zero copy for stripe target")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

selftests: ublk: fix ublk_find_tgt()

Bounds check for iterator variable `i` is missed, so add it and fix
ublk_find_tgt().

Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250412023035.2649275-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

net: don't try to ops lock uninitialized devs

We need to be careful when operating on dev while in rtnl_create_link().
Some devices (vxlan) initialize netdev_ops in ->newlink, so later on.
Avoid using netdev_lock_ops(), the device isn't registered so we
cannot legally call its ops or generate any notifications for it.

netdev_ops_assert_locked_or_invisible() is safe to use, it checks
registration status first.

Reported-by: syzbot+de1c7d68a10e3f123bdd@syzkaller.appspotmail.com
Fixes: 04efcee6ef8d ("net: hold instance lock during NETDEV_CHANGE")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250415151552.768373-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ptp: ocp: fix start time alignment in ptp_ocp_signal_set

In ptp_ocp_signal_set, the start time for periodic signals is not
aligned to the next period boundary. The current code rounds up the
start time and divides by the period but fails to multiply back by
the period, causing misaligned signal starts. Fix this by multiplying
the rounded-up value by the period to ensure the start time is the
closest next period.

Fixes: 4bd46bb037f8e ("ptp: ocp: Use DIV64_U64_ROUND_UP for rounding.")
Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250415053131.129413-1-maimon.sagi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'collection-of-dsa-bug-fixes'

Vladimir Oltean says:

====================
Collection of DSA bug fixes

Prompted by Russell King's 3 DSA bug reports from Friday (linked in
their respective patches: 1, 2 and 3), I am providing fixes to those, as
well as flushing the queue with 2 other bug fixes I had.

1: fix NULL pointer dereference during mv88e6xxx driver unbind, on old
   switch models which lack PVT and/or STU. Seen on the ZII dev board
   rev B.
2: fix failure to delete bridge port VLANs on old mv88e6xxx chips which
   lack STU. Seen on the same board.
3: fix WARN_ON() and resource leak in DSA core on driver unbind. Seen on
   the same board but is a much more widespread issue.
4: fix use-after-free during probing of DSA trees with >= 3 switches,
   if -EPROBE_DEFER exists. In principle issue also exists for the ZII
   board, I reproduced on Turris MOX.
5: fix incorrect use of refcount API in DSA core for those switches
   which use tag_8021q (felix, sja1105, vsc73xx). Returning an error
   when attempting to delete a tag_8021q VLAN prints a WARN_ON(), which
   is harmless but might be problematic with CONFIG_PANIC_ON_OOPS.
====================

Link: https://patch.msgid.link/20250414212708.2948164-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: avoid refcount warnings when ds->ops->tag_8021q_vlan_del() fails

This is very similar to the problem and solution from commit
232deb3f9567 ("net: dsa: avoid refcount warnings when
->port_{fdb,mdb}_del returns error"), except for the
dsa_port_do_tag_8021q_vlan_del() operation.

Fixes: c64b9c05045a ("net: dsa: tag_8021q: add proper cross-chip notifier support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250414213020.2959021-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: free routing table on probe failure

If complete = true in dsa_tree_setup(), it means that we are the last
switch of the tree which is successfully probing, and we should be
setting up all switches from our probe path.

After "complete" becomes true, dsa_tree_setup_cpu_ports() or any
subsequent function may fail. If that happens, the entire tree setup is
in limbo: the first N-1 switches have successfully finished probing
(doing nothing but having allocated persistent memory in the tree's
dst->ports, and maybe dst->rtable), and switch N failed to probe, ending
the tree setup process before anything is tangible from the user's PoV.

If switch N fails to probe, its memory (ports) will be freed and removed
from dst->ports. However, the dst->rtable elements pointing to its ports,
as created by dsa_link_touch(), will remain there, and will lead to
use-after-free if dereferenced.

If dsa_tree_setup_switches() returns -EPROBE_DEFER, which is entirely
possible because that is where ds->ops->setup() is, we get a kasan
report like this:

==================================================================
BUG: KASAN: slab-use-after-free in mv88e6xxx_setup_upstream_port+0x240/0x568
Read of size 8 at addr ffff000004f56020 by task kworker/u8:3/42

Call trace:
__asan_report_load8_noabort+0x20/0x30
mv88e6xxx_setup_upstream_port+0x240/0x568
mv88e6xxx_setup+0xebc/0x1eb0
dsa_register_switch+0x1af4/0x2ae0
mv88e6xxx_register_switch+0x1b8/0x2a8
mv88e6xxx_probe+0xc4c/0xf60
mdio_probe+0x78/0xb8
really_probe+0x2b8/0x5a8
__driver_probe_device+0x164/0x298
driver_probe_device+0x78/0x258
__device_attach_driver+0x274/0x350

Allocated by task 42:
__kasan_kmalloc+0x84/0xa0
__kmalloc_cache_noprof+0x298/0x490
dsa_switch_touch_ports+0x174/0x3d8
dsa_register_switch+0x800/0x2ae0
mv88e6xxx_register_switch+0x1b8/0x2a8
mv88e6xxx_probe+0xc4c/0xf60
mdio_probe+0x78/0xb8
really_probe+0x2b8/0x5a8
__driver_probe_device+0x164/0x298
driver_probe_device+0x78/0x258
__device_attach_driver+0x274/0x350

Freed by task 42:
__kasan_slab_free+0x48/0x68
kfree+0x138/0x418
dsa_register_switch+0x2694/0x2ae0
mv88e6xxx_register_switch+0x1b8/0x2a8
mv88e6xxx_probe+0xc4c/0xf60
mdio_probe+0x78/0xb8
really_probe+0x2b8/0x5a8
__driver_probe_device+0x164/0x298
driver_probe_device+0x78/0x258
__device_attach_driver+0x274/0x350

The simplest way to fix the bug is to delete the routing table in its
entirety. dsa_tree_setup_routing_table() has no problem in regenerating
it even if we deleted links between ports other than those of switch N,
because dsa_link_touch() first checks whether the port pair already
exists in dst->rtable, allocating if not.

The deletion of the routing table in its entirety already exists in
dsa_tree_teardown(), so refactor that into a function that can also be
called from the tree setup error path.

In my analysis of the commit to blame, it is the one which added
dsa_link elements to dst->rtable. Prior to that, each switch had its own
ds->rtable which is freed when the switch fails to probe. But the tree
is potentially persistent memory.

Fixes: c5f51765a1f6 ("net: dsa: list DSA links in the fabric")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250414213001.2957964-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: clean up FDB, MDB, VLAN entries on unbind

As explained in many places such as commit b117e1e8a86d ("net: dsa:
delete dsa_legacy_fdb_add and dsa_legacy_fdb_del"), DSA is written given
the assumption that higher layers have balanced additions/deletions.
As such, it only makes sense to be extremely vocal when those
assumptions are violated and the driver unbinds with entries still
present.

But Ido Schimmel points out a very simple situation where that is wrong:
https://lore.kernel.org/netdev/ZDazSM5UsPPjQuKr@shredder/
(also briefly discussed by me in the aforementioned commit).

Basically, while the bridge bypass operations are not something that DSA
explicitly documents, and for the majority of DSA drivers this API
simply causes them to go to promiscuous mode, that isn't the case for
all drivers. Some have the necessary requirements for bridge bypass
operations to do something useful - see dsa_switch_supports_uc_filtering().

Although in tools/testing/selftests/net/forwarding/local_termination.sh,
we made an effort to popularize better mechanisms to manage address
filters on DSA interfaces from user space - namely macvlan for unicast,
and setsockopt(IP_ADD_MEMBERSHIP) - through mtools - for multicast, the
fact is that 'bridge fdb add ... self static local' also exists as
kernel UAPI, and might be useful to someone, even if only for a quick
hack.

It seems counter-productive to block that path by implementing shim
.ndo_fdb_add and .ndo_fdb_del operations which just return -EOPNOTSUPP
in order to prevent the ndo_dflt_fdb_add() and ndo_dflt_fdb_del() from
running, although we could do that.

Accepting that cleanup is necessary seems to be the only option.
Especially since we appear to be coming back at this from a different
angle as well. Russell King is noticing that the WARN_ON() triggers even
for VLANs:
https://lore.kernel.org/netdev/Z_li8Bj8bD4-BYKQ@shell.armlinux.org.uk/

What happens in the bug report above is that dsa_port_do_vlan_del() fails,
then the VLAN entry lingers on, and then we warn on unbind and leak it.

This is not a straight revert of the blamed commit, but we now add an
informational print to the kernel log (to still have a way to see
that bugs exist), and some extra comments gathered from past years'
experience, to justify the logic.

Fixes: 0832cd9f1f02 ("net: dsa: warn if port lists aren't empty in dsa_port_teardown")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250414212930.2956310-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mv88e6xxx: fix -ENOENT when deleting VLANs and MST is unsupported

Russell King reports that on the ZII dev rev B, deleting a bridge VLAN
from a user port fails with -ENOENT:
https://lore.kernel.org/netdev/Z_lQXNP0s5-IiJzd@shell.armlinux.org.uk/

This comes from mv88e6xxx_port_vlan_leave() -> mv88e6xxx_mst_put(),
which tries to find an MST entry in &chip->msts associated with the SID,
but fails and returns -ENOENT as such.

But we know that this chip does not support MST at all, so that is not
surprising. The question is why does the guard in mv88e6xxx_mst_put()
not exit early:

if (!sid)
return 0;

And the answer seems to be simple: the sid comes from vlan.sid which
supposedly was previously populated by mv88e6xxx_vtu_get().
But some chip->info->ops->vtu_getnext() implementations do not populate
vlan.sid, for example see mv88e6185_g1_vtu_getnext(). In that case,
later in mv88e6xxx_port_vlan_leave() we are using a garbage sid which is
just residual stack memory.

Testing for sid == 0 covers all cases of a non-bridge VLAN or a bridge
VLAN mapped to the default MSTI. For some chips, SID 0 is valid and
installed by mv88e6xxx_stu_setup(). A chip which does not support the
STU would implicitly only support mapping all VLANs to the default MSTI,
so although SID 0 is not valid, it would be sufficient, if we were to
zero-initialize the vlan structure, to fix the bug, due to the
coincidence that a test for vlan.sid == 0 already exists and leads to
the same (correct) behavior.

Another option which would be sufficient would be to add a test for
mv88e6xxx_has_stu() inside mv88e6xxx_mst_put(), symmetric to the one
which already exists in mv88e6xxx_mst_get(). But that placement means
the caller will have to dereference vlan.sid, which means it will access
uninitialized memory, which is not nice even if it ignores it later.

So we end up making both modifications, in order to not rely just on the
sid == 0 coincidence, but also to avoid having uninitialized structure
fields which might get temporarily accessed.

Fixes: acaf4d2e36b3 ("net: dsa: mv88e6xxx: MST Offloading")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250414212913.2955253-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mv88e6xxx: avoid unregistering devlink regions which were never registered

Russell King reports that a system with mv88e6xxx dereferences a NULL
pointer when unbinding this driver:
https://lore.kernel.org/netdev/Z_lRkMlTJ1KQ0kVX@shell.armlinux.org.uk/

The crash seems to be in devlink_region_destroy(), which is not NULL
tolerant but is given a NULL devlink global region pointer.

At least on some chips, some devlink regions are conditionally registered
since the blamed commit, see mv88e6xxx_setup_devlink_regions_global():

if (cond && !cond(chip))
continue;

These are MV88E6XXX_REGION_STU and MV88E6XXX_REGION_PVT. If the chip
does not have an STU or PVT, it should crash like this.

To fix the issue, avoid unregistering those regions which are NULL, i.e.
were skipped at mv88e6xxx_setup_devlink_regions_global() time.

Fixes: 836021a2d0e0 ("net: dsa: mv88e6xxx: Export cross-chip PVT as devlink region")
Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250414212850.2953957-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: txgbe: fix memory leak in txgbe_probe() error path

When txgbe_sw_init() is called, memory is allocated for wx->rss_key
in wx_init_rss_key(). However, in txgbe_probe() function, the subsequent
error paths after txgbe_sw_init() don't free the rss_key. Fix that by
freeing it in error path along with wx->mac_table.

Also change the label to which execution jumps when txgbe_sw_init()
fails, because otherwise, it could lead to a double free for rss_key,
when the mac_table allocation fails in wx_sw_init().

Fixes: 937d46ecc5f9 ("net: wangxun: add ethtool_ops for channel number")
Reported-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com>
Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250415032910.13139-1-abdun.nihaal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: switchdev: do not notify new brentries as changed

When adding a bridge vlan that is pvid or untagged after the vlan has
already been added to any other switchdev backed port, the vlan change
will be propagated as changed, since the flags change.

This causes the vlan to not be added to the hardware for DSA switches,
since the DSA handler ignores any vlans for the CPU or DSA ports that
are changed.

E.g. the following order of operations would work:

$ ip link add swbridge type bridge vlan_filtering 1 vlan_default_pvid 0
$ ip link set lan1 master swbridge
$ bridge vlan add dev swbridge vid 1 pvid untagged self
$ bridge vlan add dev lan1 vid 1 pvid untagged

but this order would break:

$ ip link add swbridge type bridge vlan_filtering 1 vlan_default_pvid 0
$ ip link set lan1 master swbridge
$ bridge vlan add dev lan1 vid 1 pvid untagged
$ bridge vlan add dev swbridge vid 1 pvid untagged self

Additionally, the vlan on the bridge itself would become undeletable:

$ bridge vlan
port              vlan-id
lan1              1 PVID Egress Untagged
swbridge          1 PVID Egress Untagged
$ bridge vlan del dev swbridge vid 1 self
$ bridge vlan
port              vlan-id
lan1              1 PVID Egress Untagged
swbridge          1 Egress Untagged

since the vlan was never added to DSA's vlan list, so deleting it will
cause an error, causing the bridge code to not remove it.

Fix this by checking if flags changed only for vlans that are already
brentry and pass changed as false for those that become brentries, as
these are a new vlan (member) from the switchdev point of view.

Since *changed is set to true for becomes_brentry = true regardless of
would_change's value, this will not change any rtnetlink notification
delivery, just the value passed on to switchdev in vlan->changed.

Fixes: 8d23a54f5bee ("net: bridge: switchdev: differentiate new VLANs from changed ones")
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250414200020.192715-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: b53: enable BPDU reception for management port

For STP to work, receiving BPDUs is essential, but the appropriate bit
was never set. Without GC_RX_BPDU_EN, the switch chip will filter all
BPDUs, even if an appropriate PVID VLAN was setup.

Fixes: ff39c2d68679 ("net: dsa: b53: Add bridge support")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Link: https://patch.msgid.link/20250414200434.194422-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ynl-avoid-leaks-in-attr-override-and-spec-fixes-for-c'

Jakub Kicinski says:

====================
ynl: avoid leaks in attr override and spec fixes for C

The C rt-link work revealed more problems in existing codegen
and classic netlink specs.

Patches 1 - 4 fix issues with the codegen. Patches 1 and 2 are
pre-requisites for patch 3. Patch 3 fixes leaking memory if user
tries to override already set attr. Patch 4 validates attrs in case
kernel sends something we don't expect.

Remaining patches fix and align the specs. Patch 5 changes nesting,
the rest are naming adjustments.
====================

Link: https://patch.msgid.link/20250414211851.602096-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: rt-neigh: prefix struct nfmsg members with ndm

Attach ndm- to all members of struct nfmsg. We could possibly
use name-prefix just for C, but I don't think we have any precedent
for using name-prefix on structs, and other rtnetlink sub-specs
give full names for fixed header struct members.

Fixes: bc515ed06652 ("netlink: specs: Add a spec for neighbor tables in rtnetlink")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250414211851.602096-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: rt-link: adjust mctp attribute naming

MCTP attribute naming is inconsistent. In C we have:
    IFLA_MCTP_NET,
    IFLA_MCTP_PHYS_BINDING,
         ^^^^

but in YAML:
    - mctp-net
    - phys-binding
      ^
       no "mctp"

It's unclear whether the "mctp" part of the name is supposed
to be a prefix or part of attribute name. Make it a prefix,
seems cleaner, even tho technically phys-binding was added later.

Fixes: b2f63d904e72 ("doc/netlink: Add spec for rt link messages")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250414211851.602096-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>