www.infradead.org Git - nvme.git/log

Merge tag 'pci-v6.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fix from Bjorn Helgaas:

- Revert a reset patch that broke VFIO passthrough because devices
ended up with no available reset mechanisms (Alex Williamson)

* tag 'pci-v6.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
Revert "PCI: Avoid reset when disabled via sysfs"

Merge tag 'bcachefs-2025-04-17' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
"Usual set of small fixes/logging improvements.

  One bigger user reported fix, for inode <-> dirent inconsistencies
  reported in fsck, after moving a subvolume that had been snapshotted"

* tag 'bcachefs-2025-04-17' of git://evilpiepirate.org/bcachefs:
  bcachefs: Fix snapshotting a subvolume, then renaming it
  bcachefs: Add missing READ_ONCE() for metadata replicas
  bcachefs: snapshot_node_missing is now autofix
  bcachefs: Log message when incompat version requested but not enabled
  bcachefs: Print version_incompat_allowed on startup
  bcachefs: Silence extent_poisoned error messages
  bcachefs: btree_root_unreadable_and_scan_found_nothing now AUTOFIX
  bcachefs: fix bch2_dev_usage_full_read_fast()
  bcachefs: Don't print data read retry success on non-errors
  bcachefs: Add missing error handling
  bcachefs: Prevent granting write refs when filesystem is read-only

Merge tag 'vfio-v6.15-rc3' of https://github.com/awilliam/linux-vfio

Pull vfio fix from Alex Williamson:

- Include devices where the platform indicates PCI INTx is not routed
   by setting pdev->irq to zero in the expanded virtualization of the
   PCI pin register. This provides consistency in the INFO and SET_IRQS
   ioctls (Alex Williamson)

* tag 'vfio-v6.15-rc3' of https://github.com/awilliam/linux-vfio:
  vfio/pci: Virtualize zero INTx PIN if no pdev->irq

Merge tag 'spi-fix-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A few more device specific fixes plus one trivial quirk.

  There's a couple of patches for Tegra which avoid some fairly
  spectacular log spam if the hardware breaks in ways which were
  actually seen in production, plus a fix for the i.MX driver to
  propagate errors properly when setting up the hardware.

  We also have a trivial patch marking the sun4i driver as being
  compatible with GPIO chip selects"

* tag 'spi-fix-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-imx: Add check for spi_imx_setupxfer()
  spi: tegra210-quad: add rate limiting and simplify timeout error message
  spi: tegra210-quad: use WARN_ON_ONCE instead of WARN_ON for timeouts
  spi: sun4i: add support for GPIO chip select lines

Merge tag 'net-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from Bluetooth, CAN and Netfilter.

  Current release - regressions:

   - two fixes for the netdev per-instance locking

   - batman-adv: fix double-hold of meshif when getting enabled

  Current release - new code bugs:

   - Bluetooth: increment TX timestamping tskey always for stream
     sockets

   - wifi: static analysis and build fixes for the new Intel sub-driver

  Previous releases - regressions:

   - net: fib_rules: fix iif / oif matching on L3 master (VRF) device

   - ipv6: add exception routes to GC list in rt6_insert_exception()

   - netfilter: conntrack: fix erroneous removal of offload bit

   - Bluetooth:
       - fix sending MGMT_EV_DEVICE_FOUND for invalid address
       - l2cap: process valid commands in too long frame
       - btnxpuart: Revert baudrate change in nxp_shutdown

  Previous releases - always broken:

   - ethtool: fix memory corruption during SFP FW flashing

   - eth:
       - hibmcge: fixes for link and MTU handling, pause frames etc
       - igc: fixes for PTM (PCIe timestamping)

   - dsa: b53: enable BPDU reception for management port

  Misc:

   - fixes for Netlink protocol schemas"

* tag 'net-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
  net: ethernet: mtk_eth_soc: revise QDMA packet scheduler settings
  net: ethernet: mtk_eth_soc: correct the max weight of the queue limit for 100Mbps
  net: ethernet: mtk_eth_soc: reapply mdc divider on reset
  net: ti: icss-iep: Fix possible NULL pointer dereference for perout request
  net: ti: icssg-prueth: Fix possible NULL pointer dereference inside emac_xmit_xdp_frame()
  net: ti: icssg-prueth: Fix kernel warning while bringing down network interface
  netfilter: conntrack: fix erronous removal of offload bit
  net: don't try to ops lock uninitialized devs
  ptp: ocp: fix start time alignment in ptp_ocp_signal_set
  net: dsa: avoid refcount warnings when ds->ops->tag_8021q_vlan_del() fails
  net: dsa: free routing table on probe failure
  net: dsa: clean up FDB, MDB, VLAN entries on unbind
  net: dsa: mv88e6xxx: fix -ENOENT when deleting VLANs and MST is unsupported
  net: dsa: mv88e6xxx: avoid unregistering devlink regions which were never registered
  net: txgbe: fix memory leak in txgbe_probe() error path
  net: bridge: switchdev: do not notify new brentries as changed
  net: b53: enable BPDU reception for management port
  netlink: specs: rt-neigh: prefix struct nfmsg members with ndm
  netlink: specs: rt-link: adjust mctp attribute naming
  netlink: specs: rtnetlink: attribute naming corrections
  ...

bcachefs: Fix snapshotting a subvolume, then renaming it

Subvolume roots and the dirents that point to them are special; they
don't obey the normal snapshot versioning rules because they cross
snapshot boundaries.

We don't keep around older versions of subvolume dirents on rename - we
don't need to, because subvolume dirents are only visible in the parent
subvolume, and we wouldn't be able to match up the different dirent and
inode versions due to crossing the snapshot ID boundary.

That means that when we rename a subvolume, that's been snapshotted, the
older version of the subvolume root will become dangling - it won't have
a dirent that points to it.

That's expected, we just need to tell fsck that this is ok.

Fixes: https://github.com/koverstreet/bcachefs/issues/856
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Merge tag 'for-linus-6.15a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
"Just a single fix for the Xen multicall driver avoiding a percpu
variable referencing initdata by its initializer"

* tag 'for-linus-6.15a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: fix multicall debug feature

Merge tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull fwctl fixes from Jason Gunthorpe:
"Three small changes from further build testing:

   - Don't rely on the userspace uuid.h for the uapi header

   - Fix sparse warnings in pds

   - Typo in log message"

* tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  fwctl: Fix repeated device word in log message
  pds_fwctl: Fix type and endian complaints
  fwctl/cxl: Fix uuid_t usage in uapi

Merge tag 'sound-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes. All are device-specific like quirks, new
  IDs, and other safe (or rather boring) changes"

* tag 'sound-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  firmware: cs_dsp: test_bin_error: Fix uninitialized data used as fw version
  ASoC: codecs: Add of_match_table for aw888081 driver
  ASoC: fsl: fsl_qmc_audio: Reset audio data pointers on TRIGGER_START event
  mailmap: Add entry for Srinivas Kandagatla
  MAINTAINERS: use kernel.org alias
  ASoC: cs42l43: Reset clamp override on jack removal
  ALSA: hda/realtek - Fixed ASUS platform headset Mic issue
  ALSA: hda/cirrus_scodec_test: Don't select dependencies
  ALSA: azt2320: Replace deprecated strcpy() with strscpy()
  ASoC: hdmi-codec: use RTD ID instead of DAI ID for ELD entry
  ASoC: Intel: avs: Constrain path based on BE capabilities
  ALSA: hda/tas2781: Remove unnecessary NULL check before release_firmware()
  ASoC: Intel: avs: Fix null-ptr-deref in avs_component_probe()
  ASoC: fsl_asrc_dma: get codec or cpu dai from backend
  ASoC: qcom: Fix sc7280 lpass potential buffer overflow
  ASoC: dwc: always enable/disable i2s irqs
  ASoC: Intel: sof_sdw: Add quirk for Asus Zenbook S16
  ASoC: codecs:lpass-wsa-macro: Fix logic of enabling vi channels
  ASoC: codecs:lpass-wsa-macro: Fix vi feedback rate

Merge tag 'platform-drivers-x86-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers fixes from Ilpo Järvinen:
"Fixes:
   - amd/pmf: Fix STT limits
   - asus-laptop: Fix an uninitialized variable
   - intel_pmc_ipc: Allow building without ACPI
   - mlxbf-bootctl: Use sysfs_emit_at() in secure_boot_fuse_state_show()
   - msi-wmi-platform: Add locking to workaround ACPI firmware bug

  New HW support:
   - alienware-wmi-wmax:
      - Extended thermal control support to:
         - Alienware Area-51m R2
         - Alienware m16 R1
         - Alienware m16 R2
         - Dell G16 7630
         - Dell G5 5505 SE
      - G-Mode support to Alienware m16 R1
   - x86-android-tablets: Add Vexia Edu Atla 10 tablet 5V data"

* tag 'platform-drivers-x86-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: msi-wmi-platform: Workaround a ACPI firmware bug
  platform/x86: msi-wmi-platform: Rename "data" variable
  platform/x86: alienware-wmi-wmax: Extend support to more laptops
  platform/x86: alienware-wmi-wmax: Add G-Mode support to Alienware m16 R1
  platform/x86: amd: pmf: Fix STT limits
  mlxbf-bootctl: use sysfs_emit_at() in secure_boot_fuse_state_show()
  platform/x86: x86-android-tablets: Add Vexia Edu Atla 10 tablet 5V data
  platform/x86: x86-android-tablets: Add "9v" to Vexia EDU ATLA 10 tablet symbols
  asus-laptop: Fix an uninitialized variable
  platform/x86: intel_pmc_ipc: add option to build without ACPI

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Small drivers fixes, except for ufs which has two large updates, one
  for exposing the device level feature, which is a new addition to the
  device spec and the other reworking the exynos driver to fix coherence
  issues on some android phones"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: megaraid_sas: Driver version update to 07.734.00.00-rc1
  scsi: megaraid_sas: Block zero-length ATA VPD inquiry
  scsi: scsi_transport_srp: Replace min/max nesting with clamp()
  scsi: ufs: core: Add device level exception support
  scsi: ufs: core: Rename ufshcd_wb_presrv_usrspc_keep_vcc_on()
  scsi: smartpqi: Use is_kdump_kernel() to check for kdump
  scsi: pm80xx: Set phy_attached to zero when device is gone
  scsi: ufs: exynos: gs101: Put UFS device in reset on .suspend()
  scsi: ufs: exynos: Move phy calls to .exit() callback
  scsi: ufs: exynos: Enable PRDT pre-fetching with UFSHCD_CAP_CRYPTO
  scsi: ufs: exynos: Ensure consistent phy reference counts
  scsi: ufs: exynos: Disable iocc if dma-coherent property isn't set
  scsi: ufs: exynos: Move UFS shareability value to drvdata
  scsi: ufs: exynos: Ensure pre_link() executes before exynos_ufs_phy_init()
  scsi: iscsi: Fix missing scsi_host_put() in error path
  scsi: ufs: core: Fix a race condition related to device commands
  scsi: hisi_sas: Fix I/O errors caused by hardware port ID changes
  scsi: hisi_sas: Enable force phy when SATA disk directly connected

Merge tag 'ata-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fix from Damien Le Moal:

- Fix how sense data from the sense data for successfull NCQ commands
   log page is used to fully initialize the result_tf of a completed
   command, so that the sense data returned to the scsi layer is fully
   initialized with all the device provided information (from Niklas)

* tag 'ata-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-sata: Save all fields from sense data descriptor

Merge tag 'xfs-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull XFS fixes from Carlos Maiolino:
"This mostly includes fixes and documentation for the zoned allocator
  feature merged during previous merge window, but it also adds a sysfs
  tunable for the zone garbage collector.

  There is also a fix for a regression to the RT device that we'd like
  to fix ASAP now that we're getting more users on the RT zoned
  allocator"

* tag 'xfs-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: document zoned rt specifics in admin-guide
  xfs: fix fsmap for internal zoned devices
  xfs: Fix spelling mistake "drity" -> "dirty"
  xfs: compute buffer address correctly in xmbuf_map_backing_mem
  xfs: add tunable threshold parameter for triggering zone GC
  xfs: mark xfs_buf_free as might_sleep()
  xfs: remove the leftover xfs_{set,clear}_li_failed infrastructure

Merge tag 'for-6.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- handle encoded read ioctl returning EAGAIN so it does not mistakenly
   free the work structure

- escape subvolume path in mount option list so it cannot be wrongly
   parsed when the path contains ","

- remove folio size assertions when writing super block to device with
   enabled large folios

* tag 'for-6.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: remove folio order ASSERT()s in super block writeback path
  btrfs: correctly escape subvol in btrfs_show_options()
  btrfs: ioctl: don't free iov when btrfs_encoded_read() returns -EAGAIN

Merge tag 'slab-for-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:

- Stable fix adding zero initialization of slab->obj_ext to prevent
crashes with allocation profiling (Suren Baghdasaryan)

* tag 'slab-for-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slab: ensure slab->obj_exts is clear in a newly allocated slab page

net: ethernet: mtk_eth_soc: revise QDMA packet scheduler settings

The QDMA packet scheduler suffers from a performance issue.
Fix this by picking up changes from MediaTek's SDK which change to use
Token Bucket instead of Leaky Bucket and fix the SPEED_1000 configuration.

Fixes: 160d3a9b1929 ("net: ethernet: mtk_eth_soc: introduce MTK_NETSYS_V2 support")
Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/18040f60f9e2f5855036b75b28c4332a2d2ebdd8.1744764277.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_eth_soc: correct the max weight of the queue limit for 100Mbps

Without this patch, the maximum weight of the queue limit will be
incorrect when linked at 100Mbps due to an apparent typo.

Fixes: f63959c7eec31 ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues")
Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/74111ba0bdb13743313999ed467ce564e8189006.1744764277.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_eth_soc: reapply mdc divider on reset

In the current method, the MDC divider was reset to the default setting
of 2.5MHz after the NETSYS SER. Therefore, we need to reapply the MDC
divider configuration function in mtk_hw_init() after reset.

Fixes: c0a440031d431 ("net: ethernet: mtk_eth_soc: set MDIO bus clock frequency")
Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/8ab7381447e6cdcb317d5b5a6ddd90a1734efcb0.1744764277.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-25-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fix for net

The following batch contains one Netfilter fix for net:

1) conntrack offload bit is erroneously unset in a race scenario,
from Florian Westphal.

netfilter pull request 25-04-17

* tag 'nf-25-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: conntrack: fix erronous removal of offload bit
====================

Link: https://patch.msgid.link/20250417102847.16640-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

spi: spi-imx: Add check for spi_imx_setupxfer()

Add check for the return value of spi_imx_setupxfer().
spi_imx->rx and spi_imx->tx function pointer can be NULL when
spi_imx_setupxfer() return error, and make NULL pointer dereference.

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Call trace:
  0x0
  spi_imx_pio_transfer+0x50/0xd8
  spi_imx_transfer_one+0x18c/0x858
  spi_transfer_one_message+0x43c/0x790
  __spi_pump_transfer_message+0x238/0x5d4
  __spi_sync+0x2b0/0x454
  spi_write_then_read+0x11c/0x200

Signed-off-by: Tamura Dai <kirinode0@gmail.com>
Reviewed-by: Carlos Song <carlos.song@nxp.com>
Link: https://patch.msgid.link/20250417011700.14436-1-kirinode0@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

Merge tag 'for-net-2025-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

- l2cap: Process valid commands in too long frame
- vhci: Avoid needless snprintf() calls

* tag 'for-net-2025-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: vhci: Avoid needless snprintf() calls
Bluetooth: l2cap: Process valid commands in too long frame
====================

Link: https://patch.msgid.link/20250416210126.2034212-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'bug-fixes-from-xdp-and-perout-series'

Meghana Malladi says:

====================
Bug fixes from XDP and perout series

This patch series consists of bug fixes from the XDP series:
1. Fixes a kernel warning that occurs when bringing down the
   network interface.
2. Resolves a potential NULL pointer dereference in the
   emac_xmit_xdp_frame() function.
3. Resolves a potential NULL pointer dereference in the
   icss_iep_perout_enable() function

v3: https://lore.kernel.org/all/20250328102403.2626974-1-m-malladi@ti.com/
====================

Link: https://patch.msgid.link/20250415090543.717991-1-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: icss-iep: Fix possible NULL pointer dereference for perout request

The ICSS IEP driver tracks perout and pps enable state with flags.
Currently when disabling pps and perout signals during icss_iep_exit(),
results in NULL pointer dereference for perout.

To fix the null pointer dereference issue, the icss_iep_perout_enable_hw
function can be modified to directly clear the IEP CMP registers when
disabling PPS or PEROUT, without referencing the ptp_perout_request
structure, as its contents are irrelevant in this case.

Fixes: 9b115361248d ("net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/7b1c7c36-363a-4085-b26c-4f210bee1df6@stanley.mountain/
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250415090543.717991-4-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: icssg-prueth: Fix possible NULL pointer dereference inside emac_xmit_xdp_frame()

There is an error check inside emac_xmit_xdp_frame() function which
is called when the driver wants to transmit XDP frame, to check if
the allocated tx descriptor is NULL, if true to exit and return
ICSSG_XDP_CONSUMED implying failure in transmission.

In this case trying to free a descriptor which is NULL will result
in kernel crash due to NULL pointer dereference. Fix this error handling
and increase netdev tx_dropped stats in the caller of this function
if the function returns ICSSG_XDP_CONSUMED.

Fixes: 62aa3246f462 ("net: ti: icssg-prueth: Add XDP support")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/70d8dd76-0c76-42fc-8611-9884937c82f5@stanley.mountain/
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250415090543.717991-3-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: icssg-prueth: Fix kernel warning while bringing down network interface

During network interface initialization, the NIC driver needs to register
its Rx queue with the XDP, to ensure the incoming XDP buffer carries a
pointer reference to this info and is stored inside xdp_rxq_info.

While this struct isn't tied to XDP prog, if there are any changes in
Rx queue, the NIC driver needs to stop the Rx queue by unregistering
with XDP before purging and reallocating memory. Drop page_pool destroy
during Rx channel reset as this is already handled by XDP during
xdp_rxq_info_unreg (Rx queue unregister), failing to do will cause the
following warning:

warning logs: https://gist.github.com/MeghanaMalladiTI/eb627e5dc8de24e42d7d46572c13e576

Fixes: 46eeb90f03e0 ("net: ti: icssg-prueth: Use page_pool API for RX buffer allocation")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250415090543.717991-2-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netfilter: conntrack: fix erronous removal of offload bit

The blamed commit exposes a possible issue with flow_offload_teardown():
We might remove the offload bit of a conntrack entry that has been
offloaded again.

1. conntrack entry c1 is offloaded via flow f1 (f1->ct == c1).
2. f1 times out and is pushed back to slowpath, c1 offload bit is
   removed.  Due to bug, f1 is not unlinked from rhashtable right away.
3. a new packet arrives for the flow and re-offload is triggered, i.e.
   f2->ct == c1.  This is because lookup in flowtable skip entries with
   teardown bit set.
4. Next flowtable gc cycle finds f1 again
5. flow_offload_teardown() is called again for f1 and c1 offload bit is
   removed again, even though we have f2 referencing the same entry.

This is harmless, but clearly not correct.
Fix the bug that exposes this: set 'teardown = true' to have the gc
callback unlink the flowtable entry from the table right away instead of
the unintentional defer to the next round.

Also prevent flow_offload_teardown() from fixing up the ct state more than
once: We could also be called from the data path or a notifier, not only
from the flowtable gc callback.

NF_FLOW_TEARDOWN can never be unset, so we can use it as synchronization
point: if we observe did not see a 0 -> 1 transition, then another CPU
is already doing the ct state fixups for us.

Fixes: 03428ca5cee9 ("netfilter: conntrack: rework offload nf_conn timeout extension logic")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

xfs: document zoned rt specifics in admin-guide

Document the lifetime, nolifetime and max_open_zones mount options
added for zoned rt file systems.

Also add documentation describing the max_open_zones sysfs attribute
exposed in /sys/fs/xfs/<dev>/zoned/

Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

Merge tag 'mm-hotfixes-stable-2025-04-16-19-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc hotfixes from Andrew Morton:
"31 hotfixes.

  9 are cc:stable and the remainder address post-6.15 issues or aren't
  considered necessary for -stable kernels.

  22 patches are for MM, 9 are otherwise"

* tag 'mm-hotfixes-stable-2025-04-16-19-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
  MAINTAINERS: update HUGETLB reviewers
  mm: fix apply_to_existing_page_range()
  selftests/mm: fix compiler -Wmaybe-uninitialized warning
  alloc_tag: handle incomplete bulk allocations in vm_module_tags_populate
  mailmap: add entry for Jean-Michel Hautbois
  mm: (un)track_pfn_copy() fix + doc improvements
  mm: fix filemap_get_folios_contig returning batches of identical folios
  mm/hugetlb: add a line break at the end of the format string
  selftests: mincore: fix tmpfs mincore test failure
  mm/hugetlb: fix set_max_huge_pages() when there are surplus pages
  mm/cma: report base address of single range correctly
  mm: page_alloc: speed up fallbacks in rmqueue_bulk()
  kunit: slub: add module description
  mm/kasan: add module decription
  ucs2_string: add module description
  zlib: add module description
  fpga: tests: add module descriptions
  samples/livepatch: add module descriptions
  ASN.1: add module description
  mm/vma: add give_up_on_oom option on modify/merge, use in uffd release
  ...

net: don't try to ops lock uninitialized devs

We need to be careful when operating on dev while in rtnl_create_link().
Some devices (vxlan) initialize netdev_ops in ->newlink, so later on.
Avoid using netdev_lock_ops(), the device isn't registered so we
cannot legally call its ops or generate any notifications for it.

netdev_ops_assert_locked_or_invisible() is safe to use, it checks
registration status first.

Reported-by: syzbot+de1c7d68a10e3f123bdd@syzkaller.appspotmail.com
Fixes: 04efcee6ef8d ("net: hold instance lock during NETDEV_CHANGE")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250415151552.768373-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ptp: ocp: fix start time alignment in ptp_ocp_signal_set

In ptp_ocp_signal_set, the start time for periodic signals is not
aligned to the next period boundary. The current code rounds up the
start time and divides by the period but fails to multiply back by
the period, causing misaligned signal starts. Fix this by multiplying
the rounded-up value by the period to ensure the start time is the
closest next period.

Fixes: 4bd46bb037f8e ("ptp: ocp: Use DIV64_U64_ROUND_UP for rounding.")
Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250415053131.129413-1-maimon.sagi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'collection-of-dsa-bug-fixes'

Vladimir Oltean says:

====================
Collection of DSA bug fixes

Prompted by Russell King's 3 DSA bug reports from Friday (linked in
their respective patches: 1, 2 and 3), I am providing fixes to those, as
well as flushing the queue with 2 other bug fixes I had.

1: fix NULL pointer dereference during mv88e6xxx driver unbind, on old
   switch models which lack PVT and/or STU. Seen on the ZII dev board
   rev B.
2: fix failure to delete bridge port VLANs on old mv88e6xxx chips which
   lack STU. Seen on the same board.
3: fix WARN_ON() and resource leak in DSA core on driver unbind. Seen on
   the same board but is a much more widespread issue.
4: fix use-after-free during probing of DSA trees with >= 3 switches,
   if -EPROBE_DEFER exists. In principle issue also exists for the ZII
   board, I reproduced on Turris MOX.
5: fix incorrect use of refcount API in DSA core for those switches
   which use tag_8021q (felix, sja1105, vsc73xx). Returning an error
   when attempting to delete a tag_8021q VLAN prints a WARN_ON(), which
   is harmless but might be problematic with CONFIG_PANIC_ON_OOPS.
====================

Link: https://patch.msgid.link/20250414212708.2948164-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: avoid refcount warnings when ds->ops->tag_8021q_vlan_del() fails

This is very similar to the problem and solution from commit
232deb3f9567 ("net: dsa: avoid refcount warnings when
->port_{fdb,mdb}_del returns error"), except for the
dsa_port_do_tag_8021q_vlan_del() operation.

Fixes: c64b9c05045a ("net: dsa: tag_8021q: add proper cross-chip notifier support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250414213020.2959021-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: free routing table on probe failure

If complete = true in dsa_tree_setup(), it means that we are the last
switch of the tree which is successfully probing, and we should be
setting up all switches from our probe path.

After "complete" becomes true, dsa_tree_setup_cpu_ports() or any
subsequent function may fail. If that happens, the entire tree setup is
in limbo: the first N-1 switches have successfully finished probing
(doing nothing but having allocated persistent memory in the tree's
dst->ports, and maybe dst->rtable), and switch N failed to probe, ending
the tree setup process before anything is tangible from the user's PoV.

If switch N fails to probe, its memory (ports) will be freed and removed
from dst->ports. However, the dst->rtable elements pointing to its ports,
as created by dsa_link_touch(), will remain there, and will lead to
use-after-free if dereferenced.

If dsa_tree_setup_switches() returns -EPROBE_DEFER, which is entirely
possible because that is where ds->ops->setup() is, we get a kasan
report like this:

==================================================================
BUG: KASAN: slab-use-after-free in mv88e6xxx_setup_upstream_port+0x240/0x568
Read of size 8 at addr ffff000004f56020 by task kworker/u8:3/42

Call trace:
__asan_report_load8_noabort+0x20/0x30
mv88e6xxx_setup_upstream_port+0x240/0x568
mv88e6xxx_setup+0xebc/0x1eb0
dsa_register_switch+0x1af4/0x2ae0
mv88e6xxx_register_switch+0x1b8/0x2a8
mv88e6xxx_probe+0xc4c/0xf60
mdio_probe+0x78/0xb8
really_probe+0x2b8/0x5a8
__driver_probe_device+0x164/0x298
driver_probe_device+0x78/0x258
__device_attach_driver+0x274/0x350

Allocated by task 42:
__kasan_kmalloc+0x84/0xa0
__kmalloc_cache_noprof+0x298/0x490
dsa_switch_touch_ports+0x174/0x3d8
dsa_register_switch+0x800/0x2ae0
mv88e6xxx_register_switch+0x1b8/0x2a8
mv88e6xxx_probe+0xc4c/0xf60
mdio_probe+0x78/0xb8
really_probe+0x2b8/0x5a8
__driver_probe_device+0x164/0x298
driver_probe_device+0x78/0x258
__device_attach_driver+0x274/0x350

Freed by task 42:
__kasan_slab_free+0x48/0x68
kfree+0x138/0x418
dsa_register_switch+0x2694/0x2ae0
mv88e6xxx_register_switch+0x1b8/0x2a8
mv88e6xxx_probe+0xc4c/0xf60
mdio_probe+0x78/0xb8
really_probe+0x2b8/0x5a8
__driver_probe_device+0x164/0x298
driver_probe_device+0x78/0x258
__device_attach_driver+0x274/0x350

The simplest way to fix the bug is to delete the routing table in its
entirety. dsa_tree_setup_routing_table() has no problem in regenerating
it even if we deleted links between ports other than those of switch N,
because dsa_link_touch() first checks whether the port pair already
exists in dst->rtable, allocating if not.

The deletion of the routing table in its entirety already exists in
dsa_tree_teardown(), so refactor that into a function that can also be
called from the tree setup error path.

In my analysis of the commit to blame, it is the one which added
dsa_link elements to dst->rtable. Prior to that, each switch had its own
ds->rtable which is freed when the switch fails to probe. But the tree
is potentially persistent memory.

Fixes: c5f51765a1f6 ("net: dsa: list DSA links in the fabric")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250414213001.2957964-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: clean up FDB, MDB, VLAN entries on unbind

As explained in many places such as commit b117e1e8a86d ("net: dsa:
delete dsa_legacy_fdb_add and dsa_legacy_fdb_del"), DSA is written given
the assumption that higher layers have balanced additions/deletions.
As such, it only makes sense to be extremely vocal when those
assumptions are violated and the driver unbinds with entries still
present.

But Ido Schimmel points out a very simple situation where that is wrong:
https://lore.kernel.org/netdev/ZDazSM5UsPPjQuKr@shredder/
(also briefly discussed by me in the aforementioned commit).

Basically, while the bridge bypass operations are not something that DSA
explicitly documents, and for the majority of DSA drivers this API
simply causes them to go to promiscuous mode, that isn't the case for
all drivers. Some have the necessary requirements for bridge bypass
operations to do something useful - see dsa_switch_supports_uc_filtering().

Although in tools/testing/selftests/net/forwarding/local_termination.sh,
we made an effort to popularize better mechanisms to manage address
filters on DSA interfaces from user space - namely macvlan for unicast,
and setsockopt(IP_ADD_MEMBERSHIP) - through mtools - for multicast, the
fact is that 'bridge fdb add ... self static local' also exists as
kernel UAPI, and might be useful to someone, even if only for a quick
hack.

It seems counter-productive to block that path by implementing shim
.ndo_fdb_add and .ndo_fdb_del operations which just return -EOPNOTSUPP
in order to prevent the ndo_dflt_fdb_add() and ndo_dflt_fdb_del() from
running, although we could do that.

Accepting that cleanup is necessary seems to be the only option.
Especially since we appear to be coming back at this from a different
angle as well. Russell King is noticing that the WARN_ON() triggers even
for VLANs:
https://lore.kernel.org/netdev/Z_li8Bj8bD4-BYKQ@shell.armlinux.org.uk/

What happens in the bug report above is that dsa_port_do_vlan_del() fails,
then the VLAN entry lingers on, and then we warn on unbind and leak it.

This is not a straight revert of the blamed commit, but we now add an
informational print to the kernel log (to still have a way to see
that bugs exist), and some extra comments gathered from past years'
experience, to justify the logic.

Fixes: 0832cd9f1f02 ("net: dsa: warn if port lists aren't empty in dsa_port_teardown")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250414212930.2956310-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mv88e6xxx: fix -ENOENT when deleting VLANs and MST is unsupported

Russell King reports that on the ZII dev rev B, deleting a bridge VLAN
from a user port fails with -ENOENT:
https://lore.kernel.org/netdev/Z_lQXNP0s5-IiJzd@shell.armlinux.org.uk/

This comes from mv88e6xxx_port_vlan_leave() -> mv88e6xxx_mst_put(),
which tries to find an MST entry in &chip->msts associated with the SID,
but fails and returns -ENOENT as such.

But we know that this chip does not support MST at all, so that is not
surprising. The question is why does the guard in mv88e6xxx_mst_put()
not exit early:

if (!sid)
return 0;

And the answer seems to be simple: the sid comes from vlan.sid which
supposedly was previously populated by mv88e6xxx_vtu_get().
But some chip->info->ops->vtu_getnext() implementations do not populate
vlan.sid, for example see mv88e6185_g1_vtu_getnext(). In that case,
later in mv88e6xxx_port_vlan_leave() we are using a garbage sid which is
just residual stack memory.

Testing for sid == 0 covers all cases of a non-bridge VLAN or a bridge
VLAN mapped to the default MSTI. For some chips, SID 0 is valid and
installed by mv88e6xxx_stu_setup(). A chip which does not support the
STU would implicitly only support mapping all VLANs to the default MSTI,
so although SID 0 is not valid, it would be sufficient, if we were to
zero-initialize the vlan structure, to fix the bug, due to the
coincidence that a test for vlan.sid == 0 already exists and leads to
the same (correct) behavior.

Another option which would be sufficient would be to add a test for
mv88e6xxx_has_stu() inside mv88e6xxx_mst_put(), symmetric to the one
which already exists in mv88e6xxx_mst_get(). But that placement means
the caller will have to dereference vlan.sid, which means it will access
uninitialized memory, which is not nice even if it ignores it later.

So we end up making both modifications, in order to not rely just on the
sid == 0 coincidence, but also to avoid having uninitialized structure
fields which might get temporarily accessed.

Fixes: acaf4d2e36b3 ("net: dsa: mv88e6xxx: MST Offloading")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250414212913.2955253-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mv88e6xxx: avoid unregistering devlink regions which were never registered

Russell King reports that a system with mv88e6xxx dereferences a NULL
pointer when unbinding this driver:
https://lore.kernel.org/netdev/Z_lRkMlTJ1KQ0kVX@shell.armlinux.org.uk/

The crash seems to be in devlink_region_destroy(), which is not NULL
tolerant but is given a NULL devlink global region pointer.

At least on some chips, some devlink regions are conditionally registered
since the blamed commit, see mv88e6xxx_setup_devlink_regions_global():

if (cond && !cond(chip))
continue;

These are MV88E6XXX_REGION_STU and MV88E6XXX_REGION_PVT. If the chip
does not have an STU or PVT, it should crash like this.

To fix the issue, avoid unregistering those regions which are NULL, i.e.
were skipped at mv88e6xxx_setup_devlink_regions_global() time.

Fixes: 836021a2d0e0 ("net: dsa: mv88e6xxx: Export cross-chip PVT as devlink region")
Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250414212850.2953957-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: txgbe: fix memory leak in txgbe_probe() error path

When txgbe_sw_init() is called, memory is allocated for wx->rss_key
in wx_init_rss_key(). However, in txgbe_probe() function, the subsequent
error paths after txgbe_sw_init() don't free the rss_key. Fix that by
freeing it in error path along with wx->mac_table.

Also change the label to which execution jumps when txgbe_sw_init()
fails, because otherwise, it could lead to a double free for rss_key,
when the mac_table allocation fails in wx_sw_init().

Fixes: 937d46ecc5f9 ("net: wangxun: add ethtool_ops for channel number")
Reported-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com>
Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250415032910.13139-1-abdun.nihaal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: switchdev: do not notify new brentries as changed

When adding a bridge vlan that is pvid or untagged after the vlan has
already been added to any other switchdev backed port, the vlan change
will be propagated as changed, since the flags change.

This causes the vlan to not be added to the hardware for DSA switches,
since the DSA handler ignores any vlans for the CPU or DSA ports that
are changed.

E.g. the following order of operations would work:

$ ip link add swbridge type bridge vlan_filtering 1 vlan_default_pvid 0
$ ip link set lan1 master swbridge
$ bridge vlan add dev swbridge vid 1 pvid untagged self
$ bridge vlan add dev lan1 vid 1 pvid untagged

but this order would break:

$ ip link add swbridge type bridge vlan_filtering 1 vlan_default_pvid 0
$ ip link set lan1 master swbridge
$ bridge vlan add dev lan1 vid 1 pvid untagged
$ bridge vlan add dev swbridge vid 1 pvid untagged self

Additionally, the vlan on the bridge itself would become undeletable:

$ bridge vlan
port              vlan-id
lan1              1 PVID Egress Untagged
swbridge          1 PVID Egress Untagged
$ bridge vlan del dev swbridge vid 1 self
$ bridge vlan
port              vlan-id
lan1              1 PVID Egress Untagged
swbridge          1 Egress Untagged

since the vlan was never added to DSA's vlan list, so deleting it will
cause an error, causing the bridge code to not remove it.

Fix this by checking if flags changed only for vlans that are already
brentry and pass changed as false for those that become brentries, as
these are a new vlan (member) from the switchdev point of view.

Since *changed is set to true for becomes_brentry = true regardless of
would_change's value, this will not change any rtnetlink notification
delivery, just the value passed on to switchdev in vlan->changed.

Fixes: 8d23a54f5bee ("net: bridge: switchdev: differentiate new VLANs from changed ones")
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250414200020.192715-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: b53: enable BPDU reception for management port

For STP to work, receiving BPDUs is essential, but the appropriate bit
was never set. Without GC_RX_BPDU_EN, the switch chip will filter all
BPDUs, even if an appropriate PVID VLAN was setup.

Fixes: ff39c2d68679 ("net: dsa: b53: Add bridge support")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Link: https://patch.msgid.link/20250414200434.194422-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ynl-avoid-leaks-in-attr-override-and-spec-fixes-for-c'

Jakub Kicinski says:

====================
ynl: avoid leaks in attr override and spec fixes for C

The C rt-link work revealed more problems in existing codegen
and classic netlink specs.

Patches 1 - 4 fix issues with the codegen. Patches 1 and 2 are
pre-requisites for patch 3. Patch 3 fixes leaking memory if user
tries to override already set attr. Patch 4 validates attrs in case
kernel sends something we don't expect.

Remaining patches fix and align the specs. Patch 5 changes nesting,
the rest are naming adjustments.
====================

Link: https://patch.msgid.link/20250414211851.602096-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: rt-neigh: prefix struct nfmsg members with ndm

Attach ndm- to all members of struct nfmsg. We could possibly
use name-prefix just for C, but I don't think we have any precedent
for using name-prefix on structs, and other rtnetlink sub-specs
give full names for fixed header struct members.

Fixes: bc515ed06652 ("netlink: specs: Add a spec for neighbor tables in rtnetlink")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250414211851.602096-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: rt-link: adjust mctp attribute naming

MCTP attribute naming is inconsistent. In C we have:
    IFLA_MCTP_NET,
    IFLA_MCTP_PHYS_BINDING,
         ^^^^

but in YAML:
    - mctp-net
    - phys-binding
      ^
       no "mctp"

It's unclear whether the "mctp" part of the name is supposed
to be a prefix or part of attribute name. Make it a prefix,
seems cleaner, even tho technically phys-binding was added later.

Fixes: b2f63d904e72 ("doc/netlink: Add spec for rt link messages")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250414211851.602096-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: rtnetlink: attribute naming corrections

Some attribute names diverge in very minor ways from the C names.
These are most likely typos, and they prevent the C codegen from
working.

Fixes: bc515ed06652 ("netlink: specs: Add a spec for neighbor tables in rtnetlink")
Fixes: b2f63d904e72 ("doc/netlink: Add spec for rt link messages")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250414211851.602096-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: rt-link: add an attr layer around alt-ifname

alt-ifname attr is directly placed in requests (as an alternative
to ifname) but in responses its wrapped up in IFLA_PROP_LIST
and only there is may be multi-attr. See rtnl_fill_prop_list().

Fixes: b2f63d904e72 ("doc/netlink: Add spec for rt link messages")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250414211851.602096-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl-gen: make sure we validate subtype of array-nest

ArrayNest AKA indexed-array support currently skips inner type
validation. We count the attributes and then we parse them,
make sure we call validate, too. Otherwise buggy / unexpected
kernel response may lead to crashes.

Fixes: be5bea1cc0bf ("net: add basic C code generators for Netlink")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250414211851.602096-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl-gen: individually free previous values on double set

When user calls request_attrA_set() multiple times (for the same
attribute), and attrA is of type which allocates memory -
we try to free the previously associated values. For array
types (including multi-attr) we have only freed the array,
but the array may have contained pointers.

Refactor the code generation for free attr and reuse the generated
lines in setters to flush out the previous state. Since setters
are static inlines in the header we need to add forward declarations
for the free helpers of pure nested structs. Track which types get
used by arrays and include the right forwad declarations.

At least ethtool string set and bit set would not be freed without
this. Tho, admittedly, overriding already set attribute twice is likely
a very very rare thing to do.

Fixes: be5bea1cc0bf ("net: add basic C code generators for Netlink")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250414211851.602096-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl-gen: move local vars after the opening bracket

The "function writing helper" tries to put local variables
between prototype and the opening bracket. Clearly wrong,
but up until now nothing actually uses it to write local
vars so it wasn't noticed.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250414211851.602096-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl-gen: don't declare loop iterator in place

The codegen tries to follow the "old" C style and declare loop
iterators at the start of the block / function. Only nested
request handling breaks this style, so adjust it.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250414211851.602096-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

cxgb4: fix memory leak in cxgb4_init_ethtool_filters() error path

In the for loop used to allocate the loc_array and bmap for each port, a
memory leak is possible when the allocation for loc_array succeeds,
but the allocation for bmap fails. This is because when the control flow
goes to the label free_eth_finfo, only the allocations starting from
(i-1)th iteration are freed.

Fix that by freeing the loc_array in the bmap allocation error path.

Fixes: d915c299f1da ("cxgb4: add skeleton for ethtool n-tuple filters")
Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250414170649.89156-1-abdun.nihaal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bcachefs: Add missing READ_ONCE() for metadata replicas

If we race with the user changing the metadata_replicas setting, this
could cause us to get an incorrectly sized disk reservation.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Bluetooth: vhci: Avoid needless snprintf() calls

Avoid double-copying of string literals. Use a "const char *" for each
string instead of copying from .rodata into stack and then into the skb.
We can go directly from .rodata to the skb.

This also works around a Clang bug (that has since been fixed[1]).

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401250927.1poZERd6-lkp@intel.com/
Fixes: ab4e4380d4e1 ("Bluetooth: Add vhci devcoredump support")
Link: https://github.com/llvm/llvm-project/commit/ea2e66aa8b6e363b89df66dc44275a0d7ecd70ce
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: l2cap: Process valid commands in too long frame

This is required for passing PTS test cases:
- L2CAP/COS/CED/BI-14-C
  Multiple Signaling Command in one PDU, Data Truncated, BR/EDR,
  Connection Request
- L2CAP/COS/CED/BI-15-C
  Multiple Signaling Command in one PDU, Data Truncated, BR/EDR,
  Disconnection Request

The test procedure defined in L2CAP.TS.p39 for both tests is:
1. The Lower Tester sends a C-frame to the IUT with PDU Length set
   to 8 and Channel ID set to the correct signaling channel for the
   logical link. The Information payload contains one L2CAP_ECHO_REQ
   packet with Data Length set to 0 with 0 octets of echo data and
   one command packet and Data Length set as specified in Table 4.6
   and the correct command data.
2. The IUT sends an L2CAP_ECHO_RSP PDU to the Lower Tester.
3. Perform alternative 3A, 3B, 3C, or 3D depending on the IUT’s
   response.
   Alternative 3A (IUT terminates the link):
     3A.1 The IUT terminates the link.
     3A.2 The test ends with a Pass verdict.
   Alternative 3B (IUT discards the frame):
     3B.1 The IUT does not send a reply to the Lower Tester.
   Alternative 3C (IUT rejects PDU):
     3C.1 The IUT sends an L2CAP_COMMAND_REJECT_RSP PDU to the
          Lower Tester.
   Alternative 3D (Any other IUT response):
     3D.1 The Upper Tester issues a warning and the test ends.
4. The Lower Tester sends a C-frame to the IUT with PDU Length set
   to 4 and Channel ID set to the correct signaling channel for the
   logical link. The Information payload contains Data Length set to
   0 with an L2CAP_ECHO_REQ packet with 0 octets of echo data.
5. The IUT sends an L2CAP_ECHO_RSP PDU to the Lower Tester.

With expected outcome:
  In Steps 2 and 5, the IUT responds with an L2CAP_ECHO_RSP.
  In Step 3A.1, the IUT terminates the link.
  In Step 3B.1, the IUT does not send a reply to the Lower Tester.
  In Step 3C.1, the IUT rejects the PDU.
  In Step 3D.1, the IUT sends any valid response.

Currently PTS fails with the following logs:
  Failed to receive ECHO RESPONSE.

And HCI logs:
> ACL Data RX: Handle 11 flags 0x02 dlen 20
      L2CAP: Information Response (0x0b) ident 2 len 12
        Type: Fixed channels supported (0x0003)
        Result: Success (0x0000)
        Channels: 0x000000000000002e
          L2CAP Signaling (BR/EDR)
          Connectionless reception
          AMP Manager Protocol
          L2CAP Signaling (LE)
> ACL Data RX: Handle 11 flags 0x02 dlen 13
        frame too long
        08 01 00 00 08 02 01 00 aa                       .........

Cc: stable@vger.kernel.org
Signed-off-by: Frédéric Danis <frederic.danis@collabora.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Merge tag 'devicetree-fixes-for-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- A couple of maintainers updates

- Remove obsolete Renesas TPU timer binding

- Add i.MX94 support to nxp,sysctr-timer and fsl,irqsteer

- Add support for 'data-lanes' property in fsl,imx8mq-nwl-dsi binding

* tag 'devicetree-fixes-for-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: soc: fsl: fsl,ls1028a-reset: Fix maintainer entry
  dt-bindings: timer: renesas,tpu: remove obsolete binding
  dt-bindings: timer: nxp,sysctr-timer: Add i.MX94 support
  dt-bindings: interrupt-controller: fsl,irqsteer: Add i.MX94 support
  dt-bindings: display: nwl-dsi: Allow 'data-lanes' property for port@1
  dt-bindings: xilinx: Remove myself from maintainership

Merge tag 'v6.15-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

- Disable ahash request chaining as it causes problems with the sa2ul
   driver

- Fix a couple of bugs in the new scomp stream freeing code

- Fix an old caam refcount underflow that is possibly showing up now
   because of the new parallel self-tests

- Fix regression in the tegra driver

* tag 'v6.15-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: ahash - Disable request chaining
  crypto: scomp - Fix wild memory accesses in scomp_free_streams
  crypto: caam/qi - Fix drv_ctx refcount bug
  crypto: scomp - Fix null-pointer deref when freeing streams
  crypto: tegra - Fix IV usage for AES ECB

xfs: fix fsmap for internal zoned devices

Filesystems with an internal zoned rt section use xfs_rtblock_t values
that are relative to the start of the data device.  When fsmap reports
on internal rt sections, it reports the space used by the data section
as "OWN_FS".

Unfortunately, the logic for resuming a query isn't quite right, so
xfs/273 fails because it stress-tests GETFSMAP with a single-record
buffer.  If we enter the "report fake space as OWN_FS" block with a
nonzero key[0].fmr_length, we should add that to key[0].fmr_physical
and recheck if we still need to emit the fake record.  We should /not/
just return 0 from the whole function because that prevents all rmap
record iteration.

If we don't enter that block, the resumption is still wrong.
keys[*].fmr_physical is a reflection of what we copied out to userspace
on a previous query, which means that it already accounts for rgstart.
It is not correct to add rtstart_daddr when computing start_rtb or
end_rtb, so stop that.

While we're at it, add a xfs_has_zoned to make it clear that this is a
zoned filesystem thing.

Fixes: e50ec7fac81aa2 ("xfs: enable fsmap reporting for internal RT devices")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: Fix spelling mistake "drity" -> "dirty"

There is a spelling mistake in fs/xfs/xfs_log.c. Fix it.

Signed-off-by: Zhang Xianwei <zhang.xianwei8@zte.com.cn>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

ata: libata-sata: Save all fields from sense data descriptor

When filling the taskfile result for a successful NCQ command, we use
the SDB FIS from the FIS Receive Area, see e.g. ahci_qc_ncq_fill_rtf().

However, the SDB FIS only has fields STATUS and ERROR.

For a successful NCQ command that has sense data, we will have a
successful sense data descriptor, in the Sense Data for Successful NCQ
Commands log.

Since we have access to additional taskfile result fields, fill in these
additional fields in qc->result_tf.

This matches how for failing/aborted NCQ commands, we will use e.g.
ahci_qc_fill_rtf() to fill in some fields, but then for the command that
actually caused the NCQ error, we will use ata_eh_read_log_10h(), which
provides additional fields, saving additional fields/overriding the
qc->result_tf that was fetched using ahci_qc_fill_rtf().

Fixes: 18bd7718b5c4 ("scsi: ata: libata: Handle completion of CDL commands using policy 0xD")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

platform/x86: msi-wmi-platform: Workaround a ACPI firmware bug

The ACPI byte code inside the ACPI control method responsible for
handling the WMI method calls uses a global buffer for constructing
the return value, yet the ACPI control method itself is not marked
as "Serialized".
This means that calling WMI methods on this WMI device is not
thread-safe, as concurrent WMI method calls will corrupt the global
buffer.

Fix this by serializing the WMI method calls using a mutex.

Cc: stable@vger.kernel.org # 6.x.x: 912d614ac99e: platform/x86: msi-wmi-platform: Rename "data" variable
Fixes: 9c0beb6b29e7 ("platform/x86: wmi: Add MSI WMI Platform driver")
Tested-by: Antheas Kapenekakis <lkml@antheas.dev>
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20250414140453.7691-2-W_Armin@gmx.de
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

Merge tag 'linux-can-fixes-for-6.15-20250415' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2025-04-15

The first patch is by Davide Caratti and fixes the missing derement in
the protocol inuse counter for the J1939 CAN protocol.

The last patch is by Weizhao Ouyang and fixes a broken quirks check in
the rockchip CAN-FD driver.

* tag 'linux-can-fixes-for-6.15-20250415' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: rockchip_canfd: fix broken quirks checks
can: fix missing decrement of j1939_proto.inuse_idx
====================

Link: https://patch.msgid.link/20250415103401.445981-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

batman-adv: Fix double-hold of meshif when getting enabled

It was originally meant to replace the dev_hold with netdev_hold. But this
was missed in batadv_hardif_enable_interface(). As result, there was an
imbalance and a hang when trying to remove the mesh-interface with
(previously) active hard-interfaces:

unregister_netdevice: waiting for batadv0 to become free. Usage count = 3

Fixes: 00b35530811f ("batman-adv: adopt netdev_hold() / netdev_put()")
Suggested-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+ff3aa851d46ab82953a3@syzkaller.appspotmail.com
Reported-by: syzbot+4036165fc595a74b09b2@syzkaller.appspotmail.com
Reported-by: syzbot+c35d73ce910d86c0026e@syzkaller.appspotmail.com
Reported-by: syzbot+48c14f61594bdfadb086@syzkaller.appspotmail.com
Reported-by: syzbot+f37372d86207b3bb2941@syzkaller.appspotmail.com
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250414-double_hold_fix-v5-1-10e056324cde@narfation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'fib_rules-fix-iif-oif-matching-on-l3-master-device'

Ido Schimmel says:

====================
fib_rules: Fix iif / oif matching on L3 master device

Patch #1 fixes a recently reported regression regarding FIB rules that
match on iif / oif being a VRF device.

Patch #2 adds test cases to the FIB rules selftest.
====================

Link: https://patch.msgid.link/20250414172022.242991-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib_rule_tests: Add VRF match tests

Add tests for FIB rules that match on iif / oif being a VRF device. Test
both good and bad flows.

With previous patch ("net: fib_rules: Fix iif / oif matching on L3
master device"):

# ./fib_rule_tests.sh
[...]
Tests passed: 328
Tests failed: 0

Without it:

# ./fib_rule_tests.sh
[...]
Tests passed: 324
Tests failed: 4

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250414172022.242991-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fib_rules: Fix iif / oif matching on L3 master device

Before commit 40867d74c374 ("net: Add l3mdev index to flow struct and
avoid oif reset for port devices") it was possible to use FIB rules to
match on a L3 domain. This was done by having a FIB rule match on iif /
oif being a L3 master device. It worked because prior to the FIB rule
lookup the iif / oif fields in the flow structure were reset to the
index of the L3 master device to which the input / output device was
enslaved to.

The above scheme made it impossible to match on the original input /
output device. Therefore, cited commit stopped overwriting the iif / oif
fields in the flow structure and instead stored the index of the
enslaving L3 master device in a new field ('flowi_l3mdev') in the flow
structure.

While the change enabled new use cases, it broke the original use case
of matching on a L3 domain. Fix this by interpreting the iif / oif
matching on a L3 master device as a match against the L3 domain. In
other words, if the iif / oif in the FIB rule points to a L3 master
device, compare the provided index against 'flowi_l3mdev' rather than
'flowi_{i,o}if'.

Before cited commit, a FIB rule that matched on 'iif vrf1' would only
match incoming traffic from devices enslaved to 'vrf1'. With the
proposed change (i.e., comparing against 'flowi_l3mdev'), the rule would
also match traffic originating from a socket bound to 'vrf1'. Avoid that
by adding a new flow flag ('FLOWI_FLAG_L3MDEV_OIF') that indicates if
the L3 domain was derived from the output interface or the input
interface (when not set) and take this flag into account when evaluating
the FIB rule against the flow structure.

Avoid unnecessary checks in the data path by detecting that a rule
matches on a L3 master device when the rule is installed and marking it
as such.

Tested using the following script [1].

Output before 40867d74c374 (v5.4.291):

default dev dummy1 table 100 scope link
default dev dummy1 table 200 scope link

Output after 40867d74c374:

default dev dummy1 table 300 scope link
default dev dummy1 table 300 scope link

Output with this patch:

default dev dummy1 table 100 scope link
default dev dummy1 table 200 scope link

[1]
#!/bin/bash

ip link add name vrf1 up type vrf table 10
ip link add name dummy1 up master vrf1 type dummy

sysctl -wq net.ipv4.conf.all.forwarding=1
sysctl -wq net.ipv4.conf.all.rp_filter=0

ip route add table 100 default dev dummy1
ip route add table 200 default dev dummy1
ip route add table 300 default dev dummy1

ip rule add prio 0 oif vrf1 table 100
ip rule add prio 1 iif vrf1 table 200
ip rule add prio 2 table 300

ip route get 192.0.2.1 oif dummy1 fibmatch
ip route get 192.0.2.1 iif dummy1 from 198.51.100.1 fibmatch

Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
Reported-by: hanhuihui <hanhuihui5@huawei.com>
Closes: https://lore.kernel.org/netdev/ec671c4f821a4d63904d0da15d604b75@huawei.com/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250414172022.242991-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: bnxt: fix missing ring index trim on error path

Commit under Fixes converted tx_prod to be free running but missed
masking it on the Tx error path. This crashes on error conditions,
for example when DMA mapping fails.

Fixes: 6d1add95536b ("bnxt_en: Modify TX ring indexing logic.")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250414143210.458625-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ti: am65-cpsw: fix port_np reference counting

A reference to the device tree node is stored in a private struct, thus
the reference count has to be incremented. Also, decrement the count on
device removal and in the error path.

Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250414083942.4015060-1-mwalle@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-pf: handle otx2_mbox_get_rsp errors

Adding error pointer check after calling otx2_mbox_get_rsp().

This is similar to the commit bd3110bc102a
("octeontx2-pf: handle otx2_mbox_get_rsp errors in otx2_flows.c").

Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com>
Fixes: 6c40ca957fe5 ("octeontx2-pf: Adds TC offload support")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250412183327.3550970-1-chenyuan0y@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Revert "PCI: Avoid reset when disabled via sysfs"

This reverts commit 479380efe1625e251008d24b2810283db60d6fcd.

The reset_method attribute on a PCI device is only intended to manage the
availability of function scoped resets for a device. It was never intended
to restrict resets targeting the bus or slot.

In introducing a restriction that each device must support function level
reset by testing pci_reset_supported(), we essentially create a catch-22,
that a device must have a function scope reset in order to support bus/slot
reset, when we use bus/slot reset to effect a reset of a device that does
not support a function scoped reset, especially multi-function devices.

This breaks the majority of uses cases where vfio-pci uses bus/slot resets
to manage multifunction devices that do not support function scoped resets.

Fixes: 479380efe162 ("PCI: Avoid reset when disabled via sysfs")
Reported-by: Cal Peake <cp@absolutedigital.net>
Closes: https://lore.kernel.org/all/808e1111-27b7-f35b-6d5c-5b275e73677b@absolutedigital.net
Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220010
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250414211828.3530741-1-alex.williamson@redhat.com

bcachefs: snapshot_node_missing is now autofix

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

spi: tegra210-quad: add rate limiting and simplify timeout error message

On malfunctioning hardware, timeout error messages can appear thousands
of times, creating unnecessary system pressure and log bloat. This patch
makes two improvements:

1. Replace dev_err() with dev_err_ratelimited() to prevent log flooding
when hardware errors persist
2. Remove the redundant timeout value parameter from the error message,
as 'ret' is always zero in this error path

These changes reduce logging overhead while maintaining necessary error
reporting for debugging purposes.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250401-tegra-v2-2-126c293ec047@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>

spi: tegra210-quad: use WARN_ON_ONCE instead of WARN_ON for timeouts

Some machines with tegra_qspi_combined_seq_xfer hardware issues generate
excessive kernel warnings, severely polluting the logs:

dmesg | grep -i "WARNING:.*tegra_qspi_transfer_one_message" | wc -l
94451

This patch replaces WARN_ON with WARN_ON_ONCE for timeout conditions to
reduce log spam. The subsequent error message still prints on each
occurrence, providing sufficient information about the failure, while
the stack trace is only needed once for debugging purposes.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250401-tegra-v2-1-126c293ec047@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>

bcachefs: Log message when incompat version requested but not enabled

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Print version_incompat_allowed on startup

Let users know if incompatible features aren't enabled

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Silence extent_poisoned error messages

extent poisoning is partly so that we don't keep spewing the dmesg log
when we've got unreadable data - we don't want to print these.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Merge tag 'edac_urgent_for_v6.15_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC fixes from Borislav Petkov:
"Two fixes to the AMD translation library for the MI300 side of things:

   - Use the row[13] bit when calculating the memory row to retire

   - Mask the physical row address in order to avoid creating duplicate
     error records"

* tag 'edac_urgent_for_v6.15_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  RAS/AMD/FMPM: Get masked address
  RAS/AMD/ATL: Include row[13] bit in row retirement

Merge tag 'fs_for_v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull isofs fix from Jan Kara:
"Fix a case where isofs could be reading beyond end of the passed
file handle if its type was incorrectly set"

* tag 'fs_for_v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
isofs: Prevent the use of too small fid

net: ngbe: fix memory leak in ngbe_probe() error path

When ngbe_sw_init() is called, memory is allocated for wx->rss_key
in wx_init_rss_key(). However, in ngbe_probe() function, the subsequent
error paths after ngbe_sw_init() don't free the rss_key. Fix that by
freeing it in error path along with wx->mac_table.

Also change the label to which execution jumps when ngbe_sw_init()
fails, because otherwise, it could lead to a double free for rss_key,
when the mac_table allocation fails in wx_sw_init().

Fixes: 02338c484ab6 ("net: ngbe: Initialize sw info and register netdev")
Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20250412154927.25908-1-abdun.nihaal@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

platform/x86: msi-wmi-platform: Rename "data" variable

Rename the "data" variable inside msi_wmi_platform_read() to avoid
a name collision when the driver adds support for a state container
struct (that is to be called "data" too) in the future.

Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20250414140453.7691-1-W_Armin@gmx.de
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

platform/x86: alienware-wmi-wmax: Extend support to more laptops

Extend thermal control support to:

- Alienware Area-51m R2
- Alienware m16 R1
- Alienware m16 R2
- Dell G16 7630
- Dell G5 5505 SE

Cc: stable@vger.kernel.org
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Link: https://lore.kernel.org/r/20250411-awcc-support-v1-2-09a130ec4560@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

platform/x86: alienware-wmi-wmax: Add G-Mode support to Alienware m16 R1

Some users report the Alienware m16 R1 models, support G-Mode. This was
manually verified by inspecting their ACPI tables.

Cc: stable@vger.kernel.org
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Link: https://lore.kernel.org/r/20250411-awcc-support-v1-1-09a130ec4560@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

can: rockchip_canfd: fix broken quirks checks

First get the devtype_data then check quirks.

Fixes: bbdffb341498 ("can: rockchip_canfd: add quirk for broken CAN-FD support")
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://patch.msgid.link/20250324114416.10160-1-o451686892@gmail.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: fix missing decrement of j1939_proto.inuse_idx

Like other protocols on top of AF_CAN family, also j1939_proto.inuse_idx
needs to be decremented on socket dismantle.

Fixes: 6bffe88452db ("can: add protocol counter for AF_CAN sockets")
Reported-by: Oliver Hartkopp <socketcan@hartkopp.net>
Closes: https://lore.kernel.org/linux-can/7e35b13f-bbc4-491e-9081-fb939e1b8df0@hartkopp.net/
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/09ce71f281b9e27d1e3d1104430bf3fceb8c7321.1742292636.git.dcaratti@redhat.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

net: openvswitch: fix nested key length validation in the set() action

It's not safe to access nla_len(ovs_key) if the data is smaller than
the netlink header. Check that the attribute is OK first.

Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.")
Reported-by: syzbot+b07a9da40df1576b8048@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b07a9da40df1576b8048
Tested-by: syzbot+b07a9da40df1576b8048@syzkaller.appspotmail.com
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20250412104052.2073688-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
igc: Fix PTM timeout

Christopher S M Hall says:

There have been sporadic reports of PTM timeouts using i225/i226 devices

These timeouts have been root caused to:

1) Manipulating the PTM status register while PTM is enabled
   and triggered
2) The hardware retrying too quickly when an inappropriate response
   is received from the upstream device

The issue can be reproduced with the following:

$ sudo phc2sys -R 1000 -O 0 -i tsn0 -m

Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to
quickly reproduce the issue.

PHC2SYS exits with:

"ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction
  fails

The first patch in this series also resolves an issue reported by Corinna
Vinschen relating to kdump:

  This patch also fixes a hang in igc_probe() when loading the igc
  driver in the kdump kernel on systems supporting PTM.

  The igc driver running in the base kernel enables PTM trigger in
  igc_probe().  Therefore the driver is always in PTM trigger mode,
  except in brief periods when manually triggering a PTM cycle.

  When a crash occurs, the NIC is reset while PTM trigger is enabled.
  Due to a hardware problem, the NIC is subsequently in a bad busmaster
  state and doesn't handle register reads/writes.  When running
  igc_probe() in the kdump kernel, the first register access to a NIC
  register hangs driver probing and ultimately breaks kdump.

  With this patch, igc has PTM trigger disabled most of the time,
  and the trigger is only enabled for very brief (10 - 100 us) periods
  when manually triggering a PTM cycle.  Chances that a crash occurs
  during a PTM trigger are not zero, but extremly reduced.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: add lock preventing multiple simultaneous PTM transactions
  igc: cleanup PTP module if probe fails
  igc: handle the IGC_PTP_ENABLED flag correctly
  igc: move ktime snapshot into PTM retry loop
  igc: increase wait time before retrying PTM
  igc: fix PTM cycle trigger logic
====================

Link: https://patch.msgid.link/20250411162857.2754883-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: update HUGETLB reviewers

I have done quite some review on hugetlb code over the years, and some
work on it as well, the latest being the hugetlb pagewalk unification
which is a work in progress, and touches hugetlb code to great lengths.

HugeTLB does not have many reviewers, so I would like to help out by
offering myself as an official Reviewer.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Link: https://lkml.kernel.org/r/20250409082452.269180-1-osalvador@suse.de
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

netlink: specs: ovs_vport: align with C codegen capabilities

We started generating C code for OvS a while back, but actually
C codegen only supports fixed headers specified at the family
level right now (schema also allows specifying them per op).
ovs_flow and ovs_datapath already specify the fixed header
at the family level but ovs_vport does it per op.
Move the property, all ops use the same header.

This ensures YNL C sees the correct hdr_len:

   const struct ynl_family ynl_ovs_vport_family =  {
          .name           = "ovs_vport",
  -       .hdr_len        = sizeof(struct genlmsghdr),
  +       .hdr_len        = sizeof(struct genlmsghdr) + sizeof(struct ovs_header),
   };

Fixes: 7c59c9c8f202 ("tools: ynl: generate code for ovs families")
Link: https://patch.msgid.link/20250409145541.580674-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: don't mix device locking in dev_close_many() calls

Lockdep found the following dependency:

  &dev_instance_lock_key#3 -->
     &rdev->wiphy.mtx -->
        &net->xdp.lock -->
   &xs->mutex -->
      &dev_instance_lock_key#3

The first dependency is the problem. wiphy mutex should be outside
the instance locks. The problem happens in notifiers (as always)
for CLOSE. We only hold the instance lock for ops locked devices
during CLOSE, and WiFi netdevs are not ops locked. Unfortunately,
when we dev_close_many() during netns dismantle we may be holding
the instance lock of _another_ netdev when issuing a CLOSE for
a WiFi device.

Lockdep's "Possible unsafe locking scenario" only prints 3 locks
and we have 4, plus I think we'd need 3 CPUs, like this:

       CPU0                 CPU1              CPU2
       ----                 ----              ----
  lock(&xs->mutex);
                       lock(&dev_instance_lock_key#3);
                                         lock(&rdev->wiphy.mtx);
                                         lock(&net->xdp.lock);
                                         lock(&xs->mutex);
                       lock(&rdev->wiphy.mtx);
  lock(&dev_instance_lock_key#3);

Tho, I don't think that's possible as CPU1 and CPU2 would
be under rtnl_lock. Even if we have per-netns rtnl_lock and
wiphy can span network namespaces - CPU0 and CPU1 must be
in the same netns to see dev_instance_lock, so CPU0 can't
be installing a socket as CPU1 is tearing the netns down.

Regardless, our expected lock ordering is that wiphy lock
is taken before instance locks, so let's fix this.

Go over the ops locked and non-locked devices separately.
Note that calling dev_close_many() on an empty list is perfectly
fine. All processing (including RCU syncs) are conditional
on the list not being empty, already.

Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations")
Reported-by: syzbot+6f588c78bf765b62b450@syzkaller.appspotmail.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250412233011.309762-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:

- Fix hang in bnxt_re due to miscomputing the budget

- Avoid a -Wformat-security message in dev_set_name()

- Avoid an unused definition warning in fs.c with some kconfigs

- Fix error handling in usnic and remove IS_ERR_OR_NULL() usage

- Regression in RXE support foudn by blktests due to missing ODP
   exclusions

- Set the dma_segment_size on HNS so it doesn't corrupt DMA when using
   very large IOs

- Move a INIT_WORK to near when the work is allocated in cm.c to fix a
   racey crash where work in progress was being init'd

- Use __GFP_NOWARN to not dump in kvcalloc() if userspace requests a
   very big MR

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/bnxt_re: Remove unusable nq variable
  RDMA/core: Silence oversized kvmalloc() warning
  RDMA/cma: Fix workqueue crash in cma_netevent_work_handler
  RDMA/hns: Fix wrong maximum DMA segment size
  RDMA/rxe: Fix null pointer dereference in ODP MR check
  RDMA/mlx5: Fix compilation warning when USER_ACCESS isn't set
  RDMA/usnic: Fix passing zero to PTR_ERR in usnic_ib_pci_probe()
  RDMA/ucaps: Avoid format-security warning
  RDMA/bnxt_re: Fix budget handling of notification queue

Merge tag 'vfs-6.15-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

- Fix NULL pointer dereference in virtiofs

- Fix slab OOB access in hfs/hfsplus

- Only create /proc/fs/netfs when CONFIG_PROC_FS is set

- Fix getname_flags() to initialize pointer correctly

- Convert dentry flags to enum

- Don't allow datadir without lowerdir in overlayfs

- Use namespace_{lock,unlock} helpers in dissolve_on_fput() instead of
   plain namespace_sem so unmounted mounts are properly cleaned up

- Skip unnecessary ifs_block_is_uptodate check in iomap

- Remove an unused forward declaration in overlayfs

- Fix devpts uid/gid handling after converting to the new mount api

- Fix afs_dynroot_readdir() to not use the RCU read lock

- Fix mount_setattr() and open_tree_attr() to not pointlessly do path
   lookup or walk the mount tree if no mount option change has been
   requested

* tag 'vfs-6.15-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: use namespace_{lock,unlock} in dissolve_on_fput()
  iomap: skip unnecessary ifs_block_is_uptodate check
  fs: Fix filename init after recent refactoring
  netfs: Only create /proc/fs/netfs with CONFIG_PROC_FS
  mount: ensure we don't pointlessly walk the mount tree
  dcache: convert dentry flag macros to enum
  afs: Fix afs_dynroot_readdir() to not use the RCU read lock
  hfs/hfsplus: fix slab-out-of-bounds in hfs_bnode_read_key
  virtiofs: add filesystem context source name check
  devpts: Fix type for uid and gid params
  ovl: remove unused forward declaration
  ovl: don't allow datadir only

Merge tag 'perf-tools-fixes-for-v6.15-2025-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Namhyung Kim:
"A couple of fixes and the usual tooling header updates:

   - fix a build error on ARM64 when libunwind is requested

   - fix an infinite loop with branch stack on AMD Zen3

   - sync tooling headers with the kernel source"

* tag 'perf-tools-fixes-for-v6.15-2025-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf tools: Remove evsel__handle_error_quirks()
  perf libunwind arm64: Fix missing close parens in an if statement
  tools headers: Update the arch/x86/lib/memset_64.S copy with the kernel sources
  tools headers: Update the x86 headers with the kernel sources
  tools headers: Update the linux/unaligned.h copy with the kernel sources
  tools headers: Update the uapi/asm-generic/mman-common.h copy with the kernel sources
  tools headers: Update the uapi/linux/prctl.h copy with the kernel sources
  tools headers: Update the syscall table with the kernel sources
  tools headers: Update the VFS headers with the kernel sources
  tools headers: Update the uapi/linux/perf_event.h copy with the kernel sources
  tools headers: Update the socket headers with the kernel sources
  tools headers: Update the KVM headers with the kernel sources

vfio/pci: Virtualize zero INTx PIN if no pdev->irq

Typically pdev->irq is consistent with whether the device itself
supports INTx, where device support is reported via the PIN register.
Therefore the PIN register is often already zero if pdev->irq is zero.

Recently virtualization of the PIN register was expanded to include
the case where the device supports INTx but the platform does not
route the interrupt. This is reported by a value of IRQ_NOTCONNECTED
on some architectures. Other architectures just report zero for
pdev->irq.

We already disallow INTx setup if pdev->irq is zero, therefore add
this to the PIN register virtualization criteria so that a consistent
view is provided to userspace through virtualized config space and
ioctls.

Reported-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://lore.kernel.org/all/174231895238.2295.12586708771396482526.stgit@linux.ibm.com/
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://lore.kernel.org/r/20250320194145.2816379-1-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

spi: sun4i: add support for GPIO chip select lines

Set use_gpio_descriptors to true so that GPIOs can be used for chip
select in accordance with the DT binding.

Signed-off-by: Mans Rullgard <mans@mansr.com>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20250410115303.5150-1-mans@mansr.com
Signed-off-by: Mark Brown <broonie@kernel.org>

slab: ensure slab->obj_exts is clear in a newly allocated slab page

ktest recently reported crashes while running several buffered io tests
with __alloc_tagging_slab_alloc_hook() at the top of the crash call stack.
The signature indicates an invalid address dereference with low bits of
slab->obj_exts being set. The bits were outside of the range used by
page_memcg_data_flags and objext_flags and hence were not masked out
by slab_obj_exts() when obtaining the pointer stored in slab->obj_exts.
The typical crash log looks like this:

00510 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
00510 Mem abort info:
00510   ESR = 0x0000000096000045
00510   EC = 0x25: DABT (current EL), IL = 32 bits
00510   SET = 0, FnV = 0
00510   EA = 0, S1PTW = 0
00510   FSC = 0x05: level 1 translation fault
00510 Data abort info:
00510   ISV = 0, ISS = 0x00000045, ISS2 = 0x00000000
00510   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
00510   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
00510 user pgtable: 4k pages, 39-bit VAs, pgdp=0000000104175000
00510 [0000000000000010] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
00510 Internal error: Oops: 0000000096000045 [#1]  SMP
00510 Modules linked in:
00510 CPU: 10 UID: 0 PID: 7692 Comm: cat Not tainted 6.15.0-rc1-ktest-g189e17946605 #19327 NONE
00510 Hardware name: linux,dummy-virt (DT)
00510 pstate: 20001005 (nzCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00510 pc : __alloc_tagging_slab_alloc_hook+0xe0/0x190
00510 lr : __kmalloc_noprof+0x150/0x310
00510 sp : ffffff80c87df6c0
00510 x29: ffffff80c87df6c0 x28: 000000000013d1ff x27: 000000000013d200
00510 x26: ffffff80c87df9e0 x25: 0000000000000000 x24: 0000000000000001
00510 x23: ffffffc08041953c x22: 000000000000004c x21: ffffff80c0002180
00510 x20: fffffffec3120840 x19: ffffff80c4821000 x18: 0000000000000000
00510 x17: fffffffec3d02f00 x16: fffffffec3d02e00 x15: fffffffec3d00700
00510 x14: fffffffec3d00600 x13: 0000000000000200 x12: 0000000000000006
00510 x11: ffffffc080bb86c0 x10: 0000000000000000 x9 : ffffffc080201e58
00510 x8 : ffffff80c4821060 x7 : 0000000000000000 x6 : 0000000055555556
00510 x5 : 0000000000000001 x4 : 0000000000000010 x3 : 0000000000000060
00510 x2 : 0000000000000000 x1 : ffffffc080f50cf8 x0 : ffffff80d801d000
00510 Call trace:
00510  __alloc_tagging_slab_alloc_hook+0xe0/0x190 (P)
00510  __kmalloc_noprof+0x150/0x310
00510  __bch2_folio_create+0x5c/0xf8
00510  bch2_folio_create+0x2c/0x40
00510  bch2_readahead+0xc0/0x460
00510  read_pages+0x7c/0x230
00510  page_cache_ra_order+0x244/0x3a8
00510  page_cache_async_ra+0x124/0x170
00510  filemap_readahead.isra.0+0x58/0xa0
00510  filemap_get_pages+0x454/0x7b0
00510  filemap_read+0xdc/0x418
00510  bch2_read_iter+0x100/0x1b0
00510  vfs_read+0x214/0x300
00510  ksys_read+0x6c/0x108
00510  __arm64_sys_read+0x20/0x30
00510  invoke_syscall.constprop.0+0x54/0xe8
00510  do_el0_svc+0x44/0xc8
00510  el0_svc+0x18/0x58
00510  el0t_64_sync_handler+0x104/0x130
00510  el0t_64_sync+0x154/0x158
00510 Code: d5384100 f9401c01 b9401aa3 b40002e1 (f8227881)
00510 ---[ end trace 0000000000000000 ]---
00510 Kernel panic - not syncing: Oops: Fatal exception
00510 SMP: stopping secondary CPUs
00510 Kernel Offset: disabled
00510 CPU features: 0x0000,000000e0,00000410,8240500b
00510 Memory Limit: none

Investigation indicates that these bits are already set when we allocate
slab page and are not zeroed out after allocation. We are not yet sure
why these crashes start happening only recently but regardless of the
reason, not initializing a field that gets used later is wrong. Fix it
by initializing slab->obj_exts during slab page allocation.

Fixes: 21c690a349ba ("mm: introduce slabobj_ext to support slab object extensions")
Reported-by: Kent Overstreet <kent.overstreet@linux.dev>
Tested-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250411155737.1360746-1-surenb@google.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

xfs: compute buffer address correctly in xmbuf_map_backing_mem

Prior to commit e614a00117bc2d, xmbuf_map_backing_mem relied on
folio_file_page to return the base page for the xmbuf's loff_t in the
xfile, and set b_addr to the page_address of that base page.

Now that folio_file_page has been removed from xmbuf_map_backing_mem, we
always set b_addr to the folio_address of the folio. This is correct
for the situation where the folio size matches the buffer size, but it's
totally wrong if tmpfs uses large folios. We need to use
offset_in_folio here.

Found via xfs/801, which demonstrated evidence of corruption of an
in-memory rmap btree block right after initializing an adjacent block.

Fixes: e614a00117bc2d ("xfs: cleanup mapping tmpfs folios into the buffer cache")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: add tunable threshold parameter for triggering zone GC

Presently we start garbage collection late - when we start running
out of free zones to backfill max_open_zones. This is a reasonable
default as it minimizes write amplification. The longer we wait,
the more blocks are invalidated and reclaim cost less in terms
of blocks to relocate.

Starting this late however introduces a risk of GC being outcompeted
by user writes. If GC can't keep up, user writes will be forced to
wait for free zones with high tail latencies as a result.

This is not a problem under normal circumstances, but if fragmentation
is bad and user write pressure is high (multiple full-throttle
writers) we will "bottom out" of free zones.

To mitigate this, introduce a zonegc_low_space tunable that lets the
user specify a percentage of how much of the unused space that GC
should keep available for writing. A high value will reclaim more of
the space occupied by unused blocks, creating a larger buffer against
write bursts.

This comes at a cost as write amplification is increased. To
illustrate this using a sample workload, setting zonegc_low_space to
60% avoids high (500ms) max latencies while increasing write
amplification by 15%.

Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: mark xfs_buf_free as might_sleep()

xfs_buf_free can call vunmap, which can sleep. The vunmap path is an
unlikely one, so add might_sleep to ensure calling xfs_buf_free from
atomic context gets caught more easily.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: remove the leftover xfs_{set,clear}_li_failed infrastructure

Marking a log item as failed kept a buffer reference around for
resubmission of inode and dquote items.

For inode items commit 298f7bec503f3 ("xfs: pin inode backing buffer to
the inode log item") started pinning the inode item buffers
unconditionally and removed the need for this. Later commit acc8f8628c37
("xfs: attach dquot buffer to dquot log item buffer") did the same for
dquot items but didn't fully clean up the xfs_clear_li_failed side
for them. Stop adding the extra pin for dquot items and remove the
helpers.

This happens to fix a call to xfs_buf_free with the AIL lock held,
which would be incorrect for the unlikely case freeing the buffer
ends up calling vfree.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

bcachefs: btree_root_unreadable_and_scan_found_nothing now AUTOFIX

This will likely mean that the btree had only one node - there was
nothing or almost nothing in it, and we should reconstruct and continue.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Linux 6.15-rc2

bcachefs: fix bch2_dev_usage_full_read_fast()

One reference to bch_dev_usage wasn't updated, which meant we weren't
reading the full bch_dev_usage_full - oops.

Fixes: 955ba7b5ea03 ("bcachefs: bch_dev_usage_full")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Merge tag 'erofs-for-6.15-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:

- Properly handle errors when file-backed I/O fails

- Fix compilation issues on ARM platform (arm-linux-gnueabi)

- Fix parsing of encoded extents

- Minor cleanup

* tag 'erofs-for-6.15-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: remove duplicate code
  erofs: fix encoded extents handling
  erofs: add __packed annotation to union(__le16..)
  erofs: set error to bio if file-backed IO fails