]> www.infradead.org Git - users/hch/xfs.git/log
users/hch/xfs.git
2 months agoxfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay
Christoph Hellwig [Tue, 10 Sep 2024 04:58:17 +0000 (07:58 +0300)]
xfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay

The zone allocator wants to be able to remove a delalloc mapping in the
COW fork while keeping the block reservation.  To support that pass the
flags argument down to xfs_bmap_del_extent_delay and support the
XFS_BMAPI_REMAP flag to keep the reservation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoxfs: refine the unaligned check for always COW inodes in xfs_file_dio_write
Christoph Hellwig [Fri, 27 Oct 2023 07:58:24 +0000 (09:58 +0200)]
xfs: refine the unaligned check for always COW inodes in xfs_file_dio_write

For always COW inodes we also must check the alignment of each individual
iovec segment, as they could end up with different I/Os due to the way
bio_iov_iter_get_pages works, and we'd then overwrite an already written
block.  The existing always_cow sysctl based code doesn't catch this
because nothing enforces that blocks aren't rewritten, but for zoned XFS
on sequential write required zones this is a hard error.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoxfs: skip always_cow inodes in xfs_reflink_trim_around_shared
Christoph Hellwig [Sat, 14 Oct 2023 05:50:35 +0000 (07:50 +0200)]
xfs: skip always_cow inodes in xfs_reflink_trim_around_shared

xfs_reflink_trim_around_shared tries to find shared blocks in the
refcount btree.  Always_cow inodes don't have that tree, so don't
bother.

For the existing always_cow code this is a minor optimization.  For
the upcoming zoned code that can do COW without the rtreflink code it
avoids triggering a NULL pointer dereference.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoxfs: move xfs_bmapi_reserve_delalloc to xfs_iomap.c
Christoph Hellwig [Sun, 17 Nov 2024 09:22:50 +0000 (10:22 +0100)]
xfs: move xfs_bmapi_reserve_delalloc to xfs_iomap.c

Delalloc reservations are not supported in userspace, and thus it doesn't
make sense to share this helper with xfsprogs.c.  Move it to xfs_iomap.c
toward the two callers.

Note that there rest of the delalloc handling should probably eventually
also move out of xfs_bmap.c, but that will require a bit more surgery.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoxfs: add a rtg_blocks helper
Christoph Hellwig [Sat, 21 Dec 2024 08:38:28 +0000 (08:38 +0000)]
xfs: add a rtg_blocks helper

Shortcut dereferencing the xg_block_count field in the generic group
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoxfs: factor out a xfs_rt_check_size helper
Christoph Hellwig [Tue, 30 Jul 2024 23:42:42 +0000 (16:42 -0700)]
xfs: factor out a xfs_rt_check_size helper

Add a helper to check that the last block of a RT device is readable
to share the code between mount and growfs.  This also adds the mount
time overflow check to growfs and improves the error messages.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoxfs: reduce metafile reservations
Christoph Hellwig [Fri, 17 Jan 2025 05:55:08 +0000 (06:55 +0100)]
xfs: reduce metafile reservations

There is no point in reserving more space than actually available
on the data device for the worst case scenario that is unlikely to
happen.  Reserve at most 1/4th of the data device blocks, which is
still a heuristic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoxfs: make metabtree reservations global
Christoph Hellwig [Sat, 15 Feb 2025 06:48:28 +0000 (07:48 +0100)]
xfs: make metabtree reservations global

Currently each metabtree inode has it's own space reservation to ensure
it can be expanded to the maximum size, mirroring what is done for the
AG-based btrees.  But unlike the AG-based btrees the metabtree inodes
aren't restricted to allocate from a single AG but can use free space
form the entire file system.  And unlike AG-based btrees where the
required reservation shrinks with the available free space due to this,
the metabtree reservations for the rtrmap and rtfreflink trees are not
bound in any way by the data device free space as they track RT extent
allocations.  This is not very efficient as it requires a large number
of blocks to be set aside that can't be used at all by other btrees.

Switch to a model that uses a global pool instead in preparation for
reducing the amount of reserved space, which now also removes the
overloading of the i_nblocks field for metabtree inodes, which would
create problems if metabtree inodes ever had a big enough xattr fork
to require xattr blocks outside the inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoxfs: fixup the metabtree reservation in xrep_reap_metadir_fsblocks
Christoph Hellwig [Sat, 15 Feb 2025 06:43:02 +0000 (07:43 +0100)]
xfs: fixup the metabtree reservation in xrep_reap_metadir_fsblocks

All callers of xrep_reap_metadir_fsblocks need to fix up the metabtree
reservation, otherwise they'd leave the reservations in an incoherent
state.  Move the call to xrep_reset_metafile_resv into
xrep_reap_metadir_fsblocks so it always is taken care of, and remove
now superfluous helper functions in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoxfs: trace in-memory freecounter reservations
Christoph Hellwig [Sun, 9 Feb 2025 05:08:20 +0000 (06:08 +0100)]
xfs: trace in-memory freecounter reservations

Add two tracepoints when the freecounter dips into the reserved pool
and when it is entirely out of space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoxfs: support reserved blocks for the rt extent counter
Christoph Hellwig [Sun, 9 Feb 2025 05:19:06 +0000 (06:19 +0100)]
xfs: support reserved blocks for the rt extent counter

The zoned space allocator will need reserved RT extents for garbage
collection and zeroing of partial blocks.  Move the resblks related
fields into the freecounter array so that they can be used for all
counters.

Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoxfs: generalize the freespace and reserved blocks handling
Christoph Hellwig [Sun, 9 Feb 2025 04:43:50 +0000 (05:43 +0100)]
xfs: generalize the freespace and reserved blocks handling

xfs_{add,dec}_freecounter already handles the block and RT extent
percpu counters, but it currently hardcodes the passed in counter.

Add a freecounter abstraction that uses an enum to designate the counter
and add wrappers that hide the actual percpu_counters.  This will allow
expanding the reserved block handling to the RT extent counter in the
next step, and also prepares for adding yet another such counter that
can share the code.  Both these additions will be needed for the zoned
allocator.

Also switch the flooring of the frextents counter to 0 in statfs for the
rthinherit case to a manual min_t call to match the handling of the
fdblocks counter for normal file systems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2 months agoxfs: reflow xfs_dec_freecounter
Christoph Hellwig [Sun, 9 Feb 2025 04:33:43 +0000 (05:33 +0100)]
xfs: reflow xfs_dec_freecounter

Let the successful allocation be the main path through the function
with exception handling in branches to make the code easier to
follow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoiomap: pass private data to iomap_truncate_page
Christoph Hellwig [Fri, 16 Aug 2024 16:49:13 +0000 (18:49 +0200)]
iomap: pass private data to iomap_truncate_page

Allow the file system to pass private data which can be used by the
iomap_begin and iomap_end methods through the private pointer in the
iomap_iter structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoiomap: pass private data to iomap_zero_range
Christoph Hellwig [Fri, 16 Aug 2024 16:48:16 +0000 (18:48 +0200)]
iomap: pass private data to iomap_zero_range

Allow the file system to pass private data which can be used by the
iomap_begin and iomap_end methods through the private pointer in the
iomap_iter structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoiomap: pass private data to iomap_page_mkwrite
Christoph Hellwig [Sat, 30 Nov 2024 03:43:35 +0000 (04:43 +0100)]
iomap: pass private data to iomap_page_mkwrite

Allow the file system to pass private data which can be used by the
iomap_begin and iomap_end methods through the private pointer in the
iomap_iter structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoiomap: add a io_private field to struct iomap_ioend
Christoph Hellwig [Tue, 7 Jan 2025 07:07:15 +0000 (08:07 +0100)]
iomap: add a io_private field to struct iomap_ioend

Add a private data field to struct iomap_ioend so that the file system
can attach information to it.  Zoned XFS will use this for a pointer to
the open zone.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2 months agoiomap: optionally use ioends for direct I/O
Christoph Hellwig [Wed, 8 Jan 2025 18:36:49 +0000 (19:36 +0100)]
iomap: optionally use ioends for direct I/O

struct iomap_ioend currently tracks outstanding buffered writes and has
some really nice code in core iomap and XFS to merge contiguous I/Os
an defer them to userspace for completion in a very efficient way.

For zoned writes we'll also need a per-bio user context completion to
record the written blocks, and the infrastructure for that would look
basically like the ioend handling for buffered I/O.

So instead of reinventing the wheel, reuse the existing infrastructure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoiomap: factor out a iomap_dio_done helper
Christoph Hellwig [Mon, 16 Dec 2024 05:39:10 +0000 (06:39 +0100)]
iomap: factor out a iomap_dio_done helper

Split out the struct iomap-dio level final completion from
iomap_dio_bio_end_io into a helper to clean up the code and make it
reusable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoiomap: move common ioend code to ioend.c
Christoph Hellwig [Mon, 16 Dec 2024 05:32:35 +0000 (06:32 +0100)]
iomap: move common ioend code to ioend.c

This code will be reused for direct I/O soon, so split it out of
buffered-io.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoiomap: split bios to zone append limits in the submission handlers
Christoph Hellwig [Mon, 27 Jan 2025 14:33:47 +0000 (15:33 +0100)]
iomap: split bios to zone append limits in the submission handlers

Provide helpers for file systems to split bios in the direct I/O and
writeback I/O submission handlers.  The split ioends are chained to
the parent ioend so that only the parent ioend originally generated
by the iomap layer will be processed after all the chained off children
have completed.  This is based on the block layer bio chaining that has
supported a similar mechanism for a long time.

This Follows btrfs' lead and don't try to build bios to hardware limits
for zone append commands, but instead build them as normal unconstrained
bios and split them to the hardware limits in the I/O submission handler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoiomap: add a IOMAP_F_ANON_WRITE flag
Christoph Hellwig [Sun, 5 Nov 2023 05:40:52 +0000 (06:40 +0100)]
iomap: add a IOMAP_F_ANON_WRITE flag

Add a IOMAP_F_ANON_WRITE flag that indicates that the write I/O does not
have a target block assigned to it yet at iomap time and the file system
will do that in the bio submission handler, splitting the I/O as needed.

This is used to implement Zone Append based I/O for zoned XFS, where
splitting writes to the hardware limits and assigning a zone to them
happens just before sending the I/O off to the block layer, but could
also be useful for other things like compressed I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoiomap: simplify io_flags and io_type in struct iomap_ioend
Christoph Hellwig [Sun, 24 Nov 2024 12:53:36 +0000 (13:53 +0100)]
iomap: simplify io_flags and io_type in struct iomap_ioend

The ioend fields for distinct types of I/O are a bit complicated.
Consolidate them into a single io_flag field with it's own flags
decoupled from the iomap flags.  This also prepares for adding a new
flag that is unrelated to both of the iomap namespaces.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoiomap: allow the file system to submit the writeback bios
Christoph Hellwig [Tue, 5 Nov 2024 07:33:10 +0000 (08:33 +0100)]
iomap: allow the file system to submit the writeback bios

Change ->prepare_ioend to ->submit_ioend and require file systems that
implement it to submit the bio.  This is needed for file systems that
do their own work on the bios before submitting them to the block layer
like btrfs or zoned xfs.  To make this easier also pass the writeback
context to the method.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2 months agoDocumentation: Document the new zoned loop block device driver
Damien Le Moal [Fri, 31 Jan 2025 06:41:46 +0000 (15:41 +0900)]
Documentation: Document the new zoned loop block device driver

Introduce the zoned_loop.rst documentation file under
admin-guide/blockdev to document the zoned loop block device driver.
An overview of the driver is provided and its usage to create and delete
zoned devices described.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2 months agoblock: new zoned loop block device driver
Damien Le Moal [Fri, 31 Jan 2025 06:41:45 +0000 (15:41 +0900)]
block: new zoned loop block device driver

The zoned loop block device driver allows a user to create emulated
zoned block devices using one regular file per zone as backing storage.
Compared to null_blk or scsi_debug, it has the advantage of allowing
emulating large zoned devices without requiring the same amount of
memory as the capacity of the emulated device. Furthermore, zoned
devices emulated with this driver can be re-started after a host reboot
without any loss of the state of the device zones, which is something
that null_blk and scsi_debug do not support.

This initial implementation is simple and does not support zone resource
limits. That is, a zoned loop block device limits for the maximum number
of open zones and maximum number of active zones is always 0.

This driver can be either compiled in-kernel or as a module, named
"zloop". Compilation of this driver depends on the block layer support
for zoned block device (CONFIG_BLK_DEV_ZONED must be set).

Using the zloop driver to create and delete zoned block devices is
done by writing commands to the zoned loop control character device file
(/dev/zloop-control). Creating a device is done with:

  $ echo "add [options]" > /dev/zloop-control

The options available for the "add" operation cat be listed by reading
the zloop-control device file:

  $ cat /dev/zloop-control
  add id=%d,capacity_mb=%u,zone_size_mb=%u,zone_capacity_mb=%u,conv_zones=%u,base_dir=%s,nr_queues=%u,queue_depth=%u
  remove id=%d

The options available allow controlling the zoned device total
capacity, zone size, zone capactity of sequential zones, total number
of conventional zones, base directory for the zones backing file, number
of I/O queues and the maximum queue depth of I/O queues.

Deleting a device is done using the "remove" command:

  $ echo "remove id=0" > /dev/zloop-control

This implementation passes various tests using zonefs and fio (t/zbd
tests) and provides a state machine for zone conditions that is
compliant with the T10 ZBC and NVMe ZNS specifications.

Co-developed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2 months agoTEMP: nvme-pci: disable async probe
Christoph Hellwig [Sun, 15 Oct 2023 07:26:20 +0000 (09:26 +0200)]
TEMP: nvme-pci: disable async probe

This keeps getting my ZNS vs ZNS drivers reordered a bit and is annoying
for testing.

Not-really-signed-off-by: Christoph Hellwig <hch@lst.de>
2 months agoxfs: flush inodegc before swapon
Christoph Hellwig [Thu, 6 Feb 2025 06:15:00 +0000 (07:15 +0100)]
xfs: flush inodegc before swapon

Fix the brand new xfstest that tries to swapon on a recently unshared
file and use the chance to document the other bit of magic in this
function.

The big comment is taken from a mailinglist post by Dave Chinner.

Fixes: 5e672cd69f0a53 ("xfs: introduce xfs_inodegc_push()")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: rename xfs_iomap_swapfile_activate to xfs_vm_swap_activate
Christoph Hellwig [Thu, 6 Feb 2025 06:15:01 +0000 (07:15 +0100)]
xfs: rename xfs_iomap_swapfile_activate to xfs_vm_swap_activate

Match the method name and the naming convention or address_space
operations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: Do not allow norecovery mount with quotacheck
Carlos Maiolino [Mon, 3 Feb 2025 13:04:57 +0000 (14:04 +0100)]
xfs: Do not allow norecovery mount with quotacheck

Mounting a filesystem that requires quota state changing will generate a
transaction.

We already check for a read-only device; we should do that for
norecovery too.

A quotacheck on a norecovery mount, and with the right log size, will cause
the mount process to hang on:

[<0>] xlog_grant_head_wait+0x5d/0x2a0 [xfs]
[<0>] xlog_grant_head_check+0x112/0x180 [xfs]
[<0>] xfs_log_reserve+0xe3/0x260 [xfs]
[<0>] xfs_trans_reserve+0x179/0x250 [xfs]
[<0>] xfs_trans_alloc+0x101/0x260 [xfs]
[<0>] xfs_sync_sb+0x3f/0x80 [xfs]
[<0>] xfs_qm_mount_quotas+0xe3/0x2f0 [xfs]
[<0>] xfs_mountfs+0x7ad/0xc20 [xfs]
[<0>] xfs_fs_fill_super+0x762/0xa50 [xfs]
[<0>] get_tree_bdev_flags+0x131/0x1d0
[<0>] vfs_get_tree+0x26/0xd0
[<0>] vfs_cmd_create+0x59/0xe0
[<0>] __do_sys_fsconfig+0x4e3/0x6b0
[<0>] do_syscall_64+0x82/0x160
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

This is caused by a transaction running with bogus initialized head/tail

I initially hit this while running generic/050, with random log
sizes, but I managed to reproduce it reliably here with the steps
below:

mkfs.xfs -f -lsize=1025M -f -b size=4096 -m crc=1,reflink=1,rmapbt=1, -i
sparse=1 /dev/vdb2 > /dev/null
mount -o usrquota,grpquota,prjquota /dev/vdb2 /mnt
xfs_io -x -c 'shutdown -f' /mnt
umount /mnt
mount -o ro,norecovery,usrquota,grpquota,prjquota  /dev/vdb2 /mnt

Last mount hangs up

As we add yet another validation if quota state is changing, this also
add a new helper named xfs_qm_validate_state_change(), factoring the
quota state changes out of xfs_qm_newmount() to reduce cluttering
within it.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: do not check NEEDSREPAIR if ro,norecovery mount.
Lukas Herbolt [Mon, 3 Feb 2025 08:55:13 +0000 (09:55 +0100)]
xfs: do not check NEEDSREPAIR if ro,norecovery mount.

If there is corrutpion on the filesystem andxfs_repair
fails to repair it. The last resort of getting the data
is to use norecovery,ro mount. But if the NEEDSREPAIR is
set the filesystem cannot be mounted. The flag must be
cleared out manually using xfs_db, to get access to what
left over of the corrupted fs.

Signed-off-by: Lukas Herbolt <lukas@herbolt.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: fix data fork format filtering during inode repair
Darrick J. Wong [Mon, 3 Feb 2025 00:50:14 +0000 (16:50 -0800)]
xfs: fix data fork format filtering during inode repair

Coverity noticed that xrep_dinode_bad_metabt_fork never runs because
XFS_DINODE_FMT_META_BTREE is always filtered out in the mode selection
switch of xrep_dinode_check_dfork.

Metadata btrees are allowed only in the data forks of regular files, so
add this case explicitly.  I guess this got fubard during a refactoring
prior to 6.13 and I didn't notice until now. :/

Coverity-id: 1617714
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoxfs: fix online repair probing when CONFIG_XFS_ONLINE_REPAIR=n
Darrick J. Wong [Mon, 3 Feb 2025 00:50:14 +0000 (16:50 -0800)]
xfs: fix online repair probing when CONFIG_XFS_ONLINE_REPAIR=n

I received a report from the release engineering side of the house that
xfs_scrub without the -n flag (aka fix it mode) would try to fix a
broken filesystem even on a kernel that doesn't have online repair built
into it:

 # xfs_scrub -dTvn /mnt/test
 EXPERIMENTAL xfs_scrub program in use! Use at your own risk!
 Phase 1: Find filesystem geometry.
 /mnt/test: using 1 threads to scrub.
 Phase 1: Memory used: 132k/0k (108k/25k), time:  0.00/ 0.00/ 0.00s
 <snip>
 Phase 4: Repair filesystem.
 <snip>
 Info: /mnt/test/some/victimdir directory entries: Attempting repair. (repair.c line 351)
 Corruption: /mnt/test/some/victimdir directory entries: Repair unsuccessful; offline repair required. (repair.c line 204)

Source: https://blogs.oracle.com/linux/post/xfs-online-filesystem-repair

It is strange that xfs_scrub doesn't refuse to run, because the kernel
is supposed to return EOPNOTSUPP if we actually needed to run a repair,
and xfs_io's repair subcommand will perror that.  And yet:

 # xfs_io -x -c 'repair probe' /mnt/test
 #

The first problem is commit dcb660f9222fd9 (4.15) which should have had
xchk_probe set the CORRUPT OFLAG so that any of the repair machinery
will get called at all.

It turns out that some refactoring that happened in the 6.6-6.8 era
broke the operation of this corner case.  What we *really* want to
happen is that all the predicates that would steer xfs_scrub_metadata()
towards calling xrep_attempt() should function the same way that they do
when repair is compiled in; and then xrep_attempt gets to return the
fatal EOPNOTSUPP error code that causes the probe to fail.

Instead, commit 8336a64eb75cba (6.6) started the failwhale swimming by
hoisting OFLAG checking logic into a helper whose non-repair stub always
returns false, causing scrub to return "repair not needed" when in fact
the repair is not supported.  Prior to that commit, the oflag checking
that was open-coded in scrub.c worked correctly.

Similarly, in commit 4bdfd7d15747b1 (6.8) we hoisted the IFLAG_REPAIR
and ALREADY_FIXED logic into a helper whose non-repair stub always
returns false, so we never enter the if test body that would have called
xrep_attempt, let alone fail to decode the OFLAGs correctly.

The final insult (yes, we're doing The Naked Gun now) is commit
48a72f60861f79 (6.8) in which we hoisted the "are we going to try a
repair?" predicate into yet another function with a non-repair stub
always returns false.

Fix xchk_probe to trigger xrep_probe if repair is enabled, or return
EOPNOTSUPP directly if it is not.  For all the other scrub types, we
need to fix the header predicates so that the ->repair functions (which
are all xrep_notsupported) get called to return EOPNOTSUPP.  Commit
48a72 is tagged here because the scrub code prior to LTS 6.12 are
incomplete and not worth patching.

Reported-by: David Flynn <david.flynn@oracle.com>
Cc: <stable@vger.kernel.org> # v6.8
Fixes: 8336a64eb75c ("xfs: don't complain about unfixed metadata when repairs were injected")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 months agoLinux 6.14-rc2
Linus Torvalds [Sun, 9 Feb 2025 20:45:03 +0000 (12:45 -0800)]
Linux 6.14-rc2

2 months agoMerge tag 'kbuild-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masah...
Linus Torvalds [Sun, 9 Feb 2025 18:05:32 +0000 (10:05 -0800)]
Merge tag 'kbuild-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Suppress false-positive -Wformat-{overflow,truncation}-non-kprintf
   warnings regardless of the W= option

 - Avoid CONFIG_TRIM_UNUSED_KSYMS dropping symbols passed to symbol_get()

 - Fix a build regression of the Debian linux-headers package

* tag 'kbuild-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: install-extmod-build: add missing quotation marks for CC variable
  kbuild: fix misspelling in scripts/Makefile.lib
  kbuild: keep symbols for symbol_get() even with CONFIG_TRIM_UNUSED_KSYMS
  scripts/Makefile.extrawarn: Do not show clang's non-kprintf warnings at W=1

2 months agoMerge tag 'pm-6.14-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Sun, 9 Feb 2025 17:47:06 +0000 (09:47 -0800)]
Merge tag 'pm-6.14-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Fix a recently introduced kernel crash due to a NULL pointer
  dereference during system-wide suspend (Rafael Wysocki)"

* tag 'pm-6.14-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: core: Restrict power.set_active propagation

2 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 9 Feb 2025 17:41:38 +0000 (09:41 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Correctly clean the BSS to the PoC before allowing EL2 to access it
     on nVHE/hVHE/protected configurations

   - Propagate ownership of debug registers in protected mode after the
     rework that landed in 6.14-rc1

   - Stop pretending that we can run the protected mode without a GICv3
     being present on the host

   - Fix a use-after-free situation that can occur if a vcpu fails to
     initialise the NV shadow S2 MMU contexts

   - Always evaluate the need to arm a background timer for fully
     emulated guest timers

   - Fix the emulation of EL1 timers in the absence of FEAT_ECV

   - Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0

  s390:

   - move some of the guest page table (gmap) logic into KVM itself,
     inching towards the final goal of completely removing gmap from the
     non-kvm memory management code.

     As an initial set of cleanups, move some code from mm/gmap into kvm
     and start using __kvm_faultin_pfn() to fault-in pages as needed;
     but especially stop abusing page->index and page->lru to aid in the
     pgdesc conversion.

  x86:

   - Add missing check in the fix to defer starting the huge page
     recovery vhost_task

   - SRSO_USER_KERNEL_NO does not need SYNTHESIZED_F"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (31 commits)
  KVM: x86/mmu: Ensure NX huge page recovery thread is alive before waking
  KVM: remove kvm_arch_post_init_vm
  KVM: selftests: Fix spelling mistake "initally" -> "initially"
  kvm: x86: SRSO_USER_KERNEL_NO is not synthesized
  KVM: arm64: timer: Don't adjust the EL2 virtual timer offset
  KVM: arm64: timer: Correctly handle EL1 timer emulation when !FEAT_ECV
  KVM: arm64: timer: Always evaluate the need for a soft timer
  KVM: arm64: Fix nested S2 MMU structures reallocation
  KVM: arm64: Fail protected mode init if no vgic hardware is present
  KVM: arm64: Flush/sync debug state in protected mode
  KVM: s390: selftests: Streamline uc_skey test to issue iske after sske
  KVM: s390: remove the last user of page->index
  KVM: s390: move PGSTE softbits
  KVM: s390: remove useless page->index usage
  KVM: s390: move gmap_shadow_pgt_lookup() into kvm
  KVM: s390: stop using lists to keep track of used dat tables
  KVM: s390: stop using page->index for non-shadow gmaps
  KVM: s390: move some gmap shadowing functions away from mm/gmap.c
  KVM: s390: get rid of gmap_translate()
  KVM: s390: get rid of gmap_fault()
  ...

2 months agoPM: sleep: core: Restrict power.set_active propagation
Rafael J. Wysocki [Sat, 8 Feb 2025 17:54:28 +0000 (18:54 +0100)]
PM: sleep: core: Restrict power.set_active propagation

Commit 3775fc538f53 ("PM: sleep: core: Synchronize runtime PM status of
parents and children") exposed an issue related to simple_pm_bus_pm_ops
that uses pm_runtime_force_suspend() and pm_runtime_force_resume() as
bus type PM callbacks for the noirq phases of system-wide suspend and
resume.

The problem is that pm_runtime_force_suspend() does not distinguish
runtime-suspended devices from devices for which runtime PM has never
been enabled, so if it sees a device with runtime PM status set to
RPM_ACTIVE, it will assume that runtime PM is enabled for that device
and so it will attempt to suspend it with the help of its runtime PM
callbacks which may not be ready for that.  As it turns out, this
causes simple_pm_bus_runtime_suspend() to crash due to a NULL pointer
dereference.

Another problem related to the above commit and simple_pm_bus_pm_ops is
that setting runtime PM status of a device handled by the latter to
RPM_ACTIVE will actually prevent it from being resumed because
pm_runtime_force_resume() only resumes devices with runtime PM status
set to RPM_SUSPENDED.

To mitigate these issues, do not allow power.set_active to propagate
beyond the parent of the device with DPM_FLAG_SMART_SUSPEND set that
will need to be resumed, which should be a sufficient stop-gap for the
time being, but they will need to be properly addressed in the future
because in general during system-wide resume it is necessary to resume
all devices in a dependency chain in which at least one device is going
to be resumed.

Fixes: 3775fc538f53 ("PM: sleep: core: Synchronize runtime PM status of parents and children")
Closes: https://lore.kernel.org/linux-pm/1c2433d4-7e0f-4395-b841-b8eac7c25651@nvidia.com/
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/6137505.lOV4Wx5bFT@rjwysocki.net
2 months agoMerge tag 'hardening-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 8 Feb 2025 22:12:17 +0000 (14:12 -0800)]
Merge tag 'hardening-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:
 "Address a KUnit stack initialization regression that got tickled on
  m68k, and solve a Clang(v14 and earlier) bug found by 0day:

   - Fix stackinit KUnit regression on m68k

   - Use ARRAY_SIZE() for memtostr*()/strtomem*()"

* tag 'hardening-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  string.h: Use ARRAY_SIZE() for memtostr*()/strtomem*()
  compiler.h: Introduce __must_be_byte_array()
  compiler.h: Move C string helpers into C-only kernel section
  stackinit: Fix comment for test_small_end
  stackinit: Keep selftest union size small on m68k

2 months agoMerge tag 'seccomp-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees...
Linus Torvalds [Sat, 8 Feb 2025 22:04:21 +0000 (14:04 -0800)]
Merge tag 'seccomp-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fix from Kees Cook:
 "This is really a work-around for x86_64 having grown a syscall to
  implement uretprobe, which has caused problems since v6.11.

  This may change in the future, but for now, this fixes the unintended
  seccomp filtering when uretprobe switched away from traps, and does so
  with something that should be easy to backport.

   - Allow uretprobe on x86_64 to avoid behavioral complications (Eyal
     Birger)"

* tag 'seccomp-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests/seccomp: validate uretprobe syscall passes through seccomp
  seccomp: passthrough uretprobe systemcall without filtering

2 months agoMerge tag 'execve-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees...
Linus Torvalds [Sat, 8 Feb 2025 21:59:24 +0000 (13:59 -0800)]
Merge tag 'execve-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve fix from Kees Cook:
 "This is an alpha-specific fix, but since it touched ELF I was asked to
  carry it.

   - alpha/elf: Fix misc/setarch test of util-linux by removing 32bit
     support (Eric W. Biederman)"

* tag 'execve-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  alpha/elf: Fix misc/setarch test of util-linux by removing 32bit support

2 months agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 8 Feb 2025 21:45:34 +0000 (13:45 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "A number of fairly small fixes, mostly in drivers but two in the core
  to change a retry for depopulation (a trendy new hdd thing that
  reorganizes blocks away from failing elements) and one to fix a GFP_
  annotation to avoid a lock dependency (the third core patch is all in
  testing)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qla1280: Fix kernel oops when debug level > 2
  scsi: ufs: core: Fix error return with query response
  scsi: storvsc: Set correct data length for sending SCSI command without payload
  scsi: ufs: core: Fix use-after free in init error and remove paths
  scsi: core: Do not retry I/Os during depopulation
  scsi: core: Use GFP_NOIO to avoid circular locking dependency
  scsi: ufs: Fix toggling of clk_gating.state when clock gating is not allowed
  scsi: ufs: core: Ensure clk_gating.lock is used only after initialization
  scsi: ufs: core: Simplify temperature exception event handling
  scsi: target: core: Add line break to status show
  scsi: ufs: core: Fix the HIGH/LOW_TEMP Bit Definitions
  scsi: core: Add passthrough tests for success and no failure definitions

2 months agoMerge tag 'i2c-for-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 8 Feb 2025 21:35:17 +0000 (13:35 -0800)]
Merge tag 'i2c-for-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c reverts from Wolfram Sang:
 "It turned out the new mechanism for handling created devices does not
  handle all muxing cases.

  Revert the changes to give a proper solution more time"

* tag 'i2c-for-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  Revert "i2c: Replace list-based mechanism for handling auto-detected clients"
  Revert "i2c: Replace list-based mechanism for handling userspace-created clients"

2 months agoMerge tag 'rust-fixes-6.14' of https://github.com/Rust-for-Linux/linux
Linus Torvalds [Sat, 8 Feb 2025 20:22:21 +0000 (12:22 -0800)]
Merge tag 'rust-fixes-6.14' of https://github.com/Rust-for-Linux/linux

Pull rust fixes from Miguel Ojeda:

 - Do not export KASAN ODR symbols to avoid gendwarfksyms warnings

 - Fix future Rust 1.86.0 (to be released 2025-04-03) x86_64 builds

 - Clean future Rust 1.86.0 (to be released 2025-04-03) warning

 - Fix future GCC 15 (to be released in a few months) builds

 - Fix `rusttest` target in macOS

* tag 'rust-fixes-6.14' of https://github.com/Rust-for-Linux/linux:
  x86: rust: set rustc-abi=x86-softfloat on rustc>=1.86.0
  rust: kbuild: do not export generated KASAN ODR symbols
  rust: kbuild: add -fzero-init-padding-bits to bindgen_skip_cflags
  rust: init: use explicit ABI to clean warning in future compilers
  rust: kbuild: use host dylib naming in rusttestlib-kernel

2 months agoMerge tag 'ftrace-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Sat, 8 Feb 2025 20:18:02 +0000 (12:18 -0800)]
Merge tag 'ftrace-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ftrace fix from Steven Rostedt:
 "Function graph fix of notrace functions.

  When the function graph tracer was restructured to use the global
  section of the meta data in the shadow stack, the bit logic was
  changed. There's a TRACE_GRAPH_NOTRACE_BIT that is the bit number in
  the mask that tells if the function graph tracer is currently in the
  "notrace" mode. The TRACE_GRAPH_NOTRACE is the mask with that bit set.

  But when the code we restructured, the TRACE_GRAPH_NOTRACE_BIT was
  used when it should have been the TRACE_GRAPH_NOTRACE mask. This made
  notrace not work properly"

* tag 'ftrace-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  fgraph: Fix set_graph_notrace with setting TRACE_GRAPH_NOTRACE_BIT

2 months agoMerge tag 'x86-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 8 Feb 2025 20:04:00 +0000 (12:04 -0800)]
Merge tag 'x86-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:
 "Fix a build regression on GCC 15 builds, caused by GCC changing the
  default C version that is overriden in the main Makefile but not in
  the x86 boot code Makefile"

* tag 'x86-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Use '-std=gnu11' to fix build with GCC 15

2 months agoMerge tag 'timers-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 8 Feb 2025 19:55:03 +0000 (11:55 -0800)]
Merge tag 'timers-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Ingo Molnar:
 "Fix a PREEMPT_RT bug in the clocksource verification code that caused
  false positive warnings.

  Also fix a timer migration setup bug when new CPUs are added"

* tag 'timers-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers/migration: Fix off-by-one root mis-connection
  clocksource: Use migrate_disable() to avoid calling get_random_u32() in atomic context

2 months agoMerge tag 'sched-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 8 Feb 2025 19:16:22 +0000 (11:16 -0800)]
Merge tag 'sched-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
 "Fix a cfs_rq->h_nr_runnable accounting bug that trips up a defensive
  SCHED_WARN_ON() on certain workloads. The bug is believed to be
  (accidentally) self-correcting, hence no behavioral side effects are
  expected.

  Also print se.slice in debug output, since this value can now be set
  via the syscall ABI and can be useful to track"

* tag 'sched-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/debug: Provide slice length for fair tasks
  sched/fair: Fix inaccurate h_nr_runnable accounting with delayed dequeue

2 months agoMerge tag 'irq-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 8 Feb 2025 19:05:54 +0000 (11:05 -0800)]
Merge tag 'irq-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Ingo Molnar:
 "Another followup fix for the procps genirq output formatting
  regression caused by an optimization"

* tag 'irq-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Remove leading space from irq_chip::irq_print_chip() callbacks

2 months agoMerge tag 'locking-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 8 Feb 2025 18:54:11 +0000 (10:54 -0800)]
Merge tag 'locking-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Ingo Molnar:
 "Fix a dangling pointer bug in the futex code used by the uring code.

  It isn't causing problems at the moment due to uring ABI limitations
  leaving it essentially unused in current usages, but is a good idea to
  fix nevertheless"

* tag 'locking-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Pass in task to futex_queue()

2 months agofgraph: Fix set_graph_notrace with setting TRACE_GRAPH_NOTRACE_BIT
Steven Rostedt [Sat, 8 Feb 2025 05:15:11 +0000 (00:15 -0500)]
fgraph: Fix set_graph_notrace with setting TRACE_GRAPH_NOTRACE_BIT

The code was restructured where the function graph notrace code, that
would not trace a function and all its children is done by setting a
NOTRACE flag when the function that is not to be traced is hit.

There's a TRACE_GRAPH_NOTRACE_BIT which defines the bit in the flags and a
TRACE_GRAPH_NOTRACE which is the mask with that bit set. But the
restructuring used TRACE_GRAPH_NOTRACE_BIT when it should have used
TRACE_GRAPH_NOTRACE.

For example:

 # cd /sys/kernel/tracing
 # echo set_track_prepare stack_trace_save  > set_graph_notrace
 # echo function_graph > current_tracer
 # cat trace
[..]
 0)               |                          __slab_free() {
 0)               |                            free_to_partial_list() {
 0)               |                                  arch_stack_walk() {
 0)               |                                    __unwind_start() {
 0)   0.501 us    |                                      get_stack_info();

Where a non filter trace looks like:

 # echo > set_graph_notrace
 # cat trace
 0)               |                            free_to_partial_list() {
 0)               |                              set_track_prepare() {
 0)               |                                stack_trace_save() {
 0)               |                                  arch_stack_walk() {
 0)               |                                    __unwind_start() {

Where the filter should look like:

 # cat trace
 0)               |                            free_to_partial_list() {
 0)               |                              _raw_spin_lock_irqsave() {
 0)   0.350 us    |                                preempt_count_add();
 0)   0.351 us    |                                do_raw_spin_lock();
 0)   2.440 us    |                              }

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250208001511.535be150@batman.local.home
Fixes: b84214890a9bc ("function_graph: Move graph notrace bit to shadow stack global var")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2 months agokbuild: Move -Wenum-enum-conversion to W=2
Nathan Chancellor [Thu, 17 Oct 2024 17:09:22 +0000 (10:09 -0700)]
kbuild: Move -Wenum-enum-conversion to W=2

-Wenum-enum-conversion was strengthened in clang-19 to warn for C, which
caused the kernel to move it to W=1 in commit 75b5ab134bb5 ("kbuild:
Move -Wenum-{compare-conditional,enum-conversion} into W=1") because
there were numerous instances that would break builds with -Werror.
Unfortunately, this is not a full solution, as more and more developers,
subsystems, and distributors are building with W=1 as well, so they
continue to see the numerous instances of this warning.

Since the move to W=1, there have not been many new instances that have
appeared through various build reports and the ones that have appeared
seem to be following similar existing patterns, suggesting that most
instances of this warning will not be real issues. The only alternatives
for silencing this warning are adding casts (which is generally seen as
an ugly practice) or refactoring the enums to macro defines or a unified
enum (which may be undesirable because of type safety in other parts of
the code).

Move the warning to W=2, where warnings that occur frequently but may be
relevant should reside.

Cc: stable@vger.kernel.org
Fixes: 75b5ab134bb5 ("kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1")
Link: https://lore.kernel.org/ZwRA9SOcOjjLJcpi@google.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 months agoMerge tag 'v6.14rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 8 Feb 2025 03:23:06 +0000 (19:23 -0800)]
Merge tag 'v6.14rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - Three DFS fixes: DFS mount fix, fix for noisy log msg and one to
   remove some unused code

 - SMB3 Lease fix

* tag 'v6.14rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: change lease epoch type from unsigned int to __u16
  smb: client: get rid of kstrdup() in get_ses_refpath()
  smb: client: fix noisy when tree connecting to DFS interlink targets
  smb: client: don't trust DFSREF_STORAGE_SERVER bit

2 months agoMerge tag 'drm-fixes-2025-02-08' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 7 Feb 2025 20:21:54 +0000 (12:21 -0800)]
Merge tag 'drm-fixes-2025-02-08' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Just regular drm fixes, amdgpu, xe and i915 mostly, but a few
  scattered fixes. I think one of the i915 fixes fixes some build combos
  that Guenter was seeing.

  amdgpu:
   - Add new tiling flag for DCC write compress disable
   - Add BO metadata flag for DCC
   - Fix potential out of bounds access in display
   - Seamless boot fix
   - CONFIG_FRAME_WARN fix
   - PSR1 fix

  xe:
   - OA uAPI related fixes
   - Fix SRIOV migration initialization
   - Restore devcoredump to a sane state

  i915:
   - Fix the build error with clamp after WARN_ON on gcc 13.x+
   - HDCP related fixes
   - PMU fix zero delta busyness issue
   - Fix page cleanup on DMA remap failure
   - Drop 64bpp YUV formats from ICL+ SDR planes
   - GuC log related fix
   - DisplayPort related fixes

  ivpu:
   - Fix error handling

  komeda:
   - add return check

  zynqmp:
   - fix locking in DP code

  ast:
   - fix AST DP timeout

  cec:
   - fix broken CEC adapter check"

* tag 'drm-fixes-2025-02-08' of https://gitlab.freedesktop.org/drm/kernel: (29 commits)
  drm/i915/dp: Fix potential infinite loop in 128b/132b SST
  Revert "drm/amd/display: Use HW lock mgr for PSR1"
  drm/amd/display: Respect user's CONFIG_FRAME_WARN more for dml files
  accel/amdxdna: Add MODULE_FIRMWARE() declarations
  drm/i915/dp: Iterate DSC BPP from high to low on all platforms
  drm/xe: Fix and re-enable xe_print_blob_ascii85()
  drm/xe/devcoredump: Move exec queue snapshot to Contexts section
  drm/xe/oa: Set stream->pollin in xe_oa_buffer_check_unlocked
  drm/xe/pf: Fix migration initialization
  drm/xe/oa: Preserve oa_ctrl unused bits
  drm/amd/display: Fix seamless boot sequence
  drm/amd/display: Fix out-of-bound accesses
  drm/amdgpu: add a BO metadata flag to disable write compression for Vulkan
  drm/i915/backlight: Return immediately when scale() finds invalid parameters
  drm/i915/dp: Return min bpc supported by source instead of 0
  drm/i915/dp: fix the Adaptive sync Operation mode for SDP
  drm/i915/guc: Debug print LRC state entries only if the context is pinned
  drm/i915: Drop 64bpp YUV formats from ICL+ SDR planes
  drm/i915: Fix page cleanup on DMA remap failure
  drm/i915/pmu: Fix zero delta busyness issue
  ...

2 months agoMerge tag 'stable/for-linus-6.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Fri, 7 Feb 2025 19:05:50 +0000 (11:05 -0800)]
Merge tag 'stable/for-linus-6.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft

Pull ibft fixes from Konrad Rzeszutek Wilk:
 "Two tiny fixes to IBFT code: one for Kconfig and another for IPv6"

* tag 'stable/for-linus-6.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft:
  iscsi_ibft: Fix UBSAN shift-out-of-bounds warning in ibft_attr_show_nic()
  firmware: iscsi_ibft: fix ISCSI_IBFT Kconfig entry

2 months agoMerge tag 'block-6.14-20250207' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 7 Feb 2025 19:00:33 +0000 (11:00 -0800)]
Merge tag 'block-6.14-20250207' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - MD pull request via Song:
      - fix an error handling path for md-linear

 - NVMe pull request via Keith:
      - Connection fixes for fibre channel transport (Daniel)
      - Endian fixes (Keith, Christoph)
      - Cleanup fix for host memory buffer (Francis)
      - Platform specific power quirks (Georg)
      - Target memory leak (Sagi)
      - Use appropriate controller state accessor (Daniel)

 - Fixup for a regression introduced last week, where sunvdc wasn't
   updated for an API change, causing compilation failures on sparc64.

* tag 'block-6.14-20250207' of git://git.kernel.dk/linux:
  drivers/block/sunvdc.c: update the correct AIP call
  md: Fix linear_set_limits()
  nvme-fc: use ctrl state getter
  nvme: make nvme_tls_attrs_group static
  nvmet: add a missing endianess conversion in nvmet_execute_admin_connect
  nvmet: the result field in nvmet_alloc_ctrl_args is little endian
  nvmet: fix a memory leak in controller identify
  nvme-fc: do not ignore connectivity loss during connecting
  nvme: handle connectivity loss in nvme_set_queue_count
  nvme-fc: go straight to connecting state when initializing
  nvme-pci: Add TUXEDO IBP Gen9 to Samsung sleep quirk
  nvme-pci: Add TUXEDO InfinityFlex to Samsung sleep quirk
  nvme-pci: remove redundant dma frees in hmb
  nvmet: fix rw control endian access

2 months agokbuild: install-extmod-build: add missing quotation marks for CC variable
WangYuli [Fri, 7 Feb 2025 07:08:55 +0000 (15:08 +0800)]
kbuild: install-extmod-build: add missing quotation marks for CC variable

While attempting to build a Debian packages with CC="ccache gcc", I
saw the following error as builddeb builds linux-headers-$KERNELVERSION:

  make HOSTCC=ccache gcc VPATH= srcroot=. -f ./scripts/Makefile.build obj=debian/linux-headers-6.14.0-rc1/usr/src/linux-headers-6.14.0-rc1/scripts
  make[6]: *** No rule to make target 'gcc'.  Stop.

Upon investigation, it seems that one instance of $(CC) variable reference
in ./scripts/package/install-extmod-build was missing quotation marks,
causing the above error.

Add the missing quotation marks around $(CC) to fix build.

Fixes: 5f73e7d0386d ("kbuild: refactor cross-compiling linux-headers package")
Co-developed-by: Mingcong Bai <jeffbai@aosc.io>
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
Tested-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2 months agoMerge tag 'pm-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 7 Feb 2025 18:34:50 +0000 (10:34 -0800)]
Merge tag 'pm-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix a handful of issues in the amd-pstate driver, the airoha
  cpufreq driver build, a (recently added) possible NULL pointer
  dereference in the cpufreq code and a possible memory leak in the
  power capping subsystem:

   - Fix cpufreq_policy reference counting and prevent max_perf from
     going above the current limit in amd-pstate, and drop a redundant
     goto label from it (Dhananjay Ugwekar)

   - Prevent the per-policy boost_enabled flag in amd-pstate from
     getting out of sync with the actual state after boot failures
     (Lifeng Zheng)

   - Fix a recently added possible NULL pointer dereference in the
     cpufreq core (Aboorva Devarajan)

   - Fix a build issue related to CONFIG_OF and COMPILE_TEST
     dependencies in the airoha cpufreq driver (Arnd Bergmann)

   - Fix a possible memory leak in the power capping subsystem (Joe
     Hattori)"

* tag 'pm-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq/amd-pstate: Fix cpufreq_policy ref counting
  cpufreq: prevent NULL dereference in cpufreq_online()
  cpufreq: airoha: modify CONFIG_OF dependency
  cpufreq/amd-pstate: Fix max_perf updation with schedutil
  cpufreq/amd-pstate: Remove the goto label in amd_pstate_update_limits
  cpufreq/amd-pstate: Fix per-policy boost flag incorrect when fail
  powercap: call put_device() on an error path in powercap_register_control_type()

2 months agoMerge tag 'acpi-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 7 Feb 2025 18:08:25 +0000 (10:08 -0800)]
Merge tag 'acpi-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix three assorted issues, including one recent regression:

   - Add an ACPI IRQ override quirk for Eluktronics MECH-17 to make the
     internal keyboard work (Gannon Kolding)

   - Make acpi_data_prop_read() reflect the OF counterpart behavior in
     error cases (Andy Shevchenko)

   - Remove recently added strict ACPI PRM handler address checks that
     prevented PRM from working on some platforms in the field (Aubrey
     Li)"

* tag 'acpi-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: PRM: Remove unnecessary strict handler address checks
  ACPI: resource: IRQ override for Eluktronics MECH-17
  ACPI: property: Fix return value for nval == 0 in acpi_data_prop_read()

2 months agoMerge tag 'gpio-fixes-for-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 7 Feb 2025 17:50:33 +0000 (09:50 -0800)]
Merge tag 'gpio-fixes-for-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

 - fix interrupt support in gpio-pca953x

 - fix configfs attribute locking in gpio-sim

 - limit the visibility of the GPIO_GRGPIO Kconfig symbol to OF systems
   only

 - update MAINTAINERS

* tag 'gpio-fixes-for-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  MAINTAINERS: Use my kernel.org address for ACPI GPIO work
  gpio: GPIO_GRGPIO should depend on OF
  gpio: sim: lock hog configfs items if present
  gpio: pca953x: Improve interrupt support

2 months agoMerge tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Linus Torvalds [Fri, 7 Feb 2025 17:22:31 +0000 (09:22 -0800)]
Merge tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Fix fsnotify FMODE_NONOTIFY* handling.

   This also disables fsnotify on all pseudo files by default apart from
   very select exceptions. This carries a regression risk so we need to
   watch out and adapt accordingly. However, it is overall a significant
   improvement over the current status quo where every rando file can
   get fsnotify enabled.

 - Cleanup and simplify lockref_init() after recent lockref changes.

 - Fix vboxfs build with gcc-15.

 - Add an assert into inode_set_cached_link() to catch corrupt links.

 - Allow users to also use an empty string check to detect whether a
   given mount option string was empty or not.

 - Fix how security options were appended to statmount()'s ->mnt_opt
   field.

 - Fix statmount() selftests to always check the returned mask.

 - Fix uninitialized value in vfs_statx_path().

 - Fix pidfs_ioctl() sanity checks to guard against ioctl() overloading
   and preserve extensibility.

* tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  vfs: sanity check the length passed to inode_set_cached_link()
  pidfs: improve ioctl handling
  fsnotify: disable pre-content and permission events by default
  selftests: always check mask returned by statmount(2)
  fsnotify: disable notification by default for all pseudo files
  fs: fix adding security options to statmount.mnt_opt
  fsnotify: use accessor to set FMODE_NONOTIFY_*
  lockref: remove count argument of lockref_init
  gfs2: switch to lockref_init(..., 1)
  gfs2: use lockref_init for gl_lockref
  statmount: let unset strings be empty
  vboxsf: fix building with GCC 15
  fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()

2 months agoMerge tag 'bcachefs-2025-02-06.2' of git://evilpiepirate.org/bcachefs
Linus Torvalds [Fri, 7 Feb 2025 17:16:07 +0000 (09:16 -0800)]
Merge tag 'bcachefs-2025-02-06.2' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Nothing major, things continue to be fairly quiet over here.

   - add a SubmittingPatches to clarify that patches submitted for
     bcachefs do, in fact, need to be tested

   - discard path now correctly issues journal flushes when needed, this
     fixes performance issues when the filesystem is nearly full and
     we're bottlenecked on copygc

   - fix a bug that could cause the pending rebalance work accounting to
     be off when devices are being onlined/offlined; users should report
     if they are still seeing this

   - and a few more trivial ones"

* tag 'bcachefs-2025-02-06.2' of git://evilpiepirate.org/bcachefs:
  bcachefs: bch2_bkey_sectors_need_rebalance() now only depends on bch_extent_rebalance
  bcachefs: Fix rcu imbalance in bch2_fs_btree_key_cache_exit()
  bcachefs: Fix discard path journal flushing
  bcachefs: fix deadlock in journal_entry_open()
  bcachefs: fix incorrect pointer check in __bch2_subvolume_delete()
  bcachefs docs: SubmittingPatches.rst

2 months agoMAINTAINERS: Remove myself
Hector Martin [Thu, 6 Feb 2025 18:21:46 +0000 (03:21 +0900)]
MAINTAINERS: Remove myself

I no longer have any faith left in the kernel development process or
community management approach.

Apple/ARM platform development will continue downstream. If I feel like
sending some patches upstream in the future myself for whatever subtree
I may, or I may not. Anyone who feels like fighting the upstreaming
fight themselves is welcome to do so.

Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 months agoMAINTAINERS: Move Pavel to kernel.org address
Pavel Machek [Wed, 5 Feb 2025 18:42:01 +0000 (19:42 +0100)]
MAINTAINERS: Move Pavel to kernel.org address

I need to filter my emails better, switch to pavel@kernel.org address
to help with that.

Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 months agoMerge tag 'md-6.14-20250206' of https://git.kernel.org/pub/scm/linux/kernel/git/mdrai...
Jens Axboe [Fri, 7 Feb 2025 14:23:03 +0000 (07:23 -0700)]
Merge tag 'md-6.14-20250206' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into block-6.14

Pull MD fix from Song:

"This patch, by Bart Van Assche, fixes an error handling path for
 md-linear."

* tag 'md-6.14-20250206' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux:
  md: Fix linear_set_limits()

2 months agoMerge branches 'acpi-property' and 'acpi-resource'
Rafael J. Wysocki [Fri, 7 Feb 2025 12:06:31 +0000 (13:06 +0100)]
Merge branches 'acpi-property' and 'acpi-resource'

Merge a new ACPI IRQ override quirk for Eluktronics MECH-17 (Gannon
Kolding) and an acpi_data_prop_read() fix making it reflect the OF
counterpart behavior in error cases (Andy Shevchenko).

* acpi-property:
  ACPI: property: Fix return value for nval == 0 in acpi_data_prop_read()

* acpi-resource:
  ACPI: resource: IRQ override for Eluktronics MECH-17

2 months agoMerge branch 'pm-powercap'
Rafael J. Wysocki [Fri, 7 Feb 2025 11:43:58 +0000 (12:43 +0100)]
Merge branch 'pm-powercap'

Fix a possible memory leak in the power capping subsystem (Joe Hattori).

* pm-powercap:
  powercap: call put_device() on an error path in powercap_register_control_type()

2 months agovfs: sanity check the length passed to inode_set_cached_link()
Mateusz Guzik [Tue, 4 Feb 2025 21:32:07 +0000 (22:32 +0100)]
vfs: sanity check the length passed to inode_set_cached_link()

This costs a strlen() call when instatianating a symlink.

Preferably it would be hidden behind VFS_WARN_ON (or compatible), but
there is no such facility at the moment. With the facility in place the
call can be patched out in production kernels.

In the meantime, since the cost is being paid unconditionally, use the
result to a fixup the bad caller.

This is not expected to persist in the long run (tm).

Sample splat:
bad length passed for symlink [/tmp/syz-imagegen43743633/file0/file0] (got 131109, expected 37)
[rest of WARN blurp goes here]

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250204213207.337980-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agopidfs: improve ioctl handling
Christian Brauner [Tue, 4 Feb 2025 13:51:20 +0000 (14:51 +0100)]
pidfs: improve ioctl handling

Pidfs supports extensible and non-extensible ioctls. The extensible
ioctls need to check for the ioctl number itself not just the ioctl
command otherwise both backward- and forward compatibility are broken.

The pidfs ioctl handler also needs to look at the type of the ioctl
command to guard against cases where "[...] a daemon receives some
random file descriptor from a (potentially less privileged) client and
expects the FD to be of some specific type, it might call ioctl() on
this FD with some type-specific command and expect the call to fail if
the FD is of the wrong type; but due to the missing type check, the
kernel instead performs some action that userspace didn't expect."
(cf. [1]]

Link: https://lore.kernel.org/r/20250204-work-pidfs-ioctl-v1-1-04987d239575@kernel.org
Link: https://lore.kernel.org/r/CAG48ez2K9A5GwtgqO31u9ZL292we8ZwAA=TJwwEv7wRuJ3j4Lw@mail.gmail.com
Fixes: 8ce352818820 ("pidfs: check for valid ioctl commands")
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: stable@vger.kernel.org # v6.13; please backport with 8ce352818820 ("pidfs: check for valid ioctl commands")
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agoMerge patch series "Fix for huge faults regression"
Christian Brauner [Tue, 4 Feb 2025 10:25:45 +0000 (11:25 +0100)]
Merge patch series "Fix for huge faults regression"

Amir Goldstein <amir73il@gmail.com> says:

The two Fix patches have been tested by Alex together and each one
independently.

I also verified that they pass the LTP inoityf/fanotify tests.

* patches from https://lore.kernel.org/r/20250203223205.861346-1-amir73il@gmail.com:
  fsnotify: disable pre-content and permission events by default
  fsnotify: disable notification by default for all pseudo files
  fsnotify: use accessor to set FMODE_NONOTIFY_*

Link: https://lore.kernel.org/r/20250203223205.861346-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agofsnotify: disable pre-content and permission events by default
Amir Goldstein [Mon, 3 Feb 2025 22:32:05 +0000 (23:32 +0100)]
fsnotify: disable pre-content and permission events by default

After introducing pre-content events, we had a regression related to
disabling huge faults on files that should never have pre-content events
enabled.

This happened because the default f_mode of allocated files (0) does
not disable pre-content events.

Pre-content events are disabled in file_set_fsnotify_mode_by_watchers()
but internal files may not get to call this helper.

Initialize f_mode to disable permission and pre-content events for all
files and if needed they will be enabled for the callers of
file_set_fsnotify_mode_by_watchers().

Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches")
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Closes: https://lore.kernel.org/linux-fsdevel/20250131121703.1e4d00a7.alex.williamson@redhat.com/
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20250203223205.861346-4-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agoselftests: always check mask returned by statmount(2)
Miklos Szeredi [Wed, 29 Jan 2025 16:06:41 +0000 (17:06 +0100)]
selftests: always check mask returned by statmount(2)

STATMOUNT_MNT_OPTS can actually be missing if there are no options.  This
is a change of behavior since 75ead69a7173 ("fs: don't let statmount return
empty strings").

The other checks shouldn't actually trigger, but add them for correctness
and for easier debugging if the test fails.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/r/20250129160641.35485-1-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agofsnotify: disable notification by default for all pseudo files
Amir Goldstein [Mon, 3 Feb 2025 22:32:04 +0000 (23:32 +0100)]
fsnotify: disable notification by default for all pseudo files

Most pseudo files are not applicable for fsnotify events at all,
let alone to the new pre-content events.

Disable notifications to all files allocated with alloc_file_pseudo()
and enable legacy inotify events for the specific cases of pipe and
socket, which have known users of inotify events.

Pre-content events are also kept disabled for sockets and pipes.

Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches")
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Closes: https://lore.kernel.org/linux-fsdevel/20250131121703.1e4d00a7.alex.williamson@redhat.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wi2pThSVY=zhO=ZKxViBj5QCRX-=AS2+rVknQgJnHXDFg@mail.gmail.com/
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20250203223205.861346-3-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agofs: fix adding security options to statmount.mnt_opt
Miklos Szeredi [Wed, 29 Jan 2025 15:12:53 +0000 (16:12 +0100)]
fs: fix adding security options to statmount.mnt_opt

Prepending security options was made conditional on sb->s_op->show_options,
but security options are independent of sb options.

Fixes: 056d33137bf9 ("fs: prepend statmount.mnt_opts string with security_sb_mnt_opts()")
Fixes: f9af549d1fd3 ("fs: export mount options via statmount()")
Cc: stable@vger.kernel.org # v6.11
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/r/20250129151253.33241-1-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agofsnotify: use accessor to set FMODE_NONOTIFY_*
Amir Goldstein [Mon, 3 Feb 2025 22:32:03 +0000 (23:32 +0100)]
fsnotify: use accessor to set FMODE_NONOTIFY_*

The FMODE_NONOTIFY_* bits are a 2-bits mode.  Open coding manipulation
of those bits is risky.  Use an accessor file_set_fsnotify_mode() to
set the mode.

Rename file_set_fsnotify_mode() => file_set_fsnotify_mode_from_watchers()
to make way for the simple accessor name.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20250203223205.861346-2-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agoMerge patch series "further lockref cleanups"
Christian Brauner [Thu, 30 Jan 2025 15:04:42 +0000 (16:04 +0100)]
Merge patch series "further lockref cleanups"

Andreas Gruenbacher <agruenba@redhat.com> says:

Here's an updated version with an additional comment saying that
lockref_init() initializes count to 1.

* patches from https://lore.kernel.org/r/20250130135624.1899988-1-agruenba@redhat.com:
  lockref: remove count argument of lockref_init
  gfs2: switch to lockref_init(..., 1)
  gfs2: use lockref_init for gl_lockref

Link: https://lore.kernel.org/r/20250130135624.1899988-1-agruenba@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agolockref: remove count argument of lockref_init
Andreas Gruenbacher [Thu, 30 Jan 2025 13:56:23 +0000 (14:56 +0100)]
lockref: remove count argument of lockref_init

All users of lockref_init() now initialize the count to 1, so hardcode
that and remove the count argument.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Link: https://lore.kernel.org/r/20250130135624.1899988-4-agruenba@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agogfs2: switch to lockref_init(..., 1)
Andreas Gruenbacher [Thu, 30 Jan 2025 13:56:22 +0000 (14:56 +0100)]
gfs2: switch to lockref_init(..., 1)

In qd_alloc(), initialize the lockref count to 1 to cover the common
case.  Compensate for that in gfs2_quota_init() by adjusting the count
back down to 0; this only occurs when mounting the filesystem rw.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Link: https://lore.kernel.org/r/20250130135624.1899988-3-agruenba@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agogfs2: use lockref_init for gl_lockref
Andreas Gruenbacher [Thu, 30 Jan 2025 13:56:21 +0000 (14:56 +0100)]
gfs2: use lockref_init for gl_lockref

Move the initialization of gl_lockref from gfs2_init_glock_once() to
gfs2_glock_get().  This allows to use lockref_init() there.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Link: https://lore.kernel.org/r/20250130135624.1899988-2-agruenba@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agostatmount: let unset strings be empty
Miklos Szeredi [Thu, 30 Jan 2025 12:15:00 +0000 (13:15 +0100)]
statmount: let unset strings be empty

Just like it's normal for unset values to be zero, unset strings should be
empty instead of containing random values.

It seems to be a typical mistake that the mask returned by statmount is not
checked, which can result in various bugs.

With this fix, these bugs are prevented, since it is highly likely that
userspace would just want to turn the missing mask case into an empty
string anyway (most of the recently found cases are of this type).

Link: https://lore.kernel.org/all/CAJfpegsVCPfCn2DpM8iiYSS5DpMsLB8QBUCHecoj6s0Vxf4jzg@mail.gmail.com/
Fixes: 68385d77c05b ("statmount: simplify string option retrieval")
Fixes: 46eae99ef733 ("add statmount(2) syscall")
Cc: stable@vger.kernel.org # v6.8
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/r/20250130121500.113446-1-mszeredi@redhat.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agovboxsf: fix building with GCC 15
Brahmajit Das [Tue, 21 Jan 2025 16:26:48 +0000 (21:56 +0530)]
vboxsf: fix building with GCC 15

Building with GCC 15 results in build error
fs/vboxsf/super.c:24:54: error: initializer-string for array of â€˜unsigned char’ is too long [-Werror=unterminated-string-initialization]
   24 | static const unsigned char VBSF_MOUNT_SIGNATURE[4] = "\000\377\376\375";
      |                                                      ^~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Due to GCC having enabled -Werror=unterminated-string-initialization[0]
by default. Separately initializing each array element of
VBSF_MOUNT_SIGNATURE to ensure NUL termination, thus satisfying GCC 15
and fixing the build error.

[0]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wno-unterminated-string-initialization

Signed-off-by: Brahmajit Das <brahmajit.xyz@gmail.com>
Link: https://lore.kernel.org/r/20250121162648.1408743-1-brahmajit.xyz@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agofs/stat.c: avoid harmless garbage value problem in vfs_statx_path()
Su Hui [Sun, 19 Jan 2025 02:59:47 +0000 (10:59 +0800)]
fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()

Clang static checker(scan-build) warning:
fs/stat.c:287:21: warning: The left expression of the compound assignment is
an uninitialized value. The computed value will also be garbage.
  287 |                 stat->result_mask |= STATX_MNT_ID_UNIQUE;
      |                 ~~~~~~~~~~~~~~~~~ ^
fs/stat.c:290:21: warning: The left expression of the compound assignment is
an uninitialized value. The computed value will also be garbage.
  290 |                 stat->result_mask |= STATX_MNT_ID;

When vfs_getattr() failed because of security_inode_getattr(), 'stat' is
uninitialized. In this case, there is a harmless garbage problem in
vfs_statx_path(). It's better to return error directly when
vfs_getattr() failed, avoiding garbage value and more clearly.

Signed-off-by: Su Hui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20250119025946.1168957-1-suhui@nfschina.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2 months agotimers/migration: Fix off-by-one root mis-connection
Frederic Weisbecker [Wed, 5 Feb 2025 16:02:20 +0000 (17:02 +0100)]
timers/migration: Fix off-by-one root mis-connection

Before attaching a new root to the old root, the children counter of the
new root is checked to verify that only the upcoming CPU's top group have
been connected to it. However since the recently added commit b729cc1ec21a
("timers/migration: Fix another race between hotplug and idle entry/exit")
this check is not valid anymore because the old root is pre-accounted
as a child to the new root. Therefore after connecting the upcoming
CPU's top group to the new root, the children count to be expected must
be 2 and not 1 anymore.

This omission results in the old root to not be connected to the new
root. Then eventually the system may run with more than one top level,
which defeats the purpose of a single idle migrator.

Also the old root is pre-accounted but not connected upon the new root
creation. But it can be connected to the new root later on. Therefore
the old root may be accounted twice to the new root. The propagation of
such overcommit can end up creating a double final top-level root with a
groupmask incorrectly initialized. Although harmless given that the final
top level roots will never have a parent to walk up to, this oddity
opportunistically reported the core issue:

  WARNING: CPU: 8 PID: 0 at kernel/time/timer_migration.c:543 tmigr_requires_handle_remote
  CPU: 8 UID: 0 PID: 0 Comm: swapper/8
  RIP: 0010:tmigr_requires_handle_remote
  Call Trace:
   <IRQ>
   ? tmigr_requires_handle_remote
   ? hrtimer_run_queues
   update_process_times
   tick_periodic
   tick_handle_periodic
   __sysvec_apic_timer_interrupt
   sysvec_apic_timer_interrupt
  </IRQ>

Fix the problem by taking the old root into account in the children count
of the new root so the connection is not omitted.

Also warn when more than one top level group exists to better detect
similar issues in the future.

Fixes: b729cc1ec21a ("timers/migration: Fix another race between hotplug and idle entry/exit")
Reported-by: Matt Fleming <mfleming@cloudflare.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250205160220.39467-1-frederic@kernel.org
2 months agogenirq: Remove leading space from irq_chip::irq_print_chip() callbacks
Geert Uytterhoeven [Wed, 5 Feb 2025 14:22:56 +0000 (15:22 +0100)]
genirq: Remove leading space from irq_chip::irq_print_chip() callbacks

The space separator was factored out from the multiple chip name prints,
but several irq_chip::irq_print_chip() callbacks still print a leading
space.  Remove the superfluous double spaces.

Fixes: 9d9f204bdf7243bf ("genirq/proc: Add missing space separator back")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/893f7e9646d8933cd6786d5a1ef3eb076d263768.1738764803.git.geert+renesas@glider.be
2 months agoMerge tag 'drm-intel-fixes-2025-02-06' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Fri, 7 Feb 2025 05:37:12 +0000 (15:37 +1000)]
Merge tag 'drm-intel-fixes-2025-02-06' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Fix the build error with clamp after WARN_ON on gcc 13.x+ (Guenter)
- HDCP related fixes (Suraj)
- PMU fix zero delta busyness issue (Umesh)
- Fix page cleanup on DMA remap failure (Brian)
- Drop 64bpp YUV formats from ICL+ SDR planes (Ville)
- GuC log related fix (Daniele)
- DisplayPort related fixes (Ankit, Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z6TDHpgI6dnOc0KI@intel.com
2 months agoMerge tag 'drm-xe-fixes-2025-02-06' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Fri, 7 Feb 2025 05:27:23 +0000 (15:27 +1000)]
Merge tag 'drm-xe-fixes-2025-02-06' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

UAPI Changes:
 - OA uAPI related fixes (Ashutosh)

Driver Changes:
 - Fix SRIOV migration initialization (Michal)
 - Restore devcoredump to a sane state (Lucas)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z6S9rI1ScT_5Aw6_@intel.com
2 months agoMerge tag 'drm-misc-fixes-2025-02-06' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Fri, 7 Feb 2025 04:47:11 +0000 (14:47 +1000)]
Merge tag 'drm-misc-fixes-2025-02-06' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

A couple of fixes for ivpu to error handling, komeda for format
handling, AST DP timeout fix when enabling the output, locking fix for
zynqmp DP support, tiled format handling in drm/client, and refcounting
fix for bochs

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250206-encouraging-judicious-quoll-adc1dc@houat
2 months agoMerge tag 'amd-drm-fixes-6.14-2025-02-05' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Fri, 7 Feb 2025 03:53:59 +0000 (13:53 +1000)]
Merge tag 'amd-drm-fixes-6.14-2025-02-05' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.14-2025-02-05:

amdgpu:
- Add BO metadata flag for DCC
- Fix potential out of bounds access in display
- Seamless boot fix
- CONFIG_FRAME_WARN fix
- PSR1 fix

UAPI:
- Add new tiling flag for DCC write compress disable
  Proposed userspace: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33255

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250205214910.3664690-1-alexander.deucher@amd.com
2 months agobcachefs: bch2_bkey_sectors_need_rebalance() now only depends on bch_extent_rebalance
Kent Overstreet [Sun, 26 Jan 2025 02:29:45 +0000 (21:29 -0500)]
bcachefs: bch2_bkey_sectors_need_rebalance() now only depends on bch_extent_rebalance

Previously, bch2_bkey_sectors_need_rebalance() called
bch2_target_accepts_data(), checking whether the target is writable.

However, this means that adding or removing devices from a target would
change the value of bch2_bkey_sectors_need_rebalance() for an existing
extent; this needs to be invariant so that the extent trigger can
correctly maintain rebalance_work accounting.

Instead, check target_accepts_data() in io_opts_to_rebalance_opts(),
before creating the bch_extent_rebalance entry.

This fixes (one?) cause of rebalance_work accounting being off.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Fix rcu imbalance in bch2_fs_btree_key_cache_exit()
Kent Overstreet [Mon, 3 Feb 2025 16:35:11 +0000 (11:35 -0500)]
bcachefs: Fix rcu imbalance in bch2_fs_btree_key_cache_exit()

Spotted by sparse.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Fix discard path journal flushing
Kent Overstreet [Mon, 27 Jan 2025 06:21:44 +0000 (01:21 -0500)]
bcachefs: Fix discard path journal flushing

The discard path is supposed to issue journal flushes when there's too
many buckets empty buckets that need a journal commit before they can be
written to again, but at some point this code seems to have been lost.

Bring it back with a new optimization to make sure we don't issue too
many journal flushes: the journal now tracks the sequence number of the
most recent flush in progress, which the discard path uses when deciding
which buckets need a journal flush.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: fix deadlock in journal_entry_open()
Jeongjun Park [Sun, 2 Feb 2025 06:13:51 +0000 (15:13 +0900)]
bcachefs: fix deadlock in journal_entry_open()

In the previous commit b3d82c2f2761, code was added to prevent journal sequence
overflow. Among them, the code added to journal_entry_open() uses the
bch2_fs_fatal_err_on() function to handle errors.

However, __journal_res_get() , which calls journal_entry_open() , calls
journal_entry_open() while holding journal->lock , but bch2_fs_fatal_err_on()
internally tries to acquire journal->lock , which results in a deadlock.

So we need to add a locked helper to handle fatal errors even when the
journal->lock is held.

Fixes: b3d82c2f2761 ("bcachefs: Guard against journal seq overflow")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: fix incorrect pointer check in __bch2_subvolume_delete()
Jeongjun Park [Fri, 31 Jan 2025 16:20:31 +0000 (01:20 +0900)]
bcachefs: fix incorrect pointer check in __bch2_subvolume_delete()

For some unknown reason, checks on struct bkey_s_c_snapshot and struct
bkey_s_c_snapshot_tree pointers are missing.

Therefore, I think it would be appropriate to fix the incorrect pointer checking
through this patch.

Fixes: 4bd06f07bcb5 ("bcachefs: Fixes for snapshot_tree.master_subvol")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs docs: SubmittingPatches.rst
Kent Overstreet [Sat, 1 Feb 2025 17:56:51 +0000 (12:56 -0500)]
bcachefs docs: SubmittingPatches.rst

Add an (initial?) patch submission checklist, focusing mainly on
testing.

Yes, all patches must be tested, and that starts (but does not end) with
the patch author.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agostring.h: Use ARRAY_SIZE() for memtostr*()/strtomem*()
Kees Cook [Wed, 5 Feb 2025 21:45:26 +0000 (13:45 -0800)]
string.h: Use ARRAY_SIZE() for memtostr*()/strtomem*()

The destination argument of memtostr*() and strtomem*() must be a
fixed-size char array at compile time, so there is no need to use
__builtin_object_size() (which is useful for when an argument is
either a pointer or unknown). Instead use ARRAY_SIZE(), which has the
benefit of working around a bug in Clang (fixed[1] in 15+) that got
__builtin_object_size() wrong sometimes.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501310832.kiAeOt2z-lkp@intel.com/
Suggested-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://github.com/llvm/llvm-project/commit/d8e0a6d5e9dd2311641f9a8a5d2bf90829951ddc
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Kees Cook <kees@kernel.org>
2 months agocompiler.h: Introduce __must_be_byte_array()
Kees Cook [Wed, 5 Feb 2025 20:48:07 +0000 (12:48 -0800)]
compiler.h: Introduce __must_be_byte_array()

In preparation for adding stricter type checking to the str/mem*()
helpers, provide a way to check that a variable is a byte array
via __must_be_byte_array().

Suggested-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kees Cook <kees@kernel.org>
2 months agocompiler.h: Move C string helpers into C-only kernel section
Kees Cook [Wed, 5 Feb 2025 20:32:49 +0000 (12:32 -0800)]
compiler.h: Move C string helpers into C-only kernel section

The C kernel helpers for evaluating C Strings were positioned where they
were visible to assembly inclusion, which was not intended. Move them
into the kernel and C-only area of the header so future changes won't
confuse the assembler.

Fixes: d7a516c6eeae ("compiler.h: Fix undefined BUILD_BUG_ON_ZERO()")
Fixes: 559048d156ff ("string: Check for "nonstring" attribute on strscpy() arguments")
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Kees Cook <kees@kernel.org>
2 months agox86: rust: set rustc-abi=x86-softfloat on rustc>=1.86.0
Alice Ryhl [Mon, 3 Feb 2025 08:40:57 +0000 (08:40 +0000)]
x86: rust: set rustc-abi=x86-softfloat on rustc>=1.86.0

When using Rust on the x86 architecture, we are currently using the
unstable target.json feature to specify the compilation target. Rustc is
going to change how softfloat is specified in the target.json file on
x86, thus update generate_rust_target.rs to specify softfloat using the
new option.

Note that if you enable this parameter with a compiler that does not
recognize it, then that triggers a warning but it does not break the
build.

[ For future reference, this solves the following error:

        RUSTC L rust/core.o
      error: Error loading target specification: target feature
      `soft-float` is incompatible with the ABI but gets enabled in
      target spec. Run `rustc --print target-list` for a list of
      built-in targets

  - Miguel ]

Cc: <stable@vger.kernel.org> # Needed in 6.12.y and 6.13.y only (Rust is pinned in older LTSs).
Link: https://github.com/rust-lang/rust/pull/136146
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for x86
Link: https://lore.kernel.org/r/20250203-rustc-1-86-x86-softfloat-v1-1-220a72a5003e@google.com
[ Added 6.13.y too to Cc: stable tag and added reasoning to avoid
  over-backporting. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2 months agoselftests/seccomp: validate uretprobe syscall passes through seccomp
Eyal Birger [Sun, 2 Feb 2025 16:29:21 +0000 (08:29 -0800)]
selftests/seccomp: validate uretprobe syscall passes through seccomp

The uretprobe syscall is implemented as a performance enhancement on
x86_64 by having the kernel inject a call to it on function exit; User
programs cannot call this system call explicitly.

As such, this syscall is considered a kernel implementation detail and
should not be filtered by seccomp.

Enhance the seccomp bpf test suite to check that uretprobes can be
attached to processes without the killing the process regardless of
seccomp policy.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Link: https://lore.kernel.org/r/20250202162921.335813-3-eyal.birger@gmail.com
[kees: Skip archs without __NR_uretprobe]
Signed-off-by: Kees Cook <kees@kernel.org>
2 months agoseccomp: passthrough uretprobe systemcall without filtering
Eyal Birger [Sun, 2 Feb 2025 16:29:20 +0000 (08:29 -0800)]
seccomp: passthrough uretprobe systemcall without filtering

When attaching uretprobes to processes running inside docker, the attached
process is segfaulted when encountering the retprobe.

The reason is that now that uretprobe is a system call the default seccomp
filters in docker block it as they only allow a specific set of known
syscalls. This is true for other userspace applications which use seccomp
to control their syscall surface.

Since uretprobe is a "kernel implementation detail" system call which is
not used by userspace application code directly, it is impractical and
there's very little point in forcing all userspace applications to
explicitly allow it in order to avoid crashing tracked processes.

Pass this systemcall through seccomp without depending on configuration.

Note: uretprobe is currently only x86_64 and isn't expected to ever be
supported in i386.

Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe")
Reported-by: Rafael Buchbinder <rafi@rbk.io>
Closes: https://lore.kernel.org/lkml/CAHsH6Gs3Eh8DFU0wq58c_LF8A4_+o6z456J7BidmcVY2AqOnHQ@mail.gmail.com/
Link: https://lore.kernel.org/lkml/20250121182939.33d05470@gandalf.local.home/T/#me2676c378eff2d6a33f3054fed4a5f3afa64e65b
Link: https://lore.kernel.org/lkml/20250128145806.1849977-1-eyal.birger@gmail.com/
Cc: stable@vger.kernel.org
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Link: https://lore.kernel.org/r/20250202162921.335813-2-eyal.birger@gmail.com
[kees: minimized changes for easier backporting, tweaked commit log]
Signed-off-by: Kees Cook <kees@kernel.org>