]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
4 months agobcachefs: implement eytzinger0_find_gt directly
Andreas Gruenbacher [Mon, 27 Jan 2025 16:52:39 +0000 (17:52 +0100)]
bcachefs: implement eytzinger0_find_gt directly

Instead of implementing eytzinger0_find_gt() in terms of
eytzinger0_find_le() and adjusting the result, implement it directly.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: add eytzinger0_find_gt self test
Andreas Gruenbacher [Mon, 27 Jan 2025 16:05:21 +0000 (17:05 +0100)]
bcachefs: add eytzinger0_find_gt self test

Add an eytzinger0_find_gt() self test similar to eytzinger0_find_le().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: simplify eytzinger0_find_le
Andreas Gruenbacher [Mon, 27 Jan 2025 13:33:20 +0000 (14:33 +0100)]
bcachefs: simplify eytzinger0_find_le

Replace the over-complicated implementation of eytzinger0_find_le() by
an equivalent, simpler version.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: convert eytzinger0_find_le to be 1-based
Andreas Gruenbacher [Tue, 28 Jan 2025 09:56:04 +0000 (10:56 +0100)]
bcachefs: convert eytzinger0_find_le to be 1-based

eytzinger0_find_le() is also easy to concert to 1-based eytzinger (but
see the next commit).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: improve eytzinger0_find_le self test
Andreas Gruenbacher [Sun, 26 Jan 2025 16:57:06 +0000 (17:57 +0100)]
bcachefs: improve eytzinger0_find_le self test

Rename eytzinger0_find_test_val() to eytzinger0_find_test_le() and add a
new eytzinger0_find_test_val() wrapper that calls it.

We have already established that the array is sorted in eytzinger order,
so we can use the eytzinger iterator functions and check the boundary
conditions to verify the result of eytzinger0_find_le().

Only scan the entire array if we get an incorrect result.  When we need
to scan, use eytzinger0_for_each_prev() so that we'll stop at the
highest matching element in the array in case there are duplicates;
going through the array linearly wouldn't give us that.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: add eytzinger0_for_each_prev
Andreas Gruenbacher [Mon, 27 Jan 2025 16:26:05 +0000 (17:26 +0100)]
bcachefs: add eytzinger0_for_each_prev

Add an eytzinger0_for_each_prev() macro for iterating through an
eytzinger array in reverse.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: eytzinger0_find_test improvement
Andreas Gruenbacher [Sun, 26 Jan 2025 10:22:33 +0000 (11:22 +0100)]
bcachefs: eytzinger0_find_test improvement

In eytzinger0_find_test(), remember the smallest element seen so far
instead of comparing adjacent array elements.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: eytzinger[01]_test improvement
Andreas Gruenbacher [Sun, 26 Jan 2025 10:28:59 +0000 (11:28 +0100)]
bcachefs: eytzinger[01]_test improvement

In eytzinger[01]_test(), make sure that eytzinger[01]_for_each()
iterates over all array elements.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: eytzinger self tests: fix cmp_u16 typo
Andreas Gruenbacher [Tue, 26 Nov 2024 22:33:55 +0000 (23:33 +0100)]
bcachefs: eytzinger self tests: fix cmp_u16 typo

Fix an obvious typo in cmp_u16().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: eytzinger self tests: missing newline termination
Andreas Gruenbacher [Tue, 26 Nov 2024 20:55:49 +0000 (21:55 +0100)]
bcachefs: eytzinger self tests: missing newline termination

pr_info() format strings need to be newline terminated.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: eytzinger self tests: loop cleanups
Andreas Gruenbacher [Tue, 26 Nov 2024 11:12:36 +0000 (12:12 +0100)]
bcachefs: eytzinger self tests: loop cleanups

The iterator variable of eytzinger0_for_each() loops has been changed to
be locally scoped at some point, so remove variables defined outside the
loop that are now unused.  In addition and for clarity, use a different
variable inside those loops where an outside variable would be shadowed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: EYTZINGER_DEBUG fix
Andreas Gruenbacher [Tue, 28 Jan 2025 00:39:23 +0000 (01:39 +0100)]
bcachefs: EYTZINGER_DEBUG fix

When EYTZINGER_DEBUG is defined, <linux/bug.h> needs to be included.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: bch2_blacklist_entries_gc cleanup
Andreas Gruenbacher [Tue, 28 Jan 2025 09:32:47 +0000 (10:32 +0100)]
bcachefs: bch2_blacklist_entries_gc cleanup

Use an eytzinger0_for_each() loop here.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: bch2_bkey_ptr_data_type() now correctly returns cached for cached ptrs
Kent Overstreet [Fri, 7 Feb 2025 21:58:34 +0000 (16:58 -0500)]
bcachefs: bch2_bkey_ptr_data_type() now correctly returns cached for cached ptrs

Necessary for adding backpointers for cached pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Add time_stat for btree writes
Kent Overstreet [Mon, 27 Jan 2025 06:22:42 +0000 (01:22 -0500)]
bcachefs: Add time_stat for btree writes

We have other metadata IO types covered, this was missing.

Note: this includes the time until completion, i.e. including parent
pointer update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Add comment explaining why asserts in invalidate_one_bucket() are impossible
Kent Overstreet [Fri, 7 Feb 2025 22:12:47 +0000 (17:12 -0500)]
bcachefs: Add comment explaining why asserts in invalidate_one_bucket() are impossible

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Ignore backpointers to stripes in ec_stripe_update_extents()
Kent Overstreet [Sat, 8 Feb 2025 02:26:27 +0000 (21:26 -0500)]
bcachefs: Ignore backpointers to stripes in ec_stripe_update_extents()

Prep work for stripe backpointers: this path previously would get very
confused at being asked to process (remove redundant replicas) stripes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Increase JOURNAL_BUF_NR
Kent Overstreet [Thu, 23 Jan 2025 18:46:47 +0000 (13:46 -0500)]
bcachefs: Increase JOURNAL_BUF_NR

Increase journal pipelining.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Free journal bufs when not in use
Kent Overstreet [Thu, 6 Feb 2025 00:13:39 +0000 (19:13 -0500)]
bcachefs: Free journal bufs when not in use

Since we're increasing the number of 'struct journal_bufs', we don't
want them all permanently holding onto buffers for the journal data -
that'd be 16 * 2MB = 32MB, or potentially more.

Add a single-element mempool (open coded, since buffer size varies),
this also means we won't be hitting the memory allocator every time we
open and close a journal entry/buffer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Don't touch journal_buf->data->seq in journal_res_get
Kent Overstreet [Thu, 23 Jan 2025 18:06:35 +0000 (13:06 -0500)]
bcachefs: Don't touch journal_buf->data->seq in journal_res_get

This is a small optimization, reducing the number of cachelines we touch
in the fast path - and it's also necessary for the next patch that
increases JOURNAL_BUF_NR.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Kill journal_res.idx
Kent Overstreet [Thu, 23 Jan 2025 19:02:44 +0000 (14:02 -0500)]
bcachefs: Kill journal_res.idx

More dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Kill journal_res_state.unwritten_idx
Kent Overstreet [Thu, 23 Jan 2025 18:43:15 +0000 (13:43 -0500)]
bcachefs: Kill journal_res_state.unwritten_idx

Dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: add progress indicator to check_allocations
Kent Overstreet [Fri, 7 Feb 2025 19:01:05 +0000 (14:01 -0500)]
bcachefs: add progress indicator to check_allocations

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Add a progress indicator to bch2_dev_data_drop()
Kent Overstreet [Thu, 6 Feb 2025 21:25:29 +0000 (16:25 -0500)]
bcachefs: Add a progress indicator to bch2_dev_data_drop()

This code needs quite a bit of work: we don't want to be walking all
metadata in the filesystem, we should just be walking backpointers, and
it should be switched to a data ioctl that can report progress via a
file descriptor, not the system console.

But that'll take more work - before we can safely walk only backpointers
we need to change device add to not reuse device indexes, since with
that change accounting being wrong introduces the possibility of
removing a device that still has pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Factor out progress.[ch]
Kent Overstreet [Thu, 6 Feb 2025 20:59:28 +0000 (15:59 -0500)]
bcachefs: Factor out progress.[ch]

the backpointers code has progress indicators; these aren't great, since
they print to the dmesg console and we much prefer to have progress
indicators reporting to a specific userspace program so they're not
spamming the system console.

But not all codepaths that need progress indicators support that yet,
and we don't want users to think "this is hung".

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: bch2_inum_offset_err_msg_trans() no longer handles transaction restarts
Kent Overstreet [Fri, 7 Feb 2025 18:37:30 +0000 (13:37 -0500)]
bcachefs: bch2_inum_offset_err_msg_trans() no longer handles transaction restarts

we're starting to use error messages with paths in fsck_errors(), where
we do not want nested transaction restart handling, so let's prepare for
that.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: bch2_indirect_extent_missing_error() prints path, not just inode number
Kent Overstreet [Fri, 7 Feb 2025 06:33:01 +0000 (01:33 -0500)]
bcachefs: bch2_indirect_extent_missing_error() prints path, not just inode number

We want all error messages converted to print paths, not just inode
numbers - users want this information, and it speeds up debugging too.

Auditing and converting all error messages is going to be a big project,
so for the moment we're just doing this incrementally.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Convert migrate to move_data_phys()
Kent Overstreet [Thu, 23 Jan 2025 16:45:22 +0000 (11:45 -0500)]
bcachefs: Convert migrate to move_data_phys()

Iterating over backpointers on a specific device is potentially much
cheaper than walking all filesystem data.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Read/move path counter work
Kent Overstreet [Tue, 4 Feb 2025 01:15:52 +0000 (20:15 -0500)]
bcachefs: Read/move path counter work

Reorganize counters a bit, grouping related counters together.

New counters:
- io_read_inline
- io_read_hole

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Fix subtraction underflow
Alan Huang [Mon, 27 Jan 2025 09:12:41 +0000 (17:12 +0800)]
bcachefs: Fix subtraction underflow

When ancestor is less than IS_ANCESTOR_BITMAP, we would get an incorrect
result.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Scrub
Kent Overstreet [Sun, 29 Dec 2024 00:59:55 +0000 (19:59 -0500)]
bcachefs: Scrub

Add a new data op to walk all data and metadata in a filesystem,
checking if it can be read successfully, and on error repairing from
another copy if possible.

- New helper: bch2_dev_idx_is_online(), so that we can bail out and
  report to userspace when we're unable to scrub because the device is
  offline

- data_update_opts, which controls the data move path, now understands
  scrub: data is only read, not written. The read path is responsible
  for rewriting on read error, as with other reads.

- scrub_pred skips data extents that don't have checksums

- bch_ioctl_data has a new scrub member, which has a data_types field
  for data types to check - i.e. all data types, or only metadata.

- Add new entries to bch_move_stats so that we can report numbers for
  corrected and uncorrected errors

- Add a new enum to bch_ioctl_data_event for explicitly reporting
  completion and return code (i.e. device offline)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: bch2_btree_node_scrub()
Kent Overstreet [Mon, 30 Dec 2024 21:24:23 +0000 (16:24 -0500)]
bcachefs: bch2_btree_node_scrub()

Add a function for scrubbing btree nodes - reading them in, and kicking
off a rewrite if there's an error.

The btree_node_read_done() checks have to be duplicated because we're
not using a pointer to a struct btree - the btree node might already be
in cache, and we need to check a specific replica, which might not be
the one we previously read from.

This will be used in the next patch implementing high-level scrub.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: bch2_bkey_pick_read_device() can now specify a device
Kent Overstreet [Sun, 29 Dec 2024 00:58:47 +0000 (19:58 -0500)]
bcachefs: bch2_bkey_pick_read_device() can now specify a device

To be used for scrub, where we want the read to come from a specific
device.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: __bch2_move_data_phys() now uses bch2_btree_node_rewrite_pos()
Kent Overstreet [Sun, 29 Dec 2024 02:04:36 +0000 (21:04 -0500)]
bcachefs: __bch2_move_data_phys() now uses bch2_btree_node_rewrite_pos()

Kill most of the separate logic for btree nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: bch2_move_data_phys()
Kent Overstreet [Sat, 28 Dec 2024 15:40:11 +0000 (10:40 -0500)]
bcachefs: bch2_move_data_phys()

Add a more general version of bch2_evacuate_bucket - to be used for
scrub.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: bch2_btree_node_rewrite_pos()
Kent Overstreet [Sun, 29 Dec 2024 02:00:34 +0000 (21:00 -0500)]
bcachefs: bch2_btree_node_rewrite_pos()

Add a new helper for rewriting a btree node given a just the key, not a
pointer to the node itself.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: backpointer_get_key() doesn't pull in btree node
Kent Overstreet [Sat, 28 Dec 2024 21:20:38 +0000 (16:20 -0500)]
bcachefs: backpointer_get_key() doesn't pull in btree node

We may not need to pull in a btree node when walking backpointers -
don't do so unnecessarily when using backpointer_get_key().

It'll still fall back to backpointer_get_node() in a few situations,
including btree roots (where an iterator can't point at just the key),
and races due to the interior update path not having deleted a
backpointer to an old node yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Internal reads can now correct errors
Kent Overstreet [Mon, 30 Dec 2024 21:32:57 +0000 (16:32 -0500)]
bcachefs: Internal reads can now correct errors

Rework the read path so that BCH_READ_NODECODE reads now also self-heal
after a read error and a successful retry - prerequisite for scrub.

- __bch2_read_endio() now handles a read that's both BCH_READ_NODECODE
  and a bounce.

  Normally, we don't want a BCH_READ_NODECODE read to ever allocate a
  split bch_read_bio: we want to maintain the relationship between the
  bch_read_bio and the data_update it's embedded in.

  But correcting read errors requires allocating a split/bounce rbio
  that's embedded in a promote_op. We do still have a 1-1 relationship,
  i.e. we only allocate a single split/bounce if it's a
  BCH_READ_NODECODE, so things hopefully don't get too crazy.

- __bch2_read_extent() now is allowed to allocate the promote_op for
  rewriting after a failed read, even if it's BCH_READ_NODECODE.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Don't self-heal if a data update is already rewriting
Kent Overstreet [Mon, 20 Jan 2025 01:34:57 +0000 (20:34 -0500)]
bcachefs: Don't self-heal if a data update is already rewriting

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Don't start promotes from bch2_rbio_free()
Kent Overstreet [Sat, 18 Jan 2025 00:26:10 +0000 (19:26 -0500)]
bcachefs: Don't start promotes from bch2_rbio_free()

we don't want to block completion of the read - starting a promote calls
into the write path, which will block.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Bail out early on alloc_nowait data updates
Kent Overstreet [Sun, 19 Jan 2025 18:55:33 +0000 (13:55 -0500)]
bcachefs: Bail out early on alloc_nowait data updates

If a data update doesn't want to block on allocations (promotes, self
healing on read error) - check if the allocation would fail before
kicking off the data update and calling into the write path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Rework init order in bch2_data_update_init()
Kent Overstreet [Sun, 19 Jan 2025 18:43:44 +0000 (13:43 -0500)]
bcachefs: Rework init order in bch2_data_update_init()

Initialize the write op first, so that in the next patch we can check if
the allocator would block (for BCH_WRITE_alloc_nowait ops) and bail out
before taking nocow locks/dev refs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Self healing writes are BCH_WRITE_alloc_nowait
Kent Overstreet [Sat, 18 Jan 2025 07:05:57 +0000 (02:05 -0500)]
bcachefs: Self healing writes are BCH_WRITE_alloc_nowait

If a drive is failing and we're moving data off of it, we can't
necessairly depend on capacity/disk reservation calculations to avoid
deadlocking/blocking on the allocator.

And, we don't want to queue up infinite self healing moves anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Promotes should use BCH_WRITE_only_specified_devs
Kent Overstreet [Sun, 19 Jan 2025 18:11:24 +0000 (13:11 -0500)]
bcachefs: Promotes should use BCH_WRITE_only_specified_devs

Promotes, like most other internal moves, should only go to the
specified target and not fall back to allocating from the full
filesystem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Be stricter in bch2_read_retry_nodecode()
Kent Overstreet [Thu, 16 Jan 2025 08:43:03 +0000 (03:43 -0500)]
bcachefs: Be stricter in bch2_read_retry_nodecode()

Now that data_update embeds bch_read_bio, BCH_READ_NODECODE means that
the read is embedded in a a data_update - and we can check in the retry
path if the extent has changed and bail out.

This likely fixes some subtle bugs with read errors and data moves.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: cleanup redundant code around data_update_op initialization
Kent Overstreet [Thu, 16 Jan 2025 05:40:43 +0000 (00:40 -0500)]
bcachefs: cleanup redundant code around data_update_op initialization

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: bch2_update_unwritten_extent() no longer depends on wbio
Kent Overstreet [Fri, 17 Jan 2025 19:26:30 +0000 (14:26 -0500)]
bcachefs: bch2_update_unwritten_extent() no longer depends on wbio

Prep work for improving bch2_data_update_init().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: promote_op uses embedded bch_read_bio
Kent Overstreet [Thu, 16 Jan 2025 03:22:29 +0000 (22:22 -0500)]
bcachefs: promote_op uses embedded bch_read_bio

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: data_update now embeds bch_read_bio
Kent Overstreet [Wed, 15 Jan 2025 23:53:55 +0000 (18:53 -0500)]
bcachefs: data_update now embeds bch_read_bio

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: rbio_init() cleanup
Kent Overstreet [Wed, 15 Jan 2025 17:59:43 +0000 (12:59 -0500)]
bcachefs: rbio_init() cleanup

Move more initialization to rbio_init(), to assist in further cleanups.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: rbio_init_fragment()
Kent Overstreet [Tue, 14 Jan 2025 20:20:04 +0000 (15:20 -0500)]
bcachefs: rbio_init_fragment()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Rename BCH_WRITE flags fer consistency with other x-macros enums
Kent Overstreet [Sun, 19 Jan 2025 18:18:50 +0000 (13:18 -0500)]
bcachefs: Rename BCH_WRITE flags fer consistency with other x-macros enums

The uppercase/lowercase style is nice for making the namespace explicit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: x-macroize BCH_READ flags
Kent Overstreet [Fri, 17 Jan 2025 22:40:39 +0000 (17:40 -0500)]
bcachefs: x-macroize BCH_READ flags

Will be adding a bch2_read_bio_to_text().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: kill bch_read_bio.devs_have
Kent Overstreet [Fri, 17 Jan 2025 15:47:42 +0000 (10:47 -0500)]
bcachefs: kill bch_read_bio.devs_have

Dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: bch2_data_update_inflight_to_text()
Kent Overstreet [Tue, 31 Dec 2024 23:16:17 +0000 (18:16 -0500)]
bcachefs: bch2_data_update_inflight_to_text()

Add a new helper for bch2_moving_ctxt_to_text(), which may be used to
debug if moving_ios are getting stuck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: BCH_IOCTL_QUERY_COUNTERS
Kent Overstreet [Mon, 27 Jan 2025 03:05:02 +0000 (22:05 -0500)]
bcachefs: BCH_IOCTL_QUERY_COUNTERS

Add an ioctl for querying counters, the same ones provided in
/sys/fs/bcachefs/<uuid>/counters/, but more suitable for a 'bcachefs
top' command.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: BCH_COUNTER_bucket_discard_fast
Kent Overstreet [Thu, 30 Jan 2025 08:33:16 +0000 (03:33 -0500)]
bcachefs: BCH_COUNTER_bucket_discard_fast

Add a separate counter for fastpath bucket discards, which don't require
a journal flush.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: enum bch_persistent_counters_stable
Kent Overstreet [Thu, 30 Jan 2025 08:28:27 +0000 (03:28 -0500)]
bcachefs: enum bch_persistent_counters_stable

Persistent counters, like recovery passes, include a stable enum in
their definition - but this was never correctly plumbed.

This allows us to add new counters and properly organize them with a
non-stable "presentation order", which can also be used in userspace by
the new 'bcachefs fs top' tool.

Fortunatel, since we haven't yet added any new counters where
presentation order ID doesn't match stable ID, this won't cause any
reordering issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Separate running/runnable in wp stats
Kent Overstreet [Wed, 22 Jan 2025 19:38:59 +0000 (14:38 -0500)]
bcachefs: Separate running/runnable in wp stats

We've got per-writepoint statistics to see how well the writepoint index
update threads are pipelining; this separates running vs. runnable so we
can see at a glance if they're blocking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Move write_points to debugfs
Kent Overstreet [Wed, 22 Jan 2025 17:07:54 +0000 (12:07 -0500)]
bcachefs: Move write_points to debugfs

this was hitting the sysfs 4k limit

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Don't inc io_(read|write) counters for moves
Kent Overstreet [Thu, 30 Jan 2025 08:41:31 +0000 (03:41 -0500)]
bcachefs: Don't inc io_(read|write) counters for moves

This makes 'bcachefs fs top' more useful; we can now see at a glance
whether the IO to the device is being done for user reads/writes, or
copygc/rebalance.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Fix missing increment of move_extent_write counter
Kent Overstreet [Wed, 29 Jan 2025 20:51:37 +0000 (15:51 -0500)]
bcachefs: Fix missing increment of move_extent_write counter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: check_bp_exists() check for backpointers for stale pointers
Kent Overstreet [Tue, 11 Feb 2025 18:33:08 +0000 (13:33 -0500)]
bcachefs: check_bp_exists() check for backpointers for stale pointers

Early version of 'bcachefs_metadata_version_cached_backpointers' was
creating backpointers for stale cached pointers - whoops. Now we have to
repair those.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: btree_node_(rewrite|update_key) cleanup
Kent Overstreet [Tue, 25 Feb 2025 20:04:58 +0000 (15:04 -0500)]
bcachefs: btree_node_(rewrite|update_key) cleanup

Factor out get_iter_to_node() and use it for
btree_node_rewrite_get_iter(), to be used for fixing btree node write
error behaviour.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: bs > ps support
Kent Overstreet [Thu, 20 Feb 2025 20:03:38 +0000 (15:03 -0500)]
bcachefs: bs > ps support

bcachefs removed most PAGE_SIZE references long ago, so this is easy;
only readpage_bio_extend() has to be tweaked to respect the minimum
order.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: fix build on 32 bit in get_random_u64_below()
Kent Overstreet [Fri, 14 Mar 2025 22:20:20 +0000 (18:20 -0400)]
bcachefs: fix build on 32 bit in get_random_u64_below()

bare 64 bit divides not allowed, whoops

arm-linux-gnueabi-ld: drivers/char/random.o: in function `__get_random_u64_below':
drivers/char/random.c:602:(.text+0xc70): undefined reference to `__aeabi_uldivmod'

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Change btree wb assert to runtime error
Kent Overstreet [Fri, 14 Mar 2025 13:54:43 +0000 (09:54 -0400)]
bcachefs: Change btree wb assert to runtime error

We just had a report of the assert for "btree in write buffer for
non-write buffer btree" popping during the 6.14 upgrade.

- 150TB filesystem, after a reboot the upgrade was able to continue from
  where it left off, so no major damage.

But with 6.14 about to come out we want to get this tracked down asap,
and need more data if other users hit this.

Convert the BUG_ON() to an emergency read-only, and print out btree, the
key itself, and stack trace from the original write buffer update (which
did not have this check before).

Reported-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: bch2_get_random_u64_below()
Kent Overstreet [Thu, 13 Mar 2025 15:16:28 +0000 (11:16 -0400)]
bcachefs: bch2_get_random_u64_below()

steal the (clever) algorithm from get_random_u32_below()

this fixes a bug where we were passing roundup_pow_of_two() a 64 bit
number - we're squaring device latencies now:

[  +1.681698] ------------[ cut here ]------------
[  +0.000010] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[  +0.000011] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[  +0.000011] CPU: 1 UID: 0 PID: 196 Comm: kworker/u32:13 Not tainted 6.14.0-rc6-dave+ #10
[  +0.000012] Hardware name: ASUS System Product Name/PRIME B460I-PLUS, BIOS 1301 07/13/2021
[  +0.000005] Workqueue: events_unbound __bch2_read_endio [bcachefs]
[  +0.000354] Call Trace:
[  +0.000005]  <TASK>
[  +0.000007]  dump_stack_lvl+0x5d/0x80
[  +0.000018]  ubsan_epilogue+0x5/0x30
[  +0.000008]  __ubsan_handle_shift_out_of_bounds.cold+0x61/0xe6
[  +0.000011]  bch2_rand_range.cold+0x17/0x20 [bcachefs]
[  +0.000231]  bch2_bkey_pick_read_device+0x547/0x920 [bcachefs]
[  +0.000229]  __bch2_read_extent+0x1e4/0x18e0 [bcachefs]
[  +0.000241]  ? bch2_btree_iter_peek_slot+0x3df/0x800 [bcachefs]
[  +0.000180]  ? bch2_read_retry_nodecode+0x270/0x330 [bcachefs]
[  +0.000230]  bch2_read_retry_nodecode+0x270/0x330 [bcachefs]
[  +0.000230]  bch2_rbio_retry+0x1fa/0x600 [bcachefs]
[  +0.000224]  ? bch2_printbuf_make_room+0x71/0xb0 [bcachefs]
[  +0.000243]  ? bch2_read_csum_err+0x4a4/0x610 [bcachefs]
[  +0.000278]  bch2_read_csum_err+0x4a4/0x610 [bcachefs]
[  +0.000227]  ? __bch2_read_endio+0x58b/0x870 [bcachefs]
[  +0.000220]  __bch2_read_endio+0x58b/0x870 [bcachefs]
[  +0.000268]  ? try_to_wake_up+0x31c/0x7f0
[  +0.000011]  ? process_one_work+0x176/0x330
[  +0.000008]  process_one_work+0x176/0x330
[  +0.000008]  worker_thread+0x252/0x390
[  +0.000008]  ? __pfx_worker_thread+0x10/0x10
[  +0.000006]  kthread+0xec/0x230
[  +0.000011]  ? __pfx_kthread+0x10/0x10
[  +0.000009]  ret_from_fork+0x31/0x50
[  +0.000009]  ? __pfx_kthread+0x10/0x10
[  +0.000008]  ret_from_fork_asm+0x1a/0x30
[  +0.000012]  </TASK>
[  +0.000046] ---[ end trace ]---

Reported-by: Roland Vet <vet.roland@protonmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: target_congested -> get_random_u32_below()
Kent Overstreet [Thu, 13 Mar 2025 13:56:07 +0000 (09:56 -0400)]
bcachefs: target_congested -> get_random_u32_below()

get_random_u32_below() has a better algorithm than bch2_rand_range(),
it just didn't exist at the time.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: fix tiny leak in bch2_dev_add()
Kent Overstreet [Wed, 12 Mar 2025 21:21:31 +0000 (17:21 -0400)]
bcachefs: fix tiny leak in bch2_dev_add()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Make sure trans is unlocked when submitting read IO
Kent Overstreet [Tue, 11 Mar 2025 14:39:36 +0000 (10:39 -0400)]
bcachefs: Make sure trans is unlocked when submitting read IO

We were still using the trans after the unlock, leading to this bug in
the retry path:

00255 ------------[ cut here ]------------
00255 kernel BUG at fs/bcachefs/btree_iter.c:3348!
00255 Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
00255 bcachefs (0ca38fe8-0a26-41f9-9b5d-6a27796c7803): /fiotest offset 86048768: no device to read from:
00255   u64s 8 type extent 4098:168192:U32_MAX len 128 ver 0: durability: 0 crc: c_size 128 size 128 offset 0 nonce 0 csum crc32c 0:8040a368  compress none ec: idx 83 block 1 ptr: 0:302:128 gen 0
00255 bcachefs (0ca38fe8-0a26-41f9-9b5d-6a27796c7803): /fiotest offset 85983232: no device to read from:
00255   u64s 8 type extent 4098:168064:U32_MAX len 128 ver 0: durability: 0 crc: c_size 128 size 128 offset 0 nonce 0 csum crc32c 0:43311336  compress none ec: idx 83 block 1 ptr: 0:302:0 gen 0
00255 Modules linked in:
00255 CPU: 5 UID: 0 PID: 304 Comm: kworker/u70:2 Not tainted 6.14.0-rc6-ktest-g526aae23d67d #16040
00255 Hardware name: linux,dummy-virt (DT)
00255 Workqueue: events_unbound bch2_rbio_retry
00255 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00255 pc : __bch2_trans_get+0x100/0x378
00255 lr : __bch2_trans_get+0xa0/0x378
00255 sp : ffffff80c865b760
00255 x29: ffffff80c865b760 x28: 0000000000000000 x27: ffffff80d76ed880
00255 x26: 0000000000000018 x25: 0000000000000000 x24: ffffff80f4ec3760
00255 x23: ffffff80f4010140 x22: 0000000000000056 x21: ffffff80f4ec0000
00255 x20: ffffff80f4ec3788 x19: ffffff80d75f8000 x18: 00000000ffffffff
00255 x17: 2065707974203820 x16: 7334367520200a3a x15: 0000000000000008
00255 x14: 0000000000000001 x13: 0000000000000100 x12: 0000000000000006
00255 x11: ffffffc080b47a40 x10: 0000000000000000 x9 : ffffffc08038dea8
00255 x8 : ffffff80d75fc018 x7 : 0000000000000000 x6 : 0000000000003788
00255 x5 : 0000000000003760 x4 : ffffff80c922de80 x3 : ffffff80f18f0000
00255 x2 : ffffff80c922de80 x1 : 0000000000000130 x0 : 0000000000000006
00255 Call trace:
00255  __bch2_trans_get+0x100/0x378 (P)
00255  bch2_read_io_err+0x98/0x260
00255  bch2_read_endio+0xb8/0x2d0
00255  __bch2_read_extent+0xce8/0xfe0
00255  __bch2_read+0x2a8/0x978
00255  bch2_rbio_retry+0x188/0x318
00255  process_one_work+0x154/0x390
00255  worker_thread+0x20c/0x3b8
00255  kthread+0xf0/0x1b0
00255  ret_from_fork+0x10/0x20
00255 Code: 6b01001f 54ffff01 79408460 3617fec0 (d4210000)
00255 ---[ end trace 0000000000000000 ]---
00255 Kernel panic - not syncing: Oops - BUG: Fatal exception
00255 SMP: stopping secondary CPUs
00255 Kernel Offset: disabled
00255 CPU features: 0x000,00000070,00000010,8240500b
00255 Memory Limit: none
00255 ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Initialize from_inode members for bch_io_opts
Roxana Nicolescu [Tue, 11 Mar 2025 15:06:10 +0000 (15:06 +0000)]
bcachefs: Initialize from_inode members for bch_io_opts

When there is no inode source, all "from_inode" members in the structure
bhc_io_opts should be set false.

Fixes: 7a7c43a0c1ecf ("bcachefs: Add bch_io_opts fields for indicating whether the opts came from the inode")
Reported-by: syzbot+c17ad4b4367b72a853cb@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c17ad4b4367b72a853cb
Signed-off-by: Roxana Nicolescu <nicolescu.roxana@protonmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Fix b->written overflow
Alan Huang [Fri, 7 Mar 2025 16:58:27 +0000 (00:58 +0800)]
bcachefs: Fix b->written overflow

When bset past end of btree node, we should not add sectors to
b->written, which will overflow b->written.

Reported-by: syzbot+3cb3d9e8c3f197754825@syzkaller.appspotmail.com
Tested-by: syzbot+3cb3d9e8c3f197754825@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agoLinux 6.14-rc6
Linus Torvalds [Sun, 9 Mar 2025 23:45:25 +0000 (13:45 -1000)]
Linux 6.14-rc6

4 months agoMerge tag 'kbuild-fixes-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 9 Mar 2025 19:23:14 +0000 (09:23 -1000)]
Merge tag 'kbuild-fixes-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Use the specified $(LD) when building userprogs with Clang

 - Pass the correct target triple when compile-testing UAPI headers
   with Clang

 - Fix pacman-pkg build error with KBUILD_OUTPUT

* tag 'kbuild-fixes-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: install-extmod-build: Fix build when specifying KBUILD_OUTPUT
  docs: Kconfig: fix defconfig description
  kbuild: hdrcheck: fix cross build with clang
  kbuild: userprogs: use correct lld when linking through clang

4 months agoMerge tag 'usb-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 9 Mar 2025 19:14:07 +0000 (09:14 -1000)]
Merge tag 'usb-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small USB driver fixes for some reported issues. These
  contain:

   - typec driver fixes

   - dwc3 driver fixes

   - xhci driver fixes

   - renesas controller fixes

   - gadget driver fixes

   - a new USB quirk added

  All of these have been in linux-next with no reported issues"

* tag 'usb-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: ucsi: Fix NULL pointer access
  usb: quirks: Add DELAY_INIT and NO_LPM for Prolific Mass Storage Card Reader
  usb: xhci: Fix host controllers "dying" after suspend and resume
  usb: dwc3: Set SUSPENDENABLE soon after phy init
  usb: hub: lack of clearing xHC resources
  usb: renesas_usbhs: Flush the notify_hotplug_work
  usb: renesas_usbhs: Use devm_usb_get_phy()
  usb: renesas_usbhs: Call clk_put()
  usb: dwc3: gadget: Prevent irq storm when TH re-executes
  usb: gadget: Check bmAttributes only if configuration is valid
  xhci: Restrict USB4 tunnel detection for USB3 devices to Intel hosts
  usb: xhci: Enable the TRB overfetch quirk on VIA VL805
  usb: gadget: Fix setting self-powered state on suspend
  usb: typec: ucsi: increase timeout for PPM reset operations
  acpi: typec: ucsi: Introduce a ->poll_cci method
  usb: typec: tcpci_rt1711h: Unmask alert interrupts to fix functionality
  usb: gadget: Set self-powered based on MaxPower and bmAttributes
  usb: gadget: u_ether: Set is_suspend flag if remote wakeup fails
  usb: atm: cxacru: fix a flaw in existing endpoint checks

4 months agoMerge tag 'driver-core-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 9 Mar 2025 19:11:42 +0000 (09:11 -1000)]
Merge tag 'driver-core-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fix from Greg KH:
 "Here is a single driver core fix that resolves a reported memory leak.

  It's been in linux-next for 2 weeks now with no reported problems"

* tag 'driver-core-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  drivers: core: fix device leak in __fw_devlink_relax_cycles()

4 months agoMerge tag 'char-misc-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sun, 9 Mar 2025 19:07:54 +0000 (09:07 -1000)]
Merge tag 'char-misc-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc/IIO driver fixes from Greg KH:
 "Here are a number of misc and char and iio driver fixes that have been
  sitting in my tree for way too long. They contain:

   - iio driver fixes for reported issues

   - regression fix for rtsx_usb card reader

   - mei and mhi driver fixes

   - small virt driver fixes

   - ntsync permissions fix

   - other tiny driver fixes for reported problems.

  All of these have been in linux-next for quite a while with no
  reported issues"

* tag 'char-misc-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (30 commits)
  Revert "drivers/card_reader/rtsx_usb: Restore interrupt based detection"
  ntsync: Check wait count based on byte size.
  bus: simple-pm-bus: fix forced runtime PM use
  char: misc: deallocate static minor in error path
  eeprom: digsy_mtc: Make GPIO lookup table match the device
  drivers: virt: acrn: hsm: Use kzalloc to avoid info leak in pmcmd_ioctl
  binderfs: fix use-after-free in binder_devices
  slimbus: messaging: Free transaction ID in delayed interrupt scenario
  vbox: add HAS_IOPORT dependency
  cdx: Fix possible UAF error in driver_override_show()
  intel_th: pci: Add Panther Lake-P/U support
  intel_th: pci: Add Panther Lake-H support
  intel_th: pci: Add Arrow Lake support
  intel_th: msu: Fix less trivial kernel-doc warnings
  intel_th: msu: Fix kernel-doc warnings
  MAINTAINERS: change maintainer for FSI
  ntsync: Set the permissions to be 0666
  bus: mhi: host: pci_generic: Use pci_try_reset_function() to avoid deadlock
  mei: vsc: Use "wakeuphostint" when getting the host wakeup GPIO
  mei: me: add panther lake P DID
  ...

4 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 9 Mar 2025 19:04:08 +0000 (09:04 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "arm64:

   - Fix a couple of bugs affecting pKVM's PSCI relay implementation
     when running in the hVHE mode, resulting in the host being entered
     with the MMU in an unknown state, and EL2 being in the wrong mode

  x86:

   - Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow

   - Ensure DEBUGCTL is context switched on AMD to avoid running the
     guest with the host's value, which can lead to unexpected bus lock
     #DBs

   - Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't
     properly emulate BTF. KVM's lack of context switching has meant BTF
     has always been broken to some extent

   - Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as
     the guest can enable DebugSwap without KVM's knowledge

   - Fix a bug in mmu_stress_tests where a vCPU could finish the "writes
     to RO memory" phase without actually generating a write-protection
     fault

   - Fix a printf() goof in the SEV smoke test that causes build
     failures with -Werror

   - Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when
     PERFMON_V2 isn't supported by KVM"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Explicitly zero EAX and EBX when PERFMON_V2 isn't supported by KVM
  KVM: selftests: Fix printf() format goof in SEV smoke test
  KVM: selftests: Ensure all vCPUs hit -EFAULT during initial RO stage
  KVM: SVM: Don't rely on DebugSwap to restore host DR0..DR3
  KVM: SVM: Save host DR masks on CPUs with DebugSwap
  KVM: arm64: Initialize SCTLR_EL1 in __kvm_hyp_init_cpu()
  KVM: arm64: Initialize HCR_EL2.E2H early
  KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
  KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
  KVM: x86: Snapshot the host's DEBUGCTL in common x86
  KVM: SVM: Suppress DEBUGCTL.BTF on AMD
  KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
  KVM: selftests: Assert that STI blocking isn't set after event injection
  KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI shadow

4 months agoMerge tag 'kvm-x86-fixes-6.14-rcN.2' of https://github.com/kvm-x86/linux into HEAD
Paolo Bonzini [Sun, 9 Mar 2025 07:44:06 +0000 (03:44 -0400)]
Merge tag 'kvm-x86-fixes-6.14-rcN.2' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 6.14-rcN #2

 - Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow.

 - Ensure DEBUGCTL is context switched on AMD to avoid running the guest with
   the host's value, which can lead to unexpected bus lock #DBs.

 - Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly
   emulate BTF.  KVM's lack of context switching has meant BTF has always been
   broken to some extent.

 - Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as the guest
   can enable DebugSwap without KVM's knowledge.

 - Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO
   memory" phase without actually generating a write-protection fault.

 - Fix a printf() goof in the SEV smoke test that causes build failures with
   -Werror.

 - Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2
   isn't supported by KVM.

4 months agoMerge tag 'kvmarm-fixes-6.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Sun, 9 Mar 2025 07:43:56 +0000 (03:43 -0400)]
Merge tag 'kvmarm-fixes-6.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.14, take #4

- Fix a couple of bugs affecting pKVM's PSCI relay implementation
  when running in the hVHE mode, resulting in the host being entered
  with the MMU in an unknown state, and EL2 being in the wrong mode.

4 months agoMerge tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 9 Mar 2025 00:34:06 +0000 (14:34 -1000)]
Merge tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "33 hotfixes. 24 are cc:stable and the remainder address post-6.13
  issues or aren't considered necessary for -stable kernels.

  26 are for MM and 7 are for non-MM.

   - "mm: memory_failure: unmap poisoned folio during migrate properly"
     from Ma Wupeng fixes a couple of two year old bugs involving the
     migration of hwpoisoned folios.

   - "selftests/damon: three fixes for false results" from SeongJae Park
     fixes three one year old bugs in the SAMON selftest code.

  The remainder are singletons and doubletons. Please see the individual
  changelogs for details"

* tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (33 commits)
  mm/page_alloc: fix uninitialized variable
  rapidio: add check for rio_add_net() in rio_scan_alloc_net()
  rapidio: fix an API misues when rio_add_net() fails
  MAINTAINERS: .mailmap: update Sumit Garg's email address
  Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone"
  mm: fix finish_fault() handling for large folios
  mm: don't skip arch_sync_kernel_mappings() in error paths
  mm: shmem: remove unnecessary warning in shmem_writepage()
  userfaultfd: fix PTE unmapping stack-allocated PTE copies
  userfaultfd: do not block on locking a large folio with raised refcount
  mm: zswap: use ATOMIC_LONG_INIT to initialize zswap_stored_pages
  mm: shmem: fix potential data corruption during shmem swapin
  mm: fix kernel BUG when userfaultfd_move encounters swapcache
  selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries
  selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms
  selftests/damon/damos_quota: make real expectation of quota exceeds
  include/linux/log2.h: mark is_power_of_2() with __always_inline
  NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
  mm, swap: avoid BUG_ON in relocate_cluster()
  mm: swap: use correct step in loop to wait all clusters in wait_for_allocation()
  ...

4 months agoMerge tag 'x86-urgent-2025-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 8 Mar 2025 19:29:54 +0000 (09:29 -1000)]
Merge tag 'x86-urgent-2025-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more x86 fixes from Ingo Molnar:

 - Add more model IDs to the AMD microcode version check, more people
   are hitting these checks

 - Fix a Xen guest boot warning related to AMD northbridge setup

 - Fix SEV guest bugs related to a recent changes in its locking logic

 - Fix a missing definition of PTRS_PER_PMD that assembly builds can hit

* tag 'x86-urgent-2025-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Add some forgotten models to the SHA check
  x86/mm: Define PTRS_PER_PMD for assembly code too
  virt: sev-guest: Move SNP Guest Request data pages handling under snp_cmd_mutex
  virt: sev-guest: Allocate request data dynamically
  x86/amd_nb: Use rdmsr_safe() in amd_get_mmconfig_range()

4 months agox86/microcode/AMD: Add some forgotten models to the SHA check
Borislav Petkov (AMD) [Fri, 7 Mar 2025 22:02:56 +0000 (23:02 +0100)]
x86/microcode/AMD: Add some forgotten models to the SHA check

Add some more forgotten models to the SHA check.

Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches")
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Link: https://lore.kernel.org/r/20250307220256.11816-1-bp@kernel.org
4 months agoMerge branch 'linus' into x86/urgent, to pick up dependent patches
Ingo Molnar [Sat, 8 Mar 2025 19:09:27 +0000 (20:09 +0100)]
Merge branch 'linus' into x86/urgent, to pick up dependent patches

Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 months agoMerge tag 'loongarch-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 8 Mar 2025 17:21:41 +0000 (07:21 -1000)]
Merge tag 'loongarch-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
 "Fix bugs in kernel build, hibernation, memory management and KVM"

* tag 'loongarch-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Fix GPA size issue about VM
  LoongArch: KVM: Reload guest CSR registers after sleep
  LoongArch: KVM: Add interrupt checking for AVEC
  LoongArch: Set hugetlb mmap base address aligned with pmd size
  LoongArch: Set max_pfn with the PFN of the last page
  LoongArch: Use polling play_dead() when resuming from hibernation
  LoongArch: Eliminate superfluous get_numa_distances_cnt()
  LoongArch: Convert unreachable() to BUG()

4 months agoLoongArch: KVM: Fix GPA size issue about VM
Bibo Mao [Sat, 8 Mar 2025 05:52:04 +0000 (13:52 +0800)]
LoongArch: KVM: Fix GPA size issue about VM

Physical address space is 48 bit on Loongson-3A5000 physical machine,
however it is 47 bit for VM on Loongson-3A5000 system. Size of physical
address space of VM is the same with the size of virtual user space (a
half) of physical machine.

Variable cpu_vabits represents user address space, kernel address space
is not included (user space and kernel space are both a half of total).
Here cpu_vabits, rather than cpu_vabits - 1, is to represent the size of
guest physical address space.

Also there is strict checking about page fault GPA address, inject error
if it is larger than maximum GPA address of VM.

Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
4 months agoLoongArch: KVM: Reload guest CSR registers after sleep
Bibo Mao [Sat, 8 Mar 2025 05:52:01 +0000 (13:52 +0800)]
LoongArch: KVM: Reload guest CSR registers after sleep

On host, the HW guest CSR registers are lost after suspend and resume
operation. Since last_vcpu of boot CPU still records latest vCPU pointer
so that the guest CSR register skips to reload when boot CPU resumes and
vCPU is scheduled.

Here last_vcpu is cleared so that guest CSR registers will reload from
scheduled vCPU context after suspend and resume.

Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
4 months agoLoongArch: KVM: Add interrupt checking for AVEC
Bibo Mao [Sat, 8 Mar 2025 05:51:59 +0000 (13:51 +0800)]
LoongArch: KVM: Add interrupt checking for AVEC

There is a newly added macro INT_AVEC with CSR ESTAT register, which is
bit 14 used for LoongArch AVEC support. AVEC interrupt status bit 14 is
supported with macro CSR_ESTAT_IS, so here replace the hard-coded value
0x1fff with macro CSR_ESTAT_IS so that the AVEC interrupt status is also
supported by KVM.

Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
4 months agoLoongArch: Set hugetlb mmap base address aligned with pmd size
Bibo Mao [Sat, 8 Mar 2025 05:51:32 +0000 (13:51 +0800)]
LoongArch: Set hugetlb mmap base address aligned with pmd size

With ltp test case "testcases/bin/hugefork02", there is a dmesg error
report message such as:

 kernel BUG at mm/hugetlb.c:5550!
 Oops - BUG[#1]:
 CPU: 0 UID: 0 PID: 1517 Comm: hugefork02 Not tainted 6.14.0-rc2+ #241
 Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022
 pc 90000000004eaf1c ra 9000000000485538 tp 900000010edbc000 sp 900000010edbf940
 a0 900000010edbfb00 a1 9000000108d20280 a2 00007fffe9474000 a3 00007ffff3474000
 a4 0000000000000000 a5 0000000000000003 a6 00000000003cadd3 a7 0000000000000000
 t0 0000000001ffffff t1 0000000001474000 t2 900000010ecd7900 t3 00007fffe9474000
 t4 00007fffe9474000 t5 0000000000000040 t6 900000010edbfb00 t7 0000000000000001
 t8 0000000000000005 u0 90000000004849d0 s9 900000010edbfa00 s0 9000000108d20280
 s1 00007fffe9474000 s2 0000000002000000 s3 9000000108d20280 s4 9000000002b38b10
 s5 900000010edbfb00 s6 00007ffff3474000 s7 0000000000000406 s8 900000010edbfa08
    ra: 9000000000485538 unmap_vmas+0x130/0x218
   ERA: 90000000004eaf1c __unmap_hugepage_range+0x6f4/0x7d0
  PRMD: 00000004 (PPLV0 +PIE -PWE)
  EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
 ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
 PRID: 0014c010 (Loongson-64bit, Loongson-3A5000)
 Process hugefork02 (pid: 1517, threadinfo=00000000a670eaf4, task=000000007a95fc64)
 Call Trace:
 [<90000000004eaf1c>] __unmap_hugepage_range+0x6f4/0x7d0
 [<9000000000485534>] unmap_vmas+0x12c/0x218
 [<9000000000494068>] exit_mmap+0xe0/0x308
 [<900000000025fdc4>] mmput+0x74/0x180
 [<900000000026a284>] do_exit+0x294/0x898
 [<900000000026aa30>] do_group_exit+0x30/0x98
 [<900000000027bed4>] get_signal+0x83c/0x868
 [<90000000002457b4>] arch_do_signal_or_restart+0x54/0xfa0
 [<90000000015795e8>] irqentry_exit_to_user_mode+0xb8/0x138
 [<90000000002572d0>] tlb_do_page_fault_1+0x114/0x1b4

The problem is that base address allocated from hugetlbfs is not aligned
with pmd size. Here add a checking for hugetlbfs and align base address
with pmd size. After this patch the test case "testcases/bin/hugefork02"
passes to run.

This is similar to the commit 7f24cbc9c4d42db8a3c8484d1 ("mm/mmap: teach
generic_get_unmapped_area{_topdown} to handle hugetlb mappings").

Cc: stable@vger.kernel.org # 6.13+
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
4 months agoLoongArch: Set max_pfn with the PFN of the last page
Bibo Mao [Sat, 8 Mar 2025 05:51:32 +0000 (13:51 +0800)]
LoongArch: Set max_pfn with the PFN of the last page

The current max_pfn equals to zero. In this case, it causes user cannot
get some page information through /proc filesystem such as kpagecount.
The following message is displayed by stress-ng test suite with command
"stress-ng --verbose --physpage 1 -t 1".

 # stress-ng --verbose --physpage 1 -t 1
 stress-ng: error: [1691] physpage: cannot read page count for address 0x134ac000 in /proc/kpagecount, errno=22 (Invalid argument)
 stress-ng: error: [1691] physpage: cannot read page count for address 0x7ffff207c3a8 in /proc/kpagecount, errno=22 (Invalid argument)
 stress-ng: error: [1691] physpage: cannot read page count for address 0x134b0000 in /proc/kpagecount, errno=22 (Invalid argument)
 ...

After applying this patch, the kernel can pass the test.

 # stress-ng --verbose --physpage 1 -t 1
 stress-ng: debug: [1701] physpage: [1701] started (instance 0 on CPU 3)
 stress-ng: debug: [1701] physpage: [1701] exited (instance 0 on CPU 3)
 stress-ng: debug: [1700] physpage: [1701] terminated (success)

Cc: stable@vger.kernel.org # 6.8+
Fixes: ff6c3d81f2e8 ("NUMA: optimize detection of memory with no node id assigned by firmware")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
4 months agoLoongArch: Use polling play_dead() when resuming from hibernation
Huacai Chen [Sat, 8 Mar 2025 05:51:32 +0000 (13:51 +0800)]
LoongArch: Use polling play_dead() when resuming from hibernation

When CONFIG_RANDOM_KMALLOC_CACHES or other randomization infrastructrue
enabled, the idle_task's stack may different between the booting kernel
and target kernel. So when resuming from hibernation, an ACTION_BOOT_CPU
IPI wakeup the idle instruction in arch_cpu_idle_dead() and jump to the
interrupt handler. But since the stack pointer is changed, the interrupt
handler cannot restore correct context.

So rename the current arch_cpu_idle_dead() to idle_play_dead(), make it
as the default version of play_dead(), and the new arch_cpu_idle_dead()
call play_dead() directly. For hibernation, implement an arch-specific
hibernate_resume_nonboot_cpu_disable() to use the polling version (idle
instruction is replace by nop, and irq is disabled) of play_dead(), i.e.
poll_play_dead(), to avoid IPI handler corrupting the idle_task's stack
when resuming from hibernation.

This solution is a little similar to commit 406f992e4a372dafbe3c ("x86 /
hibernate: Use hlt_play_dead() when resuming from hibernation").

Cc: stable@vger.kernel.org
Tested-by: Erpeng Xu <xuerpeng@uniontech.com>
Tested-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
4 months agoLoongArch: Eliminate superfluous get_numa_distances_cnt()
Yuli Wang [Sat, 8 Mar 2025 05:51:32 +0000 (13:51 +0800)]
LoongArch: Eliminate superfluous get_numa_distances_cnt()

In LoongArch, get_numa_distances_cnt() isn't in use, resulting in a
compiler warning.

Fix follow errors with clang-18 when W=1e:

arch/loongarch/kernel/acpi.c:259:28: error: unused function 'get_numa_distances_cnt' [-Werror,-Wunused-function]
  259 | static inline unsigned int get_numa_distances_cnt(struct acpi_table_slit *slit)
      |                            ^~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Link: https://lore.kernel.org/all/Z7bHPVUH4lAezk0E@kernel.org/
Signed-off-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
4 months agoLoongArch: Convert unreachable() to BUG()
Tiezhu Yang [Sat, 8 Mar 2025 05:50:45 +0000 (13:50 +0800)]
LoongArch: Convert unreachable() to BUG()

When compiling on LoongArch, there exists the following objtool warning
in arch/loongarch/kernel/machine_kexec.o:

  kexec_reboot() falls through to next function crash_shutdown_secondary()

Avoid using unreachable() as it can (and will in the absence of UBSAN)
generate fall-through code. Use BUG() so we get a "break BRK_BUG" trap
(with unreachable annotation).

Cc: stable@vger.kernel.org # 6.12+
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
4 months agoMerge tag 's390-6.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 8 Mar 2025 02:21:02 +0000 (16:21 -1000)]
Merge tag 's390-6.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - Fix return address recovery of traced function in ftrace to ensure
   reliable stack unwinding

 - Fix compiler warnings and runtime crashes of vDSO selftests on s390
   by introducing a dedicated GNU hash bucket pointer with correct
   32-bit entry size

 - Fix test_monitor_call() inline asm, which misses CC clobber, by
   switching to an instruction that doesn't modify CC

* tag 's390-6.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/ftrace: Fix return address recovery of traced function
  selftests/vDSO: Fix GNU hash table entry size for s390x
  s390/traps: Fix test_monitor_call() inline assembly

4 months agox86/mm: Define PTRS_PER_PMD for assembly code too
Ingo Molnar [Thu, 6 Mar 2025 22:00:16 +0000 (23:00 +0100)]
x86/mm: Define PTRS_PER_PMD for assembly code too

Andy reported the following build warning from head_32.S:

  In file included from arch/x86/kernel/head_32.S:29:
  arch/x86/include/asm/pgtable_32.h:59:5: error: "PTRS_PER_PMD" is not defined, evaluates to 0 [-Werror=undef]
       59 | #if PTRS_PER_PMD > 1

The reason is that on 2-level i386 paging the folded in PMD's
PTRS_PER_PMD constant is not defined in assembly headers,
only in generic MM C headers.

Instead of trying to fish out the definition from the generic
headers, just define it - it even has a comment for it already...

Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/Z8oa8AUVyi2HWfo9@gmail.com
4 months agoMerge tag 'slab-for-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka...
Linus Torvalds [Fri, 7 Mar 2025 22:22:41 +0000 (12:22 -1000)]
Merge tag 'slab-for-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:

 - Stable fix for kmem_cache_destroy() called from a WQ_MEM_RECLAIM
   workqueue causing a warning due to the new kvfree_rcu_barrier()
   (Uladzislau Rezki)

* tag 'slab-for-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq

4 months agoMerge tag 'acpi-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 7 Mar 2025 22:17:42 +0000 (12:17 -1000)]
Merge tag 'acpi-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "Restore the previous behavior of the ACPI platform_profile sysfs
  interface that has been changed recently in a way incompatible with
  the existing user space (Mario Limonciello)"

* tag 'acpi-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  platform/x86/amd: pmf: Add balanced-performance to hidden choices
  platform/x86/amd: pmf: Add 'quiet' to hidden choices
  ACPI: platform_profile: Add support for hidden choices

4 months agoMerge tag 'execve-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees...
Linus Torvalds [Fri, 7 Mar 2025 21:49:33 +0000 (11:49 -1000)]
Merge tag 'execve-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull core dumping fix from Kees Cook:

 - Only sort VMAs when core_sort_vma sysctl is set

* tag 'execve-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  coredump: Only sort VMAs when core_sort_vma sysctl is set

4 months agoMerge tag 'for-6.14-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Fri, 7 Mar 2025 21:17:30 +0000 (11:17 -1000)]
Merge tag 'for-6.14-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fix leaked extent map after error when reading chunks

 - replace use of deprecated strncpy

 - in zoned mode, fixed range when ulocking extent range, causing a hang

* tag 'for-6.14-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix a leaked chunk map issue in read_one_chunk()
  btrfs: replace deprecated strncpy() with strscpy()
  btrfs: zoned: fix extent range end unlock in cow_file_range()