Kent Overstreet [Sat, 8 Feb 2025 00:56:11 +0000 (19:56 -0500)]
bcachefs: Advance bch_alloc.oldest_gen if no stale pointers
Now that we've got cached backpointers and aren't leaving around stale
pointers on bucket invalidation, we no longer need the periodic (rare)
gc_gens - which recalculates each bucket's oldest gen to avoid wraparound.
We can't delete that code because we've got to support existing
filesystems that will still have stale pointers, but this gets rid of
another scalability limit.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This means that we'll be able to kill cached pointers in the
bucket_invalidate path, when invalidating/reusing buckets containing
cached data, instead of leaving them around to be cleaned up by gc_gens
garbago collection - which requires a full metadata scan.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 11 Feb 2025 15:09:31 +0000 (10:09 -0500)]
bcachefs: Better trigger ordering
Transactional triggers need to run in a defined ordering, which is not
quite the same as btree ID integer comparison.
Previously this was handled in a hacky way in
bch2_trans_commit_run_triggers(), since it was only the alloc btree that
needed special handling, but upcoming stripe btree changes are going to
require more ordering changes - so, define that ordering.
Next patch will change the transaction commit path to use it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
FRAGMENTATION_START was incorrect, there's currently only one
fragmentation LRU (at the end of the reserved bits for LRU type), and
we're getting ready to add a stripe fragmentation lru - so give it a
better name.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 10 Feb 2025 22:04:08 +0000 (17:04 -0500)]
bcachefs: bch2_write_op_error() now prints info about data update
A user has been seeing the "error verifying existing checksum while
rewriting existing data (memory corruption?)" error.
This generally indicates a hardware issue (and that may be the case
here), but it might also indicate a bug, in which case we need more
information to look for patterns.
Reported-by: Roland Vet <vet.roland@protonmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
However, these properties are no longer relied upon and no longer
necessary, so remove the corresponding asserts and forbid the use of
eytzinger1_prev(0, size) and eytzinger1_next(0, size).
This allows to further simplify the code in eytzinger1_next() and
eytzinger1_prev(): where the left shifting happens, eytzinger1_next() is
trying to move i to the lowest child on the left, which is equivalent to
doubling i until the next doubling would cause it to be greater than
size. This is implemented by shifting i to the left so that the most
significant bits align and then shifting i to the right by one if the
result is greater than size.
Likewise, eytzinger1_prev() is trying to move to the lowest child on the
right; the same applies here.
The 1-offset in (size - 1) in eytzinger1_next() isn't needed at all, but
the equivalent offset in eytzinger1_prev() is surprisingly needed to
preserve the 'eytzinger1_prev(0, size) == eytzinger1_last(size)'
property. However, since we no longer support that property, we can get
rid of these offsets as well. This saves one addition in each function
and makes the code less confusing.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Andreas Gruenbacher [Mon, 27 Jan 2025 19:54:52 +0000 (20:54 +0100)]
bcachefs: convert eytzinger sort to be 1-based (2)
In this second step, transform the eytzinger indexes i, j, and k in
eytzinger1_sort_r() from 0-based to 1-based. This step looks a bit
messy, but the resulting code is slightly better.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Andreas Gruenbacher [Tue, 28 Jan 2025 09:56:37 +0000 (10:56 +0100)]
bcachefs: convert eytzinger0_find to be 1-based
Several of the algorithms on eytzinger trees are implemented in terms of
the eytzinger0 primitives. However, those algorithms can just as easily
be expressed in terms of the eytzinger1 primitives, and that leads to
better and easier to understand code. Start by converting
eytzinger0_find().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Andreas Gruenbacher [Mon, 27 Jan 2025 16:15:36 +0000 (17:15 +0100)]
bcachefs: add eytzinger0_find_ge self test
Add an eytzinger0_find_ge() self test similar to eytzinger0_find_gt().
Note that this test requires eytzinger0_find_ge() to return the first
matching element in the array in case of duplicates. To prevent
bisection errors, we only add this test after strenghening the original
implementation (see the previous commit).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Andreas Gruenbacher [Mon, 27 Jan 2025 16:52:39 +0000 (17:52 +0100)]
bcachefs: implement eytzinger0_find_ge directly
Implement eytzinger0_find_ge() directly instead of implementing it in
terms of eytzinger0_find_le() and adjusting the result.
This turns eytzinger0_find_ge() into a minimum search, so when there are
duplicate elements, the result of eytzinger0_find_ge() will now always
point at the first matching element.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Andreas Gruenbacher [Sun, 26 Jan 2025 16:57:06 +0000 (17:57 +0100)]
bcachefs: improve eytzinger0_find_le self test
Rename eytzinger0_find_test_val() to eytzinger0_find_test_le() and add a
new eytzinger0_find_test_val() wrapper that calls it.
We have already established that the array is sorted in eytzinger order,
so we can use the eytzinger iterator functions and check the boundary
conditions to verify the result of eytzinger0_find_le().
Only scan the entire array if we get an incorrect result. When we need
to scan, use eytzinger0_for_each_prev() so that we'll stop at the
highest matching element in the array in case there are duplicates;
going through the array linearly wouldn't give us that.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Andreas Gruenbacher [Tue, 26 Nov 2024 11:12:36 +0000 (12:12 +0100)]
bcachefs: eytzinger self tests: loop cleanups
The iterator variable of eytzinger0_for_each() loops has been changed to
be locally scoped at some point, so remove variables defined outside the
loop that are now unused. In addition and for clarity, use a different
variable inside those loops where an outside variable would be shadowed.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 6 Feb 2025 00:13:39 +0000 (19:13 -0500)]
bcachefs: Free journal bufs when not in use
Since we're increasing the number of 'struct journal_bufs', we don't
want them all permanently holding onto buffers for the journal data -
that'd be 16 * 2MB = 32MB, or potentially more.
Add a single-element mempool (open coded, since buffer size varies),
this also means we won't be hitting the memory allocator every time we
open and close a journal entry/buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 23 Jan 2025 18:06:35 +0000 (13:06 -0500)]
bcachefs: Don't touch journal_buf->data->seq in journal_res_get
This is a small optimization, reducing the number of cachelines we touch
in the fast path - and it's also necessary for the next patch that
increases JOURNAL_BUF_NR.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 6 Feb 2025 21:25:29 +0000 (16:25 -0500)]
bcachefs: Add a progress indicator to bch2_dev_data_drop()
This code needs quite a bit of work: we don't want to be walking all
metadata in the filesystem, we should just be walking backpointers, and
it should be switched to a data ioctl that can report progress via a
file descriptor, not the system console.
But that'll take more work - before we can safely walk only backpointers
we need to change device add to not reuse device indexes, since with
that change accounting being wrong introduces the possibility of
removing a device that still has pointers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 6 Feb 2025 20:59:28 +0000 (15:59 -0500)]
bcachefs: Factor out progress.[ch]
the backpointers code has progress indicators; these aren't great, since
they print to the dmesg console and we much prefer to have progress
indicators reporting to a specific userspace program so they're not
spamming the system console.
But not all codepaths that need progress indicators support that yet,
and we don't want users to think "this is hung".
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 29 Dec 2024 00:59:55 +0000 (19:59 -0500)]
bcachefs: Scrub
Add a new data op to walk all data and metadata in a filesystem,
checking if it can be read successfully, and on error repairing from
another copy if possible.
- New helper: bch2_dev_idx_is_online(), so that we can bail out and
report to userspace when we're unable to scrub because the device is
offline
- data_update_opts, which controls the data move path, now understands
scrub: data is only read, not written. The read path is responsible
for rewriting on read error, as with other reads.
- scrub_pred skips data extents that don't have checksums
- bch_ioctl_data has a new scrub member, which has a data_types field
for data types to check - i.e. all data types, or only metadata.
- Add new entries to bch_move_stats so that we can report numbers for
corrected and uncorrected errors
- Add a new enum to bch_ioctl_data_event for explicitly reporting
completion and return code (i.e. device offline)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 30 Dec 2024 21:24:23 +0000 (16:24 -0500)]
bcachefs: bch2_btree_node_scrub()
Add a function for scrubbing btree nodes - reading them in, and kicking
off a rewrite if there's an error.
The btree_node_read_done() checks have to be duplicated because we're
not using a pointer to a struct btree - the btree node might already be
in cache, and we need to check a specific replica, which might not be
the one we previously read from.
This will be used in the next patch implementing high-level scrub.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 28 Dec 2024 21:20:38 +0000 (16:20 -0500)]
bcachefs: backpointer_get_key() doesn't pull in btree node
We may not need to pull in a btree node when walking backpointers -
don't do so unnecessarily when using backpointer_get_key().
It'll still fall back to backpointer_get_node() in a few situations,
including btree roots (where an iterator can't point at just the key),
and races due to the interior update path not having deleted a
backpointer to an old node yet.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 30 Dec 2024 21:32:57 +0000 (16:32 -0500)]
bcachefs: Internal reads can now correct errors
Rework the read path so that BCH_READ_NODECODE reads now also self-heal
after a read error and a successful retry - prerequisite for scrub.
- __bch2_read_endio() now handles a read that's both BCH_READ_NODECODE
and a bounce.
Normally, we don't want a BCH_READ_NODECODE read to ever allocate a
split bch_read_bio: we want to maintain the relationship between the
bch_read_bio and the data_update it's embedded in.
But correcting read errors requires allocating a split/bounce rbio
that's embedded in a promote_op. We do still have a 1-1 relationship,
i.e. we only allocate a single split/bounce if it's a
BCH_READ_NODECODE, so things hopefully don't get too crazy.
- __bch2_read_extent() now is allowed to allocate the promote_op for
rewriting after a failed read, even if it's BCH_READ_NODECODE.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 19 Jan 2025 18:55:33 +0000 (13:55 -0500)]
bcachefs: Bail out early on alloc_nowait data updates
If a data update doesn't want to block on allocations (promotes, self
healing on read error) - check if the allocation would fail before
kicking off the data update and calling into the write path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 19 Jan 2025 18:43:44 +0000 (13:43 -0500)]
bcachefs: Rework init order in bch2_data_update_init()
Initialize the write op first, so that in the next patch we can check if
the allocator would block (for BCH_WRITE_alloc_nowait ops) and bail out
before taking nocow locks/dev refs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 18 Jan 2025 07:05:57 +0000 (02:05 -0500)]
bcachefs: Self healing writes are BCH_WRITE_alloc_nowait
If a drive is failing and we're moving data off of it, we can't
necessairly depend on capacity/disk reservation calculations to avoid
deadlocking/blocking on the allocator.
And, we don't want to queue up infinite self healing moves anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 16 Jan 2025 08:43:03 +0000 (03:43 -0500)]
bcachefs: Be stricter in bch2_read_retry_nodecode()
Now that data_update embeds bch_read_bio, BCH_READ_NODECODE means that
the read is embedded in a a data_update - and we can check in the retry
path if the extent has changed and bail out.
This likely fixes some subtle bugs with read errors and data moves.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 30 Jan 2025 08:28:27 +0000 (03:28 -0500)]
bcachefs: enum bch_persistent_counters_stable
Persistent counters, like recovery passes, include a stable enum in
their definition - but this was never correctly plumbed.
This allows us to add new counters and properly organize them with a
non-stable "presentation order", which can also be used in userspace by
the new 'bcachefs fs top' tool.
Fortunatel, since we haven't yet added any new counters where
presentation order ID doesn't match stable ID, this won't cause any
reordering issues.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 22 Jan 2025 19:38:59 +0000 (14:38 -0500)]
bcachefs: Separate running/runnable in wp stats
We've got per-writepoint statistics to see how well the writepoint index
update threads are pipelining; this separates running vs. runnable so we
can see at a glance if they're blocking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 30 Jan 2025 08:41:31 +0000 (03:41 -0500)]
bcachefs: Don't inc io_(read|write) counters for moves
This makes 'bcachefs fs top' more useful; we can now see at a glance
whether the IO to the device is being done for user reads/writes, or
copygc/rebalance.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 11 Feb 2025 18:33:08 +0000 (13:33 -0500)]
bcachefs: check_bp_exists() check for backpointers for stale pointers
Early version of 'bcachefs_metadata_version_cached_backpointers' was
creating backpointers for stale cached pointers - whoops. Now we have to
repair those.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 14 Mar 2025 22:20:20 +0000 (18:20 -0400)]
bcachefs: fix build on 32 bit in get_random_u64_below()
bare 64 bit divides not allowed, whoops
arm-linux-gnueabi-ld: drivers/char/random.o: in function `__get_random_u64_below':
drivers/char/random.c:602:(.text+0xc70): undefined reference to `__aeabi_uldivmod'
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 14 Mar 2025 13:54:43 +0000 (09:54 -0400)]
bcachefs: Change btree wb assert to runtime error
We just had a report of the assert for "btree in write buffer for
non-write buffer btree" popping during the 6.14 upgrade.
- 150TB filesystem, after a reboot the upgrade was able to continue from
where it left off, so no major damage.
But with 6.14 about to come out we want to get this tracked down asap,
and need more data if other users hit this.
Convert the BUG_ON() to an emergency read-only, and print out btree, the
key itself, and stack trace from the original write buffer update (which
did not have this check before).
Reported-by: Stijn Tintel <stijn@linux-ipv6.be> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Roxana Nicolescu [Tue, 11 Mar 2025 15:06:10 +0000 (15:06 +0000)]
bcachefs: Initialize from_inode members for bch_io_opts
When there is no inode source, all "from_inode" members in the structure
bhc_io_opts should be set false.
Fixes: 7a7c43a0c1ecf ("bcachefs: Add bch_io_opts fields for indicating whether the opts came from the inode") Reported-by: syzbot+c17ad4b4367b72a853cb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c17ad4b4367b72a853cb Signed-off-by: Roxana Nicolescu <nicolescu.roxana@protonmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Linus Torvalds [Sun, 9 Mar 2025 19:23:14 +0000 (09:23 -1000)]
Merge tag 'kbuild-fixes-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Use the specified $(LD) when building userprogs with Clang
- Pass the correct target triple when compile-testing UAPI headers
with Clang
- Fix pacman-pkg build error with KBUILD_OUTPUT
* tag 'kbuild-fixes-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: install-extmod-build: Fix build when specifying KBUILD_OUTPUT
docs: Kconfig: fix defconfig description
kbuild: hdrcheck: fix cross build with clang
kbuild: userprogs: use correct lld when linking through clang
Linus Torvalds [Sun, 9 Mar 2025 19:14:07 +0000 (09:14 -1000)]
Merge tag 'usb-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes for some reported issues. These
contain:
- typec driver fixes
- dwc3 driver fixes
- xhci driver fixes
- renesas controller fixes
- gadget driver fixes
- a new USB quirk added
All of these have been in linux-next with no reported issues"
* tag 'usb-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: ucsi: Fix NULL pointer access
usb: quirks: Add DELAY_INIT and NO_LPM for Prolific Mass Storage Card Reader
usb: xhci: Fix host controllers "dying" after suspend and resume
usb: dwc3: Set SUSPENDENABLE soon after phy init
usb: hub: lack of clearing xHC resources
usb: renesas_usbhs: Flush the notify_hotplug_work
usb: renesas_usbhs: Use devm_usb_get_phy()
usb: renesas_usbhs: Call clk_put()
usb: dwc3: gadget: Prevent irq storm when TH re-executes
usb: gadget: Check bmAttributes only if configuration is valid
xhci: Restrict USB4 tunnel detection for USB3 devices to Intel hosts
usb: xhci: Enable the TRB overfetch quirk on VIA VL805
usb: gadget: Fix setting self-powered state on suspend
usb: typec: ucsi: increase timeout for PPM reset operations
acpi: typec: ucsi: Introduce a ->poll_cci method
usb: typec: tcpci_rt1711h: Unmask alert interrupts to fix functionality
usb: gadget: Set self-powered based on MaxPower and bmAttributes
usb: gadget: u_ether: Set is_suspend flag if remote wakeup fails
usb: atm: cxacru: fix a flaw in existing endpoint checks
Linus Torvalds [Sun, 9 Mar 2025 19:11:42 +0000 (09:11 -1000)]
Merge tag 'driver-core-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single driver core fix that resolves a reported memory leak.
It's been in linux-next for 2 weeks now with no reported problems"
* tag 'driver-core-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
drivers: core: fix device leak in __fw_devlink_relax_cycles()
Linus Torvalds [Sun, 9 Mar 2025 19:07:54 +0000 (09:07 -1000)]
Merge tag 'char-misc-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc/IIO driver fixes from Greg KH:
"Here are a number of misc and char and iio driver fixes that have been
sitting in my tree for way too long. They contain:
- iio driver fixes for reported issues
- regression fix for rtsx_usb card reader
- mei and mhi driver fixes
- small virt driver fixes
- ntsync permissions fix
- other tiny driver fixes for reported problems.
All of these have been in linux-next for quite a while with no
reported issues"
* tag 'char-misc-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (30 commits)
Revert "drivers/card_reader/rtsx_usb: Restore interrupt based detection"
ntsync: Check wait count based on byte size.
bus: simple-pm-bus: fix forced runtime PM use
char: misc: deallocate static minor in error path
eeprom: digsy_mtc: Make GPIO lookup table match the device
drivers: virt: acrn: hsm: Use kzalloc to avoid info leak in pmcmd_ioctl
binderfs: fix use-after-free in binder_devices
slimbus: messaging: Free transaction ID in delayed interrupt scenario
vbox: add HAS_IOPORT dependency
cdx: Fix possible UAF error in driver_override_show()
intel_th: pci: Add Panther Lake-P/U support
intel_th: pci: Add Panther Lake-H support
intel_th: pci: Add Arrow Lake support
intel_th: msu: Fix less trivial kernel-doc warnings
intel_th: msu: Fix kernel-doc warnings
MAINTAINERS: change maintainer for FSI
ntsync: Set the permissions to be 0666
bus: mhi: host: pci_generic: Use pci_try_reset_function() to avoid deadlock
mei: vsc: Use "wakeuphostint" when getting the host wakeup GPIO
mei: me: add panther lake P DID
...
Linus Torvalds [Sun, 9 Mar 2025 19:04:08 +0000 (09:04 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"arm64:
- Fix a couple of bugs affecting pKVM's PSCI relay implementation
when running in the hVHE mode, resulting in the host being entered
with the MMU in an unknown state, and EL2 being in the wrong mode
x86:
- Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow
- Ensure DEBUGCTL is context switched on AMD to avoid running the
guest with the host's value, which can lead to unexpected bus lock
#DBs
- Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't
properly emulate BTF. KVM's lack of context switching has meant BTF
has always been broken to some extent
- Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as
the guest can enable DebugSwap without KVM's knowledge
- Fix a bug in mmu_stress_tests where a vCPU could finish the "writes
to RO memory" phase without actually generating a write-protection
fault
- Fix a printf() goof in the SEV smoke test that causes build
failures with -Werror
- Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when
PERFMON_V2 isn't supported by KVM"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Explicitly zero EAX and EBX when PERFMON_V2 isn't supported by KVM
KVM: selftests: Fix printf() format goof in SEV smoke test
KVM: selftests: Ensure all vCPUs hit -EFAULT during initial RO stage
KVM: SVM: Don't rely on DebugSwap to restore host DR0..DR3
KVM: SVM: Save host DR masks on CPUs with DebugSwap
KVM: arm64: Initialize SCTLR_EL1 in __kvm_hyp_init_cpu()
KVM: arm64: Initialize HCR_EL2.E2H early
KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
KVM: x86: Snapshot the host's DEBUGCTL in common x86
KVM: SVM: Suppress DEBUGCTL.BTF on AMD
KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
KVM: selftests: Assert that STI blocking isn't set after event injection
KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI shadow
Paolo Bonzini [Sun, 9 Mar 2025 07:44:06 +0000 (03:44 -0400)]
Merge tag 'kvm-x86-fixes-6.14-rcN.2' of https://github.com/kvm-x86/linux into HEAD
KVM x86 fixes for 6.14-rcN #2
- Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow.
- Ensure DEBUGCTL is context switched on AMD to avoid running the guest with
the host's value, which can lead to unexpected bus lock #DBs.
- Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly
emulate BTF. KVM's lack of context switching has meant BTF has always been
broken to some extent.
- Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as the guest
can enable DebugSwap without KVM's knowledge.
- Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO
memory" phase without actually generating a write-protection fault.
- Fix a printf() goof in the SEV smoke test that causes build failures with
-Werror.
- Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2
isn't supported by KVM.