Linus Torvalds [Thu, 30 Jan 2025 17:13:35 +0000 (09:13 -0800)]
Merge tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs d_revalidate updates from Al Viro:
"Provide stable parent and name to ->d_revalidate() instances
Most of the filesystem methods where we care about dentry name and
parent have their stability guaranteed by the callers;
->d_revalidate() is the major exception.
It's easy enough for callers to supply stable values for expected name
and expected parent of the dentry being validated. That kills quite a
bit of boilerplate in ->d_revalidate() instances, along with a bunch
of races where they used to access ->d_name without sufficient
precautions"
* tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
9p: fix ->rename_sem exclusion
orangefs_d_revalidate(): use stable parent inode and name passed by caller
ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
nfs: fix ->d_revalidate() UAF on ->d_name accesses
nfs{,4}_lookup_validate(): use stable parent inode passed by caller
gfs2_drevalidate(): use stable parent inode and name passed by caller
fuse_dentry_revalidate(): use stable parent inode and name passed by caller
vfat_revalidate{,_ci}(): use stable parent inode passed by caller
exfat_d_revalidate(): use stable parent inode passed by caller
fscrypt_d_revalidate(): use stable parent inode passed by caller
ceph_d_revalidate(): propagate stable name down into request encoding
ceph_d_revalidate(): use stable parent inode passed by caller
afs_d_revalidate(): use stable name and parent inode passed by caller
Pass parent directory inode and expected name to ->d_revalidate()
generic_ci_d_compare(): use shortname_storage
ext4 fast_commit: make use of name_snapshot primitives
dissolve external_name.u into separate members
make take_dentry_name_snapshot() lockless
dcache: back inline names with a struct-wrapped array of unsigned long
make sure that DNAME_INLINE_LEN is a multiple of word size
Linus Torvalds [Thu, 30 Jan 2025 16:47:17 +0000 (08:47 -0800)]
Merge tag 'ntfs3_for_6.14' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 fixes from Konstantin Komarov:
- unify inode corruption marking and mark them as bad immediately upon
detection of an error in attribute enumeration
- folio cleanup
* tag 'ntfs3_for_6.14' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: Unify inode corruption marking with _ntfs_bad_inode()
fs/ntfs3: Mark inode as bad as soon as error detected in mi_enum_attr()
ntfs3: Remove an access to page->index
Linus Torvalds [Thu, 30 Jan 2025 16:42:50 +0000 (08:42 -0800)]
Merge tag 'bcachefs-2025-01-29' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
- second half of a fix for a bug that'd been causing oopses on
filesystems using snapshots with memory pressure (key cache fills for
snaphots btrees are tricky)
- build fix for strange compiler configurations that double stack frame
size
- "journal stuck timeout" now takes into account device latency: this
fixes some spurious warnings, and the main remaining source of SRCU
lock hold time warnings (I'm no longer seeing this in my CI, so any
users still seeing this should definitely ping me)
- fix for slow/hanging unmounts (" Improve journal pin flushing")
- some more tracepoint fixes/improvements, to chase down the "rebalance
isn't making progress" issues
* tag 'bcachefs-2025-01-29' of git://evilpiepirate.org/bcachefs:
bcachefs: Improve trace_move_extent_finish
bcachefs: Fix trace_copygc
bcachefs: Journal writes are now IOPRIO_CLASS_RT
bcachefs: Improve journal pin flushing
bcachefs: fix bch2_btree_node_flags
bcachefs: rebalance, copygc enabled are runtime opts
bcachefs: Improve decompression error messages
bcachefs: bset_blacklisted_journal_seq is now AUTOFIX
bcachefs: "Journal stuck" timeout now takes into account device latency
bcachefs: Reduce stack frame size of __bch2_str_hash_check_key()
bcachefs: Fix btree_trans_peek_key_cache()
Linus Torvalds [Wed, 29 Jan 2025 22:38:19 +0000 (14:38 -0800)]
Merge tag 'soundwire-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
- SoundWire multi lane support to use multiple lanes if supported
- Stream handling of DEPREPARED state
- AMD wake register programming for power off mode
* tag 'soundwire-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: amd: clear wake enable register for power off mode
soundwire: generic_bandwidth_allocation: count the bandwidth of active streams only
SoundWire: pass stream to compute_params()
soundwire: generic_bandwidth_allocation: add lane in sdw_group_params
soundwire: generic_bandwidth_allocation: select data lane
soundwire: generic_bandwidth_allocation: check required freq accurately
soundwire: generic_bandwidth_allocation: correct clk_freq check in sdw_select_row_col
Soundwire: generic_bandwidth_allocation: set frame shape on fly
Soundwire: stream: program BUSCLOCK_SCALE
Soundwire: add sdw_slave_get_scale_index helper
soundwire: generic_bandwidth_allocation: skip DEPREPARED streams
soundwire: stream: set DEPREPARED state earlier
soundwire: add lane_used_bandwidth in struct sdw_bus
soundwire: mipi_disco: read lane mapping properties from ACPI
soundwire: add lane field in sdw_port_runtime
soundwire: bus: Move irq mapping cleanup into devres
Linus Torvalds [Wed, 29 Jan 2025 22:29:57 +0000 (14:29 -0800)]
Merge tag 'dmaengine-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"A bunch of new device support and updates to few drivers, biggest of
them amd ones.
New support:
- TI J722S CSI BCDMA controller support
- Intel idxd Panther Lake family platforms
- Allwinner F1C100s suniv DMA
- Qualcomm QCS615, QCS8300, SM8750, SA8775P GPI dma controller support
- AMD ae4dma controller support and reorganisation of amd driver
Updates:
- Channel page support for Nvidia Tegra210 adma driver
- Freescale support for S32G based platforms
- Yamilfy atmel dma bindings"
* tag 'dmaengine-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (45 commits)
dmaengine: idxd: Enable Function Level Reset (FLR) for halt
dmaengine: idxd: Refactor halt handler
dmaengine: idxd: Add idxd_device_config_save() and idxd_device_config_restore() helpers
dmaengine: idxd: Binding and unbinding IDXD device and driver
dmaengine: idxd: Add idxd_pci_probe_alloc() helper
dt-bindings: dma: atmel: Convert to json schema
dt-bindings: dma: st-stm32-dmamux: Add description for dma-cell values
dmaengine: qcom: gpi: Add GPI immediate DMA support for SPI protocol
dt-bindings: dma: adi,axi-dmac: deprecate adi,channels node
dt-bindings: dma: adi,axi-dmac: convert to yaml schema
dmaengine: mv_xor: switch to for_each_child_of_node_scoped()
dmaengine: bcm2835-dma: Prevent suspend if DMA channel is busy
dmaengine: tegra210-adma: Support channel page
dt-bindings: dma: Support channel page to nvidia,tegra210-adma
dmaengine: ti: k3-udma: Add support for J722S CSI BCDMA
dt-bindings: dma: ti: k3-bcdma: Add J722S CSI BCDMA
dmaengine: ti: edma: fix OF node reference leaks in edma_driver
dmaengine: ti: edma: make the loop condition simpler in edma_probe()
dmaengine: fsl-edma: read/write multiple registers in cyclic transactions
dmaengine: fsl-edma: add support for S32G based platforms
...
Linus Torvalds [Wed, 29 Jan 2025 19:56:55 +0000 (11:56 -0800)]
Merge tag 'regulator-fix-v6.14-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of fixes that have come in during the merge window: one that
operates the TPS6287x devices more within the design spec and can
prevent current surges when changing voltages and another more trivial
one for error message formatting"
* tag 'regulator-fix-v6.14-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: Add missing newline character
regulator: TPS6287X: Use min/max uV to get VRANGE
Linus Torvalds [Wed, 29 Jan 2025 19:23:22 +0000 (11:23 -0800)]
Merge tag 'cxl-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) updates from Dave Jiang:
"A tweak to the HMAT output that was acked by Rafael, a prep patch for
CXL type2 devices support that's coming soon, refactoring of the CXL
regblock enumeration code, and a series of patches to update the event
records to CXL spec r3.1:
- Move HMAT printouts to pr_debug()
- Add CXL type2 support to cxl_dvsec_rr_decode() in preparation for
type2 support
- A series that updates CXL event records to spec r3.1 and related
changes
- Refactoring of cxl_find_regblock_instance() to count regblocks"
* tag 'cxl-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/core/regs: Refactor out functions to count regblocks of given type
cxl/test: Update test code for event records to CXL spec rev 3.1
cxl/events: Update Memory Module Event Record to CXL spec rev 3.1
cxl/events: Update DRAM Event Record to CXL spec rev 3.1
cxl/events: Update General Media Event Record to CXL spec rev 3.1
cxl/events: Add Component Identifier formatting for CXL spec rev 3.1
cxl/events: Update Common Event Record to CXL spec rev 3.1
cxl/pci: Add CXL Type 1/2 support to cxl_dvsec_rr_decode()
ACPI/HMAT: Move HMAT messages to pr_debug()
Linus Torvalds [Wed, 29 Jan 2025 18:55:04 +0000 (10:55 -0800)]
Merge tag 'powerpc-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- Fix to handle PE state in pseries_eeh_get_state()
- Handle unset of tce window if it was never set
Thanks to Narayana Murty N, Ritesh Harjani (IBM), Shivaprasad G Bhat,
and Vaishnavi Bhat.
* tag 'powerpc-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries/iommu: Don't unset window if it was never set
powerpc/pseries/eeh: Fix get PE state translation
Linus Torvalds [Wed, 29 Jan 2025 18:50:28 +0000 (10:50 -0800)]
Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC cleanups from Eric Biggers:
"Simplify the kconfig options for controlling which CRC implementations
are built into the kernel, as was requested by Linus.
This means making the option to disable the arch code visible only
when CONFIG_EXPERT=y, and standardizing on a single generic
implementation of CRC32"
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crc32: remove other generic implementations
lib/crc: simplify the kconfig options for CRC implementations
Linus Torvalds [Wed, 29 Jan 2025 18:35:40 +0000 (10:35 -0800)]
Merge tag 'constfy-sysctl-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl table constification from Joel Granados:
"All ctl_table declared outside of functions and that remain unmodified
after initialization are const qualified.
This prevents unintended modifications to proc_handler function
pointers by placing them in the .rodata section.
This is a continuation of the tree-wide effort started a few releases
ago with the constification of the ctl_table struct arguments in the
sysctl API done in 78eb4ea25cd5 ("sysctl: treewide: constify the
ctl_table argument of proc_handlers")"
* tag 'constfy-sysctl-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
treewide: const qualify ctl_tables where applicable
Linus Torvalds [Wed, 29 Jan 2025 17:40:23 +0000 (09:40 -0800)]
Merge tag 'fuse-update-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
"Add support for io-uring communication between kernel and userspace
using IORING_OP_URING_CMD (Bernd Schubert). Following features enable
gains in performance compared to the regular interface:
- Allow processing multiple requests with less syscall overhead
- Combine commit of old and fetch of new fuse request
- CPU/NUMA affinity of queues
Patches were reviewed by several people, including Pavel Begunkov,
io-uring co-maintainer"
* tag 'fuse-update-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: prevent disabling io-uring on active connections
fuse: enable fuse-over-io-uring
fuse: block request allocation until io-uring init is complete
fuse: {io-uring} Prevent mount point hang on fuse-server termination
fuse: Allow to queue bg requests through io-uring
fuse: Allow to queue fg requests through io-uring
fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static
fuse: {io-uring} Handle teardown of ring entries
fuse: Add io-uring sqe commit and fetch support
fuse: {io-uring} Make hash-list req unique finding functions non-static
fuse: Add fuse-io-uring handling into fuse_copy
fuse: Make fuse_copy non static
fuse: {io-uring} Handle SQEs - register commands
fuse: make args->in_args[0] to be always the header
fuse: Add fuse-io-uring design documentation
fuse: Move request bits
fuse: Move fuse_get_dev to header file
fuse: rename to fuse_dev_end_requests and make non-static
Eric Biggers [Thu, 23 Jan 2025 21:29:04 +0000 (13:29 -0800)]
lib/crc32: remove other generic implementations
Now that we've standardized on the byte-by-byte implementation of CRC32
as the only generic implementation (see previous commit for the
rationale), remove the code for the other implementations.
Eric Biggers [Thu, 23 Jan 2025 21:29:03 +0000 (13:29 -0800)]
lib/crc: simplify the kconfig options for CRC implementations
Make the following simplifications to the kconfig options for choosing
CRC implementations for CRC32 and CRC_T10DIF:
1. Make the option to disable the arch-optimized code be visible only
when CONFIG_EXPERT=y.
2. Make a single option control the inclusion of the arch-optimized code
for all enabled CRC variants.
3. Make CRC32_SARWATE (a.k.a. slice-by-1 or byte-by-byte) be the only
generic CRC32 implementation.
The result is there is now just one option, CRC_OPTIMIZATIONS, which is
default y and can be disabled only when CONFIG_EXPERT=y.
Rationale:
1. Enabling the arch-optimized code is nearly always the right choice.
However, people trying to build the tiniest kernel possible would
find some use in disabling it. Anything we add to CRC32 is de facto
unconditional, given that CRC32 gets selected by something in nearly
all kernels. And unfortunately enabling the arch CRC code does not
eliminate the need to build the generic CRC code into the kernel too,
due to CPU feature dependencies. The size of the arch CRC code will
also increase slightly over time as more CRC variants get added and
more implementations targeting different instruction set extensions
get added. Thus, it seems worthwhile to still provide an option to
disable it, but it should be considered an expert-level tweak.
2. Considering the use case described in (1), there doesn't seem to be
sufficient value in making the arch-optimized CRC code be
independently configurable for different CRC variants. Note also
that multiple variants were already grouped together, e.g.
CONFIG_CRC32 actually enables three different variants of CRC32.
3. The bit-by-bit implementation is uselessly slow, whereas slice-by-n
for n=4 and n=8 use tables that are inconveniently large: 4096 bytes
and 8192 bytes respectively, compared to 1024 bytes for n=1. Higher
n gives higher instruction-level parallelism, so higher n easily wins
on traditional microbenchmarks on most CPUs. However, the larger
tables, which are accessed randomly, can be harmful in real-world
situations where the dcache may be cold or useful data may need be
evicted from the dcache. Meanwhile, today most architectures have
much faster CRC32 implementations using dedicated CRC32 instructions
or carryless multiplication instructions anyway, which make the
generic code obsolete in most cases especially on long messages.
Another reason for going with n=1 is that this is already what is
used by all the other CRC variants in the kernel. CRC32 was unique
in having support for larger tables. But as per the above this can
be considered an outdated optimization.
The standardization on slice-by-1 a.k.a. CRC32_SARWATE makes much of
the code in lib/crc32.c unused. A later patch will clean that up.
Linus Torvalds [Tue, 28 Jan 2025 22:32:03 +0000 (14:32 -0800)]
Merge tag 'x86-urgent-2025-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"Fix a potential early boot crash in SEV-SNP guests, where certain
config and build environment combinations can generate absolute
references to symbols in the early boot code"
* tag 'x86-urgent-2025-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Disable jump tables in SEV startup code
Linus Torvalds [Tue, 28 Jan 2025 22:23:46 +0000 (14:23 -0800)]
Merge tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New Features:
- Enable using direct IO with localio
- Added localio related tracepoints
Bugfixes:
- Sunrpc fixes for working with a very large cl_tasks list
- Fix a possible buffer overflow in nfs_sysfs_link_rpc_client()
- Fixes for handling reconnections with localio
- Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT
- Fix COPY_NOTIFY xdr_buf size calculations
- pNFS/Flexfiles fix for retrying requesting a layout segment for
reads
- Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is
expired
Cleanups:
- Various other nfs & nfsd localio cleanups
- Prepratory patches for async copy improvements that are under
development
- Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other
xprts
- Add netns inum and srcaddr to debugfs rpc_xprt info"
* tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (28 commits)
SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expired
sunrpc: add netns inum and srcaddr to debugfs rpc_xprt info
pnfs/flexfiles: retry getting layout segment for reads
NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE
NFSv4.2: mark OFFLOAD_CANCEL MOVEABLE
NFSv4.2: fix COPY_NOTIFY xdr buf size calculation
NFS: Rename struct nfs4_offloadcancel_data
NFS: Fix typo in OFFLOAD_CANCEL comment
NFS: CB_OFFLOAD can return NFS4ERR_DELAY
nfs: Make NFS_FSCACHE select NETFS_SUPPORT instead of depending on it
nfs: fix incorrect error handling in LOCALIO
nfs: probe for LOCALIO when v3 client reconnects to server
nfs: probe for LOCALIO when v4 client reconnects to server
nfs/localio: remove redundant code and simplify LOCALIO enablement
nfs_common: add nfs_localio trace events
nfs_common: track all open nfsd_files per LOCALIO nfs_client
nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock
nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file
nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_
nfsd: update percpu_ref to manage references on nfsd_net
...
Linus Torvalds [Tue, 28 Jan 2025 22:16:46 +0000 (14:16 -0800)]
Merge tag 'vfio-v6.14-rc1' of https://github.com/awilliam/linux-vfio
Pull vfio updates from Alex Williamson:
- Extend vfio-pci 8-byte read/write support to include archs defining
CONFIG_GENERIC_IOMAP, such as x86, and remove now extraneous #ifdefs
around 64-bit accessors (Ramesh Thomas)
- Update vfio-pci shadow ROM handling and allow cached ROM from setup
data to be exposed as a functional ROM BAR region when available
(Yunxiang Li)
- Update nvgrace-gpu vfio-pci variant driver for new Grace Blackwell
hardware, conditionalizing the uncached BAR workaround for previous
generation hardware based on the presence of a flag in a new DVSEC
capability, and include a delay during probe for link training to
complete, a new requirement for GB devices (Ankit Agrawal)
* tag 'vfio-v6.14-rc1' of https://github.com/awilliam/linux-vfio:
vfio/nvgrace-gpu: Add GB200 SKU to the devid table
vfio/nvgrace-gpu: Check the HBM training and C2C link status
vfio/nvgrace-gpu: Expose the blackwell device PF BAR1 to the VM
vfio/nvgrace-gpu: Read dvsec register to determine need for uncached resmem
vfio/platform: check the bounds of read/write syscalls
vfio/pci: Expose setup ROM at ROM bar when needed
vfio/pci: Remove shadow ROM specific code paths
vfio/pci: Remove #ifdef iowrite64 and #ifdef ioread64
vfio/pci: Enable iowrite64 and ioread64 for vfio pci
Ard Biesheuvel [Mon, 27 Jan 2025 11:43:37 +0000 (12:43 +0100)]
x86/sev: Disable jump tables in SEV startup code
When retpolines and IBT are both disabled, the compiler is free to use
jump tables to optimize switch instructions. However, these are emitted
by Clang as absolute references into .rodata:
Given that this code will execute before that address in .rodata has even
been mapped, it is guaranteed to crash a SEV-SNP guest in a way that is
difficult to diagnose.
So disable jump tables when building this code. It would be better if we
could attach this annotation to the __head macro but this appears to be
impossible.
Linus Torvalds [Tue, 28 Jan 2025 20:25:12 +0000 (12:25 -0800)]
Merge tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core and debugfs updates from Greg KH:
"Here is the big set of driver core and debugfs updates for 6.14-rc1.
Included in here is a bunch of driver core, PCI, OF, and platform rust
bindings (all acked by the different subsystem maintainers), hence the
merge conflict with the rust tree, and some driver core api updates to
mark things as const, which will also require some fixups due to new
stuff coming in through other trees in this merge window.
There are also a bunch of debugfs updates from Al, and there is at
least one user that does have a regression with these, but Al is
working on tracking down the fix for it. In my use (and everyone
else's linux-next use), it does not seem like a big issue at the
moment.
Here's a short list of the things in here:
- driver core rust bindings for PCI, platform, OF, and some i/o
functions.
We are almost at the "write a real driver in rust" stage now,
depending on what you want to do.
- misc device rust bindings and a sample driver to show how to use
them
- debugfs cleanups in the fs as well as the users of the fs api for
places where drivers got it wrong or were unnecessarily doing
things in complex ways.
- driver core const work, making more of the api take const * for
different parameters to make the rust bindings easier overall.
- other small fixes and updates
All of these have been in linux-next with all of the aforementioned
merge conflicts, and the one debugfs issue, which looks to be resolved
"soon""
* tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (95 commits)
rust: device: Use as_char_ptr() to avoid explicit cast
rust: device: Replace CString with CStr in property_present()
devcoredump: Constify 'struct bin_attribute'
devcoredump: Define 'struct bin_attribute' through macro
rust: device: Add property_present()
saner replacement for debugfs_rename()
orangefs-debugfs: don't mess with ->d_name
octeontx2: don't mess with ->d_parent or ->d_parent->d_name
arm_scmi: don't mess with ->d_parent->d_name
slub: don't mess with ->d_name
sof-client-ipc-flood-test: don't mess with ->d_name
qat: don't mess with ->d_name
xhci: don't mess with ->d_iname
mtu3: don't mess wiht ->d_iname
greybus/camera - stop messing with ->d_iname
mediatek: stop messing with ->d_iname
netdevsim: don't embed file_operations into your structs
b43legacy: make use of debugfs_get_aux()
b43: stop embedding struct file_operations into their objects
carl9170: stop embedding file_operations into their objects
...
Linus Torvalds [Tue, 28 Jan 2025 19:35:58 +0000 (11:35 -0800)]
Merge tag 'stop-machine.2025.01.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull stop_machine update from Paul McKenney:
"Move a misplaced call to rcu_momentary_eqs() from multi_cpu_stop() to
ensure that interrupts are disabled as required"
* tag 'stop-machine.2025.01.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
stop_machine: Fix rcu_momentary_eqs() call in multi_cpu_stop()
Linus Torvalds [Tue, 28 Jan 2025 19:34:03 +0000 (11:34 -0800)]
Merge tag 'csd-lock.2025.01.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull CSD-lock update from Paul McKenney:
"Allow runtime modification of the csd_lock_timeout and
panic_on_ipistall module parameters"
* tag 'csd-lock.2025.01.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
locking/csd-lock: make CSD lock debug tunables writable in /sys
Linus Torvalds [Tue, 28 Jan 2025 17:55:04 +0000 (09:55 -0800)]
Merge tag 'tty-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver updates from Greg KH:
"Here is the tty/serial driver set of changes for 6.14-rc1. Nothing
major in here, it was delayed a bit due to a regression found in
linux-next which has now been reverted and verified that it is fixed.
Other than the reverts, highlights include:
- 8250 work to get the nbcon mode working (partially reverted)
- altera_jtaguart minor fixes
- fsl_lpuart minor updates
- sh-sci driver minor updatesa
- other tiny driver updates and cleanups
All of these have been in linux-next for a while, and now with no
reports of problems (thanks to the reverts)"
* tag 'tty-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (44 commits)
Revert "serial: 8250: Switch to nbcon console"
Revert "serial: 8250: Revert "drop lockdep annotation from serial8250_clear_IER()""
serial: sh-sci: Increment the runtime usage counter for the earlycon device
serial: sh-sci: Clean sci_ports[0] after at earlycon exit
serial: sh-sci: Do not probe the serial port if its slot in sci_ports[] is in use
serial: sh-sci: Move runtime PM enable to sci_probe_single()
serial: sh-sci: Drop __initdata macro for port_cfg
serial: kgdb_nmi: Remove unused knock code
tty: Permit some TIOCL_SETSEL modes without CAP_SYS_ADMIN
tty: xilinx_uartps: split sysrq handling
serial: 8250: Revert "drop lockdep annotation from serial8250_clear_IER()"
serial: 8250: Switch to nbcon console
serial: 8250: Provide flag for IER toggling for RS485
serial: 8250: Use high-level writing function for FIFO
serial: 8250: Use frame time to determine timeout
serial: 8250: Adjust the timeout for FIFO mode
tty: atmel_serial: Use of_property_present() for non-boolean properties
serial: sc16is7xx: Add polling mode if no IRQ pin is available
dt-bindings: serial: sc16is7xx: Add description for polling mode
tty: serial: atmel: make it selectable for ARCH_LAN969X
...
Linus Torvalds [Tue, 28 Jan 2025 17:01:36 +0000 (09:01 -0800)]
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull KVM/arm64 updates from Will Deacon:
"New features:
- Support for non-protected guest in protected mode, achieving near
feature parity with the non-protected mode
- Support for the EL2 timers as part of the ongoing NV support
- Allow control of hardware tracing for nVHE/hVHE
Improvements, fixes and cleanups:
- Massive cleanup of the debug infrastructure, making it a bit less
awkward and definitely easier to maintain. This should pave the way
for further optimisations
- Complete rewrite of pKVM's fixed-feature infrastructure, aligning
it with the rest of KVM and making the code easier to follow
- Large simplification of pKVM's memory protection infrastructure
- Better handling of RES0/RES1 fields for memory-backed system
registers
- Add a workaround for Qualcomm's Snapdragon X CPUs, which suffer
from a pretty nasty timer bug
- Small collection of cleanups and low-impact fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits)
arm64/sysreg: Get rid of TRFCR_ELx SysregFields
KVM: arm64: nv: Fix doc header layout for timers
KVM: arm64: nv: Apply RESx settings to sysreg reset values
KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors
KVM: arm64: Fix selftests after sysreg field name update
coresight: Pass guest TRFCR value to KVM
KVM: arm64: Support trace filtering for guests
KVM: arm64: coresight: Give TRBE enabled state to KVM
coresight: trbe: Remove redundant disable call
arm64/sysreg/tools: Move TRFCR definitions to sysreg
tools: arm64: Update sysreg.h header files
KVM: arm64: Drop pkvm_mem_transition for host/hyp donations
KVM: arm64: Drop pkvm_mem_transition for host/hyp sharing
KVM: arm64: Drop pkvm_mem_transition for FF-A
KVM: arm64: Explicitly handle BRBE traps as UNDEFINED
KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe()
arm64: kvm: Introduce nvhe stack size constants
KVM: arm64: Fix nVHE stacktrace VA bits mask
KVM: arm64: Fix FEAT_MTE in pKVM
Documentation: Update the behaviour of "kvm-arm.mode"
...
Linus Torvalds [Tue, 28 Jan 2025 16:52:01 +0000 (08:52 -0800)]
Merge tag 'loongarch-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Migrate to the generic rule for built-in DTB
- Disable FIX_EARLYCON_MEM when ARCH_IOREMAP is enabled
- Derive timer max_delta from PRCFG1's timer_bits
- Correct the cacheinfo sharing information
- Add pgprot_nx() implementation
- Add debugfs entries to switch SFB/TSO state
- Change the maximum number of watchpoints
- Some bug fixes and other small changes
* tag 'loongarch-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Extend the maximum number of watchpoints
LoongArch: Change 8 to 14 for LOONGARCH_MAX_{BRP,WRP}
LoongArch: Add debugfs entries to switch SFB/TSO state
LoongArch: Fix warnings during S3 suspend
LoongArch: Adjust SETUP_SLEEP and SETUP_WAKEUP
LoongArch: Refactor bug_handler() implementation
LoongArch: Add pgprot_nx() implementation
LoongArch: Correct the __switch_to() prototype in comments
LoongArch: Correct the cacheinfo sharing information
LoongArch: Derive timer max_delta from PRCFG1's timer_bits
LoongArch: Disable FIX_EARLYCON_MEM when ARCH_IOREMAP is enabled
LoongArch: Migrate to the generic rule for built-in DTB
Linus Torvalds [Tue, 28 Jan 2025 16:38:30 +0000 (08:38 -0800)]
Merge tag 'sparc-for-6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc
Pull sparc updates from Andreas Larsson:
- Improve performance for reading /proc/interrupts
- Simplify irq code for sun4v
- Replace zero-length array with flexible array in struct for pci for
sparc64
* tag 'sparc-for-6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc:
sparc/irq: Remove unneeded if check in sun4v_cookie_only_virqs()
sparc/irq: Use str_enabled_disabled() helper function
sparc: replace zero-length array with flexible-array member
sparc/irq: use seq_put_decimal_ull_width() for decimal values
Joel Granados [Tue, 28 Jan 2025 12:48:37 +0000 (13:48 +0100)]
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.
Created this by running an spatch followed by a sed command:
Spatch:
virtual patch
@
depends on !(file in "net")
disable optional_qualifier
@
Reviewed-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/ Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>
Linus Torvalds [Tue, 28 Jan 2025 04:58:58 +0000 (20:58 -0800)]
Merge tag 'f2fs-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this series, there are several major improvements such as folio
conversion by Matthew, speed-up of block truncation, and caching more
dentry pages.
In addition, we implemented a linear dentry search to address recent
unicode regression, and figured out some false alarms that we could
get rid of.
Enhancements:
- foilio conversion in various IO paths
- optimize f2fs_truncate_data_blocks_range()
- cache more dentry pages
- remove unnecessary blk_finish_plug
- procfs: show mtime in segment_bits
Bug fixes:
- introduce linear search for dentries
- don't call block truncation for aliased file
- fix using wrong 'submitted' value in f2fs_write_cache_pages
- fix to do sanity check correctly on i_inline_xattr_size
- avoid trying to get invalid block address
- fix inconsistent dirty state of atomic file"
* tag 'f2fs-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits)
f2fs: fix inconsistent dirty state of atomic file
f2fs: fix to avoid changing 'check only' behaior of recovery
f2fs: Clean up the loop outside of f2fs_invalidate_blocks()
f2fs: procfs: show mtime in segment_bits
f2fs: fix to avoid return invalid mtime from f2fs_get_section_mtime()
f2fs: Fix format specifier in sanity_check_inode()
f2fs: avoid trying to get invalid block address
f2fs: fix to do sanity check correctly on i_inline_xattr_size
f2fs: remove blk_finish_plug
f2fs: Optimize f2fs_truncate_data_blocks_range()
f2fs: fix using wrong 'submitted' value in f2fs_write_cache_pages
f2fs: add parameter @len to f2fs_invalidate_blocks()
f2fs: update_sit_entry_for_release() supports consecutive blocks.
f2fs: introduce update_sit_entry_for_release/alloc()
f2fs: don't call block truncation for aliased file
f2fs: Introduce linear search for dentries
f2fs: add parameter @len to f2fs_invalidate_internal_cache()
f2fs: expand f2fs_invalidate_compress_page() to f2fs_invalidate_compress_pages_range()
f2fs: ensure that node info flags are always initialized
f2fs: The GC triggered by ioctl also needs to mark the segno as victim
...
This interoperates with similar functionality introduced into the
Linux NFS client in v6.11. An attribute delegation permits an NFS
client to manage a file's mtime, rather than flushing dirty data to
the NFS server so that the file's mtime reflects the last write, which
is considerably slower.
Neil Brown contributed dynamic NFSv4.1 session slot table resizing.
This facility enables NFSD to increase or decrease the number of slots
per NFS session depending on server memory availability. More session
slots means greater parallelism.
Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND
encoding screws up when crossing a page boundary in the encoding
buffer. This is a zero-day bug, but hitting it is rare and depends on
the NFS client implementation. The Linux NFS client does not happen to
trigger this issue.
A variety of bug fixes and other incremental improvements fill out the
list of commits in this release. Great thanks to all contributors,
reviewers, testers, and bug reporters who participated during this
development cycle"
* tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits)
sunrpc: Remove gss_{de,en}crypt_xdr_buf deadcode
sunrpc: Remove gss_generic_token deadcode
sunrpc: Remove unused xprt_iter_get_xprt
Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages"
nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION
nfsd: handle delegated timestamps in SETATTR
nfsd: add support for delegated timestamps
nfsd: rework NFS4_SHARE_WANT_* flag handling
nfsd: add support for FATTR4_OPEN_ARGUMENTS
nfsd: prepare delegation code for handing out *_ATTRS_DELEG delegations
nfsd: rename NFS4_SHARE_WANT_* constants to OPEN4_SHARE_ACCESS_WANT_*
nfsd: switch to autogenerated definitions for open_delegation_type4
nfs_common: make include/linux/nfs4.h include generated nfs4_1.h
nfsd: fix handling of delegated change attr in CB_GETATTR
SUNRPC: Document validity guarantees of the pointer returned by reserve_space
NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode buffer
NFSD: Insulate nfsd4_encode_secinfo() from page boundaries in the encode buffer
NFSD: Refactor nfsd4_do_encode_secinfo() again
NFSD: Insulate nfsd4_encode_readlink() from page boundaries in the encode buffer
NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the encode buffer
...
Linus Torvalds [Tue, 28 Jan 2025 01:06:42 +0000 (17:06 -0800)]
Merge tag 'for-6.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:
- fix a spelling error in dm-raid
- change kzalloc to kcalloc
- remove useless test in alloc_multiple_bios
- disable REQ_NOWAIT for flushes
- dm-transaction-manager: use red-black trees instead of linear lists
- atomic writes support for dm-linear, dm-stripe and dm-mirror
- dm-crypt: code cleanups and two bugfixes
* tag 'for-6.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-crypt: track tag_offset in convert_context
dm-crypt: don't initialize cc_sector again
dm-crypt: don't update io->sector after kcryptd_crypt_write_io_submit()
dm-crypt: use bi_sector in bio when initialize integrity seed
dm-crypt: fully initialize clone->bi_iter in crypt_alloc_buffer()
dm-crypt: set atomic as false when calling crypt_convert() in kworker
dm-mirror: Support atomic writes
dm-io: Warn on creating multiple atomic write bios for a region
dm-stripe: Enable atomic writes
dm-linear: Enable atomic writes
dm: Ensure cloned bio is same length for atomic write
dm-table: atomic writes support
dm-transaction-manager: use red-black trees instead of linear lists
dm: disable REQ_NOWAIT for flushes
dm: remove useless test in alloc_multiple_bios
dm: change kzalloc to kcalloc
dm raid: fix spelling errors in raid_ctr()
Linus Torvalds [Tue, 28 Jan 2025 00:51:51 +0000 (16:51 -0800)]
Merge tag 'char-misc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull Char/Misc/IIO driver updates from Greg KH:
"Here is the "big" set of char/misc/iio and other smaller driver
subsystem updates for 6.14-rc1. Loads of different things in here this
development cycle, highlights are:
- ntsync "driver" to handle Windows locking types enabling Wine to
work much better on many workloads (i.e. games). The driver
framework was in 6.13, but now it's enabled and fully working
properly. Should make many SteamOS users happy. Even comes with
tests!
- Large IIO driver updates and bugfixes
- FPGA driver updates
- Coresight driver updates
- MHI driver updates
- PPS driver updatesa
- const bin_attribute reworking for many drivers
- binder driver updates
- smaller driver updates and fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (311 commits)
ntsync: Fix reference leaks in the remaining create ioctls.
spmi: hisi-spmi-controller: Drop duplicated OF node assignment in spmi_controller_probe()
spmi: Set fwnode for spmi devices
ntsync: fix a file reference leak in drivers/misc/ntsync.c
scripts/tags.sh: Don't tag usages of DECLARE_BITMAP
dt-bindings: interconnect: qcom,msm8998-bwmon: Add SM8750 CPU BWMONs
dt-bindings: interconnect: OSM L3: Document sm8650 OSM L3 compatible
dt-bindings: interconnect: qcom-bwmon: Document QCS615 bwmon compatibles
interconnect: sm8750: Add missing const to static qcom_icc_desc
memstick: core: fix kernel-doc notation
intel_th: core: fix kernel-doc warnings
binder: log transaction code on failure
iio: dac: ad3552r-hs: clear reset status flag
iio: dac: ad3552r-common: fix ad3541/2r ranges
iio: chemical: bme680: Fix uninitialized variable in __bme680_read_raw()
misc: fastrpc: Fix copy buffer page size
misc: fastrpc: Fix registered buffer page address
misc: fastrpc: Deregister device nodes properly in error scenarios
nvmem: core: improve range check for nvmem_cell_write()
nvmem: qcom-spmi-sdam: Set size in struct nvmem_config
...
Linus Torvalds [Tue, 28 Jan 2025 00:29:16 +0000 (16:29 -0800)]
Merge tag 'usb-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt driver updates from Greg KH:
"Here is the USB and Thunderbolt driver updates for 6.14-rc1. Nothing
huge in here, just lots of new hardware support and updates for
existing drivers. Changes here are:
- big gadget f_tcm driver update
- other gadget driver updates and fixes
- thunderbolt driver updates for new hardware and capabilities and
lots more debugging functionality to handle it when things aren't
working well.
- xhci driver updates
- new USB-serial device updates
- typec driver updates, including a chrome platform driver (acked by
the subsystem maintainers)
- other small driver updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (123 commits)
usb: hcd: Bump local buffer size in rh_string()
Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null"
usb: typec: tcpci: Prevent Sink disconnection before vPpsShutdown in SPR PPS
usb: xhci: tegra: Fix OF boolean read warning
usb: host: xhci-plat: add support compatible ID PNP0D15
usb: typec: ucsi: Add a macro definition for UCSI v1.0
usb: dwc3: core: Defer the probe until USB power supply ready
usbip: Correct format specifier for seqnum from %d to %u
usbip: Fix seqnum sign extension issue in vhci_tx_urb
dt-bindings: usb: snps,dwc3: Split core description
usb: quirks: Add NO_LPM quirk for TOSHIBA TransMemory-Mx device
usb: dwc3: gadget: Reinitiate stream for all host NoStream behavior
USB: Use str_enable_disable-like helpers
USB: gadget: Use str_enable_disable-like helpers
USB: phy: Use str_enable_disable-like helpers
USB: typec: Use str_enable_disable-like helpers
USB: host: Use str_enable_disable-like helpers
USB: Replace own str_plural with common one
USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb()
usb: phy: Remove API devm_usb_put_phy()
...
Al Viro [Mon, 6 Jan 2025 02:33:17 +0000 (21:33 -0500)]
9p: fix ->rename_sem exclusion
9p wants to be able to build a path from given dentry to fs root and keep
it valid over a blocking operation.
->s_vfs_rename_mutex would be a natural candidate, but there are places
where we need that and where we have no way to tell if ->s_vfs_rename_mutex
is already held deeper in callchain. Moreover, it's only held for
cross-directory renames; name changes within the same directory happen
without it.
Solution:
* have d_move() done in ->rename() rather than in its caller
* maintain a 9p-private rwsem (per-filesystem)
* hold it exclusive over the relevant part of ->rename()
* hold it shared over the places where we want the path.
That almost works. FS_RENAME_DOES_D_MOVE is enough to put all d_move()
and d_exchange() calls under filesystem's control. However, there's
also __d_unalias(), which isn't covered by any of that.
If ->lookup() hits a directory inode with preexisting dentry elsewhere
(due to e.g. rename done on server behind our back), d_splice_alias()
called by ->lookup() will move/rename that alias.
Add a couple of optional methods, so that __d_unalias() would do
if alias->d_op->d_unalias_trylock != NULL
if (!alias->d_op->d_unalias_trylock(alias))
fail (resulting in -ESTALE from lookup)
__d_move(...)
if alias->d_op->d_unalias_unlock != NULL
alias->d_unalias_unlock(alias)
where it currently does __d_move(). 9p instances do down_write_trylock()
and up_write() of ->rename_mutex.
Al Viro [Sun, 8 Dec 2024 06:27:11 +0000 (01:27 -0500)]
nfs: fix ->d_revalidate() UAF on ->d_name accesses
Pass the stable name all the way down to ->rpc_ops->lookup() instances.
Note that passing &dentry->d_name is safe in e.g. nfs_lookup() - it *is*
stable there, as it is in ->create() et.al.
dget_parent() in nfs_instantiate() should be redundant - it'd better be
stable there; if it's not, we have more trouble, since ->d_name would
also be unsafe in such case.
nfs_submount() and nfs4_submount() may or may not require fixes - if
they ever get moved on server with fhandle preserved, we are in trouble
there...
UAF window is fairly narrow here and exfiltration requires the ability
to watch the traffic.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Jan 2025 06:23:50 +0000 (01:23 -0500)]
gfs2_drevalidate(): use stable parent inode and name passed by caller
No need to mess with dget_parent() for the former; for the latter we really should
not rely upon ->d_name.name remaining stable. Theoretically a UAF, but it's
hard to exfiltrate the information...
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Jan 2025 06:20:35 +0000 (01:20 -0500)]
fuse_dentry_revalidate(): use stable parent inode and name passed by caller
No need to mess with dget_parent() for the former; for the latter we really should
not rely upon ->d_name.name remaining stable - it's a real-life UAF.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 7 Jan 2025 15:04:11 +0000 (10:04 -0500)]
ceph_d_revalidate(): propagate stable name down into request encoding
Currently get_fscrypt_altname() requires ->r_dentry->d_name to be stable
and it gets that in almost all cases. The only exception is ->d_revalidate(),
where we have a stable name, but it's passed separately - dentry->d_name
is not stable there.
Propagate it down to get_fscrypt_altname() as a new field of struct
ceph_mds_request - ->r_dname, to be used instead ->r_dentry->d_name
when non-NULL.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Jan 2025 05:54:18 +0000 (00:54 -0500)]
ceph_d_revalidate(): use stable parent inode passed by caller
No need to mess with the boilerplate for obtaining what we already
have. Note that ceph is one of the "will want a path from filesystem
root if we want to talk to server" cases, so the name of the last
component is of little use - it is passed to fscrypt_d_revalidate()
and it's used to deal with (also crypt-related) case in request
marshalling, when encrypted name turns out to be too long. The former
is not a problem, but the latter is racy; that part will be handled
in the next commit.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Jan 2025 05:27:27 +0000 (00:27 -0500)]
afs_d_revalidate(): use stable name and parent inode passed by caller
No need to bother with boilerplate for obtaining the latter and for
the former we really should not count upon ->d_name.name remaining
stable under us.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 8 Dec 2024 05:28:51 +0000 (00:28 -0500)]
Pass parent directory inode and expected name to ->d_revalidate()
->d_revalidate() often needs to access dentry parent and name; that has
to be done carefully, since the locking environment varies from caller
to caller. We are not guaranteed that dentry in question will not be
moved right under us - not unless the filesystem is such that nothing
on it ever gets renamed.
It can be dealt with, but that results in boilerplate code that isn't
even needed - the callers normally have just found the dentry via dcache
lookup and want to verify that it's in the right place; they already
have the values of ->d_parent and ->d_name stable. There is a couple
of exceptions (overlayfs and, to less extent, ecryptfs), but for the
majority of calls that song and dance is not needed at all.
It's easier to make ecryptfs and overlayfs find and pass those values if
there's a ->d_revalidate() instance to be called, rather than doing that
in the instances.
This commit only changes the calling conventions; making use of supplied
values is left to followups.
NOTE: some instances need more than just the parent - things like CIFS
may need to build an entire path from filesystem root, so they need
more precautions than the usual boilerplate. This series doesn't
do anything to that need - these filesystems have to keep their locking
mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem
a-la v9fs).
One thing to keep in mind when using name is that name->name will normally
point into the pathname being resolved; the filename in question occupies
name->len bytes starting at name->name, and there is NUL somewhere after it,
but it the next byte might very well be '/' rather than '\0'. Do not
ignore name->len.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 23 Dec 2024 07:23:00 +0000 (02:23 -0500)]
generic_ci_d_compare(): use shortname_storage
... and check the "name might be unstable" predicate
the right way.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 10 Dec 2024 01:03:33 +0000 (20:03 -0500)]
dissolve external_name.u into separate members
... and document the constraints on the layout. Kept separate from
the previous commit to keep the noise separate from actual changes.
The reason for explicit __aligned() on ->name[] rather than relying
upon the alignment of the previous field is that the previous iteration
of that commit tried to save 4 bytes on 64bit by eliminating a hole
in there, which broke the assumptions in dentry_string_cmp().
Better spell it out and avoid the temptation for the future...
Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Mon, 27 Jan 2025 23:45:29 +0000 (15:45 -0800)]
Merge tag 'pwm/for-6.14-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fixes from Uwe Kleine-König:
"Two fixes.
Conor Dooley found and fixed a problem in the pwm-microchip-core
driver that existed since the driver's birth in v6.5-rc1. It's about a
corner case that only happens if two pwm devices of the same chip are
set to the same long period.
The other problem is about the new pwm API that currently is only
supported by two hardware drivers. The fix prevents a NULL pointer
exception if one of the new functions is called for a pwm device with
a driver that only provides the old callbacks"
* tag 'pwm/for-6.14-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
pwm: Ensure callbacks exist before calling them
pwm: microchip-core: fix incorrect comparison with max period
Linus Torvalds [Mon, 27 Jan 2025 23:37:16 +0000 (15:37 -0800)]
Merge tag 'for-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:
"Power-supply core:
- introduce power supply extensions, which allows adding properties
to a power supply device from a separate driver. This will be used
initially to extend the generic ACPI charger/battery driver with
vendor extensions for charge thresholds.
- convert all drivers from power_supply_for_each_device to new
power_supply_for_each_psy(), which avoids lots of casting being
done in the drivers.
- avoid LED trigger like values in uevent for
POWER_SUPPLY_PROP_CHARGE_BEHAVIOUR
- introduce POWER_SUPPLY_PROP_CHARGE_TYPES, which is similar to the
POWER_SUPPLY_PROP_CHARGE_TYPE property, but also lists the
available options on the specific platform
Power-supply drivers
- dell-laptop: use new power_supply_charge_types_show/_parse helpers
- stc3117: new driver for equally named fuel gauge chip
- bq24190: add support for new POWER_SUPPLY_PROP_CHARGE_TYPES
- bq24190: add BQ24297 support
- bq27xxx: add voltage min design for bq27000/bq27200
- cros_charge-control: convert to new power supply extension API
- ltc4162-l-charger: add ltc4162-f/s and ltc4015 support
- gpio-charger: support for default charge current limit
- misc small cleanups and fixes
Reset drivers:
- at91-poweroff: add sam9x7 support"
* tag 'for-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (77 commits)
power: supply: max1720x: add support for reading internal and thermistor temperatures
power: supply: ltc4162l: Use GENMASK macro in bitmask operation
power: supply: max17042: add max77705 fuel gauge support
dt-bindings: power: supply: max17042: add max77705 support
power: supply: add undervoltage health status property
power: supply: max17042: add platform driver variant
power: supply: max17042: make interrupt shared
power: reset: keystone: Use syscon_regmap_lookup_by_phandle_args
power: supply: Use str_enable_disable-like helpers
platform/x86: dell-laptop: Use power_supply_charge_types_show/_parse() helpers
power: supply: bq2415x_charger: Immediately reschedule delayed work on notifier events
power: supply: Add STC3117 fuel gauge unit driver
dt-bindings: power: supply: Add STC3117 Fuel Gauge
power: supply: ug3105_battery: Let the core handle POWER_SUPPLY_PROP_TECHNOLOGY
power: supply: gpio-charger: add support for default charge current limit
dt-bindings: power: supply: gpio-charger: add support for default charge current limit
power: supply: Use power_supply_external_power_changed() in __power_supply_changed_work()
power: supply: core: fix build of extension sysfs group if CONFIG_SYSFS=n
power: supply: bq2415x_charger: report charging state changes to userspace
bq27xxx: add voltage min design for bq27000 and bq27200
...
Linus Torvalds [Mon, 27 Jan 2025 23:26:06 +0000 (15:26 -0800)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"A small number of improvements all over the place:
- vdpa/octeon support for multiple interrupts
- virtio-pci support for error recovery
- vp_vdpa support for notification with data
- vhost/net fix to set num_buffers for spec compliance
- virtio-mem now works with kdump on s390
And small cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (23 commits)
virtio_blk: Add support for transport error recovery
virtio_pci: Add support for PCIe Function Level Reset
vhost/net: Set num_buffers for virtio 1.0
vdpa/octeon_ep: read vendor-specific PCI capability
virtio-pci: define type and header for PCI vendor data
vdpa/octeon_ep: handle device config change events
vdpa/octeon_ep: enable support for multiple interrupts per device
vdpa: solidrun: Replace deprecated PCI functions
s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM)
virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM
virtio-mem: remember usable region size
virtio-mem: mark device ready before registering callbacks in kdump mode
fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
fs/proc/vmcore: factor out freeing a list of vmcore ranges
fs/proc/vmcore: factor out allocating a vmcore range and adding it to a list
fs/proc/vmcore: move vmcore definitions out of kcore.h
fs/proc/vmcore: prefix all pr_* with "vmcore:"
fs/proc/vmcore: disallow vmcore modifications while the vmcore is open
fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
...
Bernd Schubert [Thu, 23 Jan 2025 16:55:32 +0000 (17:55 +0100)]
fuse: prevent disabling io-uring on active connections
The enable_uring module parameter allows administrators to enable/disable
io-uring support for FUSE at runtime. However, disabling io-uring while
connections already have it enabled can lead to an inconsistent state.
Fix this by keeping io-uring enabled on connections that were already using
it, even if the module parameter is later disabled. This ensures active
FUSE mounts continue to function correctly.
Bernd Schubert [Mon, 20 Jan 2025 01:29:09 +0000 (02:29 +0100)]
fuse: block request allocation until io-uring init is complete
Avoid races and block request allocation until io-uring
queues are ready.
This is a especially important for background requests,
as bg request completion might cause lock order inversion
of the typical queue->lock and then fc->bg_lock
Bernd Schubert [Mon, 20 Jan 2025 01:29:08 +0000 (02:29 +0100)]
fuse: {io-uring} Prevent mount point hang on fuse-server termination
When the fuse-server terminates while the fuse-client or kernel
still has queued URING_CMDs, these commands retain references
to the struct file used by the fuse connection. This prevents
fuse_dev_release() from being invoked, resulting in a hung mount
point.
This patch addresses the issue by making queued URING_CMDs
cancelable, allowing fuse_dev_release() to proceed as expected
and preventing the mount point from hanging.
Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Bernd Schubert [Mon, 20 Jan 2025 01:29:04 +0000 (02:29 +0100)]
fuse: {io-uring} Handle teardown of ring entries
On teardown struct file_operations::uring_cmd requests
need to be completed by calling io_uring_cmd_done().
Not completing all ring entries would result in busy io-uring
tasks giving warning messages in intervals and unreleased
struct file.
Additionally the fuse connection and with that the ring can
only get released when all io-uring commands are completed.
Completion is done with ring entries that are
a) in waiting state for new fuse requests - io_uring_cmd_done
is needed
b) already in userspace - io_uring_cmd_done through teardown
is not needed, the request can just get released. If fuse server
is still active and commits such a ring entry, fuse_uring_cmd()
already checks if the connection is active and then complete the
io-uring itself with -ENOTCONN. I.e. special handling is not
needed.
This scheme is basically represented by the ring entry state
FRRS_WAIT and FRRS_USERSPACE.
Entries in state:
- FRRS_INIT: No action needed, do not contribute to
ring->queue_refs yet
- All other states: Are currently processed by other tasks,
async teardown is needed and it has to wait for the two
states above. It could be also solved without an async
teardown task, but would require additional if conditions
in hot code paths. Also in my personal opinion the code
looks cleaner with async teardown.
Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Linus Torvalds [Mon, 27 Jan 2025 17:00:25 +0000 (09:00 -0800)]
Merge tag 'mips_6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
"Cleanups and fixes"
* tag 'mips_6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: pci-legacy: Override pci_address_to_pio
MIPS: Loongson64: env: Use str_on_off() helper in prom_lefi_init_env()
MIPS: migrate to generic rule for built-in DTBs
mips: fix shmctl/semctl/msgctl syscall for o32
mips/math-emu: fix emulation of the prefx instruction
MIPS: Loongson: Add comments for interface_info
MIPS: Loongson64: remove ROM Size unit in boardinfo
MIPS: traps: Use str_enabled_disabled() in parity_protection_init()
MIPS: ftrace: Declare ftrace_get_parent_ra_addr() as static
Revert "MIPS: csrc-r4k: Select HAVE_UNSTABLE_SCHED_CLOCK if SMP && 64BIT"
MIPS: Fix the wrong format specifier
MIPS: Add a blank line after __HEAD
MIPS: kernel: Rename read/write_c0_ecc to read/writec0_errctl
Ankit Agrawal [Fri, 24 Jan 2025 18:31:01 +0000 (18:31 +0000)]
vfio/nvgrace-gpu: Check the HBM training and C2C link status
In contrast to Grace Hopper systems, the HBM training has been moved
out of the UEFI on the Grace Blackwell systems. This reduces the system
bootup time significantly.
The onus of checking whether the HBM training has completed thus falls
on the module.
The HBM training status can be determined from a BAR0 register.
Similarly, another BAR0 register exposes the status of the CPU-GPU
chip-to-chip (C2C) cache coherent interconnect.
Based on testing, 30s is determined to be sufficient to ensure
initialization completion on all the Grace based systems. Thus poll
these register and check for 30s. If the HBM training is not complete
or if the C2C link is not ready, fail the probe.
While the time is not required on Grace Hopper systems, it is
beneficial to make the check to ensure the device is in an
expected state. Hence keeping it generalized to both the generations.
Ensure that the BAR0 is enabled before accessing the registers.
CC: Alex Williamson <alex.williamson@redhat.com> CC: Kevin Tian <kevin.tian@intel.com> CC: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20250124183102.3976-4-ankita@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Ankit Agrawal [Fri, 24 Jan 2025 18:31:00 +0000 (18:31 +0000)]
vfio/nvgrace-gpu: Expose the blackwell device PF BAR1 to the VM
There is a HW defect on Grace Hopper (GH) to support the
Multi-Instance GPU (MIG) feature [1] that necessiated the presence
of a 1G region carved out from the device memory and mapped as
uncached. The 1G region is shown as a fake BAR (comprising region 2 and 3)
to workaround the issue.
The Grace Blackwell systems (GB) differ from GH systems in the following
aspects:
1. The aforementioned HW defect is fixed on GB systems.
2. There is a usable BAR1 (region 2 and 3) on GB systems for the
GPUdirect RDMA feature [2].
This patch accommodate those GB changes by showing the 64b physical
device BAR1 (region2 and 3) to the VM instead of the fake one. This
takes care of both the differences.
Moreover, the entire device memory is exposed on GB as cacheable to
the VM as there is no carveout required.
Ankit Agrawal [Fri, 24 Jan 2025 18:30:59 +0000 (18:30 +0000)]
vfio/nvgrace-gpu: Read dvsec register to determine need for uncached resmem
NVIDIA's recently introduced Grace Blackwell (GB) Superchip is a
continuation with the Grace Hopper (GH) superchip that provides a
cache coherent access to CPU and GPU to each other's memory with
an internal proprietary chip-to-chip cache coherent interconnect.
There is a HW defect on GH systems to support the Multi-Instance
GPU (MIG) feature [1] that necessiated the presence of a 1G region
with uncached mapping carved out from the device memory. The 1G
region is shown as a fake BAR (comprising region 2 and 3) to
workaround the issue. This is fixed on the GB systems.
The presence of the fix for the HW defect is communicated by the
device firmware through the DVSEC PCI config register with ID 3.
The module reads this to take a different codepath on GB vs GH.
Scan through the DVSEC registers to identify the correct one and use
it to determine the presence of the fix. Save the value in the device's
nvgrace_gpu_pci_core_device structure.
Bernd Schubert [Mon, 20 Jan 2025 01:29:03 +0000 (02:29 +0100)]
fuse: Add io-uring sqe commit and fetch support
This adds support for fuse request completion through ring SQEs
(FUSE_URING_CMD_COMMIT_AND_FETCH handling). After committing
the ring entry it becomes available for new fuse requests.
Handling of requests through the ring (SQE/CQE handling)
is complete now.
Fuse request data are copied through the mmaped ring buffer,
there is no support for any zero copy yet.
Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring Reviewed-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Linus Torvalds [Mon, 27 Jan 2025 16:30:06 +0000 (08:30 -0800)]
Merge tag 'm68knommu-for-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu update from Greg Ungerer:
"Just a single fix to correct the clock rate defined for the internal
timer hardware blocks of the ColdFire 5441x family of SoC devices"
* tag 'm68knommu-for-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: coldfire: Use proper clock rate for timers
Linus Torvalds [Mon, 27 Jan 2025 16:16:33 +0000 (08:16 -0800)]
Merge tag 'xtensa-20250126' of https://github.com/jcmvbkbc/linux-xtensa
Pull xtensa updates from Max Filippov:
- a few one-liner cleanups
* tag 'xtensa-20250126' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa/simdisk: Use str_write_read() helper in simdisk_transfer()
xtensa: Remove zero-length alignment array
xtensa: annotate dtb_start variable as static __initdata
Israel Rukshin [Wed, 27 Nov 2024 06:57:32 +0000 (08:57 +0200)]
virtio_blk: Add support for transport error recovery
Add support for proper cleanup and re-initialization of virtio-blk devices
during transport reset error recovery flow.
This enhancement includes:
- Pre-reset handler (reset_prepare) to perform device-specific cleanup
- Post-reset handler (reset_done) to re-initialize the device
These changes allow the device to recover from various reset scenarios,
ensuring proper functionality after a reset event occurs.
Without this implementation, the device cannot properly recover from
resets, potentially leading to undefined behavior or device malfunction.
This feature has been tested using PCI transport with Function Level
Reset (FLR) as an example reset mechanism. The reset can be triggered
manually via sysfs (echo 1 > /sys/bus/pci/devices/$PCI_ADDR/reset).
Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <1732690652-3065-3-git-send-email-israelr@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Israel Rukshin [Wed, 27 Nov 2024 06:57:31 +0000 (08:57 +0200)]
virtio_pci: Add support for PCIe Function Level Reset
Implement support for Function Level Reset (FLR) in virtio_pci devices.
This change adds reset_prepare and reset_done callbacks, allowing
drivers to properly handle FLR operations.
Without this patch, performing and recovering from an FLR is not possible
for virtio_pci devices. This implementation ensures proper FLR handling
and recovery for both physical and virtual functions.
The device reset can be triggered in case of error or manually via
sysfs:
echo 1 > /sys/bus/pci/devices/$PCI_ADDR/reset
Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <1732690652-3065-2-git-send-email-israelr@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Added support to read the vendor-specific PCI capability to identify the
type of device being emulated.
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Message-Id: <20250103153226.1933479-4-sthotton@marvell.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Shijith Thotton [Fri, 3 Jan 2025 15:31:36 +0000 (21:01 +0530)]
virtio-pci: define type and header for PCI vendor data
Added macro definition for VIRTIO_PCI_CAP_VENDOR_CFG to identify the PCI
vendor data type in the virtio_pci_cap structure. Defined a new struct
virtio_pci_vndr_data for the vendor data capability header as per the
specification.
Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Message-Id: <20250103153226.1933479-3-sthotton@marvell.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The first interrupt of the device is used to notify the host about
device configuration changes, such as link status updates. The ISR
configuration area is updated to indicate a config change event when
triggered.
Signed-off-by: Satha Rao <skoteshwar@marvell.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Message-Id: <20250103153226.1933479-2-sthotton@marvell.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Shijith Thotton [Fri, 3 Jan 2025 15:31:34 +0000 (21:01 +0530)]
vdpa/octeon_ep: enable support for multiple interrupts per device
Updated the driver to utilize all the MSI-X interrupt vectors supported
by each OCTEON endpoint VF, instead of relying on a single vector.
Enabling more interrupts allows packets from multiple rings to be
distributed across multiple cores, improving parallelism and
performance.
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Message-Id: <20250103153226.1933479-1-sthotton@marvell.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Replace these functions with their successors pcim_iomap_region() and
pcim_iounmap_region().
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Message-Id: <20241219094428.21511-2-phasta@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com>
David Hildenbrand [Wed, 4 Dec 2024 12:54:43 +0000 (13:54 +0100)]
s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM)
Let's add support for including virtio-mem device RAM in the crash dump,
setting NEED_PROC_VMCORE_DEVICE_RAM, and implementing
elfcorehdr_fill_device_ram_ptload_elf64().
To avoid code duplication, factor out the code to fill a PT_LOAD entry.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-13-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Wed, 4 Dec 2024 12:54:42 +0000 (13:54 +0100)]
virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM
Let's implement the get_device_ram() vmcore callback, so
architectures that select NEED_PROC_VMCORE_NEED_DEVICE_RAM, like s390
soon, can include that memory in a crash dump.
Merge ranges, and process ranges that might contain a mixture of plugged
and unplugged, to reduce the total number of ranges.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-12-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Wed, 4 Dec 2024 12:54:41 +0000 (13:54 +0100)]
virtio-mem: remember usable region size
Let's remember the usable region size, which will be helpful in kdump
mode next.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-11-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Wed, 4 Dec 2024 12:54:40 +0000 (13:54 +0100)]
virtio-mem: mark device ready before registering callbacks in kdump mode
After the callbacks are registered we may immediately get a callback. So
mark the device ready before registering the callbacks.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-10-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Wed, 4 Dec 2024 12:54:39 +0000 (13:54 +0100)]
fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
s390 allocates+prepares the elfcore hdr in the dump (2nd) kernel, not in
the crashed kernel.
RAM provided by memory devices such as virtio-mem can only be detected
using the device driver; when vmcore_init() is called, these device
drivers are usually not loaded yet, or the devices did not get probed
yet. Consequently, on s390 these RAM ranges will not be included in
the crash dump, which makes the dump partially corrupt and is
unfortunate.
Instead of deferring the vmcore_init() call, to an (unclear?) later point,
let's reuse the vmcore_cb infrastructure to obtain device RAM ranges as
the device drivers probe the device and get access to this information.
Then, we'll add these ranges to the vmcore, adding more PT_LOAD
entries and updating the offsets+vmcore size.
Use a separate Kconfig option to be set by an architecture to include this
code only if the arch really needs it. Further, we'll make the config
depend on the relevant drivers (i.e., virtio_mem) once they implement
support (next). The alternative of having a PROVIDE_PROC_VMCORE_DEVICE_RAM
config option was dropped for now for simplicity.
The current target use case is s390, which only creates an elf64
elfcore, so focusing on elf64 is sufficient.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-9-david@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Wed, 4 Dec 2024 12:54:38 +0000 (13:54 +0100)]
fs/proc/vmcore: factor out freeing a list of vmcore ranges
Let's factor it out into include/linux/crash_dump.h, from where we can
use it also outside of vmcore.c later.
Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-8-david@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Wed, 4 Dec 2024 12:54:37 +0000 (13:54 +0100)]
fs/proc/vmcore: factor out allocating a vmcore range and adding it to a list
Let's factor it out into include/linux/crash_dump.h, from where we can
use it also outside of vmcore.c later.
Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-7-david@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Wed, 4 Dec 2024 12:54:36 +0000 (13:54 +0100)]
fs/proc/vmcore: move vmcore definitions out of kcore.h
These vmcore defines are not related to /proc/kcore, move them out.
We'll move "struct vmcoredd_node" to vmcore.c, because it is only used
internally. While "struct vmcore" is only used internally for now,
we're planning on using it from inline functions in crash_dump.h next,
so move it to crash_dump.h.
While at it, rename "struct vmcore" to "struct vmcore_range", which is a
more suitable name and will make the usage of it outside of vmcore.c
clearer.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-6-david@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Wed, 4 Dec 2024 12:54:35 +0000 (13:54 +0100)]
fs/proc/vmcore: prefix all pr_* with "vmcore:"
Let's use "vmcore: " as a prefix, converting the single "Kdump:
vmcore not initialized" one to effectively be "vmcore: not initialized".
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-5-david@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Wed, 4 Dec 2024 12:54:34 +0000 (13:54 +0100)]
fs/proc/vmcore: disallow vmcore modifications while the vmcore is open
The vmcoredd_update_size() call and its effects (size/offset changes) are
currently completely unsynchronized, and will cause trouble when
performed concurrently, or when done while someone is already reading the
vmcore.
Let's protect all vmcore modifications by the vmcore_mutex, disallow vmcore
modifications while the vmcore is open, and warn on vmcore
modifications after the vmcore was already opened once: modifications
while the vmcore is open are unsafe, and modifications after the vmcore
was opened indicates trouble. Properly synchronize against concurrent
opening of the vmcore.
No need to grab the mutex during mmap()/read(): after we opened the
vmcore, modifications are impossible.
It's worth noting that modifications after the vmcore was opened are
completely unexpected, so failing if open, and warning if already opened
(+closed again) is good enough.
This change not only handles concurrent adding of device dumps +
concurrent reading of the vmcore properly, it also prepares for other
mechanisms that will modify the vmcore.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-4-david@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Wed, 4 Dec 2024 12:54:33 +0000 (13:54 +0100)]
fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
Now that we have a mutex that synchronizes against opening of the vmcore,
let's use that one to replace vmcoredd_mutex: there is no need to have
two separate ones.
This is a preparation for properly preventing vmcore modifications
after the vmcore was opened.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-3-david@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Wed, 4 Dec 2024 12:54:32 +0000 (13:54 +0100)]
fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
We want to protect vmcore modifications from concurrent opening of
the vmcore, and also serialize vmcore modification.
(a) We can currently modify the vmcore after it was opened. This can happen
if a vmcoredd is added after the vmcore module was initialized and
already opened by user space. We want to fix that and prepare for
new code wanting to serialize against concurrent opening.
(b) To handle it cleanly we need to protect the modifications against
concurrent opening. As the modifications end up allocating memory and
can sleep, we cannot rely on the spinlock.
Let's convert the spinlock into a mutex to prepare for further changes.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20241204125444.1734652-2-david@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Kent Overstreet [Sat, 25 Jan 2025 02:29:24 +0000 (21:29 -0500)]
bcachefs: Journal writes are now IOPRIO_CLASS_RT
System performance is particularly sensitive to journal write latency,
the number of outstanding journal writes is bounded and we can't issue
journal flushes until other journal writes have completed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>