Damien Le Moal [Thu, 6 Mar 2025 08:55:49 +0000 (17:55 +0900)]
nvmet: pci-epf: Do not add an IRQ vector if not needed
The function nvmet_pci_epf_create_cq() always unconditionally calls
nvmet_pci_epf_add_irq_vector() to add an IRQ vector for a completion
queue. But this is not correct if the host requested the creation of a
completion queue for polling, without an IRQ vector specified (i.e. the
flag NVME_CQ_IRQ_ENABLED is not set).
Fix this by calling nvmet_pci_epf_add_irq_vector() and setting the queue
flag NVMET_PCI_EPF_Q_IRQ_ENABLED for the cq only if NVME_CQ_IRQ_ENABLED
is set. While at it, also fix the error path to add the missing removal
of the added IRQ vector if nvmet_cq_create() fails.
Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
Damien Le Moal [Thu, 6 Mar 2025 08:55:48 +0000 (17:55 +0900)]
nvmet: pci-epf: Set NVMET_PCI_EPF_Q_LIVE when a queue is fully created
The function nvmet_pci_epf_create_sq() use test_and_set_bit() to check
that a submission queue is not already live and if not, set the
NVMET_PCI_EPF_Q_LIVE queue flag to declare the sq live (ready to use).
However, this is done on entry to the function, before the submission
queue is actually fully initialized and ready to use. This creates a
race situation with the function nvmet_pci_epf_poll_sqs_work() which
looks at the NVMET_PCI_EPF_Q_LIVE queue flag to poll the submission
queue when it is live. This race can lead to invalid DMA transfers if
nvmet_pci_epf_poll_sqs_work() runs after the NVMET_PCI_EPF_Q_LIVE flag
is set but before setting the sq pci address and doorbell ofset.
Avoid this race by only testing the NVMET_PCI_EPF_Q_LIVE flag on entry
to nvmet_pci_epf_create_sq() and setting it after the submission queue
is fully setup before nvmet_pci_epf_create_sq() returns success.
Since the function nvmet_pci_epf_create_cq() also has the same racy flag
setting pattern, also make a similar change in that function.
Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
Keith Busch [Thu, 6 Mar 2025 22:25:57 +0000 (14:25 -0800)]
nvme-pci: fix stuck reset on concurrent DPC and HP
The PCIe error handling has the nvme driver quiesce the device, attempt
to restart it, then wait for that restart to complete.
A PCIe DPC event also toggles the PCIe link. If the slot doesn't have
out-of-band presence detection, this will trigger a pciehp
re-enumeration.
The error handling that calls nvme_error_resume is holding the device
lock while this happens. This lock blocks pciehp's request to disconnect
the driver from proceeding.
Meanwhile the nvme's reset can't make forward progress because its
device isn't there anymore with outstanding IO, and the timeout handler
won't do anything to fix it because the device is undergoing error
handling.
End result: deadlocked.
Fix this by having the timeout handler short cut the disabling for a
disconnected PCIe device. The downside is that we're relying on an IO
timeout to clean up this mess, which could be a minute by default.
Breno Leitao [Thu, 6 Mar 2025 16:27:51 +0000 (08:27 -0800)]
block: Name the RQF flags enum
Commit 5f89154e8e9e3445f9b59 ("block: Use enum to define RQF_x bit
indexes") converted the RQF flags to an anonymous enum, which was
a beneficial change. This patch goes one step further by naming the enum
as "rqf_flags".
This naming enables exporting these flags to BPF clients, eliminating
the need to duplicate these flags in BPF code. Instead, BPF clients can
now access the same kernel-side values through CO:RE (Compile Once, Run
Everywhere), as shown in this example:
Jens Axboe [Thu, 6 Mar 2025 11:32:46 +0000 (04:32 -0700)]
Merge tag 'nvme-6.14-2025-03-05' of git://git.infradead.org/nvme into block-6.14
Pull NVMe fixe from Keith:
"nvme fixes for Linux 6.14
- TCP use after free fix on polling (Sagi)
- Controller memory buffer cleanup fixes (Icenowy)
- Free leaking requests on bad user passthrough commands (Keith)
- TCP error message fix (Maurizio)
- TCP corruption fix on partial PDU (Maurizio)
- TCP memory ordering fix for weakly ordered archs (Meir)
- Type coercion fix on message error for TCP (Dan)"
* tag 'nvme-6.14-2025-03-05' of git://git.infradead.org/nvme:
nvme-tcp: fix signedness bug in nvme_tcp_init_connection()
nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch
nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu()
nvme-tcp: Fix a C2HTermReq error message
nvmet: remove old function prototype
nvme-ioctl: fix leaked requests on mapping error
nvme-pci: skip CMB blocks incompatible with PCI P2P DMA
nvme-pci: clean up CMBMSC when registering CMB fails
nvme-tcp: fix possible UAF in nvme_tcp_poll
Dan Carpenter [Wed, 5 Mar 2025 15:52:59 +0000 (18:52 +0300)]
nvme-tcp: fix signedness bug in nvme_tcp_init_connection()
The kernel_recvmsg() function returns an int which could be either
negative error codes or the number of bytes received. The problem is
that the condition:
if (ret < sizeof(*icresp)) {
is type promoted to type unsigned long and negative values are treated
as high positive values which is success, when they should be treated as
failure. Handle invalid positive returns separately from negative
error codes to avoid this problem.
Olivier Gayot [Wed, 5 Mar 2025 02:21:54 +0000 (10:21 +0800)]
block: fix conversion of GPT partition name to 7-bit
The utf16_le_to_7bit function claims to, naively, convert a UTF-16
string to a 7-bit ASCII string. By naively, we mean that it:
* drops the first byte of every character in the original UTF-16 string
* checks if all characters are printable, and otherwise replaces them
by exclamation mark "!".
This means that theoretically, all characters outside the 7-bit ASCII
range should be replaced by another character. Examples:
* lower-case alpha (ɒ) 0x0252 becomes 0x52 (R)
* ligature OE (œ) 0x0153 becomes 0x53 (S)
* hangul letter pieup (ㅂ) 0x3142 becomes 0x42 (B)
* upper-case gamma (Ɣ) 0x0194 becomes 0x94 (not printable) so gets
replaced by "!"
The result of this conversion for the GPT partition name is passed to
user-space as PARTNAME via udev, which is confusing and feels questionable.
However, there is a flaw in the conversion function itself. By dropping
one byte of each character and using isprint() to check if the remaining
byte corresponds to a printable character, we do not actually guarantee
that the resulting character is 7-bit ASCII.
This happens because we pass 8-bit characters to isprint(), which
in the kernel returns 1 for many values > 0x7f - as defined in ctype.c.
This results in many values which should be replaced by "!" to be kept
as-is, despite not being valid 7-bit ASCII. Examples:
* e with acute accent (é) 0x00E9 becomes 0xE9 - kept as-is because
isprint(0xE9) returns 1.
* euro sign (€) 0x20AC becomes 0xAC - kept as-is because isprint(0xAC)
returns 1.
This way has broken pyudev utility[1], fixes it by using a mask of 7 bits
instead of 8 bits before calling isprint.
Uday Shankar [Tue, 4 Mar 2025 21:34:26 +0000 (14:34 -0700)]
ublk: set_params: properly check if parameters can be applied
The parameters set by the set_params call are only applied to the block
device in the start_dev call. So if a device has already been started, a
subsequently issued set_params on that device will not have the desired
effect, and should return an error. There is an existing check for this
- set_params fails on devices in the LIVE state. But this check is not
sufficient to cover the recovery case. In this case, the device will be
in the QUIESCED or FAIL_IO states, so set_params will succeed. But this
success is misleading, because the parameters will not be applied, since
the device has already been started (by a previous ublk server). The bit
UB_STATE_USED is set on completion of the start_dev; use it to detect
and fail set_params commands which arrive too late to be applied (after
start_dev).
Ming Lei [Fri, 28 Feb 2025 13:26:56 +0000 (21:26 +0800)]
block: fix 'kmem_cache of name 'bio-108' already exists'
Device mapper bioset often has big bio_slab size, which can be more than
1000, then 8byte can't hold the slab name any more, cause the kmem_cache
allocation warning of 'kmem_cache of name 'bio-108' already exists'.
Fix the warning by extending bio_slab->name to 12 bytes, but fix output
of /proc/slabinfo
Meir Elisha [Wed, 26 Feb 2025 07:28:12 +0000 (09:28 +0200)]
nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch
The order in which queue->cmd and rcv_state are updated is crucial.
If these assignments are reordered by the compiler, the worker might not
get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
the correct reordering, set rcv_state using smp_store_release().
Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error") Signed-off-by: Meir Elisha <meir.elisha@volumez.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
Maurizio Lombardi [Wed, 26 Feb 2025 13:42:18 +0000 (14:42 +0100)]
nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu()
nvme_tcp_recv_pdu() doesn't check the validity of the header length.
When header digests are enabled, a target might send a packet with an
invalid header length (e.g. 255), causing nvme_tcp_verify_hdgst()
to access memory outside the allocated area and cause memory corruptions
by overwriting it with the calculated digest.
Fix this by rejecting packets with an unexpected header length.
Maurizio Lombardi [Mon, 24 Feb 2025 14:40:58 +0000 (15:40 +0100)]
nvme-tcp: Fix a C2HTermReq error message
In H2CTermReq, a FES with value 0x05 means "R2T Limit Exceeded"; but
in C2HTermReq the same value has a different meaning (Data Transfer Limit
Exceeded).
Fixes: 84e009042d0f ("nvme-tcp: add basic support for the C2HTermReq PDU") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
Damien Le Moal [Fri, 14 Feb 2025 04:14:34 +0000 (13:14 +0900)]
block: Remove zone write plugs when handling native zone append writes
For devices that natively support zone append operations,
REQ_OP_ZONE_APPEND BIOs are not processed through zone write plugging
and are immediately issued to the zoned device. This means that there is
no write pointer offset tracking done for these operations and that a
zone write plug is not necessary.
However, when receiving a zone append BIO, we may already have a zone
write plug for the target zone if that zone was previously partially
written using regular write operations. In such case, since the write
pointer offset of the zone write plug is not incremented by the amount
of sectors appended to the zone, 2 issues arise:
1) we risk leaving the plug in the disk hash table if the zone is fully
written using zone append or regular write operations, because the
write pointer offset will never reach the "zone full" state.
2) Regular write operations that are issued after zone append operations
will always be failed by blk_zone_wplug_prepare_bio() as the write
pointer alignment check will fail, even if the user correctly
accounted for the zone append operations and issued the regular
writes with a correct sector.
Avoid these issues by immediately removing the zone write plug of zones
that are the target of zone append operations when blk_zone_plug_bio()
is called. The new function blk_zone_wplug_handle_native_zone_append()
implements this for devices that natively support zone append. The
removal of the zone write plug using disk_remove_zone_wplug() requires
aborting all plugged regular write using disk_zone_wplug_abort() as
otherwise the plugged write BIOs would never be executed (with the plug
removed, the completion path will never see again the zone write plug as
disk_get_zone_wplug() will return NULL). Rate-limited warnings are added
to blk_zone_wplug_handle_native_zone_append() and to
disk_zone_wplug_abort() to signal this.
Since blk_zone_wplug_handle_native_zone_append() is called in the hot
path for operations that will not be plugged, disk_get_zone_wplug() is
optimized under the assumption that a user issuing zone append
operations is not at the same time issuing regular writes and that there
are no hashed zone write plugs. The struct gendisk atomic counter
nr_zone_wplugs is added to check this, with this counter incremented in
disk_insert_zone_wplug() and decremented in disk_remove_zone_wplug().
To be consistent with this fix, we do not need to fill the zone write
plug hash table with zone write plugs for zones that are partially
written for a device that supports native zone append operations.
So modify blk_revalidate_seq_zone() to return early to avoid allocating
and inserting a zone write plug for partially written sequential zones
if the device natively supports zone append.
Reported-by: Jorgen Hansen <Jorgen.Hansen@wdc.com> Fixes: 9b1ce7f0c6f8 ("block: Implement zone append emulation") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Jorgen Hansen <Jorgen.Hansen@wdc.com> Link: https://lore.kernel.org/r/20250214041434.82564-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Tue, 25 Feb 2025 01:13:30 +0000 (17:13 -0800)]
nvme-ioctl: fix leaked requests on mapping error
All the callers assume nvme_map_user_request() frees the request on a
failure. This wasn't happening on invalid metadata or io_uring command
flags, so we've been leaking those requests.
Fixes: 23fd22e55b767b ("nvme: wire up fixed buffer support for nvme passthrough") Fixes: 7c2fd76048e95d ("nvme: fix metadata handling in nvme-passthrough") Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
Ming Lei [Tue, 25 Feb 2025 02:21:41 +0000 (10:21 +0800)]
block: make segment size limit workable for > 4K PAGE_SIZE
Using PAGE_SIZE as a minimum expected DMA segment size in consideration
of devices which have a max DMA segment size of < 64k when used on 64k
PAGE_SIZE systems leads to devices not being able to probe such as
eMMC and Exynos UFS controller [0] [1] you can end up with a probe failure
as follows:
WARNING: CPU: 2 PID: 397 at block/blk-settings.c:339 blk_validate_limits+0x364/0x3c0
Ensure we use min(max_seg_size, seg_boundary_mask + 1) as the new min segment
size when max segment size is < PAGE_SIZE for 16k and 64k base page size systems.
If anyone need to backport this patch, the following commits are depended:
Icenowy Zheng [Wed, 12 Feb 2025 17:04:44 +0000 (01:04 +0800)]
nvme-pci: skip CMB blocks incompatible with PCI P2P DMA
The PCI P2PDMA code will register the CMB block to the memory
hot-plugging subsystem, which have an alignment requirement. Memory
blocks that do not satisfy this alignment requirement (usually 2MB) will
lead to a WARNING from memory hotplugging.
Verify the CMB block's address and size against the alignment and only
try to send CMB blocks compatible with it to prevent this warning.
Tested on Intel DC D4502 SSD, which has a 512K CMB block that is too
small for memory hotplugging (thus PCI P2PDMA).
Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
Sagi Grimberg [Thu, 20 Feb 2025 11:18:30 +0000 (13:18 +0200)]
nvme-tcp: fix possible UAF in nvme_tcp_poll
nvme_tcp_poll() may race with the send path error handler because
it may complete the request while it is actively being polled for
completion, resulting in a UAF panic [1]:
We should make sure to stop polling when we see an error when
trying to read from the socket. Hence make sure to propagate the
error so that the block layer breaks the polling cycle.
Jens Axboe [Fri, 21 Feb 2025 00:43:59 +0000 (17:43 -0700)]
Merge tag 'nvme-6.14-2025-02-20' of git://git.infradead.org/nvme into block-6.14
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.14
- FC controller state check fixes (Daniel)
- PCI Endpoint fixes (Damien)
- TCP connection failure fixe (Caleb)
- TCP handling C2HTermReq PDU (Maurizio)
- RDMA queue state check (Ruozhu)
- Apple controller fixes (Hector)
- Target crash on disbaled namespace (Hannes)"
* tag 'nvme-6.14-2025-02-20' of git://git.infradead.org/nvme:
nvme: only allow entering LIVE from CONNECTING state
nvme-fc: rely on state transitions to handle connectivity loss
apple-nvme: Support coprocessors left idle
apple-nvme: Release power domains when probe fails
nvmet: Use enum definitions instead of hardcoded values
nvme: Cleanup the definition of the controller config register fields
nvme/ioctl: add missing space in err message
nvme-tcp: fix connect failure on receiving partial ICResp PDU
nvme: tcp: Fix compilation warning with W=1
nvmet: pci-epf: Avoid RCU stalls under heavy workload
nvmet: pci-epf: Do not uselessly write the CSTS register
nvmet: pci-epf: Correctly initialize CSTS when enabling the controller
nvmet-rdma: recheck queue state is LIVE in state lock in recv done
nvmet: Fix crash when a namespace is disabled
nvme-tcp: add basic support for the C2HTermReq PDU
nvme-pci: quirk Acer FA100 for non-uniqueue identifiers
Daniel Wagner [Fri, 14 Feb 2025 08:02:03 +0000 (09:02 +0100)]
nvme: only allow entering LIVE from CONNECTING state
The fabric transports and also the PCI transport are not entering the
LIVE state from NEW or RESETTING. This makes the state machine more
restrictive and allows to catch not supported state transitions, e.g.
directly switching from RESETTING to LIVE.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
Daniel Wagner [Fri, 14 Feb 2025 08:02:04 +0000 (09:02 +0100)]
nvme-fc: rely on state transitions to handle connectivity loss
It's not possible to call nvme_state_ctrl_state with holding a spin
lock, because nvme_state_ctrl_state calls cancel_delayed_work_sync
when fastfail is enabled.
Instead syncing the ASSOC_FLAG and state transitions using a lock, it's
possible to only rely on the state machine transitions. That means
nvme_fc_ctrl_connectivity_loss should unconditionally call
nvme_reset_ctrl which avoids the read race on the ctrl state variable.
Actually, it's not necessary to test in which state the ctrl is, the
reset work will only scheduled when the state machine is in LIVE state.
In nvme_fc_create_association, the LIVE state can only be entered if it
was previously CONNECTING. If this is not possible then the reset
handler got triggered. Thus just error out here.
Fixes: ee59e3820ca9 ("nvme-fc: do not ignore connectivity loss during connecting") Closes: https://lore.kernel.org/all/denqwui6sl5erqmz2gvrwueyxakl5txzbbiu3fgebryzrfxunm@iwxuthct377m/ Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
Hector Martin [Thu, 13 Feb 2025 16:12:58 +0000 (11:12 -0500)]
apple-nvme: Support coprocessors left idle
iBoot on at least some firmwares/machines leaves ANS2 running, requiring
a wake command instead of a CPU boot (and if we reset ANS2 in that
state, everything breaks).
Only stop the CPU if RTKit was running, and only do the reset dance if
the CPU is stopped.
Normal shutdown handoff:
- RTKit not yet running
- CPU detected not running
- Reset
- CPU powerup
- RTKit boot wait
ANS2 left running/idle:
- RTKit not yet running
- CPU detected running
- RTKit wake message
Sleep/resume cycle:
- RTKit shutdown
- CPU stopped
- (sleep here)
- CPU detected not running
- Reset
- CPU powerup
- RTKit boot wait
Shutdown or device removal:
- RTKit shutdown
- CPU stopped
Therefore, the CPU running bit serves as a consistent flag of whether
the coprocessor is fully stopped or just idle.
Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Neal Gompa <neal@gompa.dev> Reviewed-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Keith Busch <kbusch@kernel.org>
Hector Martin [Thu, 13 Feb 2025 16:12:59 +0000 (11:12 -0500)]
apple-nvme: Release power domains when probe fails
Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Neal Gompa <neal@gompa.dev> Reviewed-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Keith Busch <kbusch@kernel.org>
Damien Le Moal [Thu, 13 Feb 2025 06:50:00 +0000 (15:50 +0900)]
nvmet: Use enum definitions instead of hardcoded values
Change the definition of the inline functions nvmet_cc_en(),
nvmet_cc_css(), nvmet_cc_mps(), nvmet_cc_ams(), nvmet_cc_shn(),
nvmet_cc_iosqes(), and nvmet_cc_iocqes() to use the enum difinitions in
include/linux/nvme.h instead of hardcoded values.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
Damien Le Moal [Thu, 13 Feb 2025 06:49:59 +0000 (15:49 +0900)]
nvme: Cleanup the definition of the controller config register fields
Reorganized the enum used to define the fields of the contrller
configuration (CC) register in include/linux/nvme.h to:
1) Group together all the values defined for each field.
2) Add the missing field masks definitions.
3) Add comments to describe the enum and each field.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
nvme_validate_passthru_nsid() logs an err message whose format string is
split over 2 lines. There is a missing space between the two pieces,
resulting in log lines like "... does not match nsid (1)of namespace".
Add the missing space between ")" and "of". Also combine the format
string pieces onto a single line to make the err message easier to grep.
Fixes: e7d4b5493a2d ("nvme: factor out a nvme_validate_passthru_nsid helper") Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
nvme-tcp: fix connect failure on receiving partial ICResp PDU
nvme_tcp_init_connection() attempts to receive an ICResp PDU but only
checks that the return value from recvmsg() is non-negative. If the
sender closes the TCP connection or sends fewer than 128 bytes, this
check will pass even though the full PDU wasn't received.
Ensure the full ICResp PDU is received by checking that recvmsg()
returns the expected 128 bytes.
Additionally set the MSG_WAITALL flag for recvmsg(), as a sender could
split the ICResp over multiple TCP frames. Without MSG_WAITALL,
recvmsg() could return prematurely with only part of the PDU.
Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
Damien Le Moal [Thu, 13 Feb 2025 06:52:31 +0000 (15:52 +0900)]
nvme: tcp: Fix compilation warning with W=1
When compiling with W=1, a warning result for the function
nvme_tcp_set_queue_io_cpu():
host/tcp.c:1578: warning: Function parameter or struct member 'queue'
not described in 'nvme_tcp_set_queue_io_cpu'
host/tcp.c:1578: warning: expecting prototype for Track the number of
queues assigned to each cpu using a global per(). Prototype was for
nvme_tcp_set_queue_io_cpu() instead
Avoid this warning by using the regular comment format for the function
nvme_tcp_set_queue_io_cpu() instead of the kdoc comment format.
Fixes: 32193789878c ("nvme-tcp: Fix I/O queue cpu spreading for multiple controllers") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
Damien Le Moal [Thu, 13 Feb 2025 06:52:30 +0000 (15:52 +0900)]
nvmet: pci-epf: Avoid RCU stalls under heavy workload
The delayed work item function nvmet_pci_epf_poll_sqs_work() polls all
submission queues and keeps running in a loop as long as commands are
being submitted by the host. Depending on the preemption configuration
of the kernel, under heavy command workload, this function can thus run
for more than RCU_CPU_STALL_TIMEOUT seconds, leading to a RCU stall:
The solution for this is simply to explicitly allow rescheduling using
cond_resched(). However, since doing so for every loop of
nvmet_pci_epf_poll_sqs_work() significantly degrades performance
(for 4K random reads using 4 I/O queues, the maximum IOPS goes down from
137 KIOPS to 110 KIOPS), call cond_resched() every second to avoid the
RCU stalls.
Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
Damien Le Moal [Thu, 13 Feb 2025 06:52:29 +0000 (15:52 +0900)]
nvmet: pci-epf: Do not uselessly write the CSTS register
The function nvmet_pci_epf_poll_cc_work() will do nothing if there are
no changes to the controller configuration (CC) register. However, even
for such case, this function still calls nvmet_update_cc() and uselessly
writes the CSTS register. Avoid this by simply rescheduling the poll_cc
work if the CC register has not changed.
Also reschedule the poll_cc work if the function
nvmet_pci_epf_enable_ctrl() fails to allow the host the chance to try
again enabling the controller.
While at it, since there is no point in trying to handle the CC register
as quickly as possible, change the poll_cc work scheduling interval to
10 ms (from 5ms), to avoid excessive read accesses to that register.
Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
Damien Le Moal [Thu, 13 Feb 2025 06:52:28 +0000 (15:52 +0900)]
nvmet: pci-epf: Correctly initialize CSTS when enabling the controller
The function nvmet_pci_epf_poll_cc_work() sets the NVME_CSTS_RDY bit of
the controller status register (CSTS) when nvmet_pci_epf_enable_ctrl()
returns success. However, since this function can be called several
times (e.g. if the host reboots), instead of setting the bit in
ctrl->csts, initialize this field to only have NVME_CSTS_RDY set.
Conversely, if nvmet_pci_epf_enable_ctrl() fails, make sure to clear all
bits from ctrl->csts.
To simplify nvmet_pci_epf_poll_cc_work(), initialize ctrl->csts to
NVME_CSTS_RDY directly inside nvmet_pci_epf_enable_ctrl() and clear this
field in that function as well in case of a failure. To be consistent,
move clearing the NVME_CSTS_RDY bit from ctrl->csts when the controller
is being disabled from nvmet_pci_epf_poll_cc_work() into
nvmet_pci_epf_disable_ctrl().
Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
Ruozhu Li [Sun, 16 Feb 2025 12:49:56 +0000 (20:49 +0800)]
nvmet-rdma: recheck queue state is LIVE in state lock in recv done
The queue state checking in nvmet_rdma_recv_done is not in queue state
lock.Queue state can transfer to LIVE in cm establish handler between
state checking and state lock here, cause a silent drop of nvme connect
cmd.
Recheck queue state whether in LIVE state in state lock to prevent this
issue.
Signed-off-by: Ruozhu Li <david.li@jaguarmicro.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
Hannes Reinecke [Fri, 7 Feb 2025 12:41:34 +0000 (13:41 +0100)]
nvmet: Fix crash when a namespace is disabled
The namespace percpu counter protects pending I/O, and we can
only safely diable the namespace once the counter drop to zero.
Otherwise we end up with a crash when running blktests/nvme/058
(eg for loop transport):
as the queue is already torn down when calling submit_bio();
So we need to init the percpu counter in nvmet_ns_enable(), and
wait for it to drop to zero in nvmet_ns_disable() to avoid having
I/O pending after the namespace has been disabled.
Fixes: 74d16965d7ac ("nvmet-loop: avoid using mutex in IO hotpath") Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
Maurizio Lombardi [Mon, 17 Feb 2025 16:08:27 +0000 (17:08 +0100)]
nvme-tcp: add basic support for the C2HTermReq PDU
Previously, the NVMe/TCP host driver did not handle the C2HTermReq PDU,
instead printing "unsupported pdu type (3)" when received. This patch adds
support for processing the C2HTermReq PDU, allowing the driver
to print the Fatal Error Status field.
Example of output:
nvme nvme4: Received C2HTermReq (FES = Invalid PDU Header Field)
Christopher Lentocha [Tue, 18 Feb 2025 13:59:29 +0000 (08:59 -0500)]
nvme-pci: quirk Acer FA100 for non-uniqueue identifiers
In order for two Acer FA100 SSDs to work in one PC (in the case of
myself, a Lenovo Legion T5 28IMB05), and not show one drive and not
the other, and sometimes mix up what drive shows up (randomly), these
two lines of code need to be added, and then both of the SSDs will
show up and not conflict when booting off of one of them. If you boot
up your computer with both SSDs installed without this patch, you may
also randomly get into a kernel panic (if the initrd is not set up) or
stuck in the initrd "/init" process, it is set up, however, if you do
apply this patch, there should not be problems with booting or seeing
both contents of the drive. Tested with the btrfs filesystem with a
RAID configuration of having the root drive '/' combined to make two
256GB Acer FA100 SSDs become 512GB in total storage.
Signed-off-by: Christopher Lentocha <christopherericlentocha@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
Ming Lei [Mon, 17 Feb 2025 03:16:26 +0000 (11:16 +0800)]
block: fix NULL pointer dereferenced within __blk_rq_map_sg
The block layer internal flush request may not have bio attached, so the
request iterator has to be initialized from valid req->bio, otherwise NULL
pointer dereferenced is triggered.
Cc: Christoph Hellwig <hch@lst.de> Reported-and-tested-by: Cheyenne Wills <cheyenne.wills@gmail.com> Fixes: b7175e24d6ac ("block: add a dma mapping iterator") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250217031626.461977-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jann Horn [Fri, 14 Feb 2025 01:39:50 +0000 (02:39 +0100)]
partitions: mac: fix handling of bogus partition table
Fix several issues in partition probing:
- The bailout for a bad partoffset must use put_dev_sector(), since the
preceding read_part_sector() succeeded.
- If the partition table claims a silly sector size like 0xfff bytes
(which results in partition table entries straddling sector boundaries),
bail out instead of accessing out-of-bounds memory.
- We must not assume that the partition table contains proper NUL
termination - use strnlen() and strncmp() instead of strlen() and
strcmp().
Bart Van Assche [Wed, 12 Feb 2025 17:11:07 +0000 (09:11 -0800)]
md/raid*: Fix the set_queue_limits implementations
queue_limits_cancel_update() must only be called if
queue_limits_start_update() is called first. Remove the
queue_limits_cancel_update() calls from the raid*_set_limits() functions
because there is no corresponding queue_limits_start_update() call.
Cc: Christoph Hellwig <hch@lst.de> Fixes: c6e56cf6b2e7 ("block: move integrity information into queue_limits") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-raid/20250212171108.3483150-1-bvanassche@acm.org/ Signed-off-by: Yu Kuai <yukuai@kernel.org>
Jens Axboe [Thu, 13 Feb 2025 15:18:46 +0000 (08:18 -0700)]
block: cleanup and fix batch completion adding conditions
The conditions for whether or not a request is allowed adding to a
completion batch are a bit hard to read, and they also have a few
issues. One is that ioerror may indeed be a random value on passthrough,
and it's being checked unconditionally of whether or not the given
request is a passthrough request or not.
Rewrite the conditions to be separate for easier reading, and only check
ioerror for non-passthrough requests. This fixes an issue with bio
unmapping on passthrough, where it fails getting added to a batch. This
both leads to suboptimal performance, and may trigger a potential
schedule-under-atomic condition for polled passthrough IO.
Jens Axboe [Mon, 3 Feb 2025 16:19:03 +0000 (09:19 -0700)]
Merge tag 'nvme-6.14-2025-01-31' of git://git.infradead.org/nvme into block-6.14
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.14
- Connection fixes for fibre channel transport (Daniel)
- Endian fixes (Keith, Christoph)
- Cleanup fix for host memory buffer (Francis)
- Platform specific power quirks (Georg)
- Target memory leak (Sagi)
- Use appropriate controller state accessor (Daniel)"
* tag 'nvme-6.14-2025-01-31' of git://git.infradead.org/nvme:
nvme-fc: use ctrl state getter
nvme: make nvme_tls_attrs_group static
nvmet: add a missing endianess conversion in nvmet_execute_admin_connect
nvmet: the result field in nvmet_alloc_ctrl_args is little endian
nvmet: fix a memory leak in controller identify
nvme-fc: do not ignore connectivity loss during connecting
nvme: handle connectivity loss in nvme_set_queue_count
nvme-fc: go straight to connecting state when initializing
nvme-pci: Add TUXEDO IBP Gen9 to Samsung sleep quirk
nvme-pci: Add TUXEDO InfinityFlex to Samsung sleep quirk
nvme-pci: remove redundant dma frees in hmb
nvmet: fix rw control endian access
Stephen Rothwell [Mon, 3 Feb 2025 01:47:17 +0000 (12:47 +1100)]
drivers/block/sunvdc.c: update the correct AIP call
My sparc64 defconfig build failed like this:
drivers/block/sunvdc.c: In function 'vdc_queue_drain':
drivers/block/sunvdc.c:1130:9: error: too many arguments to function 'blk_mq_unquiesce_queue'
1130 | blk_mq_unquiesce_queue(q, memflags);
| ^~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/block/sunvdc.c:10:
include/linux/blk-mq.h:895:6: note: declared here
895 | void blk_mq_unquiesce_queue(struct request_queue *q);
| ^~~~~~~~~~~~~~~~~~~~~~
drivers/block/sunvdc.c:1131:9: error: too few arguments to function 'blk_mq_unfreeze_queue'
1131 | blk_mq_unfreeze_queue(q);
| ^~~~~~~~~~~~~~~~~~~~~
In file included from drivers/block/sunvdc.c:10:
include/linux/blk-mq.h:914:1: note: declared here
914 | blk_mq_unfreeze_queue(struct request_queue *q, unsigned int memflags)
| ^~~~~~~~~~~~~~~~~~~~~
Fixes: 1e1a9cecfab3 ("block: force noio scope in blk_mq_freeze_queue") Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Wed, 29 Jan 2025 22:56:35 +0000 (14:56 -0800)]
md: Fix linear_set_limits()
queue_limits_cancel_update() must only be called if
queue_limits_start_update() is called first. Remove the
queue_limits_cancel_update() call from linear_set_limits() because
there is no corresponding queue_limits_start_update() call.
This bug was discovered by annotating all mutex operations with clang
thread-safety attributes and by building the kernel with clang and
-Wthread-safety.
Cc: Yu Kuai <yukuai3@huawei.com> Cc: Coly Li <colyli@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Fixes: 127186cfb184 ("md: reintroduce md-linear") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250129225636.2667932-1-bvanassche@acm.org Signed-off-by: Song Liu <song@kernel.org>
Keith Busch [Tue, 28 Jan 2025 15:22:31 +0000 (07:22 -0800)]
nvme: make nvme_tls_attrs_group static
To suppress the compiler "warning: symbol 'nvme_tls_attrs_group' was not
declared. Should it be static?"
Fixes: 1e48b34c9bc79a ("nvme: split off TLS sysfs attributes into a separate group") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
Christoph Hellwig [Fri, 31 Jan 2025 12:03:47 +0000 (13:03 +0100)]
block: force noio scope in blk_mq_freeze_queue
When block drivers or the core block code perform allocations with a
frozen queue, this could try to recurse into the block device to
reclaim memory and deadlock. Thus all allocations done by a process
that froze a queue need to be done without __GFP_IO and __GFP_FS.
Instead of tying to track all of them down, force a noio scope as
part of freezing the queue.
Note that nvme is a bit of a mess here due to the non-owner freezes,
and they will be addressed separately.
Nilay Shroff [Tue, 28 Jan 2025 14:34:14 +0000 (20:04 +0530)]
block: fix nr_hw_queue update racing with disk addition/removal
The nr_hw_queue update could potentially race with disk addtion/removal
while registering/unregistering hctx sysfs files. The __blk_mq_update_
nr_hw_queues() runs with q->tag_list_lock held and so to avoid it racing
with disk addition/removal we should acquire q->tag_list_lock while
registering/unregistering hctx sysfs files.
With this patch, blk_mq_sysfs_register() (called during disk addition)
and blk_mq_sysfs_unregister() (called during disk removal) now runs
with q->tag_list_lock held so that it avoids racing with __blk_mq_update
_nr_hw_queues().
Nilay Shroff [Tue, 28 Jan 2025 14:34:13 +0000 (20:04 +0530)]
block: get rid of request queue ->sysfs_dir_lock
The request queue uses ->sysfs_dir_lock for protecting the addition/
deletion of kobject entries under sysfs while we register/unregister
blk-mq. However kobject addition/deletion is already protected with
kernfs/sysfs internal synchronization primitives. So use of q->sysfs_
dir_lock seems redundant.
Moreover, q->sysfs_dir_lock is also used at few other callsites along
with q->sysfs_lock for protecting the addition/deletion of kojects.
One such example is when we register with sysfs a set of independent
access ranges for a disk. Here as well we could get rid off q->sysfs_
dir_lock and only use q->sysfs_lock.
The only variable which q->sysfs_dir_lock appears to protect is q->
mq_sysfs_init_done which is set/unset while registering/unregistering
blk-mq with sysfs. But use of q->mq_sysfs_init_done could be easily
replaced using queue registered bit QUEUE_FLAG_REGISTERED.
So with this patch we remove q->sysfs_dir_lock from each callsite
and replace q->mq_sysfs_init_done using QUEUE_FLAG_REGISTERED.
Christoph Hellwig [Mon, 27 Jan 2025 14:30:44 +0000 (15:30 +0100)]
loop: don't clear LO_FLAGS_PARTSCAN on LOOP_SET_STATUS{,64}
LOOP_SET_STATUS{,64} can set a lot more flags than it is supposed to
clear (the LOOP_SET_STATUS_CLEARABLE_FLAGS vs
LOOP_SET_STATUS_SETTABLE_FLAGS defines should have been a hint..).
Fix this by only clearing the bits in LOOP_SET_STATUS_CLEARABLE_FLAGS.
Fixes: ae074d07a0e5 ("loop: move updating lo_flag s out of loop_set_status_from_info") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250127143045.538279-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 24 Jan 2025 19:42:48 +0000 (12:42 -0700)]
Merge tag 'md-6.14-20250124' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into block-6.14
Pull MD fix from Song:
"Fix a md-cluster regression introduced in the 6.12 release."
* tag 'md-6.14-20250124' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux:
md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime
Root cause is that bitmap_get_stats() can be called at anytime if mddev
is still there, even if bitmap is destroyed, or not fully initialized.
Deferenceing bitmap in this case can crash the kernel. Meanwhile, the
above commit start to deferencing bitmap->storage, make the problem
easier to trigger.
Fix the problem by protecting bitmap_get_stats() with bitmap_info.mutex.
Cc: stable@vger.kernel.org # v6.12+ Fixes: 32a7627cf3a3 ("[PATCH] md: optimised resync using Bitmap based intent logging") Reported-and-tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Closes: https://lore.kernel.org/linux-raid/ca3a91a2-50ae-4f68-b317-abd9889f3907@oracle.com/T/#m6e5086c95201135e4941fe38f9efa76daf9666c5 Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250124092055.4050195-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>
Daniel Wagner [Thu, 9 Jan 2025 13:30:49 +0000 (14:30 +0100)]
nvme-fc: do not ignore connectivity loss during connecting
When a connectivity loss occurs while nvme_fc_create_assocation is
being executed, it's possible that the ctrl ends up stuck in the LIVE
state:
1) nvme nvme10: NVME-FC{10}: create association : ...
2) nvme nvme10: NVME-FC{10}: controller connectivity lost.
Awaiting Reconnect
nvme nvme10: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd
3) nvme nvme10: Could not set queue count (880)
nvme nvme10: Failed to configure AEN (cfg 900)
4) nvme nvme10: NVME-FC{10}: controller connect complete
5) nvme nvme10: failed nvme_keep_alive_end_io error=4
A connection attempt starts 1) and the ctrl is in state CONNECTING.
Shortly after the LLDD driver detects a connection lost event and calls
nvme_fc_ctrl_connectivity_loss 2). Because we are still in CONNECTING
state, this event is ignored.
nvme_fc_create_association continues to run in parallel and tries to
communicate with the controller and these commands will fail. Though
these errors are filtered out, e.g in 3) setting the I/O queues numbers
fails which leads to an early exit in nvme_fc_create_io_queues. Because
the number of IO queues is 0 at this point, there is nothing left in
nvme_fc_create_association which could detected the connection drop.
Thus the ctrl enters LIVE state 4).
Eventually the keep alive handler times out 5) but because nothing is
being done, the ctrl stays in LIVE state.
There is already the ASSOC_FAILED flag to track connectivity loss event
but this bit is set too late in the recovery code path. Move this into
the connectivity loss event handler and synchronize it with the state
change. This ensures that the ASSOC_FAILED flag is seen by
nvme_fc_create_io_queues and it does not enter the LIVE state after a
connectivity loss event. If the connectivity loss event happens after we
entered the LIVE state the normal error recovery path is executed.
Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
Daniel Wagner [Thu, 9 Jan 2025 13:30:48 +0000 (14:30 +0100)]
nvme: handle connectivity loss in nvme_set_queue_count
When the set feature attempts fails with any NVME status code set in
nvme_set_queue_count, the function still report success. Though the
numbers of queues set to 0. This is done to support controllers in
degraded state (the admin queue is still up and running but no IO
queues).
Though there is an exception. When nvme_set_features reports an host
path error, nvme_set_queue_count should propagate this error as the
connectivity is lost, which means also the admin queue is not working
anymore.
Fixes: 9a0be7abb62f ("nvme: refactor set_queue_count") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
Daniel Wagner [Thu, 9 Jan 2025 13:30:47 +0000 (14:30 +0100)]
nvme-fc: go straight to connecting state when initializing
The initial controller initialization mimiks the reconnect loop
behavior by switching from NEW to RESETTING and then to CONNECTING.
The transition from NEW to CONNECTING is a valid transition, so there is
no point entering the RESETTING state. TCP and RDMA also transition
directly to CONNECTING state.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
Daniel Wagner [Thu, 23 Jan 2025 13:08:29 +0000 (14:08 +0100)]
blk-mq: create correct map for fallback case
The fallback code in blk_mq_map_hw_queues is original from
blk_mq_pci_map_queues and was added to handle the case where
pci_irq_get_affinity will return NULL for !SMP configuration.
blk_mq_map_hw_queues replaces besides blk_mq_pci_map_queues also
blk_mq_virtio_map_queues which used to use blk_mq_map_queues for the
fallback.
It's possible to use blk_mq_map_queues for both cases though.
blk_mq_map_queues creates the same map as blk_mq_clear_mq_map for !SMP
that is CPU 0 will be mapped to hctx 0.
The WARN_ON_ONCE has to be dropped for virtio as the fallback is also
taken for certain configuration on default. Though there is still a
WARN_ON_ONCE check in lib/group_cpus.c:
WARN_ON(nr_present + nr_others < numgrps);
which will trigger if the caller tries to create more hardware queues
than CPUs. It tests the same as the WARN_ON_ONCE in
blk_mq_pci_map_queues did.
Fixes: a5665c3d150c ("virtio: blk/scsi: replace blk_mq_virtio_map_queues with blk_mq_map_hw_queues") Reported-by: Steven Rostedt <rostedt@goodmis.org> Closes: https://lore.kernel.org/all/20250122093020.6e8a4e5b@gandalf.local.home/ Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20250123-fix-blk_mq_map_hw_queues-v1-1-08dbd01f2c39@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 23 Jan 2025 13:18:41 +0000 (06:18 -0700)]
block: don't revert iter for -EIOCBQUEUED
blkdev_read_iter() has a few odd checks, like gating the position and
count adjustment on whether or not the result is bigger-than-or-equal to
zero (where bigger than makes more sense), and not checking the return
value of blkdev_direct_IO() before doing an iov_iter_revert(). The
latter can lead to attempting to revert with a negative value, which
when passed to iov_iter_revert() as an unsigned value will lead to
throwing a WARN_ON() because unroll is bigger than MAX_RW_COUNT.
Be sane and don't revert for -EIOCBQUEUED, like what is done in other
spots.
Linus Torvalds [Wed, 22 Jan 2025 20:36:16 +0000 (12:36 -0800)]
Merge tag 'linux_kselftest-nolibc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull nolibc updates from Shuah Khan:
- add support for waitid()
- use waitid() over waitpid()
- use a pipe in vfprintf tests
- skip tests for unimplemented syscalls
- rename riscv to riscv64
- add configurations for riscv32
- add detecting missing toolchain to run-tests.sh
* tag 'linux_kselftest-nolibc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/nolibc: add configurations for riscv32
selftests/nolibc: rename riscv to riscv64
selftests/nolibc: skip tests for unimplemented syscalls
selftests/nolibc: use a pipe to in vfprintf tests
selftests/nolibc: use waitid() over waitpid()
tools/nolibc: add support for waitid()
selftests/nolibc: run-tests.sh: detect missing toolchain
Linus Torvalds [Wed, 22 Jan 2025 20:30:20 +0000 (12:30 -0800)]
Merge tag 'linux_kselftest-next-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
- fixes, reporting improvements, and cleanup changes to several tests
- add support for DT_GNU_HASH to selftests/vDSO
* tag 'linux_kselftest-next-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/rseq: Fix handling of glibc without rseq support
selftests/resctrl: Discover SNC kernel support and adjust messages
selftests/resctrl: Adjust effective L3 cache size with SNC enabled
selftests/ftrace: Make uprobe test more robust against binary name
selftests/ftrace: Fix to use remount when testing mount GID option
selftests: tmpfs: Add kselftest support to tmpfs
selftests: tmpfs: Add Test-skip if not run as root
selftests: harness: fix printing of mismatch values in __EXPECT()
selftests/ring-buffer: Add test for out-of-bound pgoff mapping
selftests/run_kselftest.sh: Fix help string for --per-test-log
selftests: acct: Add ksft_exit_skip if not running as root
selftests: kselftest: Fix the wrong format specifier
selftests: timers: clocksource-switch: Adapt progress to kselftest framework
selftests/zram: gitignore output file
selftests/filesystems: Add missing gitignore file
selftests: Warn about skipped tests in result summary
selftests: kselftest: Add ksft_test_result_xpass
selftests/vDSO: support DT_GNU_HASH
selftests/ipc: Remove unused variables
selftest: media_tests: fix trivial UAF typo
Linus Torvalds [Wed, 22 Jan 2025 20:12:42 +0000 (12:12 -0800)]
Merge tag 'input-for-v6.14-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- more conversions to the guard notation in the input core
- a fix for NXP BBNSM power key driver to clean up wake IRQ after
unbinding
- several new vendor/device ID pairs added to xpad game controller
driver
- several drivers switched to using str_enable_disable and similar
helpers instead of open-coding
- add mapping for F23 to atkbd driver so that MS "Copilot" key shortcut
works out of the box (if userspace is ready to handle it)
- evbug input handler has been removed (debugging through evdev is
strongly preferred to dumping all events into the kernel log).
* tag 'input-for-v6.14-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (22 commits)
Input: synaptics - fix crash when enabling pass-through port
Input: atkbd - map F23 key to support default copilot shortcut
Input: xpad - add support for Nacon Evol-X Xbox One Controller
Input: xpad - add unofficial Xbox 360 wireless receiver clone
Input: xpad - add support for wooting two he (arm)
Input: xpad - improve name of 8BitDo controller 2dc8:3106
Input: xpad - add QH Electronics VID/PID
Input: joystick - use str_off_on() helper in sw_connect()
Input: Use str_enable_disable-like helpers
Input: use guard notation in input core
Input: poller - convert locking to guard notation
Input: mt - make use of __free() cleanup facility
Input: mt - convert locking to guard notation
Input: ff-memless - make use of __free() cleanup facility
Input: ff-memless - convert locking to guard notation
Input: ff-core - make use of __free() cleanup facility
Input: ff-core - convert locking to guard notation
Input: remove evbug driver
Input: mma8450 - add chip ID check in probe
Input: bbnsm_pwrkey - add remove hook
...
Linus Torvalds [Wed, 22 Jan 2025 19:56:39 +0000 (11:56 -0800)]
Merge tag 'hid-for-linus-2025012001' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:
- newly added support for Intel Touch Host Controller (Even Xu, Xinpeng
Sun)
- hid-core fix for long-standing syzbot-reported cornercase of
Resolution Multiplier not being present in any of the Logical
Collections in the device HID report descriptor (Alan Stern)
- improvement of behavior for non-standard LED brightness values for
Wacom driver (Jason Gerecke)
- PCI Wacom device support (depends on Intel THC support) (Even Xu)
- SteelSeries Arctis 9 support (Christian Mayer)
- constification of 'struct bin_attribute' in various HID driver
(Thomas Weißschuh)
- other assorted code cleanups / fixes and device ID additions
* tag 'hid-for-linus-2025012001' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (63 commits)
HID: hid-asus: Disable OOBE mode on the ProArt P16
HID: steelseries: remove unnecessary return
HID: steelseries: export model and manufacturer
HID: steelseries: export charging state for the SteelSeries Arctis 9 headset
HID: steelseries: add SteelSeries Arctis 9 support
HID: steelseries: preparation for adding SteelSeries Arctis 9 support
HID: intel-thc-hid: fix build errors in um mode
HID: intel-thc-hid: intel-quicki2c: fix potential memory corruption
HID: intel-thc-hid: intel-thc: Fix error code in thc_i2c_subip_init()
HID: lenovo: Fix undefined platform_profile_cycle in ThinkPad X12 keyboard patch
HID: uclogic: make const read-only array touch_ring_model_params_buf static
HID: hid-steam: Make sure rumble work is canceled on removal
HID: Wacom: Add PCI Wacom device support
HID: intel-thc-hid: intel-quicki2c: Add PM implementation
HID: intel-thc-hid: intel-quicki2c: Complete THC QuickI2C driver
HID: intel-thc-hid: intel-quicki2c: Add HIDI2C protocol implementation
HID: intel-thc-hid: intel-quicki2c: Add THC QuickI2C ACPI interfaces
HID: intel-thc-hid: intel-quicki2c: Add THC QuickI2C driver hid layer
HID: intel-thc-hid: intel-quicki2c: Add THC QuickI2C driver skeleton
HID: intel-thc-hid: intel-quickspi: Add PM implementation
...
Linus Torvalds [Wed, 22 Jan 2025 19:37:33 +0000 (11:37 -0800)]
Merge tag 'thermal-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These add support for Intel Panther Lake processors in multiple
places, modify Intel thermal drivers to stop selecting the user space
thermal governor which is not necessary for them to work any more and
clean up the thermal core somewhat:
- Add support for Panther Lake processors in multiple places (Zhang
Rui, Srinivas Pandruvada)
- Rename a few things and relocate a comment in the thermal subsystem
(Rafael Wysocki)"
* tag 'thermal-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: Rename function argument related to trip crossing
thermal: gov_bang_bang: Relocate regulation logic description
thermal: core: Rename callback functions in two governors
thermal: intel: Fix compile issue when CONFIG_NET is not defined
thermal: intel: int340x: Panther Lake power floor and workload hint support
thermal: intel: int340x: Panther Lake DLVR support
thermal: intel: Remove explicit user_space governor selection
ACPI: DPTF: Support Panther Lake
thermal: intel: int340x: processor: Enable MMIO RAPL for Panther Lake
powercap: intel_rapl: Add support for Panther Lake platform
Linus Torvalds [Wed, 22 Jan 2025 19:28:39 +0000 (11:28 -0800)]
Merge tag 'acpi-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"The most significant change here is replacing msleep() in
acpi_os_sleep() with usleep_range() to reduce spurious sleep time due
to timer inaccuracy which may spectacularly reduce the duration of
system suspend and resume transitions on some systems.
All of the other changes fall into the fixes and cleanups category
this time.
Specifics:
- Use usleep_range() instead of msleep() in acpi_os_sleep() to reduce
excessive delays due to timer inaccuracy, mostly affecting system
suspend and resume (Rafael Wysocki)
- Use str_enabled_disabled() string helpers in the ACPI tables
parsing code to make it easier to follow (Sunil V L)
- Update device properties parsing on systems using ACPI so that data
firmware nodes resulting from _DSD evaluation are treated as
available in firmware nodes walks (Sakari Ailus)
- Fix missing guid_t declaration in linux/prmt.h (Robert Richter)
- Update the GHES handling code to follow the global panic= policy
instead of overriding it by force-rebooting the system after a
fatal HW error has been reported (Borislav Petkov)
- Update messages printed by the ACPI battery driver to always refer
to driver extensions as "hooks" to avoid confusion with similar
functionality in the power supply subsystem in the future (Thomas
Weißschuh)
- Fix .probe() error path cleanup in the ACPI fan driver to avoid
memory leaks (Joe Hattori)
- Constify 'struct bin_attribute' in some places in the ACPI
subsystem and mark it as __ro_after_init in one place to prevent
binary blob attributes from being updated (Thomas Weißschuh)
- Add empty stubs for several ACPI-related symbols so that they can
be used when CONFIG_ACPI is unset and use them for removing
unnecessary conditional compilation from the ipu-bridge driver
(Ricardo Ribalda)"
* tag 'acpi-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
APEI: GHES: Have GHES honor the panic= setting
ACPI: PRM: Fix missing guid_t declaration in linux/prmt.h
ACPI: tables: Use string choice helpers
ACPI: property: Consider data nodes as being available
media: ipu-bridge: Remove unneeded conditional compilations
ACPI: bus: implement acpi_device_hid when !ACPI
ACPI: bus: implement for_each_acpi_consumer_dev when !ACPI
ACPI: header: implement acpi_device_handle when !ACPI
ACPI: bus: implement acpi_get_physical_device_location when !ACPI
ACPI: bus: implement for_each_acpi_dev_match when !ACPI
ACPI: bus: change the prototype for acpi_get_physical_device_location
ACPI: fan: cleanup resources in the error path of .probe()
ACPI: battery: Rename extensions to hook in messages
ACPI: OSL: Use usleep_range() in acpi_os_sleep()
ACPI: sysfs: Constify 'struct bin_attribute'
ACPI: BGRT: Constify 'struct bin_attribute'
ACPI: BGRT: Mark bin_attribute as __ro_after_init
Linus Torvalds [Wed, 22 Jan 2025 19:16:14 +0000 (11:16 -0800)]
Merge tag 'pm-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"The majority of changes here are cpufreq updates which are dominated
by amd-pstate driver changes, like in the previous cycle. Moreover,
changes related to amd-pstate are also the majority of cpupower
utility updates.
Included are some pieces of new hardware support, like the addition of
Clearwater Forest processors support to intel_idle, new cpufreq driver
for Airoha SoCs, and Apple cpufreq driver extensions to support more
SoCs. The intel_pstate driver is also extended to be able to support
new platforms by using ACPI CPPC to compute scaling factors between
HWP performance states and frequency.
The rest is mostly fixes and cleanups in assorted pieces of power
management code.
Specifics:
- Use str_enable_disable()-like helpers in cpufreq (Krzysztof
Kozlowski)
- Extend the Apple cpufreq driver to support more SoCs (Hector
Martin, Nick Chan)
- Add new cpufreq driver for Airoha SoCs (Christian Marangi)
- Fix using cpufreq-dt as module (Andreas Kemnade)
- Minor fixes for Sparc, SCMI, and Qcom cpufreq drivers (Ethan Carter
Edwards, Sibi Sankar, Manivannan Sadhasivam)
- Fix the maximum supported frequency computation in the ACPI cpufreq
driver to avoid relying on unfounded assumptions (Gautham Shenoy)
- Fix an amd-pstate driver regression with preferred core rankings
not being used (Mario Limonciello)
- Fix a precision issue with frequency calculation in the amd-pstate
driver (Naresh Solanki)
- Add ftrace event to the amd-pstate driver for active mode (Mario
Limonciello)
- Set default EPP policy on Ryzen processors in amd-pstate (Mario
Limonciello)
- Clean up the amd-pstate cpufreq driver and optimize it to increase
code reuse (Mario Limonciello, Dhananjay Ugwekar)
- Use CPPC to get scaling factors between HWP performance levels and
frequency in the intel_pstate driver and make it stop using a
built-in scaling factor for Arrow Lake processors (Rafael Wysocki)
- Make intel_pstate initialize epp_policy to CPUFREQ_POLICY_UNKNOWN
for consistency with CPU offline (Christian Loehle)
- Fix superfluous updates caused by need_freq_update in the schedutil
cpufreq governor (Sultan Alsawaf)
- Allow configuring the system suspend-resume (DPM) watchdog to warn
earlier than panic (Douglas Anderson)
- Implement devm_device_init_wakeup() helper and introduce a device-
managed variant of dev_pm_set_wake_irq() (Joe Hattori, Peng Fan)
- Remove direct inclusions of 'pm_wakeup.h' which should be only
included via 'device.h' (Wolfram Sang)
- Clean up two comments in the core system-wide PM code (Rafael
Wysocki, Randy Dunlap)
- Add Clearwater Forest processor support to the intel_idle cpuidle
driver (Artem Bityutskiy)
- Clean up the Exynos devfreq driver and devfreq core (Markus
Elfring, Jeongjun Park)
- Minor cleanups and fixes for OPP (Dan Carpenter, Neil Armstrong,
Joe Hattori)
- Implement dev_pm_opp_get_bw() (Neil Armstrong)
- Expose OPP reference counting helpers for Rust (Viresh Kumar)
- Fix TSC MHz calculation in cpupower (He Rongguang)
- Add install and uninstall options to bindings Makefile and add
header changes for cpufreq.h to SWIG bindings in cpupower (John B.
Wyatt IV)
- Add missing residency header changes in cpuidle.h to SWIG bindings
in cpupower (John B. Wyatt IV)
- Add output files to .gitignore and clean them up in "make clean" in
selftests/cpufreq (Li Zhijian)
- Fix cross-compilation in cpupower Makefile (Peng Fan)
- Revise the is_valid flag handling for idle_monitor in the cpupower
utility (wangfushuai)
- Extend and clean up AMD processors support in cpupower (Mario
Limonciello)"
* tag 'pm-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (67 commits)
PM / OPP: Add reference counting helpers for Rust implementation
PM: sleep: wakeirq: Introduce device-managed variant of dev_pm_set_wake_irq()
cpufreq: Use str_enable_disable()-like helpers
cpufreq: airoha: Add EN7581 CPUFreq SMCCC driver
PM: sleep: Allow configuring the DPM watchdog to warn earlier than panic
PM: sleep: convert comment from kernel-doc to plain comment
cpufreq: ACPI: Fix max-frequency computation
pm: cpupower: Add missing residency header changes in cpuidle.h to SWIG
PM / devfreq: exynos: remove unused function parameter
OPP: OF: Fix an OF node leak in _opp_add_static_v2()
cpufreq/amd-pstate: Refactor max frequency calculation
cpufreq/amd-pstate: Fix prefcore rankings
pm: cpupower: Add header changes for cpufreq.h to SWIG bindings
cpufreq: sparc: change kzalloc to kcalloc
cpufreq: qcom: Implement clk_ops::determine_rate() for qcom_cpufreq* clocks
cpufreq: qcom: Fix qcom_cpufreq_hw_recalc_rate() to query LUT if LMh IRQ is not available
cpufreq: apple-soc: Add Apple A7-A8X SoC cpufreq support
cpufreq: apple-soc: Set fallback transition latency to APPLE_DVFS_TRANSITION_TIMEOUT
cpufreq: apple-soc: Increase cluster switch timeout to 400us
cpufreq: apple-soc: Use 32-bit read for status register
...
Linus Torvalds [Wed, 22 Jan 2025 18:57:57 +0000 (10:57 -0800)]
Merge tag 'for-linus-6.14-1' of https://github.com/cminyard/linux-ipmi
Pull ipmi updates from Corey Minyard:
- I'm switching to a new email address, so update that
- Minor fixes for formats and return values and missing ifdefs
- A fix for some error handling that causes a loss of messages
* tag 'for-linus-6.14-1' of https://github.com/cminyard/linux-ipmi:
MAINTAINERS: ipmi: update my email address
ipmi: ssif_bmc: Fix new request loss when bmc ready for a response
ipmi: make ipmi_destroy_user() return void
char:ipmi: Fix a not-used variable on a non-ACPI system
char:ipmi: Fix the wrong format specifier
ipmi: ipmb: Add check devm_kasprintf() returned value
Linus Torvalds [Wed, 22 Jan 2025 18:54:18 +0000 (10:54 -0800)]
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"A pretty quiet cycle this time around. We have a bunch of new Qualcomm
clk drivers, per usual, and then a handful of drivers for other SoCs.
Then the usual pile of cleanups is fairly small data fixes or
converting DT bindings to YAML so they can be validated.
No changes to the core framework besides an OF node refcount bump that
never got decremented.
New Drivers:
- 5L35023 variant of Versa 3 clock generator
- Various Qualcomm clk controllers: IPQ CMN PLL, SM6115 LPASS, SM750
global, tcsr, rpmh, and display. X Plus GPU and global. QCS615 rpmh
and MSM8937 and MSM8940 RPM.
- Qualcomm Pongo and Taycan Alpha PLLs
- Qualcomm IPQ5424 NoC-related interconnect clks
- Renesas RZ/G3E (R9A09G047) SoC clk driver
- SAMA7D65 SoC clk driver
- Samsung Exynos990 SoC clk driver"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (159 commits)
clk: analogbits: Fix incorrect calculation of vco rate delta
clk: bcm: rpi: Add disp clock
clk: bcm: rpi: Create helper to retrieve private data
clk: bcm: rpi: Enable minimize for all firmware clocks
clk: bcm: rpi: Allow cpufreq driver to also adjust gpu clocks
clk: bcm: rpi: Add ISP to exported clocks
clk: stm32f4: support spread spectrum clock generation
clk: stm32f4: use FIELD helpers to access the PLLCFGR fields
dt-bindings: clock: st,stm32-rcc: support spread spectrum clocking
dt-bindings: clock: convert stm32 rcc bindings to json-schema
clk: Use str_enable_disable-like helpers
clk: clk-loongson2: Fix the number count of clk provider
clk: clk-loongson2: Switch to use devm_clk_hw_register_fixed_rate_parent_data()
clk: starfive: Make _clk_get become a common helper function
clk: en7523: Add clock for eMMC for EN7581
dt-bindings: clock: add ID for eMMC for EN7581
dt-bindings: clock: drop NUM_CLOCKS define for EN7581
clk: en7523: Rework clock handling for different clock numbers
clk: thead: Fix cpu2vp_clk for TH1520 AP_SUBSYS clocks
clk: thead: Add CLK_IGNORE_UNUSED to fix TH1520 boot
...
Linus Torvalds [Wed, 22 Jan 2025 18:47:46 +0000 (10:47 -0800)]
Merge tag 'i2c-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"Core:
- list-based mechanisms for handling auto-detected and userspace
created clients are replaced with a flag-based approach. The
resulting code is much simpler as well as the locking.
- i2c clients now get a default debugfs dir managed by the I2C core.
Drivers don't have to maintain their own directory anymore.
Driver updates:
- xiic: atomic_transfer support
- imx-lpi2c: DMA and target mode support
- riic cleanups
- npcm: better timeout handling and more precise frequency setups
- davinci: remove unused platform_data
- at24: add new compatibles for variants from Giantec and Puya
Semiconductor (together with a new vendor prefix)"
* tag 'i2c-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (52 commits)
i2c: add kdoc for the new debugfs entry of clients
i2c: designware: Actually make use of the I2C_DW_COMMON and I2C_DW symbol namespaces
i2c: add core-managed per-client directory in debugfs
i2c: Force ELAN06FA touchpad I2C bus freq to 100KHz
i2c: riic: Add `riic_bus_barrier()` to check bus availability
i2c: riic: Use predefined macro and simplify clock tick calculation
i2c: riic: Mark riic_irqs array as const
i2c: riic: Make use of devres helper to request deasserted reset line
i2c: riic: Use GENMASK() macro for bitmask definitions
i2c: riic: Use BIT macro consistently
i2c: riic: Use local `dev` pointer in `dev_err_probe()`
i2c: riic: Use dev_err_probe in probe and riic_init_hw functions
i2c: riic: Introduce a separate variable for IRQ
i2c: amd756: Remove superfluous TODO
Revert "i2c: amd756: Fix endianness handling for word data"
i2c: i801: Add lis3lv02d for Dell Precision M6800
i2c: i801: Remove unnecessary PCI function call
i2c: core: Allocate temp client on the stack in i2c_detect
i2c: slave-eeprom: Constify 'struct bin_attribute'
i2c: imx-lpi2c: make controller available until the system enters suspend_noirq() and from resume_noirq().
...
Linus Torvalds [Wed, 22 Jan 2025 18:43:09 +0000 (10:43 -0800)]
Merge tag 'pwm/for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm updates from Uwe Kleine-König:
"This time there are very little changes for pwm. There is nothing new,
just a few maintenance cleanups.
The contributors this time around were Krzysztof Kozlowski, Mingwei
Zheng, Philipp Stanner, and Stanislav Jakubek. Thanks!"
* tag 'pwm/for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
pwm: stm32: Add check for clk_enable()
dt-bindings: pwm: Correct indentation and style in DTS example
pwm: stm32-lp: Add check for clk_enable()
dt-bindings: pwm: marvell,berlin-pwm: Convert from txt to yaml
dt-bindings: pwm: sprd,ums512-pwm: convert to YAML
pwm: Replace deprecated PCI functions
Linus Torvalds [Wed, 22 Jan 2025 18:39:17 +0000 (10:39 -0800)]
Merge tag 'mmc-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Share a helper to convert from crypto_profile to mmc_host
- Respect quirk_max_rate for non-UHS SDIO card too
MMC host:
- Add DT bindings for the mmc-slot
- Clarify DT bindings for the mmc-controller
- bcm2835: Add support for system-wide suspend/resume PM
- dw_mmc-exynos: Add support for the exynos8895 variant
- meson-mx-sdio: Convert DT bindings to dtschema
- mtk-sd: Fixup use of two register ranges
- mtk-sd: Add support for ignoring cmd response CRC
- sdhci-esdhc-imx: enable 'SDHCI_QUIRK_NO_LED' quirk for S32G
- sdhci-msm: Correctly set the load for the regulator
- sdhci-msm: Convert to use custom crypto profile
- sdhci-of-at91: Add support for the microchip sama7d65 variant"
* tag 'mmc-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (25 commits)
mmc: sdhci-msm: Correctly set the load for the regulator
mmc: hi3798mv200: Use syscon_regmap_lookup_by_phandle_args
mmc: Use of_property_present() for non-boolean properties
dt-bindings: mmc: samsung,exynos-dw-mshc: add specific compatible for exynos8895
mmc: sdhci-msm: convert to use custom crypto profile
mmc: crypto: add mmc_from_crypto_profile()
mmc: mtk-sd: Limit getting top_base to SoCs that require it
dt-bindings: mmc: mtk-sd: Document compatibles that need two register ranges
mmc: sdhci-acpi: Use devm_platform_ioremap_resource()
mmc: sdhci-acpi: Remove not so useful error message
dt-bindings: mmc: convert amlogic,meson-mx-sdio.txt to dtschema
dt-bindings: mmc: document mmc-slot
dt-bindings: mmc: controller: remove '|' when not needed
dt-bindings: mmc: controller: move properties common with slot out to mmc-controller-common
dt-bindings: mmc: controller: clarify the address-cells description
mmc: bcm2835: add suspend/resume pm support
dt-bindings: Drop Bhupesh Sharma from maintainers
mmc: core: don't include 'pm_wakeup.h' directly
mmc: mtk-sd: Add support for ignoring cmd response CRC
mmc: core: Introduce the MMC_RSP_R1B_NO_CRC response
...
Linus Torvalds [Wed, 22 Jan 2025 18:16:48 +0000 (10:16 -0800)]
Merge tag 'hwmon-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
"New drivers:
- PMBus client driver for Intel CRPS185 power supply
- PMBus client driver for Texas Instruments TPS25990
Chip support added to existing drivers:
- pmbus/max15301: Add support for MAX15303
- pmbus/adm1275: Add adm1273 support
- lm75: Add NXP P3T1755 support; with it, add I3C support to the
driver
- asus-ec-sensors: Add TUF GAMING X670E PLUS
Other notable changes:
- nct6683: Add customer IDs for several MSI and ASRock boards
- tmp108: Add regulator support
- Improve write protect support in PMBus core
- pmbus/dps920ab: Add ability to instantiate through i2c
- The hwmon core now accepts NULL as device name parameter to
[devm_]hwmon_device_register_with_info ans uses the parent device
name as fallback in that case
- The PMBus core now provides the PMBUs revision in a debugfs file
- asus-ec-sensors: Support for optional CPU fan on AMD 600
motherboards
- raspberrypi: Add PM suspend/resume support
- dell-smm: Enable manual fan control support on Dell XPS 9370
- pwm-fan: Default to maximum cooling level if provided
And various other minor fixes and improvements"
* tag 'hwmon-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (44 commits)
hwmon: pmbus: dps920ab: Add ability to instantiate through i2c
hwmon: (pwm-fan) Default to the Maximum cooling level if provided
hwmon: (asus_atk0110) Use str_enabled_disabled() and str_enable_disable() helpers
hwmon: Fix help text for aspeed-g6-pwm-tach
hwmon: (dell-smm) Add Dell XPS 9370 to fan control whitelist
hwmon: (acpi_power_meter) Fix update the power trip points on failure
hwmon: (acpi_power_meter) Fix uninitialized variables
hwmon: (core) Use device name as a fallback in devm_hwmon_device_register_with_info
hwmon: (pmbus/max15301) Add support for MAX15303
hwmon: (pmbus/adm1275) add adm1273 support
dt-bindings: hwmon: adm1275: add adm1273
hwmon: (nct6683) Add another customer ID for MSI
hwmon: (pwm-fan): Make use of device properties everywhere
hwmon: (lm75) add I3C support for P3T1755
hwmon: (lm75) separate probe into common and I2C parts
hwmon: (lm75) Remove superfluous 'client' member from private struct
hwmon: (lm75) simplify regulator handling
hwmon: (lm75) simplify lm75_write_config()
hwmon: (lm75) Hide register size differences in regmap access functions
hwmon: (pmbus/crps) Add Intel CRPS185 power supply
...
Linus Torvalds [Wed, 22 Jan 2025 17:19:36 +0000 (09:19 -0800)]
Merge tag 'leds-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds
Pull LED updates from Lee Jones:
- Allow struct bin_attribute instances to be placed in read-only memory
for enhanced protection
- Fix a memory leak in the cht-wcove driver by using
devm_led_classdev_register()
- Fix an OF node reference leak in the netxbig driver
- Ensure PWM is disabled properly in pwm-multicolor suspend
- Add support for Texas Instruments LP8864, LP8864S, LP8866
LED-backlight drivers
- Add support for STMicroelectronics's LED1202 12-channel LED driver
- Convert LP8860 bindings to YAML format
- Add bindings for the TI LP8864/LP8866 LED drivers
- Add LED1202 LED controller bindings
- Fix path to color definitions in leds-class-multicolor.yaml
- Add pm660l compatible to qcom,spmi-flash-led bindings
- Extend cznic,turris-omnia-leds binding with interrupts property
- Add documentation for the STMicroelectronics LED1202 driver
- Add entry for AAEON UP board FPGA drivers in MAINTAINERS
- Fix a wrong format specifier in the ledtrig-activity driver
- Fix a bug in the lp8860 driver where only half of the EEPROM was
written
* tag 'leds-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (28 commits)
leds: triggers: Constify 'struct bin_attribute'
leds: cht-wcove: Use devm_led_classdev_register() to avoid memory leak
leds: lp8864: Add support for Texas Instruments LP8864, LP8864S, LP8866 LED-backlights
dt-bindings: leds: Convert LP8860 into YAML format
leds: Add LED1202 I2C driver
dt-bindings: leds: Add LED1202 LED Controller
Documentation:leds: Add leds-st1202.rst
leds: pwm-multicolor: Disable PWM when going to suspend
leds: netxbig: Fix an OF node reference leak in netxbig_leds_get_of_pdata()
turris-omnia-mcu-interface.h: Move macro definitions outside of enums
MAINTAINERS: Add entry for AAEON UP board FPGA drivers
leds: Add AAEON UP board LED driver
leds: trigger: netdev: Check offload ability on interface up
leds: turris-omnia: Use uppercase first letter in all comments
leds: turris-omnia: Use dev_err_probe() where appropriate
leds: turris-omnia: Inform about missing LED gamma correction feature in the MCU driver
platform: cznic: turris-omnia-mcu: Inform about missing LED panel brightness change interrupt feature
leds: turris-omnia: Notify sysfs on MCU global LEDs brightness change
leds: turris-omnia: Document driver private structures
dt-bindings: leds: cznic,turris-omnia-leds: Allow interrupts property
...
Linus Torvalds [Wed, 22 Jan 2025 17:08:18 +0000 (09:08 -0800)]
Merge tag 'spi-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"This is a fairly quiet release for the most part, though we do have
one really nice improvement in the spi-mem framework which will
improve performance for flash devices especially when built on by
changes in the MTD subsystem which are also due to be sent this merge
window.
There's also been some substantial work on some of the drivers,
highlights include:
- Support for per-operation bus frequency in the spi-mem framework,
meaning speeds are no longer limited by the slowest operation
- ACPI support and improved power management for Rockchip SFC
controllers
- Support for Atmel SAM7G5 QuadSPI and KEBA SPI controllers"
* tag 'spi-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (49 commits)
spi: pxa2xx: Introduce __lpss_ssp_update_priv() helper
spi: ti-qspi: Use syscon_regmap_lookup_by_phandle_args
spi: amd: Fix -Wuninitialized in amd_spi_exec_mem_op()
spi: spi-mem: Estimate the time taken by operations
spi: spi-mem: Create macros for DTR operation
spi: spi-mem: Reorder spi-mem macro assignments
spi: zynqmp-gqspi: Support per spi-mem operation frequency switches
spi: zynq-qspi: Support per spi-mem operation frequency switches
spi: spi-ti-qspi: Support per spi-mem operation frequency switches
spi: spi-sn-f-ospi: Support per spi-mem operation frequency switches
spi: rockchip-sfc: Support per spi-mem operation frequency switches
spi: nxp-fspi: Support per spi-mem operation frequency switches
spi: mxic: Support per spi-mem operation frequency switches
spi: mt65xx: Support per spi-mem operation frequency switches
spi: microchip-core-qspi: Support per spi-mem operation frequency switches
spi: fsl-qspi: Support per spi-mem operation frequency switches
spi: dw: Support per spi-mem operation frequency switches
spi: cadence-qspi: Support per spi-mem operation frequency switches
spi: amlogic-spifc-a1: Support per spi-mem operation frequency switches
spi: amd: Drop redundant check
...
Linus Torvalds [Wed, 22 Jan 2025 17:03:41 +0000 (09:03 -0800)]
Merge tag 'regulator-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"This was a very quiet release, aside from some smaller improvements we
have:
- Support for power budgeting on regulators, initially targeted at
some still in review support for PSE controllers but generally
useful
- Support for error interrupts from ROHM BD96801 devices
- Support for NXP PCA9452"
* tag 'regulator-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: dt-bindings: Add regulator-power-budget-milliwatt property
regulator: Add support for power budget
regulator: core: Resolve supply using of_node from regulator_config
regulator: of: Implement the unwind path of of_regulator_match()
regulator: tps65219: Remove debugging helper function
regulator: tps65219: Remove MODULE_ALIAS
regulator: tps65219: Update driver name
regulator: tps65219: Use dev_err_probe() instead of dev_err()
regulator: dt-bindings: mt6315: Drop regulator-compatible property
regulator: pca9450: Add PMIC pca9452 support
regulator: dt-bindings: pca9450: Add pca9452 support
regulator: pca9450: Use dev_err_probe() to simplify code
regulator: pca9450: add enable_value for all bucks
regulator: bd96801: Add ERRB IRQ
Linus Torvalds [Wed, 22 Jan 2025 16:57:52 +0000 (08:57 -0800)]
Merge tag 'regmap-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"There's one big bit of work this time around, the addition of support
for a greater range of MBQ access sizes to SoundWire devices together
with support for deferred read/write.
The MBQ register maps generally have variable register sizes, the
variable regiseter size support allows them to be handled much more
naturally within regmap with less open coding in drivers.
The deferred read/write support avoids spurious errors when devices
make use of a bus feature allowing them to indicate they're busy.
These changes pull in a supporting SoundWire change, and there's an
ASoC change building off the new code.
The remainder of the changes are code cleanups"
* tag 'regmap-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: sdw-mbq: Add support for SDCA deferred controls
regmap: sdw-mbq: Add support for further MBQ register sizes
ASoC: SDCA: Update list of entity_0 controls
soundwire: SDCA: Add additional SDCA address macros
regmap: regmap_multi_reg_read(): make register list const
regmap: cache: rbtree: use krealloc_array() to replace krealloc()
regmap: cache: mapple: use kmalloc_array() to replace kmalloc()
regmap: place foo / 8 and foo % 8 closer to each other
regmap: Use BITS_TO_BYTES()
regmap: cache: Use BITS_TO_BYTES()
Linus Torvalds [Wed, 22 Jan 2025 16:52:26 +0000 (08:52 -0800)]
Merge tag 'pwrseq-updates-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull power sequencing updates from Bartosz Golaszewski:
"We added support for another Qualcomm WCN model and a FIXME comment
that explains why we still need to keep a GPIO workaround for now
despite having merged a set of changes to the PCI code that seemingly
fixed the underlying problem:
- support a new model in the qcom-wcn pwrseq driver
- explain the need to keep the WLAN_EN GPIO workaround for now with a
FIXME comment"
* tag 'pwrseq-updates-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
power: sequencing: qcom-wcn: explain why we need the WLAN_EN GPIO hack
power: sequencing: qcom-wcn: add support for the WCN6750 PMU
Linus Torvalds [Wed, 22 Jan 2025 16:47:54 +0000 (08:47 -0800)]
Merge tag 'gpio-updates-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"Thanks to little activity in December, this is really tiny. Just a few
updates to drivers and device-tree bindings.
Driver improvements:
- support a new model in gpio-mpc8xxx
- refactor gpio-tqmx86 and add support for direction setting
- allow building gpio-omap with COMPILE_TEST=y
- use gpiochip_get_data() instead of dev_get_drvdata() in
gpio-twl6040
- drop unued field from driver data in gpio-altera
- use generic request/free callbacks in gpio-regmap for better
integration with pinctrl
- use dev_err_probe() where applicable in gpio-pca953x
- use existing dedicated GPIO defines in gpio-tps65219 instead of
custom ones
DT bindings:
- document a new model in fsl,qoriq-gpio
- explain the chip's latch clock pin and how it works like
chip-select in fairchild,74hc595
- enable the gpio-line-names property for gpio-brcmstb"
* tag 'gpio-updates-for-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: regmap: Use generic request/free ops
gpio: altera: Drop .mapped_irq from driver data
gpio: mpc8xxx: Add MPC8314 support
dt-bindings: gpio: fsl,qoriq-gpio: Add compatible string fsl,mpc8314-gpio
dt-bindings: gpio: fairchild,74hc595: Document chip select vs. latch clock
gpio: tps65219: Use existing kernel gpio macros
gpio: pca953x: log an error when failing to get the reset GPIO
dt-bindings: gpio: brcmstb: permit gpio-line-names property
gpio: tqmx86: add support for changing GPIO directions
gpio: tqmx86: introduce tqmx86_gpio_clrsetbits() helper
gpio: tqmx86: use cleanup guards for spinlock
gpio: tqmx86: consistently refer to IRQs by hwirq numbers
gpio: tqmx86: add macros for interrupt configuration
gpio: omap: allow building the module with COMPILE_TEST=y
gpio: twl4030: use gpiochip_get_data
Linus Torvalds [Wed, 22 Jan 2025 16:28:57 +0000 (08:28 -0800)]
Merge tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"This is slightly smaller than usual, with the most interesting work
being still around RTNL scope reduction.
Core:
- More core refactoring to reduce the RTNL lock contention, including
preparatory work for the per-network namespace RTNL lock, replacing
RTNL lock with a per device-one to protect NAPI-related net device
data and moving synchronize_net() calls outside such lock.
- Extend drop reasons usage, adding net scheduler, AF_UNIX, bridge
and more specific TCP coverage.
- Reduce network namespace tear-down time by removing per-subsystems
synchronize_net() in tipc and sched.
- Add flow label selector support for fib rules, allowing traffic
redirection based on such header field.
Netfilter:
- Do not remove netdev basechain when last device is gone, allowing
netdev basechains without devices.
- Revisit the flowtable teardown strategy, dealing better with fin,
reset and re-open events.
- Scale-up IP-vs connection dumping by avoiding linear search on each
restart.
Protocols:
- A significant XDP socket refactor, consolidating and optimizing
several helpers into the core
- Better scaling of ICMP rate-limiting, by removing false-sharing in
inet peers handling.
- Introduces netlink notifications for multicast IPv4 and IPv6
address changes.
- Add ipsec support for IP-TFS/AggFrag encapsulation, allowing
aggregation and fragmentation of the inner IP.
- Add sysctl to configure TIME-WAIT reuse delay for TCP sockets, to
avoid local port exhaustion issues when the average connection
lifetime is very short.
- Support updating keys (re-keying) for connections using kernel TLS
(for TLS 1.3 only).
- Support ipv4-mapped ipv6 address clients in smc-r v2.
- Add support for jumbo data packet transmission in RxRPC sockets,
gluing multiple data packets in a single UDP packet.
- Support RxRPC RACK-TLP to manage packet loss and retransmission in
conjunction with the congestion control algorithm.
Driver API:
- Introduce a unified and structured interface for reporting PHY
statistics, exposing consistent data across different H/W via
ethtool.
- Make timestamping selectable, allow the user to select the desired
hwtstamp provider (PHY or MAC) administratively.
- Add support for configuring a header-data-split threshold (HDS)
value via ethtool, to deal with partial or buggy H/W
implementation.
- Consolidate DSA drivers Energy Efficiency Ethernet support.
- Add EEE management to phylink, making use of the phylib
implementation.
- Add phylib support for in-band capabilities negotiation.
- Simplify how phylib-enabled mac drivers expose the supported
interfaces.
Tests and tooling:
- Make the YNL tool package-friendly to make it easier to deploy it
separately from the kernel.
- Increase TCP selftest coverage importing several packetdrill
test-cases.
- Regenerate the ethtool uapi header from the YNL spec, to ease
maintenance and future development.
- Add YNL support for decoding the link types used in net self-tests,
allowing a single build to run both net and drivers/net.
Drivers:
- Ethernet high-speed NICs:
- nVidia/Mellanox (mlx5):
- add cross E-Switch QoS support
- add SW Steering support for ConnectX-8
- implement support for HW-Managed Flow Steering, improving the
rule deletion/insertion rate
- support for multi-host LAG
- Intel (ixgbe, ice, igb):
- ice: add support for devlink health events
- ixgbe: add initial support for E610 chipset variant
- igb: add support for AF_XDP zero-copy
- Meta:
- add support for basic RSS config
- allow changing the number of channels
- add hardware monitoring support
- Broadcom (bnxt):
- implement TCP data split and HDS threshold ethtool support,
enabling Device Memory TCP.
- Marvell Octeon:
- implement egress ipsec offload support for the cn10k family
- Hisilicon (HIBMC):
- implement unicast MAC filtering
- Ethernet NICs embedded and virtual:
- Convert UDP tunnel drivers to NETDEV_PCPU_STAT_DSTATS, avoiding
contented atomic operations for drop counters
- Freescale:
- quicc: phylink conversion
- enetc: support Tx and Rx checksum offload and improve TSO
performances
- MediaTek:
- airoha: introduce support for ETS and HTB Qdisc offload
- Microchip:
- lan78XX USB: preparation work for phylink conversion
- Synopsys (stmmac):
- support DWMAC IP on NXP Automotive SoCs S32G2xx/S32G3xx/S32R45
- refactor EEE support to leverage the new driver API
- optimize DMA and cache access to increase raw RX performances
by 40%
- TI:
- icssg-prueth: add multicast filtering support for VLAN
interface
- netkit:
- add ability to configure head/tailroom
- VXLAN:
- accepts packets with user-defined reserved bit
- Ethernet switches:
- Microchip:
- lan969x: add RGMII support
- lan969x: improve TX and RX performance using the FDMA engine
- nVidia/Mellanox:
- move Tx header handling to PCI driver, to ease XDP support
- Ethernet PHYs:
- Texas Instruments DP83822:
- add support for GPIO2 clock output
- Realtek:
- 8169: add support for RTL8125D rev.b
- rtl822x: add hwmon support for the temperature sensor
- Microchip:
- add support for RDS PTP hardware
- consolidate periodic output signal generation
- CAN:
- several DT-bindings to DT schema conversions
- tcan4x5x:
- add HW standby support
- support nWKRQ voltage selection
- kvaser:
- allowing Bus Error Reporting runtime configuration
- WiFi:
- the on-going Multi-Link Operation (MLO) effort continues,
affecting both the stack and in drivers
- mac80211/cfg80211:
- Emergency Preparedness Communication Services (EPCS) station
mode support
- support for adding and removing station links for MLO
- add support for WiFi 7/EHT mesh over 320 MHz channels
- report Tx power info for each link
- RealTek (rtw88):
- enable USB Rx aggregation and USB 3 to improve performance
- LED support
- RealTek (rtw89):
- refactor power save to support Multi-Link Operations
- add support for RTL8922AE-VS variant
- MediaTek (mt76):
- single wiphy multiband support (preparation for MLO)
- p2p device support
- add TP-Link TXE50UH USB adapter support
- Qualcomm (ath10k):
- support for the QCA6698AQ IP core
- Qualcomm (ath12k):
- enable MLO for QCN9274
- Bluetooth:
- Allow sysfs to trigger hdev reset, to allow recovering devices
not responsive from user-space
- MediaTek: add support for MT7922, MT7925, MT7921e devices
- Realtek: add support for RTL8851BE devices
- Qualcomm: add support for WCN785x devices
- ISO: allow BIG re-sync"
* tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1386 commits)
net/rose: prevent integer overflows in rose_setsockopt()
net: phylink: fix regression when binding a PHY
net: ethernet: ti: am65-cpsw: streamline TX queue creation and cleanup
net: ethernet: ti: am65-cpsw: streamline RX queue creation and cleanup
net: ethernet: ti: am65-cpsw: ensure proper channel cleanup in error path
ipv6: Convert inet6_rtm_deladdr() to per-netns RTNL.
ipv6: Convert inet6_rtm_newaddr() to per-netns RTNL.
ipv6: Move lifetime validation to inet6_rtm_newaddr().
ipv6: Set cfg.ifa_flags before device lookup in inet6_rtm_newaddr().
ipv6: Pass dev to inet6_addr_add().
ipv6: Convert inet6_ioctl() to per-netns RTNL.
ipv6: Hold rtnl_net_lock() in addrconf_init() and addrconf_cleanup().
ipv6: Hold rtnl_net_lock() in addrconf_dad_work().
ipv6: Hold rtnl_net_lock() in addrconf_verify_work().
ipv6: Convert net.ipv6.conf.${DEV}.XXX sysctl to per-netns RTNL.
ipv6: Add __in6_dev_get_rtnl_net().
net: stmmac: Drop redundant skb_mark_for_recycle() for SKB frags
net: mii: Fix the Speed display when the network cable is not connected
sysctl net: Remove macro checks for CONFIG_SYSCTL
eth: bnxt: update header sizing defaults
...
When the 'cachestat()' system call was added in commit cf264e1329fb
("cachestat: implement cachestat syscall"), it was meant to be a much
more convenient (and performant) version of mincore() that didn't need
mapping things into the user virtual address space in order to work.
But it ended up missing the "check for writability or ownership" fix for
mincore(), done in commit 134fca9063ad ("mm/mincore.c: make mincore()
more conservative").
This just adds equivalent logic to 'cachestat()', modified for the file
context (rather than vma).
Linus Torvalds [Wed, 22 Jan 2025 04:12:24 +0000 (20:12 -0800)]
Merge tag 'audit-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit update from Paul Moore:
"A single audit patch that fixes a problem when collecting pathnames
for audit PATH records that was caused by some faulty pathname
matching logic"
* tag 'audit-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: fix suffixed '/' filename matching
Linus Torvalds [Wed, 22 Jan 2025 04:09:14 +0000 (20:09 -0800)]
Merge tag 'selinux-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Extended permissions supported in conditional policy
The SELinux extended permissions, aka "xperms", allow security admins
to target individuals ioctls, and recently netlink messages, with
their SELinux policy. Adding support for conditional policies allows
admins to toggle the granular xperms using SELinux booleans, helping
pave the way for greater use of xperms in general purpose SELinux
policies. This change bumps the maximum SELinux policy version to 34.
- Fix a SCTP/SELinux error return code inconsistency
Depending on the loaded SELinux policy, specifically it's
EXTSOCKCLASS support, the bind(2) LSM/SELinux hook could return
different error codes due to the SELinux code checking the socket's
SELinux object class (which can vary depending on EXTSOCKCLASS) and
not the socket's sk_protocol field. We fix this by doing the obvious,
and looking at the sock->sk_protocol field instead of the object
class.
- Makefile fixes to properly cleanup av_permissions.h
Add av_permissions.h to "targets" so that it is properly cleaned up
using the kbuild infrastructure.
- A number of smaller improvements by Christian Göttsche
A variety of straightforward changes to reduce code duplication,
reduce pointer lookups, migrate void pointers to defined types,
simplify code, constify function parameters, and correct iterator
types.
* tag 'selinux-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: make more use of str_read() when loading the policy
selinux: avoid unnecessary indirection in struct level_datum
selinux: use known type instead of void pointer
selinux: rename comparison functions for clarity
selinux: rework match_ipv6_addrmask()
selinux: constify and reconcile function parameter names
selinux: avoid using types indicating user space interaction
selinux: supply missing field initializers
selinux: add netlink nlmsg_type audit message
selinux: add support for xperms in conditional policies
selinux: Fix SCTP error inconsistency in selinux_socket_bind()
selinux: use native iterator types
selinux: add generated av_permissions.h to targets
Linus Torvalds [Wed, 22 Jan 2025 04:03:04 +0000 (20:03 -0800)]
Merge tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Improved handling of LSM "secctx" strings through lsm_context struct
The LSM secctx string interface is from an older time when only one
LSM was supported, migrate over to the lsm_context struct to better
support the different LSMs we now have and make it easier to support
new LSMs in the future.
These changes explain the Rust, VFS, and networking changes in the
diffstat.
- Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are
enabled
Small tweak to be a bit smarter about when we build the LSM's common
audit helpers.
- Check for absurdly large policies from userspace in SafeSetID
SafeSetID policies rules are fairly small, basically just "UID:UID",
it easy to impose a limit of KMALLOC_MAX_SIZE on policy writes which
helps quiet a number of syzbot related issues. While work is being
done to address the syzbot issues through other mechanisms, this is a
trivial and relatively safe fix that we can do now.
- Various minor improvements and cleanups
A collection of improvements to the kernel selftests, constification
of some function parameters, removing redundant assignments, and
local variable renames to improve readability.
* tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lockdown: initialize local array before use to quiet static analysis
safesetid: check size of policy writes
net: corrections for security_secid_to_secctx returns
lsm: rename variable to avoid shadowing
lsm: constify function parameters
security: remove redundant assignment to return variable
lsm: Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are set
selftests: refactor the lsm `flags_overset_lsm_set_self_attr` test
binder: initialize lsm_context structure
rust: replace lsm context+len with lsm_context
lsm: secctx provider check on release
lsm: lsm_context in security_dentry_init_security
lsm: use lsm_context in security_inode_getsecctx
lsm: replace context+len with lsm_context
lsm: ensure the correct LSM context releaser
Linus Torvalds [Wed, 22 Jan 2025 03:54:32 +0000 (19:54 -0800)]
Merge tag 'integrity-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
"There's just a couple of changes: two kernel messages addressed, a
measurement policy collision addressed, and one policy cleanup.
Please note that the contents of the IMA measurement list is
potentially affected. The builtin tmpfs IMA policy rule change might
introduce additional measurements, while detecting a reboot might
eliminate some measurements"
* tag 'integrity-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: ignore suffixed policy rule comments
ima: limit the builtin 'tcb' dont_measure tmpfs policy rule
ima: kexec: silence RCU list traversal warning
ima: Suspend PCR extends and log appends when rebooting
Linus Torvalds [Wed, 22 Jan 2025 03:48:29 +0000 (19:48 -0800)]
Merge tag 'chrome-platform-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Tzung-Bi Shih:
"New:
- Support new EC if the memory region information comes from the CRS
ACPI resource descriptor in cros_ec_lpc
Improvements:
- Make sure EC is in RW before probing
- Only check events on MKBP notifies to reduce the number of query
commands in cros_ec_lpc
Cleanups:
- Remove unused code and DT bindings for cros-kbd-led-backlight
- Constify 'struct bin_attribute' in cros_ec_vbc
- Use str_enabled_disabled() in cros_usbpd_logger"
* tag 'chrome-platform-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_lpc: Handle EC without CRS section
platform/chrome: cros_usbpd_logger: Use str_enabled_disabled() helper
platform/chrome: cros_ec_lpc: Support direct EC register memory access
platform/chrome: cros_ec_lpc: Merge lpc_driver_ops into ec private structure
platform/chrome: Update ChromeOS EC command tracing
platform/chrome: cros_ec_lpc: Only check for events on MKBP notifies
platform/chrome: cros_ec_vbc: Constify 'struct bin_attribute'
dt-bindings: cros-ec: Remove google,cros-kbd-led-backlight
platform/chrome: cros_kbd_led_backlight: Remove OF match
platform/chrome: cros_ec_proto: remove unnecessary retries
platform/chrome: cros_ec: jump to RW before probing
platform/chrome: cros_kbd_led_backlight: remove unneeded if-statement
Linus Torvalds [Wed, 22 Jan 2025 02:00:00 +0000 (18:00 -0800)]
Merge tag 'docs-6.14' of git://git.lwn.net/linux
Pull Documentation updates from Jonathan Corbet:
- Quite a bit of Chinese and Spanish translation work
- Clarifying that Git commit IDs >12chars are OK
- A new nvme-multipath document
- A reorganization of the admin-guide top-level page to make it
readable
- Clarification of the role of Acked-by and maintainer discretion on
their acceptance
- Some reorganization of debugging-oriented docs
... and typo fixes, documentation updates, etc as usual
* tag 'docs-6.14' of git://git.lwn.net/linux: (50 commits)
Documentation: Fix x86_64 UEFI outdated references to elilo
Documentation/sysctl: Add timer_migration to kernel.rst
docs/mm: Physical memory: Remove zone_t
docs: submitting-patches: clarify that signers may use their discretion on tags
docs: submitting-patches: clarify difference between Acked-by and Reviewed-by
docs: submitting-patches: clarify Acked-by and introduce "# Suffix"
Documentation: bug-hunting.rst: remove odd contact information
docs/zh_CN: Add sak index Chinese translation
doc: module: DEFAULT_SYMBOL_NAMESPACE must be defined before #includes
doc: module: Fix documented type of namespace
Documentation/kernel-parameters: Fix a reference to vga-softcursor.rst
docs/zh_CN: Add landlock index Chinese translation
Documentation: Fix typo localmodonfig -> localmodconfig
overlayfs.rst: Fix and improve grammar
docs/zh_CN: Add siphash index Chinese translation
docs/zh_CN: Add security IMA-templates Chinese translation
docs/zh_CN: Add security digsig Chinese translation
Align git commit ID abbreviation guidelines and checks
docs: process: submitting-patches: split canonical patch format section
docs/zh_CN: Add security lsm Chinese translation
...
Linus Torvalds [Wed, 22 Jan 2025 01:48:03 +0000 (17:48 -0800)]
Merge tag 'rust-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Finish the move to custom FFI integer types started in the previous
cycle and finally map 'long' to 'isize' and 'char' to 'u8'. Do a
few cleanups on top thanks to that.
- Start to use 'derive(CoercePointee)' on Rust >= 1.84.0.
This is a major milestone on the path to build the kernel using
only stable Rust features. In particular, previously we were using
the unstable features 'coerce_unsized', 'dispatch_from_dyn' and
'unsize', and now we will use the new 'derive_coerce_pointee' one,
which is on track to stabilization. This new feature is a macro
that essentially expands into code that internally uses the
unstable features that we were using before, without having to
expose those.
With it, stable Rust users, including the kernel, will be able to
build custom smart pointers that work with trait objects, e.g.:
fn f(p: &Arc<dyn Display>) {
pr_info!("{p}\n");
}
let a: Arc<dyn Display> = Arc::new(42i32, GFP_KERNEL)?;
let b: Arc<dyn Display> = Arc::new("hello there", GFP_KERNEL)?;
f(&a); // Prints "42".
f(&b); // Prints "hello there".
Together with the 'arbitrary_self_types' feature that we started
using in the previous cycle, using our custom smart pointers like
'Arc' will eventually only rely in stable Rust.
- Introduce 'PROCMACROLDFLAGS' environment variable to allow to link
Rust proc macros using different flags than those used for linking
Rust host programs (e.g. when 'rustc' uses a different C library
than the host programs' one), which Android needs.
- Help kernel builds under macOS with Rust enabled by accomodating
other naming conventions for dynamic libraries (i.e. '.so' vs.
'.dylib') which are used for Rust procedural macros. The actual
support for macOS (i.e. the rest of the pieces needed) is provided
out-of-tree by others, following the policy used for other parts of
the kernel by Kbuild.
- Run Clippy for 'rusttest' code too and clean the bits it spotted.
- Provide Clippy with the minimum supported Rust version to improve
the suggestions it gives.
- Document 'bindgen' 0.71.0 regression.
'kernel' crate:
- 'build_error!': move users of the hidden function to the documented
macro, prevent such uses in the future by moving the function
elsewhere and add the macro to the prelude.
- 'types' module: add improved version of 'ForeignOwnable::borrow_mut'
(which was removed in the past since it was problematic); change
'ForeignOwnable' pointer type to '*mut'.
- 'alloc' module: implement 'Display' for 'Box' and align the 'Debug'
implementation to it; add example (doctest) for 'ArrayLayout::new()'
- 'sync' module: document 'PhantomData' in 'Arc'; use
'NonNull::new_unchecked' in 'ForeignOwnable for Arc' impl.
- 'uaccess' module: accept 'Vec's with different allocators in
'UserSliceReader::read_all'.
- 'workqueue' module: enable run-testing a couple more doctests.
- 'error' module: simplify 'from_errno()'.
- 'block' module: fix formatting in code documentation (a lint to catch
these is being implemented).
- Avoid 'unwrap()'s in doctests, which also improves the examples by
showing how kernel code is supposed to be written.
- Avoid 'as' casts with 'cast{,_mut}' calls which are a bit safer.
And a few other cleanups"
* tag 'rust-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (32 commits)
kbuild: rust: add PROCMACROLDFLAGS
rust: uaccess: generalize userSliceReader to support any Vec
rust: kernel: add improved version of `ForeignOwnable::borrow_mut`
rust: kernel: reorder `ForeignOwnable` items
rust: kernel: change `ForeignOwnable` pointer to mut
rust: arc: split unsafe block, add missing comment
rust: types: avoid `as` casts
rust: arc: use `NonNull::new_unchecked`
rust: use derive(CoercePointee) on rustc >= 1.84.0
rust: alloc: add doctest for `ArrayLayout::new()`
rust: init: update `stack_try_pin_init` examples
rust: error: import `kernel`'s `LayoutError` instead of `core`'s
rust: str: replace unwraps with question mark operators
rust: page: remove unnecessary helper function from doctest
rust: rbtree: remove unwrap in asserts
rust: init: replace unwraps with question mark operators
rust: use host dylib naming convention to support macOS
rust: add `build_error!` to the prelude
rust: kernel: move `build_error` hidden function to prevent mistakes
rust: use the `build_error!` macro, not the hidden function
...
Linus Torvalds [Wed, 22 Jan 2025 01:10:05 +0000 (17:10 -0800)]
Merge tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks
Pull kthread updates from Frederic Weisbecker:
"Kthreads affinity follow either of 4 existing different patterns:
1) Per-CPU kthreads must stay affine to a single CPU and never
execute relevant code on any other CPU. This is currently handled
by smpboot code which takes care of CPU-hotplug operations.
Affinity here is a correctness constraint.
2) Some kthreads _have_ to be affine to a specific set of CPUs and
can't run anywhere else. The affinity is set through
kthread_bind_mask() and the subsystem takes care by itself to
handle CPU-hotplug operations. Affinity here is assumed to be a
correctness constraint.
3) Per-node kthreads _prefer_ to be affine to a specific NUMA node.
This is not a correctness constraint but merely a preference in
terms of memory locality. kswapd and kcompactd both fall into this
category. The affinity is set manually like for any other task and
CPU-hotplug is supposed to be handled by the relevant subsystem so
that the task is properly reaffined whenever a given CPU from the
node comes up. Also care should be taken so that the node affinity
doesn't cross isolated (nohz_full) cpumask boundaries.
4) Similar to the previous point except kthreads have a _preferred_
affinity different than a node. Both RCU boost kthreads and RCU
exp kworkers fall into this category as they refer to "RCU nodes"
from a distinctly distributed tree.
Currently the preferred affinity patterns (3 and 4) have at least 4
identified users, with more or less success when it comes to handle
CPU-hotplug operations and CPU isolation. Each of which do it in its
own ad-hoc way.
This is an infrastructure proposal to handle this with the following
API changes:
- kthread_create_on_node() automatically affines the created kthread
to its target node unless it has been set as per-cpu or bound with
kthread_bind[_mask]() before the first wake-up.
- kthread_affine_preferred() is a new function that can be called
right after kthread_create_on_node() to specify a preferred
affinity different than the specified node.
When the preferred affinity can't be applied because the possible
targets are offline or isolated (nohz_full), the kthread is affine to
the housekeeping CPUs (which means to all online CPUs most of the time
or only the non-nohz_full CPUs when nohz_full= is set).
kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been
converted, along with a few old drivers.
Summary of the changes:
- Consolidate a bunch of ad-hoc implementations of
kthread_run_on_cpu()
- Introduce task_cpu_fallback_mask() that defines the default last
resort affinity of a task to become nohz_full aware
- Add some correctness check to ensure kthread_bind() is always
called before the first kthread wake up.
- Default affine kthread to its preferred node.
- Convert kswapd / kcompactd and remove their halfway working ad-hoc
affinity implementation
- Implement kthreads preferred affinity
- Unify kthread worker and kthread API's style
- Convert RCU kthreads to the new API and remove the ad-hoc affinity
implementation"
* tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks:
kthread: modify kernel-doc function name to match code
rcu: Use kthread preferred affinity for RCU exp kworkers
treewide: Introduce kthread_run_worker[_on_cpu]()
kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format
rcu: Use kthread preferred affinity for RCU boost
kthread: Implement preferred affinity
mm: Create/affine kswapd to its preferred node
mm: Create/affine kcompactd to its preferred node
kthread: Default affine kthread to its preferred NUMA node
kthread: Make sure kthread hasn't started while binding it
sched,arm64: Handle CPU isolation on last resort fallback rq selection
arm64: Exclude nohz_full CPUs from 32bits el0 support
lib: test_objpool: Use kthread_run_on_cpu()
kallsyms: Use kthread_run_on_cpu()
soc/qman: test: Use kthread_run_on_cpu()
arm/bL_switcher: Use kthread_run_on_cpu()
Linus Torvalds [Wed, 22 Jan 2025 00:09:47 +0000 (16:09 -0800)]
Merge tag 'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
"There are two external interactions of note, the msm tree pull in some
opp tree, hopefully the opp tree arrives from the same git tree
however it normally does.
There is also a new cgroup controller for device memory, that is used
by drm, so is merging through my tree. This will hopefully help open
up gpu cgroup usage a bit more and move us forward.
There is a new accelerator driver for the AMD XDNA Ryzen AI NPUs.
Then the usual xe/amdgpu/i915/msm leaders and lots of changes and
refactors across the board:
core:
- device memory cgroup controller added
- Remove driver date from drm_driver
- Add drm_printer based hex dumper
- drm memory stats docs update
- scheduler documentation improvements
new driver:
- amdxdna - Ryzen AI NPU support
connector:
- add a mutex to protect ELD
- make connector setup two-step
bridge:
- ti-sn65dsi83: Add ti,lvds-vod-swing optional properties
- Provide default implementation of atomic_check for HDMI bridges
- it605: HDCP improvements, MCCS Support
xe:
- make OA buffer size configurable
- GuC capture fixes
- add ufence and g2h flushes
- restore system memory GGTT mappings
- ioctl fixes
- SRIOV PF scheduling priority
- allow fault injection
- lots of improvements/refactors
- Enable GuC's WA_DUAL_QUEUE for newer platforms
- IRQ related fixes and improvements
i915:
- More accurate engine busyness metrics with GuC submission
- Ensure partial BO segment offset never exceeds allowed max
- Flush GuC CT receive tasklet during reset preparation
- Some DG2 refactor to fix DG2 bugs when operating with certain CPUs
- Fix DG1 power gate sequence
- Enabling uncompressed 128b/132b UHBR SST
- Handle hdmi connector init failures, and no HDMI/DP cases
- More robust engine resets on Haswell and older
i915/xe display:
- HDCP fixes for Xe3Lpd
- New GSC FW ARL-H/ARL-U
- support 3 VDSC engines 12 slices
- MBUS joining sanitisation
- reconcile i915/xe display power mgmt
- Xe3Lpd fixes
- UHBR rates for Thunderbolt
amdgpu:
- DRM panic support
- track BO memory stats at runtime
- Fix max surface handling in DC
- Cleaner shader support for gfx10.3 dGPUs
- fix drm buddy trim handling
- SDMA engine reset updates
- Fix doorbell ttm cleanup
- RAS updates
- ISP updates
- SDMA queue reset support
- Rework DPM powergating interfaces
- Documentation updates and cleanups
- DCN 3.5 updates
- Use a pm notifier to more gracefully handle VRAM eviction on
suspend or hibernate
- Add debugfs interfaces for forcing scheduling to specific engine
instances
- GG 9.5 updates
- IH 4.4 updates
- Make missing optional firmware less noisy
- PSP 13.x updates
- SMU 13.x updates
- VCN 5.x updates
- JPEG 5.x updates
- GC 12.x updates
- DC FAMS updates
msm:
- MDSS:
- properly described UBWC registers
- added SM6150 (aka QCS615) support
- DPU:
- added SM6150 (aka QCS615) support
- enabled wide planes if virtual planes are enabled (by using two
SSPPs for a single plane)
- added CWB hardware blocks support
- DSI:
- added SM6150 (aka QCS615) support
- GPU:
- Print GMU core fw version
- GMU bandwidth voting for a740 and a750
- Expose uche trap base via uapi
- UAPI error reporting
rcar-du:
- Add r8a779h0 Support
ivpu:
- Fix qemu crash when using passthrough
nouveau:
- expose GSP-RM logging buffers via debugfs
panfrost:
- Add MT8188 Mali-G57 MC3 support
rockchip:
- Gamma LUT support
hisilicon:
- new HIBMC support
virtio-gpu:
- convert to helpers
- add prime support for scanout buffers
v3d:
- Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
vc4:
- Add support for BCM2712
vkms:
- line-per-line compositing algorithm to improve performance
zynqmp:
- Add DP audio support
mediatek:
- dp: Add sdp path reset
- dp: Support flexible length of DP calibration data
* tag 'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel: (1070 commits)
drm/bridge: fix documentation for the hdmi_audio_prepare() callback
doc/cgroup: Fix title underline length
drm/doc: Include new drm-compute documentation
cgroup/dmem: Fix parameters documentation
cgroup/dmem: Select PAGE_COUNTER
kernel/cgroup: Remove the unused variable climit
drm/display: hdmi: Do not read EDID on disconnected connectors
drm/tests: hdmi: Add connector disablement test
drm/connector: hdmi: Do atomic check when necessary
drm/amd/display: 3.2.316
drm/amd/display: avoid reset DTBCLK at clock init
drm/amd/display: improve dpia pre-train
drm/amd/display: Apply DML21 Patches
drm/amd/display: Use HW lock mgr for PSR1
drm/amd/display: Revised for Replay Pseudo vblank control
drm/amd/display: Add a new flag for replay low hz
drm/amd/display: Remove unused read_ono_state function from Hwss module
drm/amd/display: Do not elevate mem_type change to full update
drm/amd/display: Do not wait for PSR disable on vbl enable
drm/amd/display: Remove unnecessary eDP power down
...
Linus Torvalds [Tue, 21 Jan 2025 23:19:20 +0000 (15:19 -0800)]
Merge tag 'trace-sorttable-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull scipts/sorttable updates from Steven Rostedt:
"The sorttable.c was a copy from recordmcount.c which is very hard to
maintain. That's because it uses macro helpers and places the code in
a header file sorttable.h to handle both the 64 bit and 32 bit version
of the Elf structures. It also uses _r()/r()/r2() wrappers around
accessing the data which will read the 64 bit or 32 bit version of the
data as well as handle endianess. If the wrong wrapper is used, an
invalid value will result, and this has been a cause for bugs in the
past. In fact the new ORC code doesn't even use it. That's fine
because ORC is only for 64 bit x86 which is the default parsing.
Instead of having a bunch of macros defined and then include the code
twice from a header, the Elf structures are each wrapped in a union.
The union holds the 64 bit and 32 bit version of the needed structure.
Then a structure of function pointers is used, along with helper
macros to access the ELF types appropriately for their byte size and
endianess. How to reference the data fields is moved from the code
that implements the sorting to the helper functions where all accesses
to a field will use he same helper function. As long as the helper
functions access the fields correctly, the code will also access the
fields. This is an improvement over having to code implementing the
sorting having to make sure it always uses the right accessor function
when reading an ELF field.
This is a clean up only, the functionality of the scripts/sorttable.c
does not change"
* tag 'trace-sorttable-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
scripts/sorttable: Use a structure of function pointers for elf helpers
scripts/sorttable: Get start/stop_mcount_loc from ELF file directly
scripts/sorttable: Move code from sorttable.h into sorttable.c
scripts/sorttable: Use uint64_t for mcount sorting
scripts/sorttable: Add helper functions for Elf_Sym
scripts/sorttable: Add helper functions for Elf_Shdr
scripts/sorttable: Add helper functions for Elf_Ehdr
scripts/sorttable: Convert Elf_Sym MACRO over to a union
scripts/sorttable: Replace Elf_Shdr Macro with a union
scripts/sorttable: Convert Elf_Ehdr to union
scripts/sorttable: Make compare_extable() into two functions
scripts/sorttable: Have the ORC code use the _r() functions to read
scripts/sorttable: Remove unneeded Elf_Rel
scripts/sorttable: Remove unused write functions
scripts/sorttable: Remove unused macro defines
Linus Torvalds [Tue, 21 Jan 2025 23:15:28 +0000 (15:15 -0800)]
Merge tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:
- Have fprobes built on top of function graph infrastructure
The fprobe logic is an optimized kprobe that uses ftrace to attach to
functions when a probe is needed at the start or end of the function.
The fprobe and kretprobe logic implements a similar method as the
function graph tracer to trace the end of the function. That is to
hijack the return address and jump to a trampoline to do the trace
when the function exits. To do this, a shadow stack needs to be
created to store the original return address. Fprobes and function
graph do this slightly differently. Fprobes (and kretprobes) has
slots per callsite that are reserved to save the return address. This
is fine when just a few points are traced. But users of fprobes, such
as BPF programs, are starting to add many more locations, and this
method does not scale.
The function graph tracer was created to trace all functions in the
kernel. In order to do this, when function graph tracing is started,
every task gets its own shadow stack to hold the return address that
is going to be traced. The function graph tracer has been updated to
allow multiple users to use its infrastructure. Now have fprobes be
one of those users. This will also allow for the fprobe and kretprobe
methods to trace the return address to become obsolete. With new
technologies like CFI that need to know about these methods of
hijacking the return address, going toward a solution that has only
one method of doing this will make the kernel less complex.
- Cleanup with guard() and free() helpers
There were several places in the code that had a lot of "goto out" in
the error paths to either unlock a lock or free some memory that was
allocated. But this is error prone. Convert the code over to use the
guard() and free() helpers that let the compiler unlock locks or free
memory when the function exits.
- Remove disabling of interrupts in the function graph tracer
When function graph tracer was first introduced, it could race with
interrupts and NMIs. To prevent that race, it would disable
interrupts and not trace NMIs. But the code has changed to allow NMIs
and also interrupts. This change was done a long time ago, but the
disabling of interrupts was never removed. Remove the disabling of
interrupts in the function graph tracer is it is not needed. This
greatly improves its performance.
- Allow the :mod: command to enable tracing module functions on the
kernel command line.
The function tracer already has a way to enable functions to be
traced in modules by writing ":mod:<module>" into set_ftrace_filter.
That will enable either all the functions for the module if it is
loaded, or if it is not, it will cache that command, and when the
module is loaded that matches <module>, its functions will be
enabled. This also allows init functions to be traced. But currently
events do not have that feature.
Because enabling function tracing can be done very early at boot up
(before scheduling is enabled), the commands that can be done when
function tracing is started is limited. Having the ":mod:" command to
trace module functions as they are loaded is very useful. Update the
kernel command line function filtering to allow it.
* tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits)
ftrace: Implement :mod: cache filtering on kernel command line
tracing: Adopt __free() and guard() for trace_fprobe.c
bpf: Use ftrace_get_symaddr() for kprobe_multi probes
ftrace: Add ftrace_get_symaddr to convert fentry_ip to symaddr
Documentation: probes: Update fprobe on function-graph tracer
selftests/ftrace: Add a test case for repeating register/unregister fprobe
selftests: ftrace: Remove obsolate maxactive syntax check
tracing/fprobe: Remove nr_maxactive from fprobe
fprobe: Add fprobe_header encoding feature
fprobe: Rewrite fprobe on function-graph tracer
s390/tracing: Enable HAVE_FTRACE_GRAPH_FUNC
ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
tracing: Add ftrace_fill_perf_regs() for perf event
tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
fprobe: Use ftrace_regs in fprobe exit handler
fprobe: Use ftrace_regs in fprobe entry handler
fgraph: Pass ftrace_regs to retfunc
fgraph: Replace fgraph_ret_regs with ftrace_regs
...
Linus Torvalds [Tue, 21 Jan 2025 23:11:54 +0000 (15:11 -0800)]
Merge tag 'trace-ringbuffer-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull trace ring-buffer updates from Steven Rostedt:
- Clean up the __rb_map_vma() logic
The logic of __rb_map_vma() has a error check with WARN_ON() that
makes sure that the index does not go past the end of the array of
buffers. The test in the loop pretty much guarantees that it will
never happen, but since the relation of the variables used is a
little complex, the WARN_ON() check was added. It was noticed that
the array was dereferenced before this check and if the logic does
break and for some reason the logic goes past the array, there will
be an out of bounds access here. Move the access to after the
WARN_ON().
- Consolidate how the ring buffer is determined to be empty
Currently there's two ways that are used to determine if the ring
buffer is empty. One relies on the status of the commit and reader
pages and what was read, and the other is on what was written vs what
was read. By using the number of entries (written) method, it can be
used for reading events that are out of the kernel's control (what
pKVM will use). Move to this method to make it easier to implement a
pKVM ring buffer that the kernel can read.
* tag 'trace-ringbuffer-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Make reading page consistent with the code logic
ring-buffer: Check for empty ring-buffer with rb_num_of_entries()
Linus Torvalds [Tue, 21 Jan 2025 22:39:21 +0000 (14:39 -0800)]
Merge tag 'rcu.release.v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Uladzislau Rezki:
"Misc fixes:
- check if IRQs are disabled in rcu_exp_need_qs()
- instrument KCSAN exclusive-writer assertions
- add extra WARN_ON_ONCE() check
- set the cpu_no_qs.b.exp under lock
- warn if callback enqueued on offline CPU
Torture-test updates:
- add rcutorture.preempt_duration kernel module parameter
- make the TREE03 scenario do preemption
- improve pooling timeouts for rcu_torture_writer()
- improve output of "Failure/close-call rcutorture reader segments"
- add some reader-state debugging checks
- update doc of polled APIs
- add extra diagnostics for per-reader-segment preemption
- add an extra test for sched_clock()
- improve testing on unresponsive systems
SRCU updates:
- improve doc for srcu_read_lock() in terms of return value
- fix typo in comments
- remove redundant GP sequence checks in the srcu_funnel_gp_start"
* tag 'rcu.release.v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
srcu: Remove redundant GP sequence checks in srcu_funnel_gp_start
srcu: Fix typo s/srcu_check_read_flavor()/__srcu_check_read_flavor()/
srcu: Guarantee non-negative return value from srcu_read_lock()
MAINTAINERS: Update RCU git tree
rcu: Add lockdep_assert_irqs_disabled() to rcu_exp_need_qs()
rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp
rcu: Make preemptible rcu_exp_handler() check idempotency
rcu: Replace open-coded rcu_exp_need_qs() from rcu_exp_handler() with call
rcu: Move rcu_report_exp_rdp() setting of ->cpu_no_qs.b.exp under lock
rcu: Make rcu_report_exp_cpu_mult() caller acquire lock
rcu: Report callbacks enqueued on offline CPU blind spot
rcutorture: Use symbols for SRCU reader flavors
rcutorture: Add per-reader-segment preemption diagnostics
rcutorture: Read CPU ID for decoration protected by both reader types
rcutorture: Add preempt_count() to rcutorture_one_extend_check() diagnostics
rcutorture: Add parameters to control polled/conditional wait interval
rcutorture: Add documentation for recent conditional and polled APIs
rcutorture: Ignore attempts to test preemption and forward progress
rcutorture: Make rcutorture_one_extend() check reader state
rcutorture: Pretty-print rcutorture reader segments
...