]> www.infradead.org Git - users/willy/linux.git/log
users/willy/linux.git
3 months agomm: remove the for_reclaim field from struct writeback_control
Christoph Hellwig [Tue, 10 Jun 2025 05:49:42 +0000 (07:49 +0200)]
mm: remove the for_reclaim field from struct writeback_control

This field is now only set to one in the i915 gem code that only calls
writeback_iter on it, which ignores the flag.  All other checks are thuse
dead code and the field can be removed.

Link: https://lkml.kernel.org/r/20250610054959.2057526-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: stop passing a writeback_control structure to swap_writeout
Christoph Hellwig [Tue, 10 Jun 2025 05:49:41 +0000 (07:49 +0200)]
mm: stop passing a writeback_control structure to swap_writeout

swap_writeout only needs the swap_iocb cookie from the writeback_control
structure, so pass it explicitly.

Link: https://lkml.kernel.org/r/20250610054959.2057526-6-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: stop passing a writeback_control structure to __swap_writepage
Christoph Hellwig [Tue, 10 Jun 2025 05:49:40 +0000 (07:49 +0200)]
mm: stop passing a writeback_control structure to __swap_writepage

__swap_writepage only needs the swap_iocb cookie from the
writeback_control structure, so pass it explicitly and remove the now
unused swap_iocb member from struct writeback_control.

Link: https://lkml.kernel.org/r/20250610054959.2057526-5-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: tidy up swap_writeout
Christoph Hellwig [Tue, 10 Jun 2025 05:49:39 +0000 (07:49 +0200)]
mm: tidy up swap_writeout

Use a goto label to consolidate the unlock folio and return pattern and
don't bother with an else after a return / goto.

Link: https://lkml.kernel.org/r/20250610054959.2057526-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: stop passing a writeback_control structure to shmem_writeout
Christoph Hellwig [Tue, 10 Jun 2025 05:49:38 +0000 (07:49 +0200)]
mm: stop passing a writeback_control structure to shmem_writeout

shmem_writeout only needs the swap_iocb cookie and the split folio list.
Pass those explicitly and remove the now unused list member from struct
writeback_control.

Link: https://lkml.kernel.org/r/20250610054959.2057526-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: split out a writeout helper from pageout
Christoph Hellwig [Tue, 10 Jun 2025 05:49:37 +0000 (07:49 +0200)]
mm: split out a writeout helper from pageout

Patch series "stop passing a writeback_control to swap/shmem writeout",
v3.

This series was intended to remove the last remaining users of
AOP_WRITEPAGE_ACTIVATE after my other pending patches removed the rest,
but spectacularly failed at that.

But instead it nicely improves the code, and removes two pointers from
struct writeback_control.

This patch (of 6):

Move the code to write back swap / shmem folios into a self-contained
helper to keep prepare for refactoring it.

Link: https://lkml.kernel.org/r/20250610054959.2057526-1-hch@lst.de
Link: https://lkml.kernel.org/r/20250610054959.2057526-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Baolin Wang <baolin.wang@linux.aibaba.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm, list_lru: refactor the locking code
Kairui Song [Mon, 26 May 2025 18:06:38 +0000 (02:06 +0800)]
mm, list_lru: refactor the locking code

Cocci is confused by the try lock then release RCU and return logic here.
So separate the try lock part out into a standalone helper.  The code is
easier to follow too.

No feature change, fixes:

cocci warnings: (new ones prefixed by >>)
>> mm/list_lru.c:82:3-9: preceding lock on line 77
>> mm/list_lru.c:82:3-9: preceding lock on line 77
   mm/list_lru.c:82:3-9: preceding lock on line 75
   mm/list_lru.c:82:3-9: preceding lock on line 75

Link: https://lkml.kernel.org/r/20250526180638.14609-1-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202505252043.pbT1tBHJ-lkp@intel.com/
Reviewed-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kairui Song <kasong@tencent.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: rename CONFIG_PAGE_BLOCK_ORDER to CONFIG_PAGE_BLOCK_MAX_ORDER
Zi Yan [Wed, 4 Jun 2025 21:14:27 +0000 (17:14 -0400)]
mm: rename CONFIG_PAGE_BLOCK_ORDER to CONFIG_PAGE_BLOCK_MAX_ORDER

The config is in fact an additional upper limit of pageblock_order, so
rename it to avoid confusion.

Link: https://lkml.kernel.org/r/20250604211427.1590859-1-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Juan Yescas <jyescas@google.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Isaac J. Manjarres" <isaacmanjarres@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm/gup: remove (VM_)BUG_ONs
David Hildenbrand [Wed, 4 Jun 2025 14:05:44 +0000 (16:05 +0200)]
mm/gup: remove (VM_)BUG_ONs

Especially once we hit one of the assertions in
sanity_check_pinned_pages(), observing follow-up assertions failing in
other code can give good clues about what went wrong, so use
VM_WARN_ON_ONCE instead.

While at it, let's just convert all VM_BUG_ON to VM_WARN_ON_ONCE as well.
Add one comment for the pfn_valid() check.

We have to introduce VM_WARN_ON_ONCE_VMA() to make that fly.

Drop the BUG_ON after mmap_read_lock_killable(), if that ever returns
something > 0 we're in bigger trouble.  Convert the other BUG_ON's into
VM_WARN_ON_ONCE as well, they are in a similar domain "should never
happen", but more reasonable to check for during early testing.

[david@redhat.com: use the _FOLIO variant where possible, per Lorenzo]
Link: https://lkml.kernel.org/r/844bd929-a551-48e3-a12e-285cd65ba580@redhat.com
Link: https://lkml.kernel.org/r/20250604140544.688711-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agoDocs/admin-guide/mm/damon: add DAMON_STAT usage document
SeongJae Park [Wed, 4 Jun 2025 18:31:27 +0000 (11:31 -0700)]
Docs/admin-guide/mm/damon: add DAMON_STAT usage document

Document DAMON_STAT usage and add a link to it on DAMON admin-guide page.

Link: https://lkml.kernel.org/r/20250604183127.13968-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm/damon/stat: calculate and expose idle time percentiles
SeongJae Park [Wed, 4 Jun 2025 18:31:26 +0000 (11:31 -0700)]
mm/damon/stat: calculate and expose idle time percentiles

Knowing how much memory is how cold can be useful for understanding
coldness and utilization efficiency of memory.  The raw form of DAMON's
monitoring results has the information.  Convert the raw results into the
per-byte idle time distributions and expose it as percentiles metric to
users, as a read-only DAMON_STAT parameter.

In detail, the metrics are calculated as follows.  First, DAMON's
per-region access frequency and age information is converted into per-byte
idle time.  If access frequency of a region is higher than zero, every
byte of the region has zero idle time.  If the access frequency of a
region is zero, every byte of the region has idle time as the age of the
region.  Then the logic sorts the per-byte idle times and provides the
value at 0/100, 1/100, ..., 99/100 and 100/100 location of the sorted
array.

The metric can be easily aggregated and compared on large scale production
systems.  For example, if an average of 75-th percentile idle time of
machines that collected on similar time is two minutes, it means the
system's 25 percent memory is not accessed at all for two minutes or more
on average.  If a workload considers two minutes as unit work time, we can
conclude its working set size is only 75 percent of the memory.  If the
system utilizes proactive reclamation and it supports coldness-based
thresholds like DAMON_RECLAIM, the idle time percentiles can be used to
find a more safe or aggressive coldness threshold for aimed memory saving.

Link: https://lkml.kernel.org/r/20250604183127.13968-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm/damon/stat: calculate and expose estimated memory bandwidth
SeongJae Park [Wed, 4 Jun 2025 18:31:25 +0000 (11:31 -0700)]
mm/damon/stat: calculate and expose estimated memory bandwidth

The raw form of DAMON's monitoring results captures many details of the
information.  However, not every bit of the information is always required
for understanding practical access patterns.  Especially on real world
production systems of high scale time and size, the raw form is difficult
to be aggregated and compared.

Convert the raw monitoring results into a single number metric, namely
estimated memory bandwidth and expose it to users as a read-only
DAMON_STAT parameter.  The metric represents access intensiveness
(hotness) of the system.  It can easily be aggregated and compared for
high level understanding of the access pattern on large systems.

Link: https://lkml.kernel.org/r/20250604183127.13968-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm/damon: introduce DAMON_STAT module
SeongJae Park [Wed, 4 Jun 2025 18:31:24 +0000 (11:31 -0700)]
mm/damon: introduce DAMON_STAT module

Patch series "mm/damon: introduce DAMON_STAT for simple and practical
access monitoring", v2.

DAMON-based access monitoring is not simple due to required DAMON control
and results visualizations.  Introduce a static kernel module for making
it simple.  The module can be enabled without manual setup and provides
access pattern metrics that easy to fetch and understand the practical
access pattern information, namely estimated memory bandwidth and memory
idle time percentiles.

Background and Problems
=======================

DAMON can be used for monitoring data access patterns of the system and
workloads.  Specifically, users can start DAMON to monitor access events
on specific address space with fine controls including address ranges to
monitor and time intervals between samplings and aggregations.  The
resulting access information snapshot contains access frequency
(nr_accesses) and how long the frequency was kept (age) for each byte.

The monitoring usage is not simple and practical enough for production
usage.  Users should first start DAMON with a number of parameters, and
wait until DAMON's monitoring results capture a reasonable amount of the
time data (age).  In production, such manual start and wait is impractical
to capture useful information from a high number of machines in a timely
manner.

The monitoring result is also too detailed to be used on production
environments.  The raw results are hard to be aggregated and/or compared
for production environments having a large scale of time, space and
machines fleet.

Users have to implement and use their own automation of DAMON control and
results processing.  It is repetitive and challenging since there is no
good reference or guideline for such automation.

Solution: DAMON_STAT
====================

Implement such automation in kernel space as a static kernel module,
namely DAMON_STAT.  It can be enabled at build, boot, or run time via its
build configuration or module parameter.  It monitors the entire physical
address space with monitoring intervals that auto-tuned for a reasonable
amount of access observations and minimum overhead.  It converts the raw
monitoring results into simpler metrics that can easily be aggregated and
compared, namely estimated memory bandwidth and idle time percentiles.

Understanding of the metrics and the user interface of DAMON_STAT is
essential.  Refer to the commit messages of the second and the third
patches of this patch series for more details about the metrics.  For the
user interface, the standard module parameters system is used.  Refer to
the fourth patch of this patch series for details of the user interface.

Discussions
===========

The module aims to be useful on production environments constructed with a
large number of machines that run a long time.  The auto-tuned monitoring
intervals ensure a reasonable quality of the outputs.  The auto-tuning
also ensures its overhead be reasonable and low enough to be enabled
always on the production.  The simplified monitoring results metrics can
be useful for showing both coldness (idle time percentiles) and hotness
(memory bandwidth) of the system's access pattern.  We expect the
information can be useful for assessing system memory utilization and
inspiring optimizations or investigations on both kernel and user space
memory management logics for large scale fleets.

We hence expect the module is good enough to be just used in most
environments.  For special cases that require a custom access monitoring
automation, users will still benefit by using DAMON_STAT as a reference or
a guideline for their specialized automation.

This patch (of 4):

To use DAMON for monitoring access patterns of the system, users should
manually start DAMON via DAMON sysfs ABI with a number of parameters for
specifying the monitoring target address space, address ranges, and
monitoring intervals.  After that, users should also wait until desired
amount of time data is captured into DAMON's monitoring results.  It is
bothersome and take a long time to be practical for access monitoring on
large fleet level production environments.

For access-aware system operations use cases like proactive cold memory
reclamation, similar problems existed.  We we solved those by introducing
dedicated static kernel modules such as DAMON_RECLAIM.

Implement such static kernel module for access monitoring, namely
DAMON_STAT.  It monitors the entire physical address space with auto-tuned
monitoring intervals.  The auto-tuning is set to capture 4 % of observable
access events in each snapshot while keeping the sampling intervals 5
milliseconds in minimum and 10 seconds in maximum.  From a few production
environments, we confirmed this setup provides high quality monitoring
results with minimum overheads.  The module therefore receives only one
user input, whether to enable or disable it.  It can be set on build or
boot time via build configuration or kernel boot command line.  It can
also be overridden at runtime.

Note that this commit only implements the DAMON control part of the
module.  Users could get the monitoring results via damon:damon_aggregated
tracepoint, but that's of course not the recommended way.  Following
commits will implement convenient and optimized ways for serving the
monitoring results to users.

[sj@kernel.org: use IS_ENABLED() for enabled initial value]
Link: https://lkml.kernel.org/r/20250604205619.18929-1-sj@kernel.org
[sj@kernel.org: reset enabled when DAMON start failed]
Link: https://lkml.kernel.org/r/20250706184750.36588-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: remove unused mmap tracepoints
Caleb Sander Mateos [Fri, 11 Apr 2025 16:17:45 +0000 (10:17 -0600)]
mm: remove unused mmap tracepoints

The vma_mas_szero and vma_store tracepoints are unused since commit
fbcc3104b843 ("mmap: convert __vma_adjust() to use vma iterator").  Remove
them so they are no longer listed as available tracepoints.

Link: https://lkml.kernel.org/r/20250411161746.1043239-1-csander@purestorage.com
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reported-by: Eric Mueller <emueller@purestorage.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: Kconfig: use verb *use* in plural form in description
Paul Menzel [Tue, 3 Jun 2025 06:13:03 +0000 (08:13 +0200)]
mm: Kconfig: use verb *use* in plural form in description

*workloads* is plural requiring the verb *use* in plural form.

Link: https://lkml.kernel.org/r/20250603061303.479551-2-pmenzel@molgen.mpg.de
Fixes: e13e7922d034 ("mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order")
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm/hugetlb: convert hugetlb_change_protection() to folios
Sidhartha Kumar [Wed, 28 May 2025 19:20:13 +0000 (15:20 -0400)]
mm/hugetlb: convert hugetlb_change_protection() to folios

The for loop inside hugetlb_change_protection() increments by the huge
page size:

psize = huge_page_size(h);
for (; address < end; address += psize)

so we are operating on the head page of the huge pages between address and
end.  We can safely convert the struct page usage to struct folio.

Link: https://lkml.kernel.org/r/20250528192013.91130-1-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agotools/testing/selftests: add VMA merge tests for KSM merge
Lorenzo Stoakes [Thu, 29 May 2025 17:15:48 +0000 (18:15 +0100)]
tools/testing/selftests: add VMA merge tests for KSM merge

Add test to assert that we have now allowed merging of VMAs when KSM
merging-by-default has been set by prctl(PR_SET_MEMORY_MERGE, ...).

We simply perform a trivial mapping of adjacent VMAs expecting a merge,
however prior to recent changes implementing this mode earlier than
before, these merges would not have succeeded.

Assert that we have fixed this!

Link: https://lkml.kernel.org/r/6dec7aabf062c6b121cfac992c9c716cefdda00c.1748537921.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Tested-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Xu Xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: prevent KSM from breaking VMA merging for new VMAs
Lorenzo Stoakes [Thu, 29 May 2025 17:15:47 +0000 (18:15 +0100)]
mm: prevent KSM from breaking VMA merging for new VMAs

If a user wishes to enable KSM mergeability for an entire process and all
fork/exec'd processes that come after it, they use the prctl()
PR_SET_MEMORY_MERGE operation.

This defaults all newly mapped VMAs to have the VM_MERGEABLE VMA flag set
(in order to indicate they are KSM mergeable), as well as setting this
flag for all existing VMAs and propagating this across fork/exec.

However it also breaks VMA merging for new VMAs, both in the process and
all forked (and fork/exec'd) child processes.

This is because when a new mapping is proposed, the flags specified will
never have VM_MERGEABLE set.  However all adjacent VMAs will already have
VM_MERGEABLE set, rendering VMAs unmergeable by default.

To work around this, we try to set the VM_MERGEABLE flag prior to
attempting a merge.  In the case of brk() this can always be done.

However on mmap() things are more complicated - while KSM is not supported
for MAP_SHARED file-backed mappings, it is supported for MAP_PRIVATE
file-backed mappings.

These mappings may have deprecated .mmap() callbacks specified which
could, in theory, adjust flags and thus KSM eligibility.

So we check to determine whether this is possible.  If not, we set
VM_MERGEABLE prior to the merge attempt on mmap(), otherwise we retain the
previous behaviour.

This fixes VMA merging for all new anonymous mappings, which covers the
majority of real-world cases, so we should see a significant improvement
in VMA mergeability.

For MAP_PRIVATE file-backed mappings, those which implement the
.mmap_prepare() hook and shmem are both known to be safe, so we allow
these, disallowing all other cases.

Also add stubs for newly introduced function invocations to VMA userland
testing.

[lorenzo.stoakes@oracle.com: correctly invoke late KSM check after mmap hook]
Link: https://lkml.kernel.org/r/5861f8f6-cf5a-4d82-a062-139fb3f9cddb@lucifer.local
Link: https://lkml.kernel.org/r/3ba660af716d87a18ca5b4e635f2101edeb56340.1748537921.git.lorenzo.stoakes@oracle.com
Fixes: d7597f59d1d3 ("mm: add new api to enable ksm per process") # please no backport!
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xu Xin <xu.xin16@zte.com.cn>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Stefan Roesch <shr@devkernel.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: ksm: refer to special VMAs via VM_SPECIAL in ksm_compatible()
Lorenzo Stoakes [Thu, 29 May 2025 17:15:46 +0000 (18:15 +0100)]
mm: ksm: refer to special VMAs via VM_SPECIAL in ksm_compatible()

There's no need to spell out all the special cases, also doing it this way
makes it absolutely clear that we preclude unmergeable VMAs in general,
and puts the other excluded flags in stark and clear contrast.

Link: https://lkml.kernel.org/r/c8be5b055163b164c8824020164076ee3b9389bd.1748537921.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xu Xin <xu.xin16@zte.com.cn>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Stefan Roesch <shr@devkernel.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: ksm: have KSM VMA checks not require a VMA pointer
Lorenzo Stoakes [Thu, 29 May 2025 17:15:45 +0000 (18:15 +0100)]
mm: ksm: have KSM VMA checks not require a VMA pointer

Patch series "mm: ksm: prevent KSM from breaking merging of new VMAs", v3.

When KSM-by-default is established using prctl(PR_SET_MEMORY_MERGE), this
defaults all newly mapped VMAs to having VM_MERGEABLE set, and thus makes
them available to KSM for samepage merging.  It also sets VM_MERGEABLE in
all existing VMAs.

However this causes an issue upon mapping of new VMAs - the initial flags
will never have VM_MERGEABLE set when attempting a merge with adjacent
VMAs (this is set later in the mmap() logic), and adjacent VMAs will
ALWAYS have VM_MERGEABLE set.

This renders all newly mapped VMAs unmergeable.

To avoid this, this series performs the check for PR_SET_MEMORY_MERGE far
earlier in the mmap() logic, prior to the merge being attempted.

However we run into complexity with the depreciated .mmap() callback - if
a driver hooks this, it might change flags which adjust KSM merge
eligibility.

We have to worry about this because, while KSM is only applicable to
private mappings, this includes both anonymous and MAP_PRIVATE-mapped
file-backed mappings.

This isn't a problem for brk(), where the VMA must be anonymous.  However
in mmap() we must be conservative - if the VMA is anonymous then we can
always proceed, however if not, we permit only shmem mappings (whose .mmap
hook does not affect KSM eligibility) and drivers which implement
.mmap_prepare() (invoked prior to the KSM eligibility check).

If we can't be sure of the driver changing things, then we maintain the
same behaviour of performing the KSM check later in the mmap() logic (and
thus losing new VMA mergeability).

A great many use-cases for this logic will use anonymous mappings any
rate, so this change should already cover the majority of actual KSM
use-cases.

This patch (of 4):

In subsequent commits we are going to determine KSM eligibility prior to a
VMA being constructed, at which point we will of course not yet have
access to a VMA pointer.

It is trivial to boil down the check logic to be parameterised on
mm_struct, file and VMA flags, so do so.

As a part of this change, additionally expose and use file_is_dax() to
determine whether a file is being mapped under a DAX inode.

Link: https://lkml.kernel.org/r/cover.1748537921.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/36ad13eb50cdbd8aac6dcfba22c65d5031667295.1748537921.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xu Xin <xu.xin16@zte.com.cn>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Stefan Roesch <shr@devkernel.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agotools/mm: add script to display page state for a given PID and VADDR
Ye Liu [Fri, 30 May 2025 05:58:55 +0000 (13:58 +0800)]
tools/mm: add script to display page state for a given PID and VADDR

Introduces a new drgn script, `show_page_info.py`, which allows users
to analyze the state of a page given a process ID (PID) and a virtual
address (VADDR).  This can help kernel developers or debuggers easily
inspect page-related information in a live kernel or vmcore.

The script extracts information such as the page flags, mapping, and
other metadata relevant to diagnosing memory issues.

Output example:
sudo ./show_page_info.py 1 0x7fc988181000
PID: 1 Comm: systemd mm: 0xffff8d22c4089700
RAW: 0017ffffc000416c fffff939062ff708 fffff939062ffe08 ffff8d23062a12a8
RAW: 0000000000000000 ffff8d2323438f60 0000002500000007 ffff8d23203ff500
Page Address:    0xfffff93905664e00
Page Flags:      PG_referenced|PG_uptodate|PG_lru|PG_head|PG_active|
                 PG_private|PG_reported|PG_has_hwpoisoned
Page Size:       4096
Page PFN:        0x159938
Page Physical:   0x159938000
Page Virtual:    0xffff8d2319938000
Page Refcount:   37
Page Mapcount:   7
Page Index:      0x0
Page Memcg Data: 0xffff8d23203ff500
Memcg Name:      init.scope
Memcg Path:      /sys/fs/cgroup/memory/init.scope
Page Mapping:    0xffff8d23062a12a8
Page Anon/File:  File
Page VMA:        0xffff8d22e06e0e40
VMA Start:       0x7fc988181000
VMA End:         0x7fc988185000
This page is part of a compound page.
This page is the head page of a compound page.
Head Page:       0xfffff93905664e00
Compound Order:  2
Number of Pages: 4

Link: https://lkml.kernel.org/r/20250530055855.687067-1-ye.liu@linux.dev
Signed-off-by: Ye Liu <liuye@kylinos.cn>
Tested-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: vmscan: apply proportional reclaim pressure for memcg when MGLRU is enabled
Koichiro Den [Fri, 30 May 2025 16:23:53 +0000 (01:23 +0900)]
mm: vmscan: apply proportional reclaim pressure for memcg when MGLRU is enabled

The scan implementation for MGLRU was missing proportional reclaim
pressure for memcg, which contradicts the description in
Documentation/admin-guide/cgroup-v2.rst (memory.{low,min} section).

This issue can be observed in kselftest cgroup:test_memcontrol
(specifically test_memcg_min and test_memcg_low).  The following table
shows the actual values observed in my local test env (on xfs) and the
error "e", which is the symmetric absolute percentage error from the ideal
values of 29M for c[0] and 21M for c[1].

  test_memcg_min

         | MGLRU enabled   | MGLRU enabled   | MGLRU disabled
         | Without patch   | With patch      |
    -----|-----------------|-----------------|---------------
    c[0] | 25964544 (e=8%) | 28770304 (e=3%) | 27820032 (e=4%)
    c[1] | 26214400 (e=9%) | 23998464 (e=4%) | 24776704 (e=6%)

  test_memcg_low

         | MGLRU enabled   | MGLRU enabled   | MGLRU disabled
         | Without patch   | With patch      |
    -----|-----------------|-----------------|---------------
    c[0] | 26214400 (e=7%) | 27930624 (e=4%) | 27688960 (e=5%)
    c[1] | 26214400 (e=9%) | 24764416 (e=6%) | 24920064 (e=6%)

Factor out the proportioning logic to a new function and have MGLRU reuse
it.  While at it, update the eviction behavior via debugfs 'lru_gen'
interface ('-' command with an explicit 'nr_to_reclaim' parameter) to
ensure eviction is limited to the specified number.

Link: https://lkml.kernel.org/r/20250530162353.541882-1-den@valinux.co.jp
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Reviewed-by: Yuanchu Xie <yuanchu@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agodocs/mm: expand vma doc to highlight pte freeing, non-vma traversal
Lorenzo Stoakes [Wed, 4 Jun 2025 18:03:08 +0000 (19:03 +0100)]
docs/mm: expand vma doc to highlight pte freeing, non-vma traversal

The process addresses documentation already contains a great deal of
information about mmap/VMA locking and page table traversal and
manipulation.

However it waves it hands about non-VMA traversal.  Add a section for this
and explain the caveats around this kind of traversal.

Additionally, commit 6375e95f381e ("mm: pgtable: reclaim empty PTE page in
madvise(MADV_DONTNEED)") caused zapping to also free empty PTE page
tables.  Highlight this.

Link: https://lkml.kernel.org/r/20250604180308.137116-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: restore documentation for __free_pages()
Matthew Wilcox (Oracle) [Wed, 4 Jun 2025 19:03:26 +0000 (20:03 +0100)]
mm: restore documentation for __free_pages()

The documentation was converted to be for ___free_pages(), which doesn't
need documentation as it's static.

Link: https://lkml.kernel.org/r/20250604190327.814086-1-willy@infradead.org
Fixes: 8c57b687e833 (mm, bpf: Introduce free_pages_nolock())
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agoLinux 6.16-rc5
Linus Torvalds [Sun, 6 Jul 2025 21:10:26 +0000 (14:10 -0700)]
Linux 6.16-rc5

3 months agoMerge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 6 Jul 2025 20:10:39 +0000 (13:10 -0700)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull /proc/sys dcache lookup fix from Al Viro:
 "Fix for the breakage spotted by Neil in the interplay between
  /proc/sys ->d_compare() weirdness and parallel lookups"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix proc_sys_compare() handling of in-lookup dentries

3 months agoMerge tag 'sched_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Jul 2025 18:17:47 +0000 (11:17 -0700)]
Merge tag 'sched_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Fix the calculation of the deadline server task's runtime as this
   mishap was preventing realtime tasks from running

 - Avoid a race condition during migrate-swapping two tasks

 - Fix the string reported for the "none" dynamic preemption option

* tag 'sched_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix dl_server runtime calculation formula
  sched/core: Fix migrate_swap() vs. hotplug
  sched: Fix preemption string of preempt_dynamic_none

3 months agoMerge tag 'objtool_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Jul 2025 17:55:59 +0000 (10:55 -0700)]
Merge tag 'objtool_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fix from Borislav Petkov:

 - Fix the compilation of an x86 kernel on a big engian machine due to a
   missed endianness conversion

* tag 'objtool_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Add missing endian conversion to read_annotate()

3 months agoMerge tag 'perf_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Jul 2025 17:49:27 +0000 (10:49 -0700)]
Merge tag 'perf_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Revert uprobes to using CAP_SYS_ADMIN again as currently they can
   destructively modify kernel code from an unprivileged process

 - Move a warning to where it belongs

* tag 'perf_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Revert to requiring CAP_SYS_ADMIN for uprobes
  perf/core: Fix the WARN_ON_ONCE is out of lock protected region

3 months agoMerge tag 'x86_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Jul 2025 17:44:20 +0000 (10:44 -0700)]
Merge tag 'x86_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:

 - Make sure AMD SEV guests using secure TSC, include a TSC_FACTOR which
   prevents their TSCs from going skewed from the hypervisor's

* tag 'x86_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Use TSC_FACTOR for Secure TSC frequency calculation

3 months agoMerge tag 'locking_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Jul 2025 17:38:04 +0000 (10:38 -0700)]
Merge tag 'locking_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Borislav Petkov:

 - Disable FUTEX_PRIVATE_HASH for this cycle due to a performance
   regression

 - Add a selftests compilation product to the corresponding .gitignore
   file

* tag 'locking_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/futex: Add futex_numa to .gitignore
  futex: Temporary disable FUTEX_PRIVATE_HASH

3 months agoMerge tag 'edac_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Jul 2025 16:29:24 +0000 (09:29 -0700)]
Merge tag 'edac_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC fix from Borislav Petkov:

 - Initialize sysfs attributes properly to avoid lockdep complaining
   about an uninitialized lock class

* tag 'edac_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC: Initialize EDAC features sysfs attributes

3 months agoMerge tag 'ras_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Jul 2025 16:17:48 +0000 (09:17 -0700)]
Merge tag 'ras_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS fixes from Borislav Petkov:

 - Do not remove the MCE sysfs hierarchy if thresholding sysfs nodes
   init fails due to new/unknown banks present, which in itself is not
   fatal anyway; add default names for new banks

 - Make sure MCE polling settings are honored after CMCI storms

 - Make sure MCE threshold limit is reset after the thresholding
   interrupt has been serviced

 - Clean up properly and disable CMCI banks on shutdown so that a
   second/kexec-ed kernel can rediscover those banks again

* tag 'ras_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Make sure CMCI banks are cleared during shutdown on Intel
  x86/mce/amd: Fix threshold limit reset
  x86/mce/amd: Add default names for MCA banks and blocks
  x86/mce: Ensure user polling settings are honored when restarting timer
  x86/mce: Don't remove sysfs if thresholding sysfs init fails

3 months agoMerge tag 'irq_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Jul 2025 16:16:31 +0000 (09:16 -0700)]
Merge tag 'irq_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Borislav Petkov:

 - Have irq-msi-lib select CONFIG_GENERIC_MSI_IRQ explicitly as it uses
   its facilities

* tag 'irq_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-msi-lib: Select CONFIG_GENERIC_MSI_IRQ

3 months agoselftests/futex: Add futex_numa to .gitignore
Terry Tritton [Fri, 4 Jul 2025 10:37:49 +0000 (11:37 +0100)]
selftests/futex: Add futex_numa to .gitignore

futex_numa was never added to the .gitignore file.
Add it.

Fixes: 9140f57c1c13 ("futex,selftests: Add another FUTEX2_NUMA selftest")
Signed-off-by: Terry Tritton <terry.tritton@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/all/20250704103749.10341-1-terry.tritton@linaro.org
3 months agoMerge tag 'hid-for-linus-2025070502' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 5 Jul 2025 23:14:03 +0000 (16:14 -0700)]
Merge tag 'hid-for-linus-2025070502' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

 - Memory corruption fixes in hid-appletb-kbd driver (Qasim Ijaz)

 - New device ID in hid-elecom driver (Leonard Dizon)

 - Fixed several HID debugfs contants (Vicki Pfau)

* tag 'hid-for-linus-2025070502' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: appletb-kbd: fix slab use-after-free bug in appletb_kbd_probe
  HID: Fix debug name for BTN_GEAR_DOWN, BTN_GEAR_UP, BTN_WHEEL
  HID: elecom: add support for ELECOM HUGE 019B variant
  HID: appletb-kbd: fix memory corruption of input_handler_list

3 months agoMerge tag 'v6.16-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 5 Jul 2025 20:05:28 +0000 (13:05 -0700)]
Merge tag 'v6.16-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - Two reconnect fixes including one for a reboot/reconnect race

 - Fix for incorrect file type that can be returned by SMB3.1.1 POSIX
   extensions

 - tcon initialization fix

 - Fix for resolving Windows symlinks with absolute paths

* tag 'v6.16-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: fix native SMB symlink traversal
  smb: client: fix race condition in negotiate timeout by using more precise timing
  cifs: all initializations for tcon should happen in tcon_info_alloc
  smb: client: fix warning when reconnecting channel
  smb: client: fix readdir returning wrong type with POSIX extensions

3 months agoMerge tag 'i2c-for-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 5 Jul 2025 19:54:24 +0000 (12:54 -0700)]
Merge tag 'i2c-for-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

 - designware: initialise msg_write_idx during transfer

 - microchip: check return value from core xfer call

 - realtek: add 'reg' property constraint to the device tree

* tag 'i2c-for-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  dt-bindings: i2c: realtek,rtl9301: Fix missing 'reg' constraint
  i2c: microchip-core: re-fix fake detections w/ i2cdetect
  i2c/designware: Fix an initialization issue

3 months agoMerge tag 'pm-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Sat, 5 Jul 2025 00:27:30 +0000 (17:27 -0700)]
Merge tag 'pm-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These address system suspend failures under memory pressure in some
  configurations, fix up RAPL handling on platforms where PL1 cannot be
  disabled, and fix a documentation typo:

   - Prevent the Intel RAPL power capping driver from allowing PL1 to be
     exceeded by mistake on systems when PL1 cannot be disabled (Zhang
     Rui)

   - Fix a typo in the ABI documentation (Sumanth Gavini)

   - Allow swap to be used a bit longer during system suspend and
     hibernation to avoid suspend failures under memory pressure (Mario
     Limonciello)"

* tag 'pm-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: docs: Replace "diasble" with "disable"
  powercap: intel_rapl: Do not change CLAMPING bit if ENABLE bit cannot be changed
  PM: Restrict swap use to later in the suspend sequence

3 months agoMerge tag 'acpi-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Sat, 5 Jul 2025 00:25:41 +0000 (17:25 -0700)]
Merge tag 'acpi-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "Revert a problematic ACPI battery driver change merged recently"

* tag 'acpi-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "ACPI: battery: negate current when discharging"

3 months agoMerge branch 'pm-sleep'
Rafael J. Wysocki [Fri, 4 Jul 2025 19:54:55 +0000 (21:54 +0200)]
Merge branch 'pm-sleep'

Merge fixes related to system sleep for 6.16-rc5:

 - Fix typo in the ABI documentation (Sumanth Gavini).

 - Allow swap to be used a bit longer during system suspend and
   hibernation to avoid suspend failures under memory pressure (Mario
   Limonciello).

* pm-sleep:
  PM: sleep: docs: Replace "diasble" with "disable"
  PM: Restrict swap use to later in the suspend sequence

3 months agoMerge tag 'soc-fixes-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Fri, 4 Jul 2025 19:05:36 +0000 (12:05 -0700)]
Merge tag 'soc-fixes-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
 "A couple of fixes for firmware drivers have come up, addressing kernel
  side bugs in op-tee and ff-a code, as well as compatibility issues
  with exynos-acpm and ff-a protocols.

  The only devicetree fixes are for the Apple platform, addressing
  issues with conformance to the bindings for the wlan, spi and mipi
  nodes"

* tag 'soc-fixes-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: dts: apple: Move touchbar mipi {address,size}-cells from dtsi to dts
  arm64: dts: apple: Drop {address,size}-cells from SPI NOR
  arm64: dts: apple: t8103: Fix PCIe BCM4377 nodename
  optee: ffa: fix sleep in atomic context
  firmware: exynos-acpm: fix timeouts on xfers handling
  arm64: defconfig: update renamed PHY_SNPS_EUSB2
  firmware: arm_ffa: Fix the missing entry in struct ffa_indirect_msg_hdr
  firmware: arm_ffa: Replace mutex with rwlock to avoid sleep in atomic context
  firmware: arm_ffa: Move memory allocation outside the mutex locking
  firmware: arm_ffa: Fix memory leak by freeing notifier callback node

3 months agoMerge tag 'riscv-for-linus-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 4 Jul 2025 17:23:29 +0000 (10:23 -0700)]
Merge tag 'riscv-for-linus-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - kCFI is restricted to clang-17 or newer, as earlier versions have
   known bugs

 - sbi_hsm_hart_start is now staticly allocated, to avoid tripping up
   the SBI HSM page mapping on sparse systems.

* tag 'riscv-for-linus-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: cpu_ops_sbi: Use static array for boot_data
  riscv: Require clang-17 or newer for kCFI

3 months agoMerge tag 'regulator-fix-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 4 Jul 2025 17:14:49 +0000 (10:14 -0700)]
Merge tag 'regulator-fix-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A few driver fixes (the GPIO one being potentially nasty, though it
  has been there for a while without anyone reporting it), and one core
  fix for the rarely used combination of coupled regulators and
  unbinding"

* tag 'regulator-fix-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: gpio: Fix the out-of-bounds access to drvdata::gpiods
  regulator: mp886x: Fix ID table driver_data
  regulator: sy8824x: Fix ID table driver_data
  regulator: tps65219: Fix devm_kmalloc size allocation
  regulator: core: fix NULL dereference on unbind due to stale coupling data

3 months agoMerge tag 'spi-fix-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brooni...
Linus Torvalds [Fri, 4 Jul 2025 17:10:49 +0000 (10:10 -0700)]
Merge tag 'spi-fix-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "As well as a few driver specific fixes we've got a core change here
  which raises the hard coded limit on the number of devices we can
  support on one SPI bus since some FPGA based systems are running into
  the existing limit. This is not a good solution but it's one suitable
  for this point in the release cycle, we should dynamically size the
  relevant data structures which I hope will happen in the next couple
  of merge windows.

  We also pull in a MTD fix for the Qualcomm SNAND driver, the two fixes
  cover the same issue and merging them together minimises bisection
  issues"

* tag 'spi-fix-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: cadence-quadspi: fix cleanup of rx_chan on failure paths
  spi: spi-fsl-dspi: Clear completion counter before initiating transfer
  spi: Raise limit on number of chip selects to 24
  mtd: nand: qpic_common: prevent out of bounds access of BAM arrays
  spi: spi-qpic-snand: reallocate BAM transactions

3 months agoMerge tag 'platform-drivers-x86-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 4 Jul 2025 17:05:31 +0000 (10:05 -0700)]
Merge tag 'platform-drivers-x86-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers fixes from Ilpo Järvinen:
 "Mostly a few lines fixed here and there except amd/isp4 which improves
  swnodes relationships but that is a new driver not in any stable
  kernels yet. The think-lmi driver changes also look relatively large
  but there are just many fixes to it.

  The i2c/piix4 change is a effectively a revert of the commit
  7e173eb82ae9 ("i2c: piix4: Make CONFIG_I2C_PIIX4 dependent on
  CONFIG_X86") but that required moving the header out from arch/x86
  under include/linux/platform_data/

  Summary:

   - amd/isp4: Improve swnode graph (new driver exception)

   - asus-nb-wmi: Use duo keyboard quirk for Zenbook Duo UX8406CA

   - dell-lis3lv02d: Add Latitude 5500 accelerometer address

   - dell-wmi-sysman: Fix WMI data block retrieval and class dev unreg

   - hp-bioscfg: Fix class device unregistration

   - i2c: piix4: Re-enable on non-x86 + move FCH header under platform_data/

   - intel/hid: Wildcat Lake support

   - mellanox:
      - mlxbf-pmc: Fix duplicate event ID
      - mlxbf-tmfifo: Fix vring_desc.len assignment
      - mlxreg-lc: Fix bit-not-set logic check
      - nvsw-sn2201: Fix bus number in error message & spelling errors

   - portwell-ec: Move watchdog device under correct platform hierarchy

   - think-lmi: Error handling fixes (sysfs, kset, kobject, class dev unreg)

   - thinkpad_acpi: Handle HKEY 0x1402 event (2025 Thinkpads)

   - wmi: Fix WMI event enablement"

* tag 'platform-drivers-x86-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (22 commits)
  platform/x86: think-lmi: Fix sysfs group cleanup
  platform/x86: think-lmi: Fix kobject cleanup
  platform/x86: think-lmi: Create ksets consecutively
  platform/mellanox: mlxreg-lc: Fix logic error in power state check
  i2c: Re-enable piix4 driver on non-x86
  Move FCH header to a location accessible by all archs
  platform/x86/intel/hid: Add Wildcat Lake support
  platform/x86: dell-wmi-sysman: Fix class device unregistration
  platform/x86: think-lmi: Fix class device unregistration
  platform/x86: hp-bioscfg: Fix class device unregistration
  platform/x86: Update swnode graph for amd isp4
  platform/x86: dell-wmi-sysman: Fix WMI data block retrieval in sysfs callbacks
  platform/x86: wmi: Update documentation of WCxx/WExx ACPI methods
  platform/x86: wmi: Fix WMI event enablement
  platform/mellanox: nvsw-sn2201: Fix bus number in adapter error message
  platform/mellanox: Fix spelling and comment clarity in Mellanox drivers
  platform/mellanox: mlxbf-pmc: Fix duplicate event ID for CACHE_DATA1
  platform/x86: thinkpad_acpi: handle HKEY 0x1402 event
  platform/x86: asus-nb-wmi: add DMI quirk for ASUS Zenbook Duo UX8406CA
  platform/x86: dell-lis3lv02d: Add Latitude 5500
  ...

3 months agoMerge tag 'usb-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Fri, 4 Jul 2025 16:57:12 +0000 (09:57 -0700)]
Merge tag 'usb-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some USB driver fixes for 6.16-rc5. I originally wanted this
  to get into -rc4, but there were some regressions that had to be
  handled first. Now all looks good. Included in here are the following
  fixes:

   - cdns3 driver fixes

   - xhci driver fixes

   - typec driver fixes

   - USB hub fixes (this is what took the longest to get right)

   - new USB driver quirks added

   - chipidea driver fixes

  All of these have been in linux-next for a while and now we have no
  more reported problems with them"

* tag 'usb-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (21 commits)
  usb: hub: Fix flushing of delayed work used for post resume purposes
  xhci: dbc: Flush queued requests before stopping dbc
  xhci: dbctty: disable ECHO flag by default
  xhci: Disable stream for xHC controller with XHCI_BROKEN_STREAMS
  usb: xhci: quirk for data loss in ISOC transfers
  usb: dwc3: gadget: Fix TRB reclaim logic for short transfers and ZLPs
  usb: hub: Fix flushing and scheduling of delayed work that tunes runtime pm
  usb: typec: displayport: Fix potential deadlock
  usb: typec: altmodes/displayport: do not index invalid pin_assignments
  usb: cdnsp: Fix issue with CV Bad Descriptor test
  usb: typec: tcpm: apply vbus before data bringup in tcpm_src_attach
  Revert "usb: xhci: Implement xhci_handshake_check_state() helper"
  usb: xhci: Skip xhci_reset in xhci_resume if xhci is being removed
  usb: gadget: u_serial: Fix race condition in TTY wakeup
  Revert "usb: gadget: u_serial: Add null pointer check in gs_start_io"
  usb: chipidea: udc: disconnect/reconnect from host when do suspend/resume
  usb: acpi: fix device link removal
  usb: hub: fix detection of high tier USB3 devices behind suspended hubs
  Logitech C-270 even more broken
  usb: dwc3: Abort suspend on soft disconnect failure
  ...

3 months agoMerge tag 'input-for-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 4 Jul 2025 16:54:15 +0000 (09:54 -0700)]
Merge tag 'input-for-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:

 - support for Acer NGR 200 Controller added to xpad driver

 - xpad driver will no longer log errors about URBs at sudden disconnect

 - a fix for potential NULL dereference in cs40l50-vibra driver

 - several drivers have been switched to using scnprintf() to suppress
   warnings about potential output truncation

* tag 'input-for-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: cs40l50-vibra - fix potential NULL dereference in cs40l50_upload_owt()
  Input: alps - use scnprintf() to suppress truncation warning
  Input: iqs7222 - explicitly define number of external channels
  Input: xpad - support Acer NGR 200 Controller
  Input: xpad - return errors from xpad_try_sending_next_out_packet() up
  Input: xpad - adjust error handling for disconnect
  Input: apple_z2 - drop default ARCH_APPLE in Kconfig
  Input: Fully open-code compatible for grepping
  dt-bindings: HID: i2c-hid: elan: Introduce Elan eKTH8D18
  Input: psmouse - switch to use scnprintf() to suppress truncation warning
  Input: lifebook - switch to use scnprintf() to suppress truncation warning
  Input: alps - switch to use scnprintf() to suppress truncation warning
  Input: atkbd - switch to use scnprintf() to suppress truncation warning
  Input: fsia6b - suppress buffer truncation warning for phys
  Input: iqs626a - replace snprintf() with scnprintf()

3 months agoMerge tag 'drm-fixes-2025-07-04' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 4 Jul 2025 16:48:36 +0000 (09:48 -0700)]
Merge tag 'drm-fixes-2025-07-04' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Weekly drm fixes, bit of a bumper crop, the usual amdgpu/xe/i915
  suspects, then there is a large scattering of fixes across core and
  drivers. I think the simple panel lookup fix is probably the largest,
  the sched race fix is also fun, but I don't see anything standing out
  too badly.

  dma-buf:
   - fix timeout handling

  gem:
   - fix framebuffer object references

  sched:
   - fix spsc queue job count race

  bridge:
   - fix aux hpd bridge of node
   - panel: move missing flag handling
   - samsung-dsim: fix %pK usage to %p

  panel:
   - fix problem with simple panel lookup

  ttm:
   - fix error path handling

  amdgpu:
   - SDMA 5.x reset fix
   - Add missing firmware declaration
   - Fix leak in amdgpu_ctx_mgr_entity_fini()
   - Freesync fix
   - OLED backlight fix

  amdkfd:
   - mtype fix for ext coherent system memory
   - MMU notifier fix
   - gfx7/8 fix

  xe:
   - Fix chunking the PTE updates and overflowing the maximum number of
     dwords with with MI_STORE_DATA_IMM
   - Move WA BB to the LRC BO to mitigate hangs on context switch
   - Fix frequency/flush WAs for BMG
   - Fix kconfig prompt title and description
   - Do not require kunit
   - Extend 14018094691 WA to BMG
   - Fix wedging the device on signal

  i915:
   - Make mei interrupt top half irq disabled to fix RT builds
   - Fix timeline left held on VMA alloc error
   - Fix NULL pointer deref in vlv_dphy_param_init()
   - Fix selftest mock_request() to avoid NULL deref

  exynos:
   - switch to using %p instead of %pK
   - fix vblank NULL ptr race
   - fix lockup on samsung peach-pit/pi chromebooks

  vesadrm:
   - NULL ptr fix

  vmwgfx:
   - fix encrypted memory allocation bug

  v3d:
   - fix irq enabled during reset"

* tag 'drm-fixes-2025-07-04' of https://gitlab.freedesktop.org/drm/kernel: (41 commits)
  drm/xe: Do not wedge device on killed exec queues
  drm/xe: Extend WA 14018094691 to BMG
  drm/v3d: Disable interrupts before resetting the GPU
  drm/gem: Acquire references on GEM handles for framebuffers
  drm/sched: Increment job count before swapping tail spsc queue
  drm/xe: Allow dropping kunit dependency as built-in
  drm/xe: Fix kconfig prompt
  drm/xe/bmg: Update Wa_22019338487
  drm/xe/bmg: Update Wa_14022085890
  drm/xe: Split xe_device_td_flush()
  drm/xe/xe_guc_pc: Lock once to update stashed frequencies
  drm/xe/guc_pc: Add _locked variant for min/max freq
  drm/xe: Make WA BB part of LRC BO
  drm/xe: Fix out-of-bounds field write in MI_STORE_DATA_IMM
  drm/i915/gsc: mei interrupt top half should be in irq disabled context
  drm/i915/gt: Fix timeline left held on VMA alloc error
  drm/vmwgfx: Fix guests running with TDX/SEV
  drm/amd/display: Don't allow OLED to go down to fully off
  drm/amd/display: Added case for when RR equals panel's max RR using freesync
  drm/amdkfd: add hqd_sdma_get_doorbell callbacks for gfx7/8
  ...

3 months agoMerge tag 'iommu-fixes-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 4 Jul 2025 16:43:08 +0000 (09:43 -0700)]
Merge tag 'iommu-fixes-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:

 - Rockchip: fix infinite loop caused by probing race condition

 - Intel VT-d: assign devtlb cache tag on ATS enablement

* tag 'iommu-fixes-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/vt-d: Assign devtlb cache tag on ATS enablement
  iommu/rockchip: prevent iommus dead loop when two masters share one IOMMU

3 months agoMerge tag 'block-6.16-20250704' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 4 Jul 2025 16:33:59 +0000 (09:33 -0700)]
Merge tag 'block-6.16-20250704' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe fixes via Christoph:
     - fix incorrect cdw15 value in passthru error logging (Alok Tiwari)
     - fix memory leak of bio integrity in nvmet (Dmitry Bogdanov)
     - refresh visible attrs after being checked (Eugen Hristev)
     - fix suspicious RCU usage warning in the multipath code (Geliang Tang)
     - correctly account for namespace head reference counter (Nilay Shroff)

 - Fix for a regression introduced in ublk in this cycle, where it would
   attempt to queue a canceled request.

 - brd RCU sleeping fix, also introduced in this cycle. Bare bones fix,
   should be improved upon for the next release.

* tag 'block-6.16-20250704' of git://git.kernel.dk/linux:
  brd: fix sleeping function called from invalid context in brd_insert_page()
  ublk: don't queue request if the associated uring_cmd is canceled
  nvme-multipath: fix suspicious RCU usage warning
  nvme-pci: refresh visible attrs after being checked
  nvmet: fix memory leak of bio integrity
  nvme: correctly account for namespace head reference counter
  nvme: Fix incorrect cdw15 value in passthru error logging

3 months agoMerge tag 'i2c-host-fixes-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Wolfram Sang [Fri, 4 Jul 2025 16:31:22 +0000 (18:31 +0200)]
Merge tag 'i2c-host-fixes-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current

i2c-host-fixes for v6.16-rc5

designware: initialise msg_write_idx during transfer
microchip: check return value from core xfer call
realtek: add 'reg' property constraint to the device tree

3 months agoMerge tag 'bcachefs-2025-07-03' of git://evilpiepirate.org/bcachefs
Linus Torvalds [Fri, 4 Jul 2025 16:29:22 +0000 (09:29 -0700)]
Merge tag 'bcachefs-2025-07-03' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "The 'opts.casefold_disabled' patch is non critical, but would be a
  6.15 backport; it's to address the casefolding + overlayfs
  incompatibility that was discovvered late.

  It's late because I was hoping that this would be addressed on the
  overlayfs side (and will be in 6.17), but user reports keep coming in
  on this one (lots of people are using docker these days)"

* tag 'bcachefs-2025-07-03' of git://evilpiepirate.org/bcachefs:
  bcachefs: opts.casefold_disabled
  bcachefs: Work around deadlock to btree node rewrites in journal replay
  bcachefs: Fix incorrect transaction restart handling
  bcachefs: fix btree_trans_peek_prev_journal()
  bcachefs: mark invalid_btree_id autofix

3 months agoMerge tag 'vfs-6.16-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Linus Torvalds [Fri, 4 Jul 2025 16:06:49 +0000 (09:06 -0700)]
Merge tag 'vfs-6.16-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Fix a regression caused by the anonymous inode rework. Making them
   regular files causes various places in the kernel to tip over
   starting with io_uring.

   Revert to the former status quo and port our assertion to be based on
   checking the inode so we don't lose the valuable VFS_*_ON_*()
   assertions that have already helped discover weird behavior our
   outright bugs.

 - Fix the the upper bound calculation in fuse_fill_write_pages()

 - Fix priority inversion issues in the eventpoll code

 - Make secretmen use anon_inode_make_secure_inode() to avoid bypassing
   the LSM layer

 - Fix a netfs hang due to missing case in final DIO read result
   collection

 - Fix a double put of the netfs_io_request struct

 - Provide some helpers to abstract out NETFS_RREQ_IN_PROGRESS flag
   wrangling

 - Fix infinite looping in netfs_wait_for_pause/request()

 - Fix a netfs ref leak on an extra subrequest inserted into a request's
   list of subreqs

 - Fix various cifs RPC callbacks to set NETFS_SREQ_NEED_RETRY if a
   subrequest fails retriably

 - Fix a cifs warning in the workqueue code when reconnecting a channel

 - Fix the updating of i_size in netfs to avoid a race between testing
   if we should have extended the file with a DIO write and changing
   i_size

 - Merge the places in netfs that update i_size on write

 - Fix coredump socket selftests

* tag 'vfs-6.16-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  anon_inode: rework assertions
  netfs: Update tracepoints in a number of ways
  netfs: Renumber the NETFS_RREQ_* flags to make traces easier to read
  netfs: Merge i_size update functions
  netfs: Fix i_size updating
  smb: client: set missing retry flag in cifs_writev_callback()
  smb: client: set missing retry flag in cifs_readv_callback()
  smb: client: set missing retry flag in smb2_writev_callback()
  netfs: Fix ref leak on inserted extra subreq in write retry
  netfs: Fix looping in wait functions
  netfs: Provide helpers to perform NETFS_RREQ_IN_PROGRESS flag wangling
  netfs: Fix double put of request
  netfs: Fix hang due to missing case in final DIO read result collection
  eventpoll: Fix priority inversion problem
  fuse: fix fuse_fill_write_pages() upper bound calculation
  fs: export anon_inode_make_secure_inode() and fix secretmem LSM bypass
  selftests/coredump: Fix "socket_detect_userspace_client" test failure

3 months agosched/deadline: Fix dl_server runtime calculation formula
kuyo chang [Wed, 2 Jul 2025 02:12:25 +0000 (10:12 +0800)]
sched/deadline: Fix dl_server runtime calculation formula

In our testing with 6.12 based kernel on a big.LITTLE system, we were
seeing instances of RT tasks being blocked from running on the LITTLE
cpus for multiple seconds of time, apparently by the dl_server. This
far exceeds the default configured 50ms per second runtime.

This is due to the fair dl_server runtime calculation being scaled
for frequency & capacity of the cpu.

Consider the following case under a Big.LITTLE architecture:
Assume the runtime is: 50,000,000 ns, and Frequency/capacity
scale-invariance defined as below:
Frequency scale-invariance: 100
Capacity scale-invariance: 50
First by Frequency scale-invariance,
the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
Then by capacity scale-invariance,
it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
So it will scaled to 238,418 ns.

This smaller "accounted runtime" value is what ends up being
subtracted against the fair-server's runtime for the current period.
Thus after 50ms of real time, we've only accounted ~238us against the
fair servers runtime. This 209:1 ratio in this example means that on
the smaller cpu the fair server is allowed to continue running,
blocking RT tasks, for over 10 seconds before it exhausts its supposed
50ms of runtime.  And on other hardware configurations it can be even
worse.

For the fair deadline_server, to prevent realtime tasks from being
unexpectedly delayed, we really do want to use fixed time, and not
scaled time for smaller capacity/frequency cpus. So remove the scaling
from the fair server's accounting to fix this.

Fixes: a110a81c52a9 ("sched/deadline: Deferrable dl server")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: John Stultz <jstultz@google.com>
Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Acked-by: John Stultz <jstultz@google.com>
Tested-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/r/20250702021440.2594736-1-kuyo.chang@mediatek.com
3 months agoiommu/vt-d: Assign devtlb cache tag on ATS enablement
Lu Baolu [Sat, 28 Jun 2025 10:03:51 +0000 (18:03 +0800)]
iommu/vt-d: Assign devtlb cache tag on ATS enablement

Commit <4f1492efb495> ("iommu/vt-d: Revert ATS timing change to fix boot
failure") placed the enabling of ATS in the probe_finalize callback. This
occurs after the default domain attachment, which is when the ATS cache
tag is assigned. Consequently, the device TLB cache tag is missed when the
domain is attached, leading to the device TLB not being invalidated in the
iommu_unmap paths.

Fix this by assigning the CACHE_TAG_DEVTLB cache tag when ATS is enabled.

Fixes: 4f1492efb495 ("iommu/vt-d: Revert ATS timing change to fix boot failure")
Cc: stable@vger.kernel.org
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20250625050135.3129955-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20250628100351.3198955-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
3 months agoInput: cs40l50-vibra - fix potential NULL dereference in cs40l50_upload_owt()
Yunshui Jiang [Fri, 4 Jul 2025 04:56:02 +0000 (21:56 -0700)]
Input: cs40l50-vibra - fix potential NULL dereference in cs40l50_upload_owt()

The cs40l50_upload_owt() function allocates memory via kmalloc()
without checking for allocation failure, which could lead to a
NULL pointer dereference.

Return -ENOMEM in case allocation fails.

Signed-off-by: Yunshui Jiang <jiangyunshui@kylinos.cn>
Fixes: c38fe1bb5d21 ("Input: cs40l50 - Add support for the CS40L50 haptic driver")
Link: https://lore.kernel.org/r/20250704024010.2353841-1-jiangyunshui@kylinos.cn
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
3 months agofix proc_sys_compare() handling of in-lookup dentries
Al Viro [Mon, 30 Jun 2025 06:52:13 +0000 (02:52 -0400)]
fix proc_sys_compare() handling of in-lookup dentries

There's one case where ->d_compare() can be called for an in-lookup
dentry; usually that's nothing special from ->d_compare() point of
view, but... proc_sys_compare() is weird.

The thing is, /proc/sys subdirectories can look differently for
different processes.  Up to and including having the same name
resolve to different dentries - all of them hashed.

The way it's done is ->d_compare() refusing to admit a match unless
this dentry is supposed to be visible to this caller.  The information
needed to discriminate between them is stored in inode; it is set
during proc_sys_lookup() and until it's done d_splice_alias() we really
can't tell who should that dentry be visible for.

Normally there's no negative dentries in /proc/sys; we can run into
a dying dentry in RCU dcache lookup, but those can be safely rejected.

However, ->d_compare() is also called for in-lookup dentries, before
they get positive - or hashed, for that matter.  In case of match
we will wait until dentry leaves in-lookup state and repeat ->d_compare()
afterwards.  In other words, the right behaviour is to treat the
name match as sufficient for in-lookup dentries; if dentry is not
for us, we'll see that when we recheck once proc_sys_lookup() is
done with it.

While we are at it, fix the misspelled READ_ONCE and WRITE_ONCE there.

Fixes: d9171b934526 ("parallel lookups machinery, part 4 (and last)")
Reported-by: NeilBrown <neilb@brown.name>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
3 months agoMerge tag 'drm-xe-fixes-2025-07-03' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Fri, 4 Jul 2025 00:01:49 +0000 (10:01 +1000)]
Merge tag 'drm-xe-fixes-2025-07-03' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix chunking the PTE updates and overflowing the maximum number of
  dwords with with MI_STORE_DATA_IMM (Jia Yao)
- Move WA BB to the LRC BO to mitigate hangs on context switch (Matthew
  Brost)
- Fix frequency/flush WAs for BMG (Vinay / Lucas)
- Fix kconfig prompt title and description (Lucas)
- Do not require kunit (Harry Austen / Lucas)
- Extend 14018094691 WA to BMG (Daniele)
- Fix wedging the device on signal (Matthew Brost)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/o5662wz6nrlf6xt5sjgxq5oe6qoujefzywuwblm3m626hreifv@foqayqydd6ig
3 months agosmb: client: fix native SMB symlink traversal
Paulo Alcantara [Thu, 3 Jul 2025 20:57:19 +0000 (17:57 -0300)]
smb: client: fix native SMB symlink traversal

We've seen customers having shares mounted in paths like /??/C:/ or
/??/UNC/foo.example.com/share in order to get their native SMB
symlinks successfully followed from different mounts.

After commit 12b466eb52d9 ("cifs: Fix creating and resolving absolute NT-style symlinks"),
the client would then convert absolute paths from "/??/C:/" to "/mnt/c/"
by default.  The absolute paths would vary depending on the value of
symlinkroot= mount option.

Fix this by restoring old behavior of not trying to convert absolute
paths by default.  Only do this if symlinkroot= was _explicitly_ set.

Before patch:

  $ mount.cifs //w22-fs0/test2 /mnt/1 -o vers=3.1.1,username=xxx,password=yyy
  $ ls -l /mnt/1/symlink2
  lrwxr-xr-x 1 root root 15 Jun 20 14:22 /mnt/1/symlink2 -> /mnt/c/testfile
  $ mkdir -p /??/C:; echo foo > //??/C:/testfile
  $ cat /mnt/1/symlink2
  cat: /mnt/1/symlink2: No such file or directory

After patch:

  $ mount.cifs //w22-fs0/test2 /mnt/1 -o vers=3.1.1,username=xxx,password=yyy
  $ ls -l /mnt/1/symlink2
  lrwxr-xr-x 1 root root 15 Jun 20 14:22 /mnt/1/symlink2 -> '/??/C:/testfile'
  $ mkdir -p /??/C:; echo foo > //??/C:/testfile
  $ cat /mnt/1/symlink2
  foo

Cc: linux-cifs@vger.kernel.org
Reported-by: Pierguido Lambri <plambri@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Stefan Metzmacher <metze@samba.org>
Fixes: 12b466eb52d9 ("cifs: Fix creating and resolving absolute NT-style symlinks")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 months agosmb: client: fix race condition in negotiate timeout by using more precise timing
Wang Zhaolong [Thu, 3 Jul 2025 13:29:52 +0000 (21:29 +0800)]
smb: client: fix race condition in negotiate timeout by using more precise timing

When the SMB server reboots and the client immediately accesses the mount
point, a race condition can occur that causes operations to fail with
"Host is down" error.

Reproduction steps:
  # Mount SMB share
  mount -t cifs //192.168.245.109/TEST /mnt/ -o xxxx
  ls /mnt

  # Reboot server
  ssh root@192.168.245.109 reboot
  ssh root@192.168.245.109 /path/to/cifs_server_setup.sh
  ssh root@192.168.245.109 systemctl stop firewalld

  # Immediate access fails
  ls /mnt
  ls: cannot access '/mnt': Host is down

  # But works if there is a delay

The issue is caused by a race condition between negotiate and reconnect.
The 20-second negotiate timeout mechanism can interfere with the normal
recovery process when both are triggered simultaneously.

  ls                              cifsd
---------------------------------------------------
 cifs_getattr
 cifs_revalidate_dentry
 cifs_get_inode_info
 cifs_get_fattr
 smb2_query_path_info
 smb2_compound_op
 SMB2_open_init
 smb2_reconnect
 cifs_negotiate_protocol
  smb2_negotiate
   cifs_send_recv
    smb_send_rqst
    wait_for_response
                            cifs_demultiplex_thread
                              cifs_read_from_socket
                              cifs_readv_from_socket
                                server_unresponsive
                                cifs_reconnect
                                  __cifs_reconnect
                                  cifs_abort_connection
                                    mid->mid_state = MID_RETRY_NEEDED
                                    cifs_wake_up_task
    cifs_sync_mid_result
     // case MID_RETRY_NEEDED
     rc = -EAGAIN;
   // In smb2_negotiate()
   rc = -EHOSTDOWN;

The server_unresponsive() timeout triggers cifs_reconnect(), which aborts
ongoing mid requests and causes the ls command to receive -EAGAIN, leading
to -EHOSTDOWN.

Fix this by introducing a dedicated `neg_start` field to
precisely tracks when the negotiate process begins. The timeout check
now uses this accurate timestamp instead of `lstrp`, ensuring that:

1. Timeout is only triggered after negotiate has actually run for 20s
2. The mechanism doesn't interfere with concurrent recovery processes
3. Uninitialized timestamps (value 0) don't trigger false timeouts

Fixes: 7ccc1465465d ("smb: client: fix hang in wait_for_response() for negproto")
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 months agoMerge tag 'samsung-dsim-fixes-for-v6.16-rc4' of git://git.kernel.org/pub/scm/linux...
Dave Airlie [Thu, 3 Jul 2025 23:40:17 +0000 (09:40 +1000)]
Merge tag 'samsung-dsim-fixes-for-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes

- Fixed raw pointer leakage and unsafe behavior in printk()
  . Switch from %pK to %p for pointer formatting, as %p is now safer
    and prevents issues like raw pointer leakage and acquiring sleeping
    locks in atomic contexts.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <inki.dae@samsung.com>
Link: https://lore.kernel.org/r/20250629091742.29956-1-inki.dae@samsung.com
3 months agoMerge tag 'exynos-drm-fixes-for-v6.16-rc4' of git://git.kernel.org/pub/scm/linux...
Dave Airlie [Thu, 3 Jul 2025 23:37:57 +0000 (09:37 +1000)]
Merge tag 'exynos-drm-fixes-for-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes

Fixups
- Fixed raw pointer leakage and unsafe behavior in printk()
  . Switch from %pK to %p for pointer formatting, as %p is now safer
    and prevents issues like raw pointer leakage and acquiring sleeping
    locks in atomic contexts.

- Fixed kernel panic during boot
  . A NULL pointer dereference issue occasionally occurred
    when the vblank interrupt handler was called before
    the DRM driver was fully initialized during boot.
    So this patch fixes the issue by adding a check in the interrupt handler
    to ensure the DRM driver is properly initialized.

- Fixed a lockup issue on Samsung Peach-Pit/Pi Chromebooks
  . The issue occurred after commit c9b1150a68d9 changed
    the call order of CRTC enable/disable and bridge pre_enable/post_disable
    methods, causing fimd_dp_clock_enable() to be called
    before the FIMD device was activated. To fix this,
    runtime PM guards were added to fimd_dp_clock_enable()
    to ensure proper operation even when CRTC is not enabled.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <inki.dae@samsung.com>
Link: https://lore.kernel.org/r/20250629083554.28628-1-inki.dae@samsung.com
3 months agoMerge tag 'drm-intel-fixes-2025-07-03' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Thu, 3 Jul 2025 23:26:57 +0000 (09:26 +1000)]
Merge tag 'drm-intel-fixes-2025-07-03' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Make mei interrupt top half irq disabled to fix RT builds
- Fix timeline left held on VMA alloc error
- Fix NULL pointer deref in vlv_dphy_param_init()
- Fix selftest mock_request() to avoid NULL deref

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://lore.kernel.org/r/aGYVPAA4KvsZqDFx@jlahtine-mobl
3 months agoMerge tag 'drm-misc-fixes-2025-07-03' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Thu, 3 Jul 2025 23:06:56 +0000 (09:06 +1000)]
Merge tag 'drm-misc-fixes-2025-07-03' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

drm-misc-fixes for v6.16-rc5:
- Replace simple panel lookup hack with proper fix.
- nullpointer deref in vesadrm fix.
- fix dma_resv_wait_timeout.
- fix error handling in ttm_buffer_object_transfer.
- bridge fixes.
- Fix vmwgfx accidentally allocating encrypted memory.
- Fix race in spsc_queue_push()
- Add refcount on backing GEM objects during fb creation.
- Fix v3d irq's being enabled during gpu reset.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/a7461418-08dc-4b7c-b2fa-264155f66d5e@linux.intel.com
3 months agoMerge tag 'for-6.16-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 3 Jul 2025 20:29:56 +0000 (13:29 -0700)]
Merge tag 'for-6.16-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - tree-log fixes:
    - fixes of log tracking of directories and subvolumes
    - fix iteration and error handling of inode references
      during log replay

 - fix free space tree rebuild (reported by syzbot)

* tag 'for-6.16-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: use btrfs_record_snapshot_destroy() during rmdir
  btrfs: propagate last_unlink_trans earlier when doing a rmdir
  btrfs: record new subvolume in parent dir earlier to avoid dir logging races
  btrfs: fix inode lookup error handling during log replay
  btrfs: fix iteration of extrefs during log replay
  btrfs: fix missing error handling when searching for inode refs during log replay
  btrfs: fix failure to rebuild free space tree using multiple transactions

3 months agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Thu, 3 Jul 2025 18:52:39 +0000 (11:52 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Driver fixes plus core sd.c fix are all small and obvious.

  The larger change to hosts.c is less obvious, but required to avoid
  data corruption caused by bio splitting"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Fix spelling of a sysfs attribute name
  scsi: core: Enforce unlimited max_segment_size when virt_boundary_mask is set
  scsi: RDMA/srp: Don't set a max_segment_size when virt_boundary_mask is set
  scsi: sd: Fix VPD page 0xb7 length check
  scsi: qla4xxx: Fix missing DMA mapping error in qla4xxx_alloc_pdu()
  scsi: qla2xxx: Fix DMA mapping test in qla24xx_get_port_database()

3 months agoMerge tag 'net-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 3 Jul 2025 16:18:55 +0000 (09:18 -0700)]
Merge tag 'net-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from Bluetooth.

  Current release - new code bugs:

    - eth:
       - txgbe: fix the issue of TX failure
       - ngbe: specify IRQ vector when the number of VFs is 7

  Previous releases - regressions:

    - sched: always pass notifications when child class becomes empty

    - ipv4: fix stat increase when udp early demux drops the packet

    - bluetooth: prevent unintended pause by checking if advertising is active

    - virtio: fix error reporting in virtqueue_resize

    - eth:
       - virtio-net:
          - ensure the received length does not exceed allocated size
          - fix the xsk frame's length check
       - lan78xx: fix WARN in __netif_napi_del_locked on disconnect

  Previous releases - always broken:

    - bluetooth: mesh: check instances prior disabling advertising

    - eth:
       - idpf: convert control queue mutex to a spinlock
       - dpaa2: fix xdp_rxq_info leak
       - amd-xgbe: align CL37 AN sequence as per databook"

* tag 'net-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
  vsock/vmci: Clear the vmci transport packet properly when initializing it
  dt-bindings: net: sophgo,sg2044-dwmac: Drop status from the example
  net: ngbe: specify IRQ vector when the number of VFs is 7
  net: wangxun: revert the adjustment of the IRQ vector sequence
  net: txgbe: request MISC IRQ in ndo_open
  virtio_net: Enforce minimum TX ring size for reliability
  virtio_net: Cleanup '2+MAX_SKB_FRAGS'
  virtio_ring: Fix error reporting in virtqueue_resize
  virtio-net: xsk: rx: fix the frame's length check
  virtio-net: use the check_mergeable_len helper
  virtio-net: remove redundant truesize check with PAGE_SIZE
  virtio-net: ensure the received length does not exceed allocated size
  net: ipv4: fix stat increase when udp early demux drops the packet
  net: libwx: fix the incorrect display of the queue number
  amd-xgbe: do not double read link status
  net/sched: Always pass notifications when child class becomes empty
  nui: Fix dma_mapping_error() check
  rose: fix dangling neighbour pointers in rose_rt_device_down()
  enic: fix incorrect MTU comparison in enic_change_mtu()
  amd-xgbe: align CL37 AN sequence as per databook
  ...

3 months agoMerge tag 'xfs-fixes-6.16-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Thu, 3 Jul 2025 16:00:04 +0000 (09:00 -0700)]
Merge tag 'xfs-fixes-6.16-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:

 - Fix umount hang with unflushable inodes (and add new tracepoint used
   for debugging this)

 - Fix ABBA deadlock in xfs_reclaim_inode() vs xfs_ifree_cluster()

 - Fix dquot buffer pin deadlock

* tag 'xfs-fixes-6.16-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: add FALLOC_FL_ALLOCATE_RANGE to supported flags mask
  xfs: fix unmount hang with unflushable inodes stuck in the AIL
  xfs: factor out stale buffer item completion
  xfs: rearrange code in xfs_buf_item.c
  xfs: add tracepoints for stale pinned inode state debug
  xfs: avoid dquot buffer pin deadlock
  xfs: catch stale AGF/AGF metadata
  xfs: xfs_ifree_cluster vs xfs_iflush_shutdown_abort deadlock
  xfs: actually use the xfs_growfs_check_rtgeom tracepoint
  xfs: Improve error handling in xfs_mru_cache_create()
  xfs: move xfs_submit_zoned_bio a bit
  xfs: use xfs_readonly_buftarg in xfs_remount_rw
  xfs: remove NULL pointer checks in xfs_mru_cache_insert
  xfs: check for shutdown before going to sleep in xfs_select_zone

3 months agoMerge tag 'nvme-6.16-2025-07-03' of git://git.infradead.org/nvme into block-6.16
Jens Axboe [Thu, 3 Jul 2025 15:42:07 +0000 (09:42 -0600)]
Merge tag 'nvme-6.16-2025-07-03' of git://git.infradead.org/nvme into block-6.16

Pull NVMe fixes from Christoph:

"- fix incorrect cdw15 value in passthru error logging (Alok Tiwari)
 - fix memory leak of bio integrity in nvmet (Dmitry Bogdanov)
 - refresh visible attrs after being checked (Eugen Hristev)
 - fix suspicious RCU usage warning in the multipath code (Geliang Tang)
 - correctly account for namespace head reference counter (Nilay Shroff)"

* tag 'nvme-6.16-2025-07-03' of git://git.infradead.org/nvme:
  nvme-multipath: fix suspicious RCU usage warning
  nvme-pci: refresh visible attrs after being checked
  nvmet: fix memory leak of bio integrity
  nvme: correctly account for namespace head reference counter
  nvme: Fix incorrect cdw15 value in passthru error logging

3 months agoMerge tag 'apple-soc-fixes-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git...
Arnd Bergmann [Thu, 3 Jul 2025 14:27:31 +0000 (16:27 +0200)]
Merge tag 'apple-soc-fixes-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/sven/linux into arm/fixes

Apple SoC fixes for 6.16

One devicetree fix for a dtbs_warning that's been present for a while:
- Rename the PCIe BCM4377 node to conform to the devicetree binding
  schema

Two devicetree fixes for W=1 warnings that have been introduced recently:
- Drop {address,size}-cells from SPI NOR which doesn't have any child
  nodes such that these don't make sense
- Move touchbar mipi {address,size}-cells from the dtsi file where the
  node is disabled and has no children to the dts file where it's
  enabled and its children are declared

Signed-off-by: Sven Peter <sven@kernel.org>
* tag 'apple-soc-fixes-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/sven/linux:
  arm64: dts: apple: Move touchbar mipi {address,size}-cells from dtsi to dts
  arm64: dts: apple: Drop {address,size}-cells from SPI NOR
  arm64: dts: apple: t8103: Fix PCIe BCM4377 nodename

3 months agoMerge tag 'optee-fix-for-v6.16' of https://git.kernel.org/pub/scm/linux/kernel/git...
Arnd Bergmann [Thu, 3 Jul 2025 14:26:08 +0000 (16:26 +0200)]
Merge tag 'optee-fix-for-v6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/jenswi/linux-tee into arm/fixes

A fix in the OP-TEE driver for v6.16

Fixing a sleep in atomic context in the FF-A notification callback by
adding a work queue to process in a non-atomic context.

* tag 'optee-fix-for-v6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/jenswi/linux-tee:
  optee: ffa: fix sleep in atomic context

3 months agoMerge tag 'samsung-fixes-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git...
Arnd Bergmann [Thu, 3 Jul 2025 14:23:53 +0000 (16:23 +0200)]
Merge tag 'samsung-fixes-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into arm/fixes

Samsung SoC fixes for v6.16

1. Correct CONFIG option in arm64 defconfig enabling the Qualcomm SoC
   SNPS EUSB2 phy driver, because Kconfig entry was renamed when
   changing the driver to a common one, shared with Samsung SoC, thus
   defconfig lost that driver effectively.

2. Exynos ACPM: Fix timeouts happening with multiple requests.

* tag 'samsung-fixes-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
  firmware: exynos-acpm: fix timeouts on xfers handling
  arm64: defconfig: update renamed PHY_SNPS_EUSB2

3 months agodrm/xe: Do not wedge device on killed exec queues
Matthew Brost [Tue, 24 Jun 2025 17:41:03 +0000 (10:41 -0700)]
drm/xe: Do not wedge device on killed exec queues

When a user closes an exec queue or interrupts an app with Ctrl-C,
this does not warrant wedging the device in mode 2.

Avoid this by skipping the wedge check for killed exec queues in
the TDR and LR exec queue cleanup worker.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250624174103.2707941-1-matthew.brost@intel.com
(cherry picked from commit 5a2f117a80c207372513ca8964eeb178874f4990)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agodrm/xe: Extend WA 14018094691 to BMG
Daniele Ceraolo Spurio [Fri, 13 Jun 2025 23:11:29 +0000 (16:11 -0700)]
drm/xe: Extend WA 14018094691 to BMG

This WA is applicable to BMG as well.

Note that this is a GSC WA and we don't load the GSC on BMG, so
extending the WA to BMG won't do anything right now. However, it helps
future-proof the driver so that if we ever turn the GSC on we won't have
to remember to extend this WA.

v2: don't use VERSION_RANGE from 2001 to 2004 (Matt)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250613231128.1261815-2-daniele.ceraolospurio@intel.com
(cherry picked from commit 1a5ce0c5b95b0624ebd44f574b98003a466973be)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agoMerge tag 'ffa-fixes-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep...
Arnd Bergmann [Thu, 3 Jul 2025 11:56:15 +0000 (13:56 +0200)]
Merge tag 'ffa-fixes-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes

Arm FF-A fixes for v6.16

Couple of fixes to address:

1. The safety and memory issues in the FF-A notification callback handler:

   The fixes replaces a mutex with an rwlock to prevent sleeping in atomic
   context, resolving kernel warnings. Memory allocation is moved outside
   the lock to support this transition safely. Additionally, a memory leak
   in the notifier unregistration path is fixed by properly freeing the
   callback node.

2. The missing entry in struct ffa_indirect_msg_hdr:

   The fix adds the missing 32 bit reserved entry in the structure as
   required by the FF-A specification.

* tag 'ffa-fixes-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  firmware: arm_ffa: Fix the missing entry in struct ffa_indirect_msg_hdr
  firmware: arm_ffa: Replace mutex with rwlock to avoid sleep in atomic context
  firmware: arm_ffa: Move memory allocation outside the mutex locking
  firmware: arm_ffa: Fix memory leak by freeing notifier callback node

Link: https://lore.kernel.org/r/20250609105207.1185570-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 months agoregulator: gpio: Fix the out-of-bounds access to drvdata::gpiods
Manivannan Sadhasivam [Thu, 3 Jul 2025 10:35:49 +0000 (16:05 +0530)]
regulator: gpio: Fix the out-of-bounds access to drvdata::gpiods

drvdata::gpiods is supposed to hold an array of 'gpio_desc' pointers. But
the memory is allocated for only one pointer. This will lead to
out-of-bounds access later in the code if 'config::ngpios' is > 1. So
fix the code to allocate enough memory to hold 'config::ngpios' of GPIO
descriptors.

While at it, also move the check for memory allocation failure to be below
the allocation to make it more readable.

Cc: stable@vger.kernel.org # 5.0
Fixes: d6cd33ad7102 ("regulator: gpio: Convert to use descriptors")
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250703103549.16558-1-mani@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
3 months agoRevert "ACPI: battery: negate current when discharging"
Rafael J. Wysocki [Thu, 3 Jul 2025 10:54:55 +0000 (12:54 +0200)]
Revert "ACPI: battery: negate current when discharging"

Revert commit 234f71555019 ("ACPI: battery: negate current when
discharging") breaks not one but several userspace implementations
of battery monitoring: Steam and MangoHud. Perhaps it breaks more,
but those are the two that have been tested.

Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Closes: https://lore.kernel.org/linux-acpi/87C1B2AF-D430-4568-B620-14B941A8ABA4@linux.dev/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 months agovsock/vmci: Clear the vmci transport packet properly when initializing it
HarshaVardhana S A [Tue, 1 Jul 2025 12:22:54 +0000 (14:22 +0200)]
vsock/vmci: Clear the vmci transport packet properly when initializing it

In vmci_transport_packet_init memset the vmci_transport_packet before
populating the fields to avoid any uninitialised data being left in the
structure.

Cc: Bryan Tan <bryan-bt.tan@broadcom.com>
Cc: Vishnu Dasa <vishnu.dasa@broadcom.com>
Cc: Broadcom internal kernel review list
Cc: Stefano Garzarella <sgarzare@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: virtualization@lists.linux.dev
Cc: netdev@vger.kernel.org
Cc: stable <stable@kernel.org>
Signed-off-by: HarshaVardhana S A <harshavardhana.sa@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250701122254.2397440-1-gregkh@linuxfoundation.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agodt-bindings: net: sophgo,sg2044-dwmac: Drop status from the example
Krzysztof Kozlowski [Tue, 1 Jul 2025 06:36:22 +0000 (08:36 +0200)]
dt-bindings: net: sophgo,sg2044-dwmac: Drop status from the example

Examples should be complete and should not have a 'status' property,
especially a disabled one because this disables the dt_binding_check of
the example against the schema.  Dropping 'status' property shows
missing other properties - phy-mode and phy-handle.

Fixes: 114508a89ddc ("dt-bindings: net: Add support for Sophgo SG2044 dwmac")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Reviewed-by: Chen Wang <unicorn_wang@outlook.com>
Link: https://patch.msgid.link/20250701063621.23808-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agoMerge branch 'fix-irq-vectors'
Paolo Abeni [Thu, 3 Jul 2025 09:51:41 +0000 (11:51 +0200)]
Merge branch 'fix-irq-vectors'

Jiawen Wu says:

====================
Fix IRQ vectors

The interrupt vector order was adjusted by [1]commit 937d46ecc5f9 ("net:
wangxun: add ethtool_ops for channel number") in Linux-6.8. Because at
that time, the MISC interrupt acts as the parent interrupt in the GPIO
IRQ chip. When the number of Rx/Tx ring changes, the last MISC
interrupt must be reallocated. Then the GPIO interrupt controller would
be corrupted. So the initial plan was to adjust the sequence of the
interrupt vectors, let MISC interrupt to be the first one and do not
free it.

Later, irq_domain was introduced in [2]commit aefd013624a1 ("net: txgbe:
use irq_domain for interrupt controller") to avoid this problem.
However, the vector sequence adjustment was not reverted. So there is
still one problem that has been left unresolved.

Due to hardware limitations of NGBE, queue IRQs can only be requested
on vector 0 to 7. When the number of queues is set to the maximum 8,
the PCI IRQ vectors are allocated from 0 to 8. The vector 0 is used by
MISC interrupt, and althrough the vector 8 is used by queue interrupt,
it is unable to receive packets. This will cause some packets to be
dropped when RSS is enabled and they are assigned to queue 8.

This patch set fix the above problems.

[1] https://git.kernel.org/netdev/net-next/c/937d46ecc5f9
[2] https://git.kernel.org/netdev/net-next/c/aefd013624a1
====================

Link: https://patch.msgid.link/20250701063030.59340-1-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agonet: ngbe: specify IRQ vector when the number of VFs is 7
Jiawen Wu [Tue, 1 Jul 2025 06:30:30 +0000 (14:30 +0800)]
net: ngbe: specify IRQ vector when the number of VFs is 7

For NGBE devices, the queue number is limited to be 1 when SRIOV is
enabled. In this case, IRQ vector[0] is used for MISC and vector[1] is
used for queue, based on the previous patches. But for the hardware
design, the IRQ vector[1] must be allocated for use by the VF[6] when
the number of VFs is 7. So the IRQ vector[0] should be shared for PF
MISC and QUEUE interrupts.

+-----------+----------------------+
| Vector    | Assigned To          |
+-----------+----------------------+
| Vector 0  | PF MISC and QUEUE    |
| Vector 1  | VF 6                 |
| Vector 2  | VF 5                 |
| Vector 3  | VF 4                 |
| Vector 4  | VF 3                 |
| Vector 5  | VF 2                 |
| Vector 6  | VF 1                 |
| Vector 7  | VF 0                 |
+-----------+----------------------+

Minimize code modifications, only adjust the IRQ vector number for this
case.

Fixes: 877253d2cbf2 ("net: ngbe: add sriov function support")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20250701063030.59340-4-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agonet: wangxun: revert the adjustment of the IRQ vector sequence
Jiawen Wu [Tue, 1 Jul 2025 06:30:29 +0000 (14:30 +0800)]
net: wangxun: revert the adjustment of the IRQ vector sequence

Due to hardware limitations of NGBE, queue IRQs can only be requested
on vector 0 to 7. When the number of queues is set to the maximum 8,
the PCI IRQ vectors are allocated from 0 to 8. The vector 0 is used by
MISC interrupt, and althrough the vector 8 is used by queue interrupt,
it is unable to receive packets. This will cause some packets to be
dropped when RSS is enabled and they are assigned to queue 8.

So revert the adjustment of the MISC IRQ location, to make it be the
last one in IRQ vectors.

Fixes: 937d46ecc5f9 ("net: wangxun: add ethtool_ops for channel number")
Cc: stable@vger.kernel.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20250701063030.59340-3-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agonet: txgbe: request MISC IRQ in ndo_open
Jiawen Wu [Tue, 1 Jul 2025 06:30:28 +0000 (14:30 +0800)]
net: txgbe: request MISC IRQ in ndo_open

Move the creating of irq_domain for MISC IRQ from .probe to .ndo_open,
and free it in .ndo_stop, to maintain consistency with the queue IRQs.
This it for subsequent adjustments to the IRQ vectors.

Fixes: aefd013624a1 ("net: txgbe: use irq_domain for interrupt controller")
Cc: stable@vger.kernel.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250701063030.59340-2-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agoMerge branch 'virtio-fixes-for-tx-ring-sizing-and-resize-error-reporting'
Paolo Abeni [Thu, 3 Jul 2025 09:40:04 +0000 (11:40 +0200)]
Merge branch 'virtio-fixes-for-tx-ring-sizing-and-resize-error-reporting'

Laurent Vivier says:

====================
virtio: Fixes for TX ring sizing and resize error reporting

This patch series contains two fixes and a cleanup for the virtio subsystem.

The first patch fixes an error reporting bug in virtio_ring's
virtqueue_resize() function. Previously, errors from internal resize
helpers could be masked if the subsequent re-enabling of the virtqueue
succeeded. This patch restores the correct error propagation, ensuring that
callers of virtqueue_resize() are properly informed of underlying resize
failures.

The second patch does a cleanup of the use of '2+MAX_SKB_FRAGS'

The third patch addresses a reliability issue in virtio_net where the TX
ring size could be configured too small, potentially leading to
persistently stopped queues and degraded performance. It enforces a
minimum TX ring size to ensure there's always enough space for at least one
maximally-fragmented packet plus an additional slot.
====================

Link: https://patch.msgid.link/20250521092236.661410-1-lvivier@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agovirtio_net: Enforce minimum TX ring size for reliability
Laurent Vivier [Wed, 21 May 2025 09:22:36 +0000 (11:22 +0200)]
virtio_net: Enforce minimum TX ring size for reliability

The `tx_may_stop()` logic stops TX queues if free descriptors
(`sq->vq->num_free`) fall below the threshold of (`MAX_SKB_FRAGS` + 2).
If the total ring size (`ring_num`) is not strictly greater than this
value, queues can become persistently stopped or stop after minimal
use, severely degrading performance.

A single sk_buff transmission typically requires descriptors for:
- The virtio_net_hdr (1 descriptor)
- The sk_buff's linear data (head) (1 descriptor)
- Paged fragments (up to MAX_SKB_FRAGS descriptors)

This patch enforces that the TX ring size ('ring_num') must be strictly
greater than (MAX_SKB_FRAGS + 2). This ensures that the ring is
always large enough to hold at least one maximally-fragmented packet
plus at least one additional slot.

Reported-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250521092236.661410-4-lvivier@redhat.com
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agovirtio_net: Cleanup '2+MAX_SKB_FRAGS'
Laurent Vivier [Wed, 21 May 2025 09:22:35 +0000 (11:22 +0200)]
virtio_net: Cleanup '2+MAX_SKB_FRAGS'

Improve consistency by using everywhere it is needed
'MAX_SKB_FRAGS + 2' rather than '2+MAX_SKB_FRAGS' or
'2 + MAX_SKB_FRAGS'.

No functional change.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250521092236.661410-3-lvivier@redhat.com
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agovirtio_ring: Fix error reporting in virtqueue_resize
Laurent Vivier [Wed, 21 May 2025 09:22:34 +0000 (11:22 +0200)]
virtio_ring: Fix error reporting in virtqueue_resize

The virtqueue_resize() function was not correctly propagating error codes
from its internal resize helper functions, specifically
virtqueue_resize_packet() and virtqueue_resize_split(). If these helpers
returned an error, but the subsequent call to virtqueue_enable_after_reset()
succeeded, the original error from the resize operation would be masked.
Consequently, virtqueue_resize() could incorrectly report success to its
caller despite an underlying resize failure.

This change restores the original code behavior:

       if (vdev->config->enable_vq_after_reset(_vq))
               return -EBUSY;

       return err;

Fix: commit ad48d53b5b3f ("virtio_ring: separate the logic of reset/enable from virtqueue_resize")
Cc: xuanzhuo@linux.alibaba.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250521092236.661410-2-lvivier@redhat.com
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agoHID: appletb-kbd: fix slab use-after-free bug in appletb_kbd_probe
Qasim Ijaz [Tue, 24 Jun 2025 12:52:56 +0000 (13:52 +0100)]
HID: appletb-kbd: fix slab use-after-free bug in appletb_kbd_probe

In probe appletb_kbd_probe() a "struct appletb_kbd *kbd" is allocated
via devm_kzalloc() to store touch bar keyboard related data.
Later on if backlight_device_get_by_name() finds a backlight device
with name "appletb_backlight" a timer (kbd->inactivity_timer) is setup
with appletb_inactivity_timer() and the timer is armed to run after
appletb_tb_dim_timeout (60) seconds.

A use-after-free is triggered when failure occurs after the timer is
armed. This ultimately means probe failure occurs and as a result the
"struct appletb_kbd *kbd" which is device managed memory is freed.
After 60 seconds the timer will have expired and __run_timers will
attempt to access the timer (kbd->inactivity_timer) however the kdb
structure has been freed causing a use-after free.

[   71.636938] ==================================================================
[   71.637915] BUG: KASAN: slab-use-after-free in __run_timers+0x7ad/0x890
[   71.637915] Write of size 8 at addr ffff8881178c5958 by task swapper/1/0
[   71.637915]
[   71.637915] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.16.0-rc2-00318-g739a6c93cc75-dirty #12 PREEMPT(voluntary)
[   71.637915] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   71.637915] Call Trace:
[   71.637915]  <IRQ>
[   71.637915]  dump_stack_lvl+0x53/0x70
[   71.637915]  print_report+0xce/0x670
[   71.637915]  ? __run_timers+0x7ad/0x890
[   71.637915]  kasan_report+0xce/0x100
[   71.637915]  ? __run_timers+0x7ad/0x890
[   71.637915]  __run_timers+0x7ad/0x890
[   71.637915]  ? __pfx___run_timers+0x10/0x10
[   71.637915]  ? update_process_times+0xfc/0x190
[   71.637915]  ? __pfx_update_process_times+0x10/0x10
[   71.637915]  ? _raw_spin_lock_irq+0x80/0xe0
[   71.637915]  ? _raw_spin_lock_irq+0x80/0xe0
[   71.637915]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[   71.637915]  run_timer_softirq+0x141/0x240
[   71.637915]  ? __pfx_run_timer_softirq+0x10/0x10
[   71.637915]  ? __pfx___hrtimer_run_queues+0x10/0x10
[   71.637915]  ? kvm_clock_get_cycles+0x18/0x30
[   71.637915]  ? ktime_get+0x60/0x140
[   71.637915]  handle_softirqs+0x1b8/0x5c0
[   71.637915]  ? __pfx_handle_softirqs+0x10/0x10
[   71.637915]  irq_exit_rcu+0xaf/0xe0
[   71.637915]  sysvec_apic_timer_interrupt+0x6c/0x80
[   71.637915]  </IRQ>
[   71.637915]
[   71.637915] Allocated by task 39:
[   71.637915]  kasan_save_stack+0x33/0x60
[   71.637915]  kasan_save_track+0x14/0x30
[   71.637915]  __kasan_kmalloc+0x8f/0xa0
[   71.637915]  __kmalloc_node_track_caller_noprof+0x195/0x420
[   71.637915]  devm_kmalloc+0x74/0x1e0
[   71.637915]  appletb_kbd_probe+0x37/0x3c0
[   71.637915]  hid_device_probe+0x2d1/0x680
[   71.637915]  really_probe+0x1c3/0x690
[   71.637915]  __driver_probe_device+0x247/0x300
[   71.637915]  driver_probe_device+0x49/0x210
[...]
[   71.637915]
[   71.637915] Freed by task 39:
[   71.637915]  kasan_save_stack+0x33/0x60
[   71.637915]  kasan_save_track+0x14/0x30
[   71.637915]  kasan_save_free_info+0x3b/0x60
[   71.637915]  __kasan_slab_free+0x37/0x50
[   71.637915]  kfree+0xcf/0x360
[   71.637915]  devres_release_group+0x1f8/0x3c0
[   71.637915]  hid_device_probe+0x315/0x680
[   71.637915]  really_probe+0x1c3/0x690
[   71.637915]  __driver_probe_device+0x247/0x300
[   71.637915]  driver_probe_device+0x49/0x210
[...]

The root cause of the issue is that the timer is not disarmed
on failure paths leading to it remaining active and accessing
freed memory. To fix this call timer_delete_sync() to deactivate
the timer.

Another small issue is that timer_delete_sync is called
unconditionally in appletb_kbd_remove(), fix this by checking
for a valid kbd->backlight_dev before calling timer_delete_sync.

Fixes: 93a0fc489481 ("HID: hid-appletb-kbd: add support for automatic brightness control while using the touchbar")
Cc: stable@vger.kernel.org
Signed-off-by: Qasim Ijaz <qasdev00@gmail.com>
Reviewed-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
3 months agovirtio-net: xsk: rx: fix the frame's length check
Bui Quang Minh [Mon, 30 Jun 2025 15:13:14 +0000 (22:13 +0700)]
virtio-net: xsk: rx: fix the frame's length check

When calling buf_to_xdp, the len argument is the frame data's length
without virtio header's length (vi->hdr_len). We check that len with

xsk_pool_get_rx_frame_size() + vi->hdr_len

to ensure the provided len does not larger than the allocated chunk
size. The additional vi->hdr_len is because in virtnet_add_recvbuf_xsk,
we use part of XDP_PACKET_HEADROOM for virtio header and ask the vhost
to start placing data from

hard_start + XDP_PACKET_HEADROOM - vi->hdr_len
not
hard_start + XDP_PACKET_HEADROOM

But the first buffer has virtio_header, so the maximum frame's length in
the first buffer can only be

xsk_pool_get_rx_frame_size()
not
xsk_pool_get_rx_frame_size() + vi->hdr_len

like in the current check.

This commit adds an additional argument to buf_to_xdp differentiate
between the first buffer and other ones to correctly calculate the maximum
frame's length.

Cc: stable@vger.kernel.org
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Fixes: a4e7ba702701 ("virtio_net: xsk: rx: support recv small mode")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20250630151315.86722-2-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agoMerge branch 'virtio-net-fixes-for-mergeable-xdp-receive-path'
Paolo Abeni [Thu, 3 Jul 2025 08:56:57 +0000 (10:56 +0200)]
Merge branch 'virtio-net-fixes-for-mergeable-xdp-receive-path'

Bui Quang Minh says:

====================
virtio-net: fixes for mergeable XDP receive path

This series contains fixes for XDP receive path in virtio-net
- Patch 1: add a missing check for the received data length with our
allocated buffer size in mergeable mode.
- Patch 2: remove a redundant truesize check with PAGE_SIZE in mergeable
mode
- Patch 3: make the current repeated code use the check_mergeable_len to
check for received data length in mergeable mode
====================

Link: https://patch.msgid.link/20250630144212.48471-1-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agovirtio-net: use the check_mergeable_len helper
Bui Quang Minh [Mon, 30 Jun 2025 14:42:12 +0000 (21:42 +0700)]
virtio-net: use the check_mergeable_len helper

Replace the current repeated code to check received length in mergeable
mode with the new check_mergeable_len helper.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250630144212.48471-4-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agovirtio-net: remove redundant truesize check with PAGE_SIZE
Bui Quang Minh [Mon, 30 Jun 2025 14:42:11 +0000 (21:42 +0700)]
virtio-net: remove redundant truesize check with PAGE_SIZE

The truesize is guaranteed not to exceed PAGE_SIZE in
get_mergeable_buf_len(). It is saved in mergeable context, which is not
changeable by the host side, so the check in receive path is quite
redundant.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20250630144212.48471-3-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agovirtio-net: ensure the received length does not exceed allocated size
Bui Quang Minh [Mon, 30 Jun 2025 14:42:10 +0000 (21:42 +0700)]
virtio-net: ensure the received length does not exceed allocated size

In xdp_linearize_page, when reading the following buffers from the ring,
we forget to check the received length with the true allocate size. This
can lead to an out-of-bound read. This commit adds that missing check.

Cc: <stable@vger.kernel.org>
Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250630144212.48471-2-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 months agoperf: Revert to requiring CAP_SYS_ADMIN for uprobes
Peter Zijlstra [Wed, 2 Jul 2025 16:21:44 +0000 (18:21 +0200)]
perf: Revert to requiring CAP_SYS_ADMIN for uprobes

Jann reports that uprobes can be used destructively when used in the
middle of an instruction. The kernel only verifies there is a valid
instruction at the requested offset, but due to variable instruction
length cannot determine if this is an instruction as seen by the
intended execution stream.

Additionally, Mark Rutland notes that on architectures that mix data
in the text segment (like arm64), a similar things can be done if the
data word is 'mistaken' for an instruction.

As such, require CAP_SYS_ADMIN for uprobes.

Fixes: c9e0924e5c2b ("perf/core: open access to probes for CAP_PERFMON privileged process")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/CAG48ez1n4520sq0XrWYDHKiKxE_+WCfAK+qt9qkY4ZiBGmL-5g@mail.gmail.com
3 months agoHID: Fix debug name for BTN_GEAR_DOWN, BTN_GEAR_UP, BTN_WHEEL
Vicki Pfau [Wed, 2 Jul 2025 03:46:42 +0000 (20:46 -0700)]
HID: Fix debug name for BTN_GEAR_DOWN, BTN_GEAR_UP, BTN_WHEEL

The name of BTN_GEAR_DOWN was WheelBtn and BTN_WHEEL was missing. Further,
BTN_GEAR_UP had a space in its name and no Btn, which is against convention.
This makes the names BtnGearDown, BtnGearUp, and BtnWheel, fixing the errors
and matching convention.

Signed-off-by: Vicki Pfau <vi@endrift.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
3 months agoHID: elecom: add support for ELECOM HUGE 019B variant
Leonard Dizon [Sun, 29 Jun 2025 21:48:30 +0000 (05:48 +0800)]
HID: elecom: add support for ELECOM HUGE 019B variant

The ELECOM M-HT1DRBK trackball has an additional device ID (056E:019B)
not yet recognized by the driver, despite using the same report
descriptor as earlier variants. This patch adds the new ID and applies
the same fixups, enabling all 8 buttons to function properly.

Signed-off-by: Leonard Dizon <leonard@snekbyte.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
3 months agoHID: appletb-kbd: fix memory corruption of input_handler_list
Qasim Ijaz [Fri, 27 Jun 2025 11:01:21 +0000 (12:01 +0100)]
HID: appletb-kbd: fix memory corruption of input_handler_list

In appletb_kbd_probe an input handler is initialised and then registered
with input core through input_register_handler(). When this happens input
core will add the input handler (specifically its node) to the global
input_handler_list. The input_handler_list is central to the functionality
of input core and is traversed in various places in input core. An example
of this is when a new input device is plugged in and gets registered with
input core.

The input_handler in probe is allocated as device managed memory. If a
probe failure occurs after input_register_handler() the input_handler
memory is freed, yet it will remain in the input_handler_list. This
effectively means the input_handler_list contains a dangling pointer
to data belonging to a freed input handler.

This causes an issue when any other input device is plugged in - in my
case I had an old PixArt HP USB optical mouse and I decided to
plug it in after a failure occurred after input_register_handler().
This lead to the registration of this input device via
input_register_device which involves traversing over every handler
in the corrupted input_handler_list and calling input_attach_handler(),
giving each handler a chance to bind to newly registered device.

The core of this bug is a UAF which causes memory corruption of
input_handler_list and to fix it we must ensure the input handler is
unregistered from input core, this is done through
input_unregister_handler().

[   63.191597] ==================================================================
[   63.192094] BUG: KASAN: slab-use-after-free in input_attach_handler.isra.0+0x1a9/0x1e0
[   63.192094] Read of size 8 at addr ffff888105ea7c80 by task kworker/0:2/54
[   63.192094]
[   63.192094] CPU: 0 UID: 0 PID: 54 Comm: kworker/0:2 Not tainted 6.16.0-rc2-00321-g2aa6621d
[   63.192094] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.164
[   63.192094] Workqueue: usb_hub_wq hub_event
[   63.192094] Call Trace:
[   63.192094]  <TASK>
[   63.192094]  dump_stack_lvl+0x53/0x70
[   63.192094]  print_report+0xce/0x670
[   63.192094]  kasan_report+0xce/0x100
[   63.192094]  input_attach_handler.isra.0+0x1a9/0x1e0
[   63.192094]  input_register_device+0x76c/0xd00
[   63.192094]  hidinput_connect+0x686d/0xad60
[   63.192094]  hid_connect+0xf20/0x1b10
[   63.192094]  hid_hw_start+0x83/0x100
[   63.192094]  hid_device_probe+0x2d1/0x680
[   63.192094]  really_probe+0x1c3/0x690
[   63.192094]  __driver_probe_device+0x247/0x300
[   63.192094]  driver_probe_device+0x49/0x210
[   63.192094]  __device_attach_driver+0x160/0x320
[   63.192094]  bus_for_each_drv+0x10f/0x190
[   63.192094]  __device_attach+0x18e/0x370
[   63.192094]  bus_probe_device+0x123/0x170
[   63.192094]  device_add+0xd4d/0x1460
[   63.192094]  hid_add_device+0x30b/0x910
[   63.192094]  usbhid_probe+0x920/0xe00
[   63.192094]  usb_probe_interface+0x363/0x9a0
[   63.192094]  really_probe+0x1c3/0x690
[   63.192094]  __driver_probe_device+0x247/0x300
[   63.192094]  driver_probe_device+0x49/0x210
[   63.192094]  __device_attach_driver+0x160/0x320
[   63.192094]  bus_for_each_drv+0x10f/0x190
[   63.192094]  __device_attach+0x18e/0x370
[   63.192094]  bus_probe_device+0x123/0x170
[   63.192094]  device_add+0xd4d/0x1460
[   63.192094]  usb_set_configuration+0xd14/0x1880
[   63.192094]  usb_generic_driver_probe+0x78/0xb0
[   63.192094]  usb_probe_device+0xaa/0x2e0
[   63.192094]  really_probe+0x1c3/0x690
[   63.192094]  __driver_probe_device+0x247/0x300
[   63.192094]  driver_probe_device+0x49/0x210
[   63.192094]  __device_attach_driver+0x160/0x320
[   63.192094]  bus_for_each_drv+0x10f/0x190
[   63.192094]  __device_attach+0x18e/0x370
[   63.192094]  bus_probe_device+0x123/0x170
[   63.192094]  device_add+0xd4d/0x1460
[   63.192094]  usb_new_device+0x7b4/0x1000
[   63.192094]  hub_event+0x234d/0x3fa0
[   63.192094]  process_one_work+0x5bf/0xfe0
[   63.192094]  worker_thread+0x777/0x13a0
[   63.192094]  </TASK>
[   63.192094]
[   63.192094] Allocated by task 54:
[   63.192094]  kasan_save_stack+0x33/0x60
[   63.192094]  kasan_save_track+0x14/0x30
[   63.192094]  __kasan_kmalloc+0x8f/0xa0
[   63.192094]  __kmalloc_node_track_caller_noprof+0x195/0x420
[   63.192094]  devm_kmalloc+0x74/0x1e0
[   63.192094]  appletb_kbd_probe+0x39/0x440
[   63.192094]  hid_device_probe+0x2d1/0x680
[   63.192094]  really_probe+0x1c3/0x690
[   63.192094]  __driver_probe_device+0x247/0x300
[   63.192094]  driver_probe_device+0x49/0x210
[   63.192094]  __device_attach_driver+0x160/0x320
[...]
[   63.192094]
[   63.192094] Freed by task 54:
[   63.192094]  kasan_save_stack+0x33/0x60
[   63.192094]  kasan_save_track+0x14/0x30
[   63.192094]  kasan_save_free_info+0x3b/0x60
[   63.192094]  __kasan_slab_free+0x37/0x50
[   63.192094]  kfree+0xcf/0x360
[   63.192094]  devres_release_group+0x1f8/0x3c0
[   63.192094]  hid_device_probe+0x315/0x680
[   63.192094]  really_probe+0x1c3/0x690
[   63.192094]  __driver_probe_device+0x247/0x300
[   63.192094]  driver_probe_device+0x49/0x210
[   63.192094]  __device_attach_driver+0x160/0x320
[...]

Fixes: 7d62ba8deacf ("HID: hid-appletb-kbd: add support for fn toggle between media and function mode")
Cc: stable@vger.kernel.org
Reviewed-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Qasim Ijaz <qasdev00@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
3 months agodrm/v3d: Disable interrupts before resetting the GPU
Maíra Canal [Sat, 28 Jun 2025 22:42:42 +0000 (19:42 -0300)]
drm/v3d: Disable interrupts before resetting the GPU

Currently, an interrupt can be triggered during a GPU reset, which can
lead to GPU hangs and NULL pointer dereference in an interrupt context
as shown in the following trace:

 [  314.035040] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000c0
 [  314.043822] Mem abort info:
 [  314.046606]   ESR = 0x0000000096000005
 [  314.050347]   EC = 0x25: DABT (current EL), IL = 32 bits
 [  314.055651]   SET = 0, FnV = 0
 [  314.058695]   EA = 0, S1PTW = 0
 [  314.061826]   FSC = 0x05: level 1 translation fault
 [  314.066694] Data abort info:
 [  314.069564]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
 [  314.075039]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
 [  314.080080]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
 [  314.085382] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000102728000
 [  314.091814] [00000000000000c0] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
 [  314.100511] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
 [  314.106770] Modules linked in: v3d i2c_brcmstb vc4 snd_soc_hdmi_codec gpu_sched drm_shmem_helper drm_display_helper cec drm_dma_helper drm_kms_helper drm drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd backlight
 [  314.129654] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.25+rpt-rpi-v8 #1  Debian 1:6.12.25-1+rpt1
 [  314.139388] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
 [  314.145211] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 [  314.152165] pc : v3d_irq+0xec/0x2e0 [v3d]
 [  314.156187] lr : v3d_irq+0xe0/0x2e0 [v3d]
 [  314.160198] sp : ffffffc080003ea0
 [  314.163502] x29: ffffffc080003ea0 x28: ffffffec1f184980 x27: 021202b000000000
 [  314.170633] x26: ffffffec1f17f630 x25: ffffff8101372000 x24: ffffffec1f17d9f0
 [  314.177764] x23: 000000000000002a x22: 000000000000002a x21: ffffff8103252000
 [  314.184895] x20: 0000000000000001 x19: 00000000deadbeef x18: 0000000000000000
 [  314.192026] x17: ffffff94e51d2000 x16: ffffffec1dac3cb0 x15: c306000000000000
 [  314.199156] x14: 0000000000000000 x13: b2fc982e03cc5168 x12: 0000000000000001
 [  314.206286] x11: ffffff8103f8bcc0 x10: ffffffec1f196868 x9 : ffffffec1dac3874
 [  314.213416] x8 : 0000000000000000 x7 : 0000000000042a3a x6 : ffffff810017a180
 [  314.220547] x5 : ffffffec1ebad400 x4 : ffffffec1ebad320 x3 : 00000000000bebeb
 [  314.227677] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
 [  314.234807] Call trace:
 [  314.237243]  v3d_irq+0xec/0x2e0 [v3d]
 [  314.240906]  __handle_irq_event_percpu+0x58/0x218
 [  314.245609]  handle_irq_event+0x54/0xb8
 [  314.249439]  handle_fasteoi_irq+0xac/0x240
 [  314.253527]  handle_irq_desc+0x48/0x68
 [  314.257269]  generic_handle_domain_irq+0x24/0x38
 [  314.261879]  gic_handle_irq+0x48/0xd8
 [  314.265533]  call_on_irq_stack+0x24/0x58
 [  314.269448]  do_interrupt_handler+0x88/0x98
 [  314.273624]  el1_interrupt+0x34/0x68
 [  314.277193]  el1h_64_irq_handler+0x18/0x28
 [  314.281281]  el1h_64_irq+0x64/0x68
 [  314.284673]  default_idle_call+0x3c/0x168
 [  314.288675]  do_idle+0x1fc/0x230
 [  314.291895]  cpu_startup_entry+0x3c/0x50
 [  314.295810]  rest_init+0xe4/0xf0
 [  314.299030]  start_kernel+0x5e8/0x790
 [  314.302684]  __primary_switched+0x80/0x90
 [  314.306691] Code: 940029eb 360ffc13 f9442ea0 52800001 (f9406017)
 [  314.312775] ---[ end trace 0000000000000000 ]---
 [  314.317384] Kernel panic - not syncing: Oops: Fatal exception in interrupt
 [  314.324249] SMP: stopping secondary CPUs
 [  314.328167] Kernel Offset: 0x2b9da00000 from 0xffffffc080000000
 [  314.334076] PHYS_OFFSET: 0x0
 [  314.336946] CPU features: 0x08,00002013,c0200000,0200421b
 [  314.342337] Memory Limit: none
 [  314.345382] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

Before resetting the GPU, it's necessary to disable all interrupts and
deal with any interrupt handler still in-flight. Otherwise, the GPU might
reset with jobs still running, or yet, an interrupt could be handled
during the reset.

Cc: stable@vger.kernel.org
Fixes: 57692c94dcbe ("drm/v3d: Introduce a new DRM driver for Broadcom V3D V3.x+")
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://lore.kernel.org/r/20250628224243.47599-1-mcanal@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
3 months agoMerge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Wed, 2 Jul 2025 21:52:25 +0000 (14:52 -0700)]
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-07-01 (idpf, igc)

For idpf:
Michal returns 0 for key size when RSS is not supported.

Ahmed changes control queue to a spinlock due to sleeping calls.

For igc:
Vitaly disables L1.2 PCI-E link substate on I226 devices to resolve
performance issues.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: disable L1.2 PCI-E link substate to avoid performance issue
  idpf: convert control queue mutex to a spinlock
  idpf: return 0 size for RSS key if not supported
====================

Link: https://patch.msgid.link/20250701164317.2983952-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>