Liam R. Howlett [Fri, 17 Jan 2025 20:11:31 +0000 (15:11 -0500)]
mm/slub: Avoid waiting on empty per-cpu sheaf replacement
When the per-cpu sheaf is empty, attempt to replace it with a full one
from the barn. If the barn lock is contended then fail quickly and
enter the recovery code that will refill the empty sheaf.
This is to avoid congestion on the barn in the event of many tasks
requesting allocations at the same time. Discovered by brk1_processes()
regression from will-it-scale benchmarks.
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Liam R. Howlett [Sun, 8 Dec 2024 02:29:07 +0000 (21:29 -0500)]
maple_tree: Convert forking to use the sheaf interface
Use the generic interface which should result in less bulk allocations
during a forking.
A part of this is to abstract the freeing of the sheaf or maple state
allocations into its own function so mas_destroy() and the tree
duplication code can use the same functionality to return any unused
resources.
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Liam R. Howlett [Fri, 6 Dec 2024 20:08:19 +0000 (15:08 -0500)]
maple_tree: Add single node allocation support to maple state
The fast path through a write will require replacing a single node in
the tree. Using a sheaf (32 nodes) is too heavy for the fast path, so
special case the node store operation by just allocating one node in the
maple state.
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Liam R. Howlett [Wed, 4 Dec 2024 20:26:33 +0000 (15:26 -0500)]
maple_tree: Sheaf conversion
Sheaf conversion for super duper fast allocations, probably.
Remove push/pop of nodes from maple state.
Remove unnecessary testing
ifdef out other testing that probably will be deleted
Fix testcase for testing race
Move some testing around in the same commit.
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Liam R. Howlett [Mon, 2 Dec 2024 21:33:10 +0000 (16:33 -0500)]
tools: Add testing support for changes to rcu and slab for sheaves
Make testing work for the slab and rcu changes that have come in with
the sheaves work.
This only works with one kmem_cache, and only the first one used.
Subsequent setting of keme_cache will not update the active kmem_cache
and will be silently dropped because there are other tests which happen
after the kmem_cache of interest is set.
The saved active kmem_cache is used in the rcu callback, which passes
the object to be freed.
The rcu call takes the rcu_head, which is passed in as the field in the
struct (in this case rcu in the maple tree node), which is calculated by
pointer math. The offset of which is saved (in a global variable) for
restoring the node pointer on the callback after the rcu grace period
expires.
Don't use any of this outside of testing, please.
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Vlastimil Babka [Tue, 5 Nov 2024 16:00:08 +0000 (17:00 +0100)]
mm, slub: sheaf prefilling for guaranteed allocations
Add three functions for efficient guaranteed allocations in a critical
section (that cannot sleep) when the exact number of allocations is not
known beforehand, but an upper limit can be calculated.
kmem_cache_prefill_sheaf() returns a sheaf containing at least given
number of objects.
kmem_cache_alloc_from_sheaf() will allocate an object from the sheaf
and is guaranteed not to fail until depleted.
kmem_cache_return_sheaf() is for giving the sheaf back to the slab
allocator after the critical section. This will also attempt to refill
it to cache's sheaf capacity for better efficiency of sheaves handling,
but it's not stricly necessary to succeed.
TODO: the current implementation is limited to cache's sheaf_capacity
Vlastimil Babka [Wed, 28 Aug 2024 09:28:19 +0000 (11:28 +0200)]
mm, slub: cheaper locking for percpu sheaves
Instead of local_lock_irqsave(), use just get_cpu_ptr() (which only
disables preemption) and then set an active flag. If potential callers
include irq handler, the operation must use a trylock variant that bails
out if the flag is already set to active because we interrupted another
operation in progress.
Changing the flag doesn't need to be atomic as the irq is one the same
cpu. This should make using percpu sheaves cheaper, with the downside of
some unlucky operations in irq handlers have to fallback to non-sheave
variants. That should be rare so there should be a net benefit.
On PREEMPT_RT we can use simply local_lock() as that does the right
thing without the need to disable irqs.
Thanks to Mateusz Guzik and Jann Horn for suggesting this kind of
locking scheme in online conversations. Initially attempted to fully
copy the page allocator's pcplist locking, but its reliance on
spin_trylock() made it much more costly.
Create the vm_area_struct cache with percpu sheaves of size 32 to
hopefully improve its performance. For CONFIG_PER_VMA_LOCK, change the
vma freeing from custom call_rcu() callback to kfree_rcu() which
will perform rcu_free sheaf batching. Since there may be additional
structures attached and they are freed only after the grace period,
create a __vma_area_rcu_free_dtor() to do that.
Note I have not investigated whether vma_numab_state_free() or
free_anon_vma_name() must really need to wait for the grace period. For
vma_lock_free() ideally we wouldn't free it at all when freeing the vma
to the sheaf (or even slab page), but that would require using also a
ctor for vmas to allocate the vma lock, and reintroducing dtor support
for deallocating the lock when freeing slab pages containing the vmas.
The plan is to move vma_lock into vma itself anyway, so if the rest can
be freed immediately, the whole destructor support won't be needed
anymore.
Vlastimil Babka [Mon, 7 Aug 2023 18:52:02 +0000 (20:52 +0200)]
maple_tree: use percpu sheaves for maple_node_cache
Setup the maple_node_cache with percpu sheaves of size 32 to hopefully
improve its performance. Change the single node rcu freeing in
ma_free_rcu() to use kfree_rcu() instead of the custom callback, which
allows the rcu_free sheaf batching to be used. Note there are other
users of mt_free_rcu() where larger parts of maple tree are submitted to
call_rcu() as a whole, and that cannot use the rcu_free sheaf, but it's
still possible for maple nodes freed this way to be reused via the barn,
even if only some cpus are allowed to process rcu callbacks.
mm/slub: add sheaf support for batching kfree_rcu() operations
Extend the sheaf infrastructure for more efficient kfree_rcu() handling.
For caches where sheafs are initialized, on each cpu maintain a rcu_free
sheaf in addition to main and spare sheaves.
kfree_rcu() operations will try to put objects on this sheaf. Once full,
the sheaf is detached and submitted to call_rcu() with a handler that
will try to put in on the barn, or flush to slab pages using bulk free,
when the barn is full. Then a new empty sheaf must be obtained to put
more objects there.
It's possible that no free sheafs are available to use for a new
rcu_free sheaf, and the allocation in kfree_rcu() context can only use
GFP_NOWAIT and thus may fail. In that case, fall back to the existing
kfree_rcu() machinery.
Because some intended users will need to perform additonal cleanups
after the grace period and thus have custom rcu_call() callbacks today,
add the possibility to specify a kfree_rcu() specific destructor.
Because of the fall back possibility, the destructor now needs be
invoked also from within RCU, so add __kvfree_rcu() that RCU can use
instead of kvfree().
Expected advantages:
- batching the kfree_rcu() operations, that could eventually replace the
batching done in RCU itself
- sheafs can be reused via barn instead of being flushed to slabs, which
is more effective
- this includes cases where only some cpus are allowed to process rcu
callbacks (Android)
Possible disadvantage:
- objects might be waiting for more than their grace period (it is
determined by the last object freed into the sheaf), increasing memory
usage - but that might be true for the batching done by RCU as well?
RFC LIMITATIONS: - only tree rcu is converted, not tiny
- the rcu fallback might resort to kfree_bulk(), not kvfree(). Instead
of adding a variant of kfree_bulk() with destructors, is there an easy
way to disable the kfree_bulk() path in the fallback case?
Vlastimil Babka [Wed, 15 Nov 2023 10:38:15 +0000 (11:38 +0100)]
mm/slub: add opt-in caching layer of percpu sheaves
Specifying a non-zero value for a new struct kmem_cache_args field
sheaf_capacity will setup a caching layer of percpu arrays called
sheaves of given capacity for the created cache.
Allocations from the cache will allocate via the percpu sheaves (main or
spare) as long as they have no NUMA node preference. Frees will also
refill the the sheaves.
When both percpu sheaves are found empty during an allocation, an empty
sheaf may be replaced with a full one from the per-node barn. If none
are available and the allocation is allowed to block, an empty sheaf is
refilled from slab(s) by an internal bulk alloc operation. When both
percpu sheaves are full during freeing, the barn can replace a full one
with an empty one, unless over a full sheaves limit. In that case a
sheaf is flushed to slab(s) by an internal bulk free operation. Flushing
sheaves and barns is also wired to the existing cpu flushing and cache
shrinking operations.
The sheaves do not distinguish NUMA locality of the cached objects. If
an allocation is requested with kmem_cache_alloc_node() with a specific
node (not NUMA_NO_NODE), sheaves are bypassed.
The bulk operations exposed to slab users also try to utilize the
sheaves as long as the necessary (full or empty) sheaves are available
on the cpu or in the barn. Once depleted, they will fallback to bulk
alloc/free to slabs directly to avoid double copying.
Sysfs stat counters alloc_cpu_sheaf and free_cpu_sheaf count objects
allocated or freed using the sheaves. Counters sheaf_refill,
sheaf_flush_main and sheaf_flush_other count objects filled or flushed
from or to slab pages, and can be used to assess how effective the
caching is. The refill and flush operations will also count towards the
usual alloc_fastpath/slowpath, free_fastpath/slowpath and other
counters.
Access to the percpu sheaves is protected by local_lock_irqsave()
operations, each per-NUMA-node barns has a spin_lock.
A current limitation is that when slub_debug is enabled for a cache with
percpu sheaves, the objects in the array are considered as allocated from
the slub_debug perspective, and the alloc/free debugging hooks occur
when moving the objects between the array and slab pages. This means
that e.g. an use-after-free that occurs for an object cached in the
array is undetected. Collected alloc/free stacktraces might also be less
useful. This limitation could be changed in the future.
On the other hand, KASAN, kmemcg and other hooks are executed on actual
allocations and frees by kmem_cache users even if those use the array,
so their debugging or accounting accuracy should be unaffected.
Vlastimil Babka [Tue, 28 Nov 2023 17:49:22 +0000 (18:49 +0100)]
SLUB percpu sheaves
Hi,
This is a RFC to add an opt-in percpu array-based caching layer to SLUB.
The name "sheaf" was invented by Matthew so we don't call it magazine
like the original Bonwick paper. The per-NUMA-node cache of sheaves is
thus called "barn".
This may seem similar to the arrays in SLAB, but the main differences
are:
- opt-in, not used for every cache
- does not distinguish NUMA locality, thus no "alien" arrays that would
need periodical flushing
- improves kfree_rcu() handling
- API for obtaining a preallocated sheaf that can be used for guaranteed
and efficient allocations in a restricted context, when upper bound is
known but rarely reached
The motivation comes mainly from the ongoing work related to VMA
scalability and the related maple tree operations. This is why maple
tree node and vma caches are sheaf-enabled in the RFC. Performance benefits
were measured by Suren in preliminary non-public versions.
A sheaf-enabled cache has the following expected advantages:
- Cheaper fast paths. For allocations, instead of local double cmpxchg,
with Patch 5 it's preempt_disable() and no atomic operations. Same for
freeing, which is normally a local double cmpxchg only for a short
term allocations (so the same slab is still active on the same cpu when
freeing the object) and a more costly locked double cmpxchg otherwise.
The downside is lack of NUMA locality guarantees for the allocated
objects.
I hope this scheme will also allow (non-guaranteed) slab allocations
in context where it's impossible today and achieved by building caches
on top of slab, i.e. the BPF allocator.
- kfree_rcu() batching. kfree_rcu() will put objects to a separate
percpu sheaf and only submit the whole sheaf to call_rcu() when full.
After the grace period, the sheaf can be used for allocations, which
is more efficient than handling individual slab objects (even with the
batching done by kfree_rcu() implementation itself). In case only some
cpus are allowed to handle rcu callbacks, the sheaf can still be made
available to other cpus on the same node via the shared barn.
Both maple_node and vma caches can benefit from this.
- Preallocation support. A prefilled sheaf can be borrowed for a short
term operation that is not allowed to block and may need to allocate
some objects. If an upper bound (worst case) for the number of
allocations is known, but only much fewer allocations actually needed
on average, borrowing and returning a sheaf is much more efficient then
a bulk allocation for the worst case followed by a bulk free of the
many unused objects. Maple tree write operations should benefit from
this.
Patch 1 implements the basic sheaf functionality and using
local_lock_irqsave() for percpu sheaf locking.
Patch 2 adds the kfree_rcu() support.
Patches 3 and 4 enable sheaves for maple tree nodes and vma's.
Patch 5 replaces the local_lock_irqsave() locking with a cheaper scheme
inspired by online conversations with Mateusz Guzik and Jann Horn. In
the past I have tried to copy the scheme from page allocator's pcplists
that also avoids disabling irqs by using a trylock for operations that
might be attempted from an irq handler conext. But spin locks used for
pcplists are more costly than a simple flag with only compiler barriers.
On the other hand it's not possible to take the lock from a different
cpu (except for hotplug handling when the actual local cpu cannot race
with us), but we don't need that remote locking for sheaves.
Patch 6 implements borrowing prefilled sheaf, with maple tree being the
ancticipated user once converted to use it by someone more knowledgeable
than myself.
(RFC) LIMITATIONS:
- with slub_debug enabled, objects in sheaves are considered allocated
so allocation/free stacktraces may become imprecise and checking of
e.g. redzone violations may be delayed
- kfree_rcu() via sheaf is only hooked to tree rcu, not tiny rcu. Also
in case we fail to allocate a sheaf, and fallback to the existing
implementation, it may use kfree_bulk() where destructors are not
hooked. It's however possible we won't need the destructor support
for now at all if vma_lock is moved to vma itself [1] and if it's
possible to free anon_name and numa balancing tracking immediately
and not after a grace period.
- in case a prefilled sheaf is requested with more objects than the
cache's sheaf_capacity, it will fail. This should be possible to
handle by allocating a bigger sheaf and then freeing it when returned,
to avoid mixing up different sizes. Ineffective, but acceptable if
very rare.
To: Suren Baghdasaryan <surenb@google.com>
To: Liam R. Howlett <Liam.Howlett@oracle.com>
To: Christoph Lameter <cl@linux.com>
To: David Rientjes <rientjes@google.com>
To: Pekka Enberg <penberg@kernel.org>
To: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: rcu@vger.kernel.org Cc: maple-tree@lists.infradead.org
---
Changes in v2:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v1: https://lore.kernel.org/r/20241112-slub-percpu-caches-v1-0-ddc0bdc27e05@suse.cz
--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
"series": {
"revision": 2,
"change-id": "20231128-slub-percpu-caches-9441892011d7",
"prefixes": [
"RFC"
],
"from-thread": "20230810163627.6206-9-vbabka@suse.cz",
"history": {
"v3": [
"20231129-slub-percpu-caches-v3-0-6bcf536772bc@suse.cz"
],
"v1": [
"20240712-slub-percpu-caches-v1-0-3e61f0ad8996@suse.cz",
"20241112-slub-percpu-caches-v1-0-ddc0bdc27e05@suse.cz"
]
}
}
}
Linus Torvalds [Sun, 10 Nov 2024 22:16:28 +0000 (14:16 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A handful of Qualcomm clk driver fixes:
- Correct flags for X Elite USB MP GDSC and pcie pipediv2 clocks
- Fix alpha PLL post_div mask for the cases where width is not
specified
- Avoid hangs in the SM8350 video driver (venus) by setting HW_CTRL
trigger feature on the video clocks"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: qcom: gcc-x1e80100: Fix USB MP SS1 PHY GDSC pwrsts flags
clk: qcom: gcc-x1e80100: Fix halt_check for pipediv2 clocks
clk: qcom: clk-alpha-pll: Fix pll post div mask when width is not set
clk: qcom: videocc-sm8350: use HW_CTRL_TRIGGER for vcodec GDSCs
Linus Torvalds [Sun, 10 Nov 2024 22:13:05 +0000 (14:13 -0800)]
Merge tag 'i2c-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"i2c-host fixes for v6.12-rc7 (from Andi):
- Fix designware incorrect behavior when concluding a transmission
- Fix Mule multiplexer error value evaluation"
* tag 'i2c-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: designware: do not hold SCL low when I2C_DYNAMIC_TAR_UPDATE is not set
i2c: muxes: Fix return value check in mule_i2c_mux_probe()
If the caller supplies an iocb->ki_pos value that is close to the
filesystem upper limit, and an iterator with a count that causes us to
overflow that limit, then filemap_read() enters an infinite loop.
This behaviour was discovered when testing xfstests generic/525 with the
"localio" optimisation for loopback NFS mounts.
Reported-by: Mike Snitzer <snitzer@kernel.org> Fixes: c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()") Tested-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 10 Nov 2024 17:37:47 +0000 (09:37 -0800)]
Merge tag 'irq_urgent_for_v6.12_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Borislav Petkov:
- Make sure GICv3 controller interrupt activation doesn't race with a
concurrent deactivation due to propagation delays of the register
write
* tag 'irq_urgent_for_v6.12_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Force propagation of the active state with a read-back
Linus Torvalds [Sun, 10 Nov 2024 17:04:27 +0000 (09:04 -0800)]
Merge tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"20 hotfixes, 14 of which are cc:stable.
Three affect DAMON. Lorenzo's five-patch series to address the
mmap_region error handling is here also.
Apart from that, various singletons"
* tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mailmap: add entry for Thorsten Blum
ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
signal: restore the override_rlimit logic
fs/proc: fix compile warning about variable 'vmcore_mmap_ops'
ucounts: fix counter leak in inc_rlimit_get_ucounts()
selftests: hugetlb_dio: check for initial conditions to skip in the start
mm: fix docs for the kernel parameter ``thp_anon=``
mm/damon/core: avoid overflow in damon_feed_loop_next_input()
mm/damon/core: handle zero schemes apply interval
mm/damon/core: handle zero {aggregation,ops_update} intervals
mm/mlock: set the correct prev on failure
objpool: fix to make percpu slot allocation more robust
mm/page_alloc: keep track of free highatomic
mm: resolve faulty mmap_region() error path behaviour
mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
mm: refactor map_deny_write_exec()
mm: unconditionally close VMAs on error
mm: avoid unsafe VMA hook invocation when error arises on mmap hook
mm/thp: fix deferred split unqueue naming and locking
mm/thp: fix deferred split queue not partially_mapped
Linus Torvalds [Sun, 10 Nov 2024 16:56:48 +0000 (08:56 -0800)]
Merge tag 'usb-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt fixes from Greg KH:
"Here are some small remaining USB and Thunderbolt fixes and device ids
for 6.12-rc7. Included in here are:
- new USB serial driver device ids
- thunderbolt driver fixes for reported problems
- typec bugfixes
- dwc3 driver fix
- musb driver fix
All of these have been in linux-next this past week with no reported
issues"
* tag 'usb-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: qcserial: add support for Sierra Wireless EM86xx
thunderbolt: Fix connection issue with Pluggable UD-4VPD dock
usb: typec: fix potential out of bounds in ucsi_ccg_update_set_new_cam_cmd()
usb: dwc3: fix fault at system suspend if device was already runtime suspended
usb: typec: qcom-pmic: init value of hdr_len/txbuf_len earlier
usb: musb: sunxi: Fix accessing an released usb phy
USB: serial: io_edgeport: fix use after free in debug printk
USB: serial: option: add Quectel RG650V
USB: serial: option: add Fibocom FG132 0x0112 composition
thunderbolt: Add only on-board retimers when !CONFIG_USB4_DEBUGFS_MARGINING
Linus Torvalds [Sun, 10 Nov 2024 16:53:24 +0000 (08:53 -0800)]
Merge tag 'staging-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are two small memory leak fixes for the vchiq_arm staging driver
that have been sitting in my tree for weeks and should get merged for
6.12-rc7 so that people don't keep tripping over them.
They both have been in linux-next for a while with no reported
problems"
* tag 'staging-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vchiq_arm: Use devm_kzalloc() for drv_mgmt allocation
staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation
Linus Torvalds [Fri, 8 Nov 2024 23:20:45 +0000 (13:20 -1000)]
Merge tag 'thermal-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These fix one issue in the qcom lmh thermal driver, a DT handling
issue in the thermal core and two issues in the userspace thermal
library:
- Allow tripless thermal zones defined in a DT to be registered in
accordance with the thermal DT bindings (Icenowy Zheng)
- Annotate LMH IRQs with lockdep classes to prevent lockdep from
reporting a possible recursive locking issue that cannot really
occur (Dmitry Baryshkov)
- Improve the thermal library "make clean" to remove a leftover
symbolic link created during compilation and fix the sampling
handler invocation in that library to pass the correct pointer to
it (Emil Dahl Juhl, zhang jiao)"
* tag 'thermal-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/of: support thermal zones w/o trips subnode
tools/lib/thermal: Remove the thermal.h soft link when doing make clean
tools/lib/thermal: Fix sampling handler context ptr
thermal/drivers/qcom/lmh: Remove false lockdep backtrace
Linus Torvalds [Fri, 8 Nov 2024 23:13:54 +0000 (13:13 -1000)]
Merge tag 'pm-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix the asymmetric CPU capacity support code in the intel_pstate
driver, added during this develompent cycle, to address a corner case
in which the capacity of a CPU going online is not updated (Rafael
Wysocki)"
* tag 'pm-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Update asym capacity for CPUs that were offline initially
cpufreq: intel_pstate: Clear hybrid_max_perf_cpu before driver registration
Linus Torvalds [Fri, 8 Nov 2024 23:08:23 +0000 (13:08 -1000)]
Merge tag 'acpi-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix the ACPI processor driver initialization ordering after recent
changes to avoid calling init_freq_invariance_cppc() too early on AMD
platforms (Mario Limonciello)"
* tag 'acpi-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: processor: Move arch_init_invariance_cppc() call later
Linus Torvalds [Fri, 8 Nov 2024 19:56:27 +0000 (09:56 -1000)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes, the drivers one in ufs simply delays running a work
queue and the generic one in zoned storage switches to a more correct
API that tries the standard buddy allocator first (for small
allocations); this fixes an allocation problem with small allocations
seen under memory pressure"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Start the RTC update work later
scsi: sd_zbc: Use kvzalloc() to allocate REPORT ZONES buffer
Linus Torvalds [Fri, 8 Nov 2024 19:49:32 +0000 (09:49 -1000)]
Merge tag 'drm-fixes-2024-11-09' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes, usual leaders in amdgpu and xe, then a panel quirk, and
some fixes to imagination and panthor drivers. Seems around the usual
level for this time and don't know of any big problems.
panthor:
- Lock VM array
- Be strict about I/O mapping flags
xe:
- Fix ccs_mode setting for Xe2 and later
- Synchronize ccs_mode setting with client creation
- Apply scheduling WA for LNL in additional places as needed
- Fix leak and lock handling in error paths of xe_exec ioctl
- Fix GGTT allocation leak leading to eventual crash in SR-IOV
- Move run_ticks update out of job handling to avoid synchronization
with reader"
* tag 'drm-fixes-2024-11-09' of https://gitlab.freedesktop.org/drm/kernel: (23 commits)
drm/panthor: Be stricter about IO mapping flags
drm/panthor: Lock XArray when getting entries for the VM
drm: panel-orientation-quirks: Make Lenovo Yoga Tab 3 X90F DMI match less strict
drm/xe: Stop accumulating LRC timestamp on job_free
drm/xe/pf: Fix potential GGTT allocation leak
drm/xe: Drop VM dma-resv lock on xe_sync_in_fence_get failure in exec IOCTL
drm/xe: Fix possible exec queue leak in exec IOCTL
drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read()
drm/amdgpu: Adjust debugfs eviction and IB access permissions
drm/amdgpu: Adjust debugfs register access permissions
drm/amdgpu: Fix DPX valid mode check on GC 9.4.3
drm/amd/pm: correct the workload setting
drm/amd/pm: always pick the pptable from IFWI
drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported
drm/amd/display: parse umc_info or vram_info based on ASIC
drm/amd/display: Fix brightness level not retained over reboot
drm/xe/guc/tlb: Flush g2h worker in case of tlb timeout
drm/xe/ufence: Flush xe ordered_wq in case of ufence timeout
drm/xe: Move LNL scheduling WA to xe_device.h
drm/xe: Use the filelist from drm for ccs_mode change
...
Dave Airlie [Fri, 8 Nov 2024 19:14:28 +0000 (05:14 +1000)]
Merge tag 'drm-xe-fixes-2024-11-08' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix ccs_mode setting for Xe2 and later (Balasubramani)
- Synchronize ccs_mode setting with client creation (Balasubramani)
- Apply scheduling WA for LNL in additional places as needed
(Nirmoy)
- Fix leak and lock handling in error paths of xe_exec ioctl
(Matthew Brost)
- Fix GGTT allocation leak leading to eventual crash in SR-IOV
(Michal Wajdeczko)
- Move run_ticks update out of job handling to avoid synchronization
with reader (Lucas)
Liu Peibao [Fri, 1 Nov 2024 08:12:43 +0000 (16:12 +0800)]
i2c: designware: do not hold SCL low when I2C_DYNAMIC_TAR_UPDATE is not set
When the Tx FIFO is empty and the last command has no STOP bit
set, the master holds SCL low. If I2C_DYNAMIC_TAR_UPDATE is not
set, BIT(13) MST_ON_HOLD of IC_RAW_INTR_STAT is not enabled,
causing the __i2c_dw_disable() timeout. This is quite similar to
commit 2409205acd3c ("i2c: designware: fix __i2c_dw_disable() in
case master is holding SCL low"). Also check BIT(7)
MST_HOLD_TX_FIFO_EMPTY in IC_STATUS, which is available when
IC_STAT_FOR_CLK_STRETCH is set.
Fixes: 2409205acd3c ("i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low") Co-developed-by: Xiaowu Ding <xiaowu.ding@jaguarmicro.com> Signed-off-by: Xiaowu Ding <xiaowu.ding@jaguarmicro.com> Co-developed-by: Angus Chen <angus.chen@jaguarmicro.com> Signed-off-by: Angus Chen <angus.chen@jaguarmicro.com> Signed-off-by: Liu Peibao <loven.liu@jaguarmicro.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Linus Torvalds [Fri, 8 Nov 2024 17:44:28 +0000 (07:44 -1000)]
Merge tag 'sound-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Still more changes floating than wished at this late stage, but all
are small device-specific fixes, and look less troublesome.
Including a few ASoC quirk / ID additoins, a series of ASoC STM fixes,
HD-audio conexant codec regression fix, and other various quirks and
device-specific fixes"
* tag 'sound-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: SOF: sof-client-probes-ipc4: Set param_size extension bits
ASoC: stm: Prevent potential division by zero in stm32_sai_get_clk_div()
ASoC: stm: Prevent potential division by zero in stm32_sai_mclk_round_rate()
ASoC: amd: yc: Support dmic on another model of Lenovo Thinkpad E14 Gen 6
ASoC: SOF: amd: Fix for incorrect DMA ch status register offset
ASoC: amd: yc: fix internal mic on Xiaomi Book Pro 14 2022
ASoC: stm32: spdifrx: fix dma channel release in stm32_spdifrx_remove
MAINTAINERS: Generic Sound Card section
ALSA: usb-audio: Add quirk for HP 320 FHD Webcam
ASoC: tas2781: Add new driver version for tas2563 & tas2781 qfn chip
ALSA: firewire-lib: fix return value on fail in amdtp_tscm_init()
ALSA: ump: Don't enumeration invalid groups for legacy rawmidi
Revert "ALSA: hda/conexant: Mute speakers at suspend / shutdown"
Linus Torvalds [Fri, 8 Nov 2024 17:41:27 +0000 (07:41 -1000)]
Merge tag 'media/v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- dvb-core fixes for vb2 check and device registration
- v4l2-core: fix an issue with error handling for VIDIOC_G_CTRL
- vb2 core: fix an issue with vb plane copy logic
- videobuf2-core: copy vb planes unconditionally
- vivid: fix buffer overwrite when using > 32 buffers
- vivid: fix a potential division by zero due to an issue at v4l2-tpg
- some spectre vulnerability fixes
- several OOM access fixes
- some buffer overflow fixes
* tag 'media/v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: videobuf2-core: copy vb planes unconditionally
media: dvbdev: fix the logic when DVB_DYNAMIC_MINORS is not set
media: vivid: fix buffer overwrite when using > 32 buffers
media: pulse8-cec: fix data timestamp at pulse8_setup()
media: cec: extron-da-hd-4k-plus: don't use -1 as an error code
media: stb0899_algo: initialize cfr before using it
media: adv7604: prevent underflow condition when reporting colorspace
media: cx24116: prevent overflows on SNR calculus
media: ar0521: don't overflow when checking PLL values
media: s5p-jpeg: prevent buffer overflows
media: av7110: fix a spectre vulnerability
media: mgb4: protect driver against spectre
media: dvb_frontend: don't play tricks with underflow values
media: dvbdev: prevent the risk of out of memory access
media: v4l2-tpg: prevent the risk of a division by zero
media: v4l2-ctrls-api: fix error handling for v4l2_g_ctrl()
media: dvb-core: add missing buffer index check
Linus Torvalds [Fri, 8 Nov 2024 17:35:16 +0000 (07:35 -1000)]
Merge tag 'slab-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
- Fix for duplicate caches in some arm64 configurations with
CONFIG_SLAB_BUCKETS (Koichiro Den)
* tag 'slab-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: fix warning caused by duplicate kmem_cache creation in kmem_buckets_create
Linus Torvalds [Fri, 8 Nov 2024 17:31:03 +0000 (07:31 -1000)]
Merge tag 'for-6.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more one-liners that fix some user visible problems:
- use correct range when clearing qgroup reservations after COW
- properly reset freed delayed ref list head
- fix ro/rw subvolume mounts to be backward compatible with old and
new mount API"
* tag 'for-6.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix the length of reserved qgroup to free
btrfs: reinitialize delayed ref list after deleting it from the list
btrfs: fix per-subvolume RO/RW flags with new mount API
Linus Torvalds [Fri, 8 Nov 2024 17:27:14 +0000 (07:27 -1000)]
Merge tag 'bcachefs-2024-11-07' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Some trivial syzbot fixes, two more serious btree fixes found by
looping single_devices.ktest small_nodes:
- Topology error on split after merge, where we accidentaly picked
the node being deleted for the pivot, resulting in an assertion pop
- New nodes being preallocated were left on the freedlist, unlocked,
resulting in them sometimes being accidentally freed: this dated
from pre-cycle detector, when we could leave them locked. This
should have resulted in more explosions and fireworks, but turned
out to be surprisingly hard to hit because the preallocated nodes
were being used right away.
The fix for this is bigger than we'd like - reworking btree list
handling was a bit invasive - but we've now got more assertions and
it's well tested.
- Also another mishandled transaction restart fix (in
btree_node_prefetch) - we're almost done with those"
* tag 'bcachefs-2024-11-07' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix UAF in __promote_alloc() error path
bcachefs: Change OPT_STR max to be 1 less than the size of choices array
bcachefs: btree_cache.freeable list fixes
bcachefs: check the invalid parameter for perf test
bcachefs: add check NULL return of bio_kmalloc in journal_read_bucket
bcachefs: Ensure BCH_FS_may_go_rw is set before exiting recovery
bcachefs: Fix topology errors on split after merge
bcachefs: Ancient versions with bad bkey_formats are no longer supported
bcachefs: Fix error handling in bch2_btree_node_prefetch()
bcachefs: Fix null ptr deref in bucket_gen_get()
Linus Torvalds [Fri, 8 Nov 2024 17:19:58 +0000 (07:19 -1000)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here is a (hopefully) final round of arm64 fixes for 6.12 that address
some user-visible floating point register corruption. Both of the
Marks have been working on this for a couple of weeks and we've ended
up in a position where SVE is solid but SME still has enough pending
issues that the most pragmatic solution for the release and stable
backports is to disable the feature. Yes, it's a shame, but the
hardware is rare as hen's teeth at the moment and we're better off
getting back to a known good state before fixing it all properly.
We're also improving the selftests for 6.13 to help avoid merging
broken code in the future.
Anyway, the good news is that we're removing a lot more code than
we're adding.
Summary:
- Fix handling of SVE traps from userspace on preemptible kernels
when converting the saved floating point state into SVE state.
- Remove broken support for the SMCCCv1.3 "SVE discard hint"
optimisation.
- Disable SME support, as the current support code suffers from
numerous issues around signal delivery, ptrace access and
context-switch which can lead to user-visible corruption of the
register state"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Kconfig: Make SME depend on BROKEN for now
arm64: smccc: Remove broken support for SMCCCv1.3 SVE discard hint
arm64/sve: Discard stale CPU state when handling SVE traps
Linus Torvalds [Fri, 8 Nov 2024 17:16:01 +0000 (07:16 -1000)]
Merge tag 'powerpc-6.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Madhavan Srinivasan:
- Fix spurious interrupts in Book3S HV Nested KVM
Thanks to Gautam Menghani.
* tag 'powerpc-6.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to avoid spurious interrupts
Greg Kroah-Hartman [Fri, 8 Nov 2024 07:36:31 +0000 (08:36 +0100)]
Merge tag 'usb-serial-6.12-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for 6.12-rc7
Here's a fix for a long-standing use-after-free in an io_edgeport debug
printk and some new modem device ids.
All have been in linux-next with no reported issues.
* tag 'usb-serial-6.12-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: qcserial: add support for Sierra Wireless EM86xx
USB: serial: io_edgeport: fix use after free in debug printk
USB: serial: option: add Quectel RG650V
USB: serial: option: add Fibocom FG132 0x0112 composition
Linus Torvalds [Thu, 7 Nov 2024 22:49:36 +0000 (12:49 -1000)]
Merge tag 'regulator-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of small fixes for drivers, nothing particularly remarkable"
* tag 'regulator-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: rk808: Add apply_bit for BUCK3 on RK809
regulator: rtq2208: Fix uninitialized use of regulator_config
Reproducer uses faultinject facility to fail ocfs2_xa_remove() ->
ocfs2_xa_value_truncate() with -ENOMEM.
In this case the comment mentions that we can return 0 if
ocfs2_xa_cleanup_value_truncate() is going to wipe the entry
anyway. But the following 'rc' check is wrong and execution flow do
'ocfs2_xa_remove_entry(loc);' twice:
* 1st: in ocfs2_xa_cleanup_value_truncate();
* 2nd: returning back to ocfs2_xa_remove() instead of going to 'out'.
Fix this by skipping the 2nd removal of the same entry and making
syzkaller repro happy.
Link: https://lkml.kernel.org/r/20241103193845.2940988-1-andrew.kanner@gmail.com Fixes: 399ff3a748cf ("ocfs2: Handle errors while setting external xattr values.") Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> Reported-by: syzbot+386ce9e60fa1b18aac5b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/671e13ab.050a0220.2b8c0f.01d0.GAE@google.com/T/ Tested-by: syzbot+386ce9e60fa1b18aac5b@syzkaller.appspotmail.com Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Roman Gushchin [Mon, 4 Nov 2024 19:54:19 +0000 (19:54 +0000)]
signal: restore the override_rlimit logic
Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of
ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of
signals. However now it's enforced unconditionally, even if
override_rlimit is set. This behavior change caused production issues.
For example, if the limit is reached and a process receives a SIGSEGV
signal, sigqueue_alloc fails to allocate the necessary resources for the
signal delivery, preventing the signal from being delivered with siginfo.
This prevents the process from correctly identifying the fault address and
handling the error. From the user-space perspective, applications are
unaware that the limit has been reached and that the siginfo is
effectively 'corrupted'. This can lead to unpredictable behavior and
crashes, as we observed with java applications.
Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip
the comparison to max there if override_rlimit is set. This effectively
restores the old behavior.
Link: https://lkml.kernel.org/r/20241104195419.3962584-1-roman.gushchin@linux.dev Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Co-developed-by: Andrei Vagin <avagin@google.com> Signed-off-by: Andrei Vagin <avagin@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Alexey Gladkov <legion@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrei Vagin [Fri, 1 Nov 2024 19:19:40 +0000 (19:19 +0000)]
ucounts: fix counter leak in inc_rlimit_get_ucounts()
The inc_rlimit_get_ucounts() increments the specified rlimit counter and
then checks its limit. If the value exceeds the limit, the function
returns an error without decrementing the counter.
Link: https://lkml.kernel.org/r/20241101191940.3211128-1-roman.gushchin@linux.dev Fixes: 15bc01effefe ("ucounts: Fix signal ucount refcounting") Signed-off-by: Andrei Vagin <avagin@google.com> Co-developed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Tested-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Alexey Gladkov <legion@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Andrei Vagin <avagin@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muhammad Usama Anjum [Fri, 1 Nov 2024 14:15:57 +0000 (19:15 +0500)]
selftests: hugetlb_dio: check for initial conditions to skip in the start
The test should be skipped if initial conditions aren't fulfilled in the
start instead of failing and outputting non-compliant TAP logs. This kind
of failure pollutes the results. The initial conditions are:
- The test should only execute if /tmp file can be allocated.
- The test should only execute if huge pages are free.
Before:
TAP version 13
1..4
Bail out! Error opening file
: Read-only file system (30)
# Planned tests != run tests (4 != 0)
# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
After:
TAP version 13
1..0 # SKIP Unable to allocate file: Read-only file system
Link: https://lkml.kernel.org/r/20241101141557.3159432-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Fixes: 3a103b5315b7 ("selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()") Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Donet Tom <donettom@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This happens because the correct format isn't ``thp_anon=<size>,<size>[KMG]:<state>```,
as [KMG] must follow each number to especify its unit. So, the correct
format is ``thp_anon=<size>[KMG],<size>[KMG]:<state>```.
Therefore, adjust the documentation to reflect the correct format of the
parameter ``thp_anon=``.
Link: https://lkml.kernel.org/r/20241101165719.1074234-3-mcanal@igalia.com Fixes: dd4d30d1cdbe ("mm: override mTHP "enabled" defaults at kernel cmdline") Signed-off-by: Maíra Canal <mcanal@igalia.com> Acked-by: Barry Song <baohua@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Thu, 31 Oct 2024 16:12:03 +0000 (09:12 -0700)]
mm/damon/core: avoid overflow in damon_feed_loop_next_input()
damon_feed_loop_next_input() is inefficient and fragile to overflows.
Specifically, 'score_goal_diff_bp' calculation can overflow when 'score'
is high. The calculation is actually unnecessary at all because 'goal' is
a constant of value 10,000. Calculation of 'compensation' is again
fragile to overflow. Final calculation of return value for under-achiving
case is again fragile to overflow when the current score is
under-achieving the target.
Add two corner cases handling at the beginning of the function to make the
body easier to read, and rewrite the body of the function to avoid
overflows and the unnecessary bp value calcuation.
SeongJae Park [Thu, 31 Oct 2024 18:37:57 +0000 (11:37 -0700)]
mm/damon/core: handle zero schemes apply interval
DAMON's logics to determine if this is the time to apply damos schemes
assumes next_apply_sis is always set larger than current
passed_sample_intervals. And therefore assume continuously incrementing
passed_sample_intervals will make it reaches to the next_apply_sis in
future. The logic hence does apply the scheme and update next_apply_sis
only if passed_sample_intervals is same to next_apply_sis.
If Schemes apply interval is set as zero, however, next_apply_sis is set
same to current passed_sample_intervals, respectively. And
passed_sample_intervals is incremented before doing the next_apply_sis
check. Hence, next_apply_sis becomes larger than next_apply_sis, and the
logic says it is not the time to apply schemes and update next_apply_sis.
In other words, DAMON stops applying schemes until passed_sample_intervals
overflows.
Based on the documents and the common sense, a reasonable behavior for
such inputs would be applying the schemes for every sampling interval.
Handle the case by removing the assumption.
Link: https://lkml.kernel.org/r/20241031183757.49610-3-sj@kernel.org Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.7.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Thu, 31 Oct 2024 18:37:56 +0000 (11:37 -0700)]
mm/damon/core: handle zero {aggregation,ops_update} intervals
Patch series "mm/damon/core: fix handling of zero non-sampling intervals".
DAMON's internal intervals accounting logic is not correctly handling
non-sampling intervals of zero values for a wrong assumption. This could
cause unexpected monitoring behavior, and even result in infinite hang of
DAMON sysfs interface user threads in case of zero aggregation interval.
Fix those by updating the intervals accounting logic. For details of the
root case and solutions, please refer to commit messages of fixes.
This patch (of 2):
DAMON's logics to determine if this is the time to do aggregation and ops
update assumes next_{aggregation,ops_update}_sis are always set larger
than current passed_sample_intervals. And therefore it further assumes
continuously incrementing passed_sample_intervals every sampling interval
will make it reaches to the next_{aggregation,ops_update}_sis in future.
The logic therefore make the action and update
next_{aggregation,ops_updaste}_sis only if passed_sample_intervals is same
to the counts, respectively.
If Aggregation interval or Ops update interval are zero, however,
next_aggregation_sis or next_ops_update_sis are set same to current
passed_sample_intervals, respectively. And passed_sample_intervals is
incremented before doing the next_{aggregation,ops_update}_sis check.
Hence, passed_sample_intervals becomes larger than
next_{aggregation,ops_update}_sis, and the logic says it is not the time
to do the action and update next_{aggregation,ops_update}_sis forever,
until an overflow happens. In other words, DAMON stops doing aggregations
or ops updates effectively forever, and users cannot get monitoring
results.
Based on the documents and the common sense, a reasonable behavior for
such inputs is doing an aggregation and an ops update for every sampling
interval. Handle the case by removing the assumption.
Note that this could incur particular real issue for DAMON sysfs interface
users, in case of zero Aggregation interval. When user starts DAMON with
zero Aggregation interval and asks online DAMON parameter tuning via DAMON
sysfs interface, the request is handled by the aggregation callback.
Until the callback finishes the work, the user who requested the online
tuning just waits. Hence, the user will be stuck until the
passed_sample_intervals overflows.
Wei Yang [Sun, 27 Oct 2024 12:33:21 +0000 (12:33 +0000)]
mm/mlock: set the correct prev on failure
After commit 94d7d9233951 ("mm: abstract the vma_merge()/split_vma()
pattern for mprotect() et al."), if vma_modify_flags() return error, the
vma is set to an error code. This will lead to an invalid prev be
returned.
Generally this shouldn't matter as the caller should treat an error as
indicating state is now invalidated, however unfortunately
apply_mlockall_flags() does not check for errors and assumes that
mlock_fixup() correctly maintains prev even if an error were to occur.
This patch fixes that assumption.
[lorenzo.stoakes@oracle.com: provide a better fix and rephrase the log] Link: https://lkml.kernel.org/r/20241027123321.19511-1-richard.weiyang@gmail.com Fixes: 94d7d9233951 ("mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al.") Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
objpool: fix to make percpu slot allocation more robust
Since gfp & GFP_ATOMIC == GFP_ATOMIC is true for GFP_KERNEL | GFP_HIGH, it
will use kmalloc if user specifies that combination. Here the reason why
combining the __vmalloc_node() and kmalloc_node() is that the vmalloc does
not support all GFP flag, especially GFP_ATOMIC. So we should check if
gfp & (GFP_ATOMIC | GFP_KERNEL) != GFP_ATOMIC for vmalloc first. This
ensures caller can sleep. And for the robustness, even if vmalloc fails,
it should retry with kmalloc to allocate it.
Link: https://lkml.kernel.org/r/173008598713.1262174.2959179484209897252.stgit@mhiramat.roam.corp.google.com Fixes: aff1871bfc81 ("objpool: fix choosing allocation for percpu slots") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Closes: https://lore.kernel.org/all/CAHk-=whO+vSH+XVRio8byJU8idAWES0SPGVZ7KAVdc4qrV0VUA@mail.gmail.com/ Cc: Leo Yan <leo.yan@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Wu <wuqiang.matt@bytedance.com> Cc: Mikel Rychliski <mikel@mikelr.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The second line above shows that the OOM kill was due to the following
condition:
free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB)
And the third line shows there were no free pages in any
MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type 'H'.
Therefore __zone_watermark_unusable_free() underestimated the usable free
memory by over 1GB, which resulted in the unnecessary OOM kill above.
The comments in __zone_watermark_unusable_free() warns about the potential
risk, i.e.,
If the caller does not have rights to reserves below the min
watermark then subtract the high-atomic reserves. This will
over-estimate the size of the atomic reserve but it avoids a search.
However, it is possible to keep track of free pages in reserved highatomic
pageblocks with a new per-zone counter nr_free_highatomic protected by the
zone lock, to avoid a search when calculating the usable free memory. And
the cost would be minimal, i.e., simple arithmetics in the highatomic
alloc/free/move paths.
Note that since nr_free_highatomic can be relatively small, using a
per-cpu counter might cause too much drift and defeat its purpose, in
addition to the extra memory overhead.
Piotr Zalewski [Wed, 6 Nov 2024 19:46:30 +0000 (19:46 +0000)]
bcachefs: Change OPT_STR max to be 1 less than the size of choices array
Change OPT_STR max value to be 1 less than the "ARRAY_SIZE" of "_choices"
array. As a result, remove -1 from (opt->max-1) in bch2_opt_to_text.
The "_choices" array is a null-terminated array, so computing the maximum
using "ARRAY_SIZE" without subtracting 1 yields an incorrect result. Since
bch2_opt_validate don't subtract 1, as bch2_opt_to_text does, values
bigger than the actual maximum would pass through option validation.
Reported-by: syzbot+bee87a0c3291c06aa8c6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=bee87a0c3291c06aa8c6 Fixes: 63c4b2545382 ("bcachefs: Better superblock opt validation") Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 31 Oct 2024 05:17:54 +0000 (01:17 -0400)]
bcachefs: btree_cache.freeable list fixes
When allocating new btree nodes, we were leaving them on the freeable
list - unlocked - allowing them to be reclaimed: ouch.
Additionally, bch2_btree_node_free_never_used() ->
bch2_btree_node_hash_remove was putting it on the freelist, while
bch2_btree_node_free_never_used() was putting it back on the btree
update reserve list - ouch.
Originally, the code was written to always keep btree nodes on a list -
live or freeable - and this worked when new nodes were kept locked.
But now with the cycle detector, we can't keep nodes locked that aren't
tracked by the cycle detector; and this is fine as long as they're not
reachable.
We also have better and more robust leak detection now, with memory
allocation profiling, so the original justification no longer applies.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Hongbo Li [Tue, 29 Oct 2024 12:53:29 +0000 (20:53 +0800)]
bcachefs: check the invalid parameter for perf test
The perf_test does not check the number of iterations and threads
when it is zero. If nr_thread is 0, the perf test will keep
waiting for wakekup. If iteration is 0, it will cause exception
of division by zero. This can be reproduced by:
echo "rand_insert 0 1" > /sys/fs/bcachefs/${uuid}/perf_test
or
echo "rand_insert 1 0" > /sys/fs/bcachefs/${uuid}/perf_test
Fixes: 1c6fdbd8f246 ("bcachefs: Initial commit") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Pei Xiao [Wed, 30 Oct 2024 07:48:01 +0000 (15:48 +0800)]
bcachefs: add check NULL return of bio_kmalloc in journal_read_bucket
bio_kmalloc may return NULL, will cause NULL pointer dereference.
Add check NULL return for bio_kmalloc in journal_read_bucket.
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Fixes: ac10a9611d87 ("bcachefs: Some fixes for building in userspace") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 31 Oct 2024 06:36:21 +0000 (02:36 -0400)]
bcachefs: Ancient versions with bad bkey_formats are no longer supported
Syzbot found an assertion pop, by generating an ancient filesystem
version with an invalid bkey_format (with fields that can overflow) as
well as packed keys that aren't representable unpacked.
This breaks key comparisons in all sorts of painful ways.
Filesystems have been automatically rewriting nodes with such invalid
formats for years; we can safely drop support for them.
Reported-by: syzbot+8a0109511de9d4b61217@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 28 Oct 2024 00:40:20 +0000 (20:40 -0400)]
bcachefs: Fix null ptr deref in bucket_gen_get()
bucket_gen() checks if we're lookup up a valid bucket and returns NULL
otherwise, but bucket_gen_get() was failing to check; other callers were
correct.
Also do a bit of cleanup on callers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
- can: mcp251xfd: mcp251xfd_ring_alloc(): fix coalescing
configuration when switching CAN modes
Misc:
- revert earlier hns3 fixes, they were ignoring IOMMU abstractions
and need to be reworked
- can: {cc770,sja1000}_isa: allow building on x86_64"
* tag 'net-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits)
drivers: net: ionic: add missed debugfs cleanup to ionic_probe() error path
net/smc: do not leave a dangling sk pointer in __smc_create()
rxrpc: Fix missing locking causing hanging calls
net/smc: Fix lookup of netdev by using ib_device_get_netdev()
net: arc: rockchip: fix emac mdio node support
net: arc: fix the device for dma_map_single/dma_unmap_single
virtio_net: Update rss when set queue
virtio_net: Sync rss config to device when virtnet_probe
virtio_net: Add hash_key_length check
virtio_net: Support dynamic rss indirection table size
netfilter: nf_tables: wait for rcu grace period on net_device removal
net: stmmac: Fix unbalanced IRQ wake disable warning on single irq case
net: vertexcom: mse102x: Fix possible double free of TX skb
mptcp: use sock_kfree_s instead of kfree
mptcp: no admin perm to list endpoints
net: phy: ti: add PHY_RST_AFTER_CLK_EN flag
net: ethernet: ti: am65-cpsw: fix warning in am65_cpsw_nuss_remove_rx_chns()
net: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7
net: hns3: fix kernel crash when uninstalling driver
Revert "Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'"
...
The ionic_setup_one() creates a debugfs entry for ionic upon
successful execution. However, the ionic_probe() does not
release the dentry before returning, resulting in a memory
leak.
To fix this bug, we add the ionic_debugfs_del_dev() to release
the resources in a timely manner before returning.
Fixes: 0de38d9f1dba ("ionic: extract common bits from ionic_probe") Signed-off-by: Wentao Liang <Wentao_liang_g@163.com> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20241107021756.1677-1-liangwentao@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Wed, 6 Nov 2024 13:03:22 +0000 (13:03 +0000)]
rxrpc: Fix missing locking causing hanging calls
If a call gets aborted (e.g. because kafs saw a signal) between it being
queued for connection and the I/O thread picking up the call, the abort
will be prioritised over the connection and it will be removed from
local->new_client_calls by rxrpc_disconnect_client_call() without a lock
being held. This may cause other calls on the list to disappear if a race
occurs.
Fix this by taking the client_call_lock when removing a call from whatever
list its ->wait_link happens to be on.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org Reported-by: Marc Dionne <marc.dionne@auristor.com> Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread") Link: https://patch.msgid.link/726660.1730898202@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wenjia Zhang [Wed, 6 Nov 2024 08:26:12 +0000 (09:26 +0100)]
net/smc: Fix lookup of netdev by using ib_device_get_netdev()
The SMC-R variant of the SMC protocol used direct call to function
ib_device_ops.get_netdev() to lookup netdev. As we used mlx5 device
driver to run SMC-R, it failed to find a device, because in mlx5_ib the
internal net device management for retrieving net devices was replaced
by a common interface ib_device_get_netdev() in commit 8d159eb2117b
("RDMA/mlx5: Use IB set_netdev and get_netdev functions").
Since such direct accesses to the internal net device management is not
recommended at all, update the SMC-R code to use proper API
ib_device_get_netdev().
Fixes: 54903572c23c ("net/smc: allow pnetid-less configuration") Reported-by: Aswin K <aswin@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://patch.msgid.link/20241106082612.57803-1-wenjia@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 7 Nov 2024 17:41:34 +0000 (07:41 -1000)]
Merge tag 'pwm/for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fix from Uwe Kleine-König:
"Fix period setting in imx-tpm driver and a maintainer update
Erik Schumacher found and fixed a problem in the calculation of the
PWM period setting yielding too long periods. Trevor Gamblin - who
already cared about mainlining the pwm-axi-pwmgen driver - stepped
forward as an additional reviewer.
Thanks to Erik and Trevor"
* tag 'pwm/for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
MAINTAINERS: add self as reviewer for AXI PWM GENERATOR
pwm: imx-tpm: Use correct MODULO value for EPWM mode
David Wang [Wed, 6 Nov 2024 02:12:28 +0000 (10:12 +0800)]
proc/softirqs: replace seq_printf with seq_put_decimal_ull_width
seq_printf is costy, on a system with n CPUs, reading /proc/softirqs
would yield 10*n decimal values, and the extra cost parsing format string
grows linearly with number of cpus. Replace seq_printf with
seq_put_decimal_ull_width have significant performance improvement.
On an 8CPUs system, reading /proc/softirqs show ~40% performance
gain with this patch.
Signed-off-by: David Wang <00107082@163.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jann Horn [Mon, 4 Nov 2024 23:17:13 +0000 (00:17 +0100)]
drm/panthor: Be stricter about IO mapping flags
The current panthor_device_mmap_io() implementation has two issues:
1. For mapping DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET,
panthor_device_mmap_io() bails if VM_WRITE is set, but does not clear
VM_MAYWRITE. That means userspace can use mprotect() to make the mapping
writable later on. This is a classic Linux driver gotcha.
I don't think this actually has any impact in practice:
When the GPU is powered, writes to the FLUSH_ID seem to be ignored; and
when the GPU is not powered, the dummy_latest_flush page provided by the
driver is deliberately designed to not do any flushes, so the only thing
writing to the dummy_latest_flush could achieve would be to make *more*
flushes happen.
2. panthor_device_mmap_io() does not block MAP_PRIVATE mappings (which are
mappings without the VM_SHARED flag).
MAP_PRIVATE in combination with VM_MAYWRITE indicates that the VMA has
copy-on-write semantics, which for VM_PFNMAP are semi-supported but
fairly cursed.
In particular, in such a mapping, the driver can only install PTEs
during mmap() by calling remap_pfn_range() (because remap_pfn_range()
wants to **store the physical address of the mapped physical memory into
the vm_pgoff of the VMA**); installing PTEs later on with a fault
handler (as panthor does) is not supported in private mappings, and so
if you try to fault in such a mapping, vmf_insert_pfn_prot() splats when
it hits a BUG() check.
Fix it by clearing the VM_MAYWRITE flag (userspace writing to the FLUSH_ID
doesn't make sense) and requiring VM_SHARED (copy-on-write semantics for
the FLUSH_ID don't make sense).
Reproducers for both scenarios are in the notes of my patch on the mailing
list; I tested that these bugs exist on a Rock 5B machine.
Note that I only compile-tested the patch, I haven't tested it; I don't
have a working kernel build setup for the test machine yet. Please test it
before applying it.
Cc: stable@vger.kernel.org Fixes: 5fe909cae118 ("drm/panthor: Add the device logical block") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105-panthor-flush-page-fixes-v1-1-829aaf37db93@google.com
Jakub Kicinski [Thu, 7 Nov 2024 16:16:42 +0000 (08:16 -0800)]
Merge tag 'nf-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fix for net
The following series contains a Netfilter fix:
1) Wait for rcu grace period after netdevice removal is reported via event.
* tag 'nf-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: wait for rcu grace period on net_device removal
====================
Jyri Sarha [Thu, 7 Nov 2024 13:28:40 +0000 (15:28 +0200)]
ASoC: SOF: sof-client-probes-ipc4: Set param_size extension bits
Write the size of the optional payload of SOF_IPC4_MOD_INIT_INSTANCE
message to extension param_size-bits.
The previous IPC4 version does not set these bits that should indicate
the size of the optional payload (struct sof_ipc4_probe_cfg). The old
firmware side component code works well without these bits, but when
the probes are converted to use the generic module API, this does not
work anymore.
Fixes: f5623593060f ("ASoC: SOF: IPC4: probes: Implement IPC4 ops for probes client device") Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Link: https://patch.msgid.link/20241107132840.17386-1-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
Liviu Dudau [Wed, 6 Nov 2024 18:58:06 +0000 (18:58 +0000)]
drm/panthor: Lock XArray when getting entries for the VM
Similar to commit cac075706f29 ("drm/panthor: Fix race when converting
group handle to group object") we need to use the XArray's internal
locking when retrieving a vm pointer from there.
v2: Removed part of the patch that was trying to protect fetching
the heap pointer from XArray, as that operation is protected by
the @pool->lock.
Fixes: 647810ec2476 ("drm/panthor: Add the MMU/VM logical block") Reported-by: Jann Horn <jannh@google.com> Cc: stable@vger.kernel.org Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241106185806.389089-1-liviu.dudau@arm.com
Hans de Goede [Sun, 25 Aug 2024 13:21:31 +0000 (15:21 +0200)]
drm: panel-orientation-quirks: Make Lenovo Yoga Tab 3 X90F DMI match less strict
There are 2G and 4G RAM versions of the Lenovo Yoga Tab 3 X90F and it
turns out that the 2G version has a DMI product name of
"CHERRYVIEW D1 PLATFORM" where as the 4G version has
"CHERRYVIEW C0 PLATFORM". The sys-vendor + product-version check are
unique enough that the product-name check is not necessary.
Drop the product-name check so that the existing DMI match for the 4G
RAM version also matches the 2G RAM version.
Greg Kroah-Hartman [Thu, 7 Nov 2024 15:11:57 +0000 (16:11 +0100)]
Merge tag 'thunderbolt-for-v6.12-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-linus
thunderbolt: Fixes for v6.12-rc7
This includes following USB4/Thunderbolt fixes for v6.12-rc7:
- Fix for retimer enumeration.
- Fix connection issue with Pluggable UD-4VPD USB4 dock.
Both have been in linux-next with no reported issues.
* tag 'thunderbolt-for-v6.12-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Fix connection issue with Pluggable UD-4VPD dock
thunderbolt: Add only on-board retimers when !CONFIG_USB4_DEBUGFS_MARGINING
Chuck Lever [Wed, 6 Nov 2024 21:55:05 +0000 (16:55 -0500)]
NFSD: Fix READDIR on NFSv3 mounts of ext4 exports
I noticed that recently, simple operations like "make" started
failing on NFSv3 mounts of ext4 exports. Network capture shows that
READDIRPLUS operated correctly but READDIR failed with
NFS3ERR_INVAL. The vfs_llseek() call returned EINVAL when it is
passed a non-zero starting directory cookie.
I bisected to commit c689bdd3bffa ("nfsd: further centralize
protocol version checks.").
Turns out that nfsd3_proc_readdir() does not call fh_verify() before
it calls nfsd_readdir(), so the new fhp->fh_64bit_cookies boolean is
not set properly. This leaves the NFSD_MAY_64BIT_COOKIE unset when
the directory is opened.
For ext4, this causes the wrong "max file size" value to be used
when sanity checking the incoming directory cookie (which is a seek
offset value).
The fhp->fh_64bit_cookies boolean is /always/ properly initialized
after nfsd_open() returns. There doesn't seem to be a reason for the
generic NFSD open helper to handle the f_mode fix-up for
directories, so just move that to the one caller that tries to open
an S_IFDIR with NFSD_MAY_64BIT_COOKIE.
Suggested-by: NeilBrown <neilb@suse.de> Fixes: c689bdd3bffa ("nfsd: further centralize protocol version checks.") Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Luo Yifan [Thu, 7 Nov 2024 01:59:36 +0000 (09:59 +0800)]
ASoC: stm: Prevent potential division by zero in stm32_sai_get_clk_div()
This patch checks if div is less than or equal to zero (div <= 0). If
div is zero or negative, the function returns -EINVAL, ensuring the
division operation is safe to perform.
Luo Yifan [Wed, 6 Nov 2024 01:46:54 +0000 (09:46 +0800)]
ASoC: stm: Prevent potential division by zero in stm32_sai_mclk_round_rate()
This patch checks if div is less than or equal to zero (div <= 0). If
div is zero or negative, the function returns -EINVAL, ensuring the
division operation (*prate / div) is safe to perform.
Paolo Abeni [Thu, 7 Nov 2024 12:39:43 +0000 (13:39 +0100)]
Merge branch 'fix-the-arc-emac-driver'
Andy Yan says:
====================
Fix the arc emac driver
The arc emac driver was broken for a long time,
The first broken happens when a dma releated fix introduced in Linux 5.10.
The second broken happens when a emac device tree node restyle introduced
in Linux 6.1.
These two patches are try to make the arc emac work again.
Changes in v2:
- Add cover letter.
- Add fix tag.
- Add more detail explaination.
====================
Johan Jonker [Mon, 4 Nov 2024 13:01:39 +0000 (21:01 +0800)]
net: arc: rockchip: fix emac mdio node support
The binding emac_rockchip.txt is converted to YAML.
Changed against the original binding is an added MDIO subnode.
This make the driver failed to find the PHY, and given the 'mdio
has invalid PHY address' it is probably looking in the wrong node.
Fix emac_mdio.c so that it can handle both old and new
device trees.
Fixes: 1dabb74971b3 ("ARM: dts: rockchip: restyle emac nodes") Signed-off-by: Johan Jonker <jbx6244@gmail.com> Tested-by: Andy Yan <andyshrk@163.com> Link: https://lore.kernel.org/r/20220603163539.537-3-jbx6244@gmail.com Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Johan Jonker [Mon, 4 Nov 2024 13:01:38 +0000 (21:01 +0800)]
net: arc: fix the device for dma_map_single/dma_unmap_single
The ndev->dev and pdev->dev aren't the same device, use ndev->dev.parent
which has dma_mask, ndev->dev.parent is just pdev->dev.
Or it would cause the following issue:
[ 39.933526] ------------[ cut here ]------------
[ 39.938414] WARNING: CPU: 1 PID: 501 at kernel/dma/mapping.c:149 dma_map_page_attrs+0x90/0x1f8
Fixes: f959dcd6ddfd ("dma-direct: Fix potential NULL pointer dereference") Signed-off-by: David Wu <david.wu@rock-chips.com> Signed-off-by: Johan Jonker <jbx6244@gmail.com> Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Copy the relevant data from userspace to the vb->planes unconditionally
as it's possible some of the fields may have changed after the buffer
has been validated.
Keep the dma_buf_put(planes[plane].dbuf) calls in the first
`if (!reacquired)` case, in order to be close to the plane validation code
where the buffers were got in the first place.
Cc: stable@vger.kernel.org Fixes: 95af7c00f35b ("media: videobuf2-core: release all planes first in __prepare_dmabuf()") Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Tested-by: Will McVicker <willmcvicker@google.com> Acked-by: Tomasz Figa <tfiga@chromium.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
====================
virtio_net: Make RSS interact properly with queue number
With this patch set, RSS updates with queue_pairs changing:
- When virtnet_probe, init default rss and commit
- When queue_pairs changes _without_ user rss configuration, update rss
with the new queue number
- When queue_pairs changes _with_ user rss configuration, keep rss as user
configured
Patch 1 and 2 fix possible out of bound errors for indir_table and key.
Patch 3 and 4 add RSS update in probe() and set_queues().
====================
Philo Lu [Mon, 4 Nov 2024 08:57:06 +0000 (16:57 +0800)]
virtio_net: Update rss when set queue
RSS configuration should be updated with queue number. In particular, it
should be updated when (1) rss enabled and (2) default rss configuration
is used without user modification.
During rss command processing, device updates queue_pairs using
rss.max_tx_vq. That is, the device updates queue_pairs together with
rss, so we can skip the sperate queue_pairs update
(VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET below) and return directly.
Also remove the `vi->has_rss ?` check when setting vi->rss.max_tx_vq,
because this is not used in the other hash_report case.
Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Philo Lu [Mon, 4 Nov 2024 08:57:05 +0000 (16:57 +0800)]
virtio_net: Sync rss config to device when virtnet_probe
During virtnet_probe, default rss configuration is initialized, but was
not committed to the device. This patch fix this by sending rss command
after device ready in virtnet_probe. Otherwise, the actual rss
configuration used by device can be different with that read by user
from driver, which may confuse the user.
If the command committing fails, driver rss will be disabled.
Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Joe Damato <jdamato@fastly.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Philo Lu [Mon, 4 Nov 2024 08:57:04 +0000 (16:57 +0800)]
virtio_net: Add hash_key_length check
Add hash_key_length check in virtnet_probe() to avoid possible out of
bound errors when setting/reading the hash key.
Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Joe Damato <jdamato@fastly.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>