Linus Torvalds [Sun, 19 Jan 2025 17:04:33 +0000 (09:04 -0800)]
Merge tag 'irq_urgent_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Fix an OF node leak in irqchip init's error handling path
- Fix sunxi systems to wake up from suspend with an NMI by
pressing the power button
- Do not spuriously enable interrupts in gic-v3 in a nested
interrupts-off section
- Make sure gic-v3 handles properly a failure to enter a
low power state
* tag 'irq_urgent_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: Plug a OF node reference leak in platform_irqchip_probe()
irqchip/sunxi-nmi: Add missing SKIP_WAKE flag
irqchip/gic-v3-its: Don't enable interrupts in its_irq_set_vcpu_affinity()
irqchip/gic-v3: Handle CPU_PM_ENTER_FAILED correctly
Linus Torvalds [Sun, 19 Jan 2025 17:01:17 +0000 (09:01 -0800)]
Merge tag 'sched_urgent_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Do not adjust the weight of empty group entities and avoid
scheduling artifacts
- Avoid scheduling lag by computing lag properly and thus address
an EEVDF entity placement issue
* tag 'sched_urgent_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE
sched/fair: Fix EEVDF entity placement bug causing scheduling lag
Linus Torvalds [Sat, 18 Jan 2025 21:22:53 +0000 (13:22 -0800)]
Merge tag 'trace-v6.13-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix regression in GFP output in trace events
It was reported that the GFP flags in trace events went from human
readable to just their hex values:
gfp_flags=GFP_HIGHUSER_MOVABLE|__GFP_COMP to gfp_flags=0x140cca
This was caused by a change that added the use of enums in calculating
the GFP flags.
As defines get translated into their values in the trace event format
files, the user space tooling could easily convert the GFP flags into
their symbols via the __print_flags() helper macro.
The problem is that enums do not get converted, and the names of the
enums show up in the format files and user space tooling cannot
translate them.
Add TRACE_DEFINE_ENUM() around the enums used for GFP flags which is
the tracing infrastructure macro that informs the tracing subsystem
what the values for enums and it can then expose that to user space"
* tag 'trace-v6.13-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: gfp: Fix the GFP enum values shown for user space tracing tools
Linus Torvalds [Fri, 17 Jan 2025 23:01:24 +0000 (15:01 -0800)]
Merge tag 'devicetree-fixes-for-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
"Another fix and testcase to avoid the newly added WARN in the case of
non-translatable addresses"
* tag 'devicetree-fixes-for-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of/address: Fix WARN when attempting translating non-translatable addresses
of/unittest: Add test that of_address_to_resource() fails on non-translatable address
Linus Torvalds [Fri, 17 Jan 2025 22:49:53 +0000 (14:49 -0800)]
Merge tag 'soc-fixes-6.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"Two last minute fixes: one build issue on TI soc drivers, and a
regression in the renesas reset controller driver"
* tag 'soc-fixes-6.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
soc: ti: pruss: Fix pruss APIs
reset: rzg2l-usbphy-ctrl: Assign proper of node to the allocated device
Linus Torvalds [Fri, 17 Jan 2025 22:22:36 +0000 (14:22 -0800)]
Merge tag 'mtd/fixes-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd revert from Miquel Raynal:
"Very late this cycle we identified a breakage that could potentially
hit several spi controller drivers because of a change in the way the
dummy cycles validity is checked.
We do not know at the moment how to handle the situation properly, so
we prefer to revert the faulty patch for the next release"
* tag 'mtd/fixes-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
Revert "mtd: spi-nor: core: replace dummy buswidth from addr to data"
Steven Rostedt [Thu, 16 Jan 2025 21:41:24 +0000 (16:41 -0500)]
tracing: gfp: Fix the GFP enum values shown for user space tracing tools
Tracing tools like perf and trace-cmd read the /sys/kernel/tracing/events/*/*/format
files to know how to parse the data and also how to print it. For the
"print fmt" portion of that file, if anything uses an enum that is not
exported to the tracing system, user space will not be able to parse it.
The GFP flags use to be defines, and defines get translated in the print
fmt sections. But now they are converted to use enums, which is not.
Where the enums names like ___GFP_KSWAPD_RECLAIM_BIT are shown and not their
values. User space has no way to convert these names to their values and
the output will fail to parse. What is shown is now:
The TRACE_DEFINE_ENUM() macro was created to handle enums in the print fmt
files. This causes them to be replaced at boot up with the numbers, so
that user space tooling can parse it. By using this macro, the output is
back to the human readable:
Linus Torvalds [Fri, 17 Jan 2025 20:31:37 +0000 (12:31 -0800)]
Merge tag 'hwmon-for-v6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- ltc2991, tmp513: Fix problems seen when dividing negative numbers
- drivetemp: Handle large timeouts observed on some drives
- acpi_power_meter: Fix loading the driver on platforms without _PMD
method
* tag 'hwmon-for-v6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ltc2991) Fix mixed signed/unsigned in DIV_ROUND_CLOSEST
hwmon: (drivetemp) Set scsi command timeout to 10s
hwmon: (acpi_power_meter) Fix a check for the return value of read_domain_devices().
hwmon: (tmp513) Fix division of negative numbers
Linus Torvalds [Fri, 17 Jan 2025 05:24:34 +0000 (21:24 -0800)]
Merge tag 'mm-hotfixes-stable-2025-01-16-21-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"7 singleton hotfixes. 6 are MM.
Two are cc:stable and the remainder address post-6.12 issues"
* tag 'mm-hotfixes-stable-2025-01-16-21-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
ocfs2: check dir i_size in ocfs2_find_entry
mailmap: update entry for Ethan Carter Edwards
mm: zswap: move allocations during CPU init outside the lock
mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma
mm: shmem: use signed int for version handling in casefold option
alloc_tag: skip pgalloc_tag_swap if profiling is disabled
mm: page_alloc: fix missed updates of lowmem_reserve in adjust_managed_page_count
Linus Torvalds [Fri, 17 Jan 2025 05:18:12 +0000 (21:18 -0800)]
Merge tag '6.13-rc7-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- fix double free when reconnect racing with closing session
- fix SMB1 reconnect with password rotation
* tag '6.13-rc7-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix double free of TCP_Server_Info::hostname
cifs: support reconnect with alternate password for SMB1
Linus Torvalds [Fri, 17 Jan 2025 03:49:26 +0000 (19:49 -0800)]
Merge tag 'drm-fixes-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Final(?) set of fixes for 6.13, I think the holidays finally caught up
with everyone, the misc changes are 2 weeks worth, otherwise amdgpu
and xe are most of it. The largest pieces is a new test so I'm not too
worried about that.
kunit:
- Fix W=1 build for kunit tests
bridge:
- Handle YCbCr420 better in bridge code, with tests
- itee-it6263 error handling fix
xe:
- Add steering info support for GuC register lists
- Add means to wait for reset and synchronous reset
- Make changing ccs_mode a synchronous action
- Add missing mux registers
- Mark ComputeCS read mode as UC on iGPU, unblocking ULLS on iGPU
i915:
- Relax clear color alignment to 64 bytes [fb]
v3d:
- Fix warn when unloading v3d
nouveau:
- Fix cross-device fence handling in nouveau
- Fix backlight regression for macbooks 5,1
vmwgfx:
- Fix BO reservation handling in vmwgfx"
* tag 'drm-fixes-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel: (33 commits)
drm/xe: Mark ComputeCS read mode as UC on iGPU
drm/xe/oa: Add missing VISACTL mux registers
drm/xe: make change ccs_mode a synchronous action
drm/xe: introduce xe_gt_reset and xe_gt_wait_for_reset
drm/xe/guc: Adding steering info support for GuC register lists
drm/bridge: ite-it6263: Prevent error pointer dereference in probe()
drm/v3d: Ensure job pointer is set to NULL after job completion
drm/vmwgfx: Add new keep_resv BO param
drm/vmwgfx: Remove busy_places
drm/vmwgfx: Unreserve BO on error
drm/amdgpu: fix fw attestation for MP0_14_0_{2/3}
drm/amdgpu: always sync the GFX pipe on ctx switch
drm/amdgpu: disable gfxoff with the compute workload on gfx12
drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX Isolation
drm/i915/fb: Relax clear color alignment to 64 bytes
drm/amd/display: Disable replay and psr while VRR is enabled
drm/amd/display: Fix PSR-SU not support but still call the amdgpu_dm_psr_enable
nouveau/fence: handle cross device fences properly
drm/tests: connector: Add ycbcr_420_allowed tests
drm/connector: hdmi: Validate supported_formats matches ycbcr_420_allowed
...
Linus Torvalds [Fri, 17 Jan 2025 01:02:28 +0000 (17:02 -0800)]
Merge tag 'io_uring-6.13-20250116' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"One fix for the error handling in buffer cloning, and one fix for the
ring resizing.
Two minor followups for the latter as well.
Both of these issues only affect 6.13, so not marked for stable"
* tag 'io_uring-6.13-20250116' of git://git.kernel.dk/linux:
io_uring/register: cache old SQ/CQ head reading for copies
io_uring/register: document io_register_resize_rings() shared mem usage
io_uring/register: use stable SQ/CQ ring data during resize
io_uring/rsrc: fixup io_clone_buffers() error handling
Dave Airlie [Thu, 16 Jan 2025 22:54:06 +0000 (08:54 +1000)]
Merge tag 'drm-xe-fixes-2025-01-16' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Add steering info support for GuC register lists (Jesus Narvaez)
- Add means to wait for reset and synchronous reset (Maciej)
- Make changing ccs_mode a synchronous action (Maciej)
- Add missing mux registers (Ashutosh)
- Mark ComputeCS read mode as UC on iGPU, unblocking ULLS on iGPU (Matt Brost)
Linus Torvalds [Fri, 17 Jan 2025 00:19:05 +0000 (16:19 -0800)]
Merge tag 'trace-v6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix a regression in the irqsoff and wakeup latency tracing
The function graph tracer infrastructure has become generic so that
fprobes and BPF can be based on it. As it use to only handle function
graph tracing, it would always calculate the time the function
entered so that it could then calculate the time it exits and give
the length of time the function executed for. But this is not needed
for the other users (fprobes and BPF) and reading the clock adds a
non-negligible overhead, so the calculation was moved into the
function graph tracer logic.
But the irqsoff and wakeup latency tracers, when the "display-graph"
option was set, would use the function graph tracer to calculate the
times of functions during the latency. The movement of the calltime
calculation made the value zero for these tracers, and the output no
longer showed the length of time of each tracer, but instead the
absolute timestamp of when the function returned (rettime - calltime
where calltime is now zero).
Have the irqsoff and wakeup latency tracers also do the calltime
calculation as the function graph tracer does and report the proper
length of the function timings.
- Update the tracing display to reflect the new preempt lazy model
When the system is configured with preempt lazy, the output of the
trace data would state "unknown" for the current preemption model.
Because the lazy preemption model was just added, make it known to
the tracing subsystem too. This is just a one line change.
- Document multiple function graph having slightly different timings
Now that function graph tracer infrastructure is separate, this also
allows the function graph tracer to run in multiple instances (it
wasn't able to do so before). If two instances ran the function graph
tracer and traced the same functions, the timings for them will be
slightly different because each does their own timings and collects
the timestamps differently. Document this to not have people be
confused by it.
* tag 'trace-v6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ftrace: Document that multiple function_graph tracing may have different times
tracing: Print lazy preemption model
tracing: Fix irqsoff and wakeup latency tracers when using function graph
Matthew Brost [Tue, 14 Jan 2025 00:25:07 +0000 (16:25 -0800)]
drm/xe: Mark ComputeCS read mode as UC on iGPU
RING_CMD_CCTL read index should be UC on iGPU parts due to L3 caching
structure. Having this as WB blocks ULLS from being enabled. Change to
UC to unblock ULLS on iGPU.
v2:
- Drop internal communications commnet, bspec is updated
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: stable@vger.kernel.org Fixes: 328e089bfb37 ("drm/xe: Leverage ComputeCS read L3 caching") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250114002507.114087-1-matthew.brost@intel.com
(cherry picked from commit 758debf35b9cda5450e40996991a6e4b222899bd) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
* tag 'net-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits)
netdev: avoid CFI problems with sock priv helpers
net/mlx5e: Always start IPsec sequence number from 1
net/mlx5e: Rely on reqid in IPsec tunnel mode
net/mlx5e: Fix inversion dependency warning while enabling IPsec tunnel
net/mlx5: Clear port select structure when fail to create
net/mlx5: SF, Fix add port error handling
net/mlx5: Fix a lockdep warning as part of the write combining test
net/mlx5: Fix RDMA TX steering prio
net: make page_pool_ref_netmem work with net iovs
net: ethernet: xgbe: re-add aneg to supported features in PHY quirks
net: pcs: xpcs: actively unset DW_VR_MII_DIG_CTRL1_2G5_EN for 1G SGMII
net: pcs: xpcs: fix DW_VR_MII_DIG_CTRL1_2G5_EN bit being set for 1G SGMII w/o inband
selftests: net: Adapt ethtool mq tests to fix in qdisc graft
net: fec: handle page_pool_dev_alloc_pages error
net: netpoll: ensure skb_pool list is always initialized
net: xilinx: axienet: Fix IRQ coalescing packet count overflow
nfp: bpf: prevent integer overflow in nfp_bpf_event_output()
selftests: mptcp: avoid spurious errors on disconnect
mptcp: fix spurious wake-up on under memory pressure
mptcp: be sure to send ack when mptcp-level window re-opens
...
Linus Torvalds [Thu, 16 Jan 2025 17:04:10 +0000 (09:04 -0800)]
Merge tag 'pm-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Update the documentation of cpuidle governors that does not match the
code any more after previous functional changes (Rafael Wysocki) and
fix up the cpufreq Kconfig file broken inadvertently by a previous
update (Viresh Kumar)"
* tag 'pm-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Move endif to the end of Kconfig file
cpuidle: teo: Update documentation after previous changes
cpuidle: menu: Update documentation after previous changes
Linus Torvalds [Thu, 16 Jan 2025 17:02:10 +0000 (09:02 -0800)]
Merge tag 'acpi-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Prevent acpi_video_device_EDID() from returning a pointer to a memory
region that should not be passed to kfree() which causes one of its
users to crash randomly on attempts to free it (Chris Bainbridge)"
* tag 'acpi-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Fix random crashes due to bad kfree()
Linus Torvalds [Thu, 16 Jan 2025 16:54:33 +0000 (08:54 -0800)]
Merge tag 'for-6.13-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
- handle d_path() errors when canonicalizing device mapper paths during
device scan
* tag 'for-6.13-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: add the missing error handling inside get_canonical_dev_path
Jakub Kicinski [Wed, 15 Jan 2025 16:14:36 +0000 (08:14 -0800)]
netdev: avoid CFI problems with sock priv helpers
Li Li reports that casting away callback type may cause issues
for CFI. Let's generate a small wrapper for each callback,
to make sure compiler sees the anticipated types.
Leon Romanovsky [Wed, 15 Jan 2025 11:39:10 +0000 (13:39 +0200)]
net/mlx5e: Always start IPsec sequence number from 1
According to RFC4303, section "3.3.3. Sequence Number Generation",
the first packet sent using a given SA will contain a sequence
number of 1.
This is applicable to both ESN and non-ESN mode, which was not covered
in commit mentioned in Fixes line.
Fixes: 3d42c8cc67a8 ("net/mlx5e: Ensure that IPsec sequence packet number starts from 1") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Leon Romanovsky [Wed, 15 Jan 2025 11:39:09 +0000 (13:39 +0200)]
net/mlx5e: Rely on reqid in IPsec tunnel mode
All packet offloads SAs have reqid in it to make sure they have
corresponding policy. While it is not strictly needed for transparent
mode, it is extremely important in tunnel mode. In that mode, policy and
SAs have different match criteria.
Policy catches the whole subnet addresses, and SA catches the tunnel gateways
addresses. The source address of such tunnel is not known during egress packet
traversal in flow steering as it is added only after successful encryption.
As reqid is required for packet offload and it is unique for every SA,
we can safely rely on it only.
The output below shows the configured egress policy and SA by strongswan:
[leonro@vm ~]$ sudo ip x s
src 192.169.101.2 dst 192.169.101.1
proto esp spi 0xc88b7652 reqid 1 mode tunnel
replay-window 0 flag af-unspec esn
aead rfc4106(gcm(aes)) 0xe406a01083986e14d116488549094710e9c57bc6 128
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0
replay_window 1, bitmap-length 1 00000000
crypto offload parameters: dev eth2 dir out mode packet
[leonro@064 ~]$ sudo ip x p
src 192.170.0.0/16 dst 192.170.0.0/16
dir out priority 383615 ptype main
tmpl src 192.169.101.2 dst 192.169.101.1
proto esp spi 0xc88b7652 reqid 1 mode tunnel
crypto offload parameters: dev eth2 mode packet
Fixes: b3beba1fb404 ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Leon Romanovsky [Wed, 15 Jan 2025 11:39:08 +0000 (13:39 +0200)]
net/mlx5e: Fix inversion dependency warning while enabling IPsec tunnel
Attempt to enable IPsec packet offload in tunnel mode in debug kernel
generates the following kernel panic, which is happening due to two
issues:
1. In SA add section, the should be _bh() variant when marking SA mode.
2. There is not needed flush_workqueue in SA delete routine. It is not
needed as at this stage as it is removed from SADB and the running work
will be canceled later in SA free.
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
6.12.0+ #4 Not tainted
-----------------------------------------------------
charon/1337 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire: ffff88810f365020 (&xa->xa_lock#24){+.+.}-{3:3}, at: mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
and this task is already holding: ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30
which would create a new lock dependency:
(&x->lock){+.-.}-{3:3} -> (&xa->xa_lock#24){+.+.}-{3:3}
but this new dependency connects a SOFTIRQ-irq-safe lock:
(&x->lock){+.-.}-{3:3}
... which became SOFTIRQ-irq-safe at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
xfrm_timer_handler+0x91/0xd70
__hrtimer_run_queues+0x1dd/0xa60
hrtimer_run_softirq+0x146/0x2e0
handle_softirqs+0x266/0x860
irq_exit_rcu+0x115/0x1a0
sysvec_apic_timer_interrupt+0x6e/0x90
asm_sysvec_apic_timer_interrupt+0x16/0x20
default_idle+0x13/0x20
default_idle_call+0x67/0xa0
do_idle+0x2da/0x320
cpu_startup_entry+0x50/0x60
start_secondary+0x213/0x2a0
common_startup_64+0x129/0x138
to a SOFTIRQ-irq-unsafe lock:
(&xa->xa_lock#24){+.+.}-{3:3}
Mark Zhang [Wed, 15 Jan 2025 11:39:07 +0000 (13:39 +0200)]
net/mlx5: Clear port select structure when fail to create
Clear the port select structure on error so no stale values left after
definers are destroyed. That's because the mlx5_lag_destroy_definers()
always try to destroy all lag definers in the tt_map, so in the flow
below lag definers get double-destroyed and cause kernel crash:
Fixes: dc48516ec7d3 ("net/mlx5: Lag, add support to create definers for LAG") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Chris Mi [Wed, 15 Jan 2025 11:39:06 +0000 (13:39 +0200)]
net/mlx5: SF, Fix add port error handling
If failed to add SF, error handling doesn't delete the SF from the
SF table. But the hw resources are deleted. So when unload driver,
hw resources will be deleted again. Firmware will report syndrome
0x68def3 which means "SF is not allocated can not deallocate".
Fix it by delete SF from SF table if failed to add SF.
Fixes: 2597ee190b4e ("net/mlx5: Call mlx5_sf_id_erase() once in mlx5_sf_dealloc()") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Patrisious Haddad [Wed, 15 Jan 2025 11:39:04 +0000 (13:39 +0200)]
net/mlx5: Fix RDMA TX steering prio
User added steering rules at RDMA_TX were being added to the first prio,
which is the counters prio.
Fix that so that they are correctly added to the BYPASS_PRIO instead.
Fixes: 24670b1a3166 ("net/mlx5: Add support for RDMA TX steering") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Su Yue [Mon, 6 Jan 2025 14:06:40 +0000 (22:06 +0800)]
ocfs2: check dir i_size in ocfs2_find_entry
syz reports an out of bounds read:
==================================================================
BUG: KASAN: slab-out-of-bounds in ocfs2_match fs/ocfs2/dir.c:334
[inline]
BUG: KASAN: slab-out-of-bounds in ocfs2_search_dirblock+0x283/0x6e0
fs/ocfs2/dir.c:367
Read of size 1 at addr ffff88804d8b9982 by task syz-executor.2/14802
The two reports are all caused invalid negative i_size of dir inode. For
ocfs2, dir_inode can't be negative or zero.
Here add a check in which is called by ocfs2_check_dir_for_entry(). It
fixes the second report as ocfs2_check_dir_for_entry() must be called
before ocfs2_prepare_dir_for_insert(). Also set a up limit for dir with
OCFS2_INLINE_DATA_FL. The i_size can't be great than blocksize.
Yosry Ahmed [Mon, 13 Jan 2025 21:44:58 +0000 (21:44 +0000)]
mm: zswap: move allocations during CPU init outside the lock
In zswap_cpu_comp_prepare(), allocations are made and assigned to various
members of acomp_ctx under acomp_ctx->mutex. However, allocations may
recurse into zswap through reclaim, trying to acquire the same mutex and
deadlocking.
Move the allocations before the mutex critical section. Only the
initialization of acomp_ctx needs to be done with the mutex held.
Link: https://lkml.kernel.org/r/20250113214458.2123410-1-yosryahmed@google.com Fixes: 12dcb0ef5406 ("mm: zswap: properly synchronize freeing resources during CPU hotunplug") Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mattew Wilcox <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zihan zhou [Wed, 25 Dec 2024 02:10:35 +0000 (10:10 +0800)]
mm: page_alloc: fix missed updates of lowmem_reserve in adjust_managed_page_count
In the kernel, the zone's lowmem_reserve and _watermark, and the global
variable 'totalreserve_pages' depend on the value of managed_pages, but
after running adjust_managed_page_count, these values aren't updated,
which causes some problems.
For example, in a system with six 1GB large pages, we found that the value
of protection in zoneinfo (zone->lowmem_reserve), is not right. Its value
seems to be calculated from the initial managed_pages, but after the
managed_pages changed, was not updated. Only after reading the file
/proc/sys/vm/lowmem_reserve_ratio, updates happen.
lowmem_reserve increased also makes the totalreserve_pages increased,
which causes a decrease in available memory. The one above is just a test
machine, and the increase is not significant. On our online machine, the
reserved memory will increase by several GB due to reading this file. It
is clearly unreasonable to cause a sharp drop in available memory just by
reading a file.
In this patch, we update reserve memory when update managed_pages, The
size of reserved memory becomes stable. But it seems that the _watermark
should also be updated along with the managed_pages. We have not done it
because we are unsure if it is reasonable to set the watermark through the
initial managed_pages. If it is not reasonable, we will propose new
patch.
Pavel Begunkov [Wed, 8 Jan 2025 22:06:22 +0000 (14:06 -0800)]
net: make page_pool_ref_netmem work with net iovs
page_pool_ref_netmem() should work with either netmem representation, but
currently it casts to a page with netmem_to_page(), which will fail with
net iovs. Use netmem_get_pp_ref_count_ref() instead.
Fixes: 8ab79ed50cf1 ("page_pool: devmem support") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://lore.kernel.org/20250108220644.3528845-2-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paulo Alcantara [Tue, 14 Jan 2025 15:48:48 +0000 (12:48 -0300)]
smb: client: fix double free of TCP_Server_Info::hostname
When shutting down the server in cifs_put_tcp_session(), cifsd thread
might be reconnecting to multiple DFS targets before it realizes it
should exit the loop, so @server->hostname can't be freed as long as
cifsd thread isn't done. Otherwise the following can happen:
Fixes: 7be3248f3139 ("cifs: To match file servers, make sure the server hostname matches") Reported-by: Jay Shin <jaeshin@redhat.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
David Lechner [Wed, 15 Jan 2025 20:48:27 +0000 (14:48 -0600)]
hwmon: (ltc2991) Fix mixed signed/unsigned in DIV_ROUND_CLOSEST
Fix use of DIV_ROUND_CLOSEST where a possibly negative value is divided
by an unsigned type by casting the unsigned type to the signed type of
the same size (st->r_sense_uohm[channel] has type of u32).
The docs on the DIV_ROUND_CLOSEST macro explain that dividing a negative
value by an unsigned type is undefined behavior. The actual behavior is
that it converts both values to unsigned before doing the division, for
example:
Heiner Kallweit [Sun, 12 Jan 2025 21:59:59 +0000 (22:59 +0100)]
net: ethernet: xgbe: re-add aneg to supported features in PHY quirks
In 4.19, before the switch to linkmode bitmaps, PHY_GBIT_FEATURES
included feature bits for aneg and TP/MII ports.
SUPPORTED_TP | \
SUPPORTED_MII)
SUPPORTED_10baseT_Full)
SUPPORTED_100baseT_Full)
SUPPORTED_1000baseT_Full)
PHY_100BT_FEATURES | \
PHY_DEFAULT_FEATURES)
PHY_1000BT_FEATURES)
Referenced commit expanded PHY_GBIT_FEATURES, silently removing
PHY_DEFAULT_FEATURES. The removed part can be re-added by using
the new PHY_GBIT_FEATURES definition.
Not clear to me is why nobody seems to have noticed this issue.
I stumbled across this when checking what it takes to make
phy_10_100_features_array et al private to phylib.
Vladimir Oltean [Tue, 14 Jan 2025 16:47:21 +0000 (18:47 +0200)]
net: pcs: xpcs: actively unset DW_VR_MII_DIG_CTRL1_2G5_EN for 1G SGMII
xpcs_config_2500basex() sets DW_VR_MII_DIG_CTRL1_2G5_EN, but
xpcs_config_aneg_c37_sgmii() never unsets it. So, on a protocol change
from 2500base-x to sgmii, the DW_VR_MII_DIG_CTRL1_2G5_EN bit will remain
set.
Fixes: f27abde3042a ("net: pcs: add 2500BASEX support for Intel mGbE controller") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250114164721.2879380-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 14 Jan 2025 16:47:20 +0000 (18:47 +0200)]
net: pcs: xpcs: fix DW_VR_MII_DIG_CTRL1_2G5_EN bit being set for 1G SGMII w/o inband
On a port with SGMII fixed-link at SPEED_1000, DW_VR_MII_DIG_CTRL1 gets
set to 0x2404. This is incorrect, because bit 2 (DW_VR_MII_DIG_CTRL1_2G5_EN)
is set.
It comes from the previous write to DW_VR_MII_AN_CTRL, because the "val"
variable is reused and is dirty. Actually, its value is 0x4, aka
FIELD_PREP(DW_VR_MII_PCS_MODE_MASK, DW_VR_MII_PCS_MODE_C37_SGMII).
Resolve the issue by clearing "val" to 0 when writing to a new register.
After the fix, the register value is 0x2400.
Prior to the blamed commit, when the read-modify-write was open-coded,
the code saved the content of the DW_VR_MII_DIG_CTRL1 register in the
"ret" variable.
Fixes: ce8d6081fcf4 ("net: pcs: xpcs: add _modify() accessors") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250114164721.2879380-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wolfram Sang [Wed, 15 Jan 2025 16:23:47 +0000 (17:23 +0100)]
i2c: testunit: on errors, repeat NACK until STOP
This backend requests a NACK from the controller driver when it detects
an error. If that request gets ignored from some reason, subsequent
accesses will wrongly be handled OK. To fix this, an error now changes
the state machine, so the backend will report NACK until a STOP
condition has been detected. This make the driver more robust against
controllers which will sadly apply the NACK not to the current byte but
the next one.
Wolfram Sang [Wed, 15 Jan 2025 12:36:23 +0000 (13:36 +0100)]
i2c: rcar: fix NACK handling when being a target
When this controller is a target, the NACK handling had two issues.
First, the return value from the backend was not checked on the initial
WRITE_REQUESTED. So, the driver missed to send a NACK in this case.
Also, the NACK always arrives one byte late on the bus, even in the
WRITE_RECEIVED case. This seems to be a HW issue. We should then not
rely on the backend to correctly NACK the superfluous byte as well. Fix
both issues by introducing a flag which gets set whenever the backend
requests a NACK and keep sending it until we get a STOP condition.
The commit uses data nbits instead of addr nbits for dummy phase. This
causes a regression for all boards where spi-tx-bus-width is smaller
than spi-rx-bus-width. It is a common pattern for boards to have
spi-tx-bus-width == 1 and spi-rx-bus-width > 1. The regression causes
all reads with a dummy phase to become unavailable for such boards,
leading to a usually slower 0-dummy-cycle read being selected.
Most controllers' supports_op hooks call spi_mem_default_supports_op().
In spi_mem_default_supports_op(), spi_mem_check_buswidth() is called to
check if the buswidths for the op can actually be supported by the
board's wiring. This wiring information comes from (among other things)
the spi-{tx,rx}-bus-width DT properties. Based on these properties,
SPI_TX_* or SPI_RX_* flags are set by of_spi_parse_dt().
spi_mem_check_buswidth() then uses these flags to make the decision
whether an op can be supported by the board's wiring (in a way,
indirectly checking against spi-{rx,tx}-bus-width).
Now the tricky bit here is that spi_mem_check_buswidth() does:
if (op->dummy.nbytes &&
spi_check_buswidth_req(mem, op->dummy.buswidth, true))
return false;
The true argument to spi_check_buswidth_req() means the op is treated as
a TX op. For a board that has say 1-bit TX and 4-bit RX, a 4-bit dummy
TX is considered as unsupported, and the op gets rejected.
The commit being reverted uses the data buswidth for dummy buswidth. So
for reads, the RX buswidth gets used for the dummy phase, uncovering
this issue. In reality, a dummy phase is neither RX nor TX. As the name
suggests, these are just dummy cycles that send or receive no data, and
thus don't really need to have any buswidth at all.
Ideally, dummy phases should not be checked against the board's wiring
capabilities at all, and should only be sanity-checked for having a sane
buswidth value. Since we are now at rc7 and such a change might
introduce many unexpected bugs, revert the commit for now. It can be
sent out later along with the spi_mem_check_buswidth() fix.
Fixes: 98d1fb94ce75 ("mtd: spi-nor: core: replace dummy buswidth from addr to data") Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com> Closes: https://lore.kernel.org/linux-mtd/3342163.44csPzL39Z@steina-w/ Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Pratyush Yadav <pratyush@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Jens Axboe [Wed, 15 Jan 2025 15:39:15 +0000 (08:39 -0700)]
io_uring/register: cache old SQ/CQ head reading for copies
The SQ and CQ ring heads are read twice - once for verifying that it's
within bounds, and once inside the loops copying SQE and CQE entries.
This is technically incorrect, in case the values could get modified
in between verifying them and using them in the copy loop. While this
won't lead to anything truly nefarious, it may cause longer loop times
for the copies than expected.
Read the ring head values once, and use the verified value in the copy
loops.
Jens Axboe [Wed, 15 Jan 2025 15:23:55 +0000 (08:23 -0700)]
io_uring/register: document io_register_resize_rings() shared mem usage
It can be a bit hard to tell which parts of io_register_resize_rings()
are operating on shared memory, and which ones are not. And anything
reading or writing to those regions should really use the read/write
once primitives.
Hence add those, ensuring sanity in how this memory is accessed, and
helping document the shared nature of it.
Jens Axboe [Wed, 15 Jan 2025 14:39:12 +0000 (07:39 -0700)]
io_uring/register: use stable SQ/CQ ring data during resize
Normally the kernel would not expect an application to modify any of
the data shared with the kernel during a resize operation, but of
course the kernel cannot always assume good intent on behalf of the
application.
As part of resizing the rings, existing SQEs and CQEs are copied over
to the new storage. Resizing uses the masks in the newly allocated
shared storage to index the arrays, however it's possible that malicious
userspace could modify these after they have been sanity checked.
Use the validated and locally stored CQ and SQ ring sizing for masking
to ensure the values are both stable and valid.
Russell Harmon [Wed, 15 Jan 2025 13:13:41 +0000 (05:13 -0800)]
hwmon: (drivetemp) Set scsi command timeout to 10s
There's at least one drive (MaxDigitalData OOS14000G) such that if it
receives a large amount of I/O while entering an idle power state will
first exit idle before responding, including causing SMART temperature
requests to be delayed.
This causes the drivetemp request to exceed its timeout of 1 second.
Kazuhiro Abe [Wed, 15 Jan 2025 07:35:32 +0000 (07:35 +0000)]
hwmon: (acpi_power_meter) Fix a check for the return value of read_domain_devices().
After commit fabb1f813ec0 ("hwmon: (acpi_power_meter) Fix fail to load
module on platform without _PMD method"),
the acpi_power_meter driver fails to load if the platform has _PMD method.
To address this, add a check for successful read_domain_devices().
Tested on Nvidia Grace machine.
Fixes: fabb1f813ec0 ("hwmon: (acpi_power_meter) Fix fail to load module on platform without _PMD method") Signed-off-by: Kazuhiro Abe <fj1078ii@aa.jp.fujitsu.com> Link: https://lore.kernel.org/r/20250115073532.3211000-1-fj1078ii@aa.jp.fujitsu.com
[groeck: Dropped unnecessary () from expression] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Maciej Patelczyk [Wed, 11 Dec 2024 11:17:27 +0000 (12:17 +0100)]
drm/xe: make change ccs_mode a synchronous action
If ccs_mode is being modified via
/sys/class/drm/cardX/device/tileY/gtY/ccs_mode
the asynchronous reset is triggered and the write returns immediately.
With that some test receive false information about number of CCS engines
or even fail if they proceed without delay after changing the ccs_mode.
Changing the ccs_mode change from async to sync to prevent failures in
tests.
Jesus Narvaez [Thu, 12 Dec 2024 19:01:00 +0000 (11:01 -0800)]
drm/xe/guc: Adding steering info support for GuC register lists
The guc_mmio_reg interface supports steering, but it is currently not
implemented. This will allow the GuC to control steering of MMIO
registers after save-restore and avoid reading from fused off MCR
register instances.
Fixes: 9c57bc08652a ("drm/xe/lnl: Drop force_probe requirement") Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241212190100.3768068-1-jesus.narvaez@intel.com
(cherry picked from commit ee5a1321df90891d59d83b7c9d5b6c5b755d059d) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Simons [Sun, 12 Jan 2025 12:34:02 +0000 (13:34 +0100)]
irqchip/sunxi-nmi: Add missing SKIP_WAKE flag
Some boards with Allwinner SoCs connect the PMIC's IRQ pin to the SoC's NMI
pin instead of a normal GPIO. Since the power key is connected to the PMIC,
and people expect to wake up a suspended system via this key, the NMI IRQ
controller must stay alive when the system goes into suspend.
Add the SKIP_WAKE flag to prevent the sunxi NMI controller from going to
sleep, so that the power key can wake up those systems.
When a CPU attempts to enter low power mode, it disables the redistributor
and Group 1 interrupts and reinitializes the system registers upon wakeup.
If the transition into low power mode fails, then the CPU_PM framework
invokes the PM notifier callback with CPU_PM_ENTER_FAILED to allow the
drivers to undo the state changes.
The GIC V3 driver ignores CPU_PM_ENTER_FAILED, which leaves the GIC in
disabled state.
Handle CPU_PM_ENTER_FAILED in the same way as CPU_PM_EXIT to restore normal
operation.
[ tglx: Massage change log, add Fixes tag ]
Fixes: 3708d52fc6bb ("irqchip: gic-v3: Implement CPU PM notifier") Signed-off-by: Yogesh Lal <quic_ylal@quicinc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241220093907.2747601-1-quic_ylal@quicinc.com
Dan Carpenter [Tue, 12 Nov 2024 10:23:03 +0000 (13:23 +0300)]
drm/bridge: ite-it6263: Prevent error pointer dereference in probe()
If devm_i2c_new_dummy_device() fails then we were supposed to return an
error code, but instead the function continues and will crash on the next
line. Add the missing return statement.
Kevin Groeneveld [Mon, 13 Jan 2025 15:48:45 +0000 (10:48 -0500)]
net: fec: handle page_pool_dev_alloc_pages error
The fec_enet_update_cbd function calls page_pool_dev_alloc_pages but did
not handle the case when it returned NULL. There was a WARN_ON(!new_page)
but it would still proceed to use the NULL pointer and then crash.
This case does seem somewhat rare but when the system is under memory
pressure it can happen. One case where I can duplicate this with some
frequency is when writing over a smbd share to a SATA HDD attached to an
imx6q.
Setting /proc/sys/vm/min_free_kbytes to higher values also seems to solve
the problem for my test case. But it still seems wrong that the fec driver
ignores the memory allocation error and can crash.
This commit handles the allocation error by dropping the current packet.
Fixes: 95698ff6177b5 ("net: fec: using page pool to manage RX buffers") Signed-off-by: Kevin Groeneveld <kgroeneveld@lenbrook.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250113154846.1765414-1-kgroeneveld@lenbrook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
John Sperbeck [Tue, 14 Jan 2025 01:13:54 +0000 (17:13 -0800)]
net: netpoll: ensure skb_pool list is always initialized
When __netpoll_setup() is called directly, instead of through
netpoll_setup(), the np->skb_pool list head isn't initialized.
If skb_pool_flush() is later called, then we hit a NULL pointer
in skb_queue_purge_reason(). This can be seen with this repro,
when CONFIG_NETCONSOLE is enabled as a module:
ip tuntap add mode tap tap0
ip link add name br0 type bridge
ip link set dev tap0 master br0
modprobe netconsole netconsole=4444@10.0.0.1/br0,9353@10.0.0.2/
rmmod netconsole
David Lechner [Tue, 14 Jan 2025 21:45:52 +0000 (15:45 -0600)]
hwmon: (tmp513) Fix division of negative numbers
Fix several issues with division of negative numbers in the tmp513
driver.
The docs on the DIV_ROUND_CLOSEST macro explain that dividing a negative
value by an unsigned type is undefined behavior. The driver was doing
this in several places, i.e. data->shunt_uohms has type of u32. The
actual "undefined" behavior is that it converts both values to unsigned
before doing the division, for example:
Furthermore the MILLI macro has a type of unsigned long. Multiplying a
signed long by an unsigned long results in an unsigned long.
So, we need to cast both MILLI and data data->shunt_uohms to long when
using the DIV_ROUND_CLOSEST macro.
Fixes: f07f9d2467f4 ("hwmon: (tmp513) Use SI constants from units.h") Fixes: 59dfa75e5d82 ("hwmon: Add driver for Texas Instruments TMP512/513 sensor chips.") Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20250114-fix-si-prefix-macro-sign-bugs-v1-1-696fd8d10f00@baylibre.com
[groeck: Drop some continuation lines] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Dan Carpenter [Mon, 13 Jan 2025 06:18:39 +0000 (09:18 +0300)]
nfp: bpf: prevent integer overflow in nfp_bpf_event_output()
The "sizeof(struct cmsg_bpf_event) + pkt_size + data_size" math could
potentially have an integer wrapping bug on 32bit systems. Check for
this and return an error.
Jakub Kicinski [Tue, 14 Jan 2025 21:41:16 +0000 (13:41 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Fix E825 initialization
Grzegorz Nitka says:
E825 products have incorrect initialization procedure, which may lead to
initialization failures and register values.
Fix E825 products initialization by adding correct sync delay, checking
the PHY revision only for current PHY and adding proper destination
device when reading port/quad.
In addition, E825 uses PF ID for indexing per PF registers and as
a primary PHY lane number, which is incorrect.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Add correct PHY lane assignment
ice: Fix ETH56G FC-FEC Rx offset value
ice: Fix quad registers read on E825
ice: Fix E825 initialization
====================
====================
mptcp: fixes for connect selftest flakes
Last week, Jakub reported [1] that the MPTCP Connect selftest was
unstable. It looked like it started after the introduction of some fixes
[2]. After analysis from Paolo, these patches revealed existing bugs,
that should be fixed by the following patches.
- Patch 1: Make sure ACK are sent when MPTCP-level window re-opens. In
some corner cases, the other peer was not notified when more data
could be sent. A fix for v5.11, but depending on a feature introduced
in v5.19.
- Patch 2: Fix spurious wake-up under memory pressure. In this
situation, the userspace could be invited to read data not being there
yet. A fix for v6.7.
- Patch 3: Fix a false positive error when running the MPTCP Connect
selftest with the "disconnect" cases. The userspace could disconnect
the socket too soon, which would reset (MP_FASTCLOSE) the connection,
interpreted as an error by the test. A fix for v5.17.
Paolo Abeni [Mon, 13 Jan 2025 15:44:58 +0000 (16:44 +0100)]
selftests: mptcp: avoid spurious errors on disconnect
The disconnect test-case generates spurious errors:
INFO: disconnect
INFO: extra options: -I 3 -i /tmp/tmp.r43niviyoI
01 ns1 MPTCP -> ns1 (10.0.1.1:10000 ) MPTCP (duration 140ms) [FAIL]
file received by server does not match (in, out):
Unexpected revents: POLLERR/POLLNVAL(19)
-rw-r--r-- 1 root root 10028676 Jan 10 10:47 /tmp/tmp.r43niviyoI.disconnect
Trailing bytes are:
��\����R���!8��u2��5N%
-rw------- 1 root root 9992290 Jan 10 10:47 /tmp/tmp.Os4UbnWbI1
Trailing bytes are:
��\����R���!8��u2��5N%
02 ns1 MPTCP -> ns1 (dead:beef:1::1:10001) MPTCP (duration 206ms) [ OK ]
03 ns1 MPTCP -> ns1 (dead:beef:1::1:10002) TCP (duration 31ms) [ OK ]
04 ns1 TCP -> ns1 (dead:beef:1::1:10003) MPTCP (duration 26ms) [ OK ]
[FAIL] Tests of the full disconnection have failed
Time: 2 seconds
The root cause is actually in the user-space bits: the test program
currently disconnects as soon as all the pending data has been spooled,
generating an FASTCLOSE. If such option reaches the peer before the
latter has reached the closed status, the msk socket will report an
error to the user-space, as per protocol specification, causing the
above failure.
Address the issue explicitly waiting for all the relevant sockets to
reach a closed status before performing the disconnect.
Paolo Abeni [Mon, 13 Jan 2025 15:44:57 +0000 (16:44 +0100)]
mptcp: fix spurious wake-up on under memory pressure
The wake-up condition currently implemented by mptcp_epollin_ready()
is wrong, as it could mark the MPTCP socket as readable even when
no data are present and the system is under memory pressure.
Explicitly check for some data being available in the receive queue.
Fixes: 5684ab1a0eff ("mptcp: give rcvlowat some love") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-2-0d986ee7b1b6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 13 Jan 2025 15:44:56 +0000 (16:44 +0100)]
mptcp: be sure to send ack when mptcp-level window re-opens
mptcp_cleanup_rbuf() is responsible to send acks when the user-space
reads enough data to update the receive windows significantly.
It tries hard to avoid acquiring the subflow sockets locks by checking
conditions similar to the ones implemented at the TCP level.
To avoid too much code duplication - the MPTCP protocol can't reuse the
TCP helpers as part of the relevant status is maintained into the msk
socket - and multiple costly window size computation, mptcp_cleanup_rbuf
uses a rough estimate for the most recently advertised window size:
the MPTCP receive free space, as recorded as at last-ack time.
Unfortunately the above does not allow mptcp_cleanup_rbuf() to detect
a zero to non-zero win change in some corner cases, skipping the
tcp_cleanup_rbuf call and leaving the peer stuck.
After commit ea66758c1795 ("tcp: allow MPTCP to update the announced
window"), MPTCP has actually cheap access to the announced window value.
Use it in mptcp_cleanup_rbuf() for a more accurate ack generation.
Maíra Canal [Mon, 13 Jan 2025 15:47:40 +0000 (12:47 -0300)]
drm/v3d: Ensure job pointer is set to NULL after job completion
After a job completes, the corresponding pointer in the device must
be set to NULL. Failing to do so triggers a warning when unloading
the driver, as it appears the job is still active. To prevent this,
assign the job pointer to NULL after completing the job, indicating
the job has finished.
Viresh Kumar [Fri, 10 Jan 2025 05:53:10 +0000 (11:23 +0530)]
cpufreq: Move endif to the end of Kconfig file
It is possible to enable few cpufreq drivers, without the framework
being enabled. This happened due to a bug while moving the entries
earlier. Fix it.
Fixes: 7ee1378736f0 ("cpufreq: Move CPPC configs to common Kconfig and add RISC-V") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://patch.msgid.link/84ac7a8fa72a8fe20487bb0a350a758bce060965.1736488384.git.viresh.kumar@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Tue, 14 Jan 2025 18:07:40 +0000 (10:07 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"One iscsi driver fix and one core fix.
The core fix is an important one because a retry efficiency update is
now causing some USB devices to get the wrong size on discovery (it
upset their retry logic for READ_CAPACITY_16)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request
scsi: core: Fix command pass through retry regression
Ian Forbes [Fri, 10 Jan 2025 18:53:35 +0000 (12:53 -0600)]
drm/vmwgfx: Add new keep_resv BO param
Adds a new BO param that keeps the reservation locked after creation.
This removes the need to re-reserve the BO after creation which is a
waste of cycles.
This also fixes a bug in vmw_prime_import_sg_table where the imported
reservation is unlocked twice.
Linus Torvalds [Tue, 14 Jan 2025 17:54:57 +0000 (09:54 -0800)]
Merge tag 'sound-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Hopefully the last PR for 6.13. This became bigger than wished due to
the timing after holiday breaks.
The only large LOC is the additional document for Cirrus codec which
is nice for users (and absolutely safe). All the rest are small fixes
in ASoC Rcar and codecs as well as HD-audio quirks (And no fix for USB
guitar pedals seen yet :)"
* tag 'sound-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Fix volume adjustment issue on Lenovo ThinkBook 16P Gen5
ALSA: hda/realtek: fixup ASUS H7606W
ALSA: hda/realtek: fixup ASUS GA605W
ALSA: hda/realtek: Add support for Ayaneo System using CS35L41 HDA
ASoC: rsnd: check rsnd_adg_clk_enable() return value
ASoC: cs42l43: Add codec force suspend/resume ops
ALSA: doc: Add codecs/index.rst to top-level index
ALSA: doc: cs35l56: Add information about Cirrus Logic CS35L54/56/57
ASoC: samsung: Add missing depends on I2C
MAINTAINERS: add missing maintainers for Simple Audio Card
ASoC: samsung: Add missing selects for MFD_WM8994
ASoC: codecs: es8316: Fix HW rate calculation for 48Mhz MCLK
ASoC: wm8994: Add depends on MFD core
ASoC: tas2781: Fix occasional calibration failture
ASoC: codecs: ES8326: Adjust ANA_MICBIAS to reduce pop noise
Christian König [Fri, 20 Dec 2024 15:21:11 +0000 (16:21 +0100)]
drm/amdgpu: always sync the GFX pipe on ctx switch
That is needed to enforce isolation between contexts.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit def59436fb0d3ca0f211d14873d0273d69ebb405) Cc: stable@vger.kernel.org
Jann reports he can trigger a UAF if the target ring unregisters
buffers before the clone operation is fully done. And additionally
also an issue related to node allocation failures. Both of those
stemp from the fact that the cleanup logic puts the buffers manually,
rather than just relying on io_rsrc_data_free() doing it. Hence kill
the manual cleanup code and just let io_rsrc_data_free() handle it,
it'll put the nodes appropriately.
Srinivasan Shanmugam [Thu, 9 Jan 2025 16:03:51 +0000 (21:33 +0530)]
drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX Isolation
This commit addresses a circular locking dependency issue within the GFX
isolation mechanism. The problem was identified by a warning indicating
a potential deadlock due to inconsistent lock acquisition order.
- The `amdgpu_gfx_enforce_isolation_ring_begin_use` and
`amdgpu_gfx_enforce_isolation_ring_end_use` functions previously
acquired `enforce_isolation_mutex` and called `amdgpu_gfx_kfd_sch_ctrl`,
leading to potential deadlocks. ie., If `amdgpu_gfx_kfd_sch_ctrl` is
called while `enforce_isolation_mutex` is held, and
`amdgpu_gfx_enforce_isolation_handler` is called while `kfd_sch_mutex` is
held, it can create a circular dependency.
By ensuring consistent lock usage, this fix resolves the issue:
[ 606.297333] ======================================================
[ 606.297343] WARNING: possible circular locking dependency detected
[ 606.297353] 6.10.0-amd-mlkd-610-311224-lof #19 Tainted: G OE
[ 606.297365] ------------------------------------------------------
[ 606.297375] kworker/u96:3/3825 is trying to acquire lock:
[ 606.297385] ffff9aa64e431cb8 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}, at: __flush_work+0x232/0x610
[ 606.297413]
but task is already holding lock:
[ 606.297423] ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu]
[ 606.297725]
which lock already depends on the new lock.
v2: Refactor lock handling to resolve circular dependency (Alex)
- Introduced a `sched_work` flag to defer the call to
`amdgpu_gfx_kfd_sch_ctrl` until after releasing
`enforce_isolation_mutex`.
- This change ensures that `amdgpu_gfx_kfd_sch_ctrl` is called outside
the critical section, preventing the circular dependency and deadlock.
- The `sched_work` flag is set within the mutex-protected section if
conditions are met, and the actual function call is made afterward.
- This approach ensures consistent lock acquisition order.
Fixes: afefd6f24502 ("drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0b6b2dd38336d5fd49214f0e4e6495e658e3ab44) Cc: stable@vger.kernel.org
The yt2_1380_fc_serdev_probe() function calls devm_serdev_device_open()
before setting the client ops via serdev_device_set_client_ops(). This
ordering can trigger a NULL pointer dereference in the serdev controller's
receive_buf handler, as it assumes serdev->ops is valid when
SERPORT_ACTIVE is set.
This is similar to the issue fixed in commit 5e700b384ec1
("platform/chrome: cros_ec_uart: properly fix race condition") where
devm_serdev_device_open() was called before fully initializing the
device.
Fix the race by ensuring client ops are set before enabling the port via
devm_serdev_device_open().
Note, serdev_device_set_baudrate() and serdev_device_set_flow_control()
calls should be after the devm_serdev_device_open() call.
Fixes: b2ed33e8d486 ("platform/x86: Add lenovo-yoga-tab2-pro-1380-fastcharger driver") Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20250111180951.2277757-1-chenyuan0y@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
The dell_uart_bl_serdev_probe() function calls devm_serdev_device_open()
before setting the client ops via serdev_device_set_client_ops(). This
ordering can trigger a NULL pointer dereference in the serdev controller's
receive_buf handler, as it assumes serdev->ops is valid when
SERPORT_ACTIVE is set.
This is similar to the issue fixed in commit 5e700b384ec1
("platform/chrome: cros_ec_uart: properly fix race condition") where
devm_serdev_device_open() was called before fully initializing the
device.
Fix the race by ensuring client ops are set before enabling the port via
devm_serdev_device_open().
Note, serdev_device_set_baudrate() and serdev_device_set_flow_control()
calls should be after the devm_serdev_device_open() call.
Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver") Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20250111180118.2274516-1-chenyuan0y@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Steven Rostedt [Tue, 14 Jan 2025 15:12:02 +0000 (10:12 -0500)]
ftrace: Document that multiple function_graph tracing may have different times
The function graph tracer now calculates the calltime internally and for
each instance. If there are two instances that are running function graph
tracer and are tracing the same functions, the timings of the length of
those functions may be slightly different:
# trace-cmd record -B foo -p function_graph -B bar -p function_graph sleep 5
# trace-cmd report
[..]
bar: sleep-981 [000] ...1. 1101.109027: funcgraph_entry: 0.764 us | mutex_unlock(); (ret=0xffff8abcc256c300)
foo: sleep-981 [000] ...1. 1101.109028: funcgraph_entry: 0.748 us | mutex_unlock(); (ret=0xffff8abcc256c300)
bar: sleep-981 [000] ..... 1101.109029: funcgraph_exit: 2.456 us | } (ret=0xffff8abcc256c300)
foo: sleep-981 [000] ..... 1101.109029: funcgraph_exit: 2.403 us | } (ret=0xffff8abcc256c300)
bar: sleep-981 [000] d..1. 1101.109031: funcgraph_entry: 0.844 us | fpregs_assert_state_consistent(); (ret=0x0)
foo: sleep-981 [000] d..1. 1101.109032: funcgraph_entry: 0.803 us | fpregs_assert_state_consistent(); (ret=0x0)