www.infradead.org Git - users/dwmw2/linux.git/log

nfsd: cancel nfsd_shrinker_work using sync mode in nfs4_state_shutdown_net

In the normal case, when we excute `echo 0 > /proc/fs/nfsd/threads`, the
function `nfs4_state_destroy_net` in `nfs4_state_shutdown_net` will
release all resources related to the hashed `nfs4_client`. If the
`nfsd_client_shrinker` is running concurrently, the `expire_client`
function will first unhash this client and then destroy it. This can
lead to the following warning. Additionally, numerous use-after-free
errors may occur as well.

nfsd_client_shrinker         echo 0 > /proc/fs/nfsd/threads

expire_client                nfsd_shutdown_net
  unhash_client                ...
                               nfs4_state_shutdown_net
                                 /* won't wait shrinker exit */
  /*                             cancel_work(&nn->nfsd_shrinker_work)
   * nfsd_file for this          /* won't destroy unhashed client1 */
   * client1 still alive         nfs4_state_destroy_net
   */

                               nfsd_file_cache_shutdown
                                 /* trigger warning */
                                 kmem_cache_destroy(nfsd_file_slab)
                                 kmem_cache_destroy(nfsd_file_mark_slab)
  /* release nfsd_file and mark */
  __destroy_client

====================================================================
BUG nfsd_file (Not tainted): Objects remaining in nfsd_file on
__kmem_cache_shutdown()
--------------------------------------------------------------------
CPU: 4 UID: 0 PID: 764 Comm: sh Not tainted 6.12.0-rc3+ #1

dump_stack_lvl+0x53/0x70
slab_err+0xb0/0xf0
__kmem_cache_shutdown+0x15c/0x310
kmem_cache_destroy+0x66/0x160
nfsd_file_cache_shutdown+0xac/0x210 [nfsd]
nfsd_destroy_serv+0x251/0x2a0 [nfsd]
nfsd_svc+0x125/0x1e0 [nfsd]
write_threads+0x16a/0x2a0 [nfsd]
nfsctl_transaction_write+0x74/0xa0 [nfsd]
vfs_write+0x1a5/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e

====================================================================
BUG nfsd_file_mark (Tainted: G    B   W         ): Objects remaining
nfsd_file_mark on __kmem_cache_shutdown()
--------------------------------------------------------------------

dump_stack_lvl+0x53/0x70
slab_err+0xb0/0xf0
__kmem_cache_shutdown+0x15c/0x310
kmem_cache_destroy+0x66/0x160
nfsd_file_cache_shutdown+0xc8/0x210 [nfsd]
nfsd_destroy_serv+0x251/0x2a0 [nfsd]
nfsd_svc+0x125/0x1e0 [nfsd]
write_threads+0x16a/0x2a0 [nfsd]
nfsctl_transaction_write+0x74/0xa0 [nfsd]
vfs_write+0x1a5/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e

To resolve this issue, cancel `nfsd_shrinker_work` using synchronous
mode in nfs4_state_shutdown_net.

Fixes: 7c24fa225081 ("NFSD: replace delayed_work with work_struct for nfsd_client_shrinker")
Signed-off-by: Yang Erkun <yangerkun@huaweicloud.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space

There are two pages in one TLB entry on LoongArch system. For kernel
space, it requires both two pte entries (buddies) with PAGE_GLOBAL bit
set, otherwise HW treats it as non-global tlb, there will be potential
problems if tlb entry for kernel space is not global. Such as fail to
flush kernel tlb with the function local_flush_tlb_kernel_range() which
supposed only flush tlb with global bit.

Kernel address space areas include percpu, vmalloc, vmemmap, fixmap and
kasan areas. For these areas both two consecutive page table entries
should be enabled with PAGE_GLOBAL bit. So with function set_pte() and
pte_clear(), pte buddy entry is checked and set besides its own pte
entry. However it is not atomic operation to set both two pte entries,
there is problem with test_vmalloc test case.

So function kernel_pte_init() is added to init a pte table when it is
created for kernel address space, and the default initial pte value is
PAGE_GLOBAL rather than zero at beginning. Then only its own pte entry
need update with function set_pte() and pte_clear(), nothing to do with
the pte buddy entry.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Don't crash in stack_top() for tasks without vDSO

Not all tasks have a vDSO mapped, for example kthreads never do. If such
a task ever ends up calling stack_top(), it will derefence the NULL vdso
pointer and crash.

This can for example happen when using kunit:

[<9000000000203874>] stack_top+0x58/0xa8
[<90000000002956cc>] arch_pick_mmap_layout+0x164/0x220
[<90000000003c284c>] kunit_vm_mmap_init+0x108/0x12c
[<90000000003c1fbc>] __kunit_add_resource+0x38/0x8c
[<90000000003c2704>] kunit_vm_mmap+0x88/0xc8
[<9000000000410b14>] usercopy_test_init+0xbc/0x25c
[<90000000003c1db4>] kunit_try_run_case+0x5c/0x184
[<90000000003c3d54>] kunit_generic_run_threadfn_adapter+0x24/0x48
[<900000000022e4bc>] kthread+0xc8/0xd4
[<9000000000200ce8>] ret_from_kernel_thread+0xc/0xa4

Fixes: 803b0fc5c3f2 ("LoongArch: Add process management")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Set correct size for vDSO code mapping

The current size of vDSO code mapping is hardcoded to PAGE_SIZE. This
cannot work for 4KB page size after commit 18efd0b10e0fd77 ("LoongArch:
vDSO: Wire up getrandom() vDSO implementation") because the code size
increases to 8KB. Thus set the code mapping size to its real size, i.e.
PAGE_ALIGN(vdso_end - vdso_start).

Fixes: 18efd0b10e0fd77 ("LoongArch: vDSO: Wire up getrandom() vDSO implementation")
Reviewed-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Enable IRQ if do_ale() triggered in irq-enabled context

Unaligned access exception can be triggered in irq-enabled context such
as user mode, in this case do_ale() may call get_user() which may cause
sleep. Then we will get:

BUG: sleeping function called from invalid context at arch/loongarch/kernel/access-helper.h:7
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 129, name: modprobe
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
CPU: 0 UID: 0 PID: 129 Comm: modprobe Tainted: G        W          6.12.0-rc1+ #1723
Tainted: [W]=WARN
Stack : 9000000105e0bd48 0000000000000000 9000000003803944 9000000105e08000
         9000000105e0bc70 9000000105e0bc78 0000000000000000 0000000000000000
         9000000105e0bc78 0000000000000001 9000000185e0ba07 9000000105e0b890
         ffffffffffffffff 9000000105e0bc78 73924b81763be05b 9000000100194500
         000000000000020c 000000000000000a 0000000000000000 0000000000000003
         00000000000023f0 00000000000e1401 00000000072f8000 0000007ffbb0e260
         0000000000000000 0000000000000000 9000000005437650 90000000055d5000
         0000000000000000 0000000000000003 0000007ffbb0e1f0 0000000000000000
         0000005567b00490 0000000000000000 9000000003803964 0000007ffbb0dfec
         00000000000000b0 0000000000000007 0000000000000003 0000000000071c1d
         ...
Call Trace:
[<9000000003803964>] show_stack+0x64/0x1a0
[<9000000004c57464>] dump_stack_lvl+0x74/0xb0
[<9000000003861ab4>] __might_resched+0x154/0x1a0
[<900000000380c96c>] emulate_load_store_insn+0x6c/0xf60
[<9000000004c58118>] do_ale+0x78/0x180
[<9000000003801bc8>] handle_ale+0x128/0x1e0

So enable IRQ if unaligned access exception is triggered in irq-enabled
context to fix it.

Cc: stable@vger.kernel.org
Reported-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Get correct cores_per_package for SMT systems

In loongson_sysconf, The "core" of cores_per_node and cores_per_package
stands for a logical core, which means in a SMT system it stands for a
thread indeed. This information is gotten from SMBIOS Type4 Structure,
so in order to get a correct cores_per_package for both SMT and non-SMT
systems in parse_cpu_table() we should use SMBIOS_THREAD_PACKAGE_OFFSET
instead of SMBIOS_CORE_PACKAGE_OFFSET.

Cc: stable@vger.kernel.org
Reported-by: Chao Li <lichao@loongson.cn>
Tested-by: Chao Li <lichao@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Use "Exception return address" to comment ERA

The information contained in the comment for LOONGARCH_CSR_ERA is even
less informative than the macro itself, which can cause confusion for
junior developers. Let's use the full English term.

Signed-off-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

platform/x86:intel/pmc: Revert "Enable the ACPI PM Timer to be turned off when suspended"

Commit e86c8186d03a ("platform/x86:intel/pmc: Enable the ACPI PM Timer to
be turned off when suspended") can cause the suspend process to hang as
the pmcdev->lock in the pmc_core_acpi_pm_timer_suspend_resume might already
be held by the pmc_core_mphy_pg_show or pmc_core_pll_show if the userspace
gets frozen when these functions are being executed.

Also, pmc_core_acpi_pm_timer_suspend_resume must not sleep, as this
function is called indirectly by the tick_freeze function in
kernel/time/tick-common.c, which holds the spinlock.

Revert the changes for now to fix these issues.

Fixes: e86c8186d03a ("platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended")
Reported-by: Luca Coelho <luca@coelho.fi>
Closes: https://lore.kernel.org/lkml/40555604c3f4be43bf72e72d5409eaece4be9320.camel@coelho.fi/
Signed-off-by: Marek Maslanka <mmaslanka@google.com>
Link: https://lore.kernel.org/r/20241012182656.2107178-1-mmaslanka@google.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

drm/bridge: tc358767: fix missing of_node_put() in for_each_endpoint_of_node()

for_each_endpoint_of_node() requires a call to of_node_put() for every
early exit. A new error path was added to the loop without observing
this requirement.

Add the missing call to of_node_put() in the error path.

Fixes: 1fb4dceeedc5 ("drm/bridge: tc358767: Add configurable default preemphasis")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Link: https://lore.kernel.org/r/20241013-tc358767-of_node_put-v1-1-97431772c0ff@gmail.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20241013-tc358767-of_node_put-v1-1-97431772c0ff@gmail.com

drm/bridge: Fix assignment of the of_node of the parent to aux bridge

The assignment of the of_node to the aux bridge needs to mark the
of_node as reused as well, otherwise resource providers like pinctrl will
report a gpio as already requested by a different device when both pinconf
and gpios property are present.
Fix that by using the device_set_of_node_from_dev() helper instead.

Fixes: 6914968a0b52 ("drm/bridge: properly refcount DT nodes in aux bridge drivers")
Cc: stable@vger.kernel.org # 6.8
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20241018-drm-aux-bridge-mark-of-node-reused-v2-1-aeed1b445c7d@linaro.org
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20241018-drm-aux-bridge-mark-of-node-reused-v2-1-aeed1b445c7d@linaro.org

Merge patch series "fs/super.c: introduce get_tree_bdev_flags()"

Allison Karlitskaya <allison.karlitskaya@redhat.com> says:

In context of my work on composefs/bootc I've been testing the new support
for directly mounting files with erofs (ie: without a loopback device) and
it's working well.  Thanks for adding this feature --- it's a huge quality
of life improvement for us.

I've observed a strange behaviour, though: when mounting a file as an
erofs, if you read() the filesystem context fd, you always get the
following error message reported: Can't lookup blockdev.

That's caused by the code in erofs_fc_get_tree() trying to call
get_tree_bdev() and recovering from the error in case it was ENOTBLK and
CONFIG_EROFS_FS_BACKED_BY_FILE.  Unfortunately, get_tree_bdev() logs the
error directly on the fs_context, so you get the error message even on
successful mounts.

It looks something like this at the syscall level:

fsopen("erofs", FSOPEN_CLOEXEC)         = 3
fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0
fsconfig(3, FSCONFIG_SET_STRING, "source", "/home/lis/src/mountcfs/cfs", 0)
= 0
fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
fsmount(3, FSMOUNT_CLOEXEC, 0)          = 5
move_mount(5, "", AT_FDCWD, "/tmp/composefs.upper.KuT5aV",
MOVE_MOUNT_F_EMPTY_PATH) = 0
read(3, "e /home/lis/src/mountcfs/cfs: Can't lookup blockdev\n", 1024) = 52

This is kernel 6.12.0-0.rc0.20240926git11a299a7933e.13.fc42.x86_64 from
Fedora Rawhide.

It's a pretty minor issue, but it sent me on a wild goose chase for an hour
or two, so probably it should get fixed before the final release.

Gao Xiang <hsiangkao@linux.alibaba.com>:

Fix this by providing a get_tree_bdev_flags() helper which can be used
to silence such warnings.

* patches from https://lore.kernel.org/r/20241009033151.2334888-1-hsiangkao@linux.alibaba.com:
  erofs: use get_tree_bdev_flags() to avoid misleading messages
  fs/super.c: introduce get_tree_bdev_flags()

Link: https://lore.kernel.org/r/20241009033151.2334888-1-hsiangkao@linux.alibaba.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

erofs: use get_tree_bdev_flags() to avoid misleading messages

Users can pass in an arbitrary source path for the proper type of
a mount then without "Can't lookup blockdev" error message.

Reported-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Closes: https://lore.kernel.org/r/CAOYeF9VQ8jKVmpy5Zy9DNhO6xmWSKMB-DO8yvBB0XvBE7=3Ugg@mail.gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241009033151.2334888-2-hsiangkao@linux.alibaba.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

fs/super.c: introduce get_tree_bdev_flags()

As Allison reported [1], currently get_tree_bdev() will store
"Can't lookup blockdev" error message. Although it makes sense for
pure bdev-based fses, this message may mislead users who try to use
EROFS file-backed mounts since get_tree_nodev() is used as a fallback
then.

Add get_tree_bdev_flags() to specify extensible flags [2] and
GET_TREE_BDEV_QUIET_LOOKUP to silence "Can't lookup blockdev" message
since it's misleading to EROFS file-backed mounts now.

[1] https://lore.kernel.org/r/CAOYeF9VQ8jKVmpy5Zy9DNhO6xmWSKMB-DO8yvBB0XvBE7=3Ugg@mail.gmail.com
[2] https://lore.kernel.org/r/ZwUkJEtwIpUA4qMz@infradead.org

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241009033151.2334888-1-hsiangkao@linux.alibaba.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

ACPI: button: Add DMI quirk for Samsung Galaxy Book2 to fix initial lid detection issue

Add a DMI quirk for Samsung Galaxy Book2 to fix an initial lid state
detection issue.

The _LID device incorrectly returns the lid status as "closed" during
boot, causing the system to enter a suspend loop right after booting.

The quirk ensures that the correct lid state is reported initially,
preventing the system from immediately suspending after startup. It
only addresses the initial lid state detection and ensures proper
system behavior upon boot.

Signed-off-by: Shubham Panwar <shubiisp8@gmail.com>
Link: https://patch.msgid.link/20241020095045.6036-2-shubiisp8@gmail.com
[ rjw: Changelog edits ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: resource: Add LG 16T90SP to irq1_level_low_skip_override[]

The LG Gram Pro 16 2-in-1 (2024) the 16T90SP has its keybopard IRQ (1)
described as ActiveLow in the DSDT, which the kernel overrides to EdgeHigh
which breaks the keyboard.

Add the 16T90SP to the irq1_level_low_skip_override[] quirk table to fix
this.

Reported-by: Dirk Holten <dirk.holten@gmx.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219382
Cc: All applicable <stable@vger.kernel.org>
Suggested-by: Dirk Holten <dirk.holten@gmx.de>
Signed-off-by: Christian Heusel <christian@heusel.eu>
Link: https://patch.msgid.link/20241017-lg-gram-pro-keyboard-v2-1-7c8fbf6ff718@heusel.eu
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ASoC: rt722-sdca: increase clk_stop_timeout to fix clock stop issue

clk_stop_timeout should be increased to 900ms to fix clock stop issue.

Signed-off-by: Jack Yu <jack.yu@realtek.com>
Link: https://patch.msgid.link/cd26275d9fc54374a18dc016755cb72d@realtek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ACPI: PRM: Find EFI_MEMORY_RUNTIME block for PRM handler and context

PRMT needs to find the correct type of block to translate the PA-VA
mapping for EFI runtime services.

The issue arises because the PRMT is finding a block of type
EFI_CONVENTIONAL_MEMORY, which is not appropriate for runtime services
as described in Section 2.2.2 (Runtime Services) of the UEFI
Specification [1]. Since the PRM handler is a type of runtime service,
this causes an exception when the PRM handler is called.

    [Firmware Bug]: Unable to handle paging request in EFI runtime service
    WARNING: CPU: 22 PID: 4330 at drivers/firmware/efi/runtime-wrappers.c:341
        __efi_queue_work+0x11c/0x170
    Call trace:

Let PRMT find a block with EFI_MEMORY_RUNTIME for PRM handler and PRM
context.

If no suitable block is found, a warning message will be printed, but
the procedure continues to manage the next PRM handler.

However, if the PRM handler is actually called without proper allocation,
it would result in a failure during error handling.

By using the correct memory types for runtime services, ensure that the
PRM handler and the context are properly mapped in the virtual address
space during runtime, preventing the paging request error.

The issue is really that only memory that has been remapped for runtime
by the firmware can be used by the PRM handler, and so the region needs
to have the EFI_MEMORY_RUNTIME attribute.

Link: https://uefi.org/sites/default/files/resources/UEFI_Spec_2_10_Aug29.pdf
Fixes: cefc7ca46235 ("ACPI: PRM: implement OperationRegion handler for the PlatformRtMechanism subtype")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Koba Ko <kobak@nvidia.com>
Reviewed-by: Matthew R. Ochs <mochs@nvidia.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://patch.msgid.link/20241012205010.4165798-1-kobak@nvidia.com
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

powercap: dtpm_devfreq: Fix error check against dev_pm_qos_add_request()

The caller of the function dev_pm_qos_add_request() checks again a non
zero value but dev_pm_qos_add_request() can return '1' if the request
already exists. Therefore, the setup function fails while the QoS
request actually did not failed.

Fix that by changing the check against a negative value like all the
other callers of the function.

Fixes: e44655617317 ("powercap/drivers/dtpm: Add dtpm devfreq with energy model support")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20241018021205.46460-1-yuancan@huawei.com
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: docs: Reflect latency changes in docs

There were two changes related to transition latency recently.
Namely commit e13aa799c2a6 ("cpufreq: Change default transition delay
to 2ms") and
commit 37c6dccd6837 ("cpufreq: Remove LATENCY_MULTIPLIER").

Both changed the defaults / maximums so let the documentation
reflect that.

Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/46853b6e-bad5-4ace-9b23-ff157f234ae3@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

virtio_net: fix integer overflow in stats

Static analysis on linux-next has detected the following issue
in function virtnet_stats_ctx_init, in drivers/net/virtio_net.c :

        if (vi->device_stats_cap & VIRTIO_NET_STATS_TYPE_CVQ) {
                queue_type = VIRTNET_Q_TYPE_CQ;
                ctx->bitmap[queue_type]   |= VIRTIO_NET_STATS_TYPE_CVQ;
                ctx->desc_num[queue_type] += ARRAY_SIZE(virtnet_stats_cvq_desc);
                ctx->size[queue_type]     += sizeof(struct virtio_net_stats_cvq);
        }

ctx->bitmap is declared as a u32 however it is being bit-wise or'd with
VIRTIO_NET_STATS_TYPE_CVQ and this is defined as 1 << 32:

include/uapi/linux/virtio_net.h:#define VIRTIO_NET_STATS_TYPE_CVQ (1ULL << 32)

..and hence the bit-wise or operation won't set any bits in ctx->bitmap
because 1ULL < 32 is too wide for a u32.

In fact, the field is read into a u64:

       u64 offset, bitmap;
....
       bitmap = ctx->bitmap[queue_type];

so to fix, it is enough to make bitmap an array of u64.

Fixes: 941168f8b40e5 ("virtio_net: support device stats")
Reported-by: "Colin King (gmail)" <colin.i.king@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/53e2bd6728136d5916e384a7840e5dc7eebff832.1729099611.git.mst@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: fix races in netdev_tx_sent_queue()/dev_watchdog()

Some workloads hit the infamous dev_watchdog() message:

"NETDEV WATCHDOG: eth0 (xxxx): transmit queue XX timed out"

It seems possible to hit this even for perfectly normal
BQL enabled drivers:

1) Assume a TX queue was idle for more than dev->watchdog_timeo
   (5 seconds unless changed by the driver)

2) Assume a big packet is sent, exceeding current BQL limit.

3) Driver ndo_start_xmit() puts the packet in TX ring,
   and netdev_tx_sent_queue() is called.

4) QUEUE_STATE_STACK_XOFF could be set from netdev_tx_sent_queue()
   before txq->trans_start has been written.

5) txq->trans_start is written later, from netdev_start_xmit()

    if (rc == NETDEV_TX_OK)
          txq_trans_update(txq)

dev_watchdog() running on another cpu could read the old
txq->trans_start, and then see QUEUE_STATE_STACK_XOFF, because 5)
did not happen yet.

To solve the issue, write txq->trans_start right before one XOFF bit
is set :

- _QUEUE_STATE_DRV_XOFF from netif_tx_stop_queue()
- __QUEUE_STATE_STACK_XOFF from netdev_tx_sent_queue()

From dev_watchdog(), we have to read txq->state before txq->trans_start.

Add memory barriers to enforce correct ordering.

In the future, we could avoid writing over txq->trans_start for normal
operations, and rename this field to txq->xoff_start_time.

Fixes: bec251bc8b6a ("net: no longer stop all TX queues in dev_watchdog()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241015194118.3951657-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: wwan: fix global oob in wwan_rtnl_policy

The variable wwan_rtnl_link_ops assign a *bigger* maxtype which leads to
a global out-of-bounds read when parsing the netlink attributes. Exactly
same bug cause as the oob fixed in commit b33fb5b801c6 ("net: qualcomm:
rmnet: fix global oob in rmnet_policy").

==================================================================
BUG: KASAN: global-out-of-bounds in validate_nla lib/nlattr.c:388 [inline]
BUG: KASAN: global-out-of-bounds in __nla_validate_parse+0x19d7/0x29a0 lib/nlattr.c:603
Read of size 1 at addr ffffffff8b09cb60 by task syz.1.66276/323862

CPU: 0 PID: 323862 Comm: syz.1.66276 Not tainted 6.1.70 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x177/0x231 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:284 [inline]
print_report+0x14f/0x750 mm/kasan/report.c:395
kasan_report+0x139/0x170 mm/kasan/report.c:495
validate_nla lib/nlattr.c:388 [inline]
__nla_validate_parse+0x19d7/0x29a0 lib/nlattr.c:603
__nla_parse+0x3c/0x50 lib/nlattr.c:700
nla_parse_nested_deprecated include/net/netlink.h:1269 [inline]
__rtnl_newlink net/core/rtnetlink.c:3514 [inline]
rtnl_newlink+0x7bc/0x1fd0 net/core/rtnetlink.c:3623
rtnetlink_rcv_msg+0x794/0xef0 net/core/rtnetlink.c:6122
netlink_rcv_skb+0x1de/0x420 net/netlink/af_netlink.c:2508
netlink_unicast_kernel net/netlink/af_netlink.c:1326 [inline]
netlink_unicast+0x74b/0x8c0 net/netlink/af_netlink.c:1352
netlink_sendmsg+0x882/0xb90 net/netlink/af_netlink.c:1874
sock_sendmsg_nosec net/socket.c:716 [inline]
__sock_sendmsg net/socket.c:728 [inline]
____sys_sendmsg+0x5cc/0x8f0 net/socket.c:2499
___sys_sendmsg+0x21c/0x290 net/socket.c:2553
__sys_sendmsg net/socket.c:2582 [inline]
__do_sys_sendmsg net/socket.c:2591 [inline]
__se_sys_sendmsg+0x19e/0x270 net/socket.c:2589
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x45/0x90 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f67b19a24ad
RSP: 002b:00007f67b17febb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f67b1b45f80 RCX: 00007f67b19a24ad
RDX: 0000000000000000 RSI: 0000000020005e40 RDI: 0000000000000004
RBP: 00007f67b1a1e01d R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd2513764f R14: 00007ffd251376e0 R15: 00007f67b17fed40
</TASK>

The buggy address belongs to the variable:
wwan_rtnl_policy+0x20/0x40

The buggy address belongs to the physical page:
page:ffffea00002c2700 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xb09c
flags: 0xfff00000001000(reserved|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000001000 ffffea00002c2708 ffffea00002c2708 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner info is not present (never set?)

Memory state around the buggy address:
ffffffff8b09ca00: 05 f9 f9 f9 05 f9 f9 f9 00 01 f9 f9 00 01 f9 f9
ffffffff8b09ca80: 00 00 00 05 f9 f9 f9 f9 00 00 03 f9 f9 f9 f9 f9
>ffffffff8b09cb00: 00 00 00 00 05 f9 f9 f9 00 00 00 00 f9 f9 f9 f9
^
ffffffff8b09cb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

According to the comment of `nla_parse_nested_deprecated`, use correct size
`IFLA_WWAN_MAX` here to fix this issue.

Fixes: 88b710532e53 ("wwan: add interface creation support")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241015131621.47503-1-linma@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netfilter: xtables: fix typo causing some targets not to load on IPv6

- There is no NFPROTO_IPV6 family for mark and NFLOG.
- TRACE is also missing module autoload with NFPROTO_IPV6.

This results in ip6tables failing to restore a ruleset. This issue has been
reported by several users providing incomplete patches.

Very similar to Ilya Katsnelson's patch including a missing chunk in the
TRACE extension.

Fixes: 0bfcb7b71e73 ("netfilter: xtables: avoid NFPROTO_UNSPEC where needed")
Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Reported-by: Ilya Katsnelson <me@0upti.me>
Reported-by: Krzysztof Olędzki <ole@ans.pl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

fbdev: wm8505fb: select CONFIG_FB_IOMEM_FOPS

The fb_io_mmap() function is used in the file operations but
not enabled in all configurations unless FB_IOMEM_FOPS gets
selected:

ld.lld-20: error: undefined symbol: fb_io_mmap
referenced by wm8505fb.c
drivers/video/fbdev/wm8505fb.o:(wm8505fb_ops) in archive vmlinux.a

Fixes: 11754a504608 ("fbdev/wm8505fb: Initialize fb_ops to fbdev I/O-memory helpers")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Helge Deller <deller@gmx.de>

Merge branch 'fsl-fman-fix-refcount-handling-of-fman-related-devices'

Aleksandr Mishin says:

====================
fsl/fman: Fix refcount handling of fman-related devices

The series is intended to fix refcount handling for fman-related "struct
device" objects - the devices are not released upon driver removal or in
the error paths during probe. This leads to device reference leaks.

The device pointers are now saved to struct mac_device and properly handled
in the driver's probe and removal functions.

Originally reported by Simon Horman <horms@kernel.org>
(https://lore.kernel.org/all/20240702133651.GK598357@kernel.org/)

Compile tested only.
====================

Link: https://patch.msgid.link/20241015060122.25709-1-amishin@t-argos.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

fsl/fman: Fix refcount handling of fman-related devices

In mac_probe() there are multiple calls to of_find_device_by_node(),
fman_bind() and fman_port_bind() which takes references to of_dev->dev.
Not all references taken by these calls are released later on error path
in mac_probe() and in mac_remove() which lead to reference leaks.

Add references release.

Fixes: 3933961682a3 ("fsl/fman: Add FMan MAC driver")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

fsl/fman: Save device references taken in mac_probe()

In mac_probe() there are calls to of_find_device_by_node() which takes
references to of_dev->dev. These references are not saved and not released
later on error path in mac_probe() and in mac_remove().

Add new fields into mac_device structure to save references taken for
future use in mac_probe() and mac_remove().

This is a preparation for further reference leaks fix.

Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Revert "fuse: move initialization of fuse_file to fuse_writepages() instead of in callback"

This reverts commit 672c3b7457fcee9656c36a29a4b21ec4a652433e.

fuse_writepages() might be called with no dirty pages after all writable
opens were closed. In this case __fuse_write_file_get() will return NULL
which will trigger the WARNING.

The exact conditions under which this is triggered is unclear and syzbot
didn't find a reproducer yet.

Reported-by: syzbot+217a976dc26ef2fa8711@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/CAJnrk1aQwfvb51wQ5rUSf9N8j1hArTFeSkHqC_3T-mU6_BCD=A@mail.gmail.com/
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

bpf, arm64: Fix address emission with tag-based KASAN enabled

When BPF_TRAMP_F_CALL_ORIG is enabled, the address of a bpf_tramp_image
struct on the stack is passed during the size calculation pass and
an address on the heap is passed during code generation. This may
cause a heap buffer overflow if the heap address is tagged because
emit_a64_mov_i64() will emit longer code than it did during the size
calculation pass. The same problem could occur without tag-based
KASAN if one of the 16-bit words of the stack address happened to
be all-ones during the size calculation pass. Fix the problem by
assuming the worst case (4 instructions) when calculating the size
of the bpf_tramp_image address emission.

Fixes: 19d3c179a377 ("bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG")
Signed-off-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://linux-review.googlesource.com/id/I1496f2bc24fba7a1d492e16e2b94cf43714f2d3c
Link: https://lore.kernel.org/bpf/20241018221644.3240898-1-pcc@google.com

ALSA: hda/tas2781: select CRC32 instead of CRC32_SARWATE

Fix the kconfig option for the tas2781 HDA driver to select CRC32 rather
than CRC32_SARWATE. CRC32_SARWATE is an option from the kconfig
'choice' that selects the specific CRC32 implementation. Selecting a
'choice' option seems to have no effect, but even if it did work, it
would be incorrect for a random driver to override the user's choice.
CRC32 is the correct option to select for crc32() to be available.

Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://patch.msgid.link/20241020175624.7095-1-ebiggers@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda/realtek: Add subwoofer quirk for Acer Predator G9-593

The Acer Predator G9-593 has a 2+1 speaker system which isn't probed
correctly.
This patch adds a quirk with the proper pin connections.

Note that I do not own this laptop, so I cannot guarantee that this
fixes the issue.
Testing was done by other users here:
https://discussion.fedoraproject.org/t/-/118482

This model appears to have two different dev IDs...

- 0x1177 (as seen on the forum link above)
- 0x1178 (as seen on https://linux-hardware.org/?probe=127df9999f)

I don't think the audio system was changed between model revisions, so
the patch applies for both IDs.

Signed-off-by: José Relvas <josemonsantorelvas@gmail.com>
Link: https://patch.msgid.link/20241020102756.225258-1-josemonsantorelvas@gmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: firewire-lib: Avoid division by zero in apply_constraint_to_size()

The step variable is initialized to zero. It is changed in the loop,
but if it's not changed it will remain zero. Add a variable check
before the division.

The observed behavior was introduced by commit 826b5de90c0b
("ALSA: firewire-lib: fix insufficient PCM rule for period/buffer size"),
and it is difficult to show that any of the interval parameters will
satisfy the snd_interval_test() condition with data from the
amdtp_rate_table[] table.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 826b5de90c0b ("ALSA: firewire-lib: fix insufficient PCM rule for period/buffer size")
Signed-off-by: Andrey Shumilin <shum.sdl@nppct.ru>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://patch.msgid.link/20241018060018.1189537-1-shum.sdl@nppct.ru
Signed-off-by: Takashi Iwai <tiwai@suse.de>

i915: fix DRM_I915_GVT_KVMGT dependencies

Depending on x86 and KVM is not enough, as the kvm helper functions
that get called here are controlled by CONFIG_KVM_X86, which is
disabled if both KVM_INTEL and KVM_AMD are turned off.

ERROR: modpost: "kvm_write_track_remove_gfn" [drivers/gpu/drm/i915/kvmgt.ko] undefined!
ERROR: modpost: "kvm_page_track_register_notifier" [drivers/gpu/drm/i915/kvmgt.ko] undefined!
ERROR: modpost: "kvm_page_track_unregister_notifier" [drivers/gpu/drm/i915/kvmgt.ko] undefined!
ERROR: modpost: "kvm_write_track_add_gfn" [drivers/gpu/drm/i915/kvmgt.ko] undefined!

Change the dependency to CONFIG_KVM_X86 instead.

Fixes: ea4290d77bda ("KVM: x86: leave kvm.ko out of the build if no vendor module is requested")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241015152157.2955229-1-arnd@kernel.org
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 341e4023032fba6c02326bfc6babd63ef4039712)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

uprobe: avoid out-of-bounds memory access of fetching args

Uprobe needs to fetch args into a percpu buffer, and then copy to ring
buffer to avoid non-atomic context problem.

Sometimes user-space strings, arrays can be very large, but the size of
percpu buffer is only page size. And store_trace_args() won't check
whether these data exceeds a single page or not, caused out-of-bounds
memory access.

It could be reproduced by following steps:
1. build kernel with CONFIG_KASAN enabled
2. save follow program as test.c

```
\#include <stdio.h>
\#include <stdlib.h>
\#include <string.h>

// If string length large than MAX_STRING_SIZE, the fetch_store_strlen()
// will return 0, cause __get_data_size() return shorter size, and
// store_trace_args() will not trigger out-of-bounds access.
// So make string length less than 4096.
\#define STRLEN 4093

void generate_string(char *str, int n)
{
    int i;
    for (i = 0; i < n; ++i)
    {
        char c = i % 26 + 'a';
        str[i] = c;
    }
    str[n-1] = '\0';
}

void print_string(char *str)
{
    printf("%s\n", str);
}

int main()
{
    char tmp[STRLEN];

    generate_string(tmp, STRLEN);
    print_string(tmp);

    return 0;
}
```
3. compile program
`gcc -o test test.c`

4. get the offset of `print_string()`
```
objdump -t test | grep -w print_string
0000000000401199 g     F .text  000000000000001b              print_string
```

5. configure uprobe with offset 0x1199
```
off=0x1199

cd /sys/kernel/debug/tracing/
echo "p /root/test:${off} arg1=+0(%di):ustring arg2=\$comm arg3=+0(%di):ustring"
> uprobe_events
echo 1 > events/uprobes/enable
echo 1 > tracing_on
```

6. run `test`, and kasan will report error.
==================================================================
BUG: KASAN: use-after-free in strncpy_from_user+0x1d6/0x1f0
Write of size 8 at addr ffff88812311c004 by task test/499CPU: 0 UID: 0 PID: 499 Comm: test Not tainted 6.12.0-rc3+ #18
Hardware name: Red Hat KVM, BIOS 1.16.0-4.al8 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x55/0x70
print_address_description.constprop.0+0x27/0x310
kasan_report+0x10f/0x120
? strncpy_from_user+0x1d6/0x1f0
strncpy_from_user+0x1d6/0x1f0
? rmqueue.constprop.0+0x70d/0x2ad0
process_fetch_insn+0xb26/0x1470
? __pfx_process_fetch_insn+0x10/0x10
? _raw_spin_lock+0x85/0xe0
? __pfx__raw_spin_lock+0x10/0x10
? __pte_offset_map+0x1f/0x2d0
? unwind_next_frame+0xc5f/0x1f80
? arch_stack_walk+0x68/0xf0
? is_bpf_text_address+0x23/0x30
? kernel_text_address.part.0+0xbb/0xd0
? __kernel_text_address+0x66/0xb0
? unwind_get_return_address+0x5e/0xa0
? __pfx_stack_trace_consume_entry+0x10/0x10
? arch_stack_walk+0xa2/0xf0
? _raw_spin_lock_irqsave+0x8b/0xf0
? __pfx__raw_spin_lock_irqsave+0x10/0x10
? depot_alloc_stack+0x4c/0x1f0
? _raw_spin_unlock_irqrestore+0xe/0x30
? stack_depot_save_flags+0x35d/0x4f0
? kasan_save_stack+0x34/0x50
? kasan_save_stack+0x24/0x50
? mutex_lock+0x91/0xe0
? __pfx_mutex_lock+0x10/0x10
prepare_uprobe_buffer.part.0+0x2cd/0x500
uprobe_dispatcher+0x2c3/0x6a0
? __pfx_uprobe_dispatcher+0x10/0x10
? __kasan_slab_alloc+0x4d/0x90
handler_chain+0xdd/0x3e0
handle_swbp+0x26e/0x3d0
? __pfx_handle_swbp+0x10/0x10
? uprobe_pre_sstep_notifier+0x151/0x1b0
irqentry_exit_to_user_mode+0xe2/0x1b0
asm_exc_int3+0x39/0x40
RIP: 0033:0x401199
Code: 01 c2 0f b6 45 fb 88 02 83 45 fc 01 8b 45 fc 3b 45 e4 7c b7 8b 45 e4 48 98 48 8d 50 ff 48 8b 45 e8 48 01 d0 ce
RSP: 002b:00007ffdf00576a8 EFLAGS: 00000206
RAX: 00007ffdf00576b0 RBX: 0000000000000000 RCX: 0000000000000ff2
RDX: 0000000000000ffc RSI: 0000000000000ffd RDI: 00007ffdf00576b0
RBP: 00007ffdf00586b0 R08: 00007feb2f9c0d20 R09: 00007feb2f9c0d20
R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000401040
R13: 00007ffdf0058780 R14: 0000000000000000 R15: 0000000000000000
</TASK>

This commit enforces the buffer's maxlen less than a page-size to avoid
store_trace_args() out-of-memory access.

Link: https://lore.kernel.org/all/20241015060148.1108331-1-mqaio@linux.alibaba.com/
Fixes: dcad1a204f72 ("tracing/uprobes: Fetch args before reserving a ring buffer")
Signed-off-by: Qiao Ma <mqaio@linux.alibaba.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Linux 6.12-rc4

bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path

This fixes a fsck bug on a very old filesystem (pre mainline merge).

Fixes: 72350ee0ea22 ("bcachefs: Kill snapshot arg to fsck_write_inode()")
Reported-by: Marcin Mirosław <marcin@mejor.pl>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Mark more errors as AUTOFIX

Reported-by: Marcin Mirosław <marcin@mejor.pl>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Merge tag 'for-net-2024-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Pull bluetooth fixes from Luiz Augusto Von Dentz:

- ISO: Fix multiple init when debugfs is disabled

- Call iso_exit() on module unload

- Remove debugfs directory on module init failure

- btusb: Fix not being able to reconnect after suspend

- btusb: Fix regression with fake CSR controllers 0a12:0001

- bnep: fix wild-memory-access in proto_unregister

Note: normally the bluetooth fixes go through the networking tree, but
this missed the weekly merge, and two of the commits fix regressions
that have caused a fair amount of noise and have now hit stable too:

  https://lore.kernel.org/all/4e1977ca-6166-4891-965e-34a6f319035f@leemhuis.info/

So I'm pulling it directly just to expedite things and not miss yet
another -rc release. This is not meant to become a new pattern.

* tag 'for-net-2024-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: btusb: Fix regression with fake CSR controllers 0a12:0001
  Bluetooth: bnep: fix wild-memory-access in proto_unregister
  Bluetooth: btusb: Fix not being able to reconnect after suspend
  Bluetooth: Remove debugfs directory on module init failure
  Bluetooth: Call iso_exit() on module unload
  Bluetooth: ISO: Fix multiple init when debugfs is disabled

Merge tag 'pinctrl-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
"Mostly error path fixes, but one pretty serious interrupt problem in
  the Ocelot driver as well:

   - Fix two error paths and a missing semicolon in the Intel driver

   - Add a missing ACPI ID for the Intel Panther Lake

   - Check return value of devm_kasprintf() in the Apple and STM32
     drivers

   - Add a missing mutex_destroy() in the aw9523 driver

   - Fix a double free in cv1800_pctrl_dt_node_to_map() in the Sophgo
     driver

   - Fix a double free in ma35_pinctrl_dt_node_to_map_func() in the
     Nuvoton driver

   - Fix a bug in the Ocelot interrupt handler making the system hang"

* tag 'pinctrl-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: ocelot: fix system hang on level based interrupts
  pinctrl: nuvoton: fix a double free in ma35_pinctrl_dt_node_to_map_func()
  pinctrl: sophgo: fix double free in cv1800_pctrl_dt_node_to_map()
  pinctrl: intel: platform: Add Panther Lake to the list of supported
  pinctrl: aw9523: add missing mutex_destroy
  pinctrl: stm32: check devm_kasprintf() returned value
  pinctrl: apple: check devm_kasprintf() returned value
  pinctrl: intel: platform: use semicolon instead of comma in ncommunities assignment
  pinctrl: intel: platform: fix error path in device_for_each_child_node()

bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations

kvmalloc() doesn't support allocations > INT_MAX, but vmalloc() does -
the limit should be lifted, but we can work around this for now.

A user with a 75 TB filesystem reported the following journal replay
error:
https://github.com/koverstreet/bcachefs/issues/769

In journal replay we have to sort and dedup all the keys from the
journal, which means we need a large contiguous allocation. Given that
the user has 128GB of ram, the 2GB limit on allocation size has become
far too small.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't use wait_event_interruptible() in recovery

Fix a bug where mount was failing with -ERESTARTSYS:
https://github.com/koverstreet/bcachefs/issues/741

We only want the interruptible wait when called from fsync.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix __bch2_fsck_err() warning

We only warn about having a btree_trans that wasn't passed in if we'll
be prompting.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Merge tag 'char-misc-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull misc driver fixes from Greg KH:
"Here are a number of small char/misc/iio driver fixes for 6.12-rc4:

   - loads of small iio driver fixes for reported problems

   - parport driver out-of-bounds fix

   - Kconfig description and MAINTAINERS file updates

  All of these, except for the Kconfig and MAINTAINERS file updates have
  been in linux-next all week. Those other two are just documentation
  changes and will have no runtime issues and were merged on Friday"

* tag 'char-misc-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (39 commits)
  misc: rtsx: list supported models in Kconfig help
  MAINTAINERS: Remove some entries due to various compliance requirements.
  misc: microchip: pci1xxxx: add support for NVMEM_DEVID_AUTO for OTP device
  misc: microchip: pci1xxxx: add support for NVMEM_DEVID_AUTO for EEPROM device
  parport: Proper fix for array out-of-bounds access
  iio: frequency: admv4420: fix missing select REMAP_SPI in Kconfig
  iio: frequency: {admv4420,adrf6780}: format Kconfig entries
  iio: adc: ad4695: Add missing Kconfig select
  iio: adc: ti-ads8688: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig
  iio: hid-sensors: Fix an error handling path in _hid_sensor_set_report_latency()
  iioc: dac: ltc2664: Fix span variable usage in ltc2664_channel_config()
  iio: dac: stm32-dac-core: add missing select REGMAP_MMIO in Kconfig
  iio: dac: ltc1660: add missing select REGMAP_SPI in Kconfig
  iio: dac: ad5770r: add missing select REGMAP_SPI in Kconfig
  iio: amplifiers: ada4250: add missing select REGMAP_SPI in Kconfig
  iio: frequency: adf4377: add missing select REMAP_SPI in Kconfig
  iio: resolver: ad2s1210: add missing select (TRIGGERED_)BUFFER in Kconfig
  iio: resolver: ad2s1210 add missing select REGMAP in Kconfig
  iio: proximity: mb1232: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig
  iio: pressure: bm1390: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig
  ...

Merge tag 'tty-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for 6.12-rc4:

   - qcom-geni serial driver fixes, wow what a mess of a UART chip that
     thing is...

   - vt infoleak fix for odd font sizes

   - imx serial driver bugfix

   - yet-another n_gsm ldisc bugfix, slowly chipping down the issues in
     that piece of code

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'tty-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: qcom-geni: rename suspend functions
  serial: qcom-geni: drop unused receive parameter
  serial: qcom-geni: drop flip buffer WARN()
  serial: qcom-geni: fix rx cancel dma status bit
  serial: qcom-geni: fix receiver enable
  serial: qcom-geni: fix dma rx cancellation
  serial: qcom-geni: fix shutdown race
  serial: qcom-geni: revert broken hibernation support
  serial: qcom-geni: fix polled console initialisation
  serial: imx: Update mctrl old_status on RTSD interrupt
  tty: n_gsm: Fix use-after-free in gsm_cleanup_mux
  vt: prevent kernel-infoleak in con_font_get()

Merge tag 'usb-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes and new device ids for 6.12-rc4:

   - xhci driver fixes for a number of reported issues

   - new usb-serial driver ids

   - dwc3 driver fixes for reported problems.

   - usb gadget driver fixes for reported problems

   - typec driver fixes

   - MAINTAINER file updates

  All of these have been in linux-next this week with no reported issues"

* tag 'usb-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: option: add Telit FN920C04 MBIM compositions
  USB: serial: option: add support for Quectel EG916Q-GL
  xhci: dbc: honor usb transfer size boundaries.
  usb: xhci: Fix handling errors mid TD followed by other errors
  xhci: Mitigate failed set dequeue pointer commands
  xhci: Fix incorrect stream context type macro
  USB: gadget: dummy-hcd: Fix "task hung" problem
  usb: gadget: f_uac2: fix return value for UAC2_ATTRIBUTE_STRING store
  usb: dwc3: core: Fix system suspend on TI AM62 platforms
  xhci: tegra: fix checked USB2 port number
  usb: dwc3: Wait for EndXfer completion before restoring GUSB2PHYCFG
  usb: typec: qcom-pmic-typec: fix sink status being overwritten with RP_DEF
  usb: typec: altmode should keep reference to parent
  MAINTAINERS: usb: raw-gadget: add bug tracker link
  MAINTAINERS: Add an entry for the LJCA drivers

Merge tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- Explicitly disable the TSC deadline timer when going idle to address
   some CPU errata in that area

- Do not apply the Zenbleed fix on anything else except AMD Zen2 on the
   late microcode loading path

- Clear CPU buffers later in the NMI exit path on 32-bit to avoid
   register clearing while they still contain sensitive data, for the
   RDFS mitigation

- Do not clobber EFLAGS.ZF with VERW on the opportunistic SYSRET exit
   path on 32-bit

- Fix parsing issues of memory bandwidth specification in sysfs for
   resctrl's memory bandwidth allocation feature

- Other small cleanups and improvements

* tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Always explicitly disarm TSC-deadline timer
  x86/CPU/AMD: Only apply Zenbleed fix for Zen2 during late microcode load
  x86/bugs: Use code segment selector for VERW operand
  x86/entry_32: Clear CPU buffers after register restore in NMI return
  x86/entry_32: Do not clobber user EFLAGS.ZF
  x86/resctrl: Annotate get_mem_config() functions as __init
  x86/resctrl: Avoid overflow in MB settings in bw_validate()
  x86/amd_nb: Add new PCI ID for AMD family 1Ah model 20h

Merge tag 'irq_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

- Fix a case for sifive-plic where an interrupt gets disabled *and*
   masked and remains masked when it gets reenabled later

- Plug a small race in GIC-v4 where userspace can force an affinity
   change of a virtual CPU (vPE) in its unmapping path

- Do not mix the two sets of ocelot irqchip's registers in the mask
   calculation of the main interrupt sticky register

- Other smaller fixlets and cleanups

* tag 'irq_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/renesas-rzg2l: Fix missing put_device
  irqchip/riscv-intc: Fix SMP=n boot with ACPI
  irqchip/sifive-plic: Unmask interrupt in plic_irq_enable()
  irqchip/gic-v4: Don't allow a VMOVP on a dying VPE
  irqchip/sifive-plic: Return error code on failure
  irqchip/riscv-imsic: Fix output text of base address
  irqchip/ocelot: Comment sticky register clearing code
  irqchip/ocelot: Fix trigger register address
  irqchip: Remove obsolete config ARM_GIC_V3_ITS_PCI

Merge tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduling fixes from Borislav Petkov:

- Add PREEMPT_RT maintainers

- Fix another aspect of delayed dequeued tasks wrt determining their
   state, i.e., whether they're runnable or blocked

- Handle delayed dequeued tasks and their migration wrt PSI properly

- Fix the situation where a delayed dequeue task gets enqueued into a
   new class, which should not happen

- Fix a case where memory allocation would happen while the runqueue
   lock is held, which is a no-no

- Do not over-schedule when tasks with shorter slices preempt the
   currently running task

- Make sure delayed to deque entities are properly handled before
   unthrottling

- Other smaller cleanups and improvements

* tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Add an entry for PREEMPT_RT.
  sched/fair: Fix external p->on_rq users
  sched/psi: Fix mistaken CPU pressure indication after corrupted task state bug
  sched/core: Dequeue PSI signals for blocked tasks that are delayed
  sched: Fix delayed_dequeue vs switched_from_fair()
  sched/core: Disable page allocation in task_tick_mm_cid()
  sched/deadline: Use hrtick_enabled_dl() before start_hrtick_dl()
  sched/eevdf: Fix wakeup-preempt by checking cfs_rq->nr_running
  sched: Fix sched_delayed vs cfs_bandwidth

Merge tag 'for-linus-6.12a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
"A single fix for a build failure introduced this merge window"

* tag 'for-linus-6.12a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Remove dependency between pciback and privcmd

Merge tag 'dma-mapping-6.12-2024-10-20' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:
"Just another small tracing fix from Sean"

* tag 'dma-mapping-6.12-2024-10-20' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: fix tracing dma_alloc/free with vmalloc'd memory

Merge tag 'kvmarm-fixes-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #3

- Stop wasting space in the HYP idmap, as we are dangerously close
to the 4kB limit, and this has already exploded in -next

- Fix another race in vgic_init()

- Fix a UBSAN error when faking the cache topology with MTE
enabled

Merge tag 'kvmarm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #2

- Fix the guest view of the ID registers, making the relevant fields
  writable from userspace (affecting ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1)

- Correcly expose S1PIE to guests, fixing a regression introduced
  in 6.12-rc1 with the S1POE support

- Fix the recycling of stage-2 shadow MMUs by tracking the context
  (are we allowed to block or not) as well as the recycling state

- Address a couple of issues with the vgic when userspace misconfigures
  the emulation, resulting in various splats. Headaches courtesy
  of our Syzkaller friends

RISCV: KVM: use raw_spinlock for critical section in imsic

For the external interrupt updating procedure in imsic, there was a
spinlock to protect it already. But since it should not be preempted in
any cases, we should turn to use raw_spinlock to prevent any preemption
in case PREEMPT_RT was enabled.

Signed-off-by: Cyan Yang <cyan.yang@sifive.com>
Reviewed-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Message-ID: <20240919160126.44487-1-cyan.yang@sifive.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups

When looking for a "mangled", i.e. dynamic, CPUID entry, terminate the
walk based on the number of array _entries_, not the size in bytes of
the array. Iterating based on the total size of the array can result in
false passes, e.g. if the random data beyond the array happens to match
a CPUID entry's function and index.

Fixes: fb18d053b7f8 ("selftest: kvm: x86: test KVM_GET_CPUID2 and guest visible CPUIDs against KVM_GET_SUPPORTED_CPUID")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-ID: <20241003234337.273364-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: x86: Avoid using SSE/AVX instructions

Some distros switched gcc to '-march=x86-64-v3' by default and while it's
hard to find a CPU which doesn't support it today, many KVM selftests fail
with

  ==== Test Assertion Failure ====
    lib/x86_64/processor.c:570: Unhandled exception in guest
    pid=72747 tid=72747 errno=4 - Interrupted system call
    Unhandled exception '0x6' at guest RIP '0x4104f7'

The failure is easy to reproduce elsewhere with

   $ make clean && CFLAGS='-march=x86-64-v3' make -j && ./x86_64/kvm_pv_test

The root cause of the problem seems to be that with '-march=x86-64-v3' GCC
uses AVX* instructions (VMOVQ in the example above) and without prior
XSETBV() in the guest this results in #UD. It is certainly possible to add
it there, e.g. the following saves the day as well:

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-ID: <20240920154422.2890096-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

MAINTAINERS: add samples/pktgen to NETWORKING [GENERAL]

samples/pktgen is missing in the MAINTAINERS file.

Suggested-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Message-ID: <20241018005301.10052-1-liuhangbin@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>

mailmap: update entry for Jesper Dangaard Brouer

Mapping all my previously used emails to my kernel.org email.

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Message-ID: <172909057364.2452383.8019986488234344607.stgit@firesoul>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>

net: dsa: mv88e6xxx: Fix error when setting port policy on mv88e6393x

mv88e6393x_port_set_policy doesn't correctly shift the ptr value when
converting the policy format between the old and new styles, so the
target register ends up with the ptr being written over the data bits.

Shift the pointer to align with the format expected by
mv88e6393x_port_policy_write().

Fixes: 6584b26020fc ("net: dsa: mv88e6xxx: implement .port_set_policy for Amethyst")
Signed-off-by: Peter Rashleigh <peter@rashleigh.ca>
Reviewed-by: Simon Horman <horms@kernel.org>
Message-ID: <20241016040822.3917-1-peter@rashleigh.ca>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>

KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory

Ignore nCR3[4:0] when loading PDPTEs from memory for nested SVM, as bits
4:0 of CR3 are ignored when PAE paging is used, and thus VMRUN doesn't
enforce 32-byte alignment of nCR3.

In the absolute worst case scenario, failure to ignore bits 4:0 can result
in an out-of-bounds read, e.g. if the target page is at the end of a
memslot, and the VMM isn't using guard pages.

Per the APM:

  The CR3 register points to the base address of the page-directory-pointer
  table. The page-directory-pointer table is aligned on a 32-byte boundary,
  with the low 5 address bits 4:0 assumed to be 0.

And the SDM's much more explicit:

  4:0    Ignored

Note, KVM gets this right when loading PDPTRs, it's only the nSVM flow
that is broken.

Fixes: e4e517b4be01 ("KVM: MMU: Do not unconditionally read PDPTE from guest memory")
Reported-by: Kirk Swidowski <swidowski@google.com>
Cc: Andy Nguyen <theflow@google.com>
Cc: 3pvd <3pvd@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20241009140838.1036226-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()

Reset the segment cache after segment initialization in vmx_vcpu_reset()
to harden KVM against caching stale/uninitialized data.  Without the
recent fix to bypass the cache in kvm_arch_vcpu_put(), the following
scenario is possible:

- vCPU is just created, and the vCPU thread is preempted before
   SS.AR_BYTES is written in vmx_vcpu_reset().

- When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() =>
   vmx_get_cpl() reads and caches '0' for SS.AR_BYTES.

- vmx_vcpu_reset() => seg_setup() configures SS.AR_BYTES, but doesn't
   invoke vmx_segment_cache_clear() to invalidate the cache.

As a result, KVM retains a stale value in the cache, which can be read,
e.g. via KVM_GET_SREGS.  Usually this is not a problem because the VMX
segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM
selftests) reads and writes system registers just after the vCPU was
created, _without_ modifying SS.AR_BYTES, userspace will write back the
stale '0' value and ultimately will trigger a VM-Entry failure due to
incorrect SS segment type.

Invalidating the cache after writing the VMCS doesn't address the general
issue of cache accesses from IRQ context being unsafe, but it does prevent
KVM from clobbering the VMCS, i.e. mitigates the harm done _if_ KVM has a
bug that results in an unsafe cache access.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Fixes: 2fb92db1ec08 ("KVM: VMX: Cache vmcs segment fields")
[sean: rework changelog to account for previous patch]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20241009175002.1118178-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL

Massage the documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL to call out that
it applies to moved memslots as well as deleted memslots, to avoid KVM's
"fast zap" terminology (which has no meaning for userspace), and to reword
the documented targeted zap behavior to specifically say that KVM _may_
zap a subset of all SPTEs. As evidenced by the fix to zap non-leafs SPTEs
with gPTEs, formally documenting KVM's exact internal behavior is risky
and unnecessary.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20241009192345.1148353-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()

Add a lockdep assertion in kvm_unmap_gfn_range() to ensure that either
mmu_invalidate_in_progress is elevated, or that the range is being zapped
due to memslot removal (loosely detected by slots_lock being held).
Zapping SPTEs without mmu_invalidate_{in_progress,seq} protection is unsafe
as KVM's page fault path snapshots state before acquiring mmu_lock, and
thus can create SPTEs with stale information if vCPUs aren't forced to
retry faults (due to seeing an in-progress or past MMU invalidation).

Memslot removal is a special case, as the memslot is retrieved outside of
mmu_invalidate_seq, i.e. doesn't use the "standard" protections, and
instead relies on SRCU synchronization to ensure any in-flight page faults
are fully resolved before zapping SPTEs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20241009192345.1148353-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot

When performing a targeted zap on memslot removal, zap only MMU pages that
shadow guest PTEs, as zapping all SPs that "match" the gfn is inexact and
unnecessary.  Furthermore, for_each_gfn_valid_sp() arguably shouldn't
exist, because it doesn't do what most people would it expect it to do.
The "round gfn for level" adjustment that is done for direct SPs (no gPTE)
means that the exact gfn comparison will not get a match, even when a SP
does "cover" a gfn, or was even created specifically for a gfn.

For memslot deletion specifically, KVM's behavior will vary significantly
based on the size and alignment of a memslot, and in weird ways.  E.g. for
a 4KiB memslot, KVM will zap more SPs if the slot is 1GiB aligned than if
it's only 4KiB aligned.  And as described below, zapping SPs in the
aligned case overzaps for direct MMUs, as odds are good the upper-level
SPs are serving other memslots.

To iterate over all potentially-relevant gfns, KVM would need to make a
pass over the hash table for each level, with the gfn used for lookup
rounded for said level.  And then check that the SP is of the correct
level, too, e.g. to avoid over-zapping.

But even then, KVM would massively overzap, as processing every level is
all but guaranteed to zap SPs that serve other memslots, especially if the
memslot being removed is relatively small.  KVM could mitigate that issue
by processing only levels that can be possible guest huge pages, i.e. are
less likely to be re-used for other memslot, but while somewhat logical,
that's quite arbitrary and would be a bit of a mess to implement.

So, zap only SPs with gPTEs, as the resulting behavior is easy to describe,
is predictable, and is explicitly minimal, i.e. KVM only zaps SPs that
absolutely must be zapped.

Cc: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20241009192345.1148353-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

x86/kvm: Override default caching mode for SEV-SNP and TDX

AMD SEV-SNP and Intel TDX have limited access to MTRR: either it is not
advertised in CPUID or it cannot be programmed (on TDX, due to #VE on
CR0.CD clear).

This results in guests using uncached mappings where it shouldn't and
pmd/pud_set_huge() failures due to non-uniform memory type reported by
mtrr_type_lookup().

Override MTRR state, making it WB by default as the kernel does for
Hyper-V guests.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Suggested-by: Binbin Wu <binbin.wu@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241015095818.357915-1-kirill.shutemov@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic

The last use of kvm_vcpu_gfn_to_pfn_atomic was removed by commit
1bbc60d0c7e5 ("KVM: x86/mmu: Remove MMU auditing")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Message-ID: <20241001141354.18009-3-linux@treblig.org>
[Adjust Documentation/virt/kvm/locking.rst. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Remove unused kvm_vcpu_gfn_to_pfn

The last use of kvm_vcpu_gfn_to_pfn was removed by commit
b1624f99aa8f ("KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Message-ID: <20241001141354.18009-2-linux@treblig.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'io_uring-6.12-20241019' of git://git.kernel.dk/linux

Pull one more io_uring fix from Jens Axboe:
"Fix for a regression introduced in 6.12-rc2, where a condition check
  was negated and hence -EAGAIN would bubble back up up to userspace
  rather than trigger a retry condition"

* tag 'io_uring-6.12-20241019' of git://git.kernel.dk/linux:
  io_uring/rw: fix wrong NOWAIT check in io_rw_init_file()

octeon_ep: Add SKB allocation failures handling in __octep_oq_process_rx()

build_skb() returns NULL in case of a memory allocation failure so handle
it inside __octep_oq_process_rx() to avoid NULL pointer dereference.

__octep_oq_process_rx() is called during NAPI polling by the driver. If
skb allocation fails, keep on pulling packets out of the Rx DMA queue: we
shouldn't break the polling immediately and thus falsely indicate to the
octep_napi_poll() that the Rx pressure is going down. As there is no
associated skb in this case, don't process the packets and don't push them
up the network stack - they are skipped.

Helper function is implemented to unmmap/flush all the fragment buffers
used by the dropped packet. 'alloc_failures' counter is incremented to
mark the skb allocation error in driver statistics.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 37d79d059606 ("octeon_ep: add Tx/Rx processing and interrupt support")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>

octeon_ep: Implement helper for iterating packets in Rx queue

The common code with some packet and index manipulations is extracted and
moved to newly implemented helper to make the code more readable and avoid
duplication. This is a preparation for skb allocation failure handling.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Suggested-by: Simon Horman <horms@kernel.org>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>

bnxt_en: replace ptp_lock with irqsave variant

In netpoll configuration the completion processing can happen in hard
irq context which will break with spin_lock_bh() for fullfilling RX
timestamp in case of all packets timestamping. Replace it with
spin_lock_irqsave() variant.

Fixes: 7f5515d19cd7 ("bnxt_en: Get the RX packet timestamp")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Message-ID: <20241016195234.2622004-1-vadfed@meta.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>

net: phy: dp83822: Fix reset pin definitions

This change fixes a rare issue where the PHY fails to detect a link
due to incorrect reset behavior.

The SW_RESET definition was incorrectly assigned to bit 14, which is the
Digital Restart bit according to the datasheet. This commit corrects
SW_RESET to bit 15 and assigns DIG_RESTART to bit 14 as per the
datasheet specifications.

The SW_RESET define is only used in the phy_reset function, which fully
re-initializes the PHY after the reset is performed. The change in the
bit definitions should not have any negative impact on the functionality
of the PHY.

v2:
- added Fixes tag
- improved commit message

Cc: stable@vger.kernel.org
Fixes: 5dc39fd5ef35 ("net: phy: DP83822: Add ability to advertise Fiber connection")
Signed-off-by: Alex Michel <alex.michel@wiedemann-group.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Message-ID: <AS1P250MB0608A798661549BF83C4B43EA9462@AS1P250MB0608.EURP250.PROD.OUTLOOK.COM>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>

MAINTAINERS: add Simon as an official reviewer

Simon has been diligently and consistently reviewing networking
changes for at least as long as our development statistics
go back. Often if not usually topping the list of reviewers.
Make his role official.

Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Message-ID: <20241015153005.2854018-1-kuba@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>

net: plip: fix break; causing plip to never transmit

Since commit
71ae2cb30531 ("net: plip: Fix fall-through warnings for Clang")

plip was not able to send any packets, this patch replaces one
unintended break; with fallthrough; which was originally missed by
commit 9525d69a3667 ("net: plip: mark expected switch fall-throughs").

I have verified with a real hardware PLIP connection that everything
works once again after applying this patch.

Fixes: 71ae2cb30531 ("net: plip: Fix fall-through warnings for Clang")
Signed-off-by: Jakub Boehm <boehm.jakub@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Message-ID: <20241015-net-plip-tx-fix-v1-1-32d8be1c7e0b@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>

be2net: fix potential memory leak in be_xmit()

The be_xmit() returns NETDEV_TX_OK without freeing skb
in case of be_xmit_enqueue() fails, add dev_kfree_skb_any() to fix it.

Fixes: 760c295e0e8d ("be2net: Support for OS2BMC.")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Message-ID: <20241015144802.12150-1-wanghai38@huawei.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>

net/sun3_82586: fix potential memory leak in sun3_82586_send_packet()

The sun3_82586_send_packet() returns NETDEV_TX_OK without freeing skb
in case of skb->len being too long, add dev_kfree_skb() to fix it.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Message-ID: <20241015144148.7918-1-wanghai38@huawei.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>

net: pse-pd: Fix out of bound for loop

Adjust the loop limit to prevent out-of-bounds access when iterating over
PI structures. The loop should not reach the index pcdev->nr_lines since
we allocate exactly pcdev->nr_lines number of PI structures. This fix
ensures proper bounds are maintained during iterations.

Fixes: 9be9567a7c59 ("net: pse-pd: Add support for PSE PIs")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Message-ID: <20241015130255.125508-1-kory.maincent@bootlin.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Fixes all in drivers. The largest is the mpi3mr which corrects a phy
  count limit that should only apply to the controller but was being
  incorrectly applied to expander phys"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target: core: Fix null-ptr-deref in target_alloc_device()
  scsi: mpi3mr: Validate SAS port assignments
  scsi: ufs: core: Set SDEV_OFFLINE when UFS is shut down
  scsi: ufs: core: Requeue aborted request
  scsi: ufs: core: Fix the issue of ICU failure

Merge tag 'ftrace-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ftrace fixes from Steven Rostedt:
"A couple of fixes to function graph infrastructure:

   - Fix allocation of idle shadow stack allocation during hotplug

     If function graph tracing is started when a CPU is offline, if it
     were come online during the trace then the idle task that
     represents the CPU will not get a shadow stack allocated for it.
     This means all function graph hooks that happen while that idle
     task is running (including in interrupt mode) will have all its
     events dropped.

     Switch over to the CPU hotplug mechanism that will have any newly
     brought on line CPU get a callback that can allocate the shadow
     stack for its idle task.

   - Fix allocation size of the ret_stack_list array

     When function graph tracing converted over to allowing more than
     one user at a time, it had to convert its shadow stack from an
     array of ret_stack structures to an array of unsigned longs. The
     shadow stacks are allocated in batches of 32 at a time and assigned
     to every running task. The batch is held by the ret_stack_list
     array.

     But when the conversion happened, instead of allocating an array of
     32 pointers, it was allocated as a ret_stack itself (PAGE_SIZE).
     This ret_stack_list gets passed to a function that iterates over
     what it believes is its size defined by the
     FTRACE_RETSTACK_ALLOC_SIZE macro (which is 32).

     Luckily (PAGE_SIZE) is greater than 32 * sizeof(long), otherwise
     this would have been an array overflow. This still should be fixed
     and the ret_stack_list should be allocated to the size it is
     expected to be as someday it may end up being bigger than
     SHADOW_STACK_SIZE"

* tag 'ftrace-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  fgraph: Allocate ret_stack_list with proper size
  fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks

Merge tag 'ipe-pr-20241018' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe

Pull ipe fixes from Fan Wu:
"This addresses several issues identified by Luca when attempting to
  enable IPE on Debian and systemd:

   - address issues with IPE policy update errors and policy update
     version check, improving the clarity of error messages for better
     understanding by userspace programs.

   - enable IPE policies to be signed by secondary and platform
     keyrings, facilitating broader use across general Linux
     distributions like Debian.

   - updates the IPE entry in the MAINTAINERS file to reflect the new
     tree URL and my updated email from kernel.org"

* tag 'ipe-pr-20241018' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
  MAINTAINERS: update IPE tree url and Fan Wu's email
  ipe: fallback to platform keyring also if key in trusted keyring is rejected
  ipe: allow secondary and platform keyrings to install/update policies
  ipe: also reject policy updates with the same version
  ipe: return -ESTALE instead of -EINVAL on update when new policy has a lower version

Merge tag 'input-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

- a fix for Zinitix driver to not fail probing if the property enabling
   touch keys functionality is not defined. Support for touch keys was
   added in 6.12 merge window so this issue does not affect users of
   released kernels

- a couple new vendor/device IDs in xpad driver to enable support for
   more hardware

* tag 'input-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: zinitix - don't fail if linux,keycodes prop is absent
  Input: xpad - add support for MSI Claw A1M
  Input: xpad - add support for 8BitDo Ultimate 2C Wireless Controller

Merge tag '9p-for-6.12-rc4' of https://github.com/martinetd/linux

Pull 9p fixes from Dominique Martinet:
"Mashed-up update that I sat on too long:

   - fix for multiple slabs created with the same name

   - enable multipage folios

   - theorical fix to also look for opened fids by inode if none was
     found by dentry"

[ Enabling multi-page folios should have been done during the merge
  window, but it's a one-liner, and the actual meat of the enablement
  is in netfs and already in use for other filesystems...  - Linus ]

* tag '9p-for-6.12-rc4' of https://github.com/martinetd/linux:
  9p: Avoid creating multiple slab caches with the same name
  9p: Enable multipage folios
  9p: v9fs_fid_find: also lookup by inode if not found dentry

Merge tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux

Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:

   - Fix several issues with the 'rustc-option' macro. It includes a
     refactor from Masahiro of three '{cc,rust}-*' macros, which is not
     a fix but avoids repeating the same commands (which would be
     several lines in the case of 'rustc-option').

   - Fix conditions for 'CONFIG_HAVE_CFI_ICALL_NORMALIZE_INTEGERS'. It
     includes the addition of 'CONFIG_RUSTC_LLVM_VERSION', which is not
     a fix but is needed for the actual fix.

  And a trivial grammar fix"

* tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux:
  cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERS
  kbuild: rust: add `CONFIG_RUSTC_LLVM_VERSION`
  kbuild: fix issues with rustc-option
  kbuild: refactor cc-option-yn, cc-disable-warning, rust-option-yn macros
  lib/Kconfig.debug: fix grammar in RUST_BUILD_ASSERT_ALLOW

io_uring/rw: fix wrong NOWAIT check in io_rw_init_file()

A previous commit improved how !FMODE_NOWAIT is dealt with, but
inadvertently negated a check whilst doing so. This caused -EAGAIN to be
returned from reading files with O_NONBLOCK set. Fix up the check for
REQ_F_SUPPORT_NOWAIT.

Reported-by: Julian Orth <ju.orth@gmail.com>
Link: https://github.com/axboe/liburing/issues/1270
Fixes: f7c913438533 ("io_uring/rw: allow pollable non-blocking attempts for !FMODE_NOWAIT")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fgraph: Allocate ret_stack_list with proper size

The ret_stack_list is an array of ret_stack shadow stacks for the function
graph usage. When the first function graph is enabled, all tasks in the
system get a shadow stack. The ret_stack_list is a 32 element array of
pointers to these shadow stacks. It allocates the shadow stack in batches
(32 stacks at a time), assigns them to running tasks, and continues until
all tasks are covered.

When the function graph shadow stack changed from an array of
ftrace_ret_stack structures to an array of longs, the allocation of
ret_stack_list went from allocating an array of 32 elements to just a
block defined by SHADOW_STACK_SIZE. Luckily, that's defined as PAGE_SIZE
and is much more than enough to hold 32 pointers. But it is way overkill
for the amount needed to allocate.

Change the allocation of ret_stack_list back to a kcalloc() of
FTRACE_RETSTACK_ALLOC_SIZE pointers.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241018215212.23f13f40@rorschach
Fixes: 42675b723b484 ("function_graph: Convert ret_stack to a series of longs")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks

The function graph infrastructure allocates a shadow stack for every task
when enabled. This includes the idle tasks. The first time the function
graph is invoked, the shadow stacks are created and never freed until the
task exits. This includes the idle tasks.

Only the idle tasks that were for online CPUs had their shadow stacks
created when function graph tracing started. If function graph tracing is
enabled and a CPU comes online, the idle task representing that CPU will
not have its shadow stack created, and all function graph tracing for that
idle task will be silently dropped.

Instead, use the CPU hotplug mechanism to allocate the idle shadow stacks.
This will include idle tasks for CPUs that come online during tracing.

This issue can be reproduced by:

# cd /sys/kernel/tracing
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo 0 > set_ftrace_pid
# echo function_graph > current_tracer
# echo 1 > options/funcgraph-proc
# echo 1 > /sys/devices/system/cpu/cpu1
# grep '<idle>' per_cpu/cpu1/trace | head

Before, nothing would show up.

After:
1)    <idle>-0    |   0.811 us    |                        __enqueue_entity();
1)    <idle>-0    |   5.626 us    |                      } /* enqueue_entity */
1)    <idle>-0    |               |                      dl_server_update_idle_time() {
1)    <idle>-0    |               |                        dl_scaled_delta_exec() {
1)    <idle>-0    |   0.450 us    |                          arch_scale_cpu_capacity();
1)    <idle>-0    |   1.242 us    |                        }
1)    <idle>-0    |   1.908 us    |                      }
1)    <idle>-0    |               |                      dl_server_start() {
1)    <idle>-0    |               |                        enqueue_dl_entity() {
1)    <idle>-0    |               |                          task_contending() {

Note, if tracing stops and restarts, the old way would then initialize
the onlined CPUs.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/20241018214300.6df82178@rorschach
Fixes: 868baf07b1a25 ("ftrace: Fix memory leak with function graph and cpu hotplug")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Daniel Borkmann:

- Fix BPF verifier to not affect subreg_def marks in its range
   propagation (Eduard Zingerman)

- Fix a truncation bug in the BPF verifier's handling of
   coerce_reg_to_size_sx (Dimitar Kanaliev)

- Fix the BPF verifier's delta propagation between linked registers
   under 32-bit addition (Daniel Borkmann)

- Fix a NULL pointer dereference in BPF devmap due to missing rxq
   information (Florian Kauer)

- Fix a memory leak in bpf_core_apply (Jiri Olsa)

- Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for
   arrays of nested structs (Hou Tao)

- Fix build ID fetching where memory areas backing the file were
   created with memfd_secret (Andrii Nakryiko)

- Fix BPF task iterator tid filtering which was incorrectly using pid
   instead of tid (Jordan Rome)

- Several fixes for BPF sockmap and BPF sockhash redirection in
   combination with vsocks (Michal Luczaj)

- Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri)

- Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility
   of an infinite BPF tailcall (Pu Lehui)

- Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot
   be resolved (Thomas Weißschuh)

- Fix a bug in kfunc BTF caching for modules where the wrong BTF object
   was returned (Toke Høiland-Jørgensen)

- Fix a BPF selftest compilation error in cgroup-related tests with
   musl libc (Tony Ambardar)

- Several fixes to BPF link info dumps to fill missing fields (Tyrone
   Wu)

- Add BPF selftests for kfuncs from multiple modules, checking that the
   correct kfuncs are called (Simon Sundberg)

- Ensure that internal and user-facing bpf_redirect flags don't overlap
   (Toke Høiland-Jørgensen)

- Switch to use kvzmalloc to allocate BPF verifier environment (Rik van
   Riel)

- Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat
   under RT (Wander Lairson Costa)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits)
  lib/buildid: Handle memfd_secret() files in build_id_parse()
  selftests/bpf: Add test case for delta propagation
  bpf: Fix print_reg_state's constant scalar dump
  bpf: Fix incorrect delta propagation between linked registers
  bpf: Properly test iter/task tid filtering
  bpf: Fix iter/task tid filtering
  riscv, bpf: Make BPF_CMPXCHG fully ordered
  bpf, vsock: Drop static vsock_bpf_prot initialization
  vsock: Update msg_count on read_skb()
  vsock: Update rx_bytes on read_skb()
  bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock
  selftests/bpf: Add asserts for netfilter link info
  bpf: Fix link info netfilter flags to populate defrag flag
  selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx()
  selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx()
  bpf: Fix truncation bug in coerce_reg_to_size_sx()
  selftests/bpf: Assert link info uprobe_multi count & path_size if unset
  bpf: Fix unpopulated path_size when uprobe_multi fields unset
  selftests/bpf: Fix cross-compiling urandom_read
  selftests/bpf: Add test for kfunc module order
  ...

Merge tag 'linux_kselftest-fixes-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fix from Shuah Khan:

- fix test makefile to install tests directory without which the test
fails with errors

* tag 'linux_kselftest-fixes-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftest: hid: add the missing tests directory

Input: zinitix - don't fail if linux,keycodes prop is absent

When initially adding the touchkey support, a mistake was made in the
property parsing code. The possible negative errno from
device_property_count_u32() was never checked, which was an oversight
left from converting to it from the of_property as part of the review
fixes.

Re-add the correct handling of the absent property, in which case zero
touchkeys should be assumed, which would disable the feature.

Reported-by: Jakob Hauser <jahau@rocketmail.com>
Tested-by: Jakob Hauser <jahau@rocketmail.com>
Fixes: 075d9b22c8fe ("Input: zinitix - add touchkey support")
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Nikita Travkin <nikita@trvn.ru>
Tested-by: Yassine Oudjana <y.oudjana@protonmail.com>
Link: https://lore.kernel.org/r/20241004-zinitix-no-keycodes-v2-1-876dc9fea4b6@trvn.ru
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'block-6.12-20241018' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

- NVMe pull request via Keith:
     - Fix target passthrough identifier (Nilay)
     - Fix tcp locking (Hannes)
     - Replace list with sbitmap for tracking RDMA rsp tags (Guixen)
     - Remove unnecessary fallthrough statements (Tokunori)
     - Remove ready-without-media support (Greg)
     - Fix multipath partition scan deadlock (Keith)
     - Fix concurrent PCI reset and remove queue mapping (Maurizio)
     - Fabrics shutdown fixes (Nilay)

- Fix for a kerneldoc warning (Keith)

- Fix a race with blk-rq-qos and wakeups (Omar)

- Cleanup of checking for always-set tag_set (SurajSonawane2415)

- Fix for a crash with CPU hotplug notifiers (Ming)

- Don't allow zero-copy ublk on unprivileged device (Ming)

- Use array_index_nospec() for CDROM (Josh)

- Remove dead code in drbd (David)

- Tweaks to elevator loading (Breno)

* tag 'block-6.12-20241018' of git://git.kernel.dk/linux:
  cdrom: Avoid barrier_nospec() in cdrom_ioctl_media_changed()
  nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function
  nvme: make keep-alive synchronous operation
  nvme-loop: flush off pending I/O while shutting down loop controller
  nvme-pci: fix race condition between reset and nvme_dev_disable()
  ublk: don't allow user copy for unprivileged device
  blk-rq-qos: fix crash on rq_qos_wait vs. rq_qos_wake_function race
  nvme-multipath: defer partition scanning
  blk-mq: setup queue ->tag_set before initializing hctx
  elevator: Remove argument from elevator_find_get
  elevator: do not request_module if elevator exists
  drbd: Remove unused conn_lowest_minor
  nvme: disable CC.CRIME (NVME_CC_CRIME)
  nvme: delete unnecessary fallthru comment
  nvmet-rdma: use sbitmap to replace rsp free list
  block: Fix elevator_get_default() checking for NULL q->tag_set
  nvme: tcp: avoid race between queue_lock lock and destroy
  nvmet-passthru: clear EUID/NGUID/UUID while using loop target
  block: fix blk_rq_map_integrity_sg kernel-doc

Merge tag 'io_uring-6.12-20241018' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

- Fix a regression this merge window where cloning of registered
   buffers didn't take into account the dummy_ubuf

- Fix a race with reading how many SQRING entries are available,
   causing userspace to need to loop around io_uring_sqring_wait()
   rather than being able to rely on SQEs being available when it
   returned

- Ensure that the SQPOLL thread is TASK_RUNNING before running
   task_work off the cancelation exit path

* tag 'io_uring-6.12-20241018' of git://git.kernel.dk/linux:
  io_uring/sqpoll: ensure task state is TASK_RUNNING when running task_work
  io_uring/rsrc: ignore dummy_ubuf for buffer cloning
  io_uring/sqpoll: close race on waiting for sqring entries

Input: xpad - add support for MSI Claw A1M

Add MSI Claw A1M controller to xpad_device match table when in xinput mode.
Add MSI VID as XPAD_XBOX360_VENDOR.

Signed-off-by: John Edwards <uejji@uejji.net>
Reviewed-by: Derek J. Clark <derekjohn.clark@gmail.com>
Reviewed-by: Christopher Snowhill <kode54@gmail.com>
Link: https://lore.kernel.org/r/20241010232020.3292284-4-uejji@uejji.net
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

ASoC/SoundWire: clean up link DMA during stop for IPC4

Merge series from Bard Liao <yung-chuan.liao@linux.intel.com>:

Clean up the link DMA for playback during stop for IPC4 is required to
reset the DMA read/write pointers when the stream is prepared and
restarted after a call to snd_pcm_drain()/snd_pcm_drop().

The change is mainly on ASoC. We may go via ASoC tree with Vinod's
Acked-by tag

Ranjani Sridharan (4):
  ASoC: SOF: ipc4-topology: Do not set ALH node_id for aggregated DAIs
  ASoC: SOF: Intel: hda: Handle prepare without close for non-HDA DAI's
  soundwire: intel_ace2x: Send PDI stream number during prepare
  ASoC: SOF: Intel: hda: Always clean up link DMA during stop

drivers/soundwire/intel_ace2x.c   | 19 +++++-----------
sound/soc/sof/intel/hda-dai-ops.c | 23 +++++++++----------
sound/soc/sof/intel/hda-dai.c     | 37 +++++++++++++++++++++++++++----
sound/soc/sof/ipc4-topology.c     | 15 +++++++++++--
4 files changed, 62 insertions(+), 32 deletions(-)

--
2.43.0

nfsd: fix race between laundromat and free_stateid

There is a race between laundromat handling of revoked delegations
and a client sending free_stateid operation. Laundromat thread
finds that delegation has expired and needs to be revoked so it
marks the delegation stid revoked and it puts it on a reaper list
but then it unlock the state lock and the actual delegation revocation
happens without the lock. Once the stid is marked revoked a racing
free_stateid processing thread does the following (1) it calls
list_del_init() which removes it from the reaper list and (2) frees
the delegation stid structure. The laundromat thread ends up not
calling the revoke_delegation() function for this particular delegation
but that means it will no release the lock lease that exists on
the file.

Now, a new open for this file comes in and ends up finding that
lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
up trying to derefence a freed delegation stateid. Leading to the
followint use-after-free KASAN warning:

kernel: ==================================================================
kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
kernel:
kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
kernel: Call trace:
kernel: dump_backtrace+0x98/0x120
kernel: show_stack+0x1c/0x30
kernel: dump_stack_lvl+0x80/0xe8
kernel: print_address_description.constprop.0+0x84/0x390
kernel: print_report+0xa4/0x268
kernel: kasan_report+0xb4/0xf8
kernel: __asan_report_load8_noabort+0x1c/0x28
kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
kernel: nfsd4_open+0xa08/0xe80 [nfsd]
kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
kernel: svc_process+0x3d4/0x7e0 [sunrpc]
kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
kernel: nfsd+0x270/0x400 [nfsd]
kernel: kthread+0x288/0x310
kernel: ret_from_fork+0x10/0x20

This patch proposes a fixed that's based on adding 2 new additional
stid's sc_status values that help coordinate between the laundromat
and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()).

First to make sure, that once the stid is marked revoked, it is not
removed by the nfsd4_free_stateid(), the laundromat take a reference
on the stateid. Then, coordinating whether the stid has been put
on the cl_revoked list or we are processing FREE_STATEID and need to
make sure to remove it from the list, each check that state and act
accordingly. If laundromat has added to the cl_revoke list before
the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove
it from the list. If nfsd4_free_stateid() finds that operations arrived
before laundromat has placed it on cl_revoke list, it marks the state
freed and then laundromat will no longer add it to the list.

Also, for nfsd4_delegreturn() when looking for the specified stid,
we need to access stid that are marked removed or freeable, it means
the laundromat has started processing it but hasn't finished and this
delegreturn needs to return nfserr_deleg_revoked and not
nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the
lack of it will leave this stid on the cl_revoked list indefinitely.

Fixes: 2d4a532d385f ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock")
CC: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

MAINTAINERS: update IPE tree url and Fan Wu's email

Update Integrity Policy Enforcement (IPE) LSM tree url and
maintainer's email to the newly issued kernel.org tree/email.

Signed-off-by: Fan Wu <wufan@kernel.org>

ipe: fallback to platform keyring also if key in trusted keyring is rejected

If enabled, we fallback to the platform keyring if the trusted keyring
doesn't have the key used to sign the ipe policy. But if pkcs7_verify()
rejects the key for other reasons, such as usage restrictions, we do not
fallback. Do so, following the same change in dm-verity.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Suggested-by: Serge Hallyn <serge@hallyn.com>
[FW: fixed some line length issues and a typo in the commit message]
Signed-off-by: Fan Wu <wufan@kernel.org>

Merge tag 'v6.12-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix possible double free setting xattrs

- Fix slab out of bounds with large ioctl payload

- Remove three unused functions, and an unused variable that could be
   confusing

* tag 'v6.12-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Remove unused functions
  smb/client: Fix logically dead code
  smb: client: fix OOBs when building SMB2_IOCTL request
  smb: client: fix possible double free in smb2_set_ea()

Merge tag 'xfs-6.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:

- Fix integer overflow in xrep_bmap

- Fix stale dealloc punching for COW IO

* tag 'xfs-6.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: punch delalloc extents from the COW fork for COW writes
  xfs: set IOMAP_F_SHARED for all COW fork allocations
  xfs: share more code in xfs_buffered_write_iomap_begin
  xfs: support the COW fork in xfs_bmap_punch_delalloc_range
  xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock
  xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof
  xfs: factor out a xfs_file_write_zero_eof helper
  iomap: move locking out of iomap_write_delalloc_release
  iomap: remove iomap_file_buffered_write_punch_delalloc
  iomap: factor out a iomap_last_written_block helper
  xfs: fix integer overflow in xrep_bmap

Merge tag 'pm-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix two issues in the amd-pstate cpufreq driver and update the
  intel_rapl power capping driver with a new processor ID.

  Specifics:

   - Enable ACPI CPPC in amd_pstate_register_driver() after disabling it
     in amd_pstate_unregister_driver() when switching driver operation
     modes (Dhananjay Ugwekar)

   - Make amd-pstate use nominal performance as the maximum performance
     level when boost is disabled (Mario Limonciello)

   - Add ArrowLake-H to the list of processors where PL4 is supported in
     the MSR part of the intel_rapl power capping driver (Srinivas
     Pandruvada)"

* tag 'pm-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  powercap: intel_rapl_msr: Add PL4 support for ArrowLake-H
  cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled
  cpufreq/amd-pstate: Fix amd_pstate mode switch on shared memory systems

Merge tag 'hwmon-for-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
"Fix auto-detect regression in jc42 driver"

* tag 'hwmon-for-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
[PATCH} hwmon: (jc42) Properly detect TSE2004-compliant devices again

Merge tag 'drm-fixes-2024-10-18' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly fixes, msm and xe are the two main ones, with a bunch of
  scattered fixes including a largish revert in mgag200, then amdgpu,
  vmwgfx and scattering of other minor ones.

  All seems pretty regular.

  msm:
   - Display:
      - move CRTC resource assignment to atomic_check otherwise to make
        consecutive calls to atomic_check() consistent
      - fix rounding / sign-extension issues with pclk calculation in
        case of DSC
      - cleanups to drop incorrect null checks in dpu snapshots
      - fix to use kvzalloc in dpu snapshot to avoid allocation issues
        in heavily loaded system cases
      - Fix to not program merge_3d block if dual LM is not being used
      - Fix to not flush merge_3d block if its not enabled otherwise
        this leads to false timeouts
   - GPU:
      - a7xx: add a fence wait before SMMU table update

  xe:
   - New workaround to Xe2 (Aradhya)
   - Fix unbalanced rpm put (Matthew Auld)
   - Remove fragile lock optimization (Matthew Brost)
   - Fix job release, delegating it to the drm scheduler (Matthew Brost)
   - Fix timestamp bit width for Xe2 (Lucas)
   - Fix external BO's dma-resv usag (Matthew Brost)
   - Fix returning success for timeout in wait_token (Nirmoy)
   - Initialize fence to avoid it being detected as signaled (Matthew
     Auld)
   - Improve cache flush for BMG (Matthew Auld)
   - Don't allow hflip for tile4 framebuffer on Xe2 (Juha-Pekka)

  amdgpu:
   - SR-IOV fix
   - CS chunk handling fix
   - MES fixes
   - SMU13 fixes

  amdkfd:
   - VRAM usage reporting fix

  radeon:
   - Fix possible_clones handling

  i915:
   - Two DP bandwidth related MST fixes

  ast:
   - Clear EDID on unplugged connectors

  host1x:
   - Fix boot on Tegra186
   - Set DMA parameters

  mgag200:
   - Revert VBLANK support

  panel:
   - himax-hx83192: Adjust power and gamma

  qaic:
   - Sgtable loop fixes

  vmwgfx:
   - Limit display layout allocatino size
   - Handle allocation errors in connector checks
   - Clean up KMS code for 2d-only setup
   - Report surface-check errors correctly
   - Remove NULL test around kvfree()"

* tag 'drm-fixes-2024-10-18' of https://gitlab.freedesktop.org/drm/kernel: (45 commits)
  drm/ast: vga: Clear EDID if no display is connected
  drm/ast: sil164: Clear EDID if no display is connected
  Revert "drm/mgag200: Add vblank support"
  drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs
  drm/i915/display: Don't allow tile4 framebuffer to do hflip on display20 or greater
  drm/xe/bmg: improve cache flushing behaviour
  drm/xe/xe_sync: initialise ufence.signalled
  drm/xe/ufence: ufence can be signaled right after wait_woken
  drm/xe: Use bookkeep slots for external BO's in exec IOCTL
  drm/xe/query: Increase timestamp width
  drm/xe: Don't free job in TDR
  drm/xe: Take job list lock in xe_sched_add_pending_job
  drm/xe: fix unbalanced rpm put() with declare_wedged()
  drm/xe: fix unbalanced rpm put() with fence_fini()
  drm/xe/xe2lpg: Extend Wa_15016589081 for xe2lpg
  drm/i915/dp_mst: Don't require DSC hblank quirk for a non-DSC compatible mode
  drm/i915/dp_mst: Handle error during DSC BW overhead/slice calculation
  drm/msm/a6xx+: Insert a fence wait before SMMU table update
  drm/msm/dpu: don't always program merge_3d block
  drm/msm/dpu: Don't always set merge_3d pending flush
  ...