]> www.infradead.org Git - users/willy/linux.git/log
users/willy/linux.git
10 months agoselftests/bpf: Fix fill_link_info selftest on powerpc
Saket Kumar Bhaskar [Mon, 9 Dec 2024 06:57:20 +0000 (12:27 +0530)]
selftests/bpf: Fix fill_link_info selftest on powerpc

With CONFIG_KPROBES_ON_FTRACE enabled on powerpc, ftrace_location_range
returns ftrace location for bpf_fentry_test1 at offset of 4 bytes from
function entry. This is because branch to _mcount function is at offset
of 4 bytes in function profile sequence.

To fix this, add entry_offset of 4 bytes while verifying the address for
kprobe entry address of bpf_fentry_test1 in verify_perf_link_info in
selftest, when CONFIG_KPROBES_ON_FTRACE is enabled.

Disassemble of bpf_fentry_test1:

c000000000e4b080 <bpf_fentry_test1>:
c000000000e4b080:       a6 02 08 7c     mflr    r0
c000000000e4b084:       b9 e2 22 4b     bl      c00000000007933c <_mcount>
c000000000e4b088:       01 00 63 38     addi    r3,r3,1
c000000000e4b08c:       b4 07 63 7c     extsw   r3,r3
c000000000e4b090:       20 00 80 4e     blr

When CONFIG_PPC_FTRACE_OUT_OF_LINE [1] is enabled, these function profile
sequence is moved out of line with an unconditional branch at offset 0.
So, the test works without altering the offset for
'CONFIG_KPROBES_ON_FTRACE && CONFIG_PPC_FTRACE_OUT_OF_LINE' case.

Disassemble of bpf_fentry_test1:

c000000000f95190 <bpf_fentry_test1>:
c000000000f95190:       00 00 00 60     nop
c000000000f95194:       01 00 63 38     addi    r3,r3,1
c000000000f95198:       b4 07 63 7c     extsw   r3,r3
c000000000f9519c:       20 00 80 4e     blr

[1] https://lore.kernel.org/all/20241030070850.1361304-13-hbathini@linux.ibm.com/

Fixes: 23cf7aa539dc ("selftests/bpf: Add selftest for fill_link_info")
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241209065720.234344-1-skb99@linux.ibm.com
10 months agoselftests/bpf: Add more stats into veristat
Mykyta Yatsenko [Mon, 9 Dec 2024 13:04:55 +0000 (13:04 +0000)]
selftests/bpf: Add more stats into veristat

Extend veristat to collect and print more stats, namely:
  - program size in instructions
  - jited program size in bytes
  - program type
  - attach type
  - stack depth

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241209130455.94592-1-mykyta.yatsenko5@gmail.com
10 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov [Mon, 9 Dec 2024 01:01:51 +0000 (17:01 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Cross-merge bpf fixes after downstream PR.

Trivial conflict:
tools/testing/selftests/bpf/prog_tests/verifier.c

Adjacent changes in:
Auto-merging kernel/bpf/verifier.c
Auto-merging samples/bpf/Makefile
Auto-merging tools/testing/selftests/bpf/.gitignore
Auto-merging tools/testing/selftests/bpf/Makefile
Auto-merging tools/testing/selftests/bpf/prog_tests/verifier.c

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 months agoLinux 6.13-rc2
Linus Torvalds [Sun, 8 Dec 2024 22:03:39 +0000 (14:03 -0800)]
Linux 6.13-rc2

10 months agoMerge tag 'kbuild-fixes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masah...
Linus Torvalds [Sun, 8 Dec 2024 20:01:06 +0000 (12:01 -0800)]
Merge tag 'kbuild-fixes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Fix a section mismatch warning in modpost

 - Fix Debian package build error with the O= option

* tag 'kbuild-fixes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: deb-pkg: fix build error with O=
  modpost: Add .irqentry.text to OTHER_SECTIONS

10 months agoMerge tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 8 Dec 2024 19:54:04 +0000 (11:54 -0800)]
Merge tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Fix a /proc/interrupts formatting regression

 - Have the BCM2836 interrupt controller enter power management states
   properly

 - Other fixlets

* tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/stm32mp-exti: CONFIG_STM32MP_EXTI should not default to y when compile-testing
  genirq/proc: Add missing space separator back
  irqchip/bcm2836: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
  irqchip/gic-v3: Fix irq_complete_ack() comment

10 months agoMerge tag 'timers_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 8 Dec 2024 19:51:29 +0000 (11:51 -0800)]
Merge tag 'timers_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Borislav Petkov:

 - Handle the case where clocksources with small counter width can,
   in conjunction with overly long idle sleeps, falsely trigger the
   negative motion detection of clocksources

* tag 'timers_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: Make negative motion detection more robust

10 months agoMerge tag 'x86_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 8 Dec 2024 19:38:56 +0000 (11:38 -0800)]
Merge tag 'x86_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Have the Automatic IBRS setting check on AMD does not falsely fire in
   the guest when it has been set already on the host

 - Make sure cacheinfo structures memory is allocated to address a boot
   NULL ptr dereference on Intel Meteor Lake which has different numbers
   of subleafs in its CPUID(4) leaf

 - Take care of the GDT restoring on the kexec path too, as expected by
   the kernel

 - Make sure SMP is not disabled when IO-APIC is disabled on the kernel
   cmdline

 - Add a PGD flag _PAGE_NOPTISHADOW to instruct machinery not to
   propagate changes to the kernelmode page tables, to the user portion,
   in PTI

 - Mark Intel Lunar Lake as affected by an issue where MONITOR wakeups
   can get lost and thus user-visible delays happen

 - Make sure PKRU is properly restored with XRSTOR on AMD after a PRKU
   write of 0 (WRPKRU) which will mark PKRU in its init state and thus
   lose the actual buffer

* tag 'x86_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR fails
  x86/cacheinfo: Delete global num_cache_leaves
  cacheinfo: Allocate memory during CPU hotplug if not done from the primary CPU
  x86/kexec: Restore GDT on return from ::preserve_context kexec
  x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC
  x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables
  x86/cpu: Add Lunar Lake to list of CPUs with a broken MONITOR implementation
  x86/pkeys: Ensure updated PKRU value is XRSTOR'd
  x86/pkeys: Change caller of update_pkru_in_sigframe()

10 months agoMerge tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 8 Dec 2024 19:26:13 +0000 (11:26 -0800)]
Merge tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "24 hotfixes.  17 are cc:stable.  15 are MM and 9 are non-MM.

  The usual bunch of singletons - please see the relevant changelogs for
  details"

* tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (24 commits)
  iio: magnetometer: yas530: use signed integer type for clamp limits
  sched/numa: fix memory leak due to the overwritten vma->numab_state
  mm/damon: fix order of arguments in damos_before_apply tracepoint
  lib: stackinit: hide never-taken branch from compiler
  mm/filemap: don't call folio_test_locked() without a reference in next_uptodate_folio()
  scatterlist: fix incorrect func name in kernel-doc
  mm: correct typo in MMAP_STATE() macro
  mm: respect mmap hint address when aligning for THP
  mm: memcg: declare do_memsw_account inline
  mm/codetag: swap tags when migrate pages
  ocfs2: update seq_file index in ocfs2_dlm_seq_next
  stackdepot: fix stack_depot_save_flags() in NMI context
  mm: open-code page_folio() in dump_page()
  mm: open-code PageTail in folio_flags() and const_folio_flags()
  mm: fix vrealloc()'s KASAN poisoning logic
  Revert "readahead: properly shorten readahead when falling back to do_page_cache_ra()"
  selftests/damon: add _damon_sysfs.py to TEST_FILES
  selftest: hugetlb_dio: fix test naming
  ocfs2: free inode when ocfs2_get_init_inode() fails
  nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry()
  ...

10 months agokbuild: deb-pkg: fix build error with O=
Masahiro Yamada [Sun, 8 Dec 2024 07:56:45 +0000 (16:56 +0900)]
kbuild: deb-pkg: fix build error with O=

Since commit 13b25489b6f8 ("kbuild: change working directory to external
module directory with M="), the Debian package build fails if a relative
path is specified with the O= option.

  $ make O=build bindeb-pkg
    [ snip ]
  dpkg-deb: building package 'linux-image-6.13.0-rc1' in '../linux-image-6.13.0-rc1_6.13.0-rc1-6_amd64.deb'.
  Rebuilding host programs with x86_64-linux-gnu-gcc...
  make[6]: Entering directory '/home/masahiro/linux/build'
  /home/masahiro/linux/Makefile:190: *** specified kernel directory "build" does not exist.  Stop.

This occurs because the sub_make_done flag is cleared, even though the
working directory is already in the output directory.

Passing KBUILD_OUTPUT=. resolves the issue.

Fixes: 13b25489b6f8 ("kbuild: change working directory to external module directory with M=")
Reported-by: Charlie Jenkins <charlie@rivosinc.com>
Closes: https://lore.kernel.org/all/Z1DnP-GJcfseyrM3@ghost/
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
10 months agomodpost: Add .irqentry.text to OTHER_SECTIONS
Thomas Gleixner [Sun, 1 Dec 2024 11:17:30 +0000 (12:17 +0100)]
modpost: Add .irqentry.text to OTHER_SECTIONS

The compiler can fully inline the actual handler function of an interrupt
entry into the .irqentry.text entry point. If such a function contains an
access which has an exception table entry, modpost complains about a
section mismatch:

  WARNING: vmlinux.o(__ex_table+0x447c): Section mismatch in reference ...

  The relocation at __ex_table+0x447c references section ".irqentry.text"
  which is not in the list of authorized sections.

Add .irqentry.text to OTHER_SECTIONS to cure the issue.

Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # needed for linux-5.4-y
Link: https://lore.kernel.org/all/20241128111844.GE10431@google.com/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
10 months agoMerge tag '6.13-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 8 Dec 2024 01:27:25 +0000 (17:27 -0800)]
Merge tag '6.13-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - DFS fix (for race with tree disconnect and dfs cache worker)

 - Four fixes for SMB3.1.1 posix extensions:
      - improve special file support e.g. to Samba, retrieving the file
        type earlier
      - reduce roundtrips (e.g. on ls -l, in some cases)

* tag '6.13-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: fix potential race in cifs_put_tcon()
  smb3.1.1: fix posix mounts to older servers
  fs/smb/client: cifs_prime_dcache() for SMB3 POSIX reparse points
  fs/smb/client: Implement new SMB3 POSIX type
  fs/smb/client: avoid querying SMB2_OP_QUERY_WSL_EA for SMB3 POSIX

10 months agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sun, 8 Dec 2024 01:17:38 +0000 (17:17 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Large number of small fixes, all in drivers"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits)
  scsi: scsi_debug: Fix hrtimer support for ndelay
  scsi: storvsc: Do not flag MAINTENANCE_IN return of SRB_STATUS_DATA_OVERRUN as an error
  scsi: ufs: core: Add missing post notify for power mode change
  scsi: sg: Fix slab-use-after-free read in sg_release()
  scsi: ufs: core: sysfs: Prevent div by zero
  scsi: qla2xxx: Update version to 10.02.09.400-k
  scsi: qla2xxx: Supported speed displayed incorrectly for VPorts
  scsi: qla2xxx: Fix NVMe and NPIV connect issue
  scsi: qla2xxx: Remove check req_sg_cnt should be equal to rsp_sg_cnt
  scsi: qla2xxx: Fix use after free on unload
  scsi: qla2xxx: Fix abort in bsg timeout
  scsi: mpi3mr: Update driver version to 8.12.0.3.50
  scsi: mpi3mr: Handling of fault code for insufficient power
  scsi: mpi3mr: Start controller indexing from 0
  scsi: mpi3mr: Fix corrupt config pages PHY state is switched in sysfs
  scsi: mpi3mr: Synchronize access to ioctl data buffer
  scsi: mpt3sas: Update driver version to 51.100.00.00
  scsi: mpt3sas: Diag-Reset when Doorbell-In-Use bit is set during driver load time
  scsi: ufs: pltfrm: Dellocate HBA during ufshcd_pltfrm_remove()
  scsi: ufs: pltfrm: Drop PM runtime reference count after ufshcd_remove()
  ...

10 months agoMerge tag 'block-6.13-20241207' of git://git.kernel.dk/linux
Linus Torvalds [Sat, 7 Dec 2024 18:07:05 +0000 (10:07 -0800)]
Merge tag 'block-6.13-20241207' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
      - Target fix using incorrect zero buffer (Nilay)
      - Device specifc deallocate quirk fixes (Christoph, Keith)
      - Fabrics fix for handling max command target bugs (Maurizio)
      - Cocci fix usage for kzalloc (Yu-Chen)
      - DMA size fix for host memory buffer feature (Christoph)
      - Fabrics queue cleanup fixes (Chunguang)

 - CPU hotplug ordering fixes

 - Add missing MODULE_DESCRIPTION for rnull

 - bcache error value fix

 - virtio-blk queue freeze fix

* tag 'block-6.13-20241207' of git://git.kernel.dk/linux:
  blk-mq: move cpuhp callback registering out of q->sysfs_lock
  blk-mq: register cpuhp callback after hctx is added to xarray table
  virtio-blk: don't keep queue frozen during system suspend
  nvme-tcp: simplify nvme_tcp_teardown_io_queues()
  nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues()
  nvme-rdma: unquiesce admin_q before destroy it
  nvme-tcp: fix the memleak while create new ctrl failed
  nvme-pci: don't use dma_alloc_noncontiguous with 0 merge boundary
  nvmet: replace kmalloc + memset with kzalloc for data allocation
  nvme-fabrics: handle zero MAXCMD without closing the connection
  bcache: revert replacing IS_ERR_OR_NULL with IS_ERR again
  nvme-pci: remove two deallocate zeroes quirks
  block: rnull: add missing MODULE_DESCRIPTION
  nvme: don't apply NVME_QUIRK_DEALLOCATE_ZEROES when DSM is not supported
  nvmet: use kzalloc instead of ZERO_PAGE in nvme_execute_identify_ns_nvm()

10 months agoMerge tag 'io_uring-6.13-20241207' of git://git.kernel.dk/linux
Linus Torvalds [Sat, 7 Dec 2024 18:01:13 +0000 (10:01 -0800)]
Merge tag 'io_uring-6.13-20241207' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
 "A single fix for a parameter type which affects 32-bit"

* tag 'io_uring-6.13-20241207' of git://git.kernel.dk/linux:
  io_uring: Change res2 parameter type in io_uring_cmd_done

10 months agoMerge tag 'ubifs-for-linus-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 7 Dec 2024 17:57:38 +0000 (09:57 -0800)]
Merge tag 'ubifs-for-linus-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull jffs2 fix from Richard Weinberger:

 - Fixup rtime compressor bounds checking

* tag 'ubifs-for-linus-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  jffs2: Fix rtime decompressor

10 months agoMerge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Linus Torvalds [Fri, 6 Dec 2024 23:07:48 +0000 (15:07 -0800)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Daniel Borkmann::

 - Fix several issues for BPF LPM trie map which were found by syzbot
   and during addition of new test cases (Hou Tao)

 - Fix a missing process_iter_arg register type check in the BPF
   verifier (Kumar Kartikeya Dwivedi, Tao Lyu)

 - Fix several correctness gaps in the BPF verifier when interacting
   with the BPF stack without CAP_PERFMON (Kumar Kartikeya Dwivedi,
   Eduard Zingerman, Tao Lyu)

 - Fix OOB BPF map writes when deleting elements for the case of xsk map
   as well as devmap (Maciej Fijalkowski)

 - Fix xsk sockets to always clear DMA mapping information when
   unmapping the pool (Larysa Zaremba)

 - Fix sk_mem_uncharge logic in tcp_bpf_sendmsg to only uncharge after
   sent bytes have been finalized (Zijian Zhang)

 - Fix BPF sockmap with vsocks which was missing a queue check in poll
   and sockmap cleanup on close (Michal Luczaj)

 - Fix tools infra to override makefile ARCH variable if defined but
   empty, which addresses cross-building tools. (Björn Töpel)

 - Fix two resolve_btfids build warnings on unresolved bpf_lsm symbols
   (Thomas Weißschuh)

 - Fix a NULL pointer dereference in bpftool (Amir Mohammadi)

 - Fix BPF selftests to check for CONFIG_PREEMPTION instead of
   CONFIG_PREEMPT (Sebastian Andrzej Siewior)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (31 commits)
  selftests/bpf: Add more test cases for LPM trie
  selftests/bpf: Move test_lpm_map.c to map_tests
  bpf: Use raw_spinlock_t for LPM trie
  bpf: Switch to bpf mem allocator for LPM trie
  bpf: Fix exact match conditions in trie_get_next_key()
  bpf: Handle in-place update for full LPM trie correctly
  bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie
  bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem
  bpf: Remove unnecessary check when updating LPM trie
  selftests/bpf: Add test for narrow spill into 64-bit spilled scalar
  selftests/bpf: Add test for reading from STACK_INVALID slots
  selftests/bpf: Introduce __caps_unpriv annotation for tests
  bpf: Fix narrow scalar spill onto 64-bit spilled scalar slots
  bpf: Don't mark STACK_INVALID as STACK_MISC in mark_stack_slot_misc
  samples/bpf: Remove unnecessary -I flags from libbpf EXTRA_CFLAGS
  bpf: Zero index arg error string for dynptr and iter
  selftests/bpf: Add tests for iter arg check
  bpf: Ensure reg is PTR_TO_STACK in process_iter_arg
  tools: Override makefile ARCH variable if defined, but empty
  selftests/bpf: Add apply_bytes test to test_txmsg_redir_wait_sndmem in test_sockmap
  ...

10 months agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 6 Dec 2024 21:47:55 +0000 (13:47 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
 "Nothing major, some left-overs from the recent merging window (MTE,
  coco) and some newly found issues like the ptrace() ones.

   - MTE/hugetlbfs:

      - Set VM_MTE_ALLOWED in the arch code and remove it from the core
        code for hugetlbfs mappings

      - Fix copy_highpage() warning when the source is a huge page but
        not MTE tagged, taking the wrong small page path

   - drivers/virt/coco:

      - Add the pKVM and Arm CCA drivers under the arm64 maintainership

      - Fix the pkvm driver to fall back to ioremap() (and warn) if the
        MMIO_GUARD hypercall fails

      - Keep the Arm CCA driver default 'n' rather than 'm'

   - A series of fixes for the arm64 ptrace() implementation,
     potentially leading to the kernel consuming uninitialised stack
     variables when PTRACE_SETREGSET is invoked with a length of 0

   - Fix zone_dma_limit calculation when RAM starts below 4GB and
     ZONE_DMA is capped to this limit

   - Fix early boot warning with CONFIG_DEBUG_VIRTUAL=y triggered by a
     call to page_to_phys() (from patch_map()) which checks pfn_valid()
     before vmemmap has been set up

   - Do not clobber bits 15:8 of the ASID used for TTBR1_EL1 and TLBI
     ops when the kernel assumes 8-bit ASIDs but running under a
     hypervisor on a system that implements 16-bit ASIDs (found running
     Linux under Parallels on Apple M4)

   - ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A as it
     is using the same SMMU PMCG as HIP09 and suffers from the same
     errata

   - Add GCS to cpucap_is_possible(), missed in the recent merge"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: ptrace: fix partial SETREGSET for NT_ARM_GCS
  arm64: ptrace: fix partial SETREGSET for NT_ARM_POE
  arm64: ptrace: fix partial SETREGSET for NT_ARM_FPMR
  arm64: ptrace: fix partial SETREGSET for NT_ARM_TAGGED_ADDR_CTRL
  arm64: cpufeature: Add GCS to cpucap_is_possible()
  coco: virt: arm64: Do not enable cca guest driver by default
  arm64: mte: Fix copy_highpage() warning on hugetlb folios
  arm64: Ensure bits ASID[15:8] are masked out when the kernel uses 8-bit ASIDs
  ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A
  MAINTAINERS: Add CCA and pKVM CoCO guest support to the ARM64 entry
  drivers/virt: pkvm: Don't fail ioremap() call if MMIO_GUARD fails
  arm64: patching: avoid early page_to_phys()
  arm64: mm: Fix zone_dma_limit calculation
  arm64: mte: set VM_MTE_ALLOWED for hugetlbfs at correct place

10 months agoMerge tag 'fixes-2024-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt...
Linus Torvalds [Fri, 6 Dec 2024 21:42:03 +0000 (13:42 -0800)]
Merge tag 'fixes-2024-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock fixes from Mike Rapoport:
 "Restore check for node validity in arch_numa.

  The rework of NUMA initialization in arch_numa dropped a check that
  refused to accept configurations with invalid node IDs.

  Restore that check to ensure that when firmware passes invalid nodes,
  such configuration is rejected and kernel gracefully falls back to
  dummy NUMA"

* tag 'fixes-2024-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  arch_numa: Restore nid checks before registering a memblock with a node
  memblock: allow zero threshold in validate_numa_converage()

10 months agoMerge tag 'drm-fixes-2024-12-06' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 6 Dec 2024 21:16:41 +0000 (13:16 -0800)]
Merge tag 'drm-fixes-2024-12-06' of https://gitlab.freedesktop.org/drm/kernel

Pull more drm fixes from Simona Vetter:
 "Due to mailing list unreliability we missed the amdgpu pull, hence
  part two with that now included:

   - amdgu: mostly display fixes + jpeg vcn 1.0, sriov, dcn4.0 resume
     fixes

   - amdkfd fixes"

* tag 'drm-fixes-2024-12-06' of https://gitlab.freedesktop.org/drm/kernel:
  drm/amdgpu: rework resume handling for display (v2)
  drm/amd/pm: fix and simplify workload handling
  Revert "drm/amd/pm: correct the workload setting"
  drm/amdgpu: fix sriov reinit late orders
  drm/amdgpu: Fix ISP hw init issue
  drm/amd/display: Add hblank borrowing support
  drm/amd/display: Limit VTotal range to max hw cap minus fp
  drm/amd/display: Correct prefetch calculation
  drm/amd/display: Add option to retrieve detile buffer size
  drm/amd/display: Add a left edge pixel if in YCbCr422 or YCbCr420 and odm
  drm/amdkfd: hard-code cacheline for gc943,gc944
  drm/amdkfd: add MEC version that supports no PCIe atomics for GFX12
  drm/amd/display: Fix programming backlight on OLED panels
  drm/amd: Sanity check the ACPI EDID
  drm/amdgpu/hdp7.0: do a posting read when flushing HDP
  drm/amdgpu/hdp6.0: do a posting read when flushing HDP
  drm/amdgpu/hdp5.2: do a posting read when flushing HDP
  drm/amdgpu/hdp5.0: do a posting read when flushing HDP
  drm/amdgpu/hdp4.0: do a posting read when flushing HDP
  drm/amdgpu/jpeg1.0: fix idle work handler

10 months agoMerge tag 'amd-drm-fixes-6.13-2024-12-04' of https://gitlab.freedesktop.org/agd5f...
Simona Vetter [Fri, 6 Dec 2024 20:54:04 +0000 (21:54 +0100)]
Merge tag 'amd-drm-fixes-6.13-2024-12-04' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.13-2024-12-04:

amdgpu:
- Jpeg work handler fix for VCN 1.0
- HDP flush fixes
- ACPI EDID sanity check
- OLED panel backlight fix
- DC YCbCr fix
- DC Detile buffer size debugging
- DC prefetch calculation fix
- DC VTotal handling fix
- DC HBlank fix
- ISP fix
- SR-IOV fix
- Workload profile fixes
- DCN 4.0.1 resume fix

amdkfd:
- GC 12.x fix
- GC 9.4.x fix

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241206190452.2571042-1-alexander.deucher@amd.com
10 months agoMerge tag 'drm-fixes-2024-12-07' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 6 Dec 2024 19:52:15 +0000 (11:52 -0800)]
Merge tag 'drm-fixes-2024-12-07' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Pretty quiet week which is probably expected after US holidays, the
  dma-fence and displayport MST message handling fixes make up the bulk
  of this, along with a couple of minor xe and other driver fixes.

  dma-fence:
   - Fix reference leak on fence-merge failure path
   - Simplify fence merging with kernel's sort()
   - Fix dma_fence_array_signaled() to ensure forward progress

  dp_mst:
   - Fix MST sideband message body length check
   - Fix a bunch of locking/state handling with DP MST msgs

  sti:
   - Add __iomem for mixer_dbg_mxn()'s parameter

  xe:
   - Missing init value and 64-bit write-order check
   - Fix a memory allocation issue causing lockdep violation

  v3d:
   - Performance counter fix"

* tag 'drm-fixes-2024-12-07' of https://gitlab.freedesktop.org/drm/kernel:
  drm/v3d: Enable Performance Counters before clearing them
  drm/dp_mst: Use reset_msg_rx_state() instead of open coding it
  drm/dp_mst: Reset message rx state after OOM in drm_dp_mst_handle_up_req()
  drm/dp_mst: Ensure mst_primary pointer is valid in drm_dp_mst_handle_up_req()
  drm/dp_mst: Fix down request message timeout handling
  drm/dp_mst: Simplify error path in drm_dp_mst_handle_down_rep()
  drm/dp_mst: Verify request type in the corresponding down message reply
  drm/dp_mst: Fix resetting msg rx state after topology removal
  drm/xe: Move the coredump registration to the worker thread
  drm/xe/guc: Fix missing init value and add register order check
  drm/sti: Add __iomem for mixer_dbg_mxn's parameter
  drm/dp_mst: Fix MST sideband message body length check
  dma-buf: fix dma_fence_array_signaled v4
  dma-fence: Use kernel's sort for merging fences
  dma-fence: Fix reference leak on fence merge failure path

10 months agoMerge tag 'sound-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 6 Dec 2024 19:46:39 +0000 (11:46 -0800)]
Merge tag 'sound-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes that have been gathered in the week.

   - Fix the missing XRUN handling in USB-audio low latency mode

   - Fix regression by the previous USB-audio hadening change

   - Clean up old SH sound driver to use the standard helpers

   - A few further fixes for MIDI 2.0 UMP handling

   - Various HD-audio and USB-audio quirks

   - Fix jack handling at PM on ASoC Intel AVS

   - Misc small fixes for ASoC SOF and Mediatek"

* tag 'sound-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: Fix spelling mistake "Firelfy" -> "Firefly"
  ASoC: mediatek: mt8188-mt6359: Remove hardcoded dmic codec
  ALSA: hda/realtek: fix micmute LEDs don't work on HP Laptops
  ALSA: usb-audio: Add extra PID for RME Digiface USB
  ALSA: usb-audio: Fix a DMA to stack memory bug
  ASoC: SOF: ipc3-topology: fix resource leaks in sof_ipc3_widget_setup_comp_dai()
  ALSA: hda/realtek: Add support for Samsung Galaxy Book3 360 (NP730QFG)
  ASoC: Intel: avs: da7219: Remove suspend_pre() and resume_post()
  ALSA: hda/tas2781: Fix error code tas2781_read_acpi()
  ALSA: hda/realtek: Enable mute and micmute LED on HP ProBook 430 G8
  ALSA: usb-audio: add mixer mapping for Corsair HS80
  ALSA: ump: Shut up truncated string warning
  ALSA: sh: Use standard helper for buffer accesses
  ALSA: usb-audio: Notify xrun for low-latency mode
  ALSA: hda/conexant: fix Z60MR100 startup pop issue
  ALSA: ump: Update legacy substream names upon FB info update
  ALSA: ump: Indicate the inactive group in legacy substream names
  ALSA: ump: Don't open legacy substream for an inactive group
  ALSA: seq: ump: Fix seq port updates per FB info notify

10 months agoMerge tag 'regmap-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 6 Dec 2024 19:43:22 +0000 (11:43 -0800)]
Merge tag 'regmap-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
 "A couple of small fixes, fixing an incorrect format specifier in a log
  message and adding missing cleanup of the devres data used to support
  dev_get_regmap() when a device is unregistered"

* tag 'regmap-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: detach regmap from dev on regmap_exit
  regmap: Use correct format specifier for logging range errors

10 months agoMerge tag 'spi-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brooni...
Linus Torvalds [Fri, 6 Dec 2024 19:36:48 +0000 (11:36 -0800)]
Merge tag 'spi-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A few small driver specific fixes and device ID updates for SPI.

  The Apple change flags the driver as being compatible with the core's
  GPIO chip select support, fixing support for some systems"

* tag 'spi-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: omap2-mcspi: Fix the IS_ERR() bug for devm_clk_get_optional_enabled()
  spi: intel: Add Panther Lake SPI controller support
  spi: apple: Set use_gpio_descriptors to true
  spi: mpc52xx: Add cancel_work_sync before module remove

10 months agoMerge tag 'mmc-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Fri, 6 Dec 2024 19:27:10 +0000 (11:27 -0800)]
Merge tag 'mmc-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "Core:
   - Further prevent card detect during shutdown

  Host drivers:
   - sdhci-pci: Add DMI quirk for missing CD GPIO on Vexia Edu Atla 10
     tablet"

* tag 'mmc-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: core: Further prevent card detect during shutdown
  mmc: sdhci-pci: Add DMI quirk for missing CD GPIO on Vexia Edu Atla 10 tablet

10 months agoMerge tag 'pmdomain-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh...
Linus Torvalds [Fri, 6 Dec 2024 19:24:00 +0000 (11:24 -0800)]
Merge tag 'pmdomain-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain fixes from Ulf Hansson:
 "Core:
   - Fix a couple of memory-leaks during genpd init/remove

  Providers:
   - imx: Adjust delay for gpcv2 to fix power up handshake
   - mediatek: Fix DT bindings by adding another nested power-domain
     layer"

* tag 'pmdomain-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
  pmdomain: imx: gpcv2: Adjust delay after power up handshake
  pmdomain: core: Fix error path in pm_genpd_init() when ida alloc fails
  pmdomain: core: Add missing put_device()
  dt-bindings: power: mediatek: Add another nested power-domain layer

10 months agox86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR fails
Sean Christopherson [Fri, 6 Dec 2024 16:20:06 +0000 (08:20 -0800)]
x86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR fails

When ensuring EFER.AUTOIBRS is set, WARN only on a negative return code
from msr_set_bit(), as '1' is used to indicate the WRMSR was successful
('0' indicates the MSR bit was already set).

Fixes: 8cc68c9c9e92 ("x86/CPU/AMD: Make sure EFER[AIBRSE] is set")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/Z1MkNofJjt7Oq0G6@google.com
Closes: https://lore.kernel.org/all/20241205220604.GA2054199@thelio-3990X
10 months agoselftests/bpf: Consolidate kernel modules into common directory
Toke Høiland-Jørgensen [Wed, 4 Dec 2024 13:28:26 +0000 (14:28 +0100)]
selftests/bpf: Consolidate kernel modules into common directory

The selftests build four kernel modules which use copy-pasted Makefile
targets. This is a bit messy, and doesn't scale so well when we add more
modules, so let's consolidate these rules into a single rule generated
for each module name, and move the module sources into a single
directory.

To avoid parallel builds of the different modules stepping on each
other's toes during the 'modpost' phase of the Kbuild 'make modules',
the module files should really be a grouped target. However, make only
added explicit support for grouped targets in version 4.3, which is
newer than the minimum version supported by the kernel. However, make
implicitly treats pattern matching rules with multiple targets as a
grouped target, so we can work around this by turning the rule into a
pattern matching target. We do this by replacing '.ko' with '%ko' in the
targets with subst().

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/bpf/20241204-bpf-selftests-mod-compile-v5-1-b96231134a49@redhat.com
10 months agoMerge branch 'fixes-for-lpm-trie'
Alexei Starovoitov [Fri, 6 Dec 2024 17:14:26 +0000 (09:14 -0800)]
Merge branch 'fixes-for-lpm-trie'

Hou Tao says:

====================
This patch set fixes several issues for LPM trie. These issues were
found during adding new test cases or were reported by syzbot.

The patch set is structured as follows:

Patch #1~#2 are clean-ups for lpm_trie_update_elem().
Patch #3 handles BPF_EXIST and BPF_NOEXIST correctly for LPM trie.
Patch #4 fixes the accounting of n_entries when doing in-place update.
Patch #5 fixes the exact match condition in trie_get_next_key() and it
may skip keys when the passed key is not found in the map.
Patch #6~#7 switch from kmalloc() to bpf memory allocator for LPM trie
to fix several lock order warnings reported by syzbot. It also enables
raw_spinlock_t for LPM trie again. After these changes, the LPM trie will
be closer to being usable in any context (though the reentrance check of
trie->lock is still missing, but it is on my todo list).
Patch #8: move test_lpm_map to map_tests to make it run regularly.
Patch #9: add test cases for the issues fixed by patch #3~#5.

Please see individual patches for more details. Comments are always
welcome.

Change Log:
v3:
  * patch #2: remove the unnecessary NULL-init for im_node
  * patch #6: alloc the leaf node before disabling IRQ to low
    the possibility of -ENOMEM when leaf_size is large; Free
    these nodes outside the trie lock (Suggested by Alexei)
  * collect review and ack tags (Thanks for Toke & Daniel)

v2: https://lore.kernel.org/bpf/20241127004641.1118269-1-houtao@huaweicloud.com/
  * collect review tags (Thanks for Toke)
  * drop "Add bpf_mem_cache_is_mergeable() helper" patch
  * patch #3~#4: add fix tag
  * patch #4: rename the helper to trie_check_add_elem() and increase
    n_entries in it.
  * patch #6: use one bpf mem allocator and update commit message to
    clarify that using bpf mem allocator is more appropriate.
  * patch #7: update commit message to add the possible max running time
    for update operation.
  * patch #9: update commit message to specify the purpose of these test
    cases.

v1: https://lore.kernel.org/bpf/20241118010808.2243555-1-houtao@huaweicloud.com/
====================

Link: https://lore.kernel.org/all/20241206110622.1161752-1-houtao@huaweicloud.com/
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 months agoselftests/bpf: Add more test cases for LPM trie
Hou Tao [Fri, 6 Dec 2024 11:06:22 +0000 (19:06 +0800)]
selftests/bpf: Add more test cases for LPM trie

Add more test cases for LPM trie in test_maps:

1) test_lpm_trie_update_flags
It constructs various use cases for BPF_EXIST and BPF_NOEXIST and check
whether the return value of update operation is expected.

2) test_lpm_trie_update_full_maps
It tests the update operations on a full LPM trie map. Adding new node
will fail and overwriting the value of existed node will succeed.

3) test_lpm_trie_iterate_strs and test_lpm_trie_iterate_ints
There two test cases test whether the iteration through get_next_key is
sorted and expected. These two test cases delete the minimal key after
each iteration and check whether next iteration returns the second
minimal key. The only difference between these two test cases is the
former one saves strings in the LPM trie and the latter saves integers.
Without the fix of get_next_key, these two cases will fail as shown
below:
  test_lpm_trie_iterate_strs(1091):FAIL:iterate #2 got abc exp abS
  test_lpm_trie_iterate_ints(1142):FAIL:iterate #1 got 0x2 exp 0x1

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-10-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 months agoselftests/bpf: Move test_lpm_map.c to map_tests
Hou Tao [Fri, 6 Dec 2024 11:06:21 +0000 (19:06 +0800)]
selftests/bpf: Move test_lpm_map.c to map_tests

Move test_lpm_map.c to map_tests/ to include LPM trie test cases in
regular test_maps run. Most code remains unchanged, including the use of
assert(). Only reduce n_lookups from 64K to 512, which decreases
test_lpm_map runtime from 37s to 0.7s.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-9-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 months agobpf: Use raw_spinlock_t for LPM trie
Hou Tao [Fri, 6 Dec 2024 11:06:20 +0000 (19:06 +0800)]
bpf: Use raw_spinlock_t for LPM trie

After switching from kmalloc() to the bpf memory allocator, there will be
no blocking operation during the update of LPM trie. Therefore, change
trie->lock from spinlock_t to raw_spinlock_t to make LPM trie usable in
atomic context, even on RT kernels.

The max value of prefixlen is 2048. Therefore, update or deletion
operations will find the target after at most 2048 comparisons.
Constructing a test case which updates an element after 2048 comparisons
under a 8 CPU VM, and the average time and the maximal time for such
update operation is about 210us and 900us.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-8-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 months agobpf: Switch to bpf mem allocator for LPM trie
Hou Tao [Fri, 6 Dec 2024 11:06:19 +0000 (19:06 +0800)]
bpf: Switch to bpf mem allocator for LPM trie

Multiple syzbot warnings have been reported. These warnings are mainly
about the lock order between trie->lock and kmalloc()'s internal lock.
See report [1] as an example:

======================================================
WARNING: possible circular locking dependency detected
6.10.0-rc7-syzkaller-00003-g4376e966ecb7 #0 Not tainted
------------------------------------------------------
syz.3.2069/15008 is trying to acquire lock:
ffff88801544e6d8 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node ...

but task is already holding lock:
ffff88802dcc89f8 (&trie->lock){-.-.}-{2:2}, at: trie_update_elem ...

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&trie->lock){-.-.}-{2:2}:
       __raw_spin_lock_irqsave
       _raw_spin_lock_irqsave+0x3a/0x60
       trie_delete_elem+0xb0/0x820
       ___bpf_prog_run+0x3e51/0xabd0
       __bpf_prog_run32+0xc1/0x100
       bpf_dispatcher_nop_func
       ......
       bpf_trace_run2+0x231/0x590
       __bpf_trace_contention_end+0xca/0x110
       trace_contention_end.constprop.0+0xea/0x170
       __pv_queued_spin_lock_slowpath+0x28e/0xcc0
       pv_queued_spin_lock_slowpath
       queued_spin_lock_slowpath
       queued_spin_lock
       do_raw_spin_lock+0x210/0x2c0
       __raw_spin_lock_irqsave
       _raw_spin_lock_irqsave+0x42/0x60
       __put_partials+0xc3/0x170
       qlink_free
       qlist_free_all+0x4e/0x140
       kasan_quarantine_reduce+0x192/0x1e0
       __kasan_slab_alloc+0x69/0x90
       kasan_slab_alloc
       slab_post_alloc_hook
       slab_alloc_node
       kmem_cache_alloc_node_noprof+0x153/0x310
       __alloc_skb+0x2b1/0x380
       ......

-> #0 (&n->list_lock){-.-.}-{2:2}:
       check_prev_add
       check_prevs_add
       validate_chain
       __lock_acquire+0x2478/0x3b30
       lock_acquire
       lock_acquire+0x1b1/0x560
       __raw_spin_lock_irqsave
       _raw_spin_lock_irqsave+0x3a/0x60
       get_partial_node.part.0+0x20/0x350
       get_partial_node
       get_partial
       ___slab_alloc+0x65b/0x1870
       __slab_alloc.constprop.0+0x56/0xb0
       __slab_alloc_node
       slab_alloc_node
       __do_kmalloc_node
       __kmalloc_node_noprof+0x35c/0x440
       kmalloc_node_noprof
       bpf_map_kmalloc_node+0x98/0x4a0
       lpm_trie_node_alloc
       trie_update_elem+0x1ef/0xe00
       bpf_map_update_value+0x2c1/0x6c0
       map_update_elem+0x623/0x910
       __sys_bpf+0x90c/0x49a0
       ...

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&trie->lock);
                               lock(&n->list_lock);
                               lock(&trie->lock);
  lock(&n->list_lock);

 *** DEADLOCK ***

[1]: https://syzkaller.appspot.com/bug?extid=9045c0a3d5a7f1b119f7

A bpf program attached to trace_contention_end() triggers after
acquiring &n->list_lock. The program invokes trie_delete_elem(), which
then acquires trie->lock. However, it is possible that another
process is invoking trie_update_elem(). trie_update_elem() will acquire
trie->lock first, then invoke kmalloc_node(). kmalloc_node() may invoke
get_partial_node() and try to acquire &n->list_lock (not necessarily the
same lock object). Therefore, lockdep warns about the circular locking
dependency.

Invoking kmalloc() before acquiring trie->lock could fix the warning.
However, since BPF programs call be invoked from any context (e.g.,
through kprobe/tracepoint/fentry), there may still be lock ordering
problems for internal locks in kmalloc() or trie->lock itself.

To eliminate these potential lock ordering problems with kmalloc()'s
internal locks, replacing kmalloc()/kfree()/kfree_rcu() with equivalent
BPF memory allocator APIs that can be invoked in any context. The lock
ordering problems with trie->lock (e.g., reentrance) will be handled
separately.

Three aspects of this change require explanation:

1. Intermediate and leaf nodes are allocated from the same allocator.
Since the value size of LPM trie is usually small, using a single
alocator reduces the memory overhead of the BPF memory allocator.

2. Leaf nodes are allocated before disabling IRQs. This handles cases
where leaf_size is large (e.g., > 4KB - 8) and updates require
intermediate node allocation. If leaf nodes were allocated in
IRQ-disabled region, the free objects in BPF memory allocator would not
be refilled timely and the intermediate node allocation may fail.

3. Paired migrate_{disable|enable}() calls for node alloc and free. The
BPF memory allocator uses per-CPU struct internally, these paired calls
are necessary to guarantee correctness.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-7-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 months agobpf: Fix exact match conditions in trie_get_next_key()
Hou Tao [Fri, 6 Dec 2024 11:06:18 +0000 (19:06 +0800)]
bpf: Fix exact match conditions in trie_get_next_key()

trie_get_next_key() uses node->prefixlen == key->prefixlen to identify
an exact match, However, it is incorrect because when the target key
doesn't fully match the found node (e.g., node->prefixlen != matchlen),
these two nodes may also have the same prefixlen. It will return
expected result when the passed key exist in the trie. However when a
recently-deleted key or nonexistent key is passed to
trie_get_next_key(), it may skip keys and return incorrect result.

Fix it by using node->prefixlen == matchlen to identify exact matches.
When the condition is true after the search, it also implies
node->prefixlen equals key->prefixlen, otherwise, the search would
return NULL instead.

Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 months agobpf: Handle in-place update for full LPM trie correctly
Hou Tao [Fri, 6 Dec 2024 11:06:17 +0000 (19:06 +0800)]
bpf: Handle in-place update for full LPM trie correctly

When a LPM trie is full, in-place updates of existing elements
incorrectly return -ENOSPC.

Fix this by deferring the check of trie->n_entries. For new insertions,
n_entries must not exceed max_entries. However, in-place updates are
allowed even when the trie is full.

Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 months agobpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie
Hou Tao [Fri, 6 Dec 2024 11:06:16 +0000 (19:06 +0800)]
bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie

Add the currently missing handling for the BPF_EXIST and BPF_NOEXIST
flags. These flags can be specified by users and are relevant since LPM
trie supports exact matches during update.

Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 months agobpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem
Hou Tao [Fri, 6 Dec 2024 11:06:15 +0000 (19:06 +0800)]
bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem

There is no need to call kfree(im_node) when updating element fails,
because im_node must be NULL. Remove the unnecessary kfree() for
im_node.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 months agobpf: Remove unnecessary check when updating LPM trie
Hou Tao [Fri, 6 Dec 2024 11:06:14 +0000 (19:06 +0800)]
bpf: Remove unnecessary check when updating LPM trie

When "node->prefixlen == matchlen" is true, it means that the node is
fully matched. If "node->prefixlen == key->prefixlen" is false, it means
the prefix length of key is greater than the prefix length of node,
otherwise, matchlen will not be equal with node->prefixlen. However, it
also implies that the prefix length of node must be less than
max_prefixlen.

Therefore, "node->prefixlen == trie->max_prefixlen" will always be false
when the check of "node->prefixlen == key->prefixlen" returns false.
Remove this unnecessary comparison.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 months agoblk-mq: move cpuhp callback registering out of q->sysfs_lock
Ming Lei [Fri, 6 Dec 2024 11:16:07 +0000 (19:16 +0800)]
blk-mq: move cpuhp callback registering out of q->sysfs_lock

Registering and unregistering cpuhp callback requires global cpu hotplug lock,
which is used everywhere. Meantime q->sysfs_lock is used in block layer
almost everywhere.

It is easy to trigger lockdep warning[1] by connecting the two locks.

Fix the warning by moving blk-mq's cpuhp callback registering out of
q->sysfs_lock. Add one dedicated global lock for covering registering &
unregistering hctx's cpuhp, and it is safe to do so because hctx is
guaranteed to be live if our request_queue is live.

[1] https://lore.kernel.org/lkml/Z04pz3AlvI4o0Mr8@agluck-desk3/

Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Newman <peternewman@google.com>
Cc: Babu Moger <babu.moger@amd.com>
Reported-by: Luck Tony <tony.luck@intel.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20241206111611.978870-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoblk-mq: register cpuhp callback after hctx is added to xarray table
Ming Lei [Fri, 6 Dec 2024 11:16:06 +0000 (19:16 +0800)]
blk-mq: register cpuhp callback after hctx is added to xarray table

We need to retrieve 'hctx' from xarray table in the cpuhp callback, so the
callback should be registered after this 'hctx' is added to xarray table.

Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Newman <peternewman@google.com>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Luck Tony <tony.luck@intel.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20241206111611.978870-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agosmb: client: fix potential race in cifs_put_tcon()
Paulo Alcantara [Fri, 6 Dec 2024 14:49:07 +0000 (11:49 -0300)]
smb: client: fix potential race in cifs_put_tcon()

dfs_cache_refresh() delayed worker could race with cifs_put_tcon(), so
make sure to call list_replace_init() on @tcon->dfs_ses_list after
kworker is cancelled or finished.

Fixes: 4f42a8b54b5c ("smb: client: fix DFS interlink failover")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 months agosmb3.1.1: fix posix mounts to older servers
Steve French [Wed, 4 Dec 2024 23:46:00 +0000 (17:46 -0600)]
smb3.1.1: fix posix mounts to older servers

Some servers which implement the SMB3.1.1 POSIX extensions did not
set the file type in the mode in the infolevel 100 response.
With the recent changes for checking the file type via the mode field,
this can cause the root directory to be reported incorrectly and
mounts (e.g. to ksmbd) to fail.

Fixes: 6a832bc8bbb2 ("fs/smb/client: Implement new SMB3 POSIX type")
Cc: stable@vger.kernel.org
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Cc: Ralph Boehme <slow@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 months agox86/cacheinfo: Delete global num_cache_leaves
Ricardo Neri [Thu, 28 Nov 2024 00:22:47 +0000 (16:22 -0800)]
x86/cacheinfo: Delete global num_cache_leaves

Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all
CPUs from the same global "num_cache_leaves".

This is erroneous on systems such as Meteor Lake, where each CPU has a
distinct num_leaves value. Delete the global "num_cache_leaves" and
initialize num_leaves on each CPU.

init_cache_level() no longer needs to set num_leaves. Also, it never had to
set num_levels as it is unnecessary in x86. Keep checking for zero cache
leaves. Such condition indicates a bug.

  [ bp: Cleanup. ]

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org # 6.3+
Link: https://lore.kernel.org/r/20241128002247.26726-3-ricardo.neri-calderon@linux.intel.com
10 months agocacheinfo: Allocate memory during CPU hotplug if not done from the primary CPU
Ricardo Neri [Thu, 28 Nov 2024 00:22:46 +0000 (16:22 -0800)]
cacheinfo: Allocate memory during CPU hotplug if not done from the primary CPU

Commit

  5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU")

adds functionality that architectures can use to optionally allocate and
build cacheinfo early during boot. Commit

  6539cffa9495 ("cacheinfo: Add arch specific early level initializer")

lets secondary CPUs correct (and reallocate memory) cacheinfo data if
needed.

If the early build functionality is not used and cacheinfo does not need
correction, memory for cacheinfo is never allocated. x86 does not use
the early build functionality. Consequently, during the cacheinfo CPU
hotplug callback, last_level_cache_is_valid() attempts to dereference
a NULL pointer:

  BUG: kernel NULL pointer dereference, address: 0000000000000100
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEPMT SMP NOPTI
  CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1
  RIP: 0010: last_level_cache_is_valid+0x95/0xe0a

Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback
if not done earlier.

Moreover, before determining the validity of the last-level cache info,
ensure that it has been allocated. Simply checking for non-zero
cache_leaves() is not sufficient, as some architectures (e.g., Intel
processors) have non-zero cache_leaves() before allocation.

Dereferencing NULL cacheinfo can occur in update_per_cpu_data_slice_size().
This function iterates over all online CPUs. However, a CPU may have come
online recently, but its cacheinfo may not have been allocated yet.

While here, remove an unnecessary indentation in allocate_cache_info().

  [ bp: Massage. ]

Fixes: 6539cffa9495 ("cacheinfo: Add arch specific early level initializer")
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Radu Rendec <rrendec@redhat.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org # 6.3+
Link: https://lore.kernel.org/r/20241128002247.26726-2-ricardo.neri-calderon@linux.intel.com
10 months agox86/kexec: Restore GDT on return from ::preserve_context kexec
David Woodhouse [Thu, 5 Dec 2024 15:05:07 +0000 (15:05 +0000)]
x86/kexec: Restore GDT on return from ::preserve_context kexec

The restore_processor_state() function explicitly states that "the asm code
that gets us here will have restored a usable GDT". That wasn't true in the
case of returning from a ::preserve_context kexec. Make it so.

Without this, the kernel was depending on the called function to reload a
GDT which is appropriate for the kernel before returning.

Test program:

 #include <unistd.h>
 #include <errno.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <linux/kexec.h>
 #include <linux/reboot.h>
 #include <sys/reboot.h>
 #include <sys/syscall.h>

 int main (void)
 {
        struct kexec_segment segment = {};
unsigned char purgatory[] = {
0x66, 0xba, 0xf8, 0x03, // mov $0x3f8, %dx
0xb0, 0x42, // mov $0x42, %al
0xee, // outb %al, (%dx)
0xc3, // ret
};
int ret;

segment.buf = &purgatory;
segment.bufsz = sizeof(purgatory);
segment.mem = (void *)0x400000;
segment.memsz = 0x1000;
ret = syscall(__NR_kexec_load, 0x400000, 1, &segment, KEXEC_PRESERVE_CONTEXT);
if (ret) {
perror("kexec_load");
exit(1);
}

ret = syscall(__NR_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, LINUX_REBOOT_CMD_KEXEC);
if (ret) {
perror("kexec reboot");
exit(1);
}
printf("Success\n");
return 0;
 }

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241205153343.3275139-2-dwmw2@infradead.org
10 months agoiio: magnetometer: yas530: use signed integer type for clamp limits
Jakob Hauser [Fri, 29 Nov 2024 21:25:07 +0000 (22:25 +0100)]
iio: magnetometer: yas530: use signed integer type for clamp limits

In the function yas537_measure() there is a clamp_val() with limits of
-BIT(13) and BIT(13) - 1.  The input clamp value h[] is of type s32.  The
BIT() is of type unsigned long integer due to its define in
include/vdso/bits.h.  The lower limit -BIT(13) is recognized as -8192 but
expressed as an unsigned long integer.  The size of an unsigned long
integer differs between 32-bit and 64-bit architectures.  Converting this
to type s32 may lead to undesired behavior.

Additionally, in the calculation lines h[0], h[1] and h[2] the unsigned
long integer divisor BIT(13) causes an unsigned division, shifting the
left-hand side of the equation back and forth, possibly ending up in large
positive values instead of negative values on 32-bit architectures.

To solve those two issues, declare a signed integer with a value of
BIT(13).

There is another omission in the clamp line: clamp_val() returns a value
and it's going nowhere here.  Self-assign it to h[i] to make use of the
clamp macro.

Finally, replace clamp_val() macro by clamp() because after changing the
limits from type unsigned long integer to signed integer it's fine that
way.

Link: https://lkml.kernel.org/r/11609b2243c295d65ab4d47e78c239d61ad6be75.1732914810.git.jahau@rocketmail.com
Fixes: 65f79b501030 ("iio: magnetometer: yas530: Add YAS537 variant")
Signed-off-by: Jakob Hauser <jahau@rocketmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202411230458.dhZwh3TT-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202411282222.oF0B4110-lkp@intel.com/
Reviewed-by: David Laight <david.laight@aculab.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agosched/numa: fix memory leak due to the overwritten vma->numab_state
Adrian Huang [Wed, 13 Nov 2024 10:21:46 +0000 (18:21 +0800)]
sched/numa: fix memory leak due to the overwritten vma->numab_state

[Problem Description]
When running the hackbench program of LTP, the following memory leak is
reported by kmemleak.

  # /opt/ltp/testcases/bin/hackbench 20 thread 1000
  Running with 20*40 (== 800) tasks.

  # dmesg | grep kmemleak
  ...
  kmemleak: 480 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
  kmemleak: 665 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

  # cat /sys/kernel/debug/kmemleak
  unreferenced object 0xffff888cd8ca2c40 (size 64):
    comm "hackbench", pid 17142, jiffies 4299780315
    hex dump (first 32 bytes):
      ac 74 49 00 01 00 00 00 4c 84 49 00 01 00 00 00  .tI.....L.I.....
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    backtrace (crc bff18fd4):
      [<ffffffff81419a89>] __kmalloc_cache_noprof+0x2f9/0x3f0
      [<ffffffff8113f715>] task_numa_work+0x725/0xa00
      [<ffffffff8110f878>] task_work_run+0x58/0x90
      [<ffffffff81ddd9f8>] syscall_exit_to_user_mode+0x1c8/0x1e0
      [<ffffffff81dd78d5>] do_syscall_64+0x85/0x150
      [<ffffffff81e0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
  ...

This issue can be consistently reproduced on three different servers:
  * a 448-core server
  * a 256-core server
  * a 192-core server

[Root Cause]
Since multiple threads are created by the hackbench program (along with
the command argument 'thread'), a shared vma might be accessed by two or
more cores simultaneously. When two or more cores observe that
vma->numab_state is NULL at the same time, vma->numab_state will be
overwritten.

Although current code ensures that only one thread scans the VMAs in a
single 'numa_scan_period', there might be a chance for another thread
to enter in the next 'numa_scan_period' while we have not gotten till
numab_state allocation [1].

Note that the command `/opt/ltp/testcases/bin/hackbench 50 process 1000`
cannot the reproduce the issue. It is verified with 200+ test runs.

[Solution]
Use the cmpxchg atomic operation to ensure that only one thread executes
the vma->numab_state assignment.

[1] https://lore.kernel.org/lkml/1794be3c-358c-4cdc-a43d-a1f841d91ef7@amd.com/

Link: https://lkml.kernel.org/r/20241113102146.2384-1-ahuang12@lenovo.com
Fixes: ef6a22b70f6d ("sched/numa: apply the scan delay to every new vma")
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Reported-by: Jiwei Sun <sunjw10@lenovo.com>
Reviewed-by: Raghavendra K T <raghavendra.kt@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agomm/damon: fix order of arguments in damos_before_apply tracepoint
Akinobu Mita [Fri, 15 Nov 2024 18:20:23 +0000 (10:20 -0800)]
mm/damon: fix order of arguments in damos_before_apply tracepoint

Since the order of the scheme_idx and target_idx arguments in TP_ARGS is
reversed, they are stored in the trace record in reverse.

Link: https://lkml.kernel.org/r/20241115182023.43118-1-sj@kernel.org
Link: https://patch.msgid.link/20241112154828.40307-1-akinobu.mita@gmail.com
Fixes: c603c630b509 ("mm/damon/core: add a tracepoint for damos apply target regions")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agolib: stackinit: hide never-taken branch from compiler
Kees Cook [Sun, 17 Nov 2024 11:38:13 +0000 (03:38 -0800)]
lib: stackinit: hide never-taken branch from compiler

The never-taken branch leads to an invalid bounds condition, which is by
design. To avoid the unwanted warning from the compiler, hide the
variable from the optimizer.

../lib/stackinit_kunit.c: In function 'do_nothing_u16_zero':
../lib/stackinit_kunit.c:51:49: error: array subscript 1 is outside array bounds of 'u16[0]' {aka 'short unsigned int[]'} [-Werror=array-bounds=]
   51 | #define DO_NOTHING_RETURN_SCALAR(ptr)           *(ptr)
      |                                                 ^~~~~~
../lib/stackinit_kunit.c:219:24: note: in expansion of macro 'DO_NOTHING_RETURN_SCALAR'
  219 |                 return DO_NOTHING_RETURN_ ## which(ptr + 1);    \
      |                        ^~~~~~~~~~~~~~~~~~

Link: https://lkml.kernel.org/r/20241117113813.work.735-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agomm/filemap: don't call folio_test_locked() without a reference in next_uptodate_folio()
David Hildenbrand [Fri, 29 Nov 2024 12:53:03 +0000 (13:53 +0100)]
mm/filemap: don't call folio_test_locked() without a reference in next_uptodate_folio()

The folio can get freed + buddy-merged + reallocated in the meantime,
resulting in us calling folio_test_locked() possibly on a tail page.

This makes const_folio_flags VM_BUG_ON_PGFLAGS() when stumbling over the
tail page.

Could this result in other issues?  Doesn't look like it.  False positives
and false negatives don't really matter, because this folio would get
skipped either way when detecting that they have been reallocated in the
meantime.

Fix it by performing the folio_test_locked() checked after grabbing a
reference.  If this ever becomes a real problem, we could add a special
helper that racily checks if the bit is set even on tail pages ...  but
let's hope that's not required so we can just handle it cleaner: work on
the folio after we hold a reference.

Do we really need the folio_test_locked() check if we are going to trylock
briefly after?  Well, we can at least avoid a xas_reload().

It's a bit unclear which exact change introduced that issue.  Likely, ever
since we made PG_locked obey to the PF_NO_TAIL policy it could have been
triggered in some way.

Link: https://lkml.kernel.org/r/20241129125303.4033164-1-david@redhat.com
Fixes: 48c935ad88f5 ("page-flags: define PG_locked behavior on compound pages")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: syzbot+9f9a7f73fb079b2387a6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/674184c9.050a0220.1cc393.0001.GAE@google.com/
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agoscatterlist: fix incorrect func name in kernel-doc
Randy Dunlap [Sat, 30 Nov 2024 02:24:06 +0000 (18:24 -0800)]
scatterlist: fix incorrect func name in kernel-doc

Fix a kernel-doc warning by making the kernel-doc function description
match the function name:

include/linux/scatterlist.h:323: warning: expecting prototype for sg_unmark_bus_address(). Prototype was for sg_dma_unmark_bus_address() instead

Link: https://lkml.kernel.org/r/20241130022406.537973-1-rdunlap@infradead.org
Fixes: 42399301203e ("lib/scatterlist: add flag for indicating P2PDMA segments in an SGL")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agomm: correct typo in MMAP_STATE() macro
Lorenzo Stoakes [Mon, 18 Nov 2024 17:54:14 +0000 (17:54 +0000)]
mm: correct typo in MMAP_STATE() macro

We mistakenly refer to len rather than len_ here.  The only existing
caller passes len to the len_ parameter so this has no impact on the code,
but it is obviously incorrect to do this, so fix it.

Link: https://lkml.kernel.org/r/20241118175414.390827-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Jann Horn <jannh@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agomm: respect mmap hint address when aligning for THP
Kalesh Singh [Mon, 18 Nov 2024 21:46:48 +0000 (13:46 -0800)]
mm: respect mmap hint address when aligning for THP

Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
boundaries") updated __get_unmapped_area() to align the start address for
the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.

It does this by effectively looking up a region that is of size,
request_size + PMD_SIZE, and aligning up the start to a PMD boundary.

Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment on
32 bit") opted out of this for 32bit due to regressions in mmap base
randomization.

Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous mappings
to PMD-aligned sizes") restricted this to only mmap sizes that are
multiples of the PMD_SIZE due to reported regressions in some performance
benchmarks -- which seemed mostly due to the reduced spatial locality of
related mappings due to the forced PMD-alignment.

Another unintended side effect has emerged: When a user specifies an mmap
hint address, the THP alignment logic modifies the behavior, potentially
ignoring the hint even if a sufficiently large gap exists at the requested
hint location.

Example Scenario:

Consider the following simplified virtual address (VA) space:

    ...

    0x200000-0x400000 --- VMA A
    0x400000-0x600000 --- Hole
    0x600000-0x800000 --- VMA B

    ...

A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:

  - Before THP alignment: The requested region (size 0x200000) fits into
    the gap at 0x400000, so the hint is respected.

  - After alignment: The logic searches for a region of size
    0x400000 (len + PMD_SIZE) starting at 0x400000.
    This search fails due to the mapping at 0x600000 (VMA B), and the hint
    is ignored, falling back to arch_get_unmapped_area[_topdown]().

In general the hint is effectively ignored, if there is any existing
mapping in the below range:

     [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)

This changes the semantics of mmap hint; from ""Respect the hint if a
sufficiently large gap exists at the requested location" to "Respect the
hint only if an additional PMD-sized gap exists beyond the requested
size".

This has performance implications for allocators that allocate their heap
using mmap but try to keep it "as contiguous as possible" by using the end
of the exisiting heap as the address hint.  With the new behavior it's
more likely to get a much less contiguous heap, adding extra fragmentation
and performance overhead.

To restore the expected behavior; don't use
thp_get_unmapped_area_vmflags() when the user provided a hint address, for
anonymous mappings.

Note: As Yang Shi pointed out: the issue still remains for filesystems
which are using thp_get_unmapped_area() for their get_unmapped_area() op.
It is unclear what worklaods will regress for if we ignore THP alignment
when the hint address is provided for such file backed mappings -- so this
fix will be handled separately.

Link: https://lkml.kernel.org/r/20241118214650.3667577-1-kaleshsingh@google.com
Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hans Boehm <hboehm@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agomm: memcg: declare do_memsw_account inline
John Sperbeck [Thu, 28 Nov 2024 20:39:59 +0000 (12:39 -0800)]
mm: memcg: declare do_memsw_account inline

In commit 66d60c428b23 ("mm: memcg: move legacy memcg event code into
memcontrol-v1.c"), the static do_memsw_account() function was moved from a
.c file to a .h file.  Unfortunately, the traditional inline keyword
wasn't added.  If a file (e.g., a unit test) includes the .h file, but
doesn't refer to do_memsw_account(), it will get a warning like:

mm/memcontrol-v1.h:41:13: warning: unused function 'do_memsw_account' [-Wunused-function]
   41 | static bool do_memsw_account(void)
      |             ^~~~~~~~~~~~~~~~

Link: https://lkml.kernel.org/r/20241128203959.726527-1-jsperbeck@google.com
Fixes: 66d60c428b23 ("mm: memcg: move legacy memcg event code into memcontrol-v1.c")
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agomm/codetag: swap tags when migrate pages
David Wang [Fri, 29 Nov 2024 02:52:13 +0000 (10:52 +0800)]
mm/codetag: swap tags when migrate pages

Current solution to adjust codetag references during page migration is
done in 3 steps:

1. sets the codetag reference of the old page as empty (not pointing
   to any codetag);

2. subtracts counters of the new page to compensate for its own
   allocation;

3. sets codetag reference of the new page to point to the codetag of
   the old page.

This does not work if CONFIG_MEM_ALLOC_PROFILING_DEBUG=n because
set_codetag_empty() becomes NOOP.  Instead, let's simply swap codetag
references so that the new page is referencing the old codetag and the old
page is referencing the new codetag.  This way accounting stays valid and
the logic makes more sense.

Link: https://lkml.kernel.org/r/20241129025213.34836-1-00107082@163.com
Fixes: e0a955bf7f61 ("mm/codetag: add pgalloc_tag_copy()")
Signed-off-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/lkml/20241124074318.399027-1-00107082@163.com/
Acked-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agoocfs2: update seq_file index in ocfs2_dlm_seq_next
Wengang Wang [Tue, 19 Nov 2024 17:45:00 +0000 (09:45 -0800)]
ocfs2: update seq_file index in ocfs2_dlm_seq_next

The following INFO level message was seen:

seq_file: buggy .next function ocfs2_dlm_seq_next [ocfs2] did not
update position index

Fix:
Update *pos (so m->index) to make seq_read_iter happy though the index its
self makes no sense to ocfs2_dlm_seq_next.

Link: https://lkml.kernel.org/r/20241119174500.9198-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agostackdepot: fix stack_depot_save_flags() in NMI context
Marco Elver [Fri, 22 Nov 2024 15:39:47 +0000 (16:39 +0100)]
stackdepot: fix stack_depot_save_flags() in NMI context

Per documentation, stack_depot_save_flags() was meant to be usable from
NMI context if STACK_DEPOT_FLAG_CAN_ALLOC is unset.  However, it still
would try to take the pool_lock in an attempt to save a stack trace in the
current pool (if space is available).

This could result in deadlock if an NMI is handled while pool_lock is
already held.  To avoid deadlock, only try to take the lock in NMI context
and give up if unsuccessful.

The documentation is fixed to clearly convey this.

Link: https://lkml.kernel.org/r/Z0CcyfbPqmxJ9uJH@elver.google.com
Link: https://lkml.kernel.org/r/20241122154051.3914732-1-elver@google.com
Fixes: 4434a56ec209 ("stackdepot: make fast paths lock-less again")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agomm: open-code page_folio() in dump_page()
Matthew Wilcox (Oracle) [Mon, 25 Nov 2024 20:17:19 +0000 (20:17 +0000)]
mm: open-code page_folio() in dump_page()

page_folio() calls page_fixed_fake_head() which will misidentify this page
as being a fake head and load off the end of 'precise'.  We may have a
pointer to a fake head, but that's OK because it contains the right
information for dump_page().

gcc-15 is smart enough to catch this with -Warray-bounds:

In function 'page_fixed_fake_head',
    inlined from '_compound_head' at ../include/linux/page-flags.h:251:24,
    inlined from '__dump_page' at ../mm/debug.c:123:11:
../include/asm-generic/rwonce.h:44:26: warning: array subscript 9 is outside
+array bounds of 'struct page[1]' [-Warray-bounds=]

Link: https://lkml.kernel.org/r/20241125201721.2963278-2-willy@infradead.org
Fixes: fae7d834c43c ("mm: add __dump_folio()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Kees Cook <kees@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agomm: open-code PageTail in folio_flags() and const_folio_flags()
Matthew Wilcox (Oracle) [Mon, 25 Nov 2024 20:17:18 +0000 (20:17 +0000)]
mm: open-code PageTail in folio_flags() and const_folio_flags()

It is unsafe to call PageTail() in dump_page() as page_is_fake_head() will
almost certainly return true when called on a head page that is copied to
the stack.  That will cause the VM_BUG_ON_PGFLAGS() in const_folio_flags()
to trigger when it shouldn't.  Fortunately, we don't need to call
PageTail() here; it's fine to have a pointer to a virtual alias of the
page's flag word rather than the real page's flag word.

Link: https://lkml.kernel.org/r/20241125201721.2963278-1-willy@infradead.org
Fixes: fae7d834c43c ("mm: add __dump_folio()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Kees Cook <kees@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agomm: fix vrealloc()'s KASAN poisoning logic
Andrii Nakryiko [Tue, 26 Nov 2024 00:52:06 +0000 (16:52 -0800)]
mm: fix vrealloc()'s KASAN poisoning logic

When vrealloc() reuses already allocated vmap_area, we need to re-annotate
poisoned and unpoisoned portions of underlying memory according to the new
size.

This results in a KASAN splat recorded at [1].  A KASAN mis-reporting
issue where there is none.

Note, hard-coding KASAN_VMALLOC_PROT_NORMAL might not be exactly correct,
but KASAN flag logic is pretty involved and spread out throughout
__vmalloc_node_range_noprof(), so I'm using the bare minimum flag here and
leaving the rest to mm people to refactor this logic and reuse it here.

Link: https://lkml.kernel.org/r/20241126005206.3457974-1-andrii@kernel.org
Link: https://lore.kernel.org/bpf/67450f9b.050a0220.21d33d.0004.GAE@google.com/
Fixes: 3ddc2fefe6f3 ("mm: vmalloc: implement vrealloc()")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agoRevert "readahead: properly shorten readahead when falling back to do_page_cache_ra()"
Jan Kara [Tue, 26 Nov 2024 14:52:08 +0000 (15:52 +0100)]
Revert "readahead: properly shorten readahead when falling back to do_page_cache_ra()"

This reverts commit 7c877586da3178974a8a94577b6045a48377ff25.

Anders and Philippe have reported that recent kernels occasionally hang
when used with NFS in readahead code.  The problem has been bisected to
7c877586da3 ("readahead: properly shorten readahead when falling back to
do_page_cache_ra()").  The cause of the problem is that ra->size can be
shrunk by read_pages() call and subsequently we end up calling
do_page_cache_ra() with negative (read huge positive) number of pages.
Let's revert 7c877586da3 for now until we can find a proper way how the
logic in read_pages() and page_cache_ra_order() can coexist.  This can
lead to reduced readahead throughput due to readahead window confusion but
that's better than outright hangs.

Link: https://lkml.kernel.org/r/20241126145208.985-1-jack@suse.cz
Fixes: 7c877586da31 ("readahead: properly shorten readahead when falling back to do_page_cache_ra()")
Reported-by: Anders Blomdell <anders.blomdell@gmail.com>
Reported-by: Philippe Troin <phil@fifi.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Tested-by: Philippe Troin <phil@fifi.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agoselftests/damon: add _damon_sysfs.py to TEST_FILES
Maximilian Heyne [Wed, 27 Nov 2024 12:08:53 +0000 (12:08 +0000)]
selftests/damon: add _damon_sysfs.py to TEST_FILES

When running selftests I encountered the following error message with
some damon tests:

 # Traceback (most recent call last):
 #   File "[...]/damon/./damos_quota.py", line 7, in <module>
 #     import _damon_sysfs
 # ModuleNotFoundError: No module named '_damon_sysfs'

Fix this by adding the _damon_sysfs.py file to TEST_FILES so that it
will be available when running the respective damon selftests.

Link: https://lkml.kernel.org/r/20241127-picks-visitor-7416685b-mheyne@amazon.de
Fixes: 306abb63a8ca ("selftests/damon: implement a python module for test-purpose DAMON sysfs controls")
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agoselftest: hugetlb_dio: fix test naming
Mark Brown [Wed, 27 Nov 2024 16:14:22 +0000 (16:14 +0000)]
selftest: hugetlb_dio: fix test naming

The string logged when a test passes or fails is used by the selftest
framework to identify which test is being reported.  The hugetlb_dio test
not only uses the same strings for every test that is run but it also uses
different strings for test passes and failures which means that test
automation is unable to follow what the test is doing at all.

Pull the existing duplicated logging of the number of free huge pages
before and after the test out of the conditional and replace that and the
logging of the result with a single ksft_print_result() which incorporates
the parameters passed into the test into the output.

Link: https://lkml.kernel.org/r/20241127-kselftest-mm-hugetlb-dio-names-v1-1-22aab01bf550@kernel.org
Fixes: fae1980347bf ("selftests: hugetlb_dio: fixup check for initial conditions to skip in the start")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agoocfs2: free inode when ocfs2_get_init_inode() fails
Tetsuo Handa [Sat, 23 Nov 2024 13:28:34 +0000 (22:28 +0900)]
ocfs2: free inode when ocfs2_get_init_inode() fails

syzbot is reporting busy inodes after unmount, for commit 9c89fe0af826
("ocfs2: Handle error from dquot_initialize()") forgot to call iput() when
new_inode() succeeded and dquot_initialize() failed.

Link: https://lkml.kernel.org/r/e68c0224-b7c6-4784-b4fa-a9fc8c675525@I-love.SAKURA.ne.jp
Fixes: 9c89fe0af826 ("ocfs2: Handle error from dquot_initialize()")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot+0af00f6a2cba2058b5db@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0af00f6a2cba2058b5db
Tested-by: syzbot+0af00f6a2cba2058b5db@syzkaller.appspotmail.com
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agonilfs2: fix potential out-of-bounds memory access in nilfs_find_entry()
Ryusuke Konishi [Tue, 19 Nov 2024 17:23:37 +0000 (02:23 +0900)]
nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry()

Syzbot reported that when searching for records in a directory where the
inode's i_size is corrupted and has a large value, memory access outside
the folio/page range may occur, or a use-after-free bug may be detected if
KASAN is enabled.

This is because nilfs_last_byte(), which is called by nilfs_find_entry()
and others to calculate the number of valid bytes of directory data in a
page from i_size and the page index, loses the upper 32 bits of the 64-bit
size information due to an inappropriate type of local variable to which
the i_size value is assigned.

This caused a large byte offset value due to underflow in the end address
calculation in the calling nilfs_find_entry(), resulting in memory access
that exceeds the folio/page size.

Fix this issue by changing the type of the local variable causing the bit
loss from "unsigned int" to "u64".  The return value of nilfs_last_byte()
is also of type "unsigned int", but it is truncated so as not to exceed
PAGE_SIZE and no bit loss occurs, so no change is required.

Link: https://lkml.kernel.org/r/20241119172403.9292-1-konishi.ryusuke@gmail.com
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+96d5d14c47d97015c624@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=96d5d14c47d97015c624
Tested-by: syzbot+96d5d14c47d97015c624@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agokasan: make report_lock a raw spinlock
Jared Kangas [Tue, 19 Nov 2024 21:02:34 +0000 (13:02 -0800)]
kasan: make report_lock a raw spinlock

If PREEMPT_RT is enabled, report_lock is a sleeping spinlock and must not
be locked when IRQs are disabled.  However, KASAN reports may be triggered
in such contexts.  For example:

        char *s = kzalloc(1, GFP_KERNEL);
        kfree(s);
        local_irq_disable();
        char c = *s;  /* KASAN report here leads to spin_lock() */
        local_irq_enable();

Make report_spinlock a raw spinlock to prevent rescheduling when
PREEMPT_RT is enabled.

Link: https://lkml.kernel.org/r/20241119210234.1602529-1-jkangas@redhat.com
Fixes: 342a93247e08 ("locking/spinlock: Provide RT variant header: <linux/spinlock_rt.h>")
Signed-off-by: Jared Kangas <jkangas@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agomm/mempolicy: fix migrate_to_node() assuming there is at least one VMA in a MM
David Hildenbrand [Wed, 20 Nov 2024 20:11:51 +0000 (21:11 +0100)]
mm/mempolicy: fix migrate_to_node() assuming there is at least one VMA in a MM

We currently assume that there is at least one VMA in a MM, which isn't
true.

So we might end up having find_vma() return NULL, to then de-reference
NULL.  So properly handle find_vma() returning NULL.

This fixes the report:

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline]
RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194
Code: ...
RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000
RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044
RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1
R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003
R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8
FS:  00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709
 __do_sys_migrate_pages mm/mempolicy.c:1727 [inline]
 __se_sys_migrate_pages mm/mempolicy.c:1723 [inline]
 __x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

[akpm@linux-foundation.org: add unlikely()]
Link: https://lkml.kernel.org/r/20241120201151.9518-1-david@redhat.com
Fixes: 39743889aaf7 ("[PATCH] Swap Migration V5: sys_migrate_pages interface")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: syzbot+3511625422f7aa637f0d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/673d2696.050a0220.3c9d61.012f.GAE@google.com/T/
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agomm/gup: handle NULL pages in unpin_user_pages()
John Hubbard [Thu, 21 Nov 2024 03:49:33 +0000 (19:49 -0800)]
mm/gup: handle NULL pages in unpin_user_pages()

The recent addition of "pofs" (pages or folios) handling to gup has a
flaw: it assumes that unpin_user_pages() handles NULL pages in the pages**
array.  That's not the case, as I discovered when I ran on a new
configuration on my test machine.

Fix this by skipping NULL pages in unpin_user_pages(), just like
unpin_folios() already does.

Details: when booting on x86 with "numa=fake=2 movablecore=4G" on Linux
6.12, and running this:

    tools/testing/selftests/mm/gup_longterm

...I get the following crash:

BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:sanity_check_pinned_pages+0x3a/0x2d0
...
Call Trace:
 <TASK>
 ? __die_body+0x66/0xb0
 ? page_fault_oops+0x30c/0x3b0
 ? do_user_addr_fault+0x6c3/0x720
 ? irqentry_enter+0x34/0x60
 ? exc_page_fault+0x68/0x100
 ? asm_exc_page_fault+0x22/0x30
 ? sanity_check_pinned_pages+0x3a/0x2d0
 unpin_user_pages+0x24/0xe0
 check_and_migrate_movable_pages_or_folios+0x455/0x4b0
 __gup_longterm_locked+0x3bf/0x820
 ? mmap_read_lock_killable+0x12/0x50
 ? __pfx_mmap_read_lock_killable+0x10/0x10
 pin_user_pages+0x66/0xa0
 gup_test_ioctl+0x358/0xb20
 __se_sys_ioctl+0x6b/0xc0
 do_syscall_64+0x7b/0x150
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Link: https://lkml.kernel.org/r/20241121034933.77502-1-jhubbard@nvidia.com
Fixes: 94efde1d1539 ("mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agofs/proc/vmcore.c: fix warning when CONFIG_MMU=n
Andrew Morton [Thu, 14 Nov 2024 23:44:21 +0000 (15:44 -0800)]
fs/proc/vmcore.c: fix warning when CONFIG_MMU=n

>> fs/proc/vmcore.c:424:19: warning: 'mmap_vmcore_fault' defined but not used [-Wunused-function]
     424 | static vm_fault_t mmap_vmcore_fault(struct vm_fault *vmf)

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202411140156.2o0nS4fl-lkp@intel.com/
Cc: Qi Xi <xiqi2@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agolibbpf: Fix segfault due to libelf functions not setting errno
Quentin Monnet [Thu, 5 Dec 2024 13:59:42 +0000 (13:59 +0000)]
libbpf: Fix segfault due to libelf functions not setting errno

Libelf functions do not set errno on failure. Instead, it relies on its
internal _elf_errno value, that can be retrieved via elf_errno (or the
corresponding message via elf_errmsg()). From "man libelf":

    If a libelf function encounters an error it will set an internal
    error code that can be retrieved with elf_errno. Each thread
    maintains its own separate error code. The meaning of each error
    code can be determined with elf_errmsg, which returns a string
    describing the error.

As a consequence, libbpf should not return -errno when a function from
libelf fails, because an empty value will not be interpreted as an error
and won't prevent the program to stop. This is visible in
bpf_linker__add_file(), for example, where we call a succession of
functions that rely on libelf:

    err = err ?: linker_load_obj_file(linker, filename, opts, &obj);
    err = err ?: linker_append_sec_data(linker, &obj);
    err = err ?: linker_append_elf_syms(linker, &obj);
    err = err ?: linker_append_elf_relos(linker, &obj);
    err = err ?: linker_append_btf(linker, &obj);
    err = err ?: linker_append_btf_ext(linker, &obj);

If the object file that we try to process is not, in fact, a correct
object file, linker_load_obj_file() may fail with errno not being set,
and return 0. In this case we attempt to run linker_append_elf_sysms()
and may segfault.

This can happen (and was discovered) with bpftool:

    $ bpftool gen object output.o sample_ret0.bpf.c
    libbpf: failed to get ELF header for sample_ret0.bpf.c: invalid `Elf' handle
    zsh: segmentation fault (core dumped)  bpftool gen object output.o sample_ret0.bpf.c

Fix the issue by returning a non-null error code (-EINVAL) when libelf
functions fail.

Fixes: faf6ed321cf6 ("libbpf: Add BPF static linker APIs")
Signed-off-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241205135942.65262-1-qmo@kernel.org
10 months agoMerge tag 'audit-pr-20241205' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoor...
Linus Torvalds [Thu, 5 Dec 2024 23:11:39 +0000 (15:11 -0800)]
Merge tag 'audit-pr-20241205' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit build problem workaround from Paul Moore:
 "A minor audit patch that shuffles some code slightly to workaround a
  GCC bug affecting a number of people.

  The GCC folks have been able to reproduce the problem and are
  discussing solutions (see the bug report link in the commit), but
  since the workaround is trivial let's do that in the kernel so we can
  unblock people who are hitting this"

* tag 'audit-pr-20241205' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: workaround a GCC bug triggered by task comm changes

10 months agoMerge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg...
Linus Torvalds [Thu, 5 Dec 2024 23:02:20 +0000 (15:02 -0800)]
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd fixes from Jason Gunthorpe:
 "One bug fix and some documentation updates:

   - Correct typos in comments

   - Elaborate a comment about how the uAPI works for
     IOMMU_HW_INFO_TYPE_ARM_SMMUV3

   - Fix a double free on error path and add test coverage for the bug"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommu/arm-smmu-v3: Improve uAPI comment for IOMMU_HW_INFO_TYPE_ARM_SMMUV3
  iommufd/selftest: Cover IOMMU_FAULT_QUEUE_ALLOC in iommufd_fail_nth
  iommufd: Fix out_fput in iommufd_fault_alloc()
  iommufd: Fix typos in kernel-doc comments

10 months agoMerge tag 'drm-misc-fixes-2024-12-05' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Thu, 5 Dec 2024 22:40:46 +0000 (08:40 +1000)]
Merge tag 'drm-misc-fixes-2024-12-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

drm-misc-fixes v6.13-rc2:
- v3d performance counter fix.
- A lot of DP-MST related fixes.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2ce1650d-801f-4265-a876-5a8743f1c82b@linux.intel.com
10 months agoMerge tag 'v6.13-rc1-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Thu, 5 Dec 2024 22:38:49 +0000 (14:38 -0800)]
Merge tag 'v6.13-rc1-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - Three fixes for potential out of bound accesses in read and write
   paths (e.g. when alternate data streams enabled)

 - GCC 15 build fix

* tag 'v6.13-rc1-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: align aux_payload_buf to avoid OOB reads in cryptographic operations
  ksmbd: fix Out-of-Bounds Write in ksmbd_vfs_stream_write
  ksmbd: fix Out-of-Bounds Read in ksmbd_vfs_stream_read
  smb: server: Fix building with GCC 15

10 months agoMerge tag 'drm-xe-fixes-2024-12-04' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Thu, 5 Dec 2024 21:53:59 +0000 (07:53 +1000)]
Merge tag 'drm-xe-fixes-2024-12-04' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Missing init value and 64-bit write-order check (Zhanjung)
- Fix a memory allocation issue causing lockdep violation (John)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z1BidZBFQOLjz__J@fedora
10 months agosamples/bpf: Pass TPROGS_USER_CFLAGS to libbpf makefile
Eduard Zingerman [Wed, 4 Dec 2024 17:34:16 +0000 (09:34 -0800)]
samples/bpf: Pass TPROGS_USER_CFLAGS to libbpf makefile

Before commit [1], the value of a variable TPROGS_USER_CFLAGS was
passed to libbpf make command as a part of EXTRA_CFLAGS.
This commit makes sure that the value of TPROGS_USER_CFLAGS is still
passed to libbpf make command, in order to maintain backwards build
scripts compatibility.

[1] commit 5a6ea7022ff4 ("samples/bpf: Remove unnecessary -I flags from libbpf EXTRA_CFLAGS")

Fixes: 5a6ea7022ff4 ("samples/bpf: Remove unnecessary -I flags from libbpf EXTRA_CFLAGS")
Suggested-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/bpf/20241204173416.142240-1-eddyz87@gmail.com
10 months agoMerge tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 5 Dec 2024 18:25:06 +0000 (10:25 -0800)]
Merge tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from can and netfilter.

  Current release - regressions:

   - rtnetlink: fix double call of rtnl_link_get_net_ifla()

   - tcp: populate XPS related fields of timewait sockets

   - ethtool: fix access to uninitialized fields in set RXNFC command

   - selinux: use sk_to_full_sk() in selinux_ip_output()

  Current release - new code bugs:

   - net: make napi_hash_lock irq safe

   - eth:
      - bnxt_en: support header page pool in queue API
      - ice: fix NULL pointer dereference in switchdev

  Previous releases - regressions:

   - core: fix icmp host relookup triggering ip_rt_bug

   - ipv6:
      - avoid possible NULL deref in modify_prefix_route()
      - release expired exception dst cached in socket

   - smc: fix LGR and link use-after-free issue

   - hsr: avoid potential out-of-bound access in fill_frame_info()

   - can: hi311x: fix potential use-after-free

   - eth: ice: fix VLAN pruning in switchdev mode

  Previous releases - always broken:

   - netfilter:
      - ipset: hold module reference while requesting a module
      - nft_inner: incorrect percpu area handling under softirq

   - can: j1939: fix skb reference counting

   - eth:
      - mlxsw: use correct key block on Spectrum-4
      - mlx5: fix memory leak in mlx5hws_definer_calc_layout"

* tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
  net :mana :Request a V2 response version for MANA_QUERY_GF_STAT
  net: avoid potential UAF in default_operstate()
  vsock/test: verify socket options after setting them
  vsock/test: fix parameter types in SO_VM_SOCKETS_* calls
  vsock/test: fix failures due to wrong SO_RCVLOWAT parameter
  net/mlx5e: Remove workaround to avoid syndrome for internal port
  net/mlx5e: SD, Use correct mdev to build channel param
  net/mlx5: E-Switch, Fix switching to switchdev mode in MPV
  net/mlx5: E-Switch, Fix switching to switchdev mode with IB device disabled
  net/mlx5: HWS: Properly set bwc queue locks lock classes
  net/mlx5: HWS: Fix memory leak in mlx5hws_definer_calc_layout
  bnxt_en: handle tpa_info in queue API implementation
  bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap()
  bnxt_en: refactor tpa_info alloc/free into helpers
  geneve: do not assume mac header is set in geneve_xmit_skb()
  mlxsw: spectrum_acl_flex_keys: Use correct key block on Spectrum-4
  ethtool: Fix wrong mod state in case of verbose and no_mask bitset
  ipmr: tune the ipmr_can_free_table() checks.
  netfilter: nft_set_hash: skip duplicated elements pending gc run
  netfilter: ipset: Hold module reference while requesting a module
  ...

10 months agoMerge tag 'trace-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Thu, 5 Dec 2024 18:17:55 +0000 (10:17 -0800)]
Merge tag 'trace-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix trace histogram sort function cmp_entries_dup()

   The sort function cmp_entries_dup() returns either 1 or 0, and not -1
   if parameter "a" is less than "b" by memcmp().

 - Fix archs that call trace_hardirqs_off() without RCU watching

   Both x86 and arm64 no longer call any tracepoints with RCU not
   watching. It was assumed that it was safe to get rid of
   trace_*_rcuidle() version of the tracepoint calls. This was needed to
   get rid of the SRCU protection and be able to implement features like
   faultable traceponits and add rust tracepoints.

   Unfortunately, there were a few architectures that still relied on
   that logic. There's only one file that has tracepoints that are
   called without RCU watching. Add macro logic around the tracepoints
   for architectures that do not have CONFIG_ARCH_WANTS_NO_INSTR defined
   will check if the code is in the idle path (the only place RCU isn't
   watching), and enable RCU around calling the tracepoint, but only do
   it if the tracepoint is enabled.

* tag 'trace-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix archs that still call tracepoints without RCU watching
  tracing: Fix cmp_entries_dup() to respect sort() comparison rules

10 months agoMerge tag 'hid-for-linus-2024120501' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 5 Dec 2024 18:06:47 +0000 (10:06 -0800)]
Merge tag 'hid-for-linus-2024120501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Benjamin Tissoires:

 - regression fix in suspend/resume for i2c-hid (Kenny Levinsen)

 - fix wacom driver assuming a name can not be null (WangYuli)

 - a couple of constify changes/fixes (Thomas Weißschuh)

 - a couple of selftests/hid fixes (Maximilian Heyne & Benjamin
   Tissoires)

* tag 'hid-for-linus-2024120501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  selftests/hid: fix kfunc inclusions with newer bpftool
  HID: bpf: drop unneeded casts discarding const
  HID: bpf: constify hid_ops
  selftests: hid: fix typo and exit code
  HID: wacom: fix when get product name maybe null pointer
  HID: i2c-hid: Revert to using power commands to wake on resume

10 months agoarm64: ptrace: fix partial SETREGSET for NT_ARM_GCS
Mark Rutland [Thu, 5 Dec 2024 12:16:55 +0000 (12:16 +0000)]
arm64: ptrace: fix partial SETREGSET for NT_ARM_GCS

Currently gcs_set() doesn't initialize the temporary 'user_gcs'
variable, and a SETREGSET call with a length of 0, 8, or 16 will leave
some portion of this uninitialized. Consequently some arbitrary
uninitialized values may be written back to the relevant fields in task
struct, potentially leaking up to 192 bits of memory from the kernel
stack. The read is limited to a specific slot on the stack, and the
issue does not provide a write mechanism.

As gcs_set() rejects cases where user_gcs::features_enabled has bits set
other than PR_SHADOW_STACK_SUPPORTED_STATUS_MASK, a SETREGSET call with
a length of zero will randomly succeed or fail depending on the value of
the uninitialized value, it isn't possible to leak the full 192 bits.
With a length of 8 or 16, user_gcs::features_enabled can be initialized
to an accepted value, making it practical to leak 128 or 64 bits.

Fix this by initializing the temporary value before copying the regset
from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG,
NT_ARM_SYSTEM_CALL). In the case of a zero-length or partial write, the
existing contents of the fields which are not written to will be
retained.

To ensure that the extraction and insertion of fields is consistent
across the GETREGSET and SETREGSET calls, new task_gcs_to_user() and
task_gcs_from_user() helpers are added, matching the style of
pac_address_keys_to_user() and pac_address_keys_from_user().

Before this patch:

| # ./gcs-test
| Attempting to write NT_ARM_GCS::user_gcs = {
|     .features_enabled = 0x0000000000000000,
|     .features_locked  = 0x0000000000000000,
|     .gcspr_el0        = 0x900d900d900d900d,
| }
| SETREGSET(nt=0x410, len=24) wrote 24 bytes
|
| Attempting to read NT_ARM_GCS::user_gcs
| GETREGSET(nt=0x410, len=24) read 24 bytes
| Read NT_ARM_GCS::user_gcs = {
|     .features_enabled = 0x0000000000000000,
|     .features_locked  = 0x0000000000000000,
|     .gcspr_el0        = 0x900d900d900d900d,
| }
|
| Attempting partial write NT_ARM_GCS::user_gcs = {
|     .features_enabled = 0x0000000000000000,
|     .features_locked  = 0x1de7ec7edbadc0de,
|     .gcspr_el0        = 0x1de7ec7edbadc0de,
| }
| SETREGSET(nt=0x410, len=8) wrote 8 bytes
|
| Attempting to read NT_ARM_GCS::user_gcs
| GETREGSET(nt=0x410, len=24) read 24 bytes
| Read NT_ARM_GCS::user_gcs = {
|     .features_enabled = 0x0000000000000000,
|     .features_locked  = 0x000000000093e780,
|     .gcspr_el0        = 0xffff800083a63d50,
| }

After this patch:

| # ./gcs-test
| Attempting to write NT_ARM_GCS::user_gcs = {
|     .features_enabled = 0x0000000000000000,
|     .features_locked  = 0x0000000000000000,
|     .gcspr_el0        = 0x900d900d900d900d,
| }
| SETREGSET(nt=0x410, len=24) wrote 24 bytes
|
| Attempting to read NT_ARM_GCS::user_gcs
| GETREGSET(nt=0x410, len=24) read 24 bytes
| Read NT_ARM_GCS::user_gcs = {
|     .features_enabled = 0x0000000000000000,
|     .features_locked  = 0x0000000000000000,
|     .gcspr_el0        = 0x900d900d900d900d,
| }
|
| Attempting partial write NT_ARM_GCS::user_gcs = {
|     .features_enabled = 0x0000000000000000,
|     .features_locked  = 0x1de7ec7edbadc0de,
|     .gcspr_el0        = 0x1de7ec7edbadc0de,
| }
| SETREGSET(nt=0x410, len=8) wrote 8 bytes
|
| Attempting to read NT_ARM_GCS::user_gcs
| GETREGSET(nt=0x410, len=24) read 24 bytes
| Read NT_ARM_GCS::user_gcs = {
|     .features_enabled = 0x0000000000000000,
|     .features_locked  = 0x0000000000000000,
|     .gcspr_el0        = 0x900d900d900d900d,
| }

Fixes: 7ec3b57cb29f ("arm64/ptrace: Expose GCS via ptrace and core files")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241205121655.1824269-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
10 months agoarm64: ptrace: fix partial SETREGSET for NT_ARM_POE
Mark Rutland [Thu, 5 Dec 2024 12:16:54 +0000 (12:16 +0000)]
arm64: ptrace: fix partial SETREGSET for NT_ARM_POE

Currently poe_set() doesn't initialize the temporary 'ctrl' variable,
and a SETREGSET call with a length of zero will leave this
uninitialized. Consequently an arbitrary value will be written back to
target->thread.por_el0, potentially leaking up to 64 bits of memory from
the kernel stack. The read is limited to a specific slot on the stack,
and the issue does not provide a write mechanism.

Fix this by initializing the temporary value before copying the regset
from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG,
NT_ARM_SYSTEM_CALL). In the case of a zero-length write, the existing
contents of POR_EL1 will be retained.

Before this patch:

| # ./poe-test
| Attempting to write NT_ARM_POE::por_el0 = 0x900d900d900d900d
| SETREGSET(nt=0x40f, len=8) wrote 8 bytes
|
| Attempting to read NT_ARM_POE::por_el0
| GETREGSET(nt=0x40f, len=8) read 8 bytes
| Read NT_ARM_POE::por_el0 = 0x900d900d900d900d
|
| Attempting to write NT_ARM_POE (zero length)
| SETREGSET(nt=0x40f, len=0) wrote 0 bytes
|
| Attempting to read NT_ARM_POE::por_el0
| GETREGSET(nt=0x40f, len=8) read 8 bytes
| Read NT_ARM_POE::por_el0 = 0xffff8000839c3d50

After this patch:

| # ./poe-test
| Attempting to write NT_ARM_POE::por_el0 = 0x900d900d900d900d
| SETREGSET(nt=0x40f, len=8) wrote 8 bytes
|
| Attempting to read NT_ARM_POE::por_el0
| GETREGSET(nt=0x40f, len=8) read 8 bytes
| Read NT_ARM_POE::por_el0 = 0x900d900d900d900d
|
| Attempting to write NT_ARM_POE (zero length)
| SETREGSET(nt=0x40f, len=0) wrote 0 bytes
|
| Attempting to read NT_ARM_POE::por_el0
| GETREGSET(nt=0x40f, len=8) read 8 bytes
| Read NT_ARM_POE::por_el0 = 0x900d900d900d900d

Fixes: 175198199262 ("arm64/ptrace: add support for FEAT_POE")
Cc: <stable@vger.kernel.org> # 6.12.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241205121655.1824269-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
10 months agoarm64: ptrace: fix partial SETREGSET for NT_ARM_FPMR
Mark Rutland [Thu, 5 Dec 2024 12:16:53 +0000 (12:16 +0000)]
arm64: ptrace: fix partial SETREGSET for NT_ARM_FPMR

Currently fpmr_set() doesn't initialize the temporary 'fpmr' variable,
and a SETREGSET call with a length of zero will leave this
uninitialized. Consequently an arbitrary value will be written back to
target->thread.uw.fpmr, potentially leaking up to 64 bits of memory from
the kernel stack. The read is limited to a specific slot on the stack,
and the issue does not provide a write mechanism.

Fix this by initializing the temporary value before copying the regset
from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG,
NT_ARM_SYSTEM_CALL). In the case of a zero-length write, the existing
contents of FPMR will be retained.

Before this patch:

| # ./fpmr-test
| Attempting to write NT_ARM_FPMR::fpmr = 0x900d900d900d900d
| SETREGSET(nt=0x40e, len=8) wrote 8 bytes
|
| Attempting to read NT_ARM_FPMR::fpmr
| GETREGSET(nt=0x40e, len=8) read 8 bytes
| Read NT_ARM_FPMR::fpmr = 0x900d900d900d900d
|
| Attempting to write NT_ARM_FPMR (zero length)
| SETREGSET(nt=0x40e, len=0) wrote 0 bytes
|
| Attempting to read NT_ARM_FPMR::fpmr
| GETREGSET(nt=0x40e, len=8) read 8 bytes
| Read NT_ARM_FPMR::fpmr = 0xffff800083963d50

After this patch:

| # ./fpmr-test
| Attempting to write NT_ARM_FPMR::fpmr = 0x900d900d900d900d
| SETREGSET(nt=0x40e, len=8) wrote 8 bytes
|
| Attempting to read NT_ARM_FPMR::fpmr
| GETREGSET(nt=0x40e, len=8) read 8 bytes
| Read NT_ARM_FPMR::fpmr = 0x900d900d900d900d
|
| Attempting to write NT_ARM_FPMR (zero length)
| SETREGSET(nt=0x40e, len=0) wrote 0 bytes
|
| Attempting to read NT_ARM_FPMR::fpmr
| GETREGSET(nt=0x40e, len=8) read 8 bytes
| Read NT_ARM_FPMR::fpmr = 0x900d900d900d900d

Fixes: 4035c22ef7d4 ("arm64/ptrace: Expose FPMR via ptrace")
Cc: <stable@vger.kernel.org> # 6.9.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241205121655.1824269-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
10 months agoMerge tag 'linux-watchdog-6.13-rc1' of git://www.linux-watchdog.org/linux-watchdog
Linus Torvalds [Thu, 5 Dec 2024 18:03:43 +0000 (10:03 -0800)]
Merge tag 'linux-watchdog-6.13-rc1' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:

 - Add support for exynosautov920 SoC

 - Add support for Airoha EN7851 watchdog

 - Add support for MT6735 TOPRGU/WDT

 - Delete the cpu5wdt driver

 - Always print when registering watchdog fails

 - Several other small fixes and improvements

* tag 'linux-watchdog-6.13-rc1' of git://www.linux-watchdog.org/linux-watchdog: (36 commits)
  watchdog: rti: of: honor timeout-sec property
  watchdog: s3c2410_wdt: add support for exynosautov920 SoC
  dt-bindings: watchdog: Document ExynosAutoV920 watchdog bindings
  watchdog: mediatek: Add support for MT6735 TOPRGU/WDT
  watchdog: mediatek: Make sure system reset gets asserted in mtk_wdt_restart()
  dt-bindings: watchdog: fsl-imx-wdt: Add missing 'big-endian' property
  dt-bindings: watchdog: Document Qualcomm QCS8300
  docs: ABI: Fix spelling mistake in pretimeout_avaialable_governors
  Revert "watchdog: s3c2410_wdt: use exynos_get_pmu_regmap_by_phandle() for PMU regs"
  watchdog: rzg2l_wdt: Power on the watchdog domain in the restart handler
  watchdog: Switch back to struct platform_driver::remove()
  watchdog: it87_wdt: add PWRGD enable quirk for Qotom QCML04
  watchdog: da9063: Remove __maybe_unused notations
  watchdog: da9063: Do not use a global variable
  watchdog: Delete the cpu5wdt driver
  watchdog: Add support for Airoha EN7851 watchdog
  dt-bindings: watchdog: airoha: document watchdog for Airoha EN7581
  watchdog: sl28cpld_wdt: don't print out if registering watchdog fails
  watchdog: rza_wdt: don't print out if registering watchdog fails
  watchdog: rti_wdt: don't print out if registering watchdog fails
  ...

10 months agoarm64: ptrace: fix partial SETREGSET for NT_ARM_TAGGED_ADDR_CTRL
Mark Rutland [Thu, 5 Dec 2024 12:16:52 +0000 (12:16 +0000)]
arm64: ptrace: fix partial SETREGSET for NT_ARM_TAGGED_ADDR_CTRL

Currently tagged_addr_ctrl_set() doesn't initialize the temporary 'ctrl'
variable, and a SETREGSET call with a length of zero will leave this
uninitialized. Consequently tagged_addr_ctrl_set() will consume an
arbitrary value, potentially leaking up to 64 bits of memory from the
kernel stack. The read is limited to a specific slot on the stack, and
the issue does not provide a write mechanism.

As set_tagged_addr_ctrl() only accepts values where bits [63:4] zero and
rejects other values, a partial SETREGSET attempt will randomly succeed
or fail depending on the value of the uninitialized value, and the
exposure is significantly limited.

Fix this by initializing the temporary value before copying the regset
from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG,
NT_ARM_SYSTEM_CALL). In the case of a zero-length write, the existing
value of the tagged address ctrl will be retained.

The NT_ARM_TAGGED_ADDR_CTRL regset is only visible in the
user_aarch64_view used by a native AArch64 task to manipulate another
native AArch64 task. As get_tagged_addr_ctrl() only returns an error
value when called for a compat task, tagged_addr_ctrl_get() and
tagged_addr_ctrl_set() should never observe an error value from
get_tagged_addr_ctrl(). Add a WARN_ON_ONCE() to both to indicate that
such an error would be unexpected, and error handlnig is not missing in
either case.

Fixes: 2200aa7154cb ("arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset")
Cc: <stable@vger.kernel.org> # 5.10.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241205121655.1824269-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
10 months agodrm/v3d: Enable Performance Counters before clearing them
Maíra Canal [Wed, 4 Dec 2024 12:28:31 +0000 (09:28 -0300)]
drm/v3d: Enable Performance Counters before clearing them

On the Raspberry Pi 5, performance counters are not being cleared
when `v3d_perfmon_start()` is called, even though we write to the
CLR register. As a result, their values accumulate until they
overflow.

The expected behavior is for performance counters to reset to zero
at the start of a job. When the job finishes and the perfmon is
stopped, the counters should accurately reflect the values for that
specific job.

To ensure this behavior, the performance counters are now enabled
before being cleared. This allows the CLR register to function as
intended, zeroing the counter values when the job begins.

Fixes: 26a4dc29b74a ("drm/v3d: Expose performance counters to userspace")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241204122831.17015-1-mcanal@igalia.com
10 months agoarm64: cpufeature: Add GCS to cpucap_is_possible()
Robin Murphy [Thu, 5 Dec 2024 13:48:10 +0000 (13:48 +0000)]
arm64: cpufeature: Add GCS to cpucap_is_possible()

Since system_supports_gcs() ends up referring to cpucap_is_possible(),
teach the latter about GCS for consistency with similar features.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/416c7369fcdce4ebb2a8f12daae234507be27e38.1733406275.git.robin.murphy@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
10 months agoMerge tag 'nvme-6.13-2024-12-05' of git://git.infradead.org/nvme into block-6.13
Jens Axboe [Thu, 5 Dec 2024 17:14:36 +0000 (10:14 -0700)]
Merge tag 'nvme-6.13-2024-12-05' of git://git.infradead.org/nvme into block-6.13

Pull NVMe fixess from Keith:

"nvme fixes for Linux 6.13

 - Target fix using incorrect zero buffer (Nilay)
 - Device specifc deallocate quirk fixes (Christoph, Keith)
 - Fabrics fix for handling max command target bugs (Maurizio)
 - Cocci fix usage for kzalloc (Yu-Chen)
 - DMA size fix for host memory buffer feature (Christoph)
 - Fabrics queue cleanup fixes (Chunguang)"

* tag 'nvme-6.13-2024-12-05' of git://git.infradead.org/nvme:
  nvme-tcp: simplify nvme_tcp_teardown_io_queues()
  nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues()
  nvme-rdma: unquiesce admin_q before destroy it
  nvme-tcp: fix the memleak while create new ctrl failed
  nvme-pci: don't use dma_alloc_noncontiguous with 0 merge boundary
  nvmet: replace kmalloc + memset with kzalloc for data allocation
  nvme-fabrics: handle zero MAXCMD without closing the connection
  nvme-pci: remove two deallocate zeroes quirks
  nvme: don't apply NVME_QUIRK_DEALLOCATE_ZEROES when DSM is not supported
  nvmet: use kzalloc instead of ZERO_PAGE in nvme_execute_identify_ns_nvm()

10 months agoMerge tag 'asoc-fix-v6.13-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git...
Takashi Iwai [Thu, 5 Dec 2024 17:09:29 +0000 (18:09 +0100)]
Merge tag 'asoc-fix-v6.13-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.13

A few small fixes for v6.13, all system specific - the biggest thing is
the fix for jack handling over suspend on some Intel laptops.

10 months agovirtio-blk: don't keep queue frozen during system suspend
Ming Lei [Tue, 12 Nov 2024 12:58:21 +0000 (20:58 +0800)]
virtio-blk: don't keep queue frozen during system suspend

Commit 4ce6e2db00de ("virtio-blk: Ensure no requests in virtqueues before
deleting vqs.") replaces queue quiesce with queue freeze in virtio-blk's
PM callbacks. And the motivation is to drain inflight IOs before suspending.

block layer's queue freeze looks very handy, but it is also easy to cause
deadlock, such as, any attempt to call into bio_queue_enter() may run into
deadlock if the queue is frozen in current context. There are all kinds
of ->suspend() called in suspend context, so keeping queue frozen in the
whole suspend context isn't one good idea. And Marek reported lockdep
warning[1] caused by virtio-blk's freeze queue in virtblk_freeze().

[1] https://lore.kernel.org/linux-block/ca16370e-d646-4eee-b9cc-87277c89c43c@samsung.com/

Given the motivation is to drain in-flight IOs, it can be done by calling
freeze & unfreeze, meantime restore to previous behavior by keeping queue
quiesced during suspend.

Cc: Yi Sun <yi.sun@unisoc.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: virtualization@lists.linux.dev
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20241112125821.1475793-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
10 months agoclocksource: Make negative motion detection more robust
Thomas Gleixner [Tue, 3 Dec 2024 10:16:30 +0000 (11:16 +0100)]
clocksource: Make negative motion detection more robust

Guenter reported boot stalls on a emulated ARM 32-bit platform, which has a
24-bit wide clocksource.

It turns out that the calculated maximal idle time, which limits idle
sleeps to prevent clocksource wrap arounds, is close to the point where the
negative motion detection triggers.

  max_idle_ns:                    597268854 ns
  negative motion tripping point: 671088640 ns

If the idle wakeup is delayed beyond that point, the clocksource
advances far enough to trigger the negative motion detection. This
prevents the clock to advance and in the worst case the system stalls
completely if the consecutive sleeps based on the stale clock are
delayed as well.

Cure this by calculating a more robust cut-off value for negative motion,
which covers 87.5% of the actual clocksource counter width. Compare the
delta against this value to catch negative motion. This is specifically for
clock sources with a small counter width as their wrap around time is close
to the half counter width. For clock sources with wide counters this is not
a problem because the maximum idle time is far from the half counter width
due to the math overflow protection constraints.

For the case at hand this results in a tripping point of 1174405120ns.

Note, that this cannot prevent issues when the delay exceeds the 87.5%
margin, but that's not different from the previous unchecked version which
allowed arbitrary time jumps.

Systems with small counter width are prone to invalid results, but this
problem is unlikely to be seen on real hardware. If such a system
completely stalls for more than half a second, then there are other more
urgent problems than the counter wrapping around.

Fixes: c163e40af9b2 ("timekeeping: Always check for negative motion")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/8734j5ul4x.ffs@tglx
Closes: https://lore.kernel.org/all/387b120b-d68a-45e8-b6ab-768cd95d11c2@roeck-us.net
10 months agococo: virt: arm64: Do not enable cca guest driver by default
Suzuki K Poulose [Thu, 5 Dec 2024 14:36:34 +0000 (14:36 +0000)]
coco: virt: arm64: Do not enable cca guest driver by default

As per the guidelines, new drivers may not be set to default on.
An expert user can always select it.

Reported-by: Dan Williams <dan.j.williams@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Sami Mujawar <sami.mujawar@arm.com>
Link: https://lore.kernel.org/r/6750c695194cd_2508129427@dwillia2-xfh.jf.intel.com.notmuch
Link: https://lore.kernel.org/r/20241205143634.306114-1-suzuki.poulose@arm.com
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
10 months agodrm/dp_mst: Use reset_msg_rx_state() instead of open coding it
Imre Deak [Tue, 3 Dec 2024 16:02:23 +0000 (18:02 +0200)]
drm/dp_mst: Use reset_msg_rx_state() instead of open coding it

Use reset_msg_rx_state() in drm_dp_mst_handle_up_req() instead of
open-coding it.

Cc: Lyude Paul <lyude@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241203160223.2926014-8-imre.deak@intel.com
10 months agotracing: Fix archs that still call tracepoints without RCU watching
Steven Rostedt [Wed, 4 Dec 2024 15:04:14 +0000 (10:04 -0500)]
tracing: Fix archs that still call tracepoints without RCU watching

Tracepoints require having RCU "watching" as it uses RCU to do updates to
the tracepoints. There are some cases that would call a tracepoint when
RCU was not "watching". This was usually in the idle path where RCU has
"shutdown". For the few locations that had tracepoints without RCU
watching, there was an trace_*_rcuidle() variant that could be used. This
used SRCU for protection.

There are tracepoints that trace when interrupts and preemption are
enabled and disabled. In some architectures, these tracepoints are called
in a path where RCU is not watching. When x86 and arm64 removed these
locations, it was incorrectly assumed that it would be safe to remove the
trace_*_rcuidle() variant and also remove the SRCU logic, as it made the
code more complex and harder to implement new tracepoint features (like
faultable tracepoints and tracepoints in rust).

Instead of bringing back the trace_*_rcuidle(), as it will not be trivial
to do as new code has already been added depending on its removal, add a
workaround to the one file that still requires it (trace_preemptirq.c). If
the architecture does not define CONFIG_ARCH_WANTS_NO_INSTR, then check if
the code is in the idle path, and if so, call ct_irq_enter/exit() which
will enable RCU around the tracepoint.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20241204100414.4d3e06d0@gandalf.local.home
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: 48bcda684823 ("tracing: Remove definition of trace_*_rcuidle()")
Closes: https://lore.kernel.org/all/bddb02de-957a-4df5-8e77-829f55728ea2@roeck-us.net/
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
10 months agodrm/dp_mst: Reset message rx state after OOM in drm_dp_mst_handle_up_req()
Imre Deak [Tue, 3 Dec 2024 16:02:22 +0000 (18:02 +0200)]
drm/dp_mst: Reset message rx state after OOM in drm_dp_mst_handle_up_req()

After an out-of-memory error the reception state should be reset, so
that the next attempt receiving a message doesn't fail (due to getting a
start-of-message packet, while the reception state has already the
start-of-message flag set).

Cc: Lyude Paul <lyude@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241203160223.2926014-7-imre.deak@intel.com
10 months agodrm/dp_mst: Ensure mst_primary pointer is valid in drm_dp_mst_handle_up_req()
Imre Deak [Wed, 4 Dec 2024 13:20:07 +0000 (15:20 +0200)]
drm/dp_mst: Ensure mst_primary pointer is valid in drm_dp_mst_handle_up_req()

While receiving an MST up request message from one thread in
drm_dp_mst_handle_up_req(), the MST topology could be removed from
another thread via drm_dp_mst_topology_mgr_set_mst(false), freeing
mst_primary and setting drm_dp_mst_topology_mgr::mst_primary to NULL.
This could lead to a NULL deref/use-after-free of mst_primary in
drm_dp_mst_handle_up_req().

Avoid the above by holding a reference for mst_primary in
drm_dp_mst_handle_up_req() while it's used.

v2: Fix kfreeing the request if getting an mst_primary reference fails.

Cc: Lyude Paul <lyude@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241204132007.3132494-1-imre.deak@intel.com
10 months agodrm/dp_mst: Fix down request message timeout handling
Imre Deak [Tue, 3 Dec 2024 17:46:32 +0000 (19:46 +0200)]
drm/dp_mst: Fix down request message timeout handling

If receiving a reply for an MST down request message times out, the
thread receiving the reply in drm_dp_mst_handle_down_rep() could try to
dereference the drm_dp_sideband_msg_tx txmsg request message after the
thread waiting for the reply - calling drm_dp_mst_wait_tx_reply() - has
timed out and freed txmsg, hence leading to a use-after-free in
drm_dp_mst_handle_down_rep().

Prevent the above by holding the drm_dp_mst_topology_mgr::qlock in
drm_dp_mst_handle_down_rep() for the whole duration txmsg is looked up
from the request list and dereferenced.

v2: Fix unlocking mgr->qlock after verify_rx_request_type() fails.

Cc: Lyude Paul <lyude@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241203174632.2941402-1-imre.deak@intel.com
10 months agodrm/dp_mst: Simplify error path in drm_dp_mst_handle_down_rep()
Imre Deak [Tue, 3 Dec 2024 16:02:19 +0000 (18:02 +0200)]
drm/dp_mst: Simplify error path in drm_dp_mst_handle_down_rep()

Simplify the error return path in drm_dp_mst_handle_down_rep(),
preparing for the next patch.

While at it use reset_msg_rx_state() instead of open-coding it.

Cc: Lyude Paul <lyude@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241203160223.2926014-4-imre.deak@intel.com
10 months agodrm/dp_mst: Verify request type in the corresponding down message reply
Imre Deak [Tue, 3 Dec 2024 16:02:18 +0000 (18:02 +0200)]
drm/dp_mst: Verify request type in the corresponding down message reply

After receiving the response for an MST down request message, the
response should be accepted/parsed only if the response type matches
that of the request. Ensure this by checking if the request type code
stored both in the request and the reply match, dropping the reply in
case of a mismatch.

This fixes the topology detection for an MST hub, as described in the
Closes link below, where the hub sends an incorrect reply message after
a CLEAR_PAYLOAD_TABLE -> LINK_ADDRESS down request message sequence.

Cc: Lyude Paul <lyude@redhat.com>
Cc: <stable@vger.kernel.org>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12804
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241203160223.2926014-3-imre.deak@intel.com
10 months agodrm/dp_mst: Fix resetting msg rx state after topology removal
Imre Deak [Tue, 3 Dec 2024 16:02:17 +0000 (18:02 +0200)]
drm/dp_mst: Fix resetting msg rx state after topology removal

If the MST topology is removed during the reception of an MST down reply
or MST up request sideband message, the
drm_dp_mst_topology_mgr::up_req_recv/down_rep_recv states could be reset
from one thread via drm_dp_mst_topology_mgr_set_mst(false), racing with
the reading/parsing of the message from another thread via
drm_dp_mst_handle_down_rep() or drm_dp_mst_handle_up_req(). The race is
possible since the reader/parser doesn't hold any lock while accessing
the reception state. This in turn can lead to a memory corruption in the
reader/parser as described by commit bd2fccac61b4 ("drm/dp_mst: Fix MST
sideband message body length check").

Fix the above by resetting the message reception state if needed before
reading/parsing a message. Another solution would be to hold the
drm_dp_mst_topology_mgr::lock for the whole duration of the message
reception/parsing in drm_dp_mst_handle_down_rep() and
drm_dp_mst_handle_up_req(), however this would require a bigger change.
Since the fix is also needed for stable, opting for the simpler solution
in this patch.

Cc: Lyude Paul <lyude@redhat.com>
Cc: <stable@vger.kernel.org>
Fixes: 1d082618bbf3 ("drm/display/dp_mst: Fix down/up message handling after sink disconnect")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13056
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241203160223.2926014-2-imre.deak@intel.com