Helge Deller [Fri, 25 Aug 2023 19:50:33 +0000 (21:50 +0200)]
lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
The gcc compiler translates on some architectures the 64-bit
__builtin_clzll() function to a call to the libgcc function __clzdi2(),
which should take a 64-bit parameter on 32- and 64-bit platforms.
But in the current kernel code, the built-in __clzdi2() function is
defined to operate (wrongly) on 32-bit parameters if BITS_PER_LONG ==
32, thus the return values on 32-bit kernels are in the range from
[0..31] instead of the expected [0..63] range.
This patch fixes the in-kernel functions __clzdi2() and __ctzdi2() to
take a 64-bit parameter on 32-bit kernels as well, thus it makes the
functions identical for 32- and 64-bit kernels.
This bug went unnoticed since kernel 3.11 for over 10 years, and here
are some possible reasons for that:
a) Some architectures have assembly instructions to count the bits and
which are used instead of calling __clzdi2(), e.g. on x86 the bsr
instruction and on ppc cntlz is used. On such architectures the
wrong __clzdi2() implementation isn't used and as such the bug has
no effect and won't be noticed.
b) Some architectures link to libgcc.a, and the in-kernel weak
functions get replaced by the correct 64-bit variants from libgcc.a.
c) __builtin_clzll() and __clzdi2() doesn't seem to be used in many
places in the kernel, and most likely only in uncritical functions,
e.g. when printing hex values via seq_put_hex_ll(). The wrong return
value will still print the correct number, but just in a wrong
formatting (e.g. with too many leading zeroes).
d) 32-bit kernels aren't used that much any longer, so they are less
tested.
A trivial testcase to verify if the currently running 32-bit kernel is
affected by the bug is to look at the output of /proc/self/maps:
Here the kernel uses a correct implementation of __clzdi2():
Linus Torvalds [Fri, 25 Aug 2023 18:44:43 +0000 (11:44 -0700)]
Merge tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"18 hotfixes. 13 are cc:stable and the remainder pertain to post-6.4
issues or aren't considered suitable for a -stable backport"
* tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
shmem: fix smaps BUG sleeping while atomic
selftests: cachestat: catch failing fsync test on tmpfs
selftests: cachestat: test for cachestat availability
maple_tree: disable mas_wr_append() when other readers are possible
madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
madvise:madvise_free_huge_pmd(): don't use mapcount() against large folio for sharing check
madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
mm: multi-gen LRU: don't spin during memcg release
mm: memory-failure: fix unexpected return value in soft_offline_page()
radix tree: remove unused variable
mm: add a call to flush_cache_vmap() in vmap_pfn()
selftests/mm: FOLL_LONGTERM need to be updated to 0x100
nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()
mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast
selftests: cgroup: fix test_kmem_basic less than error
mm: enable page walking API to lock vmas during the walk
smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()
mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
Linus Torvalds [Fri, 25 Aug 2023 16:29:47 +0000 (09:29 -0700)]
Merge tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"This is obviously not ideal, particularly for something this late in
the cycle.
Unfortunately we found some uABI issues in the vector support while
reviewing the GDB port, which has triggered a revert -- probably a
good sign we should have reviewed GDB before merging this, I guess I
just dropped the ball because I was so worried about the context
extension and libc suff I forgot. Hence the late revert.
There's some risk here as we're still exposing the vector context for
signal handlers, but changing that would have meant reverting all of
the vector support. The issues we've found so far have been fixed
already and they weren't absolute showstoppers, so we're essentially
just playing it safe by holding ptrace support for another release (or
until we get through a proper userspace code review).
Summary:
- The vector ucontext extension has been extended with vlenb
- The vector registers ELF core dump note type has been changed to
avoid aliasing with the CSR type used in embedded systems
- Support for accessing vector registers via ptrace() has been
reverted
- Another build fix for the ISA spec changes around Zifencei/Zicsr
that manifests on some systems built with binutils-2.37 and
gcc-11.2"
* tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix build errors using binutils2.37 toolchains
RISC-V: vector: export VLENB csr in __sc_riscv_v_state
RISC-V: Remove ptrace support for vectors
Linus Torvalds [Fri, 25 Aug 2023 16:18:22 +0000 (09:18 -0700)]
Merge tag 'gpio-fixes-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix an irq mapping leak in gpio-sim
- associate the GPIO device's software node with the irq domain in
gpio-sim
* tag 'gpio-fixes-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sim: pass the GPIO device's software node to irq domain
gpio: sim: dispose of irq mappings before destroying the irq_sim domain
Linus Torvalds [Fri, 25 Aug 2023 16:10:16 +0000 (09:10 -0700)]
Merge tag 'pinctrl-v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Here are some Renesas and AMD driver fixes, the AMD fix affects
important laptops in the wild so this one is pretty important. It
seems a bit tough to get this right.
- Fix DT parsing and related locking in the Renesas driver.
- Fix wakeup IRQs in the AMD driver once again. Really tricky this
one"
* tag 'pinctrl-v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: amd: Mask wake bits on probe again
pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
pinctrl: renesas: rzv2m: Fix NULL pointer dereference in rzv2m_dt_subnode_to_map()
pinctrl: renesas: rzg2l: Fix NULL pointer dereference in rzg2l_dt_subnode_to_map()
Linus Torvalds [Fri, 25 Aug 2023 15:48:14 +0000 (08:48 -0700)]
Merge tag 'sound-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Hopefully the last bits for 6.5. It's slightly higher LOCs than
wished, but it doesn't look scary.
The biggest change is MAINTAINERS update for TI; it's good to have the
update before the final release, so that people can contact to the
right persons for bug reports (which shouldn't happen of course!)
The rest are all device-specific fixes and quirks, most for various
ASoC platforms"
* tag 'sound-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: amd: yc: Fix a non-functional mic on Lenovo 82SJ
ALSA: ymfpci: Fix the missing snd_card_free() call at probe error
ASoC: cs35l41: Correct amp_gain_tlv values
ASoC: amd: yc: Add VivoBook Pro 15 to quirks list for acp6x
ASoC: tas2781: fixed register access error when switching to other chips
ASoC: cs35l56: Add an ACPI match table
ASoC: cs35l56: Read firmware uuid from a device property instead of _SUB
ASoC: SOF: ipc4-pcm: fix possible null pointer deference
MAINTAINERS: Add entries for TEXAS INSTRUMENTS ASoC DRIVERS
Linus Torvalds [Fri, 25 Aug 2023 15:38:40 +0000 (08:38 -0700)]
Merge tag 'drm-fixes-2023-08-25' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"A bit bigger than I'd care for, but it's mostly a single vmwgfx fix
and a fix for an i915 hotplug probing. Otherwise misc i915, bridge,
panfrost and dma-buf fixes.
core:
- add a HPD poll helper
i915:
- fix regression in i915 polling
- fix docs build warning
- fix DG2 idle power consumption
* tag 'drm-fixes-2023-08-25' of git://anongit.freedesktop.org/drm/drm:
drm/i915: Fix HPD polling, reenabling the output poll work as needed
drm: Add an HPD poll helper to reschedule the poll work
drm/vmwgfx: Fix possible invalid drm gem put calls
drm/vmwgfx: Fix shader stage validation
dma-buf/sw_sync: Avoid recursive lock during fence signal
drm/i915: fix Sphinx indentation warning
drm/i915/dgfx: Enable d3cold at s2idle
drm/display/dp: Fix the DP DSC Receiver cap size
drm/panfrost: Skip speed binning on EOPNOTSUPP
drm: bridge: samsung-dsim: Fix init during host transfer
Linus Torvalds [Fri, 25 Aug 2023 02:39:20 +0000 (19:39 -0700)]
Merge tag 'trace-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix ring buffer being permanently disabled due to missed
record_disabled()
Changing the trace cpu mask will disable the ring buffers for the
CPUs no longer in the mask. But it fails to update the snapshot
buffer. If a snapshot takes place, the accounting for the ring buffer
being disabled is corrupted and this can lead to the ring buffer
being permanently disabled.
- Add test case for snapshot and cpu mask working together
- Fix memleak by the function graph tracer not getting closed properly.
The iterator is used to read the ring buffer. When it opens, it calls
the open function of a tracer, and when it is closed, it calls the
close iteration. While a trace is being read, it is still possible to
change the tracer.
If this happens between the function graph tracer and the wakeup
tracer (which uses function graph tracing), the tracers are not
closed properly during when the iterator sees the switch, and the
wakeup function did not initialize its private pointer to NULL, which
is used to know if the function graph tracer was the last tracer. It
could be fooled in thinking it is, but then on exit it does not call
the close function of the function graph tracer to clean up its data.
- Fix synthetic events on big endian machines, by introducing a union
that does the conversions properly.
- Fix synthetic events from printing out the number of elements in the
stacktrace when it shouldn't.
- Fix synthetic events stacktrace to not print a bogus value at the
end.
- Introduce a pipe_cpumask that prevents the trace_pipe files from
being opened by more than one task (file descriptor).
There was a race found where if splice is called, the iter->ent could
become stale and events could be missed. There's no point reading a
producer/consumer file by more than one task as they will corrupt
each other anyway. Add a cpumask that keeps track of the per_cpu
trace_pipe files as well as the global trace_pipe file that prevents
more than one open of a trace_pipe file that represents the same ring
buffer. This prevents the race from happening.
- Fix ftrace samples for arm64 to work with older compilers.
* tag 'trace-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
samples: ftrace: Replace bti assembly with hint for older compiler
tracing: Introduce pipe_cpumask to avoid race on trace_pipes
tracing: Fix memleak due to race between current_tracer and trace
tracing/synthetic: Allocate one additional element for size
tracing/synthetic: Skip first entry for stack traces
tracing/synthetic: Use union instead of casts
selftests/ftrace: Add a basic testcase for snapshot
tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
Linus Torvalds [Fri, 25 Aug 2023 02:10:53 +0000 (19:10 -0700)]
Merge tag 'media/v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fix from Mauro Carvalho Chehab:
"Fix a potential array out-of-bounds in the mediatek vcodec driver"
* tag 'media/v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
Hugh Dickins [Wed, 23 Aug 2023 05:14:47 +0000 (22:14 -0700)]
shmem: fix smaps BUG sleeping while atomic
smaps_pte_hole_lookup() is calling shmem_partial_swap_usage() with page
table lock held: but shmem_partial_swap_usage() does cond_resched_rcu() if
need_resched(): "BUG: sleeping function called from invalid context".
Since shmem_partial_swap_usage() is designed to count across a range, but
smaps_pte_hole_lookup() only calls it for a single page slot, just break
out of the loop on the last or only page, before checking need_resched().
Link: https://lkml.kernel.org/r/6fe3b3ec-abdf-332f-5c23-6a3b3a3b11a9@google.com Fixes: 230100321518 ("mm/smaps: simplify shmem handling of pte holes") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> [5.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andre Przywara [Mon, 21 Aug 2023 16:05:34 +0000 (17:05 +0100)]
selftests: cachestat: catch failing fsync test on tmpfs
The cachestat kselftest runs a test on a normal file, which is created
temporarily in the current directory. Among the tests it runs there is a
call to fsync(), which is expected to clean all dirty pages used by the
file.
However the tmpfs filesystem implements fsync() as noop_fsync(), so the
call will not even attempt to clean anything when this test file happens
to live on a tmpfs instance. This happens in an initramfs, or when the
current directory is in /dev/shm or sometimes /tmp.
To avoid this test failing wrongly, use statfs() to check which filesystem
the test file lives on. If that is "tmpfs", we skip the fsync() test.
Since the fsync test is only one part of the "normal file" test, we now
execute this twice, skipping the fsync part on the first call. This way
only the second test, including the fsync part, would be skipped.
Link: https://lkml.kernel.org/r/20230821160534.3414911-3-andre.przywara@arm.com Signed-off-by: Andre Przywara <andre.przywara@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andre Przywara [Mon, 21 Aug 2023 16:05:33 +0000 (17:05 +0100)]
selftests: cachestat: test for cachestat availability
Patch series "selftests: cachestat: fix run on older kernels", v2.
I ran all kernel selftests on some test machine, and stumbled upon
cachestat failing (among others). These patches fix the run on older
kernels and when the current directory is on a tmpfs instance.
This patch (of 2):
As cachestat is a new syscall, it won't be available on older kernels, for
instance those running on a development machine. At the moment the test
reports all tests as "not ok" in this case.
Test for the cachestat syscall availability first, before doing further
tests, and bail out early with a TAP SKIP comment.
This also uses the opportunity to add the proper TAP headers, and add one
check for proper error handling (illegal file descriptor).
Liam R. Howlett [Sat, 19 Aug 2023 00:43:55 +0000 (20:43 -0400)]
maple_tree: disable mas_wr_append() when other readers are possible
The current implementation of append may cause duplicate data and/or
incorrect ranges to be returned to a reader during an update. Although
this has not been reported or seen, disable the append write operation
while the tree is in rcu mode out of an abundance of caution.
During the analysis of the mas_next_slot() the following was
artificially created by separating the writer and reader code:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last slot write is to start of slot
store current contents in slot
overwrite old end pivot
mas_next_slot():
read end metadata
read old end pivot
return with incorrect range
store new value
Alternatively:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last lost write to end of slot
store value
mas_next_slot():
read end metadata
read old end pivot
read new end pivot
return with incorrect range
set old end pivot
There may be other accesses that are not safe since we are now updating
both metadata and pointers, so disabling append if there could be rcu
readers is the safest action.
Yin Fengwei [Tue, 8 Aug 2023 02:09:17 +0000 (10:09 +0800)]
madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
Commit 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a
folio") replaced the page_mapcount() with folio_mapcount() to check
whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-4-fengwei.yin@intel.com Fixes: 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio") Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> Reviewed-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yin Fengwei [Tue, 8 Aug 2023 02:09:16 +0000 (10:09 +0800)]
madvise:madvise_free_huge_pmd(): don't use mapcount() against large folio for sharing check
Commit fc986a38b670 ("mm: huge_memory: convert madvise_free_huge_pmd to
use a folio") replaced the page_mapcount() with folio_mapcount() to check
whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-3-fengwei.yin@intel.com Fixes: fc986a38b670 ("mm: huge_memory: convert madvise_free_huge_pmd to use a folio") Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> Reviewed-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yin Fengwei [Tue, 8 Aug 2023 02:09:15 +0000 (10:09 +0800)]
madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
Patch series "don't use mapcount() to check large folio sharing", v2.
In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(),
folio_mapcount() is used to check whether the folio is shared. But it's
not correct as folio_mapcount() returns total mapcount of large folio.
Use folio_estimated_sharers() here as the estimated number is enough.
This patchset will fix the cases:
User space application call madvise() with MADV_FREE, MADV_COLD and
MADV_PAGEOUT for specific address range. There are THP mapped to the
range. Without the patchset, the THP is skipped. With the patch, the
THP will be split and handled accordingly.
David reported the cow self test skip some cases because of MADV_PAGEOUT
skip THP:
https://lore.kernel.org/linux-mm/9e92e42d-488f-47db-ac9d-75b24cd0d037@intel.com/T/#mbf0f2ec7fbe45da47526de1d7036183981691e81
and I confirmed this patchset make it work again.
This patch (of 3):
Commit 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range()
to use folios") replaced the page_mapcount() with folio_mapcount() to
check whether the folio is shared by other mapping.
It's not correct for large folio. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-1-fengwei.yin@intel.com Link: https://lkml.kernel.org/r/20230808020917.2230692-2-fengwei.yin@intel.com Fixes: 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios") Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> Reviewed-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Thu, 24 Aug 2023 21:30:47 +0000 (14:30 -0700)]
Merge tag 'nfsd-6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Two last-minute one-liners for v6.5-rc. One got lost in the shuffle,
and the other was reported just this morning"
- Close race window when handling FREE_STATEID operations
- Fix regression in /proc/fs/nfsd/v4_end_grace introduced in v6.5-rc"
* tag 'nfsd-6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix a thinko introduced by recent trace point changes
nfsd: Fix race to FREE_STATEID and cl_revoked
Linus Torvalds [Thu, 24 Aug 2023 20:55:35 +0000 (13:55 -0700)]
Merge tag 'spi-fix-v6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A couple more small driver specific fixes for v6.5.
The device mode for Cadence had been broken by some recent updates
done for host mode and large transfers for multi-byte words on stm32
had been broken by an API update in what I think was a rebasing
incident"
* tag 'spi-fix-v6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-cadence: Fix data corruption issues in slave mode
spi: stm32: fix accidential revert to byte-sized transfer splitting
Mingzheng Xing [Thu, 24 Aug 2023 19:08:52 +0000 (03:08 +0800)]
riscv: Fix build errors using binutils2.37 toolchains
When building the kernel with binutils 2.37 and GCC-11.1.0/GCC-11.2.0,
the following error occurs:
Assembler messages:
Error: cannot find default versions of the ISA extension `zicsr'
Error: cannot find default versions of the ISA extension `zifencei'
The above error originated from this commit of binutils[0], which has been
resolved and backported by GCC-12.1.0[1] and GCC-11.3.0[2].
So fix this by change the GCC version in
CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC to GCC-11.3.0.
Dave Airlie [Thu, 24 Aug 2023 19:15:54 +0000 (05:15 +1000)]
Merge tag 'drm-misc-fixes-2023-08-24' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A samsung-dsim initialization fix, a devfreq fix for panfrost, a DP DSC
define fix, a recursive lock fix for dma-buf, a shader validation fix
and a reference counting fix for vmwgfx
- wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
- batman:
- do not get eth header before batadv_check_management_packet
- fix batadv_v_ogm_aggr_send memory leak
- bonding: fix macvlan over alb bond support
- mlxsw: set time stamp fields also when its type is MIRROR_UTC"
* tag 'net-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
selftests: bonding: add macvlan over bond testing
selftest: bond: add new topo bond_topo_2d1c.sh
bonding: fix macvlan over alb bond support
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix out of memory error handling
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: flush pending destroy work before netlink notifier
netfilter: nf_tables: validate all pending tables
ibmveth: Use dcbf rather than dcbfl
i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
net/sched: fix a qdisc modification with ambiguous command request
igc: Fix the typo in the PTM Control macro
batman-adv: Hold rtnl lock during MTU update via netlink
igb: Avoid starting unnecessary workqueues
can: raw: add missing refcount for memory leak fix
can: isotp: fix support for transmission of SF without flow control
bnx2x: new flag for track HW resource allocation
sfc: allocate a big enough SKB for loopback selftest packet
...
Mario Limonciello [Thu, 24 Aug 2023 01:11:49 +0000 (20:11 -0500)]
ASoC: amd: yc: Fix a non-functional mic on Lenovo 82SJ
Lenovo 82SJ doesn't have DMIC connected like 82V2 does. Narrow
the match down to only cover 82V2.
Reported-by: prosenfeld@Yuhsbstudents.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217063 Fixes: 2232b2dd8cd4 ("ASoC: amd: yc: Add Lenovo Yoga Slim 7 Pro X to quirks table") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com Link: https://lore.kernel.org/r/20230824011149.1395-1-mario.limonciello@amd.com Signed-off-by: Mark Brown <broonie@kernel.org
Paolo Abeni [Thu, 24 Aug 2023 08:33:22 +0000 (10:33 +0200)]
Merge tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter updates for net
This PR contains nf_tables updates for your *net* tree.
First patch fixes table validation, I broke this in 6.4 when tracking
validation state per table, reported by Pablo, fixup from myself.
Second patch makes sure objects waiting for memory release have been
released, this was broken in 6.1, patch from Pablo Neira Ayuso.
Patch three is a fix-for-fix from previous PR: In case a transaction
gets aborted, gc sequence counter needs to be incremented so pending
gc requests are invalidated, from Pablo.
Same for patch 4: gc list needs to use gc list lock, not destroy lock,
also from Pablo.
Patch 5 fixes a UaF in a set backend, but this should only occur when
failslab is enabled for GFP_KERNEL allocations, broken since feature
was added in 5.6, from myself.
Patch 6 fixes a double-free bug that was also added via previous PR:
We must not schedule gc work if the previous batch is still queued.
netfilter pull request 2023-08-23
* tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix out of memory error handling
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: flush pending destroy work before netlink notifier
netfilter: nf_tables: validate all pending tables
====================
Paolo Abeni [Thu, 24 Aug 2023 08:07:16 +0000 (10:07 +0200)]
Merge branch 'fix-macvlan-over-alb-bond-support'
Hangbin Liu says:
====================
fix macvlan over alb bond support
Currently, the macvlan over alb bond is broken after commit 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds").
Fix this and add relate tests.
====================
Hangbin Liu [Wed, 23 Aug 2023 07:19:04 +0000 (15:19 +0800)]
bonding: fix macvlan over alb bond support
The commit 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode
bonds") aims to enable the use of macvlans on top of rlb bond mode. However,
the current rlb bond mode only handles ARP packets to update remote neighbor
entries. This causes an issue when a macvlan is on top of the bond, and
remote devices send packets to the macvlan using the bond's MAC address
as the destination. After delivering the packets to the macvlan, the macvlan
will rejects them as the MAC address is incorrect. Consequently, this commit
makes macvlan over bond non-functional.
To address this problem, one potential solution is to check for the presence
of a macvlan port on the bond device using netif_is_macvlan_port(bond->dev)
and return NULL in the rlb_arp_xmit() function. However, this approach
doesn't fully resolve the situation when a VLAN exists between the bond and
macvlan.
So let's just do a partial revert for commit 14af9963ba1e in rlb_arp_xmit().
As the comment said, Don't modify or load balance ARPs that do not originate
locally.
Fixes: 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds") Reported-by: susan.zheng@veritas.com Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2117816 Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ido Schimmel [Wed, 23 Aug 2023 06:43:48 +0000 (09:43 +0300)]
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
Negative ifindexes are illegal, but the kernel does not validate the
ifindex in the ancillary header of RTM_NEWLINK messages, resulting in
the kernel generating a warning [1] when such an ifindex is specified.
Takashi Iwai [Thu, 24 Aug 2023 07:20:36 +0000 (09:20 +0200)]
Merge tag 'asoc-fix-v6.5-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.5
A relatively large but generally not super urgent set of fixes for ASoC,
including some quirks and a MAINTAINERS update. There's also an update
to cs35l56 to change the firmware ABI, there are no current shipping
systems which use the current interface and the sooner we get the new
interface in the less likely it is that something will start.
It'd be nice if these landed for v6.5 but not the end of the world if
they wait till v6.6.
Linus Torvalds [Wed, 23 Aug 2023 21:28:19 +0000 (14:28 -0700)]
Merge tag 'acpi-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Make an existing ACPI IRQ override quirk for PCSpecialist Elimina Pro
16 M work as intended (Hans de Goede)"
* tag 'acpi-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: resource: Fix IRQ override quirk for PCSpecialist Elimina Pro 16 M
Imre Deak [Tue, 22 Aug 2023 11:30:15 +0000 (14:30 +0300)]
drm/i915: Fix HPD polling, reenabling the output poll work as needed
After the commit in the Fixes: line below, HPD polling stopped working
on i915, since after that change calling drm_kms_helper_poll_enable()
doesn't restart drm_mode_config::output_poll_work if the work was
stopped (no connectors needing polling) and enabling polling for a
connector (during runtime suspend or detecting an HPD IRQ storm).
After the above change calling drm_kms_helper_poll_enable() is a nop
after it's been called already and polling for some connectors was
disabled/re-enabled.
Fix this by calling drm_kms_helper_poll_reschedule() added in the
previous patch instead, which reschedules the work whenever expected.
Fixes: d33a54e3991d ("drm/probe_helper: sort out poll_running vs poll_enabled") CC: stable@vger.kernel.org # 6.4+ Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: dri-devel@lists.freedesktop.org Reviewed-by: Jouni Högander <jouni.hogander@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230822113015.41224-2-imre.deak@intel.com
(cherry picked from commit 50452f2f76852322620b63e62922b85e955abe94) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Imre Deak [Tue, 22 Aug 2023 11:30:14 +0000 (14:30 +0300)]
drm: Add an HPD poll helper to reschedule the poll work
Add a helper to reschedule drm_mode_config::output_poll_work after
polling has been enabled for a connector (and needing a reschedule,
since previously polling was disabled for all connectors and hence
output_poll_work was not running).
This is needed by the next patch fixing HPD polling on i915.
Palmer Dabbelt [Tue, 22 Aug 2023 20:55:14 +0000 (13:55 -0700)]
Merge patch series "riscv: fix ptrace and export VLENB"
Andy Chiu <andy.chiu@sifive.com> says:
We add a vlenb field in Vector context and save it with the
riscv_vstate_save() macro. It should not cause performance regression as
VLENB is a design-time constant and is frequently used by hardware.
Also, adding this field into the __sc_riscv_v_state may benifit us on a
future compatibility issue becuse a hardware may have writable VLENB.
Adding and saving VLENB have an immediate benifit as it gives ptrace a
better view of the Vector extension and makes it possible to reconstruct
Vector register files from the dump without doing an additional csr read.
This patchset also sync the number of note types between us and gdb for
riscv to solve a conflicting note.
This is not an ABI break given that 6.5 has not been released yet.
* b4-shazam-merge:
RISC-V: vector: export VLENB csr in __sc_riscv_v_state
RISC-V: Remove ptrace support for vectors
Bartosz Golaszewski [Tue, 22 Aug 2023 19:29:43 +0000 (21:29 +0200)]
gpio: sim: pass the GPIO device's software node to irq domain
Associate the swnode of the GPIO device's (which is the interrupt
controller here) with the irq domain. Otherwise the interrupt-controller
device attribute is a no-op.
Fixes: cb8c474e79be ("gpio: sim: new testing module") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Bartosz Golaszewski [Tue, 22 Aug 2023 19:29:42 +0000 (21:29 +0200)]
gpio: sim: dispose of irq mappings before destroying the irq_sim domain
If a GPIO simulator device is unbound with interrupts still requested,
we will hit a use-after-free issue in __irq_domain_deactivate_irq(). The
owner of the irq domain must dispose of all mappings before destroying
the domain object.
Fixes: cb8c474e79be ("gpio: sim: new testing module") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Zack Rusin [Fri, 18 Aug 2023 04:13:01 +0000 (00:13 -0400)]
drm/vmwgfx: Fix possible invalid drm gem put calls
vmw_bo_unreference sets the input buffer to null on exit, resulting in
null ptr deref's on the subsequent drm gem put calls.
This went unnoticed because only very old userspace would be exercising
those paths but it wouldn't be hard to hit on old distros with brand
new kernels.
Introduce a new function that abstracts unrefing of user bo's to make
the code cleaner and more explicit.
Signed-off-by: Zack Rusin <zackr@vmware.com> Reported-by: Ian Forbes <iforbes@vmware.com> Fixes: 9ef8d83e8e25 ("drm/vmwgfx: Do not drop the reference to the handle too soon") Cc: <stable@vger.kernel.org> # v6.4+ Reviewed-by: Maaz Mombasawala<mombasawalam@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230818041301.407636-1-zack@kde.org
Zack Rusin [Fri, 16 Jun 2023 19:09:34 +0000 (15:09 -0400)]
drm/vmwgfx: Fix shader stage validation
For multiple commands the driver was not correctly validating the shader
stages resulting in possible kernel oopses. The validation code was only.
if ever, checking the upper bound on the shader stages but never a lower
bound (valid shader stages start at 1 not 0).
Takashi Iwai [Wed, 23 Aug 2023 16:16:25 +0000 (18:16 +0200)]
ALSA: ymfpci: Fix the missing snd_card_free() call at probe error
Like a few other drivers, YMFPCI driver needs to clean up with
snd_card_free() call at an error path of the probe; otherwise the
other devres resources are released before the card and it results in
the UAF.
This patch uses the helper for handling the probe error gracefully.
Linus Torvalds [Wed, 23 Aug 2023 15:32:52 +0000 (08:32 -0700)]
Merge tag 'platform-drivers-x86-v6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Final set of three small fixes for 6.5"
* tag 'platform-drivers-x86-v6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/mellanox: Fix mlxbf-tmfifo not handling all virtio CONSOLE notifications
platform/x86: ideapad-laptop: Add support for new hotkeys found on ThinkBook 14s Yoga ITL
platform/x86: lenovo-ymc: Add Lenovo Yoga 7 14ACN6 to ec_trigger_quirk_dmi_table
Shih-Yi Chen [Mon, 21 Aug 2023 15:06:27 +0000 (11:06 -0400)]
platform/mellanox: Fix mlxbf-tmfifo not handling all virtio CONSOLE notifications
rshim console does not show all entries of dmesg.
Fixed by setting MLXBF_TM_TX_LWM_IRQ for every CONSOLE notification.
Signed-off-by: Shih-Yi Chen <shihyic@nvidia.com> Reviewed-by: Liming Sung <limings@nvidia.com> Reviewed-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/20230821150627.26075-1-shihyic@nvidia.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Florian Westphal [Tue, 22 Aug 2023 20:03:57 +0000 (22:03 +0200)]
netfilter: nf_tables: defer gc run if previous batch is still pending
Don't queue more gc work, else we may queue the same elements multiple
times.
If an element is flagged as dead, this can mean that either the previous
gc request was invalidated/discarded by a transaction or that the previous
request is still pending in the system work queue.
The latter will happen if the gc interval is set to a very low value,
e.g. 1ms, and system work queue is backlogged.
The sets refcount is 1 if no previous gc requeusts are queued, so add
a helper for this and skip gc run if old requests are pending.
Add a helper for this and skip the gc run in this case.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 22 Aug 2023 17:49:52 +0000 (19:49 +0200)]
netfilter: nf_tables: fix out of memory error handling
Several instances of pipapo_resize() don't propagate allocation failures,
this causes a crash when fault injection is enabled for gfp_kernel slabs.
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Pablo Neira Ayuso [Mon, 21 Aug 2023 12:33:32 +0000 (14:33 +0200)]
netfilter: nf_tables: use correct lock to protect gc_list
Use nf_tables_gc_list_lock spinlock, not nf_tables_destroy_list_lock to
protect the gc list.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
Pablo Neira Ayuso [Thu, 17 Aug 2023 23:13:52 +0000 (01:13 +0200)]
netfilter: nf_tables: GC transaction race with abort path
Abort path is missing a synchronization point with GC transactions. Add
GC sequence number hence any GC transaction losing race will be
discarded.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
Pablo Neira Ayuso [Thu, 17 Aug 2023 23:13:31 +0000 (01:13 +0200)]
netfilter: nf_tables: flush pending destroy work before netlink notifier
Destroy work waits for the RCU grace period then it releases the objects
with no mutex held. All releases objects follow this path for
transactions, therefore, order is guaranteed and references to top-level
objects in the hierarchy remain valid.
However, netlink notifier might interfer with pending destroy work.
rcu_barrier() is not correct because objects are not release via RCU
callback. Flush destroy work before releasing objects from netlink
notifier path.
Fixes: d4bc8271db21 ("netfilter: nf_tables: netlink notifier might race to release objects") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Thu, 17 Aug 2023 18:28:32 +0000 (20:28 +0200)]
netfilter: nf_tables: validate all pending tables
We have to validate all tables in the transaction that are in
VALIDATE_DO state, the blamed commit below did not move the break
statement to its right location so we only validate one table.
Moreover, we can't init table->validate to _SKIP when a table object
is allocated.
If we do, then if a transcaction creates a new table and then
fails the transaction, nfnetlink will loop and nft will hang until
user cancels the command.
Add back the pernet state as a place to stash the last state encountered.
This is either _DO (we hit an error during commit validation) or _SKIP
(transaction passed all checks).
Fixes: 00c320f9b755 ("netfilter: nf_tables: make validation state per table") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
Charles Keepax [Wed, 23 Aug 2023 08:53:08 +0000 (09:53 +0100)]
ASoC: cs35l41: Correct amp_gain_tlv values
The current analog gain TLV seems to have completely incorrect values in
it. The gain starts at 0.5dB, proceeds in 1dB steps, and has no mute
value, correct the control to match.
Andrii Staikov [Tue, 22 Aug 2023 22:16:53 +0000 (15:16 -0700)]
i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
Add check for pf->vf not being NULL before dereferencing
pf->vf[vsi->vf_id] in updating VSI filter sync.
Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted
in the condition for clearing promisc mode bit.
Fixes: c87c938f62d8 ("i40e: Add VF VLAN pruning") Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Tue, 22 Aug 2023 10:12:31 +0000 (06:12 -0400)]
net/sched: fix a qdisc modification with ambiguous command request
When replacing an existing root qdisc, with one that is of the same kind, the
request boils down to essentially a parameterization change i.e not one that
requires allocation and grafting of a new qdisc. syzbot was able to create a
scenario which resulted in a taprio qdisc replacing an existing taprio qdisc
with a combination of NLM_F_CREATE, NLM_F_REPLACE and NLM_F_EXCL leading to
create and graft scenario.
The fix ensures that only when the qdisc kinds are different that we should
allow a create and graft, otherwise it goes into the "change" codepath.
While at it, fix the code and comments to improve readability.
While syzbot was able to create the issue, it did not zone on the root cause.
Analysis from Vladimir Oltean <vladimir.oltean@nxp.com> helped narrow it down.
v1->V2 changes:
- remove "inline" function definition (Vladmir)
- remove extrenous braces in branches (Vladmir)
- change inline function names (Pedro)
- Run tdc tests (Victor)
v2->v3 changes:
- dont break else/if (Simon)
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+a3618a167af2021433cd@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/20230816225759.g25x76kmgzya2gei@skbuf/T/ Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rob Clark [Fri, 18 Aug 2023 14:59:38 +0000 (07:59 -0700)]
dma-buf/sw_sync: Avoid recursive lock during fence signal
If a signal callback releases the sw_sync fence, that will trigger a
deadlock as the timeline_fence_release recurses onto the fence->lock
(used both for signaling and the the timeline tree).
To avoid that, temporarily hold an extra reference to the signalled
fences until after we drop the lock.
(This is an alternative implementation of https://patchwork.kernel.org/patch/11664717/
which avoids some potential UAF issues with the original patch.)
v2: Remove now obsolete comment, use list_move_tail() and
list_del_init()
Reported-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Fixes: d3c6dd1fb30d ("dma-buf/sw_sync: Synchronize signal vs syncpt free") Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230818145939.39697-1-robdclark@gmail.com Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>
Wei Chen [Thu, 10 Aug 2023 08:23:33 +0000 (08:23 +0000)]
media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
variable *nplanes is provided by user via system call argument. The
possible value of q_data->fmt->num_planes is 1-3, while the value
of *nplanes can be 1-8. The array access by index i can cause array
out-of-bounds.
Fix this bug by checking *nplanes against the array size.
Sasha Neftin [Mon, 21 Aug 2023 17:17:21 +0000 (10:17 -0700)]
igc: Fix the typo in the PTM Control macro
The IGC_PTM_CTRL_SHRT_CYC defines the time between two consecutive PTM
requests. The bit resolution of this field is six bits. That bit five was
missing in the mask. This patch comes to correct the typo in the
IGC_PTM_CTRL_SHRT_CYC macro.
Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/20230821171721.2203572-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sven Eckelmann [Mon, 21 Aug 2023 19:48:48 +0000 (21:48 +0200)]
batman-adv: Hold rtnl lock during MTU update via netlink
The automatic recalculation of the maximum allowed MTU is usually triggered
by code sections which are already rtnl lock protected by callers outside
of batman-adv. But when the fragmentation setting is changed via
batman-adv's own batadv genl family, then the rtnl lock is not yet taken.
But dev_set_mtu requires that the caller holds the rtnl lock because it
uses netdevice notifiers. And this code will then fail the check for this
lock:
RTNL: assertion failed at net/core/dev.c (1953)
Cc: stable@vger.kernel.org Reported-by: syzbot+f8812454d9b3ac00d282@syzkaller.appspotmail.com Fixes: c6a953cce8d0 ("batman-adv: Trigger events for auto adjusted MTU") Signed-off-by: Sven Eckelmann <sven@narfation.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821-batadv-missing-mtu-rtnl-lock-v1-1-1c5a7bfe861e@narfation.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 23 Aug 2023 00:18:56 +0000 (17:18 -0700)]
Merge branch 'can-fixes-for-6-5-rc7'
Oliver Hartkopp says:
====================
CAN fixes for 6.5-rc7
The isotp fix removes an unnecessary check which leads to delays and/or
a wrong error notification.
The fix for the CAN_RAW socket solves the last issue that has been
introduced with commit ee8b94c8510c ("can: raw: fix receiver memory leak")
in this upstream cycle (detected by Eric Dumazet).
====================
Oliver Hartkopp [Mon, 21 Aug 2023 14:45:47 +0000 (16:45 +0200)]
can: raw: add missing refcount for memory leak fix
Commit ee8b94c8510c ("can: raw: fix receiver memory leak") introduced
a new reference to the CAN netdevice that has assigned CAN filters.
But this new ro->dev reference did not maintain its own refcount which
lead to another KASAN use-after-free splat found by Eric Dumazet.
This patch ensures a proper refcount for the CAN nedevice.
Fixes: ee8b94c8510c ("can: raw: fix receiver memory leak") Reported-by: Eric Dumazet <edumazet@google.com> Cc: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20230821144547.6658-3-socketcan@hartkopp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oliver Hartkopp [Mon, 21 Aug 2023 14:45:46 +0000 (16:45 +0200)]
can: isotp: fix support for transmission of SF without flow control
The original implementation had a very simple handling for single frame
transmissions as it just sent the single frame without a timeout handling.
With the new echo frame handling the echo frame was also introduced for
single frames but the former exception ('simple without timers') has been
maintained by accident. This leads to a 1 second timeout when closing the
socket and to an -ECOMM error when CAN_ISOTP_WAIT_TX_DONE is selected.
As the echo handling is always active (also for single frames) remove the
wrong extra condition for single frames.
Thinh Tran [Fri, 18 Aug 2023 16:14:40 +0000 (11:14 -0500)]
bnx2x: new flag for track HW resource allocation
While injecting PCIe errors to the upstream PCIe switch of
a BCM57810 NIC, system hangs/crashes were observed.
After several calls to bnx2x_tx_timout() complete,
bnx2x_nic_unload() is called to free up HW resources
and bnx2x_napi_disable() is called to release NAPI objects.
Later, when the EEH driver calls bnx2x_io_slot_reset() to
complete the recovery process, bnx2x attempts to disable
NAPI again by calling bnx2x_napi_disable() and freeing
resources which have already been freed, resulting in a
hang or crash.
Introduce a new flag to track the HW resource and NAPI
allocation state, refactor duplicated code into a single
function, check page pool allocation status before freeing,
and reduces debug output when a TX timeout event occurs.
Reviewed-by: Manish Chopra <manishc@marvell.com> Tested-by: Abdul Haleem <abdhalee@in.ibm.com> Tested-by: David Christensen <drc@linux.vnet.ibm.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com> Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20230818161443.708785-2-thinhtr@linux.vnet.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andy Chiu [Wed, 16 Aug 2023 15:54:49 +0000 (15:54 +0000)]
RISC-V: vector: export VLENB csr in __sc_riscv_v_state
VLENB is critical for callers of ptrace to reconstruct Vector register
files from the register dump of NT_RISCV_VECTOR. Also, future systems
may will have a writable VLENB, so add it now to potentially save future
compatibility issue.
Palmer Dabbelt [Wed, 16 Aug 2023 15:54:48 +0000 (15:54 +0000)]
RISC-V: Remove ptrace support for vectors
We've found two bugs here: NT_RISCV_VECTOR steps on NT_RISCV_CSR (which
is only for embedded), and we don't have vlenb in the core dumps. Given
that we've have a pair of bugs croup up as part of the GDB review we've
probably got other issues, so let's just cut this for 6.5 and get it
right.
Fixes: 0c59922c769a ("riscv: Add ptrace vector support") Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20230816155450.26200-2-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Linus Torvalds [Tue, 22 Aug 2023 18:16:23 +0000 (11:16 -0700)]
Merge tag 'devicetree-fixes-for-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Fix DT node refcount when creating platform devices
- Fix deadlock in changeset code due to printing with devtree_lock held
- Fix unittest EXPECT strings for parse_phandle_with_args_map() test
- Fix IMA kexec memblock freeing
* tag 'devicetree-fixes-for-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of/platform: increase refcount of fwnode
of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock
of: unittest: Fix EXPECT for parse_phandle_with_args_map() test
mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer
Edward Cree [Mon, 21 Aug 2023 18:01:53 +0000 (19:01 +0100)]
sfc: allocate a big enough SKB for loopback selftest packet
Cited commits passed a size to alloc_skb that was only big enough for
the actual packet contents, but the following skb_put + memcpy writes
the whole struct efx_loopback_payload including leading and trailing
padding bytes (which are then stripped off with skb_pull/skb_trim).
This could cause an skb_over_panic, although in practice we get saved
by kmalloc_size_roundup.
Pass the entire size we use, instead of the size of the final packet.
Reported-by: Andy Moreton <andy.moreton@amd.com> Fixes: cf60ed469629 ("sfc: use padding to fix alignment in loopback test") Fixes: 30c24dd87f3f ("sfc: siena: use padding to fix alignment in loopback test") Fixes: 1186c6b31ee1 ("sfc: falcon: use padding to fix alignment in loopback test") Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821180153.18652-1-edward.cree@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 22 Aug 2023 18:04:01 +0000 (11:04 -0700)]
Merge tag 'wireless-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Two fixes:
- reorder buffer filter checks can cause bad shift/UBSAN
warning with newer HW, avoid the check (mac80211)
- add Kconfig dependency for iwlwifi for PTP clock usage
* tag 'wireless-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
wifi: iwlwifi: mvm: add dependency for PTP clock
====================
Marek Behún [Mon, 21 Aug 2023 12:14:53 +0000 (14:14 +0200)]
leds: trigger: netdev: rename 'hw_control' sysfs entry to 'offloaded'
Commit b655892ffd6d ("leds: trigger: netdev: expose hw_control status
via sysfs") exposed to sysfs the flag that tells whether the LED trigger
is offloaded to hardware, under the name "hw_control", since that is the
name under which this setting is called in the code.
Everywhere else in kernel when some work that is normally done in
software can be made to be done by hardware instead, we use the word
"offloading" to describe this, e.g. "LED blinking is offloaded to
hardware".
Normally renaming sysfs entries is a no-go because of backwards
compatibility. But since this patch was not yet released in a stable
kernel, I think it is still possible to rename it, if there is
consensus.
Fixes: b655892ffd6d ("leds: trigger: netdev: expose hw_control status via sysfs") Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230821121453.30203-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle [Mon, 21 Aug 2023 16:12:44 +0000 (17:12 +0100)]
net: ethernet: mtk_eth_soc: fix NULL pointer on hw reset
When a hardware reset is triggered on devices not initializing WED the
calls to mtk_wed_fe_reset and mtk_wed_fe_reset_complete dereference a
pointer on uninitialized stack memory.
Break out of both functions in case a hw_list entry is 0.
Linus Torvalds [Tue, 22 Aug 2023 17:50:17 +0000 (10:50 -0700)]
Merge tag 'nfs-for-6.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
- fix a use after free in nfs_direct_join_group() (Cc: stable)
- fix sysfs server name memory leak
- fix lock recovery hang in NFSv4.0
- fix page free in the error path for nfs42_proc_getxattr() and
__nfs4_get_acl_uncached()
- SUNRPC/rdma: fix receive buffer dma-mapping after a server disconnect
* tag 'nfs-for-6.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
xprtrdma: Remap Receive buffers after a reconnect
NFSv4: fix out path in __nfs4_get_acl_uncached
NFSv4.2: fix error handling in nfs42_proc_getxattr
NFS: Fix sysfs server name memory leak
NFS: Fix a use after free in nfs_direct_join_group()
NFSv4: Fix dropped lock for racing OPEN and delegation return
Linus Torvalds [Tue, 22 Aug 2023 17:38:29 +0000 (10:38 -0700)]
Merge tag 'selinux-pr-20230821' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"A small fix for a potential problem when cleaning up after a failed
SELinux policy load (list next pointer not being properly initialized
to NULL early enough)"
* tag 'selinux-pr-20230821' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: set next pointer before attaching to list
Anshuman Gupta [Wed, 16 Aug 2023 12:52:16 +0000 (18:22 +0530)]
drm/i915/dgfx: Enable d3cold at s2idle
System wide suspend already has support for lmem save/restore during
suspend therefore enabling d3cold for s2idle and keepng it disable for
runtime PM.(Refer below commit for d3cold runtime PM disable justification)
'commit 66eb93e71a7a ("drm/i915/dgfx: Keep PCI autosuspend control
'on' by default on all dGPU")'
It will reduce the DG2 Card power consumption to ~0 Watt
for s2idle power KPI.
Kees Cook [Fri, 18 Aug 2023 17:54:21 +0000 (10:54 -0700)]
tg3: Use slab_build_skb() when needed
The tg3 driver will use kmalloc() under some conditions. Check the
frag_size and use slab_build_skb() when frag_size is 0. Silences
the warning introduced by commit ce098da1497c ("skbuff: Introduce
slab_build_skb()"):
Use slab_build_skb() instead
...
tg3_poll_work+0x638/0xf90 [tg3]
Hangbin Liu [Thu, 17 Aug 2023 08:24:59 +0000 (16:24 +0800)]
selftests: bonding: do not set port down before adding to bond
Before adding a port to bond, it need to be set down first. In the
lacpdu test the author set the port down specifically. But commit a4abfa627c38 ("net: rtnetlink: Enslave device before bringing it up")
changed the operation order, the kernel will set the port down _after_
adding to bond. So all the ports will be down at last and the test failed.
In fact, the veth interfaces are already inactive when added. This
means there's no need to set them down again before adding to the bond.
Let's just remove the link down operation.
Fixes: a4abfa627c38 ("net: rtnetlink: Enslave device before bringing it up") Reported-by: Zhengchao Shao <shaozhengchao@huawei.com> Closes: https://lore.kernel.org/netdev/a0ef07c7-91b0-94bd-240d-944a330fcabd@huawei.com/ Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20230817082459.1685972-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Peng Fan [Mon, 21 Aug 2023 02:39:28 +0000 (10:39 +0800)]
of/platform: increase refcount of fwnode
commit 0f8e5651095b
("of/platform: Propagate firmware node by calling device_set_node()")
use of_fwnode_handle to replace of_node_get, which introduces a side
effect that the refcount is not increased. Then the out of tree
jailhouse hypervisor enable/disable test will trigger kernel dump in
of_overlay_remove, with the following sequence
"
of_changeset_revert(&overlay_changeset);
of_changeset_destroy(&overlay_changeset);
of_overlay_remove(&overlay_id);
"
So increase the refcount to avoid issues.
This patch also release the refcount when releasing amba device to avoid
refcount leakage.
Fixes: 0f8e5651095b ("of/platform: Propagate firmware node by calling device_set_node()") Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/20230821023928.3324283-2-peng.fan@oss.nxp.com Signed-off-by: Rob Herring <robh@kernel.org>
T.J. Mercier [Mon, 14 Aug 2023 15:16:36 +0000 (15:16 +0000)]
mm: multi-gen LRU: don't spin during memcg release
When a memcg is in the process of being released mem_cgroup_tryget will
fail because its reference count has already reached 0. This can happen
during reclaim if the memcg has already been offlined, and we reclaim all
remaining pages attributed to the offlined memcg. shrink_many attempts to
skip the empty memcg in this case, and continue reclaiming from the
remaining memcgs in the old generation. If there is only one memcg
remaining, or if all remaining memcgs are in the process of being released
then shrink_many will spin until all memcgs have finished being released.
The release occurs through a workqueue, so it can take a while before
kswapd is able to make any further progress.
This fix results in reductions in kswapd activity and direct reclaim in
a test where 28 apps (working set size > total memory) are repeatedly
launched in a random sequence:
Miaohe Lin [Tue, 27 Jun 2023 11:28:08 +0000 (19:28 +0800)]
mm: memory-failure: fix unexpected return value in soft_offline_page()
When page_handle_poison() fails to handle the hugepage or free page in
retry path, soft_offline_page() will return 0 while -EBUSY is expected in
this case.
Consequently the user will think soft_offline_page succeeds while it in
fact failed. So the user will not try again later in this case.
Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexandre Ghiti [Wed, 9 Aug 2023 16:46:33 +0000 (18:46 +0200)]
mm: add a call to flush_cache_vmap() in vmap_pfn()
flush_cache_vmap() must be called after new vmalloc mappings are installed
in the page table in order to allow architectures to make sure the new
mapping is visible.
It could lead to a panic since on some architectures (like powerpc),
the page table walker could see the wrong pte value and trigger a
spurious page fault that can not be resolved (see commit f1cb8f9beba8
("powerpc/64s/radix: avoid ptesync after set_pte and
ptep_set_access_flags")).
But actually the patch is aiming at riscv: the riscv specification
allows the caching of invalid entries in the TLB, and since we recently
removed the vmalloc page fault handling, we now need to emit a tlb
shootdown whenever a new vmalloc mapping is emitted
(https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivosinc.com/).
That's a temporary solution, there are ways to avoid that :)
Ayush Jain [Tue, 8 Aug 2023 12:43:47 +0000 (07:43 -0500)]
selftests/mm: FOLL_LONGTERM need to be updated to 0x100
After commit 2c2241081f7d ("mm/gup: move private gup FOLL_ flags to
internal.h") FOLL_LONGTERM flag value got updated from 0x10000 to 0x100 at
include/linux/mm_types.h.
As hmm.hmm_device_private.hmm_gup_test uses FOLL_LONGTERM Updating same
here as well.
Before this change test goes in an infinite assert loop in
hmm.hmm_device_private.hmm_gup_test
==========================================================
RUN hmm.hmm_device_private.hmm_gup_test ...
hmm-tests.c:1962:hmm_gup_test:Expected HMM_DMIRROR_PROT_WRITE..
..(2) == m[2] (34)
hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0)
hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0)
...
==========================================================
Ryusuke Konishi [Sat, 5 Aug 2023 13:20:38 +0000 (22:20 +0900)]
nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()
A syzbot stress test reported that create_empty_buffers() called from
nilfs_lookup_dirty_data_buffers() can cause a general protection fault.
Analysis using its reproducer revealed that the back reference "mapping"
from a page/folio has been changed to NULL after dirty page/folio gang
lookup in nilfs_lookup_dirty_data_buffers().
Fix this issue by excluding pages/folios from being collected if, after
acquiring a lock on each page/folio, its back reference "mapping" differs
from the pointer to the address space struct that held the page/folio.
David Hildenbrand [Sat, 5 Aug 2023 10:12:56 +0000 (12:12 +0200)]
mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast
In contrast to most other GUP code, GUP-fast common page table walking
code like gup_pte_range() also handles hugetlb pages. But in contrast to
other hugetlb page table walking code, it does not look at the hugetlb PTE
abstraction whereby we have only a single logical hugetlb PTE per hugetlb
page, even when using multiple cont-PTEs underneath -- which is for
example what huge_ptep_get() abstracts.
So when we have a hugetlb page that is mapped via cont-PTEs, GUP-fast
might stumble over a PTE that does not map the head page of a hugetlb page
-- not the first "head" PTE of such a cont mapping.
Logically, the whole hugetlb page is mapped (entire_mapcount == 1), but we
might end up calling gup_must_unshare() with a tail page of a hugetlb
page.
We only maintain a single PageAnonExclusive flag per hugetlb page (as
hugetlb pages cannot get partially COW-shared), stored for the head page.
That flag is clear for all tail pages.
So when gup_must_unshare() ends up calling PageAnonExclusive() with a tail
page of a hugetlb page:
Always detects "not exclusive" for passed tail pages and refuses to PIN
the tail pages R/O, as gup_must_unshare() == true. GUP-fast will fallback
to ordinary GUP. As ordinary GUP properly considers the logical hugetlb
PTE abstraction in hugetlb_follow_page_mask(), pinning the page will
succeed when looking at the PageAnonExclusive on the head page only.
So the only real effect of this is that with cont-PTE hugetlb pages, we'll
always fallback from GUP-fast to ordinary GUP when not working on the head
page, which ends up checking the head page and do the right thing.
Consequently, the cow selftests pass with cont-PTE hugetlb pages as well
without CONFIG_DEBUG_VM_PGFLAGS.
Note that this only applies to anon hugetlb pages that are mapped using
cont-PTEs: for example 64k hugetlb pages on a 4k arm64 kernel.
... and only when R/O-pinning (FOLL_PIN) such pages that are mapped into
the page table R/O using GUP-fast.
On production kernels (and even most debug kernels, that don't set
CONFIG_DEBUG_VM_PGFLAGS) this patch should theoretically not be required
to be backported. But of course, it does not hurt.
Link: https://lkml.kernel.org/r/20230805101256.87306-1-david@redhat.com Fixes: a7f226604170 ("mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lucas Karpinski [Fri, 4 Aug 2023 19:35:29 +0000 (15:35 -0400)]
selftests: cgroup: fix test_kmem_basic less than error
test_kmem_basic creates 100,000 negative dentries, with each one mapping
to a slab object. After memory.high is set, these are reclaimed through
the shrink_slab function call which reclaims all 100,000 entries. The
test passes the majority of the time because when slab1 or current is
calculated, it is often above 0, however, 0 is also an acceptable value.
Link: https://lkml.kernel.org/r/7d6gcuyzdjcice6qbphrmpmv5skr5jtglg375unnjxqhstvhxc@qkn6dw6bao6v Signed-off-by: Lucas Karpinski <lkarpins@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Suren Baghdasaryan [Fri, 4 Aug 2023 15:27:19 +0000 (08:27 -0700)]
mm: enable page walking API to lock vmas during the walk
walk_page_range() and friends often operate under write-locked mmap_lock.
With introduction of vma locks, the vmas have to be locked as well during
such walks to prevent concurrent page faults in these areas. Add an
additional member to mm_walk_ops to indicate locking requirements for the
walk.
The change ensures that page walks which prevent concurrent page faults
by write-locking mmap_lock, operate correctly after introduction of
per-vma locks. With per-vma locks page faults can be handled under vma
lock without taking mmap_lock at all, so write locking mmap_lock would
not stop them. The change ensures vmas are properly locked during such
walks.
A sample issue this solves is do_mbind() performing queue_pages_range()
to queue pages for migration. Without this change a concurrent page
can be faulted into the area and be left out of migration.
Link: https://lkml.kernel.org/r/20230804152724.3090321-2-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Suggested-by: Jann Horn <jannh@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Thu, 3 Aug 2023 14:32:03 +0000 (16:32 +0200)]
smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()
We shouldn't be using a GUP-internal helper if it can be avoided.
Similar to smaps_pte_entry() that uses vm_normal_page(), let's use
vm_normal_page_pmd() that similarly refuses to return the huge zeropage.
In contrast to follow_trans_huge_pmd(), vm_normal_page_pmd():
(1) Will always return the head page, not a tail page of a THP.
If we'd ever call smaps_account with a tail page while setting "compound
= true", we could be in trouble, because smaps_account() would look at
the memmap of unrelated pages.
If we're unlucky, that memmap does not exist at all. Before we removed
PG_doublemap, we could have triggered something similar as in
commit 24d7275ce279 ("fs/proc: task_mmu.c: don't read mapcount for
migration entry").
This can theoretically happen ever since commit ff9f47f6f00c ("mm: proc:
smaps_rollup: do not stall write attempts on mmap_lock"):
(a) We're in show_smaps_rollup() and processed a VMA
(b) We release the mmap lock in show_smaps_rollup() because it is
contended
(c) We merged that VMA with another VMA
(d) We collapsed a THP in that merged VMA at that position
If the end address of the original VMA falls into the middle of a THP
area, we would call smap_gather_stats() with a start address that falls
into a PMD-mapped THP. It's probably very rare to trigger when not
really forced.
(2) Will succeed on a is_pci_p2pdma_page(), like vm_normal_page()
Treat such PMDs here just like smaps_pte_entry() would treat such PTEs.
If such pages would be anonymous, we most certainly would want to
account them.
(3) Will skip over pmd_devmap(), like vm_normal_page() for pte_devmap()
As noted in vm_normal_page(), that is only for handling legacy ZONE_DEVICE
pages. So just like smaps_pte_entry(), we'll now also ignore such PMD
entries.
Especially, follow_pmd_mask() never ends up calling
follow_trans_huge_pmd() on pmd_devmap(). Instead it calls
follow_devmap_pmd() -- which will fail if neither FOLL_GET nor FOLL_PIN
is set.
So skipping pmd_devmap() pages seems to be the right thing to do.
(4) Will properly handle VM_MIXEDMAP/VM_PFNMAP, like vm_normal_page()
We won't be returning a memmap that should be ignored by core-mm, or
worse, a memmap that does not even exist. Note that while
walk_page_range() will skip VM_PFNMAP mappings, walk_page_vma() won't.
Most probably this case doesn't currently really happen on the PMD level,
otherwise we'd already be able to trigger kernel crashes when reading
smaps / smaps_rollup.
So most probably only (1) is relevant in practice as of now, but could only
cause trouble in extreme corner cases.
Let's move follow_trans_huge_pmd() to mm/internal.h to discourage future
reuse in wrong context.
Link: https://lkml.kernel.org/r/20230803143208.383663-3-david@redhat.com Fixes: ff9f47f6f00c ("mm: proc: smaps_rollup: do not stall write attempts on mmap_lock") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: liubo <liubo254@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Thu, 3 Aug 2023 14:32:02 +0000 (16:32 +0200)]
mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
Unfortunately commit 474098edac26 ("mm/gup: replace FOLL_NUMA by
gup_can_follow_protnone()") missed that follow_page() and
follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really
don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting
or due to inaccessible (PROT_NONE) VMAs.
As spelled out in commit 0b9d705297b2 ("mm: numa: Support NUMA hinting
page faults from gup/gup_fast"): "Other follow_page callers like KSM
should not use FOLL_NUMA, or they would fail to get the pages if they use
follow_page instead of get_user_pages."
liubo reported [1] that smaps_rollup results are imprecise, because they
miss accounting of pages that are mapped PROT_NONE. Further, it's easy to
reproduce that KSM no longer works on inaccessible VMAs on x86-64, because
pte_protnone()/pmd_protnone() also indictaes "true" in inaccessible VMAs,
and follow_page() refuses to return such pages right now.
As KVM really depends on these NUMA hinting faults, removing the
pte_protnone()/pmd_protnone() handling in GUP code completely is not
really an option.
To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
to restore the original behavior for now and add better comments.
Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in
is_valid_gup_args(), to add that flag for all external GUP users.
Note that there are three GUP-internal __get_user_pages() users that don't
end up calling is_valid_gup_args() and consequently won't get
FOLL_HONOR_NUMA_FAULT set.
1) get_dump_page(): we really don't want to handle NUMA hinting
faults. It specifies FOLL_FORCE and wouldn't have honored NUMA
hinting faults already.
2) populate_vma_page_range(): we really don't want to handle NUMA hinting
faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have
honored NUMA hinting faults already.
3) faultin_vma_page_range(): we similarly don't want to handle NUMA
hinting faults.
To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in
inaccessible VMAs properly, we have to perform VMA accessibility checks in
gup_can_follow_protnone().
As GUP-fast should reject such pages either way in
pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and
arm64 that both implement pte_protnone() -- let's just always fallback to
ordinary GUP when stumbling over pte_protnone()/pmd_protnone().
As Linus notes [2], honoring NUMA faults might only make sense for
selected GUP users.
So we should really see if we can instead let relevant GUP callers specify
it manually, and not trigger NUMA hinting faults from GUP as default.
Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag and
adding appropriate documenation.
While at it, remove a stale comment from follow_trans_huge_pmd(): That
comment for pmd_protnone() was added in commit 2b4847e73004 ("mm: numa:
serialise parallel get_user_page against THP migration"), which noted:
THP does not unmap pages due to a lack of support for migration
entries at a PMD level. This allows races with get_user_pages
Nowadays, we do have PMD migration entries, so the comment no longer
applies. Let's drop it.
Petr Oros [Fri, 11 Aug 2023 08:07:02 +0000 (10:07 +0200)]
ice: Fix NULL pointer deref during VF reset
During stress test with attaching and detaching VF from KVM and
simultaneously changing VFs spoofcheck and trust there was a
NULL pointer dereference in ice_reset_vf that VF's VSI is null.
More than one instance of ice_reset_vf() can be running at a given
time. When we rebuild the VSI in ice_reset_vf, another reset can be
triaged from ice_service_task. In this case we can access the currently
uninitialized VSI and cause panic. The window for this racing condition
has been around for a long time but it's much worse after commit 227bf4500aaa ("ice: move VSI delete outside deconfig") because
the reset runs faster. ice_reset_vf() using vf->cfg_lock and when
we move this lock before accessing to the VF VSI, we can fix
BUG for all cases.
Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes
in ice_vsi_stop_all_rx_rings()
With our reproducer, we can hit BUG:
~8h before commit 227bf4500aaa ("ice: move VSI delete outside deconfig").
~20m after commit 227bf4500aaa ("ice: move VSI delete outside deconfig").
After this fix we are not able to reproduce it after ~48h
There was commit cf90b74341ee ("ice: Fix call trace with null VSI during
VF reset") which also tried to fix this issue, but it was only
partially resolved and the bug still exists.
After this commit we are not able to attach VF to VM:
virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52
error: Failed to attach interface
error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable
ice_check_vf_ready_for_cfg() already contain waiting for reset.
New condition in ice_check_vf_ready_for_reset() causing only problems.
Fixes: 7255355a0636 ("ice: Fix ice VF reset during iavf initialization") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jesse Brandeburg [Thu, 10 Aug 2023 23:51:10 +0000 (16:51 -0700)]
ice: fix receive buffer size miscalculation
The driver is misconfiguring the hardware for some values of MTU such that
it could use multiple descriptors to receive a packet when it could have
simply used one.
Change the driver to use a round-up instead of the result of a shift, as
the shift can truncate the lower bits of the size, and result in the
problem noted above. It also aligns this driver with similar code in i40e.
The insidiousness of this problem is that everything works with the wrong
size, it's just not working as well as it could, as some MTU sizes end up
using two or more descriptors, and there is no way to tell that is
happening without looking at ice_trace or a bus analyzer.
Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Zheng Yejian [Fri, 18 Aug 2023 02:26:45 +0000 (10:26 +0800)]
tracing: Introduce pipe_cpumask to avoid race on trace_pipes
There is race issue when concurrently splice_read main trace_pipe and
per_cpu trace_pipes which will result in data read out being different
from what actually writen.
As suggested by Steven:
> I believe we should add a ref count to trace_pipe and the per_cpu
> trace_pipes, where if they are opened, nothing else can read it.
>
> Opening trace_pipe locks all per_cpu ref counts, if any of them are
> open, then the trace_pipe open will fail (and releases any ref counts
> it had taken).
>
> Opening a per_cpu trace_pipe will up the ref count for just that
> CPU buffer. This will allow multiple tasks to read different per_cpu
> trace_pipe files, but will prevent the main trace_pipe file from
> being opened.
But because we only need to know whether per_cpu trace_pipe is open or
not, using a cpumask instead of using ref count may be easier.
After this patch, users will find that:
- Main trace_pipe can be opened by only one user, and if it is
opened, all per_cpu trace_pipes cannot be opened;
- Per_cpu trace_pipes can be opened by multiple users, but each per_cpu
trace_pipe can only be opened by one user. And if one of them is
opened, main trace_pipe cannot be opened.
David Michael [Wed, 16 Aug 2023 01:42:41 +0000 (21:42 -0400)]
drm/panfrost: Skip speed binning on EOPNOTSUPP
Encountered on an ARM Mali-T760 MP4, attempting to read the nvmem
variable can also return EOPNOTSUPP instead of ENOENT when speed
binning is unsupported.
Cc: <stable@vger.kernel.org> Fixes: 7d690f936e9b ("drm/panfrost: Add basic support for speed binning") Signed-off-by: David Michael <fedora.dm0@gmail.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/87msyryd7y.fsf@gmail.com
Rob Herring [Fri, 18 Aug 2023 20:40:57 +0000 (15:40 -0500)]
of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock
While originally it was fine to format strings using "%pOF" while
holding devtree_lock, this now causes a deadlock. Lockdep reports:
of_get_parent from of_fwnode_get_parent+0x18/0x24
^^^^^^^^^^^^^
of_fwnode_get_parent from fwnode_count_parents+0xc/0x28
fwnode_count_parents from fwnode_full_name_string+0x18/0xac
fwnode_full_name_string from device_node_string+0x1a0/0x404
device_node_string from pointer+0x3c0/0x534
pointer from vsnprintf+0x248/0x36c
vsnprintf from vprintk_store+0x130/0x3b4
Fix this by moving the printing in __of_changeset_entry_apply() outside
the lock. As the only difference in the multiple prints is the action
name, use the existing "action_names" to refactor the prints into a
single print.
Fixes: a92eb7621b9fb2c2 ("lib/vsprintf: Make use of fwnode API to obtain node names and separators") Cc: stable@vger.kernel.org Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20230801-dt-changeset-fixes-v3-2-5f0410e007dd@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
Rob Herring [Fri, 18 Aug 2023 20:40:56 +0000 (15:40 -0500)]
of: unittest: Fix EXPECT for parse_phandle_with_args_map() test
Commit 12e17243d8a1 ("of: base: improve error msg in
of_phandle_iterator_next()") added printing of the phandle value on
error, but failed to update the unittest.
Srinivas Goud [Mon, 21 Aug 2023 09:30:16 +0000 (15:00 +0530)]
spi: spi-cadence: Fix data corruption issues in slave mode
Remove 10us delay in cdns_spi_process_fifo() (called from cdns_spi_irq())
to fix data corruption issue on Master side when this driver
configured in Slave mode, as Slave is failed to prepare the date
on time due to above delay.
Add 10us delay before processing the RX FIFO as TX empty doesn't
guarantee valid data in RX FIFO.
Leonard Göhrs [Wed, 16 Aug 2023 14:52:37 +0000 (16:52 +0200)]
spi: stm32: fix accidential revert to byte-sized transfer splitting
Commit 6f486556abe35 ("spi: stm32: renaming of spi_master into
spi_controller") included an accidential reverted of a change added in
commit 1e4929112507f ("spi: stm32: split large transfers based on word
size instead of bytes").
This breaks large SPI transfers with word sizes > 8 bits, which are
e.g. common when driving MIPI DBI displays.
Fix this by using `spi_split_transfers_maxwords()` instead of
`spi_split_transfers_maxsize()`.
André Apitzsch [Sat, 19 Aug 2023 07:12:15 +0000 (09:12 +0200)]
platform/x86: ideapad-laptop: Add support for new hotkeys found on ThinkBook 14s Yoga ITL
The Lenovo Thinkbook 14s Yoga ITL has 4 new symbols/shortcuts on their
F9-F11 and PrtSc keys:
F9: Has a symbol of a head with a headset, the manual says "Service key"
F10: Has a symbol of a telephone horn which has been picked up from the
receiver, the manual says: "Answer incoming calls"
F11: Has a symbol of a telephone horn which is resting on the receiver,
the manual says: "Reject incoming calls"
PrtSc: Has a symbol of a siccor and a dashed ellipse, the manual says:
"Open the Windows 'Snipping' Tool app"
This commit adds support for these 4 new hkey events.