Linus Torvalds [Fri, 21 Oct 2022 21:43:09 +0000 (14:43 -0700)]
Merge tag 'for-linus-6.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Just two fixes for the new 'virtio with grants' feature"
* tag 'for-linus-6.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/virtio: Convert PAGE_SIZE/PAGE_SHIFT/PFN_UP to Xen counterparts
xen/virtio: Handle cases when page offset > PAGE_SIZE properly
Linus Torvalds [Fri, 21 Oct 2022 21:33:36 +0000 (14:33 -0700)]
Merge tag 'selinux-pr-20221020' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"A small SELinux fix for a GFP_KERNEL allocation while a spinlock is
held.
The patch, while still fairly small, is a bit larger than one might
expect from a simple s/GFP_KERNEL/GFP_ATOMIC/ conversion because we
added support for the function to be called with different gfp flags
depending on the context, preserving GFP_KERNEL for those cases that
can safely sleep"
* tag 'selinux-pr-20221020' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: enable use of both GFP_KERNEL and GFP_ATOMIC in convert_context()
Linus Torvalds [Fri, 21 Oct 2022 19:33:03 +0000 (12:33 -0700)]
Merge tag 'mm-hotfixes-stable-2022-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morron:
"Seventeen hotfixes, mainly for MM.
Five are cc:stable and the remainder address post-6.0 issues"
* tag 'mm-hotfixes-stable-2022-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nouveau: fix migrate_to_ram() for faulting page
mm/huge_memory: do not clobber swp_entry_t during THP split
hugetlb: fix memory leak associated with vma_lock structure
mm/page_alloc: reduce potential fragmentation in make_alloc_exact()
mm: /proc/pid/smaps_rollup: fix maple tree search
mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages
mm/mmap: fix MAP_FIXED address return on VMA merge
mm/mmap.c: __vma_adjust(): suppress uninitialized var warning
mm/mmap: undo ->mmap() when mas_preallocate() fails
init: Kconfig: fix spelling mistake "satify" -> "satisfy"
ocfs2: clear dinode links count in case of error
ocfs2: fix BUG when iput after ocfs2_mknod fails
gcov: support GCC 12.1 and newer compilers
zsmalloc: zs_destroy_pool: add size_class NULL check
mm/mempolicy: fix mbind_range() arguments to vma_merge()
mailmap: update email for Qais Yousef
mailmap: update Dan Carpenter's email address
Alistair Popple [Wed, 19 Oct 2022 12:29:34 +0000 (23:29 +1100)]
nouveau: fix migrate_to_ram() for faulting page
Commit 16ce101db85d ("mm/memory.c: fix race when faulting a device private
page") changed the migrate_to_ram() callback to take a reference on the
device page to ensure it can't be freed while handling the fault.
Unfortunately the corresponding update to Nouveau to accommodate this
change was inadvertently dropped from that patch causing GPU to CPU
migration to fail so add it here.
Link: https://lkml.kernel.org/r/20221019122934.866205-1-apopple@nvidia.com Fixes: 16ce101db85d ("mm/memory.c: fix race when faulting a device private page") Signed-off-by: Alistair Popple <apopple@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The problem can be reproduced with the mmtests config
config-workload-stressng-mmap. It does not always happen and when it
triggers is variable but it has happened on multiple machines.
The intent of commit b653db77350c patch was to avoid the case where
PG_private is clear but folio->private is not-NULL. However, THP tail
pages uses page->private for "swp_entry_t if folio_test_swapcache()" as
stated in the documentation for struct folio. This patch only clobbers
page->private for tail pages if the head page was not in swapcache and
warns once if page->private had an unexpected value.
Link: https://lkml.kernel.org/r/20221019134156.zjyyn5aownakvztf@techsingularity.net Fixes: b653db77350c ("mm: Clear page->private when splitting or migrating a page") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Yang Shi <shy828301@gmail.com> Cc: Brian Foster <bfoster@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mike Kravetz [Wed, 19 Oct 2022 20:19:57 +0000 (13:19 -0700)]
hugetlb: fix memory leak associated with vma_lock structure
The hugetlb vma_lock structure hangs off the vm_private_data pointer of
sharable hugetlb vmas. The structure is vma specific and can not be
shared between vmas. At fork and various other times, vmas are duplicated
via vm_area_dup(). When this happens, the pointer in the newly created
vma must be cleared and the structure reallocated. Two hugetlb specific
routines deal with this hugetlb_dup_vma_private and hugetlb_vm_op_open.
Both routines are called for newly created vmas. hugetlb_dup_vma_private
would always clear the pointer and hugetlb_vm_op_open would allocate the
new vms_lock structure. This did not work in the case of this calling
sequence pointed out in [1].
move_vma
copy_vma
new_vma = vm_area_dup(vma);
new_vma->vm_ops->open(new_vma); --> new_vma has its own vma lock.
is_vm_hugetlb_page(vma)
clear_vma_resv_huge_pages
hugetlb_dup_vma_private --> vma->vm_private_data is set to NULL
When clearing hugetlb_dup_vma_private we actually leak the associated
vma_lock structure.
The vma_lock structure contains a pointer to the associated vma. This
information can be used in hugetlb_dup_vma_private and hugetlb_vm_op_open
to ensure we only clear the vm_private_data of newly created (copied)
vmas. In such cases, the vma->vma_lock->vma field will not point to the
vma.
Update hugetlb_dup_vma_private and hugetlb_vm_op_open to not clear
vm_private_data if vma->vma_lock->vma == vma. Also, log a warning if
hugetlb_vm_op_open ever encounters the case where vma_lock has already
been correctly allocated for the vma.
Liam R. Howlett [Tue, 31 May 2022 13:20:51 +0000 (09:20 -0400)]
mm/page_alloc: reduce potential fragmentation in make_alloc_exact()
Try to avoid using the left over split page on the next request for a page
by calling __free_pages_ok() with FPI_TO_TAIL. This increases the
potential of defragmenting memory when it's used for a short period of
time.
Rik van Riel [Tue, 18 Oct 2022 00:25:05 +0000 (20:25 -0400)]
mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages
The h->*_huge_pages counters are protected by the hugetlb_lock, but
alloc_huge_page has a corner case where it can decrement the counter
outside of the lock.
This could lead to a corrupted value of h->resv_huge_pages, which we have
observed on our systems.
Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a
potential race.
Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com Fixes: a88c76954804 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count") Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Glen McCready <gkmccready@meta.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam Howlett [Tue, 18 Oct 2022 19:17:12 +0000 (19:17 +0000)]
mm/mmap: fix MAP_FIXED address return on VMA merge
mmap should return the start address of newly mapped area when successful.
On a successful merge of a VMA, the return address was changed and thus
was violating that expectation from userspace.
This is a restoration of functionality provided by 309d08d9b3a3
(mm/mmap.c: fix mmap return value when vma is merged after call_mmap()).
For completeness of fixing MAP_FIXED, implement the comments from the
previous discussion to never update the address and fail if the address
changes. Leaving the error as a WARN_ON() to avoid crashing the kernel.
Mike Kravetz [Tue, 18 Oct 2022 02:49:45 +0000 (19:49 -0700)]
mm/mmap: undo ->mmap() when mas_preallocate() fails
A memory leak in hugetlb_reserve_pages was reported in [1]. The root
cause was traced to an error path in mmap_region when mas_preallocate()
fails. In this case, the vma is freed after a successful call to
filesystem specific mmap. The hugetlbfs mmap routine may allocate data
structures pointed to by m_private_data. These need to be cleaned up by
the hugetlb vm_ops->close() routine.
The same issue was addressed by commit deb0f6562884 ("mm/mmap: undo
->mmap() when arch_validate_flags() fails") for the arch_validate_flags()
test. Go to the same close_and_free_vma label if mas_preallocate() fails.
Joseph Qi [Mon, 17 Oct 2022 13:02:27 +0000 (21:02 +0800)]
ocfs2: clear dinode links count in case of error
In ocfs2_mknod(), if error occurs after dinode successfully allocated,
ocfs2 i_links_count will not be 0.
So even though we clear inode i_nlink before iput in error handling, it
still won't wipe inode since we'll refresh inode from dinode during inode
lock. So just like clear inode i_nlink, we clear ocfs2 i_links_count as
well. Also do the same change for ocfs2_symlink().
Link: https://lkml.kernel.org/r/20221017130227.234480-2-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Yan Wang <wangyan122@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joseph Qi [Mon, 17 Oct 2022 13:02:26 +0000 (21:02 +0800)]
ocfs2: fix BUG when iput after ocfs2_mknod fails
Commit b1529a41f777 "ocfs2: should reclaim the inode if
'__ocfs2_mknod_locked' returns an error" tried to reclaim the claimed
inode if __ocfs2_mknod_locked() fails later. But this introduce a race,
the freed bit may be reused immediately by another thread, which will
update dinode, e.g. i_generation. Then iput this inode will lead to BUG:
inode->i_generation != le32_to_cpu(fe->i_generation)
We could make this inode as bad, but we did want to do operations like
wipe in some cases. Since the claimed inode bit can only affect that an
dinode is missing and will return back after fsck, it seems not a big
problem. So just leave it as is by revert the reclaim logic.
Link: https://lkml.kernel.org/r/20221017130227.234480-1-joseph.qi@linux.alibaba.com Fixes: b1529a41f777 ("ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Yan Wang <wangyan122@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Martin Liska [Thu, 13 Oct 2022 07:40:59 +0000 (09:40 +0200)]
gcov: support GCC 12.1 and newer compilers
Starting with GCC 12.1, the created .gcda format can't be read by gcov
tool. There are 2 significant changes to the .gcda file format that
need to be supported:
Tested with GCC 7.5, 10.4, 12.2 and the current master.
Link: https://lkml.kernel.org/r/624bda92-f307-30e9-9aaa-8cc678b2dfb2@suse.cz Signed-off-by: Martin Liska <mliska@suse.cz> Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Inside the zs_destroy_pool() function, there can still be NULL size_class
pointers: if when the next size_class is allocated, inside
zs_create_pool() function, kzalloc will return NULL and handling the error
condition, zs_create_pool() will call zs_destroy_pool().
Liam Howlett [Sat, 15 Oct 2022 02:12:33 +0000 (02:12 +0000)]
mm/mempolicy: fix mbind_range() arguments to vma_merge()
Fuzzing produced an invalid argument to vma_merge() which was caught by
the newly added verification of the number of VMAs being removed on
process exit. Analyzing the failure eventually resulted in finding an
issue with the search of a VMA that started at address 0, which caused an
underflow and thus the loss of many VMAs being tracked in the tree. Fix
the underflow by changing the search of the maple tree to use the start
address directly.
Dan Carpenter [Wed, 12 Oct 2022 13:19:39 +0000 (16:19 +0300)]
mailmap: update Dan Carpenter's email address
My time at Oracle is ending at the end of the month. Update my email
address accordingly.
Link: https://lkml.kernel.org/r/Y0a+6+5SHMdvUnpg@kili Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
amdgpu:
- Mode2 reset fixes for Sienna Cichlid
- Revert broken fan speed sensor fix
- SMU 13.x fixes
- GC 11.x fixes
- RAS fixes
- SR-IOV fixes
- Fix BO move breakage on SI
- Misc compiler fixes
- Fix gfx9 APU regression caused by PCI AER fix
vc4:
- HDMI fixes
panfrost:
- compiler fixes"
* tag 'drm-fixes-2022-10-21' of git://anongit.freedesktop.org/drm/drm: (35 commits)
drm/amdgpu: fix sdma doorbell init ordering on APUs
drm/panfrost: replace endian-specific types with native ones
drm/panfrost: Remove type name from internal structs
drm/connector: Set DDC pointer in drmm_connector_init
drm: tests: Fix a buffer overflow in format_helper_test
drm/amdgpu: use DRM_SCHED_FENCE_DONT_PIPELINE for VM updates
drm/sched: add DRM_SCHED_FENCE_DONT_PIPELINE flag
drm/amdgpu: Fix for BO move issue
drm/amdgpu: dequeue mes scheduler during fini
drm/amd/pm: enable thermal alert on smu_v13_0_10
drm/amdgpu: Program GC registers through RLCG interface in gfx_v11/gmc_v11
drm/amdkfd: Fix type of reset_type parameter in hqd_destroy() callback
drm/amd/display: Increase frame size limit for display_mode_vba_util_32.o
drm/amd/pm: add SMU IP v13.0.4 IF version define to V7
drm/amd/pm: update SMU IP v13.0.4 driver interface version
drm/amd/pm: Init pm_attr_list when dpm is disabled
drm/amd/pm: disable cstate feature for gpu reset scenario
drm/amd/pm: fulfill SMU13.0.7 cstate control interface
drm/amd/pm: fulfill SMU13.0.0 cstate control interface
drm/amdgpu: Add sriov vf ras support in amdgpu_ras_asic_supported
...
* tag 'net-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
net: phy: dp83822: disable MDI crossover status change interrupt
net: sched: fix race condition in qdisc_graft()
net: hns: fix possible memory leak in hnae_ae_register()
wwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new()
sfc: include vport_id in filter spec hash and equal()
genetlink: fix kdoc warnings
selftests: add selftest for chaining of tc ingress handling to egress
net: Fix return value of qdisc ingress handling on success
net: sched: sfb: fix null pointer access issue when sfb_init() fails
Revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()"
net: sched: cake: fix null pointer access issue when cake_init() fails
ethernet: marvell: octeontx2 Fix resource not freed after malloc
netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements
netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.
ionic: catch NULL pointer issue on reconfig
net: hsr: avoid possible NULL deref in skb_clone()
bnxt_en: fix memory leak in bnxt_nvm_test()
ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failed
udp: Update reuse->has_conns under reuseport_lock.
net: ethernet: mediatek: ppe: Remove the unused function mtk_foe_entry_usable()
...
Linus Torvalds [Fri, 21 Oct 2022 00:00:54 +0000 (17:00 -0700)]
Merge tag 'for-6.1/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Fix dm-bufio to use test_bit_acquire to properly test_bit on arches
with weaker memory ordering.
- DM core replace DMWARN with DMERR or DMCRIT for fatal errors.
- Enable WQ_HIGHPRI on DM verity target's verify_wq.
- Add documentation for DM verity's try_verify_in_tasklet option.
- Various typo and redundant word fixes in code and/or comments.
* tag 'for-6.1/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm clone: Fix typo in block_device format specifier
dm: remove unnecessary assignment statement in alloc_dev()
dm verity: Add documentation for try_verify_in_tasklet option
dm cache: delete the redundant word 'each' in comment
dm raid: fix typo in analyse_superblocks code comment
dm verity: enable WQ_HIGHPRI on verify_wq
dm raid: delete the redundant word 'that' in comment
dm: change from DMWARN to DMERR or DMCRIT for fatal errors
dm bufio: use the acquire memory barrier when testing for B_READING
Alex Deucher [Wed, 19 Oct 2022 20:57:42 +0000 (16:57 -0400)]
drm/amdgpu: fix sdma doorbell init ordering on APUs
Commit 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
uncovered a bug in amdgpu that required a reordering of the driver
init sequence to avoid accessing a special register on the GPU
before it was properly set up leading to an PCI AER error. This
reordering uncovered a different hw programming ordering dependency
in some APUs where the SDMA doorbells need to be programmed before
the GFX doorbells. To fix this, move the SDMA doorbell programming
back into the soc15 common code, but use the actual doorbell range
values directly rather than the values stored in the ring structure
since those will not be initialized at this point.
This is a partial revert, but with the doorbell assignment
fixed so the proper doorbell index is set before it's used.
Fixes: e3163bc8ffdfdb ("drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: skhan@linuxfoundation.org Cc: stable@vger.kernel.org
Steven Price [Mon, 17 Oct 2022 10:46:02 +0000 (11:46 +0100)]
drm/panfrost: replace endian-specific types with native ones
__le32 and __le64 types aren't portable and are not available on
FreeBSD (which uses the same uAPI).
Instead of attempting to always output little endian, just use native
endianness in the dumps. Tools can detect the endianness in use by
looking at the 'magic' field, but equally we don't expect big-endian to
be used with Mali (there are no known implementations out there).
Steven Price [Mon, 17 Oct 2022 10:46:01 +0000 (11:46 +0100)]
drm/panfrost: Remove type name from internal structs
The two structs internal to struct panfrost_dump_object_header were
named, but sadly that is incompatible with C++, causing an error: "an
anonymous union may only have public non-static data members".
However nothing refers to struct pan_reg_hdr and struct pan_bomap_hdr
and there's no need to export these definitions, so lets drop them. This
fixes the C++ build error with the minimum change in userspace API.
Maxime Ripard [Wed, 19 Oct 2022 14:34:42 +0000 (16:34 +0200)]
drm/connector: Set DDC pointer in drmm_connector_init
Commit 35a3b82f1bdd ("drm/connector: Introduce drmm_connector_init")
introduced the function drmm_connector_init() with a parameter for an
optional ddc pointer to the i2c controller used to access the DDC bus.
However, the underlying call to __drm_connector_init() was always
setting it to NULL instead of passing the ddc argument around.
This resulted in unexpected null pointer dereference on platforms
expecting to get a DDC controller.
David Gow [Wed, 19 Oct 2022 07:32:40 +0000 (15:32 +0800)]
drm: tests: Fix a buffer overflow in format_helper_test
The xrgb2101010 format conversion test (unlike for other formats) does
an endianness conversion on the results. However, it always converts
TEST_BUF_SIZE 32-bit integers, which results in reading from (and
writing to) more memory than in present in the result buffer. Instead,
use the buffer size, divided by sizeof(u32).
The issue could be reproduced with KASAN:
./tools/testing/kunit/kunit.py run --kunitconfig drivers/gpu/drm/tests \
--kconfig_add CONFIG_KASAN=y --kconfig_add CONFIG_KASAN_VMALLOC=y \
--kconfig_add CONFIG_KASAN_KUNIT_TEST=y \
drm_format_helper_test.*xrgb2101010
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Fixes: 453114319699 ("drm/format-helper: Add KUnit tests for drm_fb_xrgb8888_to_xrgb2101010()") Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Maíra Canal <mairacanal@riseup.net> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: José Expósito <jose.exposito89@gmail.com> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221019073239.3779180-1-davidgow@google.com
Felix Riemann [Tue, 18 Oct 2022 10:47:54 +0000 (12:47 +0200)]
net: phy: dp83822: disable MDI crossover status change interrupt
If the cable is disconnected the PHY seems to toggle between MDI and
MDI-X modes. With the MDI crossover status interrupt active this causes
roughly 10 interrupts per second.
As the crossover status isn't checked by the driver, the interrupt can
be disabled to reduce the interrupt load.
Fixes: 87461f7a58ab ("net: phy: DP83822 initial driver submission") Signed-off-by: Felix Riemann <felix.riemann@sma.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20221018104755.30025-1-svc.sw.rte.linux@sma.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 18 Oct 2022 20:32:58 +0000 (20:32 +0000)]
net: sched: fix race condition in qdisc_graft()
We had one syzbot report [1] in syzbot queue for a while.
I was waiting for more occurrences and/or a repro but
Dmitry Vyukov spotted the issue right away.
<quoting Dmitry>
qdisc_graft() drops reference to qdisc in notify_and_destroy
while it's still assigned to dev->qdisc
</quoting>
Indeed, RCU rules are clear when replacing a data structure.
The visible pointer (dev->qdisc in this case) must be updated
to the new object _before_ RCU grace period is started
(qdisc_put(old) in this case).
[1]
BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
The buggy address belongs to the object at ffff88802065e000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 56 bytes inside of
1024-byte region [ffff88802065e000, ffff88802065e400)
Yang Yingliang [Tue, 18 Oct 2022 12:24:51 +0000 (20:24 +0800)]
net: hns: fix possible memory leak in hnae_ae_register()
Inject fault while probing module, if device_register() fails,
but the refcount of kobject is not decreased to 0, the name
allocated in dev_set_name() is leaked. Fix this by calling
put_device(), so that name can be freed in callback function
kobject_cleanup().
Yang Yingliang [Tue, 18 Oct 2022 13:16:07 +0000 (21:16 +0800)]
wwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new()
Inject fault while probing module, if device_register() fails,
but the refcount of kobject is not decreased to 0, the name
allocated in dev_set_name() is leaked. Fix this by calling
put_device(), so that name can be freed in callback function
kobject_cleanup().
Pieter Jansen van Vuuren [Tue, 18 Oct 2022 09:28:41 +0000 (10:28 +0100)]
sfc: include vport_id in filter spec hash and equal()
Filters on different vports are qualified by different implicit MACs and/or
VLANs, so shouldn't be considered equal even if their other match fields
are identical.
Fixes: 7c460d9be610 ("sfc: Extend and abstract efx_filter_spec to cover Huntington/EF10") Co-developed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20221018092841.32206-1-pieter.jansen-van-vuuren@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
1) Missing flowi uid field in nft_fib expression, from Guillaume Nault.
This is broken since the creation of the fib expression.
2) Relax sanity check to fix bogus EINVAL error when deleting elements
belonging set intervals. Broken since 6.0-rc.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements
netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.
====================
Jakub Kicinski [Tue, 18 Oct 2022 23:13:10 +0000 (16:13 -0700)]
genetlink: fix kdoc warnings
Address a bunch of kdoc warnings:
include/net/genetlink.h:81: warning: Function parameter or member 'module' not described in 'genl_family'
include/net/genetlink.h:243: warning: expecting prototype for struct genl_info. Prototype was for struct genl_dumpit_info instead
include/net/genetlink.h:419: warning: Function parameter or member 'net' not described in 'genlmsg_unicast'
include/net/genetlink.h:438: warning: expecting prototype for gennlmsg_data(). Prototype was for genlmsg_data() instead
include/net/genetlink.h:244: warning: Function parameter or member 'op' not described in 'genl_dumpit_info'
GONG, Ruiqi [Wed, 19 Oct 2022 02:57:10 +0000 (10:57 +0800)]
selinux: enable use of both GFP_KERNEL and GFP_ATOMIC in convert_context()
The following warning was triggered on a hardware environment:
SELinux: Converting 162 SID table entries...
BUG: sleeping function called from invalid context at
__might_sleep+0x60/0x74 0x0
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 5943, name: tar
CPU: 7 PID: 5943 Comm: tar Tainted: P O 5.10.0 #1
Call trace:
dump_backtrace+0x0/0x1c8
show_stack+0x18/0x28
dump_stack+0xe8/0x15c
___might_sleep+0x168/0x17c
__might_sleep+0x60/0x74
__kmalloc_track_caller+0xa0/0x7dc
kstrdup+0x54/0xac
convert_context+0x48/0x2e4
sidtab_context_to_sid+0x1c4/0x36c
security_context_to_sid_core+0x168/0x238
security_context_to_sid_default+0x14/0x24
inode_doinit_use_xattr+0x164/0x1e4
inode_doinit_with_dentry+0x1c0/0x488
selinux_d_instantiate+0x20/0x34
security_d_instantiate+0x70/0xbc
d_splice_alias+0x4c/0x3c0
ext4_lookup+0x1d8/0x200 [ext4]
__lookup_slow+0x12c/0x1e4
walk_component+0x100/0x200
path_lookupat+0x88/0x118
filename_lookup+0x98/0x130
user_path_at_empty+0x48/0x60
vfs_statx+0x84/0x140
vfs_fstatat+0x20/0x30
__se_sys_newfstatat+0x30/0x74
__arm64_sys_newfstatat+0x1c/0x2c
el0_svc_common.constprop.0+0x100/0x184
do_el0_svc+0x1c/0x2c
el0_svc+0x20/0x34
el0_sync_handler+0x80/0x17c
el0_sync+0x13c/0x140
SELinux: Context system_u:object_r:pssp_rsyslog_log_t:s0:c0 is
not valid (left unmapped).
It was found that within a critical section of spin_lock_irqsave in
sidtab_context_to_sid(), convert_context() (hooked by
sidtab_convert_params.func) might cause the process to sleep via
allocating memory with GFP_KERNEL, which is problematic.
As Ondrej pointed out [1], convert_context()/sidtab_convert_params.func
has another caller sidtab_convert_tree(), which is okay with GFP_KERNEL.
Therefore, fix this problem by adding a gfp_t argument for
convert_context()/sidtab_convert_params.func and pass GFP_KERNEL/_ATOMIC
properly in individual callers.
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20221018120111.1474581-1-gongruiqi1@huawei.com/ Reported-by: Tan Ninghao <tanninghao1@huawei.com> Fixes: ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve performance") Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: wrap long BUG() output lines, tweak subject line] Signed-off-by: Paul Moore <paul@paul-moore.com>
Paul Blakey [Tue, 18 Oct 2022 07:34:39 +0000 (10:34 +0300)]
selftests: add selftest for chaining of tc ingress handling to egress
This test runs a simple ingress tc setup between two veth pairs,
then adds a egress->ingress rule to test the chaining of tc ingress
pipeline to tc egress piepline.
Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Blakey [Tue, 18 Oct 2022 07:34:38 +0000 (10:34 +0300)]
net: Fix return value of qdisc ingress handling on success
Currently qdisc ingress handling (sch_handle_ingress()) doesn't
set a return value and it is left to the old return value of
the caller (__netif_receive_skb_core()) which is RX drop, so if
the packet is consumed, caller will stop and return this value
as if the packet was dropped.
This causes a problem in the kernel tcp stack when having a
egress tc rule forwarding to a ingress tc rule.
The tcp stack sending packets on the device having the egress rule
will see the packets as not successfully transmitted (although they
actually were), will not advance it's internal state of sent data,
and packets returning on such tcp stream will be dropped by the tcp
stack with reason ack-of-unsent-data. See reproduction in [0] below.
Fix that by setting the return value to RX success if
the packet was handled successfully.
[0] Reproduction steps:
$ ip link add veth1 type veth peer name peer1
$ ip link add veth2 type veth peer name peer2
$ ifconfig peer1 5.5.5.6/24 up
$ ip netns add ns0
$ ip link set dev peer2 netns ns0
$ ip netns exec ns0 ifconfig peer2 5.5.5.5/24 up
$ ifconfig veth2 0 up
$ ifconfig veth1 0 up
#ingress forwarding veth1 <-> veth2
$ tc qdisc add dev veth2 ingress
$ tc qdisc add dev veth1 ingress
$ tc filter add dev veth2 ingress prio 1 proto all flower \
action mirred egress redirect dev veth1
$ tc filter add dev veth1 ingress prio 1 proto all flower \
action mirred egress redirect dev veth2
#steal packet from peer1 egress to veth2 ingress, bypassing the veth pipe
$ tc qdisc add dev peer1 clsact
$ tc filter add dev peer1 egress prio 20 proto ip flower \
action mirred ingress redirect dev veth1
#run iperf and see connection not running
$ iperf3 -s&
$ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1
#delete egress rule, and run again, now should work
$ tc filter del dev peer1 egress
$ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1
Fixes: f697c3e8b35c ("[NET]: Avoid unnecessary cloning for ingress filtering") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Oct 2022 12:47:09 +0000 (13:47 +0100)]
Merge branch 'qdisc-null-deref'
Zhengchao Shao says:
====================
net: fix null pointer access issue in qdisc
These three patches fix the same type of problem. Set the default qdisc,
and then construct an init failure scenario when the dev qdisc is
configured on mqprio to trigger the reset process. NULL pointer access
may occur during the reset process.
---
v2: for fq_codel, revert the patch
---
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When the default qdisc is sfb, if the qdisc of dev_queue fails to be
inited during mqprio_init(), sfb_reset() is invoked to clear resources.
In this case, the q->qdisc is NULL, and it will cause gpf issue.
The process is as follows:
qdisc_create_dflt()
sfb_init()
tcf_block_get() --->failed, q->qdisc is NULL
...
qdisc_put()
...
sfb_reset()
qdisc_reset(q->qdisc) --->q->qdisc is NULL
ops = qdisc->ops
The following is the Call Trace information:
general protection fault, probably for non-canonical address
0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
RIP: 0010:qdisc_reset+0x2b/0x6f0
Call Trace:
<TASK>
sfb_reset+0x37/0xd0
qdisc_reset+0xed/0x6f0
qdisc_destroy+0x82/0x4c0
qdisc_put+0x9e/0xb0
qdisc_create_dflt+0x2c3/0x4a0
mqprio_init+0xa71/0x1760
qdisc_create+0x3eb/0x1000
tc_modify_qdisc+0x408/0x1720
rtnetlink_rcv_msg+0x38e/0xac0
netlink_rcv_skb+0x12d/0x3a0
netlink_unicast+0x4a2/0x740
netlink_sendmsg+0x826/0xcc0
sock_sendmsg+0xc5/0x100
____sys_sendmsg+0x583/0x690
___sys_sendmsg+0xe8/0x160
__sys_sendmsg+0xbf/0x160
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f2164122d04
</TASK>
Fixes: e13e02a3c68d ("net_sched: SFB flow scheduler") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the default qdisc is fq_codel, if the qdisc of dev_queue fails to be
inited during mqprio_init(), fq_codel_reset() is invoked to clear
resources. In this case, the flow is NULL, and it will cause gpf issue.
The process is as follows:
qdisc_create_dflt()
fq_codel_init()
...
q->flows_cnt = 1024;
...
q->flows = kvcalloc(...) --->failed, q->flows is NULL
...
qdisc_put()
...
fq_codel_reset()
...
flow = q->flows + i --->q->flows is NULL
The following is the Call Trace information:
general protection fault, probably for non-canonical address
0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
RIP: 0010:fq_codel_reset+0x14d/0x350
Call Trace:
<TASK>
qdisc_reset+0xed/0x6f0
qdisc_destroy+0x82/0x4c0
qdisc_put+0x9e/0xb0
qdisc_create_dflt+0x2c3/0x4a0
mqprio_init+0xa71/0x1760
qdisc_create+0x3eb/0x1000
tc_modify_qdisc+0x408/0x1720
rtnetlink_rcv_msg+0x38e/0xac0
netlink_rcv_skb+0x12d/0x3a0
netlink_unicast+0x4a2/0x740
netlink_sendmsg+0x826/0xcc0
sock_sendmsg+0xc5/0x100
____sys_sendmsg+0x583/0x690
___sys_sendmsg+0xe8/0x160
__sys_sendmsg+0xbf/0x160
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fd272b22d04
</TASK>
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the default qdisc is cake, if the qdisc of dev_queue fails to be
inited during mqprio_init(), cake_reset() is invoked to clear
resources. In this case, the tins is NULL, and it will cause gpf issue.
The process is as follows:
qdisc_create_dflt()
cake_init()
q->tins = kvcalloc(...) --->failed, q->tins is NULL
...
qdisc_put()
...
cake_reset()
...
cake_dequeue_one()
b = &q->tins[...] --->q->tins is NULL
The following is the Call Trace information:
general protection fault, probably for non-canonical address
0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:cake_dequeue_one+0xc9/0x3c0
Call Trace:
<TASK>
cake_reset+0xb1/0x140
qdisc_reset+0xed/0x6f0
qdisc_destroy+0x82/0x4c0
qdisc_put+0x9e/0xb0
qdisc_create_dflt+0x2c3/0x4a0
mqprio_init+0xa71/0x1760
qdisc_create+0x3eb/0x1000
tc_modify_qdisc+0x408/0x1720
rtnetlink_rcv_msg+0x38e/0xac0
netlink_rcv_skb+0x12d/0x3a0
netlink_unicast+0x4a2/0x740
netlink_sendmsg+0x826/0xcc0
sock_sendmsg+0xc5/0x100
____sys_sendmsg+0x583/0x690
___sys_sendmsg+0xe8/0x160
__sys_sendmsg+0xbf/0x160
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f89e5122d04
</TASK>
Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
Christian König [Fri, 7 Oct 2022 07:51:13 +0000 (09:51 +0200)]
drm/sched: add DRM_SCHED_FENCE_DONT_PIPELINE flag
Setting this flag on a scheduler fence prevents pipelining of jobs
depending on this fence. In other words we always insert a full CPU
round trip before dependent jobs are pushed to the pipeline.
Guillaume Nault [Thu, 13 Oct 2022 14:37:47 +0000 (16:37 +0200)]
netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.
Currently netfilter's rpfilter and fib modules implicitely initialise
->flowic_uid with 0. This is normally the root UID. However, this isn't
the case in user namespaces, where user ID 0 is mapped to a different
kernel UID. By initialising ->flowic_uid with sock_net_uid(), we get
the root UID of the user namespace, thus keeping the same behaviour
whether or not we're running in a user namepspace.
Note, this is similar to commit 8bcfd0925ef1 ("ipv4: add missing
initialization for flowi4_uid"), which fixed the rp_filter sysctl.
Fixes: 622ec2c9d524 ("net: core: add UID to flows, rules, and routes") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Brett Creeley [Mon, 17 Oct 2022 23:31:23 +0000 (16:31 -0700)]
ionic: catch NULL pointer issue on reconfig
It's possible that the driver will dereference a qcq that doesn't exist
when calling ionic_reconfigure_queues(), which causes a page fault BUG.
If a reduction in the number of queues is followed by a different
reconfig such as changing the ring size, the driver can hit a NULL
pointer when trying to clean up non-existent queues.
Fix this by checking to make sure both the qcqs array and qcq entry
exists bofore trying to use and free the entry.
Fixes: 101b40a0171f ("ionic: change queue count with no reset") Signed-off-by: Brett Creeley <brett@pensando.io> Signed-off-by: Shannon Nelson <snelson@pensando.io> Link: https://lore.kernel.org/r/20221017233123.15869-1-snelson@pensando.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A user reported a bug on CAPE VERDE system where uvd_v3_1
IP component failed to initialize as there is an issue with
BO move code from one memory to other.
In function amdgpu_mem_visible() called by amdgpu_bo_move(),
when there are no blocks to compare or if we have a single
block then break the loop.
Fixes: 312b4dc11d4f ("drm/amdgpu: Fix VRAM BO swap issue") Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
YuBiao Wang [Thu, 13 Oct 2022 03:31:55 +0000 (11:31 +0800)]
drm/amdgpu: dequeue mes scheduler during fini
[Why]
If mes is not dequeued during fini, mes will be in an uncleaned state
during reload, then mes couldn't receive some commands which leads to
reload failure.
[How]
Perform MES dequeue via MMIO after all the unmap jobs are done by mes
and before kiq fini.
v2: Move the dequeue operation inside kiq_hw_fini.
Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Clang's kernel Control Flow Integrity (kCFI) makes sure that all
indirect call targets have a type that exactly matches the function
pointer prototype. In this case, hqd_destroy()'s third parameter,
reset_type, should have a type of 'uint32_t' but every implementation of
this callback has a third parameter type of 'enum kfd_preempt_type'.
Update the function pointer prototype to match reality so that there is
no more CFI violation.
Guenter Roeck [Thu, 13 Oct 2022 18:25:23 +0000 (11:25 -0700)]
drm/amd/display: Increase frame size limit for display_mode_vba_util_32.o
Building 32-bit images may fail with the following error.
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:
In function ‘dml32_UseMinimumDCFCLK’:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:3142:1:
error: the frame size of 1096 bytes is larger than 1024 bytes
This is seen when building i386:allmodconfig with any of the following
compilers.
The problem is not seen if the compiler supports GCC_PLUGIN_LATENT_ENTROPY
because in that case CONFIG_FRAME_WARN is already set to 2048 even for
32-bit builds.
dml32_UseMinimumDCFCLK() was introduced with commit dda4fb85e433
("drm/amd/display: DML changes for DCN32/321"). It declares a large
number of local variables. Increase the frame size for the affected
file to 2048, similar to other files in the same directory, to enable
32-bit build tests with affected compilers.
Tim Huang [Thu, 29 Sep 2022 07:06:47 +0000 (15:06 +0800)]
drm/amd/pm: add SMU IP v13.0.4 IF version define to V7
The pmfw has changed the driver interface version, so keep same with the
fw.
Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tim Huang [Thu, 29 Sep 2022 06:39:21 +0000 (14:39 +0800)]
drm/amd/pm: update SMU IP v13.0.4 driver interface version
Update the SMU driver interface version to V7.
Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ZhenGuo Yin [Wed, 12 Oct 2022 08:54:38 +0000 (16:54 +0800)]
drm/amd/pm: Init pm_attr_list when dpm is disabled
[Why]
In SRIOV multi-vf, dpm is always disabled, and pm_attr_list won't
be initialized. There will be a NULL pointer call trace after
removing the dpm check condition in amdgpu_pm_sysfs_fini.
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:amdgpu_device_attr_remove_groups+0x20/0x90 [amdgpu]
Call Trace:
<TASK>
amdgpu_pm_sysfs_fini+0x2f/0x40 [amdgpu]
amdgpu_device_fini_hw+0xdf/0x290 [amdgpu]
[How]
List pm_attr_list should be initialized when dpm is disabled.
Fixes: a6ad27cec585fe ("drm/amd/pm: Remove redundant check condition") Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Victor Zhao [Thu, 13 Oct 2022 07:53:19 +0000 (15:53 +0800)]
drm/amdgpu: Refactor mode2 reset logic for v11.0.7
- refactor mode2 on v11.0.7 to align with aldebaran
- comment out using mode2 reset as default for now, will introduce
another controller to replace previous reset_level_mask
v2: squash in unused variable removal (Alex)
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.
Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure") Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Danijel Slivka [Tue, 4 Oct 2022 13:39:44 +0000 (15:39 +0200)]
drm/amdgpu: set vm_update_mode=0 as default for Sienna Cichlid in SRIOV case
For asic with VF MMIO access protection avoid using CPU for VM table updates.
CPU pagetable updates have issues with HDP flush as VF MMIO access protection
blocks write to mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register
during sriov runtime.
v3: introduce virtualization capability flag AMDGPU_VF_MMIO_ACCESS_PROTECT
which indicates that VF MMIO write access is not allowed in sriov runtime
Signed-off-by: Danijel Slivka <danijel.slivka@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Linus Torvalds [Tue, 18 Oct 2022 18:25:50 +0000 (11:25 -0700)]
Merge tag 'for-6.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fiemap fixes:
- add missing path cache update
- fix processing of delayed data and tree refs during backref
walking, this could lead to reporting incorrect extent sharing
- fix extent range locking under heavy contention to avoid deadlocks
- make it possible to test send v3 in debugging mode
- update links in MAINTAINERS
* tag 'for-6.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
MAINTAINERS: update btrfs website links and files
btrfs: ignore fiemap path cache if we have multiple leaves for a data extent
btrfs: fix processing of delayed tree block refs during backref walking
btrfs: fix processing of delayed data refs during backref walking
btrfs: delete stale comments after merge conflict resolution
btrfs: unlock locked extent area if we have contention
btrfs: send: update command for protocol version check
btrfs: send: allow protocol version 3 with CONFIG_BTRFS_DEBUG
btrfs: add missing path cache update during fiemap
Linus Torvalds [Tue, 18 Oct 2022 18:18:26 +0000 (11:18 -0700)]
Merge tag 'erofs-for-6.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Fix invalid unmapped accesses when initializing compressed inodes
- Fix up very rare hung on page lock after enabling compressed data
deduplication
- Fix up inplace decompression success rate
- Take s_inode_list_lock to protect sb->s_inodes for fscache shared
domain
* tag 'erofs-for-6.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: protect s_inodes with s_inode_list_lock for fscache
erofs: fix up inplace decompression success rate
erofs: shouldn't churn the mapping page for duplicated copies
erofs: fix illegal unmapped accesses in z_erofs_fill_inode_lazy()
Mikulas Patocka [Tue, 18 Oct 2022 14:06:45 +0000 (10:06 -0400)]
dm bufio: use the acquire memory barrier when testing for B_READING
The function test_bit doesn't provide any memory barrier. It may be
possible that the read requests that follow test_bit(B_READING, &b->state)
are reordered before the test, reading invalid data that existed before
B_READING was cleared.
Fix this bug by changing test_bit to test_bit_acquire. This is
particularly important on arches with weak(er) memory ordering
(e.g. arm64).
Depends-On: 8238b4579866 ("wait_on_bit: add an acquire memory barrier")
Depends-On: d6ffe6067a54 ("provide arch_test_bit_acquire for architectures that define test_bit") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Zhengchao Shao [Mon, 17 Oct 2022 08:03:31 +0000 (16:03 +0800)]
ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failed
If the initialization fails in calling addrconf_init_net(), devconf_all is
the pointer that has been released. Then ip6mr_sk_done() is called to
release the net, accessing devconf->mc_forwarding directly causes invalid
pointer access.
The process is as follows:
setup_net()
ops_init()
addrconf_init_net()
all = kmemdup(...) ---> alloc "all"
...
net->ipv6.devconf_all = all;
__addrconf_sysctl_register() ---> failed
...
kfree(all); ---> ipv6.devconf_all invalid
...
ops_exit_list()
...
ip6mr_sk_done()
devconf = net->ipv6.devconf_all;
//devconf is invalid pointer
if (!devconf || !atomic_read(&devconf->mc_forwarding))
The following is the Call Trace information:
BUG: KASAN: use-after-free in ip6mr_sk_done+0x112/0x3a0
Read of size 4 at addr ffff888075508e88 by task ip/14554
Call Trace:
<TASK>
dump_stack_lvl+0x8e/0xd1
print_report+0x155/0x454
kasan_report+0xba/0x1f0
kasan_check_range+0x35/0x1b0
ip6mr_sk_done+0x112/0x3a0
rawv6_close+0x48/0x70
inet_release+0x109/0x230
inet6_release+0x4c/0x70
sock_release+0x87/0x1b0
igmp6_net_exit+0x6b/0x170
ops_exit_list+0xb0/0x170
setup_net+0x7ac/0xbd0
copy_net_ns+0x2e6/0x6b0
create_new_namespaces+0x382/0xa50
unshare_nsproxy_namespaces+0xa6/0x1c0
ksys_unshare+0x3a4/0x7e0
__x64_sys_unshare+0x2d/0x40
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f7963322547
Kuniyuki Iwashima [Fri, 14 Oct 2022 18:26:25 +0000 (11:26 -0700)]
udp: Update reuse->has_conns under reuseport_lock.
When we call connect() for a UDP socket in a reuseport group, we have
to update sk->sk_reuseport_cb->has_conns to 1. Otherwise, the kernel
could select a unconnected socket wrongly for packets sent to the
connected socket.
However, the current way to set has_conns is illegal and possible to
trigger that problem. reuseport_has_conns() changes has_conns under
rcu_read_lock(), which upgrades the RCU reader to the updater. Then,
it must do the update under the updater's lock, reuseport_lock, but
it doesn't for now.
For this reason, there is a race below where we fail to set has_conns
resulting in the wrong socket selection. To avoid the race, let's split
the reader and updater with proper locking.
Note the likely(reuse) in reuseport_has_conns_set() is always true,
but we put the test there for ease of review. [0]
For the record, usually, sk_reuseport_cb is changed under lock_sock().
The only exception is reuseport_grow() & TCP reqsk migration case.
1) shutdown() TCP listener, which is moved into the latter part of
reuse->socks[] to migrate reqsk.
2) New listen() overflows reuse->socks[] and call reuseport_grow().
3) reuse->max_socks overflows u16 with the new listener.
4) reuseport_grow() pops the old shutdown()ed listener from the array
and update its sk->sk_reuseport_cb as NULL without lock_sock().
shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(),
but, reuseport_has_conns_set() is called only for UDP under lock_sock(),
so likely(reuse) never be false in reuseport_has_conns_set().
Linus Torvalds [Tue, 18 Oct 2022 01:52:43 +0000 (18:52 -0700)]
Merge tag 'cgroup-for-6.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- Fix a recent regression where a sleeping kernfs function is called
with css_set_lock (spinlock) held
- Revert the commit to enable cgroup1 support for cgroup_get_from_fd/file()
Multiple users assume that the lookup only works for cgroup2 and
breaks when fed a cgroup1 file. Instead, introduce a separate set of
functions to lookup both v1 and v2 and use them where the user
explicitly wants to support both versions.
- Compat update for tools/perf/util/bpf_skel/bperf_cgroup.bpf.c.
- Add Josef Bacik as a blkcg maintainer.
* tag 'cgroup-for-6.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
blkcg: Update MAINTAINERS entry
mm: cgroup: fix comments for get from fd/file helpers
perf stat: Support old kernels for bperf cgroup counting
bpf: cgroup_iter: support cgroup1 using cgroup fd
cgroup: add cgroup_v1v2_get_from_[fd/file]()
Revert "cgroup: enable cgroup_get_from_file() on cgroup1"
cgroup: Reorganize css_set_lock and kernfs path processing