Linus Torvalds [Fri, 6 Jan 2023 21:12:42 +0000 (13:12 -0800)]
Merge tag 'block-2023-01-06' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"The big change here is obviously the revert of the pktcdvd driver
removal. Outside of that, just minor tweaks. In detail:
- Re-instate the pktcdvd driver, which necessitates adding back
bio_copy_data_iter() and the fops->devnode() hook for now (me)
- Fix for splitting of a bio marked as NOWAIT, causing either nowait
reads or writes to error with EAGAIN even if parts of the IO
completed (me)
- Fix for ublk, punting management commands to io-wq as they can all
easily block for extended periods of time (Ming)
- Removal of SRCU dependency for the block layer (Paul)"
* tag 'block-2023-01-06' of git://git.kernel.dk/linux:
block: Remove "select SRCU"
Revert "pktcdvd: remove driver."
Revert "block: remove devnode callback from struct block_device_operations"
Revert "block: bio_copy_data_iter"
ublk: honor IO_URING_F_NONBLOCK for handling control command
block: don't allow splitting of a REQ_NOWAIT bio
block: handle bio_split_to_limits() NULL return
Linus Torvalds [Fri, 6 Jan 2023 20:54:51 +0000 (12:54 -0800)]
Merge tag 'tif-notify-signal-2023-01-06' of git://git.kernel.dk/linux
Pull arm TIF_NOTIFY_SIGNAL fixup from Jens Axboe:
"Hui Tang reported a performance regressions with _TIF_WORK_MASK in
newer kernels, which he tracked to a change that went into 5.11. After
this change, we'll call do_work_pending() more often than we need to,
because we're now testing bits 0..15 rather than just 0..7.
Shuffle the bits around to avoid this"
* tag 'tif-notify-signal-2023-01-06' of git://git.kernel.dk/linux:
ARM: renumber bits related to _TIF_WORK_MASK
Linus Torvalds [Fri, 6 Jan 2023 20:11:41 +0000 (12:11 -0800)]
Merge tag 'ceph-for-6.2-rc3' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two file locking fixes from Xiubo"
* tag 'ceph-for-6.2-rc3' of https://github.com/ceph/ceph-client:
ceph: avoid use-after-free in ceph_fl_release_lock()
ceph: switch to vfs_inode_has_locks() to fix file lock bug
Linus Torvalds [Fri, 6 Jan 2023 20:07:00 +0000 (12:07 -0800)]
Merge tag 'fixes_for_v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes from Jan Kara:
"Two fixups of the UDF changes that went into 6.2-rc1"
* tag 'fixes_for_v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: initialize newblock to 0
udf: Fix extension of the last extent in the file
Linus Torvalds [Fri, 6 Jan 2023 20:01:49 +0000 (12:01 -0800)]
Merge tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more regression and regular fixes:
- regressions:
- fix assertion condition using = instead of ==
- fix false alert on bad tree level check
- fix off-by-one error in delalloc search during lseek
- fix compat ro feature check at read-write remount
- handle case when read-repair happens with ongoing device replace
- updated error messages"
* tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix compat_ro checks against remount
btrfs: always report error in run_one_delayed_ref()
btrfs: handle case when repair happens with dev-replace
btrfs: fix off-by-one in delalloc search during lseek
btrfs: fix false alert on bad tree level check
btrfs: add error message for metadata level mismatch
btrfs: fix ASSERT em->len condition in btrfs_get_extent
Linus Torvalds [Fri, 6 Jan 2023 19:33:42 +0000 (11:33 -0800)]
Merge tag 'riscv-for-linus-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- use the correct mask for c.jr/c.jalr when decoding instructions
- build fix for get_user() to avoid a sparse warning
* tag 'riscv-for-linus-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: uaccess: fix type of 0 variable on error in get_user()
riscv, kprobes: Stricter c.jr/c.jalr decoding
Linus Torvalds [Fri, 6 Jan 2023 19:23:58 +0000 (11:23 -0800)]
Merge tag 'perf-tools-fixes-for-v6.2-1-2023-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix segfault when trying to process tracepoints present in a
perf.data file and not linked with libtraceevent.
- Fix build on uClibc systems by adding missing sys/types.h include,
that was being obtained indirectly which stopped being the case when
tools/lib/traceevent was removed.
- Don't show commands in 'perf help' that depend on linking with
libtraceevent when not building with that library, which is now a
possibility since we no longer ship a copy in tools/lib/traceevent.
- Fix failure in 'perf test' entry testing the combination of 'perf
probe' user space function + 'perf record' + 'perf script' where it
expects a backtrace leading to glibc's inet_pton() from 'ping' that
now happens more than once with glibc 2.35 for IPv6 addreses.
- Fix for the inet_pton perf test on s/390 where
'text_to_binary_address' now appears on the backtrace.
- Fix build error on riscv due to missing header for 'struct
perf_sample'.
- Fix 'make -C tools perf_install' install variant by not propagating
the 'subdir' to submakes for the 'install_headers' targets.
- Fix handling of unsupported cgroup events when using BPF counters in
'perf stat'.
- Count all cgroups, not just the last one when using 'perf stat' and
combining --for-each-cgroup with --bpf-counters.
This makes the output using BPF counters match the output without
using it, which was the intention all along, the output should be the
same using --bpf-counters or not.
- Fix 'perf lock contention' core dump related to not finding the
"__sched_text_end" symbol on s/390.
- Fix build failure when HEAD is signed: exclude the signature from the
version string.
- Add missing closedir() calls to in perf_data__open_dir(), plugging a
fd leak.
* tag 'perf-tools-fixes-for-v6.2-1-2023-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf tools: Fix build on uClibc systems by adding missing sys/types.h include
perf stat: Fix handling of --for-each-cgroup with --bpf-counters to match non BPF mode
perf stat: Fix handling of unsupported cgroup events when using BPF counters
perf test record_probe_libc_inet_pton: Fix test on s/390 where 'text_to_binary_address' now appears on the backtrace
perf lock contention: Fix core dump related to not finding the "__sched_text_end" symbol on s/390
perf build: Don't propagate subdir to submakes for install_headers
perf test record_probe_libc_inet_pton: Fix failure due to extra inet_pton() backtrace in glibc >= 2.35
perf tools: Fix segfault when trying to process tracepoints in perf.data and not linked with libtraceevent
perf tools: Don't include signature in version strings
perf help: Use HAVE_LIBTRACEEVENT to filter out unsupported commands
perf tools riscv: Fix build error on riscv due to missing header for 'struct perf_sample'
perf tools: Fix resources leak in perf_data__open_dir()
Linus Torvalds [Fri, 6 Jan 2023 19:20:12 +0000 (11:20 -0800)]
Merge tag 'perf-urgent-2023-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Ingo Molnar:
"Intel RAPL updates for new model IDs"
* tag 'perf-urgent-2023-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/rapl: Add support for Intel Emerald Rapids
perf/x86/rapl: Add support for Intel Meteor Lake
perf/x86/rapl: Treat Tigerlake like Icelake
Linus Torvalds [Fri, 6 Jan 2023 19:14:11 +0000 (11:14 -0800)]
Merge tag 'v6.2-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a CFI crash in arm64/sm4 as well as a regression in the
caam driver"
* tag 'v6.2-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm64/sm4 - fix possible crash with CFI enabled
crypto: caam - fix CAAM io mem access in blob_gen
Tom Rix [Fri, 30 Dec 2022 17:53:41 +0000 (12:53 -0500)]
udf: initialize newblock to 0
The clang build reports this error
fs/udf/inode.c:805:6: error: variable 'newblock' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
if (*err < 0)
^~~~~~~~
newblock is never set before error handling jump.
Initialize newblock to 0 and remove redundant settings.
Fixes: d8b39db5fab8 ("udf: Handle error when adding extent to a file") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20221230175341.1629734-1-trix@redhat.com>
Jan Kara [Wed, 21 Dec 2022 16:45:51 +0000 (17:45 +0100)]
udf: Fix extension of the last extent in the file
When extending the last extent in the file within the last block, we
wrongly computed the length of the last extent. This is mostly a
cosmetical problem since the extent does not contain any data and the
length will be fixed up by following operations but still.
Fixes: 1f3868f06855 ("udf: Fix extending file within last block") Signed-off-by: Jan Kara <jack@suse.cz>
- eth: dsa: qca8k: various fixes for the in-band register access
- eth: nfp: fix schedule in atomic context when sync mc address
- eth: renesas: rswitch: fix getting mac address from device tree
- mobile: ipa: use proper endpoint mask for suspend
Previous releases - regressions:
- tcp: add TIME_WAIT sockets in bhash2, fix regression caught by
Jiri / python tests
- net: tc: don't intepret cls results when asked to drop, fix
oob-access
- vrf: determine the dst using the original ifindex for multicast
- eth: bnxt_en:
- fix XDP RX path if BPF adjusted packet length
- fix HDS (header placement) and jumbo thresholds for RX packets
- eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
avoid memory corruptions
Previous releases - always broken:
- ulp: prevent ULP without clone op from entering the LISTEN status
- veth: fix race with AF_XDP exposing old or uninitialized
descriptors
- bpf:
- pull before calling skb_postpull_rcsum() (fix checksum support
and avoid a WARN())
- fix panic due to wrong pageattr of im->image (when livepatch and
kretfunc coexist)
- keep a reference to the mm, in case the task is dead
- mptcp: fix deadlock in fastopen error path
- netfilter:
- nf_tables: perform type checking for existing sets
- nf_tables: honor set timeout and garbage collection updates
- ipset: fix hash:net,port,net hang with /0 subnet
- ipset: avoid hung task warning when adding/deleting entries
- selftests: net:
- fix cmsg_so_mark.sh test hang on non-x86 systems
- fix the arp_ndisc_evict_nocarrier test for IPv6
- usb: rndis_host: secure rndis_query check against int overflow
- eth: r8169: fix dmar pte write access during suspend/resume with
WOL
- eth: lan966x: fix configuration of the PCS
- eth: sparx5: fix reading of the MAC address
- eth: qed: allow sleep in qed_mcp_trace_dump()
- eth: hns3:
- fix interrupts re-initialization after VF FLR
- fix handling of promisc when MAC addr table gets full
- refine the handling for VF heartbeat
- eth: mlx5:
- properly handle ingress QinQ-tagged packets on VST
- fix io_eq_size and event_eq_size params validation on big endian
- fix RoCE setting at HCA level if not supported at all
- don't turn CQE compression on by default for IPoIB
- eth: ena:
- fix toeplitz initial hash key value
- account for the number of XDP-processed bytes in interface stats
- fix rx_copybreak value update
Misc:
- ethtool: harden phy stat handling against buggy drivers
- docs: netdev: convert maintainer's doc from FAQ to a normal
document"
* tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
caif: fix memory leak in cfctrl_linkup_request()
inet: control sockets should not use current thread task_frag
net/ulp: prevent ULP without clone op from entering the LISTEN status
qed: allow sleep in qed_mcp_trace_dump()
MAINTAINERS: Update maintainers for ptp_vmw driver
usb: rndis_host: Secure rndis_query check against int overflow
net: dpaa: Fix dtsec check for PCS availability
octeontx2-pf: Fix lmtst ID used in aura free
drivers/net/bonding/bond_3ad: return when there's no aggregator
netfilter: ipset: Rework long task execution when adding/deleting entries
netfilter: ipset: fix hash:net,port,net hang with /0 subnet
net: sparx5: Fix reading of the MAC address
vxlan: Fix memory leaks in error path
net: sched: htb: fix htb_classify() kernel-doc
net: sched: cbq: dont intepret cls results when asked to drop
net: sched: atm: dont intepret cls results when asked to drop
dt-bindings: net: marvell,orion-mdio: Fix examples
dt-bindings: net: sun8i-emac: Add phy-supply property
net: ipa: use proper endpoint mask for suspend
selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier
...
Ben Dooks [Thu, 29 Dec 2022 17:05:45 +0000 (17:05 +0000)]
riscv: uaccess: fix type of 0 variable on error in get_user()
If the get_user(x, ptr) has x as a pointer, then the setting
of (x) = 0 is going to produce the following sparse warning,
so fix this by forcing the type of 'x' when access_ok() fails.
fs/aio.c:2073:21: warning: Using plain integer as NULL pointer
Linus Torvalds [Thu, 5 Jan 2023 20:06:40 +0000 (12:06 -0800)]
Merge tag 'gpio-fixes-for-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"A reference leak fix, two fixes for using uninitialized variables and
more drivers converted to using immutable irqchips:
- fix a reference leak in gpio-sifive
- fix a potential use of an uninitialized variable in core gpiolib
- fix a potential use of an uninitialized variable in gpio-pca953x
- make GPIO irqchips immutable in gpio-pmic-eic-sprd, gpio-eic-sprd
and gpio-sprd"
* tag 'gpio-fixes-for-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sifive: Fix refcount leak in sifive_gpio_probe
gpio: sprd: Make the irqchip immutable
gpio: pmic-eic-sprd: Make the irqchip immutable
gpio: eic-sprd: Make the irqchip immutable
gpio: pca953x: avoid to use uninitialized value pinctrl
gpiolib: Fix using uninitialized lookup-flags on ACPI platforms
Linus Torvalds [Thu, 5 Jan 2023 19:24:33 +0000 (11:24 -0800)]
Merge tag 'fbdev-for-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes from Helge Deller:
- Fix Matrox G200eW initialization failure
- Fix build failure of offb driver when built as module
- Optimize stack usage in omapfb
* tag 'fbdev-for-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: omapfb: avoid stack overflow warning
fbdev: matroxfb: G200eW: Increase max memory from 1 MB to 16 MB
fbdev: atyfb: use strscpy() to instead of strncpy()
fbdev: omapfb: use strscpy() to instead of strncpy()
fbdev: make offb driver tristate
Paul E. McKenney [Thu, 5 Jan 2023 00:37:53 +0000 (16:37 -0800)]
block: Remove "select SRCU"
Now that the SRCU Kconfig option is unconditionally selected, there is
no longer any point in selecting it. Therefore, remove the "select SRCU"
Kconfig statements.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 5 Jan 2023 10:49:15 +0000 (10:49 +0000)]
io_uring: fix CQ waiting timeout handling
Jiffy to ktime CQ waiting conversion broke how we treat timeouts, in
particular we rearm it anew every time we get into
io_cqring_wait_schedule() without adjusting the timeout. Waiting for 2
CQEs and getting a task_work in the middle may double the timeout value,
or even worse in some cases task may wait indefinitely.
Arnd Bergmann [Thu, 15 Dec 2022 17:02:28 +0000 (18:02 +0100)]
fbdev: omapfb: avoid stack overflow warning
The dsi_irq_stats structure is a little too big to fit on the
stack of a 32-bit task, depending on the specific gcc options:
fbdev/omap2/omapfb/dss/dsi.c: In function 'dsi_dump_dsidev_irqs':
fbdev/omap2/omapfb/dss/dsi.c:1621:1: error: the frame size of 1064 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Since this is only a debugfs file, performance is not critical,
so just dynamically allocate it, and print an error message
in there in place of a failure code when the allocation fails.
Caleb Sander [Tue, 3 Jan 2023 23:30:21 +0000 (16:30 -0700)]
qed: allow sleep in qed_mcp_trace_dump()
By default, qed_mcp_cmd_and_union() delays 10us at a time in a loop
that can run 500K times, so calls to qed_mcp_nvm_rd_cmd()
may block the current thread for over 5s.
We observed thread scheduling delays over 700ms in production,
with stacktraces pointing to this code as the culprit.
qed_mcp_trace_dump() is called from ethtool, so sleeping is permitted.
It already can sleep in qed_mcp_halt(), which calls qed_mcp_cmd().
Add a "can sleep" parameter to qed_find_nvram_image() and
qed_nvram_read() so they can sleep during qed_mcp_trace_dump().
qed_mcp_trace_get_meta_info() and qed_mcp_trace_read_meta(),
called only by qed_mcp_trace_dump(), allow these functions to sleep.
I can't tell if the other caller (qed_grc_dump_mcp_hw_dump()) can sleep,
so keep b_can_sleep set to false when it calls these functions.
Jakub Kicinski [Thu, 5 Jan 2023 04:17:19 +0000 (20:17 -0800)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:
====================
bpf 2023-01-04
We've added 5 non-merge commits during the last 8 day(s) which contain
a total of 5 files changed, 112 insertions(+), 18 deletions(-).
The main changes are:
1) Always use maximal size for copy_array in the verifier to fix
KASAN tracking, from Kees.
2) Fix bpf task iterator walking through dead tasks, from Kui-Feng.
3) Make sure livepatch and bpf fexit can coexist, from Chuang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Always use maximal size for copy_array()
selftests/bpf: add a test for iter/task_vma for short-lived processes
bpf: keep a reference to the mm, in case the task is dead.
selftests/bpf: Temporarily disable part of btf_dump:var_data test.
bpf: Fix panic due to wrong pageattr of im->image
====================
Linus Torvalds [Thu, 5 Jan 2023 01:13:53 +0000 (17:13 -0800)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"Mostly fixes all over the place, a couple of cleanups"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (32 commits)
virtio_blk: Fix signedness bug in virtblk_prep_rq()
vdpa_sim_net: should not drop the multicast/broadcast packet
vdpasim: fix memory leak when freeing IOTLBs
vdpa: conditionally fill max max queue pair for stats
vdpa/vp_vdpa: fix kfree a wrong pointer in vp_vdpa_remove
vduse: Validate vq_num in vduse_validate_config()
tools/virtio: remove smp_read_barrier_depends()
tools/virtio: remove stray characters
vhost_vdpa: fix the crash in unmap a large memory
virtio: Implementing attribute show with sysfs_emit
virtio-crypto: fix memory leak in virtio_crypto_alg_skcipher_close_session()
tools/virtio: Variable type completion
vdpa_sim: fix vringh initialization in vdpasim_queue_ready()
virtio_blk: use UINT_MAX instead of -1U
vhost-vdpa: fix an iotlb memory leak
vhost: fix range used in translate_desc()
vringh: fix range used in iotlb_translate()
vhost/vsock: Fix error handling in vhost_vsock_init()
vdpa_sim: fix possible memory leak in vdpasim_net_init() and vdpasim_blk_init()
tools: Delete the unneeded semicolon after curly braces
...
There are apparently still users out there of this driver. While we'd
love to remove it to ease the maintenance burden, let's reinstate it
for now until better (userspace) solutions can be developed.
Jens Axboe [Wed, 4 Jan 2023 20:49:54 +0000 (13:49 -0700)]
io_uring: move 'poll_multi_queue' bool in io_ring_ctx
The cacheline section holding this variable has two gaps, where one is
caused by this bool not packing well with structs. This causes it to
blow into the next cacheline. Move the variable, shrinking io_ring_ctx
by a full cacheline in size.
Jens Axboe [Wed, 4 Jan 2023 15:52:06 +0000 (08:52 -0700)]
block: don't allow splitting of a REQ_NOWAIT bio
If we split a bio marked with REQ_NOWAIT, then we can trigger spurious
EAGAIN if constituent parts of that split bio end up failing request
allocations. Parts will complete just fine, but just a single failure
in one of the chained bios will yield an EAGAIN final result for the
parent bio.
Return EAGAIN early if we end up needing to split such a bio, which
allows for saner recovery handling.
Cc: stable@vger.kernel.org # 5.15+ Link: https://github.com/axboe/liburing/issues/766 Reported-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Wed, 4 Jan 2023 20:11:29 +0000 (12:11 -0800)]
Merge tag 'x86-urgent-2023-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
"Fix a double-free bug, a binutils warning, a header namespace clash
and a bug in ib_prctl_set()"
* tag 'x86-urgent-2023-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Flush IBP in ib_prctl_set()
x86/insn: Avoid namespace clash by separating instruction decoder MMIO type from MMIO trace type
x86/asm: Fix an assembler warning with current binutils
x86/kexec: Fix double-free of elf header buffer
Linus Torvalds [Wed, 4 Jan 2023 20:02:26 +0000 (12:02 -0800)]
Merge tag 'f2fs-fix-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
- fix a null pointer dereference in f2fs_issue_flush, which occurs by
the combination of mount/remount options.
- fix a bug in per-block age-based extent_cache newly introduced in
6.2-rc1, which reported a wrong age information in extent_cache.
- fix a kernel panic if extent_tree was not created, which was caught
by a wrong BUG_ON
* tag 'f2fs-fix-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: let's avoid panic if extent_tree is not created
f2fs: should use a temp extent_info for lookup
f2fs: don't mix to use union values in extent_info
f2fs: initialize extent_cache parameter
f2fs: fix to avoid NULL pointer dereference in f2fs_issue_flush()
Jesus Sanchez-Palencia [Wed, 4 Jan 2023 19:34:14 +0000 (11:34 -0800)]
perf tools: Fix build on uClibc systems by adding missing sys/types.h include
Not all libc implementations define ssize_t as part of stdio.h like
glibc does since the standard only requires this type to be defined by
unistd.h and sys/types.h. For this reason the perf build is currently
broken for toolchains based on uClibc, for instance.
Include sys/types.h explicitly to fix that.
Committer notes:
In addition, in the past this worked in uClibc test systems as there was
another way to get to sys/types.h that got removed in that cset:
tools/perf/util/trace-event.h
/usr/include/traceevent/event_parse.h # This got removed from util/trace-event.h in 378ef0f5d9d7f465
/usr/include/regex.h
/usr/include/sys/types.h
typedef __ssize_t ssize_t;
So the size_t that is used in tools/perf/util/trace-event.h was being
obtained indirectly, by chance.
Fixes: 378ef0f5d9d7f465 ("perf build: Use libtraceevent from the system") Signed-off-by: Jesus Sanchez-Palencia <jesussanp@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20230104193414.606905-1-jesussanp@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Wed, 4 Jan 2023 19:26:36 +0000 (11:26 -0800)]
Merge tag 'nfsd-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix a filecache UAF during NFSD shutdown
- Avoid exposing automounted mounts on NFS re-exports
* tag 'nfsd-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix handling of readdir in v4root vs. mount upcall timeout
nfsd: shut down the NFSv4 state objects before the filecache
Jens Axboe [Wed, 4 Jan 2023 15:51:19 +0000 (08:51 -0700)]
block: handle bio_split_to_limits() NULL return
This can't happen right now, but in preparation for allowing
bio_split_to_limits() returning NULL if it ended the bio, check for it
in all the callers.
Jens Axboe [Wed, 4 Jan 2023 14:48:37 +0000 (07:48 -0700)]
ARM: renumber bits related to _TIF_WORK_MASK
We want to ensure that the mask related to calling do_work_pending()
is within the first 16 bits. Move bits unrelated to that outside of
that range, to avoid spuriously calling do_work_pending() when we don't
need to.
Cc: stable@vger.kernel.org Fixes: 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL") Reported-and-tested-by: Hui Tang <tanghui20@huawei.com> Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk> Link: https://lore.kernel.org/lkml/7ecb8f3c-2aeb-a905-0d4a-aa768b9649b5@huawei.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
Namhyung Kim [Wed, 4 Jan 2023 06:44:02 +0000 (22:44 -0800)]
perf stat: Fix handling of --for-each-cgroup with --bpf-counters to match non BPF mode
The --for-each-cgroup can have the same cgroup multiple times, but this
confuses BPF counters (since they have the same cgroup id), making only
the last cgroup events to be counted.
Let's check the cgroup name before adding a new entry to the cgroups
list.
Before:
$ sudo ./perf stat -a --bpf-counters --for-each-cgroup /,/ sleep 1
As a reminder, to test with BPF counters one has to use BUILD_BPF_SKEL=1
in the make command line and have clang/llvm installed when building
perf, otherwise the --bpf-counters option will not be available:
# perf stat -a --bpf-counters --for-each-cgroup /,/ sleep 1
Error: unknown option `bpf-counters'
Usage: perf stat [<options>] [<command>]
-a, --all-cpus system-wide collection from all CPUs
<SNIP>
#
Fixes: bb1c15b60b981d10 ("perf stat: Support regex pattern in --for-each-cgroup") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: bpf@vger.kernel.org Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/r/20230104064402.1551516-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 4 Jan 2023 06:44:01 +0000 (22:44 -0800)]
perf stat: Fix handling of unsupported cgroup events when using BPF counters
When --for-each-cgroup option is used, it fails when any of events is
not supported and exits immediately. This is not how 'perf stat'
handles unsupported events.
Let's ignore the failure and proceed with others so that the output is
similar to when BPF counters are not used:
Before:
$ sudo ./perf stat -a --bpf-counters -e L1-icache-loads,L1-dcache-loads --for-each-cgroup system.slice,user.slice sleep 1
Failed to open first cgroup events
$
After it shows output similat to when --bpf-counters isn't specified:
$ sudo ./perf stat -a --bpf-counters -e L1-icache-loads,L1-dcache-loads --for-each-cgroup system.slice,user.slice sleep 1
Thomas Richter [Wed, 28 Dec 2022 14:57:03 +0000 (15:57 +0100)]
perf test record_probe_libc_inet_pton: Fix test on s/390 where 'text_to_binary_address' now appears on the backtrace
perf test '84: probe libc's inet_pton & backtrace it with ping' fails on
s390. Debugging revealed a changed stack trace for the ping command
using probes:
Srivatsa S. Bhat (VMware) [Tue, 3 Jan 2023 22:09:41 +0000 (14:09 -0800)]
MAINTAINERS: Update maintainers for ptp_vmw driver
Vivek has decided to transfer the maintainership of the VMware virtual
PTP clock driver (ptp_vmw) to Srivatsa and Deep. Update the
MAINTAINERS file to reflect this change, and also add Alexey as a
reviewer for the driver.
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Acked-by: Vivek Thampi <vivek@vivekthampi.com> Acked-by: Deep Shah <sdeep@vmware.com> Acked-by: Alexey Makhalov <amakhalov@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov [Wed, 4 Jan 2023 01:34:02 +0000 (01:34 +0000)]
io_uring: pin context while queueing deferred tw
Unlike normal tw, nothing prevents deferred tw to be executed right
after an tw item added to ->work_llist in io_req_local_work_add(). For
instance, the waiting task may get waken up by CQ posting or a normal
tw. Thus we need to pin the ring for the rest of io_req_local_work_add()
Thomas Richter [Fri, 30 Dec 2022 10:26:27 +0000 (11:26 +0100)]
perf lock contention: Fix core dump related to not finding the "__sched_text_end" symbol on s/390
The test case perf lock contention dumps core on s390. Run the following
commands:
# ./perf lock record -- ./perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 2.799 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.073 MB perf.data (100 samples) ]
#
# ./perf lock contention
Segmentation fault (core dumped)
#
The function call stack is lengthy, here are the top 5 functions:
# gdb ./perf core.24048
GNU gdb (GDB) Fedora Linux 12.1-6.fc37
Core was generated by `./perf lock contention'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28, addr=1789230) at util/machine.c:3356
3356 machine->sched.text_end = kmap->unmap_ip(kmap, sym->start);
(gdb) where
#0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28, addr=1789230) at util/machine.c:3356
#1 0x000000000109f244 in callchain_id (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:957
#2 0x000000000109e094 in get_key_by_aggr_mode (key=0x3ffea4f7290, addr=27758136, evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:586
#3 0x000000000109f4d0 in report_lock_contention_begin_event (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:1004
#4 0x00000000010a00ae in evsel__process_contention_begin (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:1254
#5 0x00000000010a0e14 in process_sample_event (tool=0x3ffea4f8480, event=0x3ff85601ef8, sample=0x3ffea4f77d0, evsel=0x30313e0, machine=0x3029e28) at builtin-lock.c:1464
.....
The issue is in function machine__is_lock_function() in file
./util/machine.c lines 3355:
/* should not fail from here */
sym = machine__find_kernel_symbol_by_name(machine, "__sched_text_end", &kmap);
machine->sched.text_end = kmap->unmap_ip(kmap, sym->start)
On s390 the symbol __sched_text_end is *NOT* in the symbol list and the
resulting pointer sym is set to NULL. The sym->start is then a NULL pointer
access and generates the core dump.
The reason why __sched_text_end is not in the symbol list on s390 is
simple:
When the symbol list is created at perf start up with function calls
two symbols have identical addresses and __sched_text_end is considered
duplicate (in ascending sort order) and removed from the symbol list.
Therefore it is missing and an invalid pointer reference occurs. The
code checks for symbol __sched_text_start and when it exists assumes
symbol __sched_text_end is also in the symbol table. However this is not
the case on s390.
Same situation exists for symbol __lock_text_start:
This symbol is also removed from the symbol table but used in function
machine__is_lock_function().
To fix this and keep duplicate symbols in the symbol table, set
symbol_conf.allow_aliases to true. This prevents the removal of
duplicate symbols in function symbols__fixup_duplicate().
Output After:
# ./perf lock contention
contended total wait max wait avg wait type caller
48 124.39 ms 123.99 ms 2.59 ms rwsem:W unlink_anon_vmas+0x24a
47 83.68 ms 83.26 ms 1.78 ms rwsem:W free_pgtables+0x132
5 41.22 us 10.55 us 8.24 us rwsem:W free_pgtables+0x140
4 40.12 us 20.55 us 10.03 us rwsem:W copy_process+0x1ac8
#
Fixes: 0d2997f750d1de39 ("perf lock: Look up callchain for the contended locks") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20221230102627.2410847-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 3 Jan 2023 07:09:16 +0000 (23:09 -0800)]
perf build: Don't propagate subdir to submakes for install_headers
subdir is added to the OUTPUT which fails as part of building
install_headers when passed from "make -C tools perf_install".
Committer testing:
The original reporter (see the Link: below) had trouble with this:
$ make -C tools perf_install
That ended up with errors like this:
/var/home/acme/git/perf-urgent/tools/scripts/Makefile.include:17: *** output directory "/var/home/acme/git/perf-urgent/tools/perf/libperf/perf/" does not exist. Stop.
With this patch applied we now get it installed at:
$ ls -la /var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h
-rw-r--r--. 1 acme acme 1146 Jan 3 15:42 /var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h
And if we clean tools with:
$ make -C tools clean
it gets cleaned up:
$ ls -la /var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h
ls: cannot access '/var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h': No such file or directory
$
Fixes: 746bd29e348f99b4 ("perf build: Use tools/lib headers from install path") Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/fa4b3115-d555-3d7f-54d1-018002e99350@secunet.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
b) we will never allocate memory for SM_I(sbi)->fcc_info w/ below
testcase,
- mount -o ro /dev/vda /mnt/f2fs
- mount -o rw,remount /dev/vda /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=1 conv=fsync
In order to fix this issue, let change as below:
- fix error path handling in f2fs_create_flush_cmd_control().
- allocate SM_I(sbi)->fcc_info even if readonly is on.
Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Qu Wenruo [Wed, 21 Dec 2022 23:59:17 +0000 (07:59 +0800)]
btrfs: fix compat_ro checks against remount
[BUG]
Even with commit 81d5d61454c3 ("btrfs: enhance unsupported compat RO
flags handling"), btrfs can still mount a fs with unsupported compat_ro
flags read-only, then remount it RW:
# mount /dev/loop0 /mnt/btrfs
mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
dmesg(1) may have more information after failed mount system call.
^^^ RW mount failed as expected ^^^
# dmesg -t | tail -n5
loop0: detected capacity change from 0 to 1048576
BTRFS: device fsid cb5b82f5-0fdd-4d81-9b4b-78533c324afa devid 1 transid 7 /dev/loop0 scanned by mount (1146)
BTRFS info (device loop0): using crc32c (crc32c-intel) checksum algorithm
BTRFS info (device loop0): using free space tree
BTRFS error (device loop0): cannot mount read-write because of unknown compat_ro features (0x403)
BTRFS error (device loop0): open_ctree failed
# mount /dev/loop0 -o ro /mnt/btrfs
# mount -o remount,rw /mnt/btrfs
^^^ RW remount succeeded unexpectedly ^^^
[CAUSE]
Currently we use btrfs_check_features() to check compat_ro flags against
our current mount flags.
That function get reused between open_ctree() and btrfs_remount().
But for btrfs_remount(), the super block we passed in still has the old
mount flags, thus btrfs_check_features() still believes we're mounting
read-only.
[FIX]
Replace the existing @sb argument with @is_rw_mount.
As originally we only use @sb to determine if the mount is RW.
Now it's callers' responsibility to determine if the mount is RW, and
since there are only two callers, the check is pretty simple:
- caller in open_ctree()
Just pass !sb_rdonly().
- caller in btrfs_remount()
Pass !(*flags & SB_RDONLY), as our check should be against the new
flags.
Now we can correctly reject the RW remount:
# mount /dev/loop0 -o ro /mnt/btrfs
# mount -o remount,rw /mnt/btrfs
mount: /mnt/btrfs: mount point not mounted or bad option.
dmesg(1) may have more information after failed mount system call.
# dmesg -t | tail -n 1
BTRFS error (device loop0: state M): cannot mount read-write because of unknown compat_ro features (0x403)
Qu Wenruo [Mon, 26 Dec 2022 01:00:40 +0000 (09:00 +0800)]
btrfs: always report error in run_one_delayed_ref()
Currently we have a btrfs_debug() for run_one_delayed_ref() failure, but
if end users hit such problem, there will be no chance that
btrfs_debug() is enabled. This can lead to very little useful info for
debugging.
This patch will:
- Add extra info for error reporting
Including:
* logical bytenr
* num_bytes
* type
* action
* ref_mod
- Replace the btrfs_debug() with btrfs_err()
- Move the error reporting into run_one_delayed_ref()
This is to avoid use-after-free, the @node can be freed in the caller.
This error should only be triggered at most once.
As if run_one_delayed_ref() failed, we trigger the error message, then
causing the call chain to error out:
And we will abort the current transaction in btrfs_run_delayed_refs().
If we have to run delayed refs for the abort transaction,
run_one_delayed_ref() will just cleanup the refs and do nothing, thus no
new error messages would be output.
Qu Wenruo [Sun, 1 Jan 2023 01:02:21 +0000 (09:02 +0800)]
btrfs: handle case when repair happens with dev-replace
[BUG]
There is a bug report that a BUG_ON() in btrfs_repair_io_failure()
(originally repair_io_failure() in v6.0 kernel) got triggered when
replacing a unreliable disk:
Before the BUG_ON(), we got some read errors from the replace target
first, note the mirror number (3, which is beyond RAID1 duplication,
thus it's read from the replace target device).
Then at the BUG_ON() location, we are trying to writeback the repaired
sectors back the failed device.
The check looks like this:
ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical,
&map_length, &bioc, mirror_num);
if (ret)
goto out_counter_dec;
BUG_ON(mirror_num != bioc->mirror_num);
But inside btrfs_map_block(), we can modify bioc->mirror_num especially
for dev-replace:
Thus if we're repairing the replace target device, we're going to
trigger that BUG_ON().
But in reality, the read failure from the replace target device may be
that, our replace hasn't reached the range we're reading, thus we're
reading garbage, but with replace running, the range would be properly
filled later.
Thus in that case, we don't need to do anything but let the replace
routine to handle it.
[FIX]
Instead of a BUG_ON(), just skip the repair if we're repairing the
device replace target device.
Filipe Manana [Fri, 23 Dec 2022 18:28:53 +0000 (18:28 +0000)]
btrfs: fix off-by-one in delalloc search during lseek
During lseek, when searching for delalloc in a range that represents a
hole and that range has a length of 1 byte, we end up not doing the actual
delalloc search in the inode's io tree, resulting in not correctly
reporting the offset with data or a hole. This actually only happens when
the start offset is 0 because with any other start offset we round it down
by sector size.
Reproducer:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt/sdc
$ xfs_io -f -c "pwrite -q 0 1" /mnt/sdc/foo
$ xfs_io -c "seek -d 0" /mnt/sdc/foo
Whence Result
DATA EOF
It should have reported an offset of 0 instead of EOF.
Fix this by updating btrfs_find_delalloc_in_range() and count_range_bits()
to deal with inclusive ranges properly. These functions are already
supposed to work with inclusive end offsets, they just got it wrong in a
couple places due to off-by-one mistakes.
A test case for fstests will be added later.
Reported-by: Joan Bruguera Micó <joanbrugueram@gmail.com> Link: https://lore.kernel.org/linux-btrfs/20221223020509.457113-1-joanbrugueram@gmail.com/ Fixes: b6e833567ea1 ("btrfs: make hole and data seeking a lot more efficient") CC: stable@vger.kernel.org # 6.1 Tested-by: Joan Bruguera Micó <joanbrugueram@gmail.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 28 Dec 2022 23:32:24 +0000 (07:32 +0800)]
btrfs: fix false alert on bad tree level check
[BUG]
There is a bug report that on a RAID0 NVMe btrfs system, under heavy
write load the filesystem can flip RO randomly.
With extra debugging, it shows some tree blocks failed to pass their
level checks, and if that happens at critical path of a transaction, we
abort the transaction:
BTRFS error (device nvme0n1p3): level verify failed on logical 5446121209856 mirror 1 wanted 0 found 1
BTRFS error (device nvme0n1p3: state A): Transaction aborted (error -5)
BTRFS: error (device nvme0n1p3: state A) in btrfs_finish_ordered_io:3343: errno=-5 IO failure
BTRFS info (device nvme0n1p3: state EA): forced readonly
[CAUSE]
The reporter has already bisected to commit 947a629988f1 ("btrfs: move
tree block parentness check into validate_extent_buffer()").
And with extra debugging, it shows we can have btrfs_tree_parent_check
filled with all zeros in the following call trace:
Currently we only copy the btrfs_tree_parent_check structure into bbio
at read_extent_buffer_pages() after we have assembled the bbio.
But as shown above, submit_extent_page() itself can already submit the
bbio, leaving the bbio->parent_check uninitialized, and cause the false
alert.
[FIX]
Instead of copying @check into bbio after bbio is assembled, we pass
@check in btrfs_bio_ctrl::parent_check, and copy the content of
parent_check in submit_one_bio() for metadata read.
By this we should be able to pass the needed info for metadata endio
verification, and fix the false alert.
Qu Wenruo [Wed, 28 Dec 2022 23:32:23 +0000 (07:32 +0800)]
btrfs: add error message for metadata level mismatch
From a recent regression report, we found that after commit 947a629988f1
("btrfs: move tree block parentness check into
validate_extent_buffer()") if we have a level mismatch (false alert
though), there is no error message at all.
This makes later debugging harder. This patch will add the proper error
message for such case.
Tanmay Bhushan [Sat, 31 Dec 2022 15:05:01 +0000 (16:05 +0100)]
btrfs: fix ASSERT em->len condition in btrfs_get_extent
The em->len value is supposed to be verified in the assertion condition
though we expect it to be same as the sectorsize.
Fixes: a196a8944f77 ("btrfs: do not reset extent map members for inline extents read") Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Tanmay Bhushan <007047221b@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
We see that its just the first call to inet_pton() that didn't came thru
getaddrinfo(), so if we ignore the first the script matches what it
expects, testing that using 'perf probe' + 'perf record' + 'perf script'
with callchains on userspace targets is producing the expected results.
Since we don't have a 'perf script --skip' to help us here, use tac +
grep to do that, resulting in a one liner that makes this script work on
both older glibc versions as well as with 2.35.
With it, on fedora 36, x86, glibc 2.35:
# perf test inet_pton
90: probe libc's inet_pton & backtrace it with ping : Ok
# perf test -v inet_pton
90: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 627197
ping 627220 1 267956.962402: probe_libc:inet_pton_1: (7f488bf314c0)
1314c0 __GI___inet_pton+0x0 (/usr/lib64/libc.so.6)
fa6c6 getaddrinfo+0x126 (/usr/lib64/libc.so.6)
491e n (/usr/bin/ping)
test child finished with 0
---- end ----
probe libc's inet_pton & backtrace it with ping: Ok
#
And on Ubuntu 22.04.1 LTS on a Libre Computer ROC-RK3399-PC arm64 system:
Before this patch it works (see that the script used has no 'tac' to
remove the first event):
root@roc-rk3399-pc:~# dpkg -l | grep libc-bin
ii libc-bin 2.35-0ubuntu3.1 arm64 GNU C Library: Binaries
root@roc-rk3399-pc:~# grep -w tac ~acme/libexec/perf-core/tests/shell/record+probe_libc_inet_pton.sh
root@roc-rk3399-pc:~# perf test inet_pton
86: probe libc's inet_pton & backtrace it with ping : Ok
root@roc-rk3399-pc:~# perf test -v inet_pton
86: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 1375
ping 1399 [000] 4114.417450: probe_libc:inet_pton: (ffffb3e26120)
106120 inet_pton+0x0 (/usr/lib/aarch64-linux-gnu/libc.so.6)
d18bc getaddrinfo+0xec (/usr/lib/aarch64-linux-gnu/libc.so.6)
2b68 [unknown] (/usr/bin/ping)
test child finished with 0
---- end ----
probe libc's inet_pton & backtrace it with ping: Ok
root@roc-rk3399-pc:~#
And after it continues to work:
root@roc-rk3399-pc:~# grep -w tac ~acme/libexec/perf-core/tests/shell/record+probe_libc_inet_pton.sh
perf script -i $perf_data | tac | grep -m1 ^ping -B9 | tac > $perf_script
root@roc-rk3399-pc:~# perf test inet_pton
86: probe libc's inet_pton & backtrace it with ping : Ok
root@roc-rk3399-pc:~# perf test -v inet_pton
86: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 6995
ping 7019 [005] 4832.160741: probe_libc:inet_pton: (ffffa62e6120)
106120 inet_pton+0x0 (/usr/lib/aarch64-linux-gnu/libc.so.6)
d18bc getaddrinfo+0xec (/usr/lib/aarch64-linux-gnu/libc.so.6)
2b68 [unknown] (/usr/bin/ping)
test child finished with 0
---- end ----
probe libc's inet_pton & backtrace it with ping: Ok
root@roc-rk3399-pc:~#
Reported-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/Y7QyPkPlDYip3cZH@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Szymon Heidrich [Tue, 3 Jan 2023 09:17:09 +0000 (10:17 +0100)]
usb: rndis_host: Secure rndis_query check against int overflow
Variables off and len typed as uint32 in rndis_query function
are controlled by incoming RNDIS response message thus their
value may be manipulated. Setting off to a unexpectetly large
value will cause the sum with len and 8 to overflow and pass
the implemented validation step. Consequently the response
pointer will be referring to a location past the expected
buffer boundaries allowing information leakage e.g. via
RNDIS_OID_802_3_PERMANENT_ADDRESS OID.
Fixes: ddda08624013 ("USB: rndis_host, various cleanups") Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Anderson [Tue, 3 Jan 2023 06:50:38 +0000 (01:50 -0500)]
net: dpaa: Fix dtsec check for PCS availability
We want to fail if the PCS is not available, not if it is available. Fix
this condition.
Fixes: 5d93cfcf7360 ("net: dpaa: Convert to phylink") Reported-by: Christian Zigotzky <info@xenosoft.de> Signed-off-by: Sean Anderson <seanga2@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Tue, 3 Jan 2023 03:50:12 +0000 (09:20 +0530)]
octeontx2-pf: Fix lmtst ID used in aura free
Current code uses per_cpu pointer to get the lmtst_id mapped to
the core on which aura_free() is executed. Using per_cpu pointer
without preemption disable causing mismatch between lmtst_id and
core on which pointer gets freed. This patch fixes the issue by
disabling preemption around aura_free.
Fixes: ef6c8da71eaf ("octeontx2-pf: cn10K: Reserve LMTST lines per core") Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following patchset contains Netfilter fixes for net:
1) Use signed integer in ipv6_skip_exthdr() called from nf_confirm().
Reported by static analysis tooling, patch from Florian Westphal.
2) Missing set type checks in nf_tables: Validate that set declaration
matches the an existing set type, otherwise bail out with EEXIST.
Currently, nf_tables silently accepts the re-declaration with a
different type but it bails out later with EINVAL when the user adds
entries to the set. This fix is relatively large because it requires
two preparation patches that are included in this batch.
3) Do not ignore updates of timeout and gc_interval parameters in
existing sets.
4) Fix a hang when 0/0 subnets is added to a hash:net,port,net type of
ipset. Except hash:net,port,net and hash:net,iface, the set types don't
support 0/0 and the auxiliary functions rely on this fact. So 0/0 needs
a special handling in hash:net,port,net which was missing (hash:net,iface
was not affected by this bug), from Jozsef Kadlecsik.
5) When adding/deleting large number of elements in one step in ipset,
it can take a reasonable amount of time and can result in soft lockup
errors. This patch is a complete rework of the previous version in order
to use a smaller internal batch limit and at the same time removing
the external hard limit to add arbitrary number of elements in one step.
Also from Jozsef Kadlecsik.
Except for patch #1, which fixes a bug introduced in the previous net-next
development cycle, anything else has been broken for several releases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 2 Jan 2023 19:06:18 +0000 (11:06 -0800)]
Merge tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"First batch of regression and regular fixes:
- regressions:
- fix error handling after conversion to qstr for paths
- fix raid56/scrub recovery caused by uninitialized variable
after conversion to error bitmaps
- restore qgroup backref lookup behaviour after recent
refactoring
- fix leak of device lists at module exit time
- fix resolving backrefs for inline extent followed by prealloc
- reset defrag ioctl buffer on memory allocation error"
* tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix fscrypt name leak after failure to join log transaction
btrfs: scrub: fix uninitialized return value in recover_scrub_rbio
btrfs: fix resolving backrefs for inline extent followed by prealloc
btrfs: fix trace event name typo for FLUSH_DELAYED_REFS
btrfs: restore BTRFS_SEQ_LAST when looking up qgroup backref lookup
btrfs: fix leak of fs devices after removing btrfs module
btrfs: fix an error handling path in btrfs_defrag_leaves()
btrfs: fix an error handling path in btrfs_rename()
Tetsuo Handa [Mon, 2 Jan 2023 14:05:33 +0000 (23:05 +0900)]
fs/ntfs3: don't hold ni_lock when calling truncate_setsize()
syzbot is reporting hung task at do_user_addr_fault() [1], for there is
a silent deadlock between PG_locked bit and ni_lock lock.
Since filemap_update_page() calls filemap_read_folio() after calling
folio_trylock() which will set PG_locked bit, ntfs_truncate() must not
call truncate_setsize() which will wait for PG_locked bit to be cleared
when holding ni_lock lock.
Takashi Iwai [Tue, 22 Nov 2022 11:51:22 +0000 (12:51 +0100)]
x86/kexec: Fix double-free of elf header buffer
After
b3e34a47f989 ("x86/kexec: fix memory leak of elf header buffer"),
freeing image->elf_headers in the error path of crash_load_segments()
is not needed because kimage_file_post_load_cleanup() will take
care of that later. And not clearing it could result in a double-free.
Drop the superfluous vfree() call at the error path of
crash_load_segments().
Program received signal SIGSEGV, Segmentation fault.
0x000000000055219d in find_evsel (evlist=0xfda7b0, event_name=0x0) at util/sort.c:2830
warning: Source file is more recent than executable.
2830 if (event_name[0] == '%') {
Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-11.fc36.x86_64 cyrus-sasl-lib-2.1.27-18.fc36.x86_64 elfutils-debuginfod-client-0.188-3.fc36.x86_64 elfutils-libelf-0.188-3.fc36.x86_64 elfutils-libs-0.188-3.fc36.x86_64 glibc-2.35-20.fc36.x86_64 keyutils-libs-1.6.1-4.fc36.x86_64 krb5-libs-1.19.2-12.fc36.x86_64 libbrotli-1.0.9-7.fc36.x86_64 libcap-2.48-4.fc36.x86_64 libcom_err-1.46.5-2.fc36.x86_64 libcurl-7.82.0-12.fc36.x86_64 libevent-2.1.12-6.fc36.x86_64 libgcc-12.2.1-4.fc36.x86_64 libidn2-2.3.4-1.fc36.x86_64 libnghttp2-1.51.0-1.fc36.x86_64 libpsl-0.21.1-5.fc36.x86_64 libselinux-3.3-4.fc36.x86_64 libssh-0.9.6-4.fc36.x86_64 libstdc++-12.2.1-4.fc36.x86_64 libunistring-1.0-1.fc36.x86_64 libunwind-1.6.2-2.fc36.x86_64 libxcrypt-4.4.33-4.fc36.x86_64 libzstd-1.5.2-2.fc36.x86_64 numactl-libs-2.0.14-5.fc36.x86_64 opencsd-1.2.0-1.fc36.x86_64 openldap-2.6.3-1.fc36.x86_64 openssl-libs-3.0.5-2.fc36.x86_64 slang-2.3.2-11.fc36.x86_64 xz-libs-5.2.5-9.fc36.x86_64 zlib-1.2.11-33.fc36.x86_64
(gdb) bt
#0 0x000000000055219d in find_evsel (evlist=0xfda7b0, event_name=0x0) at util/sort.c:2830
#1 0x0000000000552416 in add_dynamic_entry (evlist=0xfda7b0, tok=0xffb6eb "trace", level=2) at util/sort.c:2976
#2 0x0000000000552d26 in sort_dimension__add (list=0xf93e00 <perf_hpp_list>, tok=0xffb6eb "trace", evlist=0xfda7b0, level=2) at util/sort.c:3193
#3 0x0000000000552e1c in setup_sort_list (list=0xf93e00 <perf_hpp_list>, str=0xffb6eb "trace", evlist=0xfda7b0) at util/sort.c:3227
#4 0x00000000005532fa in __setup_sorting (evlist=0xfda7b0) at util/sort.c:3381
#5 0x0000000000553cdc in setup_sorting (evlist=0xfda7b0) at util/sort.c:3608
#6 0x000000000042eb9f in cmd_report (argc=0, argv=0x7fffffffe470) at builtin-report.c:1596
#7 0x00000000004aee7e in run_builtin (p=0xf64ca0 <commands+288>, argc=3, argv=0x7fffffffe470) at perf.c:330
#8 0x00000000004af0f2 in handle_internal_command (argc=3, argv=0x7fffffffe470) at perf.c:384
#9 0x00000000004af241 in run_argv (argcp=0x7fffffffe29c, argv=0x7fffffffe290) at perf.c:428
#10 0x00000000004af5fc in main (argc=3, argv=0x7fffffffe470) at perf.c:562
(gdb)
So check if we have tracepoint events in add_dynamic_entry() and bail
out instead:
# perf report --stdio -f
This perf binary isn't linked with libtraceevent, can't process probe_perf:lzma_decompress_to_file
Error:
Unknown --sort key: `trace'
#
Fixes: 378ef0f5d9d7f465 ("perf build: Use libtraceevent from the system") Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/Y7MDb7kRaHZB6APC@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jeff Layton [Tue, 13 Dec 2022 18:08:26 +0000 (13:08 -0500)]
nfsd: fix handling of readdir in v4root vs. mount upcall timeout
If v4 READDIR operation hits a mountpoint and gets back an error,
then it will include that entry in the reply and set RDATTR_ERROR for it
to the error.
That's fine for "normal" exported filesystems, but on the v4root, we
need to be more careful to only expose the existence of dentries that
lead to exports.
If the mountd upcall times out while checking to see whether a
mountpoint on the v4root is exported, then we have no recourse other
than to fail the whole operation.
Cc: Steve Dickson <steved@redhat.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777 Reported-by: JianHong Yin <yin-jianhong@163.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org>
Ahelenia Ziemiańska [Tue, 27 Dec 2022 20:58:00 +0000 (21:58 +0100)]
perf tools: Don't include signature in version strings
This explodes the build if HEAD is signed, since the generated version
is gpg: Signature made Mon 26 Dec 2022 20:34:48 CET, then a few more
lines, then the SHA.
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/7c9637711271f50ec2341fb8a7c29585335dab04.1672174189.git.nabijaczleweli@nabijaczleweli.xyz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yang Jihong [Mon, 26 Dec 2022 08:57:03 +0000 (08:57 +0000)]
perf help: Use HAVE_LIBTRACEEVENT to filter out unsupported commands
Commands such as kmem, kwork, lock, sched, trace and timechart depend on
libtraceevent, these commands need to be isolated using HAVE_LIBTRACEEVENT
macro when cmdlist generation.
The output of the generate-cmdlist.sh script is as follows:
static struct cmdname_help common_cmds[] = {
{"annotate", "Read perf.data (created by perf record) and display annotated code"},
{"archive", "Create archive with object files with build-ids found in perf.data file"},
{"bench", "General framework for benchmark suites"},
{"buildid-cache", "Manage build-id cache."},
{"buildid-list", "List the buildids in a perf.data file"},
{"c2c", "Shared Data C2C/HITM Analyzer."},
{"config", "Get and set variables in a configuration file."},
{"daemon", "Run record sessions on background"},
{"data", "Data file related processing"},
{"diff", "Read perf.data files and display the differential profile"},
{"evlist", "List the event names in a perf.data file"},
{"ftrace", "simple wrapper for kernel's ftrace functionality"},
{"inject", "Filter to augment the events stream with additional information"},
{"iostat", "Show I/O performance metrics"},
{"kallsyms", "Searches running kernel for symbols"},
{"kvm", "Tool to trace/measure kvm guest os"},
{"list", "List all symbolic event types"},
{"mem", "Profile memory accesses"},
{"record", "Run a command and record its profile into perf.data"},
{"report", "Read perf.data (created by perf record) and display the profile"},
{"script", "Read perf.data (created by perf record) and display trace output"},
{"stat", "Run a command and gather performance counter statistics"},
{"test", "Runs sanity tests."},
{"top", "System profiling tool."},
{"version", "display the version of perf binary"},
#ifdef HAVE_LIBELF_SUPPORT
{"probe", "Define new dynamic tracepoints"},
#endif /* HAVE_LIBELF_SUPPORT */
#if defined(HAVE_LIBTRACEEVENT) && (defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE_SUPPORT))
{"trace", "strace inspired tool"},
#endif /* HAVE_LIBTRACEEVENT && (HAVE_LIBAUDIT_SUPPORT || HAVE_SYSCALL_TABLE_SUPPORT) */
#ifdef HAVE_LIBTRACEEVENT
{"kmem", "Tool to trace/measure kernel memory properties"},
{"kwork", "Tool to trace/measure kernel work properties (latencies)"},
{"lock", "Analyze lock events"},
{"sched", "Tool to trace/measure scheduler properties (latencies)"},
{"timechart", "Tool to visualize total system behavior during a workload"},
#endif /* HAVE_LIBTRACEEVENT */
};
Fixes: 378ef0f5d9d7f465 ("perf build: Use libtraceevent from the system") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221226085703.95081-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Eric Lin [Sat, 31 Dec 2022 05:27:31 +0000 (05:27 +0000)]
perf tools riscv: Fix build error on riscv due to missing header for 'struct perf_sample'
Since the definition of 'struct perf_sample' has been moved to sample.h,
we need to include this header file to fix the build error as follows:
arch/riscv/util/unwind-libdw.c: In function 'libdw__arch_set_initial_registers':
arch/riscv/util/unwind-libdw.c:12:50: error: invalid use of undefined type 'struct perf_sample'
12 | struct regs_dump *user_regs = &ui->sample->user_regs;
| ^~
Fixes: 9823147da6c893d9 ("perf tools: Move 'struct perf_sample' to a separate header file to disentangle headers") Signed-off-by: Eric Lin <eric.lin@sifive.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: greentime.hu@sifive.com Cc: Jiri Olsa <jolsa@kernel.org> Cc: linux-riscv@lists.infradead.org Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Chen <vincent.chen@sifive.com> Link: https://lore.kernel.org/r/20221231052731.24908-1-eric.lin@sifive.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Paul Menzel [Mon, 2 Jan 2023 13:57:30 +0000 (14:57 +0100)]
fbdev: matroxfb: G200eW: Increase max memory from 1 MB to 16 MB
Commit 62d89a7d49af ("video: fbdev: matroxfb: set maxvram of vbG200eW to
the same as vbG200 to avoid black screen") accidently decreases the
maximum memory size for the Matrox G200eW (102b:0532) from 8 MB to 1 MB
by missing one zero. This caused the driver initialization to fail with
the messages below, as the minimum required VRAM size is 2 MB:
So, add the missing 0 to make it the intended 16 MB. Successfully tested on
the Dell PowerEdge R910/0KYD3D, BIOS 2.10.0 08/29/2013, that the warning is
gone.
While at it, add a leading 0 to the maxdisplayable entry, so it’s aligned
properly. The value could probably also be increased from 8 MB to 16 MB, as
the G200 uses the same values, but I have not checked any datasheet.
Note, matroxfb is obsolete and superseded by the maintained DRM driver
mga200, which is used by default on most systems where both drivers are
available. Therefore, on most systems it was only a cosmetic issue.
Jozsef Kadlecsik [Fri, 30 Dec 2022 12:24:38 +0000 (13:24 +0100)]
netfilter: ipset: Rework long task execution when adding/deleting entries
When adding/deleting large number of elements in one step in ipset, it can
take a reasonable amount of time and can result in soft lockup errors. The
patch 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of
consecutive elements to add/delete") tried to fix it by limiting the max
elements to process at all. However it was not enough, it is still possible
that we get hung tasks. Lowering the limit is not reasonable, so the
approach in this patch is as follows: rely on the method used at resizing
sets and save the state when we reach a smaller internal batch limit,
unlock/lock and proceed from the saved state. Thus we can avoid long
continuous tasks and at the same time removed the limit to add/delete large
number of elements in one step.
The nfnl mutex is held during the whole operation which prevents one to
issue other ipset commands in parallel.
Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete") Reported-by: syzbot+9204e7399656300bf271@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jozsef Kadlecsik [Fri, 30 Dec 2022 12:24:37 +0000 (13:24 +0100)]
netfilter: ipset: fix hash:net,port,net hang with /0 subnet
The hash:net,port,net set type supports /0 subnets. However, the patch
commit 5f7b51bf09baca8e titled "netfilter: ipset: Limit the maximal range
of consecutive elements to add/delete" did not take into account it and
resulted in an endless loop. The bug is actually older but the patch 5f7b51bf09baca8e brings it out earlier.
Handle /0 subnets properly in hash:net,port,net set types.
Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete") Reported-by: Марк Коренберг <socketpair@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Horatiu Vultur [Mon, 2 Jan 2023 12:12:15 +0000 (13:12 +0100)]
net: sparx5: Fix reading of the MAC address
There is an issue with the checking of the return value of
'of_get_mac_address', which returns 0 on success and negative value on
failure. The driver interpretated the result the opposite way. Therefore
if there was a MAC address defined in the DT, then the driver was
generating a random MAC address otherwise it would use address 0.
Fix this by checking correctly the return value of 'of_get_mac_address'
Fixes: b74ef9f9cb91 ("net: sparx5: Do not use mac_addr uninitialized in mchp_sparx5_probe()") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 2 Jan 2023 06:55:56 +0000 (08:55 +0200)]
vxlan: Fix memory leaks in error path
The memory allocated by vxlan_vnigroup_init() is not freed in the error
path, leading to memory leaks [1]. Fix by calling
vxlan_vnigroup_uninit() in the error path.
The leaks can be reproduced by annotating gro_cells_init() with
ALLOW_ERROR_INJECTION() and then running:
# echo "100" > /sys/kernel/debug/fail_function/probability
# echo "1" > /sys/kernel/debug/fail_function/times
# echo "gro_cells_init" > /sys/kernel/debug/fail_function/inject
# printf %#x -12 > /sys/kernel/debug/fail_function/gro_cells_init/retval
# ip link add name vxlan0 type vxlan dstport 4789 external vnifilter
RTNETLINK answers: Cannot allocate memory
Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Mon, 2 Jan 2023 07:17:37 +0000 (23:17 -0800)]
net: sched: htb: fix htb_classify() kernel-doc
Fix W=1 kernel-doc warning:
net/sched/sch_htb.c:214: warning: expecting prototype for htb_classify(). Prototype was for HTB_DIRECT() instead
by moving the HTB_DIRECT() macro above the function.
Add kernel-doc notation for function parameters as well.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 2 Jan 2023 13:32:43 +0000 (13:32 +0000)]
Merge branch 'cls_drop-fix'
Jamal Hadi Salim says:
====================
net: dont intepret cls results when asked to drop
It is possible that an error in processing may occur in tcf_classify() which
will result in res.classid being some garbage value. Example of such a code path
is when the classifier goes into a loop due to bad policy. See patch 1/2
for a sample splat.
While the core code reacts correctly and asks the caller to drop the packet
(by returning TC_ACT_SHOT) some callers first intepret the res.class as
a pointer to memory and end up dropping the packet only after some activity with
the pointer. There is likelihood of this resulting in an exploit. So lets fix
all the known qdiscs that behave this way.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Sun, 1 Jan 2023 21:57:44 +0000 (16:57 -0500)]
net: sched: cbq: dont intepret cls results when asked to drop
If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume that
res.class contains a valid pointer
Sample splat reported by Kyle Zeng
[ 5.405624] 0: reclassify loop, rule prio 0, protocol 800
[ 5.406326] ==================================================================
[ 5.407240] BUG: KASAN: slab-out-of-bounds in cbq_enqueue+0x54b/0xea0
[ 5.407987] Read of size 1 at addr ffff88800e3122aa by task poc/299
[ 5.408731]
[ 5.408897] CPU: 0 PID: 299 Comm: poc Not tainted 5.10.155+ #15
[ 5.409516] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.15.0-1 04/01/2014
[ 5.410439] Call Trace:
[ 5.410764] dump_stack+0x87/0xcd
[ 5.411153] print_address_description+0x7a/0x6b0
[ 5.411687] ? vprintk_func+0xb9/0xc0
[ 5.411905] ? printk+0x76/0x96
[ 5.412110] ? cbq_enqueue+0x54b/0xea0
[ 5.412323] kasan_report+0x17d/0x220
[ 5.412591] ? cbq_enqueue+0x54b/0xea0
[ 5.412803] __asan_report_load1_noabort+0x10/0x20
[ 5.413119] cbq_enqueue+0x54b/0xea0
[ 5.413400] ? __kasan_check_write+0x10/0x20
[ 5.413679] __dev_queue_xmit+0x9c0/0x1db0
[ 5.413922] dev_queue_xmit+0xc/0x10
[ 5.414136] ip_finish_output2+0x8bc/0xcd0
[ 5.414436] __ip_finish_output+0x472/0x7a0
[ 5.414692] ip_finish_output+0x5c/0x190
[ 5.414940] ip_output+0x2d8/0x3c0
[ 5.415150] ? ip_mc_finish_output+0x320/0x320
[ 5.415429] __ip_queue_xmit+0x753/0x1760
[ 5.415664] ip_queue_xmit+0x47/0x60
[ 5.415874] __tcp_transmit_skb+0x1ef9/0x34c0
[ 5.416129] tcp_connect+0x1f5e/0x4cb0
[ 5.416347] tcp_v4_connect+0xc8d/0x18c0
[ 5.416577] __inet_stream_connect+0x1ae/0xb40
[ 5.416836] ? local_bh_enable+0x11/0x20
[ 5.417066] ? lock_sock_nested+0x175/0x1d0
[ 5.417309] inet_stream_connect+0x5d/0x90
[ 5.417548] ? __inet_stream_connect+0xb40/0xb40
[ 5.417817] __sys_connect+0x260/0x2b0
[ 5.418037] __x64_sys_connect+0x76/0x80
[ 5.418267] do_syscall_64+0x31/0x50
[ 5.418477] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 5.418770] RIP: 0033:0x473bb7
[ 5.418952] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00
00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34
24 89
[ 5.420046] RSP: 002b:00007fffd20eb0f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
[ 5.420472] RAX: ffffffffffffffda RBX: 00007fffd20eb578 RCX: 0000000000473bb7
[ 5.420872] RDX: 0000000000000010 RSI: 00007fffd20eb110 RDI: 0000000000000007
[ 5.421271] RBP: 00007fffd20eb150 R08: 0000000000000001 R09: 0000000000000004
[ 5.421671] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[ 5.422071] R13: 00007fffd20eb568 R14: 00000000004fc740 R15: 0000000000000002
[ 5.422471]
[ 5.422562] Allocated by task 299:
[ 5.422782] __kasan_kmalloc+0x12d/0x160
[ 5.423007] kasan_kmalloc+0x5/0x10
[ 5.423208] kmem_cache_alloc_trace+0x201/0x2e0
[ 5.423492] tcf_proto_create+0x65/0x290
[ 5.423721] tc_new_tfilter+0x137e/0x1830
[ 5.423957] rtnetlink_rcv_msg+0x730/0x9f0
[ 5.424197] netlink_rcv_skb+0x166/0x300
[ 5.424428] rtnetlink_rcv+0x11/0x20
[ 5.424639] netlink_unicast+0x673/0x860
[ 5.424870] netlink_sendmsg+0x6af/0x9f0
[ 5.425100] __sys_sendto+0x58d/0x5a0
[ 5.425315] __x64_sys_sendto+0xda/0xf0
[ 5.425539] do_syscall_64+0x31/0x50
[ 5.425764] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 5.426065]
[ 5.426157] The buggy address belongs to the object at ffff88800e312200
[ 5.426157] which belongs to the cache kmalloc-128 of size 128
[ 5.426955] The buggy address is located 42 bytes to the right of
[ 5.426955] 128-byte region [ffff88800e312200, ffff88800e312280)
[ 5.427688] The buggy address belongs to the page:
[ 5.427992] page:000000009875fabc refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0xe312
[ 5.428562] flags: 0x100000000000200(slab)
[ 5.428812] raw: 0100000000000200dead000000000100dead000000000122 ffff888007843680
[ 5.429325] raw: 0000000000000000000000000010001000000001ffffffff ffff88800e312401
[ 5.429875] page dumped because: kasan: bad access detected
[ 5.430214] page->mem_cgroup:ffff88800e312401
[ 5.430471]
[ 5.430564] Memory state around the buggy address:
[ 5.430846] ffff88800e312180: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 5.431267] ffff88800e312200: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 fc
[ 5.431705] >ffff88800e312280: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 5.432123] ^
[ 5.432391] ffff88800e312300: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 fc
[ 5.432810] ffff88800e312380: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 5.433229] ==================================================================
[ 5.433648] Disabling lock debugging due to kernel taint
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Sun, 1 Jan 2023 21:57:43 +0000 (16:57 -0500)]
net: sched: atm: dont intepret cls results when asked to drop
If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume
res.class contains a valid pointer Fixes: b0188d4dbe5f ("[NET_SCHED]: sch_atm: Lindent") Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Miaoqian Lin [Mon, 2 Jan 2023 08:20:39 +0000 (12:20 +0400)]
gpio: sifive: Fix refcount leak in sifive_gpio_probe
of_irq_find_parent() returns a node pointer with refcount incremented,
We should use of_node_put() on it when not needed anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: 96868dce644d ("gpio/sifive: Add GPIO driver for SiFive SoCs") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Xiubo Li [Thu, 17 Nov 2022 02:57:53 +0000 (10:57 +0800)]
ceph: avoid use-after-free in ceph_fl_release_lock()
When ceph releasing the file_lock it will try to get the inode pointer
from the fl->fl_file, which the memory could already be released by
another thread in filp_close(). Because in VFS layer the fl->fl_file
doesn't increase the file's reference counter.
Will switch to use ceph dedicate lock info to track the inode.
And in ceph_fl_release_lock() we should skip all the operations if the
fl->fl_u.ceph.inode is not set, which should come from the request
file_lock. And we will set fl->fl_u.ceph.inode when inserting it to the
inode lock list, which is when copying the lock.
Link: https://tracker.ceph.com/issues/57986 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Xiubo Li [Thu, 17 Nov 2022 02:43:21 +0000 (10:43 +0800)]
ceph: switch to vfs_inode_has_locks() to fix file lock bug
For the POSIX locks they are using the same owner, which is the
thread id. And multiple POSIX locks could be merged into single one,
so when checking whether the 'file' has locks may fail.
For a file where some openers use locking and others don't is a
really odd usage pattern though. Locks are like stoplights -- they
only work if everyone pays attention to them.
Just switch ceph_get_caps() to check whether any locks are set on
the inode. If there are POSIX/OFD/FLOCK locks on the file at the
time, we should set CHECK_FILELOCK, regardless of what fd was used
to set the lock.
Fixes: ff5d913dfc71 ("ceph: return -EIO if read/write against filp that lost file locks") Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Linus Torvalds [Sun, 1 Jan 2023 19:27:00 +0000 (11:27 -0800)]
Merge tag 'perf_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Pass only an initialized perf event attribute to the LSM hook
- Fix a use-after-free on the perf syscall's error path
- A potential integer overflow fix in amd_core_pmu_init()
- Fix the cgroup events tracking after the context handling rewrite
- Return the proper value from the inherit_event() function on error
* tag 'perf_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Call LSM hook after copying perf_event_attr
perf: Fix use-after-free in error path
perf/x86/amd: fix potential integer overflow on shift of a int
perf/core: Fix cgroup events tracking
perf core: Return error pointer if inherit_event() fails to find pmu_ctx
Linus Torvalds [Sun, 1 Jan 2023 19:11:13 +0000 (11:11 -0800)]
Merge tag 'drm-fixes-2023-01-01' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Daniel Vetter:
"I'm just back from the mountains, and Dave is out at the beach and
should be back in a week again. Just i915 fixes and since Rodrigo
bothered to make the pull last week I figured I should warm up gpg and
forward this in a nice signed tag as a new years present!
- i915 fixes for newer platforms
- i915 locking rework to not give up in vm eviction fallback path too
early"
* tag 'drm-fixes-2023-01-01' of git://anongit.freedesktop.org/drm/drm:
drm/i915/dsi: fix MIPI_BKLT_EN_1 native GPIO index
drm/i915/dsi: add support for ICL+ native MIPI GPIO sequence
drm/i915/uc: Fix two issues with over-size firmware files
drm/i915: improve the catch-all evict to handle lock contention
drm/i915: Remove __maybe_unused from mtl_info
drm/i915: fix TLB invalidation for Gen12.50 video and compute engines
As stated in marvell-orion-mdio.txt deleted in commit 0781434af811f
("dt-bindings: net: orion-mdio: Convert to JSON schema") if
'interrupts' property is present, width of 'reg' should be 0x84.
Otherwise, width of 'reg' should be 0x4. Fix 'examples:' and add
constraints checking whether 'interrupts' property is present
and validate it against fixed values in reg.
Signed-off-by: Michał Grzelak <mig@semihalf.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This property has always been supported by the Linux driver; see
commit 9f93ac8d4085 ("net-next: stmmac: Add dwmac-sun8i"). In fact, the
original driver submission includes the phy-supply code but no mention
of it in the binding, so the omission appears to be accidental. In
addition, the property is documented in the binding for the previous
hardware generation, allwinner,sun7i-a20-gmac.
Document phy-supply in the binding to fix devicetree validation for the
25+ boards that already use this property.
Fixes: 0441bde003be ("dt-bindings: net-next: Add DT bindings documentation for Allwinner dwmac-sun8i") Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: David S. Miller <davem@davemloft.net>