Arnaldo Carvalho de Melo [Fri, 13 Dec 2024 19:29:54 +0000 (16:29 -0300)]
tools build feature: Don't set feature-libslang-include-subdir=1 if test-all.c builds
As it is not really included in tools/build/feature/test-all.c, so any
questioning about this feature should really try to build
tools/build/feature/test-libslang-include-subdir.c and not set it as
detected when test-all.c builds.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20241213195052.914914-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 13 Dec 2024 17:59:08 +0000 (14:59 -0300)]
perf tests switch-tracking: Set this test to run exclusively
This test was failing when run with the default 'perf test' mode, which
is to run multiple regression tests in parallel.
Since it checks system_wide mode, set it to run in exclusive mode.
Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/lkml/Z1yPYqYYs_isO1PJ@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Fri, 13 Dec 2024 01:33:20 +0000 (17:33 -0800)]
Merge tag 'v6.13-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- fix ctime setting in setattr
- fix reference count on user session to avoid potential race with
session expire
- fix query dir issue
* tag 'v6.13-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: set ATTR_CTIME flags when setting mtime
ksmbd: fix racy issue from session lookup and expire
ksmbd: retry iterate_dir in smb2_query_dir
Linus Torvalds [Fri, 13 Dec 2024 01:26:55 +0000 (17:26 -0800)]
Merge tag 'perf-tools-fixes-for-v6.13-2024-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
"A set of random fixes for this cycle.
perf record:
- Fix build-id event size calculation in perf record
- Fix perf record -C/--cpu option on hybrid systems
- Fix perf mem record with precise-ip on SapphireRapids
perf test:
- Refresh hwmon directory before reading the test files
- Make sure system_tsc_freq event is tested on x86 only
Others:
- Usual header file sync
- Fix undefined behavior in perf ftrace profile
- Properly initialize a return variable in perf probe"
* tag 'perf-tools-fixes-for-v6.13-2024-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (21 commits)
perf probe: Fix uninitialized variable
libperf: evlist: Fix --cpu argument on hybrid platform
perf test expr: Fix system_tsc_freq for only x86
perf test hwmon_pmu: Fix event file location
perf hwmon_pmu: Use openat rather than dup to refresh directory
perf ftrace: Fix undefined behavior in cmp_profile_data()
perf tools: Fix precise_ip fallback logic
perf tools: Fix build error on generated/fs_at_flags_array.c
tools headers: Sync uapi/linux/prctl.h with the kernel sources
tools headers: Sync uapi/linux/mount.h with the kernel sources
tools headers: Sync uapi/linux/fcntl.h with the kernel sources
tools headers: Sync uapi/asm-generic/mman.h with the kernel sources
tools headers: Sync *xattrat syscall changes with the kernel sources
tools headers: Sync arm64 kvm header with the kernel sources
tools headers: Sync x86 kvm and cpufeature headers with the kernel
tools headers: Sync uapi/linux/kvm.h with the kernel sources
tools headers: Sync uapi/linux/perf_event.h with the kernel sources
tools headers: Sync uapi/drm/drm.h with the kernel sources
perf machine: Initialize machine->env to address a segfault
perf test: Don't signal all processes on system when interrupting tests
...
- wifi: mac80211:
- fix a queue stall in certain cases of channel switch
- wake the queues in case of failure in resume
- splice: do not checksum AF_UNIX sockets
- virtio_net: fix BUG()s in BQL support due to incorrect accounting
of purged packets during interface stop
- eth:
- stmmac: fix TSO DMA API mis-usage causing oops
- bnxt_en: fixes for HW GRO: GSO type on 5750X chips and oops
due to incorrect aggregation ID mask on 5760X chips
Previous releases - always broken:
- Bluetooth: improve setsockopt() handling of malformed user input
- eth: ocelot: fix PTP timestamping in presence of packet loss
- ptp: kvm: x86: avoid "fail to initialize ptp_kvm" when simply not
supported"
* tag 'net-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
net: dsa: tag_ocelot_8021q: fix broken reception
net: dsa: microchip: KSZ9896 register regmap alignment to 32 bit boundaries
net: renesas: rswitch: fix initial MPIC register setting
Bluetooth: btmtk: avoid UAF in btmtk_process_coredump
Bluetooth: iso: Fix circular lock in iso_conn_big_sync
Bluetooth: iso: Fix circular lock in iso_listen_bis
Bluetooth: SCO: Add support for 16 bits transparent voice setting
Bluetooth: iso: Fix recursive locking warning
Bluetooth: iso: Always release hdev at the end of iso_listen_bis
Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating
Bluetooth: hci_core: Fix sleeping function called from invalid context
team: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
team: Fix initial vlan_feature set in __team_compute_features
bonding: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
bonding: Fix initial {vlan,mpls}_feature set in bond_compute_features
net, team, bonding: Add netdev_base_features helper
net/sched: netem: account for backlog updates from child qdisc
net: dsa: felix: fix stuck CPU-injected packets with short taprio windows
splice: do not checksum AF_UNIX sockets
net: usb: qmi_wwan: add Telit FE910C04 compositions
...
Levi Yun [Fri, 8 Nov 2024 14:34:25 +0000 (14:34 +0000)]
perf expr: Initialize is_test value in expr__ctx_new()
When expr_parse_ctx is allocated by expr_ctx_new(),
expr_scanner_ctx->is_test isn't initialize, so it has garbage value.
this can affects the result of expr__parse() return when it parses
non-exist event literal according to garbage value.
Use calloc instead of malloc in expr_ctx_new() to fix this.
Fixes: 3340a08354ac286e ("perf pmu-events: Fix testing with JEVENTS_ARCH=all") Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241108143424.819126-1-yeoreum.yun@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiapeng Chong [Fri, 15 Nov 2024 09:15:27 +0000 (17:15 +0800)]
perf tests: Fix an incorrect type in append_script()
The return value from the call to readlink() is ssize_t. However, the
return value is being assigned to an size_t variable 'len', so making
'len' an ssize_t.
./tools/perf/tests/tests-scripts.c:182:5-8: WARNING: Unsigned expression compared with zero: len < 0.
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=11909 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241115091527.128923-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 20 Nov 2024 06:52:24 +0000 (22:52 -0800)]
perf string: Avoid undefined NULL+1
While the value NULL+1 is never used it triggers a ubsan warning.
Restructure and comment the loop to avoid this.
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241120065224.286813-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Wed, 20 Nov 2024 14:37:31 +0000 (14:37 +0000)]
perf vendor events arm64: Update N2/V2 events from source
Update using the new data [1] for these changes:
* Scale some metrics like dtlb_walk_ratio to percent so they display
better with Perf's 2 dp precision
* Description typos, grammar and clarifications
* Unnecessary metric formula brackets seem to have been removed in the
source but this is not a functional change
* New sve_all_percentage metric
The following command was used to generate this commit:
Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241120143739.243728-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Thu, 28 Nov 2024 01:03:25 +0000 (17:03 -0800)]
perf tools: Avoid unaligned pointer operations
The sample data is 64-bit aligned basically but raw data starts with
32-bit length field and data follows. In perf_event__synthesize_sample
it treats the sample data as a 64-bit array. And it needs some trick
to update the raw data properly.
But it seems some compilers are not happy with this and the program dies
siliently. I found the sample parsing test failed without any messages
on affected systems.
Let's update the code to use a 32-bit pointer directly and make sure the
result is 64-bit aligned again. No functional changes intended.
Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241128010325.946897-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 11 Dec 2024 19:51:08 +0000 (16:51 -0300)]
tools build feature: Don't set feature-libcap=1 if libcap-devel isn't available
libcap isn't tested in the tools/build/feature/test-all.c fast path
feature detection process, so don't set it as available if test-all
manages to build.
There are other users of this feature detection mechanism, and they
explicitely ask for libcap to be tested, so are not affected by this
patch, for instance, with this patch in place:
$ make -C tools/bpf/bpftool/ clean
<SNIP>
make: Leaving directory '/home/acme/git/perf-tools-next/tools/bpf/bpftool'
⬢ [acme@toolbox perf-tools-next]$ make -C tools/bpf/bpftool/
make: Entering directory '/home/acme/git/perf-tools-next/tools/bpf/bpftool'
Auto-detecting system features:
... clang-bpf-co-re: [ on ]
... llvm: [ on ]
... libcap: [ on ]
... libbfd: [ on ]
... libelf-zstd: [ on ]
<SNIP>
LINK bpftool
make: Leaving directory '/home/acme/git/perf-tools-next/tools/bpf/bpftool'
$
$ sudo rpm -e libcap-devel
$ make -C tools/bpf/bpftool/
<SNIP>
make: Entering directory '/home/acme/git/perf-tools-next/tools/bpf/bpftool'
Auto-detecting system features:
... clang-bpf-co-re: [ on ]
... llvm: [ on ]
... libcap: [ OFF ]
... libbfd: [ on ]
... libelf-zstd: [ on ]
$
Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Quentin Monnet <qmo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20241211224509.797827-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 11 Dec 2024 19:30:02 +0000 (16:30 -0300)]
tools build feature: Add some comments to explain the FEATURE_TESTS logic
The tools/build/feature/test-all.c works in conjunction with the
tools/build/Makefile.feature FEATURE_TESTS_BASIC and FEATURE_TESTS_EXTRA
contents, so that if test-all.c manages to be built, we go on and
iterate all entries in FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA setting
them to 1.
There are inconsistencies that are being audited, as can be seen above
with the libcap case, that is not linked with test-all.bin nor is
present in test-all.c, so shouldn't be set as present. Further patches
are going to address those inconsistencies, but lets document this a bit
more to reduce the chances of this happening again.
Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20241211224509.797827-2-acme@kernel.org
[ Fixed typo pointed out by Ian Rogers ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jakub Kicinski [Thu, 12 Dec 2024 15:10:39 +0000 (07:10 -0800)]
Merge tag 'for-net-2024-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- SCO: Fix transparent voice setting
- ISO: Locking fixes
- hci_core: Fix sleeping function called from invalid context
- hci_event: Fix using rcu_read_(un)lock while iterating
- btmtk: avoid UAF in btmtk_process_coredump
* tag 'for-net-2024-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btmtk: avoid UAF in btmtk_process_coredump
Bluetooth: iso: Fix circular lock in iso_conn_big_sync
Bluetooth: iso: Fix circular lock in iso_listen_bis
Bluetooth: SCO: Add support for 16 bits transparent voice setting
Bluetooth: iso: Fix recursive locking warning
Bluetooth: iso: Always release hdev at the end of iso_listen_bis
Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating
Bluetooth: hci_core: Fix sleeping function called from invalid context
Bluetooth: Improve setsockopt() handling of malformed user input
====================
Robert Hodaszi [Wed, 11 Dec 2024 14:47:41 +0000 (15:47 +0100)]
net: dsa: tag_ocelot_8021q: fix broken reception
The blamed commit changed the dsa_8021q_rcv() calling convention to
accept pre-populated source_port and switch_id arguments. If those are
not available, as in the case of tag_ocelot_8021q, the arguments must be
pre-initialized with -1.
Due to the bug of passing uninitialized arguments in tag_ocelot_8021q,
dsa_8021q_rcv() does not detect that it needs to populate the
source_port and switch_id, and this makes dsa_conduit_find_user() fail,
which leads to packet loss on reception.
Fixes: dcfe7673787b ("net: dsa: tag_sja1105: absorb logic for not overwriting precise info into dsa_8021q_rcv()") Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241211144741.1415758-1-robert.hodaszi@digi.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jesse Van Gavere [Wed, 11 Dec 2024 09:29:32 +0000 (10:29 +0100)]
net: dsa: microchip: KSZ9896 register regmap alignment to 32 bit boundaries
Commit 8d7ae22ae9f8 ("net: dsa: microchip: KSZ9477 register regmap
alignment to 32 bit boundaries") fixed an issue whereby regmap_reg_range
did not allow writes as 32 bit words to KSZ9477 PHY registers, this fix
for KSZ9896 is adapted from there as the same errata is present in
KSZ9896C as "Module 5: Certain PHY registers must be written as pairs
instead of singly" the explanation below is likewise taken from this
commit.
The commit provided code
to apply "Module 6: Certain PHY registers must be written as pairs instead
of singly" errata for KSZ9477 as this chip for certain PHY registers
(0xN120 to 0xN13F, N=1,2,3,4,5) must be accessed as 32 bit words instead
of 16 or 8 bit access.
Otherwise, adjacent registers (no matter if reserved or not) are
overwritten with 0x0.
Without this patch some registers (e.g. 0x113c or 0x1134) required for 32
bit access are out of valid regmap ranges.
As a result, following error is observed and KSZ9896 is not properly
configured:
ksz-switch spi1.0: can't rmw 32bit reg 0x113c: -EIO
ksz-switch spi1.0: can't rmw 32bit reg 0x1134: -EIO
ksz-switch spi1.0 lan1 (uninitialized): failed to connect to PHY: -EIO
ksz-switch spi1.0 lan1 (uninitialized): error -5 setting up PHY for tree 0, switch 0, port 0
The solution is to modify regmap_reg_range to allow accesses with 4 bytes
boundaries.
Thadeu Lima de Souza Cascardo [Tue, 10 Dec 2024 19:36:10 +0000 (16:36 -0300)]
Bluetooth: btmtk: avoid UAF in btmtk_process_coredump
hci_devcd_append may lead to the release of the skb, so it cannot be
accessed once it is called.
==================================================================
BUG: KASAN: slab-use-after-free in btmtk_process_coredump+0x2a7/0x2d0 [btmtk]
Read of size 4 at addr ffff888033cfabb0 by task kworker/0:3/82
The buggy address belongs to the object at ffff888033cfab40
which belongs to the cache skbuff_head_cache of size 232
The buggy address is located 112 bytes inside of
freed 232-byte region [ffff888033cfab40, ffff888033cfac28)
Memory state around the buggy address: ffff888033cfaa80: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc ffff888033cfab00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
>ffff888033cfab80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff888033cfac00: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc ffff888033cfac80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Check if we need to call hci_devcd_complete before calling
hci_devcd_append. That requires that we check data->cd_info.cnt >=
MTK_COREDUMP_NUM instead of data->cd_info.cnt > MTK_COREDUMP_NUM, as we
increment data->cd_info.cnt only once the call to hci_devcd_append
succeeds.
Fixes: 0b7015132878 ("Bluetooth: btusb: mediatek: add MediaTek devcoredump support") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Iulia Tanasescu [Mon, 9 Dec 2024 09:42:18 +0000 (11:42 +0200)]
Bluetooth: iso: Fix circular lock in iso_conn_big_sync
This fixes the circular locking dependency warning below, by reworking
iso_sock_recvmsg, to ensure that the socket lock is always released
before calling a function that locks hdev.
[ 561.670344] ======================================================
[ 561.670346] WARNING: possible circular locking dependency detected
[ 561.670349] 6.12.0-rc6+ #26 Not tainted
[ 561.670351] ------------------------------------------------------
[ 561.670353] iso-tester/3289 is trying to acquire lock:
[ 561.670355] ffff88811f600078 (&hdev->lock){+.+.}-{3:3},
at: iso_conn_big_sync+0x73/0x260 [bluetooth]
[ 561.670405]
but task is already holding lock:
[ 561.670407] ffff88815af58258 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0},
at: iso_sock_recvmsg+0xbf/0x500 [bluetooth]
[ 561.670450]
which lock already depends on the new lock.
Fixes: 07a9342b94a9 ("Bluetooth: ISO: Send BIG Create Sync via hci_sync") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Iulia Tanasescu [Mon, 9 Dec 2024 09:42:17 +0000 (11:42 +0200)]
Bluetooth: iso: Fix circular lock in iso_listen_bis
This fixes the circular locking dependency warning below, by
releasing the socket lock before enterning iso_listen_bis, to
avoid any potential deadlock with hdev lock.
[ 75.307983] ======================================================
[ 75.307984] WARNING: possible circular locking dependency detected
[ 75.307985] 6.12.0-rc6+ #22 Not tainted
[ 75.307987] ------------------------------------------------------
[ 75.307987] kworker/u81:2/2623 is trying to acquire lock:
[ 75.307988] ffff8fde1769da58 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO)
at: iso_connect_cfm+0x253/0x840 [bluetooth]
[ 75.308021]
but task is already holding lock:
[ 75.308022] ffff8fdd61a10078 (&hdev->lock)
at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth]
[ 75.308053]
which lock already depends on the new lock.
Fixes: 02171da6e86a ("Bluetooth: ISO: Add hcon for listening bis sk") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Frédéric Danis [Thu, 5 Dec 2024 15:51:59 +0000 (16:51 +0100)]
Bluetooth: SCO: Add support for 16 bits transparent voice setting
The voice setting is used by sco_connect() or sco_conn_defer_accept()
after being set by sco_sock_setsockopt().
The PCM part of the voice setting is used for offload mode through PCM
chipset port.
This commits add support for mSBC 16 bits offloading, i.e. audio data
not transported over HCI.
The BCM4349B1 supports 16 bits transparent data on its I2S port.
If BT_VOICE_TRANSPARENT is used when accepting a SCO connection, this
gives only garbage audio while using BT_VOICE_TRANSPARENT_16BIT gives
correct audio.
This has been tested with connection to iPhone 14 and Samsung S24.
Fixes: ad10b1a48754 ("Bluetooth: Add Bluetooth socket voice option") Signed-off-by: Frédéric Danis <frederic.danis@collabora.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Iulia Tanasescu [Wed, 4 Dec 2024 12:28:49 +0000 (14:28 +0200)]
Bluetooth: iso: Fix recursive locking warning
This updates iso_sock_accept to use nested locking for the parent
socket, to avoid lockdep warnings caused because the parent and
child sockets are locked by the same thread:
[ 41.585683] ============================================
[ 41.585688] WARNING: possible recursive locking detected
[ 41.585694] 6.12.0-rc6+ #22 Not tainted
[ 41.585701] --------------------------------------------
[ 41.585705] iso-tester/3139 is trying to acquire lock:
[ 41.585711] ffff988b29530a58 (sk_lock-AF_BLUETOOTH)
at: bt_accept_dequeue+0xe3/0x280 [bluetooth]
[ 41.585905]
but task is already holding lock:
[ 41.585909] ffff988b29533a58 (sk_lock-AF_BLUETOOTH)
at: iso_sock_accept+0x61/0x2d0 [bluetooth]
[ 41.586064]
other info that might help us debug this:
[ 41.586069] Possible unsafe locking scenario:
Iulia Tanasescu [Wed, 4 Dec 2024 12:28:48 +0000 (14:28 +0200)]
Bluetooth: iso: Always release hdev at the end of iso_listen_bis
Since hci_get_route holds the device before returning, the hdev
should be released with hci_dev_put at the end of iso_listen_bis
even if the function returns with an error.
Fixes: 02171da6e86a ("Bluetooth: ISO: Add hcon for listening bis sk") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Luiz Augusto von Dentz [Wed, 4 Dec 2024 16:40:59 +0000 (11:40 -0500)]
Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating
The usage of rcu_read_(un)lock while inside list_for_each_entry_rcu is
not safe since for the most part entries fetched this way shall be
treated as rcu_dereference:
Note that the value returned by rcu_dereference() is valid
only within the enclosing RCU read-side critical section [1]_.
For example, the following is **not** legal::
rcu_read_lock();
p = rcu_dereference(head.next);
rcu_read_unlock();
x = p->address; /* BUG!!! */
rcu_read_lock();
y = p->data; /* BUG!!! */
rcu_read_unlock();
Fixes: a0bfde167b50 ("Bluetooth: ISO: Add support for connecting multiple BISes") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Paolo Abeni [Thu, 12 Dec 2024 12:11:38 +0000 (13:11 +0100)]
Merge tag 'nf-24-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix bogus test reports in rpath.sh selftest by adding permanent
neighbor entries, from Phil Sutter.
2) Lockdep reports possible ABBA deadlock in xt_IDLETIMER, fix it by
removing sysfs out of the mutex section, also from Phil Sutter.
3) It is illegal to release basechain via RCU callback, for several
reasons. Keep it simple and safe by calling synchronize_rcu() instead.
This is a partially reverting a botched recent attempt of me to fix
this basechain release path on netdevice removal.
From Florian Westphal.
netfilter pull request 24-12-11
* tag 'nf-24-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: do not defer rule destruction via call_rcu
netfilter: IDLETIMER: Fix for possible ABBA deadlock
selftests: netfilter: Stabilize rpath.sh
====================
Daniel Borkmann [Tue, 10 Dec 2024 14:12:45 +0000 (15:12 +0100)]
team: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
Similar to bonding driver, add NETIF_F_GSO_ENCAP_ALL to TEAM_VLAN_FEATURES
in order to support slave devices which propagate NETIF_F_GSO_UDP_TUNNEL &
NETIF_F_GSO_UDP_TUNNEL_CSUM as vlan_features.
Fixes: 3625920b62c3 ("teaming: fix vlan_features computing") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20241210141245.327886-5-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann [Tue, 10 Dec 2024 14:12:44 +0000 (15:12 +0100)]
team: Fix initial vlan_feature set in __team_compute_features
Similarly as with bonding, fix the calculation of vlan_features to reuse
netdev_base_features() in order derive the set in the same way as
ndo_fix_features before iterating through the slave devices to refine the
feature set.
Fixes: 3625920b62c3 ("teaming: fix vlan_features computing") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20241210141245.327886-4-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann [Tue, 10 Dec 2024 14:12:43 +0000 (15:12 +0100)]
bonding: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
Drivers like mlx5 expose NIC's vlan_features such as
NETIF_F_GSO_UDP_TUNNEL & NETIF_F_GSO_UDP_TUNNEL_CSUM which are
later not propagated when the underlying devices are bonded and
a vlan device created on top of the bond.
Right now, the more cumbersome workaround for this is to create
the vlan on top of the mlx5 and then enslave the vlan devices
to a bond.
To fix this, add NETIF_F_GSO_ENCAP_ALL to BOND_VLAN_FEATURES
such that bond_compute_features() can probe and propagate the
vlan_features from the slave devices up to the vlan device.
# ethtool -k enp2s0f0np0 | grep udp
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: on
rx-udp-gro-forwarding: off
# ethtool -k enp2s0f1np1 | grep udp
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: on
rx-udp-gro-forwarding: off
# ethtool -k bond0 | grep udp
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: off [fixed]
rx-udp-gro-forwarding: off
Before:
# ethtool -k bond0.100 | grep udp
tx-udp_tnl-segmentation: off [requested on]
tx-udp_tnl-csum-segmentation: off [requested on]
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: off [fixed]
rx-udp-gro-forwarding: off
After:
# ethtool -k bond0.100 | grep udp
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: off [fixed]
rx-udp-gro-forwarding: off
Various users have run into this reporting performance issues when
configuring Cilium in vxlan tunneling mode and having the combination
of bond & vlan for the core devices connecting the Kubernetes cluster
to the outside world.
Fixes: a9b3ace44c7d ("bonding: fix vlan_features computing") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20241210141245.327886-3-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann [Tue, 10 Dec 2024 14:12:42 +0000 (15:12 +0100)]
bonding: Fix initial {vlan,mpls}_feature set in bond_compute_features
If a bonding device has slave devices, then the current logic to derive
the feature set for the master bond device is limited in that flags which
are fully supported by the underlying slave devices cannot be propagated
up to vlan devices which sit on top of bond devices. Instead, these get
blindly masked out via current NETIF_F_ALL_FOR_ALL logic.
vlan_features and mpls_features should reuse netdev_base_features() in
order derive the set in the same way as ndo_fix_features before iterating
through the slave devices to refine the feature set.
Fixes: a9b3ace44c7d ("bonding: fix vlan_features computing") Fixes: 2e770b507ccd ("net: bonding: Inherit MPLS features from slave devices") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20241210141245.327886-2-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Both bonding and team driver have logic to derive the base feature
flags before iterating over their slave devices to refine the set
via netdev_increment_features().
Add a small helper netdev_base_features() so this can be reused
instead of having it open-coded multiple times.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241210141245.327886-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Martin Ottens [Tue, 10 Dec 2024 13:14:11 +0000 (14:14 +0100)]
net/sched: netem: account for backlog updates from child qdisc
In general, 'qlen' of any classful qdisc should keep track of the
number of packets that the qdisc itself and all of its children holds.
In case of netem, 'qlen' only accounts for the packets in its internal
tfifo. When netem is used with a child qdisc, the child qdisc can use
'qdisc_tree_reduce_backlog' to inform its parent, netem, about created
or dropped SKBs. This function updates 'qlen' and the backlog statistics
of netem, but netem does not account for changes made by a child qdisc.
'qlen' then indicates the wrong number of packets in the tfifo.
If a child qdisc creates new SKBs during enqueue and informs its parent
about this, netem's 'qlen' value is increased. When netem dequeues the
newly created SKBs from the child, the 'qlen' in netem is not updated.
If 'qlen' reaches the configured sch->limit, the enqueue function stops
working, even though the tfifo is not full.
Reproduce the bug:
Ensure that the sender machine has GSO enabled. Configure netem as root
qdisc and tbf as its child on the outgoing interface of the machine
as follows:
$ tc qdisc add dev <oif> root handle 1: netem delay 100ms limit 100
$ tc qdisc add dev <oif> parent 1:0 tbf rate 50Mbit burst 1542 latency 50ms
Send bulk TCP traffic out via this interface, e.g., by running an iPerf3
client on the machine. Check the qdisc statistics:
$ tc -s qdisc show dev <oif>
Statistics after 10s of iPerf3 TCP test before the fix (note that
netem's backlog > limit, netem stopped accepting packets):
qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
Sent 2767766 bytes 1848 pkt (dropped 652, overlimits 0 requeues 0)
backlog 4294528236b 1155p requeues 0
qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
Sent 2767766 bytes 1848 pkt (dropped 327, overlimits 7601 requeues 0)
backlog 0b 0p requeues 0
tbf segments the GSO SKBs (tbf_segment) and updates the netem's 'qlen'.
The interface fully stops transferring packets and "locks". In this case,
the child qdisc and tfifo are empty, but 'qlen' indicates the tfifo is at
its limit and no more packets are accepted.
This patch adds a counter for the entries in the tfifo. Netem's 'qlen' is
only decreased when a packet is returned by its dequeue function, and not
during enqueuing into the child qdisc. External updates to 'qlen' are thus
accounted for and only the behavior of the backlog statistics changes. As
in other qdiscs, 'qlen' then keeps track of how many packets are held in
netem and all of its children. As before, sch->limit remains as the
maximum number of packets in the tfifo. The same applies to netem's
backlog statistics.
Jakub Kicinski [Thu, 12 Dec 2024 04:25:59 +0000 (20:25 -0800)]
Merge tag 'batadv-net-pullrequest-20241210' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- fix TT unitialized data and size limit issues, by Remi Pommarel
(3 patches)
* tag 'batadv-net-pullrequest-20241210' of git://git.open-mesh.org/linux-merge:
batman-adv: Do not let TT changes list grows indefinitely
batman-adv: Remove uninitialized data in full table TT response
batman-adv: Do not send uninitialized TT changes
====================
Vladimir Oltean [Tue, 10 Dec 2024 13:26:40 +0000 (15:26 +0200)]
net: dsa: felix: fix stuck CPU-injected packets with short taprio windows
With this port schedule:
tc qdisc replace dev $send_if parent root handle 100 taprio \
num_tc 8 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
map 0 1 2 3 4 5 6 7 \
base-time 0 cycle-time 10000 \
sched-entry S 01 1250 \
sched-entry S 02 1250 \
sched-entry S 04 1250 \
sched-entry S 08 1250 \
sched-entry S 10 1250 \
sched-entry S 20 1250 \
sched-entry S 40 1250 \
sched-entry S 80 1250 \
flags 2
ptp4l would fail to take TX timestamps of Pdelay_Resp messages like this:
increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
ptp4l[4134.168]: port 2: send peer delay response failed
It turns out that the driver can't take their TX timestamps because it
can't transmit them in the first place. And there's nothing special
about the Pdelay_Resp packets - they're just regular 68 byte packets.
But with this taprio configuration, the switch would refuse to send even
the ETH_ZLEN minimum packet size.
This should have definitely not been the case. When applying the taprio
config, the driver prints:
mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
and thus, everything under 132 bytes - ETH_FCS_LEN should have been sent
without problems. Yet it's not.
For the forwarding path, the configuration is fine, yet packets injected
from Linux get stuck with this schedule no matter what.
The first hint that the static guard bands are the cause of the problem
is that reverting Michael Walle's commit 297c4de6f780 ("net: dsa: felix:
re-enable TAS guard band mode") made things work. It must be that the
guard bands are calculated incorrectly.
I remembered that there is a magic constant in the driver, set to 33 ns
for no logical reason other than experimentation, which says "never let
the static guard bands get so large as to leave less than this amount of
remaining space in the time slot, because the queue system will refuse
to schedule packets otherwise, and they will get stuck". I had a hunch
that my previous experimentally-determined value was only good for
packets coming from the forwarding path, and that the CPU injection path
needed more.
I came to the new value of 35 ns through binary search, after seeing
that with 544 ns (the bit time required to send the Pdelay_Resp packet
at gigabit) it works. Again, this is purely experimental, there's no
logic and the manual doesn't say anything.
The new driver prints for this schedule look like this:
mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
So yes, the maximum MTU is now even smaller by 1 byte than before.
This is maybe counter-intuitive, but makes more sense with a diagram of
one time slot.
Before:
Gate open Gate close
| |
v 1250 ns total time slot duration v
<---------------------------------------------------->
<----><---------------------------------------------->
33 ns 1217 ns static guard band
useful
Gate open Gate close
| |
v 1250 ns total time slot duration v
<---------------------------------------------------->
<-----><--------------------------------------------->
35 ns 1215 ns static guard band
useful
The static guard band implemented by this switch hardware directly
determines the maximum allowable MTU for that traffic class. The larger
it is, the earlier the switch will stop scheduling frames for
transmission, because otherwise they might overrun the gate close time
(and avoiding that is the entire purpose of Michael's patch).
So, we now have guard bands smaller by 2 ns, thus, in this particular
case, we lose a byte of the maximum MTU.
Fixes: 11afdc6526de ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Michael Walle <mwalle@kernel.org> Link: https://patch.msgid.link/20241210132640.3426788-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Frederik Deweerdt [Tue, 10 Dec 2024 05:06:48 +0000 (21:06 -0800)]
splice: do not checksum AF_UNIX sockets
When `skb_splice_from_iter` was introduced, it inadvertently added
checksumming for AF_UNIX sockets. This resulted in significant
slowdowns, for example when using sendfile over unix sockets.
Using the test code in [1] in my test setup (2G single core qemu),
the client receives a 1000M file in:
- without the patch: 1482ms (+/- 36ms)
- with the patch: 652.5ms (+/- 22.9ms)
This commit addresses the issue by marking checksumming as unnecessary in
`unix_stream_sendmsg`
Cc: stable@vger.kernel.org Signed-off-by: Frederik Deweerdt <deweerdt.lkml@gmail.com> Fixes: 2e910b95329c ("net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES") Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/Z1fMaHkRf8cfubuE@xiberoa Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Levitsky [Mon, 9 Dec 2024 17:57:50 +0000 (12:57 -0500)]
net: mana: Fix memory leak in mana_gd_setup_irqs
Commit 8afefc361209 ("net: mana: Assigning IRQ affinity on HT cores")
added memory allocation in mana_gd_setup_irqs of 'irqs' but the code
doesn't free this temporary array in the success path.
This was caught by kmemleak.
Fixes: 8afefc361209 ("net: mana: Assigning IRQ affinity on HT cores") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Link: https://patch.msgid.link/20241209175751.287738-2-mlevitsk@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arnaldo Carvalho de Melo [Wed, 11 Dec 2024 15:24:21 +0000 (12:24 -0300)]
tools build: Remove the libunwind feature tests from the ones detected when test-all.o builds
We have a tools/build/feature/test-all.c that has the most common set of
features that perf uses and are expected to have its development files
available when building perf.
When we made libwunwind opt-in we forgot to remove them from the list of
features that are assumed to be available when test-all.c builds, remove
them.
extern int UNW_OBJ(dwarf_search_unwind_table) (unw_addr_space_t as,
$
After this patch:
$ grep feature-libunwind- /tmp/b/FEATURE-DUMP
$
Now an audit on what is being enabled when test-all.c builds will be
performed.
Fixes: 176c9d1e6a06f2fa ("tools features: Don't check for libunwind devel files by default") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Florian Westphal [Sat, 7 Dec 2024 11:14:48 +0000 (12:14 +0100)]
netfilter: nf_tables: do not defer rule destruction via call_rcu
nf_tables_chain_destroy can sleep, it can't be used from call_rcu
callbacks.
Moreover, nf_tables_rule_release() is only safe for error unwinding,
while transaction mutex is held and the to-be-desroyed rule was not
exposed to either dataplane or dumps, as it deactives+frees without
the required synchronize_rcu() in-between.
nft_rule_expr_deactivate() callbacks will change ->use counters
of other chains/sets, see e.g. nft_lookup .deactivate callback, these
must be serialized via transaction mutex.
Also add a few lockdep asserts to make this more explicit.
Calling synchronize_rcu() isn't ideal, but fixing this without is hard
and way more intrusive. As-is, we can get:
In case the synchronize_rcu becomes an issue, we can explore alternatives.
One way would be to allocate nft_trans_rule objects + one nft_trans_chain
object, deactivate the rules + the chain and then defer the freeing to the
nft destroy workqueue. We'd still need to keep the synchronize_rcu path as
a fallback to handle -ENOMEM corner cases though.
Reported-by: syzbot+b26935466701e56cfdc2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67478d92.050a0220.253251.0062.GAE@google.com/T/ Fixes: c03d278fdf35 ("netfilter: nf_tables: wait for rcu grace period on net_device removal") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Fri, 6 Dec 2024 14:08:40 +0000 (15:08 +0100)]
selftests: netfilter: Stabilize rpath.sh
On some systems, neighbor discoveries from ns1 for fec0:42::1 (i.e., the
martian trap address) would happen at the wrong time and cause
false-negative test result.
Problem analysis also discovered that IPv6 martian ping test was broken
in that sent neighbor discoveries, not echo requests were inadvertently
trapped
Avoid the race condition by introducing the neighbors to each other
upfront. Also pin down the firewall rules to matching on echo requests
only.
Fixes: efb056e5f1f0 ("netfilter: ip6t_rpfilter: Fix regression with VRF interfaces") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It turns out that we can't do this, because while the old behavior of
ignoring ignorable code points was most definitely wrong, we have
case-folding filesystems with on-disk hash values with that wrong
behavior.
So now you can't look up those names, because they hash to something
different.
Of course, it's also entirely possible that in the meantime people have
created *new* files with the new ("more correct") case folding logic,
and reverting will just make other things break.
The correct solution is to not do case folding in filesystems, but
sadly, people seem to never really understand that. People still see it
as a feature, not a bug.
Reported-by: Qi Han <hanqi@vivo.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586 Cc: Gabriel Krisman Bertazi <krisman@suse.de> Requested-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 11 Dec 2024 21:48:25 +0000 (13:48 -0800)]
Merge tag 'vfio-v6.13-rc3' of https://github.com/awilliam/linux-vfio
Pull vfio fix from Alex Williamson:
- Fix migration dirty page tracking support in the mlx5-vfio-pci
variant driver in configurations where the system page size exceeds
the device maximum message size, and anticipate device updates where
the opposite may also be required (Yishai Hadas)
* tag 'vfio-v6.13-rc3' of https://github.com/awilliam/linux-vfio:
vfio/mlx5: Align the page tracking max message size with the device capability
Linus Torvalds [Wed, 11 Dec 2024 21:41:41 +0000 (13:41 -0800)]
Merge tag 'linux_kselftest-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fix from Shuah Khan:
- fix the offset for kprobe syntax error test case when checking the
BTF arguments on 64-bit powerpc
* tag 'linux_kselftest-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftrace: adjust offset for kprobe syntax error test
James Clark [Thu, 14 Nov 2024 16:04:48 +0000 (16:04 +0000)]
libperf: evlist: Fix --cpu argument on hybrid platform
Since the linked fixes: commit, specifying a CPU on hybrid platforms
results in an error because Perf tries to open an extended type event
on "any" CPU which isn't valid. Extended type events can only be opened
on CPUs that match the type.
Before (working):
$ perf record --cpu 1 -- true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.385 MB perf.data (7 samples) ]
After (not working):
$ perf record -C 1 -- true
WARNING: A requested CPU in '1' is not supported by PMU 'cpu_atom' (CPUs 16-27) for event 'cycles:P'
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cpu_atom/cycles:P/).
/bin/dmesg | grep -i perf may provide additional information.
(Ignore the warning message, that's expected and not particularly
relevant to this issue).
This is because perf_cpu_map__intersect() of the user specified CPU (1)
and one of the PMU's CPUs (16-27) correctly results in an empty (NULL)
CPU map. However for the purposes of opening an event, libperf converts
empty CPU maps into an any CPU (-1) which the kernel rejects.
Fix it by deleting evsels with empty CPU maps in the specific case where
user requested CPU maps are evaluated.
Fixes: 251aa040244a ("perf parse-events: Wildcard most "numeric" events") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20241114160450.295844-2-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ian Rogers [Thu, 5 Dec 2024 02:23:05 +0000 (18:23 -0800)]
perf test expr: Fix system_tsc_freq for only x86
The refactoring of tool PMU events to have a PMU then adding the expr
literals to the tool PMU made it so that the literal system_tsc_freq
was only supported on x86. Update the test expectations to match -
namely the parsing is x86 specific and only yields a non-zero value on
Intel.
Fixes: 609aa2667f67 ("perf tool_pmu: Switch to standard pmu functions and json descriptions") Reported-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Closes: https://lore.kernel.org/linux-perf-users/20241022140156.98854-1-atrajeev@linux.vnet.ibm.com/ Co-developed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: akanksha@linux.ibm.com Cc: hbathini@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241205022305.158202-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Hari Bathini [Fri, 29 Nov 2024 20:26:21 +0000 (01:56 +0530)]
selftests/ftrace: adjust offset for kprobe syntax error test
In 'NOFENTRY_ARGS' test case for syntax check, any offset X of
`vfs_read+X` except function entry offset (0) fits the criterion,
even if that offset is not at instruction boundary, as the parser
comes before probing. But with "ENDBR64" instruction on x86, offset
4 is treated as function entry. So, X can't be 4 as well. Thus, 8
was used as offset for the test case. On 64-bit powerpc though, any
offset <= 16 can be considered function entry depending on build
configuration (see arch_kprobe_on_func_entry() for implementation
details). So, use `vfs_read+20` to accommodate that scenario too.
Link: https://lore.kernel.org/r/20241129202621.721159-1-hbathini@linux.ibm.com Fixes: 4231f30fcc34a ("selftests/ftrace: Add BTF arguments test cases") Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Michal Luczaj [Tue, 19 Nov 2024 13:31:40 +0000 (14:31 +0100)]
Bluetooth: Improve setsockopt() handling of malformed user input
The bt_copy_from_sockptr() return value is being misinterpreted by most
users: a non-zero result is mistakenly assumed to represent an error code,
but actually indicates the number of bytes that could not be copied.
Remove bt_copy_from_sockptr() and adapt callers to use
copy_safe_from_sockptr().
For sco_sock_setsockopt() (case BT_CODEC) use copy_struct_from_sockptr() to
scrub parts of uninitialized buffer.
Opportunistically, rename `len` to `optlen` in hci_sock_setsockopt_old()
and hci_sock_setsockopt().
Fixes: 51eda36d33e4 ("Bluetooth: SCO: Fix not validating setsockopt user input") Fixes: a97de7bff13b ("Bluetooth: RFCOMM: Fix not validating setsockopt user input") Fixes: 4f3951242ace ("Bluetooth: L2CAP: Fix not validating setsockopt user input") Fixes: 9e8742cdfc4b ("Bluetooth: ISO: Fix not validating setsockopt user input") Fixes: b2186061d604 ("Bluetooth: hci_sock: Fix not validating setsockopt user input") Reviewed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: David Wei <dw@davidwei.uk> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Nikita Yushchenko [Mon, 9 Dec 2024 11:32:04 +0000 (16:32 +0500)]
net: renesas: rswitch: handle stop vs interrupt race
Currently the stop routine of rswitch driver does not immediately
prevent hardware from continuing to update descriptors and requesting
interrupts.
It can happen that when rswitch_stop() executes the masking of
interrupts from the queues of the port being closed, napi poll for
that port is already scheduled or running on a different CPU. When
execution of this napi poll completes, it will unmask the interrupts.
And unmasked interrupt can fire after rswitch_stop() returns from
napi_disable() call. Then, the handler won't mask it, because
napi_schedule_prep() will return false, and interrupt storm will
happen.
This can't be fixed by making rswitch_stop() call napi_disable() before
masking interrupts. In this case, the interrupt storm will happen if
interrupt fires between napi_disable() and masking.
Fix this by checking for priv->opened_ports bit when unmasking
interrupts after napi poll. For that to be consistent, move
priv->opened_ports changes into spinlock-protected areas, and reorder
other operations in rswitch_open() and rswitch_stop() accordingly.
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Link: https://patch.msgid.link/20241209113204.175015-1-nikita.yoush@cogentembedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikita Yushchenko [Sun, 8 Dec 2024 09:50:04 +0000 (14:50 +0500)]
net: renesas: rswitch: avoid use-after-put for a device tree node
The device tree node saved in the rswitch_device structure is used at
several driver locations. So passing this node to of_node_put() after
the first use is wrong.
Nikita Yushchenko [Sun, 8 Dec 2024 09:50:03 +0000 (14:50 +0500)]
net: renesas: rswitch: fix leaked pointer on error path
If error path is taken while filling descriptor for a frame, skb
pointer is left in the entry. Later, on the ring entry reuse, the
same entry could be used as a part of a multi-descriptor frame,
and skb for that new frame could be stored in a different entry.
Then, the stale pointer will reach the completion routine, and passed
to the release operation.
Fix that by clearing the saved skb pointer at the error path.
Nikita Yushchenko [Sun, 8 Dec 2024 09:50:02 +0000 (14:50 +0500)]
net: renesas: rswitch: fix race window between tx start and complete
If hardware is already transmitting, it can start handling the
descriptor being written to immediately after it observes updated DT
field, before the queue is kicked by a write to GWTRC.
If the start_xmit() execution is preempted at unfortunate moment, this
transmission can complete, and interrupt handled, before gq->cur gets
updated. With the current implementation of completion, this will cause
the last entry not completed.
Fix that by changing completion loop to check DT values directly, instead
of depending on gq->cur.
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://patch.msgid.link/20241208095004.69468-3-nikita.yoush@cogentembedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikita Yushchenko [Sun, 8 Dec 2024 09:50:01 +0000 (14:50 +0500)]
net: renesas: rswitch: fix possible early skb release
When sending frame split into multiple descriptors, hardware processes
descriptors one by one, including writing back DT values. The first
descriptor could be already marked as completed when processing of
next descriptors for the same frame is still in progress.
Although only the last descriptor is configured to generate interrupt,
completion of the first descriptor could be noticed by the driver when
handling interrupt for the previous frame.
Currently, driver stores skb in the entry that corresponds to the first
descriptor. This results into skb could be unmapped and freed when
hardware did not complete the send yet. This opens a window for
corrupting the data being sent.
Fix this by saving skb in the entry that corresponds to the last
descriptor used to send the frame.
Jakub Kicinski [Wed, 11 Dec 2024 02:44:24 +0000 (18:44 -0800)]
Merge tag 'wireless-2024-12-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
A small set of fixes:
- avoid CSA warnings during link removal
(by changing link bitmap after remove)
- fix # of spatial streams initialisation
- fix queues getting stuck in some CSA cases
and resume failures
- fix interface address when switching monitor mode
- fix MBSS change flags 32-bit stack corruption
- more UBSAN __counted_by "fixes" ...
- fix link ID netlink validation
* tag 'wireless-2024-12-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: sme: init n_channels before channels[] access
wifi: mac80211: fix station NSS capability initialization order
wifi: mac80211: fix vif addr when switching from monitor to station
wifi: mac80211: fix a queue stall in certain cases of CSA
wifi: mac80211: wake the queues in case of failure in resume
wifi: cfg80211: clear link ID from bitmap during link delete after clean up
wifi: mac80211: init cnt before accessing elem in ieee80211_copy_mbssid_beacon
wifi: mac80211: fix mbss changed flags corruption on 32 bit systems
wifi: nl80211: fix NL80211_ATTR_MLO_LINK_ID off-by-one
====================
Petr Machata [Mon, 9 Dec 2024 11:05:31 +0000 (12:05 +0100)]
Documentation: networking: Add a caveat to nexthop_compat_mode sysctl
net.ipv4.nexthop_compat_mode was added when nexthop objects were added to
provide the view of nexthop objects through the usual lens of the route
UAPI. As nexthop objects evolved, the information provided through this
lens became incomplete. For example, details of resilient nexthop groups
are obviously omitted.
Now that 16-bit nexthop group weights are a thing, the 8-bit UAPI cannot
convey the >8-bit weight accurately. Instead of inventing workarounds for
an obsolete interface, just document the expectations of inaccuracy.
Michael Chan [Mon, 9 Dec 2024 01:54:48 +0000 (17:54 -0800)]
bnxt_en: Fix aggregation ID mask to prevent oops on 5760X chips
The 5760X (P7) chip's HW GRO/LRO interface is very similar to that of
the previous generation (5750X or P5). However, the aggregation ID
fields in the completion structures on P7 have been redefined from
16 bits to 12 bits. The freed up 4 bits are redefined for part of the
metadata such as the VLAN ID. The aggregation ID mask was not modified
when adding support for P7 chips. Including the extra 4 bits for the
aggregation ID can potentially cause the driver to store or fetch the
packet header of GRO/LRO packets in the wrong TPA buffer. It may hit
the BUG() condition in __skb_pull() because the SKB contains no valid
packet header:
Fix it by redefining the aggregation ID mask for P5_PLUS chips to be
12 bits. This will work because the maximum aggregation ID is less
than 4096 on all P5_PLUS chips.
Fixes: 13d2d3d381ee ("bnxt_en: Add new P7 hardware interface definitions") Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241209015448.1937766-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 11 Dec 2024 02:21:40 +0000 (18:21 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Two reverts and two EN7581 driver fixes:
- Revert the attempt to make CLK_GET_RATE_NOCACHE flag work in
clk_set_rate() because it led to problems with the Qualcomm CPUFreq
driver
- Revert Amlogic reset driver back to the initial implementation.
This broke probe of the audio subsystem on axg based platforms and
also had compilation problems. We'll try again next time.
- Fix a clk frequency and fix array bounds runtime checks in the
Airoha EN7581 driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: en7523: Initialize num before accessing hws in en7523_register_clocks()
clk: en7523: Fix wrong BUS clock for EN7581
clk: amlogic: axg-audio: revert reset implementation
Revert "clk: Fix invalid execution of clk_set_rate"
Linus Torvalds [Wed, 11 Dec 2024 02:18:01 +0000 (18:18 -0800)]
Merge tag 'for-6.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes. Apart from the one liners and updated bio splitting
error handling there's a fix for subvolume mount with different flags.
This was known and fixed for some time but I've delayed it to give it
more testing.
- fix unbalanced locking when swapfile activation fails when the
subvolume gets deleted in the meantime
- add btrfs error handling after bio_split() calls that got error
handling recently
- during unmount, flush delalloc workers at the right time before the
cleaner thread is shut down
- fix regression in buffered write folio conversion, explicitly wait
for writeback as FGP_STABLE flag is currently a no-op on btrfs
- handle race in subvolume mount with different flags, the conversion
to the new mount API did not handle the case where multiple
subvolumes get mounted in parallel, which is a distro use case"
* tag 'for-6.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: flush delalloc workers queue before stopping cleaner kthread during unmount
btrfs: handle bio_split() errors
btrfs: properly wait for writeback before buffered write
btrfs: fix missing snapshot drew unlock when root is dead during swap activation
btrfs: fix mount failure due to remount races
Linus Torvalds [Wed, 11 Dec 2024 02:15:25 +0000 (18:15 -0800)]
Merge tag 'probes-fixes-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull eprobes fix from Masami Hiramatsu:
- release eprobe when failing to add dyn_event.
This unregisters event call and release eprobe when it fails to add a
dynamic event. Found in cleaning up.
* tag 'probes-fixes-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/eprobe: Fix to release eprobe when failed to add dyn_event
ksmbd is trying to set the atime and mtime via notify_change without also
setting the ctime. so This patch add ATTR_CTIME flags when setting mtime
to avoid a warning.
Reported-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Hobin Woo [Thu, 5 Dec 2024 02:31:19 +0000 (11:31 +0900)]
ksmbd: retry iterate_dir in smb2_query_dir
Some file systems do not ensure that the single call of iterate_dir
reaches the end of the directory. For example, FUSE fetches entries from
a daemon using 4KB buffer and stops fetching if entries exceed the
buffer. And then an actor of caller, KSMBD, is used to fill the entries
from the buffer.
Thus, pattern searching on FUSE, files located after the 4KB could not
be found and STATUS_NO_SUCH_FILE was returned.
Signed-off-by: Hobin Woo <hobin.woo@samsung.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Tested-by: Yoonho Shin <yoonho.shin@samsung.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Zhongqiu Han [Thu, 5 Dec 2024 08:45:00 +0000 (16:45 +0800)]
perf bpf: Fix two memory leakages when calling perf_env__insert_bpf_prog_info()
If perf_env__insert_bpf_prog_info() returns false due to a duplicate bpf
prog info node insertion, the temporary info_node and info_linear memory
will leak. Add a check to ensure the memory is freed if the function
returns false.
Fixes: d56354dc49091e33 ("perf tools: Save bpf_prog_info and BTF of new BPF programs") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241205084500.823660-4-quic_zhonhan@quicinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Zhongqiu Han [Thu, 5 Dec 2024 08:44:59 +0000 (16:44 +0800)]
perf header: Fix one memory leakage in process_bpf_prog_info()
Function __perf_env__insert_bpf_prog_info() will return without inserting
bpf prog info node into perf env again due to a duplicate bpf prog info
node insertion, causing the temporary info_linear and info_node memory to
leak. Modify the return type of this function to bool and add a check to
ensure the memory is freed if the function returns false.
Fixes: 606f972b1361f477 ("perf bpf: Save bpf_prog_info information as headers to perf.data") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241205084500.823660-3-quic_zhonhan@quicinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Zhongqiu Han [Thu, 5 Dec 2024 08:44:58 +0000 (16:44 +0800)]
perf header: Fix one memory leakage in process_bpf_btf()
If __perf_env__insert_btf() returns false due to a duplicate btf node
insertion, the temporary node will leak. Add a check to ensure the memory
is freed if the function returns false.
Fixes: a70a1123174ab592 ("perf bpf: Save BTF information as headers to perf.data") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241205084500.823660-2-quic_zhonhan@quicinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 13 Nov 2024 16:55:58 +0000 (08:55 -0800)]
perf jevents: Fix build issue in '*/' in event descriptions
For big string offsets we output comments for what string the offset
is for. If the string contains a '*/' as seen in Intel Arrowlake event
descriptions, then this causes C parsing issues for the generated
pmu-events.c. Catch such '*/' values and escape to avoid this.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20241113165558.628856-1-irogers@google.com
[ Used return s.replace('*/', r'\*\/') based on failure followed by request by Ian ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Veronika Molnarova [Tue, 29 Oct 2024 14:43:47 +0000 (15:43 +0100)]
perf test: Parse 'perf stat' Topdown events for aarch64
The 'perf stat' output on aarch64 machines with topdown events wasn't
counted for in the 'perf stat STD output linter' test case. Add the
topdown metric to the skip_metric list as it is done for topdown events
on other systems.
The Topdown events are also disabled on aarch64 KVM guests because the
value of caps/slots is set to 0 due to the part of the system register
being a stub.
This prevents the metric for the topdown events from being computed,
leaving the 'perf stat' topdown metric without any value at all.
Add the "TopdownL1" to the skip_metric list as well to handle this
possibility.
Before aarch64:
100: perf stat STD output linter:
--- start ---
test child forked, pid 403305
Checking STD output: no args Unknown event name in TopdownL1 # 4.3 percent of slots slots_lost_misspeculation_fraction
---- end(-1) ----
100: perf stat STD output linter : FAILED!
Before aarch64 KVM:
100: perf stat STD output linter:
--- start ---
test child forked, pid 404671
Checking STD output: no args Unknown event name in TopdownL1
---- end(-1) ----
100: perf stat STD output linter : FAILED!
After:
100: perf stat STD output linter:
--- start ---
test child forked, pid 404777
Checking STD output: no args [Success]
Checking STD output: system wide [Success]
Checking STD output: interval [Success]
Checking STD output: per thread [Success]
Checking STD output: per node [Success]
Checking STD output: system wide no aggregation [Success]
Checking STD output: per core [Success]
Checking STD output: per cache instance [Success]
Checking STD output: per cluster [Success]
Checking STD output: per die [Success]
Checking STD output: per socket [Success]
---- end(0) ----
100: perf stat STD output linter : Ok
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241029144347.25651-1-vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Gabriele Monaco [Tue, 12 Nov 2024 18:12:14 +0000 (15:12 -0300)]
perf ftrace latency: Add --max-latency option
This patch adds a max-latency option as discussed, in case the number of
buckets is more than 22, we don't observe the setting (for now, let's
say).
By default or if 0 is passed, the value is automatically determined
based on the number of buckets, range and minimum, so that we fill all
available buffers (equivalent to the behaviour before this patch).
We now get something like this:
# perf ftrace latency --bucket-range=20 \
--min-latency 10 \
--max-latency=100 \
-T switch_mm_irqs_off -a sleep 2
# DURATION | COUNT | GRAPH |
0 - 10 us | 1731 | ################ |
10 - 30 us | 1 | |
30 - 50 us | 0 | |
50 - 70 us | 0 | |
70 - 90 us | 0 | |
90 - 100 us | 0 | |
100 - ... us | 0 | |
Note the maximum is observed also if it doesn't cover completely a full
range (the second to last range is 10us long to let the last start at
100 sharp), this looks to me more sensible and eases the computations,
since we don't need to account for the range while filling the buckets.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now we want focus on the latencies starting at 1.2us, with a finer
grained range of 20ns:
This is all on a live system, so statistically interesting, but not
narrowing down on the same numbers, so a 'perf ftrace latency record'
seems interesting to then use all on the same snapshot of latencies.
A --max-latency counterpart should come next, at first limiting the
max-latency to 20 * bucket-size, as we have a fixed buckets array with
20 + 2 entries (+ for the outliers) and thus would need to make it
larger for higher latencies.
We also may need a way to ask for not considering the out of range
values (first and last buckets) when drawing the buckets bars.
Co-developed-by: Gabriele Monaco <gmonaco@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-4-acme@kernel.org Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 12 Nov 2024 18:12:11 +0000 (15:12 -0300)]
perf ftrace latency: Pass ftrace pointer to histogram routines to pass more args
The ftrace->use_nsec arg is being passed to both make_historgram() and
display_histogram(), since another ftrace field will be passed to those
functions in a followup patch, make them look like other functions in
this codebase that receive the 'struct perf_ftrace' pointer.
No change in logic.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Masahiro Yamada [Mon, 2 Dec 2024 06:28:22 +0000 (15:28 +0900)]
openrisc: place exception table at the head of vmlinux
Since commit 0043ecea2399 ("vmlinux.lds.h: Adjust symbol ordering in
text output section"), the exception table in arch/openrisc/kernel/head.S
is no longer positioned at the very beginning of the kernel image, which
causes a boot failure.
Currently, the exception table resides in the regular .text section.
Previously, it was placed at the head by relying on the linker receiving
arch/openrisc/kernel/head.o as the first object. However, this behavior
has changed because sections like .text.{asan,unknown,unlikely,hot} now
precede the regular .text section.
The .head.text section is intended for entry points requiring special
placement. However, in OpenRISC, this section has been misused: instead
of the entry points, it contains boot code meant to be discarded after
booting. This feature is typically handled by the .init.text section.
This commit addresses the issue by replacing the current __HEAD marker
with __INIT and re-annotating the entry points with __HEAD. Additionally,
it adds __REF to entry.S to suppress the following modpost warning:
When virtnet_close is followed by virtnet_open, some TX completions can
possibly remain unconsumed, until they are finally processed during the
first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
[1]. Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
before RX napi enable") was not sufficient to eliminate all BQL crash
scenarios for virtio-net.
This issue can be reproduced with the latest net-next master by running:
`while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
TX load from inside the machine.
This patch series resolves the issue and also addresses similar existing
problems:
(a). Drop netdev_tx_reset_queue() from open/close path. This eliminates the
BQL crashes due to the problematic open/close path.
(b). As a result of (a), netdev_tx_reset_queue() is now explicitly required
in freeze/restore path. Add netdev_tx_reset_queue() immediately after
free_unused_bufs() invocation.
(c). Fix missing resetting in virtnet_tx_resize().
virtnet_tx_resize() has lacked proper resetting since commit c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits").
(d). Fix missing resetting in the XDP_SETUP_XSK_POOL path.
Similar to (c), this path lacked proper resetting. Call
netdev_tx_reset_queue() when virtqueue_reset() has actually recycled
unused buffers.
This patch series consists of six commits:
[1/6]: Resolves (a) and (b). # also -stable 6.11.y
[2/6]: Minor fix to make [4/6] streamlined.
[3/6]: Prerequisite for (c). # also -stable 6.11.y
[4/6]: Resolves (c) (incl. Prerequisite for (d)) # also -stable 6.11.y
[5/6]: Preresuisite for (d).
[6/6]: Resolves (d).
Changes for v4:
- move netdev_tx_reset_queue() out of free_unused_bufs()
- submit to net, not net-next
Changes for v3:
- replace 'flushed' argument with 'recycle_done'
Changes for v2:
- add tx queue resetting for (b) to (d) above
Koichiro Den [Fri, 6 Dec 2024 01:10:47 +0000 (10:10 +0900)]
virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx
virtnet_sq_bind_xsk_pool() flushes tx skbs and then resets tx queue, so
DQL counters need to be reset when flushing has actually occurred, Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_resize()
to handle this.
Fixes: 21a4e3ce6dc7 ("virtio_net: xsk: bind/unbind xsk for tx") Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Koichiro Den [Fri, 6 Dec 2024 01:10:46 +0000 (10:10 +0900)]
virtio_ring: add a func argument 'recycle_done' to virtqueue_reset()
When virtqueue_reset() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when it really occurs.
Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Koichiro Den [Fri, 6 Dec 2024 01:10:45 +0000 (10:10 +0900)]
virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize
virtnet_tx_resize() flushes remaining tx skbs, requiring DQL counters to
be reset when flushing has actually occurred. Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_reset() to
handle this.
Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits") Cc: <stable@vger.kernel.org> # v6.11+ Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Koichiro Den [Fri, 6 Dec 2024 01:10:44 +0000 (10:10 +0900)]
virtio_ring: add a func argument 'recycle_done' to virtqueue_resize()
When virtqueue_resize() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when the recycle really occurs.
Cc: <stable@vger.kernel.org> # v6.11+ Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Koichiro Den [Fri, 6 Dec 2024 01:10:43 +0000 (10:10 +0900)]
virtio_net: replace vq2rxq with vq2txq where appropriate
While not harmful, using vq2rxq where it's always sq appears odd.
Replace it with the more appropriate vq2txq for clarity and correctness.
Fixes: 89f86675cb03 ("virtio_net: xsk: tx: support xmit xsk buffer") Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Koichiro Den [Fri, 6 Dec 2024 01:10:42 +0000 (10:10 +0900)]
virtio_net: correct netdev_tx_reset_queue() invocation point
When virtnet_close is followed by virtnet_open, some TX completions can
possibly remain unconsumed, until they are finally processed during the
first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
[1]. Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
before RX napi enable") was not sufficient to eliminate all BQL crash
cases for virtio-net.
This issue can be reproduced with the latest net-next master by running:
`while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
TX load from inside the machine.
netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
the device is not stopped in any case. For BQL core part, it's just like
traffic nearly ceases to exist for some period. For stall detector added
to BQL, even if virtnet_close could somehow lead to some TX completions
delayed for long, followed by virtnet_open, we can just take it as stall
as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
based on BQL"). Note also that users can still reset stall_max via sysfs.
So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
eliminates the BQL crashes. As a result, netdev_tx_reset_queue() is now
explicitly required in freeze/restore path. This patch adds it to
immediately after free_unused_bufs(), following the rule of thumb:
netdev_tx_reset_queue() should follow any SKB freeing not followed by
netdev_tx_completed_queue(). This seems the most consistent and
streamlined approach, and now netdev_tx_reset_queue() runs whenever
free_unused_bufs() is done.
Geetha sowjanya [Thu, 5 Dec 2024 11:34:35 +0000 (17:04 +0530)]
octeontx2-af: Fix installation of PF multicast rule
Due to target variable is being reassigned in npc_install_flow()
function, PF multicast rules are not getting installed.
This patch addresses the issue by fixing the "IF" condition
checks when rules are installed by AF.
Stefan Wahren [Fri, 6 Dec 2024 18:46:43 +0000 (19:46 +0100)]
qca_spi: Make driver probing reliable
The module parameter qcaspi_pluggable controls if QCA7000 signature
should be checked at driver probe (current default) or not. Unfortunately
this could fail in case the chip is temporary in reset, which isn't under
total control by the Linux host. So disable this check per default
in order to avoid unexpected probe failures.
Stefan Wahren [Fri, 6 Dec 2024 18:46:42 +0000 (19:46 +0100)]
qca_spi: Fix clock speed for multiple QCA7000
Storing the maximum clock speed in module parameter qcaspi_clkspeed
has the unintended side effect that the first probed instance
defines the value for all other instances. Fix this issue by storing
it in max_speed_hz of the relevant SPI device.
This fix keeps the priority of the speed parameter (module parameter,
device tree property, driver default). Btw this uses the opportunity
to get the rid of the unused member clkspeed.
t4_set_vf_mac_acl() uses pf to set mac addr, but t4vf_get_vf_mac_acl()
uses port number to get mac addr, this leads to error when an attempt
to set MAC address on VF's of PF2 and PF3.
This patch fixes the issue by using port number to set mac address.
Fixes: e0cdac65ba26 ("cxgb4vf: configure ports accessible by the VF") Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241206062014.49414-1-anumula@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ian Rogers [Fri, 6 Dec 2024 04:23:06 +0000 (20:23 -0800)]
perf test hwmon_pmu: Fix event file location
The temp directory is made and a known fake hwmon PMU created within
it. Prior to this fix the events were being incorrectly written to the
temp directory rather than the fake PMU directory. This didn't impact
the test as the directory fd matched the wrong location, but it
doesn't mirror what a hwmon PMU would actually look like.
Ian Rogers [Fri, 6 Dec 2024 04:23:05 +0000 (20:23 -0800)]
perf hwmon_pmu: Use openat rather than dup to refresh directory
The hwmon PMU test will make a temp directory, open the directory with
O_DIRECTORY then fill it with contents. As the open is before the
filling the contents the later fdopendir may reflect the initial empty
state, meaning no events are seen. Change to re-open the directory,
rather than dup the fd, so the latest contents are seen.
Kuan-Wei Chiu [Mon, 9 Dec 2024 13:42:26 +0000 (21:42 +0800)]
perf ftrace: Fix undefined behavior in cmp_profile_data()
The comparison function cmp_profile_data() violates the C standard's
requirements for qsort() comparison functions, which mandate symmetry
and transitivity:
* Symmetry: If x < y, then y > x.
* Transitivity: If x < y and y < z, then x < z.
When v1 and v2 are equal, the function incorrectly returns 1, breaking
symmetry and transitivity. This causes undefined behavior, which can
lead to memory corruption in certain versions of glibc [1].
Fix the issue by returning 0 when v1 and v2 are equal, ensuring
compliance with the C standard and preventing undefined behavior.
Ian Rogers [Fri, 6 Dec 2024 04:23:06 +0000 (20:23 -0800)]
perf test hwmon_pmu: Fix event file location
The temp directory is made and a known fake hwmon PMU created within
it. Prior to this fix the events were being incorrectly written to the
temp directory rather than the fake PMU directory. This didn't impact
the test as the directory fd matched the wrong location, but it
doesn't mirror what a hwmon PMU would actually look like.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206042306.1055913-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>