David Wei [Wed, 28 Feb 2024 23:22:53 +0000 (15:22 -0800)]
netdevsim: fix rtnetlink.sh selftest
I cleared IFF_NOARP flag from netdevsim dev->flags in order to support
skb forwarding. This breaks the rtnetlink.sh selftest
kci_test_ipsec_offload() test because ipsec does not connect to peers it
cannot transmit to.
Fix the issue by adding a neigh entry manually. ipsec_offload test now
successfully pass.
Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
David Wei [Wed, 28 Feb 2024 23:22:52 +0000 (15:22 -0800)]
netdevsim: add selftest for forwarding skb between connected ports
Connect two netdevsim ports in different namespaces together, then send
packets between them using socat.
Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David Wei [Wed, 28 Feb 2024 23:22:51 +0000 (15:22 -0800)]
netdevsim: add ndo_get_iflink() implementation
Add an implementation for ndo_get_iflink() in netdevsim that shows the
ifindex of the linked peer, if any.
Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David Wei [Wed, 28 Feb 2024 23:22:50 +0000 (15:22 -0800)]
netdevsim: forward skbs from one connected port to another
Forward skbs sent from one netdevsim port to its connected netdevsim
port using dev_forward_skb, in a spirit similar to veth.
Add a tx_dropped variable to struct netdevsim, tracking the number of
skbs that could not be forwarded using dev_forward_skb().
The xmit() function accessing the peer ptr is protected by an RCU read
critical section. The rcu_read_lock() is functionally redundant as since
v5.0 all softirqs are implicitly RCU read critical sections; but it is
useful for human readers.
If another CPU is concurrently in nsim_destroy(), then it will first set
the peer ptr to NULL. This does not affect any existing readers that
dereferenced a non-NULL peer. Then, in unregister_netdevice(), there is
a synchronize_rcu() before the netdev is actually unregistered and
freed. This ensures that any readers i.e. xmit() that got a non-NULL
peer will complete before the netdev is freed.
Any readers after the RCU_INIT_POINTER() but before synchronize_rcu()
will dereference NULL, making it safe.
The codepath to nsim_destroy() and nsim_create() takes both the newly
added nsim_dev_list_lock and rtnl_lock. This makes it safe with
concurrent calls to linking two netdevsims together.
Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David Wei [Wed, 28 Feb 2024 23:22:49 +0000 (15:22 -0800)]
netdevsim: allow two netdevsim ports to be connected
Add two netdevsim bus attribute to sysfs:
/sys/bus/netdevsim/link_device
/sys/bus/netdevsim/unlink_device
Writing "A M B N" to link_device will link netdevsim M in netnsid A with
netdevsim N in netnsid B.
Writing "A M" to unlink_device will unlink netdevsim M in netnsid A from
its peer, if any.
rtnl_lock is taken to ensure nothing changes during the linking.
Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Mar 2024 10:30:30 +0000 (10:30 +0000)]
Merge branch 'selftests-xfail'
Jakub Kicinski says:
====================
selftests: kselftest_harness: support using xfail
When running selftests for our subsystem in our CI we'd like all
tests to pass. Currently some tests use SKIP for cases they
expect to fail, because the kselftest_harness limits the return
codes to pass/fail/skip. XFAIL which would be a great match
here cannot be used.
Remove the no_print handling and use vfork() to run the test in
a different process than the setup. This way we don't need to
pass "failing step" via the exit code. Further clean up the exit
codes so that we can use all KSFT_* values. Rewrite the result
printing to make handling XFAIL/XPASS easier. Support tests
declaring combinations of fixture + variant they expect to fail.
Merge plan is to put it on top of -rc6 and merge into net-next.
That way others should be able to pull the patches without
any networking changes.
v4:
- rebase on top of Mickael's vfork() changes
v3: https://lore.kernel.org/all/20240220192235.2953484-1-kuba@kernel.org/
- combine multiple series
- change to "list of expected failures" rather than SKIP()-like handling
v2: https://lore.kernel.org/all/20240216002619.1999225-1-kuba@kernel.org/
- fix alignment
follow up RFC: https://lore.kernel.org/all/20240216004122.2004689-1-kuba@kernel.org/
v1: https://lore.kernel.org/all/20240213154416.422739-1-kuba@kernel.org/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 29 Feb 2024 00:59:18 +0000 (16:59 -0800)]
selftests: kselftest_harness: support using xfail
Currently some tests report skip for things they expect to fail
e.g. when given combination of parameters is known to be unsupported.
This is confusing because in an ideal test environment and fully
featured kernel no tests should be skipped.
Selftest summary line already includes xfail and xpass counters,
e.g.:
Jakub Kicinski [Thu, 29 Feb 2024 00:59:16 +0000 (16:59 -0800)]
selftests: kselftest_harness: separate diagnostic message with # in ksft_test_result_code()
According to the spec we should always print a # if we add
a diagnostic message. Having the caller pass in the new line
as part of diagnostic message makes handling this a bit
counter-intuitive, so append the new line in the helper.
Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 29 Feb 2024 00:59:15 +0000 (16:59 -0800)]
selftests: kselftest_harness: print test name for SKIP
Jakub points out that for parsers it's rather useful to always
have the test name on the result line. Currently if we SKIP
(or soon XFAIL or XPASS), we will print:
ok 17 # SKIP SCTP doesn't support IP_BIND_ADDRESS_NO_PORT
^
no test name
Always print the test name.
KTAP format seems to allow or even call for it, per:
https://docs.kernel.org/dev-tools/ktap.html
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/all/87jzn6lnou.fsf@cloudflare.com/ Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 29 Feb 2024 00:59:14 +0000 (16:59 -0800)]
selftests: kselftest: add ksft_test_result_code(), handling all exit codes
For generic test harness code it's more useful to deal with exit
codes directly, rather than having to switch on them and call
the right ksft_test_result_*() helper. Add such function to kselftest.h.
Note that "directive" and "diagnostic" are what ktap docs call
those parts of the message.
Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 29 Feb 2024 00:59:13 +0000 (16:59 -0800)]
selftests: kselftest_harness: use exit code to store skip
We always use skip in combination with exit_code being 0
(KSFT_PASS). This are basic KSFT / KTAP semantics.
Store the right KSFT_* code in exit_code directly.
This makes it easier to support tests reporting other
extended KSFT_* codes like XFAIL / XPASS.
Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 29 Feb 2024 00:59:12 +0000 (16:59 -0800)]
selftests: kselftest_harness: save full exit code in metadata
Instead of tracking passed = 0/1 rename the field to exit_code
and invert the values so that they match the KSFT_* exit codes.
This will allow us to fold SKIP / XFAIL into the same value.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 29 Feb 2024 00:59:11 +0000 (16:59 -0800)]
selftests: kselftest_harness: generate test name once
Since we added variant support generating full test case
name takes 4 string arguments. We're about to need it
in another two places. Stop the duplication and print
once into a temporary buffer.
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 29 Feb 2024 00:59:10 +0000 (16:59 -0800)]
selftests: kselftest_harness: use KSFT_* exit codes
Now that we no longer need low exit codes to communicate
assertion steps - use normal KSFT exit codes.
Acked-by: Kees Cook <keescook@chromium.org> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Mickaël Salaün [Thu, 29 Feb 2024 00:59:09 +0000 (16:59 -0800)]
selftests/harness: Merge TEST_F_FORK() into TEST_F()
Replace Landlock-specific TEST_F_FORK() with an improved TEST_F() which
brings four related changes:
Run TEST_F()'s tests in a grandchild process to make it possible to
drop privileges and delegate teardown to the parent.
Compared to TEST_F_FORK(), simplify handling of the test grandchild
process thanks to vfork(2), and makes it generic (e.g. no explicit
conversion between exit code and _metadata).
Compared to TEST_F_FORK(), run teardown even when tests failed with an
assert thanks to commit 63e6b2a42342 ("selftests/harness: Run TEARDOWN
for ASSERT failures").
Simplify the test harness code by removing the no_print and step fields
which are not used. I added this feature just after I made
kselftest_harness.h more broadly available but this step counter
remained even though it wasn't needed after all. See commit 369130b63178
("selftests: Enhance kselftest_harness.h to print which assert failed").
Replace spaces with tabs in one line of __TEST_F_IMPL().
Cc: Günther Noack <gnoack@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Mickaël Salaün <mic@digikod.net> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Mickaël Salaün [Thu, 29 Feb 2024 00:59:08 +0000 (16:59 -0800)]
selftests/landlock: Redefine TEST_F() as TEST_F_FORK()
This has the effect of creating a new test process for either TEST_F()
or TEST_F_FORK(), which doesn't change tests but will ease potential
backports. See next commit for the TEST_F_FORK() merge into TEST_F().
Cc: Günther Noack <gnoack@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Justin Chen [Wed, 28 Feb 2024 22:54:00 +0000 (14:54 -0800)]
net: bcmasp: Add support for PHY interrupts
Hook up the phy interrupts for internal phys to reduce mdio traffic
and improve responsiveness of link changes.
Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Justin Chen [Wed, 28 Feb 2024 22:53:59 +0000 (14:53 -0800)]
net: bcmasp: Keep buffers through power management
There is no advantage of freeing and re-allocating buffers through
suspend and resume. This waste cycles and makes suspend/resume time
longer. We also open ourselves to failed allocations in systems with
heavy memory fragmentation.
Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Justin Chen [Wed, 28 Feb 2024 22:53:58 +0000 (14:53 -0800)]
net: phy: mdio-bcm-unimac: Add asp v2.2 support
Add mdio compat string for ASP 2.0 ethernet driver.
Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Justin Chen [Wed, 28 Feb 2024 22:53:57 +0000 (14:53 -0800)]
net: bcmasp: Add support for ASP 2.2
ASP 2.2 improves power savings during low power modes.
A new register was added to toggle to a slower clock during low
power modes.
EEE was broken for ASP 2.0/2.1. A HW workaround was added for
ASP 2.2 that requires toggling a chicken bit.
Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Justin Chen [Wed, 28 Feb 2024 22:53:56 +0000 (14:53 -0800)]
dt-bindings: net: brcm,asp-v2.0: Add asp-v2.2
Add support for ASP 2.2.
Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Justin Chen [Wed, 28 Feb 2024 22:53:55 +0000 (14:53 -0800)]
dt-bindings: net: brcm,unimac-mdio: Add asp-v2.2
The ASP 2.2 Ethernet controller uses a brcm unimac.
Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Mar 2024 08:56:39 +0000 (08:56 +0000)]
Merge branch 'qcom-phy-possible'
Robert Marko says:
====================
net: phy: qcom: qca808x: fill in possible_interfaces
QCA808x does not currently fill in the possible_interfaces.
This leads to Phylink not being aware that it supports 2500Base-X as well
so in cases where it is connected to a DSA switch like MV88E6393 it will
limit that port to phy-mode set in the DTS.
That means that if SGMII is used you are limited to 1G only while if
2500Base-X was set you are limited to 2.5G only.
Populating the possible_interfaces fixes this.
Changes in v2:
* Get rid of the if/else by Russels suggestion in the helper
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Marko [Wed, 28 Feb 2024 17:24:10 +0000 (18:24 +0100)]
net: phy: qcom: qca808x: fill in possible_interfaces
Currently QCA808x driver does not fill the possible_interfaces.
2.5G QCA808x support SGMII and 2500Base-X while 1G model only supports
SGMII, so fill the possible_interfaces accordingly.
Signed-off-by: Robert Marko <robimarko@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Marko [Wed, 28 Feb 2024 17:24:09 +0000 (18:24 +0100)]
net: phy: qcom: qca808x: add helper for checking for 1G only model
There are 2 versions of QCA808x, one 2.5G capable and one 1G capable.
Currently, this matter only in the .get_features call however, it will
be required for filling supported interface modes so lets add a helper
that can be reused.
Signed-off-by: Robert Marko <robimarko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Wed, 28 Feb 2024 14:05:29 +0000 (15:05 +0100)]
Simplify net_dbg_ratelimited() dummy
There is no need to wrap calls to the no_printk() helper inside an
always-false check, as no_printk() already does that internally.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 28 Feb 2024 13:54:39 +0000 (13:54 +0000)]
ipv6: use xa_array iterator to implement inet6_netconf_dump_devconf()
1) inet6_netconf_dump_devconf() can run under RCU protection
instead of RTNL.
2) properly return 0 at the end of a dump, avoiding an
an extra recvmsg() system call.
3) Do not use inet6_base_seq() anymore, for_each_netdev_dump()
has nice properties. Restarting a GETDEVCONF dump if a device has
been added/removed or if net->ipv6.dev_addr_genid has changed is moot.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 28 Feb 2024 13:54:29 +0000 (13:54 +0000)]
ipv6: annotate data-races around cnf.hop_limit
idev->cnf.hop_limit and net->ipv6.devconf_all->hop_limit
might be read locklessly, add appropriate READ_ONCE()
and WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Florian Westphal <fw@strlen.de> # for netfilter parts Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/mptcp/protocol.c adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket") 9426ce476a70 ("mptcp: annotate lockless access for RX path fields")
https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/
Adjacent changes:
drivers/dpll/dpll_core.c 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order")
drivers/net/veth.c 1ce7d306ea63 ("veth: try harder when allocating queue memory") 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers")
drivers/net/wireless/intel/iwlwifi/mvm/d3.c 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO") 78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists")
net/wireless/nl80211.c f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change") 414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately")
Linus Torvalds [Thu, 29 Feb 2024 20:40:20 +0000 (12:40 -0800)]
Merge tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, WiFi and netfilter.
We have one outstanding issue with the stmmac driver, which may be a
LOCKDEP false positive, not a blocker.
Current release - regressions:
- netfilter: nf_tables: re-allow NFPROTO_INET in
nft_(match/target)_validate()
- eth: ionic: fix error handling in PCI reset code
Current release - new code bugs:
- eth: stmmac: complete meta data only when enabled, fix null-deref
- kunit: fix again checksum tests on big endian CPUs
Previous releases - regressions:
- veth: try harder when allocating queue memory
- Bluetooth:
- hci_bcm4377: do not mark valid bd_addr as invalid
- hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST
Previous releases - always broken:
- info leak in __skb_datagram_iter() on netlink socket
- mptcp:
- map v4 address to v6 when destroying subflow
- fix potential wake-up event loss due to sndbuf auto-tuning
- fix double-free on socket dismantle
- wifi: nl80211: reject iftype change with mesh ID change
- fix small out-of-bound read when validating netlink be16/32 types
- rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
- ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr()
- ip_tunnel: prevent perpetual headroom growth with huge number of
tunnels on top of each other
- mctp: fix skb leaks on error paths of mctp_local_output()
- eth: ice: fixes for DPLL state reporting
- dpll: rely on rcu for netdev_dpll_pin() to prevent UaF
- eth: dpaa: accept phy-interface-type = '10gbase-r' in the device
tree"
* tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
dpll: fix build failure due to rcu_dereference_check() on unknown type
kunit: Fix again checksum tests on big endian CPUs
tls: fix use-after-free on failed backlog decryption
tls: separate no-async decryption request handling from async
tls: fix peeking with sync+async decryption
tls: decrement decrypt_pending if no async completion will be called
gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames
igb: extend PTP timestamp adjustments to i211
rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
tools: ynl: fix handling of multiple mcast groups
selftests: netfilter: add bridge conntrack + multicast test case
netfilter: bridge: confirm multicast packets before passing them up the stack
netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
Bluetooth: qca: Fix triggering coredump implementation
Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
Bluetooth: qca: Fix wrong event type for patch config command
Bluetooth: Enforce validation on max value of connection interval
Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
Bluetooth: mgmt: Fix limited discoverable off timeout
...
Christophe Leroy [Fri, 23 Feb 2024 10:41:52 +0000 (11:41 +0100)]
kunit: Fix again checksum tests on big endian CPUs
Commit b38460bc463c ("kunit: Fix checksum tests on big endian CPUs")
fixed endianness issues with kunit checksum tests, but then
commit 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and
ip_fast_csum") introduced new issues on big endian CPUs. Those issues
are once again reflected by the warnings reported by sparse.
So, fix them with the same approach, perform proper conversion in
order to support both little and big endian CPUs. Once the conversions
are properly done and the right types used, the sparse warnings are
cleared as well.
Reported-by: Erhard Furtner <erhard_f@mailbox.org> Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/73df3a9e95c2179119398ad1b4c84cdacbd8dfb6.1708684443.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 29 Feb 2024 17:10:24 +0000 (09:10 -0800)]
Merge tag 'for-net-2024-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- mgmt: Fix limited discoverable off timeout
- hci_qca: Set BDA quirk bit if fwnode exists in DT
- hci_bcm4377: do not mark valid bd_addr as invalid
- hci_sync: Check the correct flag before starting a scan
- Enforce validation on max value of connection interval
- hci_sync: Fix accept_list when attempting to suspend
- hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
- Avoid potential use-after-free in hci_error_reset
- rfcomm: Fix null-ptr-deref in rfcomm_check_security
- hci_event: Fix wrongly recorded wakeup BD_ADDR
- qca: Fix wrong event type for patch config command
- qca: Fix triggering coredump implementation
* tag 'for-net-2024-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: qca: Fix triggering coredump implementation
Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
Bluetooth: qca: Fix wrong event type for patch config command
Bluetooth: Enforce validation on max value of connection interval
Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
Bluetooth: mgmt: Fix limited discoverable off timeout
Bluetooth: hci_event: Fix wrongly recorded wakeup BD_ADDR
Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security
Bluetooth: hci_sync: Fix accept_list when attempting to suspend
Bluetooth: Avoid potential use-after-free in hci_error_reset
Bluetooth: hci_sync: Check the correct flag before starting a scan
Bluetooth: hci_bcm4377: do not mark valid bd_addr as invalid
====================
====================
tls: a few more fixes for async decrypt
The previous patchset [1] took care of "full async". This adds a few
fixes for cases where only part of the crypto operations go the async
route, found by extending my previous debug patch [2] to do N
synchronous operations followed by M asynchronous ops (with N and M
configurable).
Sabrina Dubroca [Wed, 28 Feb 2024 22:44:00 +0000 (23:44 +0100)]
tls: fix use-after-free on failed backlog decryption
When the decrypt request goes to the backlog and crypto_aead_decrypt
returns -EBUSY, tls_do_decryption will wait until all async
decryptions have completed. If one of them fails, tls_do_decryption
will return -EBADMSG and tls_decrypt_sg jumps to the error path,
releasing all the pages. But the pages have been passed to the async
callback, and have already been released by tls_decrypt_done.
The only true async case is when crypto_aead_decrypt returns
-EINPROGRESS. With -EBUSY, we already waited so we can tell
tls_sw_recvmsg that the data is available for immediate copy, but we
need to notify tls_decrypt_sg (via the new ->async_done flag) that the
memory has already been released.
Sabrina Dubroca [Wed, 28 Feb 2024 22:43:59 +0000 (23:43 +0100)]
tls: separate no-async decryption request handling from async
If we're not doing async, the handling is much simpler. There's no
reference counting, we just need to wait for the completion to wake us
up and return its result.
We should preferably also use a separate crypto_wait. I'm not seeing a
UAF as I did in the past, I think aec7961916f3 ("tls: fix race between
async notify and socket close") took care of it.
Sabrina Dubroca [Wed, 28 Feb 2024 22:43:58 +0000 (23:43 +0100)]
tls: fix peeking with sync+async decryption
If we peek from 2 records with a currently empty rx_list, and the
first record is decrypted synchronously but the second record is
decrypted async, the following happens:
1. decrypt record 1 (sync)
2. copy from record 1 to the userspace's msg
3. queue the decrypted record to rx_list for future read(!PEEK)
4. decrypt record 2 (async)
5. queue record 2 to rx_list
6. call process_rx_list to copy data from the 2nd record
We currently pass copied=0 as skip offset to process_rx_list, so we
end up copying once again from the first record. We should skip over
the data we've already copied.
Seen with selftest tls.12_aes_gcm.recv_peek_large_buf_mult_recs
Sabrina Dubroca [Wed, 28 Feb 2024 22:43:57 +0000 (23:43 +0100)]
tls: decrement decrypt_pending if no async completion will be called
With mixed sync/async decryption, or failures of crypto_aead_decrypt,
we increment decrypt_pending but we never do the corresponding
decrement since tls_decrypt_done will not be called. In this case, we
should decrement decrypt_pending immediately to avoid getting stuck.
For example, the prequeue prequeue test gets stuck with mixed
modes (one async decrypt + one sync decrypt).
The commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf") added a field in struct_netdevice, which tells what
type of statistics the driver supports.
That field is used primarily to allocate stats structures automatically,
but, it also could leveraged to simplify the drivers even further, such
as, if the driver relies in the default stats collection, then it
doesn't need to assign to .ndo_get_stats64. That means that drivers only
assign functions to .ndo_get_stats64 if they are using something
special.
I started to move some of these drivers[1][2][3] to use the core
allocation, and with this change in, I just need to touch the driver
once, and be able to simplify the whole stats allocation and collection
for generic case.
There are 44 devices today that could benefit from this simplification.
Breno Leitao [Wed, 28 Feb 2024 11:31:22 +0000 (03:31 -0800)]
net: sit: Do not set .ndo_get_stats64
If the driver is using the network core allocation mechanism, by setting
NETDEV_PCPU_STAT_TSTATS, as this driver is, then, it doesn't need to set
the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Since
the network core calls it automatically, and .ndo_get_stats64 should
only be set if the driver needs special treatment.
This simplifies the driver, since all the generic statistics is now
handled by core.
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Breno Leitao [Wed, 28 Feb 2024 11:31:21 +0000 (03:31 -0800)]
net: get stats64 if device if driver is configured
If the network driver is relying in the net core to do stats allocation,
then we want to dev_get_tstats64() instead of netdev_stats_to_stats64(),
since there are per-cpu stats that needs to be taken in consideration.
This will also simplify the drivers in regard to statistics. Once the
driver sets NETDEV_PCPU_STAT_TSTATS, it doesn't not need to allocate the
stacks, neither it needs to set `.ndo_get_stats64 = dev_get_tstats64`
for the generic stats collection function anymore.
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 29 Feb 2024 11:16:07 +0000 (12:16 +0100)]
Merge tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin.
Patch #2 fixes an issue with bridge netfilter and broadcast/multicast
packets.
There is a day 0 bug in br_netfilter when used with connection tracking.
Conntrack assumes that an nf_conn structure that is not yet added to
hash table ("unconfirmed"), is only visible by the current cpu that is
processing the sk_buff.
For bridge this isn't true, sk_buff can get cloned in between, and
clones can be processed in parallel on different cpu.
This patch disables NAT and conntrack helpers for multicast packets.
Patch #3 adds a selftest to cover for the br_netfilter bug.
netfilter pull request 24-02-29
* tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: netfilter: add bridge conntrack + multicast test case
netfilter: bridge: confirm multicast packets before passing them up the stack
netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
====================
The current code adds extra two bytes (i.e. sizeof(struct hsr_sup_tlv))
when offset for skb_pull() is calculated.
This is wrong, as both 'struct hsrv1_ethhdr_sp' and 'hsrv0_ethhdr_sp'
already have 'struct hsr_sup_tag' defined in them, so there is no need
for adding extra two bytes.
This code was working correctly as with no RedBox support, the check for
HSR_TLV_EOT (0x00) was off by two bytes, which were corresponding to
zeroed padded bytes for minimal packet size.
Fixes: eafaa88b3eb7 ("net: hsr: Add support for redbox supervision frames") Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240228085644.3618044-1-lukma@denx.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
net: dsa: mv88e6xxx: add Amethyst specific SMI GPIO function
Amethyst family (MV88E6191X/6193X/6393X) has a simplified SMI GPIO setting
via the Scratch and Misc register so it requires family specific function.
In the v1 review, Andrew pointed out that it would make sense to rename the
existing mv88e6xxx_g2_scratch_gpio_set_smi as it only works on the MV6390
family.
Changes in v2:
* Add rename of mv88e6xxx_g2_scratch_gpio_set_smi to
mv88e6390_g2_scratch_gpio_set_smi
====================
Robert Marko [Tue, 27 Feb 2024 17:54:22 +0000 (18:54 +0100)]
net: dsa: mv88e6xxx: add Amethyst specific SMI GPIO function
The existing mv88e6390_g2_scratch_gpio_set_smi() cannot be used on the
88E6393X as it requires certain P0_MODE, it also checks the CPU mode
as it impacts the bit setting value.
This is all irrelevant for Amethyst (MV88E6191X/6193X/6393X) as only
the default value of the SMI_PHY Config bit is set to CPU_MGD bootstrap
pin value but it can be changed without restrictions so that GPIO pins
9 and 10 are used as SMI pins.
So, introduce Amethyst specific function and call that if the Amethyst
family wants to setup the external PHY.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Robert Marko <robimarko@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The name mv88e6xxx_g2_scratch_gpio_set_smi is a bit ambiguous as it appears
to only be applicable to the 6390 family, so lets rename it to
mv88e6390_g2_scratch_gpio_set_smi to make it more obvious.
Signed-off-by: Robert Marko <robimarko@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Tue, 27 Feb 2024 22:22:59 +0000 (22:22 +0000)]
inet6: expand rcu_read_lock() scope in inet6_dump_addr()
I missed that inet6_dump_addr() is calling in6_dump_addrs()
from two points.
First one under RTNL protection, and second one under rcu_read_lock().
Since we want to remove RTNL use from inet6_dump_addr() very soon,
no longer assume in6_dump_addrs() is protected by RTNL (even
if this is still the case).
Use rcu_read_lock() earlier to fix this lockdep splat:
Eric Dumazet [Tue, 27 Feb 2024 21:01:04 +0000 (21:01 +0000)]
net: call skb_defer_free_flush() from __napi_busy_loop()
skb_defer_free_flush() is currently called from net_rx_action()
and napi_threaded_poll().
We should also call it from __napi_busy_loop() otherwise
there is the risk the percpu queue can grow until an IPI
is forced from skb_attempt_defer_free() adding a latency spike.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Samiullah Khawaja <skhawaja@google.com> Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240227210105.3815474-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleksij Rempel [Tue, 27 Feb 2024 18:49:41 +0000 (10:49 -0800)]
igb: extend PTP timestamp adjustments to i211
The i211 requires the same PTP timestamp adjustments as the i210,
according to its datasheet. To ensure consistent timestamping across
different platforms, this change extends the existing adjustments to
include the i211.
The adjustment result are tested and comparable for i210 and i211 based
systems.
Fixes: 3f544d2a4d5c ("igb: adjust PTP timestamps for Tx/Rx latency") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240227184942.362710-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Breno Leitao [Tue, 27 Feb 2024 18:23:36 +0000 (10:23 -0800)]
net: bridge: Do not allocate stats in the driver
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of this driver.
With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.
Remove the allocation in the bridge driver and leverage the network
core allocation.
Lin Ma [Tue, 27 Feb 2024 12:11:28 +0000 (20:11 +0800)]
rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
In the commit d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks
IFLA_BRIDGE_MODE length"), an adjustment was made to the old loop logic
in the function `rtnl_bridge_setlink` to enable the loop to also check
the length of the IFLA_BRIDGE_MODE attribute. However, this adjustment
removed the `break` statement and led to an error logic of the flags
writing back at the end of this function.
if (have_flags)
memcpy(nla_data(attr), &flags, sizeof(flags));
// attr should point to IFLA_BRIDGE_FLAGS NLA !!!
Before the mentioned commit, the `attr` is granted to be IFLA_BRIDGE_FLAGS.
However, this is not necessarily true fow now as the updated loop will let
the attr point to the last NLA, even an invalid NLA which could cause
overflow writes.
This patch introduces a new variable `br_flag` to save the NLA pointer
that points to IFLA_BRIDGE_FLAGS and uses it to resolve the mentioned
error logic.
Fixes: d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length") Signed-off-by: Lin Ma <linma@zju.edu.cn> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240227121128.608110-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zhengchao Shao [Tue, 27 Feb 2024 09:36:04 +0000 (17:36 +0800)]
netlabel: remove impossible return value in netlbl_bitmap_walk
Since commit 446fda4f2682 ("[NetLabel]: CIPSOv4 engine"), *bitmap_walk
function only returns -1. Nearly 18 years have passed, -2 scenes never
come up, so there's no need to consider it.
Eric Dumazet [Tue, 27 Feb 2024 09:24:11 +0000 (09:24 +0000)]
inet: use xa_array iterator to implement inet_netconf_dump_devconf()
1) inet_netconf_dump_devconf() can run under RCU protection
instead of RTNL.
2) properly return 0 at the end of a dump, avoiding an
an extra recvmsg() system call.
3) Do not use inet_base_seq() anymore, for_each_netdev_dump()
has nice properties. Restarting a GETDEVCONF dump if a device has
been added/removed or if net->ipv4.dev_addr_genid has changed is moot.
Catalin Popescu [Mon, 26 Feb 2024 16:23:39 +0000 (17:23 +0100)]
net: phy: dp83826: disable WOL at init
Commit d1d77120bc28 ("net: phy: dp83826: support TX data voltage tuning")
introduced a regression in that WOL is not disabled by default for DP83826.
WOL should normally be enabled through ethtool.
Chengming Zhou [Wed, 28 Feb 2024 03:06:58 +0000 (03:06 +0000)]
net: remove SLAB_MEM_SPREAD flag usage
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was
removed as of v6.8-rc1, so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the
series[1] went on to mark it obsolete to avoid confusion for users.
Here we can just remove all its users, which has no functional change.
Jakub Kicinski [Wed, 28 Feb 2024 23:25:47 +0000 (15:25 -0800)]
Merge branch 'tools-ynl-stop-using-libmnl'
Jakub Kicinski says:
====================
tools: ynl: stop using libmnl
There is no strong reason to stop using libmnl in ynl but there
are a few small ones which add up.
First (as I remembered immediately after hitting send on v1),
C++ compilers do not like the libmnl for_each macros.
I haven't tried it myself, but having all the code directly
in YNL makes it easier for folks porting to C++ to modify them
and/or make YNL more C++ friendly.
Second, we do much more advanced netlink level parsing in ynl
than libmnl so it's hard to say that libmnl abstracts much from us.
The fact that this series, removing the libmnl dependency, only
adds <300 LoC shows that code savings aren't huge.
OTOH when new types are added (e.g. auto-int) we need to add
compatibility to deal with older version of libmnl (in fact,
even tho patches have been sent months ago, auto-ints are still
not supported in libmnl.git).
Thrid, the dependency makes ynl less self contained, and harder
to vendor in. Whether vendoring libraries into projects is a good
idea is a separate discussion, nonetheless, people want to do it.
Fourth, there are small annoyances with the libmnl APIs which
are hard to fix in backward-compatible ways. See the last patch
for example.
All in all, libmnl is a great library, but with all the code
generation and structured parsing, ynl is better served by going
its own way.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:32 +0000 (14:30 -0800)]
tools: ynl: use MSG_DONTWAIT for getting notifications
To stick to libmnl wrappers in the past we had to use poll()
to check if there are any outstanding notifications on the socket.
This is no longer necessary, we can use MSG_DONTWAIT.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:30 +0000 (14:30 -0800)]
tools: ynl: stop using mnl socket helpers
Most libmnl socket helpers can be replaced by direct calls to
the underlying libc API. We need portid, the netlink manpage
suggests we bind() address of zero.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:27 +0000 (14:30 -0800)]
tools: ynl: stop using mnl_cb_run2()
There's only one set of callbacks in YNL, for netlink control
messages, and most of them are trivial. So implement the message
walking directly without depending on mnl_cb_run2().
Jakub Kicinski [Tue, 27 Feb 2024 22:30:26 +0000 (14:30 -0800)]
tools: ynl: use ynl_sock_read_msgs() for ACK handling
ynl_recv_ack() is simple and it's the only user of mnl_cb_run().
Now that ynl_sock_read_msgs() exists it's actually less code
to use ynl_sock_read_msgs() instead of being special.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:25 +0000 (14:30 -0800)]
tools: ynl: wrap recv() + mnl_cb_run2() into a single helper
All callers to mnl_cb_run2() call mnl_socket_recvfrom() right before.
Wrap the two in a helper, take typed arguments (struct ynl_parse_arg),
instead of hoping that all callers remember that parser error handling
requires yarg.
In case of ynl_sock_read_family() we will no longer check for kernel
returning no data, but that would be a kernel bug, not worth complicating
the code to catch this. Calling mnl_cb_run2() on an empty buffer
is legal and results in STOP (1).
Jakub Kicinski [Tue, 27 Feb 2024 22:30:24 +0000 (14:30 -0800)]
tools: ynl-gen: remove unused parse code
Commit f2ba1e5e2208 ("tools: ynl-gen: stop generating common notification handlers")
removed the last caller of the parse_cb_run() helper.
We no longer need to export ynl_cb_array.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:23 +0000 (14:30 -0800)]
tools: ynl: make yarg the first member of struct ynl_dump_state
All YNL parsing code expects a pointer to struct ynl_parse_arg AKA yarg.
For dump was pass in struct ynl_dump_state, which works fine, because
struct ynl_dump_state and struct ynl_parse_arg have identical layout
for the members that matter.. but it's a bit hacky.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:19 +0000 (14:30 -0800)]
tools: ynl: create local attribute helpers
Don't use mnl attr helpers, we're trying to remove the libmnl
dependency. Create both signed and unsigned helpers, libmnl
had unsigned helpers, so code generator no longer needs
the mnl_type() hack.
The new helpers are written from first principles, but are
hopefully not too buggy.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:18 +0000 (14:30 -0800)]
tools: ynl: give up on libmnl for auto-ints
The temporary auto-int helpers are not really correct.
We can't treat signed and unsigned ints the same when
determining whether we need full 8B. I realized this
before sending the patch to add support in libmnl.
Unfortunately, that patch has not been merged,
so time to fix our local helpers. Use the mnl* name
for now, subsequent patches will address that.
Jakub Kicinski [Mon, 26 Feb 2024 21:40:18 +0000 (13:40 -0800)]
tools: ynl: fix handling of multiple mcast groups
We never increment the group number iterator, so all groups
get recorded into index 0 of the mcast_groups[] array.
As a result YNL can only handle using the last group.
For example using the "netdev" sample on kernel with
page pool commands results in:
$ ./samples/netdev
YNL: Multicast group 'mgmt' not found
Most families have only one multicast group, so this hasn't
been noticed. Plus perhaps developers usually test the last
group which would have worked.
Fixes: 86878f14d71a ("tools: ynl: user space helpers") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20240226214019.1255242-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 26 Feb 2024 22:58:06 +0000 (14:58 -0800)]
tools: ynl: protect from old OvS headers
Since commit 7c59c9c8f202 ("tools: ynl: generate code for ovs families")
we need relatively recent OvS headers to get YNL to compile.
Add the direct include workaround to fix compilation on less
up-to-date OSes like CentOS 9.