Jijie Shao [Tue, 15 Oct 2024 12:35:12 +0000 (20:35 +0800)]
net: hibmcge: Implement .ndo_start_xmit function
Implement .ndo_start_xmit function to fill the information of the packet
to be transmitted into the tx descriptor, and then the hardware will
transmit the packet using the information in the tx descriptor.
In addition, we also implemented the tx_handler function to enable the
tx descriptor to be reused, and .ndo_tx_timeout function to print some
information when the hardware is busy.
Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jijie Shao [Tue, 15 Oct 2024 12:35:11 +0000 (20:35 +0800)]
net: hibmcge: Implement some .ndo functions
Implement the .ndo_open() .ndo_stop() .ndo_set_mac_address()
and .ndo_change_mtu functions().
And .ndo_validate_addr calls the eth_validate_addr function directly
Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jijie Shao [Tue, 15 Oct 2024 12:35:10 +0000 (20:35 +0800)]
net: hibmcge: Add interrupt supported in this module
The driver supports four interrupts: TX interrupt, RX interrupt,
mdio interrupt, and error interrupt.
Actually, the driver does not use the mdio interrupt.
Therefore, the driver does not request the mdio interrupt.
The error interrupt distinguishes different error information
by using different masks. To distinguish different errors,
the statistics count is added for each error.
To ensure the consistency of the code process, masks are added for the
TX interrupt and RX interrupt.
This patch implements interrupt request, and provides a
unified entry for the interrupt handler function. However,
the specific interrupt handler function of each interrupt
is not implemented currently.
Because of pcim_enable_device(), the interrupt vector
is already device managed and does not need to be free actively.
Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jijie Shao [Tue, 15 Oct 2024 12:35:08 +0000 (20:35 +0800)]
net: hibmcge: Add read/write registers supported through the bar space
Add support for to read and write registers through the pic bar space.
Some driver parameters, such as mac_id, are determined by the
board form. Therefore, these parameters are initialized
from the register as device specifications.
the device specifications register are initialized and written by bmc.
driver will read these registers when loading.
Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jijie Shao [Tue, 15 Oct 2024 12:35:07 +0000 (20:35 +0800)]
net: hibmcge: Add pci table supported in this module
Add pci table supported in this module, and implement pci_driver function
to initialize this driver.
hibmcge is a passthrough network device. Its software runs
on the host side, and the MAC hardware runs on the BMC side
to reduce the host CPU area. The software interacts with the
MAC hardware through the PCIe.
WangYuli [Fri, 18 Oct 2024 02:19:10 +0000 (10:19 +0800)]
eth: Fix typo 'accelaration'. 'exprienced' and 'rewritting'
There are some spelling mistakes of 'accelaration', 'exprienced' and
'rewritting' in comments which should be 'acceleration', 'experienced'
and 'rewriting'.
Suggested-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/all/20241017162846.GA51712@kernel.org/ Signed-off-by: WangYuli <wangyuli@uniontech.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org>
Message-ID: <90D42CB167CA0842+20241018021910.31359-1-wangyuli@uniontech.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Heiner Kallweit [Thu, 17 Oct 2024 20:27:44 +0000 (22:27 +0200)]
r8169: enable EEE at 2.5G per default on RTL8125B
Register a6d/12 is shadowing register MDIO_AN_EEE_ADV2. So this line
disables advertisement of EEE at 2.5G. Latest vendor driver r8125
doesn't do this (any longer?), so this mode seems to be safe.
EEE saves quite some energy, therefore enable this mode per default.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org>
Message-ID: <95dd5a0c-09ea-4847-94d9-b7aa3063e8ff@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Heiner Kallweit [Thu, 17 Oct 2024 16:01:13 +0000 (18:01 +0200)]
net: phy: realtek: add RTL8125D-internal PHY
The first boards show up with Realtek's RTL8125D. This MAC/PHY chip
comes with an integrated 2.5Gbps PHY with ID 0x001cc841. It's not
clear yet whether there's an external version of this PHY and how
Realtek calls it, therefore use the numeric id for now.
Heiner Kallweit [Wed, 16 Oct 2024 20:29:39 +0000 (22:29 +0200)]
r8169: avoid duplicated messages if loading firmware fails and switch to warn level
In case of a problem with firmware loading we inform at the driver level,
in addition the firmware load code itself issues warnings. Therefore
switch to firmware_request_nowarn() to avoid duplicated error messages.
In addition switch to warn level because the firmware is optional and
typically just fixes compatibility issues.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org>
Message-ID: <d9c5094c-89a6-40e2-b5fe-8df7df4624ef@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Heiner Kallweit [Wed, 16 Oct 2024 20:06:53 +0000 (22:06 +0200)]
r8169: replace custom flag with disable_work() et al
So far we use a custom flag to define when a task can be scheduled and
when not. Let's use the standard mechanism with disable_work() et al
instead.
Note that in rtl8169_close() we can remove the call to cancel_work()
because we now call disable_work_sync() in rtl8169_down() already.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Heiner Kallweit [Wed, 16 Oct 2024 20:05:57 +0000 (22:05 +0200)]
r8169: don't take RTNL lock in rtl_task()
There's not really a benefit here in taking the RTNL lock. The task
handler does exception handling only, so we're in trouble anyway when
we come here, and there's no need to protect against e.g. a parallel
ethtool call.
A benefit of removing the RTNL lock here is that we now can
synchronously cancel the workqueue from a context holding the RTNL mutex.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
fbnic fails to link as built-in when PTP support is in a loadable
module:
aarch64-linux-ld: drivers/net/ethernet/meta/fbnic/fbnic_ethtool.o: in function `fbnic_get_ts_info':
fbnic_ethtool.c:(.text+0x428): undefined reference to `ptp_clock_index'
aarch64-linux-ld: drivers/net/ethernet/meta/fbnic/fbnic_time.o: in function `fbnic_time_start':
fbnic_time.c:(.text+0x820): undefined reference to `ptp_schedule_worker'
aarch64-linux-ld: drivers/net/ethernet/meta/fbnic/fbnic_time.o: in function `fbnic_ptp_setup':
fbnic_time.c:(.text+0xa68): undefined reference to `ptp_clock_register'
Menglong Dong [Tue, 15 Oct 2024 09:02:44 +0000 (17:02 +0800)]
net: vxlan: update the document for vxlan_snoop()
The function vxlan_snoop() returns drop reasons now, so update the
document of it too.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Tue, 15 Oct 2024 08:28:30 +0000 (16:28 +0800)]
net: vxlan: replace VXLAN_INVALID_HDR with VNI_NOT_FOUND
Replace the drop reason "SKB_DROP_REASON_VXLAN_INVALID_HDR" with
"SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND" in encap_bypass_if_local(), as the
latter is more accurate.
Fixes: 790961d88b0e ("net: vxlan: use kfree_skb_reason() in encap_bypass_if_local()") Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Tue, 15 Oct 2024 07:58:09 +0000 (09:58 +0200)]
net: airoha: Fix typo in REG_CDM2_FWD_CFG configuration
Fix typo in airoha_fe_init routine configuring CDM2_OAM_QSEL_MASK field
of REG_CDM2_FWD_CFG register.
This bug is not introducing any user visible problem since Frame Engine
CDM2 port is used just by the second QDMA block and we currently enable
just QDMA1 block connected to the MT7530 dsa switch via CDM1 port.
Introduced by commit 23020f049327 ("net: airoha: Introduce ethernet
support for EN7581 SoC")
Reported-by: ChihWei Cheng <chihwei.cheng@airoha.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org>
Message-ID: <20241015-airoha-eth-cdm2-fixes-v1-1-9dc6993286c3@kernel.org> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Paul Barker [Tue, 15 Oct 2024 13:36:31 +0000 (14:36 +0100)]
net: ravb: Simplify UDP TX checksum offload
The GbEth IP will pass through a zero UDP checksum without asserting any
error flags so we do not need to resort to software checksum calculation
in this case.
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Paul Barker [Tue, 15 Oct 2024 13:36:30 +0000 (14:36 +0100)]
net: ravb: Disable IP header TX checksum offloading
For IPv4 packets, the header checksum will always be calculated in software
in the TX path (Documentation/networking/checksum-offloads.rst says "No
offloading of the IP header checksum is performed; it is always done in
software.") so there is no advantage in asking the hardware to also
calculate this checksum.
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Paul Barker [Tue, 15 Oct 2024 13:36:29 +0000 (14:36 +0100)]
net: ravb: Simplify types in RX csum validation
The hardware checksum value is used as a 16-bit flag, it is zero when
the checksum has been validated and non-zero otherwise. Therefore we
don't need to treat this as an actual __wsum type or call csum_unfold(),
we can just use a u16 pointer.
Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Paul Barker [Tue, 15 Oct 2024 13:36:28 +0000 (14:36 +0100)]
net: ravb: Combine if conditions in RX csum validation
We can merge the two if conditions on skb_is_nonlinear(). Since
skb_frag_size_sub() and skb_trim() do not free memory, it is still safe
to access the trimmed bytes at the end of the packet after these calls.
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Paul Barker [Tue, 15 Oct 2024 13:36:26 +0000 (14:36 +0100)]
net: ravb: Disable IP header RX checksum offloading
For IPv4 packets, the header checksum will always be checked in software
in the RX path (inet_gro_receive() calls ip_fast_csum() unconditionally)
so there is no advantage in asking the hardware to also calculate this
checksum.
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Andy Shevchenko [Wed, 16 Oct 2024 09:05:54 +0000 (12:05 +0300)]
tg3: Increase buffer size for IRQ label
GCC is not happy with the current code, e.g.:
.../tg3.c:11313:37: error: ‘-txrx-’ directive output may be truncated writing 6 bytes into a region of size between 1 and 16 [-Werror=format-truncation=]
11313 | "%s-txrx-%d", tp->dev->name, irq_num);
| ^~~~~~
.../tg3.c:11313:34: note: using the range [-2147483648, 2147483647] for directive argument
11313 | "%s-txrx-%d", tp->dev->name, irq_num);
When `make W=1` is supplied, this prevents kernel building. Fix it by
increasing the buffer size for IRQ label and use sizeoF() instead of
hard coded constants.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Message-ID: <20241016090647.691022-1-andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Russell King (Oracle) [Wed, 16 Oct 2024 09:58:44 +0000 (10:58 +0100)]
net: phylink: remove "using_mac_select_pcs"
With DSA's implementation of the mac_select_pcs() method removed, we
can now remove the detection of mac_select_pcs() implementation.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Russell King (Oracle) [Wed, 16 Oct 2024 09:58:39 +0000 (10:58 +0100)]
net: phylink: remove use of pl->pcs in phylink_validate_mac_and_pcs()
When the mac_select_pcs() method is not implemented, there is no way
for pl->pcs to be set to a non-NULL value. This was here to support
the old phylink_set_pcs() method which has been removed a few years
ago. Simplify the code in phylink_validate_mac_and_pcs().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Russell King (Oracle) [Wed, 16 Oct 2024 09:58:34 +0000 (10:58 +0100)]
net: phylink: allow mac_select_pcs() to remove a PCS
phylink has historically not permitted a PCS to be removed. An attempt
to permit this with phylink_set_pcs() resulted in comments indicating
that there was no need for this. This behaviour has been propagated
forward to the mac_select_pcs() approach as it was believed from these
comments that changing this would be NAK'd.
However, with mac_select_pcs(), it takes more code and thus complexity
to maintain this behaviour, which can - and in this case has - resulted
in a bug. If mac_select_pcs() returns NULL for a particular interface
type, but there is already a PCS in-use, then we skip the pcs_validate()
method, but continue using the old PCS. Also, it wouldn't be expected
behaviour by implementers of mac_select_pcs().
Allow this by removing this old unnecessary restriction.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
There is no longer any reason to implement the mac_select_pcs()
callback in DSA. Returning ERR_PTR(-EOPNOTSUPP) is functionally
equivalent to not providing the function.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Andy Shevchenko [Wed, 16 Oct 2024 13:25:26 +0000 (16:25 +0300)]
net: ks8851: use %*ph to print small buffer
Use %*ph format to print small buffer as hex string. It will change
the output format from 32-bit words to byte hexdump, but this is not
critical as it's only a debug message.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org>
Message-ID: <20241016132615.899037-1-andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Simon Horman [Wed, 16 Oct 2024 14:31:14 +0000 (15:31 +0100)]
net: usb: sr9700: only store little-endian values in __le16 variable
In sr_mdio_read() the local variable res is used to store both
little-endian and host byte order values. This prevents Sparse
from helping us by flagging when endian miss matches occur - the
detection process hinges on the type of variables matching the
byte order of values stored in them.
Address this by adding a new local variable, word, to store little-endian
values; change the type of res to int, and use it to store host-byte
order values.
Flagged by Sparse as:
.../sr9700.c:205:21: warning: incorrect type in assignment (different base types)
.../sr9700.c:205:21: expected restricted __le16 [addressable] [usertype] res
.../sr9700.c:205:21: got int
.../sr9700.c:207:21: warning: incorrect type in assignment (different base types)
.../sr9700.c:207:21: expected restricted __le16 [addressable] [usertype] res
.../sr9700.c:207:21: got int
.../sr9700.c:212:16: warning: incorrect type in return expression (different base types)
.../sr9700.c:212:16: expected int
.../sr9700.c:212:16: got restricted __le16 [addressable] [usertype] res
Compile tested only.
No functional change intended.
Signed-off-by: Simon Horman <horms@kernel.org>
Message-ID: <20241016-blackbird-le16-v1-1-97ba8de6b38f@kernel.org> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
The *ndev pointer needs to be set or it leads to an uninitialized variable
bug in the caller.
Fixes: 4a7b2ba94a59 ("net: ethernet: ti: am65-cpsw: Use tstats instead of open coded version") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Roger Quadros <rogerq@kernel.org>
Message-ID: <b168d5c7-704b-4452-84f9-1c1762b1f4ce@stanley.mountain> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Linus Torvalds [Thu, 17 Oct 2024 16:31:18 +0000 (09:31 -0700)]
Merge tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Current release - new code bugs:
- eth: mlx5: HWS, don't destroy more bwc queue locks than allocated
Previous releases - regressions:
- ipv4: give an IPv4 dev to blackhole_netdev
- udp: compute L4 checksum as usual when not segmenting the skb
- tcp/dccp: don't use timer_pending() in reqsk_queue_unlink().
- eth: mlx5e: don't call cleanup on profile rollback failure
- eth: microchip: vcap api: fix memory leaks in
vcap_api_encode_rule_test()
- eth: enetc: disable Tx BD rings after they are empty
- eth: macb: avoid 20s boot delay by skipping MDIO bus registration
for fixed-link PHY
Previous releases - always broken:
- posix-clock: fix missing timespec64 check in pc_clock_settime()
- genetlink: hold RCU in genlmsg_mcast()
- mptcp: prevent MPC handshake on port-based signal endpoints
- eth: vmxnet3: fix packet corruption in vmxnet3_xdp_xmit_frame
- eth: stmmac: dwmac-tegra: fix link bring-up sequence
- eth: bcmasp: fix potential memory leak in bcmasp_xmit()
Misc:
- add Andrew Lunn as a co-maintainer of all networking drivers"
* tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net/mlx5e: Don't call cleanup on profile rollback failure
net/mlx5: Unregister notifier on eswitch init failure
net/mlx5: Fix command bitmask initialization
net/mlx5: Check for invalid vector index on EQ creation
net/mlx5: HWS, use lock classes for bwc locks
net/mlx5: HWS, don't destroy more bwc queue locks than allocated
net/mlx5: HWS, fixed double free in error flow of definer layout
net/mlx5: HWS, removed wrong access to a number of rules variable
mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init
vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame
net: dsa: vsc73xx: fix reception from VLAN-unaware bridges
net: ravb: Only advertise Rx/Tx timestamps if hardware supports it
net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()
net: phy: mdio-bcm-unimac: Add BCM6846 support
dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio
udp: Compute L4 checksum as usual when not segmenting the skb
genetlink: hold RCU in genlmsg_mcast()
net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361
tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
...
Heiner Kallweit [Tue, 15 Oct 2024 05:47:14 +0000 (07:47 +0200)]
net: phy: realtek: merge the drivers for internal NBase-T PHY's
The Realtek RTL8125/RTL8126 NBase-T MAC/PHY chips have internal PHY's
which are register-compatible, at least for the registers we use here.
So let's use just one PHY driver to support all of them.
These internal PHY's exist also as external C45 PHY's, but on the
internal PHY's no access to MMD registers is possible. This can be
used to differentiate between the internal and external version.
As a side effect the drivers for two now external-only drivers don't
require read_mmd/write_mmd hooks any longer.
Sanman Pradhan [Mon, 14 Oct 2024 15:27:09 +0000 (08:27 -0700)]
eth: fbnic: Add hardware monitoring support via HWMON interface
This patch adds support for hardware monitoring to the fbnic driver,
allowing for temperature and voltage sensor data to be exposed to
userspace via the HWMON interface. The driver registers a HWMON device
and provides callbacks for reading sensor data, enabling system
admins to monitor the health and operating conditions of fbnic.
Cosmin Ratiu [Tue, 15 Oct 2024 09:32:08 +0000 (12:32 +0300)]
net/mlx5e: Don't call cleanup on profile rollback failure
When profile rollback fails in mlx5e_netdev_change_profile, the netdev
profile var is left set to NULL. Avoid a crash when unloading the driver
by not calling profile->cleanup in such a case.
This was encountered while testing, with the original trigger that
the wq rescuer thread creation got interrupted (presumably due to
Ctrl+C-ing modprobe), which gets converted to ENOMEM (-12) by
mlx5e_priv_init, the profile rollback also fails for the same reason
(signal still active) so the profile is left as NULL, leading to a crash
later in _mlx5e_remove.
Shay Drory [Tue, 15 Oct 2024 09:32:06 +0000 (12:32 +0300)]
net/mlx5: Fix command bitmask initialization
Command bitmask have a dedicated bit for MANAGE_PAGES command, this bit
isn't Initialize during command bitmask Initialization, only during
MANAGE_PAGES.
In addition, mlx5_cmd_trigger_completions() is trying to trigger
completion for MANAGE_PAGES command as well.
Hence, in case health error occurred before any MANAGE_PAGES command
have been invoke (for example, during mlx5_enable_hca()),
mlx5_cmd_trigger_completions() will try to trigger completion for
MANAGE_PAGES command, which will result in null-ptr-deref error.[1]
Fix it by Initialize command bitmask correctly.
While at it, re-write the code for better understanding.
Maher Sanalla [Tue, 15 Oct 2024 09:32:05 +0000 (12:32 +0300)]
net/mlx5: Check for invalid vector index on EQ creation
Currently, mlx5 driver does not enforce vector index to be lower than
the maximum number of supported completion vectors when requesting a
new completion EQ. Thus, mlx5_comp_eqn_get() fails when trying to
acquire an IRQ with an improper vector index.
To prevent the case above, enforce that vector index value is
valid and lower than maximum in mlx5_comp_eqn_get() before handling the
request.
Cosmin Ratiu [Tue, 15 Oct 2024 09:32:04 +0000 (12:32 +0300)]
net/mlx5: HWS, use lock classes for bwc locks
The HWS BWC API uses one lock per queue and usually acquires one of
them, except when doing changes which require locking all queues in
order. Naturally, lockdep isn't too happy about acquiring the same lock
class multiple times, so inform it that each queue lock is a different
class to avoid false positives.
Cosmin Ratiu [Tue, 15 Oct 2024 09:32:03 +0000 (12:32 +0300)]
net/mlx5: HWS, don't destroy more bwc queue locks than allocated
hws_send_queues_bwc_locks_destroy destroyed more queue locks than
allocated, leading to memory corruption (occasionally) and warnings such
as DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock)) in __mutex_destroy because
sometimes, the 'mutex' being destroyed was random memory.
The severity of this problem is proportional to the number of queues
configured because the code overreaches beyond the end of the
bwc_send_queue_locks array by 2x its length.
Fix that by using the correct number of bwc queues.
Yevgeny Kliteynik [Tue, 15 Oct 2024 09:32:01 +0000 (12:32 +0300)]
net/mlx5: HWS, removed wrong access to a number of rules variable
Removed wrong access to the num_of_rules field of the matcher.
This is a usual u32 variable, but the access was as if it was atomic.
This fixes the following CI warnings:
mlx5hws_bwc.c:708:17: warning: large atomic operation may incur significant performance penalty;
the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Watomic-alignment]
Fixes: 510f9f61a112 ("net/mlx5: HWS, added API and enabled HWS support") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202409291101.6NdtMFVC-lkp@intel.com/ Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
Syzkaller reported this splat:
==================================================================
BUG: KASAN: slab-use-after-free in mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
Read of size 4 at addr ffff8880569ac858 by task syz.1.2799/14662
The buggy address belongs to the object at ffff8880569ac800
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 88 bytes inside of
freed 512-byte region [ffff8880569ac800, ffff8880569aca00)
That's because 'subflow' is used just after 'mptcp_close_ssk(subflow)',
which will initiate the release of its memory. Even if it is very likely
the release and the re-utilisation will be done later on, it is of
course better to avoid any issues and read the content of 'subflow'
before closing it.
Fixes: 1c1f72137598 ("mptcp: pm: only decrement add_addr_accepted for MPJ req") Cc: stable@vger.kernel.org Reported-by: syzbot+3c8b7a8e7df6a2a226ca@syzkaller.appspotmail.com Closes: https://lore.kernel.org/670d7337.050a0220.4cbc0.004f.GAE@google.com Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20241015-net-mptcp-uaf-pm-rm-v1-1-c4ee5d987a64@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Felix Fietkau [Tue, 15 Oct 2024 08:17:55 +0000 (10:17 +0200)]
net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init
The loop responsible for allocating up to MTK_FQ_DMA_LENGTH buffers must
only touch as many descriptors, otherwise it ends up corrupting unrelated
memory. Fix the loop iteration count accordingly.
Fixes: c57e55819443 ("net: ethernet: mtk_eth_soc: handle dma buffer size soc specific") Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241015081755.31060-1-nbd@nbd.name Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann [Mon, 14 Oct 2024 19:03:11 +0000 (21:03 +0200)]
vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame
Andrew and Nikolay reported connectivity issues with Cilium's service
load-balancing in case of vmxnet3.
If a BPF program for native XDP adds an encapsulation header such as
IPIP and transmits the packet out the same interface, then in case
of vmxnet3 a corrupted packet is being sent and subsequently dropped
on the path.
vmxnet3_xdp_xmit_frame() which is called e.g. via vmxnet3_run_xdp()
through vmxnet3_xdp_xmit_back() calculates an incorrect DMA address:
The above assumes a fixed offset (VMXNET3_XDP_HEADROOM), but the XDP
BPF program could have moved xdp->data. While the passed buf_size is
correct (xdpf->len), the dma_addr needs to have a dynamic offset which
can be calculated as xdpf->data - (void *)xdpf, that is, xdp->data -
xdp->data_hard_start.
Fixes: 54f00cce1178 ("vmxnet3: Add XDP support.") Reported-by: Andrew Sauber <andrew.sauber@isovalent.com> Reported-by: Nikolay Nikolaev <nikolay.nikolaev@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Nikolay Nikolaev <nikolay.nikolaev@isovalent.com> Acked-by: Anton Protopopov <aspsk@isovalent.com> Cc: William Tu <witu@nvidia.com> Cc: Ronak Doshi <ronak.doshi@broadcom.com> Link: https://patch.msgid.link/a0888656d7f09028f9984498cc698bb5364d89fc.1728931137.git.daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
It should be invalid to delete an rss context while it is being
referenced from an ntuple filter. ethtool core should prevent this
from happening. This patch adds a testcase to verify this behavior.
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Zahka [Fri, 11 Oct 2024 18:35:47 +0000 (11:35 -0700)]
ethtool: rss: prevent rss ctx deletion when in use
ntuple filters can specify an rss context to use for packet hashing
and queue selection. When a filter is referencing an rss context, it
should be invalid for that context to be deleted. A list of active
ntuple filters and their associated rss contexts can be compiled by
querying a device's ethtool_ops.get_rxnfc. This patch checks to see if
any ntuple filters are referencing an rss context during context
deletion, and prevents the deletion if the requested context is still
in use.
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Golle [Thu, 10 Oct 2024 13:07:26 +0000 (14:07 +0100)]
net: phy: realtek: change order of calls in C22 read_status()
Always call rtlgen_read_status() first, so genphy_read_status() which
is called by it clears bits in case auto-negotiation has not completed.
Also clear 10GBT link-partner advertisement bits in case auto-negotiation
is disabled or has not completed.
Daniel Golle [Thu, 10 Oct 2024 13:07:16 +0000 (14:07 +0100)]
net: phy: realtek: read duplex and gbit master from PHYSR register
The PHYSR MMD register is present and defined equally for all RTL82xx
Ethernet PHYs.
Read duplex and Gbit master bits from rtlgen_decode_speed() and rename
it to rtlgen_decode_physr().
Linus Torvalds [Wed, 16 Oct 2024 20:37:59 +0000 (13:37 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Several miscellaneous fixes. A lot of bnxt_re activity, there will be
more rc patches there coming.
- Many bnxt_re bug fixes - Memory leaks, kasn, NULL pointer deref,
soft lockups, error unwinding and some small functional issues
- Error unwind bug in rdma netlink
- Two issues with incorrect VLAN detection for iWarp
- skb_splice_from_iter() splat in siw
- Give SRP slab caches unique names to resolve the merge window
WARN_ON regression"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/bnxt_re: Fix the GID table length
RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages
RDMA/bnxt_re: Change the sequence of updating the CQ toggle value
RDMA/bnxt_re: Fix an error path in bnxt_re_add_device
RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop
RDMA/bnxt_re: Fix a possible NULL pointer dereference
RDMA/bnxt_re: Return more meaningful error
RDMA/bnxt_re: Fix incorrect dereference of srq in async event
RDMA/bnxt_re: Fix out of bound check
RDMA/bnxt_re: Fix the max CQ WQEs for older adapters
RDMA/srpt: Make slab cache names unique
RDMA/irdma: Fix misspelling of "accept*"
RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP
RDMA/siw: Add sendpage_ok() check to disable MSG_SPLICE_PAGES
RDMA/core: Fix ENODEV error for iWARP test over vlan
RDMA/nldev: Fix NULL pointer dereferences issue in rdma_nl_notify_event
RDMA/bnxt_re: Fix the max WQEs used in Static WQE mode
RDMA/bnxt_re: Add a check for memory allocation
RDMA/bnxt_re: Fix incorrect AVID type in WQE structure
RDMA/bnxt_re: Fix a possible memory leak
Linus Torvalds [Wed, 16 Oct 2024 16:30:20 +0000 (09:30 -0700)]
Merge tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- regression fix: dirty extents tracked in xarray for qgroups must be
adjusted for 32bit platforms
- fix potentially freeing uninitialized name in fscrypt structure
- fix warning about unneeded variable in a send callback
* tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix uninitialized pointer free on read_alloc_one_name() error
btrfs: send: cleanup unneeded return variable in changed_verity()
btrfs: fix uninitialized pointer free in add_inode_ref()
btrfs: use sector numbers as keys for the dirty extents xarray
Linus Torvalds [Wed, 16 Oct 2024 16:15:43 +0000 (09:15 -0700)]
Merge tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- fix race between session setup and session logoff
- add supplementary group support
* tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: add support for supplementary groups
ksmbd: fix user-after-free from session log off
Linus Torvalds [Wed, 16 Oct 2024 02:47:19 +0000 (19:47 -0700)]
Merge tag 'sched_ext-for-6.12-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- More issues reported in the enable/disable paths on large machines
with many tasks due to scx_tasks_lock being held too long. Break up
the task iterations
- Remove ops.select_cpu() dependency in bypass mode so that a
misbehaving implementation can't live-lock the machine by pushing all
tasks to few CPUs in bypass mode
- Other misc fixes
* tag 'sched_ext-for-6.12-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Remove unnecessary cpu_relax()
sched_ext: Don't hold scx_tasks_lock for too long
sched_ext: Move scx_tasks_lock handling into scx_task_iter helpers
sched_ext: bypass mode shouldn't depend on ops.select_cpu()
sched_ext: Move scx_buildin_idle_enabled check to scx_bpf_select_cpu_dfl()
sched_ext: Start schedulers with consistent p->scx.slice values
Revert "sched_ext: Use shorter slice while bypassing"
sched_ext: use correct function name in pick_task_scx() warning message
selftests: sched_ext: Add sched_ext as proper selftest target
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:26 +0000 (13:18 -0700)]
dcb: Use rtnl_register_many().
We will remove rtnl_register() in favour of rtnl_register_many().
When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:25 +0000 (13:18 -0700)]
ipmr: Use rtnl_register_many().
We will remove rtnl_register() and rtnl_register_module() in favour
of rtnl_register_many().
When it succeeds for built-in callers, rtnl_register_many() guarantees
all rtnetlink types in the passed array are supported, and there is no
chance that a part of message types is not supported.
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:23 +0000 (13:18 -0700)]
ipv4: Use rtnl_register_many().
We will remove rtnl_register() in favour of rtnl_register_many().
When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:22 +0000 (13:18 -0700)]
net: Use rtnl_register_many().
We will remove rtnl_register() in favour of rtnl_register_many().
When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:21 +0000 (13:18 -0700)]
net: sched: Use rtnl_register_many().
We will remove rtnl_register() in favour of rtnl_register_many().
When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.
Let's use rtnl_register_many() instead.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20241014201828.91221-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:20 +0000 (13:18 -0700)]
neighbour: Use rtnl_register_many().
We will remove rtnl_register() in favour of rtnl_register_many().
When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:19 +0000 (13:18 -0700)]
rtnetlink: Use rtnl_register_many().
We will remove rtnl_register() in favour of rtnl_register_many().
When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:18 +0000 (13:18 -0700)]
rtnetlink: Panic when __rtnl_register_many() fails for builtin callers.
We will replace all rtnl_register() and rtnl_register_module() with
rtnl_register_many().
Currently, rtnl_register() returns nothing and prints an error message
when it fails to register a rtnetlink message type and handlers.
The failure happens only when rtnl_register_internal() fails to allocate
rtnl_msg_handlers[protocol][msgtype], but it's unlikely for built-in
callers on boot time.
rtnl_register_many() unwinds the previous successful registrations on
failure and returns an error, but it will be useless for built-in callers,
especially some subsystems that do not have the legacy ioctl() interface
and do not work without rtnetlink.
Instead of booting up without rtnetlink functionality, let's panic on
failure for built-in rtnl_register_many() callers.
Jakub Kicinski [Wed, 16 Oct 2024 01:50:14 +0000 (18:50 -0700)]
Merge branch 'gve-adopt-page-pool'
Harshitha Ramamurthy says:
====================
gve: adopt page pool
This patchset implements page pool support for gve.
The first patch deals with movement of code to make
page pool adoption easier in the next patch. The
second patch adopts the page pool API. The third patch
adds basic per queue stats which includes page pool
allocation failures as well.
====================
Harshitha Ramamurthy [Mon, 14 Oct 2024 20:21:08 +0000 (13:21 -0700)]
gve: add support for basic queue stats
Implement netdev_stats_ops to export basic per-queue stats.
With page pool support for DQO added in the previous patches,
rx-alloc-fail captures failures in page pool allocations as
well since the rx_buf_alloc_fail stat tracked in the driver
is incremented when gve_alloc_buffer returns error.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20241014202108.1051963-4-pkaligineedi@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Harshitha Ramamurthy [Mon, 14 Oct 2024 20:21:07 +0000 (13:21 -0700)]
gve: adopt page pool for DQ RDA mode
For DQ queue format in raw DMA addressing(RDA) mode,
implement page pool recycling of buffers by leveraging
a few helper functions.
DQ QPL mode will continue to use the exisiting recycling
logic. This is because in QPL mode, the pages come from a
constant set of pages that the driver pre-allocates and
registers with the device.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Shailend Chand <shailend@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20241014202108.1051963-3-pkaligineedi@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Harshitha Ramamurthy [Mon, 14 Oct 2024 20:21:06 +0000 (13:21 -0700)]
gve: move DQO rx buffer management related code to a new file
In preparation for the upcoming page pool adoption for DQO
raw addressing mode, move RX buffer management code to a new
file. In the follow on patches, page pool code will be added
to this file.
No functional change, just movement of code.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Shailend Chand <shailend@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20241014202108.1051963-2-pkaligineedi@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
do not leave dangling sk pointers in pf->create functions
Some protocol family create() implementations have an error path after
allocating the sk object and calling sock_init_data(). sock_init_data()
attaches the allocated sk object to the sock object, provided by the
caller.
If the create() implementation errors out after calling sock_init_data(),
it releases the allocated sk object, but the caller ends up having a
dangling sk pointer in its sock object on return. Subsequent manipulations
on this sock object may try to access the sk pointer, because it is not
NULL thus creating a use-after-free scenario.
We have implemented a stable hotfix in commit 631083143315
("net: explicitly clear the sk pointer, when pf->create fails"), but this
series aims to fix it properly by going through each of the pf->create()
implementations and making sure they all don't return a sock object with
a dangling pointer on error.
====================
Ignat Korchagin [Mon, 14 Oct 2024 15:38:06 +0000 (16:38 +0100)]
net: inet6: do not leave a dangling sk pointer in inet6_create()
sock_init_data() attaches the allocated sk pointer to the provided sock
object. If inet6_create() fails later, the sk object is released, but the
sock object retains the dangling sk pointer, which may cause use-after-free
later.
Clear the sock sk pointer on error.
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241014153808.51894-8-ignat@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ignat Korchagin [Mon, 14 Oct 2024 15:38:05 +0000 (16:38 +0100)]
net: inet: do not leave a dangling sk pointer in inet_create()
sock_init_data() attaches the allocated sk object to the provided sock
object. If inet_create() fails later, the sk object is freed, but the
sock object retains the dangling pointer, which may create use-after-free
later.
Clear the sk pointer in the sock object on error.
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241014153808.51894-7-ignat@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ignat Korchagin [Mon, 14 Oct 2024 15:38:04 +0000 (16:38 +0100)]
net: ieee802154: do not leave a dangling sk pointer in ieee802154_create()
sock_init_data() attaches the allocated sk object to the provided sock
object. If ieee802154_create() fails later, the allocated sk object is
freed, but the dangling pointer remains in the provided sock object, which
may allow use-after-free.
Ignat Korchagin [Mon, 14 Oct 2024 15:38:03 +0000 (16:38 +0100)]
net: af_can: do not leave a dangling sk pointer in can_create()
On error can_create() frees the allocated sk object, but sock_init_data()
has already attached it to the provided sock object. This will leave a
dangling sk pointer in the sock object and may cause use-after-free later.
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://patch.msgid.link/20241014153808.51894-5-ignat@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ignat Korchagin [Mon, 14 Oct 2024 15:38:02 +0000 (16:38 +0100)]
Bluetooth: RFCOMM: avoid leaving dangling sk pointer in rfcomm_sock_alloc()
bt_sock_alloc() attaches allocated sk object to the provided sock object.
If rfcomm_dlc_alloc() fails, we release the sk object, but leave the
dangling pointer in the sock object, which may cause use-after-free.
Fix this by swapping calls to bt_sock_alloc() and rfcomm_dlc_alloc().
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241014153808.51894-4-ignat@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ignat Korchagin [Mon, 14 Oct 2024 15:38:01 +0000 (16:38 +0100)]
Bluetooth: L2CAP: do not leave dangling sk pointer on error in l2cap_sock_create()
bt_sock_alloc() allocates the sk object and attaches it to the provided
sock object. On error l2cap_sock_alloc() frees the sk object, but the
dangling pointer is still attached to the sock object, which may create
use-after-free in other code.
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241014153808.51894-3-ignat@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ignat Korchagin [Mon, 14 Oct 2024 15:38:00 +0000 (16:38 +0100)]
af_packet: avoid erroring out after sock_init_data() in packet_create()
After sock_init_data() the allocated sk object is attached to the provided
sock object. On error, packet_create() frees the sk object leaving the
dangling pointer in the sock object on return. Some other code may try
to use this pointer and cause use-after-free.
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241014153808.51894-2-ignat@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Mon, 14 Oct 2024 15:30:41 +0000 (18:30 +0300)]
net: dsa: vsc73xx: fix reception from VLAN-unaware bridges
Similar to the situation described for sja1105 in commit 1f9fc48fd302
("net: dsa: sja1105: fix reception from VLAN-unaware bridges"), the
vsc73xx driver uses tag_8021q and doesn't need the ds->untag_bridge_pvid
request. In fact, this option breaks packet reception.
The ds->untag_bridge_pvid option strips VLANs from packets received on
VLAN-unaware bridge ports. But those VLANs should already be stripped
by tag_vsc73xx_8021q.c as part of vsc73xx_rcv() - they are not VLANs in
VLAN-unaware mode, but DSA tags. Thus, dsa_software_vlan_untag() tries
to untag a VLAN that doesn't exist, corrupting the packet.
Fixes: 93e4649efa96 ("net: dsa: provide a software untagging function on RX for VLAN-aware bridges") Tested-by: Pawel Dembicki <paweldembicki@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://patch.msgid.link/20241014153041.1110364-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Niklas Söderlund [Mon, 14 Oct 2024 12:43:43 +0000 (14:43 +0200)]
net: ravb: Only advertise Rx/Tx timestamps if hardware supports it
Recent work moving the reporting of Rx software timestamps to the core
[1] highlighted an issue where hardware time stamping was advertised
for the platforms where it is not supported.
Fix this by covering advertising support for hardware timestamps only if
the hardware supports it. Due to the Tx implementation in RAVB software
Tx timestamping is also only considered if the hardware supports
hardware timestamps. This should be addressed in future, but this fix
only reflects what the driver currently implements.
1. Commit 277901ee3a26 ("ravb: Remove setting of RX software timestamp")
Fixes: 7e09a052dc4e ("ravb: Exclude gPTP feature support for RZ/G2L") Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Paul Barker <paul.barker.ct@bp.renesas.com> Tested-by: Paul Barker <paul.barker.ct@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://patch.msgid.link/20241014124343.3875285-1-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jinjie Ruan [Mon, 14 Oct 2024 12:19:22 +0000 (20:19 +0800)]
net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()
Commit a3c1e45156ad ("net: microchip: vcap: Fix use-after-free error in
kunit test") fixed the use-after-free error, but introduced below
memory leaks by removing necessary vcap_free_rule(), add it to fix it.
Elena Salomatkina [Sun, 13 Oct 2024 12:45:29 +0000 (15:45 +0300)]
net/sched: cbs: Fix integer overflow in cbs_set_port_rate()
The subsequent calculation of port_rate = speed * 1000 * BYTES_PER_KBIT,
where the BYTES_PER_KBIT is of type LL, may cause an overflow.
At least when speed = SPEED_20000, the expression to the left of port_rate
will be greater than INT_MAX.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
As pointed out by Florian:
https://lore.kernel.org/linux-devicetree/b542b2e8-115c-4234-a464-e73aa6bece5c@broadcom.com/
The BCM6846 has a few extra registers and cannot reuse the
compatible string from other variants of the Unimac
MDIO block: we need to be able to tell them apart.
====================