www.infradead.org Git - users/griffoul/linux.git/log

net: phy: aquantia: merge and rename aqr105_read_status() and aqr107_read_status()

aqr105_read_status() and aqr107_read_status() are very similar.
In fact, they are identical, save from a code snippet accessing a Gen2
feature (rate adaptation), placed at the end of aqr107_read_rate(), and
absent from aqr105_read_rate().

The code structure is:

aqr105_read_status() aqr107_read_status()
-> aqr105_read_rate() -> aqr107_read_rate()

After the recent change "net: phy: aquantia: use cached GLOBAL_CFG
registers in aqr107_read_rate()", it is absolutely trivial to
restructure the code as follows:

aqr_gen2_read_status()
-> aqr_gen1_read_status()
-> Gen2-specific stuff (read GLOBAL_CFG registers to set rate_matching)

Doing so reduces code duplication.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250821152022.1065237-10-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: use cached GLOBAL_CFG registers in aqr107_read_rate()

aqr107_read_rate() - called from aqr107_read_status() even periodically
if there is no PHY IRQ - currently reads GLOBAL_CFG registers to
determine what kind of rate adaptation is in use for the current
phydev->speed. However, GLOBAL_CFG registers are runtime invariants, so
accessing the slow MDIO bus is unnecessary.

Reimplement aqr107_read_rate() by reading from the
priv->global_cfg[i].rade_adapt variables (where i is the entry
corresponding to the current phydev->speed).

Making this change also helps disentangle the code delta between
aqr105_read_rate() and aqr107_read_rate(). They are now identical up to
the code snippet which iterates over priv->global_cfg[]. This will help
eliminate the duplicate code in the upcoming patch.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250821152022.1065237-9-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: remove handling for get_rate_matching(PHY_INTERFACE_MODE_NA)

After commit 7642cc28fd37 ("net: phylink: fix PHY validation with rate
adaption"), the API contract changed and PHY drivers are no longer
required to respond to the .get_rate_matching() method for
PHY_INTERFACE_MODE_NA. This was later followed up by documentation
commit 6d4cfcf97986 ("net: phy: Update documentation for
get_rate_matching").

As such, handling PHY_INTERFACE_MODE_NA in the Aquantia PHY driver
implementation of this method is unnecessary and confusing. Remove it.

Cc: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250821152022.1065237-8-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: save a local shadow of GLOBAL_CFG register values

Currently, aqr_gen2_fill_interface_modes() reads VEND1_GLOBAL_CFG_*
registers to populate phydev->supported_interfaces. But this is not
the only place which needs to read these registers. There is also
aqr107_read_rate().

Based on the premise that these values are statically set by firmware
and the driver only needs to read them, the proposal is to read them
only once, at config_init() time, and use the cached values also in
aqr107_read_rate().

This patch only refactors the aqr_gen2_fill_interface_modes() code to
save the registers to driver memory, and to populate supported_interfaces
based on that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250821152022.1065237-7-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: fill supported_interfaces for all aqr_gen2_config_init() callers

Since aqr_gen2_config_init() and aqr_gen2_fill_interface_modes() refer to
the feature set common to the same generation, it means all callers of
aqr_gen2_config_init() also support the Global System Configuration
registers at addresses 1E.31B -> 1E.31F, and these should be read by the
driver to figure out the list of supported interfaces for phylink.

This affects the following PHYs supported by this driver:
- Gen2: AQR107
- Gen3: AQR111, AQR111B0
- Gen4: AQR114C, AQR813.

AQR113C, a Gen4 PHY, has unmodified logic after this change, because
currently, the aqr_gen2_fill_interface_modes() call is chained after
aqr_gen2_config_init(), and after this patch, it is tail-called from the
latter function, leading to the same code flow.

At the same time, move aqr_gen2_fill_interface_modes() upwards of its
new caller, aqr_gen2_config_init(), to avoid a forward declaration.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250821152022.1065237-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: rename some aqr107 functions according to generation

Establish a more intuitive function naming convention in this driver.
A GenX PHY must only call aqr_genY_ functions, where Y <= X.

Loosely speaking, aqr107_ is representative of Gen2 and above, except for:
- aqr107_config_init()
- aqr107_suspend()
- aqr107_resume()
- aqr107_wait_processor_intensive_op()

which are also called by AQR105, so these are renamed to Gen1.

Actually aqr107_config_init() is renamed to aqr_gen1_config_init() when
called by AQR105, and aqr_gen2_config_init() when called by all other
PHYs. The Gen2 function calls the Gen1 function, so there is no
functional change. This prefaces further Gen2-specific initialization
steps which must be omitted for AQR105. These will be added to
aqr_gen2_config_init().

In fact, many PHY drivers call an aqr*_config_init() beneath their
generation's feature set: AQR114C is a Gen4 PHY which calls
aqr_gen2_config_init(), even though AQR113C, also a Gen4 PHY which
differs only in maximum link speed, calls the richer
aqr113c_config_init() which also sets phydev->possible_interfaces.
Many of the more subtle inconsistencies of this kind will be fixed up in
later changes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250821152022.1065237-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: reorder AQR113C PMD Global Transmit Disable bit clearing with supported_interfaces

Introduced in commit bed90b06b681 ("net: phy: aquantia: clear PMD Global
Transmit Disable bit during init"), the clearing of MDIO_PMA_TXDIS plus
the call to aqr107_wait_processor_intensive_op() are only by chance
placed between aqr107_config_init() and aqr107_fill_interface_modes().
In other words, aqr107_fill_interface_modes() does not depend in any way
on these 2 operations.

I am only 90% sure of that, and I intend to move aqr107_fill_interface_modes()
to be a part of aqr107_config_init() in the future. So to isolate the
issue for blame attribution purposes, make these 2 functions adjacent to
each other again.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250821152022.1065237-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: merge aqr113c_fill_interface_modes() into aqr107_fill_interface_modes()

I'm unsure whether intentionate or not, but I think the (partially
observed) naming convention in this driver is that function prefixes
denote the earliest generation when a feature is available. In case of
aqr107_fill_interface_modes(), that means that the GLOBAL_CFG registers
are a Gen2 feature. Supporting evidence: the AQR105, a Gen1 PHY, does
not have these registers, thus the function is not named aqr105_*.

Based on this inferred naming scheme, I am proposing a refinement of
commit a7f3abcf6357 ("net: phy: aquantia: only poll GLOBAL_CFG regs on
aqr113, aqr113c and aqr115c") which introduced aqr113c_fill_interface_modes(),
suggesting this may be a Gen4 PHY feature.

The long-term goal is for aqr107_config_init() to tail-call
aqr107_fill_interface_modes(), such that the latter function is also
called by AQR107 itself, and many other PHY drivers. Currently it can't,
because aqr113c_config_init() calls aqr107_config_init() and then
aqr113c_fill_interface_modes(). So this would lead to a duplicate call
to aqr107_fill_interface_modes() for AQR113C.

Centralize the reading of GLOBAL_CFG registers in the AQR107 method, and
create a boolean, set to true by AQR113C, which tests whether waiting
for a non-zero value in the GLOBAL_CFG_100M register is necessary.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250821152022.1065237-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: rename AQR412 to AQR412C and add real AQR412

I have noticed from schematics and firmware images that the PHY for
which I've previously added support in commit 973fbe68df39 ("net: phy:
aquantia: add AQR112 and AQR412 PHY IDs") is actually an AQR412C, not
AQR412.

These are actually PHYs from the same generation, and Marvell documents
them as differing only in the size of the FCCSP package: 19x19 mm for
the AQR412, vs 14x12mm for the Compact AQR412C.

I don't think there is any point in backporting this to stable kernels,
since the PHYs are identical in capabilities, and no functional
difference is expected regardless of how the PHY is identified.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250821152022.1065237-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-wangxun-complete-ethtool-coalesce-options'

Jiawen Wu says:

====================
net: wangxun: complete ethtool coalesce options

Support to use adaptive RX coalescing. Change the default RX coalesce
usecs and limit the range of parameters for various types of devices,
according to their hardware design.
====================

Link: https://patch.msgid.link/20250821023408.53472-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wangxun: support to use adaptive RX/TX coalescing

Support to turn on/off adaptive RX/TX coalesce. When adaptive coalesce
is on, use DIM algorithm for a dynamic interrupt moderation.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20250821023408.53472-5-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wangxun: cleanup the code in wx_set_coalesce()

Cleanup the code for the next patch to add adaptive RX coalesce.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250821023408.53472-4-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wangxun: limit tx_max_coalesced_frames_irq

Add limitation on tx_max_coalesced_frames_irq as 0 ~ 65535, because
'wx->tx_work_limit' is declared as a member of type u16.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250821023408.53472-3-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ngbe: change the default ITR setting

Change the default RX/TX ITR for wx_mac_em devices from 20K to 7K, which
is an experience value from out-of-tree ngbe driver, to get higher
performance on some platforms with weak single-core performance.

TCP_SRTEAM test on Phytium 2000+ shows that the throughput of 64-Byte
packets is increased from 350.53Mbits/s to 395.92Mbits/s.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250821023408.53472-2-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-hinic3-add-a-driver-for-huawei-3rd-gen-nic-management-interfaces'

Fan Gong says:

====================
net: hinic3: Add a driver for Huawei 3rd gen NIC - management interfaces

This is the 2/3 patch of the patch-set described below.

The patch-set contains driver for Huawei's 3rd generation HiNIC
Ethernet device that will be available in the future.

This is an SRIOV device, designed for data centers.
Initially, the driver only supports VFs.

Following the discussion over RFC01, the code will be submitted in
separate smaller patches where until the last patch the driver is
non-functional. The RFC02 submission contains overall view of the entire
driver but every patch will be posted as a standalone submission.
====================

Link: https://patch.msgid.link/cover.1755673097.git.zhuyikai1@h-partners.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hinic3: Interrupt request configuration

Configure interrupt request initialization.
It allows driver to receive packets and management information
from HW.

Co-developed-by: Xin Guo <guoxin09@huawei.com>
Signed-off-by: Xin Guo <guoxin09@huawei.com>
Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/37615d5d87ced741e522cd966948d11ec87e4ad6.1755673097.git.zhuyikai1@h-partners.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hinic3: Mailbox management interfaces

Add mailbox management interfaces initialization.
It enables mailbox to communicate with event queues from HW.

Co-developed-by: Xin Guo <guoxin09@huawei.com>
Signed-off-by: Xin Guo <guoxin09@huawei.com>
Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/3ce856068d23a0bbce74157e16f701c58ebbb1ce.1755673097.git.zhuyikai1@h-partners.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hinic3: Mailbox framework

Add mailbox framework initialization.
It allows driver to send commands to HW.

Co-developed-by: Xin Guo <guoxin09@huawei.com>
Signed-off-by: Xin Guo <guoxin09@huawei.com>
Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/084f22f0155aaa713fa583205d540cb2bf3c3c2d.1755673097.git.zhuyikai1@h-partners.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hinic3: TX & RX Queue coalesce interfaces

Add TX RX queue coalesce interfaces initialization.
It configures the parameters of tx & tx msix coalesce.

Co-developed-by: Xin Guo <guoxin09@huawei.com>
Signed-off-by: Xin Guo <guoxin09@huawei.com>
Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20bdb94d91e5dcbb3257b7486830ea4109922169.1755673097.git.zhuyikai1@h-partners.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hinic3: Command Queue interfaces

Add Command Queue interfaces initialization.
It enables communictaion and operation with HW.

Co-developed-by: Xin Guo <guoxin09@huawei.com>
Signed-off-by: Xin Guo <guoxin09@huawei.com>
Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/6a3ce147e1b4623f84407b9796eade137ddcf9dc.1755673097.git.zhuyikai1@h-partners.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hinic3: Command Queue framework

Add Command Queue framework initialization.
It is used to set the related table items of the driver and obtain the
HW configuration.

Co-developed-by: Xin Guo <guoxin09@huawei.com>
Signed-off-by: Xin Guo <guoxin09@huawei.com>
Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/1aeed56de39078bde8fff4597d7aa22d350058fc.1755673097.git.zhuyikai1@h-partners.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hinic3: Complete Event Queue interfaces

Add complete event queue interfaces initialization.
It informs that driver should handle the messages from HW.

Co-developed-by: Xin Guo <guoxin09@huawei.com>
Signed-off-by: Xin Guo <guoxin09@huawei.com>
Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/837837f13b96c7155644428a329d5d47b7242153.1755673097.git.zhuyikai1@h-partners.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hinic3: Async Event Queue interfaces

Add async event queue interfaces initialization.
It allows driver to handle async events reported by HW.

Co-developed-by: Xin Guo <guoxin09@huawei.com>
Signed-off-by: Xin Guo <guoxin09@huawei.com>
Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/553ebd562b61cd854a2beb25c3d4d98ad3073db0.1755673097.git.zhuyikai1@h-partners.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'rds-fix-semantic-annotations'

Ujwal Kundur says:

====================
rds: Fix semantic annotations

This patchset addresses all semantic warnings flagged by Sparse for
net/rds.

v1:https://lore.kernel.org/20250810171155.3263-1-ujwal.kundur@gmail.com
====================

Link: https://patch.msgid.link/20250820175550.498-1-ujwal.kundur@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rds: Fix endianness annotations for RDS extension headers

Per the RDS 3.1 spec [1], RDS extension headers EXTHDR_NPATHS and
EXTHDR_GEN_NUM are be16 and be32 values respectively, exchanged during
normal operations over-the-wire (RDS Ping/Pong). This contrasts their
declarations as host endian unsigned ints.

Fix the annotations across occurrences. Flagged by Sparse.

[1] https://oss.oracle.com/projects/rds/dist/documentation/rds-3.1-spec.html

Signed-off-by: Ujwal Kundur <ujwal.kundur@gmail.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250820175550.498-5-ujwal.kundur@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rds: Fix endianness annotation for RDS_MPATH_HASH

jhash_1word accepts host endian inputs while rs_bound_port is a be16
value (sockaddr_in6.sin6_port). Use ntohs() for consistency.

Flagged by Sparse.

Signed-off-by: Ujwal Kundur <ujwal.kundur@gmail.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250820175550.498-4-ujwal.kundur@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rds: Fix endianness annotation of jhash wrappers

__ipv6_addr_jhash (wrapper around jhash2()) and __inet_ehashfn (wrapper
around jhash_3words()) work with u32 (host endian) values but accept big
endian inputs. Declare the local variables as big endian to avoid
unnecessary casts.

Flagged by Sparse.

Signed-off-by: Ujwal Kundur <ujwal.kundur@gmail.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250820175550.498-3-ujwal.kundur@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rds: Replace POLLERR with EPOLLERR

Both constants are 1<<3, but EPOLLERR uses the correct annotations.

Flagged by Sparse.

Signed-off-by: Ujwal Kundur <ujwal.kundur@gmail.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250820175550.498-2-ujwal.kundur@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tcp-user_mss-and-tcp_maxseg-series'

Eric Dumazet says:

====================
tcp: user_mss and TCP_MAXSEG series

Annotate data-races around tp->rx_opt.user_mss and make
TCP_MAXSEG lockless.
====================

Link: https://patch.msgid.link/20250821141901.18839-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: lockless TCP_MAXSEG option

setsockopt(TCP_MAXSEG) writes over a field that does not need
socket lock protection anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250821141901.18839-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: annotate data-races around tp->rx_opt.user_mss

This field is already read locklessly for listeners,
next patch will make setsockopt(TCP_MAXSEG) lockless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250821141901.18839-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

idpf: do not linearize big TSO packets

idpf has a limit on number of scatter-gather frags
that can be used per segment.

Currently, idpf_tx_start() checks if the limit is hit
and forces a linearization of the whole packet.

This requires high order allocations that can fail
under memory pressure. A full size BIG-TCP packet
would require order-7 alocation on x86_64 :/

We can move the check earlier from idpf_features_check()
for TSO packets, to force GSO in this case, removing the
cost of a big copy.

This means that a linearization will eventually happen
with sizes smaller than one MSS.

__idpf_chk_linearize() is renamed to idpf_chk_tso_segment()
and moved to idpf_lib.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Madhu Chittim <madhu.chittim@intel.com>
Cc: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Reviewed-by: Joshua Hay <joshua.a.hay@intel.com>
Tested-by: Brian Vazquez <brianvv@google.com>
Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250818195934.757936-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-test-xdp_tx-for-single-buffer'

Dimitri Daskalakis says:

====================
selftests: Test XDP_TX for single-buffer

Ensure single buffer XDP functions correctly by covering the following cases:
1) Zero size payload
2) Full MTU
3) Single buffer packets through a multi-buffer XDP program

These changes were tested with netdevsim and fbnic.

# ./ksft-net-drv/drivers/net/xdp.py
TAP version 13
1..10
ok 1 xdp.test_xdp_native_pass_sb
ok 2 xdp.test_xdp_native_pass_mb
ok 3 xdp.test_xdp_native_drop_sb
ok 4 xdp.test_xdp_native_drop_mb
ok 5 xdp.test_xdp_native_tx_sb
ok 6 xdp.test_xdp_native_tx_mb
# Failed run: pkt_sz 2048, offset 1. Last successful run: pkt_sz 1024, offset 256. Reason: Adjustment failed
ok 7 xdp.test_xdp_native_adjst_tail_grow_data
ok 8 xdp.test_xdp_native_adjst_tail_shrnk_data
# Failed run: pkt_sz 512, offset -256. Last successful run: pkt_sz 512, offset -128. Reason: Adjustment failed
ok 9 xdp.test_xdp_native_adjst_head_grow_data
# Failed run: pkt_sz (2048) > HDS threshold (1536) and offset 64 > 48
ok 10 xdp.test_xdp_native_adjst_head_shrnk_data
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0
====================

Link: https://patch.msgid.link/20250821014023.1481662-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: xdp: Validate single-buff XDP_TX in multi-buff mode

Validate that drivers with multi-buff XDP programs properly reinitialize
xdp_buff between packets.

Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20250821014023.1481662-4-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: xdp: Add a single-buffer XDP_TX test.

Test single-buffer XDP_TX for packets with various payload sizes.
Update the socat TX command to generate packets with 0 length payloads.

Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20250821014023.1481662-3-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: xdp: Extract common XDP_TX setup/validation.

In preparation of single-buffer XDP_TX tests, refactor common test code
into the _test_xdp_native_tx method. Add support for multiple payload
sizes, and additional validation for RX packet count. Pass the -n flag
to echo to avoid adding an extra byte into the TX packet.

Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20250821014023.1481662-2-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Octeontx2-af: Broadcast XON on all channels

The NIX block receives traffic from multiple channels, including:

MAC block (RPM)
Loopback module (LBK)
CPT block

                     RPM
                      |
                -----------------
       LBK   --|     NIX         |
                -----------------
                     |
                    CPT

Due to a hardware errata,  CN10k and earlier Octeon silicon series,
the hardware may incorrectly assert XOFF on certain channels during
reset. As a workaround, a write operation to the NIX_AF_RX_CHANX_CFG
register can be performed to broadcast XON signals on the affected
channels

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250820064625.1464361-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: support unreadable netmem

Declare PP_FLAG_ALLOW_UNREADABLE_NETMEM to turn on unreadable netmem
support in GVE.

We also drop any net_iov packets where header split is not enabled.
We're unable to process packets where the header landed in unreadable
netmem.

Use page_pool_dma_sync_netmem_for_cpu in lieu of
dma_sync_single_range_for_cpu to correctly handle unreadable netmem
that should not be dma-sync'd.

Disable rx_copybreak optimization if payload is unreadable netmem as
that needs access to the payload.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250818210507.3781705-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-next-25-08-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter: updates for net-next

First patch gets rid of refcounting for dying list dumping,
use a cookie value instead of keeping the object around.

Remaining patches extend nftables pipapo (concatenated ranges) set type.

Make the AVX2 optimized version available from the control plane as
well, then use it during insert.  This gives a nice speedup for large
sets. All from myself.

On PREEMPT_RT, we can't rely on local_bh_disable to protect the
access to the percpu scratch maps.  Use nested-BH locking for this,
From Sebastian Siewior.

* tag 'nf-next-25-08-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nft_set_pipapo: Use nested-BH locking for nft_pipapo_scratch
  netfilter: nft_set_pipapo: Store real pointer, adjust later.
  netfilter: nft_set_pipapo: use avx2 algorithm for insertions too
  netfilter: nft_set_pipapo_avx2: split lookup function in two parts
  netfilter: nft_set_pipapo_avx2: Drop the comment regarding protection
  netfilter: ctnetlink: remove refcounting in dying list dumping
====================

Link: https://patch.msgid.link/20250820144738.24250-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: fix stmmac_simple_pm_ops build errors

The kernel test robot reports that various drivers have an undefined
reference to stmmac_simple_pm_ops. This is caused by
EXPORT_SYMBOL_GPL_SIMPLE_DEV_PM_OPS() defining the struct as static
and omitting the export when CONFIG_PM=n, unlike DEFINE_SIMPLE_PM_OPS()
which still defines the struct non-static.

Switch to using DEFINE_SIMPLE_PM_OPS() + EXPORT_SYMBOL_GPL(), which
means we always define stmmac_simple_pm_ops, and it will always be
visible for dwmac-* to reference whether modular or built-in.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508132051.a7hJXkrd-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202508132158.dEwQdick-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202508140029.V6tDuUxc-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202508161406.RwQuZBkA-lkp@intel.com/
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uojpo-00BMoL-4W@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-remove-the-use-of-dev_err_probe'

Xichao Zhao says:

====================
net: Remove the use of dev_err_probe()

The dev_err_probe() doesn't do anything when error is '-ENOMEM'.
Therefore, remove the useless call to dev_err_probe(), and just
return the value instead.
====================

Link: https://patch.msgid.link/20250820085749.397586-1-zhao.xichao@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: Remove the use of dev_err_probe()

The dev_err_probe() doesn't do anything when error is '-ENOMEM'.
Therefore, remove the useless call to dev_err_probe(), and just
return the value instead.

Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250820085749.397586-3-zhao.xichao@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hibmcge: Remove the use of dev_err_probe()

The dev_err_probe() doesn't do anything when error is '-ENOMEM'.
Therefore, remove the useless call to dev_err_probe(), and just
return the value instead.

Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com>
Link: https://patch.msgid.link/20250820085749.397586-2-zhao.xichao@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-08-21

We've added 9 non-merge commits during the last 3 day(s) which contain
a total of 13 files changed, 1027 insertions(+), 27 deletions(-).

The main changes are:

1) Added bpf dynptr support for accessing the metadata of a skb,
   from Jakub Sitnicki.
   The patches are merged from a stable branch bpf-next/skb-meta-dynptr.
   The same patches have also been merged into bpf-next/master.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Cover metadata access from a modified skb clone
  selftests/bpf: Cover read/write to skb metadata at an offset
  selftests/bpf: Cover write access to skb metadata via dynptr
  selftests/bpf: Cover read access to skb metadata via dynptr
  selftests/bpf: Parametrize test_xdp_context_tuntap
  selftests/bpf: Pass just bpf_map to xdp_context_test helper
  selftests/bpf: Cover verifier checks for skb_meta dynptr type
  bpf: Enable read/write access to skb metadata through a dynptr
  bpf: Add dynptr type for skb metadata
====================

Link: https://patch.msgid.link/20250821191827.2099022-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.17-rc3).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: fix memory leak in tls.c

To free memory and close fd after use

Suggested-by: Jun Zhan <zhanjun@uniontech.com>
Signed-off-by: Cryolitia PukNgae <cryolitia@uniontech.com>
Link: https://patch.msgid.link/20250819-memoryleak-v1-1-d4c70a861e62@uniontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from Bluetooth.

  Current release - fix to a fix:

   - usb: asix_devices: fix PHY address mask in MDIO bus initialization

  Current release - regressions:

   - Bluetooth: fixes for the split between BIS_LINK and PA_LINK

   - Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN
     flag", breaks compatibility with some existing device tree blobs

   - dsa: b53: fix reserved register access in b53_fdb_dump()

  Current release - new code bugs:

   - sched: dualpi2: run probability update timer in BH to avoid
     deadlock

   - eth: libwx: fix the size in RSS hash key population

   - pse-pd: pd692x0: improve power budget error paths and handling

  Previous releases - regressions:

   - tls: fix handling of zero-length records on the rx_list

   - hsr: reject HSR frame if skb can't hold tag

   - bonding: fix negotiation flapping in 802.3ad passive mode

  Previous releases - always broken:

   - gso: forbid IPv6 TSO with extensions on devices with only IPV6_CSUM

   - sched: make cake_enqueue return NET_XMIT_CN when past buffer_limit,
     avoid packet drops with low buffer_limit, remove unnecessary WARN()

   - sched: fix backlog accounting after modifying config of a qdisc in
     the middle of the hierarchy

   - mptcp: improve handling of skb extension allocation failures

   - eth: mlx5:
       - fixes for the "HW Steering" flow management method
       - fixes for QoS and device buffer management"

* tag 'net-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
  netfilter: nf_reject: don't leak dst refcount for loopback packets
  net/mlx5e: Preserve shared buffer capacity during headroom updates
  net/mlx5e: Query FW for buffer ownership
  net/mlx5: Restore missing scheduling node cleanup on vport enable failure
  net/mlx5: Fix QoS reference leak in vport enable error path
  net/mlx5: Destroy vport QoS element when no configuration remains
  net/mlx5e: Preserve tc-bw during parent changes
  net/mlx5: Remove default QoS group and attach vports directly to root TSAR
  net/mlx5: Base ECVF devlink port attrs from 0
  net: pse-pd: pd692x0: Skip power budget configuration when undefined
  net: pse-pd: pd692x0: Fix power budget leak in manager setup error path
  Octeontx2-af: Skip overlap check for SPI field
  selftests: tls: add tests for zero-length records
  tls: fix handling of zero-length records on the rx_list
  net: airoha: ppe: Do not invalid PPE entries in case of SW hash collision
  selftests: bonding: add test for passive LACP mode
  bonding: send LACPDUs periodically in passive mode after receiving partner's LACPDU
  bonding: update LACP activity flag after setting lacp_active
  Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flag"
  ipv6: sr: Fix MAC comparison to be constant-time
  ...

netfilter: nf_reject: don't leak dst refcount for loopback packets

recent patches to add a WARN() when replacing skb dst entry found an
old bug:

WARNING: include/linux/skbuff.h:1165 skb_dst_check_unset include/linux/skbuff.h:1164 [inline]
WARNING: include/linux/skbuff.h:1165 skb_dst_set include/linux/skbuff.h:1210 [inline]
WARNING: include/linux/skbuff.h:1165 nf_reject_fill_skb_dst+0x2a4/0x330 net/ipv4/netfilter/nf_reject_ipv4.c:234
[..]
Call Trace:
nf_send_unreach+0x17b/0x6e0 net/ipv4/netfilter/nf_reject_ipv4.c:325
nft_reject_inet_eval+0x4bc/0x690 net/netfilter/nft_reject_inet.c:27
expr_call_ops_eval net/netfilter/nf_tables_core.c:237 [inline]
..

This is because blamed commit forgot about loopback packets.
Such packets already have a dst_entry attached, even at PRE_ROUTING stage.

Instead of checking hook just check if the skb already has a route
attached to it.

Fixes: f53b9b0bdc59 ("netfilter: introduce support for reject at prerouting stage")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250820123707.10671-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: page_pool: add page_pool_get()

There is a page_pool_put() function but no get equivalent.
Having multiple references to a page pool is quite useful.
It avoids branching in create / destroy paths in drivers
which support memory providers.

Use the new helper in bnxt.

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250820025704.166248-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-misx-fixes-2025-08-20'

Mark Bloch says:

====================
mlx5 misx fixes 2025-08-20

This patchset provides misc bug fixes from the team to the mlx5
core and Eth drivers.

v1: https://lore.kernel.org/1755095476-414026-1-git-send-email-tariqt@nvidia.com
====================

Link: https://patch.msgid.link/20250820133209.389065-1-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Preserve shared buffer capacity during headroom updates

When port buffer headroom changes, port_update_shared_buffer()
recalculates the shared buffer size and splits it in a 3:1 ratio
(lossy:lossless) - Currently, the calculation is:
lossless = shared / 4;
lossy = (shared / 4) * 3;

Meaning, the calculation dropped the remainder of shared % 4 due to
integer division, unintentionally reducing the total shared buffer
by up to three cells on each update. Over time, this could shrink
the buffer below usable size.

Fix it by changing the calculation to:
lossless = shared / 4;
lossy = shared - lossless;

This retains all buffer cells while still approximating the
intended 3:1 split, preventing capacity loss over time.

While at it, perform headroom calculations in units of cells rather than
in bytes for more accurate calculations avoiding extra divisions.

Fixes: a440030d8946 ("net/mlx5e: Update shared buffer along with device buffer changes")
Signed-off-by: Armen Ratner <armeng@nvidia.com>
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20250820133209.389065-9-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Query FW for buffer ownership

The SW currently saves local buffer ownership when setting
the buffer.
This means that the SW assumes it has ownership of the buffer
after the command is set.

If setting the buffer fails and we remain in FW ownership,
the local buffer ownership state incorrectly remains as SW-owned.
This leads to incorrect behavior in subsequent PFC commands,
causing failures.

Instead of saving local buffer ownership in SW,
query the FW for buffer ownership when setting the buffer.
This ensures that the buffer ownership state is accurately
reflected, avoiding the issues caused by incorrect ownership
states.

Fixes: ecdf2dadee8e ("net/mlx5e: Receive buffer support for DCBX")
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-8-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Restore missing scheduling node cleanup on vport enable failure

Restore the __esw_qos_free_node() call removed by the offending commit.

Fixes: 97733d1e00a0 ("net/mlx5: Add traffic class scheduling support for vport QoS")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-7-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Fix QoS reference leak in vport enable error path

Add missing esw_qos_put() call when __esw_qos_alloc_node() fails in
mlx5_esw_qos_vport_enable().

Fixes: be034baba83e ("net/mlx5: Make vport QoS enablement more flexible for future extensions")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-6-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Destroy vport QoS element when no configuration remains

If a VF has been configured and the user later clears all QoS settings,
the vport element remains in the firmware QoS tree. This leads to
inconsistent behavior compared to VFs that were never configured, since
the FW assumes that unconfigured VFs are outside the QoS hierarchy.
As a result, the bandwidth share across VFs may differ, even though
none of them appear to have any configuration.

Align the driver behavior with the FW expectation by destroying the
vport QoS element when all configurations are removed.

Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20250820133209.389065-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Preserve tc-bw during parent changes

When changing parent of a node/leaf with tc-bw configured, the code
saves and restores tc-bw values. However, it was reading the converted
hardware bw_share values (where 0 becomes 1) instead of the original
user values, causing incorrect tc-bw calculations after parent change.

Store original tc-bw values in the node structure and use them directly
for save/restore operations.

Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-4-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Remove default QoS group and attach vports directly to root TSAR

Currently, the driver creates a default group (`node0`) and attaches
all vports to it unless the user explicitly sets a parent group. As a
result, when a user configures tx_share on a group and tx_share on
a VF, the expectation is for the group and the VF to share bandwidth
relatively. However, since the VF is not connected to the same parent
(but to the default node), the proportional share logic is not applied
correctly.

To fix this, remove the default group (`node0`) and instead connect
vports directly to the root TSAR when no parent is specified. This
ensures that vports and groups share the same root scheduler and their
tx_share values are compared directly under the same hierarchy.

Fixes: 0fe132eac38c ("net/mlx5: E-switch, Allow to add vports to rate groups")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Base ECVF devlink port attrs from 0

Adjust the vport number by the base ECVF vport number so the port
attributes start at 0. Previously the port attributes would start 1
after the maximum number of host VFs.

Fixes: dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-2-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pse-pd: pd692x0: Skip power budget configuration when undefined

If the power supply's power budget is not defined in the device tree,
the current code still requests power and configures the PSE manager
with a 0W power limit, which is undesirable behavior.

Skip power budget configuration entirely when the budget is zero,
avoiding unnecessary power requests and preventing invalid 0W limits
from being set on the PSE manager.

Fixes: 359754013e6a ("net: pse-pd: pd692x0: Add support for PSE PI priority feature")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250820133321.841054-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pse-pd: pd692x0: Fix power budget leak in manager setup error path

Fix a resource leak where manager power budgets were freed on both
success and error paths during manager setup. Power budgets should
only be freed on error paths after regulator registration or during
driver removal.

Refactor cleanup logic by extracting OF node cleanup and power budget
freeing into separate helper functions for better maintainability.

Fixes: 359754013e6a ("net: pse-pd: pd692x0: Add support for PSE PI priority feature")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250820132708.837255-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Octeontx2-af: Skip overlap check for SPI field

Octeontx2/CN10K silicon supports generating a 256-bit key per packet.
The specific fields to be extracted from a packet for key generation
are configurable via a Key Extraction (MKEX) Profile.

The AF driver scans the configured extraction profile to ensure that
fields from upper layers do not overwrite fields from lower layers in
the key.

Example Packet Field Layout:
LA: DMAC + SMAC
LB: VLAN
LC: IPv4/IPv6
LD: TCP/UDP

Valid MKEX Profile Configuration:

LA   -> DMAC   -> key_offset[0-5]
LC   -> SIP    -> key_offset[20-23]
LD   -> SPORT  -> key_offset[30-31]

Invalid MKEX profile configuration:

LA   -> DMAC   -> key_offset[0-5]
LC   -> SIP    -> key_offset[20-23]
LD   -> SPORT  -> key_offset[2-3]  // Overlaps with DMAC field

In another scenario, if the MKEX profile is configured to extract
the SPI field from both AH and ESP headers at the same key offset,
the driver rejecting this configuration. In a regular traffic,
ipsec packet will be having either AH(LD) or ESP (LE). This patch
relaxes the check for the same.

Fixes: 12aa0a3b93f3 ("octeontx2-af: Harden rule validation.")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250820063919.1463518-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: tls: add tests for zero-length records

Test various combinations of zero-length records.
Unfortunately, kernel cannot be coerced into producing those,
so hardcode the ciphertext messages in the test.

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250820021952.143068-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: fix handling of zero-length records on the rx_list

Each recvmsg() call must process either
- only contiguous DATA records (any number of them)
- one non-DATA record

If the next record has different type than what has already been
processed we break out of the main processing loop. If the record
has already been decrypted (which may be the case for TLS 1.3 where
we don't know type until decryption) we queue the pending record
to the rx_list. Next recvmsg() will pick it up from there.

Queuing the skb to rx_list after zero-copy decrypt is not possible,
since in that case we decrypted directly to the user space buffer,
and we don't have an skb to queue (darg.skb points to the ciphertext
skb for access to metadata like length).

Only data records are allowed zero-copy, and we break the processing
loop after each non-data record. So we should never zero-copy and
then find out that the record type has changed. The corner case
we missed is when the initial record comes from rx_list, and it's
zero length.

Reported-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg>
Reported-by: Billy Jheng Bing-Jhong <billy@starlabs.sg>
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250820021952.143068-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'loongarch-fixes-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
"Fix a lot of build warnings for LTO-enabled objtool check, increase
  COMMAND_LINE_SIZE up to 4096, rename a missing GCC_PLUGIN_STACKLEAK to
  KSTACK_ERASE, and fix some bugs about arch timer, module loading, LBT
  and KVM"

* tag 'loongarch-fixes-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Add address alignment check in pch_pic register access
  LoongArch: KVM: Use kvm_get_vcpu_by_id() instead of kvm_get_vcpu()
  LoongArch: KVM: Fix stack protector issue in send_ipi_data()
  LoongArch: KVM: Make function kvm_own_lbt() robust
  LoongArch: Rename GCC_PLUGIN_STACKLEAK to KSTACK_ERASE
  LoongArch: Save LBT before FPU in setup_sigcontext()
  LoongArch: Optimize module load time by optimizing PLT/GOT counting
  LoongArch: Add cpuhotplug hooks to fix high cpu usage of vCPU threads
  LoongArch: Increase COMMAND_LINE_SIZE up to 4096
  LoongArch: Pass annotate-tablejump option if LTO is enabled
  objtool/LoongArch: Get table size correctly if LTO is enabled

Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library fixes from Eric Biggers:
"Fix a regression where 'make clean' stopped removing some of the
  generated assembly files on arm and arm64"

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crypto: ensure generated *.S files are removed on make clean
  lib/crypto: sha: Update Kconfig help for SHA1 and SHA256

Merge tag '6.17-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- fix refcount issue that can cause memory leak

- rate limit repeated connections from IPv6, not just IPv4 addresses

- fix potential null pointer access of smb direct work queue

* tag '6.17-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix refcount leak causing resource not released
  ksmbd: extend the connection limiting mechanism to support IPv6
  smb: server: split ksmbd_rdma_stop_listening() out of ksmbd_rdma_destroy()

Merge branch 'add-ppe-driver-for-qualcomm-ipq9574-soc'

Luo Jie says:

====================
Add PPE driver for Qualcomm IPQ9574 SoC

The PPE (packet process engine) hardware block is available in Qualcomm
IPQ chipsets that support PPE architecture, such as IPQ9574 and IPQ5332.
The PPE in the IPQ9574 SoC includes six Ethernet ports (6 GMAC and 6
XGMAC), which are used to connect with external PHY devices by PCS. The
PPE also includes packet processing offload capabilities for various
networking functions such as route and bridge flows, VLANs, different
tunnel protocols and VPN. It also includes an L2 switch function for
bridging packets among the 6 Ethernet ports and the CPU port. The CPU
port enables packet transfer between the Ethernet ports and the ARM
cores in the SoC, using the Ethernet DMA.

This patch series is the first part of a three part series that will
together enable Ethernet function for IPQ9574 SoC. While support is
initially being added for IPQ9574 SoC, the driver will be easily
extendable to enable Ethernet support for other IPQ SoC such as IPQ5332.
The driver can also be extended later for adding support for L2/L3
network offload features that the PPE can support. The functionality
to be enabled by each of the three series (to be posted sequentially)
is as below:

Part 1: The PPE patch series (this series), which enables the platform
driver, probe and initialization/configuration of different PPE hardware
blocks.

Part 2: The PPE MAC patch series, which enables the phylink operations
for the PPE Ethernet ports.

Part 3: The PPE EDMA patch series, which enables the Rx/Tx Ethernet DMA
and netdevice driver for the 6 PPE Ethernet ports.

A more detailed description of the functions enabled by part 1 is below:
1. Initialize PPE device hardware functions such as buffer management,
   queue management, scheduler and clocks in order to bring up PPE
   device.
2. Enable platform driver and probe functions
3. Register debugfs file to provide access to various PPE packet
   counters. These statistics are recorded by the various hardware
   process counters, such as port RX/TX, CPU code and hardware queue
   counters.
4. A detailed introduction of PPE along with the PPE hardware diagram
   in the first two patches (dt-bindings and documentation).

Below is a reference to an earlier RFC discussion with the community
about enabling Ethernet driver support for Qualcomm IPQ9574 SoC. This
writeup can help provide a higher level architectural view of various
other drivers that support the PPE such as clock and PCS drivers.
Topic: RFC: Advice on adding support for Qualcomm IPQ9574 SoC Ethernet.
https://lore.kernel.org/linux-arm-msm/d2929bd2-bc9e-4733-a89f-2a187e8bf917@quicinc.com/

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
====================

Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-0-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

MAINTAINERS: Add maintainer for Qualcomm PPE driver

Add maintainer entry for PPE (Packet Process Engine) driver supported
for Qualcomm IPQ SoCs.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-14-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: qualcomm: Add PPE debugfs support for PPE counters

The PPE hardware counters maintain counters for packets handled by the
various functional blocks of PPE. They help in tracing the packets
passed through PPE and debugging any packet drops.

The counters displayed by this debugfs file are ones that are common
for all Ethernet ports, and they do not include the counters that are
specific for a MAC port. Hence they cannot be displayed using ethtool.
The per-MAC counters will be supported using "ethtool -S" along with
the netdevice driver.

The PPE hardware various type counters are made available through the
debugfs files under directory "/sys/kernel/debug/ppe/".

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-13-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: qualcomm: Initialize PPE L2 bridge settings

Initialize the L2 bridge settings for the PPE ports to only enable
L2 frame forwarding between CPU port and PPE Ethernet ports.

The per-port L2 bridge settings are initialized as follows:
For PPE CPU port, the PPE bridge TX is enabled and FDB learning is
disabled. For PPE physical ports, the default L2 forwarding action
is initialized to forward to CPU port only.

L2/FDB learning and forwarding will not be enabled for PPE physical
ports yet, since the port's VSI (Virtual Switch Instance) and VSI
membership are not yet configured, which are required for FDB
forwarding. The VSI and FDB forwarding will later be enabled when
switchdev is enabled.

Signed-off-by: Lei Wei <quic_leiwei@quicinc.com>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-12-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: qualcomm: Initialize PPE queue to Ethernet DMA ring mapping

Configure the selected queues to map with an Ethernet DMA ring for the
packet to receive on ARM cores.

As default initialization, all queues assigned to CPU port 0 are mapped
to the EDMA ring 0. This configuration is later updated during Ethernet
DMA initialization.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-11-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: qualcomm: Initialize PPE RSS hash settings

The PPE RSS hash is generated during PPE receive, based on the packet
content (3 tuples or 5 tuples) and as per the configured RSS seed. The
hash is then used to select the queue to transmit the packet to the
ARM CPU.

This patch initializes the RSS hash settings that are used to generate
the hash for the packet during PPE packet receive.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-10-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: qualcomm: Initialize PPE port control settings

Configure the default action as drop when the packet size is more than
the configured MTU of physical port. Also enable port specific counters
in PPE.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-9-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: qualcomm: Initialize PPE service code settings

PPE service code is a special code (0-255) that is defined by PPE for
PPE's packet processing stages, as per the network functions required
for the packet.

For packet being sent out by ARM cores on Ethernet ports, The service
code 1 is used as the default service code. This service code is used
to bypass most of packet processing stages of the PPE before the packet
transmitted out PPE port, since the software network stack has already
processed the packet.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-8-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: qualcomm: Initialize PPE queue settings

Configure unicast and multicast hardware queues for the PPE ports to
enable packet forwarding between the ports.

Each PPE port is assigned with a range of queues. The queue ID selection
for the packet is decided by the queue base and queue offset that is
configured based on the internal priority and the RSS hash value of the
packet.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-7-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: qualcomm: Initialize the PPE scheduler settings

The PPE scheduler settings determine the priority of scheduling the
packet across the different hardware queues per PPE port.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-6-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: qualcomm: Initialize PPE queue management for IPQ9574

QM (queue management) configurations decide the length of PPE queues
and the queue depth for these queues which are used to drop packets
in events of congestion.

There are two types of PPE queues - unicast queues (0-255) and multicast
queues (256-299). These queue types are used to forward different types
of traffic, and are configured with different lengths.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-5-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: qualcomm: Initialize PPE buffer management for IPQ9574

The BM (Buffer Management) config controls the pause frame generated
on the PPE port. There are maximum 15 BM ports and 4 groups supported,
all BM ports are assigned to group 0 by default. The number of hardware
buffers configured for the port influence the threshold of the flow
control for that port.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-4-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: qualcomm: Add PPE driver for IPQ9574 SoC

The PPE (Packet Process Engine) hardware block is available on Qualcomm
IPQ SoC that support PPE architecture, such as IPQ9574.

The PPE in IPQ9574 includes six integrated Ethernet MAC for 6 PPE ports,
buffer management, queue management and scheduler functions. The MACs
can connect with the external PHY or switch devices using the UNIPHY PCS
block available in the SoC.

The PPE also includes various packet processing offload capabilities
such as L3 routing and L2 bridging, VLAN and tunnel processing offload.
It also includes Ethernet DMA function for transferring packets between
ARM cores and PPE Ethernet ports.

This patch adds the base source files and Makefiles for the PPE driver
such as platform driver registration, clock initialization, and PPE
reset routines.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-3-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

docs: networking: Add PPE driver documentation for Qualcomm IPQ9574 SoC

Add description and high-level diagram for PPE, driver overview and
module enable/debug information.

Signed-off-by: Lei Wei <quic_leiwei@quicinc.com>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-2-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: Add PPE for Qualcomm IPQ9574 SoC

The PPE (packet process engine) hardware block is available in Qualcomm
IPQ chipsets that support PPE architecture, such as IPQ9574. The PPE in
the IPQ9574 SoC includes six Ethernet ports (6 GMAC and 6 XGMAC), which
are used to connect with external PHY devices by PCS. It includes an L2
switch function for bridging packets among the 6 Ethernet ports and the
CPU port. The CPU port enables packet transfer between the Ethernet ports
and the ARM cores in the SoC, using the Ethernet DMA.

The PPE also includes packet processing offload capabilities for various
networking functions such as route and bridge flows, VLANs, different
tunnel protocols and VPN.

The PPE switch is modeled according to the Ethernet switch schema, with
additional properties defined for the switch node for interrupts, clocks,
resets, interconnects and Ethernet DMA. The switch port node is extended
with additional properties for clocks and resets.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-1-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: airoha: ppe: Do not invalid PPE entries in case of SW hash collision

SW hash computed by airoha_ppe_foe_get_entry_hash routine (used for
foe_flow hlist) can theoretically produce collisions between two
different HW PPE entries.
In airoha_ppe_foe_insert_entry() if the collision occurs we will mark
the second PPE entry in the list as stale (setting the hw hash to 0xffff).
Stale entries are no more updated in airoha_ppe_foe_flow_entry_update
routine and so they are removed by Netfilter.
Fix the problem not marking the second entry as stale in
airoha_ppe_foe_insert_entry routine if we have already inserted the
brand new entry in the PPE table and let Netfilter remove real stale
entries according to their timestamp.
Please note this is just a theoretical issue spotted reviewing the code
and not faced running the system.

Fixes: cd53f622611f9 ("net: airoha: Add L2 hw acceleration support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250818-airoha-en7581-hash-collision-fix-v1-1-d190c4b53d1c@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-phy-micrel-add-support-for-lan8842'

Horatiu Vultur says:

====================
net: phy: micrel: Add support for lan8842

Add support for LAN8842 which supports industry-standard SGMII.
While add this the first 3 patches in the series cleans more the
driver, they should not introduce any functional changes.
====================

Link: https://patch.msgid.link/20250818075121.1298170-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: micrel: Add support for lan8842

The LAN8842 is a low-power, single port triple-speed (10BASE-T/ 100BASE-TX/
1000BASE-T) ethernet physical layer transceiver (PHY) that supports
transmission and reception of data on standard CAT-5, as well as CAT-5e and
CAT-6, Unshielded Twisted Pair (UTP) cables.

The LAN8842 supports industry-standard SGMII (Serial Gigabit Media
Independent Interface) providing chip-to-chip connection to a Gigabit
Ethernet MAC using a single serialized link (differential pair) in each
direction.

There are 2 variants of the lan8842. The one that supports timestamping
(lan8842) and one that doesn't have timestamping (lan8832).

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250818075121.1298170-5-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: micrel: Replace hardcoded pages with defines

The functions lan_*_page_reg gets as a second parameter the page
where the register is. In all the functions the page was hardcoded.
Replace the hardcoded values with defines to make it more clear
what are those parameters.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20250818075121.1298170-4-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: micrel: Introduce lanphy_modify_page_reg

As the name suggests this function modifies the register in an
extended page. It has the same parameters as phy_modify_mmd.
This function was introduce because there are many places in the
code where the registers was read then the value was modified and
written back. So replace all this code with this function to make
it clear.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20250818075121.1298170-3-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: micrel: Start using PHY_ID_MATCH_MODEL

Start using PHY_ID_MATCH_MODEL for all the drivers.
While at this add also PHY_ID_KSZ8041RNLI to micrel_tbl.

It is safe to change the current of 0x00fffff0 to PHY_ID_MATCH_MODEL
because all the micrel PHYs have in MSB a value of 0.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250818075121.1298170-2-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'mlx5-next-vhca-id' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5-next-vhca-id

A preparation patchset for adjacent function vports.

Adjacent functions can delegate their SR-IOV VFs to sibling PFs,
allowing for more flexible and scalable management in multi-host and
ECPF-to-host scenarios. Adjacent vports can be managed by the management
PF via their unique vhca id and can't be managed by function index as the
index can conflict with the local vports/vfs.

This series provides:

- Use the cached vcha id instead of querying it every time from fw
- Query hca cap using vhca id instead of function id when FW supports it
- Add HW capabilities and required definitions for adjacent function vports

* tag 'mlx5-next-vhca-id' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  {rdma,net}/mlx5: export mlx5_vport_get_vhca_id
  net/mlx5: E-Switch, Set/Query hca cap via vhca id
  net/mlx5: E-Switch, Cache vport vhca id on first cap query
  net/mlx5: mlx5_ifc, Add hardware definitions needed for adjacent vports
====================

Link: https://patch.msgid.link/20250815194901.298689-1-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: pktgen: Use min()/min_t() to improve pktgen_finalize_skb()

Use min() and min_t() to improve pktgen_finalize_skb() and avoid
calculating 'datalen / frags' twice.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20250815153334.295431-3-thorsten.blum@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'bonding-fix-negotiation-flapping-in-802-3ad-passive-mode'

Hangbin Liu says:

====================
bonding: fix negotiation flapping in 802.3ad passive mode

This patch fixes unstable LACP negotiation when bonding is configured in
passive mode (`lacp_active=off`).

Previously, the actor would stop sending LACPDUs after initial negotiation
succeeded, leading to the partner timing out and restarting the negotiation
cycle. This resulted in continuous LACP state flapping.

The fix ensures the passive actor starts sending periodic LACPDUs after
receiving the first LACPDU from the partner, in accordance with IEEE
802.1AX-2020 section 6.4.1.
====================

Link: https://patch.msgid.link/20250815062000.22220-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: bonding: add test for passive LACP mode

Add a selftest to verify bonding behavior when `lacp_active` is set to `off`.

The test checks the following:
- The passive LACP bond should not send LACPDUs before receiving a partner's
LACPDU.
- The transmitted LACPDUs must not include the active flag.
- After transitioning to EXPIRED and DEFAULTED states, the passive side should
still not initiate LACPDUs.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250815062000.22220-4-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bonding: send LACPDUs periodically in passive mode after receiving partner's LACPDU

When `lacp_active` is set to `off`, the bond operates in passive mode, meaning
it only "speaks when spoken to." However, the current kernel implementation
only sends an LACPDU in response when the partner's state changes.

As a result, once LACP negotiation succeeds, the actor stops sending LACPDUs
until the partner times out and sends an "expired" LACPDU. This causes
continuous LACP state flapping.

According to IEEE 802.1AX-2014, 6.4.13 Periodic Transmission machine. The
values of Partner_Oper_Port_State.LACP_Activity and
Actor_Oper_Port_State.LACP_Activity determine whether periodic transmissions
take place. If either or both parameters are set to Active LACP, then periodic
transmissions occur; if both are set to Passive LACP, then periodic
transmissions do not occur.

To comply with this, we remove the `!bond->params.lacp_active` check in
`ad_periodic_machine()`. Instead, we initialize the actor's port's
`LACP_STATE_LACP_ACTIVITY` state based on `lacp_active` setting.

Additionally, we avoid setting the partner's state to
`LACP_STATE_LACP_ACTIVITY` in the EXPIRED state, since we should not assume
the partner is active by default.

This ensures that in passive mode, the bond starts sending periodic LACPDUs
after receiving one from the partner, and avoids flapping due to inactivity.

Fixes: 3a755cd8b7c6 ("bonding: add new option lacp_active")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250815062000.22220-3-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bonding: update LACP activity flag after setting lacp_active

The port's actor_oper_port_state activity flag should be updated immediately
after changing the lacp_active option to reflect the current mode correctly.

Fixes: 3a755cd8b7c6 ("bonding: add new option lacp_active")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250815062000.22220-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: openvswitch: Use for_each_cpu() where appropriate

Due to legacy reasons, openswitch code opencodes for_each_cpu() to make
sure that CPU0 is always considered.

Since commit c4b2bf6b4a35 ("openvswitch: Optimize operations for OvS
flow_stats."), the corresponding flow->cpu_used_mask is initialized
such that CPU0 is explicitly set.

So, switch the code to using plain for_each_cpu().

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://patch.msgid.link/20250818172806.189325-1-yury.norov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: packetdrill: Support single protocol test.

Currently, we cannot write IPv4 or IPv6 specific packetdrill tests
as ksft_runner.sh runs each .pkt file for both protocols.

Let's support single protocol test by checking --ip_version in the
.pkt file.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250819231527.1427361-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: set net.core.rmem_max and net.core.wmem_max to 4 MB

SO_RCVBUF and SO_SNDBUF have limited range today, unless
distros or system admins change rmem_max and wmem_max.

Even iproute2 uses 1 MB SO_RCVBUF which is capped by
the kernel.

Decouple [rw]mem_max and [rw]mem_default and increase
[rw]mem_max to 4 MB.

Before:

$ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.core.wmem_default = 212992
net.core.wmem_max = 212992

After:

$ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max
net.core.rmem_default = 212992
net.core.rmem_max = 4194304
net.core.wmem_default = 212992
net.core.wmem_max = 4194304

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250819174030.1986278-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'bnxt_en-updates-for-net-next'

Michael Chan says:

====================
bnxt_en: Updates for net-next

The first patch is the FW interface update, followed by 3 patches to
support the expanded pcie v2 structure for ethtool -d. The last patch
adds a Hyper-V PCI ID for the 5760X chips (Thor2).

v1: https://lore.kernel.org/20250818004940.5663-1-michael.chan@broadcom.com
====================

Link: https://patch.msgid.link/20250819163919.104075-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Add Hyper-V VF ID

VFs of the P7 chip family created by Hyper-V will have the device ID of
0x181b.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250819163919.104075-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Add pcie_ctx_v2 support for ethtool -d

Add support to dump the expanded v2 struct that contains PCIE read/write
latency and credit histogram data.

Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250819163919.104075-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Add pcie_stat_len to struct bp

Add this length field to capture the length of the pcie stats structure
supported by the FW. This length will be determined in
bnxt_ethtool_init(). The minimum of this FW length and the length
known to the driver will determine the actual ethtool -d length.

Suggested-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250819163919.104075-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>