www.infradead.org Git - linux.git/log

i40e: Replace one-element array with flex-array member in struct i40e_section_table

One-element and zero-length arrays are deprecated. So, replace
one-element array in struct i40e_section_table with flexible-array
member.

This results in no differences in binary output.

Link: https://github.com/KSPP/linux/issues/335
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

i40e: Replace one-element array with flex-array member in struct i40e_profile_segment

One-element and zero-length arrays are deprecated. So, replace
one-element array in struct i40e_profile_segment with flexible-array
member.

This results in no differences in binary output.

Link: https://github.com/KSPP/linux/issues/335
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Tested-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

i40e: Replace one-element array with flex-array member in struct i40e_package_header

One-element and zero-length arrays are deprecated. So, replace
one-element array in struct i40e_package_header with flexible-array
member.

The `+ sizeof(u32)` adjustments ensure that there are no differences
in binary output.

Link: https://github.com/KSPP/linux/issues/335
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Merge branch 'improve-the-taprio-qdisc-s-relationship-with-its-children'

Vladimir Oltean says:

====================
Improve the taprio qdisc's relationship with its children

v1: https://lore.kernel.org/lkml/20230531173928.1942027-1-vladimir.oltean@nxp.com/

Prompted by Vinicius' request to consolidate some child Qdisc
dereferences in taprio:
https://lore.kernel.org/netdev/87edmxv7x2.fsf@intel.com/

I remembered that I had left some unfinished work in this Qdisc, namely
commit af7b29b1deaa ("Revert "net/sched: taprio: make qdisc_leaf() see
the per-netdev-queue pfifo child qdiscs"").

This patch set represents another stab at, essentially, what's in the
title. Not only does taprio not properly detect when it's grafted as a
non-root qdisc, but it also returns incorrect per-class stats.
Eventually, Vinicius' request is addressed too, although in a different
form than the one he requested (which was purely cosmetic).

Review from people more experienced with Qdiscs than me would be
appreciated. I tried my best to explain what I consider to be problems.
I am deliberately targeting net-next because the changes are too
invasive for net - they were reverted from stable once already.
====================

Link: https://lore.kernel.org/r/20230807193324.4128292-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/tc-testing: verify that a qdisc can be grafted onto a taprio class

The reason behind commit af7b29b1deaa ("Revert "net/sched: taprio: make
qdisc_leaf() see the per-netdev-queue pfifo child qdiscs"") was that the
patch it reverted caused a crash when attaching a CBS shaper to one of
the taprio classes. Prevent that from happening again by adding a test
case for it, which now passes correctly in both offload and software
modes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20230807193324.4128292-12-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/tc-testing: test that taprio can only be attached as root

Check that the "Can only be attached as root qdisc" error message from
taprio is effective by attempting to attach it to a class of another
taprio qdisc. That operation should fail.

In the bug that was squashed by change "net/sched: taprio: try again to
report q->qdiscs[] to qdisc_leaf()", grafting a child taprio to a root
software taprio would be misinterpreted as a change() to the root
taprio. Catch this by looking at whether the base-time of the root
taprio has changed to follow the base-time of the child taprio,
something which should have absolutely never happened assuming correct
semantics.

Vinicius points out that looking at "base_time" in the tc qdisc show
output is unreliable because user space is in a race with the kernel
applying the setting. So we create a helper bash script which waits
while there is any pending schedule.

Link: https://lore.kernel.org/netdev/87il9w0xx7.fsf@intel.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20230807193324.4128292-11-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/tc-testing: add ptp_mock Kconfig dependency

For offloaded tc-taprio testing with netdevsim, the mock-up PHC driver
is used.

Suggested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230807193324.4128292-10-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: netdevsim: mimic tc-taprio offload

To be able to use netdevsim for tc-testing with an offloaded tc-taprio
schedule, it needs to report a PTP clock (which it now does), and to
accept ndo_setup_tc(TC_SETUP_QDISC_TAPRIO) calls.

Since netdevsim has no packet I/O, this doesn't do anything intelligent,
it only allows taprio offload code paths to go through some level of
automated testing.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230807193324.4128292-9-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: netdevsim: use mock PHC driver

I'd like to make netdevsim offload tc-taprio, but currently, this Qdisc
emits a ETHTOOL_GET_TS_INFO call to the driver to make sure that it has
a PTP clock, so that it is reasonably capable of offloading the schedule.

By using the mock PHC driver, that becomes possible.

Hardware timestamping is not necessary, and netdevsim does not support
packet I/O anyway.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230807193324.4128292-8-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ptp: create a mock-up PTP Hardware Clock driver

There are several cases where virtual net devices may benefit from
having a PTP clock, and these have to do with testing. I can see at
least netdevsim and veth as potential users of a common mock-up PTP
hardware clock driver.

The proposed idea is to create an object which emulates PTP clock
operations on top of the unadjustable CLOCK_MONOTONIC_RAW plus a
software-controlled time domain via a timecounter/cyclecounter and then
link that PHC to the netdevsim device.

The driver is fully functional for its intended purpose, and it
successfully passes the PTP selftests.

$ cd tools/testing/selftests/ptp/
$ ./phc.sh /dev/ptp2
TEST: settime                          [ OK ]
TEST: adjtime                          [ OK ]
TEST: adjfreq                          [ OK ]

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230807193324.4128292-7-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: taprio: dump class stats for the actual q->qdiscs[]

This makes a difference for the software scheduling mode, where
dev_queue->qdisc_sleeping is the same as the taprio root Qdisc itself,
but when we're talking about what Qdisc and stats get reported for a
traffic class, the root taprio isn't what comes to mind, but q->qdiscs[]
is.

To understand the difference, I've attempted to send 100 packets in
software mode through class 8001:5, and recorded the stats before and
after the change.

Here is before:

$ tc -s class show dev eth0
class taprio 8001:1 root leaf 8001:
Sent 9400 bytes 100 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:2 root leaf 8001:
Sent 9400 bytes 100 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:3 root leaf 8001:
Sent 9400 bytes 100 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:4 root leaf 8001:
Sent 9400 bytes 100 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:5 root leaf 8001:
Sent 9400 bytes 100 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:6 root leaf 8001:
Sent 9400 bytes 100 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:7 root leaf 8001:
Sent 9400 bytes 100 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:8 root leaf 8001:
Sent 9400 bytes 100 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0

and here is after:

class taprio 8001:1 root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:2 root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:3 root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:4 root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:5 root
Sent 9400 bytes 100 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:6 root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:7 root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0
class taprio 8001:8 root leaf 800d:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
window_drops 0

The most glaring (and expected) difference is that before, all class
stats reported the global stats, whereas now, they really report just
the counters for that traffic class.

Finally, Pedro Tammela points out that there is a tc selftest which
checks specifically which handle do the child Qdiscs corresponding to
each class have. That's changing here - taprio no longer reports
tcm->tcm_info as the same handle "1:" as itself (the root Qdisc), but 0
(the handle of the default pfifo child Qdiscs). Since iproute2 does not
print a child Qdisc handle of 0, adjust the test's expected output.

Link: https://lore.kernel.org/netdev/3b83fcf6-a5e8-26fb-8c8a-ec34ec4c3342@mojatatu.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230807193324.4128292-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: taprio: delete misleading comment about preallocating child qdiscs

As mentioned in commit af7b29b1deaa ("Revert "net/sched: taprio: make
qdisc_leaf() see the per-netdev-queue pfifo child qdiscs"") - unlike
mqprio, taprio doesn't use q->qdiscs[] only as a temporary transport
between Qdisc_ops :: init() and Qdisc_ops :: attach().

Delete the comment, which is just stolen from mqprio, but there, the
usage patterns are a lot different, and this is nothing but confusing.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230807193324.4128292-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: taprio: try again to report q->qdiscs[] to qdisc_leaf()

This is another stab at commit 1461d212ab27 ("net/sched: taprio: make
qdisc_leaf() see the per-netdev-queue pfifo child qdiscs"), later
reverted in commit af7b29b1deaa ("Revert "net/sched: taprio: make
qdisc_leaf() see the per-netdev-queue pfifo child qdiscs"").

I believe that the problems that caused the revert were fixed, and thus,
this change is identical to the original patch.

Its purpose is to properly reject attaching a software taprio child
qdisc to a software taprio parent. Because unoffloaded taprio currently
reports itself (the root Qdisc) as the return value from qdisc_leaf(),
then the process of attaching another taprio as child to a Qdisc class
of the root will just result in a Qdisc_ops :: change() call for the
root. Whereas that's not we want. We want Qdisc_ops :: init() to be
called for the taprio child, in order to give the taprio child a chance
to check whether its sch->parent is TC_H_ROOT or not (and reject this
configuration).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230807193324.4128292-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: taprio: keep child Qdisc refcount elevated at 2 in offload mode

Normally, Qdiscs have one reference on them held by their owner and one
held for each TXQ to which they are attached, however this is not the
case with the children of an offloaded taprio. Instead, the taprio qdisc
currently lives in the following fragile equilibrium.

In the software scheduling case, taprio attaches itself (the root Qdisc)
to all TXQs, thus having a refcount of 1 + the number of TX queues. In
this mode, the q->qdiscs[] children are not visible directly to the
Qdisc API. The lifetime of the Qdiscs from this private array lasts
until qdisc_destroy() -> taprio_destroy().

In the fully offloaded case, the root taprio has a refcount of 1, and
all child q->qdiscs[] also have a refcount of 1. The child q->qdiscs[]
are attached to the netdev TXQs directly and thus are visible to the
Qdisc API, however taprio loses a reference to them very early - during
qdisc_graft(parent==NULL) -> taprio_attach(). At that time, taprio frees
the q->qdiscs[] array to not leak memory, but interestingly, it does not
release a reference on these qdiscs because it doesn't effectively own
them - they are created by taprio but owned by the Qdisc core, and will
be freed by qdisc_graft(parent==NULL, new==NULL) -> qdisc_put(old) when
the Qdisc is deleted or when the child Qdisc is replaced with something
else.

My interest is to change this equilibrium such that taprio also owns a
reference on the q->qdiscs[] child Qdiscs for the lifetime of the root
Qdisc, including in full offload mode. I want this because I would like
taprio_leaf(), taprio_dump_class(), taprio_dump_class_stats() to have
insight into q->qdiscs[] for the software scheduling mode - currently
they look at dev_queue->qdisc_sleeping, which is, as mentioned, the same
as the root taprio.

The following set of changes is necessary:
- don't free q->qdiscs[] early in taprio_attach(), free it late in
  taprio_destroy() for consistency with software mode. But:
- currently that's not possible, because taprio doesn't own a reference
  on q->qdiscs[]. So hold that reference - once during the initial
  attach() and once during subsequent graft() calls when the child is
  changed.
- always keep track of the current child in q->qdiscs[], even for full
  offload mode, so that we free in taprio_destroy() what we should, and
  not something stale.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230807193324.4128292-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: taprio: don't access q->qdiscs[] in unoffloaded mode during attach()

This is a simple code transformation with no intended behavior change,
just to make it absolutely clear that q->qdiscs[] is only attached to
the child taprio classes in full offload mode.

Right now we use the q->qdiscs[] variable in taprio_attach() for
software mode too, but that is quite confusing and avoidable. We use
it only to reach the netdev TX queue, but we could as well just use
netdev_get_tx_queue() for that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230807193324.4128292-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-expose-nic-temperature-via-hwmon-api'

Saeed Mahameed says:

====================
mlx5: Expose NIC temperature via hwmon API

Expose NIC temperature by implementing hwmon kernel API, which turns
current thermal zone kernel API to redundant.

For each one of the supported and exposed thermal diode sensors, expose
the following attributes:
1) Input temperature.
2) Highest temperature.
3) Temperature label.
4) Temperature critical max value:
   refers to the high threshold of Warning Event. Will be exposed as
   `tempY_crit` hwmon attribute (RO attribute). For example for
   ConnectX5 HCA's this temperature value will be 105 Celsius, 10
   degrees lower than the HW shutdown temperature).
5) Temperature reset history: resets highest temperature.
====================

Link: https://lore.kernel.org/r/20230807180507.22984-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Expose NIC temperature via hardware monitoring kernel API

Expose NIC temperature by implementing hwmon kernel API, which turns
current thermal zone kernel API to redundant.

For each one of the supported and exposed thermal diode sensors, expose
the following attributes:
1) Input temperature.
2) Highest temperature.
3) Temperature label:
   Depends on the firmware capability, if firmware doesn't support
   sensors naming, the fallback naming convention would be: "sensorX",
   where X is the HW spec (MTMP register) sensor index.
4) Temperature critical max value:
   refers to the high threshold of Warning Event. Will be exposed as
   `tempY_crit` hwmon attribute (RO attribute). For example for
   ConnectX5 HCA's this temperature value will be 105 Celsius, 10
   degrees lower than the HW shutdown temperature).
5) Temperature reset history: resets highest temperature.

For example, for dualport ConnectX5 NIC with a single IC thermal diode
sensor will have 2 hwmon directories (one for each PCI function)
under "/sys/class/hwmon/hwmon[X,Y]".

Listing one of the directories above (hwmonX/Y) generates the
corresponding output below:

$ grep -H -d skip . /sys/class/hwmon/hwmon0/*

Output
=======================================================================
/sys/class/hwmon/hwmon0/name:mlx5
/sys/class/hwmon/hwmon0/temp1_crit:105000
/sys/class/hwmon/hwmon0/temp1_highest:48000
/sys/class/hwmon/hwmon0/temp1_input:46000
/sys/class/hwmon/hwmon0/temp1_label:asic
grep: /sys/class/hwmon/hwmon0/temp1_reset_history: Permission denied

In addition, displaying the sensors data via lm_sensors generates the
corresponding output below:

$ sensors

Output
=======================================================================
mlx5-pci-0800
Adapter: PCI adapter
asic:         +46.0°C  (crit = +105.0°C, highest = +48.0°C)

mlx5-pci-0801
Adapter: PCI adapter
asic:         +46.0°C  (crit = +105.0°C, highest = +48.0°C)

CC: Jean Delvare <jdelvare@suse.com>
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807180507.22984-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Expose port.c/mlx5_query_module_num() function

Make mlx5_query_module_num() defined in port.c, a non-static, so it can
be used by other files.

CC: Jean Delvare <jdelvare@suse.com>
CC: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807180507.22984-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/llc/llc_conn.c: fix 4 instances of -Wmissing-variable-declarations

I'm looking to enable -Wmissing-variable-declarations behind W=1. 0day
bot spotted the following instances:

  net/llc/llc_conn.c:44:5: warning: no previous extern declaration for
  non-static variable 'sysctl_llc2_ack_timeout'
  [-Wmissing-variable-declarations]
  44 | int sysctl_llc2_ack_timeout = LLC2_ACK_TIME * HZ;
     |     ^
  net/llc/llc_conn.c:44:1: note: declare 'static' if the variable is not
  intended to be used outside of this translation unit
  44 | int sysctl_llc2_ack_timeout = LLC2_ACK_TIME * HZ;
     | ^
  net/llc/llc_conn.c:45:5: warning: no previous extern declaration for
  non-static variable 'sysctl_llc2_p_timeout'
  [-Wmissing-variable-declarations]
  45 | int sysctl_llc2_p_timeout = LLC2_P_TIME * HZ;
     |     ^
  net/llc/llc_conn.c:45:1: note: declare 'static' if the variable is not
  intended to be used outside of this translation unit
  45 | int sysctl_llc2_p_timeout = LLC2_P_TIME * HZ;
     | ^
  net/llc/llc_conn.c:46:5: warning: no previous extern declaration for
  non-static variable 'sysctl_llc2_rej_timeout'
  [-Wmissing-variable-declarations]
  46 | int sysctl_llc2_rej_timeout = LLC2_REJ_TIME * HZ;
     |     ^
  net/llc/llc_conn.c:46:1: note: declare 'static' if the variable is not
  intended to be used outside of this translation unit
  46 | int sysctl_llc2_rej_timeout = LLC2_REJ_TIME * HZ;
     | ^
  net/llc/llc_conn.c:47:5: warning: no previous extern declaration for
  non-static variable 'sysctl_llc2_busy_timeout'
  [-Wmissing-variable-declarations]
  47 | int sysctl_llc2_busy_timeout = LLC2_BUSY_TIME * HZ;
     |     ^
  net/llc/llc_conn.c:47:1: note: declare 'static' if the variable is not
  intended to be used outside of this translation unit
  47 | int sysctl_llc2_busy_timeout = LLC2_BUSY_TIME * HZ;
     | ^

These symbols are referenced by more than one translation unit, so make
include the correct header for their declarations. Finally, sort the
list of includes to help keep them tidy.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/llvm/202308081000.tTL1ElTr-lkp@intel.com/
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230808-llc_static-v1-1-c140c4c297e4@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: annotate data-races around sock->ops

IPV6_ADDRFORM socket option is evil, because it can change sock->ops
while other threads might read it. Same issue for sk->sk_family
being set to AF_INET.

Adding READ_ONCE() over sock->ops reads is needed for sockets
that might be impacted by IPV6_ADDRFORM.

Note that mptcp_is_tcpsk() can also overwrite sock->ops.

Adding annotations for all sk->sk_family reads will require
more patches :/

BUG: KCSAN: data-race in ____sys_sendmsg / do_ipv6_setsockopt

write to 0xffff888109f24ca0 of 8 bytes by task 4470 on cpu 0:
do_ipv6_setsockopt+0x2c5e/0x2ce0 net/ipv6/ipv6_sockglue.c:491
ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012
udpv6_setsockopt+0x95/0xa0 net/ipv6/udp.c:1690
sock_common_setsockopt+0x61/0x70 net/core/sock.c:3663
__sys_setsockopt+0x1c3/0x230 net/socket.c:2273
__do_sys_setsockopt net/socket.c:2284 [inline]
__se_sys_setsockopt net/socket.c:2281 [inline]
__x64_sys_setsockopt+0x66/0x80 net/socket.c:2281
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffff888109f24ca0 of 8 bytes by task 4469 on cpu 1:
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x349/0x4c0 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmmsg+0x263/0x500 net/socket.c:2643
__do_sys_sendmmsg net/socket.c:2672 [inline]
__se_sys_sendmmsg net/socket.c:2669 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2669
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0xffffffff850e32b8 -> 0xffffffff850da890

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 4469 Comm: syz-executor.1 Not tainted 6.4.0-rc5-syzkaller-00313-g4c605260bc60 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230808135809.2300241-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'remove-redundant-functions-and-use-generic-functions'

Li Zetao says:

====================
Remove redundant functions and use generic functions

This patch set removes some redundant functions. In the network module,
two generic functions are provided to convert u64 value and Ethernet
MAC address. Using generic functions helps reduce redundant code and
improve code readability.
====================

Link: https://lore.kernel.org/r/20230808114504.4036008-1-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: Remove redundant functions rvu_npc_exact_mac2u64()

The rvu_npc_exact_mac2u64() is used to convert an Ethernet MAC address
into a u64 value, as this is exactly what ether_addr_to_u64() does.
Use ether_addr_to_u64() to replace the rvu_npc_exact_mac2u64().

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Acked-by: Geethasowjanya Akula <gakula@marvell.com>
Link: https://lore.kernel.org/r/20230808114504.4036008-4-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: Use u64_to_ether_addr() to convert ethernet address

Use u64_to_ether_addr() to convert a u64 value to an Ethernet MAC address,
instead of directly calculating, as this is exactly what this
function does.

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Acked-by: Geethasowjanya Akula <gakula@marvell.com>
Link: https://lore.kernel.org/r/20230808114504.4036008-3-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: Remove redundant functions mac2u64() and cfg2mac()

The mac2u64() is used to convert an Ethernet MAC address into a u64 value,
as this is exactly what ether_addr_to_u64() does. Similarly, the cfg2mac()
is also the case. Use ether_addr_to_u64() and u64_to_ether_addr() instead
of these two.

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Acked-by: Geethasowjanya Akula <gakula@marvell.com>
Link: https://lore.kernel.org/r/20230808114504.4036008-2-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlxsw-set-port-stp-state-on-bridge-enslavement'

Petr Machata says:

====================
mlxsw: Set port STP state on bridge enslavement

When the first port joins a LAG that already has a bridge upper, an
instance of struct mlxsw_sp_bridge_port is created for the LAG to keep
track of it as a bridge port. The bridge_port's STP state is initialized to
BR_STATE_DISABLED. This made sense previously, because mlxsw would only
ever allow a port to join a LAG if the LAG had no uppers. Thus if a
bridge_port was instantiated, it must have been because the LAG as such is
joining a bridge, and the STP state is correspondingly disabled.

However as of commit 2c5ffe8d7226 ("mlxsw: spectrum: Permit enslavement to
netdevices with uppers"), mlxsw allows a port to join a LAG that is already
a member of a bridge. The STP state may be different than disabled in that
case. Initialize it properly by querying the actual state.

This bug may cause an issue as traffic on ports attached to a bridged LAG
gets dropped on ingress with discard_ingress_general counter bumped.

The above fix in patch #1. Patch #2 contains a selftest that would
sporadically reproduce the issue.
====================

Link: https://lore.kernel.org/r/cover.1691498735.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: router_bridge_lag: Add a new selftest

Add a selftest to verify enslavement to a LAG with upper after fresh
devlink reload.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/373a7754daa4dac32759a45095f47b08a2a869c8.1691498735.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: Set port STP state on bridge enslavement

When the first port joins a LAG that already has a bridge upper, an
instance of struct mlxsw_sp_bridge_port is created for the LAG to keep
track of it as a bridge port. The bridge_port's STP state is initialized to
BR_STATE_DISABLED. This made sense previously, because mlxsw would only
ever allow a port to join a LAG if the LAG had no uppers. Thus if a
bridge_port was instantiated, it must have been because the LAG as such is
joining a bridge, and the STP state is correspondingly disabled.

However as of commit 2c5ffe8d7226 ("mlxsw: spectrum: Permit enslavement to
netdevices with uppers"), mlxsw allows a port to join a LAG that is already
a member of a bridge. The STP state may be different than disabled in that
case. Initialize it properly by querying the actual state.

This bug may cause an issue as traffic on ports attached to a bridged LAG
gets dropped on ingress with discard_ingress_general counter bumped.

Fixes: c6514f3627a0 ("Merge branch 'mlxsw-enslavement'")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/39f4a5781050866b4132f350d7d8cf7ab23ea070.1691498735.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethernet: s2io: Use ether_addr_to_u64() to convert ethernet address

Use ether_addr_to_u64() to convert an Ethernet address into a u64 value,
instead of directly calculating, as this is exactly what
this function does.

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230808113849.4033657-1-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-next-2023-08-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter updates for net-next

First 4 Patches, from Yue Haibing, remove unused prototypes in
various netfilter headers.

Last patch makes nfnetlink_log to always include a packet timestamp,
up to now it was only included if the skb had assigned previously.
From Maciej Żenczykowski.

* tag 'nf-next-2023-08-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nfnetlink_log: always add a timestamp
  netfilter: h323: Remove unused function declarations
  netfilter: conntrack: Remove unused function declarations
  netfilter: helper: Remove unused function declarations
  netfilter: gre: Remove unused function declaration nf_ct_gre_keymap_flush()
====================

Link: https://lore.kernel.org/r/20230808124159.19046-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_switchdev: Use is_zero_ether_addr() instead of ether_addr_equal()

Use is_zero_ether_addr() instead of ether_addr_equal()
to check if the ethernet address is all zeros.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230808133528.4083501-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl-gen: add missing empty line between policies

We're missing empty line between policies.
DPLL will need this.

Tested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20230808200907.1290647-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: switchdev: Remove unused declaration switchdev_port_fwd_mark_set()

Commit 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices")
removed the implementation but leave declaration.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20230808145955.2176-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxbf_gige: Remove two unused function declarations

Commit f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
declared but never implemented these.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Link: https://lore.kernel.org/r/20230808145249.41596-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Remove two unused function declarations

Commit 1e2dc14509fd ("net: ethtool: Add helpers for reporting test results")
declared but never implemented these function.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230808144610.19096-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mark parsed interface mode for legacy switch drivers

If we successfully parsed an interface mode with a legacy switch
driver, populate that mode into phylink's supported interfaces rather
than defaulting to the internal and gmii interfaces.

This hasn't caused an issue so far, because when the interface doesn't
match a supported one, phylink_validate() doesn't clear the supported
mask, but instead returns -EINVAL. phylink_parse_fixedlink() doesn't
check this return value, and merely relies on the supported ethtool
link modes mask being cleared. Therefore, the fixed link settings end
up being allowed despite validation failing.

Before this causes a problem, arrange for DSA to more accurately
populate phylink's supported interfaces mask so validation can
correctly succeed.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/E1qTKdM-003Cpx-Eh@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: clear flag on port register error path

When xarray insertion fails, clear the flag.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230808082020.1363497-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl-gen: avoid rendering empty validate field

When dont-validate flags are filtered out for do/dump op, the list may
be empty. In that case, avoid rendering the validate field.

Fixes: fa8ba3502ade ("ynl-gen-c.py: render netlink policies static for split ops")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230808090344.1368874-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: txgbe: Use pci_dev_id() to simplify the code

PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it manually. Use pci_dev_id() to
simplify the code a little bit.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230808024931.147048-1-wangxiongfeng2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bcm63xx_enet: Remove redundant initialization owner

The platform_register_drivers() will set "THIS_MODULE" to driver.owner when
register a platform_driver driver, so it is redundant initialization to set
driver.owner in the statement. Remove it for clean code.

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://lore.kernel.org/r/20230808014702.2712699-1-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tipc: Remove unused declaration tipc_link_build_bc_sync_msg()

Commit 526669866140 ("tipc: let broadcast packet reception use new link receive function")
declared but never implemented this.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807142926.45752-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mt7530: improve and relax PHY driver dependency

Different MT7530 variants require different PHY drivers.
Use 'imply' instead of 'select' to relax the dependency on the PHY
driver, and choose the appropriate driver.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sfc-conntrack-offload'

Edward Cree says:

====================
sfc: basic conntrack offload

Support offloading tracked connections and matching against them in
TC chains on the PF and on representors.
Later patch serieses will add NAT and conntrack-on-tunnel-netdevs;
keep it simple for now.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: offload left-hand side rules for conntrack

Handle the (comparatively) simple case of a -trk rule on an efx netdev
(i.e. not a tunnel decap rule) with ct and goto chain actions.

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: conntrack state matches in TC rules

Parse ct_state trk/est, mark and zone out of flower keys, and plumb
them through to the hardware, performing some minor translations.
Nothing can actually hit them yet as we're not offloading any DO_CT
actions.

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: handle non-zero chain_index on TC rules

Map it to an 8-bit recirc_id for use by the hardware.
Currently nothing in the driver is offloading 'goto chain' actions,
so these rules cannot yet be hit.

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: offload conntrack flow entries (match only) from CT zones

No handling yet for FLOW_ACTION_MANGLE (NAT or NAPT) actions.

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: functions to insert/remove conntrack entries to MAE hardware

Translate from software struct efx_tc_ct_entry objects to the key
and response bitstrings, and implement insertion and removal of
these entries from the hardware table.
Callers of these functions will be added in subsequent patches.

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: functions to register for conntrack zone offload

Bind a stub callback to the netfilter flow table.

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: add MAE table machinery for conntrack table

Access to the connection tracking table in EF100 hardware is through
a "generic" table mechanism, whereby a firmware call at probe time
gives the driver a description of the field widths and offsets, so
that the driver can then construct key and response bitstrings at
runtime.
Probe the NIC for this information and populate the needed metadata
into a new meta_ct field of struct efx_tc_state.

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-08-07 (ice)

This series contains updates to ice driver only.

Wojciech allows for LAG interfaces to be used for bridge offloads.

Marcin tracks additional metadata for filtering rules to aid in proper
differentiation of similar rules. He also renames some flags that
do not entirely describe their representation.

Karol and Jan add additional waiting for firmware load on devices that
require it.

Przemek refactors RSS implementation to clarify/simplify configurations.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: clean up __ice_aq_get_set_rss_lut()
  ice: add FW load wait
  ice: Add get C827 PHY index function
  ice: Rename enum ice_pkt_flags values
  ice: Add direction metadata
  ice: Accept LAG netdevs in bridge offloads
====================

Link: https://lore.kernel.org/r/20230807204835.3129164-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'mlx5-updates-2023-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-08-07

1) Few cleanups

2) Dynamic completion EQs

The driver creates completion EQs for all vectors directly on driver
load, even if those EQs will not be utilized later on.

To allow more flexibility in managing completion EQs and to reduce the
memory overhead of driver load, this series will adjust completion EQs
creation to be dynamic. Meaning, completion EQs will be created only
when needed.

Patch #1 introduces a counter for tracking the current number of
completion EQs.
Patches #2-6 refactor the existing infrastructure of managing completion
EQs and completion IRQs to be compatible with per-vector
allocation/release requests.
Patches #7-8 modify the CPU-to-IRQ affinity calculation to be resilient
in case the affinity is requested but completion IRQ is not allocated yet.
Patch #9 function rename.
Patch #10 handles the corner case of SF performing an IRQ request when no
SF IRQ pool is found, and no PF IRQ exists for the same vector.
Patch #11 modify driver to use dynamically allocate completion EQs.

* tag 'mlx5-updates-2023-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Bridge, Only handle registered netdev bridge events
  net/mlx5: E-Switch, Remove redundant arg ignore_flow_lvl
  net/mlx5: Fix typo reminder -> remainder
  net/mlx5: remove many unnecessary NULL values
  net/mlx5: Allocate completion EQs dynamically
  net/mlx5: Handle SF IRQ request in the absence of SF IRQ pool
  net/mlx5: Rename mlx5_comp_vectors_count() to mlx5_comp_vectors_max()
  net/mlx5: Add IRQ vector to CPU lookup function
  net/mlx5: Introduce mlx5_cpumask_default_spread
  net/mlx5: Implement single completion EQ create/destroy methods
  net/mlx5: Use xarray to store and manage completion EQs
  net/mlx5: Refactor completion IRQ request/release handlers in EQ layer
  net/mlx5: Use xarray to store and manage completion IRQs
  net/mlx5: Refactor completion IRQ request/release API
  net/mlx5: Track the current number of completion EQs
====================

Link: https://lore.kernel.org/r/20230807175642.20834-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: net: page_pool: de-duplicate the intro comment

In commit 82e896d992fa ("docs: net: page_pool: use kdoc to avoid
duplicating the information") I shied away from using the DOC:
comments when moving to kdoc for documenting page_pool API,
because I wasn't sure how familiar people are with it.

Turns out there is already a DOC: comment for the intro, which
is the same in both places, modulo what looks like minor rewording.
Use the version from Documentation/ but keep the contents with
the code.

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230807210051.1014580-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Remove unused devlink_dpipe_table_resource_set() declaration

Commit f655dacb59ac ("net: devlink: remove unused locked functions")
removed this but leave the declaration.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230807143214.46648-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fq: Remove unused typedef fq_flow_get_default_t

Commitbf9009bf21b5 ("net/fq_impl: drop get_default_func, move default flow to fq_tin")
remove its last user, so can remove it.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807142111.33524-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/tls: avoid TCP window full during ->read_sock()

When flushing the backlog after decoding a record we don't really
know how much data the caller want us to evaluate, so use INT_MAX
and 0 as arguments to tls_read_flush_backlog() to ensure we flush
at 128k of data. Otherwise we might be reading too much data and
trigger a TCP window full.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20230807071022.10091-1-hare@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: exthdrs: Replace opencoded swap() implementation

Get a coccinelle warning as follows:
net/ipv6/exthdrs.c:800:29-30: WARNING opportunity for swap()

Use swap() to replace opencoded implementation.

Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230807020947.1991716-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/ipv4: return the real errno instead of -EINVAL

For now, No matter what error pointer ip_neigh_for_gw() returns,
ip_finish_output2() always return -EINVAL, which may mislead the upper
users.

For exemple, an application uses sendto to send an UDP packet, but when the
neighbor table overflows, sendto() will get a value of -EINVAL, and it will
cause users to waste a lot of time checking parameters for errors.

Return the real errno instead of -EINVAL.

Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: Si Hao <si.hao@zte.com.cn>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20230807015408.248237-1-xu.xin16@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-renesas-rswitch-add-speed-change-support'

Yoshihiro Shimoda says:

====================
net: renesas: rswitch: Add speed change support

Add speed change support at runtime for the latest SoC version.
Also, add ethtool .[gs]et_link_ksettings.

v1: https://lore.kernel.org/all/20230803120621.1471440-1-yoshihiro.shimoda.uh@renesas.com/
====================

Link: https://lore.kernel.org/r/20230807003231.1552062-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: renesas: rswitch: Add .[gs]et_link_ksettings support

Add .[gs]et_link_ksettings support by using
phy_ethtool_[gs]et_link_ksettings() functions.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807003231.1552062-3-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: renesas: rswitch: Add runtime speed change support

The latest SoC version can support runtime speed change. So,
add detect SoC version by using soc_device_match() and then
reconfigure the hardware of this and SerDes if needed.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807003231.1552062-2-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rtnetlink: remove redundant checks for nlattr IFLA_BRIDGE_MODE

The commit d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks
IFLA_BRIDGE_MODE length") added the nla_len check in rtnl_bridge_setlink,
which is the only caller for ndo_bridge_setlink handlers defined in
low-level driver codes. Hence, this patch cleanups the redundant checks in
each ndo_bridge_setlink handler function.

Suggested-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Acked-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807091347.3804523-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'bnxt_en-fix-2-compile-warnings-in-bnxt_dcb-c'

Michael Chan says:

====================
bnxt_en: Fix 2 compile warnings in bnxt_dcb.c

Fix 2 similar warnings in bnxt_dcb.c related to the cos2bw interface
definitions in bnxt_hsi.h.
====================

Link: https://lore.kernel.org/r/20230807145720.159645-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Fix W=stringop-overflow warning in bnxt_dcb.c

Fix the following warning:

drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c: In function ‘bnxt_hwrm_queue_cos2bw_cfg’:
cc1: error: writing 12 bytes into a region of size 1 [-Werror=stringop-overflow ]
In file included from drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c:19:
drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h:6045:17: note: destination object ‘unused_0’ of size 1
6045 | u8 unused_0;

Fix it by modifying struct hwrm_queue_cos2bw_cfg_input to use an array
of sub struct similar to the previous patch. This will eliminate the
pointer arithmetc to calculate the destination pointer passed to
memcpy().

Link: https://lore.kernel.org/netdev/CACKFLinikvXmKcxr4kjWO9TPYxTd2cb5agT1j=w9Qyj5-24s5A@mail.gmail.com/
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230807145720.159645-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Fix W=1 warning in bnxt_dcb.c from fortify memcpy()

Fix the following warning:

inlined from ‘bnxt_hwrm_queue_cos2bw_qcfg’ at drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c:165:3,
./include/linux/fortify-string.h:592:4: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror]
    __read_overflow2_field(q_size_field, size);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Modify the FW interface defintion of struct hwrm_queue_cos2bw_qcfg_output
to use an array of sub struct for the queue1 to queue7 fields.  Note that
the layout of the queue0 fields are different and these are not part of
the array.  This makes the code much cleaner by removing the pointer
arithmetic for memcpy().

Link: https://lore.kernel.org/netdev/20230727190726.1859515-2-kuba@kernel.org/
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230807145720.159645-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bcmasp: Prevent array undereflow in bcmasp_netfilt_get_init()

The "loc" value comes from the user and it can be negative leading to an
an array underflow when we check "priv->net_filters[loc].claimed". Fix
this by changing the type to u32.

Fixes: c5d511c49587 ("net: bcmasp: Add support for wake on net filters")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Justin Chen <justin.chen@broadcom.com>
Link: https://lore.kernel.org/r/b3b47b25-01fc-4d9f-a6c3-e037ad4d71d7@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'team-do-some-cleanups-in-team-driver'

Zhengchao Shao says:

====================
team: do some cleanups in team driver

Do some cleanups in team driver.
====================

Link: https://lore.kernel.org/r/20230807012556.3146071-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

team: remove unused input parameters in lb_htpm_select_tx_port and lb_hash_select_tx_port

The input parameters "lb_priv" and "skb" in lb_htpm_select_tx_port and
lb_hash_select_tx_port are unused, so remove them.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230807012556.3146071-6-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

team: change the getter function in the team_option structure to void

Because the getter function in the team_option structure always returns 0,
so change the getter function to void and remove redundant code.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230807012556.3146071-5-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

team: change the init function in the team_option structure to void

Because the init function in the team_option structure always returns 0,
so change the init function to void and remove redundant code.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230807012556.3146071-4-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

team: remove unreferenced header in broadcast and roundrobin files

Because linux/errno.h is unreferenced in broadcast and roundrobin
files, so remove it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230807012556.3146071-3-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

team: add __exit modifier to team_nl_fini()

team_nl_fini is only called when the module exits, so add the __exit
modifier to it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230807012556.3146071-2-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-fs_enet-driver-cleanup'

Christophe Leroy says:

====================
net: fs_enet: Driver cleanup

Over the years, platform and driver initialisation have evolved into
more generic ways, and driver or platform specific stuff has gone
away, leaving stale objects behind.

This series aims at cleaning all that up for fs_enet ethernet driver.
====================

Link: https://lore.kernel.org/r/cover.1691155346.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fs_enet: Use cpm_muram_xxx() functions instead of cpm_dpxxx() macros

cpm_dpxxx() macros are now always referring to cpm_muram_xxx() functions
directly since commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new
binding.")

Use cpm_muram_xxx() functions directly so that the cpm_dpxxx() macros
can be removed in the near future.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/2400b3156891adb653dc387fff6393de10cf2b24.1691155347.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fs_enet: Remove linux/fs_enet_pd.h

linux/fs_enet_pd.h is not used anymore.

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/5be102791c987792ad127b15543ee6715394cf67.1691155347.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fs_enet: Don't include fs_enet_pd.h when not needed

Three platforms in arch/powerpc/platforms/8xx/ include fs_enet_pd.h
but don't use anything from it.

Remove the includes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/de62ad1261a801c4a8ae4238bd4842ff278d2ddf.1691155347.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fs_enet: Move struct fs_platform_info into fs_enet.h

struct fs_platform_info is only used in fs_enet ethernet driver since
commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new binding.").

Stale prototypes using fs_platform_info were left over in
arch/powerpc/sysdev/fsl_soc.c but they are now removed by
previous patch.

Move struct fs_platform_info into fs_enet.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/f882d6b0b7075d0d8435310634ceaa2cc8e9938f.1691155347.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fs_enet: Remove stale prototypes from fsl_soc.c

Commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new binding.")
removed last use of init_fec_ioports() and init_fcc_ioports().

Remove stale prototypes then don't include anymore fs_enet_pd.h
which was only included to provide fs_platform_info structure
for the prototypes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/f2f2db5dce3242eaa4a2aad6c226ba587fb0f513.1691155347.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fs_enet: Remove has_phy field in fs_platform_info struct

Since commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new binding.")
has_phy field is never set.

Remove dead code and remove the field.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/bb5264e09e18f0ce8a0dbee399926a59f33cb248.1691155346.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fs_enet: Remove unused fields in fs_platform_info struct

Since commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new binding.")
many fields of fs_platform_info structure are not used anymore.

Remove them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/2e584fcd75e21a0f7e7d5f942eebdc067b2f82f9.1691155346.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fs_enet: Remove fs_get_id()

fs_get_id() hasn't been used since commit b219108cbace ("fs_enet:
Remove !CONFIG_PPC_CPM_NEW_BINDING code")

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/7a53b88cc40302fcbea59554f5e7067e3594ad53.1691155346.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fs_enet: Fix address space and base types mismatches

CHECK   drivers/net/ethernet/freescale/fs_enet/mac-fcc.c
drivers/net/ethernet/freescale/fs_enet/mac-fcc.c:550:9: warning: cast removes address space '__iomem' of expression
drivers/net/ethernet/freescale/fs_enet/mac-fcc.c:550:9: error: subtraction of different types can't work (different address spaces)
  CC      drivers/net/ethernet/freescale/fs_enet/mii-bitbang.o
  CHECK   drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c
drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c:95:31: warning: incorrect type in argument 1 (different base types)
drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c:95:31:    expected unsigned int [noderef] [usertype] __iomem *p
drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c:95:31:    got restricted __be32 [noderef] [usertype] __iomem *dat
...
drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c:63:31: warning: incorrect type in argument 1 (different base types)
drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c:63:31:    expected unsigned int [noderef] [usertype] __iomem *p
drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c:63:31:    got restricted __be32 [noderef] [usertype] __iomem *dir
...

Fix those address space and base type mismatches reported by sparse.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/25c7965e6aeeb6bbe1b6be5a3c2c7125182fcb02.1691155346.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fs_enet: Remove set but not used variable

CC drivers/net/ethernet/freescale/fs_enet/fs_enet-main.o
drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c: In function 'fs_enet_interrupt':
drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c:321:40: warning: variable 'fpi' set but not used [-Wunused-but-set-variable]

Remove that variable.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/90b72c1708bb8ba2b7a1a688e8259e428968364d.1691155346.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns: Remove unused function declaration mac_adjust_link()

Commit 511e6bc071db ("net: add Hisilicon Network Subsystem DSAF support")
declared but never implemented this, remove it.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Hao Lan <lanhao@huawei.com>
Link: https://lore.kernel.org/r/20230804130048.39808-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

i40e: Remove unused function declarations

Commit f62b5060d670 ("i40e: fix mac address checking") left behind
i40e_validate_mac_addr() declaration.
Also the other declarations are declared but never implemented in
commit 56a62fc86895 ("i40e: init code and hardware support").

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230804125525.20244-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ixgbe: Remove unused function declarations

Commit dc166e22ede5 ("ixgbe: DCB remove ixgbe_fcoe_getapp routine")
leave ixgbe_fcoe_getapp() unused.
Commit ffed21bcee7a ("ixgbe: Don't bother clearing buffer memory for descriptor rings")
leave ixgbe_unmap_and_free_tx_resource() declaration unused.
And commit 3b3bf3b92b31 ("ixgbe: remove unused fcoe.tc field and fcoe_setapp()")
removed the ixgbe_fcoe_setapp() implementation.

Commit c44ade9ef8ff ("ixgbe: update to latest common code module")
declared but never implemented ixgbe_init_ops_generic().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230804125203.30924-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netfilter: nfnetlink_log: always add a timestamp

Compared to all the other work we're already doing to deliver
an skb to userspace this is very cheap - at worse an extra
call to ktime_get_real() - and very useful.

(and indeed it may even be cheaper if we're running from other hooks)

(background: Android occasionally logs packets which
caused wake from sleep/suspend and we'd like to have
timestamps reliably associated with these events)

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: h323: Remove unused function declarations

Commit f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port")
declared but never implemented these.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: conntrack: Remove unused function declarations

Commit 1015c3de23ee ("netfilter: conntrack: remove extension register api")
leave nf_conntrack_acct_fini() and nf_conntrack_labels_init() unused, remove it.
And commit a0ae2562c6c4 ("netfilter: conntrack: remove l3proto abstraction")
leave behind nf_ct_l3proto_try_module_get() and nf_ct_l3proto_module_put().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: helper: Remove unused function declarations

Commit b118509076b3 ("netfilter: remove nf_conntrack_helper sysctl and modparam toggles")
leave these unused declarations.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: gre: Remove unused function declaration nf_ct_gre_keymap_flush()

Commit a23f89a99906 ("netfilter: conntrack: nf_ct_gre_keymap_flush() removal")
leave this unused, remove it.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

Merge branch 'net-remove-redundant-initialization-owner'

Li Zetao says:

====================
net: Remove redundant initialization owner

This patch set removes redundant initialization owner when register a
fsl_mc_driver driver
====================

Link: https://lore.kernel.org/r/20230804095946.99956-1-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dpaa2-switch: Remove redundant initialization owner in dpaa2_switch_drv

The fsl_mc_driver_register() will set "THIS_MODULE" to driver.owner when
register a fsl_mc_driver driver, so it is redundant initialization to set
driver.owner in dpaa2_switch_drv statement. Remove it for clean code.

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230804095946.99956-3-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dpaa2-eth: Remove redundant initialization owner in dpaa2_eth_driver

The fsl_mc_driver_register() will set "THIS_MODULE" to driver.owner when
register a fsl_mc_driver driver, so it is redundant initialization to set
driver.owner in dpaa2_eth_driver statement. Remove it for clean code.

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230804095946.99956-2-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'octeontx2-af-tc-flower-offload-changes'

Suman Ghosh says:

====================
octeontx2-af: TC flower offload changes

This patchset includes minor code restructuring related to TC
flower offload for outer vlan and adding support for TC inner
vlan offload.

Patch #1 Code restructure to handle TC flower outer vlan offload

Patch #2 Add TC flower offload support for inner vlan
====================

Link: https://lore.kernel.org/r/20230804045935.3010554-1-sumang@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: TC flower offload support for inner VLAN

Extend the current TC flower offload support to enable filters matching
inner VLAN, and support offload of those filters to hardware.

Signed-off-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230804045935.3010554-3-sumang@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: Code restructure to handle TC outer VLAN offload

Moved the TC outer VLAN offload support to a separate function.
This change is done to handle all VLAN related changes cleanly from
a dedicated function.

Signed-off-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230804045935.3010554-2-sumang@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'page_pool-a-couple-of-assorted-optimizations'

Alexander Lobakin says:

====================
page_pool: a couple of assorted optimizations

That initially was a spin-off of the IAVF PP series[0], but has grown
(and shrunk) since then a bunch. In fact, it consists of three
semi-independent blocks:

* #1-2: Compile-time optimization. Split page_pool.h into 2 headers to
  not overbloat the consumers not needing complex inline helpers and
  then stop including it in skbuff.h at all. The first patch is also
  prereq for the whole series.
* #3: Improve cacheline locality for users of the Page Pool frag API.
* #4-6: Use direct cache recycling more aggressively, when it is safe
  obviously. In addition, make sure nobody wants to use Page Pool API
  with disabled interrupts.

Patches #1 and #5 are authored by Yunsheng and Jakub respectively, with
small modifications from my side as per ML discussions.
For the perf numbers for #3-6, please see individual commit messages.

Also available on my GH with many more Page Pool goodies[1].

[0] https://lore.kernel.org/netdev/20230530150035.1943669-1-aleksander.lobakin@intel.com
[1] https://github.com/alobakin/linux/commits/iavf-pp-frag
====================

Link: https://lore.kernel.org/r/20230804180529.2483231-1-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: skbuff: always try to recycle PP pages directly when in softirq

Commit 8c48eea3adf3 ("page_pool: allow caching from safely localized
NAPI") allowed direct recycling of skb pages to their PP for some cases,
but unfortunately missed a couple of other majors.
For example, %XDP_DROP in skb mode. The netstack just calls kfree_skb(),
which unconditionally passes `false` as @napi_safe. Thus, all pages go
through ptr_ring and locks, although most of time we're actually inside
the NAPI polling this PP is linked with, so that it would be perfectly
safe to recycle pages directly.
Let's address such. If @napi_safe is true, we're fine, don't change
anything for this path. But if it's false, check whether we are in the
softirq context. It will most likely be so and then if ->list_owner
is our current CPU, we're good to use direct recycling, even though
@napi_safe is false -- concurrent access is excluded. in_softirq()
protection is needed mostly due to we can hit this place in the
process context (not the hardirq though).
For the mentioned xdp-drop-skb-mode case, the improvement I got is
3-4% in Mpps. As for page_pool stats, recycle_ring is now 0 and
alloc_slow counter doesn't change most of time, which means the
MM layer is not even called to allocate any new pages.

Suggested-by: Jakub Kicinski <kuba@kernel.org> # in_softirq()
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230804180529.2483231-7-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

page_pool: add a lockdep check for recycling in hardirq

Page pool use in hardirq is prohibited, add debug checks
to catch misuses. IIRC we previously discussed using
DEBUG_NET_WARN_ON_ONCE() for this, but there were concerns
that people will have DEBUG_NET enabled in perf testing.
I don't think anyone enables lockdep in perf testing,
so use lockdep to avoid pushback and arguing :)

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230804180529.2483231-6-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: skbuff: avoid accessing page_pool if !napi_safe when returning page

Currently, pp->p.napi is always read, but the actual variable it gets
assigned to is read-only when @napi_safe is true. For the !napi_safe
cases, which yet is still a pack, it's an unneeded operation.
Moreover, it can lead to premature or even redundant page_pool
cacheline access. For example, when page_pool_is_last_frag() returns
false (with the recent frag improvements).
Thus, read it only when @napi_safe is true. This also allows moving
@napi inside the condition block itself. Constify it while we are
here, because why not.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230804180529.2483231-5-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>