Vladimir Oltean [Tue, 29 Nov 2022 14:12:12 +0000 (16:12 +0200)]
net: dpaa2-mac: absorb phylink_start() call into dpaa2_mac_start()
The phylink handling is intended to be hidden inside the dpaa2_mac
object. Move the phylink_start() call into dpaa2_mac_start(), and
phylink_stop() into dpaa2_mac_stop().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 29 Nov 2022 14:12:11 +0000 (16:12 +0200)]
net: dpaa2: replace dpaa2_mac_is_type_fixed() with dpaa2_mac_is_type_phy()
dpaa2_mac_is_type_fixed() is a header with no implementation and no
callers, which is referenced from the documentation though. It can be
deleted.
On the other hand, it would be useful to reuse the code between
dpaa2_eth_is_type_phy() and dpaa2_switch_port_is_type_phy(). That common
code should be called dpaa2_mac_is_type_phy(), so let's create that.
The removal and the addition are merged into the same patch because,
in fact, is_type_phy() is the logical opposite of is_type_fixed().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 29 Nov 2022 14:12:10 +0000 (16:12 +0200)]
net: dpaa2-eth: don't use -ENOTSUPP error code
dpaa2_eth_setup_dpni() is called from the probe path and
dpaa2_eth_set_link_ksettings() is propagated to user space.
include/linux/errno.h says that ENOTSUPP is "Defined for the NFSv3
protocol". Conventional wisdom has it to not use it in networking
drivers. Replace it with -EOPNOTSUPP.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dan Carpenter [Tue, 29 Nov 2022 09:43:47 +0000 (12:43 +0300)]
net: microchip: sparx5: Fix error handling in vcap_show_admin()
If vcap_dup_rule() fails that leads to an error pointer dereference
side the call to vcap_free_rule(). Also it only returns an error if the
very last call to vcap_read_rule() fails and it returns success for
other errors.
I've changed it to just stop printing after the first error and return
an error code.
Fixes: 3a7921560d2f ("net: microchip: sparx5: Add VCAP rule debugFS support for the VCAP API") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Link: https://lore.kernel.org/r/Y4XUUx9kzurBN+BV@kili Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dan Carpenter [Mon, 28 Nov 2022 11:06:14 +0000 (14:06 +0300)]
bonding: uninitialized variable in bond_miimon_inspect()
The "ignore_updelay" variable needs to be initialized to false.
Fixes: f8a65ab2f3ff ("bonding: fix link recovery in mode 2 when updelay is nonzero") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/Y4SWJlh3ohJ6EPTL@kili Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 1 Dec 2022 06:05:19 +0000 (22:05 -0800)]
Merge tag 'mlx5-updates-2022-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-11-29
Misc update for mlx5 driver
1) Various trivial cleanups
2) Maor Dickman, Adds support for trap offload with additional actions
3) From Tariq, UMR (device memory registrations) cleanups,
UMR WQE must be aligned to 64B per device spec, (not a bug fix).
* tag 'mlx5-updates-2022-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Support devlink reload of IPsec core
net/mlx5e: TC, Add offload support for trap with additional actions
net/mlx5e: Do early return when setup vports dests for slow path flow
net/mlx5: Remove redundant check
net/mlx5e: Delete always true DMA check
net/mlx5e: Don't access directly DMA device pointer
net/mlx5e: Don't use termination table when redundant
net/mlx5: Fix orthography errors in documentation
net/mlx5: Use generic definition for UMR KLM alignment
net/mlx5: Generalize name of UMR alignment definition
net/mlx5: Remove unused UMR MTT definitions
net/mlx5e: Add padding when needed in UMR WQEs
net/mlx5: Remove unused ctx variables
net/mlx5e: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper
net/mlx5e: Remove unneeded io-mapping.h #include
====================
Xiaolei Wang [Wed, 30 Nov 2022 02:12:16 +0000 (10:12 +0800)]
net: phy: Add link between phy dev and mac dev
If the external phy used by current mac interface is
managed by another mac interface, it means that this
network port cannot work independently, especially
when the system suspends and resumes, the following
trace may appear, so we should create a device link
between phy dev and mac dev.
WARNING: CPU: 0 PID: 24 at drivers/net/phy/phy.c:983 phy_error+0x20/0x68
Modules linked in:
CPU: 0 PID: 24 Comm: kworker/0:2 Not tainted 6.1.0-rc3-00011-g5aaef24b5c6d-dirty #34
Hardware name: Freescale i.MX6 SoloX (Device Tree)
Workqueue: events_power_efficient phy_state_machine
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x68/0x90
dump_stack_lvl from __warn+0xb4/0x24c
__warn from warn_slowpath_fmt+0x5c/0xd8
warn_slowpath_fmt from phy_error+0x20/0x68
phy_error from phy_state_machine+0x22c/0x23c
phy_state_machine from process_one_work+0x288/0x744
process_one_work from worker_thread+0x3c/0x500
worker_thread from kthread+0xf0/0x114
kthread from ret_from_fork+0x14/0x28
Exception stack(0xf0951fb0 to 0xf0951ff8)
====================
net: devlink: return the driver name in devlink_nl_info_fill
The driver name is available in device_driver::name. Right now,
drivers still have to report this piece of information themselves in
their devlink_ops::info_get callback function.
The goal of this series is to have the devlink core to report this
information instead of the drivers.
The first patch fulfills the actual goal of this series: modify
devlink core to report the driver name and clean-up all drivers. Both
have to be done in an atomic change to avoid attribute
duplication. This same patch also removes the
devlink_info_driver_name_put() function to prevent future drivers from
reporting the driver name themselves.
The second patch allows the core to call devlink_nl_info_fill() even
if the devlink_ops::info_get() callback is NULL. This leads to the
third and final patch which cleans up the drivers which have an empty
info_get().
====================
Vincent Mailhol [Tue, 29 Nov 2022 09:51:39 +0000 (18:51 +0900)]
net: devlink: make the devlink_ops::info_get() callback optional
Some drivers only reported the driver name in their
devlink_ops::info_get() callback. Now that the core provides this
information, the callback became empty. For such drivers, just
removing the callback would prevent the core from executing
devlink_nl_info_fill() meaning that "devlink dev info" would not
return anything.
Make the callback function optional by executing
devlink_nl_info_fill() even if devlink_ops::info_get() is NULL.
N.B.: the drivers with devlink support which previously did not
implement devlink_ops::info_get() will now also be able to report
the driver name.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vincent Mailhol [Tue, 29 Nov 2022 09:51:38 +0000 (18:51 +0900)]
net: devlink: let the core report the driver name instead of the drivers
The driver name is available in device_driver::name. Right now,
drivers still have to report this piece of information themselves in
their devlink_ops::info_get callback function.
In order to factorize code, make devlink_nl_info_fill() add the driver
name attribute.
Now that the core sets the driver name attribute, drivers are not
supposed to call devlink_info_driver_name_put() anymore. Remove
devlink_info_driver_name_put() and clean-up all the drivers using this
function in their callback.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Tested-by: Ido Schimmel <idosch@nvidia.com> # mlxsw Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 1 Dec 2022 04:54:39 +0000 (20:54 -0800)]
Merge branch 'support-direct-read-from-region'
Jacob Keller says:
====================
support direct read from region
A long time ago when initially implementing devlink regions in ice I
proposed the ability to allow reading from a region without taking a
snapshot [1]. I eventually dropped this work from the original series due to
size. Then I eventually lost track of submitting this follow up.
This can be useful when interacting with some region that has some
definitive "contents" from which snapshots are made. For example the ice
driver has regions representing the contents of the device flash.
If userspace wants to read the contents today, it must first take a snapshot
and then read from that snapshot. This makes sense if you want to read a
large portion of data or you want to be sure reads are consistently from the
same recording of the flash.
However if user space only wants to read a small chunk, it must first
generate a snapshot of the entire contents, perform a read from the
snapshot, and then delete the snapshot after reading.
For such a use case, a direct read from the region makes more sense. This
can be achieved by allowing the devlink region read command to work without
a snapshot. Instead the portion to be read can be forwarded directly to the
driver via a new .read callback.
This avoids the need to read the entire region contents into memory first
and avoids the software overhead of creating a snapshot and then deleting
it.
This series implements such behavior and hooks up the ice NVM and shadow RAM
regions to allow it.
Jacob Keller [Mon, 28 Nov 2022 20:36:47 +0000 (12:36 -0800)]
ice: implement direct read for NVM and Shadow RAM regions
Implement the .read handler for the NVM and Shadow RAM regions. This
enables user space to read a small chunk of the flash without needing the
overhead of creating a full snapshot.
Update the documentation for ice to detail which regions have direct read
support.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller [Mon, 28 Nov 2022 20:36:46 +0000 (12:36 -0800)]
ice: document 'shadow-ram' devlink region
78ad87da9978 ("ice: devlink: add shadow-ram region to snapshot Shadow RAM")
added support for the 'shadow-ram' devlink region, but did not document it
in the ice devlink documentation. Fix this.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller [Mon, 28 Nov 2022 20:36:45 +0000 (12:36 -0800)]
ice: use same function to snapshot both NVM and Shadow RAM
The ice driver supports a region for both the flat NVM contents as well as
the Shadow RAM which is a layer built on top of the flash during device
initialization.
These regions use an almost identical read function, except that the NVM
needs to set the direct flag when reading, while Shadow RAM needs to read
without the direct flag set. They each call ice_read_flat_nvm with the only
difference being whether to set the direct flash flag.
The NVM region read function also was fixed to read the NVM in blocks to
avoid a situation where the firmware reclaims the lock due to taking too
long.
Note that the region snapshot function takes the ops pointer so the
function can easily determine which region to read. Make use of this and
re-use the NVM snapshot function for both the NVM and Shadow RAM regions.
This makes Shadow RAM benefit from the same block approach as the NVM
region. It also reduces code in the ice driver.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller [Mon, 28 Nov 2022 20:36:44 +0000 (12:36 -0800)]
devlink: support directly reading from region memory
To read from a region, user space must currently request a new snapshot of
the region and then read from that snapshot. This can sometimes be overkill
if user space only reads a tiny portion. They first create the snapshot,
then request a read, then destroy the snapshot.
For regions which have a single underlying "contents", it makes sense to
allow supporting direct reading of the region data.
Extend the DEVLINK_CMD_REGION_READ to allow direct reading from a region if
requested via the new DEVLINK_ATTR_REGION_DIRECT. If this attribute is set,
then perform a direct read instead of using a snapshot. Direct read is
mutually exclusive with DEVLINK_ATTR_REGION_SNAPSHOT_ID, and care is taken
to ensure that we reject commands which provide incorrect attributes.
Regions must enable support for direct read by implementing the .read()
callback function. If a region does not support such direct reads, a
suitable extended error message is reported.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller [Mon, 28 Nov 2022 20:36:43 +0000 (12:36 -0800)]
devlink: refactor region_read_snapshot_fill to use a callback function
The devlink_nl_region_read_snapshot_fill is used to copy the contents of
a snapshot into a message for reporting to userspace via the
DEVLINK_CMG_REGION_READ netlink message.
A future change is going to add support for directly reading from
a region. Almost all of the logic for this new capability is identical.
To help reduce code duplication and make this logic more generic,
refactor the function to take a cb and cb_priv pointer for doing the
actual copy.
Add a devlink_region_snapshot_fill implementation that will simply copy
the relevant chunk of the region. This does require allocating some
storage for the chunk as opposed to simply passing the correct address
forward to the devlink_nl_cmg_region_read_chunk_fill function.
A future change to implement support for directly reading from a region
without a snapshot will provide a separate implementation that calls the
newly added devlink region operation.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller [Mon, 28 Nov 2022 20:36:41 +0000 (12:36 -0800)]
devlink: find snapshot in devlink_nl_cmd_region_read_dumpit
The snapshot pointer is obtained inside of the function
devlink_nl_region_read_snapshot_fill. Simplify this function by locating
the snapshot upfront in devlink_nl_cmd_region_read_dumpit instead. This
aligns with how other netlink attributes are handled, and allows us to
exit slightly earlier if an invalid snapshot ID is provided.
It also allows us to pass the snapshot pointer directly to the
devlink_nl_region_read_snapshot_fill, and remove the now unused attrs
parameter.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Horatiu Vultur [Mon, 28 Nov 2022 14:29:59 +0000 (15:29 +0100)]
net: microchip: vcap: Change how the rule id is generated
Currently whenever a new rule id is generated, it picks up the next
number bigger than previous id. So it would always be 1, 2, 3, etc.
When the rule with id 1 will be deleted and a new rule will be added,
it will have the id 4 and not id 1.
In theory this can be a problem if at some point a rule will be added
and removed ~0 times. Then no more rules can be added because there
are no more ids.
Change this such that when a new rule is added, search for an empty
rule id starting with value of 1 as value 0 is reserved.
Christophe JAILLET [Fri, 25 Nov 2022 12:24:00 +0000 (13:24 +0100)]
octeontx2-af: Fix the size of memory allocated for the 'id_bmap' bitmap
This allocation is really spurious.
The size of the bitmap is 'tot_ids' and it is used as such in the driver.
So we could expect something like:
table->id_bmap = devm_kcalloc(rvu->dev, BITS_TO_LONGS(table->tot_ids),
sizeof(long), GFP_KERNEL);
However, when the bitmap is allocated, we allocate:
BITS_TO_LONGS(table->tot_ids) * table->tot_ids ~=
table->tot_ids / 32 * table->tot_ids ~=
table->tot_ids^2 / 32
It is proportional to the square of 'table->tot_ids' which seems to
potentially be big.
Allocate the expected amount of memory, and switch to the bitmap API to
have it more straightforward.
Christophe JAILLET [Fri, 25 Nov 2022 12:23:59 +0000 (13:23 +0100)]
octeontx2-af: Use the bitmap API to allocate bitmaps
Use devm_bitmap_zalloc() instead of hand-writing it.
This also makes the comment "Allocate bitmap for 32 entry mcam" more
explicit because now 32 is really used in the allocation function, instead
of an obscure 'sizeof(long)'.
Christophe JAILLET [Fri, 25 Nov 2022 12:23:57 +0000 (13:23 +0100)]
octeontx2-af: Fix a potentially spurious error message
When this error message is displayed, we know that the all the bits in the
bitmap are set.
So, bitmap_weight() will return the number of bits of the bitmap, which is
'table->tot_ids'.
It is unlikely that a bit will be cleared between mutex_unlock() and
dev_err(), but, in order to simplify the code and avoid this possibility,
just take 'table->tot_ids'.
Leon Romanovsky [Mon, 22 Aug 2022 10:38:55 +0000 (13:38 +0300)]
net/mlx5e: Support devlink reload of IPsec core
Change IPsec initialization flow to allow future creation of hardware
resources that should be released and allocated during devlink reload
operation. As part of that change, update function signature to be
void as no callers are actually interested in it.
Maor Dickman [Thu, 24 Nov 2022 12:21:03 +0000 (14:21 +0200)]
net/mlx5e: TC, Add offload support for trap with additional actions
TC trap action offload is currently supported only when trap is the sole action
in the flow.
This patch remove this limitation by changing trap action offload to not use
MLX5_ATTR_FLAG_SLOW_PATH flag and instead set the flow destination table
explicitly to be the slow table. This will allow offload of the additional
actions.
TC flow example:
tc filter add dev $REP2 protocol ip prio 2 root \
flower skip_sw dst_mac $mac0 \
action mirred egress redirect dev $REP3 \
action pedit ex munge eth dst set $mac2 pipe \
action trap
Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Mon, 21 Nov 2022 10:14:50 +0000 (12:14 +0200)]
net/mlx5e: Do early return when setup vports dests for slow path flow
Adding flow flag cases in setup vport dests before the slow path
case is incorrect as the slow path should take precedence.
Current code doesn't show this importance so make the slow path
case return early and separate from the other cases and remove
the redundant comparison of it in the sample case.
Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Leon Romanovsky [Thu, 13 Oct 2022 12:48:43 +0000 (15:48 +0300)]
net/mlx5: Remove redundant check
If ASO failed in creation, it won't be called to destroy either.
The kernel coding pattern is to make sure that callers are calling
to destroy only for valid objects.
Roi Dayan [Wed, 16 Nov 2022 07:47:37 +0000 (09:47 +0200)]
net/mlx5e: Don't use termination table when redundant
Current code used termination table for each vport destination
while it's only required for hairpin, i.e. uplink to uplink, or
when vlan push on rx action being used.
Fix to skip using termination table for vport destinations that
do not require it.
Tariq Toukan [Mon, 31 Oct 2022 12:24:02 +0000 (14:24 +0200)]
net/mlx5: Use generic definition for UMR KLM alignment
MLX5_UMR_KLM_ALIGNMENT is in units of number of entries, while
MLX5_UMR_MTT_ALIGNMENT (generalized and renamed to
MLX5_UMR_FLEX_ALIGNMENT) is in byte units. This is misleading and
confusing.
Replace this KLM definition with one based on the generic definition.
Tariq Toukan [Mon, 31 Oct 2022 12:18:22 +0000 (14:18 +0200)]
net/mlx5e: Add padding when needed in UMR WQEs
Per the device spec, MTTs/KLMs list in a UMR WQE must be aligned to 64B.
Per our SW design, the MTT/KLMs list would need alignment only if it's
too small, for example on PPC when PAGE_SIZE is 64KB, and only 4 pages
are needed to cover a MPWQE of size 256KB.
Padding, if needed, is taken into account when calculating the UMR WQE
fields (ds_cnt and xlt_octowords), however no entries are provided,
instead garbage is passed.
No real harm though, as these parts act as gaps between the RX MPWQEs
and not used by any of them. Hence, in practice, device does not try to
write any incoming packet to them. Still, prefer providing clean padding
marking the end of the list, and do not map garbage into the RQ memory
region.
Gustavo A. R. Silva [Mon, 26 Sep 2022 21:50:42 +0000 (16:50 -0500)]
net/mlx5e: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper
Zero-length arrays are deprecated and we are moving towards adopting
C99 flexible-array members, instead. So, replace zero-length arrays
declarations in anonymous union with the new DECLARE_FLEX_ARRAY()
helper macro.
This helper allows for flexible-array members in unions.
Jakub Kicinski [Wed, 30 Nov 2022 04:50:50 +0000 (20:50 -0800)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
ipsec-next 2022-11-26
1) Remove redundant variable in esp6.
From Colin Ian King.
2) Update x->lastused for every packet. It was used only for
outgoing mobile IPv6 packets, but showed to be usefull
to check if the a SA is still in use in general.
From Antony Antony.
3) Remove unused variable in xfrm_byidx_resize.
From Leon Romanovsky.
4) Finalize extack support for xfrm.
From Sabrina Dubroca.
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: add extack to xfrm_set_spdinfo
xfrm: add extack to xfrm_alloc_userspi
xfrm: add extack to xfrm_do_migrate
xfrm: add extack to xfrm_new_ae and xfrm_replay_verify_len
xfrm: add extack to xfrm_del_sa
xfrm: add extack to xfrm_add_sa_expire
xfrm: a few coding style clean ups
xfrm: Remove not-used total variable
xfrm: update x->lastused for every packet
esp6: remove redundant variable err
====================
====================
net: pcs: altera-tse: simplify and clean-up the driver
This small series does a bit of code cleanup in the altera TSE pcs
driver, removing unused register definitions, handling 1000BaseX speed
configuration correctly according to the datasheet, and making use of
proper poll_timeout helpers.
No functional change is introduced.
====================
Maxime Chevallier [Fri, 25 Nov 2022 13:17:59 +0000 (14:17 +0100)]
net: pcs: altera-tse: use read_poll_timeout to wait for reset
Software resets on the TSE PCS don't clear registers, but rather reset
all internal state machines regarding AN, comma detection and
encoding/decoding. Use read_poll_timeout to wait for the reset to clear
instead of manually polling the register.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
mptcp: MSG_FASTOPEN and TFO listener side support
Before this series, only the initiator of a connection was able to combine
both TCP FastOpen and MPTCP when using TCP_FASTOPEN_CONNECT socket option.
These new patches here add (in theory) the full support of TFO with MPTCP,
which means:
- MSG_FASTOPEN sendmsg flag support (patch 1/8)
- TFO support for the listener side (patches 2-5/8)
- TCP_FASTOPEN socket option (patch 6/8)
- TCP_FASTOPEN_KEY socket option (patch 7/8)
To support TFO for the server side, a few preparation patches are needed
(patches 2 to 5/8). Some of them were inspired by a previous work from
Benjamin Hesmans.
Note that TFO support with MPTCP has been validated with selftests
(patch 8/8) but also with Packetdrill tests running with a modified
but still very WIP version supporting MPTCP. Both the modified tool
and the tests are available online:
Dmytro Shytyi [Fri, 25 Nov 2022 22:29:54 +0000 (23:29 +0100)]
selftests: mptcp: mptfo Initiator/Listener
This patch first adds TFO support in mptcp_connect.c.
This can be enabled via a new option: -o MPTFO.
Once enabled, the TCP_FASTOPEN socket option is enabled for the server
side and a sendto() with MSG_FASTOPEN is used instead of a connect() for
the client side.
Note that the first SYN has a limit of bytes it can carry. In other
words, it is allowed to send less data than the provided one. We then
need to track more status info to properly allow the next sendmsg()
starting from the next part of the data to send the rest.
Also in TFO scenarios, we need to completely spool the partially xmitted
buffer -- and account for that -- before starting sendfile/mmap xmit,
otherwise the relevant tests will fail.
Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts [Fri, 25 Nov 2022 22:29:53 +0000 (23:29 +0100)]
mptcp: add support for TCP_FASTOPEN_KEY sockopt
The goal of this socket option is to set different keys per listener,
see commit 1fba70e5b6be ("tcp: socket option to set TCP fast open key")
for more details about this socket option.
The only thing to do here with MPTCP is to relay the request to the
first subflow like it is already done for the other TCP_FASTOPEN* socket
options.
Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dmytro Shytyi [Fri, 25 Nov 2022 22:29:51 +0000 (23:29 +0100)]
mptcp: add subflow_v(4,6)_send_synack()
The send_synack() needs to be overridden for MPTCP to support TFO for
two reasons:
- There is not be enough space in the TCP options if the TFO cookie has
to be added in the SYN+ACK with other options: MSS (4), SACK OK (2),
Timestamps (10), Window Scale (3+1), TFO (10+2), MP_CAPABLE (12).
MPTCPv1 specs -- RFC 8684, section B.1 [1] -- suggest to drop the TCP
timestamps option in this case.
- The data received in the SYN has to be handled: the SKB can be
dequeued from the subflow sk and transferred to the MPTCP sk. Counters
need to be updated accordingly and the application can be notified at
the end because some bytes have been received.
Dmytro Shytyi [Fri, 25 Nov 2022 22:29:50 +0000 (23:29 +0100)]
mptcp: implement delayed seq generation for passive fastopen
With fastopen in place, the first subflow socket is created before the
MPC handshake completes, and we need to properly initialize the sequence
numbers at MPC ACK reception.
Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Fri, 25 Nov 2022 22:29:49 +0000 (23:29 +0100)]
mptcp: consolidate initial ack seq generation
Currently the initial ack sequence is generated on demand whenever
it's requested and the remote key is handy. The relevant code is
scattered in different places and can lead to multiple, unneeded,
crypto operations.
This change consolidates the ack sequence generation code in a single
helper, storing the sequence number at the subflow level.
The above additionally saves a few conditional in fast-path and will
simplify the upcoming fast-open implementation.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Fri, 25 Nov 2022 22:29:48 +0000 (23:29 +0100)]
mptcp: track accurately the incoming MPC suboption type
Currently in the receive path we don't need to discriminate
between MPC SYN, MPC SYN-ACK and MPC ACK, but soon the fastopen
code will need that info to properly track the fully established
status.
Track the exact MPC suboption type into the receive opt bitmap.
No functional change intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dmytro Shytyi [Fri, 25 Nov 2022 22:29:47 +0000 (23:29 +0100)]
mptcp: add MSG_FASTOPEN sendmsg flag support
Since commit 54f1944ed6d2 ("mptcp: factor out mptcp_connect()"), all the
infrastructure is now in place to support the MSG_FASTOPEN flag, we
just need to call into the fastopen path in mptcp_sendmsg().
Co-developed-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 29 Nov 2022 17:52:10 +0000 (09:52 -0800)]
Merge tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and wifi.
Current release - new code bugs:
- eth: mlx5e:
- use kvfree() in mlx5e_accel_fs_tcp_create()
- MACsec, fix RX data path 16 RX security channel limit
- MACsec, fix memory leak when MACsec device is deleted
- MACsec, fix update Rx secure channel active field
- MACsec, fix add Rx security association (SA) rule memory leak
Previous releases - regressions:
- wifi: cfg80211: don't allow multi-BSSID in S1G
- stmmac: set MAC's flow control register to reflect current settings
- eth: mlx5:
- E-switch, fix duplicate lag creation
- fix use-after-free when reverting termination table
Previous releases - always broken:
- ipv4: fix route deletion when nexthop info is not specified
- bpf: fix a local storage BPF map bug where the value's spin lock
field can get initialized incorrectly
- tipc: re-fetch skb cb after tipc_msg_validate
- wifi: wilc1000: fix Information Element parsing
- packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
- sctp: fix memory leak in sctp_stream_outq_migrate()
- can: can327: fix potential skb leak when netdev is down
- can: add number of missing netdev freeing on error paths
- aquantia: do not purge addresses when setting the number of rings
- wwan: iosm:
- fix incorrect skb length leading to truncated packet
- fix crash in peek throughput test due to skb UAF"
* tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed
MAINTAINERS: Update maintainer list for chelsio drivers
ionic: update MAINTAINERS entry
sctp: fix memory leak in sctp_stream_outq_migrate()
packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
net/mlx5: Lag, Fix for loop when checking lag
Revert "net/mlx5e: MACsec, remove replay window size limitation in offload path"
net: marvell: prestera: Fix a NULL vs IS_ERR() check in some functions
net: tun: Fix use-after-free in tun_detach()
net: mdiobus: fix unbalanced node reference count
net: hsr: Fix potential use-after-free
tipc: re-fetch skb cb after tipc_msg_validate
mptcp: fix sleep in atomic at close time
mptcp: don't orphan ssk in mptcp_close()
dsa: lan9303: Correct stat name
ipv4: Fix route deletion when nexthop info is not specified
net: wwan: iosm: fix incorrect skb length
net: wwan: iosm: fix crash in peek throughput test
net: wwan: iosm: fix dma_alloc_coherent incompatible pointer type
net: wwan: iosm: fix kernel test robot reported error
...
Yuan Can [Tue, 29 Nov 2022 01:39:34 +0000 (01:39 +0000)]
udp_tunnel: Add checks for nla_nest_start() in __udp_tunnel_nic_dump_write()
As the nla_nest_start() may fail with NULL returned, the return value
should be checked.
Note that this is not a real bug, nothing will break here.
The next nla_put() will fail as well and we'll bail (and
nla_nest_cancel() can handle NULL). But we keep getting
those "fixes" so whatever.
Yoshihiro Shimoda [Mon, 28 Nov 2022 06:56:04 +0000 (15:56 +0900)]
net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed
After system resumed on some environment board, the promiscuous mode
is disabled because the SoC turned off. So, call ravb_set_rx_mode() in
the ravb_resume() to fix the issue.
Shannon Nelson [Tue, 29 Nov 2022 01:17:34 +0000 (17:17 -0800)]
ionic: update MAINTAINERS entry
Now that Pensando is a part of AMD we need to update
a couple of addresses. We're keeping the mailing list
address for the moment, but that will likely change in
the near future.
Willem de Bruijn [Mon, 28 Nov 2022 16:18:12 +0000 (11:18 -0500)]
packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
CHECKSUM_COMPLETE signals that skb->csum stores the sum over the
entire packet. It does not imply that an embedded l4 checksum
field has been validated.
Chris Mi [Tue, 29 Nov 2022 09:30:06 +0000 (01:30 -0800)]
net/mlx5: Lag, Fix for loop when checking lag
The cited commit adds a for loop to check if each port supports lag
or not. But dev is not initialized correctly. Fix it by initializing
dev for each iteration.
Fixes: e87c6a832f88 ("net/mlx5: E-switch, Fix duplicate lag creation") Signed-off-by: Chris Mi <cmi@nvidia.com> Reported-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221129093006.378840-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The cited commit removed the validity checks which initialized the
window_sz and never removed the use of the now uninitialized variable,
so now we are left with wrong value in the window size and the following
clang warning: [-Wuninitialized]
drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c:232:45:
warning: variable 'window_sz' is uninitialized when used here
MLX5_SET(macsec_aso, aso_ctx, window_size, window_sz);
Revet at this time to address the clang issue due to lack of time to
test the proper solution.
Fixes: c0071be0e16c ("net/mlx5e: MACsec, remove replay window size limitation in offload path") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reported-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20221129093006.378840-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yu Xiao [Fri, 25 Nov 2022 11:30:30 +0000 (12:30 +0100)]
nfp: ethtool: support reporting link modes
Add support for reporting link modes,
including `Supported link modes` and `Advertised link modes`,
via ethtool $DEV.
A new command `SPCODE_READ_MEDIA` is added to read info from
management firmware. Also, the mapping table `nfp_eth_media_table`
associates the link modes between NFP and kernel. Both of them
help to support this ability.
Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/r/20221125113030.141642-1-simon.horman@corigine.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Horatiu Vultur [Fri, 25 Nov 2022 09:50:08 +0000 (10:50 +0100)]
net: lan966x: add tc matchall goto action
Extend matchall with action goto. This is needed to enable the lookup in
the VCAP. It is needed to connect chain 0 to a chain that is recognized
by the HW.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Horatiu Vultur [Fri, 25 Nov 2022 09:50:02 +0000 (10:50 +0100)]
net: microchip: vcap: Merge the vcap_ag_api_kunit.h into vcap_ag_api.h
Currently there are 2 files that contain the keyfields, keys,
actionfields and actions. First file is used by the kunit while the
second one is used by VCAP api.
The header file that is used by kunit is just a super set of the of the
header file used by VCAP api.
Therefore not to have duplicate information in different files which is
also harder to maintain, create a single file that is used both by API
and by kunit.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shigeru Yoshida [Thu, 24 Nov 2022 17:51:34 +0000 (02:51 +0900)]
net: tun: Fix use-after-free in tun_detach()
syzbot reported use-after-free in tun_detach() [1]. This causes call
trace like below:
==================================================================
BUG: KASAN: use-after-free in notifier_call_chain+0x1ee/0x200 kernel/notifier.c:75
Read of size 8 at addr ffff88807324e2a8 by task syz-executor.0/3673
The cause of the issue is that sock_put() from __tun_detach() drops
last reference count for struct net, and then notifier_call_chain()
from netdev_state_change() accesses that struct net.
This patch fixes the issue by calling sock_put() from tun_detach()
after all necessary accesses for the struct net has done.
Lorenzo Bianconi [Thu, 24 Nov 2022 15:22:53 +0000 (16:22 +0100)]
net: ethernet: mtk_wed: update mtk_wed_stop
Update mtk_wed_stop routine and rename old mtk_wed_stop() to
mtk_wed_deinit(). This is a preliminary patch to add Wireless Ethernet
Dispatcher reset support.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Thu, 24 Nov 2022 15:22:52 +0000 (16:22 +0100)]
net: ethernet: mtk_wed: move MTK_WDMA_RESET_IDX_TX configuration in mtk_wdma_tx_reset
Remove duplicated code. Increase poll timeout to 10ms in order to be
aligned with vendor sdk.
This is a preliminary patch to add Wireless Ethernet Dispatcher reset
support.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Thu, 24 Nov 2022 15:22:51 +0000 (16:22 +0100)]
net: ethernet: mtk_wed: return status value in mtk_wdma_rx_reset
Move MTK_WDMA_RESET_IDX configuration in mtk_wdma_rx_reset routine.
Increase poll timeout to 10ms in order to be aligned with vendor sdk.
This is a preliminary patch to add Wireless Ethernet Dispatcher reset
support.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
Marvell nvmem mac addresses support
Now that we are aligned on how to make information available from static
storage media to drivers like Ethernet controller drivers or switch
drivers by using nvmem cells and going through the whole nvmem
infrastructure, here are two driver updates to reflect these changes.
Prior to the driver updates, I propose:
* Reverting binding changes which should have never been accepted like
that.
* A conversion of the (old) Prestera and DFX server bindings (optional,
can be dropped if not considered necessary).
* A better description of the more recent Prestera PCI switch.
Please mind that this series cannot break anything since retrieving the
MAC address Prestera driver has never worked upstream, because the (ONIE
tlv) driver supposed to export the MAC address has not been accepted in
its original form and has been updated to the nvmem-layout
infrastructure (bindings have been merged, the code remains to be
applied).
====================
Miquel Raynal [Thu, 24 Nov 2022 11:15:56 +0000 (12:15 +0100)]
net: mvpp2: Consider NVMEM cells as possible MAC address source
The ONIE standard describes the organization of tlv (type-length-value)
arrays commonly stored within NVMEM devices on common networking
hardware.
Several drivers already make use of NVMEM cells for purposes like
retrieving a default MAC address provided by the manufacturer.
What made ONIE tables unusable so far was the fact that the information
where "dynamically" located within the table depending on the
manufacturer wishes, while Linux NVMEM support only allowed statically
defined NVMEM cells. Fortunately, this limitation was eventually tackled
with the introduction of discoverable cells through the use of NVMEM
layouts, making it possible to extract and consistently use the content
of tables like ONIE's tlv arrays.
Parsing this table at runtime in order to get various information is now
possible. So, because many Marvell networking switches already follow
this standard, let's consider using NVMEM cells as a new valid source of
information when looking for a base MAC address, which is one of the
primary uses of these new fields. Indeed, manufacturers following the
ONIE standard are encouraged to provide a default MAC address there, so
let's eventually use it if no other MAC address has been found using the
existing methods.
This driver fist makes an expensive DT lookup to retrieve its DT node
(this is a PCI driver) in order to later search for the
base-mac-provider property. This property has no reality upstream and
this code should not have been accepted like this in the first
place. Instead, there is a proper nvmem interface that should be
used. Let's avoid these extra lookups and rely on the nvmem internal
logic.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Miquel Raynal [Thu, 24 Nov 2022 11:15:54 +0000 (12:15 +0100)]
of: net: export of_get_mac_address_nvmem()
Export
of_get_mac_addr_nvmem()
and rename it to
of_get_mac_address_nvmem()
in order to fit the convention followed by the existing exported helpers
of the same kind.
This way, OF compatible drivers using eg. fwnode_get_mac_address() can
do a direct call to it instead of calling of_get_mac_address() just for
the nvmem step, avoiding to repeat an expensive DT lookup which has
already been done once.
Eventually, fwnode_get_mac_address() should probably be updated to
perform the nvmem lookup directly, but as of today, nvmem cells seem not
to be supported by ACPI yet which would defeat this kind of extension.
Suggested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Miquel Raynal [Thu, 24 Nov 2022 11:15:53 +0000 (12:15 +0100)]
dt-bindings: net: marvell,prestera: Describe PCI devices of the prestera family
Even though the devices have very little in common beside the name and
the main "switch" feature, Marvell Prestera switch family is also
composed of PCI-only devices which can receive additional static
properties, like nvmem cells to point at MAC addresses, for
instance. Let's describe them.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Miquel Raynal [Thu, 24 Nov 2022 11:15:51 +0000 (12:15 +0100)]
dt-bindings: net: marvell,dfx-server: Convert to yaml
Even though this description is not used anywhere upstream (no matching
driver), while on this file I decided I would try a conversion to yaml
in order to clarify the prestera family description.
I cannot keep the nodename dfx-server@xxxx so I switched to dfx-bus@xxxx
which matches simple-bus.yaml. Otherwise I took the example context from
the only user of this compatible: armada-xp-98dx3236.dtsi, which is a
rather old and not perfect DT.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
marvell,prestera.txt is an old file describing the old Alleycat3
standalone switches. The commit mentioned above actually hacked these
bindings to add support for a device tree property for a more modern
version of the IP connected over PCI, using only the generic compatible
in order to retrieve the device node from the prestera driver to read
one static property.
The problematic property discussed here is "base-mac-provider". The
original intent was to point to a nvmem device which could produce the
relevant nvmem-cell. This property has never been acked by DT
maintainers and fails all the layering that has been brought with the nvmem
bindings by pointing at a nvmem producer, bypassing the existing nvmem
bindings, rather than a nvmem cell directly. Furthermore, the property
cannot even be used upstream because it expected the ONIE tlv driver to
produce a specific cell, driver which used nacked bindings and thus was
never merged, replaced by a more integrated concept: the nvmem-layout.
So let's forget about this temporary addition, safely avoiding the need
for any backward compatibility handling. A new (yaml) binding file will
be brought with the prestera bindings, and there we will actually
include a description of the modern IP over PCI, including the right way
to point to a nvmem cell.
Cc: Vadym Kochan <vadym.kochan@plvision.eu> Cc: Taras Chornyi <tchornyi@marvell.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 29 Nov 2022 01:14:01 +0000 (17:14 -0800)]
Daniel Borkmann says:
====================
bpf-next 2022-11-25
We've added 101 non-merge commits during the last 11 day(s) which contain
a total of 109 files changed, 8827 insertions(+), 1129 deletions(-).
The main changes are:
1) Support for user defined BPF objects: the use case is to allocate own
objects, build own object hierarchies and use the building blocks to
build own data structures flexibly, for example, linked lists in BPF,
from Kumar Kartikeya Dwivedi.
2) Add bpf_rcu_read_{,un}lock() support for sleepable programs,
from Yonghong Song.
3) Add support storing struct task_struct objects as kptrs in maps,
from David Vernet.
4) Batch of BPF map documentation improvements, from Maryam Tahhan
and Donald Hunter.
5) Improve BPF verifier to propagate nullness information for branches
of register to register comparisons, from Eduard Zingerman.
6) Fix cgroup BPF iter infra to hold reference on the start cgroup,
from Hou Tao.
7) Fix BPF verifier to not mark fentry/fexit program arguments as trusted
given it is not the case for them, from Alexei Starovoitov.
8) Improve BPF verifier's realloc handling to better play along with dynamic
runtime analysis tools like KASAN and friends, from Kees Cook.
9) Remove legacy libbpf mode support from bpftool,
from Sahid Orentino Ferdjaoui.
10) Rework zero-len skb redirection checks to avoid potentially breaking
existing BPF test infra users, from Stanislav Fomichev.
11) Two small refactorings which are independent and have been split out
of the XDP queueing RFC series, from Toke Høiland-Jørgensen.
12) Fix a memory leak in LSM cgroup BPF selftest, from Wang Yufen.
13) Documentation on how to run BPF CI without patch submission,
from Daniel Müller.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
David Howells [Mon, 28 Nov 2022 22:02:56 +0000 (22:02 +0000)]
afs: Fix fileserver probe RTT handling
The fileserver probing code attempts to work out the best fileserver to
use for a volume by retrieving the RTT calculated by AF_RXRPC for the
probe call sent to each server and comparing them. Sometimes, however,
no RTT estimate is available and rxrpc_kernel_get_srtt() returns false,
leading good fileservers to be given an RTT of UINT_MAX and thus causing
the rotation algorithm to ignore them.
Fix afs_select_fileserver() to ignore rxrpc_kernel_get_srtt()'s return
value and just take the estimated RTT it provides - which will be capped
at 1 second.
Fixes: 1d4adfaf6574 ("rxrpc: Make rxrpc_kernel_get_srtt() indicate validity") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/166965503999.3392585.13954054113218099395.stgit@warthog.procyon.org.uk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the 'fwnode' is not an acpi node, the refcount is get in
fwnode_mdiobus_phy_device_register(), but it has never been
put when the device is freed in the normal path. So call
fwnode_handle_put() in phy_device_release() to avoid leak.
If it's an acpi node, it has never been get, but it's put
in the error path, so call fwnode_handle_get() before
phy_device_register() to keep get/put operation balanced.
Fixes: bc1bee3b87ee ("net: mdiobus: Introduce fwnode_mdiobus_register_phy()") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20221124150130.609420-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>