Beniamino Galvani [Mon, 16 Oct 2023 07:15:26 +0000 (09:15 +0200)]
vxlan: use generic function for tunnel IPv4 route lookup
The route lookup can be done now via generic function
udp_tunnel_dst_lookup() to replace the custom implementations in
vxlan_get_route().
Note that this patch only touches IPv4, while IPv6 still uses
vxlan6_get_route(). After IPv6 route lookup gets converted as well,
vxlan_xmit_one() can be simplified by removing local variables that
will be passed via "struct ip_tunnel_key", such as remote_ip,
local_ip, flow_flags, label.
Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Beniamino Galvani [Mon, 16 Oct 2023 07:15:25 +0000 (09:15 +0200)]
geneve: use generic function for tunnel IPv4 route lookup
The route lookup can be done now via generic function
udp_tunnel_dst_lookup() to replace the custom implementation in
geneve_get_v4_rt().
Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Beniamino Galvani [Mon, 16 Oct 2023 07:15:24 +0000 (09:15 +0200)]
geneve: add dsfield helper function
Add a helper function to compute the tos/dsfield. In this way, we can
factor out some duplicate code. Also, the helper will be called from
more places in the next commit.
Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Beniamino Galvani [Mon, 16 Oct 2023 07:15:23 +0000 (09:15 +0200)]
ipv4: use tunnel flow flags for tunnel route lookups
Commit 451ef36bd229 ("ip_tunnels: Add new flow flags field to
ip_tunnel_key") added a new field to struct ip_tunnel_key to control
route lookups. Currently the flag is used by vxlan and geneve tunnels;
use it also in udp_tunnel_dst_lookup() so that it affects all tunnel
types relying on this function.
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Beniamino Galvani [Mon, 16 Oct 2023 07:15:22 +0000 (09:15 +0200)]
ipv4: add new arguments to udp_tunnel_dst_lookup()
We want to make the function more generic so that it can be used by
other UDP tunnel implementations such as geneve and vxlan. To do that,
add the following arguments:
- source and destination UDP port;
- ifindex of the output interface, needed by vxlan;
- the tos, because in some cases it is not taken from struct
ip_tunnel_info (for example, when it's inherited from the inner
packet);
- the dst cache, because not all tunnel types (e.g. vxlan) want to
use the one from struct ip_tunnel_info.
With these parameters, the function no longer needs the full struct
ip_tunnel_info as argument and we can pass only the relevant part of
it (struct ip_tunnel_key).
Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Beniamino Galvani [Mon, 16 Oct 2023 07:15:21 +0000 (09:15 +0200)]
ipv4: remove "proto" argument from udp_tunnel_dst_lookup()
The function is now UDP-specific, the protocol is always IPPROTO_UDP.
Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Beniamino Galvani [Mon, 16 Oct 2023 07:15:20 +0000 (09:15 +0200)]
ipv4: rename and move ip_route_output_tunnel()
At the moment ip_route_output_tunnel() is used only by bareudp.
Ideally, other UDP tunnel implementations should use it, but to do so
the function needs to accept new parameters that are specific for UDP
tunnels, such as the ports.
Prepare for these changes by renaming the function to
udp_tunnel_dst_lookup() and move it to file
net/ipv4/udp_tunnel_core.c.
Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Christian Marangi [Thu, 12 Oct 2023 09:14:29 +0000 (11:14 +0200)]
net: cxgb3: simplify logic for rspq_check_napi
Simplify logic for rspq_check_napi.
Drop redundant and wrong napi_is_scheduled call as it's not race free
and directly use the output of napi_schedule to understand if a napi is
pending or not.
rspq_check_napi main logic is to check if is_new_response is true and
check if a napi is not scheduled. The result of this function is then
used to detect if we are missing some interrupt and act on top of
this... With this knowing, we can rework and simplify the logic and make
it less problematic with testing an internal bit for napi.
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 15 Oct 2023 19:07:53 +0000 (20:07 +0100)]
Merge branch 'ptp-multiple-readers'
Xabier Marquiegui says:
====================
ptp: Support for multiple filtered timestamp event queue readers
On systems with multiple timestamp event channels, there can be scenarios
where multiple userspace readers want to access the timestamping data for
various purposes.
One such example is wanting to use a pps out for time synchronization, and
wanting to timestamp external events with the synchronized time base
simultaneously.
Timestmp event consumers on the other hand, are often interested in a
subset of the available timestamp channels. linuxptp ts2phc, for example,
is not happy if more than one timestamping channel is active on the device
it is reading from.
Linked lists are introduced to support multiple timestamp event queue
consumers, and timestamp event channel filters through IOCTLs, as well as
a debugfs interface to do some simple verifications.
Xabier Marquiegui (6):
posix-clock: introduce posix_clock_context concept
ptp: Replace timestamp event queue with linked list
ptp: support multiple timestamp event readers
ptp: support event queue reader channel masks
ptp: add debugfs interface to see applied channel masks
ptp: add testptp mask test
---
v6:
- correct commit message
- correct coding style
v5: https://lore.kernel.org/netdev/cover.1696804243.git.reibax@gmail.com/
- fix spelling on commit message
- fix memory leak on ptp_open
v4: https://lore.kernel.org/netdev/cover.1696511486.git.reibax@gmail.com/
- split modifications in different patches for improved organization
- rename posix_clock_user to posix_clock_context
- remove unnecessary flush_users clock operation
- remove unnecessary tests
- simpler queue clean procedure
- fix/clean comment lines
- simplified release procedures
- filter modifications exclusive to currently open instance for
simplicity and security
- expand mask to 2048 channels
- make more secure and simple: mask is only applied to the testptp
instance. Use debugfs to verify effects.
v3: https://lore.kernel.org/netdev/20230928133544.3642650-1-reibax@gmail.com/
- add this patchset overview file
- fix use of safe and non safe linked lists for loops
- introduce new posix_clock private_data and ida object ids for better
dicrimination of timestamp consumers
- safer resource release procedures
- filter application by object id, aided by process id
- friendlier testptp implementation of event queue channel filters
v2: https://lore.kernel.org/netdev/20230912220217.2008895-1-reibax@gmail.com/
- fix ptp_poll() return value
- Style changes to comform to checkpatch strict suggestions
- more coherent ptp_read error exit routines
- fix testptp compilation error: unknown type name 'pid_t'
- rename mask variable for easier code traceability
- more detailed commit message with two examples
v1: https://lore.kernel.org/netdev/20230906104754.1324412-2-reibax@gmail.com/
====================
Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Richard Cochran <richardcochran@gmail.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Richard Cochran <richardcochran@gmail.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xabier Marquiegui [Wed, 11 Oct 2023 22:39:57 +0000 (00:39 +0200)]
ptp: add debugfs interface to see applied channel masks
Use debugfs to be able to view channel mask applied to every timestamp
event queue.
Every time the device is opened, a new entry is created in
`$DEBUGFS_MOUNTPOINT/ptpN/$INSTANCE_ADDRESS/mask`.
The mask value can be viewed grouped in 32bit decimal values using cat,
or converted to hexadecimal with the included `ptpchmaskfmt.sh` script.
32 bit values are listed from least significant to most significant.
Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xabier Marquiegui [Wed, 11 Oct 2023 22:39:56 +0000 (00:39 +0200)]
ptp: support event queue reader channel masks
On systems with multiple timestamp event channels, some readers might
want to receive only a subset of those channels.
Add the necessary modifications to support timestamp event channel
filtering, including two IOCTL operations:
- Clear all channels
- Enable one channel
The mask modification operations will be applied exclusively on the
event queue assigned to the file descriptor used on the IOCTL operation,
so the typical procedure to have a reader receiving only a subset of the
enabled channels would be:
- Open device file
- ioctl: clear all channels
- ioctl: enable one channel
- start reading
Calling the enable one channel ioctl more than once will result in
multiple enabled channels.
Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Richard Cochran <richardcochran@gmail.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xabier Marquiegui [Wed, 11 Oct 2023 22:39:55 +0000 (00:39 +0200)]
ptp: support multiple timestamp event readers
Use linked lists to create one event queue per open file. This enables
simultaneous readers for timestamp event queues.
Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xabier Marquiegui [Wed, 11 Oct 2023 22:39:54 +0000 (00:39 +0200)]
ptp: Replace timestamp event queue with linked list
Introduce linked lists to access the timestamp event queue.
Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add the necessary structure to support custom private-data per
posix-clock user.
The previous implementation of posix-clock assumed all file open
instances need access to the same clock structure on private_data.
The need for individual data structures per file open instance has been
identified when developing support for multiple timestamp event queue
users for ptp_clock.
Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Richard Cochran <richardcochran@gmail.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 15 Oct 2023 15:08:25 +0000 (16:08 +0100)]
Merge branch 'dpll-phase-offset-phase-adjust'
Arkadiusz Kubalewski says:
====================
dpll: add phase-offset and phase-adjust
Improve monitoring and control over dpll devices.
Allow user to receive measurement of phase difference between signals
on pin and dpll (phase-offset).
Allow user to receive and control adjustable value of pin's signal
phase (phase-adjust).
v4->v5:
- rebase series on top of net-next/main, fix conflict - remove redundant
attribute type definition in subset definition
v3->v4:
- do not increase do version of uAPI header as it is not needed (v3 did
not have this change)
- fix spelling around commit messages, argument descriptions and docs
- add missing extack errors on failure set callbacks for pin phase
adjust and frequency
- remove ice check if value is already set, now redundant as checked in
the dpll subsystem
v2->v3:
- do not increase do version of uAPI header as it is not needed
v1->v2:
- improve handling for error case of requesting the phase adjust set
- align handling for error case of frequency set request with the
approach introduced for phase adjust
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadiusz Kubalewski [Wed, 11 Oct 2023 10:12:36 +0000 (12:12 +0200)]
dpll: netlink/core: change pin frequency set behavior
Align the approach of pin frequency set behavior with the approach
introduced with pin phase adjust set.
Fail the request if any of devices did not registered the callback ops.
If callback op on any pin's registered device fails, return error and
rollback the value to previous one.
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadiusz Kubalewski [Wed, 11 Oct 2023 10:12:34 +0000 (12:12 +0200)]
dpll: netlink/core: add support for pin-dpll signal phase offset/adjust
Add callback ops for pin-dpll phase measurement.
Add callback for pin signal phase adjustment.
Add min and max phase adjustment values to pin proprties.
Invoke callbacks in dpll_netlink.c when filling the pin details to
provide user with phase related attribute values.
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 15 Oct 2023 13:33:42 +0000 (14:33 +0100)]
Merge branch 'i40e-devlink'
Ivan Vecera says:
====================
i40e: Add basic devlink support
The series adds initial support for devlink to i40e driver.
Patch-set overview:
Patch 1: Adds initial devlink support (devlink and port registration)
Patch 2: Refactors and split i40e_nvm_version_str()
Patch 3: Adds support for 'devlink dev info'
Patch 4: Refactors existing helper function to read PBA ID
Patch 5: Adds 'board.id' to 'devlink dev info' using PBA ID
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Fri, 13 Oct 2023 17:07:54 +0000 (19:07 +0200)]
i40e: Refactor and rename i40e_read_pba_string()
Function i40e_read_pba_string() is currently unused but will be used
by subsequent patch to provide board ID via devlink device info.
The function reads PBA block from NVM so it cannot be called during
adapter reset and as we would like to provide PBA ID via devlink
info it is better to read the PBA ID during i40e_probe() and cache
it in i40e_hw structure to avoid a waiting for potential adapter
reset in devlink info callback.
So...
- Remove pba_num and pba_num_size arguments from the function,
allocate resource managed buffer to store PBA ID string and
save resulting pointer to i40e_hw->pba_id field
- Make the function void as the PBA ID can be missing and in this
case (or in case of NVM reading failure) the i40e_hw->pba_id
will be NULL
- Rename the function to i40e_get_pba_string() to align with other
functions like i40e_get_oem_version() i40e_get_port_mac_addr()...
- Call this function on init during i40e_probe()
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Fri, 13 Oct 2023 17:07:53 +0000 (19:07 +0200)]
i40e: Add handler for devlink .info_get
Provide devlink .info_get callback to allow the driver to report
detailed version information. The following info is reported:
"serial_number" -> The PCI DSN of the adapter
"fw.mgmt" -> The version of the firmware
"fw.mgmt.api" -> The API version of interface exposed over the AdminQ
"fw.psid" -> The version of the NVM image
"fw.bundle_id" -> Unique identifier for the combined flash image
"fw.undi" -> The combo image version
With this, 'devlink dev info' provides at least the same amount
information as is reported by ETHTOOL_GDRVINFO:
Ivan Vecera [Fri, 13 Oct 2023 17:07:52 +0000 (19:07 +0200)]
i40e: Split and refactor i40e_nvm_version_str()
The function formats NVM version string according adapter's
EETrackID value. If this value OEM specific (0xffffffff) then
the reported version is with format:
"<gen>.<snap>.<release>"
and in other case
"<nvm_maj>.<nvm_min> <eetrackid> <cvid_maj>.<cvid_bld>.<cvid_min>"
These versions are reported in the subsequent patch in this series
that implements devlink .info_get but separately.
So split the function into separate ones, refactor it to use them
and remove ugly static string buffer.
Additionally convert NVM/OEM version mask macros to use GENMASK and
use FIELD_GET/FIELD_PREP for them in i40e_nvm_version_str() and
i40e_get_oem_version(). This makes code more readable and allows
us to remove related shift macros.
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Fri, 13 Oct 2023 17:07:51 +0000 (19:07 +0200)]
i40e: Add initial devlink support
Add an initial support for devlink interface to i40e driver.
Similarly to ice driver the implementation doe not enable devlink
to manage device-wide configuration and devlink instance is created
for each physical function of PCIe device.
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavan Chebbi [Fri, 13 Oct 2023 13:59:19 +0000 (06:59 -0700)]
tg3: Improve PTP TX timestamping logic
When we are trying to timestamp a TX packet, there may be
occasions when the TX timestamp register is still not
updated with the latest timestamp even if the timestamp
packet descriptor is marked as complete.
This usually happens in cases where the system is under
stress or flow control is affecting the transmit side.
We will solve this problem by saving the snapshot of the
timestamp register when we are posting the TX descriptor.
At this time, the register contains previously timestamped
packet's value and valid timestamp of the current packet must
be different than this.
Upon completion of the current descriptor, we will check if
the timestamp register is updated or not before timestamping
the skb. If not updated, we will schedule the ptp worker to
fetch the updated time later and timestamp the skb.
Also now we restrict number of outstanding PTP TX packet
requests to 1.
Reported-by: Simon White <Simon.White@viavisolutions.com> Link: https://lore.kernel.org/netdev/CACKFLikGdN9XPtWk-fdrzxdcD=+bv-GHBvfVfSpJzHY7hrW39g@mail.gmail.com/ Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 11 Oct 2023 02:42:24 +0000 (19:42 -0700)]
docs: try to encourage (netdev?) reviewers
Add a section to netdev maintainer doc encouraging reviewers
to chime in on the mailing list.
The questions about "when is it okay to share feedback"
keep coming up (most recently at netconf) and the answer
is "pretty much always".
Extend the section of 7.AdvancedTopics.rst which deals
with reviews a little bit to add stuff we had been recommending
locally.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 15 Oct 2023 13:25:03 +0000 (14:25 +0100)]
Merge branch 'sfc-conntrack-offload'
Edward Cree says:
====================
sfc: support conntrack NAT offload
The EF100 MAE supports performing NAT (and NPT) on packets which match in
the conntrack table. This series adds that capability to the driver.
====================
Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 10 Oct 2023 21:52:00 +0000 (22:52 +0100)]
sfc: support offloading ct(nat) action in RHS rules
If an IP address and/or L4 port for NAPT is available from a CT match,
the MAE will perform the edits; if no CT lookup has been performed for
this packet, the CT lookup did not return a match, or the matched CT
entry did not include NAPT, the action will have no effect.
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 10 Oct 2023 21:51:59 +0000 (22:51 +0100)]
sfc: parse mangle actions (NAT) in conntrack entries
The MAE can edit either address, L4 port, or both, for either source
or destination. These can't be mixed; i.e. it can edit source addr
and source port, but not (say) source addr and dest port.
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
this patchset is first of three parts of another big patchset for
MSG_ZEROCOPY flag support:
https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/
During review of this series, Stefano Garzarella <sgarzare@redhat.com>
suggested to split it for three parts to simplify review and merging:
1) virtio and vhost updates (for fragged skbs) <--- this patchset
2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
tx completions) and update for Documentation/.
3) Updates for tests and utils.
This series enables handling of fragged skbs in virtio and vhost parts.
Newly logic won't be triggered, because SO_ZEROCOPY options is still
impossible to enable at this moment (next bunch of patches from big
set above will enable it).
I've included changelog to some patches anyway, because there were some
comments during review of last big patchset from the link above.
Link to v1:
https://lore.kernel.org/netdev/20230717210051.856388-1-AVKrasnov@sberdevices.ru/
Link to v2:
https://lore.kernel.org/netdev/20230718180237.3248179-1-AVKrasnov@sberdevices.ru/
Link to v3:
https://lore.kernel.org/netdev/20230720214245.457298-1-AVKrasnov@sberdevices.ru/
Link to v4:
https://lore.kernel.org/netdev/20230727222627.1895355-1-AVKrasnov@sberdevices.ru/
Link to v5:
https://lore.kernel.org/netdev/20230730085905.3420811-1-AVKrasnov@sberdevices.ru/
Link to v6:
https://lore.kernel.org/netdev/20230814212720.3679058-1-AVKrasnov@sberdevices.ru/
Link to v7:
https://lore.kernel.org/netdev/20230827085436.941183-1-avkrasnov@salutedevices.com/
Link to v8:
https://lore.kernel.org/netdev/20230911202234.1932024-1-avkrasnov@salutedevices.com/
Changelog:
v3 -> v4:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
v4 -> v5:
* See per-patch changelog after ---.
v5 -> v6:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
v6 -> v7:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
v7 -> v8:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
v8 -> v9:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arseniy Krasnov [Tue, 10 Oct 2023 19:15:24 +0000 (22:15 +0300)]
test/vsock: io_uring rx/tx tests
This adds set of tests which use io_uring for rx/tx. This test suite is
implemented as separated util like 'vsock_test' and has the same set of
input arguments as 'vsock_test'. These tests only cover cases of data
transmission (no connect/bind/accept etc).
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arseniy Krasnov [Tue, 10 Oct 2023 19:15:23 +0000 (22:15 +0300)]
test/vsock: MSG_ZEROCOPY support for vsock_perf
To use this option pass '--zerocopy' parameter:
./vsock_perf --zerocopy --sender <cid> ...
With this option MSG_ZEROCOPY flag will be passed to the 'send()' call.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arseniy Krasnov [Tue, 10 Oct 2023 19:15:22 +0000 (22:15 +0300)]
test/vsock: MSG_ZEROCOPY flag tests
This adds three tests for MSG_ZEROCOPY feature:
1) SOCK_STREAM tx with different buffers.
2) SOCK_SEQPACKET tx with different buffers.
3) SOCK_STREAM test to read empty error queue of the socket.
Patch also works as preparation for the next patches for tools in this
patchset: vsock_perf and vsock_uring_test:
1) Adds several new functions to util.c - they will be also used by
vsock_uring_test.
2) Adds two new functions for MSG_ZEROCOPY handling to a new source
file - such source will be shared between vsock_test, vsock_perf and
vsock_uring_test, thus avoiding code copy-pasting.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arseniy Krasnov [Tue, 10 Oct 2023 19:15:21 +0000 (22:15 +0300)]
docs: net: description of MSG_ZEROCOPY for AF_VSOCK
This adds description of MSG_ZEROCOPY flag support for AF_VSOCK type of
socket.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arseniy Krasnov [Tue, 10 Oct 2023 19:15:20 +0000 (22:15 +0300)]
vsock: enable setting SO_ZEROCOPY
For AF_VSOCK, zerocopy tx mode depends on transport, so this option must
be set in AF_VSOCK implementation where transport is accessible (if
transport is not set during setting SO_ZEROCOPY: for example socket is
not connected, then SO_ZEROCOPY will be enabled, but once transport will
be assigned, support of this type of transmission will be checked).
To handle SO_ZEROCOPY, AF_VSOCK implementation uses SOCK_CUSTOM_SOCKOPT
bit, thus handling SOL_SOCKET option operations, but all of them except
SO_ZEROCOPY will be forwarded to the generic handler by calling
'sock_setsockopt()'.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arseniy Krasnov [Tue, 10 Oct 2023 19:15:19 +0000 (22:15 +0300)]
vsock/loopback: support MSG_ZEROCOPY for transport
Add 'msgzerocopy_allow()' callback for loopback transport.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arseniy Krasnov [Tue, 10 Oct 2023 19:15:18 +0000 (22:15 +0300)]
vsock/virtio: support MSG_ZEROCOPY for transport
Add 'msgzerocopy_allow()' callback for virtio transport.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arseniy Krasnov [Tue, 10 Oct 2023 19:15:17 +0000 (22:15 +0300)]
vhost/vsock: support MSG_ZEROCOPY for transport
Add 'msgzerocopy_allow()' callback for vhost transport.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arseniy Krasnov [Tue, 10 Oct 2023 19:15:16 +0000 (22:15 +0300)]
vsock: enable SOCK_SUPPORT_ZC bit
This bit is used by io_uring in case of zerocopy tx mode. io_uring code
checks, that socket has this feature. This patch sets it in two places:
1) For socket in 'connect()' call.
2) For new socket which is returned by 'accept()' call.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arseniy Krasnov [Tue, 10 Oct 2023 19:15:15 +0000 (22:15 +0300)]
vsock: check for MSG_ZEROCOPY support on send
This feature totally depends on transport, so if transport doesn't
support it, return error.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arseniy Krasnov [Tue, 10 Oct 2023 19:15:14 +0000 (22:15 +0300)]
vsock: read from socket's error queue
This adds handling of MSG_ERRQUEUE input flag in receive call. This flag
is used to read socket's error queue instead of data queue. Possible
scenario of error queue usage is receiving completions for transmission
with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK'
and 'VSOCK_RECVERR'.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arseniy Krasnov [Tue, 10 Oct 2023 19:15:13 +0000 (22:15 +0300)]
vsock: set EPOLLERR on non-empty error queue
If socket's error queue is not empty, EPOLLERR must be set. Otherwise,
reader of error queue won't detect data in it using EPOLLERR bit.
Currently for AF_VSOCK this is actual only with MSG_ZEROCOPY, as this
feature is the only user of an error queue of the socket.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lukas Bulwahn [Thu, 12 Oct 2023 06:34:43 +0000 (08:34 +0200)]
appletalk: remove special handling code for ipddp
After commit 1dab47139e61 ("appletalk: remove ipddp driver") removes the
config IPDDP, there is some minor code clean-up possible in the appletalk
network layer.
Remove some code in appletalk layer after the ipddp driver is gone.
Justin Stitt [Thu, 12 Oct 2023 18:35:41 +0000 (18:35 +0000)]
qed: replace uses of strncpy
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
This patch eliminates three uses of strncpy():
Firstly, `dest` is expected to be NUL-terminated which is evident by the
manual setting of a NUL-byte at size - 1. For this use specifically,
strscpy() is a viable replacement due to the fact that it guarantees
NUL-termination on the destination buffer.
The next two cases should simply be memcpy() as the size of the src
string is always 3 and the destination string just wants the first 3
bytes changed.
To be clear, there are no buffer overread bugs in the current code as
the sizes and offsets are carefully managed such that buffers are
NUL-terminated. However, with these changes, the code is now more robust
and less ambiguous (and hopefully easier to read).
Heiner Kallweit [Thu, 12 Oct 2023 06:51:13 +0000 (08:51 +0200)]
r8169: fix rare issue with broken rx after link-down on RTL8125
In very rare cases (I've seen two reports so far about different
RTL8125 chip versions) it seems the MAC locks up when link goes down
and requires a software reset to get revived.
Realtek doesn't publish hw errata information, therefore the root cause
is unknown. Realtek vendor drivers do a full hw re-initialization on
each link-up event, the slimmed-down variant here was reported to fix
the issue for the reporting user.
It's not fully clear which parts of the NIC are reset as part of the
software reset, therefore I can't rule out side effects.
====================
net: netconsole: configfs entries for boot target
There is a limitation in netconsole, where it is impossible to
disable or modify the target created from the command line parameter.
(netconsole=...).
"netconsole" cmdline parameter sets the remote IP, and if the remote IP
changes, the machine needs to be rebooted (with the new remote IP set in
the command line parameter).
This allows the user to modify a target without the need to restart the
machine.
This functionality sits on top of the dynamic target reconfiguration that is
already implemented in netconsole.
The way to modify a boot time target is creating special named configfs
directories, that will be associated with the targets coming from
`netconsole=...`.
Example:
Let's suppose you have two netconsole targets defined at boot time::
Breno Leitao [Thu, 12 Oct 2023 11:14:00 +0000 (04:14 -0700)]
netconsole: Attach cmdline target to dynamic target
Enable the attachment of a dynamic target to the target created during
boot time. The boot-time targets are named as "cmdline\d", where "\d" is
a number starting at 0.
If the user creates a dynamic target named "cmdline0", it will attach to
the first target created at boot time (as defined in the
`netconsole=...` command line argument). `cmdline1` will attach to the
second target and so forth.
If there is no netconsole target created at boot time, then, the target
name could be reused.
Breno Leitao [Thu, 12 Oct 2023 11:13:59 +0000 (04:13 -0700)]
netconsole: Initialize configfs_item for default targets
For netconsole targets allocated during the boot time (passing
netconsole=... argument), netconsole_target->item is not initialized.
That is not a problem because it is not used inside configfs.
An upcoming patch will be using it, thus, initialize the targets with
the name 'cmdline' plus a counter starting from 0. This name will match
entries in the configfs later.
Breno Leitao [Thu, 12 Oct 2023 11:13:58 +0000 (04:13 -0700)]
netconsole: move init/cleanup functions lower
Move alloc_param_target() and its counterpart (free_param_target())
to the bottom of the file. These functions are called mostly at
initialization/cleanup of the module, and they should be just above the
callers, at the bottom of the file.
From a practical perspective, having alloc_param_target() at the bottom
of the file will avoid forward declaration later (in the following
patch).
Nothing changed other than the functions location.
Justin Stitt [Thu, 12 Oct 2023 20:38:19 +0000 (20:38 +0000)]
sfc: replace deprecated strncpy with strscpy
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
`desc` is expected to be NUL-terminated as evident by the manual
NUL-byte assignment. Moreover, NUL-padding does not seem to be
necessary.
The only caller of efx_mcdi_nvram_metadata() is
efx_devlink_info_nvram_partition() which provides a NULL for `desc`:
| rc = efx_mcdi_nvram_metadata(efx, partition_type, NULL, version, NULL, 0);
Due to this, I am not sure this code is even reached but we should still
favor something other than strncpy.
Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.
Justin Stitt [Thu, 12 Oct 2023 22:25:12 +0000 (22:25 +0000)]
net: phy: tja11xx: replace deprecated strncpy with ethtool_sprintf
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
ethtool_sprintf() is designed specifically for get_strings() usage.
Let's replace strncpy in favor of this dedicated helper function.
Justin Stitt [Wed, 11 Oct 2023 21:53:44 +0000 (21:53 +0000)]
ionic: replace deprecated strncpy with strscpy
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
NUL-padding is not needed due to `ident` being memset'd to 0 just before
the copy.
Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.
Justin Stitt [Wed, 11 Oct 2023 21:37:18 +0000 (21:37 +0000)]
net: sparx5: replace deprecated strncpy with ethtool_sprintf
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
ethtool_sprintf() is designed specifically for get_strings() usage.
Let's replace strncpy() in favor of this more robust and easier to
understand interface.
Justin Stitt [Wed, 11 Oct 2023 21:04:37 +0000 (21:04 +0000)]
net/mlx4_core: replace deprecated strncpy with strscpy
`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
We expect `dst` to be NUL-terminated based on its use with format
strings:
| mlx4_dbg(dev, "Reporting Driver Version to FW: %s\n", dst);
Moreover, NUL-padding is not required.
Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.
Justin Stitt [Wed, 11 Oct 2023 21:48:39 +0000 (21:48 +0000)]
nfp: replace deprecated strncpy with strscpy
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
We expect res->name to be NUL-terminated based on its usage with format
strings:
| dev_err(cpp->dev.parent, "Dangling area: %d:%d:%d:0x%0llx-0x%0llx%s%s\n",
| NFP_CPP_ID_TARGET_of(res->cpp_id),
| NFP_CPP_ID_ACTION_of(res->cpp_id),
| NFP_CPP_ID_TOKEN_of(res->cpp_id),
| res->start, res->end,
| res->name ? " " : "",
| res->name ? res->name : "");
... and with strcmp()
| if (!strcmp(res->name, NFP_RESOURCE_TBL_NAME)) {
Moreover, NUL-padding is not required as `res` is already
zero-allocated:
| res = kzalloc(sizeof(*res), GFP_KERNEL);
Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.
Let's also opt to use the more idiomatic strscpy() usage of (dest, src,
sizeof(dest)) rather than (dest, src, SOME_LEN).
Typically the pattern of 1) allocate memory for string, 2) copy string
into freshly-allocated memory is a candidate for kmemdup_nul() but in
this case we are allocating the entirety of the `res` struct and that
should stay as is. As mentioned above, simple 1:1 replacement of strncpy
-> strscpy :)
Ido Schimmel [Wed, 11 Oct 2023 14:39:12 +0000 (16:39 +0200)]
mlxsw: pci: Allocate skbs using GFP_KERNEL during initialization
The driver allocates skbs during initialization and during Rx
processing. Take advantage of the fact that the former happens in
process context and allocate the skbs using GFP_KERNEL to decrease the
probability of allocation failure.
Subbaraya Sundeep [Wed, 11 Oct 2023 12:15:51 +0000 (17:45 +0530)]
octeontx2-af: Enable hardware timestamping for VFs
Currently for VFs, mailbox returns ENODEV error when hardware timestamping
enable is requested. This patch fixes this issue. Modified this patch to
return EPERM error for the PF/VFs which are not attached to CGX/RPM.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231011121551.1205211-1-saikrishnag@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Justin Stitt [Tue, 10 Oct 2023 22:32:35 +0000 (22:32 +0000)]
net: dsa: vsc73xx: replace deprecated strncpy with ethtool_sprintf
`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
ethtool_sprintf() is designed specifically for get_strings() usage.
Let's replace strncpy in favor of this more robust and easier to
understand interface.
This change could result in misaligned strings when if(cnt) fails. To
combat this, use ternary to place empty string in buffer and properly
increment pointer to next string slot.
Heng Guo [Wed, 11 Oct 2023 01:51:37 +0000 (09:51 +0800)]
net: fix IPSTATS_MIB_OUTFORWDATAGRAMS increment after fragment check
Reproduce environment:
network with 3 VM linuxs is connected as below:
VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3
VM1: eth0 ip: 192.168.122.207 MTU 1800
VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500
VM3: eth0 ip: 192.168.123.240 MTU 1800
Reproduce:
VM1 send 1600 bytes UDP data to VM3 using tools scapy with flags='DF'.
scapy command:
send(IP(dst="192.168.123.240",flags='DF')/UDP()/str('0'*1600),count=1,
inter=1.000000)
Issue description and patch:
ip_exceeds_mtu() in ip_forward() drops this IP datagram because skb len
(1600 sending by scapy) is over MTU(1500 in VM2) if "DF" is set.
According to RFC 4293 "3.2.3. IP Statistics Tables",
+-------+------>------+----->-----+----->-----+
| InForwDatagrams (6) | OutForwDatagrams (6) |
| V +->-+ OutFragReqds
| InNoRoutes | | (packets)
/ (local packet (3) | |
| IF is that of the address | +--> OutFragFails
| and may not be the receiving IF) | | (packets)
the IPSTATS_MIB_OUTFORWDATAGRAMS should be counted before fragment
check.
The existing implementation, instead, would incease the counter after
fragment check: ip_exceeds_mtu() in ipv4 and ip6_pkt_too_big() in ipv6.
So do patch to move IPSTATS_MIB_OUTFORWDATAGRAMS counter to ip_forward()
for ipv4 and ip6_forward() for ipv6.
Test result with patch:
Before IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss
Ip: 1 64 6 0 2 2 0 0 2 4 0 0 0 0 0 0 0 0 0
......
root@qemux86-64:~#
----------------------------------------------------------------------
After IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss
Ip: 1 64 7 0 2 3 0 0 2 5 0 0 0 0 0 0 0 1 0
......
root@qemux86-64:~#
----------------------------------------------------------------------
ForwDatagrams is updated from 2 to 3.
Jakub Kicinski [Fri, 13 Oct 2023 16:35:34 +0000 (09:35 -0700)]
Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:
====================
This PR is collected from
https://lore.kernel.org/all/cover.1695296682.git.leon@kernel.org
This series from Patrisious extends mlx5 to support IPsec packet offload
in multiport devices (MPV, see [1] for more details).
These devices have single flow steering logic and two netdev interfaces,
which require extra logic to manage IPsec configurations as they performed
on netdevs.
* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Handle IPsec steering upon master unbind/bind
net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic
net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic
net/mlx5: Add create alias flow table function to ipsec roce
net/mlx5: Implement alias object allow and create functions
net/mlx5: Add alias flow table bits
net/mlx5: Store devcom pointer inside IPsec RoCE
net/mlx5: Register mlx5e priv to devcom in MPV mode
RDMA/mlx5: Send events from IB driver about device affiliation state
net/mlx5: Introduce ifc bits for migration in a chunk mode
David S. Miller [Fri, 13 Oct 2023 10:26:11 +0000 (11:26 +0100)]
Merge branch 'tls-cleanups'
Sabrina Dubroca says:
====================
net: tls: various code cleanups and improvements
This series contains multiple cleanups and simplifications for the
config code of both TLS_SW and TLS_HW.
It also modifies the chcr_ktls driver to use driver_state like all
other drivers, so that we can then make driver_state fixed size
instead of a flex array always allocated to that same fixed size. As
reported by Gustavo A. R. Silva, the way chcr_ktls misuses
driver_state irritates GCC [1].
Patches 1 and 2 are follow-ups to my previous cipher_desc series.
Sabrina Dubroca [Mon, 9 Oct 2023 20:50:54 +0000 (22:50 +0200)]
tls: use fixed size for tls_offload_context_{tx,rx}.driver_state
driver_state is a flex array, but is always allocated by the tls core
to a fixed size (TLS_DRIVER_STATE_SIZE_{TX,RX}). Simplify the code by
making that size explicit so that sizeof(struct
tls_offload_context_{tx,rx}) works.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Mon, 9 Oct 2023 20:50:53 +0000 (22:50 +0200)]
chcr_ktls: use tls_offload_context_tx and driver_state like other drivers
chcr_ktls uses the space reserved in driver_state by
tls_set_device_offload, but makes up into own wrapper around
tls_offload_context_tx instead of accessing driver_state via the
__tls_driver_ctx helper.
In this driver, driver_state is only used to store a pointer to a
larger context struct allocated by the driver.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Mon, 9 Oct 2023 20:50:51 +0000 (22:50 +0200)]
tls: remove tls_context argument from tls_set_device_offload
It's not really needed since we end up refetching it as tls_ctx. We
can also remove the NULL check, since we have already dereferenced ctx
in do_tls_setsockopt_conf.
While at it, fix up the reverse xmas tree ordering.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Mon, 9 Oct 2023 20:50:50 +0000 (22:50 +0200)]
tls: remove tls_context argument from tls_set_sw_offload
It's not really needed since we end up refetching it as tls_ctx. We
can also remove the NULL check, since we have already dereferenced ctx
in do_tls_setsockopt_conf.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Mon, 9 Oct 2023 20:50:42 +0000 (22:50 +0200)]
tls: drop unnecessary cipher_type checks in tls offload
We should never reach tls_device_reencrypt, tls_enc_record, or
tls_enc_skb with a cipher_type that can't be offloaded. Replace those
checks with a DEBUG_NET_WARN_ON_ONCE, and use cipher_desc instead of
hard-coding offloadable cipher types.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 10 Oct 2023 14:44:00 +0000 (16:44 +0200)]
selftests: netdevsim: use suitable existing dummy file for flash test
The file name used in flash test was "dummy" because at the time test
was written, drivers were responsible for file request and as netdevsim
didn't do that, name was unused. However, the file load request is
now done in devlink code and therefore the file has to exist.
Use first random file from /lib/firmware for this purpose.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Luca Fancellu [Tue, 10 Oct 2023 14:26:30 +0000 (15:26 +0100)]
xen-netback: add software timestamp capabilities
Add software timestamp capabilities to the xen-netback driver
by advertising it on the struct ethtool_ops and calling
skb_tx_timestamp before passing the buffer to the queue.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Justin Stitt [Mon, 9 Oct 2023 23:19:57 +0000 (23:19 +0000)]
ibmvnic: replace deprecated strncpy with strscpy
`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
NUL-padding is not required as the buffer is already memset to 0:
| memset(adapter->fw_version, 0, 32);
Note that another usage of strscpy exists on the same buffer:
| strscpy((char *)adapter->fw_version, "N/A", sizeof(adapter->fw_version));
Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.
Justin Stitt [Mon, 9 Oct 2023 23:05:41 +0000 (23:05 +0000)]
net: fec: replace deprecated strncpy with ethtool_sprintf
`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
ethtool_sprintf() is designed specifically for get_strings() usage.
Let's replace strncpy in favor of this more robust and easier to
understand interface.
Also, while we're here, let's change memcpy() over to ethtool_sprintf()
for consistency.
Rob Herring [Mon, 9 Oct 2023 17:29:04 +0000 (12:29 -0500)]
net: mdio: xgene: Use device_get_match_data()
Use preferred device_get_match_data() instead of of_match_device() and
acpi_match_device() to get the driver match data. With this, adjust the
includes to explicitly include the correct headers.
Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Mon, 9 Oct 2023 21:46:18 +0000 (15:46 -0600)]
net: wwan: t7xx: Add __counted_by for struct t7xx_fsm_event and use struct_size()
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
While there, use struct_size() helper, instead of the open-coded
version, to calculate the size for the allocation of the whole
flexible structure, including of course, the flexible-array member.
This code was found with the help of Coccinelle, and audited and
fixed manually.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Rob Herring [Mon, 9 Oct 2023 17:29:00 +0000 (12:29 -0500)]
net: ethernet: wiznet: Use spi_get_device_match_data()
Use preferred spi_get_device_match_data() instead of of_match_device() and
spi_get_device_id() to get the driver match data. With this, adjust the
includes to explicitly include the correct headers.
Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Rob Herring [Mon, 9 Oct 2023 17:28:58 +0000 (12:28 -0500)]
net: ethernet: Use device_get_match_data()
Use preferred device_get_match_data() instead of of_match_device() to
get the driver match data. With this, adjust the includes to explicitly
include the correct headers.
Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Wolsieffer [Mon, 9 Oct 2023 14:59:04 +0000 (10:59 -0400)]
net: stmmac: dwmac-stm32: refactor clock config
Currently, clock configuration is spread throughout the driver and
partially duplicated for the STM32MP1 and STM32 MCU variants. This makes
it difficult to keep track of which clocks need to be enabled or disabled
in various scenarios.
This patch adds symmetric stm32_dwmac_clk_enable/disable() functions
that handle all clock configuration, including quirks required while
suspending or resuming. syscfg_clk and clk_eth_ck are not present on
STM32 MCUs, but it is fine to try to configure them anyway since NULL
clocks are ignored.
Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 13 Oct 2023 09:00:32 +0000 (10:00 +0100)]
Merge branch 'vxlan-fdb-flushing'
Amit Cohen says:
====================
Extend VXLAN driver to support FDB flushing
The merge commit 92716869375b ("Merge branch 'br-flush-filtering'") added
support for FDB flushing in bridge driver. Extend VXLAN driver to support
FDB flushing also. Add support for filtering by fields which are relevant
for VXLAN FDBs:
* Source VNI
* Nexthop ID
* 'router' flag
* Destination VNI
* Destination Port
* Destination IP
Without this set, flush for VXLAN device fails:
$ bridge fdb flush dev vx10
RTNETLINK answers: Operation not supported
With this set, such flush works with the relevant arguments, for example:
$ bridge fdb flush dev vx10 vni 5000 dst 193.2.2.1
< flush all vx10 entries with VNI 5000 and destination IP 193.2.2.1>
Some preparations are required, handle them before adding flushing support
in VXLAN driver. See more details in commit messages.
Patch set overview:
Patch #1 prepares flush policy to be used by VXLAN driver
Patches #2-#3 are preparations in VXLAN driver
Patch #4 adds an initial support for flushing in VXLAN driver
Patches #5-#9 add support for filtering by several attributes
Patch #10 adds a test for FDB flush with VXLAN
Patch #11 extends the test to check FDB flush with bridge
====================
Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 9 Oct 2023 10:06:18 +0000 (13:06 +0300)]
selftests: fdb_flush: Add test cases for FDB flush with bridge device
Extend the test to check flushing with bridge device, test flush by device
and by VID.
Add test case for flushing with "self" and "master" and attributes that are
supported only in one driver, this is unrecommended configuration, check it
to verify that user gets an error.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 9 Oct 2023 10:06:17 +0000 (13:06 +0300)]
selftests: Add test cases for FDB flush with VXLAN device
Test all the supported arguments for FDB flush. The test checks
configuration, not traffic. Note that the flag 'offloaded' is not checked
as it is not relevant when there is no hardware.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 9 Oct 2023 10:06:16 +0000 (13:06 +0300)]
vxlan: vxlan_core: Support FDB flushing by destination IP
Add support for flush VXLAN FDB entries by destination IP. FDB entry is
stored as {MAC, SRC_VNI} + remote. The destination IP is an attribute of
the remote. For multicast entries, the VXLAN driver stores a linked list
of remotes for a given key.
In user space, each remote is represented as a separate entry, so when
flush is sent with filter of 'destination IP', flush only the match
remotes. In case that there are no additional remotes, destroy the entry.
For example, the following are stored as one entry with several remotes:
$ bridge fdb show dev vx10
00:00:00:00:00:00 dst 192.1.1.3 self permanent
00:00:00:00:00:00 dst 192.1.1.1 self permanent
00:00:00:00:00:00 dst 192.1.1.2 self permanent
00:00:00:00:00:00 dst 192.1.1.1 vni 1000 self permanent
When user flush by destination IP x, only the relevant remotes will be
flushed:
$ bridge fdb flush dev vx10 dst 192.1.1.1
$ bridge fdb show dev vx10
00:00:00:00:00:00 dst 192.1.1.3 self permanent
00:00:00:00:00:00 dst 192.1.1.2 self permanent
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 9 Oct 2023 10:06:15 +0000 (13:06 +0300)]
vxlan: vxlan_core: Support FDB flushing by destination port
Add support for flush VXLAN FDB entries by destination port. FDB entry
is stored as {MAC, SRC_VNI} + remote. The destination port is an attribute
of the remote. For multicast entries, the VXLAN driver stores a linked list
of remotes for a given key.
In user space, each remote is represented as a separate entry, so when
flush is sent with filter of 'destination port', flush only the match
remotes. In case that there are no additional remotes, destroy the entry.
For example, the following are stored as one entry with several remotes:
$ bridge fdb show dev vx10
00:00:00:00:00:00 dst 192.1.1.1 port 1111 vni 2000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 port 1111 vni 3000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 port 2222 vni 2000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent
When user flush by port x, only the relevant remotes will be flushed:
$ bridge fdb flush dev vx10 port 1111
$ bridge fdb show dev vx10
00:00:00:00:00:00 dst 192.1.1.1 port 2222 vni 2000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 9 Oct 2023 10:06:14 +0000 (13:06 +0300)]
vxlan: vxlan_core: Support FDB flushing by destination VNI
Add support for flush VXLAN FDB entries by destination VNI. FDB entry is
stored as {MAC, SRC_VNI} + remote. The destination VNI is an attribute
of the remote. For multicast entries, the VXLAN driver stores a linked list
of remotes for a given key.
In user space, each remote is represented as a separate entry, so when
flush is sent with filter of 'destination VNI', flush only the match
remotes. In case that there are no additional remotes, destroy the entry.
For example, the following are stored as one entry with several remotes:
$ bridge fdb show dev vx10
00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 vni 4000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 vni 2000 self permanent
00:00:00:00:00:00 dst 192.1.1.2 vni 2000 self permanent
When user flush by VNI x, only the relevant remotes will be flushed:
$ bridge fdb flush dev vx10 vni 2000
$ bridge fdb show dev vx10
00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 vni 4000 self permanent
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>