]> www.infradead.org Git - users/willy/linux.git/log
users/willy/linux.git
4 years agonet/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups
Dmytro Linkin [Mon, 31 May 2021 15:19:50 +0000 (18:19 +0300)]
net/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups

Provide eswitch API to allow controlling group rate limits. Use it to
implement devlink_ops->mlx5_devlink_rate_node_tx_{share|max}_set().

The share rate will create relative bandwidth share on the groups level
while within the group the user can set shared rate on the member vports
of that group and this rate will be relative to the group's share rate.
The group with the highest shared rate will get a BW share of 100 and
the rest of the groups will get a value that reflects the ratio between
their share rate and the maximum share rate.

Example:
Created four rate groups with tx_share limits:

$ devlink port function rate add \
    pci/0000:06:00.0/group_1 tx_share 30gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_2 tx_share 20gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_3 tx_share 20gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_4 tx_share 10gbit

Assuming link speed is 50 Gbit/sec ratio divider will be
50 / (30+20+20+10) = 0.625. Normalized rate values for the groups:

<group_1> 30 * 0.625 = 18.75 Gbit/sec
<group_2> 20 * 0.625 = 12.5 Gbit/sec
<group_3> 20 * 0.625 = 12.5 Gbit/sec
<group_4> 10 * 0.625 = 6.25 Gbit/sec

Rate group with unlimited tx_share rate will receive minimum BW value
(1Mbit/sec) if presented any group with tx_share rate limit. This allow
to not drop all packets in case of heavy traffic.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: E-switch, Introduce rate limiting groups API
Dmytro Linkin [Mon, 31 May 2021 14:08:14 +0000 (17:08 +0300)]
net/mlx5: E-switch, Introduce rate limiting groups API

Extend eswitch API with rate limiting groups:

- Define new struct mlx5_esw_rate_group that is used to hold all
  internal group data.

- Implement functions that allow creation, destruction and cleanup of
  groups.

- Assign all vports to internal unlimited zero group by default.

This commit lays the groundwork for group rate limiting by implementing
devlink_ops->rate_node_{new|del}() callbacks to support creating and
deleting groups through devlink rate node objects. APIs that allows
setting rates and adding/removing members are implemented in following
patches.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: E-switch, Enable devlink port tx_{share|max} rate control
Dmytro Linkin [Fri, 28 May 2021 14:42:15 +0000 (17:42 +0300)]
net/mlx5: E-switch, Enable devlink port tx_{share|max} rate control

Register devlink rate leaf object for every eswitch vport.
Implement devlink ops that enable setting shared and max tx rates
through devlink API.
Extract common eswitch code from existing tx rate set function that is
accessed through NDO to be reused for the devlink. Values configured
with NDO API are not visible for the devlink API, therefore shouldn't be
used simultaneously.

When normalizing the BW share value, dividing the desired minimum rate
by the common divider results in losing information since the quotient
is rounded down. This has a significant affect on configurations of low
rate where the round down eliminates a large percentage of the total
rate. To improve the formula, round up the division result to make sure
that the BW share is at least the value it was supposed to be and won't
lost a significant amount of the expected value.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: E-switch, Move QoS related code to dedicated file
Dmytro Linkin [Fri, 28 May 2021 14:30:22 +0000 (17:30 +0300)]
net/mlx5: E-switch, Move QoS related code to dedicated file

Move eswitch QoS related code into dedicated file. Provide eswitch API
to access this code meaning it is isolated and restricted to be used
only by eswitch.c. Exception is legacy NDO vf set rate, which moved to
esw/legacy.c.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: TC, Support sample offload action for tunneled traffic
Chris Mi [Mon, 21 Jun 2021 07:49:50 +0000 (10:49 +0300)]
net/mlx5e: TC, Support sample offload action for tunneled traffic

Currently the sample offload actions send the encapsulated packet
to software. This commit decapsulates the packet before performing
the sampling and set the tunnel properties on the skb metadata
fields to make the behavior consistent with OVS sFlow.

If decapsulating first, we can't use the same match like before in
default table. So instantiate a post action instance to continue
processing the action list. If HW can preserve reg_c, also use the
post action instance.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: TC, Restore tunnel info for sample offload
Chris Mi [Fri, 30 Apr 2021 07:17:33 +0000 (10:17 +0300)]
net/mlx5e: TC, Restore tunnel info for sample offload

Currently the sample offload actions send the encapsulated packet
to software. sFlow expects tunneled packets to be decapsulated while
having the tunnel properties on the skb metadata fields.

Reuse the functions used by connection tracking to map the outer
header properties to a unique id. The next patch  will use that id
to restore the tunnel information of decapsulated packets onto the
skb.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: TC, Remove CONFIG_NET_TC_SKB_EXT dependency when restoring tunnel
Chris Mi [Fri, 30 Apr 2021 09:08:40 +0000 (12:08 +0300)]
net/mlx5e: TC, Remove CONFIG_NET_TC_SKB_EXT dependency when restoring tunnel

CONFIG_NET_TC_SKB_EXT controls the SKB extension support for
restoring chain ids. SKB extension is not required for tunnel
restoration.

Remove the CONFIG_NET_TC_SKB_EXT dependency as a pre-step for
using the tunnel restore methods for sample offload use cases.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Refactor ct to use post action infrastructure
Chris Mi [Tue, 17 Aug 2021 03:23:09 +0000 (11:23 +0800)]
net/mlx5e: Refactor ct to use post action infrastructure

Move post action table management to common library providing
add/del/get API. Refactor the ct action offload to use the common
API.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Introduce post action infrastructure
Chris Mi [Mon, 16 Aug 2021 14:00:30 +0000 (22:00 +0800)]
net/mlx5e: Introduce post action infrastructure

Some tc actions are modeled in hardware using multiple tables
causing a tc action list split. For example, CT action is modeled
by jumping to a ct table which is controlled by nf flowtable.
sFlow jumps in hardware to a sample table, which continues to a
"default table" where it should continue processing the action list.

Multi table actions are modeled in hardware using a unique fte_id.
The fte_id is set before jumping to a table. Split actions continue
to a post-action table where the matched fte_id value continues the
execution the tc action list.

Currently the post-action design is implemented only by the ct
action. Introduce post action infrastructure as a pre-step for
reusing it with the sFlow offload feature. Init and destroy the
common post action table. Refactor the ct offload to use the
common post table infrastructure in the next patch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: CT, Use xarray to manage fte ids
Chris Mi [Wed, 2 Jun 2021 03:23:08 +0000 (06:23 +0300)]
net/mlx5e: CT, Use xarray to manage fte ids

IDR is deprecated. Use xarray instead.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Move sample attribute to flow attribute
Chris Mi [Wed, 18 Aug 2021 12:18:57 +0000 (20:18 +0800)]
net/mlx5e: Move sample attribute to flow attribute

Currently it is in eswitch attribute. Move it to flow attribute to
reflect the change in previous patch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Move esw/sample to en/tc/sample
Chris Mi [Wed, 18 Aug 2021 07:52:18 +0000 (15:52 +0800)]
net/mlx5e: Move esw/sample to en/tc/sample

Module sample belongs to en/tc instead of esw. Move it and rename
accordingly.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Remove mlx5e dependency from E-Switch sample
Saeed Mahameed [Mon, 16 Aug 2021 18:58:19 +0000 (11:58 -0700)]
net/mlx5e: Remove mlx5e dependency from E-Switch sample

mlx5/esw/sample.c doesn't really need mlx5e_priv object, we can remove
this redundant dependency by passing the eswitch object directly to
the sample object constructor.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Fri, 20 Aug 2021 01:09:18 +0000 (18:09 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

drivers/ptp/Kconfig:
  55c8fca1dae1 ("ptp_pch: Restore dependency on PCI")
  e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoMerge tag 'net-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 19 Aug 2021 19:33:43 +0000 (12:33 -0700)]
Merge tag 'net-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from bpf, wireless and mac80211
  trees.

  Current release - regressions:

   - tipc: call tipc_wait_for_connect only when dlen is not 0

   - mac80211: fix locking in ieee80211_restart_work()

  Current release - new code bugs:

   - bpf: add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id()

   - ethernet: ice: fix perout start time rounding

   - wwan: iosm: prevent underflow in ipc_chnl_cfg_get()

  Previous releases - regressions:

   - bpf: clear zext_dst of dead insns

   - sch_cake: fix srchost/dsthost hashing mode

   - vrf: reset skb conntrack connection on VRF rcv

   - net/rds: dma_map_sg is entitled to merge entries

  Previous releases - always broken:

   - ethernet: bnxt: fix Tx path locking and races, add Rx path
     barriers"

* tag 'net-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits)
  net: dpaa2-switch: disable the control interface on error path
  Revert "flow_offload: action should not be NULL when it is referenced"
  iavf: Fix ping is lost after untrusted VF had tried to change MAC
  i40e: Fix ATR queue selection
  r8152: fix the maximum number of PLA bp for RTL8153C
  r8152: fix writing USB_BP2_EN
  mptcp: full fully established support after ADD_ADDR
  mptcp: fix memory leak on address flush
  net/rds: dma_map_sg is entitled to merge entries
  net: mscc: ocelot: allow forwarding from bridge ports to the tag_8021q CPU port
  net: asix: fix uninit value bugs
  ovs: clear skb->tstamp in forwarding path
  net: mdio-mux: Handle -EPROBE_DEFER correctly
  net: mdio-mux: Don't ignore memory allocation errors
  net: mdio-mux: Delete unnecessary devm_kfree
  net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse
  sch_cake: fix srchost/dsthost hashing mode
  ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error path
  net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32
  mac80211: fix locking in ieee80211_restart_work()
  ...

4 years agoMerge tag 'platform-drivers-x86-v5.14-4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 19 Aug 2021 19:19:58 +0000 (12:19 -0700)]
Merge tag 'platform-drivers-x86-v5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

 - Enable SW_TABLET_MODE support for the TP200s

 - Enable WMI on two more Gigabyte motherboards

* tag 'platform-drivers-x86-v5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: gigabyte-wmi: add support for B450M S2H V2
  platform/x86: gigabyte-wmi: add support for X570 GAMING X
  platform/x86: asus-nb-wmi: Add tablet_mode_sw=lid-flip quirk for the TP200s
  platform/x86: asus-nb-wmi: Allow configuring SW_TABLET_MODE method with a module option

4 years agoocteontx2-af: remove redudant second error check on variable err
Colin Ian King [Wed, 18 Aug 2021 13:09:27 +0000 (14:09 +0100)]
octeontx2-af: remove redudant second error check on variable err

A recent change added error checking messages and failed to remove one
of the previous error checks. There are now two checks on variable err
so the second one is redundant dead code and can be removed.

Addresses-Coverity: ("Logically dead code")
Fixes: a83bdada06bf ("octeontx2-af: Add debug messages for failures")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210818130927.33895-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoMerge tag 'linux-can-next-for-5.15-20210819' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Thu, 19 Aug 2021 18:51:13 +0000 (11:51 -0700)]
Merge tag 'linux-can-next-for-5.15-20210819' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
linux-can-next-for-5.15-20210819

The first patch is by me, for the mailmap file and maps the email
address of two former ESD employees to a newly created role account.

The next 3 patches are by Oleksij Rempel and add support for GPIO
based switchable CAN bus termination.

The next 3 patches are by Vincent Mailhol. The first one changes the
CAN netlink interface to not bail out if the user switched off
unsupported features. The next one adds Vincent as the maintainer of
the etas_es58x driver and the last one cleans up the documentation of
struct es58x_fd_tx_conf_msg.

The next patch is by me, for the mcp251xfd driver and marks some
instances of struct mcp251xfd_priv as const. Lad Prabhakar contributes
2 patches for the rcar_canfd driver, that add support for RZ/G2L
family.

The next 5 patches target the m_can/tcan45x5 driver. 2 are by me an
fix trivial checkpatch warnings. The remaining 3 patches are by Matt
Kline and improve the performance on the SPI based tcan4x5x chip by
batching FIFO reads and writes.

The last 7 patches are for the c_can driver. Dario Binacchi's patch
converts the DT bindings to yaml, 2 patches by me fix a typo and
rename a macro to properly represent the usage. The last 4 patches are
again by Dario Binacchi and provide a performance improvement for the
TX path by operating the TX mailboxes as a true FIFO.

* tag 'linux-can-next-for-5.15-20210819' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (22 commits)
  can: c_can: cache frames to operate as a true FIFO
  can: c_can: support tx ring algorithm
  can: c_can: exit c_can_do_tx() early if no frames have been sent
  can: c_can: remove struct c_can_priv::priv field
  can: c_can: rename IF_RX -> IF_NAPI
  can: c_can: c_can_do_tx(): fix typo in comment
  dt-bindings: net: can: c_can: convert to json-schema
  can: m_can: Batch FIFO writes during CAN transmit
  can: m_can: Batch FIFO reads during CAN receive
  can: m_can: Disable IRQs on FIFO bus errors
  can: m_can: fix block comment style
  can: tcan4x5x: cdev_to_priv(): remove stray empty line
  can: rcar_canfd: Add support for RZ/G2L family
  dt-bindings: net: can: renesas,rcar-canfd: Document RZ/G2L SoC
  can: mcp251xfd: mark some instances of struct mcp251xfd_priv as const
  can: etas_es58x: clean-up documentation of struct es58x_fd_tx_conf_msg
  MAINTAINERS: add Vincent MAILHOL as maintainer for the ETAS ES58X CAN/USB driver
  can: netlink: allow user to turn off unsupported features
  can: dev: provide optional GPIO based termination support
  dt-bindings: can: fsl,flexcan: enable termination-* bindings
  ...
====================

Link: https://lore.kernel.org/r/20210819133913.657715-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet: dpaa2-switch: disable the control interface on error path
Vladimir Oltean [Thu, 19 Aug 2021 14:17:55 +0000 (17:17 +0300)]
net: dpaa2-switch: disable the control interface on error path

Currently dpaa2_switch_takedown has a funny name and does not do the
opposite of dpaa2_switch_init, which makes probing fail when we need to
handle an -EPROBE_DEFER.

A sketch of what dpaa2_switch_init does:

dpsw_open

dpaa2_switch_detect_features

dpsw_reset

for (i = 0; i < ethsw->sw_attr.num_ifs; i++) {
dpsw_if_disable

dpsw_if_set_stp

dpsw_vlan_remove_if_untagged

dpsw_if_set_tci

dpsw_vlan_remove_if
}

dpsw_vlan_remove

alloc_ordered_workqueue

dpsw_fdb_remove

dpaa2_switch_ctrl_if_setup

When dpaa2_switch_takedown is called from the error path of
dpaa2_switch_probe(), the control interface, enabled by
dpaa2_switch_ctrl_if_setup from dpaa2_switch_init, remains enabled,
because dpaa2_switch_takedown does not call
dpaa2_switch_ctrl_if_teardown.

Since dpaa2_switch_probe might fail due to EPROBE_DEFER of a PHY, this
means that a second probe of the driver will happen with the control
interface directly enabled.

This will trigger a second error:

[   93.273528] fsl_dpaa2_switch dpsw.0: dpsw_ctrl_if_set_pools() failed
[   93.281966] fsl_dpaa2_switch dpsw.0: fsl_mc_driver_probe failed: -13
[   93.288323] fsl_dpaa2_switch: probe of dpsw.0 failed with error -13

Which if we investigate the /dev/dpaa2_mc_console log, we find out is
caused by:

[E, ctrl_if_set_pools:2211, DPMNG]  ctrl_if must be disabled

So make dpaa2_switch_takedown do the opposite of dpaa2_switch_init (in
reasonable limits, no reason to change STP state, re-add VLANs etc), and
rename it to something more conventional, like dpaa2_switch_teardown.

Fixes: 613c0a5810b7 ("staging: dpaa2-switch: enable the control interface")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20210819141755.1931423-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoRevert "flow_offload: action should not be NULL when it is referenced"
Ido Schimmel [Thu, 19 Aug 2021 10:58:42 +0000 (13:58 +0300)]
Revert "flow_offload: action should not be NULL when it is referenced"

This reverts commit 9ea3e52c5bc8bb4a084938dc1e3160643438927a.

Cited commit added a check to make sure 'action' is not NULL, but
'action' is already dereferenced before the check, when calling
flow_offload_has_one_action().

Therefore, the check does not make any sense and results in a smatch
warning:

include/net/flow_offload.h:322 flow_action_mixed_hw_stats_check() warn:
variable dereferenced before check 'action' (see line 319)

Fix by reverting this commit.

Cc: gushengxian <gushengxian@yulong.com>
Fixes: 9ea3e52c5bc8 ("flow_offload: action should not be NULL when it is referenced")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20210819105842.1315705-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoMerge branch 'intel-wired-lan-driver-updates-2021-08-18'
Jakub Kicinski [Thu, 19 Aug 2021 16:56:42 +0000 (09:56 -0700)]
Merge branch 'intel-wired-lan-driver-updates-2021-08-18'

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-08-18

This series contains updates to i40e and iavf drivers.

Arkadiusz fixes Flow Director not using the correct queue due to calling
the wrong pick Tx function for i40e.

Sylwester resolves traffic loss for iavf when it attempts to change its
MAC address when it does not have permissions to do so.
====================

Link: https://lore.kernel.org/r/20210818174217.4138922-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoiavf: Fix ping is lost after untrusted VF had tried to change MAC
Sylwester Dziedziuch [Wed, 18 Aug 2021 17:42:17 +0000 (10:42 -0700)]
iavf: Fix ping is lost after untrusted VF had tried to change MAC

Make changes to MAC address dependent on the response of PF.
Disallow changes to HW MAC address and MAC filter from untrusted
VF, thanks to that ping is not lost if VF tries to change MAC.
Add a new field in iavf_mac_filter, to indicate whether there
was response from PF for given filter. Based on this field pass
or discard the filter.
If untrusted VF tried to change it's address, it's not changed.
Still filter was changed, because of that ping couldn't go through.

Fixes: c5c922b3e09b ("iavf: fix MAC address setting for VFs when filter is rejected")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Gurucharan G <Gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoi40e: Fix ATR queue selection
Arkadiusz Kubalewski [Wed, 18 Aug 2021 17:42:16 +0000 (10:42 -0700)]
i40e: Fix ATR queue selection

Without this patch, ATR does not work. Receive/transmit uses queue
selection based on SW DCB hashing method.

If traffic classes are not configured for PF, then use
netdev_pick_tx function for selecting queue for packet transmission.
Instead of calling i40e_swdcb_skb_tx_hash, call netdev_pick_tx,
which ensures that packet is transmitted/received from CPU that is
running the application.

Reproduction steps:
1. Load i40e driver
2. Map each MSI interrupt of i40e port for each CPU
3. Disable ntuple, enable ATR i.e.:
ethtool -K $interface ntuple off
ethtool --set-priv-flags $interface flow-director-atr
4. Run application that is generating traffic and is bound to a
single CPU, i.e.:
taskset -c 9 netperf -H 1.1.1.1 -t TCP_RR -l 10
5. Observe behavior:
Application's traffic should be restricted to the CPU provided in
taskset.

Fixes: 89ec1f0886c1 ("i40e: Fix queue-to-TC mapping on Tx")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoMerge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Thu, 19 Aug 2021 15:58:16 +0000 (08:58 -0700)]
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2021-08-19

We've added 3 non-merge commits during the last 3 day(s) which contain
a total of 3 files changed, 29 insertions(+), 6 deletions(-).

The main changes are:

1) Fix to clear zext_dst for dead instructions which was causing invalid program
   rejections on JITs with bpf_jit_needs_zext such as s390x, from Ilya Leoshkevich.

2) Fix RCU splat in bpf_get_current_{ancestor_,}cgroup_id() helpers when they are
   invoked from sleepable programs, from Yonghong Song.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests, bpf: Test that dead ldx_w insns are accepted
  bpf: Clear zext_dst of dead insns
  bpf: Add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers
====================

Link: https://lore.kernel.org/r/20210819144904.20069-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agocan: c_can: cache frames to operate as a true FIFO
Dario Binacchi [Sat, 7 Aug 2021 13:08:00 +0000 (15:08 +0200)]
can: c_can: cache frames to operate as a true FIFO

As reported by a comment in the c_can_start_xmit() this was not a FIFO.
C/D_CAN controller sends out the buffers prioritized so that the lowest
buffer number wins.

What did c_can_start_xmit() do if head was less tail in the tx ring ? It
waited until all the frames queued in the FIFO was actually transmitted
by the controller before accepting a new CAN frame to transmit, even if
the FIFO was not full, to ensure that the messages were transmitted in
the order in which they were loaded.

By storing the frames in the FIFO without requiring its transmission, we
will be able to use the full size of the FIFO even in cases such as the
one described above. The transmission interrupt will trigger their
transmission only when all the messages previously loaded but stored in
less priority positions of the buffers have been transmitted.

Link: https://lore.kernel.org/r/20210807130800.5246-5-dariobin@libero.it
Suggested-by: Gianluca Falavigna <gianluca.falavigna@inwind.it>
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: c_can: support tx ring algorithm
Dario Binacchi [Sat, 7 Aug 2021 13:07:59 +0000 (15:07 +0200)]
can: c_can: support tx ring algorithm

The algorithm is already used successfully by other CAN drivers
(e.g. mcp251xfd). Its implementation was kindly suggested to me by
Marc Kleine-Budde following a patch I had previously submitted. You can
find every detail at https://lore.kernel.org/patchwork/patch/1422929/.

The idea is that after this patch, it will be easier to patch the driver
to use the message object memory as a true FIFO.

Link: https://lore.kernel.org/r/20210807130800.5246-4-dariobin@libero.it
Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: c_can: exit c_can_do_tx() early if no frames have been sent
Dario Binacchi [Sat, 7 Aug 2021 13:07:58 +0000 (15:07 +0200)]
can: c_can: exit c_can_do_tx() early if no frames have been sent

The c_can_poll() handles RX/TX events unconditionally. It may therefore
happen that c_can_do_tx() is called unnecessarily because the interrupt
was triggered by the reception of a frame. In these cases, we avoid to
execute unnecessary statements and exit immediately.

Link: https://lore.kernel.org/r/20210807130800.5246-3-dariobin@libero.it
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: c_can: remove struct c_can_priv::priv field
Dario Binacchi [Sat, 7 Aug 2021 13:07:57 +0000 (15:07 +0200)]
can: c_can: remove struct c_can_priv::priv field

It references the clock but it is never used. So let's remove it.

Link: https://lore.kernel.org/r/20210807130800.5246-2-dariobin@libero.it
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: c_can: rename IF_RX -> IF_NAPI
Marc Kleine-Budde [Mon, 9 Aug 2021 07:26:35 +0000 (09:26 +0200)]
can: c_can: rename IF_RX -> IF_NAPI

The C_CAN/D_CAN cores implement 2 interfaces to manage the message
objects. To avoid concurrency and the need for locking one interface
is used in the TX path (IF_TX). While the other one, named IF_RX is
used from NAPI context only. As this interface is not only used to
manage RX, but also TX message objects, this patch renames IF_RX to
IF_NAPI.

Link: https://lore.kernel.org/r/20210809080608.171545-1-mkl@pengutronix.de
Cc: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: c_can: c_can_do_tx(): fix typo in comment
Marc Kleine-Budde [Fri, 6 Aug 2021 08:49:52 +0000 (10:49 +0200)]
can: c_can: c_can_do_tx(): fix typo in comment

This patch fixes a typo in the comment in c_can_do_tx().

Fixes: eddf67115040 ("can: c_can: add a comment about IF_RX interface's use")
Link: https://lore.kernel.org/r/20210806105127.103302-1-mkl@pengutronix.de
Cc: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agodt-bindings: net: can: c_can: convert to json-schema
Dario Binacchi [Thu, 5 Aug 2021 19:27:50 +0000 (21:27 +0200)]
dt-bindings: net: can: c_can: convert to json-schema

Convert the Bosch C_CAN/D_CAN controller device tree binding
documentation to json-schema.

Document missing properties.
Remove "ti,hwmods" as it is no longer used in TI dts.
Make "clocks" required as it is used in all dts.
Update the examples.

Link: https://lore.kernel.org/r/20210805192750.9051-1-dariobin@libero.it
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: m_can: Batch FIFO writes during CAN transmit
Matt Kline [Tue, 17 Aug 2021 05:08:53 +0000 (22:08 -0700)]
can: m_can: Batch FIFO writes during CAN transmit

Give FIFO writes the same treatment as reads to avoid fixed costs of
individual transfers on a slow bus (e.g., tcan4x5x).

Link: https://lore.kernel.org/r/20210817050853.14875-4-matt@bitbashing.io
Signed-off-by: Matt Kline <matt@bitbashing.io>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: m_can: Batch FIFO reads during CAN receive
Matt Kline [Tue, 17 Aug 2021 05:08:52 +0000 (22:08 -0700)]
can: m_can: Batch FIFO reads during CAN receive

On peripherals communicating over a relatively slow SPI line
(e.g. tcan4x5x), individual transfers have high fixed costs.
This causes the driver to spend most of its time waiting between
transfers and severely limits throughput.

Reduce these overheads by reading more than one word at a time.
Writing could get a similar treatment in follow-on commits.

Link: https://lore.kernel.org/r/20210817050853.14875-3-matt@bitbashing.io
Signed-off-by: Matt Kline <matt@bitbashing.io>
[mkl: remove __packed from struct id_and_dlc]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: m_can: Disable IRQs on FIFO bus errors
Matt Kline [Tue, 17 Aug 2021 05:08:51 +0000 (22:08 -0700)]
can: m_can: Disable IRQs on FIFO bus errors

If FIFO reads or writes fail due to the underlying regmap (e.g., SPI)
I/O, propagate that up to the m_can driver, log an error, and disable
interrupts, similar to the mcp251xfd driver.

While reworking the FIFO functions to add this error handling,
add support for bulk reads and writes of multiple registers.

Link: https://lore.kernel.org/r/20210817050853.14875-2-matt@bitbashing.io
Signed-off-by: Matt Kline <matt@bitbashing.io>
[mkl: re-wrap long lines, remove WARN_ON, convert to netdev block comments]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: m_can: fix block comment style
Marc Kleine-Budde [Thu, 19 Aug 2021 10:20:03 +0000 (12:20 +0200)]
can: m_can: fix block comment style

This patch fixes the commenting style in the m_can driver.

Fixes: 1be37d3b0414 ("can: m_can: fix periph RX path: use rx-offload to ensure skbs are sent from softirq context")
Fixes: df06fd678260 ("can: m_can: m_can_chip_config(): enable and configure internal timestamps")
Link: https://lore.kernel.org/r/20210819111703.599686-2-mkl@pengutronix.de
Cc: Chandrasekar Ramakrishnan <rcsekar@samsung.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: tcan4x5x: cdev_to_priv(): remove stray empty line
Marc Kleine-Budde [Thu, 19 Aug 2021 09:48:45 +0000 (11:48 +0200)]
can: tcan4x5x: cdev_to_priv(): remove stray empty line

This patch removes a stray empty line in the cdev_to_priv() function.

Fixes: ac33ffd3e2b0 ("can: m_can: let m_can_class_allocate_dev() allocate driver specific private data")
Link: https://lore.kernel.org/r/20210819111703.599686-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: rcar_canfd: Add support for RZ/G2L family
Lad Prabhakar [Tue, 27 Jul 2021 13:30:21 +0000 (14:30 +0100)]
can: rcar_canfd: Add support for RZ/G2L family

CANFD block on RZ/G2L SoC is almost identical to one found on
R-Car Gen3 SoC's. On RZ/G2L SoC interrupt sources for each channel
are split into different sources and the IP doesn't divide (1/2)
CANFD clock within the IP.

This patch adds compatible string for RZ/G2L family and splits
the irq handlers to accommodate both RZ/G2L and R-Car Gen3 SoC's.

Link: https://lore.kernel.org/r/20210727133022.634-3-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
[mkl: fixed typo: recieve -> receive, thanks Geert]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agodt-bindings: net: can: renesas,rcar-canfd: Document RZ/G2L SoC
Lad Prabhakar [Tue, 27 Jul 2021 13:30:20 +0000 (14:30 +0100)]
dt-bindings: net: can: renesas,rcar-canfd: Document RZ/G2L SoC

Add CANFD binding documentation for Renesas RZ/G2L SoC.

Link: https://lore.kernel.org/r/20210727133022.634-2-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: mcp251xfd: mark some instances of struct mcp251xfd_priv as const
Marc Kleine-Budde [Mon, 9 Aug 2021 18:31:54 +0000 (20:31 +0200)]
can: mcp251xfd: mark some instances of struct mcp251xfd_priv as const

With the patch 07ff4aed015c ("time/timecounter: Mark 1st argument of
timecounter_cyc2time() as const") some instances of the struct
mcp251xfd_priv can be marked as const. This patch marks these as
const.

Link: https://lore.kernel.org/r/20210813091027.159379-1-mkl@pengutronix.de
Cc: Manivannan Sadhasivam <mani@kernel.org>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: etas_es58x: clean-up documentation of struct es58x_fd_tx_conf_msg
Vincent Mailhol [Sun, 15 Aug 2021 03:32:48 +0000 (12:32 +0900)]
can: etas_es58x: clean-up documentation of struct es58x_fd_tx_conf_msg

The documentation of struct es58x_fd_tx_conf_msg explains in details
the different TDC parameters. However, those description are redundant
with the documentation of struct can_tdc.

Remove most of the description.

Also, fixes a typo in the reference to the datasheet (E701 -> E70).

Link: https://lore.kernel.org/r/20210815033248.98111-8-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agoMAINTAINERS: add Vincent MAILHOL as maintainer for the ETAS ES58X CAN/USB driver
Vincent Mailhol [Sat, 14 Aug 2021 09:33:53 +0000 (18:33 +0900)]
MAINTAINERS: add Vincent MAILHOL as maintainer for the ETAS ES58X CAN/USB driver

Adding myself (Vincent Mailhol) as a maintainer for the ETAS ES58X
CAN/USB driver.

Link: https://lore.kernel.org/r/20210814093353.74391-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: netlink: allow user to turn off unsupported features
Vincent Mailhol [Sun, 15 Aug 2021 03:32:42 +0000 (12:32 +0900)]
can: netlink: allow user to turn off unsupported features

The sanity checks on the control modes will reject any request related
to an unsupported features, even turning it off.

Example on an interface which does not support CAN-FD:

$ ip link set can0 type can bitrate 500000 fd off
RTNETLINK answers: Operation not supported

This patch lets such command go through (but requests to turn on an
unsupported feature are, of course, still denied).

Link: https://lore.kernel.org/r/20210815033248.98111-2-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agocan: dev: provide optional GPIO based termination support
Oleksij Rempel [Wed, 18 Aug 2021 07:12:32 +0000 (09:12 +0200)]
can: dev: provide optional GPIO based termination support

For CAN buses to work, a termination resistor has to be present at both
ends of the bus. This resistor is usually 120 Ohms, other values may be
required for special bus topologies.

This patch adds support for a generic GPIO based CAN termination. The
resistor value has to be specified via device tree, and it can only be
attached to or detached from the bus. By default the termination is not
active.

Link: https://lore.kernel.org/r/20210818071232.20585-4-o.rempel@pengutronix.de
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agodt-bindings: can: fsl,flexcan: enable termination-* bindings
Oleksij Rempel [Wed, 18 Aug 2021 07:12:31 +0000 (09:12 +0200)]
dt-bindings: can: fsl,flexcan: enable termination-* bindings

Enable termination-* binding and provide validation example for it.

Link: https://lore.kernel.org/r/20210818071232.20585-3-o.rempel@pengutronix.de
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agodt-bindings: can-controller: add support for termination-gpios
Oleksij Rempel [Wed, 18 Aug 2021 07:12:30 +0000 (09:12 +0200)]
dt-bindings: can-controller: add support for termination-gpios

Some boards provide GPIO controllable termination resistor. Provide
binding to make use of it.

Link: https://lore.kernel.org/r/20210818071232.20585-2-o.rempel@pengutronix.de
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agonet: ethernet: ti: cpsw: make array stpa static const, makes object smaller
Colin Ian King [Thu, 19 Aug 2021 12:04:43 +0000 (13:04 +0100)]
net: ethernet: ti: cpsw: make array stpa static const, makes object smaller

Don't populate the array stpa on the stack but instead it
static const. Makes the object code smaller by 81 bytes:

Before:
   text    data   bss    dec    hex filename
  54993   17248     0  72241  11a31 ./drivers/net/ethernet/ti/cpsw_new.o

After:
   text    data   bss    dec    hex filename
  54784   17376     0  72160  119e0 ./drivers/net/ethernet/ti/cpsw_new.o

(gcc version 10.3.0)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: make array spec_opcode static const, makes object smaller
Colin Ian King [Thu, 19 Aug 2021 11:58:13 +0000 (12:58 +0100)]
net: hns3: make array spec_opcode static const, makes object smaller

Don't populate the array spec_opcode on the stack but instead it
static const. Makes the object code smaller by 158 bytes:

Before:
   text   data   bss     dec    hex filename
  12271   3976   128   16375   3ff7 .../hisilicon/hns3/hns3pf/hclge_cmd.o

After:
   text   data   bss     dec    hex filename
  12017   4072   128   16217   3f59 .../hisilicon/hns3/hns3pf/hclge_cmd.o

(gcc version 10.3.0)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agohinic: make array speeds static const, makes object smaller
Colin Ian King [Thu, 19 Aug 2021 11:52:53 +0000 (12:52 +0100)]
hinic: make array speeds static const, makes object smaller

Don't populate the array speeds on the stack but instead it
static const. Makes the object code smaller by 17 bytes:

Before:
   text    data     bss     dec     hex filename
  39987   14200      64   54251    d3eb .../huawei/hinic/hinic_sriov.o

After:
   text    data     bss     dec     hex filename
  39906   14264      64   54234    d3da .../huawei/hinic/hinic_sriov.o

(gcc version 10.3.0)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'indirect-qdisc-order'
David S. Miller [Thu, 19 Aug 2021 12:19:30 +0000 (13:19 +0100)]
Merge branch 'indirect-qdisc-order'

Eli Cohen says:

====================
Indirect dev ingress qdisc creation order

The first patch is just a cleanup of the code.
The second patch is fixing the dependency in ingress qdisc creation
relative to offloading driver registration to filter configurations.

v1 -> v2:
Fix warning - variable set but not used
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: Fix offloading indirect devices dependency on qdisc order creation
Eli Cohen [Tue, 17 Aug 2021 17:05:18 +0000 (20:05 +0300)]
net: Fix offloading indirect devices dependency on qdisc order creation

Currently, when creating an ingress qdisc on an indirect device before
the driver registered for callbacks, the driver will not have a chance
to register its filter configuration callbacks.

To fix that, modify the code such that it keeps track of all the ingress
qdiscs that call flow_indr_dev_setup_offload(). When a driver calls
flow_indr_dev_register(),  go through the list of tracked ingress qdiscs
and call the driver callback entry point so as to give it a chance to
register its callback.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/core: Remove unused field from struct flow_indr_dev
Eli Cohen [Tue, 17 Aug 2021 17:05:17 +0000 (20:05 +0300)]
net/core: Remove unused field from struct flow_indr_dev

rcu field is not used. Remove it.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mii: make mii_ethtool_gset() return void
Pavel Skripkin [Wed, 18 Aug 2021 15:07:09 +0000 (18:07 +0300)]
net: mii: make mii_ethtool_gset() return void

mii_ethtool_gset() does not return any errors. Since there are no users
of this function that rely on its return value, it can be
made void.

Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: pch_gbe: remove mii_ethtool_gset() error handling
Pavel Skripkin [Wed, 18 Aug 2021 15:06:30 +0000 (18:06 +0300)]
net: pch_gbe: remove mii_ethtool_gset() error handling

mii_ethtool_gset() does not return any errors, so error handling can be
omitted to make code more simple.

Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'r8152-bp-settings'
David S. Miller [Thu, 19 Aug 2021 11:19:30 +0000 (12:19 +0100)]
Merge branch 'r8152-bp-settings'

Hayes Wang says:

====================
r8152: fix bp settings

Fix the wrong bp settings of the firmware.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agor8152: fix the maximum number of PLA bp for RTL8153C
Hayes Wang [Thu, 19 Aug 2021 03:05:37 +0000 (11:05 +0800)]
r8152: fix the maximum number of PLA bp for RTL8153C

The maximum PLA bp number of RTL8153C is 16, not 8. That is, the
bp 0 ~ 15 are at 0xfc28 ~ 0xfc46, and the bp_en is at 0xfc48.

Fixes: 195aae321c82 ("r8152: support new chips")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agor8152: fix writing USB_BP2_EN
Hayes Wang [Thu, 19 Aug 2021 03:05:36 +0000 (11:05 +0800)]
r8152: fix writing USB_BP2_EN

The register of USB_BP2_EN is 16 bits, so we should use
ocp_write_word(), not ocp_write_byte().

Fixes: 9370f2d05a2a ("support request_firmware for RTL8153")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'mptcp-fixes'
David S. Miller [Thu, 19 Aug 2021 11:17:05 +0000 (12:17 +0100)]
Merge branch 'mptcp-fixes'

Mat Martineau says:

====================
mptcp: Bug fixes

Here are two bug fixes for the net tree:

Patch 1 fixes a memory leak that could be encountered when clearing the
list of advertised MPTCP addresses.

Patch 2 fixes a protocol issue early in an MPTCP connection, to ensure
both peers correctly understand that the full MPTCP connection handshake
has completed even when the server side quickly sends an ADD_ADDR
option.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: full fully established support after ADD_ADDR
Matthieu Baerts [Wed, 18 Aug 2021 23:42:37 +0000 (16:42 -0700)]
mptcp: full fully established support after ADD_ADDR

If directly after an MP_CAPABLE 3WHS, the client receives an ADD_ADDR
with HMAC from the server, it is enough to switch to a "fully
established" mode because it has received more MPTCP options.

It was then OK to enable the "fully_established" flag on the MPTCP
socket. Still, best to check if the ADD_ADDR looks valid by looking if
it contains an HMAC (no 'echo' bit). If an ADD_ADDR echo is received
while we are not in "fully established" mode, it is strange and then
we should not switch to this mode now.

But that is not enough. On one hand, the path-manager has be notified
the state has changed. On the other hand, the "fully_established" flag
on the subflow socket should be turned on as well not to re-send the
MP_CAPABLE 3rd ACK content with the next ACK.

Fixes: 84dfe3677a6f ("mptcp: send out dedicated ADD_ADDR packet")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: fix memory leak on address flush
Paolo Abeni [Wed, 18 Aug 2021 23:42:36 +0000 (16:42 -0700)]
mptcp: fix memory leak on address flush

The endpoint cleanup path is prone to a memory leak, as reported
by syzkaller:

 BUG: memory leak
 unreferenced object 0xffff88810680ea00 (size 64):
   comm "syz-executor.6", pid 6191, jiffies 4295756280 (age 24.138s)
   hex dump (first 32 bytes):
     58 75 7d 3c 80 88 ff ff 22 01 00 00 00 00 ad de  Xu}<....".......
     01 00 02 00 00 00 00 00 ac 1e 00 07 00 00 00 00  ................
   backtrace:
     [<0000000072a9f72a>] kmalloc include/linux/slab.h:591 [inline]
     [<0000000072a9f72a>] mptcp_nl_cmd_add_addr+0x287/0x9f0 net/mptcp/pm_netlink.c:1170
     [<00000000f6e931bf>] genl_family_rcv_msg_doit.isra.0+0x225/0x340 net/netlink/genetlink.c:731
     [<00000000f1504a2c>] genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
     [<00000000f1504a2c>] genl_rcv_msg+0x341/0x5b0 net/netlink/genetlink.c:792
     [<0000000097e76f6a>] netlink_rcv_skb+0x148/0x430 net/netlink/af_netlink.c:2504
     [<00000000ceefa2b8>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
     [<000000008ff91aec>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
     [<000000008ff91aec>] netlink_unicast+0x537/0x750 net/netlink/af_netlink.c:1340
     [<0000000041682c35>] netlink_sendmsg+0x846/0xd80 net/netlink/af_netlink.c:1929
     [<00000000df3aa8e7>] sock_sendmsg_nosec net/socket.c:704 [inline]
     [<00000000df3aa8e7>] sock_sendmsg+0x14e/0x190 net/socket.c:724
     [<000000002154c54c>] ____sys_sendmsg+0x709/0x870 net/socket.c:2403
     [<000000001aab01d7>] ___sys_sendmsg+0xff/0x170 net/socket.c:2457
     [<00000000fa3b1446>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486
     [<00000000db2ee9c7>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     [<00000000db2ee9c7>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
     [<000000005873517d>] entry_SYSCALL_64_after_hwframe+0x44/0xae

We should not require an allocation to cleanup stuff.

Rework the code a bit so that the additional RCU work is no more needed.

Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'ravb-gbit'
David S. Miller [Thu, 19 Aug 2021 11:05:29 +0000 (12:05 +0100)]
Merge branch 'ravb-gbit'

Biju Das says:

====================
ravb: Add Gigabit Ethernet driver support

The DMAC and EMAC blocks of Gigabit Ethernet IP found on RZ/G2L SoC are
similar to the R-Car Ethernet AVB IP.

The Gigabit Ethernet IP consists of Ethernet controller (E-MAC), Internal
TCP/IP Offload Engine (TOE)  and Dedicated Direct memory access controller
(DMAC).

With a few changes in the driver we can support both IPs.

Currently a runtime decision based on the chip type is used to distinguish
the HW differences between the SoC families.

This patch series is in preparation for supporting the RZ/G2L SoC by
replacing driver data chip type with struct ravb_hw_info by moving chip
type to it and also adding gstrings_stats, gstrings_size, net_hw_features,
net_features, aligned_tx, stats_len, max_rx_len variables to
it. This patch also adds the feature bit for {RX, TX} clock internal
delays and TX counters HW features found on R-Car Gen3 to struct
ravb_hw_info.

This patch series is based on net-next.
v2->v3:
 * Removed num_gstat_queue variable from struct ravb_hw_info.
 * started using unsigned int for num_tx_desc variable in struct ravb_private
 * split the patch 'Add struct ravb_hw_info to driver data' into two
 * Renamed skb_sz to max_rx_len and tx_drop_cntrs to tx_counters
   and also updated the comments.
v1->v2:
 * Replaced driver data chip type with struct ravb_hw_info
 * Added gstrings_stats, gstrings_size, net_hw_features, net_features,
   num_gstat_queue, num_tx_desc, stats_len, skb_sz to struct ravb_hw_info
 * Added internal_delay and tx_drop_cntrs hw feature bit to struct ravb_hw_info

RFC->V1
  * Incorporated feedback from Andrew, Sergei, Geert and Prabhakar
  * https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=515525
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoravb: Add tx_counters to struct ravb_hw_info
Biju Das [Wed, 18 Aug 2021 19:08:00 +0000 (20:08 +0100)]
ravb: Add tx_counters to struct ravb_hw_info

The register for retrieving TX counters is present only on R-Car Gen3
and RZ/G2L; it is not present on R-Car Gen2.

Add the tx_counters hw feature bit to struct ravb_hw_info, to enable this
feature specifically for R-Car Gen3 now and later extend it to RZ/G2L.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoravb: Add internal delay hw feature to struct ravb_hw_info
Biju Das [Wed, 18 Aug 2021 19:07:59 +0000 (20:07 +0100)]
ravb: Add internal delay hw feature to struct ravb_hw_info

R-Car Gen3 supports TX and RX clock internal delay modes, whereas R-Car
Gen2 and RZ/G2L do not support it.
Add an internal_delay hw feature bit to struct ravb_hw_info to enable this
only for R-Car Gen3.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoravb: Add net_features and net_hw_features to struct ravb_hw_info
Biju Das [Wed, 18 Aug 2021 19:07:58 +0000 (20:07 +0100)]
ravb: Add net_features and net_hw_features to struct ravb_hw_info

On R-Car the checksum calculation on RX frames is done by the E-MAC
module, whereas on RZ/G2L it is done by the TOE.

TOE calculates the checksum of received frames from E-MAC and outputs it to
DMAC. TOE also calculates the checksum of transmission frames from DMAC and
outputs it E-MAC.

Add net_features and net_hw_features to struct ravb_hw_info, to support
subsequent SoCs without any code changes in the ravb_probe function.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoravb: Add gstrings_stats and gstrings_size to struct ravb_hw_info
Biju Das [Wed, 18 Aug 2021 19:07:57 +0000 (20:07 +0100)]
ravb: Add gstrings_stats and gstrings_size to struct ravb_hw_info

The device stats strings for R-Car and RZ/G2L are different.

R-Car provides 30 device stats, whereas RZ/G2L provides only 15. In
addition, RZ/G2L has stats "rx_queue_0_csum_offload_errors" instead of
"rx_queue_0_missed_errors".

Add structure variables gstrings_stats and gstrings_size to struct
ravb_hw_info, so that subsequent SoCs can be added without any code
changes in the ravb_get_strings function.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoravb: Add stats_len to struct ravb_hw_info
Biju Das [Wed, 18 Aug 2021 19:07:56 +0000 (20:07 +0100)]
ravb: Add stats_len to struct ravb_hw_info

R-Car provides 30 device stats, whereas RZ/G2L provides only 15. In
addition, RZ/G2L has stats "rx_queue_0_csum_offload_errors" instead of
"rx_queue_0_missed_errors".

Replace RAVB_STATS_LEN macro with a structure variable stats_len to
struct ravb_hw_info, to support subsequent SoCs without any code changes
to the ravb_get_sset_count function.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoravb: Add max_rx_len to struct ravb_hw_info
Biju Das [Wed, 18 Aug 2021 19:07:55 +0000 (20:07 +0100)]
ravb: Add max_rx_len to struct ravb_hw_info

The maximum descriptor size that can be specified on the reception side for
R-Car is 2048 bytes, whereas for RZ/G2L it is 8096.

Add the max_rx_len variable to struct ravb_hw_info for allocating different
RX skb buffer sizes for R-Car and RZ/G2L using the netdev_alloc_skb
function.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoravb: Add aligned_tx to struct ravb_hw_info
Biju Das [Wed, 18 Aug 2021 19:07:54 +0000 (20:07 +0100)]
ravb: Add aligned_tx to struct ravb_hw_info

R-Car Gen2 needs a 4byte aligned address for the transmission buffer,
whereas R-Car Gen3 doesn't have any such restriction.

Add aligned_tx to struct ravb_hw_info to select the driver to choose
between aligned and unaligned tx buffers.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoravb: Add struct ravb_hw_info to driver data
Biju Das [Wed, 18 Aug 2021 19:07:53 +0000 (20:07 +0100)]
ravb: Add struct ravb_hw_info to driver data

The DMAC and EMAC blocks of Gigabit Ethernet IP found on RZ/G2L SoC are
similar to the R-Car Ethernet AVB IP. With a few changes in the driver we
can support both IPs.

This patch adds the struct ravb_hw_info to hold hw features, driver data
and function pointers to support both the IPs. It also replaces the driver
data chip type with struct ravb_hw_info by moving chip type to it.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoravb: Use unsigned int for num_tx_desc variable in struct ravb_private
Biju Das [Wed, 18 Aug 2021 19:07:52 +0000 (20:07 +0100)]
ravb: Use unsigned int for num_tx_desc variable in struct ravb_private

The number of TX descriptors per packet is an unsigned value and
the variable for holding this information should be unsigned.

This patch replaces the data type of num_tx_desc variable in struct
ravb_private from 'int' to 'unsigned int'.
This patch also updates the data type of local variables to unsigned int,
where the local variables are evaluated using num_tx_desc.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomailmap: update email address of Matthias Fuchs and Thomas Körper
Marc Kleine-Budde [Fri, 6 Aug 2021 10:39:25 +0000 (12:39 +0200)]
mailmap: update email address of Matthias Fuchs and Thomas Körper

Matthias Fuchs's and Thomas Körper's email addresses aren't valid
anymore. Use the newly created role account instead.

Link: https://lore.kernel.org/r/20210809175843.207864-1-mkl@pengutronix.de
Cc: socketcan@esd.eu
Cc: Stefan Mätje <Stefan.Maetje@esd.eu>
Acked-by: Stefan Mätje <stefan.maetje@esd.eu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agonet/rds: dma_map_sg is entitled to merge entries
Gerd Rausch [Tue, 17 Aug 2021 17:04:37 +0000 (10:04 -0700)]
net/rds: dma_map_sg is entitled to merge entries

Function "dma_map_sg" is entitled to merge adjacent entries
and return a value smaller than what was passed as "nents".

Subsequently "ib_map_mr_sg" needs to work with this value ("sg_dma_len")
rather than the original "nents" parameter ("sg_len").

This old RDS bug was exposed and reliably causes kernel panics
(using RDMA operations "rds-stress -D") on x86_64 starting with:
commit c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops")

Simply put: Linux 5.11 and later.

Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Link: https://lore.kernel.org/r/60efc69f-1f35-529d-a7ef-da0549cad143@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet: mscc: ocelot: allow forwarding from bridge ports to the tag_8021q CPU port
Vladimir Oltean [Tue, 17 Aug 2021 16:04:25 +0000 (19:04 +0300)]
net: mscc: ocelot: allow forwarding from bridge ports to the tag_8021q CPU port

Currently we are unable to ping a bridge on top of a felix switch which
uses the ocelot-8021q tagger. The packets are dropped on the ingress of
the user port and the 'drop_local' counter increments (the counter which
denotes drops due to no valid destinations).

Dumping the PGID tables, it becomes clear that the PGID_SRC of the user
port is zero, so it has no valid destinations.

But looking at the code, the cpu_fwd_mask (the bit mask of DSA tag_8021q
ports) is clearly missing from the forwarding mask of ports that are
under a bridge. So this has always been broken.

Looking at the version history of the patch, in v7
https://patchwork.kernel.org/project/netdevbpf/patch/20210125220333.1004365-12-olteanv@gmail.com/
the code looked like this:

/* Standalone ports forward only to DSA tag_8021q CPU ports */
unsigned long mask = cpu_fwd_mask;

(...)
} else if (ocelot->bridge_fwd_mask & BIT(port)) {
mask |= ocelot->bridge_fwd_mask & ~BIT(port);

while in v8 (the merged version)
https://patchwork.kernel.org/project/netdevbpf/patch/20210129010009.3959398-12-olteanv@gmail.com/
it looked like this:

unsigned long mask;

(...)
} else if (ocelot->bridge_fwd_mask & BIT(port)) {
mask = ocelot->bridge_fwd_mask & ~BIT(port);

So the breakage was introduced between v7 and v8 of the patch.

Fixes: e21268efbe26 ("net: dsa: felix: perform switch setup for tag_8021q")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210817160425.3702809-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet/mlx4: Use ARRAY_SIZE to get an array's size
Jason Wang [Tue, 17 Aug 2021 12:11:06 +0000 (20:11 +0800)]
net/mlx4: Use ARRAY_SIZE to get an array's size

The ARRAY_SIZE macro is defined to get an array's size which is
more compact and more formal in linux source. Thus, we can replace
the long sizeof(arr)/sizeof(arr[0]) with the compact ARRAY_SIZE.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210817121106.44189-1-wangborong@cdjrlc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoMerge tag 'for-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Wed, 18 Aug 2021 19:06:42 +0000 (12:06 -0700)]
Merge tag 'for-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "One more fix for cross-rename, adding a missing check for directory
  and subvolume, this could lead to a crash"

* tag 'for-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: prevent rename2 from exchanging a subvol with a directory from different parents

4 years agoMerge tag 'sound-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Wed, 18 Aug 2021 19:00:27 +0000 (12:00 -0700)]
Merge tag 'sound-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Only a few regression fixes and trivial device quirks"

* tag 'sound-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/via: Apply runtime PM workaround for ASUS B23E
  ALSA: hda: Fix hang during shutdown due to link reset
  ALSA: hda/realtek: Enable 4-speaker output for Dell XPS 15 9510 laptop
  ALSA: oxfw: fix functioal regression for silence in Apogee Duet FireWire
  ALSA: hda - fix the 'Capture Switch' value change notifications

4 years agoMerge tag 'cfi-v5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Linus Torvalds [Wed, 18 Aug 2021 18:55:50 +0000 (11:55 -0700)]
Merge tag 'cfi-v5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull clang cfi fix from Kees Cook:

 - Use rcu_read_{un}lock_sched_notrace to avoid recursion (Elliot Berman)

* tag 'cfi-v5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  cfi: Use rcu_read_{un}lock_sched_notrace

4 years agopipe: avoid unnecessary EPOLLET wakeups under normal loads
Linus Torvalds [Thu, 5 Aug 2021 17:04:43 +0000 (10:04 -0700)]
pipe: avoid unnecessary EPOLLET wakeups under normal loads

I had forgotten just how sensitive hackbench is to extra pipe wakeups,
and commit 3a34b13a88ca ("pipe: make pipe writes always wake up
readers") ended up causing a quite noticeable regression on larger
machines.

Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's
not clear that this matters in real life all that much, but as Mel
points out, it's used often enough when comparing kernels and so the
performance regression shows up like a sore thumb.

It's easy enough to fix at least for the common cases where pipes are
used purely for data transfer, and you never have any exciting poll
usage at all.  So set a special 'poll_usage' flag when there is polling
activity, and make the ugly "EPOLLET has crazy legacy expectations"
semantics explicit to only that case.

I would love to limit it to just the broken EPOLLET case, but the pipe
code can't see the difference between epoll and regular select/poll, so
any non-read/write waiting will trigger the extra wakeup behavior.  That
is sufficient for at least the hackbench case.

Apart from making the odd extra wakeup cases more explicitly about
EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe
write, not at the first write chunk.  That is actually much saner
semantics (as much as you can call any of the legacy edge-triggered
expectations for EPOLLET "sane") since it means that you know the wakeup
will happen once the write is done, rather than possibly in the middle
of one.

[ For stable people: I'm putting a "Fixes" tag on this, but I leave it
  up to you to decide whether you actually want to backport it or not.
  It likely has no impact outside of synthetic benchmarks  - Linus ]

Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/
Fixes: 3a34b13a88ca ("pipe: make pipe writes always wake up readers")
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Sandeep Patil <sspatil@android.com>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoplatform/x86: gigabyte-wmi: add support for B450M S2H V2
Thomas Weißschuh [Wed, 18 Aug 2021 16:44:35 +0000 (18:44 +0200)]
platform/x86: gigabyte-wmi: add support for B450M S2H V2

Reported as working here:
https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/1#issuecomment-901207693

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20210818164435.99821-1-linux@weissschuh.net
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
4 years agonet: asix: fix uninit value bugs
Pavel Skripkin [Tue, 17 Aug 2021 16:37:23 +0000 (19:37 +0300)]
net: asix: fix uninit value bugs

Syzbot reported uninit-value in asix_mdio_read(). The problem was in
missing error handling. asix_read_cmd() should initialize passed stack
variable smsr, but it can fail in some cases. Then while condidition
checks possibly uninit smsr variable.

Since smsr is uninitialized stack variable, driver can misbehave,
because smsr will be random in case of asix_read_cmd() failure.
Fix it by adding error handling and just continue the loop instead of
checking uninit value.

Added helper function for checking Host_En bit, since wrong loop was used
in 4 functions and there is no need in copy-pasting code parts.

Cc: Robert Foss <robert.foss@collabora.com>
Fixes: d9fe64e51114 ("net: asix: Add in_pm parameter")
Reported-by: syzbot+a631ec9e717fb0423053@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agopktgen: Remove fill_imix_distribution() CONFIG_XFRM dependency
Nick Richardson [Wed, 18 Aug 2021 01:31:26 +0000 (01:31 +0000)]
pktgen: Remove fill_imix_distribution() CONFIG_XFRM dependency

Currently, the declaration of fill_imix_distribution() is dependent
on CONFIG_XFRM. This is incorrect.

Move fill_imix_distribution() declaration out of #ifndef CONFIG_XFRM
block.

Signed-off-by: Nick Richardson <richardsonnick@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()
Wei Wang [Tue, 17 Aug 2021 19:40:03 +0000 (12:40 -0700)]
net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()

Add gfp_t mask as an input parameter to mem_cgroup_charge_skmem(),
to give more control to the networking stack and enable it to change
memcg charging behavior. In the future, the networking stack may decide
to avoid oom-kills when fallbacks are more appropriate.

One behavior change in mem_cgroup_charge_skmem() by this patch is to
avoid force charging by default and let the caller decide when and if
force charging is needed through the presence or absence of
__GFP_NOFAIL.

Signed-off-by: Wei Wang <weiwan@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoovs: clear skb->tstamp in forwarding path
kaixi.fan [Wed, 18 Aug 2021 02:22:15 +0000 (10:22 +0800)]
ovs: clear skb->tstamp in forwarding path

fq qdisc requires tstamp to be cleared in the forwarding path. Now ovs
doesn't clear skb->tstamp. We encountered a problem with linux
version 5.4.56 and ovs version 2.14.1, and packets failed to
dequeue from qdisc when fq qdisc was attached to ovs port.

Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: kaixi.fan <fankaixi.li@bytedance.com>
Signed-off-by: xiexiaohui <xiexiaohui.xxh@bytedance.com>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoocteontx2-pf: Allow VLAN priority also in ntuple filters
Subbaraya Sundeep [Wed, 18 Aug 2021 06:42:55 +0000 (12:12 +0530)]
octeontx2-pf: Allow VLAN priority also in ntuple filters

VLAN TCI is a 16 bit field which includes Priority(3 bits),
CFI(1 bit) and VID(12 bits). Currently ntuple filters support
installing rules to steer packets based on VID only.
This patch extends that support such that filters can
be installed for entire VLAN TCI.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: vrf: Add test for SNAT over VRF
Lahav Schlesinger [Wed, 18 Aug 2021 08:52:12 +0000 (08:52 +0000)]
selftests: vrf: Add test for SNAT over VRF

Commit 09e856d54bda ("vrf: Reset skb conntrack connection on VRF rcv")
fixes the "reverse-DNAT" of an SNAT-ed packet over a VRF.

This patch adds a test for this scenario.

Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'mdio-fixes'
David S. Miller [Wed, 18 Aug 2021 09:48:52 +0000 (10:48 +0100)]
Merge branch 'mdio-fixes'

Saravana Kannan says:

====================
Clean up and fix error handling in mdio_mux_init()

This patch series was started due to -EPROBE_DEFER not being handled
correctly in mdio_mux_init() and causing issues [1]. While at it, I also
did some more error handling fixes and clean ups. The -EPROBE_DEFER fix is
the last patch.

Ideally, in the last patch we'd treat any error similar to -EPROBE_DEFER
but I'm not sure if it'll break any board/platforms where some child
mdiobus never successfully registers. If we treated all errors similar to
-EPROBE_DEFER, then none of the child mdiobus will work and that might be a
regression. If people are sure this is not a real case, then I can fix up
the last patch to always fail the entire mdio-mux init if any of the child
mdiobus registration fails.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mdio-mux: Handle -EPROBE_DEFER correctly
Saravana Kannan [Wed, 18 Aug 2021 03:38:03 +0000 (20:38 -0700)]
net: mdio-mux: Handle -EPROBE_DEFER correctly

When registering mdiobus children, if we get an -EPROBE_DEFER, we shouldn't
ignore it and continue registering the rest of the mdiobus children. This
would permanently prevent the deferring child mdiobus from working instead
of reattempting it in the future. So, if a child mdiobus needs to be
reattempted in the future, defer the entire mdio-mux initialization.

This fixes the issue where PHYs sitting under the mdio-mux aren't
initialized correctly if the PHY's interrupt controller is not yet ready
when the mdio-mux is being probed. Additional context in the link below.

Fixes: 0ca2997d1452 ("netdev/of/phy: Add MDIO bus multiplexer support.")
Link: https://lore.kernel.org/lkml/CAGETcx95kHrv8wA-O+-JtfH7H9biJEGJtijuPVN0V5dUKUAB3A@mail.gmail.com/#t
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mdio-mux: Don't ignore memory allocation errors
Saravana Kannan [Wed, 18 Aug 2021 03:38:02 +0000 (20:38 -0700)]
net: mdio-mux: Don't ignore memory allocation errors

If we are seeing memory allocation errors, don't try to continue
registering child mdiobus devices. It's unlikely they'll succeed.

Fixes: 342fa1964439 ("mdio: mux: make child bus walking more permissive and errors more verbose")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mdio-mux: Delete unnecessary devm_kfree
Saravana Kannan [Wed, 18 Aug 2021 03:38:01 +0000 (20:38 -0700)]
net: mdio-mux: Delete unnecessary devm_kfree

The whole point of devm_* APIs is that you don't have to undo them if you
are returning an error that's going to get propagated out of a probe()
function. So delete unnecessary devm_kfree() call in the error return path.

Fixes: b60161668199 ("mdio: mux: Correct mdio_mux_init error path issues")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: net_namespace: Optimize the code
Yajun Deng [Tue, 17 Aug 2021 15:23:00 +0000 (23:23 +0800)]
net: net_namespace: Optimize the code

There is only one caller for ops_free(), so inline it.
Separate net_drop_ns() and net_free(), so the net_free()
can be called directly.
Add free_exit_list() helper function for free net_exit_list.

====================
v2:
 - v1 does not apply, rebase it.
====================

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: tag_sja1105: be dsa_loop-safe
Vladimir Oltean [Tue, 17 Aug 2021 14:58:47 +0000 (17:58 +0300)]
net: dsa: tag_sja1105: be dsa_loop-safe

Add support for tag_sja1105 running on non-sja1105 DSA ports, by making
sure that every time we dereference dp->priv, we check the switch's
dsa_switch_ops (otherwise we access a struct sja1105_port structure that
is in fact something else).

This adds an unconditional build-time dependency between sja1105 being
built as module => tag_sja1105 must also be built as module. This was
there only for PTP before.

Some sane defaults must also take place when not running on sja1105
hardware. These are:

- sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols
  depending on VLAN awareness and switch revision (when an encapsulated
  VLAN must be sent). Default to 0x8100.

- sja1105_rcv_meta_state_machine: this aggregates PTP frames with their
  metadata timestamp frames. When running on non-sja1105 hardware, don't
  do that and accept all frames unmodified.

- sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c
  which writes a management route over SPI. When not running on sja1105
  hardware, bypass the SPI write and send the frame as-is.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse
Vladimir Oltean [Tue, 17 Aug 2021 14:52:45 +0000 (17:52 +0300)]
net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse

It seems that of_find_compatible_node has a weird calling convention in
which it calls of_node_put() on the "from" node argument, instead of
leaving that up to the caller. This comes from the fact that
of_find_compatible_node with a non-NULL "from" argument it only supposed
to be used as the iterator function of for_each_compatible_node(). OF
iterator functions call of_node_get on the next OF node and of_node_put()
on the previous one.

When of_find_compatible_node calls of_node_put, it actually never
expects the refcount to drop to zero, because the call is done under the
atomic devtree_lock context, and when the refcount drops to zero it
triggers a kobject and a sysfs file deletion, which assume blocking
context.

So any driver call to of_find_compatible_node is probably buggy because
an unexpected of_node_put() takes place.

What should be done is to use the of_get_compatible_child() function.

Fixes: 5a8f09748ee7 ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX")
Link: https://lore.kernel.org/netdev/20210814010139.kzryimmp4rizlznt@skbuf/
Suggested-by: Frank Rowand <frowand.list@gmail.com>
Suggested-by: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'nci-ext'
David S. Miller [Wed, 18 Aug 2021 09:17:57 +0000 (10:17 +0100)]
Merge branch 'nci-ext'

Bongsu Jeon says:

====================
Update the virtual NCI device driver and add the NCI testcase

This series updates the virtual NCI device driver and NCI selftest code
and add the NCI test case in selftests.

1/8 to use wait queue in virtual device driver.
2/8 to remove the polling code in selftests.
3/8 to fix a typo.
4/8 to fix the next nlattr offset calculation.
5/8 to fix the wrong condition in if statement.
6/8 to add a flag parameter to the Netlink send function.
7/8 to extract the start/stop discovery function.
8/8 to add the NCI testcase in selftests.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: nci: Add the NCI testcase reading T4T Tag
Bongsu Jeon [Tue, 17 Aug 2021 13:28:18 +0000 (06:28 -0700)]
selftests: nci: Add the NCI testcase reading T4T Tag

Add the NCI testcase reading T4T Tag that has NFC TEST in plain text.
the virtual device application acts as T4T Tag in this testcase.

Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: nci: Extract the start/stop discovery function
Bongsu Jeon [Tue, 17 Aug 2021 13:28:17 +0000 (06:28 -0700)]
selftests: nci: Extract the start/stop discovery function

To reuse the start/stop discovery code in other testcase, extract the code.

Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: nci: Add the flags parameter for the send_cmd_mt_nla
Bongsu Jeon [Tue, 17 Aug 2021 13:28:16 +0000 (06:28 -0700)]
selftests: nci: Add the flags parameter for the send_cmd_mt_nla

To reuse the send_cmd_mt_nla for NLM_F_REQUEST and NLM_F_DUMP flag,
add the flags parameter to the function.

Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: nci: Fix the wrong condition
Bongsu Jeon [Tue, 17 Aug 2021 13:28:15 +0000 (06:28 -0700)]
selftests: nci: Fix the wrong condition

memcpy should be executed only in case nla_len's value is greater than 0.

Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: nci: Fix the code for next nlattr offset
Bongsu Jeon [Tue, 17 Aug 2021 13:28:14 +0000 (06:28 -0700)]
selftests: nci: Fix the code for next nlattr offset

nlattr could have a padding for 4 bytes alignment. So next nla's offset
should be calculated with a padding.

Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: nci: Fix the typo
Bongsu Jeon [Tue, 17 Aug 2021 13:28:13 +0000 (06:28 -0700)]
selftests: nci: Fix the typo

Fix typo: rep_len -> resp_len

Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: nci: Remove the polling code to read a NCI frame
Bongsu Jeon [Tue, 17 Aug 2021 13:28:12 +0000 (06:28 -0700)]
selftests: nci: Remove the polling code to read a NCI frame

Because the virtual NCI device uses Wait Queue, the virtual device
application doesn't need to poll the NCI frame.

Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonfc: virtual_ncidev: Use wait queue instead of polling
Bongsu Jeon [Tue, 17 Aug 2021 13:28:11 +0000 (06:28 -0700)]
nfc: virtual_ncidev: Use wait queue instead of polling

In previous version, the user level virtual device application that used
this driver should have the polling scheme to read a NCI frame.
To remove this polling scheme, use Wait Queue.

Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>