]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
4 years agoocteontx2-af: cn10K: support for sched lmtst and other features
Harman Kalra [Thu, 26 Aug 2021 12:33:40 +0000 (18:03 +0530)]
octeontx2-af: cn10K: support for sched lmtst and other features

Enhancing the mailbox scope to support important configurations
like enabling scheduled LMTST, disable LMTLINE prefetch, disable
early completion for ordered LMTST, as per request from the
application. On FLR these configurations will be reset to default.
This patch also adds the 95XXO silicon version to octeontx2 silicon
list.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agophy: marvell: phy-mvebu-a3700-comphy: Remove unsupported modes
Pali Rohár [Fri, 27 Aug 2021 09:27:53 +0000 (11:27 +0200)]
phy: marvell: phy-mvebu-a3700-comphy: Remove unsupported modes

Armada 3700 does not support RXAUI, XFI and neither SFI. Remove unused
macros for these unsupported modes.

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 9695375a3f4a ("phy: add A3700 COMPHY support")
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agophy: marvell: phy-mvebu-a3700-comphy: Rename HS-SGMMI to 2500Base-X
Pali Rohár [Fri, 27 Aug 2021 09:27:52 +0000 (11:27 +0200)]
phy: marvell: phy-mvebu-a3700-comphy: Rename HS-SGMMI to 2500Base-X

Comphy phy mode 0x3 is incorrectly named. It is not SGMII but rather
2500Base-X mode which runs at 3.125 Gbps speed.

Rename macro names and comments to 2500Base-X.

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 9695375a3f4a ("phy: add A3700 COMPHY support")
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agophy: marvell: phy-mvebu-cp110-comphy: Rename HS-SGMMI to 2500Base-X
Pali Rohár [Fri, 27 Aug 2021 09:27:51 +0000 (11:27 +0200)]
phy: marvell: phy-mvebu-cp110-comphy: Rename HS-SGMMI to 2500Base-X

Comphy phy mode 0x3 is incorrectly named. It is not SGMII but rather
2500Base-X mode which runs at 3.125 Gbps speed.

Rename macro names and comments to 2500Base-X.

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: eb6a1fcb53e2 ("phy: mvebu-cp110-comphy: Add SMC call support")
Fixes: c2afb2fef595 ("phy: mvebu-cp110-comphy: Rename the macro handling only Ethernet modes")
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'hns3-cleanups'
David S. Miller [Fri, 27 Aug 2021 10:20:21 +0000 (11:20 +0100)]
Merge branch 'hns3-cleanups'

Guangbin Huang says:

====================
net: hns3: add some cleanups

This series includes some cleanups for the HNS3 ethernet driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: uniform type of function parameter cmd
Hao Chen [Fri, 27 Aug 2021 09:28:24 +0000 (17:28 +0800)]
net: hns3: uniform type of function parameter cmd

The parameter cmd in function definition of hns3_dbg_bd_file_init and
hns3_dbg_common_file_init is used type u32, this patch uniforms them
in function declaration to type u32 too.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: merge some repetitive macros
Peng Li [Fri, 27 Aug 2021 09:28:23 +0000 (17:28 +0800)]
net: hns3: merge some repetitive macros

There are some repetitive macros have same meaning and value, this patch
merges them to make code clean.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: package new functions to simplify hclgevf_mbx_handler code
Peng Li [Fri, 27 Aug 2021 09:28:22 +0000 (17:28 +0800)]
net: hns3: package new functions to simplify hclgevf_mbx_handler code

This patch packages two new function to simplify the function
hclgevf_mbx_handler, and it can reduce the code cycle complexity
and make code more concise.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: remove redundant param to simplify code
Peng Li [Fri, 27 Aug 2021 09:28:21 +0000 (17:28 +0800)]
net: hns3: remove redundant param to simplify code

The param msg_q is redundant, copy &req->msg to
hdev->arq.msg_q[hdev->arq.tail] directly makes code clean.
So removes the redundant param msg_q.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: use memcpy to simplify code
Peng Li [Fri, 27 Aug 2021 09:28:20 +0000 (17:28 +0800)]
net: hns3: use memcpy to simplify code

Use memcpy to copy req->msg.resp_data to resp->additional_info,
to simplify the code and improve a little efficiency.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: remove redundant param mbx_event_pending
Peng Li [Fri, 27 Aug 2021 09:28:19 +0000 (17:28 +0800)]
net: hns3: remove redundant param mbx_event_pending

This patch removes the redundant param mbx_event_pending.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: add hns3_state_init() to do state initialization
Huazhong Tan [Fri, 27 Aug 2021 09:28:18 +0000 (17:28 +0800)]
net: hns3: add hns3_state_init() to do state initialization

To improve the readability and maintainability, add hns3_state_init() to
initialize the state, and this new function will be used to add more state
initialization in the future.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: add macros for mac speeds of firmware command
Guangbin Huang [Fri, 27 Aug 2021 09:28:17 +0000 (17:28 +0800)]
net: hns3: add macros for mac speeds of firmware command

To improve code readability, replace digital numbers of mac speeds
defined by firmware command with macros.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/
David S. Miller [Fri, 27 Aug 2021 10:16:29 +0000 (11:16 +0100)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/
ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2021-08-27

1) Remove an unneeded extra variable in esp4 esp_ssg_unref.
   From Corey Minyard.

2) Add a configuration option to change the default behaviour
   to block traffic if there is no matching policy.
   Joint work with Christian Langrock and Antony Antony.

3) Fix a shift-out-of-bounce bug reported from syzbot.
   From Pavel Skripkin.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'mlx5-updates-2021-08-26' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Fri, 27 Aug 2021 08:53:31 +0000 (09:53 +0100)]
Merge tag 'mlx5-updates-2021-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
This patch series contains various fixes, additions and improvements to
mlx5 software steering.

Patch 1:
  adds support for REMOVE_HEADER packet reformat - a new reformat type
  that is supported starting with ConnectX-6 DX, and allows removing an
  arbitrary size packet segment at a selected position.

Patches 2 and 3:
  add support for VLAN pop on TX and VLAN push on RX flows.

Patch 4:
  enables retransmission mechanism for the SW Steering RC QP.

Patch 5:
  does some improvements to error flow in building STE array and adds
  a more informative printout of an invalid actions sequence.

Patch 6:
  improves error flow on SW Steering QP error.

Patch 7:
  reduces the log level of a message that is printed when a table is
  connected to a lower/same level destination table, as this case proves to
  be not as rare as it was in the past.

Patch 8:
  adds missing support for matching on IPv6 flow label for devices
  older than ConnectX-6 DX.

Patch 9:
  replaces uintN_t types with kernel-style types.

Patch 10:
  allows for using the right API for updating flow tables - if it is
  a FW-owned table, then FW API will be used.

Patch 11:
  adds support for 'ignore_flow_level' on multi-destination flow
  tables that are created by SW Steering.

Patch 12:
   optimizes FDB RX steering rule by skipping matching on source port,
   as the source port for all incoming packets equals to wire.

Patch 13:
   is a small code refactoring - it merges several DR_STE_SIZE enums
   into a single enum.

Patch 14:
   does some additional refactoring and removes HW-specific STE type
   from NIC domain.

Patch 15:
   removes rehash ctrl struct from dr_htbl struct and saves some memory.

Patch 16:
   does a more significant improvement in terms of memory consumption
   and was able to save about 1.6 Gb for 8M rules.

Patch 17:
   adds support for update FTE, which is needed for cases where there
   are multiple rules with the same match.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'mptcp-Optimize-received-options-handling'
David S. Miller [Fri, 27 Aug 2021 08:45:07 +0000 (09:45 +0100)]
Merge branch 'mptcp-Optimize-received-options-handling'

Mat Martineau says:

====================
mptcp: Optimize received options handling

These patches optimize received MPTCP option handling in terms of both
storage and fewer conditionals to evaluate in common cases, and also add
a couple of cleanup patches.

Patches 1 and 5 do some cleanup in checksum option parsing and
clarification of lock handling.

Patches 2 and 3 rearrange struct mptcp_options_received to shrink it
slightly and consolidate frequently used fields in the same cache line.

Patch 4 optimizes incoming MPTCP option parsing to skip many extra
comparisons in the common case where only a DSS option is present.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: make the locking tx schema more readable
Paolo Abeni [Fri, 27 Aug 2021 00:44:54 +0000 (17:44 -0700)]
mptcp: make the locking tx schema more readable

Florian noted the locking schema used by __mptcp_push_pending()
is hard to follow, let's add some more descriptive comments
and drop an unneeded and confusing check.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: optimize the input options processing
Paolo Abeni [Fri, 27 Aug 2021 00:44:53 +0000 (17:44 -0700)]
mptcp: optimize the input options processing

Most MPTCP packets carries a single MPTCP subption: the
DSS containing the mapping for the current packet.

Check explicitly for the above, so that is such scenario we
replace most conditional statements with a single likely() one.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: consolidate in_opt sub-options fields in a bitmask
Paolo Abeni [Fri, 27 Aug 2021 00:44:52 +0000 (17:44 -0700)]
mptcp: consolidate in_opt sub-options fields in a bitmask

This makes input options processing more consistent with
output ones and will simplify the next patch.

Also avoid clearing the suboption field after processing
it, since it's not needed.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: better binary layout for mptcp_options_received
Paolo Abeni [Fri, 27 Aug 2021 00:44:51 +0000 (17:44 -0700)]
mptcp: better binary layout for mptcp_options_received

This change reorder the mptcp_options_received fields
to shrink the structure a bit and to ensure the most
frequently used fields are all in the first cacheline.

Sub-opt specific flags are moved out of the suboptions area,
and we must now explicitly set them when the relevant
suboption is parsed.

There is a notable exception: 'csum_reqd' is used by both DSS
and MPC suboptions, and keeping such field in the suboptions
flag area will simplfy the next patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: do not set unconditionally csum_reqd on incoming opt
Paolo Abeni [Fri, 27 Aug 2021 00:44:50 +0000 (17:44 -0700)]
mptcp: do not set unconditionally csum_reqd on incoming opt

Should be set only if the ingress packets present it, otherwise
we can confuse csum validation.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotcp: enable mid stream window clamp
Neil Spring [Wed, 25 Aug 2021 21:01:17 +0000 (14:01 -0700)]
tcp: enable mid stream window clamp

The TCP_WINDOW_CLAMP socket option is defined in tcp(7) to "Bound the size
of the advertised window to this value."  Window clamping is distributed
across two variables, window_clamp ("Maximal window to advertise" in
tcp.h) and rcv_ssthresh ("Current window clamp").

This patch updates the function where the window clamp is set to also
reduce the current window clamp, rcv_sshthresh, if needed.  With this,
setting the TCP_WINDOW_CLAMP option has the documented effect of limiting
the window.

Signed-off-by: Neil Spring <ntspring@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210825210117.1668371-1-ntspring@fb.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 26 Aug 2021 20:45:47 +0000 (13:45 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet/mlx5: DR, Add support for update FTE
Yevgeny Kliteynik [Thu, 8 Jul 2021 13:58:39 +0000 (16:58 +0300)]
net/mlx5: DR, Add support for update FTE

Add the support for update FTE, which is needed for cases where there are
multiple rules with the same match. In such case fs_core will merge the
actions and call update FTE to update current FTE. Since we don't want to
disrupt the traffic, we will add the new duplicate rule, and only then
remove the old duplicate rule.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Improve rule tracking memory consumption
Yevgeny Kliteynik [Thu, 8 Jul 2021 13:51:29 +0000 (16:51 +0300)]
net/mlx5: DR, Improve rule tracking memory consumption

To track each STE of the rule a rule member was allocated, each
member would point to one STE. This means that we would allocate
40B (rule member) * number of STEs per rule.

To reduce this per rule allocation we use the STE tree pointers
for next_htbl and pointing STE to navigate the tree, this allows
us to keep only the pointer to the last STE of rule (always unique).
From the last rule STE we are able to traverse and rebuild all of
the STEs that construct the rule.

In our testing with 8M rules, each consisting of 7 STES, we were able
to reduce 1.6GB of memory.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Remove rehash ctrl struct from dr_htbl
Yevgeny Kliteynik [Sun, 4 Jul 2021 17:54:26 +0000 (20:54 +0300)]
net/mlx5: DR, Remove rehash ctrl struct from dr_htbl

The calculations to decide for the maximum allowed collision threshold
are simple and there is no reason to save them on the htbl struct.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Remove HW specific STE type from nic domain
Yevgeny Kliteynik [Sun, 4 Jul 2021 17:43:10 +0000 (20:43 +0300)]
net/mlx5: DR, Remove HW specific STE type from nic domain

Instead of using the HW specific STEv0 type, it is better to use
an enum to indicate if this is an RX or TX nic domain.
This means that now we will need to convert the nic domain type
to the corresponding STE type.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Merge DR_STE_SIZE enums
Yevgeny Kliteynik [Sun, 4 Jul 2021 15:01:54 +0000 (18:01 +0300)]
net/mlx5: DR, Merge DR_STE_SIZE enums

Merge DR_STE_SIZE enums - no need for a separate enum for reduced STE size.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Skip source port matching on FDB RX domain
Yevgeny Kliteynik [Sun, 4 Jul 2021 14:48:24 +0000 (17:48 +0300)]
net/mlx5: DR, Skip source port matching on FDB RX domain

The FDB RX pipe is connected to the wire and the source port for all
incoming packets equals to wire, single uplink port per PF, this means
there is no point of matching on the source port in such case.
Once we recognize such case, we will optimize the RX steering rule.
Note that in such case we clean both source_eswitch_owner_vhca_id and
source_port.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Add ignore_flow_level support for multi-dest flow tables
Yevgeny Kliteynik [Sun, 4 Jul 2021 14:42:04 +0000 (17:42 +0300)]
net/mlx5: DR, Add ignore_flow_level support for multi-dest flow tables

When creating an FTE, we might need to create multi-destination flow table,
which is eventually created by FW. In such case, this FW table should
include all the FTE properties as requested by the upper layer, including
the ability to point to another flow table with level lower or equal to
the current table - indicated by the "ignore_flow_level" property.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Use FW API when updating FW-owned flow table
Yevgeny Kliteynik [Sun, 4 Jul 2021 14:29:01 +0000 (17:29 +0300)]
net/mlx5: DR, Use FW API when updating FW-owned flow table

Need to call the DR API only when it is DR table.
To update FW-owned table the driver should call the FW API.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, replace uintN_t with kernel-style types
Yevgeny Kliteynik [Sun, 4 Jul 2021 14:25:11 +0000 (17:25 +0300)]
net/mlx5: DR, replace uintN_t with kernel-style types

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Support IPv6 matching on flow label for STEv0
Yevgeny Kliteynik [Sun, 4 Jul 2021 14:17:33 +0000 (17:17 +0300)]
net/mlx5: DR, Support IPv6 matching on flow label for STEv0

Add missing support for matching on IPv6 flow label for STEv0.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Reduce print level for FT chaining level check
Bodong Wang [Wed, 26 Aug 2020 15:59:54 +0000 (10:59 -0500)]
net/mlx5: DR, Reduce print level for FT chaining level check

There are usecases with Connection Tracking that have such connection
as default, printing this warning in dmesg confuses the user.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Warn and ignore SW steering rule insertion on QP err
Yevgeny Kliteynik [Sun, 4 Jul 2021 08:57:38 +0000 (11:57 +0300)]
net/mlx5: DR, Warn and ignore SW steering rule insertion on QP err

In the event of SW steering QP entering error state, SW steering
cannot insert more rules, and will silently ignore the insertion
after issuing a warning.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Improve error flow in actions_build_ste_arr
Yevgeny Kliteynik [Thu, 24 Sep 2020 17:58:44 +0000 (20:58 +0300)]
net/mlx5: DR, Improve error flow in actions_build_ste_arr

Improve error flow and print actions sequence when an
invalid/unsupported sequence provided.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Enable QP retransmission
Yevgeny Kliteynik [Thu, 24 Sep 2020 17:58:50 +0000 (20:58 +0300)]
net/mlx5: DR, Enable QP retransmission

Under high stress, SW steering might get stuck on polling for completion
that never comes.
For such cases QP needs to have protocol retransmission mechanism enabled.
Currently the retransmission timeout is defined as 0 (unlimited). Fix this
by defining a real timeout.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Enable VLAN pop on TX and VLAN push on RX
Yevgeny Kliteynik [Sun, 27 Jun 2021 20:05:28 +0000 (23:05 +0300)]
net/mlx5: DR, Enable VLAN pop on TX and VLAN push on RX

Enable pop VLAN action in TX and push VLAN in RX.
These actions are supported only on STEv1.

On TX: when a host sends a packet, VLAN is popped at the beginning.
On RX: just before passing the packet to the host the VLAN is pushed.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Split modify VLAN state to separate pop/push states
Yevgeny Kliteynik [Sun, 27 Jun 2021 12:01:12 +0000 (15:01 +0300)]
net/mlx5: DR, Split modify VLAN state to separate pop/push states

Split modify vlan state in the actions state machine to pop vlan
and push vlan states. This enables using of pop/push vlan without
restrictions (e.g. pop vlan on TX in STEv1).

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: DR, Added support for REMOVE_HEADER packet reformat
Yevgeny Kliteynik [Thu, 22 Apr 2021 08:32:56 +0000 (11:32 +0300)]
net/mlx5: DR, Added support for REMOVE_HEADER packet reformat

ConnectX supports offloading of various encapsulations and decapsulations
(e.g. VXLAN), which are performed by 'Packet Reformat' action. Starting
with ConnectX-6 DX, a new reformat type is supported - REMOVE_HEADER, which
allows deleting an arbitrary size chunk at the selected position in the packet.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agoMerge tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Thu, 26 Aug 2021 20:26:40 +0000 (13:26 -0700)]
Merge tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fix from Bruce Fields:
 "This is a one-liner fix for a serious bug that can cause the server to
  become unresponsive to a client, so I think it's worth the last-minute
  inclusion for 5.14"

* tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux:
  SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...

4 years agoMerge tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 26 Aug 2021 20:20:22 +0000 (13:20 -0700)]
Merge tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from can and bpf.

  Closing three hw-dependent regressions. Any fixes of note are in the
  'old code' category. Nothing blocking release from our perspective.

  Current release - regressions:

   - stmmac: revert "stmmac: align RX buffers"

   - usb: asix: ax88772: move embedded PHY detection as early as
     possible

   - usb: asix: do not call phy_disconnect() for ax88178

   - Revert "net: really fix the build...", from Kalle to fix QCA6390

  Current release - new code bugs:

   - phy: mediatek: add the missing suspend/resume callbacks

  Previous releases - regressions:

   - qrtr: fix another OOB Read in qrtr_endpoint_post

   - stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings

  Previous releases - always broken:

   - inet: use siphash in exception handling

   - ip_gre: add validation for csum_start

   - bpf: fix ringbuf helper function compatibility

   - rtnetlink: return correct error on changing device netns

   - e1000e: do not try to recover the NVM checksum on Tiger Lake"

* tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
  Revert "net: really fix the build..."
  net: hns3: fix get wrong pfc_en when query PFC configuration
  net: hns3: fix GRO configuration error after reset
  net: hns3: change the method of getting cmd index in debugfs
  net: hns3: fix duplicate node in VLAN list
  net: hns3: fix speed unknown issue in bond 4
  net: hns3: add waiting time before cmdq memory is released
  net: hns3: clear hardware resource when loading driver
  net: fix NULL pointer reference in cipso_v4_doi_free
  rtnetlink: Return correct error on changing device netns
  net: dsa: hellcreek: Adjust schedule look ahead window
  net: dsa: hellcreek: Fix incorrect setting of GCL
  cxgb4: dont touch blocked freelist bitmap after free
  ipv4: use siphash instead of Jenkins in fnhe_hashfun()
  ipv6: use siphash in rt6_exception_hash()
  can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
  net: usb: asix: ax88772: fix boolconv.cocci warnings
  net/sched: ets: fix crash when flipping from 'strict' to 'quantum'
  qede: Fix memset corruption
  net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp
  ...

4 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Thu, 26 Aug 2021 18:26:00 +0000 (11:26 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Will Deacon:
 "We received a report this week that the generic version of
  pfn_valid(), which we switched to this merge window in 16c9afc77660
  ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), interacts badly with
  dma_map_resource() due to the following check:

        /* Don't allow RAM to be mapped */
        if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
                return DMA_MAPPING_ERROR;

  Since the ongoing saga to determine the semantics of pfn_valid() is
  unlikely to be resolved this week (does it indicate valid memory, or
  just the presence of a struct page, or whether that struct page has
  been initialised?), just revert back to our old version of pfn_valid()
  for 5.14.

  Summary:

   - Fix dma_map_resource() by reverting back to old pfn_valid() code"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"

4 years agoMerge tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client
Linus Torvalds [Thu, 26 Aug 2021 18:18:30 +0000 (11:18 -0700)]
Merge tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Two memory management fixes for the filesystem"

* tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client:
  ceph: fix possible null-pointer dereference in ceph_mdsmap_decode()
  ceph: correctly handle releasing an embedded cap flush

4 years agoRevert "net: really fix the build..."
Kalle Valo [Thu, 26 Aug 2021 17:28:16 +0000 (20:28 +0300)]
Revert "net: really fix the build..."

This reverts commit ce78ffa3ef1681065ba451cfd545da6126f5ca88.

Wren and Nicolas reported that ath11k was failing to initialise QCA6390
Wi-Fi 6 device with error:

qcom_mhi_qrtr: probe of mhi0_IPCR failed with error -22

Commit ce78ffa3ef16 ("net: really fix the build..."), introduced in
v5.14-rc5, caused this regression in qrtr. Most likely all ath11k
devices are broken, but I only tested QCA6390. Let's revert the broken
commit so that ath11k works again.

Reported-by: Wren Turkal <wt@penguintechs.org>
Reported-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210826172816.24478-1-kvalo@codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoMerge tag 'for-5.14-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 26 Aug 2021 18:05:11 +0000 (11:05 -0700)]
Merge tag 'for-5.14-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "One more fix that I think qualifies for a late merge. It's a revert of
  a one-liner fix that meanwhile got backported to stable kernels and we
  got reports from users.

  The broken fix prevents creating compressed inline extents, which
  could be noticeable on space consumption.

  Technically it's a regression as the patch was merged in 5.14-rc1 but
  got propagated to several stable kernels and has higher exposure than
  a 'typical' development cycle bug"

* tag 'for-5.14-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Revert "btrfs: compression: don't try to compress if we don't have enough pages"

4 years agoMerge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Thu, 26 Aug 2021 15:43:20 +0000 (08:43 -0700)]
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
bpf 2021-08-26

We've added 1 non-merge commit during the last 1 day(s):

1) Fix ringbuf helper function compatibility, from Daniel.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix ringbuf helper function compatibility
====================

Link: https://lore.kernel.org/r/20210826153720.19083-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoMerge branch 'net-hns3-add-some-fixes-for-net'
Jakub Kicinski [Thu, 26 Aug 2021 14:24:20 +0000 (07:24 -0700)]
Merge branch 'net-hns3-add-some-fixes-for-net'

Guangbin Huang says:

====================
net: hns3: add some fixes for -net

This series adds some fixes for the HNS3 ethernet driver.
====================

Link: https://lore.kernel.org/r/1629976921-43438-1-git-send-email-huangguangbin2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet: hns3: fix get wrong pfc_en when query PFC configuration
Guangbin Huang [Thu, 26 Aug 2021 11:22:01 +0000 (19:22 +0800)]
net: hns3: fix get wrong pfc_en when query PFC configuration

Currently, when query PFC configuration by dcbtool, driver will return
PFC enable status based on TC. As all priorities are mapped to TC0 by
default, if TC0 is enabled, then all priorities mapped to TC0 will be
shown as enabled status when query PFC setting, even though some
priorities have never been set.

for example:
$ dcb pfc show dev eth0
pfc-cap 4 macsec-bypass off delay 0
prio-pfc 0:off 1:off 2:off 3:off 4:off 5:off 6:off 7:off
$ dcb pfc set dev eth0 prio-pfc 0:on 1:on 2:on 3:on
$ dcb pfc show dev eth0
pfc-cap 4 macsec-bypass off delay 0
prio-pfc 0:on 1:on 2:on 3:on 4:on 5:on 6:on 7:on

To fix this problem, just returns user's PFC config parameter saved in
driver.

Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet: hns3: fix GRO configuration error after reset
Yufeng Mo [Thu, 26 Aug 2021 11:22:00 +0000 (19:22 +0800)]
net: hns3: fix GRO configuration error after reset

The GRO configuration is enabled by default after reset. This
is incorrect and should be restored to the user-configured value.
So this restoration is added during reset initialization.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet: hns3: change the method of getting cmd index in debugfs
Yufeng Mo [Thu, 26 Aug 2021 11:21:59 +0000 (19:21 +0800)]
net: hns3: change the method of getting cmd index in debugfs

Currently, the cmd index is obtained in debugfs by comparing file names.
However, this method may cause errors when processing more complex file
names. So, change this method by saving cmd in private data and comparing
it when getting cmd index in debugfs for optimization.

Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet: hns3: fix duplicate node in VLAN list
Guojia Liao [Thu, 26 Aug 2021 11:21:58 +0000 (19:21 +0800)]
net: hns3: fix duplicate node in VLAN list

VLAN list should not be added duplicate VLAN node, otherwise it would
cause "add failed" when restore VLAN from VLAN list, so this patch adds
VLAN ID check before adding node into VLAN list.

Fixes: c6075b193462 ("net: hns3: Record VF vlan tables")
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet: hns3: fix speed unknown issue in bond 4
Yonglong Liu [Thu, 26 Aug 2021 11:21:57 +0000 (19:21 +0800)]
net: hns3: fix speed unknown issue in bond 4

In bond 4, when the link goes down and up repeatedly, the bond may get an
unknown speed, and then this port can not work.

The driver notify netif_carrier_on() before update the link state, when the
bond receive carrier on, will query the speed of the port, if the query
operation happens before updating the link state, will get an unknown
speed. So need to notify netif_carrier_on() after update the link state.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet: hns3: add waiting time before cmdq memory is released
Yufeng Mo [Thu, 26 Aug 2021 11:21:56 +0000 (19:21 +0800)]
net: hns3: add waiting time before cmdq memory is released

After the cmdq registers are cleared, the firmware may take time to
clear out possible left over commands in the cmdq. Driver must release
cmdq memory only after firmware has completed processing of left over
commands.

Fixes: 232d0d55fca6 ("net: hns3: uninitialize command queue while unloading PF driver")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet: hns3: clear hardware resource when loading driver
Yufeng Mo [Thu, 26 Aug 2021 11:21:55 +0000 (19:21 +0800)]
net: hns3: clear hardware resource when loading driver

If a PF is bonded to a virtual machine and the virtual machine exits
unexpectedly, some hardware resource cannot be cleared. In this case,
loading driver may cause exceptions. Therefore, the hardware resource
needs to be cleared when the driver is loaded.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet: fix NULL pointer reference in cipso_v4_doi_free
王贇 [Thu, 26 Aug 2021 03:42:42 +0000 (11:42 +0800)]
net: fix NULL pointer reference in cipso_v4_doi_free

In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc
failed, we sometime observe panic:

  BUG: kernel NULL pointer dereference, address:
  ...
  RIP: 0010:cipso_v4_doi_free+0x3a/0x80
  ...
  Call Trace:
   netlbl_cipsov4_add_std+0xf4/0x8c0
   netlbl_cipsov4_add+0x13f/0x1b0
   genl_family_rcv_msg_doit.isra.15+0x132/0x170
   genl_rcv_msg+0x125/0x240

This is because in cipso_v4_doi_free() there is no check
on 'doi_def->map.std' when 'doi_def->type' equal 1, which
is possibe, since netlbl_cipsov4_add_std() haven't initialize
it before alloc 'doi_def->map.std'.

This patch just add the check to prevent panic happen for similar
cases.

Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'LiteETH-driver'
David S. Miller [Thu, 26 Aug 2021 11:13:52 +0000 (12:13 +0100)]
Merge branch 'LiteETH-driver'

Joel Stanley says:

====================
net: Add LiteETH network driver

This adds a driver for the LiteX network device, LiteEth.

v4 Fixes the bindings and adds r-b tags from Gabriel and Rob.

v3 Updates the bindings to describe the slots in a way that makes more
sense for the hardware, instead of trying to fit some existing
properties. The driver is updated to use these bindings, and fix some
issues pointed out by Gabriel.

v2 Addresses feedback from Jakub, with detailed changes in each patch.

It also moves to the litex register accessors so the system works on big
endian litex platforms. I tested with mor1k on an Arty A7-100T.

I have removed the mdio aspects of the driver as they are not needed for
basic operation. I will continue to work on adding support in the
future, but I don't think it needs to block the mac driver going in.

The binding describes the mdio registers, and has been fixed to not show
any warnings against dtschema master.

LiteEth is a simple driver for the FPGA based Ethernet device used in various
RISC-V, PowerPC's microwatt, OpenRISC's mor1k and other FPGA based
systems on chip.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: Add driver for LiteX's LiteETH network interface
Joel Stanley [Wed, 25 Aug 2021 22:21:06 +0000 (07:51 +0930)]
net: Add driver for LiteX's LiteETH network interface

LiteX is a soft system-on-chip that targets FPGAs. LiteETH is a basic
network device that is commonly used in LiteX designs.

The driver was first written in 2017 and has been maintained by the
LiteX community in various trees. Thank you to all who have contributed.

Co-developed-by: Gabriel Somlo <gsomlo@gmail.com>
Co-developed-by: David Shah <dave@ds0.me>
Co-developed-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Tested-by: Gabriel Somlo <gsomlo@gmail.com>
Reviewed-by: Gabriel Somlo <gsomlo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodt-bindings: net: Add bindings for LiteETH
Joel Stanley [Wed, 25 Aug 2021 22:21:05 +0000 (07:51 +0930)]
dt-bindings: net: Add bindings for LiteETH

LiteETH is a small footprint and configurable Ethernet core for FPGA
based system on chips.

The hardware is parametrised by the size and number of the slots in it's
receive and send buffers. These are described as properties, with the
commonly used values set as the default.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agortnetlink: Return correct error on changing device netns
Andrey Ignatov [Thu, 26 Aug 2021 00:25:40 +0000 (17:25 -0700)]
rtnetlink: Return correct error on changing device netns

Currently when device is moved between network namespaces using
RTM_NEWLINK message type and one of netns attributes (FLA_NET_NS_PID,
IFLA_NET_NS_FD, IFLA_TARGET_NETNSID) but w/o specifying IFLA_IFNAME, and
target namespace already has device with same name, userspace will get
EINVAL what is confusing and makes debugging harder.

Fix it so that userspace gets more appropriate EEXIST instead what makes
debugging much easier.

Before:

  # ./ifname.sh
  + ip netns add ns0
  + ip netns exec ns0 ip link add l0 type dummy
  + ip netns exec ns0 ip link show l0
  8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 66:90:b5:d5:78:69 brd ff:ff:ff:ff:ff:ff
  + ip link add l0 type dummy
  + ip link show l0
  10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 6e:c6:1f:15:20:8d brd ff:ff:ff:ff:ff:ff
  + ip link set l0 netns ns0
  RTNETLINK answers: Invalid argument

After:

  # ./ifname.sh
  + ip netns add ns0
  + ip netns exec ns0 ip link add l0 type dummy
  + ip netns exec ns0 ip link show l0
  8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 1e:4a:72:e3:e3:8f brd ff:ff:ff:ff:ff:ff
  + ip link add l0 type dummy
  + ip link show l0
  10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether f2:fc:fe:2b:7d:a6 brd ff:ff:ff:ff:ff:ff
  + ip link set l0 netns ns0
  RTNETLINK answers: File exists

The problem is that do_setlink() passes its `char *ifname` argument,
that it gets from a caller, to __dev_change_net_namespace() as is (as
`const char *pat`), but semantics of ifname and pat can be different.

For example, __rtnl_newlink() does this:

net/core/rtnetlink.c
    3270 char ifname[IFNAMSIZ];
     ...
    3286 if (tb[IFLA_IFNAME])
    3287 nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
    3288 else
    3289 ifname[0] = '\0';
     ...
    3364 if (dev) {
     ...
    3394 return do_setlink(skb, dev, ifm, extack, tb, ifname, status);
    3395 }

, i.e. do_setlink() gets ifname pointer that is always valid no matter
if user specified IFLA_IFNAME or not and then do_setlink() passes this
ifname pointer as is to __dev_change_net_namespace() as pat argument.

But the pat (pattern) in __dev_change_net_namespace() is used as:

net/core/dev.c
   11198 err = -EEXIST;
   11199 if (__dev_get_by_name(net, dev->name)) {
   11200 /* We get here if we can't use the current device name */
   11201 if (!pat)
   11202 goto out;
   11203 err = dev_get_valid_name(net, dev, pat);
   11204 if (err < 0)
   11205 goto out;
   11206 }

As the result the `goto out` path on line 11202 is neven taken and
instead of returning EEXIST defined on line 11198,
__dev_change_net_namespace() returns an error from dev_get_valid_name()
and this, in turn, will be EINVAL for ifname[0] = '\0' set earlier.

Fixes: d8a5ec672768 ("[NET]: netlink support for moving devices between network namespaces.")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoptp: ocp: Simplify Kconfig.
Jonathan Lemon [Wed, 25 Aug 2021 21:17:33 +0000 (14:17 -0700)]
ptp: ocp: Simplify Kconfig.

Remove the 'imply' statements, these apparently are not doing
what I expected.  Platform modules which are used by the driver
still need to be enabled in the overall config for them to be
used, but there isn't a hard dependency on them.

Use 'depend' for selectable modules which provide functions
used directly by the driver.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agor8169: add rtl_enable_exit_l1
Heiner Kallweit [Wed, 25 Aug 2021 16:29:48 +0000 (18:29 +0200)]
r8169: add rtl_enable_exit_l1

This adds a function for what has been magic register writes so far.
It's based on recent changes to vendor drivers r8101, r8168, r8125,
and deals with events that trigger an early ASPM L1 exit.
Description of the bits has been kindly provided by Realtek.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests/net: allow GRO coalesce test on veth
Paolo Abeni [Thu, 26 Aug 2021 07:30:42 +0000 (09:30 +0200)]
selftests/net: allow GRO coalesce test on veth

This change extends the existing GRO coalesce test to
allow running on top of a veth pair, so that no H/W dep
is required to run them.

By default gro.sh will use the veth backend, and will try
to use exiting H/W in loopback mode if a specific device
name is provided with the '-i' command line option.

No functional change is intended for the loopback-based
tests, just move all the relevant initialization/cleanup
code into the related script.

Introduces a new initialization helper script for the
veth backend, and plugs the correct helper script according
to the provided command line.

Additionally, enable veth-based tests by default.

v1 -> v2:
  - drop unused code in setup_veth_ns() - Willem

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'mac80211-next-for-net-next-2021-08-26' of git://git.kernel.org/pub/scm...
David S. Miller [Thu, 26 Aug 2021 09:47:43 +0000 (10:47 +0100)]
Merge tag 'mac80211-next-for-net-next-2021-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
A few more things:
 * Use correct DFS domain for self-managed devices
 * some preparations for transmit power element handling
   and other 6 GHz regulatory handling
 * TWT support in AP mode in mac80211
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosock: remove one redundant SKB_FRAG_PAGE_ORDER macro
Yunsheng Lin [Thu, 26 Aug 2021 02:49:47 +0000 (10:49 +0800)]
sock: remove one redundant SKB_FRAG_PAGE_ORDER macro

Both SKB_FRAG_PAGE_ORDER are defined to the same value in
net/core/sock.c and drivers/vhost/net.c.

Move the SKB_FRAG_PAGE_ORDER definition to net/core/sock.h,
as both net/core/sock.c and drivers/vhost/net.c include it,
and it seems a reasonable file to put the macro.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoocteontx2-pf: cn10k: Fix error return code in otx2_set_flowkey_cfg()
Yang Yingliang [Wed, 25 Aug 2021 06:34:47 +0000 (14:34 +0800)]
octeontx2-pf: cn10k: Fix error return code in otx2_set_flowkey_cfg()

If otx2_mbox_get_rsp() fails, otx2_set_flowkey_cfg() need return an
error code.

Fixes: e7938365459f ("octeontx2-pf: Fix algorithm index in MCAM rules with RSS action")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'dsa-hellcreek-fixes'
David S. Miller [Thu, 26 Aug 2021 09:26:06 +0000 (10:26 +0100)]
Merge branch 'dsa-hellcreek-fixes'

Kurt Kanzenbach says:

====================
net: dsa: hellcreek: 802.1Qbv Fixes

while using TAPRIO offloading on the Hirschmann hellcreek switch, I've noticed
two issues in the current implementation:

1. The gate control list is incorrectly programmed
2. The admin base time is not set properly

Fix it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: hellcreek: Adjust schedule look ahead window
Kurt Kanzenbach [Wed, 25 Aug 2021 13:58:13 +0000 (15:58 +0200)]
net: dsa: hellcreek: Adjust schedule look ahead window

Traffic schedules can only be started up to eight seconds within the
future. Therefore, the driver periodically checks every two seconds whether the
admin base time provided by the user is inside that window. If so the schedule
is started. Otherwise the check is deferred.

However, according to the programming manual the look ahead window size should
be four - not eight - seconds. By using the proposed value of four seconds
starting a schedule at a specified admin base time actually works as expected.

Fixes: 24dfc6eb39b2 ("net: dsa: hellcreek: Add TAPRIO offloading support")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: hellcreek: Fix incorrect setting of GCL
Kurt Kanzenbach [Wed, 25 Aug 2021 13:58:12 +0000 (15:58 +0200)]
net: dsa: hellcreek: Fix incorrect setting of GCL

Currently the gate control list which is programmed into the hardware is
incorrect resulting in wrong traffic schedules. The problem is the loop
variables are incremented before they are referenced. Therefore, move the
increment to the end of the loop.

Fixes: 24dfc6eb39b2 ("net: dsa: hellcreek: Add TAPRIO offloading support")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocxgb4: dont touch blocked freelist bitmap after free
Rahul Lakkireddy [Wed, 25 Aug 2021 21:29:42 +0000 (02:59 +0530)]
cxgb4: dont touch blocked freelist bitmap after free

When adapter init fails, the blocked freelist bitmap is already freed
up and should not be touched. So, move the bitmap zeroing closer to
where it was successfully allocated. Also handle adapter init failure
unwind path immediately and avoid setting up RDMA memory windows.

Fixes: 5b377d114f2b ("cxgb4: Add debugfs facility to inject FL starvation")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'inet-siphash'
David S. Miller [Thu, 26 Aug 2021 09:20:34 +0000 (10:20 +0100)]
Merge branch 'inet-siphash'

Eric Dumazet says:

====================
inet: use siphash in exception handling

A group of security researchers brought to our attention
the weakness of hash functions used in rt6_exception_hash()
and fnhe_hashfun()

I made two distinct patches to help backports, since IPv6
part was added in 4.15
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv4: use siphash instead of Jenkins in fnhe_hashfun()
Eric Dumazet [Wed, 25 Aug 2021 23:17:29 +0000 (16:17 -0700)]
ipv4: use siphash instead of Jenkins in fnhe_hashfun()

A group of security researchers brought to our attention
the weakness of hash function used in fnhe_hashfun().

Lets use siphash instead of Jenkins Hash, to considerably
reduce security risks.

Also remove the inline keyword, this really is distracting.

Fixes: d546c621542d ("ipv4: harden fnhe_hashfun()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv6: use siphash in rt6_exception_hash()
Eric Dumazet [Wed, 25 Aug 2021 23:17:28 +0000 (16:17 -0700)]
ipv6: use siphash in rt6_exception_hash()

A group of security researchers brought to our attention
the weakness of hash function used in rt6_exception_hash()

Lets use siphash instead of Jenkins Hash, to considerably
reduce security risks.

Following patch deals with IPv4.

Fixes: 35732d01fe31 ("ipv6: introduce a hash table to store dst cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Wei Wang <weiwan@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocfg80211: use wiphy DFS domain if it is self-managed
Sriram R [Wed, 25 Aug 2021 23:38:50 +0000 (05:08 +0530)]
cfg80211: use wiphy DFS domain if it is self-managed

Currently during CAC start or other radar events, the DFS
domain is fetched from cfg based on global DFS domain,
even if the wiphy regdomain disagrees.

But this could be different in case of self managed wiphy's
in case the self managed driver updates its database or supports
regions which has DFS domain set to UNSET in cfg80211 local
regdomain.

So for explicitly self-managed wiphys, just use their DFS
domain.

Signed-off-by: Sriram R <srirrama@codeaurora.org>
Link: https://lore.kernel.org/r/1629934730-16388-1-git-send-email-srirrama@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
4 years agoMerge branch 'ionic-next'
David S. Miller [Thu, 26 Aug 2021 08:41:50 +0000 (09:41 +0100)]
Merge branch 'ionic-next'

Shannon Nelson says:

====================
ionic: queue and filter mgmt updates

After a pair of simple code cleanups, we change the mac filter
management to split the updates between the driver's filter
list and the device's filter list so that we can keep the calls
to dev_uc_sync() and dev_mc_sync() under the netif_addr_lock
in ndo_set_rx_mode, and then sync the driver's list to the
device later in the rx_mode work task.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoionic: handle mac filter overflow
Shannon Nelson [Thu, 26 Aug 2021 01:24:50 +0000 (18:24 -0700)]
ionic: handle mac filter overflow

Make sure we go into PROMISC mode when we have too many
filters by specifically counting the filters that successfully
get saved to the firmware.

The device advertises max_ucast_filters and max_mcast_filters,
but really only has max_ucast_filters slots available for
uc and mc filters combined.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoionic: refactor ionic_lif_addr to remove a layer
Shannon Nelson [Thu, 26 Aug 2021 01:24:49 +0000 (18:24 -0700)]
ionic: refactor ionic_lif_addr to remove a layer

The filter counting in ionic_lif_addr() really isn't useful,
and potentially misleading, especially when we're checking in
ionic_lif_rx_mode() to see if we need to go into PROMISC mode.
We can safely refactor this and remove a calling layer.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoionic: sync the filters in the work task
Shannon Nelson [Thu, 26 Aug 2021 01:24:48 +0000 (18:24 -0700)]
ionic: sync the filters in the work task

In order to separate the atomic needs of __dev_uc_sync()
and __dev_mc_sync() from the safe rx_mode handling, we need
to have the ndo handler manipulate the driver's filter list,
and later have the driver sync the filters to the firmware,
outside of the atomic context.

Here we put __dev_mc_sync() and __dev_uc_sync() back into the
ndo callback to give them their netif_addr_lock context and
have them update the driver's filter list, flagging changes
that should be made to the device filter list.  Later, in the
rx_mode handler, we read those hints and sync up the device's
list as needed.

It is possible for multiple add/delete requests to come from
the stack before the rx_mode task processes the list, but the
handling of the sync status flag should keep everything sorted
correctly.  For example, if a delete of an existing filter is
followed by another add before the rx_mode task is run, as can
happen when going in and out of a bond, the add will cancel
the delete and no actual changes will be sent to the device.

We also add a check in the watchdog to see if there are any
stray unsync'd filters, possibly left over from a filter
overflow and waiting to get sync'd after some other filter
gets removed to make room.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoionic: flatten calls to set-rx-mode
Shannon Nelson [Thu, 26 Aug 2021 01:24:47 +0000 (18:24 -0700)]
ionic: flatten calls to set-rx-mode

Since only two functions call through ionic_set_rx_mode(), one
that can sleep and one that can't, we can split the function
and put the bits of code into the callers.  This removes an
unnecessary calling layer.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoionic: remove old work task types
Shannon Nelson [Thu, 26 Aug 2021 01:24:46 +0000 (18:24 -0700)]
ionic: remove old work task types

With the move of mac filter handling to outside of the
ndo_rx_mode context using the IONIC_DW_TYPE_RX_MODE,
we no longer are using IONIC_DW_TYPE_RX_ADDR_ADD and
IONIC_DW_TYPE_RX_ADDR_DEL and they can be removed.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'linux-can-fixes-for-5.14-20210826' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Thu, 26 Aug 2021 08:37:40 +0000 (09:37 +0100)]
Merge tag 'linux-can-fixes-for-5.14-20210826' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine says:

====================
pull-request: can 2021-08-26

this is a pull request of a single patch for net/master.

Stefan Mätje's patch fixes the interchange of RX and TX error counters
inthe esd_usb2 CAN driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomac80211: parse transmit power envelope element
Wen Gong [Fri, 20 Aug 2021 12:20:40 +0000 (08:20 -0400)]
mac80211: parse transmit power envelope element

Parse and store the transmit power envelope element.

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Link: https://lore.kernel.org/r/20210820122041.12157-8-wgong@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
4 years agoieee80211: add definition for transmit power envelope element
Wen Gong [Fri, 20 Aug 2021 12:20:39 +0000 (08:20 -0400)]
ieee80211: add definition for transmit power envelope element

IEEE Std 802.11ax™-2021 makes changes to the transmit power envelope
element, adjust the code accordingly.

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Link: https://lore.kernel.org/r/20210820122041.12157-7-wgong@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
4 years agoieee80211: add definition of regulatory info in 6 GHz operation information
Wen Gong [Fri, 20 Aug 2021 12:20:35 +0000 (08:20 -0400)]
ieee80211: add definition of regulatory info in 6 GHz operation information

IEEE Std 802.11ax™-2021 added regulatory info subfield in HE operation
element, add it to the header file.

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Link: https://lore.kernel.org/r/20210820122041.12157-3-wgong@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
4 years agonfc: st95hf: remove unused header includes
Krzysztof Kozlowski [Wed, 25 Aug 2021 14:24:59 +0000 (16:24 +0200)]
nfc: st95hf: remove unused header includes

Do not include unnecessary headers.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonfc: st21nfca: remove unused header includes
Krzysztof Kozlowski [Wed, 25 Aug 2021 14:24:58 +0000 (16:24 +0200)]
nfc: st21nfca: remove unused header includes

Do not include unnecessary headers.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonfc: st-nci: remove unused header includes
Krzysztof Kozlowski [Wed, 25 Aug 2021 14:24:57 +0000 (16:24 +0200)]
nfc: st-nci: remove unused header includes

Do not include unnecessary headers.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonfc: pn544: remove unused header includes
Krzysztof Kozlowski [Wed, 25 Aug 2021 14:24:56 +0000 (16:24 +0200)]
nfc: pn544: remove unused header includes

Do not include unnecessary headers.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonfc: mrvl: remove unused header includes
Krzysztof Kozlowski [Wed, 25 Aug 2021 14:24:55 +0000 (16:24 +0200)]
nfc: mrvl: remove unused header includes

Do not include unnecessary headers.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonfc: microread: remove unused header includes
Krzysztof Kozlowski [Wed, 25 Aug 2021 14:24:54 +0000 (16:24 +0200)]
nfc: microread: remove unused header includes

Do not include unnecessary headers.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocan: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX...
Stefan Mätje [Wed, 25 Aug 2021 21:52:27 +0000 (23:52 +0200)]
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters

This patch fixes the interchanged fetch of the CAN RX and TX error
counters from the ESD_EV_CAN_ERROR_EXT message. The RX error counter
is really in struct rx_msg::data[2] and the TX error counter is in
struct rx_msg::data[3].

Fixes: 96d8e90382dc ("can: Add driver for esd CAN-USB/2 device")
Link: https://lore.kernel.org/r/20210825215227.4947-2-stefan.maetje@esd.eu
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agonet: usb: asix: ax88772: fix boolconv.cocci warnings
kernel test robot [Wed, 25 Aug 2021 18:35:38 +0000 (20:35 +0200)]
net: usb: asix: ax88772: fix boolconv.cocci warnings

drivers/net/usb/asix_devices.c:757:60-65: WARNING: conversion to bool not needed here

 Remove unneeded conversion to bool

Semantic patch information:
 Relational and logical operators evaluate to bool,
 explicit conversion is overly verbose and unneeded.

Generated by: scripts/coccinelle/misc/boolconv.cocci

Fixes: 7a141e64cf14 ("net: usb: asix: ax88772: move embedded PHY detection as early as possible")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20210825183538.13070-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoSUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...
Trond Myklebust [Wed, 25 Aug 2021 19:33:14 +0000 (15:33 -0400)]
SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...

If the attempt to reserve a slot fails, we currently leak the XPT_BUSY
flag on the socket. Among other things, this make it impossible to close
the socket.

Fixes: 82011c80b3ec ("SUNRPC: Move svc_xprt_received() call sites")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
4 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Wed, 25 Aug 2021 19:45:31 +0000 (12:45 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
 "2 patches.

  Subsystems affected by this patch series: mm/memory-hotplug and
  MAINTAINERS"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  MAINTAINERS: exfat: update my email address
  mm/memory_hotplug: fix potential permanent lru cache disable

4 years agoMAINTAINERS: exfat: update my email address
Namjae Jeon [Wed, 25 Aug 2021 19:17:58 +0000 (12:17 -0700)]
MAINTAINERS: exfat: update my email address

My email address in exfat entry will be not available in a few days.
Update it to my own kernel.org address.

Link: https://lkml.kernel.org/r/20210825044833.16806-1-namjae.jeon@samsung.com
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memory_hotplug: fix potential permanent lru cache disable
Miaohe Lin [Wed, 25 Aug 2021 19:17:55 +0000 (12:17 -0700)]
mm/memory_hotplug: fix potential permanent lru cache disable

If offline_pages failed after lru_cache_disable(), it forgot to do
lru_cache_enable() in error path.  So we would have lru cache disabled
permanently in this case.

Link: https://lkml.kernel.org/r/20210821094246.10149-3-linmiaohe@huawei.com
Fixes: d479960e44f2 ("mm: disable LRU pagevec during the migration temporarily")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agopipe: do FASYNC notifications for every pipe IO, not just state changes
Linus Torvalds [Tue, 24 Aug 2021 17:39:25 +0000 (10:39 -0700)]
pipe: do FASYNC notifications for every pipe IO, not just state changes

It turns out that the SIGIO/FASYNC situation is almost exactly the same
as the EPOLLET case was: user space really wants to be notified after
every operation.

Now, in a perfect world it should be sufficient to only notify user
space on "state transitions" when the IO state changes (ie when a pipe
goes from unreadable to readable, or from unwritable to writable).  User
space should then do as much as possible - fully emptying the buffer or
what not - and we'll notify it again the next time the state changes.

But as with EPOLLET, we have at least one case (stress-ng) where the
kernel sent SIGIO due to the pipe being marked for asynchronous
notification, but the user space signal handler then didn't actually
necessarily read it all before returning (it read more than what was
written, but since there could be multiple writes, it could leave data
pending).

The user space code then expected to get another SIGIO for subsequent
writes - even though the pipe had been readable the whole time - and
would only then read more.

This is arguably a user space bug - and Colin King already fixed the
stress-ng code in question - but the kernel regression rules are clear:
it doesn't matter if kernel people think that user space did something
silly and wrong.  What matters is that it used to work.

So if user space depends on specific historical kernel behavior, it's a
regression when that behavior changes.  It's on us: we were silly to
have that non-optimal historical behavior, and our old kernel behavior
was what user space was tested against.

Because of how the FASYNC notification was tied to wakeup behavior, this
was first broken by commits f467a6a66419 and 1b6b26ae7053 ("pipe: fix
and clarify pipe read/write wakeup logic"), but at the time it seems
nobody noticed.  Probably because the stress-ng problem case ends up
being timing-dependent too.

It was then unwittingly fixed by commit 3a34b13a88ca ("pipe: make pipe
writes always wake up readers") only to be broken again when by commit
3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal
loads").

And at that point the kernel test robot noticed the performance
refression in the stress-ng.sigio.ops_per_sec case.  So the "Fixes" tag
below is somewhat ad hoc, but it matches when the issue was noticed.

Fix it for good (knock wood) by simply making the kill_fasync() case
separate from the wakeup case.  FASYNC is quite rare, and we clearly
shouldn't even try to use the "avoid unnecessary wakeups" logic for it.

Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/
Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge branch 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm...
Linus Torvalds [Wed, 25 Aug 2021 16:56:10 +0000 (09:56 -0700)]
Merge branch 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull ucount fixes from Eric Biederman:
 "This branch fixes a regression that made it impossible to increase
  rlimits that had been converted to the ucount infrastructure, and also
  fixes a reference counting bug where the reference was not incremented
  soon enough.

  The fixes are trivial and the bugs have been encountered in the wild,
  and the fixes have been tested"

* 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucounts: Increase ucounts reference counter before the security hook
  ucounts: Fix regression preventing increasing of rlimits in init_user_ns

4 years agoceph: fix possible null-pointer dereference in ceph_mdsmap_decode()
Tuo Li [Thu, 5 Aug 2021 15:14:34 +0000 (08:14 -0700)]
ceph: fix possible null-pointer dereference in ceph_mdsmap_decode()

kcalloc() is called to allocate memory for m->m_info, and if it fails,
ceph_mdsmap_destroy() behind the label out_err will be called:
  ceph_mdsmap_destroy(m);

In ceph_mdsmap_destroy(), m->m_info is dereferenced through:
  kfree(m->m_info[i].export_targets);

To fix this possible null-pointer dereference, check m->m_info before the
for loop to free m->m_info[i].export_targets.

[ jlayton: fix up whitespace damage
   only kfree(m->m_info) if it's non-NULL ]

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
4 years agoceph: correctly handle releasing an embedded cap flush
Xiubo Li [Wed, 18 Aug 2021 13:38:42 +0000 (21:38 +0800)]
ceph: correctly handle releasing an embedded cap flush

The ceph_cap_flush structures are usually dynamically allocated, but
the ceph_cap_snap has an embedded one.

When force umounting, the client will try to remove all the session
caps. During this, it will free them, but that should not be done
with the ones embedded in a capsnap.

Fix this by adding a new boolean that indicates that the cap flush is
embedded in a capsnap, and skip freeing it if that's set.

At the same time, switch to using list_del_init() when detaching the
i_list and g_list heads.  It's possible for a forced umount to remove
these objects but then handle_cap_flushsnap_ack() races in and does the
list_del_init() again, corrupting memory.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/52283
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>