]> www.infradead.org Git - users/willy/linux.git/log
users/willy/linux.git
5 years agoRDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series
Luke Starrett [Thu, 21 Nov 2019 06:22:21 +0000 (01:22 -0500)]
RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series

In the first version of Gen P5 ASIC, chip-id was always set to 0x1750 for
all adaptor port configurations. This has been fixed in the new chip rev.

Due to this missing fix users are not able to use adaptors based on latest
chip rev of Broadcom's Gen P5 adaptors.

Fixes: ae8637e13185 ("RDMA/bnxt_re: Add chip context to identify 57500 series")
Link: https://lore.kernel.org/r/1574317343-23300-2-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Luke Starrett <luke.starrett@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/bnxt_re: Fix Kconfig indentation
Krzysztof Kozlowski [Wed, 20 Nov 2019 13:41:38 +0000 (21:41 +0800)]
RDMA/bnxt_re: Fix Kconfig indentation

Adjust indentation from spaces to tab (+optional two spaces) as in coding
style with command like:
$ sed -e 's/^        /\t/' -i */Kconfig

Link: https://lore.kernel.org/r/20191120134138.15245-1-krzk@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoMerge branch 'ib-guids' into rdma.git for-next
Jason Gunthorpe [Fri, 22 Nov 2019 20:08:34 +0000 (16:08 -0400)]
Merge branch 'ib-guids' into rdma.git for-next

Danit Goldberg says:

====================
This series extends RTNETLINK to provide IB port and node GUIDs, which
were configured for Infiniband VFs.

The functionality to set VF GUIDs already existed for a long time, and
here we are adding the missing "get" so that netlink will be symmetric and
various cloud orchestration tools will be able to manage such VFs more
naturally.

The iproute2 was extended too to present those GUIDs.

- ip link show <device>

For example:
- ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33
- ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10
- ip link show ib4
    ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
    link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    vf 0     link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
    spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off
====================

Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies

* branch 'ib-guids': (35 commits)
  IB/mlx5: Implement callbacks for getting VFs GUID attributes
  IB/ipoib: Add ndo operation for getting VFs GUID attributes
  IB/core: Add interfaces to get VF node and port GUIDs
  net/core: Add support for getting VF GUIDs

  net/mlx5: Add new chain for netfilter flow table offload
  net/mlx5: Refactor creating fast path prio chains
  net/mlx5: Accumulate levels for chains prio namespaces
  net/mlx5: Define fdb tc levels per prio
  net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines
  net/mlx5: Simplify fdb chain and prio eswitch defines
  IB/mlx5: Load profile according to RoCE enablement state
  IB/mlx5: Rename profile and init methods
  net/mlx5: Handle "enable_roce" devlink param
  net/mlx5: Document flow_steering_mode devlink param
  devlink: Add new "enable_roce" generic device param
  net/mlx5: fix spelling mistake "metdata" -> "metadata"
  net/mlx5: fix kvfree of uninitialized pointer spec
  IB/mlx5: Introduce and use mlx5_core_is_vf()
  net/mlx5: E-switch, Enable metadata on own vport
  net/mlx5: Refactor ingress acl configuration
  ...

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Implement callbacks for getting VFs GUID attributes
Danit Goldberg [Wed, 6 Nov 2019 13:18:12 +0000 (15:18 +0200)]
IB/mlx5: Implement callbacks for getting VFs GUID attributes

Implement the IB defined callback mlx5_ib_get_vf_guid used to query FW
for VFs attributes and return node and port GUIDs.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoIB/ipoib: Add ndo operation for getting VFs GUID attributes
Danit Goldberg [Wed, 6 Nov 2019 12:57:00 +0000 (14:57 +0200)]
IB/ipoib: Add ndo operation for getting VFs GUID attributes

Add ndo operation to the network driver that enables configuring
ipoib_get_vf_guid operation. The operation allows to get a VF port
and node GUIDs.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoIB/core: Add interfaces to get VF node and port GUIDs
Danit Goldberg [Wed, 6 Nov 2019 13:08:32 +0000 (15:08 +0200)]
IB/core: Add interfaces to get VF node and port GUIDs

Provide ability to get node and port GUIDs of VFs to be symmetrical
to already existing set option.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/core: Add support for getting VF GUIDs
Danit Goldberg [Wed, 6 Nov 2019 13:30:07 +0000 (15:30 +0200)]
net/core: Add support for getting VF GUIDs

Introduce a new ndo: ndo_get_vf_guid, to get from the net
device the port and node GUID.

New applications can choose to use this interface to show
GUIDs with iproute2 with commands such as:

- ip link show ib4
ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
vf 0     link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset
Michal Kalderon [Mon, 18 Nov 2019 15:06:45 +0000 (17:06 +0200)]
RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset

When running against rdma-core that doesn't support doorbell recovery, the
rdma_user_mmap_entry won't be allocated for doorbell recovery related
mappings.

We have a flag indicating whether rdma-core supports doorbell recovery or
not which was used during initialization, however some cases didn't check
that the rdma_user_mmap_entry exists before attempting to acquire it's
offset.

Fixes: 97f612509294 ("RDMA/qedr: Add doorbell overflow recovery support")
Link: https://lore.kernel.org/r/20191118150645.26602-1-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cm: Use refcount_t type for refcount variable
Danit Goldberg [Sun, 17 Nov 2019 13:33:21 +0000 (15:33 +0200)]
RDMA/cm: Use refcount_t type for refcount variable

This atomic in struct cm_id_private is being used as a refcount, change it
to refcount_t for better clarity and to get the refcount protections.

Link: https://lore.kernel.org/r/1573997601-4502-1-git-send-email-danitg@mellanox.com
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Support extended number of strides for Striding RQ
Mark Zhang [Fri, 15 Nov 2019 15:45:55 +0000 (17:45 +0200)]
IB/mlx5: Support extended number of strides for Striding RQ

Extends the minimum single WQE strides from 64 to 8, which is exposed
by the "min_single_wqe_log_num_of_strides" field of striding_rq_caps.
Choose right number of strides based on FW capability.

Link: https://lore.kernel.org/r/20191115154555.247856-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx4: Update HW GID table while adding vlan GID
Danit Goldberg [Fri, 15 Nov 2019 15:44:57 +0000 (17:44 +0200)]
IB/mlx4: Update HW GID table while adding vlan GID

When adding a new GID compare the vlan along with the GID and type. This
allows vlan's to have GIDs that alias each other, such as the default
GID. Otherwise they the GID cache view can become inconsistent with the HW
view.

Link: https://lore.kernel.org/r/20191115154457.247763-1-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/iw_cgxb4: Fix an error handling path in 'c4iw_connect()'
Christophe JAILLET [Mon, 23 Sep 2019 19:07:46 +0000 (21:07 +0200)]
RDMA/iw_cgxb4: Fix an error handling path in 'c4iw_connect()'

We should jump to fail3 in order to undo the 'xa_insert_irq()' call.

Link: https://lore.kernel.org/r/20190923190746.10964-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cma: Use ACK timeout for RoCE packetLifeTime
Dag Moxnes [Wed, 30 Oct 2019 12:44:00 +0000 (13:44 +0100)]
RDMA/cma: Use ACK timeout for RoCE packetLifeTime

The cma is currently using a hard-coded value, CMA_IBOE_PACKET_LIFETIME,
for the PacketLifeTime, as it can not be determined from the network.
This value might not be optimal for all networks.

The cma module supports the function rdma_set_ack_timeout to set the ACK
timeout for a QP associated with a connection. As per IBTA 12.7.34 local
ACK timeout = (2 * PacketLifeTime + Local CA’s ACK delay).  Assuming a
negligible local ACK delay, we can use PacketLifeTime = local ACK
timeout/2 as a reasonable approximation for RoCE networks.

Link: https://lore.kernel.org/r/1572439440-17416-1-git-send-email-dag.moxnes@oracle.com
Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/umem: remove the dmasync argument to ib_umem_get
Christoph Hellwig [Wed, 13 Nov 2019 07:32:14 +0000 (08:32 +0100)]
IB/umem: remove the dmasync argument to ib_umem_get

The argument is always ignored, so remove it.

Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agodma-mapping: remove the DMA_ATTR_WRITE_BARRIER flag
Christoph Hellwig [Wed, 13 Nov 2019 07:32:13 +0000 (08:32 +0100)]
dma-mapping: remove the DMA_ATTR_WRITE_BARRIER flag

This flag is not implemented by any backend and only set by the ib_umem
module in a single instance.

Link: https://lore.kernel.org/r/20191113073214.9514-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/efa: Clear the admin command buffer prior to its submission
Gal Pressman [Tue, 12 Nov 2019 09:26:08 +0000 (11:26 +0200)]
RDMA/efa: Clear the admin command buffer prior to its submission

We cannot rely on the entry memcpy as we only copy the actual size of the
command, the rest of the bytes must be memset to zero.

Currently providing non-zero memory will not have any user visible impact.
However, since admin commands are extendable (in a backwards compatible
way) everything beyond the size of the command must be cleared to prevent
issues in the future.

Fixes: 0420e542569b ("RDMA/efa: Implement functions that submit and complete admin commands")
Link: https://lore.kernel.org/r/20191112092608.46964-1-galpress@amazon.com
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/siw: Cleanup unused mmap structures.
Bernard Metzler [Wed, 13 Nov 2019 15:34:04 +0000 (16:34 +0100)]
RDMA/siw: Cleanup unused mmap structures.

Removes obsolete driver specific mmap information after
generalization of RDMA driver mmap service. Also removes
useless forward declaration of struct siw_mr.

Fixes: 11f1a75567c4 ("RDMA/siw: Use the common mmap_xa helpers")
Link: https://lore.kernel.org/r/20191113153404.7402-1-bmt@zurich.ibm.com
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/qedr: Make qedr_iw_load_qp() static
Kamal Heib [Sun, 10 Nov 2019 11:36:45 +0000 (13:36 +0200)]
RDMA/qedr: Make qedr_iw_load_qp() static

The function qedr_iw_load_qp() is only used in qedr_iw_cm.c

Fixes: 82af6d19d8d9 ("RDMA/qedr: Fix synchronization methods and memory leaks in qedr")
Link: https://lore.kernel.org/r/20191110113645.20058-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/ocrdma: Fix spelling mistake in variable name
Colin Ian King [Thu, 7 Nov 2019 22:48:55 +0000 (22:48 +0000)]
RDMA/ocrdma: Fix spelling mistake in variable name

There is a spelling mistake in the variable nak_invalid_requst_errors,
rename it to nak_invalid_request_errors.

Link: https://lore.kernel.org/r/20191107224855.417647-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/qib: Validate ->show()/store() callbacks before calling them
Viresh Kumar [Thu, 7 Nov 2019 03:20:25 +0000 (08:50 +0530)]
RDMA/qib: Validate ->show()/store() callbacks before calling them

The permissions of the read-only or write-only sysfs files can be
changed (as root) and the user can then try to read a write-only file or
write to a read-only file which will lead to kernel crash here.

Protect against that by always validating the show/store callbacks.

Link: https://lore.kernel.org/r/d45cc26361a174ae12dbb86c994ef334d257924b.1573096807.git.viresh.kumar@linaro.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/i40iw: Fix potential use after free
Pan Bian [Wed, 6 Nov 2019 06:44:11 +0000 (14:44 +0800)]
RDMA/i40iw: Fix potential use after free

Release variable dst after logging dst->error to avoid possible use after
free.

Link: https://lore.kernel.org/r/1573022651-37171-1-git-send-email-bianpan2016@163.com
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/qedr: Fix potential use after free
Pan Bian [Wed, 6 Nov 2019 06:23:54 +0000 (14:23 +0800)]
RDMA/qedr: Fix potential use after free

Move the release operation after error log to avoid possible use after
free.

Link: https://lore.kernel.org/r/1573021434-18768-1-git-send-email-bianpan2016@163.com
Signed-off-by: Pan Bian <bianpan2016@163.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonet/mlx5: Add new chain for netfilter flow table offload
Paul Blakey [Mon, 11 Nov 2019 23:34:29 +0000 (00:34 +0100)]
net/mlx5: Add new chain for netfilter flow table offload

Netfilter tables (nftables) implements a software datapath that
comes after tc ingress datapath. The datapath supports offloading
such rules via the flow table offload API.

This API is currently only used by NFT and it doesn't provide the
global priority in regards to tc offload, so we assume offloading such
rules must come after tc. It does provide a flow table priority
parameter, so we need to provide some supported priority range.

For that, split fastpath prio to two, flow table offload and tc offload,
with one dedicated priority chain for flow table offload.

Next patch will re-use the multi chain API to access this chain by
allowing access to this chain by the fdb_sub_namespace.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Refactor creating fast path prio chains
Paul Blakey [Mon, 11 Nov 2019 23:34:28 +0000 (00:34 +0100)]
net/mlx5: Refactor creating fast path prio chains

Next patch will re-use this to add a new chain but in a
different prio.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Accumulate levels for chains prio namespaces
Paul Blakey [Mon, 11 Nov 2019 23:34:27 +0000 (00:34 +0100)]
net/mlx5: Accumulate levels for chains prio namespaces

Tc chains are implemented by creating a chained prio steering type, and
inside it there is a namespace for each chain (FDB_TC_MAX_CHAINS). Each
of those has a list of priorities.

Currently, all namespaces in a prio start at the parent prio level.
But since we can jump from chain (namespace) to another chain in the
same prio, we need the levels for higher chains to be higher as well.
So we created unused prios to account for levels in previous namespaces.

Fix that by accumulating the namespaces levels if we are inside a chained
type prio, and removing the unused prios.

Fixes: 328edb499f99 ('net/mlx5: Split FDB fast path prio to multiple namespaces')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Define fdb tc levels per prio
Paul Blakey [Mon, 11 Nov 2019 23:34:26 +0000 (00:34 +0100)]
net/mlx5: Define fdb tc levels per prio

Define FDB_TC_LEVELS_PER_PRIO instead of magic number 2.
This is the number of levels used by each tc prio table in the fdb.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines
Paul Blakey [Mon, 11 Nov 2019 23:34:25 +0000 (00:34 +0100)]
net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines

Rename it to prepare for next patch that will add a
different type of offload to the FDB.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Simplify fdb chain and prio eswitch defines
Paul Blakey [Mon, 11 Nov 2019 23:34:24 +0000 (00:34 +0100)]
net/mlx5: Simplify fdb chain and prio eswitch defines

FDB_MAX_CHAIN and FDB_MAX_PRIO were defined differently depending
on if CONFIG_MLX5_ESWITCH is enabled to save space on allocations.

This is a minor space saving, and there is no real need for it.
Simplify things instead, and define them the same in both cases.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoRDMA/srpt: Report the SCSI residual to the initiator
Bart Van Assche [Tue, 5 Nov 2019 21:46:32 +0000 (13:46 -0800)]
RDMA/srpt: Report the SCSI residual to the initiator

The code added by this patch is similar to the code that already exists in
ibmvscsis_determine_resid(). This patch has been tested by running the
following command:

strace sg_raw -r 1k /dev/sdb 12 00 00 00 60 00 -o inquiry.bin |&
    grep resid=

Link: https://lore.kernel.org/r/20191105214632.183302-1-bvanassche@acm.org
Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Support flow counters offset for bulk counters
Yevgeny Kliteynik [Sun, 3 Nov 2019 14:07:23 +0000 (16:07 +0200)]
IB/mlx5: Support flow counters offset for bulk counters

Add support for flow steering counters action with a non-base counter
ID (offset) for bulk counters.

When creating a flow counter object, save the bulk value.  This value is
used when a flow action with a non-base counter ID is requested - to
validate that the required offset is in the range of the allocated bulk.

Link: https://lore.kernel.org/r/20191103140723.77411-1-leon@kernel.org
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA: Change MAD processing function to remove extra casting and parameter
Leon Romanovsky [Tue, 29 Oct 2019 06:27:45 +0000 (08:27 +0200)]
RDMA: Change MAD processing function to remove extra casting and parameter

All users of process_mad() converts input pointers from ib_mad_hdr to be
ib_mad, update the function declaration to use ib_mad directly.

Also remove not used input MAD size parameter.

Link: https://lore.kernel.org/r/20191029062745.7932-17-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-By: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hfi1: Delete unreachable code
Leon Romanovsky [Tue, 29 Oct 2019 06:27:36 +0000 (08:27 +0200)]
RDMA/hfi1: Delete unreachable code

All callers allocate MAD structures with proper sizes, there is no need to
recheck it.

Link: https://lore.kernel.org/r/20191029062745.7932-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Load profile according to RoCE enablement state
Michael Guralnik [Fri, 8 Nov 2019 23:45:28 +0000 (23:45 +0000)]
IB/mlx5: Load profile according to RoCE enablement state

When RoCE is disabled load mlx5_ib in raw_eth profile.
Clean pf_profile roce capability checks as it will not be used without
roce capability.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoIB/mlx5: Rename profile and init methods
Michael Guralnik [Fri, 8 Nov 2019 23:45:26 +0000 (23:45 +0000)]
IB/mlx5: Rename profile and init methods

Rename uplink_rep_profile and its unique init and cleanup stages to
suit its upcoming use as the profile when RoCE is disabled.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Handle "enable_roce" devlink param
Michael Guralnik [Fri, 8 Nov 2019 23:45:24 +0000 (23:45 +0000)]
net/mlx5: Handle "enable_roce" devlink param

Register "enable_roce" param, default value is RoCE enabled.
Current configuration is stored on mlx5_core_dev and exposed to user
through the cmode runtime devlink param.
Changing configuration requires changing the cmode driverinit devlink
param and calling devlink reload.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Document flow_steering_mode devlink param
Michael Guralnik [Fri, 8 Nov 2019 23:45:22 +0000 (23:45 +0000)]
net/mlx5: Document flow_steering_mode devlink param

Add documentation for current mlx5 supported devlink param.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agodevlink: Add new "enable_roce" generic device param
Michael Guralnik [Fri, 8 Nov 2019 23:45:20 +0000 (23:45 +0000)]
devlink: Add new "enable_roce" generic device param

New device parameter to enable/disable handling of RoCE traffic in the
device.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoRDMA/hns: Modify appropriate printings
Wenpeng Liang [Tue, 5 Nov 2019 11:08:02 +0000 (19:08 +0800)]
RDMA/hns: Modify appropriate printings

Modify some printings that is not in uniformed style, non-standard or with
spelling errors.

Link: https://lore.kernel.org/r/1572952082-6681-10-git-send-email-liweihang@hisilicon.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Fix non-standard error codes
Yixian Liu [Tue, 5 Nov 2019 11:08:01 +0000 (19:08 +0800)]
RDMA/hns: Fix non-standard error codes

It is better to return a linux error code than define a private constant.

Link: https://lore.kernel.org/r/1572952082-6681-9-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Modify hns_roce_hw_v2_get_cfg to simplify the code
Lang Cheng [Tue, 5 Nov 2019 11:08:00 +0000 (19:08 +0800)]
RDMA/hns: Modify hns_roce_hw_v2_get_cfg to simplify the code

Merge base configuration of hr_dev into hns_roce_hw_v2_get_cfg(). In
addition, there is no need to return 0 at last, so we change return type
of it to void.

Link: https://lore.kernel.org/r/1572952082-6681-8-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Simplify doorbell initialization code
Lang Cheng [Tue, 5 Nov 2019 11:07:59 +0000 (19:07 +0800)]
RDMA/hns: Simplify doorbell initialization code

If a variable needs to be set to 0 before use, it can be directly
initialized to 0.

Link: https://lore.kernel.org/r/1572952082-6681-7-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Replace not intuitive function/macro names
Yixing Liu [Tue, 5 Nov 2019 11:07:58 +0000 (19:07 +0800)]
RDMA/hns: Replace not intuitive function/macro names

Replace "sw2hw" and "hw2sw" which is hard to understand with "create" and
"destroy".

Link: https://lore.kernel.org/r/1572952082-6681-6-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Modify fields of struct hns_roce_srq
Yixian Liu [Tue, 5 Nov 2019 11:07:57 +0000 (19:07 +0800)]
RDMA/hns: Modify fields of struct hns_roce_srq

Use wqe_cnt instead of max which means the queue size of srq, and remove
wqe_ctr which is not used.

Link: https://lore.kernel.org/r/1572952082-6681-5-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Delete unnecessary uar from hns_roce_cq
Yixian Liu [Tue, 5 Nov 2019 11:07:56 +0000 (19:07 +0800)]
RDMA/hns: Delete unnecessary uar from hns_roce_cq

The uar information is already recorded in priv_uar of hns_roce_dev, there
is no need to record it in hns_roce_cq again.

Link: https://lore.kernel.org/r/1572952082-6681-4-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Remove unnecessary structure hns_roce_sqp
Lang Cheng [Tue, 5 Nov 2019 11:07:55 +0000 (19:07 +0800)]
RDMA/hns: Remove unnecessary structure hns_roce_sqp

Special QP have no differences with normal qp in data structure, so
definition of struct hns_roce_sqp should be removed and replaced by struct
hns_roce_qp.

Link: https://lore.kernel.org/r/1572952082-6681-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Delete unnecessary variable max_post
Yixian Liu [Tue, 5 Nov 2019 11:07:54 +0000 (19:07 +0800)]
RDMA/hns: Delete unnecessary variable max_post

There is no need to define max_post in hns_roce_wq, as it does same thing
as wqe_cnt.

Link: https://lore.kernel.org/r/1572952082-6681-2-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mlx5: Rewrite MAD processing logic to be readable
Leon Romanovsky [Tue, 29 Oct 2019 06:27:42 +0000 (08:27 +0200)]
RDMA/mlx5: Rewrite MAD processing logic to be readable

Multiple "if"s and "||" make extension of process_mad() function
as a tedious task, rewrite that function to be more readable.

Link: https://lore.kernel.org/r/20191029062745.7932-14-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/ocrdma: Simplify process_mad function
Leon Romanovsky [Tue, 29 Oct 2019 06:27:40 +0000 (08:27 +0200)]
RDMA/ocrdma: Simplify process_mad function

Change the switch with one case into a simple if statement so the code is
less confusing.

Link: https://lore.kernel.org/r/20191029062745.7932-12-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mad: Do not check MAD sizes in roce and ib drivers
Leon Romanovsky [Tue, 29 Oct 2019 06:27:37 +0000 (08:27 +0200)]
RDMA/mad: Do not check MAD sizes in roce and ib drivers

All callers for process_mad allocate MAD structures with proper sizes,
there is no need to recheck it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/ocrdma: Make ocrdma_pma_counters() return void
Leon Romanovsky [Tue, 29 Oct 2019 06:27:34 +0000 (08:27 +0200)]
RDMA/ocrdma: Make ocrdma_pma_counters() return void

This function always returns 0, so just use void and remove the bogus
checking at the only call site.

Link: https://lore.kernel.org/r/20191029062745.7932-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mad: Allocate zeroed MAD buffer
Leon Romanovsky [Tue, 29 Oct 2019 06:27:31 +0000 (08:27 +0200)]
RDMA/mad: Allocate zeroed MAD buffer

Ensure that MAD output buffer is zero-based allocated in all the callers
of process_mad and remove the various memset()'s from the drivers.

Link: https://lore.kernel.org/r/20191029062745.7932-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/qib: Delete empty check_cc_key function
Leon Romanovsky [Tue, 29 Oct 2019 06:27:44 +0000 (08:27 +0200)]
RDMA/qib: Delete empty check_cc_key function

Function always returns zero, just delete it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/qib: Delete extra line
Leon Romanovsky [Tue, 29 Oct 2019 06:27:43 +0000 (08:27 +0200)]
RDMA/qib: Delete extra line

Trivial cleanup to fix the following warning:
 drivers/infiniband/hw/qib/qib_iba6120.c:1420: warning: bad line:

Fixes: f931551bafe1 ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
Link: https://lore.kernel.org/r/20191029062745.7932-15-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mad: Delete never implemented functions
Leon Romanovsky [Tue, 29 Oct 2019 06:27:30 +0000 (08:27 +0200)]
RDMA/mad: Delete never implemented functions

Delete never implemented and used MAD functions.

Link: https://lore.kernel.org/r/20191029062745.7932-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRevert "RDMA/srpt: Postpone HCA removal until after configfs directory removal"
Bart Van Assche [Fri, 1 Nov 2019 20:47:56 +0000 (13:47 -0700)]
Revert "RDMA/srpt: Postpone HCA removal until after configfs directory removal"

Although the mentioned patch fixes a use-after-free bug, it introduces a
hang during shutdown. Since the latter is worse, revert this patch.

Link: https://lore.kernel.org/r/20191101204756.182162-1-bvanassche@acm.org
Reported-by: Honggang Li <honli@redhat.com>
Fixes: 9b64f7d0bb0a ("RDMA/srpt: Postpone HCA removal until after configfs directory removal")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/qedr: Remove unsupported modify_port callback
Kamal Heib [Mon, 28 Oct 2019 15:59:31 +0000 (17:59 +0200)]
RDMA/qedr: Remove unsupported modify_port callback

There is no need to return always zero for function which is not
supported.

Fixes: ac1b36e55a51 ("qedr: Add support for user context verbs")
Link: https://lore.kernel.org/r/20191028155931.1114-5-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/ocrdma: Remove unsupported modify_port callback
Kamal Heib [Mon, 28 Oct 2019 15:59:30 +0000 (17:59 +0200)]
RDMA/ocrdma: Remove unsupported modify_port callback

There is no need to return always zero for function which is not
supported.

Fixes: fe2caefcdf58 ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter")
Link: https://lore.kernel.org/r/20191028155931.1114-4-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Remove unsupported modify_port callback
Kamal Heib [Mon, 28 Oct 2019 15:59:29 +0000 (17:59 +0200)]
RDMA/hns: Remove unsupported modify_port callback

There is no need to return always zero for function which is not
supported.

Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/20191028155931.1114-3-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Fix return code when modify_port isn't supported
Kamal Heib [Mon, 28 Oct 2019 15:59:28 +0000 (17:59 +0200)]
RDMA/core: Fix return code when modify_port isn't supported

Improve return code from ib_modify_port() by doing the following:
 - Use "-EOPNOTSUPP" instead "-ENOSYS" which is the proper return code

 - Allow only fake IB_PORT_CM_SUP manipulation for RoCE providers that
   didn't implement the modify_port callback, otherwise return
   "-EOPNOTSUPP"

Fixes: 61e0962d5221 ("IB: Avoid ib_modify_port() failure for RoCE devices")
Link: https://lore.kernel.org/r/20191028155931.1114-2-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/qedr: Add iWARP doorbell recovery support
Michal Kalderon [Wed, 30 Oct 2019 09:44:17 +0000 (11:44 +0200)]
RDMA/qedr: Add iWARP doorbell recovery support

This patch adds the iWARP specific doorbells to the doorbell recovery
mechanism.

Link: https://lore.kernel.org/r/20191030094417.16866-9-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/qedr: Add doorbell overflow recovery support
Michal Kalderon [Wed, 30 Oct 2019 09:44:16 +0000 (11:44 +0200)]
RDMA/qedr: Add doorbell overflow recovery support

Use the doorbell recovery mechanism to register rdma related doorbells
that will be restored in case there is a doorbell overflow attention.

Link: https://lore.kernel.org/r/20191030094417.16866-8-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/qedr: Use the common mmap API
Michal Kalderon [Wed, 30 Oct 2019 09:44:15 +0000 (11:44 +0200)]
RDMA/qedr: Use the common mmap API

Remove all functions related to mmap from qedr and use the common API.

Link: https://lore.kernel.org/r/20191030094417.16866-7-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/siw: Use the common mmap_xa helpers
Michal Kalderon [Wed, 30 Oct 2019 09:44:14 +0000 (11:44 +0200)]
RDMA/siw: Use the common mmap_xa helpers

Remove the functions related to managing the mmap_xa database.  This code
is now common in ib_core.

Link: https://lore.kernel.org/r/20191030094417.16866-6-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Tested-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/efa: Use the common mmap_xa helpers
Michal Kalderon [Wed, 30 Oct 2019 09:44:13 +0000 (11:44 +0200)]
RDMA/efa: Use the common mmap_xa helpers

Remove the functions related to managing the mmap_xa database.  This code
was replaced with common code in ib_core.

Link: https://lore.kernel.org/r/20191030094417.16866-5-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA: Connect between the mmap entry and the umap_priv structure
Michal Kalderon [Wed, 30 Oct 2019 09:44:12 +0000 (11:44 +0200)]
RDMA: Connect between the mmap entry and the umap_priv structure

The rdma_user_mmap_io interface created a common interface for drivers to
correctly map hw resources and zap them once the ucontext is destroyed
enabling the drivers to safely free the hw resources.

However, this meant the drivers need to delay freeing the resource to the
ucontext destroy phase to ensure they were no longer mapped.  The new
mechanism for a common way of handling user/driver address mapping enabled
notifying the driver if all umap_priv mappings were removed, and enabled
freeing the hw resources when they are done with and not delay it until
ucontext destroy.

Since not all drivers use the mechanism, NULL can be sent to the
rdma_user_mmap_io interface to continue working as before.  Drivers that
use the mmap_xa interface can pass the entry being mapped to the
rdma_user_mmap_io function to be linked together.

Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Create mmap database and cookie helper functions
Michal Kalderon [Wed, 30 Oct 2019 09:44:11 +0000 (11:44 +0200)]
RDMA/core: Create mmap database and cookie helper functions

Create some common API's for adding entries to a xa_mmap. Searching for
an entry and freeing one.

The general approach is copied from the EFA driver and improved to be more
general and do more to help the drivers. Integration with the core allows
a reference counted scheme with a free function so that the driver can
know when its mmaps are all gone.

This significant new functionality will be helpful for drivers to have the
correct lifetime model for mmap objects.

Link: https://lore.kernel.org/r/20191030094417.16866-3-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonet/mlx5: fix spelling mistake "metdata" -> "metadata"
Colin Ian King [Tue, 5 Nov 2019 14:54:16 +0000 (14:54 +0000)]
net/mlx5: fix spelling mistake "metdata" -> "metadata"

There is a spelling mistake in a esw_warn warning message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: fix kvfree of uninitialized pointer spec
Colin Ian King [Tue, 5 Nov 2019 18:27:40 +0000 (18:27 +0000)]
net/mlx5: fix kvfree of uninitialized pointer spec

Currently when a call to  esw_vport_create_legacy_ingress_acl_group
fails the error exit path to label 'out' will cause a kvfree on the
uninitialized pointer spec.  Fix this by ensuring pointer spec is
initialized to NULL to avoid this issue.

Addresses-Coverity: ("Uninitialized pointer read")
Fixes: 10652f39943e ("net/mlx5: Refactor ingress acl configuration")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoRDMA/core: Move core content from ib_uverbs to ib_core
Michal Kalderon [Wed, 30 Oct 2019 09:44:10 +0000 (11:44 +0200)]
RDMA/core: Move core content from ib_uverbs to ib_core

Move functionality that is called by the driver, which is
related to umap, to a new file that will be linked in ib_core.
This is a first step in later enabling ib_uverbs to be optional.
vm_ops is now initialized in ib_uverbs_mmap instead of
priv_init to avoid having to move all the rdma_umap functions
as well.

Link: https://lore.kernel.org/r/20191030094417.16866-2-michal.kalderon@marvell.com
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Introduce and use mlx5_core_is_vf()
Parav Pandit [Mon, 28 Oct 2019 23:35:30 +0000 (23:35 +0000)]
IB/mlx5: Introduce and use mlx5_core_is_vf()

Instead of deciding a given device is virtual function or
not based on a device is PF or not, use already defined
MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf().

This enables to clearly identify PF, VF and non virtual functions.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-switch, Enable metadata on own vport
Parav Pandit [Mon, 28 Oct 2019 23:35:28 +0000 (23:35 +0000)]
net/mlx5: E-switch, Enable metadata on own vport

Currently on ECPF, metadata is enabled on the ECPF vport = 0xfffe
(manager vport).
Metadata when supported, must be enabled on own vport which is
used to pass metadata to vport of NIC Rx Flow Table.

Due to this error, traffic tagged by ingress ACL is not processed
correctly at NIC rx flow table level which is supposed to work
on metadata tag.

Hence, instead of working on eswitch manager vport, always working on
eswitch own vport regardless of PF or ECPF.

Given that mlx5_eswitch_query/modify_esw_vport_context() is used to
access other vport in legacy mode and own vport settings in switchdev mode,
extend low level API to explicitly specify other_vport.

Fixes: c1286050cf47 ("net/mlx5: E-Switch, Pass metadata from FDB to eswitch manager")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Refactor ingress acl configuration
Parav Pandit [Mon, 28 Oct 2019 23:35:26 +0000 (23:35 +0000)]
net/mlx5: Refactor ingress acl configuration

Drop, untagged, spoof check and untagged spoof check flow groups are
limited to legacy mode only.

Therefore, following refactoring is done to
(a) improve code readability
(b) have better code split between legacy and offloads mode

1. Move legacy flow groups under legacy structure
2. Add validity check for group deletion
3. Restrict scope of esw_vport_disable_ingress_acl to legacy mode
4. Rename esw_vport_enable_ingress_acl() to
esw_vport_create_ingress_acl_table() and limit its scope to
table creation
5. Introduce legacy flow groups creation helper
esw_legacy_create_ingress_acl_groups() and keep its scope to legacy mode
6. Reduce offloads ingress groups from 4 to just 1 metadata group
per vport
7. Removed redundant IS_ERR_OR_NULL as entries are marked NULL on free.
8. Shortern error message to remove redundant 'E-switch'

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Restrict metadata disablement to offloads mode
Parav Pandit [Mon, 28 Oct 2019 23:35:24 +0000 (23:35 +0000)]
net/mlx5: Restrict metadata disablement to offloads mode

Now that there is clear separation for acl setup/cleanup between legacy
and offloads mode, limit metdata disablement to offloads mode.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-switch, Offloads shift ACL programming during enable/disable vport
Vu Pham [Mon, 28 Oct 2019 23:35:22 +0000 (23:35 +0000)]
net/mlx5: E-switch, Offloads shift ACL programming during enable/disable vport

Currently legacy mode enables ACL while enabling vport, while offloads
mode enable ACL when moving to offloads mode.

Bring consistency to both modes by enabling/disabling ACL when
enabling/disabling a vport.

It also eliminates creating ingress ACL table on unused ECPF vport in
offloads mode.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-switch, Offloads introduce and use per vport acl tables APIs
Parav Pandit [Mon, 28 Oct 2019 23:35:20 +0000 (23:35 +0000)]
net/mlx5: E-switch, Offloads introduce and use per vport acl tables APIs

Introduce and use per vport ACL tables creation and destroy APIs, so that
subsequently patch can use them during enabling/disabling a vport.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Move ACL drop counters life cycle close to ACL lifecycle
Parav Pandit [Mon, 28 Oct 2019 23:35:19 +0000 (23:35 +0000)]
net/mlx5: Move ACL drop counters life cycle close to ACL lifecycle

It is better to create/destroy ACL related drop counters where the actual
drop rule ACLs are created/destroyed, so that ACL configuration is self
contained for ingress and egress.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-switch, Legacy introduce and use per vport acl tables APIs
Parav Pandit [Mon, 28 Oct 2019 23:35:17 +0000 (23:35 +0000)]
net/mlx5: E-switch, Legacy introduce and use per vport acl tables APIs

Introduce and use per vport ACL tables creation and destroy APIs, so that
subsequently patch can use them during enabling/disabling a vport in
unified way for legacy vs offloads mode.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-switch, Prepare code to handle vport enable error
Parav Pandit [Mon, 28 Oct 2019 23:35:15 +0000 (23:35 +0000)]
net/mlx5: E-switch, Prepare code to handle vport enable error

In subsequent patch, esw_enable_vport() could fail and return error.
Prepare code to handle such error.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Tide up state_lock and vport enabled flag usage
Parav Pandit [Mon, 28 Oct 2019 23:35:13 +0000 (23:35 +0000)]
net/mlx5: Tide up state_lock and vport enabled flag usage

When eswitch is disabled, vport event handler is unregistered.
This unregistration already synchronizes with running EQ event handler
in below code flow.

mlx5_eswitch_disable()
  mlx5_eswitch_event_handlers_unregister()
    mlx5_eq_notifier_unregister()
      atomic_notifier_chain_unregister()
        synchronize_rcu()

notifier_callchain
  eswitch_vport_event()
    queue_work()

Additionally vport->enabled flag is set under state_lock during
esw_enable_vport() but is not read under state_lock in
(a) esw_disable_vport() and (b) under atomic context
eswitch_vport_event().

It is also necessary to synchronize with already scheduled vport event.
This is already achieved using below sequence.

mlx5_eswitch_event_handlers_unregister()
  [..]
  flush_workqueue()

Hence,
(a) Remove vport->enabled check in eswitch_vport_event() which
doesn't make any sense.
(b) Remove redundant flush_workqueue() on every vport disable.
(c) Keep esw_disable_vport() symmetric with esw_enable_vport() for
state_lock.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Move legacy drop counter and rule under legacy structure
Parav Pandit [Mon, 28 Oct 2019 23:35:11 +0000 (23:35 +0000)]
net/mlx5: Move legacy drop counter and rule under legacy structure

To improve code readability, move legacy drop counters and droup rule
under legacy structure.

While at it,
(a) prefix drop flow counters helper with legacy_.
(b) nullify the rule pointers only if they were valid.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Move metdata fields under offloads structure
Parav Pandit [Mon, 28 Oct 2019 23:35:10 +0000 (23:35 +0000)]
net/mlx5: Move metdata fields under offloads structure

Metadata fields are offload mode specific.
To improve code readability, move metadata under offloads structure.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Correct comment for legacy fields
Parav Pandit [Mon, 28 Oct 2019 23:35:07 +0000 (23:35 +0000)]
net/mlx5: Correct comment for legacy fields

fdb_table is used for both legacy and offloads mode.
It was incorrect to comment that fdb_table is legacy specific.
Hence, fix the comment to reflect that fdb_table is used in legacy and
offloads mode.

Fixes: 131ce7014043 ("net/mlx5: E-Switch, Remove redundant mc_promisc NULL check")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Introduce and use mlx5_esw_is_manager_vport()
Parav Pandit [Mon, 28 Oct 2019 23:35:05 +0000 (23:35 +0000)]
net/mlx5: Introduce and use mlx5_esw_is_manager_vport()

Currently esw_enable_vport() does vport check for zero to enable drop
counters regardless of execution on ECPF/PF.
While esw_disable_vport() considers such scenario.

To keep consistency across code for checking for manager_vport,
introduce and use mlx5_esw_is_manager_vport() to check if a specified
vport is eswitch manager vport or not.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-switch, Introduce and use vlan rule config helper
Parav Pandit [Mon, 28 Oct 2019 23:35:03 +0000 (23:35 +0000)]
net/mlx5: E-switch, Introduce and use vlan rule config helper

Between legacy mode and switchdev mode, only two fields are changed,
vlan_tag and flow action.
Hence to avoid duplicte code between two modes, introduce and and use
helper function to configure allowed VLAN rule.

While at it, get rid of duplicate debug message.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-Switch, Rename ingress acl config in offloads mode
Vu Pham [Mon, 28 Oct 2019 23:35:00 +0000 (23:35 +0000)]
net/mlx5: E-Switch, Rename ingress acl config in offloads mode

Changing the function name esw_ingress_acl_common_config() to
esw_ingress_acl_config() to be consistent with egress config
function naming in offloads mode.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-Switch, Rename egress config to generic name
Vu Pham [Mon, 28 Oct 2019 23:34:58 +0000 (23:34 +0000)]
net/mlx5: E-Switch, Rename egress config to generic name

Refactor vport egress config in offloads mode

Refactoring vport egress configuration in offloads mode that
includes egress prio tag configuration.
This makes code symmetric to ingress configuration.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Fixed a typo in a comment in esw_del_uc_addr()
Qing Huang [Mon, 28 Oct 2019 23:34:56 +0000 (23:34 +0000)]
net/mlx5: Fixed a typo in a comment in esw_del_uc_addr()

Changed "managerss" to "managers".

Fixes: a1b3839ac4a4 ("net/mlx5: E-Switch, Properly refer to the esw manager vport")
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoIB/mlx5: Test write combining support
Michael Guralnik [Mon, 10 Jun 2019 12:21:24 +0000 (15:21 +0300)]
IB/mlx5: Test write combining support

Linux can run in all sorts of physical machines and VMs where write
combining may or may not be supported. Currently there is no way to
reliably tell if the system supports WC, or not. The driver uses WC to
optimize posting work to the HCA, and getting this wrong in either
direction can cause a significant performance loss.

Add a test in mlx5_ib initialization process to test whether
write-combining is supported on the machine.  The test will run as part of
the enable_driver callback to ensure that the test runs after the device
is setup and can create and modify the QP needed, but runs before the
device is exposed to the users.

The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame
is different from the WQE in memory, requesting CQE only on the BlueFlame
WQE. By checking whether we received a completion on one of these WQEs we
can know if BlueFlame succeeded and this write-combining must be
supported.

Change reporting of BlueFlame support to be dependent on write-combining
support instead of the FW's guess as to what the machine can do.

Link: https://lore.kernel.org/r/20191027062234.10993-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mlx5: Return proper error value
Leon Romanovsky [Tue, 29 Oct 2019 05:57:21 +0000 (07:57 +0200)]
RDMA/mlx5: Return proper error value

Returned value from mlx5_mr_cache_alloc() is checked to be error or real
pointer. Return proper error code instead of NULL which is not checked
later.

Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191029055721.7192-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Fix build error again
Arnd Bergmann [Mon, 7 Oct 2019 21:18:08 +0000 (23:18 +0200)]
RDMA/hns: Fix build error again

This is not the first attempt to fix building random configurations,
unfortunately the attempt in commit a07fc0bb483e ("RDMA/hns: Fix build
error") caused a new problem when CONFIG_INFINIBAND_HNS_HIP06=m and
CONFIG_INFINIBAND_HNS_HIP08=y:

drivers/infiniband/hw/hns/hns_roce_main.o:(.rodata+0xe60): undefined reference to `__this_module'

Revert commits a07fc0bb483e ("RDMA/hns: Fix build error") and
a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment") to get back to
the previous state, then fix the issues described there differently, by
adding more specific dependencies: INFINIBAND_HNS can now only be built-in
if at least one of HNS or HNS3 are built-in, and the individual back-ends
are only available if that code is reachable from the main driver.

Fixes: a07fc0bb483e ("RDMA/hns: Fix build error")
Fixes: a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment")
Fixes: dd74282df573 ("RDMA/hns: Initialize the PCI device for hip08 RoCE")
Fixes: 08805fdbeb2d ("RDMA/hns: Split hw v1 driver from hns roce driver")
Link: https://lore.kernel.org/r/20191007211826.3361202-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoMerge branch 'odp_rework' into rdma.git for-next
Jason Gunthorpe [Mon, 28 Oct 2019 19:44:35 +0000 (16:44 -0300)]
Merge branch 'odp_rework' into rdma.git for-next

Jason Gunthorpe says:

====================
In order to hoist the interval tree code out of the drivers and into the
mmu_notifiers it is necessary for the drivers to not use the interval tree
for other things.

This series replaces the interval tree with an xarray and along the way
re-aligns all the locking to use a sensible SRCU model where the 'update'
step is done by modifying an xarray.

The result is overall much simpler and with less locking in the critical
path. Many functions were reworked for clarity and small details like
using 'imr' to refer to the implicit MR make the entire code flow here
more readable.

This also squashes at least two race bugs on its own, and quite possibily
more that haven't been identified.
====================

Merge conflicts with the odp statistics patch resolved.

* branch 'odp_rework':
  RDMA/odp: Remove broken debugging call to invalidate_range
  RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
  RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
  RDMA/mlx5: Rework implicit ODP destroy
  RDMA/mlx5: Avoid double lookups on the pagefault path
  RDMA/mlx5: Reduce locking in implicit_mr_get_data()
  RDMA/mlx5: Use an xarray for the children of an implicit ODP
  RDMA/mlx5: Split implicit handling from pagefault_mr
  RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
  RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
  RDMA/mlx5: Rework implicit_mr_get_data
  RDMA/mlx5: Delete struct mlx5_priv->mkey_table
  RDMA/mlx5: Use a dedicated mkey xarray for ODP
  RDMA/mlx5: Split sig_err MR data into its own xarray
  RDMA/mlx5: Use SRCU properly in ODP prefetch

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/odp: Remove broken debugging call to invalidate_range
Jason Gunthorpe [Wed, 9 Oct 2019 16:09:35 +0000 (13:09 -0300)]
RDMA/odp: Remove broken debugging call to invalidate_range

invalidate_range() also obtains the umem_mutex which is being held at this
point, so if this path were was ever called it would deadlock. Thus
conclude the debugging never triggers and rework it into a simple WARN_ON
and leave things as they are.

While here add a note to explain how we could possibly get inconsistent
page pointers.

Link: https://lore.kernel.org/r/20191009160934.3143-16-jgg@ziepe.ca
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
Jason Gunthorpe [Wed, 9 Oct 2019 16:09:34 +0000 (13:09 -0300)]
RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy

For creation, as soon as the umem_odp is created the notifier can be
called, however the underlying MR may not have been setup yet. This would
cause problems if mlx5_ib_invalidate_range() runs. There is some
confusing/ulocked/racy code that might by trying to solve this, but
without locks it isn't going to work right.

Instead trivially solve the problem by short-circuiting the invalidation
if there are not yet any DMA mapped pages. By definition there is nothing
to invalidate in this case.

The create code will have the umem fully setup before anything is DMA
mapped, and npages is fully locked by the umem_mutex.

For destroy, invalidate the entire MR at the HW to stop DMA then DMA unmap
the pages before destroying the MR. This drives npages to zero and
prevents similar racing with invalidate while the MR is undergoing
destruction.

Arguably it would be better if the umem was created after the MR and
destroyed before, but that would require a big rework of the MR code.

Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191009160934.3143-15-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
Jason Gunthorpe [Wed, 9 Oct 2019 16:09:33 +0000 (13:09 -0300)]
RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray

These mkeys are entirely internal and are never used by the HW for
page fault. They should also never be used by userspace for prefetch.
Simplify & optimize things by not including them in the xarray.

Since the prefetch path can now never see a child mkey there is no need
for the second synchronize_srcu() during imr destroy.

Link: https://lore.kernel.org/r/20191009160934.3143-14-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mlx5: Rework implicit ODP destroy
Jason Gunthorpe [Wed, 9 Oct 2019 16:09:32 +0000 (13:09 -0300)]
RDMA/mlx5: Rework implicit ODP destroy

Use SRCU in a sensible way by removing all MRs in the implicit tree from
the two xarrays (the update operation), then a synchronize, followed by a
normal single threaded teardown.

This is only a little unusual from the normal pattern as there can still
be some work pending in the unbound wq that may also require a workqueue
flush. This is tracked with a single atomic, consolidating the redundant
existing atomics and wait queue.

For understand-ability the entire ODP implicit create/destroy flow now
largely exists in a single pair of functions within odp.c, with a few
support functions for tearing down an unused child.

Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mlx5: Avoid double lookups on the pagefault path
Jason Gunthorpe [Wed, 9 Oct 2019 16:09:31 +0000 (13:09 -0300)]
RDMA/mlx5: Avoid double lookups on the pagefault path

Now that the locking is simplified combine pagefault_implicit_mr() with
implicit_mr_get_data() so that we sweep over the idx range only once,
and do the single xlt update at the end, after the child umems are
setup.

This avoids double iteration/xa_loads plus the sketchy failure path if the
xa_load() fails.

Link: https://lore.kernel.org/r/20191009160934.3143-12-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mlx5: Reduce locking in implicit_mr_get_data()
Jason Gunthorpe [Wed, 9 Oct 2019 16:09:30 +0000 (13:09 -0300)]
RDMA/mlx5: Reduce locking in implicit_mr_get_data()

Now that the child MRs are stored in an xarray we can rely on the SRCU
lock to protect the xa_load and use xa_cmpxchg on the slow allocation path
to resolve races with concurrent page fault.

This reduces the scope of the critical section of umem_mutex for implicit
MRs to only cover mlx5_ib_update_xlt, and avoids taking a lock at all if
the child MR is already in the xarray. This makes it consistent with the
normal ODP MR critical section for umem_lock, and the locking approach
used for destroying an unusued implicit child MR.

The MLX5_IB_UPD_XLT_ATOMIC is no longer needed in implicit_get_child_mr()
since it is no longer called with any locks.

Link: https://lore.kernel.org/r/20191009160934.3143-11-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mlx5: Use an xarray for the children of an implicit ODP
Jason Gunthorpe [Wed, 9 Oct 2019 16:09:29 +0000 (13:09 -0300)]
RDMA/mlx5: Use an xarray for the children of an implicit ODP

Currently the child leaves are stored in the shared interval tree and
every lookup for a child must be done under the interval tree rwsem.

This is further complicated by dropping the rwsem during iteration (ie the
odp_lookup(), odp_next() pattern), which requires a very tricky an
difficult to understand locking scheme with SRCU.

Instead reserve the interval tree for the exclusive use of the mmu
notifier related code in umem_odp.c and give each implicit MR a xarray
containing all the child MRs.

Since the size of each child is 1GB of VA, a 1 level xarray will index 64G
of VA, and a 2 level will index 2TB, making xarray a much better
data structure choice than an interval tree.

The locking properties of xarray will be used in the next patches to
rework the implicit ODP locking scheme into something simpler.

At this point, the xarray is locked by the implicit MR's umem_mutex, and
read can also be locked by the odp_srcu.

Link: https://lore.kernel.org/r/20191009160934.3143-10-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mlx5: Split implicit handling from pagefault_mr
Jason Gunthorpe [Wed, 9 Oct 2019 16:09:28 +0000 (13:09 -0300)]
RDMA/mlx5: Split implicit handling from pagefault_mr

The single routine has a very confusing scheme to advance to the next
child MR when working on an implicit parent. This scheme can only be used
when working with an implicit parent and must not be triggered when
working on a normal MR.

Re-arrange things by directly putting all the single-MR stuff into one
function and calling it in a loop for the implicit case. Simplify some of
the error handling in the new pagefault_real_mr() to remove unneeded gotos.

Link: https://lore.kernel.org/r/20191009160934.3143-9-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
Jason Gunthorpe [Wed, 9 Oct 2019 16:09:27 +0000 (13:09 -0300)]
RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree

Instead of rewriting all the IOVA's to 0 as things progress down the tree
make the IOVA of the children equal to placement in the tree. This makes
things easier to understand by keeping mmkey.iova == HW configuration.

Link: https://lore.kernel.org/r/20191009160934.3143-8-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>