]> www.infradead.org Git - users/hch/dma-mapping.git/log
users/hch/dma-mapping.git
8 years agoIB/mlx5: Support IB_SRQT_TM
Artemy Kovalyov [Thu, 17 Aug 2017 12:52:11 +0000 (15:52 +0300)]
IB/mlx5: Support IB_SRQT_TM

Pass to mlx5_core flag to enable rendezvous offload, list_size and CQ
when SRQ created with IB_SRQT_TM.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/mlx5: Add XRQ support
Artemy Kovalyov [Thu, 17 Aug 2017 12:52:10 +0000 (15:52 +0300)]
net/mlx5: Add XRQ support

Add support to new XRQ(eXtended shared Receive Queue)
hardware object. It supports SRQ semantics with addition
of extended receive buffers topologies and offloads.

Currently supports tag matching topology and rendezvouz offload.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Fill XRQ capabilities
Artemy Kovalyov [Thu, 17 Aug 2017 12:52:09 +0000 (15:52 +0300)]
IB/mlx5: Fill XRQ capabilities

Provide driver specific values for XRQ capabilities.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/uverbs: Expose XRQ capabilities
Artemy Kovalyov [Thu, 17 Aug 2017 12:52:08 +0000 (15:52 +0300)]
IB/uverbs: Expose XRQ capabilities

Make XRQ capabilities available via ibv_query_device() verb.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/uverbs: Add new SRQ type IB_SRQT_TM
Artemy Kovalyov [Thu, 17 Aug 2017 12:52:07 +0000 (15:52 +0300)]
IB/uverbs: Add new SRQ type IB_SRQT_TM

Add new SRQ type capable of new tag matching feature.

When SRQ receives a message it will search through the matching list
for the corresponding posted receive buffer. The process of searching
the matching list is called tag matching.

In case the tag matching results in a match, the received message will
be placed in the address specified by the receive buffer. In case no
match was found the message will be placed in a generic buffer until the
corresponding receive buffer will be posted. These messages are called
unexpected and their set is called an unexpected list.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/uverbs: Add XRQ creation parameter to UAPI
Artemy Kovalyov [Thu, 17 Aug 2017 12:52:06 +0000 (15:52 +0300)]
IB/uverbs: Add XRQ creation parameter to UAPI

Add tm_list_size parameter to struct ib_uverbs_create_xsrq.
If SRQ type is tag-matching this field defines maximum size
of tag matching list. Otherwise, it is expected to be zero.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Add new SRQ type IB_SRQT_TM
Artemy Kovalyov [Thu, 17 Aug 2017 12:52:05 +0000 (15:52 +0300)]
IB/core: Add new SRQ type IB_SRQT_TM

This patch adds new SRQ type - IB_SRQT_TM. The new SRQ type supports tag
matching and rendezvous offloads for MPI applications.

When SRQ receives a message it will search through the matching list
for the corresponding posted receive buffer. The process of searching
the matching list is called tag matching.
In case the tag matching results in a match, the received message will
be placed in the address specified by the receive buffer. In case no
match was found the message will be placed in a generic buffer until the
corresponding receive buffer will be posted. These messages are called
unexpected and their set is called an unexpected list.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Separate CQ handle in SRQ context
Artemy Kovalyov [Thu, 17 Aug 2017 12:52:04 +0000 (15:52 +0300)]
IB/core: Separate CQ handle in SRQ context

Before this change CQ attached to SRQ was part of XRC specific extension.
Moving CQ handle out makes it available to other types extending SRQ
functionality.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Add XRQ capabilities
Artemy Kovalyov [Thu, 17 Aug 2017 12:52:03 +0000 (15:52 +0300)]
IB/core: Add XRQ capabilities

This patch adds following TM XRQ capabilities:

* max_rndv_hdr_size - Max size of rendezvous request message
* max_num_tags - Max number of entries in tag matching list
* max_ops - Max number of outstanding list operations
* max_sge - Max number of SGE in tag matching entry
* flags - the following flags are currently defined:
    - IB_TM_CAP_RC - Support tag matching on RC transport

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/mlx5: Update HW layout definitions
Artemy Kovalyov [Tue, 15 Aug 2017 08:59:02 +0000 (11:59 +0300)]
net/mlx5: Update HW layout definitions

* add offload_type field to mlx5_ifc_qpc_bits
* update mlx5_ifc_xrqc_bits layout

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rxe: Handle NETDEV_CHANGE events
Andrew Boyer [Mon, 28 Aug 2017 20:11:59 +0000 (16:11 -0400)]
IB/rxe: Handle NETDEV_CHANGE events

Without this fix, ports configured on top of ixgbe miss link up
notifications. ibv_query_port() will continue to return IBV_PORT_DOWN even
though the port is up and working.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rxe: Avoid ICRC errors by copying into the skb first
Andrew Boyer [Mon, 28 Aug 2017 20:11:58 +0000 (16:11 -0400)]
IB/rxe: Avoid ICRC errors by copying into the skb first

The current process is to first calculate the CRC and then copy the client
data into the packet. This leaves a window in which the packet contents and
CRC can get out of sync, if the client changes the data after the CRC is
calculated but before the data is copied.

By copying the data into the packet and then calculating the CRC directly
from the packet contents we eliminate the window.

This can be seen with qperf's ud_bi_bw test. This seems like very
strange/reckless client behavior, but whether the client has mangled its
data or not RXE should be able to transfer it reliably.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rxe: Another fix for broken receive queue draining
Andrew Boyer [Mon, 28 Aug 2017 20:11:57 +0000 (16:11 -0400)]
IB/rxe: Another fix for broken receive queue draining

This fixes another path in rxe_requester() that might overlook stale SKBs,
preventing cleanup.

Fixes: 1217197142d1 ("rxe: fix broken receive queue draining")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rxe: Remove unneeded initialization in prepare6()
Andrew Boyer [Mon, 28 Aug 2017 20:11:56 +0000 (16:11 -0400)]
IB/rxe: Remove unneeded initialization in prepare6()

Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rxe: Fix up rxe_qp_cleanup()
Andrew Boyer [Mon, 28 Aug 2017 20:11:55 +0000 (16:11 -0400)]
IB/rxe: Fix up rxe_qp_cleanup()

Replace sk_dst_get()/dst_release() in rxe_qp_cleanup() with sk_dst_reset().
sk_dst_get() takes a new reference on dst, so the dst_release() doesn't
actually release the original reference, which was the design intent.

Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rxe: Add dst_clone() in prepare_ipv6_hdr()
Andrew Boyer [Mon, 28 Aug 2017 20:11:54 +0000 (16:11 -0400)]
IB/rxe: Add dst_clone() in prepare_ipv6_hdr()

Otherwise the reference count goes negative as IPv6 packets complete.

Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rxe: Fix destination cache for IPv6
Andrew Boyer [Mon, 28 Aug 2017 20:11:53 +0000 (16:11 -0400)]
IB/rxe: Fix destination cache for IPv6

To successfully match an IPv6 path, the path cookie must match. Store it
in the QP so that the IPv6 path can be reused.

Replace open-coded version of dst_check() with the actual call, fixing the
logic. The open-coded version skips the check call if dst->obsolete is 0
(DST_OBSOLETE_NONE), proceeding to replace the route. DST_OBSOLETE_NONE
means that the route may continue to be used, though.

Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rxe: Fix up the responder's find_resources() function
Andrew Boyer [Mon, 28 Aug 2017 20:11:52 +0000 (16:11 -0400)]
IB/rxe: Fix up the responder's find_resources() function

The resource array is sized by max_dest_rd_atomic, not max_rd_atomic.
Iterating over max_rd_atomic entries of qp->resp.resources[] will cause
incorrect behavior when the two attributes are different (or even
crash if max_rd_atomic is larger).

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rxe: Remove dangling prototype
Andrew Boyer [Mon, 28 Aug 2017 20:11:51 +0000 (16:11 -0400)]
IB/rxe: Remove dangling prototype

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rxe: Disable completion upcalls when a CQ is destroyed
Andrew Boyer [Mon, 28 Aug 2017 20:11:50 +0000 (16:11 -0400)]
IB/rxe: Disable completion upcalls when a CQ is destroyed

This prevents the stack from accessing userspace objects while they
are being torn down.

One possible sequence of events:
 - Userspace program exits
 - ib_uverbs_cleanup_ucontext() runs, calling ib_destroy_qp(),
   ib_destroy_cq(), etc. and releasing/freeing the UCQ
   - The QP still has tasklets running, so it isn't destroyed yet
   - The CQ is referenced by the QP, so the CQ isn't destroyed yet
   - The UCQ is kfree()'d anyway
 - A send work request completes
 - rxe_send_complete() calls cq->ibcq.comp_handler()
 - ib_uverbs_comp_handler() runs and crashes; the event queue is checked
   for is_closed, but it has no way to check the ib_ucq_object before
   accessing it

The reference counting on the CQ doesn't protect against this since the CQ
hasn't been destroyed yet.
There's no available interface to deregister the UCQ from the CQ, and it
didn't appear that attempting to add reference counting to the UCQ was
going to be a good way to go since this solution is much simpler.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rxe: Move refcounting earlier in rxe_send()
Andrew Boyer [Mon, 28 Aug 2017 20:11:49 +0000 (16:11 -0400)]
IB/rxe: Move refcounting earlier in rxe_send()

The network stack will call nskb's destructor, rxe_skb_tx_dtor(), if the
packet gets dropped by ip_local_out()/ip6_local_out(). Thus we need to add
the QP ref before output to avoid extra dereferences during network
congestion. This could lead to unwanted destruction of the QP.

Fix up the skb_out accounting, too.

Fixes: fda85ce91240 ("IB/rxe: Fix kernel panic from skb destructor")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rdmavt: Handle dereg of inuse MRs properly
Mike Marciniszyn [Mon, 28 Aug 2017 18:24:10 +0000 (11:24 -0700)]
IB/rdmavt: Handle dereg of inuse MRs properly

A destroy of an MR prior to destroying the QP can cause the following
diagnostic if the QP is referencing the MR being de-registered:

hfi1 0000:05:00.0: hfi1_0: rvt_dereg_mr timeout mr ffff8808562108
              00 pd ffff880859b20b00

The solution is to when the a non-zero refcount is encountered when
the MR is destroyed the QPs needs to be iterated looking for QPs in
the same PD as the MR.  If rvt_qp_mr_clean() detects any such QP
references the rkey/lkey, the QP needs to be put into an error state
via a call to rvt_qp_error() which will trigger the clean up of any
stuck references.

This solution is as specified in IBTA 1.3 Volume 1 11.2.10.5.

[This is reproduced with the 0.4.9 version of qperf and the rc_bw test]

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/qib: Convert qp_stats debugfs interface to use new iterator API
Mike Marciniszyn [Mon, 28 Aug 2017 18:24:04 +0000 (11:24 -0700)]
IB/qib: Convert qp_stats debugfs interface to use new iterator API

Continue porting copy/paste code into rdmavt from qib.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Convert qp_stats debugfs interface to use new iterator API
Mike Marciniszyn [Mon, 28 Aug 2017 18:23:58 +0000 (11:23 -0700)]
IB/hfi1: Convert qp_stats debugfs interface to use new iterator API

Continue moving copy/paste code into rdmavt.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Convert hfi1_error_port_qps() to use new QP iterator
Mike Marciniszyn [Mon, 28 Aug 2017 18:23:51 +0000 (11:23 -0700)]
IB/hfi1: Convert hfi1_error_port_qps() to use new QP iterator

Change hfi1_error_port_qps() to use the new rvt_qp_iter() in its QP
scanning.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rdmavt: Add QP iterator API for QPs
Mike Marciniszyn [Mon, 28 Aug 2017 18:23:45 +0000 (11:23 -0700)]
IB/rdmavt: Add QP iterator API for QPs

There are currently 3 spots in the qib and hfi1 driver that have
knowledge of the internal QP hash list that should only be in
scope to rdmavt QP code.

Add an iterator API for processing all QPs to hide the
nature of the RCU hashlist.

The API consists of:
- rvt_qp_iter_init()
  * For iterating QPs one at a time for seq_file semantics
- rvt_qp_iter_next()
  * For iterating QPs one at a time for seq_file semantics
- rvt_qp_iter()
  * For iterating all QPs

The first two are used for things like seq_file prints.

The last is for code that just needs to iterate all QPs
in the system.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Use accessor to determine ring size
Kaike Wan [Mon, 28 Aug 2017 18:23:39 +0000 (11:23 -0700)]
IB/hfi1: Use accessor to determine ring size

The qp_stats print will soon be moving to rdmavt, so use the proper
accessor to get the ring size rather than a driver supplied constant.

Fixes: Commit ff8d836efe06 ("IB/hfi1: Add receiving queue info to qp_stats")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/qib: Stricter bounds checking for copy to buffer
Kamenee Arumugam [Mon, 28 Aug 2017 18:23:33 +0000 (11:23 -0700)]
IB/qib: Stricter bounds checking for copy to buffer

Replace 'strcpy' with 'strncpy' to restrict the number
of bytes copied to the buffer.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hif1: Remove static tracing from SDMA hot path
Michael J. Ruhl [Mon, 28 Aug 2017 18:23:27 +0000 (11:23 -0700)]
IB/hif1: Remove static tracing from SDMA hot path

The hfi1_cdbg() macro can be instantiated in the hot path even when it
is not in use.  This shows up on perf profiles.

Rework the macros (for SDMA and MMU), to use the trace interface directly
to eliminate this performance hit.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Acquire QSFP cable information on loopback
Jan Sokolowski [Mon, 28 Aug 2017 18:23:21 +0000 (11:23 -0700)]
IB/hfi1: Acquire QSFP cable information on loopback

Currently, QSFP information is not queried
in cases where loopback was set up and QSFP module is
present.

Acquire QSFP information in case of loopback.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoi40iw: make some structures const
Bhumika Goyal [Mon, 28 Aug 2017 16:21:23 +0000 (21:51 +0530)]
i40iw: make some structures const

Make some structures const as they are only used during a copy
operation.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: constify vm_operations_struct
Arvind Yadav [Mon, 28 Aug 2017 04:29:28 +0000 (09:59 +0530)]
IB/hfi1: constify vm_operations_struct

vm_operations_struct are not supposed to change at runtime.
vm_area_struct structure working with const vm_operations_struct.
So mark the non-const vm_operations_struct structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/bnxt_re: remove unnecessary call to memset
Himanshu Jha [Fri, 25 Aug 2017 16:11:02 +0000 (21:41 +0530)]
RDMA/bnxt_re: remove unnecessary call to memset

call to memset to assign 0 value immediately after allocating
memory with kzalloc is unnecesaary as kzalloc allocates the memory
filled with 0 value.

Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/usnic: check for allocation failure
Dan Carpenter [Fri, 25 Aug 2017 08:43:59 +0000 (11:43 +0300)]
IB/usnic: check for allocation failure

usnic_uiom_get_dev_list() can return ERR_PTR(-ENOMEM) so we should check
for that.

Fixes: e3cf00d0a87f ("IB/usnic: Add Cisco VIC low-level hardware driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add opcode states to qp_stats
Mike Marciniszyn [Tue, 22 Aug 2017 01:27:48 +0000 (18:27 -0700)]
IB/hfi1: Add opcode states to qp_stats

These fields allow for debugging send engine processing.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add received request info to qp_stats
Kaike Wan [Tue, 22 Aug 2017 01:27:41 +0000 (18:27 -0700)]
IB/hfi1: Add received request info to qp_stats

The rvt_ack_entry pointed to by s_tail_ack_queue provides important
info about the request that has just been processed or is being processed
on the responder side of a RC connection. This patch adds this info to
the qp_stats to assist debugging.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Fix whitespace alignment issue for MAD
Kamenee Arumugam [Tue, 22 Aug 2017 01:27:35 +0000 (18:27 -0700)]
IB/hfi1: Fix whitespace alignment issue for MAD

Fix a tab alignment issue present in pr_err_ratelimited
error message.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Move structure and MACRO definitions in user_sdma.c to user_sdma.h
Harish Chegondi [Tue, 22 Aug 2017 01:27:29 +0000 (18:27 -0700)]
IB/hfi1: Move structure and MACRO definitions in user_sdma.c to user_sdma.h

Clean up user_sdma.c by moving the structure and MACRO definitions into
the header file user_sdma.h

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Move structure definitions from user_exp_rcv.c to user_exp_rcv.h
Harish Chegondi [Tue, 22 Aug 2017 01:27:23 +0000 (18:27 -0700)]
IB/hfi1: Move structure definitions from user_exp_rcv.c to user_exp_rcv.h

Clean up user_exp_rcv.c file by moving structure definitions into header
file user_exp_rcv.h. Since these structure definitions depend on the
structure definitions in mmu_rb.h, move #include "mmu_rb.h" above
the include "user_exp_rcv.h" or include of header files that include
user_exp_rcv.h

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Remove duplicate definitions of num_user_pages() function
Harish Chegondi [Tue, 22 Aug 2017 01:27:16 +0000 (18:27 -0700)]
IB/hfi1: Remove duplicate definitions of num_user_pages() function

num_user_pages() function has been defined in both user_exp_rcv.c file
and user_sdma.c file. Move the function definition to a header file so
there is only one definition in the source repo.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Fix the bail out code in pin_vector_pages() function
Harish Chegondi [Tue, 22 Aug 2017 01:27:09 +0000 (18:27 -0700)]
IB/hfi1: Fix the bail out code in pin_vector_pages() function

In pin_vector_pages() function, if there is any error while pinning the
pages or while adding a pinned buffer to the cache, the bail out code
needs to unpin any pinned pages that are not in the cache and adjust the
n_locked counter that counts the total pages pinned. The current bail
out code doesn't seem to be doing it right in two cases:

1. Before pinning required pages for a buffer, the SDMA pinned buffer
cache is searched to see if the virtual address range that needs to be
pinned is already pinned. If there isn't a hit in the cache, a new node
is created for the buffer and is added to the cache after the buffer is
pinned. If adding the new node to the cache fails, the n_locked count is
decremented properly but the pinned pages are not freed. This commit
fixes this issue.

2. If there is a hit in the SDMA cache, but the cached buffer doesn't
have enough pages to cover the entire address range that needs to be
pinned, the node for the cached buffer is extracted from the cache,
remaining pages needed are pinned and added to the node. The node is
finally added back into the cache. If there is an error pinning the
extra pages, the bail out code frees all the pages in the node but the
n_locked count is not being decremented by the no of pages in the node
that are freed. This commit fixes this issue.

This commit fixes the above two issues by creating a new function that
frees the pages in a node and decrements the n_locked count by the
number of pages freed.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Clean up pin_vector_pages() function
Harish Chegondi [Tue, 22 Aug 2017 01:27:03 +0000 (18:27 -0700)]
IB/hfi1: Clean up pin_vector_pages() function

Clean up pin_vector_pages() function by moving page pinning related code
to a separate function since it really stands on its own.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Clean up user_sdma_send_pkts() function
Harish Chegondi [Tue, 22 Aug 2017 01:26:57 +0000 (18:26 -0700)]
IB/hfi1: Clean up user_sdma_send_pkts() function

user_sdma_send_pkts() function is unnecessarily long. Clean it up by
moving some of its code into separate functions.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Clean up hfi1_user_exp_rcv_setup function
Harish Chegondi [Tue, 22 Aug 2017 01:26:51 +0000 (18:26 -0700)]
IB/hfi1: Clean up hfi1_user_exp_rcv_setup function

Clean up hfi1_user_exp_rcv_setup function by moving page pinning and
unpinning related code to separate functions. In order to reduce the
number of parameters passed between functions, a new data structure
struct tid_user_buf is defined and used.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Improve local kmem_cache_alloc performance
Michael J. Ruhl [Tue, 22 Aug 2017 01:26:45 +0000 (18:26 -0700)]
IB/hfi1: Improve local kmem_cache_alloc performance

Performance analysis shows that the cache callback function
sdma_kmem_cache_ctor contributes to 1/2 of the kmem_cache_allocs
time.

Since all of the fields in the allocated data structure are initialized
in the code path, remove the _ctor function.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Ratelimit prints from sdma_interrupt
Grzegorz Morys [Tue, 22 Aug 2017 01:26:38 +0000 (18:26 -0700)]
IB/hfi1: Ratelimit prints from sdma_interrupt

Ratelimit error prints from sdma_interrupt function
that could swarm dmesg otherwise.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Grzegorz Morys <grzegorz.morys@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/qib: Stricter bounds checking for copy and array access
Kamenee Arumugam [Tue, 22 Aug 2017 01:26:32 +0000 (18:26 -0700)]
IB/qib: Stricter bounds checking for copy and array access

Added checking on index value of array 'guids' in qib_ruc.c.
Pass in corrrect size of array for memset operation in qib_mad.c.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/qib: Remove unnecessary memory allocation for boardname
Kamenee Arumugam [Tue, 22 Aug 2017 01:26:26 +0000 (18:26 -0700)]
IB/qib: Remove unnecessary memory allocation for boardname

Remove all the memory allocation implemented for boardname and
directly assign the defined string literal.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/{qib, hfi1}: Avoid flow control testing for RDMA write operation
Mike Marciniszyn [Tue, 22 Aug 2017 01:26:20 +0000 (18:26 -0700)]
IB/{qib, hfi1}: Avoid flow control testing for RDMA write operation

Section 9.7.7.2.5 of the 1.3 IBTA spec clearly says that receive
credits should never apply to RDMA write.

qib and hfi1 were doing that.  The following situation will result
in a QP hang:
- A prior SEND or RDMA_WRITE with immmediate consumed the last
  credit for a QP using RC receive buffer credits
- The prior op is acked so there are no more acks
- The peer ULP fails to post receive for some reason
- An RDMA write sees that the credits are exhausted and waits
- The peer ULP posts receive buffers
- The ULP posts a send or RDMA write that will be hung

The fix is to avoid the credit test for the RDMA write operation.

Cc: <stable@vger.kernel.org>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rdmavt: Use rvt_put_swqe() in rvt_clear_mr_ref()
Mike Marciniszyn [Tue, 22 Aug 2017 01:26:14 +0000 (18:26 -0700)]
IB/rdmavt: Use rvt_put_swqe() in rvt_clear_mr_ref()

hfi1 and qib were converted in previous patches, do the same for rdmavt.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoMerge branch 'mellanox' into k.o/for-next
Doug Ledford [Fri, 25 Aug 2017 00:25:15 +0000 (20:25 -0400)]
Merge branch 'mellanox' into k.o/for-next

Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Report mlx5 enhanced multi packet WQE capability
Bodong Wang [Thu, 17 Aug 2017 12:52:35 +0000 (15:52 +0300)]
IB/mlx5: Report mlx5 enhanced multi packet WQE capability

Expose enhanced multi packet WQE capability to user space through
query_device by uhw.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Allow posting multi packet send WQEs if hardware supports
Bodong Wang [Thu, 17 Aug 2017 12:52:34 +0000 (15:52 +0300)]
IB/mlx5: Allow posting multi packet send WQEs if hardware supports

Set the field to allow posting multi packet send WQEs if hardware
supports this feature. This doesn't mean the send WQEs will be for
multi packet unless the send WQE was prepared according to multi
packet send WQE format.

User space shall use flag MLX5_IB_ALLOW_MPW to check if hardware
supports MPW and allows MPW in SQ context.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Add support for multi underlay QP
Yishai Hadas [Thu, 17 Aug 2017 12:52:33 +0000 (15:52 +0300)]
IB/mlx5: Add support for multi underlay QP

Set underlay QPN as part of flow rule when it's applicable.

There is one root flow table in the NIC RX namespace and all the
underlay QPs steer the traffic to this flow table.
In order to prevent QP to get traffic which is not target to its
underlay QP, we need to set the underlay QP number as part of
the steering matching.

Note:
When multicast traffic is sent the QPN filtering is done by the firmware
as some early step. Adding the QPN match on the flow table entry is
wrong as by that time the target QPN holds the multicast address (e.g.
FF(s)) and it won't match.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Fix integer overflow when page_shift == 31
Ilya Lesokhin [Thu, 17 Aug 2017 12:52:32 +0000 (15:52 +0300)]
IB/mlx5: Fix integer overflow when page_shift == 31

Fix a bug where MR registration fails when mlx5_ib_cont_pages
indicates that the MR can be mapped using 2GB pages (page_shift == 31).

Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Fix memory leak in clean_mr error path
Kamal Heib [Thu, 17 Aug 2017 12:52:31 +0000 (15:52 +0300)]
IB/mlx5: Fix memory leak in clean_mr error path

In clean_mr error path the 'mr' should be freed.

Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Decouple MR allocation and population flows
Ilya Lesokhin [Thu, 17 Aug 2017 12:52:30 +0000 (15:52 +0300)]
IB/mlx5: Decouple MR allocation and population flows

mlx5 compatible devices have two ways of populating the MTT
table of an MKEY: using a FW command and using a UMR WQE.

A UMR is much faster, so it should be used whenever possible.
Unfortunately the code today uses UMR only if the MKEY was allocated
from the MR cache.

Fix the code to use UMR even for MKEYs that were allocated using
a FW command.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Enable UMR for MRs created with reg_create
Ilya Lesokhin [Thu, 17 Aug 2017 12:52:29 +0000 (15:52 +0300)]
IB/mlx5: Enable UMR for MRs created with reg_create

This patch is the first step in decoupling UMR usage and
allocation from the MR cache. The only functional change
in this patch is to enables UMR for MRs created with
reg_create.

This change fixes a bug where ODP memory regions that
were not allocated from the MR cache did not have UMR
enabled.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Expose software parsing for Raw Ethernet QP
Noa Osherovich [Thu, 17 Aug 2017 12:52:28 +0000 (15:52 +0300)]
IB/mlx5: Expose software parsing for Raw Ethernet QP

Software parsing (SWP) is a feature that can be used to instruct the
device to stop using its internal parser and to parse packets on the
transmit path according to offsets set for each packets.

Through this feature, the device allows the handling of checksum and
LSO by the hardware according to the location of IP and TCP/UDP
headers.

Enable SW parsing on Raw Ethernet send queue by default if firmware
supports it and report these capabilities to user space.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/i40iw: Remove unused argument
Yuval Shaia [Thu, 24 Aug 2017 17:11:42 +0000 (20:11 +0300)]
RDMA/i40iw: Remove unused argument

None of the calls to i40iw_netdev_vlan_ipv6 are using mac so let's
remove it from func's args-list.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/qedr: fix spelling mistake: "invlaid" -> "invalid"
Colin Ian King [Thu, 24 Aug 2017 08:25:53 +0000 (09:25 +0100)]
RDMA/qedr: fix spelling mistake: "invlaid" -> "invalid"

Trivial fix to spelling mistake in DP_ERR error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB: Avoid ib_modify_port() failure for RoCE devices
Selvin Xavier [Wed, 23 Aug 2017 08:08:07 +0000 (01:08 -0700)]
IB: Avoid ib_modify_port() failure for RoCE devices

IB CM calls ib_modify_port() irrespective of link layer. If the
failure is returned, the mad agent gets unregistered for those
devices. Recently, modify_port() hook was removed from some of the
low level drivers as it was always returning success. This breaks
rdma connection establishment over those devices.
For ethernet devices, Qkey violation and port capabilities are not
applicable. So returning success for RoCE when modify_port hook is
is not implemented.

Cc: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/vmw_pvrdma: Update device query parameters and port caps
Adit Ranadive [Wed, 23 Aug 2017 06:19:01 +0000 (23:19 -0700)]
RDMA/vmw_pvrdma: Update device query parameters and port caps

Added support for two device caps - max_sge_rd, max_fast_reg_page_list_len
and the IP_BASED_GIDS port cap flag.

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/vmw_pvrdma: Add RoCEv2 support
Bryan Tan [Wed, 23 Aug 2017 06:19:00 +0000 (23:19 -0700)]
RDMA/vmw_pvrdma: Add RoCEv2 support

The driver version is bumped for compatibility purposes. Also, send correct
GID type during register to device. Added compatibility check macros for
the device.

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/ipoib: Enable ioctl for to IPoIB rdma netdevs
Feras Daoud [Wed, 23 Aug 2017 05:37:21 +0000 (08:37 +0300)]
IB/ipoib: Enable ioctl for to IPoIB rdma netdevs

Adds support for ioctl callback in the RDMA netdevs to allow
supporting functions not handled by the generic interface code.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/nes: Remove zeroed parameter from port query callback
Leon Romanovsky [Thu, 17 Aug 2017 12:50:55 +0000 (15:50 +0300)]
RDMA/nes: Remove zeroed parameter from port query callback

There is no need to explicitly zero parameters, because
the structure requested to be filled already initialized to zeros.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/mlx4: Properly annotate link layer variable
Leon Romanovsky [Thu, 17 Aug 2017 12:50:54 +0000 (15:50 +0300)]
RDMA/mlx4: Properly annotate link layer variable

The rdma_port_get_link_layer() returns enum rdma_link_layer as
a return value, hence it is better to store the return value in
specially annotated variable and not in int.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/mlx5: Limit scope of get vector affinity local function
Leon Romanovsky [Thu, 17 Aug 2017 12:50:53 +0000 (15:50 +0300)]
RDMA/mlx5: Limit scope of get vector affinity local function

The mlx5_ib_get_vector_affinity() call is local to main.c file and there
is no need to be declared globally visible.

Fixes: 40b24403f33e ("mlx5: support ->get_vector_affinity")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rxe: Make rxe_counter_name static
Kamal Heib [Thu, 17 Aug 2017 12:50:52 +0000 (15:50 +0300)]
IB/rxe: Make rxe_counter_name static

rxe_counter_name is used in rxe_hw_counters.c only. Make it static.

Fixes: 0b1e5b99a48b ('IB/rxe: Add port protocol stats')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/ipoib: Sync between remove_one to sysfs calls that use rtnl_lock
Erez Shitrit [Thu, 17 Aug 2017 12:50:50 +0000 (15:50 +0300)]
IB/ipoib: Sync between remove_one to sysfs calls that use rtnl_lock

In order to avoid deadlock between sysfs functions (like create/delete
child) and remove_one (both of them are using the sysfs lock and
rtnl_lock) the driver will use a state mutex for sync.

That will fix traces as the following:
schedule+0x3e/0x90
kernfs_drain+0x75/0xf0
? wait_woken+0x90/0x90
__kernfs_remove+0x12e/0x1c0
kernfs_remove+0x25/0x40
sysfs_remove_dir+0x57/0x90
kobject_del+0x22/0x60
device_del+0x195/0x230
 pm_runtime_set_memalloc_noio+0xac/0xf0
netdev_unregister_kobject+0x71/0x80
rollback_registered_many+0x205/0x2f0
rollback_registered+0x31/0x40
unregister_netdevice_queue+0x58/0xb0
unregister_netdev+0x20/0x30
ipoib_remove_one+0xb7/0x240 [ib_ipoib]
ib_unregister_device+0xbc/0x1b0 [ib_core]
ib_unregister_mad_agent+0x29/0x30 [ib_core]
mlx4_ib_remove+0x67/0x280 [mlx4_ib]
INFO: task echo:24082 blocked for more than 120 seconds.
Tainted: G           OE   4.1.12-37.5.1.el6uek.x86_64 #2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
Call Trace:
schedule+0x3e/0x90
schedule_preempt_disabled+0xe/0x10
__mutex_lock_slowpath+0x95/0x110
? _rcu_barrier+0x177/0x220
mutex_lock+0x23/0x40
rtnl_lock+0x15/0x20
netdev_run_todo+0x81/0x1f0
rtnl_unlock+0xe/0x10
ipoib_vlan_delete+0x12f/0x1c0 [ib_ipoib]
delete_child+0x69/0x80 [ib_ipoib]
dev_attr_store+0x20/0x30
sysfs_kf_write+0x41/0x50

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Check that reserved fields in mlx4_ib_create_qp_rss are zero
Guy Levi [Thu, 17 Aug 2017 12:50:49 +0000 (15:50 +0300)]
IB/mlx4: Check that reserved fields in mlx4_ib_create_qp_rss are zero

According to mlx4 convention, need to fail the command due to a non-zero
value in the user data which is expected to be zero.

Fixes: 3078f5f1bd8b ("IB/mlx4: Add support for RSS QP")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Remove redundant attribute in mlx4_ib_create_qp_rss struct
Guy Levi [Thu, 17 Aug 2017 12:50:48 +0000 (15:50 +0300)]
IB/mlx4: Remove redundant attribute in mlx4_ib_create_qp_rss struct

rx_key_len is not in use and needs to be removed.

Fixes: 3078f5f1bd8b ("IB/mlx4: Add support for RSS QP")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Fix struct mlx4_ib_create_wq alignment
Guy Levi [Thu, 17 Aug 2017 12:50:47 +0000 (15:50 +0300)]
IB/mlx4: Fix struct mlx4_ib_create_wq alignment

The mlx4 ABI defines to have structures with alignment of 64B.

Fixes: 400b1ebcfe31 ("IB/mlx4: Add support for WQ related verbs")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Fix RSS QP type in creation verb
Guy Levi [Thu, 17 Aug 2017 12:50:46 +0000 (15:50 +0300)]
IB/mlx4: Fix RSS QP type in creation verb

The mlx4 was designed to support QP type of MLX4_IB_QPT_RAW_PACKET.

Fixes: 3078f5f1bd8b ("IB/mlx4: Add support for RSS QP")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Add necessary delay drop assignment
Maor Gottlieb [Thu, 17 Aug 2017 12:50:45 +0000 (15:50 +0300)]
IB/mlx5: Add necessary delay drop assignment

Assign the statistics and configuration structure pointer on success.

Fixes: fe248c3a5837 ('IB/mlx5: Add delay drop configuration and statistics')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Fix some spelling mistakes
Talat Batheesh [Thu, 17 Aug 2017 12:50:44 +0000 (15:50 +0300)]
IB/mlx5: Fix some spelling mistakes

Fix spelling mistakes in remarks
    "retrun"->"return"
    "Decalring"->"Declaring"

Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Fix some spelling mistakes
Talat Batheesh [Thu, 17 Aug 2017 12:50:43 +0000 (15:50 +0300)]
IB/mlx4: Fix some spelling mistakes

Fix spelling mistakes in remarks
    "retrun"->"return"
    "cancell"->"cancel"

Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/mthca: Make explicit conversion to 64bit value
Leon Romanovsky [Thu, 17 Aug 2017 12:50:42 +0000 (15:50 +0300)]
RDMA/mthca: Make explicit conversion to 64bit value

The "lg" variable is declared as int so in all places where
this variable is used as a shift operand, the output will be
int too.

This produces the following smatch warning:
drivers/infiniband/hw/mthca/mthca_cmd.c:701 mthca_map_cmd() warn:
should '1 << lg' be a 64 bit type?

Simple declaration of "1" to be "1ULL" will fix the issue.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/usnic: Fix remove address space warning
Leon Romanovsky [Thu, 17 Aug 2017 12:50:41 +0000 (15:50 +0300)]
RDMA/usnic: Fix remove address space warning

Sparse tool complains with the following error:
drivers/infiniband/hw/usnic/usnic_ib_main.c:445:16: warning: cast removes
address space of expression

Fix it by doing casting on correct field and convert function helper which
sets ifaddr to be void, because there are no users who are interested in
returned value.

Fixes: c7845bcafe4d ("IB/usnic: Add UDP support in u*verbs.c, u*main.c and u*util.h")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/mlx4: Remove gfp_mask argument from acquire_group call
Leon Romanovsky [Thu, 17 Aug 2017 12:50:40 +0000 (15:50 +0300)]
RDMA/mlx4: Remove gfp_mask argument from acquire_group call

All callers of acquire_group() passed the same gfp_mask to it
and it is safe to remove it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/core: Refactor get link layer wrapper
Leon Romanovsky [Thu, 17 Aug 2017 12:50:39 +0000 (15:50 +0300)]
RDMA/core: Refactor get link layer wrapper

The return values from rdma_node_get_transport() are strict
and IB_LINK_LAYER_UNSPECIFIED is unreachable in this flow.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/core: Delete BUG() from unreachable flow
Leon Romanovsky [Thu, 17 Aug 2017 12:50:38 +0000 (15:50 +0300)]
RDMA/core: Delete BUG() from unreachable flow

Remove call to BUG() in case wrong node_type was provided.
This flow is unreachable, because node_types are supplied
from specific enum.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/core: Cleanup device capability enum
Leon Romanovsky [Thu, 17 Aug 2017 12:50:37 +0000 (15:50 +0300)]
RDMA/core: Cleanup device capability enum

Cleanup patch prior exporting the ib_device_cap_flags
to the user space. In this patch, we are aligning the
indentation, removing IB_DEVICE_INIT_TYPE and IB_DEVICE_RESERVED
fields, because it is not used in the kernel.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/(core, ulp): Convert register/unregister event handler to be void
Leon Romanovsky [Thu, 17 Aug 2017 12:50:36 +0000 (15:50 +0300)]
RDMA/(core, ulp): Convert register/unregister event handler to be void

The functions ib_register_event_handler() and
ib_unregister_event_handler() always returned success and they can't fail.

Let's convert those functions to be void, remove redundant checks and
cleanup tons of goto statements.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/mlx4: Fix create qp command alignment
Maor Gottlieb [Thu, 17 Aug 2017 12:50:35 +0000 (15:50 +0300)]
RDMA/mlx4: Fix create qp command alignment

Avoid extra padding by replacing the order of inl_recv_sz and reserved,
otherwise 'mlx4_ib_create_qp' structure might be larger than legacy user
input leading to copy of some garbage data from the user space buffer.

Fixes: ea30b966f7dd ('IB/mlx4: Add inline-receive support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/mlx4: Don't use uninitialized variable
Leon Romanovsky [Thu, 17 Aug 2017 12:50:34 +0000 (15:50 +0300)]
RDMA/mlx4: Don't use uninitialized variable

Avoid usage of uninitialized variable.

Fixes: 3078f5f1bd8b ("IB/mlx4: Add support for RSS QP")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/uverbs: Introduce and use helper functions to copy ah attributes
Parav Pandit [Thu, 17 Aug 2017 12:50:33 +0000 (15:50 +0300)]
IB/uverbs: Introduce and use helper functions to copy ah attributes

This patch introduces two helper functions to copy ah attributes
from uverbs to internal ib_ah_attr structure and the other way
during modify qp and query qp respectively.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cma: Fix erroneous validation of supported default GID type
Leon Romanovsky [Thu, 17 Aug 2017 12:50:32 +0000 (15:50 +0300)]
IB/cma: Fix erroneous validation of supported default GID type

When rdma_cm is initializing a cma_device it checks if this device
supports the preferred default GID type. This check was done in a wrong way
and therefore sometimes rdma_cm is coming up with default GID type that is
not supported by the device.

Fix that by checking for supported GID type properly.

Fixes: 3c7f67d1880d ("IB/cma: Fix default RoCE type setting")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoMerge branch 'k.o/for-4.13-rc' into k.o/for-next
Doug Ledford [Thu, 24 Aug 2017 19:58:26 +0000 (15:58 -0400)]
Merge branch 'k.o/for-4.13-rc' into k.o/for-next

Pick up -rc fixes.

Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Always return success for RoCE modify port
Majd Dibbiny [Wed, 23 Aug 2017 05:35:42 +0000 (08:35 +0300)]
IB/mlx5: Always return success for RoCE modify port

CM layer calls ib_modify_port() regardless of the link layer.

For the Ethernet ports, qkey violation and Port capabilities
are meaningless. Therefore, always return success for ib_modify_port
calls on the Ethernet ports.

Cc: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Fix Raw Packet QP event handler assignment
Majd Dibbiny [Wed, 23 Aug 2017 05:35:41 +0000 (08:35 +0300)]
IB/mlx5: Fix Raw Packet QP event handler assignment

In case we have SQ and RQ for Raw Packet QP, the SQ's event handler
wasn't assigned.

Fixing this by assigning event handler for each WQ after creation.

[ 1877.145243] Call Trace:
[ 1877.148644] <IRQ>
[ 1877.150580] [<ffffffffa07987c5>] ? mlx5_rsc_event+0x105/0x210 [mlx5_core]
[ 1877.159581] [<ffffffffa0795bd7>] ? mlx5_cq_event+0x57/0xd0 [mlx5_core]
[ 1877.167137] [<ffffffffa079208e>] mlx5_eq_int+0x53e/0x6c0 [mlx5_core]
[ 1877.174526] [<ffffffff8101a679>] ? sched_clock+0x9/0x10
[ 1877.180753] [<ffffffff810f717e>] handle_irq_event_percpu+0x3e/0x1e0
[ 1877.188014] [<ffffffff810f735d>] handle_irq_event+0x3d/0x60
[ 1877.194567] [<ffffffff810f9fe7>] handle_edge_irq+0x77/0x130
[ 1877.201129] [<ffffffff81014c3f>] handle_irq+0xbf/0x150
[ 1877.207244] [<ffffffff815ed78a>] ? atomic_notifier_call_chain+0x1a/0x20
[ 1877.214829] [<ffffffff815f434f>] do_IRQ+0x4f/0xf0
[ 1877.220498] [<ffffffff815e94ad>] common_interrupt+0x6d/0x6d
[ 1877.227025] <EOI>
[ 1877.228967] [<ffffffff814834e2>] ? cpuidle_enter_state+0x52/0xc0
[ 1877.236990] [<ffffffff81483615>] cpuidle_idle_call+0xc5/0x200
[ 1877.243676] [<ffffffff8101bc7e>] arch_cpu_idle+0xe/0x30
[ 1877.249831] [<ffffffff810b4725>] cpu_startup_entry+0xf5/0x290
[ 1877.256513] [<ffffffff815cfee1>] start_secondary+0x265/0x27b
[ 1877.263111] Code: Bad RIP value.
[ 1877.267296] RIP [< (null)>] (null)
[ 1877.273264] RSP <ffff88046fd63df8>
[ 1877.277531] CR2: 0000000000000000

Fixes: 19098df2da78 ("IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Avoid accessing non-allocated memory when inferring port type
Noa Osherovich [Wed, 23 Aug 2017 05:35:40 +0000 (08:35 +0300)]
IB/core: Avoid accessing non-allocated memory when inferring port type

Commit 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
introduced the concept of type in ah_attr:
 * During ib_register_device, each port is checked for its type which
   is stored in ib_device's port_immutable array.
 * During uverbs' modify_qp, the type is inferred using the port number
   in ib_uverbs_qp_dest struct (address vector) by accessing the
   relevant port_immutable array and the type is passed on to
   providers.

IB spec (version 1.3) enforces a valid port value only in Reset to
Init. During Init to RTR, the address vector must be valid but port
number is not mentioned as a field in the address vector, so its
value is not validated, which leads to accesses to a non-allocated
memory when inferring the port type.

Save the real port number in ib_qp during modify to Init (when the
comp_mask indicates that the port number is valid) and use this value
to infer the port type.

Avoid copying the address vector fields if the matching bit is not set
in the attr_mask. Address vector can't be modified before the port, so
no valid flow is affected.

Fixes: 44c58487d51a ('IB/core: Define 'ib' and 'roce' rdma_ah_attr types')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agordma: Autoload netlink client modules
Jason Gunthorpe [Mon, 14 Aug 2017 20:57:39 +0000 (14:57 -0600)]
rdma: Autoload netlink client modules

If a message comes in and we do not have the client in the table, then
try to load the module supplying that client using MODULE_ALIAS to find
it.

This duplicates the scheme seen in other netlink muxes (eg nfnetlink).

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agordma: Allow demand loading of NETLINK_RDMA
Jason Gunthorpe [Mon, 14 Aug 2017 20:57:38 +0000 (14:57 -0600)]
rdma: Allow demand loading of NETLINK_RDMA

Provide a module alias so that if userspace opens a netlink
socket for RDMA the kernel support is loaded automatically.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: use kvmalloc_array to allocate wrid
Li Dongyang [Wed, 16 Aug 2017 13:31:23 +0000 (23:31 +1000)]
IB/mlx4: use kvmalloc_array to allocate wrid

We could use kvmalloc_array instead of the
kmalloc and __vmalloc combination.
After this we don't need to include linux/vmalloc.h

Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: use kvmalloc_array for mlx5_ib_wq
Li Dongyang [Wed, 16 Aug 2017 13:31:22 +0000 (23:31 +1000)]
IB/mlx5: use kvmalloc_array for mlx5_ib_wq

We observed multiple times on our Lustre OSS servers that when
the system memory is fragmented, kmalloc() in create_kernel_qp()
could fail order 4/5 allocations while we still have many free pages.

Switch to kvmalloc_array() to allow the operation to contine.

Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA: Fix return value check for ib_get_eth_speed()
Selvin Xavier [Thu, 17 Aug 2017 14:58:07 +0000 (07:58 -0700)]
RDMA: Fix return value check for ib_get_eth_speed()

ib_get_eth_speed() return 0 on success. Fixing the condition checking
and prevent reporting failure for query_port verb.

Fixes: d41861942fc5 ("Add generic function to extract IB speed from netdev")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/pvrdma: Remove unused function
Yuval Shaia [Thu, 10 Aug 2017 21:12:11 +0000 (00:12 +0300)]
IB/pvrdma: Remove unused function

The function pvrdma_idx_ring_is_valid_idx is not in used so let's remove
it.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoi40iw: Improve CQP timeout logic
Shiraz Saleem [Wed, 9 Aug 2017 01:38:45 +0000 (20:38 -0500)]
i40iw: Improve CQP timeout logic

The current timeout logic for Control Queue-Pair (CQP) OPs
does not take into account whether CQP makes progress but
rather blindly waits for a large timeout value, 100000 jiffies
for the completion event. Improve this by setting the timeout
based on whether the CQP is making progress or not. If the CQP
is hung, the timeout will happen sooner, in 5000 jiffies. Each
time the CQP progress is detetcted, the timeout extends by 5000
jiffies.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Christopher N Bednarz <christopher.n.bednarz@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add kernel receive context info to debugfs
Kaike Wan [Sun, 13 Aug 2017 15:09:04 +0000 (08:09 -0700)]
IB/hfi1: Add kernel receive context info to debugfs

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>