www.infradead.org Git - users/jedix/linux-maple.git/log

xsigo: supported SGE's for LSO QP

Orabug: 25029868

PSIF supports 15 SGE's in case LSO QP linearize skb
if number of frags is greater than allowed sge's

Reported-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: viswa krishnamurthy <viswa.krishnamurthy@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

xsigo: Hardening driver in handling remote QP failures

Orabug: 24929076

Handle scenario's where PSIF generates batched
transmit completions.

In case of Remote QP disconnect follow this sequence
-> If there are any pending transmit completion
explicitly transition QP state to ERROR.
-> Wait for a maximum of 10 seconds for all pending
completions
(10 seconds derived from retrycount * local ack timeout)
Destroy QP
Wait for another 10 seconds(max) if completions are
not returned by hardware.

Synchronized calls to poll_tx

Added more efficiency in handling of TX queue full condtion.

Handle scenario where uVNIC removal can come in batches

Reported-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: Aravind Kini <aravind.kini@oracle.com>
Reviewed-by: viswa krishnamurthy <viswa.krishnamurthy@oracle.com>
Reviewed-by: Manish Kumar Singh <mk.singh@oracle.com>
Reviewed-by: Ariel cohen <ariel.x.cohen@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

IB/cm: avoid query device in CM REQ/REP

The query device needed in CM REQ/REP is a bit expensive since
it involves a MAD query and also it is not saved in the cache.
When the driver that holds the local device is different from
sif then there is no need to go through the query device. If
sif driver is identified, then we still need to go through the
query device in order to get the specific vendor id. This last
is to make sure the software workaround is applied only to the
PSIF revisions that are affected.

This patch filters those cases and avoids unnecessary MAD queries
when driver is different from sif.

Orabug: 24785622

Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>

IB/cm: return original rnr value when RNR WA for PSIF

With this patch, the original min_rnr_value set by the user is saved
in case it is later queried. The ib_qp flag has been re-purposed to
store the value in addition. This patch makes the RNR WA implementation
total transparent for the user.

Orabug: 24785622

Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>

IB/cm: MBIT needs to be used in network order

This patch fixes the MBIT big endian portability.

Orabug: 24785622

Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>

IB/core: Issue DREQ when receiving REQ/REP for stale QP

from "InfiBand Architecture Specifications Volume 1":

  A QP is said to have a stale connection when only one side has
  connection information. A stale connection may result if the remote CM
  had dropped the connection and sent a DREQ but the DREQ was never
  received by the local CM. Alternatively the remote CM may have lost
  all record of past connections because its node crashed and rebooted,
  while the local CM did not become aware of the remote node's reboot
  and therefore did not clean up stale connections.

and:

   A local CM may receive a REQ/REP for a stale connection. It shall
   abort the connection issuing REJ to the REQ/REP. It shall then issue
   DREQ with "DREQ:remote QPNâ\80\9d set to the remote QPN from the REQ/REP.

This patch solves a problem with reuse of QPN. Current codebase, that
is IPoIB, relies on a REAP-mechanism to do cleanup of the structures
in CM. A problem with this is the timeconstants governing this
mechanism; they are up to 768 seconds and the interface may look
inresponsive in that period.  Issuing a DREQ (and receiving a DREP)
does the necessary cleanup and the interface comes up.

Orabug: 24806985

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

sif: cq: cleanup cqe once a kernel qp is destroyed/reset

To ease the sqflush/rqflush workaround, sifdrv removes
all the associated cqes once a kernel qp is destroy/reset.

Even though IB specification 10.2.4.4 mentioned that
"Destroying a QP does not guarantee that CQEs of that
QP are deallocated from the CQ upon destruction.",
it also stated that "Even if the CQEs are already on
the CQ, it might not be possible to retrieve them"

Thus, IB spec is indicating that it is vendor specific
implementation and the ULP should not assume that the
cqes are in the cq once a qp is destroyed/reset.

Orabug: 25070316

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: cq: sif_poll_cq might not drain cq completely

In a scenario where a duplicate completion is detected, the sif_poll_cq
skips the remaining cqes in the cq. This code bug is introduced in
commit "sif: sqflush: Handle duplicate completions in poll_cq".

Orabug: 25038711

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: rq: do not flush rq if it is an srq

sifdrv needs to flush a regular rq (non-srq) once it
detects that a qp is transitioned into ERR state.
Nevertheless, if a qp is created with srq and
with no event handler, the srq might be accidentally
flushed once the qp is transitioned into ERR sate.

Orabug: 25071205

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: cq: use refcnt to disable/enable cq polling

Due to a hardware bug, sifdrv must clean up the cq
before it is being polled by the user. Thus, sifdrv
uses CQ_POLLING_NOT_ALLOWED bitmask to disable/enable
cq polling.

Nevertheless, the bit mask operation is not sufficient in
a shared cq scenario (many qps to one cq). The cq clean up
is performed by each qp and it might be performed concurrently.
As a result, the cq polling might be enabled before all qps
have clean up the cq.

Thus, this patch uses refcnt to disable/enable cq polling.
The CQ_POLLING_NOT_ALLOWED bitmask is kept in order to have
backward compatibility in the user library.

Orabug: 25038731

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: pt: Add support for single thread modified page tables

Modifications to the page tables in sif_pt is protected by
a lock to allow multiple threads to add and subtract regions
to/from the page table in parallel. This functionality is
currently only needed/used by the special sq_cmpl page table
handling. In the future we might however need this also for
other cases, for instance to optimize further on page table
memory usage.

The kernel documentation for infiniband midlayer locking
requires that map_phys_fmr should be callable from any context.
This prevents us from blocking on a lock, something that happens
if there are contention for the lock (eg. more than one thread
involved in modifying the page table)

Implement another flag: thread_safe in a pt that determines
if a page table is going to need to be modified from multiple
threads simultaneously. For now keep a BUG_ON if the code
is attempted accessed in parallel for memory types
that should not ever see parallel access.

Orabug: 24836269

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Francisco Trivino-Garcia <francisco.trivino@oracle.com>

sif: pqp: Implement handling of PQPs in error.

The assumption is that any such situation that can arise in
production is due to an application that causes it's CQ to go
to error and where the PQP subsequently tries to post a CQ
operation that affects the CQ that is in error. In these cases,
the PQP itself goes to error and an event is generated.

This commit refactors the modify_qp logic slightly, as well as
implementing a modification cycle to bring a privileged QP
back up again. It also adds a new pqp debugfs file and some statistics
to help monitoring the new PQP specific state as well.

The resurrect operation is queued on the sif workqueue
by the new handle_pqp_event function, which is now properly
wired up to accept all PQP events. When a PQP is detected as
being in error, its last_set_state is updated, and in addition
the write_only flag is set, which causes new send reqs not to
touch any collect buffer as part of the operation.
This flag was introduced to allow the resurrect to set the PQP in
RTS again while still not triggering any sends.

This way the implementation allows clients to continue to
post requests to the PQP while it is in error or in transition
back to RTS again by just accepting these requests into the PQP
send queue without any writes to the collect buffer.

When in the INIT state, the resurrect worker updates
the SQ pointers to skip the request that triggered
the PQP error.

Once back in RTS, the resurrect worker can take the single SQ lock
which serializes posts, check the size of the send queue
and if >= 0, trigger the send queue scheduler to start processing these.
Once the QP is in SQS mode, or just idle if the queue was empty,
it is safe for ordering purposes to let normal posting with
collect buffer writes commence.

Orabug: 24715634

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Francisco Trivino-Garcia <francisco.trivino@oracle.com>

sif: eps*: initialize each struct in array

explisitly zero all the structs, set and retain also for epsa0
which was (unintentionally) zeroed also when looping through rest
of the eps entries.

Make sure to set members of the struct after initialization.

Orabug: 25033224

Signed-off-by: George Refseth <george.refseth@oracle.com>
Reviewed-by: Åsmund Østvold <asmund.ostvold@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: query_device: Return correct #SGEs for EoIB

ULPs that use LSO, can only create QPs with #SGE entries being one
less than what is supported in HW. This because PSIF uses one entry
for the LSO stencil. The driver attempts to detect the ULP and from
that derive if LSO will be used.

This commit adds support for XVE (Xsigo Virtual Ethernet), and a
query_device() from the XVE driver (not the connected mode part), will
return one less #SGEs.

Further, since the XVE ULP is detected, we amend the sysfs listing of
QPs with EoIB (for XVE datagram mode QPs) and EoIB_CM (for XVE
connected mode QPs).

Orabug: 25027106

Signed-off-by: Hakon Bugge <Haakon.Bugge@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: LSO not supported for EoIB queuepairs

Due to missing decoding of create_flags, LSO is not enabled for EoIB
queuepairs. Regression was introduced in 4.1.12-71 kernel

Orabug: 25026132

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: pqp: Make setup/teardown function ref sif_pqp_info directly

Take advantage of the new, cleaner and more separate PQP data structures:
Simplify/abstract pqp setup/teardown by pointing into sdev
for a direct ref to the sif_pqp_info data structure.

Orabug: 24715634

Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: Move the rest of the pqp setup and teardown to sif_pqp

This finishes the restructure started in the previous commit,
to consolidate PQP handling logic to make it simpler
to extend without spreading the complexity to other parts
of the code.

Orabug: 24715634

Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: Move sif_dfs_register beyond base init

The debugfs setup must be initialized prior to PQP operation,
but must also be deinitialized before base table takedown,
otherwise we are exposed to faults due to a race condition between
a user accessing debugfs tables and driver unload.

This commit moves the dfs init/deinit from sif_probe to
sif_hw_init to achieve this order.

Orabug: 24971465

Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: Refactor PQP state out of sif_dev.

Prepare handling of PQPs for the additional complexity
of resurrect upon errors and query for undetected
error conditions by moving some of the state
now directly in sif_dev into a new sif_pqp_info
struct.

Already complex logic for PQP handling is going to
increase in complexity. We need to consolidate
to be able to keep clean and easily maintainable
interfaces.

Orabug: 24715634

Signed-off-by: Knut Omang <knut.omang@oracle.com>

xsigo: send nack codes

Orabug: 24442792

Sometime uVNIC removal on OFOS won't trigger a actual removal
of Vstar interface, in that case uVNIC driver has to send NACK
code so that XCM will start cleaning its database.

Added additional codes as per XCM specification

Reported-by: jie zhu <jie.x.zhu@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: Qingjun Wang <qingjun.wang@oracle.com>
Reviewed-by: Manish Kumar Singh <mk.singh@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

xsigo: xve driver has excessive messages

Orabug: 24758335

Moved some message types from Warning to debug.

Consolidated multiple messages into single to avoid
flooding of messages on console

Added more counters to identify state of vnic.

Added a debug type xve_info

Reported-by: chien yen <chien.yen@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: Aravind Kini <aravind.kini@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

xsigo: hard LOCKUP in freeing paths

Orabug: 24669507

When path->users becomes zero uVNIC driver starts
cleaning up the Forwarding table entries.

In some corner cases the call is invoked from transmit
function which is in interrupt context and that results
in a hard LOCKUP.

With new changes path->users is decremented in transmit
function to allow cleanup to happen from other thread.
Proper care is taken to avoid race between these
two contexts.

Reported-by: chien yen <chien.yen@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: Aravind Kini <aravind.kini@oracle.com>
Reviewed-by: viswa krishnamurthy <viswa.krishnamurthy@oracle.com>
Reviewed-by: Manish Kumar Singh <mk.singh@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

xsigo: Crash in xscore_port_num

Orabug: 24760465

When Server Profile context is not present
xcpm_get_xsmp_session_info returns error and uVNIC
driver has to handle that conditions

Reported-by: scarlett chen <scarlett.chen@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: viswa krishnamurthy <viswa.krishnamurthy@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

xsigo: Resize uVNIC/PVI CQ size

Orabug: 24765034

uVNIC/PVI should avoid CQ overflow condition

Resize CQ's to 16k to handle multiple connections
flushed simultaneously per path.

Increase Send Queue and receive Queue to 2k for
better performance.

Added counters to print CQ sizes.
Added stats to count RC completions.

Reported-by: scarlett chen <scarlett.chen@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: Aravind Kini <aravind.kini@oracle.com>
Reviewed-by: viswa krishnamurthy <viswa.krishnamurthy@oracle.com>
Reviewed-by: Manish Kumar Singh <mk.singh@oracle.com>
Reviewed-by: UmaShankar Tumari Mahabalagiri <umashankar.mahabalagiri@oracle.com>

xsigo: Optimizing Transmit completions

Orabug: 24928865

Added a timer for polling Transmit completion and
removed polling completion from a thread context.

Seeing Good Performance improvments with the changes.
In some cases uVNIC is seeing 10% increase in throughput

Reported-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>

xsigo: Implementing Jumbo MTU support

Orabug: 24928804

With Titan and Saturn supporting Jumbo Infiniband frames
uVNIC can have MTU greater than 4k and upto 10k.

Allocate multiple pages for Receive descriptors code changes
for handling multiple page mapping and unmapping.

Took proper care for enabling Jumbo MTU only for Titan and only
in EoiB mode.

If Jumbo MTU is used for non-Titan cards uVNIC driver will NACK
the Install and OFOS will display a failure message for the
install.

Added stats to display Jumbo & removed legacy EoiB HeartBeat code.

Reported-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>

RDS: rds debug messages are enabled by default

rds use Kconfig option called "RDS_DEBUG" to enable rds debug messages.
This option cause the rds Makefile to add -DDEBUG to the rds gcc command
line.

When CONFIG_DYNAMIC_DEBUG is enabled, the "DEBUG" macro is used by
include/linux/dynamic_debug.h to decide if dynamic debug prints should
be sent by default to the kernel log.

rds should not enable this macro for production builds.

Orabug: 24956522

Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

net/rds: Fix new sparse warning

c0adf54a109 introduced new sparse warnings:
  CHECK   /home/dahern/kernels/linux.git/net/rds/ib_cm.c
net/rds/ib_cm.c:191:34: warning: incorrect type in initializer (different base types)
net/rds/ib_cm.c:191:34:    expected unsigned long long [unsigned] [usertype] dp_ack_seq
net/rds/ib_cm.c:191:34:    got restricted __be64 <noident>
net/rds/ib_cm.c:194:51: warning: cast to restricted __be64

The temporary variable for sequence number should have been declared as __be64
rather than u64. Make it so.

Orabug: 24817685

Signed-off-by: David Ahern <david.ahern@oracle.com>
Cc: shamir rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e2783717a71e9babfdd7c36c7e35b790d2c01022)
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

net/rds: fix unaligned memory access

rdma_conn_param private data is copied using memcpy after headers such
as cma_hdr (see cma_resolve_ib_udp as example). so the start of the
private data is aligned to the end of the structure that come before. if
this structure end with u32 the meaning is that the start of the private
data will be 4 bytes aligned. structures that use u8/u16/u32/u64 are
naturally aligned but in case the structure start is not 8 bytes aligned,
all u64 members of this structure will not be aligned. to solve this issue
we must use special macros that allow unaligned access to those
unaligned members.

Addresses the following kernel log seen when attempting to use RDMA:

Kernel unaligned access at TPC[10507a88] rds_ib_cm_connect_complete+0x1bc/0x1e0 [rds_rdma]

Orabug: 24817685

Acked-by: Chien Yen <chien.yen@oracle.com>
Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com>
[Minor tweaks for top of tree by:]
Signed-off-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c0adf54a10903b59037a4c5fcb933dfeeb7b2624)
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

mlx4_ib: remove WARN_ON() based on incorrect assumptions

A WARN_ON() was inserted when user data was introduced to the
ibv_cmd_alloc_shpd() by another infiniband provider to make
sure that no user data is sent to older providers such as this
one which do not expect it.

It assumed when no user data is sent the udata->inlen is zero.
The user-kernel API however always sends at least 8 octets
(which may not be initialized in case of provider libraries that
do not user user data).

We remove the WARN_ON and rely on providers not to touch that
field if their companion library is not expected to initialize it.

Orabug: 24972331

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Revert "ib/mlx4: Initialize multiple Mellanox HCAs in parallel"

This reverts commit a661980d2a809dbe208914b8eec46c18b78c18fb.

Orabug: 24951493
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

mlx4_core/ib: set the IB port MTU to 2K

'commit 096335b3f983 ("mlx4_core: Allow dynamic MTU configuration for IB
ports")' overwrite the default port MTU and sets it as 4K. Since this
directly impacts the HW VLs supported and Oracle workloads heavily uses
all supported 8 VLs for traffic classification, 2K default needs to
be kept as is.

We initilise it to default 2k so that the feature(dynamic MTU configuration)
is still available for non DB users to set the desired MTU value using sysctl.

Also for CX2 cards, commit 596c5ff4b7b3 ("net/mlx4: adjust initial
value of vl_cap in mlx4_SET_PORT") broke the vl_cap which made the
supported VLs to 4 irrespective of MTU size.

Orabug: 24946479

Tested-by: Pierre Orzechowski <pierre.e.orzechowski@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

sif: cq: transfer headroom attribute to user mode

This commit makes sure old libsif versions works
with the driver while providing a forward compatible
way of making additional changes to the extra
headroom in the CQs.

We anticipate to be able to trim
down the extra entries once we have PQP errors
handled transparently. This commit then ensures that
the headroom is only set in one place, at the
driver side, and that user mode just can
pick up the configured headroom from the kernel.
This is done by providing the used headroom
in a formerly reserved 32 bit field, thus no changes
to the packet size is necessary.

Nevertheless we increment the abi version from
3.6 to 3.7 to allow libsif to detect whether
the headroom field can be trusted.

Orabug: 24926265

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: franklin <osl04sys_no_grp@oracle.com>

sif: Minor cleanup commit

- Remove some unused variables
- Minor changes due to bug fixes in hardware header
file generator

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Francisco Triviño <francisco.trivino@oracle.com>

sif: Add vendor flag to support testing without oversized CQs

After introduction of extra CQ entries to reduce risk of
having duplicate completions overflow a CQ, we no longer can
trigger various CQ overflow scenarios without running a lot of
requests. We need to be able to test with a minimal set of operations
to allow co-sim based tests for further analysis.

Introduce a new vendor_flag no_x_cqe = 0x80 to turn off
the allocation of extra CQEs.

Orabug: 24919301

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Francisco Triviño <francisco.trivino@oracle.com>

sif: cq: Fix the max_cqe capability supported by SIF

Orabug: 24673784

This patch fixes an incomplete patch in commit "cq: Add
additional SIF visible cqes to CQ". The max_cqe
capability reported by query_device is incorrect because
it includes the SIF visible cqes.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: qp_attr: Fix qp attributes for query_qp verb

Orabug: 21946858

Following QP attributes were incorrectly reported:
1) max_rd_atomic
2) service level
3) alternate pkey index
4) alternate ack timeout
5) alternate address handle

The initial commit with the same title was somehow probably
subject to a merge issue and it's effect got lost entirely
by a subsequent patch.

Signed-off-by: Vinay Shaw <vinay.shaw@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: qp: Fix modify_qp_hw from SQE to RTS

Orabug: 24810237

Fix an issue in modify_qp_hw from SQE to RTS returns
EPSC_MODIFY_CANNOT_CHANGE_QP_ATTR. In sif, qp transition
from SQE to RTS must explicitly set the req_access_error
to 0.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: pd: Implement Oracle ib_core compliance shared pd

Orabug: 24713410

shared pd is not an IBTA defined feature, but an Oracle
Linux extension. Even though PSIF can share a pd easily,
it must comply with the Oracle ib_core implementation which
requires a new pd "object" when reusing a pd (via share_pd
verbs).

Without a new pd "object", it causes a NULL pointer deference
during pd clean-up phase. Thus, this patch creates a new pd
"object" when reusing a pd, and this pd "object" is pointing
to the original pd index.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: eq: Add timeout to the threaded interrupt handler

This commit implements a timeout that prevents soft lockup issues when
the threaded interrupt function (sif_intr_worker) keeps processing
events for a long period. If the timeout is reached, the threaded
handler returns IRQ_HANDLED even if there are more events to be
processed. In such a case, the coalescing mechanism will generate
an IRQ for the last event.

Orabug: 24839976

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

IB/mlx4: Scatter CQs to different EQs

If the user does not request a specific comp vector, use a weight based
algorithm to set the EQ.

Orabug: 24705943

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: fix panic with handlers running post teardown

Shutdown cqe reaping loop takes care of emptying the
CQ's before they being destroyed. And once tasklets are
killed, the hanlders are not expected to run.

But because of core tasklet BUG, tasklet handler could
still run after tasklet_kill which lead can lead to kernel
panic. Fix for core tasklet code was proposed and accepted
upstream, but it comes with bagage of fixing quite a
few bad users of it. Also for receive, we have additional
kthread to take care.

The BUG fix done as part of Orabug 2446085, had an additional
assumption that reaping code won't reap all the CQEs after
QP moved to error state which was not correct. QP is
moved to error state as part of rdma_disconnect() and
all the CQEs are reaped by the loop properly.

Any handler running after above and trying to access the
qp/cq resources gets exposed to race conditions. Patch
fixes this race by makes sure that handlers returns
without any action post teardown.

Orabug: 24460805

Reviewed-by: Wengang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: ib: build fix rds_conn_drop() takes extra parameter now

rds_conn_drop() now takes reason as a parametr. Fixes
commit 052f62099 build issue

Orabug: 22506032

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: Drop the connection as part of cancel to avoid hangs

To avoid waiting indefinitely in rds_send_drop_to(), drop the connection
proactively if one of the cancelled messages is mapped.

Orabug: 22506032

Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: add reconnect retry scheme for stalled connections

RDS IB connections gets stalled at times and letting the connections
take its sweet time to reconnect. On passive side, we wait for 15 seconds
for such stalled connections which is too slow based on application
IO timeouts. IB connections are established in milliseconds so we better
drop these stuck connections early and retry.

The retry timeout is kept tunable via reconnect_retry_ms sysctl. The
upper bound for retries is tunbale via rds_sysctl_reconnect_max_retries.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: restore the exponential back-off scheme

Lower IP and exponential back-off scheme was added to save the
SM queries because of races but it doesn't do what its intended.
The exponential back-off scheme does a good job of backing off
for races. The code just falls back to the original scheme.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: avoid duplicate connection drop for self loopback

For self-IB loopback is special mode and the c_passive conn is just a
place holder to stick the the second QP.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: don't modify conn state directly in rds_connect_complete

Avoid modifying the conn state directly and let
the APIs handle it to be consistent across.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: log associates connection details for setup failures

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: suppress log prints for FLUSH_ERR/RETRY_EXC

Flush errors are normal while draining the QP and retry exceeded
errors are normal for RC connections with finite transport retry.
No need so flood the log file. RDS in-memory trace already logs it.

Orabug: 22347191, 24663803

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

ipoib: supress the retry related completion errors

IPoIB doesn't support transport/rnr retry schemes as per
RFC so those errors are expected. No need to flood the
log files with them.

It was tempting make it as upstream(debug mode only) but
since we have this erros in Oracle IB stack for while,
leaving rest of them to be logged.

Orabug: 24663803, 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reported-by: Rajiv Raja <rajiv.raja@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: use c_wq for all activities on a connection

RDS connection work, send work, recv work etc events have been
serialised by use of a single threaded work queue. For loopback
connections, we created a separate thread but only connection
work is moved on it. This actually under utilises the thread
and creates un-necessary contention for send/recv work and
connection work for loopback connections.

We move remainder loopback work as well on the rds_local_wq
which garantees serialisation as well as delinks the loopback
and non loopback work(s).

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: Avoid double reject on ACL failures

We end up sending double reject on ACL failures. Fix it.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: make the rds_{local_}wq part of rds_connection

Instead of sprinkling if/else for loopback all over the place,
lets just add c_wq as part of rds_connection. This will prevent
missing cases like 'commit edca33be359c ("RDS: move more queing for
loopback connections to separate queue"), 'commit 8502173071b6
("rds: schedule local connection activity in proper workqueue")
or any future changes.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: make rds_conn_drop() take reason argument

This removes the need of modifying the conn all over the place
and moves it inside rds_conn_drop(). Actually there is almost
no need to carry the 'c_drop_source' information as part of
rds_connection but since shutdown thread wants to log this
info for debug, the field is left as is for now.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: Remove unused PATH migration event code

These events were used for APM which is removed already from the code.
Remove this remainder '#if 0' code and migrate handler.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: remove delayed queuing of address change

There is no good reason to delay the address change event. Remove
the delayed work and use the function directly.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDMA CM: init the return value to avoid false negative

Garbage return value can be seen as an error which is false negative

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: use address change event for failover/failback

The RDS active bonding code had few fundamental bugs which
have been addressed so the workaround(s) added can be removed.
These workaround are creating problems and races in connection
management code leading to occasional connections stalls.
Removal of these makes the code behavior predictable and clean.

Few notables fixes to mention here:
- Taking care of all layers of events for ports before
marking them up/down.
- ARP cache related fixes.
- Local loopback connection hang fix

Patch almost make RDS active bonding failover/failback path
as intended from the beginning.
i.e On failover/failback, the address change events needs to
be sent which then triggers all the necessary events to
re-establish the connection(s).

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: drop workaround for loopback connection hangs

There is no need to modify the ARP cache directly under the
assumption that it helps to speed up the failover/failback for
loopback connections. The ARP cache is properly updated by core
IPv4 code and one of the issue with RDS active bonding code
with arp has been addressed as part of
'commit 42a7becc725f ("RDS: IB: Make use of ARPOP_REQUEST instead
of ARPOP_REPLY in bonding code")'. Remove the workaround added as
part of bug 16979994 with patch "RDS: Local address resolution may
be delayed after IP has moved"

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Revert "RDS: IB: skip rx/tx work when destroying connection"

This reverts commit cb9e35124534e0803606ac58b03390d3f3d83dad.

Orabug: 24746103

xsigo: EoiB QP support

Orabug: 24508359

1) Enable EoiB QP property for uVNIC's created on Titan
   card using is_eoib variable.
2) Use qkey provided by OFOS for EoIB QP, this QKEY comes
   as part of VNIC Install message.
3) If Titan Card has TSO enabled then enable TSO using
   NETIF_F_TSO flag and notify upstream network stack about this
4) If Titan has Checksum offload feature then notify network stack.
   about this and on each WR set IB_SEND_IP_CSUM bit.
5) Added some Debug Flags
6) Printing Qkey information and EoiB information in proc stats

Reported-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>

xsigo: Send Heart Beat Lost Operational state

Orabug: 23032392

In Case of Heart Beat loss due to saturn or Multicast issues
uVNIC driver needs to send XVE_NOTIFY_HBEAT_LOST as a part of
Operational Request to OFOS . This will enable OFOS to perform
appropriate actions

Reported-by: Suyi Shao <suyi.shao@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>

sif: Retest last allocated entry with roundrobin allocation

Current codebase tests next element if it is unused(freed) when allocating
round-robin. In certain cases where an element is allocated and then
deallocated immediately it is convenient to reuse the element.

Orabug: 24761759

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: cq: Implement error tracking

This commit introduces a new state variable 'in_error'
in the CQ state. When a CQ error event is detected
cq->in_error is set.

This state is then checked before
* posting a req_notify_cq
* invoking any of the completion generating
workarounds.

Also set qp->last_set_state to ERR if a fatal QP event
is detected to reduce further posting on that QP.

This commit reduces the risk of getting into a privileged
QP error scenario for some common cases of misbehaved
applications, and also enables the driver to terminate more
quickly if a priv.QP error has occurred.

Orabug: 24715634

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: sqflush: Fix wrong casting in the calculation of CQ full

Orabug: 24735772

The CQ full is calculated via last posted seq number (last_seq)
minus the last completion seq number (head_seq). Both last_seq
and head_seq are defined as u16. However, in the calculation to
verify that the CQ is not full, a wrong casting is performed.
This causes a false negative of CQ full in the wrapped case.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: qp: Re-factor initializing of HW QP state

When a QP is created, the HW state is zeroed, then certain fields are
initialized. After modify_qp(), other fields are set. When a QP is
handed over to HW, HW will potentially modify parts of the QP
state. The QP will eventually be transitioned into the RESET state.

From RESET, it is legitimate to resurrect the QP.

It is imperative that the QP state is equal the state when it was
created when transitioned to RESET, in the case it will be resurrected

For any state to RESET transition, the IBTA specification states: "QP
attributes are reset to the same values after the QP was created."

This commit re-factors create_qp() and reset_qp() so the driver
adheres to the specification.

Orabug: 24747392

Signed-off-by: Hakon Bugge <Haakon.Bugge@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: cq: Add additional SIF visible cqes to CQ

Orabug: 24673784

Due to two HW bugs, SIF needs to add additional cqes apart from
the requested N cqes. First issue is that the CQ cannot be full
when it is being invalidated. Hence, we need 1 extra entry.

The second issue is that HW might generate duplicate completions.
Thus, SIF needs 768 extra entries to cater for these duplicate
completions and 1 additional entry for the fence completion.

Then, SIF driver rounds up to the nearest 2^N. The number of cqes
available to the ulp/user, will be the above 2^N - (768 + 1 + 1).

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: qp: Clear the QP state cq_int_err bit upon reset

During reset of a QP, the cq_int_err bit was not reset.
This would cause a subsequent transition to RTR to fail
and hardware to set the QP back in error again.

Orabug: 24708282

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: qp_attr: Fix qp attributes for query_qp verb

Orabug: 21946858

Following QP attributes were incorrectly reported:
1) max_rd_atomic
2) service level
3) alternate pkey index
4) alternate ack timeout
5) alternate address handle

Signed-off-by: Vinay Shaw <vinay.shaw@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: qp_attr: Fix qp attributes for modify_qp verb

Orabug: 24669222

Local ACK timeout were incorrectly set and reported.

Signed-off-by: Vinay Shaw <vinay.shaw@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: eq: Let compiler handle endianess and memory model

Handle port number, a u8 entity in struct ib_event in a platform agnostic way

This bug causes a crash/kernel panic on a (psif-)ib-connected SPARC at boot

Orabug: 24702857

Signed-off-by: George Refseth <george.refseth@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: cq: do not return errors from poll_cq

Orabug: 23321166

Remove returning errors from sif_poll_cq function. On the opposite, log the
error and skip the CQEs and continue with the next CQE.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: qp: Collapsed two log statements + removed incorrect port number print

With debug level bit 0x2000 (SIF_QP) set, the driver logs QP creation
info. Two log statements logging the same information are
collapsed. Also, incorrect logging of port number is removed for all
QPs except QP0/1 , which are the only QPs that have valid port number
at their creation time.

Orabug: 24695066

Signed-off-by: Hakon Bugge <Haakon.Bugge@oracle.com>
Tested-by: Francisco Trivi?o-Garcia <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: Avoid using SIFMT_2M for allocation of any tables in no_huge_page mode

The feature mask no_huge_pages, enabled for Xen due to
DMA address alignment issues with huge pages, did not apply
to allocation of CQs, RQs, and SQs, only to the tableworks.
This causes allocation of queues of these types
larger than 4M in total size to fail on Xen PV domains
such as dom0.

Orabug: 24683830

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: qp: Adjust EoIB qp inline size to support LSO

Handle EoIB in same way as IPoIB in create_qp. To support LSO we
need a minimum inline size, the code will adjust inline size
accordingly.

Orabug: 24672908

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: sqflush: set the duplicated CQ entry status as DUPL_COMPL_ERR

Orabug: 24652927

Mark the duplicate completion status as PSIF_WC_STATUS_DUPL_COMPL_ERR if
the additional/duplicate completions are detected by walk_and_update_cqes.
Then, the translate_wr_id can identify the duplicate completion during
sif_poll_cq.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: cq: fixup the CQEs when a QP is transitioned to RESET

Orabug: 24652927

The sif_fixup_cqes function, which update the wr_id from the SQ handle, is
moved to reset_qp to cover the scenario where IB user reuses a QP after
performing ib_modify_qp(RESET). This patch also handles a scenario in
sif_fixup_cqes where a QP has been reset multiple times but the IB user has
not polled the associated CQ completely.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: eq: Implement threaded interrupt handler

The handler function (sif_intr) for a single interrupt is mostly
processing completions notification events (CNE) as long as there
are events in the queue. Sometimes the CNEs are received at a higher
rate than the handler is able to process them, then it keeps
infinitely processing events until the queue might be full, which
leads to a fatal error, or the watchdog triggers a kernel panic,
as shown in orabug 24657844.

This commit replaces request_irq by request_threaded_irq, which
allows the driver to specify a threaded handler (sif_intr_worker)
in addition. The original handler function (sif_intr) is called
in hard interrupt context and can return IRQ_HANDLED if the timeout
SIF_IRQ_HANDLER_TIMEOUT is not exceeded or IRQ_WAKE_THREAD otherwise.

The flag IRQF_ONESHOT is used to ensure that the interrupt is
disabled when IRQ_WAKE_THREAD is returned.

Orabug: 24657844

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: eq: remove check_all_eqs_on_intr driver feature

The feature check_all_eqs_on_intr is no longer needed. This commit
removes this driver feature and hence simplifies the interrupt
handler implementation.

Orabug: 24665085

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: eq: Add max irq handling time to the sysfs eq table

Track the max time (in ms) that has been recorded for dispatching events
(dispatch_eq) on interrupt handling (sif_intr).

Orabug: 24657844

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

RDS: IB: set default frag size to 16K

For systems which wants lower fragment setting because of
smaller memory footprints, module parameter 'rds_ib_max_frag'
can be used to set lower value like 4K or 8K.

Orabug: 24656820

Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

rds: avoid call to flush_mrs() in specific condition

This is to reduce process spawn time.
When user provides 0 values for cookie and flags in rds_free_mr() call,
avoid calling flush_mr()

skgxp uses cookie 0 and flag 0 combination for checking whether
transport is RDMA capable or not.

This is short term hack for customer escalation.
Customer is having other processes which are calling flush_mrs() and
that is causing mutex contention.
skgxp change is fairly significant, and we want to provide minimal
change in customer environment.

Risk factor here is, if there is any other use of cookie 0 and flag 0
combination (like freeing up unused MRs), then that will be impacted.
Code inspection by Leo/Avneesh at skgxp and skgnfs suggests that, this
combination not being used anywhere.

Long term solution for this requires changes in RDS as well as skgxp
application, which should be done in next UEK release.
Required RDS changes are present in UEK4; however, skgxp changes are
still remaining. Since this was escalation from major customer, we
require this hack in UEK4.

Orabug: 24656750

Tested-by: Sujatha Tolstoy <sujatha.tolstoy@oracle.com>
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Leo Tominna <leo.tominna@oracle.com>
Reviewed-by: Avneesh Pant <avneesh.pant@oracle.com>
Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

mlx4_core: allow unprivileged VFs read physical port counters

For compatibility to Guest OS running older release, we allow
VFs to read physical port counters

Orabug: 24656803

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Santosh Shilimkar<santosh.shilimkar@oracle.com>

sif: Lift sif_verbs up to be independent of sif internal headers

The sif_verbs.h file needs to be independent of
other header files to be includable from other kernel.
This is necessary to avoid duplicate definition of
the API elements. For Oracle Linux this file now moves from
drivers/infiniband/hw/sif/ to include/rdma/ to make it
available for the RDS and uvNIC drivers.

This is a temporary but necessary measure while we wait
for proper generic interfaces to be defined at the common
verbs layer.

Orabug: 24524698

Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: ireg: Use the firmware release version as sysfs fw_ver

Report the official release version as reported by
ibv_query_device etc. instead of the previously used
internal firmware build version.

Orabug: 24533579

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: Remove dummy implementation of get_protocol_stats

We don't really implement it and the entry point was silently
removed in upstream commit v4.6-rc5-317-gb40f475

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Åsmund Østvold <asmund.ostvold@oracle.com>

sif: ipd: Fix incorrect calculation of ipd from static rate

Orabug: 24449061

The ipd is calculated wrongly because it compares the active speed enum
with the value return from ib_rate_to_mult. Thus, this patch converts the
PSIF Active speed enum to a multiple of the base rate of SDR (2.5 Gbps).

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: Fix recently introduced checkpatch issues

It appears the commit check in checkpatch does not capture
all errors. Fix the new ones inthe driver code to
allow us to enable a regression test for it.

Orabug: 24570578

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Åsmund Østvold <asmund.ostvold@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: sqflush: Handle duplicate completions in poll_cq

Orabug: 23759723

During the QP transition from RTS-> ERR, the HW might generate
duplicate FLUSHED-IN-ERR completion. The SIF driver inverses the
sq_seq in a dedicated completion entry and sets the
CQ_POLLING_IGNORED_SEQ bit in the cq_sw flags. Nevertheless, this bit
is cleared once a duplicate FLUSHED-IN-ERR completion is detected in
poll_cq.

The above mentioned method cannot handle a scenario where HW generates
multiple duplicate completions. Thus, this patch moves the detection
of the duplicate completions to translate_wr_id. Then, SIF driver
will only return non duplicate completions to the user.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

ib_core: make wait_event uninterruptible in ib_flush_fmr_pool()

Replace wait_event_interruptible() with wait_event() in
ib_flush_fmr_pool() to avoid deallocating pd before fmr_cleanup_thread
tears down pool of fmrs.

Orabug: 24533036

Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

sif: vlink connect is now enabled by default

This fix makes default link failover behaviour compatible with existing
mellanox CX3. Internal link status (PortState) will now follow external
link status (PortState) by default.

Driver feature mask SIFF_vlink_disconnect may be used to set default
behaviour to "vlink connect"=disabled.

Orabug: 24445370

Signed-off-by: Harald Høeg <harald.hoeg@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: sif_hwmon: add hwmon interface to export psif chip temperatures

This commit adds support to export psif chip temperatures via hwmon
interface

Orabug: 24432362

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: EPSC_API_VERSION(2,10) - EPSC_DIAG_COUNTERS

Adding a new EPSC command EPSC_DIAG_COUNTERS
to get Diag counter values via mailbox.

Orabug: 24374612

Modified-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: base: Scale default desc.array size values based on #of available CBs

With default values for #of QPs and MRs set high by default,
33 instances of the driver would consume a lot
of memory just to initialize basic tables since each of these
instances have their own 1M QP space and in effect allocates
the same amount of resources that a bare metal, single instance
driver would do.

The number of collect buffers assigned to the PCIe function tells us
what fraction of the hardware resources we got, and a small
fraction of the 16K CB space indicates that the function competes with
other functions on resources, and that it is unlikely that the same
huge number of QPs etc can be deployed with high performance
anyway.

This commit introduces tracking of module parameter settings
compared to default values, and if compiled in defaults are used,
we scale down the number of QPs etc with a factor corresponding
to the fraction of CBs we got.

This yields eg. 32K QPs per function in a 32 VF enabled system
and significantly reduces system wide memory usage in a
virtualized environment (whether Xen based or not)

Users can still override settings using the module parameters,
which will not be subject to scaling if they deviate from the
compiled in defaults.

Orabug: 24424521

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: cb: Improve algorithm for allocating and using CBs from driver

Instead of allocating bandwidth collect buffers (CBs)
as a fallback for latency CBs, and spamming the kernel log
with failure messages, instead multiplex use across
the actual allocated number of latency CBs and just report
the failure to allocate once, with values to improve debugging.

Improves behaviour for scenarios where available CB resources
are spread across many VFs but VF drivers still see a lot
of (virtual) CPUs, which will easily be the case with the
default VF settings for Xen dom0.

Also, the low latency property is most critical for req.notify PQP
requests. Use high bandwidth CBs also for PQP operations other than
the REARM request, which is the performance critical req. for
req_notify_cq. This should improve performance for event based
applications under high load.

Orabug: 24424521

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: epsc: For Xen dom0 configure resources for all 32 VFs at driver load

As of EPSC API version 2.9 firmware can distribute resources based on
the number of PCI functions the PF driver requests support for.
Older firmware will just ignore the value.

This commit enforces no VFs configured as the default setting
but enable all 32 VFs if a Xen PV domain is detected.

To allow overriding this behaviour we add a new module parameter
vf_max which can be used to override the number of VFs configured
for instance for use with other virtualization engines than Xen
and for debugging/tuning purposes. The vf_max parameter takes the
following values:

-2:  Use NVRAM configured firmware defaults (backward compat mode)
-1  (now default) : Exadata mode as described above
0-32:  Configure explicitly for that many VFs (only selected values
     are supported by firmware)

Orabug: 24424521

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: Reintroduce function name prefixes in log statements

A lot of the available messages doesn't make enough sense without the
information in the function name so just reintroduce the function
name prefixes.

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Orabug: 24437547
Pre-check: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: fmr: invalidate keys before TLB bulk invalidates

This commit reorders and sequentializes the cleanup phase when
bulk invalidates are used. The order was to post the TLB flushing
operation to the EPSC, then invalidate keys (potentially in parallel with the
ongoing flushing) before finally waiting for the TLB flushing to complete.
This way is not considered safe in general, as an incoming access to a key
can cause an invalidated PTE or PTW to be cached again and later cause
sif to read or write to a no longer valid location.

This commit makes sure that all keys are invalidated before
the TLB flushing is triggered.

Orabug: 24438867

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: dfs: Add number of entries and extent to debugfs report headers

This is a simple enhancement to make memory usage of the sif driver
more easy to explore/trace.

Also fix a few printout alignment issues in the eq file
and remove some invalid information.

Orabug: 23141108

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>