www.infradead.org Git - users/jedix/linux-maple.git/log

RDS: IB: Avoid double reject on ACL failures

We end up sending double reject on ACL failures. Fix it.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: make the rds_{local_}wq part of rds_connection

Instead of sprinkling if/else for loopback all over the place,
lets just add c_wq as part of rds_connection. This will prevent
missing cases like 'commit edca33be359c ("RDS: move more queing for
loopback connections to separate queue"), 'commit 8502173071b6
("rds: schedule local connection activity in proper workqueue")
or any future changes.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: make rds_conn_drop() take reason argument

This removes the need of modifying the conn all over the place
and moves it inside rds_conn_drop(). Actually there is almost
no need to carry the 'c_drop_source' information as part of
rds_connection but since shutdown thread wants to log this
info for debug, the field is left as is for now.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: Remove unused PATH migration event code

These events were used for APM which is removed already from the code.
Remove this remainder '#if 0' code and migrate handler.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: remove delayed queuing of address change

There is no good reason to delay the address change event. Remove
the delayed work and use the function directly.

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDMA CM: init the return value to avoid false negative

Garbage return value can be seen as an error which is false negative

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: use address change event for failover/failback

The RDS active bonding code had few fundamental bugs which
have been addressed so the workaround(s) added can be removed.
These workaround are creating problems and races in connection
management code leading to occasional connections stalls.
Removal of these makes the code behavior predictable and clean.

Few notables fixes to mention here:
- Taking care of all layers of events for ports before
marking them up/down.
- ARP cache related fixes.
- Local loopback connection hang fix

Patch almost make RDS active bonding failover/failback path
as intended from the beginning.
i.e On failover/failback, the address change events needs to
be sent which then triggers all the necessary events to
re-establish the connection(s).

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: drop workaround for loopback connection hangs

There is no need to modify the ARP cache directly under the
assumption that it helps to speed up the failover/failback for
loopback connections. The ARP cache is properly updated by core
IPv4 code and one of the issue with RDS active bonding code
with arp has been addressed as part of
'commit 42a7becc725f ("RDS: IB: Make use of ARPOP_REQUEST instead
of ARPOP_REPLY in bonding code")'. Remove the workaround added as
part of bug 16979994 with patch "RDS: Local address resolution may
be delayed after IP has moved"

Orabug: 22347191

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Revert "RDS: IB: skip rx/tx work when destroying connection"

This reverts commit cb9e35124534e0803606ac58b03390d3f3d83dad.

Orabug: 24746103

xsigo: EoiB QP support

Orabug: 24508359

1) Enable EoiB QP property for uVNIC's created on Titan
   card using is_eoib variable.
2) Use qkey provided by OFOS for EoIB QP, this QKEY comes
   as part of VNIC Install message.
3) If Titan Card has TSO enabled then enable TSO using
   NETIF_F_TSO flag and notify upstream network stack about this
4) If Titan has Checksum offload feature then notify network stack.
   about this and on each WR set IB_SEND_IP_CSUM bit.
5) Added some Debug Flags
6) Printing Qkey information and EoiB information in proc stats

Reported-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>

xsigo: Send Heart Beat Lost Operational state

Orabug: 23032392

In Case of Heart Beat loss due to saturn or Multicast issues
uVNIC driver needs to send XVE_NOTIFY_HBEAT_LOST as a part of
Operational Request to OFOS . This will enable OFOS to perform
appropriate actions

Reported-by: Suyi Shao <suyi.shao@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>

sif: Retest last allocated entry with roundrobin allocation

Current codebase tests next element if it is unused(freed) when allocating
round-robin. In certain cases where an element is allocated and then
deallocated immediately it is convenient to reuse the element.

Orabug: 24761759

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: cq: Implement error tracking

This commit introduces a new state variable 'in_error'
in the CQ state. When a CQ error event is detected
cq->in_error is set.

This state is then checked before
* posting a req_notify_cq
* invoking any of the completion generating
workarounds.

Also set qp->last_set_state to ERR if a fatal QP event
is detected to reduce further posting on that QP.

This commit reduces the risk of getting into a privileged
QP error scenario for some common cases of misbehaved
applications, and also enables the driver to terminate more
quickly if a priv.QP error has occurred.

Orabug: 24715634

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: sqflush: Fix wrong casting in the calculation of CQ full

Orabug: 24735772

The CQ full is calculated via last posted seq number (last_seq)
minus the last completion seq number (head_seq). Both last_seq
and head_seq are defined as u16. However, in the calculation to
verify that the CQ is not full, a wrong casting is performed.
This causes a false negative of CQ full in the wrapped case.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: qp: Re-factor initializing of HW QP state

When a QP is created, the HW state is zeroed, then certain fields are
initialized. After modify_qp(), other fields are set. When a QP is
handed over to HW, HW will potentially modify parts of the QP
state. The QP will eventually be transitioned into the RESET state.

From RESET, it is legitimate to resurrect the QP.

It is imperative that the QP state is equal the state when it was
created when transitioned to RESET, in the case it will be resurrected

For any state to RESET transition, the IBTA specification states: "QP
attributes are reset to the same values after the QP was created."

This commit re-factors create_qp() and reset_qp() so the driver
adheres to the specification.

Orabug: 24747392

Signed-off-by: Hakon Bugge <Haakon.Bugge@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: cq: Add additional SIF visible cqes to CQ

Orabug: 24673784

Due to two HW bugs, SIF needs to add additional cqes apart from
the requested N cqes. First issue is that the CQ cannot be full
when it is being invalidated. Hence, we need 1 extra entry.

The second issue is that HW might generate duplicate completions.
Thus, SIF needs 768 extra entries to cater for these duplicate
completions and 1 additional entry for the fence completion.

Then, SIF driver rounds up to the nearest 2^N. The number of cqes
available to the ulp/user, will be the above 2^N - (768 + 1 + 1).

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: qp: Clear the QP state cq_int_err bit upon reset

During reset of a QP, the cq_int_err bit was not reset.
This would cause a subsequent transition to RTR to fail
and hardware to set the QP back in error again.

Orabug: 24708282

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: qp_attr: Fix qp attributes for query_qp verb

Orabug: 21946858

Following QP attributes were incorrectly reported:
1) max_rd_atomic
2) service level
3) alternate pkey index
4) alternate ack timeout
5) alternate address handle

Signed-off-by: Vinay Shaw <vinay.shaw@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: qp_attr: Fix qp attributes for modify_qp verb

Orabug: 24669222

Local ACK timeout were incorrectly set and reported.

Signed-off-by: Vinay Shaw <vinay.shaw@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: eq: Let compiler handle endianess and memory model

Handle port number, a u8 entity in struct ib_event in a platform agnostic way

This bug causes a crash/kernel panic on a (psif-)ib-connected SPARC at boot

Orabug: 24702857

Signed-off-by: George Refseth <george.refseth@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: cq: do not return errors from poll_cq

Orabug: 23321166

Remove returning errors from sif_poll_cq function. On the opposite, log the
error and skip the CQEs and continue with the next CQE.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: qp: Collapsed two log statements + removed incorrect port number print

With debug level bit 0x2000 (SIF_QP) set, the driver logs QP creation
info. Two log statements logging the same information are
collapsed. Also, incorrect logging of port number is removed for all
QPs except QP0/1 , which are the only QPs that have valid port number
at their creation time.

Orabug: 24695066

Signed-off-by: Hakon Bugge <Haakon.Bugge@oracle.com>
Tested-by: Francisco Trivi?o-Garcia <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: Avoid using SIFMT_2M for allocation of any tables in no_huge_page mode

The feature mask no_huge_pages, enabled for Xen due to
DMA address alignment issues with huge pages, did not apply
to allocation of CQs, RQs, and SQs, only to the tableworks.
This causes allocation of queues of these types
larger than 4M in total size to fail on Xen PV domains
such as dom0.

Orabug: 24683830

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: qp: Adjust EoIB qp inline size to support LSO

Handle EoIB in same way as IPoIB in create_qp. To support LSO we
need a minimum inline size, the code will adjust inline size
accordingly.

Orabug: 24672908

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: sqflush: set the duplicated CQ entry status as DUPL_COMPL_ERR

Orabug: 24652927

Mark the duplicate completion status as PSIF_WC_STATUS_DUPL_COMPL_ERR if
the additional/duplicate completions are detected by walk_and_update_cqes.
Then, the translate_wr_id can identify the duplicate completion during
sif_poll_cq.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: cq: fixup the CQEs when a QP is transitioned to RESET

Orabug: 24652927

The sif_fixup_cqes function, which update the wr_id from the SQ handle, is
moved to reset_qp to cover the scenario where IB user reuses a QP after
performing ib_modify_qp(RESET). This patch also handles a scenario in
sif_fixup_cqes where a QP has been reset multiple times but the IB user has
not polled the associated CQ completely.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: eq: Implement threaded interrupt handler

The handler function (sif_intr) for a single interrupt is mostly
processing completions notification events (CNE) as long as there
are events in the queue. Sometimes the CNEs are received at a higher
rate than the handler is able to process them, then it keeps
infinitely processing events until the queue might be full, which
leads to a fatal error, or the watchdog triggers a kernel panic,
as shown in orabug 24657844.

This commit replaces request_irq by request_threaded_irq, which
allows the driver to specify a threaded handler (sif_intr_worker)
in addition. The original handler function (sif_intr) is called
in hard interrupt context and can return IRQ_HANDLED if the timeout
SIF_IRQ_HANDLER_TIMEOUT is not exceeded or IRQ_WAKE_THREAD otherwise.

The flag IRQF_ONESHOT is used to ensure that the interrupt is
disabled when IRQ_WAKE_THREAD is returned.

Orabug: 24657844

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: eq: remove check_all_eqs_on_intr driver feature

The feature check_all_eqs_on_intr is no longer needed. This commit
removes this driver feature and hence simplifies the interrupt
handler implementation.

Orabug: 24665085

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: eq: Add max irq handling time to the sysfs eq table

Track the max time (in ms) that has been recorded for dispatching events
(dispatch_eq) on interrupt handling (sif_intr).

Orabug: 24657844

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

RDS: IB: set default frag size to 16K

For systems which wants lower fragment setting because of
smaller memory footprints, module parameter 'rds_ib_max_frag'
can be used to set lower value like 4K or 8K.

Orabug: 24656820

Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

rds: avoid call to flush_mrs() in specific condition

This is to reduce process spawn time.
When user provides 0 values for cookie and flags in rds_free_mr() call,
avoid calling flush_mr()

skgxp uses cookie 0 and flag 0 combination for checking whether
transport is RDMA capable or not.

This is short term hack for customer escalation.
Customer is having other processes which are calling flush_mrs() and
that is causing mutex contention.
skgxp change is fairly significant, and we want to provide minimal
change in customer environment.

Risk factor here is, if there is any other use of cookie 0 and flag 0
combination (like freeing up unused MRs), then that will be impacted.
Code inspection by Leo/Avneesh at skgxp and skgnfs suggests that, this
combination not being used anywhere.

Long term solution for this requires changes in RDS as well as skgxp
application, which should be done in next UEK release.
Required RDS changes are present in UEK4; however, skgxp changes are
still remaining. Since this was escalation from major customer, we
require this hack in UEK4.

Orabug: 24656750

Tested-by: Sujatha Tolstoy <sujatha.tolstoy@oracle.com>
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Leo Tominna <leo.tominna@oracle.com>
Reviewed-by: Avneesh Pant <avneesh.pant@oracle.com>
Reviewed-by: John Sobecki <john.sobecki@oracle.com>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>

mlx4_core: allow unprivileged VFs read physical port counters

For compatibility to Guest OS running older release, we allow
VFs to read physical port counters

Orabug: 24656803

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Santosh Shilimkar<santosh.shilimkar@oracle.com>

sif: Lift sif_verbs up to be independent of sif internal headers

The sif_verbs.h file needs to be independent of
other header files to be includable from other kernel.
This is necessary to avoid duplicate definition of
the API elements. For Oracle Linux this file now moves from
drivers/infiniband/hw/sif/ to include/rdma/ to make it
available for the RDS and uvNIC drivers.

This is a temporary but necessary measure while we wait
for proper generic interfaces to be defined at the common
verbs layer.

Orabug: 24524698

Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: ireg: Use the firmware release version as sysfs fw_ver

Report the official release version as reported by
ibv_query_device etc. instead of the previously used
internal firmware build version.

Orabug: 24533579

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: Remove dummy implementation of get_protocol_stats

We don't really implement it and the entry point was silently
removed in upstream commit v4.6-rc5-317-gb40f475

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Åsmund Østvold <asmund.ostvold@oracle.com>

sif: ipd: Fix incorrect calculation of ipd from static rate

Orabug: 24449061

The ipd is calculated wrongly because it compares the active speed enum
with the value return from ib_rate_to_mult. Thus, this patch converts the
PSIF Active speed enum to a multiple of the base rate of SDR (2.5 Gbps).

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: Fix recently introduced checkpatch issues

It appears the commit check in checkpatch does not capture
all errors. Fix the new ones inthe driver code to
allow us to enable a regression test for it.

Orabug: 24570578

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Åsmund Østvold <asmund.ostvold@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: sqflush: Handle duplicate completions in poll_cq

Orabug: 23759723

During the QP transition from RTS-> ERR, the HW might generate
duplicate FLUSHED-IN-ERR completion. The SIF driver inverses the
sq_seq in a dedicated completion entry and sets the
CQ_POLLING_IGNORED_SEQ bit in the cq_sw flags. Nevertheless, this bit
is cleared once a duplicate FLUSHED-IN-ERR completion is detected in
poll_cq.

The above mentioned method cannot handle a scenario where HW generates
multiple duplicate completions. Thus, this patch moves the detection
of the duplicate completions to translate_wr_id. Then, SIF driver
will only return non duplicate completions to the user.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

ib_core: make wait_event uninterruptible in ib_flush_fmr_pool()

Replace wait_event_interruptible() with wait_event() in
ib_flush_fmr_pool() to avoid deallocating pd before fmr_cleanup_thread
tears down pool of fmrs.

Orabug: 24533036

Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

sif: vlink connect is now enabled by default

This fix makes default link failover behaviour compatible with existing
mellanox CX3. Internal link status (PortState) will now follow external
link status (PortState) by default.

Driver feature mask SIFF_vlink_disconnect may be used to set default
behaviour to "vlink connect"=disabled.

Orabug: 24445370

Signed-off-by: Harald Høeg <harald.hoeg@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: sif_hwmon: add hwmon interface to export psif chip temperatures

This commit adds support to export psif chip temperatures via hwmon
interface

Orabug: 24432362

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: EPSC_API_VERSION(2,10) - EPSC_DIAG_COUNTERS

Adding a new EPSC command EPSC_DIAG_COUNTERS
to get Diag counter values via mailbox.

Orabug: 24374612

Modified-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: base: Scale default desc.array size values based on #of available CBs

With default values for #of QPs and MRs set high by default,
33 instances of the driver would consume a lot
of memory just to initialize basic tables since each of these
instances have their own 1M QP space and in effect allocates
the same amount of resources that a bare metal, single instance
driver would do.

The number of collect buffers assigned to the PCIe function tells us
what fraction of the hardware resources we got, and a small
fraction of the 16K CB space indicates that the function competes with
other functions on resources, and that it is unlikely that the same
huge number of QPs etc can be deployed with high performance
anyway.

This commit introduces tracking of module parameter settings
compared to default values, and if compiled in defaults are used,
we scale down the number of QPs etc with a factor corresponding
to the fraction of CBs we got.

This yields eg. 32K QPs per function in a 32 VF enabled system
and significantly reduces system wide memory usage in a
virtualized environment (whether Xen based or not)

Users can still override settings using the module parameters,
which will not be subject to scaling if they deviate from the
compiled in defaults.

Orabug: 24424521

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: cb: Improve algorithm for allocating and using CBs from driver

Instead of allocating bandwidth collect buffers (CBs)
as a fallback for latency CBs, and spamming the kernel log
with failure messages, instead multiplex use across
the actual allocated number of latency CBs and just report
the failure to allocate once, with values to improve debugging.

Improves behaviour for scenarios where available CB resources
are spread across many VFs but VF drivers still see a lot
of (virtual) CPUs, which will easily be the case with the
default VF settings for Xen dom0.

Also, the low latency property is most critical for req.notify PQP
requests. Use high bandwidth CBs also for PQP operations other than
the REARM request, which is the performance critical req. for
req_notify_cq. This should improve performance for event based
applications under high load.

Orabug: 24424521

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: epsc: For Xen dom0 configure resources for all 32 VFs at driver load

As of EPSC API version 2.9 firmware can distribute resources based on
the number of PCI functions the PF driver requests support for.
Older firmware will just ignore the value.

This commit enforces no VFs configured as the default setting
but enable all 32 VFs if a Xen PV domain is detected.

To allow overriding this behaviour we add a new module parameter
vf_max which can be used to override the number of VFs configured
for instance for use with other virtualization engines than Xen
and for debugging/tuning purposes. The vf_max parameter takes the
following values:

-2:  Use NVRAM configured firmware defaults (backward compat mode)
-1  (now default) : Exadata mode as described above
0-32:  Configure explicitly for that many VFs (only selected values
     are supported by firmware)

Orabug: 24424521

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: Reintroduce function name prefixes in log statements

A lot of the available messages doesn't make enough sense without the
information in the function name so just reintroduce the function
name prefixes.

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Orabug: 24437547
Pre-check: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: fmr: invalidate keys before TLB bulk invalidates

This commit reorders and sequentializes the cleanup phase when
bulk invalidates are used. The order was to post the TLB flushing
operation to the EPSC, then invalidate keys (potentially in parallel with the
ongoing flushing) before finally waiting for the TLB flushing to complete.
This way is not considered safe in general, as an incoming access to a key
can cause an invalidated PTE or PTW to be cached again and later cause
sif to read or write to a no longer valid location.

This commit makes sure that all keys are invalidated before
the TLB flushing is triggered.

Orabug: 24438867

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: dfs: Add number of entries and extent to debugfs report headers

This is a simple enhancement to make memory usage of the sif driver
more easy to explore/trace.

Also fix a few printout alignment issues in the eq file
and remove some invalid information.

Orabug: 23141108

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: Remove write access to the module parameters cb_max and cq_eq_max

Writing to these parameters after driver load does not have
the desired effect so tit should be prohibited.

Orabug: 24437094

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: sq/rq: Do not generate completion if target CQ is full

Orabug: 24378690

SIF driver needs to generate FLUSHED-IN-ERR completions using pqp
during the QP tear down phase. Neverthelss, a faulty application or an
application that does not rely on the completion (e.g ibv_*pingpong)
might cause pqp to generate completion to a full CQ. Consequently, the
pqp transitions to ERR state, and this will eventually cause the
system to crash. This patch checks for this scenario to prevent
system crash.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: sqflush: Handle the race condition between sqflush and modify_qp

Orabug: 23759723

Due to a SIF HW bug where SIF might generate duplicate completions,
the QP state must be transitioned into shadowed ERR state (HW state is
in RESET). In this case, modify_qp(ERR) will cause the QP state
transitions(HW/SW): from ERR (ERR) to RESET (ERR).

As a result, this means that SIF driver needs to generate
FLUSHED-IN-ERR completions when IB user performs post_send while the
QP is in shadowed ERR state. HW will not generate them as the HW QP
state is already in RESET state. The SIF driver generates
FLUSHED-IN-ERR if the last_set_state is in ERR state. last_set_state
is a "best effort" tracked state because QP mutex cannot be held in a
non-sleep context (post_send).

The issue happens in a multi-threaded scenario where one thread is
constantly performing post_send whereas another thread is performing
modify_qp (ERR). During the QP state transition from ERR (ERR) to
RESET (ERR), both HW and SIF driver generate the FLUSHED-IN-ERR and
eventually causing duplicate completion. This patch adds a
test in post_wa4074 to mask out this condition.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: mmu/fmr: Fix check for page table reusability

The FMR mapping logic attempts to reuse the page table if memory layout is
sufficiently similar. Currently this optimization is for simplicity
limited to very similar memory layouts and does not handle changes
in the page table level (base page size).

The test for this only considered page sizes going from small to larger
and not the opposite.

This scenario is triggered by NFSoRDMA if the previous use was in
huge page mappable memory and the current use is in more
fragmented memory.

Orabug: 21835309

Signed-off-by: Knut Omang <knut.omang@oracle.com>

sif: PSC_API_VERSION(2,9): add num_ufs to psif_epsc_csr_config

The new member num_ufs can be used by the PF driver to request a
number of UFs FW shall support.
0: use default value stored on card
1: PF (UF 0) only
...
33: fully virtualized
>33: capped by FW to 33 i.e. fully virtualized
-1: alternative PF only config not for official use

Orabug: 24424521

Reviewed-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>

RDS: IB: Fix the build error in previous commit dfbdf6c626

Commit dfbdf6c626ee ("RDS: IB: skip rx/tx work when destroying
connection")' using atomic api which doesn't exit which lead
to build error.

net/rds/ib_cm.c:1319: error: implicit declaration of function
‘smp_mb__after_atomic_inc’

Fix it.

Orabug: 24395789

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: skip rx/tx work when destroying connection

Orabug: 24395789
quickref: 24314773

There is a race between rds connection destruction (rds_ib_conn_shutdown) path
and the IRQ path (rds_ib_cq_comp_handler_recv). The IRQ path can schedule the
takelet (i_rtasklet) again (to receive data) in between of the removal of the
tasklet from list and the destruction of the connection in destuction path. When
the tasklet run, it would then access on stale (destroied) data.
A seen case is it was accessing ic->i_rcq which is set to NULL by destuction
path.

Fix:
We add a flag to rds_ib_connection structure indicating the connection is
under detroying when set. The flag is set after we reap on the receive CQ i_rcq
and before start to destroy the CQ in rds_ib_conn_shutdown(). We also flush the
rds_ib_rx running in rds_aux_wq worker thread before starting the destroy. So
that all existing run of rds_ib_rx (in tasklet path and workder thread path)
won't access distroyed receive CQ. And newly queued job (tasklet or worker) will
exist on seeing the flag set before accessing the (maybe destroied) receive CQ.
The flag is unset on new connection completions to allow access on re-created
receive CQ. This patch also takes care of rds_ib_cq_comp_handler_send (the IRQ
handler for send). And we do a final reap after destroying the QP to take care
of the flushing errors to release resouce.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: TCP: rds_tcp_accept_one() should transition socket from RESETTING to UP

Orabug 23542064

Backport of upstream commit 3bb549ae4c51 ("RDS: TCP:
rds_tcp_accept_one() should transition socket from RESETTING to UP")

The state of the rds_connection after rds_tcp_reset_callbacks() would
be RDS_CONN_RESETTING and this is the value that should be passed by
rds_tcp_accept_one() to rds_connect_path_complete() to transition the
socket to RDS_CONN_UP.

Fixes: b5c21c0947c1 ("RDS: TCP: fix race windows in send-path
quiescence by rds_tcp_accept_one()")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one()

Orabug: 23542064

Backport of upstream commit 9c79440e2c5e ("RDS: TCP: fix race windows
in send-path quiescence by rds_tcp_accept_one()")

The send path needs to be quiesced before resetting callbacks from
rds_tcp_accept_one(), and commit eb192840266f ("RDS:TCP: Synchronize
rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves
this using the c_state and RDS_IN_XMIT bit following the pattern
used by rds_conn_shutdown(). However this leaves the possibility
of a race window as shown in the sequence below
        take t_conn_lock in rds_tcp_conn_connect
        send outgoing syn to peer
        drop t_conn_lock in rds_tcp_conn_connect
        incoming from peer triggers rds_tcp_accept_one, conn is
     marked CONNECTING
        wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads
        call rds_tcp_reset_callbacks
        [.. race-window where incoming syn-ack can cause the conn
     to be marked UP from rds_tcp_state_change ..]
        lock_sock called from rds_tcp_reset_callbacks, and we set
     t_sock to null
As soon as the conn is marked UP in the race-window above, rds_send_xmit()
threads will proceed to rds_tcp_xmit and may encounter a null-pointer
deref on the t_sock.

Given that rds_tcp_state_change() is invoked in softirq context, whereas
rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT
after lock_sock could result in a deadlock with tcp_sendmsg, this
commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which
will prevent a transition to RDS_CONN_UP from rds_tcp_state_change().

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS: TCP: Retransmit half-sent datagrams when switching sockets in rds_tcp_reset_callbacks

Orabug: 23542064

Backport of upstream commit 0b6f760cff04 ("RDS: TCP: Retransmit half-sent
datagrams when switching sockets in rds_tcp_reset_callbacks")

When we switch a connection's sockets in rds_tcp_rest_callbacks,
any partially sent datagram must be retransmitted on the new
socket so that the receiver can correctly reassmble the RDS
datagram. Use rds_send_reset() which is designed for this purpose.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS: TCP: Add/use rds_tcp_reset_callbacks to reset tcp socket safely

Orabug: 23542064

Backport of upstream commit 335b48d980f6 ("RDS: TCP: Add/use
rds_tcp_reset_callbacks to reset tcp socket safely")

When rds_tcp_accept_one() has to replace the existing tcp socket
with a newer tcp socket (duelling-syn resolution), it must lock_sock()
to suppress the rds_tcp_data_recv() path while callbacks are being
changed. Also, existing RDS datagram reassembly state must be reset,
so that the next datagram on the new socket does not have corrupted
state. Similarly when resetting the newly accepted socket, appropriate
locks and synchronization is needed.

This commit ensures correct synchronization by invoking
kernel_sock_shutdown to reset a newly accepted sock, and by taking
appropriate lock_sock()s (for old and new sockets) when resetting
existing callbacks.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS: TCP: Avoid rds connection churn from rogue SYNs

Orabug: 23542064

Backport of upstream commmit c948bb5c2cc4 ("RDS: TCP: Avoid rds connection
churn from rogue SYNs")

When a rogue SYN is received after the connection arbitration
algorithm has converged, the incoming SYN should not needlessly
quiesce the transmit path, and it should not result in needless
TCP connection resets due to re-execution of the connection
arbitration logic.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS: TCP: rds_tcp_accept_worker() must exit gracefully when terminating rds-tcp

Orabug 23542064

Backport of upstream commit 37e14f4fe299 ("RDS: TCP: rds_tcp_accept_worker()
must exit gracefully when terminating rds-tcp")

There are two instances where we want to terminate RDS-TCP: when
exiting the netns or during module unload. In either case, the
termination sequence is to stop the listen socket, mark the
rtn->rds_tcp_listen_sock as null, and flush any accept workqs.
Thus any workqs that get flushed at this point will encounter a
null rds_tcp_listen_sock, and must exit gracefully to allow
the RDS-TCP termination to complete successfully.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS: TCP: Remove kfreed tcp conn from list

Orabug: 23542064

This is a backport of the upstream commit 8200a59f24ae
("rds: Remove kfreed tcp conn from list")

All the rds_tcp_connection objects are stored list, but when
being freed it should be removed from there.

Original author: Pavel Emelyanov <xemul@parallels.com>

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS: IB: Add MOS note details to link local(HAIP) address print

Update the log to include MOS note details and also make the
banner more prominent. This makes it consistent with application
flagging the similar error with MOS note details.

Orabug: 23027670

Acked-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

ib/mlx4: Initialize multiple Mellanox HCAs in parallel

This is a rework of UEK2 commit a8962313e121 ("OFED: Load multiple ...").
The goal of this patch to reduce the total mount of system boot/kernel
startup time when there are multiple Mellanox HCAs present in the system.
Typically each HCA/PF would require 6~7s to initialize plus extra time for
a certian number of VFs created by each PF. By default, multiple HCAs have
to be probed one by one in a serialized fasion.

The new scheme is to create a work request for current pci probe/mlx4 init
task and then return -EPROBE_DEFER immediately to the probe caller while
the system thread starts to execute the work request in the background.
The main pci probe thread doesn't have to wait for all the current probe
task to finish. The background init task's progress and return err code
will be saved by the sys worker thread and processed from the deferred
queue.

Orabug: 20995222

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Revert "IB/mlx4: Generate alias GUID for slaves"

Now the alias GUID management is moved to userland so
we no longer need this broken API.

Orabug: 24355806

Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

IB/mlx4: Do not generate random node_guid for VFs

Exadata fast node detection and fail-over mechanism(s) relies on the fact
that node GUID in guest is the same as in dom0.

Orabug: 22145330

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Qing Huang <qing.huang@oracle.com>

{IB/{core,ipoib},net/{mlx4,rds}}: Mark unload_allowed as __initdata variable

Replacing __read_mostly directive with __initdata since this variable is
used only during module initialization. Module parameter permissions are
changed accordingly.

Orabug: 23501273

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Reviewed-By: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

EPSC_API_VERSION(2,8) - New EPSC_QUERY_ON_CHIP_TEMP

Also added a new EPSA_GET_EXPORTED_SYMBOL_MAP
which returns a list of exported EPSA runtime symbols.

Orabugs: 24317746, 23168922

Change-Id: I97c600950fefe6649ec0a8b6539f7225d78aa9c4
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: pqp: Be less aggressive in invoking cond_resched()

This commit attempts to avoid unnecessary or potentially dangerous
calls to cond_resched() in the privileged QP completion polling code.

Privileged QP requests typically takes a few microseconds to
complete. Since we usually need the result of the operation
to be able to continue, user code usually calls poll_cq_waitfor()
to busy wait for the completion of the request. This commit
adds two measures to make this logic better:

1) Avoid rescheduling while interrupts have been turned off:
The driver was using the in_interrupt() test to avoid calling
cond_resched() from interrupt context, and leaving the rest of
the decision making of whether or not to reschedule to cond_resched().
Testing indicates that this could lead to deadlock prone calls to
schedule(), as cond_resched() would actually allow rescheduling if interrupts
have been disabled. Switch this logic to use irqs_disabled() instead, which will
cover both the interrupt case and the cases where interrupts have been disabled
by the caller.

2) Busywait for the completion for a few cycles before even trying to reschedule
or cpu_relax. Measurements indicate that 10 tries are enough to cover
a large fraction of cases on a lightly loaded system.

Orabug: 23733539

Change-Id: Ief35e1828d4dde9b692640f259c3df80ccdb553b
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Francisco Trivino-Garcia <francisco.trivino@oracle.com>

sif: xrc: Add handling for xrc_domain_violation & invalid_xrceth events

XRC spec defines xrc_domain_violation and invalid_xrceth events as
affiliated asynchronous error on XRC TGTQP, the QP is marked in ERROR.
Map these to ofed's IB_EVENT_QP_FATAL type and qp event handler.

Orabug: 24318556

Signed-off-by: Vinay Shaw <vinay.shaw@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: dfs: Minor change to print CQ tied to XRCSRQ (rq_hw).

Orabug: 24318845

Signed-off-by: Vinay Shaw <vinay.shaw@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: During driver load, hold back events instead of ignoring them

The current semantics when a queued event is received before the driver
is done loading is to ignore it with a log warning. This was not sufficient
to implement the flush_retry_qp setup, which relies on lid change events.

Unfortunately the solution to make an exception for lid events for the
flush_retry_qp is not valid because it defeats the purpose of the
check in the first place by allowing such an event to be handled before
the data structure needed to handle it is initialized.

This commit introduces a new kernel completion that the driver completes
when the whole driver load is finished. The first EPSC event queued on the
sif work queue will now block on this completion.

This covers all the remaining cases not handled by commit
"eq: Avoid enabling interrupts on TSU EQs until the initialization is complete"
and solves the original problem that introduced the need for a fix.

Orabug: 24296729

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: Let sif_remove implement the shutdown entry point

Due to bugs in the FLR handling on PSIF 2.1, we need to make sure
that the driver gets to do a full unload whenever possible.
Simply let the shutdown entry point be sif_remove similar
to the remove entry point.

Orabug: 24322970

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: pqp: Fix potential null pointer exception under high load

If a high number of invalidate requests are posted
without requesting completions, the PQP may run full
enough not to be able to allow a posted req anymore.

To handle this scenario, an additional attempt to send a
synchronous invalidate request was added. Unfortunately
that request ended up being posted with synchronous semantics
but without a handle to handle the completion.

This commit fixes this case by dynamically allocating/freeing
a handle in such situations.

Orabug: 24316139

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: fmr: call sif_post_flush_tlb with ptw flush and in SR/IOV cases

Flush the ptw cache as part of the bulk tlb flush.

With the introduction of more dynamic page tables,
the code to distinguish between the two types of cases,
simple page table entries and entries with interior nodes
was simplified because we no longer always know if PTW entries
have been consumed. Unfortunately the code was unified to the
wrong case which does not flush the ptw cache possibly causing
cache entries to remain in the cache and theoretically have
sif look up pages that no longer exists or that has been reused
for other purposes.

Also now that the EPSC handles the tlb flushing, it is just fine
to call that code even in virtualized setups.
Remove tests for whether VFs exists or we are running in a VF.

Orabug: 24315529

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: eq: Avoid enabling interrupts on TSU EQs until the initialization is complete

During driver load we might have some rare conditions where external events
are occuring before the driver is ready to accept them. The hardware workarounds
to handle issues with QP flushing are particularly sensitive to this.

Delay enabling of the IRQs that can generate interrupts for all
event queues except the EPS event queue(s) until everything is set up and ready.

Note that this commit will also implicitly cause interrupts for EQs 1-3 for each EPSA
not to be enabled. This is no big deal as they are currently not used anyway.

Orabug: 24296729

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: base: change default queue size according to ED scale_profile=1

Make queue sizes of PSIF equal to those of cx3 when using ED's
scale_profile=1.

Signed-off-by: Hakon Bugge <Haakon.Bugge@oracle.com>
Orabug: 23141108
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: sif_eq: fix missing qp->refcnt decrement for COMM_EST events

qp->refcnt is increased by 1 when event_status_communication_established
is dispatched but later it is not decremented when handling the event
work for IB_EVENT_COMM_EST for UD & RAW QP types.

This commit decrements qp->refcnt for those cases too and fixes another
potential bug by moving the sif_log line up before the qp->refcnt
decrement.

Orabug: 24288467

Signed-off-by: Francisco Triviño <francisco.trivino@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

EPSC_API_VERSION(2,6) - Adding retrieval of SMP and vlink connect modes

Orabug: 23634562

Change-Id: Ic3eea7a7297c9ff97e72cb25dda4ba44fdfa2937
Signed-off-by: Harald Høeg <harald.hoeg@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: eq: increase cq_eq_max to 46

PSIF supports 48 msi-x interrupts. We associate one msi-x per event
queue (EQ). Further, PSIF need one eq for epsc and one for async
events from the hardware. That leaves 46 for completion notification
events or completion vectors.

This commit also reduces the number of completion notification event
queues to the lesser of the number of cpus present and the default.

Note, this requires fw 1.0.0.1 or newer...

Orabug: 23705843

Change-Id: Iea9101bf09203dff86403453a7e0690cb31b3756
Signed-off-by: Hakon Bugge <Haakon.Bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

sif: sif_r3: implemented WA#4074 stats counters

This commit added both wa4074 and wa4059 statistics
to help to identify potential issues when the work-
around are applied.

The wa4074 stats implementation is based on:

a) pre_wa4074_cnt == post_wa4074_cnt. This means the
w/a is triggered from the modify_qp_hw.
b) pre_wa4074_cnt != post_wa4074_cnt. post_wa4074 is
triggered from other scenarios too.
c) post_wa4074_err_cnt != 0. It means that post_wa4074
fails.
d) wrs_csum_corr_wa4074_cnt indicates the number of
WRs that were csum corrupted.
e) rcv_snd_gen_wa4074_cnt shows the number of recv
and send cqe's were generated.

The wa4059 stats indicate the number of keep-alive
events that have been sent.

This commit also improves wa3714 stats implementation
by using atomic64 counters and enumeration values,
and other minor changes such as clean up and fix
typos on comment messages.

Orabug: 23760170

Signed-off-by: Triviño <francisco.trivino@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: Remove software emulation of > 16 SGEs

Orabug: 24310514

Change-Id: I1886d138b0ff103b074c45da475b3052dd1fd9b1
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

sif: rq: Do not clear the rq_sw until the completion of flush_rq

Orabug: 23754857

The rq can be invalidated from reset_qp or flush_rq. Nevertheless,
the rq_sw data structure has been reset after rq is invalidated in
reset_qp regardless of the completion of the flush_rq. Thus, move
the rq synchronization to reset_qp, and place the synchronization
in between of reset_qp and flush_rq. After invalidating and
reseting the rq, no flush rq is required as both head and tail
have been reset to 0.

This commit creates another atomic_t variable for the synchronization
between reset rq and flush_rq.

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>

IBCM: dereference timewait_info only when needed

timewait_info is available in valid CM states and may
not be even allocated in invalid states.

Lets move the dereferencing only when we need in
those valid state.

Orabug: 24326732

Reviewed-by: Hakon Bugge <Haakon.Bugge@oracle.com>
Tested-by: Efrain Galaviz <efrain.galaviz@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

IB: Add RNR timer workaround for PSIF

The RNR NAK Retry timer on Titan and Sonoma 1&2 IB subsystems runs 500
times faster than desired. This means that retries are started a lot
sooner than they should.

The software workaround is bit involved and intrusive because it needs
to work in mixed HCA environments. It uses CM protocol to detect the
involvement of the offending IB requestor and then enables the
workaround in the peer responder. To keep the workaround flag
persistent, ib_qp verbs need to carry the flag which impacts
IB core kABI which is wrapped under __GENKSYMS__.

The workaround matches the desired RNR NAK Retry timer value when the
encodings 1 to 14 (decimal) are supplied. For encodings larger than 14
and for zero, the work-around will set the largest possible RNR NAK
Timer value for the offending requestor, which is 1,31 ms.

Thanks to Trivino, Haakon for updates and wide range of testing for
kernel as well as userland with mixed HCA configurations.

Orabug: 23633926

Reviewed-by Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: David Brean <david.brean@oracle.com>
Tested-by: Francisco Triviño García <francisco.trivino@oracle.com>
Signed-off-by: Francisco Triviño García <francisco.trivino@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

IB/core: Add encode/decode FDR/EDR rates

The cases for FDR/EDR signalling speed, was missing in
ib_rate_to_mult and mult_to_ib_rate giving wrong return values
when drivers are converting static rate to/from inter-packet-delay.

Orabug: 23084916

Change-Id: Ib1d6e84eeea1addb830c415faf92f9f430c4ba32
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>

xsigo: SKB Frag cleanup

Orabug: 23514725

Fixed pre-allocating transmit scatter gather lists by using
max_sge variable instead of MAX_SKB_FRAGS

some changes to prints

Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

xsigo: Tx_tail goes outof bound

Orabug: 23514725

Fixed a rare condition where tx_tail value goes out of bound, by properly
locking poll_tx

Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>
Reviewed-by: Haakon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

xsigo: Fixed Path locking issues

Orabug: 23514725

Changed xve_put_path to allow condition where caller
holds private lock, priv->lock
Removed path_free function and put all the functionality
in xve_put_path
No need for using scatter-gatter when MTU is less than admin mtu
instead of multicast mtu, as admin MTU is the driving factor for
vnic

Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>
Reviewed-by: Haakon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Asmund Ostvold <asmund.ostvold@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

net/rds: Skip packet filtering if interface does not support ACL

NULL value returned from ib_cm_dpp_acl_lookup for a given DPP means that
this DPP is not under ACL protection.
In this case we skip packet filtering.

Orabug: 23541567

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: Fix the rds_conn_destroy panic due to pending messages

In corner cases, there could be pending messages on connection which
needs to be detsroyed. Make sure those messages are purged before
the connection is torned down.

Orabug: 23222944

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: add handshaking for ACL violation detection at passive

Offending connections with ACL violations should be cleaned up as
early as possible. When active detects ACL violation and sends reject;
it fills up private_data field. Passive checks for private_data
whenever it receives reject; and in case of ACL violation it destroys
connection.

Orabug: 23222944

Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: enforce IP anti-spoofing based on ACLs

Connection is established only after the IP requesting the connection
is legitimate and part of the ACL group. Invalid connection request(s)
are rejected and destroyed.

Ajay moved destroy connection when ACL check fails while initiating
connection to avoid unnecessary packet transfer on wire.

Orabug: 23222944

Signed-off-by: Bang Ngyen <bang.nguyen@oracle.com>
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: Add acl fields to the rds_connection

ACL can enabled on connections and to track them per connection,
lets add couple of fields.

Orabug: 23222944

Signed-off-by: Bang Ngyen <bang.nguyen@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

RDS: IB: invoke connection destruction in worker

This is to avoid deadlock with c_cm_lock mutex.
In event handling path of Infiniband, whenever connection destruction is
required; we should invoke worker in order to avoid deadlock with mutex.

Orabug: 23222944

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>

RDS: Add reset all conns for a source address to CONN_RESET

RDS_CONN_RESET SO gets enhanced to support reseting all
connections associated with a local address.

$rds-stress -r <SRC_IP> -s 0 --reset

Orabug: 23222944

Reported-by: Bang Ngyen <bang.nguyen@oracle.com>
Acked-by: Bang Ngyen <bang.nguyen@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

IB/mlx4: Generate alias GUID for slaves

Generate alias GUID by changing the fourth byte to be the GUID index in the
port GUID table.

This is porting of a work done in uek2 for Oracle purpose only.

Orabug: 23222944

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>

IB/ipoib: ioctl interface to manage ACL tables

Expose ioctl to manage ACL content by application layer.

Orabug: 23222944

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
merge into IOCTL code

IB/ipoib: sysfs interface to manage ACL tables

Expose sysfs interface for ACL to be used for debug.

Orabug: 23222944

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

IB/{cm,ipoib}: Filter traffic using ACL

Implement two packet filtering points, one at ib_ipoib driver when
processing ARP packets and second in ib_cm when processing connection
requests.

Orabug: 23222944

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>