RDS active bonding gratuitous arp code needs to adopt to this change
to take advantage of the neighbor updates on UEK4. The current code
makes use ARPOP_REPLY which needs to be changed to ARPOP_REQUEST.
RDS: IB: don't use the link-local address for ib transport
Link-local address can't be used for IB failover and don't work
with IB stack. Even though the DB RDS usage has recommnded to not
use these addresses, we keep hitting issue because of accidental
usage of it because of missing application config or admin scripts
blindly doing rds-ping for each local address(s).
RDS TCP which doesn't support acitive active, there might be an
usecase so the current fix it limited for IB transport atm.
cosnole:
RDS/IB: Link local address 169.254.221.37 NOT SUPPORTED
RDS: rds_bind() could not find a transport for 169.254.221.37, load rds_tcp or rds_rdma?
Santosh Shilimkar [Wed, 18 May 2016 17:44:56 +0000 (10:44 -0700)]
RDS: IB: rebuild receive caches when needed
RDS IB caches code leaks memory & it have been there from the
inception of cache code but we didn't noticed them since caches
are not teardown in normal operation paths. But now to support
features like variable fragment or connection destroy for ACL,
caches needs to be destroyed and rebuild if needed.
While freeing the caches is just fine, leaking memory while
doing that is bug and needs to be addressed. Thanks to Wengang
for spotting this stone age leak. Also the cache rebuild needs
to be done only when desired so patch optimises that part as
well.
Tested-by: Michael Nowak <michael.nowak@oracle.com> Tested-by: Maria Rodriguez <maria.r.rodriguez@oracle.com> Tested-by: Hong Liu <hong.x.liu@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Ajaykumar Hotchandani [Tue, 10 May 2016 22:43:48 +0000 (15:43 -0700)]
OFED: indicate consistent vendor error
vendor error print should be consistent across protocols to avoid
any confusion.
Currently, it's decimal at some places and hex at some places.
This patch corrects that.
Avinash Repaka [Tue, 17 May 2016 21:42:19 +0000 (14:42 -0700)]
RDS: Change number based conn-drop reasons to enum
This patch converts the number based connection-drop reasons to enums,
making it easy to grep the reasons and to develop new patches based on
these reasons.
Avinash Repaka [Wed, 18 May 2016 22:09:05 +0000 (15:09 -0700)]
RDS: Move rds_rtd definitions from rds_rt_debug files to common files
This patch moves rds_rtd definitions from rds_rtd_debug.h to rds.h and
rds_rt_debug_bitmap modparam definition from rds_rt_debug.c to af_rds.c.
The patch removes rds_rt_debug files since there isn't much content
in these files to be held separately.
Commit 'ib/rds: runtime debuggability enhancement' originally defined
rds_rtd definitions.
RDS: Change the default value of rds_rt_debug_bitmap modparam to 0x488B
This patch changes the default value of rds_rt_debug_bitmap module
parameter to 0x488B to enable RDS_RTD_ERR, RDS_RTD_ERR_EXT, RDS_RTD_CM,
RDS_RTD_ACT_BND, RDS_RTD_RCV, RDS_RTD_SND flags of rds_rtd.
shamir rabinovitch [Wed, 18 May 2016 10:18:10 +0000 (06:18 -0400)]
IB/mlx4: Fix unaligned access in send_reply_to_slave
The problem is that the function 'send_reply_to_slave' gets the
'req_sa_mad' as a pointer whose address is only aliged to 4 bytes
but is 8 bytes in size. This can result in unaligned access faults
on certain architectures.
Sowmini Varadhan pointed to this reply from Dave Miller that say
that memcpy should not be used to solve alignment issues:
https://lkml.org/lkml/2015/10/21/352
Optimization of memcpy to 'ldx' instruction can only happen if the
compiler knows that the size of the data we are copying is 8 bytes
and it assumes it is aligned to 8 bytes. If the compiler know the
type is not aligned to 8 it must not optimize the 8 byte copy.
Defining the data type as aligned to 4 forces the compiler to treat
all accesses as though they aren't aligned and avoids the 'ldx'
optimization.
Full credit for the idea goes to Jason Gunthorpe
<jgunthorpe@obsidianresearch.com>.
Jason Gunthorpe [Mon, 11 Apr 2016 01:13:13 +0000 (19:13 -0600)]
IB/security: Restrict use of the write() interface
The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl(). This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.
For the immediate repair, detect and deny suspicious accesses to
the write API.
For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).
The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.
Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ] Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
CVE-2016-4565
Orabug: 23276449
Santosh Shilimkar [Tue, 17 May 2016 21:46:18 +0000 (14:46 -0700)]
RDS: IB: disable ib_cache purging to avoid memory leak in reconnect path
RDS IB caches don't work in reconnect path and if used can lead to
memory leaks. These leaks have been there for long time but we didn't
hit them since caches are not teardown in reconnect path. For different
frag rolling upgrade/downgrade support, its needed to work in reconnect
path but needs additional fixes.
Since the leak is blocking rest of the testing, temporary the cache
purging is disabled. It will be added back once fully fixed.
Tested-by: Hong Liu <hong.x.liu@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Backport of upstream commit bd7c5f983f31 ("RDS: TCP: Synchronize accept()
and connect() paths on t_conn_lock.")
An arbitration scheme for duelling SYNs is implemented as part of
commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") which ensures that both nodes
involved will arrive at the same arbitration decision. However, this
needs to be synchronized with an outgoing SYN to be generated by
rds_tcp_conn_connect(). This commit achieves the synchronization
through the t_conn_lock mutex in struct rds_tcp_connection.
The rds_conn_state is checked in rds_tcp_conn_connect() after acquiring
the t_conn_lock mutex. A SYN is sent out only if the RDS connection is
not already UP (an UP would indicate that rds_tcp_accept_one() has
completed 3WH, so no SYN needs to be generated).
Similarly, the rds_conn_state is checked in rds_tcp_accept_one() after
acquiring the t_conn_lock mutex. The only acceptable states (to
allow continuation of the arbitration logic) are UP (i.e., outgoing SYN
was SYN-ACKed by peer after it sent us the SYN) or CONNECTING (we sent
outgoing SYN before we saw incoming SYN).
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Backport of upstream commit eb192840266f ("RDS:TCP: Synchronize
rds_tcp_accept_one with rds_send_xmit when resetting t_sock")
There is a race condition between rds_send_xmit -> rds_tcp_xmit
and the code that deals with resolution of duelling syns added
by commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()").
Specifically, we may end up derefencing a null pointer in rds_send_xmit
if we have the interleaving sequence:
rds_tcp_accept_one rds_send_xmit
The race condition can be avoided without adding the overhead of
additional locking in the xmit path: have rds_tcp_accept_one wait
for rds_tcp_xmit threads to complete before resetting callbacks.
The synchronization can be done in the same manner as rds_conn_shutdown().
First set the rds_conn_state to something other than RDS_CONN_UP
(so that new threads cannot get into rds_tcp_xmit()), then wait for
RDS_IN_XMIT to be cleared in the conn->c_flags indicating that any
threads in rds_tcp_xmit are done.
Fixes: 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
A pattern of skb usage seen in modules such as RDS-TCP is to
extract `to_copy' bytes from the received TCP segment, starting
at some offset `off' into a new skb `clone'. This is done in
the ->data_ready callback, where the clone skb is queued up for rx on
the PF_RDS socket, while the parent TCP segment is returned unchanged
back to the TCP engine.
The existing code uses the sequence
clone = skb_clone(..);
pskb_pull(clone, off, ..);
pskb_trim(clone, to_copy, ..);
with the intention of discarding the first `off' bytes. However,
skb_clone() + pskb_pull() implies pksb_expand_head(), which ends
up doing a redundant memcpy of bytes that will then get discarded
in __pskb_pull_tail().
To avoid this inefficiency, this commit adds pskb_extract() that
creates the clone, and memcpy's only the relevant header/frag/frag_list
to the start of `clone'. pskb_trim() is then invoked to trim clone
down to the requested to_copy bytes.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
IPoIB collects statistics of traffic including number of packets
sent/received, number of bytes transferred, and certain errors. This
patch makes these statistics available to be queried by ethtool.
IPoIB puts skb-fragments in SGEs adding 1 extra SGE when SG is enabled.
Current codepath assumes that the max number of SGEs a device supports
is at least MAX_SKB_FRAGS+1, there is no interaction with upper layers
to limit number of fragments in an skb if a device suports fewer SGEs.
The assumptions also lead to requesting a fixed number of SGEs when
IPoIB creates queue-pairs with SG enabled.
A fallback/slowpath is implemented using skb_linearize to
handle cases where the conversion would result in more sges than supported.
Change-Id: Ia81e69d7231987208ac298300fc5b9734f193a2d Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
From Avinash Repaka <avinash.repaka@oracle.com>:
This patch reverts the fix for Orabug: 22661521, since the fix assumes
that the memory region is always aligned on a page boundary, causing an
EMSGSIZE error when trying to register 1MB region that isn't 4KB aligned.
These issues were observed on kernel 4.1.12-39.el6uek tag.
Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Acked-by: Chuck Anderson <chuck.anderson@oracle.com>
RDS_TCP_DEFAULT_BUFSIZE has been unused since commit 1edd6a14d24f
("RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune").
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Upstream commit c6a58ffed536 ("RDS: TCP: Add sysctl tunables for
sndbuf/rcvbuf on rds-tcp socket")
Add per-net sysctl tunables to set the size of sndbuf and
rcvbuf on the kernel tcp socket.
The tunables are added at /proc/sys/net/rds/tcp/rds_tcp_sndbuf
and /proc/sys/net/rds/tcp/rds_tcp_rcvbuf.
These values must be set before accept() or connect(),
and there may be an arbitrary number of existing rds-tcp
sockets when the tunable is modified. To make sure that all
connections in the netns pick up the same value for the tunable,
we reset existing rds-tcp connections in the netns, so that
they can reconnect with the new parameters.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Wei Lin Guay [Wed, 27 Jan 2016 12:18:08 +0000 (13:18 +0100)]
RDS: add flow control info to rds_info_rdma_connection
Added per connection flow_ctl_post_credit and
flow_ctl_send_credit to rds-info. These info help
in debugging RDS IB flow control. The newly added
attributes are placed at the bottom of the data
structure to ensure backward compatibility.
Wei Lin Guay [Thu, 17 Dec 2015 08:34:33 +0000 (09:34 +0100)]
RDS: update IB flow control algorithm
The current algorithm that uses 16 as a hard-coded value
in rds_ib_advertise_credits() doesn't serve the purpose, as
post_recvs() are performed in bulk. Thus, the test
condition will always be true.
This patch moves rds_ib_advertise_credits() in to the
post_recvs() loop. Instead of updating the post_recv credits
after all the post_recvs() have completed, the post_recv
credit is being updated in log2 incremental manner.
The proposed exponential quadrupling algorithm serves as a
good compromise between early start of the peer and at the
same time reducing the amount of explicit ACKs. The credit
update explicit ACKs will be generated starting from 16,
256, 4096...etc.
The performance number below shows that this new flow
control algorithm has minimal impact performance even though
it requires additional explicit ACKs.
IB flow control is always disabled regardless of
rds_ib_sysctl_flow_control flag.
The issue is that the initial credit advertisement
annouces zero credits, because ib_recv_refill() has
not yet been called. An initial credit offering
of zero effectively disables flow control.
IB flow control is only enabled if both active and
passive connections have set the rds_ib_sysctl
flow_control flag. E.g,
Conn. A (on), Conn. B (on) = enable
Conn. A (off), Conn. B (on) = disable
Conn. A (on), Conn. B (off) = disable
Conn. A (off), Conn. B (off) = disable
mlx4_core: scale_profile should work without params set to 0
The "scale_profile" parameter is to be used to do scaling of
hca params without requiring specific tuning setting for each
one of them individually and yet allowing manual setting of
variables.
In UEK2, the module params were default zero and initialized
later from a "driver default" or a scale_profile dictated value.
In UEK4 (derived from Mellanox OFED 2.4) module params were
pre-initialized to default values and are not zero and have
to be forced to 0 for dynamic scaling to be activated.
This defeats the purpose of having a single parameter to achieve
scaling and not requiring setting individual parameters (while
retaining ability to revert to driver defaults).
The changes here (re)introduce a separate static instance
containing default parameters separate from module parameters
which are pre-initialized to zero for parameters that can scale
dynamically and to default values for others. The zero module
parameters are later initialized to either a scale_profile
governed value or driver defaults.
When RAC tries to scale RDS-TCP, they are hitting bottlenecks
due to inefficiencies in rds_bind_lookup. Each call to
rds_bind_lookup results in an irqsave/irqrestore sequence, and
when the list of RDS sockets is large, we end up having IRQs
suppressed for long intervals. This trigger flow-control assertions
and causes TX queue watchdog hangs in the sender. The current
implementation makes this even worse, by superfluously calling
rds_bind_lookup(). This patch set takes the first step to solving
this problem by avoiding one of the redundant calls to rds_bind_lookup.
When errors such as connection hangs or failures are encountered
over RDS-TCP, the sending RDS, in an attempt at HA, will try to
reconnect, and trip up on all sorts of data structures intended
for ToS support. The ToS feature is currently only supported for
RDS-IB, and unplanned/untested usage of these data
structures by RDS-TCP causes deadlocks and panics.
Until we properly design, support, and test the ToS feature for
RDS-TCP, such paths should not be wandered into. Thus this patchset
adds defensive checks to ignore rs_tos settings in rds_sendmsg() for
TCP transports, and prevents the sending of ToS heartbeat pings
Until we properly design, support, and test the ToS feature for
RDS-TCP, such paths should not be wandered into. Thus this patchset
adds defensive checks to ignore rs_tos settings in rds_sendmsg() for
TCP transports, and prevents the sending of ToS heartbeat pings
in rds_send_hb() for TCP transport.
For reference, the deadlock that can be encountered in the
hb ping path is:
shamir rabinovitch [Wed, 16 Mar 2016 13:57:19 +0000 (09:57 -0400)]
rds: rds-stress show all zeros after few minutes
Issue can be seen on platforms that use 8K and above page size
while rds fragment size is 4K. On those platforms single page is
shared between 2 or more rds fragments. Each fragment has it's own
offeset and rds cong map code need to take this offset to account.
Not taking this offset to account lead to reading the data fragment
as congestion map fragment and hang of the rds transmit due to far
cong map corruption.
Two different threads with different rds sockets may be in
rds_recv_rcvbuf_delta() via receive path. If their ports
both map to the same word in the congestion map, then
using non-atomic ops to update it could cause the map to
be incorrect. Lets use atomics to avoid such an issue.
Full credit to Wengang <wen.gang.wang@oracle.com> for
finding the issue, analysing it and also pointing out
to offending code with spin lock based fix.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
rds_fmr_flush workqueue is calling ib_unmap_fmr
to invalidate a list of FMRs. Today, this workqueue
can be scheduled at any CPUs. In a NUMA-aware system,
schedule this workqueue to run on a CPU core closer to
ib_device can improve performance. As for now, we use
"sequential-low" policy. This policy selects two lower
cpu cores closer to HCA. In a non-NUMA aware system,
schedule rds_fmr_flush workqueue in a fixed cpu core
improves performance.
The mapping of cpu to the rds_fmr_flush workqueue
can be enabled/disabled via sysctl and it is enable
by default. To disable the feature, use below sysctl.
rds_ib_sysctl_disable_unmap_fmr_cpu = 1
Putting down some of the rds-stress performance number
comparing default and sequential-low policy in a NUMA
system with Oracle M4 QDR and Mellanox CX3.
rds-stress 4 conns, 32 threads, 16 depths, RDMA write
and unidirectional (higher is better).
Santosh Shilimkar [Wed, 21 Oct 2015 23:47:28 +0000 (16:47 -0700)]
RDS: IB: support larger frag size up to 16KB
Infiniband (IB) transport supports larger message size
than RDS_FRAG_SIZE, which is usually in 4KB PAGE_SIZE.
Nevertheless, RDS always fragments each payload into
RDS_FRAG_SIZE before hands it over to the underlying
IB HCA.
One of the important message size required for database
is 8448 (8K + 256B control message) for BCOPY. This RDS
message, even with IB transport, will generate three
IB work requests (WR) with each having its own RDS header.
This series of patches improve RDS performance by allowing
IB transport to send/receive RDS message with a larger
RDS_FRAG_SIZE (Ideally, using a single WR).
In order to maintain the backward compatibility and
interoperability between various RDS versions, and at
the same time to support various FRAG_SIZE, the IB
fragment size is negotiated per connection.
Although IB is capable of supporting 4GB of message size,
currently we limit the IB RDS_FRAG_SIZE up to 16KB due to
two reasons:-
1. This is needed for current 8448 RDS message size usecase.
2. Minizing the size for each receive queue entry in order
to optimal memory usage.
In term of implementation, The 'dp_reserved2' field of
'struct rds_ib_connect_private' now carries information about
supported IB fragment size. Since we are just
using the IB connection private data and a reserved field,
the protocol version is not bumped up. Furthermore, the feature
is enabled only for RDS_PROTOCOL_v4.1 and above (future).
To keep thing simpler for user, a module parameter
'rds_ib_max_frag' is provided. Without module parameter,
the default PAGE_SIZE frag will be used. During the connection
establishment, the smallest fragment size will be
chosen. If the fragment size is 0, it means RDS module
doesn't support large fragment size and the default
RDS_FRAG_SIZE will be used.
Upto ~10+ % improvement seen with Orion and ~9+ % with RDBMS
update queries.
Santosh Shilimkar [Mon, 21 Mar 2016 06:24:32 +0000 (23:24 -0700)]
RDS: IB: purge receive frag cache on connection shutdown
RDS IB connections can be formed with different fragment size across
reconnect and hence the current frag cache needs to be purged to
avoid stale frag usages.
Santosh Shilimkar [Wed, 4 Nov 2015 21:42:39 +0000 (13:42 -0800)]
RDS: IB: scale rds_ib_allocation based on fragment size
The 'rds_ib_sysctl_max_recv_allocation' allocation is used to manage
and allocate the size of IB receive queue entry (RQE) for each IB
connection. However, it relies on the hardcoded RDS_FRAG_SIZE.
Lets make it scalable based on supported fragment sizes for different
IB connection. Each connection can allocate different RQE size
depending on the per connection fragment_size.
Santosh Shilimkar [Wed, 21 Oct 2015 23:47:28 +0000 (16:47 -0700)]
RDS: IB: make fragment size (RDS_FRAG_SIZE) dynamic
IB fabric is capable of fragment 4GB of data payload into
send_first, send_middle and send_last. Nevertheless,
RDS fragments each payload into PAGE_SIZE, which is usually
4KB. This patch makes the RDS_FRAG_SIZE for RDS IB transport
dynamic.
In the preperation for subsequent patch(es), this patch
adds per connection peer negotiation to determine the
supported fragment size for IB transport.
Orabug: 21894138 Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com> Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Wei Lin Guay [Fri, 20 Nov 2015 23:14:48 +0000 (15:14 -0800)]
RDS: fix the sg allocation based on actual message size
Fix an issue where only PAGE_SIZE bytes are allocated per
scatter-gather entry (SGE) regardless of the actual message
size: Furthermore, use buddy memory allocation technique to
allocate/free memmory (if possible) to reduce SGE.
Santosh Shilimkar [Mon, 16 Nov 2015 21:28:11 +0000 (13:28 -0800)]
RDS: make congestion code independent of PAGE_SIZE
RDS congestion map code is designed with base assumption of
4K page size. The map update as well transport code assumes
it that way. Ofcourse it breaks when transport like IB starts
supporting larger fragments than 4K.
To overcome this limitation without too many changes to the core
congestion map update logic, define indepedent RDS_CONG_PAGE_SIZE
and use it.
While at it we also move rds_message_map_pages() whose sole
purpose it to map congestion pages to congestion code.
Santosh Shilimkar [Tue, 16 Feb 2016 18:24:31 +0000 (10:24 -0800)]
RDS: Back out OoO send status fix since it causes the regression
With the DB build, the crash was observed which was boiled down to
this change on UEK2. Proactively we back this out on UEK4 as well
till the issue gets addresssed.
net/mlx4_core: Modify default value of log_rdmarc_per_qp to be consistent with HW capability
This value is used to calculate max_qp_dest_rdma.
Default value of 4 brings us to 16 while HW supports 128
(max_requester_per_qp)
Although this value can be changed by module param it is best that default
will be optimized
Pradeep Gopanapalli [Tue, 1 Mar 2016 01:45:59 +0000 (01:45 +0000)]
Fixed vnic issue after saturn reset
After saturn reboots uVnic will not recover
The issue happens When a host connected to NM3 via infiniband switch.
The fact here is xve driver sets OPER_DOWN when it losses HEART_BEAT and
there is no way for it set this state back .
Fixing this by disjoining Multicast group once HeartBeat is lost and
joining back once saturn comes back
Pradeep Gopanapalli [Thu, 11 Feb 2016 23:29:39 +0000 (23:29 +0000)]
uvnic issues
When there is Heartbeat loss because of Link event on EDR fabric, uvnic
recovery fails(some times).
This is due to the fact that XVE_OPER_UP state would have cleared and
xve driver has to check this once multicast group membership is retained
While running MAXq test we are running into soft crash around
skb_try_coalesce .Since xve driver allocates PAGE_SIZE it has to use
full PAGE_SIZE for truesize of skb.
xve driver doesn't allocate enough tailroom in skbs, so IP/TCP
stacks need to reallocate skb head to pull IP/TCP headers.
this .
xve allocates some resources which are not needed by uVnic
functionality.
Fix this by using cm_supported flag.
Venkat Venkatsubra [Tue, 1 Mar 2016 22:27:28 +0000 (14:27 -0800)]
RDS/IB: VRPC DELAY / OSS RECONNECT CAUSES 5 MINUTE STALL ON PORT FAILURE
This problem occurs when the user gets notified of a successful
rdma write + bcopy message completion but the peer application
does not receive the bcopy message. This happens during a port down/up test.
What seems to happen is the rdma write succeeds but the bcopy message fails.
RDS should not be returning successful completion status to the user
in this case.
When RDS does a rdma followed by a bcopy message the user notification is
supposed to be implemented by method #3 below.
/* If the user asked for a completion notification on this
* message, we can implement three different semantics:
* 1. Notify when we received the ACK on the RDS message
* that was queued with the RDMA. This provides reliable
* notification of RDMA status at the expense of a one-way
* packet delay.
* 2. Notify when the IB stack gives us the completion event for
* the RDMA operation.
* 3. Notify when the IB stack gives us the completion event for
* the accompanying RDS messages.
* Here, we implement approach #3. To implement approach #2,
* we would need to take an event for the rdma WR. To implement #1,
* don't call rds_rdma_send_complete at all, and fall back to the notify
* handling in the ACK processing code.
But unfortunately the user gets notified earlier to knowing the bcopy
send status. Right after rdma write completes the user gets notified
even though the subsequent bcopy eventually fails.
The fix is to delay signaling completions of rdma op till the
bcopy send completes.
Ajaykumar Hotchandani [Fri, 4 Mar 2016 03:23:05 +0000 (19:23 -0800)]
rds: add infrastructure to find more details for reconnect failure
This patch adds run-time support to debug scenarios where reconnect is
not successful for certain time.
We add two sysctl variables for start time and end time. These are
number of seconds after reconnect was initiated.
Ajaykumar Hotchandani [Fri, 4 Mar 2016 03:18:28 +0000 (19:18 -0800)]
rds: find connection drop reason
This patch attempts to find connection drop details.
Rational for adding this type of patch is, there are too many
places from where connection can get dropped.
And, in some cases, we don't have any idea of the source of
connection drop. This is especially painful for issues which
are reproducible in customer environment only.
Idea here is, we have tracker variable which keeps latest value
of connection drop source.
We can fetch that tracker variable as per our need.
Santosh Shilimkar [Fri, 11 Dec 2015 20:01:56 +0000 (12:01 -0800)]
RDS: Add interface for receive MSG latency trace
Socket option to tap receive path latency.
SO_RDS: SO_RDS_MSG_RXPATH_LATENCY
with parameter,
struct rds_rx_trace_so {
u8 rx_traces;
u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
}
CMSG:
RDS_CMSG_RXPATH_LATENCY(recvmsg)
Returns rds message latencies in various stages of receive
path in nS. Its set per socket using SO_RDS_MSG_RXPATH_LATENCY
socket option. Legitimate points are defined in
enum rds_message_rxpath_latency. More points can be added in
future.
Wengang Wang [Thu, 17 Dec 2015 02:54:15 +0000 (10:54 +0800)]
IB/mlx4: Replace kfree with kvfree in mlx4_ib_destroy_srq
Upstream commit 0ef2f05c7e02ff99c0b5b583d7dee2cd12b053f2 uses vmalloc for
WR buffers when needed and uses kvfree to free the buffers. It missed
changing kfree to kvfree in mlx4_ib_destroy_srq().
Reported-by: Matthew Finaly <matt@Mellanox.com> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Backport from upstream: df4176677fdf7ac0c5748083eb7a5b269fb1e156
Add udata argument to shared pd interface alloc_shpd()
consistent with evolution of other similar ib_core
interfaces so providers that wish to support it can use it.
For providers (like current Mellanox driver code) that
do not expect user user data, we assert a warning.
Santosh Shilimkar [Thu, 29 Oct 2015 16:24:46 +0000 (09:24 -0700)]
RDS: establish connection for legitimate remote RDMA message
The first message to a remote node should prompt a new
connection even if it is RDMA operation via CMSG. So that
means before CMSG parsing, the connection needs to be
established. Commit 3d6e0fed8edc ("rds_rdma: rds_sendmsg
should return EAGAIN if connection not setup")' tried to
address that issue as part of bug 20232581.
But it inadvertently broke the QoS policy evaluation. Basically
QoS has opposite requirement where it needs information from
CMSG to evaluate if the message is legitimate to be sent over
the wire. It basically needs to know how the total payload
which should include the actual payload and additional rdma
bytes. It then evaluates total payload with the systems QoS
thresholds to determine if the message is legitimate to be
sent.
Patch addresses these two opposite requirement by fetching
only the rdma bytes information for QoS evaluation and let
the full CMSG parsing happen after the connection is
initiated. Since the connection establishment is asynchronous,
we make sure the map failure because of unavailable
connection reach to the user by appropriate error code.
RDS: Add support for per socket SO_TIMESTAMP for incoming messages
The SO_TIMESTAMP generates time stamp for each incoming RDS message
using the wall time. Result is returned via recv_msg() in a control
message as timeval (usec resolution).
User app can enable it by using SO_TIMESTAMP setsocketopt() at
SOL_SOCKET level. CMSG data of cmsg type SO_TIMESTAMP contains the
time stamp in struct timeval format.
If the RDS user application requests notification of RDMA send
completions, there is a possibility that RDS_CMSG_RDMA_SEND_STATUS
will be delivered out-of-order.
This can happen if RDS drops sending ACK after it received an explicit
ACK. In this case, the rds message ended up in reverse order in the
list.
Pradeep Gopanapalli [Thu, 5 Nov 2015 02:58:15 +0000 (18:58 -0800)]
1) Support vnic for EDR based platform(uVnic) 2) Supported Types now Type 0 - XSMP_XCM_OVN - Xsigo VP780/OSDN standalone Chassis, (add pvi) Type 1 - XSMP_XCM_NOUPLINK - EDR Without uplink (add public-network) Type 2 - XSMP_XCM_UPLINK -EDR with uplink (add public-network <with -if> 3) Intelligence in driver to support all the modes 4) Added Code for printing Multicast LID [Revision 8008] 5) removed style errors
There are several hits that WR buffer allocation(kmalloc) failed.
It failed at order 3 and/or 4 contigous pages allocation. At the same time
there are actually 100MB+ free memory but well fragmented.
So try vmalloc when kmalloc failed.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Qing Huang [Tue, 6 Oct 2015 22:32:22 +0000 (15:32 -0700)]
net/rds: start rdma listening after ib/iw initialization is done
This prevents RDS from handling incoming rdma packets before RDS
completes initializing its recv/send components.
We don't need to call rds_rdma_listen_stop() if rds_rdma_listen_init()
didn't succeed.
We only need to call rds_ib_exit() if rds_ib_init() succeeds but
other parts fail. The same applies to rds_iw_init()/rds_iw_exit().
So we need to change error handling sequence accordingly.
Jump to ib/iw error handling path when we get an err code from
rds_rdma_listen_init().
Qing Huang [Tue, 6 Oct 2015 22:32:22 +0000 (15:32 -0700)]
net/rds: start rdma listening after ib/iw initialization is done
This prevents RDS from handling incoming rdma packets before RDS
completes initializing its recv/send components.
We don't need to call rds_rdma_listen_stop() if rds_rdma_listen_init()
didn't succeed.
We only need to call rds_ib_exit() if rds_ib_init() succeeds but
other parts fail. The same applies to rds_iw_init()/rds_iw_exit().
So we need to change error handling sequence accordingly.
Jump to ib/iw error handling path when we get an err code from
rds_rdma_listen_init().
Santosh Shilimkar [Wed, 21 Oct 2015 18:15:14 +0000 (11:15 -0700)]
Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed into topic/uek-4.1/ofed
* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed:
RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()
RDS: Invoke ->laddr_check() in rds_bind() for explicitly bound transports.
RDS: rds_conn_lookup() should factor in the struct net for a match
RDS: Use a single TCP socket for both send and receive.
RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune
RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_
Revert "rds_rdma: rds_sendmsg should return EAGAIN if connection not setup"
rds: make sure base connection is up on both sides
rds_ib/iw: fixed big endianness conversion issue for dp->dp_ack_seq
RDS: fix race condition when sending a message on unbound socket.
RDS: verify the underlying transport exists before creating a connection
mlx4: indicate memory resource exhaustion
IB/mlx4: Use correct order of variables in log message
mlx4_core: Introduce restrictions for PD update
Mukesh Kacker [Wed, 21 Oct 2015 16:11:46 +0000 (09:11 -0700)]
Merge branch 'topic/uek-4.1/ofed.rds-p2' into topic/uek-4.1/ofed
* topic/uek-4.1/ofed.rds-p2:
RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()
RDS: Invoke ->laddr_check() in rds_bind() for explicitly bound transports.
RDS: rds_conn_lookup() should factor in the struct net for a match
RDS: Use a single TCP socket for both send and receive.
RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune
RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_
Revert "rds_rdma: rds_sendmsg should return EAGAIN if connection not setup"
rds: make sure base connection is up on both sides
rds_ib/iw: fixed big endianness conversion issue for dp->dp_ack_seq
RDS: fix race condition when sending a message on unbound socket.
RDS: verify the underlying transport exists before creating a connection
Backport of upstream commit 241b271952eb ("RDS-TCP: Reset tcp callbacks
if re-using an outgoing socket in rds_tcp_accept_one()")
Consider the following "duelling syn" sequence between two peers A and B:
A B
SYN1 -->
<-- SYN2
SYN2ACK -->
Note that the SYN/ACK has already been sent out by TCP before
rds_tcp_accept_one() gets invoked as part of callbacks.
If the inet_addr(A) is numerically less than inet_addr(B),
the arbitration scheme in rds_tcp_accept_one() will prefer the
TCP connection triggered by SYN1, and will send a CLOSE for the
SYN2 (just after the SYN2ACK was sent).
Since B also follows the same arbitration scheme, it will send the SYN-ACK
for SYN1 that will set up a healthy ESTABLISHED connection on both sides.
B will also get a CLOSE for SYN2, which should result in the cleanup
of the TCP state machine for SYN2, but it should not trigger any
stale RDS-TCP callbacks (such as ->writespace, ->state_change etc),
that would disrupt the progress of the SYN2 based RDS-TCP connection.
Thus the arbitration scheme in rds_tcp_accept_one() should restore
rds_tcp callbacks for the winner before setting them up for the
new accept socket, and also make sure that conn->c_outgoing
is set to 0 so that we do not trigger any reconnect attempts on the
passive side of the tcp socket in the future, in conformance with
commit c82ac7e69efe ("net/rds: RDS-TCP: only initiate reconnect attempt
on outgoing TCP socket.")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Backport of upstream commit 486798001b92 ("RDS: Invoke ->laddr_check()
in rds_bind() for explicitly bound transports.")
The IP address passed to rds_bind() should be vetted by the
transport's ->laddr_check() for a previously bound transport.
This needs to be done to avoid cases where, for example,
the application has asked for an IB transport,
but the IP address passed to bind is only usable on
ethernet interfaces.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Backport of upstream commit 3b20fc389705 ("RDS: Use a single TCP
socket for both send and receive.");
Commit f711a6ae062c ("net/rds: RDS-TCP: Always create a new rds_sock
for an incoming connection.") modified rds-tcp so that an incoming SYN
would ignore an existing "client" TCP connection which had the local
port set to the transient port. The motivation for ignoring the existing
"client" connection in f711a6ae was to avoid race conditions and an
endless duel of reconnect attempts triggered by a restart/abort of one
of the nodes in the TCP connection.
However, having separate sockets for active and passive sides
is avoidable, and the simpler model of a single TCP socket for
both send and receives of all RDS connections associated with
that tcp socket makes for easier observability. We avoid the race
conditions from f711a6ae by attempting reconnects in rds_conn_shutdown
if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP.
The c_outgoing bit is initialized in __rds_conn_create().
A side-effect of re-using the client rds_connection for an incoming
SYN is the potential of encountering duelling SYNs, i.e., we
have an outgoing RDS_CONN_CONNECTING socket when we get the incoming
SYN. The logic to arbitrate this criss-crossing SYN exchange in
rds_tcp_accept_one() has been modified to emulate the BGP state
machine: the smaller IP address should back off from the connection
attempt.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Backport of upstream commit 1edd6a14d24f ("RDS-TCP: Do not bloat
sndbuf/rcvbuf in rds_tcp_tune")
Using the value of RDS_TCP_DEFAULT_BUFSIZE (128K)
clobbers efficient use of TSO because it inflates the size_goal
that is computed in tcp_sendmsg/tcp_sendpage and skews packet
latency, and the default values for these parameters actually
results in significantly better performance.
In request-response tests using rds-stress with a packet size of
100K with 16 threads (test parameters -q 100000 -a 256 -t16 -d16)
between a single pair of IP addresses achieves a throughput of
6-8 Gbps. Without this patch, throughput maxes at 2-3 Gbps under
equivalent conditions on these platforms.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Wed, 14 Oct 2015 01:03:04 +0000 (21:03 -0400)]
RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_
Backport of upstream commit 76b29ef120f5 ("RDS-TCP: Set up MSG_MORE and
MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_xmit")
For the same reasons as commit 2f5338442425 ("tcp: allow splice() to
build full TSO packets") and commit 35f9c09fe9c7 ("tcp: tcp_sendpages()
should call tcp_push() once"), rds_tcp_xmit may have multiple pages to
send, so use the MSG_MORE and MSG_SENDPAGE_NOTLAST as hints to
tcp_sendpage()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Ajaykumar Hotchandani [Tue, 5 May 2015 03:09:42 +0000 (20:09 -0700)]
rds: make sure base connection is up on both sides
Current RDS active side requires zero lane path records for establishing
non-zero lane connection. For this reason, active side makes sure to
have zero lane connection up before establishing non-zero lane
connection. Passive side does not require to fetch path records, so it
does not have this check.
This has possibility of connection having non-ideal path records in
following scenario:
- Host1 had PORT_UP event.
- Lane0 and Lane6 connection went down.
- Lane0 connection came up.
- Host1 sent connection request for Lane6.
- Host2 had PORT_UP event.
- Lane0 and Lane6 connections went down.
- Host2 sent DREQ for Lane0.
- Since Lane6 connection is not up, it does not require to do anything.
- Host2 received connection request from host1 having old path records
for Lane6.
- Lane6 connection got established on old path records.
Following are impacts of having connections with non-ideal path records:
- minor performance hit because of extra hop with ISL path
- in port failure scenario, it impacts connections which are not related
to that port.
With this patch we make sure that base connection is up on passive side
as well before allowing to establish connection.
Quentin Casasnovas [Mon, 19 Oct 2015 21:22:27 +0000 (14:22 -0700)]
RDS: fix race condition when sending a message on unbound socket.
Sasha's found a NULL pointer dereference in the RDS connection code when
sending a message to an apparently unbound socket. The problem is caused
by the code checking if the socket is bound in rds_sendmsg(), which checks
the rs_bound_addr field without taking a lock on the socket. This opens a
race where rs_bound_addr is temporarily set but where the transport is not
in rds_bind(), leading to a NULL pointer dereference when trying to
dereference 'trans' in __rds_conn_create().
Vegard wrote a reproducer for this issue, so kindly ask him to share if
you're interested.
I cannot reproduce the NULL pointer dereference using Vegard's reproducer
with this patch, whereas I could without.
Complete earlier incomplete fix to CVE-2015-6937:
74e98eb08588 ("RDS: verify the underlying transport exists before creating a connection")
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Reviewed-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Chien Yen <chien.yen@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: stable@vger.kernel.org Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Mukesh Kacker [Wed, 21 Oct 2015 15:09:05 +0000 (08:09 -0700)]
Merge branch 'topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes' into topic/uek-4.1/ofed
* topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes:
mlx4: indicate memory resource exhaustion
IB/mlx4: Use correct order of variables in log message
mlx4_core: Introduce restrictions for PD update
Ajaykumar Hotchandani [Wed, 14 Oct 2015 23:38:11 +0000 (16:38 -0700)]
mlx4_core: Introduce restrictions for PD update
From 2.31.5350 firmware onwards,
- RDS with RDMA data transfer stopped working.
- Mellanox has introduced limitations related to PD updates.
These imposed limitations are inline with PRM.
This patch makes driver in sync with these imposed limitations.
Mellanox R&D has approved this patch.
It's been tested on both old firmware (2.11.1280) and new firmware.
Quentin Casasnovas [Mon, 19 Oct 2015 21:22:27 +0000 (14:22 -0700)]
RDS: fix race condition when sending a message on unbound socket.
Sasha's found a NULL pointer dereference in the RDS connection code when
sending a message to an apparently unbound socket. The problem is caused
by the code checking if the socket is bound in rds_sendmsg(), which checks
the rs_bound_addr field without taking a lock on the socket. This opens a
race where rs_bound_addr is temporarily set but where the transport is not
in rds_bind(), leading to a NULL pointer dereference when trying to
dereference 'trans' in __rds_conn_create().
Vegard wrote a reproducer for this issue, so kindly ask him to share if
you're interested.
I cannot reproduce the NULL pointer dereference using Vegard's reproducer
with this patch, whereas I could without.
Complete earlier incomplete fix to CVE-2015-6937:
74e98eb08588 ("RDS: verify the underlying transport exists before creating a connection")
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Reviewed-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Chien Yen <chien.yen@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: stable@vger.kernel.org Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Santosh Shilimkar [Tue, 13 Oct 2015 17:10:40 +0000 (10:10 -0700)]
Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed into topic/uek-4.1/ofed
* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed:
RDS/IB: print string constants in more places
ib/rds: runtime debuggability enhancement
Santosh Shilimkar [Thu, 8 Oct 2015 22:59:14 +0000 (15:59 -0700)]
RDS: make send_batch_count tunable effective
The send_batch_count tunable is stale and code relies on
hard-coded batch count value. Its a nice feature and lets you
tune the system based on different HCAs. TCP transport as well
have different characteristics and tunable can be useful.
There is no change in default behavior with this patch.
Santosh Shilimkar [Thu, 8 Oct 2015 23:26:32 +0000 (16:26 -0700)]
RDS: make use of kfree_rcu() and avoid the call_rcu() chain
call_rcu() chains are expensive and its use in rds_ib_remove_ipaddr()
is just to kfree() the rds_ib_ipaddr. Chains make use of high-latency
rcu_barrier() in modules which can be avoided.
Makes use of kfree_rcu() which is exactly meant for such use
This patch provides the ability to dynamically turn on or off various
types of debug/diag prints inside the RDS module.
The run-time debug prints are controlled by a rds module parameter,
rds_rt_debug_bitmap.
Here is the definition for different bits. We have implemented feature
related bits, such as Connection Management, Active Bonding, Error prints,
Send, Recv.
in net/rds/rds_rt_debug.h
...
enum {
/* bit 0 ~ 19 are feature related bits */
RDS_RTD_ERR = 1 << 0, /* 0x1 */
RDS_RTD_ERR_EXT = 1 << 1, /* 0x2 */
In general, *EXTRA bits mean that you will get extra information but
possible flood prints as well. But every bit can be controlled by users
so users can decide how much information they want to see/collect. The
current embedded printk level used for this patch is KERN_INFO. Most
likely all the msgs will only go to /var/log/messages without showing up
on console if we use the default settings for /proc/sys/kernel/printk and
/etc/rsyslog.conf in ol6 environment.
E.g if we want to turn on RDS_RTD_ERR and RDS_RTD_CM bits. What we can
do is
Add Oracle virtual Networking Drivers for uek4 kernel
This commit adds 4 kernel modules: xscore, xsvnic, xve
and xsvhba developed by Xsigo (acquired by Oracle) and used in the Oracle
virtual networking (OVN) products which provide provide virtual network and
storage adapter devices on the servers dynamically at runtime.
The heart of OVN product is the Fabric Interconnect (FI).
Hosts and IO modules connect to the FI using Infiniband fabric.
IO modules can be N/W card or/and FC card.
The "xscore" module is responsible for doing FI topology discovery
and establishing the connection with FI. It is involved in retrieving
virtual device management commands such as INSTALL, DELETE, etc.
This module provides wrapper for IB framework API's which will be used
by its client modules "xsvnic", "xsvhba" and "xve".
The "xve" module supprots the Xsigo Virtual Ethernet(XVE) protocol.
The "xsvnic" module supports the Xsigo vNIC functinality. These modules
interface between kernel networking stack and the "xscore" module.
On the egress side, it processes the N/W packet sends it to "xscore"
module which is then wrapped into a IB packet.
On the ingress side, "xscore" receives the N/W packet which is
encapsulated inside IB packet and transfers it to "xsvnic" or "xve".
The modules "xsvnic"/"xve" process this packet and send it to the
kernel networking stack. The "xsvnic" interacts with N/W card gateway
connected to the FI whereas, "xve" interacts with another host in the
same IB fabric.
The "xsvhba" module support for the Xsigo virtual HBA allowing SAN
Connectivity. The "xsvhba" module interfaces with SCSI layer. It
communicates with the FC card gateway connected to the FI. It is
responsible for accepting/transporting the SCSI commands from/to
the specified SCSI target. The "xsvhba" module uses "xscore" to
wrap(unwrap) the commands in a IB packet and transmit(receive) it.