Pradeep Gopanapalli [Thu, 5 Nov 2015 02:58:15 +0000 (18:58 -0800)]
1) Support vnic for EDR based platform(uVnic) 2) Supported Types now Type 0 - XSMP_XCM_OVN - Xsigo VP780/OSDN standalone Chassis, (add pvi) Type 1 - XSMP_XCM_NOUPLINK - EDR Without uplink (add public-network) Type 2 - XSMP_XCM_UPLINK -EDR with uplink (add public-network <with -if> 3) Intelligence in driver to support all the modes 4) Added Code for printing Multicast LID [Revision 8008] 5) removed style errors
Qing Huang [Tue, 6 Oct 2015 22:32:22 +0000 (15:32 -0700)]
net/rds: start rdma listening after ib/iw initialization is done
This prevents RDS from handling incoming rdma packets before RDS
completes initializing its recv/send components.
We don't need to call rds_rdma_listen_stop() if rds_rdma_listen_init()
didn't succeed.
We only need to call rds_ib_exit() if rds_ib_init() succeeds but
other parts fail. The same applies to rds_iw_init()/rds_iw_exit().
So we need to change error handling sequence accordingly.
Jump to ib/iw error handling path when we get an err code from
rds_rdma_listen_init().
Santosh Shilimkar [Wed, 21 Oct 2015 18:15:14 +0000 (11:15 -0700)]
Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed into topic/uek-4.1/ofed
* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed:
RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()
RDS: Invoke ->laddr_check() in rds_bind() for explicitly bound transports.
RDS: rds_conn_lookup() should factor in the struct net for a match
RDS: Use a single TCP socket for both send and receive.
RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune
RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_
Revert "rds_rdma: rds_sendmsg should return EAGAIN if connection not setup"
rds: make sure base connection is up on both sides
rds_ib/iw: fixed big endianness conversion issue for dp->dp_ack_seq
RDS: fix race condition when sending a message on unbound socket.
RDS: verify the underlying transport exists before creating a connection
mlx4: indicate memory resource exhaustion
IB/mlx4: Use correct order of variables in log message
mlx4_core: Introduce restrictions for PD update
Mukesh Kacker [Wed, 21 Oct 2015 16:11:46 +0000 (09:11 -0700)]
Merge branch 'topic/uek-4.1/ofed.rds-p2' into topic/uek-4.1/ofed
* topic/uek-4.1/ofed.rds-p2:
RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()
RDS: Invoke ->laddr_check() in rds_bind() for explicitly bound transports.
RDS: rds_conn_lookup() should factor in the struct net for a match
RDS: Use a single TCP socket for both send and receive.
RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune
RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_
Revert "rds_rdma: rds_sendmsg should return EAGAIN if connection not setup"
rds: make sure base connection is up on both sides
rds_ib/iw: fixed big endianness conversion issue for dp->dp_ack_seq
RDS: fix race condition when sending a message on unbound socket.
RDS: verify the underlying transport exists before creating a connection
Backport of upstream commit 241b271952eb ("RDS-TCP: Reset tcp callbacks
if re-using an outgoing socket in rds_tcp_accept_one()")
Consider the following "duelling syn" sequence between two peers A and B:
A B
SYN1 -->
<-- SYN2
SYN2ACK -->
Note that the SYN/ACK has already been sent out by TCP before
rds_tcp_accept_one() gets invoked as part of callbacks.
If the inet_addr(A) is numerically less than inet_addr(B),
the arbitration scheme in rds_tcp_accept_one() will prefer the
TCP connection triggered by SYN1, and will send a CLOSE for the
SYN2 (just after the SYN2ACK was sent).
Since B also follows the same arbitration scheme, it will send the SYN-ACK
for SYN1 that will set up a healthy ESTABLISHED connection on both sides.
B will also get a CLOSE for SYN2, which should result in the cleanup
of the TCP state machine for SYN2, but it should not trigger any
stale RDS-TCP callbacks (such as ->writespace, ->state_change etc),
that would disrupt the progress of the SYN2 based RDS-TCP connection.
Thus the arbitration scheme in rds_tcp_accept_one() should restore
rds_tcp callbacks for the winner before setting them up for the
new accept socket, and also make sure that conn->c_outgoing
is set to 0 so that we do not trigger any reconnect attempts on the
passive side of the tcp socket in the future, in conformance with
commit c82ac7e69efe ("net/rds: RDS-TCP: only initiate reconnect attempt
on outgoing TCP socket.")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Backport of upstream commit 486798001b92 ("RDS: Invoke ->laddr_check()
in rds_bind() for explicitly bound transports.")
The IP address passed to rds_bind() should be vetted by the
transport's ->laddr_check() for a previously bound transport.
This needs to be done to avoid cases where, for example,
the application has asked for an IB transport,
but the IP address passed to bind is only usable on
ethernet interfaces.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Backport of upstream commit 3b20fc389705 ("RDS: Use a single TCP
socket for both send and receive.");
Commit f711a6ae062c ("net/rds: RDS-TCP: Always create a new rds_sock
for an incoming connection.") modified rds-tcp so that an incoming SYN
would ignore an existing "client" TCP connection which had the local
port set to the transient port. The motivation for ignoring the existing
"client" connection in f711a6ae was to avoid race conditions and an
endless duel of reconnect attempts triggered by a restart/abort of one
of the nodes in the TCP connection.
However, having separate sockets for active and passive sides
is avoidable, and the simpler model of a single TCP socket for
both send and receives of all RDS connections associated with
that tcp socket makes for easier observability. We avoid the race
conditions from f711a6ae by attempting reconnects in rds_conn_shutdown
if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP.
The c_outgoing bit is initialized in __rds_conn_create().
A side-effect of re-using the client rds_connection for an incoming
SYN is the potential of encountering duelling SYNs, i.e., we
have an outgoing RDS_CONN_CONNECTING socket when we get the incoming
SYN. The logic to arbitrate this criss-crossing SYN exchange in
rds_tcp_accept_one() has been modified to emulate the BGP state
machine: the smaller IP address should back off from the connection
attempt.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Backport of upstream commit 1edd6a14d24f ("RDS-TCP: Do not bloat
sndbuf/rcvbuf in rds_tcp_tune")
Using the value of RDS_TCP_DEFAULT_BUFSIZE (128K)
clobbers efficient use of TSO because it inflates the size_goal
that is computed in tcp_sendmsg/tcp_sendpage and skews packet
latency, and the default values for these parameters actually
results in significantly better performance.
In request-response tests using rds-stress with a packet size of
100K with 16 threads (test parameters -q 100000 -a 256 -t16 -d16)
between a single pair of IP addresses achieves a throughput of
6-8 Gbps. Without this patch, throughput maxes at 2-3 Gbps under
equivalent conditions on these platforms.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Wed, 14 Oct 2015 01:03:04 +0000 (21:03 -0400)]
RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_
Backport of upstream commit 76b29ef120f5 ("RDS-TCP: Set up MSG_MORE and
MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_xmit")
For the same reasons as commit 2f5338442425 ("tcp: allow splice() to
build full TSO packets") and commit 35f9c09fe9c7 ("tcp: tcp_sendpages()
should call tcp_push() once"), rds_tcp_xmit may have multiple pages to
send, so use the MSG_MORE and MSG_SENDPAGE_NOTLAST as hints to
tcp_sendpage()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Ajaykumar Hotchandani [Tue, 5 May 2015 03:09:42 +0000 (20:09 -0700)]
rds: make sure base connection is up on both sides
Current RDS active side requires zero lane path records for establishing
non-zero lane connection. For this reason, active side makes sure to
have zero lane connection up before establishing non-zero lane
connection. Passive side does not require to fetch path records, so it
does not have this check.
This has possibility of connection having non-ideal path records in
following scenario:
- Host1 had PORT_UP event.
- Lane0 and Lane6 connection went down.
- Lane0 connection came up.
- Host1 sent connection request for Lane6.
- Host2 had PORT_UP event.
- Lane0 and Lane6 connections went down.
- Host2 sent DREQ for Lane0.
- Since Lane6 connection is not up, it does not require to do anything.
- Host2 received connection request from host1 having old path records
for Lane6.
- Lane6 connection got established on old path records.
Following are impacts of having connections with non-ideal path records:
- minor performance hit because of extra hop with ISL path
- in port failure scenario, it impacts connections which are not related
to that port.
With this patch we make sure that base connection is up on passive side
as well before allowing to establish connection.
Quentin Casasnovas [Mon, 19 Oct 2015 21:22:27 +0000 (14:22 -0700)]
RDS: fix race condition when sending a message on unbound socket.
Sasha's found a NULL pointer dereference in the RDS connection code when
sending a message to an apparently unbound socket. The problem is caused
by the code checking if the socket is bound in rds_sendmsg(), which checks
the rs_bound_addr field without taking a lock on the socket. This opens a
race where rs_bound_addr is temporarily set but where the transport is not
in rds_bind(), leading to a NULL pointer dereference when trying to
dereference 'trans' in __rds_conn_create().
Vegard wrote a reproducer for this issue, so kindly ask him to share if
you're interested.
I cannot reproduce the NULL pointer dereference using Vegard's reproducer
with this patch, whereas I could without.
Complete earlier incomplete fix to CVE-2015-6937:
74e98eb08588 ("RDS: verify the underlying transport exists before creating a connection")
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Reviewed-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Chien Yen <chien.yen@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: stable@vger.kernel.org Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Mukesh Kacker [Wed, 21 Oct 2015 15:09:05 +0000 (08:09 -0700)]
Merge branch 'topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes' into topic/uek-4.1/ofed
* topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes:
mlx4: indicate memory resource exhaustion
IB/mlx4: Use correct order of variables in log message
mlx4_core: Introduce restrictions for PD update
Ajaykumar Hotchandani [Wed, 14 Oct 2015 23:38:11 +0000 (16:38 -0700)]
mlx4_core: Introduce restrictions for PD update
From 2.31.5350 firmware onwards,
- RDS with RDMA data transfer stopped working.
- Mellanox has introduced limitations related to PD updates.
These imposed limitations are inline with PRM.
This patch makes driver in sync with these imposed limitations.
Mellanox R&D has approved this patch.
It's been tested on both old firmware (2.11.1280) and new firmware.
Quentin Casasnovas [Mon, 19 Oct 2015 21:22:27 +0000 (14:22 -0700)]
RDS: fix race condition when sending a message on unbound socket.
Sasha's found a NULL pointer dereference in the RDS connection code when
sending a message to an apparently unbound socket. The problem is caused
by the code checking if the socket is bound in rds_sendmsg(), which checks
the rs_bound_addr field without taking a lock on the socket. This opens a
race where rs_bound_addr is temporarily set but where the transport is not
in rds_bind(), leading to a NULL pointer dereference when trying to
dereference 'trans' in __rds_conn_create().
Vegard wrote a reproducer for this issue, so kindly ask him to share if
you're interested.
I cannot reproduce the NULL pointer dereference using Vegard's reproducer
with this patch, whereas I could without.
Complete earlier incomplete fix to CVE-2015-6937:
74e98eb08588 ("RDS: verify the underlying transport exists before creating a connection")
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Reviewed-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Chien Yen <chien.yen@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: stable@vger.kernel.org Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Santosh Shilimkar [Tue, 13 Oct 2015 17:10:40 +0000 (10:10 -0700)]
Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed into topic/uek-4.1/ofed
* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed:
RDS/IB: print string constants in more places
ib/rds: runtime debuggability enhancement
Santosh Shilimkar [Thu, 8 Oct 2015 22:59:14 +0000 (15:59 -0700)]
RDS: make send_batch_count tunable effective
The send_batch_count tunable is stale and code relies on
hard-coded batch count value. Its a nice feature and lets you
tune the system based on different HCAs. TCP transport as well
have different characteristics and tunable can be useful.
There is no change in default behavior with this patch.
Santosh Shilimkar [Thu, 8 Oct 2015 23:26:32 +0000 (16:26 -0700)]
RDS: make use of kfree_rcu() and avoid the call_rcu() chain
call_rcu() chains are expensive and its use in rds_ib_remove_ipaddr()
is just to kfree() the rds_ib_ipaddr. Chains make use of high-latency
rcu_barrier() in modules which can be avoided.
Makes use of kfree_rcu() which is exactly meant for such use
This patch provides the ability to dynamically turn on or off various
types of debug/diag prints inside the RDS module.
The run-time debug prints are controlled by a rds module parameter,
rds_rt_debug_bitmap.
Here is the definition for different bits. We have implemented feature
related bits, such as Connection Management, Active Bonding, Error prints,
Send, Recv.
in net/rds/rds_rt_debug.h
...
enum {
/* bit 0 ~ 19 are feature related bits */
RDS_RTD_ERR = 1 << 0, /* 0x1 */
RDS_RTD_ERR_EXT = 1 << 1, /* 0x2 */
In general, *EXTRA bits mean that you will get extra information but
possible flood prints as well. But every bit can be controlled by users
so users can decide how much information they want to see/collect. The
current embedded printk level used for this patch is KERN_INFO. Most
likely all the msgs will only go to /var/log/messages without showing up
on console if we use the default settings for /proc/sys/kernel/printk and
/etc/rsyslog.conf in ol6 environment.
E.g if we want to turn on RDS_RTD_ERR and RDS_RTD_CM bits. What we can
do is
Add Oracle virtual Networking Drivers for uek4 kernel
This commit adds 4 kernel modules: xscore, xsvnic, xve
and xsvhba developed by Xsigo (acquired by Oracle) and used in the Oracle
virtual networking (OVN) products which provide provide virtual network and
storage adapter devices on the servers dynamically at runtime.
The heart of OVN product is the Fabric Interconnect (FI).
Hosts and IO modules connect to the FI using Infiniband fabric.
IO modules can be N/W card or/and FC card.
The "xscore" module is responsible for doing FI topology discovery
and establishing the connection with FI. It is involved in retrieving
virtual device management commands such as INSTALL, DELETE, etc.
This module provides wrapper for IB framework API's which will be used
by its client modules "xsvnic", "xsvhba" and "xve".
The "xve" module supprots the Xsigo Virtual Ethernet(XVE) protocol.
The "xsvnic" module supports the Xsigo vNIC functinality. These modules
interface between kernel networking stack and the "xscore" module.
On the egress side, it processes the N/W packet sends it to "xscore"
module which is then wrapped into a IB packet.
On the ingress side, "xscore" receives the N/W packet which is
encapsulated inside IB packet and transfers it to "xsvnic" or "xve".
The modules "xsvnic"/"xve" process this packet and send it to the
kernel networking stack. The "xsvnic" interacts with N/W card gateway
connected to the FI whereas, "xve" interacts with another host in the
same IB fabric.
The "xsvhba" module support for the Xsigo virtual HBA allowing SAN
Connectivity. The "xsvhba" module interfaces with SCSI layer. It
communicates with the FC card gateway connected to the FI. It is
responsible for accepting/transporting the SCSI commands from/to
the specified SCSI target. The "xsvhba" module uses "xscore" to
wrap(unwrap) the commands in a IB packet and transmit(receive) it.
ib_sdp/cma: readd SDP support to cma_save_net_info
Upstream has removed SDP support from cma.c. Some applications may
not display addr/port information correctly without this change to
cma_save_net_info() function.
Mukesh Kacker [Tue, 6 Oct 2015 12:49:07 +0000 (05:49 -0700)]
Merge branch 'topic/uek-4.1/ofed.sdp' into topic/uek-4.1/ofed
* topic/uek-4.1/ofed.sdp:
ib/sdp: Enable usermode FMR
ib/sdp: fix null dereference of sk->sk_wq in sdp_rx_irq()
sdp: fix keepalive functionality
ib_sdp: fix deadlock when sdp_cma_handler is called while socket is being closed
ib_sdp: add unhandled events to rdma_cm_event_str
We know the first call to sdp_sk_sleep(sk) finds a non-null sk->sk_wq
because we don't crash:
0xffffffffa02b6388 <sdp_rx_irq+56>: mov 0xb8(%rsi),%rax
0xffffffffa02b638f <sdp_rx_irq+63>: test %rax,%rax
*** struct sock sk+0xb8 == sk->sk_wq (sk_wq is at offset 0xb8)
*** we didn't crash at sdp_rx_irq+56 so sk->sk_wq was apparently valid
0xffffffffa02b6394 <sdp_rx_irq+68>: mov 0xb8(%rsi),%rdx
0xffffffffa02b639b <sdp_rx_irq+75>: lea 0x8(%rdx),%rax
0xffffffffa02b639f <sdp_rx_irq+79>: cmp %rax,0x8(%rdx)
*** RDX is NULL causing the null dereference of address 0x8 at sdp_rx_irq+79.
Fix is to check if sk->sk_wq is NULL before dereferencing it to get the
address of sk->sk_wq->wait. Also, do the RCU dereference of sk->sk_wq
once, not twice as we may get a different answer (NULL) the second time.
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com> Signed-off-by: John Sobecki <john.sobecki@oracle.com> Acked-by: Chien Yen <chien.yen@oracle.com> Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
shamir rabinovitch [Mon, 12 May 2014 15:34:02 +0000 (08:34 -0700)]
sdp: fix keepalive functionality
sdp keepalive functionality differ a bit from tcp socket functionality.
in sdp only accepted or connected socket can trigger this functionality
as the keepalive is implemented as rdma write with zero length and this
require ib connection. due to this sdp behaviour you cannot set keepalive
on listening server socket or on non connected client socket. apps can
use sdp in 2 ways. binary apps that use tcp sockets can use the libsdp
to direct all the socket calls to sdp and new apps can open and use sdp
sockets directly w/o the need for libsdp. when using sdp socket directly
please follow the below rules:
- define: AF_INET_SDP = SOL_SDP = 27
- create the socket as follow:
socket(AF_INET_SDP, SOCK_STREAM, 0)
- get the sdp socket keepalive as follow:
getsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &optval, &optlen)
- set the sdp socket keepalive as follow:
setsockopt(fd, SOL_SDP, SO_KEEPALIVE, &optval, optlen)
when you load the sdp module;
- set the keepalive time. this is the max period in sec of no data before
sdp start to send the probes. you should take to account that more
then one probe is needed till sdp detect that the remote hca is gone.
echo <time sec> > /sys/module/ib_sdp/parameters/sdp_keepalive_time
- zero the probes counter. this counter is incremented any time sdp send probe.
probes are sent only if there is no tx/rx on this queue pair for the
keepalive time period.
echo 0 > /sys/module/ib_sdp/parameters/sdp_keepalive_probes_sent
on server socket:
- set keepalive only on accepted socket
on client socket:
- set keepalive only on socket after connect
Saeed Mahameed [Sun, 17 Feb 2013 16:10:57 +0000 (18:10 +0200)]
ib_sdp: fix deadlock when sdp_cma_handler is called while socket is being closed
issue: 130280
sdp_close will grap sock_lock and while closing sdp_cma_handler can be called from cma context
under id_priv->qp_mutex and the sdp_cma_handler will wait for sock_lock to be available.
sdp_close will call rdma_disconnect which will need to grap id_priv->qp_mutex --> deadlock !
Mukesh Kacker [Tue, 6 Oct 2015 12:13:39 +0000 (05:13 -0700)]
Merge branch 'topic/uek-4.1/ofed.sdp' into topic/uek-4.1/ofed
* topic/uek-4.1/ofed.sdp: (408 commits)
ib_sdp: porting sdp from uek2 to uek-4.1
ib_sdp: remove APM code
sdp: Kconfig and Makefile changes
sdp: port the code to uek2
sdp: added debug print for the event: RDMA_CM_EVENT_ALT_PATH_LOADED
sdp: prepare support to kernel 2.6.39-200.1.1.el5uek: add macro to get sk_sleep
sdp: add support to kernel 2.6.39-200.1.1.el5uek
sdp: add [rt]x_bytes counters to sdpstats
sdp: Fix Bug 114242 - Multi connection net_perf causes server to hang
FMR: remove FMR failure messages
sdp: make sdp memory leak print a debug
sdp: changed memory accounting warning into debug
sdp: Fix issues in sdpprf
sdp: Remove protection before sleep on RX
sdp: Enable automatic path migration support also in the passive side of the connection.
sdp: Fixed some coverity issues
Flatten the entire tree fixes
sdp: Fixed compilation error on 2.6.18 RH5.5
sdp: fix memory leak. sockets_allocated wasn't freed
sdp: Removed spaces and tabs at end of lines
...
Eli Cohen [Sun, 3 Apr 2011 07:07:44 +0000 (10:07 +0300)]
Flatten the entire tree fixes
As from now we are going to avoid using patches to commit changes to the
driver. Instead, we will push directly to the source files. Backports are still
maintained but only for 2.6.18-EL5.5; backaports of 2.6.32 are completely
removed.
Amir Vadai [Tue, 8 Mar 2011 08:25:35 +0000 (10:25 +0200)]
sdp: Limit total memory consumed by rcvbuf
rcvbuf is already limited by the payload in the queue. But also need to limit
total memory consumption of it, since small packets received might have a very
large overhead to the payload.
Amir Vadai [Mon, 28 Feb 2011 09:57:18 +0000 (11:57 +0200)]
sdp: fix memory socket accounting
skb->truesize - total bytes allocated by skb, including fragments
Specific socket accounting:
* sk->sk_wmem_queued - send bytes currently in TX queue
* RX queue accounting is done by using seq
* sk->sk_rmem_alloc - bytes consumed by RX
Protocol accounting:
* sk->sk_forward_alloc - bytes that are available to be consumed
* prot->memory_allocated - bytes consumed by TX/RX
Amir Vadai [Thu, 27 Jan 2011 08:42:56 +0000 (10:42 +0200)]
rdma_cm, sdp: bug fixes and some changes to APM logic
- We no longer rely on the private data buffer of the LAP/APR messages for passive side LID improvement.
Instead, we use the protocol defined LID improvement APR error code.
- Two paths are allocated on id creation to simplify code.
- Various small bug fixes.
- Added a missing ref_count get
- Some code cleanup.
- Important: rdma_enable_apm may be called only upon receiving RDMA_CM_ROUTE_RESOLVED event.
This was done to break symmetry on failover and possibly on other occasions.
Signed-off-by: Nir Muchtar <nirm@voltaire.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Amir Vadai [Sun, 16 Jan 2011 16:32:38 +0000 (18:32 +0200)]
sdp: Abort rx SrcAvail when out of credits
SrcAvail sink side could run out of credits when having bcopy sends to the
other direction. Because of that no RdmaRdCompl could be sent nor SendSM. This
could hang the sender side forever (No SendSM on SrcAvailCancel).
Amir Vadai [Tue, 11 Jan 2011 13:16:51 +0000 (15:16 +0200)]
sdp: Fixed BUG2207 - EINVAL when connect after IPv6 bind
Connect to IPv4 over IPv6 address need rdma id to be created with IPv4 address.
If bound before with IPv6 adderss, need to destroy and recreate the id.
Also, when connecting after bind, keep the same source port number.
Amir Vadai [Tue, 14 Dec 2010 12:41:12 +0000 (14:41 +0200)]
sdp: remove 'reading beyond SKB' warning
This is a good sanity check, but could print a warning when a
partially used SrcAvail skb is cancelled.
This should be fixed in a way that will leave the sanity check,
but need to make minimal changes before the GA.
Amir Vadai [Tue, 14 Dec 2010 06:48:42 +0000 (08:48 +0200)]
sdp: RdmaRdCompl not sent sometimes
When SrcAvailCancel is handled after RDMA finshed and before sending
RdmaRdCompl, RdmaRdCompl won't be sent, and a data corruption will occur.
Made sure that all sdp_abort_rx_srcavail will send RdmaRdCompl if needed.