Amir Vadai [Tue, 28 Apr 2009 06:40:10 +0000 (09:40 +0300)]
sdp: BUG1311 Netpipe fails with a IB_WC_LOC_LEN_ERR.
This problem is seen when the receive buffer or the receive buffer fragments
are smaller than the senders buffer. If the sender is using 64KB pages
and supports a sk fragment of 64KB it may send fragments that are a full 64KB
in length causing the receiver to generate an IB_WC_LOC_LEN_ERR. This patch
makes two changes:
If the kernel does not support a full 64KB fragment it will reject resize
requests over 32K. (On older kernels a fragment size is defined as a U16)
If a kernel supports a 64KB fragment then it allows a full 64KB receive
fragment to be used.
Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Nicolas Morey-Chaisemartin [Wed, 29 Apr 2009 14:23:04 +0000 (16:23 +0200)]
sdp: change orphan_count and sockets_allocated from atomic_t to percpu_counter
Fixed SDP to work on 2.6.29+
As percpu_counter are huge they can be allocated on the stack without causing sdp module to crash.
Both variable are now dynamically allocated at module init.
Signed-off-by: Nicolas Morey-Chaisemartin <nicolas.morey-chaisemartin@ext.bull.net> Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Amir Vadai [Mon, 17 Nov 2008 08:11:27 +0000 (10:11 +0200)]
SDP: BUG1391 - bugs in the zero-copy send code
* fix sdp_bz_setup() code to handle the case of kernel data segment correctly (kernel sockets)
* make sdp_bz_setup() pass ENOMEM, EFAULT or other errors to sendmsg().
* Fix: the deallocation of bz descriptor in sendmsg() is not handled properly -- it is allocated many times, but freed once
* Fix: sdp_bzcopy_get() code does not raise reference count for all pages in the bz descriptor (only the "partial" pages will get the count raised).
However, the send completion code will call put_page() on all entries, leading to a crash for page-aligned transfers.
Signed-off-by: Constantine Gavrilov <constantine.gavrilov@gmail.com> Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Amir Vadai [Thu, 20 Nov 2008 10:56:39 +0000 (12:56 +0200)]
SDP: BUG1348 - sockets are left in CLOSE state with ref count > 0
Removed unnecessary sock_hold() when a CM_REJECT arrives before TCP_ESTABLISHED state.
This happend in the server side when after getting CM_REQ and answering with CM_REP a CM_REJ
arrived.
The sock_hold that was removed assumed that there will be a timewait state - but according to
the spec, the state changes back to LISTEN without TIMEWAIT.
Amir Vadai [Tue, 18 Nov 2008 13:41:15 +0000 (15:41 +0200)]
SDP: BUG1343 - Polygraph test crashes machine
No socket reference was taken before starting DREQ timeout.
cancel delayed work only remove the work if it is in the timer stage
before entered to the workqueue.
Because of that, sdp_dreq_timeout_work could be in the work queue
after the socket was destructed. and when the socket was reused
and the inlined work structre was resetted things went wrong.
Amir Vadai [Sun, 26 Oct 2008 09:43:43 +0000 (11:43 +0200)]
sdp: Limit skb frag size to 64K-1
When 64K pages are in use, the skb_frag size can become larger
than the skb_frag can address. An skb_frag's max size is 64K-1.
This patch defines SDP_MAX_PAYLOAD as 64K - SDP_HEADER_SIZE.
The patch changes sdp_post_recv() and sdp_sendmsg() to use the smaller of
PAGE_SIZE or SDP_MAX_PAYLOAD as it segment size.
This fix the bug here:
https://bugs.openfabrics.org/show_bug.cgi?id=1300
Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Amir Vadai [Wed, 23 Jul 2008 11:30:31 +0000 (14:30 +0300)]
SDP: Don't allow destruct socket when having sdp_destroy_work in workqueue
in error flow, sdp_destroy_work is placed in workqueue and sometimes the
user application call close() that destruct the socket before sdp_destroy_work
is called.
Amir Vadai [Thu, 10 Jul 2008 08:53:53 +0000 (01:53 -0700)]
SDP: do gracefull close instead of always doing abortive close.
Main changes:
1. when a close/shutdown syscall is called, instead of sending a DREQ, put
last socket ref count and go to TCP_CLOSE state do:
- take a socket reference count
- set state to TCP_TIME_WAIT
- start infiniband tear down
- wait till got RDMA_CM_EVENT_TIMEWAIT_EXIT
- set socket state to TCP_CLOSE
- put last socket ref count - this will call sdp_destruct()
2. No need for sdp_time_wait
3. Abortive close will immedietly start infiniband teardown - will finilize the
socket closing when CM finish.
Jim Mott [Wed, 5 Dec 2007 09:02:11 +0000 (11:02 +0200)]
SDP: various bzcopy fixes V2
The Mellanox regression tests posted a number of failures when
multiple threads were accessing the same sockets concurrently. In
addition to test failures, there were log messages of the form:
sdp_sock(54386:19002): Could not reap -5 in-flight sends
This fix handles all these failures and errors.
The V2 is a fix to handle 2.6.22+ kernels where sk_buffs have
changed.
Jim Mott [Sun, 11 Nov 2007 17:20:02 +0000 (19:20 +0200)]
sdp: Fix data corretness regression test failure.
Mellanox regression testing for data correctness started failing
after the recent addition of bzcopy. This was because sdp_sendmsg
returned before all in-flight RC transfers completed.
This allowed user space to modify buffers that had not been sent.
A big oops.
This fixes that bug. Small frame bandwidth is even worse
now, but small frame latency is lower which is good. The
default transfer size that triggers bzcopy has been
increased to the bandwidth crossover point found in
MLX4-MLX4 tests. More work will be required to find the
best value for the release.
Jim Mott [Tue, 6 Nov 2007 22:28:05 +0000 (14:28 -0800)]
SDP - Fix reference count locking bug
Add code to fix a problem found by the Mellanox regression group. When
mlx4_ib driver is unloaded while SDP connections are active, the system
would hang.
The original fix for this problem called an rdma_cm service that can block
with 2 spin locks held.
Jim Mott [Sat, 3 Nov 2007 02:50:31 +0000 (19:50 -0700)]
SDP - Make bzcopy defualt for 2K and larger transfer size
In order to be sure we test the new bzcopy code it will be enabled by
default. The 2K threshold is what my testing shows to be the lowest
value that always wins. We may have to adjust this upward if other
hardware has worse performance.
Jim Mott [Tue, 23 Oct 2007 17:59:13 +0000 (10:59 -0700)]
SDP - Zero copy bcopy support
This patch adds zero copy send support to SDP. Below 2K transfer size,
it is better to bcopy. With larger transfers, this is a net win on
bandwidth. Latency testing is yet to be done.
Performance work still remains. Open issues include correct setsockopt
defines (use previous SDP values?), code cleanup, performance tuning,
rigorous regression testing, and multi-OS build+test. Simple testing to
date includes netperf and iperf, ^C recovery, unload/load, and checking
for gross memory leaks on Rhat4u4.
Jim Mott [Tue, 23 Oct 2007 17:58:27 +0000 (10:58 -0700)]
SDP - Method used to allocate socket buffers may cause node to hang
The problem we are seeing is that if a node is under load, and
a memory allocation fails (say in sock_sendmsg()), the kernel will
use the allocation policy to decide how to proceed with the allocation.
If GFP_KERNEL is specified, then the kernel may attempt to free pages
through the iSCSI block device that is making the socket call, which
would result in a deadlock. Use of GFP_NOIO should prevent the kernel
from using the IO backend to free memory resources.
Each kernel level socket has an allocation flag to specify the
memory allocation policy for socket buffers, the default is GFP_ATOMIC
(or GFP_KERNEL for SDP). If the caller creates a socket with the
policy set to GFP_NOFS or GFP_NOIO this should be the allocation
policy used by the SDP layer.
Jim Mott [Tue, 23 Oct 2007 17:57:33 +0000 (10:57 -0700)]
SDP bug647 - Validate ChRcvBuf range and add comments
Clean up the buffer resize code to comply with CA4-83:
Upon receipt of ChRcvBuf message, the remote peer shall not
change the buffer size in the direction opposite of that
requested.
Jim Mott [Tue, 23 Oct 2007 17:51:30 +0000 (10:51 -0700)]
SDP bug646 - Do not send DisConn if there is only 1 credit
Compliance with CA4-82:
If one credit is available, an implementation shall only send SDP
messages that provide additional credits and also do not contain ULP
payload.