Kernel always crashed in the following test case:
user program: socket+bind+listen+accept. socket accpted.
shell: rmmod mlx4_ib
user program: close <<< CRASH
The fix closes any socket that its ib_device is the device being removed.
Amir Vadai [Thu, 15 Apr 2010 08:57:11 +0000 (11:57 +0300)]
sdp: Don't try to allocate FMR larger than RLIMIT_MEMLOCK
During ZCopy. If don't have CAP_IPC_LOCK capability and current
max number of locked pages is smaller than the buffer size, split
the send into small fragments.
Amir Vadai [Thu, 15 Apr 2010 08:52:19 +0000 (11:52 +0300)]
sdp: Don't count sdp header twice when calculating size_goal
sizeof(struct sdp_bsdh) is included inside skb->len. Ignore it
when calculating maximum payload of the skb.
This mistake caused every BCopy send of 32K (and its multiples) to
be split and thus got bad performance.
sdp: added differentiation between bind failures of sdp.
When bind()ing in mode 'BOTH', bind(sdp_sock) might fail if:
1. the IP&port is already bounded.
2. the IP is not part of IB network.
previous implementation returned errno=EADDRINUSE either way.
Only the first case should fail the bind(), the second is legitimate
because the TCP socket will hanle the connection.
This fix corresponds to a fix in libsdp.
Amir Vadai [Thu, 25 Feb 2010 09:43:03 +0000 (11:43 +0200)]
sdp: SendSM wasn't sent sometimes after getting SrcAvailCancel
* skb was freed if rx_sa is aborted - preventing SendSM
to be sent.
* Didn't update rx_sa->used in case of SrcAvailCancel
and therefore not sending RdmaRdCompl.
This also caused the next read to fail because offset
wasn't updated
Amir Vadai [Wed, 24 Feb 2010 08:59:31 +0000 (10:59 +0200)]
sdp: Fix bug in crossing SrcAvail
* Handle RdmaRdCompl in interrupt, before splitted to two Q's
This way the handling is sequencial, and no race could occure
between RdmaRdCompl and SrcAvailCancel
* Fixed an error when checking that RdmaRdCompl is not for
old SrcAvail
If sdp_add_device() fails, there is no client data stored in the IB device,
leading to a kernel crash when a connection is being established. Fix this
by rejecting connections when the device is not initialized.
Also, fix a bad goto target in an error case early in sdp_init_qp().
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Amir Vadai [Tue, 16 Feb 2010 15:36:00 +0000 (17:36 +0200)]
sdp: Fix bugs in huge paged HW's
* Protect some constants that are based on PAGE_SIZE:
- FMR size
- xmit_goal
* renamed SDP_HEAD_SIZE => SDP_SKB_HEAD_SIZE
* removed unneeded special IA64 code due to changes here
Amir Vadai [Sun, 24 Jan 2010 15:12:34 +0000 (17:12 +0200)]
sdp: must use ib_sg_dma_*, not sg_dma_* for mapping
This fixes OFED bug 1895, althoug some warnings are still generated,
when running qperf sdp_bw with large sizes (using zcopy), on the
truescale adapters.
Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Amir Vadai [Tue, 24 Nov 2009 07:32:39 +0000 (09:32 +0200)]
sdp: fixed BUG1796 - running out of memory on rx
rcv queue could grow endlessly because minimal RX buffers in QP
was set to SDP_MIN_TX_CREDITS + 1 - so there always were credits
available for the sender.
Jack Morgenstein [Sun, 30 Aug 2009 14:16:13 +0000 (17:16 +0300)]
sdp: incorrect SDP_FMR_SIZE on 32-bit machines
On 32-bit machines, sizeof (u64 *) is 4 bytes (size of a ***pointer***).
However, the max SDP FMR pool size should be PAGE_SIZE / sizeof(an mtt entry) --
and mtt entries are u64's (or __be64's).
This resulted in SDP requesting twice as many entries per pool on 32-bit machines
as could fit on a single page -- with the result that the fmr pool allocation failed
at driver startup.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Jack Morgenstein [Sun, 30 Aug 2009 14:24:07 +0000 (17:24 +0300)]
sdp: check if sdp device is actually present in sdp_remove_one
If sdp fails to initialize at driver startup for any reason,
the device is still registered with the ib_core, but there will be
no client data (i.e., ib_set_client_data() will not be called, and all
kernel resources are de-allocated).
On removal, ib_get_client_data() will return NULL in this case -- and this
must be tested for -- or we will get a kernel Oops for a NULL pointer
dereference.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Amir Vadai [Tue, 28 Apr 2009 06:40:10 +0000 (09:40 +0300)]
sdp: BUG1311 Netpipe fails with a IB_WC_LOC_LEN_ERR.
This problem is seen when the receive buffer or the receive buffer fragments
are smaller than the senders buffer. If the sender is using 64KB pages
and supports a sk fragment of 64KB it may send fragments that are a full 64KB
in length causing the receiver to generate an IB_WC_LOC_LEN_ERR. This patch
makes two changes:
If the kernel does not support a full 64KB fragment it will reject resize
requests over 32K. (On older kernels a fragment size is defined as a U16)
If a kernel supports a 64KB fragment then it allows a full 64KB receive
fragment to be used.
Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Nicolas Morey-Chaisemartin [Wed, 29 Apr 2009 14:23:04 +0000 (16:23 +0200)]
sdp: change orphan_count and sockets_allocated from atomic_t to percpu_counter
Fixed SDP to work on 2.6.29+
As percpu_counter are huge they can be allocated on the stack without causing sdp module to crash.
Both variable are now dynamically allocated at module init.
Signed-off-by: Nicolas Morey-Chaisemartin <nicolas.morey-chaisemartin@ext.bull.net> Signed-off-by: Amir Vadai <amirv@mellanox.co.il>