]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
10 years agoRDS: added stats to track and display receive side memory usage
Venkat Venkatsubra [Thu, 8 Aug 2013 05:15:05 +0000 (22:15 -0700)]
RDS: added stats to track and display receive side memory usage

Added these stats:
1. per-connection stat for number of receive buffers in cache
2. global stat for the same across all connections
3. number of bytes in socket receive buffer
Since stats are implemented using per-cpu variables and RDS currently
does unsigned arithmetic to add them up, separate counters (one for
addition and one for subtraction) are used for (2) and (3).
In the future we might change it to signed computation.

Orabug: 17045536

Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
(cherry picked from commit 4631300fcf86d459d5dbb09791ff9198c51feab1)

10 years agoRDS: RDS reconnect stalls
Bang Nguyen [Thu, 15 Aug 2013 02:10:00 +0000 (19:10 -0700)]
RDS: RDS reconnect stalls

After successfully negiotiating the version at lower protocol, RDS incorrectly
set the proposed version to the higher protocol, causing the subsequent
reconnect to stall.

The fix was not to change the proposed version after the initial connection
setup.

Orabug: 1731355

Signed-off-by: Richard Frank <richard.frank@oracle.com>
(cherry picked from commit 1a14dda0d3d3195306f3a141227eb003e895fb58)

10 years agoRDS: disable IP failover if device removed
Bang Nguyen [Fri, 26 Jul 2013 23:10:29 +0000 (16:10 -0700)]
RDS: disable IP failover if device removed

IP failover after the device has been removed can lead to panic.

The fix is to disable IP failover if the underlying device has been removed.

Orabug: 17206167

Signed-off-by: zheng.li <zheng.x.li@oracle.com>
(cherry picked from commit 84fc44b9e9fa00354892ef491d09d5eb727943b7)

10 years agoRDS: Fix a bug in QoS protocol negotiation
Bang Nguyen [Fri, 19 Jul 2013 17:28:26 +0000 (10:28 -0700)]
RDS: Fix a bug in QoS protocol negotiation

 Fix bug that may cause the connection to downgrade to lower
 protocol. Also, don't negotiate protocol on reconnect.

Orabug: 17079972

Signed-off-by: Giri adari <giri.adari@oracle.com>
(cherry picked from commit 4962a6def99ce1b80212198ebc96700a51dee694)

10 years agoRDS: alias failover is not working properly
Bang Nguyen [Fri, 19 Jul 2013 19:05:16 +0000 (12:05 -0700)]
RDS: alias failover is not working properly

This can lead to crashes or duplicate addresses. Alias will be
failed over in the following form:

e.g.,  ib0:<alias> -> ib1:P**:<alias>

Orabug: 17177994

Signed-off-by: zheng.li <zheng.x.li@oracle.com>
(cherry picked from commit 049a5ec115391ef1ad171825c4b7630550ae3328)

10 years agoadd NETFILTER suppport
Ahmed Abbas [Thu, 18 Jul 2013 23:59:59 +0000 (16:59 -0700)]
add NETFILTER suppport

Orabug: 17082619
Adds the ability for the RDS code to support the NETFILTER kernel interfaces.
This allows for packet inspection, modification, and potential redirection as
the packets flow through the lower layers of the RDS code.

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
(Ported from UEK2 commit 1913973db561fd6db2e495d3b95e6f8c78b3ba23)

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
10 years agoRDS: Local address resolution may be delayed after IP has moved. RDS to update local...
Bang Nguyen [Tue, 2 Jul 2013 23:18:38 +0000 (16:18 -0700)]
RDS: Local address resolution may be delayed after IP has moved. RDS to update local ARP cache directly to speed it up.

Orabug: 16979994

Signed-off-by: zheng.li <zheng.x.li@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
(cherry picked from commit e95af0c38f586f88521fe81432cf705748d366f9)

10 years agoRDS: restore two-sided reconnect with the lower IP node having a constant 100 ms...
Bang Nguyen [Tue, 25 Jun 2013 17:41:41 +0000 (10:41 -0700)]
RDS: restore two-sided reconnect with the lower IP node having a constant 100 ms backoff.

Orabug: 16710287

Signed-off-by: Richard Frank <richard.frank@oracle.com>
(cherry picked from commit 1e165f6511abd1d57e4be79f1a3a430c98a7576e)

10 years agords: set correct msg_namelen
Weiping Pan [Mon, 23 Jul 2012 02:37:48 +0000 (10:37 +0800)]
rds: set correct msg_namelen

commit 06b6a1cf6e776426766298d055bb3991957d90a7 upstream.

CVE-2012-3430

Jay Fenlason (fenlason@redhat.com) found a bug,
that recvfrom() on an RDS socket can return the contents of random kernel
memory to userspace if it was called with a address length larger than
sizeof(struct sockaddr_in).
rds_recvmsg() also fails to set the addr_len paramater properly before
returning, but that's just a bug.
There are also a number of cases wher recvfrom() can return an entirely bogus
address. Anything in rds_recvmsg() that returns a non-negative value but does
not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
at the end of the while(1) loop will return up to 128 bytes of kernel memory
to userspace.

And I write two test programs to reproduce this bug, you will see that in
rds_server, fromAddr will be overwritten and the following sock_fd will be
destroyed.
Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
better to make the kernel copy the real length of address to user space in
such case.

How to run the test programs ?
I test them on 32bit x86 system, 3.5.0-rc7.

1 compile
gcc -o rds_client rds_client.c
gcc -o rds_server rds_server.c

2 run ./rds_server on one console

3 run ./rds_client on another console

4 you will see something like:
server is waiting to receive data...
old socket fd=3
server received data from client:data from client
msg.msg_namelen=32
new socket fd=-1067277685
sendmsg()
: Bad file descriptor

/***************** rds_client.c ********************/

int main(void)
{
int sock_fd;
struct sockaddr_in serverAddr;
struct sockaddr_in toAddr;
char recvBuffer[128] = "data from client";
struct msghdr msg;
struct iovec iov;

sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
if (sock_fd < 0) {
perror("create socket error\n");
exit(1);
}

memset(&serverAddr, 0, sizeof(serverAddr));
serverAddr.sin_family = AF_INET;
serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
serverAddr.sin_port = htons(4001);

if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
perror("bind() error\n");
close(sock_fd);
exit(1);
}

memset(&toAddr, 0, sizeof(toAddr));
toAddr.sin_family = AF_INET;
toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
toAddr.sin_port = htons(4000);
msg.msg_name = &toAddr;
msg.msg_namelen = sizeof(toAddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;

if (sendmsg(sock_fd, &msg, 0) == -1) {
perror("sendto() error\n");
close(sock_fd);
exit(1);
}

printf("client send data:%s\n", recvBuffer);

memset(recvBuffer, '\0', 128);

msg.msg_name = &toAddr;
msg.msg_namelen = sizeof(toAddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = 128;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;
if (recvmsg(sock_fd, &msg, 0) == -1) {
perror("recvmsg() error\n");
close(sock_fd);
exit(1);
}

printf("receive data from server:%s\n", recvBuffer);

close(sock_fd);

return 0;
}

/***************** rds_server.c ********************/

int main(void)
{
struct sockaddr_in fromAddr;
int sock_fd;
struct sockaddr_in serverAddr;
unsigned int addrLen;
char recvBuffer[128];
struct msghdr msg;
struct iovec iov;

sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
if(sock_fd < 0) {
perror("create socket error\n");
exit(0);
}

memset(&serverAddr, 0, sizeof(serverAddr));
serverAddr.sin_family = AF_INET;
serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
serverAddr.sin_port = htons(4000);
if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
perror("bind error\n");
close(sock_fd);
exit(1);
}

printf("server is waiting to receive data...\n");
msg.msg_name = &fromAddr;

/*
 * I add 16 to sizeof(fromAddr), ie 32,
 * and pay attention to the definition of fromAddr,
 * recvmsg() will overwrite sock_fd,
 * since kernel will copy 32 bytes to userspace.
 *
 * If you just use sizeof(fromAddr), it works fine.
 * */
msg.msg_namelen = sizeof(fromAddr) + 16;
/* msg.msg_namelen = sizeof(fromAddr); */
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = 128;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;

while (1) {
printf("old socket fd=%d\n", sock_fd);
if (recvmsg(sock_fd, &msg, 0) == -1) {
perror("recvmsg() error\n");
close(sock_fd);
exit(1);
}
printf("server received data from client:%s\n", recvBuffer);
printf("msg.msg_namelen=%d\n", msg.msg_namelen);
printf("new socket fd=%d\n", sock_fd);
strcat(recvBuffer, "--data from server");
if (sendmsg(sock_fd, &msg, 0) == -1) {
perror("sendmsg()\n");
close(sock_fd);
exit(1);
}
}

close(sock_fd);
return 0;
}

Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
(cherry picked from commit eb3ccc4c696e5c4a10d324886fd061ea88bab6c4)

10 years agoRDS: IP config needs to be updated when network/rdma service restarted.
Bang Nguyen [Mon, 17 Jun 2013 22:08:40 +0000 (15:08 -0700)]
RDS: IP config needs to be updated when network/rdma service restarted.

Orabug: 16963884

Signed-off-by: zheng.li <zheng.x.li@oracle.com>
(cherry picked from commit 00e79a242561efec173dab4640a3eaad50b1f4b3)

10 years agoRDS: check for valid rdma id before initiating connection
Bang Nguyen [Mon, 17 Jun 2013 20:32:18 +0000 (13:32 -0700)]
RDS: check for valid rdma id before initiating connection

Connection could have been dropped while the route is being resolved
so check for valid rdma id before initiating the connection.

Orabug: 16857341

Signed-off-by: zheng.li <zheng.x.li@oracle.com>
(cherry picked from commit 5528367d56539f817182faa1f0ea35779ccac14e)

10 years agoRDS: reduce slab memory usage
Bang Nguyen [Mon, 17 Jun 2013 19:13:25 +0000 (12:13 -0700)]
RDS: reduce slab memory usage

Both rds_ib_incoming and rds_ib_frag slab objects are incorrectly
aligned, causing significant increase in slab memory usage by RDS.

Orabug: 16935507

Signed-off-by: Richard Frank <richard.frank@oracle.com>
(cherry picked from commit a7cf83092e6ad5c2d842c34b17436d4aafd00b54)

10 years agoRDS: Move connection along with IP when failing over/back.
Bang Nguyen [Fri, 7 Jun 2013 00:15:07 +0000 (17:15 -0700)]
RDS: Move connection along with IP when failing over/back.

Orabug: 16916648

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
Acked-by: Zheng Li <zheng.x.li@oracle.com>
Signed-off-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>
(cherry picked from commit 78b7d86911046c3a10ffa52d90f4f1a4523d7ac3)

10 years agoRDS: Rename HAIP parameters to Active Bonding
Bang Nguyen [Wed, 5 Jun 2013 22:49:10 +0000 (15:49 -0700)]
RDS: Rename HAIP parameters to Active Bonding

Orabug: 16810395
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
(cherry picked from commit 5fc4ef482f653e8510875e5fe0fd6936b5133d15)

10 years agords shouldn't release fmr when ib_device was already released.
Zheng Li [Tue, 4 Jun 2013 06:35:37 +0000 (14:35 +0800)]
rds shouldn't release fmr when ib_device was already released.

Orabug: 16605377

when rds_ib_remove_one return, driver's mlx4_ib_removeone
function destroy ib_device, so we must clear rds_ibdev->dev
to NULL, or will cause crash when rds connection be released,
at the moment rds_ib_dev_free through ib_device
.i.e rds_ibdev->dev to release mr and fmr, reusing the
released ib_device will cause crash.

Signed-off-by: zheng.x.li@oracle.com
Signed-off-by: bang.nguyen@oracle.com
10 years agords remove dev race.
Zheng Li [Tue, 4 Jun 2013 03:34:33 +0000 (11:34 +0800)]
rds remove dev race.

Orabug: 16605377

RDS: make sure rds_ib_remove_one() returns only after the device is freed.

This is to avoid possible race condition in which rds_ib_remove_one() returns
prematurely and IB removes the underlying device. RDS later tries to free the
device and trips over.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: bang.nguyen@oracle.com
(cherry picked from commit 62dab719ea687129dc52df2c2eec3b730d628b7a)

10 years agoreinit ip_config when service rdma restart.
Zheng Li [Tue, 4 Jun 2013 03:20:42 +0000 (11:20 +0800)]
reinit ip_config when service rdma restart.

Orabug: 16605377

reinit rds ip_config when net_device REGISTER and UNREGISTER event
happen, that will reassign new value to ip_config's member:dev and
rds_ibdev.

Signed-off-by: zheng.x.li@oracle.com
Signed-off-by: bang.nguyen@oracle.com
(cherry picked from commit 864b4ee41637414ae7916f740441cfa6509bc8dc)

10 years agords: limit the size allocated by rds_message_alloc()
Cong Wang [Sun, 3 Mar 2013 16:18:11 +0000 (16:18 +0000)]
rds: limit the size allocated by rds_message_alloc()

Orabug: 16837486
[ Upstream commit ece6b0a2b25652d684a7ced4ae680a863af041e0 ]

Dave Jones reported the following bug:

"When fed mangled socket data, rds will trust what userspace gives it,
and tries to allocate enormous amounts of memory larger than what
kmalloc can satisfy."

WARNING: at mm/page_alloc.c:2393 __alloc_pages_nodemask+0xa0d/0xbe0()
Hardware name: GA-MA78GM-S2H
Modules linked in: vmw_vsock_vmci_transport vmw_vmci vsock fuse bnep dlci bridge 8021q garp stp mrp binfmt_misc l2tp_ppp l2tp_core rfcomm s
Pid: 24652, comm: trinity-child2 Not tainted 3.8.0+ #65
Call Trace:
 [<ffffffff81044155>] warn_slowpath_common+0x75/0xa0
 [<ffffffff8104419a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff811444ad>] __alloc_pages_nodemask+0xa0d/0xbe0
 [<ffffffff8100a196>] ? native_sched_clock+0x26/0x90
 [<ffffffff810b2128>] ? trace_hardirqs_off_caller+0x28/0xc0
 [<ffffffff810b21cd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff811861f8>] alloc_pages_current+0xb8/0x180
 [<ffffffff8113eaaa>] __get_free_pages+0x2a/0x80
 [<ffffffff811934fe>] kmalloc_order_trace+0x3e/0x1a0
 [<ffffffff81193955>] __kmalloc+0x2f5/0x3a0
 [<ffffffff8104df0c>] ? local_bh_enable_ip+0x7c/0xf0
 [<ffffffffa0401ab3>] rds_message_alloc+0x23/0xb0 [rds]
 [<ffffffffa04043a1>] rds_sendmsg+0x2b1/0x990 [rds]
 [<ffffffff810b21cd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff81564620>] sock_sendmsg+0xb0/0xe0
 [<ffffffff810b2052>] ? get_lock_stats+0x22/0x70
 [<ffffffff810b24be>] ? put_lock_stats.isra.23+0xe/0x40
 [<ffffffff81567f30>] sys_sendto+0x130/0x180
 [<ffffffff810b872d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff816c547b>] ? _raw_spin_unlock_irq+0x3b/0x60
 [<ffffffff816cd767>] ? sysret_check+0x1b/0x56
 [<ffffffff810b8695>] ? trace_hardirqs_on_caller+0x115/0x1a0
 [<ffffffff81341d8e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff816cd742>] system_call_fastpath+0x16/0x1b
---[ end trace eed6ae990d018c8b ]---

Reported-by: Dave Jones <davej@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
(cherry picked from commit 1524f0a4e3e23b3c8b4235eb7d9932129cc0006b)

10 years agoRDS: Fixes to improve throughput performance
Bang Nguyen [Fri, 19 Apr 2013 15:56:14 +0000 (08:56 -0700)]
RDS: Fixes to improve throughput performance

This fixes race conditions and other feature enhancements to improve
throughput.

Ported from UEK2 patch dbe1629e3387d8c68009e1da51d1a1ca778f2501

(Changes related to LAP in the original patch in
drivers/infiniband/core/cma.c are NOT ported because we
do not have APM support in rdma_cm)

Orabug: 16571410
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: fix rds-ping spinlock recursion
jeff.liu [Mon, 8 Oct 2012 18:57:27 +0000 (18:57 +0000)]
RDS: fix rds-ping spinlock recursion

This is the revised patch for fixing rds-ping spinlock recursion
according to Venkat's suggestions.

RDS ping/pong over TCP feature has been broken for years(2.6.39 to
3.6.0) since we have to set TCP cork and call kernel_sendmsg() between
ping/pong which both need to lock "struct sock *sk". However, this
lock has already been hold before rds_tcp_data_ready() callback is
triggerred. As a result, we always facing spinlock resursion which
would resulting in system panic.

Given that RDS ping is only used to test the connectivity and not for
serious performance measurements, we can queue the pong transmit to
rds_wq as a delayed response.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
CC: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
CC: David S. Miller <davem@davemloft.net>
CC: James Morris <james.l.morris@oracle.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5175a5e76bbdf20a614fb47ce7a38f0f39e70226)

Signed-off-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>
Conflicts:
net/rds/send.c

Orabug: 16223050

Signed-off-by: Jerry Snitselaar <dev@snitselaar.org>
(cherry picked from commit 3badb20f7c232c2f72758453d01cb890ab686def)

Signed-off-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>
(cherry picked from commit bd05c6c016b911bb7d9a16f2998389b4219bb2cf)

10 years agords: Congestion flag does not get cleared causing the connection to hang
Bang Nguyen [Mon, 18 Mar 2013 20:48:57 +0000 (13:48 -0700)]
rds: Congestion flag does not get cleared causing the connection to hang

Orabug: 16424692

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
(cherry picked from commit 456165e342b25b010735d84985f2895ab7f379a9)

10 years agoAdd SIOCRDSGETTOS to get the current TOS for the socket
Bang Nguyen [Mon, 25 Feb 2013 17:18:17 +0000 (09:18 -0800)]
Add SIOCRDSGETTOS to get the current TOS for the socket

Orabug: 16397197
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
(cherry picked from commit 201e2362694aab25b4ef6b11e5bd62b75b2a0e17)

10 years agoChanges to connect/TOS interface
Bang Nguyen [Thu, 21 Feb 2013 18:21:49 +0000 (10:21 -0800)]
Changes to connect/TOS interface

Orabug: 16397197
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
(cherry picked from commit b0aa0bd4342a38bdb994b7af301ea07a4f4b5ad6)

10 years agords: this resolved crash while removing rds_rdma module. orabug: 16268201
Bang Nguyen [Tue, 19 Feb 2013 09:25:29 +0000 (01:25 -0800)]
rds: this resolved crash while removing rds_rdma module. orabug: 16268201

Signed-off-by: Bang Nguyen <band.nguyen@oracle.com>
(cherry picked from commit 0ee85d26682603e53b3e022ec70a55dfa98710f9)

10 years agords: scheduling while atomic on failover orabug: 16275095
Bang Nguyen [Tue, 19 Feb 2013 09:23:40 +0000 (01:23 -0800)]
rds: scheduling while atomic on failover orabug: 16275095

Signed-off-by: Bang Nguyen <band.nguyen@oracle.com>
(cherry picked from commit a1b048d2106086119400fccbf37129414edf3f3a)

10 years agords: unregister IB event handler on shutdown
Bang Nguyen [Tue, 5 Feb 2013 23:57:58 +0000 (15:57 -0800)]
rds: unregister IB event handler on shutdown

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
(cherry picked from commit 888623e08e35272913838f83e6a601b65683ab27)

10 years agords: HAIP support child interface
Bang Nguyen [Tue, 5 Feb 2013 23:53:00 +0000 (15:53 -0800)]
rds: HAIP support child interface

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
(cherry picked from commit 538f5d0dfa704f4dcb4afa80a1d01b1317b9cd65)

10 years agoRDS HAIP misc fixes
Bang Nguyen [Tue, 22 Jan 2013 19:08:51 +0000 (11:08 -0800)]
RDS HAIP misc fixes

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
(cherry picked from commit e0cac8d762cbfeee7f5b34d722a43e61a326d970)

10 years agoIgnore failover groups if HAIP is disabled
Bang Nguyen [Tue, 22 Jan 2013 19:08:51 +0000 (11:08 -0800)]
Ignore failover groups if HAIP is disabled

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
(cherry picked from commit 47bf625103193bf59f75a7c43c42411a04b55712)

10 years agoRDS: RDS rolling upgrade
Saeed Mahameed [Thu, 31 Jan 2013 08:37:19 +0000 (10:37 +0200)]
RDS: RDS rolling upgrade

Changes to support rolling upgrade from RDS protocol version 3.1 to 4.1

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
(cherry picked from commit 6788b32aeb00a1ac4b3815680c029911c431031a)

10 years agoRDS: Fixes warning while rds-info. spin_lock_irqsave() is changed to spin_lock_bh().
Ajaykumar Hotchandani [Wed, 16 Jan 2013 06:00:57 +0000 (22:00 -0800)]
RDS: Fixes warning while rds-info. spin_lock_irqsave() is changed to spin_lock_bh().

Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Reviewd-by: Bang Nguyen <bang.nguyen@oracle.com>
(cherry picked from commit 237b028186dd2523fbb81d47463ea8ce4e9a202d)

10 years agords: UNDO reverts done for rebase code to compile with Linux 4.1 APIs
Mukesh Kacker [Sun, 19 Apr 2015 22:26:19 +0000 (15:26 -0700)]
rds: UNDO reverts done for rebase code to compile with Linux 4.1 APIs

Commit 163377dd82f2d81809aabe736a2e0ea515055a69 does reverts
to common ancestor of upstream and UEK2 to rebase UEK2 patches
for net/rds. This commit undoes reverts needed to compile to
Linux 4.0 APIs.

UNDO Revert "net: Replace get_cpu_var through this_cpu_ptr" for net/rds
This commit does UNDO of revert of commit 903ceff7ca7b4d80c083a80ee5163b74e9fa359f for net/rds.

UNDO Revert "rds: switch ->inc_copy_to_user() to passing iov_iter"
This commit does UNDO of revert of commit c310e72c89926e06138e4881f21e4c8da3e7ef18

UNDO Revert of "rds: switch rds_message_copy_from_user() to iov_iter"
This commit does UNDO of revert of commit 083735f4b01b703184c0e11c2e384b2c60a8aea4.

UNDO Revert "put iov_iter into msghdr" for net/rds
This commit does UNDO of revert of commit c0371da6047abd261bc483c744dbc7d81a116172 for net/rds

UNDO Revert "net: introduce helper macro for_each_cmsghdr" for net/rds
This commit does UNDO of revert of commit f95b414edb18de59940dcebbefb49cf25c6d505c for net/rds

UNDO Revert "rds: Fix min() warning in rds_message_inc_copy_to_user()"
This commit does UNDO of revert of commit 6ff4a8ad4b6eae5171754fb60418bc81834aa09b.

UNDO Revert "rds: Make rds_message_copy_from_user() return 0 on success."
This commit does UNDO of revert of commit d0a47d32724bf0765b8768086ef1a7a6d074a7a0.

UNDO Revert "net: Remove iocb argument from sendmsg and recvmsg" for net/rds
This commit does UNDO of revert of commit 1b784140474e4fc94281a49e96c67d29df0efbde for net/rds.

These commits were reverted earlier to rebase unmodified UEK2 RDS code
(UNDO needed to compile to new Linux 4.1 kernel APIs - changed *after* Linux 3.18)

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
10 years agords: port to UEK4, Linux-3.18*
Ajaykumar Hotchandani [Fri, 23 Jan 2015 02:27:30 +0000 (18:27 -0800)]
rds: port to UEK4, Linux-3.18*

Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
10 years agords: disable APM support
Ajaykumar Hotchandani [Thu, 2 Apr 2015 21:35:07 +0000 (14:35 -0700)]
rds: disable APM support

The APM(Alternate Path Migration) feature is not used and its
code is being disabled. (It can be re-enabled if/when APM support
is enabled in rdma_cm.

Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
10 years agords: disable cq balance
Ajaykumar Hotchandani [Fri, 23 Jan 2015 00:04:05 +0000 (16:04 -0800)]
rds: disable cq balance

This should be enabled back after IB_CQ_VECTOR_LEAST_ATTACHED is added.

Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
10 years agords: move linux/rds.h to uapi/linux/rds.h
Ajaykumar Hotchandani [Thu, 22 Jan 2015 22:11:47 +0000 (14:11 -0800)]
rds: move linux/rds.h to uapi/linux/rds.h

to be compatible to 3.18*

Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
10 years agoRDS: Kconfig and Makefile changes
Ajaykumar Hotchandani [Thu, 22 Jan 2015 05:03:28 +0000 (21:03 -0800)]
RDS: Kconfig and Makefile changes

Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
10 years agoRDS merge for UEK2
Bang Nguyen [Mon, 14 Jan 2013 05:54:09 +0000 (21:54 -0800)]
RDS merge for UEK2

Orabug: 15997083

This is merged code of Mellanox OFED R2, 0080 release; and ofa 4.1

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
(cherry picked from commit 26add53cf20e08dfa331ec22d307dab40f0c4d74)

10 years agords: Misc Async Send fixes
Bang Nguyen [Thu, 27 Dec 2012 18:23:05 +0000 (10:23 -0800)]
rds: Misc Async Send fixes

Async send fixes to support new rds-stress option "--async"

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agords: call unregister_netdevice_notifier for rds_ib_nb in rds_ib_exit
Saeed Mahameed [Thu, 27 Dec 2012 09:58:56 +0000 (11:58 +0200)]
rds: call unregister_netdevice_notifier for rds_ib_nb in rds_ib_exit

in commit 58f6b52b114d3400fea7daffb0440ca611e45c1c

     rds: Misc HAIP fixes

netdevice_notifier rds_ib_nb is never unregistered.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
10 years agords: flush and destroy workqueue rds_aux_wq and fix creation order.
Saeed Mahameed [Wed, 26 Dec 2012 17:31:22 +0000 (19:31 +0200)]
rds: flush and destroy workqueue rds_aux_wq and fix creation order.

in commit f05d77d46d172127d3f96538a62764a2a589a61b

    rds: Add Automatic Path Migration support

    RDS APM supports automatic connection failover in case of path
    failure, and connection failback when the path recovers.

    RDS APM is enabled by module parameter rds_ib_enable_apm (disabled by
    default).

workqueue rds_aux_wq is not destroyed and it should be create prior to
rds_trans_register since rds_trans_register callbacks can use rds_aux_wq.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
10 years agords : fix compilation warning
Saeed Mahameed [Mon, 24 Dec 2012 19:14:08 +0000 (21:14 +0200)]
rds : fix compilation warning

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
10 years agords: port the code to uek2
Dotan Barak [Tue, 3 Jul 2012 10:13:22 +0000 (13:13 +0300)]
rds: port the code to uek2

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
10 years agords: CQ balance
Bang Nguyen [Fri, 30 Nov 2012 22:07:31 +0000 (14:07 -0800)]
rds: CQ balance

This patch provides load-balancing for RDS CQs across available interrupt vectors.

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agords: HAIP across HCAs
Bang Nguyen [Mon, 26 Nov 2012 16:10:22 +0000 (08:10 -0800)]
rds: HAIP across HCAs

This patch extends HAIP support to failover/failback IPs across HCAs.

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agords: Misc HAIP fixes
Bang Nguyen [Tue, 13 Nov 2012 20:27:34 +0000 (12:27 -0800)]
rds: Misc HAIP fixes

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agords: off by one fixes
Dotan Barak [Sun, 14 Oct 2012 09:26:44 +0000 (11:26 +0200)]
rds: off by one fixes

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agords: Add Automatic Path Migration support
Dotan Barak [Mon, 24 Sep 2012 18:25:51 +0000 (20:25 +0200)]
rds: Add Automatic Path Migration support

RDS APM supports automatic connection failover in case of path
failure, and connection failback when the path recovers.

RDS APM is enabled by module parameter rds_ib_enable_apm (disabled by
default).

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agords: fix error flow handling
Dotan Barak [Mon, 27 Aug 2012 15:39:36 +0000 (18:39 +0300)]
rds: fix error flow handling

In case of an error flow, an uninitialized memory was used and this caused a
kernel oops.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agonet/rds: prevent memory leak in case of error flow
Dotan Barak [Thu, 2 Aug 2012 14:55:12 +0000 (17:55 +0300)]
net/rds: prevent memory leak in case of error flow

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
10 years agords: prepare support to kernel 2.6.39-200.1.1.el5uek: add the macro NIPQUAD_*
Dotan Barak [Tue, 3 Jul 2012 08:29:51 +0000 (11:29 +0300)]
rds: prepare support to kernel 2.6.39-200.1.1.el5uek: add the macro NIPQUAD_*

Add the macro:
  NIPQUAD
  NIPQUAD_FMT

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
10 years agords: fixed wrong condition in case of error
Dotan Barak [Sun, 24 Jun 2012 06:06:50 +0000 (09:06 +0300)]
rds: fixed wrong condition in case of error

Need to use IS_ERR and not compare with NULL

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agords: fixed kernel oops in case of error flow
Dotan Barak [Thu, 21 Jun 2012 21:14:54 +0000 (00:14 +0300)]
rds: fixed kernel oops in case of error flow

If failed to create an rdma_cm handler, don't try to free it and
prevent the following kernel oops:

BUG: unable to handle kernel NULL pointer dereference at 00000000000001fc
IP: [<ffffffff814ef21f>] _spin_lock_irqsave+0x1f/0x40
PGD 175b80067 PUD 176a0b067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/module/rds/initstate
CPU 0
Modules linked in: rds_rdma(+)(U) rds(U) ib_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_srp(U) scsi_transport_srp scsi_tgt ib_ipoib(U) ib_cm(U) ib_sa(U) ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_core(U) memtrack(U) netconsole configfs nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ipv6 microcode virtio_balloon virtio_net snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: memtrack]

Pid: 22908, comm: modprobe Not tainted 2.6.32-220.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffff814ef21f>]  [<ffffffff814ef21f>] _spin_lock_irqsave+0x1f/0x40
RSP: 0018:ffff880125f07e68  EFLAGS: 00010086
RAX: 0000000000010000 RBX: fffffffffffffff4 RCX: 0000000000000000
RDX: 0000000000000286 RSI: 000000000000000a RDI: 00000000000001fc
RBP: ffff880125f07e68 R08: 0000000000000000 R09: ffff880176db4020
R10: ffff880125f07988 R11: 0000000000000002 R12: 00000000000001fc
R13: 0000000000d8e4c0 R14: 000000000000000a R15: 0000000000000000
FS:  00007f66c90cc700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000001fc CR3: 000000011e71b000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 22908, threadinfo ffff880125f06000, task ffff880174732b40)
Stack:
 ffff880125f07e98 ffffffffa03dd795 fffffffffffffff4 fffffffffffffff4
<0> 0000000000d8e4c0 0000000000000000 ffff880125f07ed8 ffffffffa03dee11
<0> ffff880125f07ee8 00000000fffffff4 00000000fffffff4 fffffffffffffff4
Call Trace:
 [<ffffffffa03dd795>] cma_exch+0x35/0x70 [rdma_cm]
 [<ffffffffa03dee11>] rdma_destroy_id+0x21/0x310 [rdma_cm]
 [<ffffffffa042a0be>] init_module+0xbe/0x118 [rds_rdma]
 [<ffffffff81096e75>] ? __blocking_notifier_call_chain+0x65/0x80
 [<ffffffffa042a000>] ? init_module+0x0/0x118 [rds_rdma]
 [<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0
 [<ffffffff810af641>] sys_init_module+0xe1/0x250
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 9c 58 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 b8 00 00 01 00 <f0> 0f c1 07 0f b7 c8 c1 e8 10 39 c1 74 0e f3 90 0f 1f 44 00 00
RIP  [<ffffffff814ef21f>] _spin_lock_irqsave+0x1f/0x40
 RSP <ffff880125f07e68>
CR2: 00000000000001fc
---[ end trace 8db2f942777f29d0 ]---

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
10 years agoRDS: fixed compilation warnings
Dotan Barak [Thu, 7 Jun 2012 05:56:34 +0000 (08:56 +0300)]
RDS: fixed compilation warnings

Fixed the following compilation warnings:

net/rds/send.c: In function 'rds_send_xmit':
net/rds/send.c:299: warning: suggest parentheses around && within ||
net/rds/rdma.c: In function 'rds_cmsg_rdma_dest':
net/rds/rdma.c:697: warning: format '%Lx' expects type 'long long unsigned int', but argument 2 has type 'u32'
net/rds/ib_recv.c: In function 'rds_ib_srqs_init':
net/rds/ib_recv.c:1570: warning: 'return' with no value, in function returning non-void

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
10 years agoRDS SRQ optional
Bang Nguyen [Sat, 14 Apr 2012 00:16:31 +0000 (17:16 -0700)]
RDS SRQ optional

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS Async send support revised
Bang Nguyen [Sat, 14 Apr 2012 00:16:31 +0000 (17:16 -0700)]
RDS Async send support revised

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS Asynchronous Send support
Bang Nguyen [Sun, 19 Feb 2012 20:19:57 +0000 (12:19 -0800)]
RDS Asynchronous Send support

1. Same behavior as RDMA send, i.e., generate notification on IB completion.
2. On error handling, connection is closed for traffic, i.e., new sends are
   rejected and client retries
3. To guarantee ordering, all pending async (RDMA/bcopy) sends after the
   failed send will also be aborted, and in the order that they were submitted.
4. Re-open connection for traffic after all the failed notifications have
   been reaped by the client.

Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
10 years agords: fix compilation warnings
Dotan Barak [Wed, 15 Feb 2012 16:00:50 +0000 (18:00 +0200)]
rds: fix compilation warnings

net/rds/ib_recv.c: In function 'rds_ib_srq_event':
net/rds/ib_recv.c:1490: warning: too many arguments for format
net/rds/ib_recv.c:1484: warning: unused variable 'srq_attr'
net/rds/ib_recv.c: In function 'rds_ib_srq_init':
net/rds/ib_recv.c:1524: warning: passing argument 1 of 'ERR_PTR' makes
integer from pointer without a cast
include/linux/err.h:20: note: expected 'long int' but argument is of
type 'struct ib_srq *'
net/rds/ib_recv.c:1524: warning: format '%d' expects type 'int', but
argument 2 has type 'void *'

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
10 years agoRDS: cleanup checkpatch errors
Bang Nguyen [Wed, 8 Feb 2012 21:31:22 +0000 (13:31 -0800)]
RDS: cleanup checkpatch errors

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS Quality Of Service
Bang Nguyen [Fri, 3 Feb 2012 16:10:06 +0000 (11:10 -0500)]
RDS Quality Of Service

RDS QoS is an extension of IB QoS to provide clients the ability to
segregate traffic flows and define policy to regulate them.
Internally, each traffic flow is represented by a connection with all of its
independent resources like that of a normal connection, and is
differentiated by service type. In other words, there can be multiple
connections between an IP pair and each supports a unique service type.
Service type (TOS) is user-defined and can be configured to satisfy certain
traffic requirements. For example, one service type may be configured for
high-priority low-latency traffic, another for low-priority high-bandwidth
traffic, and so on.

TOS is socket based. Client can set TOS on a socket via an IOCTL and must
do so before initiating any traffic. Once set, the TOS can not be changed.

        ioctl(fd, RDS_IOC_SET_TOS=1, (uint8_t *)<TOS ptr>)

All out-going traffic from the socket will be associated with its TOS.

Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: Use IB_CQ_NEXT_COMP instead of IB_CQ_SOLICITED for TX CQ
Bang Nguyen [Fri, 3 Feb 2012 16:09:49 +0000 (11:09 -0500)]
RDS: Use IB_CQ_NEXT_COMP instead of IB_CQ_SOLICITED for TX CQ

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: make sure rds_send_xmit doesn't loop forever
Chris Mason [Fri, 3 Feb 2012 16:09:49 +0000 (11:09 -0500)]
RDS: make sure rds_send_xmit doesn't loop forever

rds_send_xmit can get stuck doing work on behalf of other senders.  This
breaks out if we've been working too long.  The work queue will get kicked
to finish off any other requests if our current process gives up.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: issue warning if re-connect stalling for more than 1 min.
Bang Nguyen [Fri, 3 Feb 2012 16:09:49 +0000 (11:09 -0500)]
RDS: issue warning if re-connect stalling for more than 1 min.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: don't test ring_empty or ring_low without locks held
Chris Mason [Fri, 3 Feb 2012 16:09:36 +0000 (11:09 -0500)]
RDS: don't test ring_empty or ring_low without locks held

The math in the ring functions can't be trusted unless you're either the only
person adding to the ring or the only person freeing from it.  If there are no
locks held at all you can end up hitting bogus assertions around the ring counters.

This chnages the rds_ib_recv_refill code and the recv tasklet code to make sure
proper locks are held before we use rds_ib_ring_empty or rds_ib_ring_low

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: don't use RCU for the bind hash table
Chris Mason [Fri, 3 Feb 2012 16:09:23 +0000 (11:09 -0500)]
RDS: don't use RCU for the bind hash table

RCU delays are making socket shutdown too slow.  Switch to a reader/writer lock so
that we don't risk ooming as we wait for sockets to free

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: avoid double destory of cm_id when rdms_resolve_route fails
Venkat Venkatsubra [Fri, 3 Feb 2012 16:09:07 +0000 (11:09 -0500)]
RDS: avoid double destory of cm_id when rdms_resolve_route fails

It crashes in rds_ib_conn_shutdown because it was using a freed cm_id.  The
cm_id had got freed quite a while back actually (more than 15 secs back) during
an earlier connect attempt.

This was the sequence of the earlier connect attempt: rds_ib_conn_connect calls
rdma_resolve_addr.  The synchronous part of rdma_resolve_addr succeeds. But the
asynchronous part fails at some point.  RDMA Connection Manager returns the
event RDMA_CM_EVENT_ADDR_RESOLVED. This part succeeds.  Next, RDS calls
rdma_resolve_route from the rds_rdma_cm_event_handler. This fails.  We return
this error back to the RDMA CM addr_handler which destroys the cm_id as
follows: addr_handler (cma.c):

static void addr_handler(int status, struct sockaddr *src_addr,
                         struct rdma_dev_addr *dev_addr, void *context)
{
     .....
        if (id_priv->id.event_handler(&id_priv->id, &event)) {
                cma_exch(id_priv, CMA_DESTROYING);
                mutex_unlock(&id_priv->handler_mutex);
                cma_deref_id(id_priv);
                rdma_destroy_id(&id_priv->id);    <----  here
                return;
        }

RDS continues to point to this freed cm_id.

Later when a new connect req comes in from the remote side, we shutdown this cm_id
and try to reconnect:
  /*
   * after 15 seconds, give up on existing connection
   * attempts and make them try again.  At this point
   * it's no longer a race but something has gone
   * horribly wrong
   */
   if (now > conn->c_connection_start &&
           now - conn->c_connection_start > 5) {
              printk(KERN_CRIT "rds connection racing for 15s, forcing reset "
                        "connection %u.%u.%u.%u->%u.%u.%u.%u\n",
                        NIPQUAD(conn->c_laddr), NIPQUAD(conn->c_faddr));
       rds_conn_drop(conn);
          ....
We crash during the shutdown.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: make sure rds_send_drop_to properly takes the m_rs_lock
Chris Mason [Fri, 3 Feb 2012 16:09:07 +0000 (11:09 -0500)]
RDS: make sure rds_send_drop_to properly takes the m_rs_lock

rds_send_drop_to is used during socket tear down to find all the
messages on the socket and clean them up.  It can race with the
acking code unless it takes the m_rs_lock on each and every message.

This plugs a hole where we didn't take m_rs_lock on any message that
didn't have the RDS_MSG_ON_CONN set.  Taking m_rs_lock avoids
double frees and other memory corruptions as the ack code trusts
the message m_rs pointer on a socket that had actually been freed.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: kick krdsd to send congestion map updates
Chris Mason [Fri, 3 Feb 2012 16:09:07 +0000 (11:09 -0500)]
RDS: kick krdsd to send congestion map updates

We can get into a deadlock on the recv spinlock because
congestion map updates can be sent in the recev path.  This
pushes the work off to krdsd instead.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: add debuging code around sock_hold and sock_put.
Chris Mason [Fri, 3 Feb 2012 16:09:07 +0000 (11:09 -0500)]
RDS: add debuging code around sock_hold and sock_put.

RDS had a recent series of memory corruptions because of
a use-after-free and double-free of rds sockets.  This adds
some debugging code around sock_put and sock_hold to
catch any similar bugs and spit out useful debugging info.

This is a temporary commit while customers try out our fix.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: Don't destroy the rdma id until after we're dong using it
Chris Mason [Fri, 3 Feb 2012 16:09:07 +0000 (11:09 -0500)]
RDS: Don't destroy the rdma id until after we're dong using it

During connection resets, we are destroying the rdma id too soon.
This moves it to after we clear the rings

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: adjust BUG()s for irqs disabled.
Chris Mason [Fri, 3 Feb 2012 16:09:07 +0000 (11:09 -0500)]
RDS: adjust BUG()s for irqs disabled.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agords: make sure we don't deref a null cm_id->device during address checks
Chris Mason [Fri, 3 Feb 2012 16:09:07 +0000 (11:09 -0500)]
rds: make sure we don't deref a null cm_id->device during address checks

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: don't use GFP_ATOMIC for sk_alloc in rds_create
Chris Mason [Fri, 3 Feb 2012 16:08:51 +0000 (11:08 -0500)]
RDS: don't use GFP_ATOMIC for sk_alloc in rds_create

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: Make sure we do a signaled send at least once per large send
Chris Mason [Fri, 3 Feb 2012 16:08:50 +0000 (11:08 -0500)]
RDS: Make sure we do a signaled send at least once per large send

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: Fix an rcu race with rds_bin_lookup
Tina Yang [Fri, 3 Feb 2012 16:08:50 +0000 (11:08 -0500)]
RDS: Fix an rcu race with rds_bin_lookup

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: Fix RDS_MSG_MAPPED usage.
Chris Mason [Fri, 3 Feb 2012 16:08:50 +0000 (11:08 -0500)]
RDS: Fix RDS_MSG_MAPPED usage.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: add a sock_destruct callback with debugging
Chris Mason [Fri, 3 Feb 2012 16:08:50 +0000 (11:08 -0500)]
RDS: add a sock_destruct callback with debugging

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: add a sock_destruct callback with debugging
Tina Yang [Fri, 3 Feb 2012 16:07:54 +0000 (11:07 -0500)]
RDS: add a sock_destruct callback with debugging

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: limit the number of times we loop in rds_send_xmit
Chris Mason [Fri, 3 Feb 2012 16:07:54 +0000 (11:07 -0500)]
RDS: limit the number of times we loop in rds_send_xmit

This will kick the RDS worker thread if we have been looping
too long.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS Make sure we check for congestion updates during rds_send_xmit
Chris Mason [Fri, 3 Feb 2012 16:07:54 +0000 (11:07 -0500)]
RDS Make sure we check for congestion updates during rds_send_xmit

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoMake sure to kick rds_send_xmit for both LL_SEND_FULL and for the congestion map...
Chris Mason [Fri, 3 Feb 2012 16:07:54 +0000 (11:07 -0500)]
Make sure to kick rds_send_xmit for both LL_SEND_FULL and for the congestion map updates.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: make sure we post recv buffers
Chris Mason [Fri, 3 Feb 2012 16:07:54 +0000 (11:07 -0500)]
RDS: make sure we post recv buffers

If we get an ENOMEM during rds_ib_recv_refill, we might never come
back and refill again later.

This makes sure to kick krdsd into helping out.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: don't trust the LL_SEND_FULL bit
Chris Mason [Fri, 3 Feb 2012 16:07:54 +0000 (11:07 -0500)]
RDS: don't trust the LL_SEND_FULL bit

We are seeing connections stuck with the LL_SEND_FULL bit getting
set and never cleared.  This changes RDS to stop trusting the
LL_SEND_FULL bit and kick krdsd after any time we
see -ENOMEM from the ring allocation code.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: give up on half formed connections after 15s
Chris Mason [Fri, 3 Feb 2012 16:07:54 +0000 (11:07 -0500)]
RDS: give up on half formed connections after 15s

RDS relies on events to transition connections through a few
different states, but sometimes we get stuck and end up with
a half formed connection that is never able to finish

The other end has either wandered off or there are bugs in
other layers, and we end up with any future attempts from
the other end rejected because we're already working on a
connection attempt.

This patch changes things to give up on half formed connections
after 15 seconds.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agords_send_xmit is called uner a spinlock, lets not do a cond_resched()
Chris Mason [Fri, 3 Feb 2012 16:07:41 +0000 (11:07 -0500)]
rds_send_xmit is called uner a spinlock, lets not do a cond_resched()

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: make sure not to loop forever inside rds_send_xmit
Chris Mason [Fri, 3 Feb 2012 16:07:41 +0000 (11:07 -0500)]
RDS: make sure not to loop forever inside rds_send_xmit

If a determined set of concurrent senders keep the send queue full,
we can loop forever insdie rds_send_xmit.  This fix has two parts.

First we are dropping out of the while(1) loop after we've processed a
large batch of messages.

Second we add a generation number that gets bumped each time the
xmit bit lock is acquired.  If someone else has jumped in and
made progress in the queue, we skip our goto restart.

Signed-off-by: Chris Mason <chris.mason@oracle.c.om>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agords: check for excessive looping in rds_send_xmit
Andy Grover [Thu, 13 Jan 2011 19:40:31 +0000 (11:40 -0800)]
rds: check for excessive looping in rds_send_xmit

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agords: don't update ipaddress tables if the address hasn't changed
Chris Mason [Fri, 3 Feb 2012 16:07:41 +0000 (11:07 -0500)]
rds: don't update ipaddress tables if the address hasn't changed

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agochange ib default retry to 1
Andy Grover [Fri, 24 Sep 2010 17:16:37 +0000 (10:16 -0700)]
change ib default retry to 1

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoThis patch adds the modparam to rds.ko.
Andy Grover [Fri, 3 Feb 2012 16:07:40 +0000 (11:07 -0500)]
This patch adds the modparam to rds.ko.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: only use passive connections when addresses match
Zach Brown [Fri, 3 Feb 2012 16:07:40 +0000 (11:07 -0500)]
RDS: only use passive connections when addresses match

Passive connections were added for the case where one loopback IB
connection between identical addresses needs another connection to store
the second QP.  Unfortunately, they were also created in the case where
the addesses differ and we already have both QPs.

This lead to a message reordering bug.

- two different IB interfaces and addresses on a machine: A B
- traffic is sent from A to B
- connection from A-B is created, connect request sent
- listening accepts connect request, B-A is created
- traffic flows, next_rx is incremented
- unacked messages exist on the retrans list
- connection A-B is shut down, new connect request sent
- listen sees existing loopback B-A, creates new passive B-A
- retrans messages are sent and delivered because of 0 next_rx

The problem is that the second connection request saw the previously
existing parent connection.  Instead of using it, and using the existing
next_rx_seq state for the traffic between those IPs, it mistakenly
thought that it had to create a passive connection.

We fix this by only using passive connections in the special case where
laddr and faddr match.  In this case we'll only ever have one parent
sending connection requests and one passive connection created as the
listening path sees the existing parent connection which initiated the
request.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: destroy the ib state that generates call back earlier during shutdown
Chris Mason [Fri, 3 Feb 2012 16:07:40 +0000 (11:07 -0500)]
RDS: destroy the ib state that generates call back earlier during shutdown

Otherwise we can get callbacks after the QP isn't really able to handle them.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: check access on pages before doing copy_to_user
Chris Mason [Fri, 3 Feb 2012 16:07:40 +0000 (11:07 -0500)]
RDS: check access on pages before doing copy_to_user

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS/IB: always free recv frag as we free its ring entry
Zach Brown [Fri, 3 Feb 2012 16:07:40 +0000 (11:07 -0500)]
RDS/IB: always free recv frag as we free its ring entry

We were still seeing rare occurances of the WARN_ON() that indicates
that the recv refill path was finding allocated frags in ring entries
that were marked free.  These were usually followed by oom crashes.
They only seem to be occuring in the presence of interesting completion
errors and connection resets.

There are error paths in rds_ib_recv_cqe_handler() that could leave a
recv frag sitting in the ring.  This patch ensures that we free the frag
as we mark the ring entry free.  This should stop the refill path from
finding allocated frags in ring entries that were marked free.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS/IB: Quiet warnings when leaking frags
Andy Grover [Tue, 7 Sep 2010 17:59:44 +0000 (10:59 -0700)]
RDS/IB: Quiet warnings when leaking frags

We have a race where sometimes we leak frags, and it hits
the WARN_ON. Unfortunately, the stream of WARN_ONs make
the machine unusable. This patch changes to WARN_ON_ONCE
so we do not hose the box, and we can still get notifications
the bug has occurred.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoFix loopback connection reference counts
Zach Brown [Tue, 3 Aug 2010 13:20:09 +0000 (09:20 -0400)]
Fix loopback connection reference counts

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: cancel connection work structs as we shut down
Zach Brown [Fri, 23 Jul 2010 17:37:33 +0000 (10:37 -0700)]
RDS: cancel connection work structs as we shut down

Nothing was canceling the send and receive work that might have been
queued as a conn was being destroyed.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: don't call rds_conn_shutdown() from rds_conn_destroy()
Zach Brown [Fri, 23 Jul 2010 17:36:58 +0000 (10:36 -0700)]
RDS: don't call rds_conn_shutdown() from rds_conn_destroy()

rds_conn_shutdown() can return before the connection is shut down when
it encounters an existing state that it doesn't understand.  This lets
rds_conn_destroy() then start tearing down the conn from under paths
that are still using it.

It's more reliable the shutdown work and wait for krdsd to complete the
shutdown callback.  This stopped some hangs I was seeing where krdsd was
trying to shut down a freed conn.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: have sockets get transport module references
Zach Brown [Fri, 23 Jul 2010 17:32:31 +0000 (10:32 -0700)]
RDS: have sockets get transport module references

Right now there's nothing to stop the various paths that use
rs->rs_transport from racing with rmmod and executing freed transport
code.  The simple fix is to have binding to a transport also hold a
reference to the transport's module, removing this class of races.

We already had an unused t_owner field which was set for the modular
transports and which wasn't set for the built-in loop transport.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
10 years agoRDS: remove old rs_transport comment
Zach Brown [Wed, 21 Jul 2010 22:13:25 +0000 (15:13 -0700)]
RDS: remove old rs_transport comment

rs_transport is now also used by the rdma paths once the socket is
bound.  We don't need this stale comment to tell us what cscope can.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>