]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
9 years agoRDS: fix the sg allocation based on actual message size
Wei Lin Guay [Fri, 20 Nov 2015 23:14:48 +0000 (15:14 -0800)]
RDS: fix the sg allocation based on actual message size

Fix an issue where only PAGE_SIZE bytes are allocated per
scatter-gather entry (SGE) regardless of the actual message
size: Furthermore, use buddy memory allocation technique to
allocate/free memmory (if possible) to reduce SGE.

Orabug: 21894138
Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoRDS: make congestion code independent of PAGE_SIZE
Santosh Shilimkar [Mon, 16 Nov 2015 21:28:11 +0000 (13:28 -0800)]
RDS: make congestion code independent of PAGE_SIZE

RDS congestion map code is designed with base assumption of
4K page size. The map update as well transport code assumes
it that way. Ofcourse it breaks when transport like IB starts
supporting larger fragments than 4K.

To overcome this limitation without too many changes to the core
congestion map update logic, define indepedent RDS_CONG_PAGE_SIZE
and use it.

While at it we also move rds_message_map_pages() whose sole
purpose it to map congestion pages to congestion code.

Orabug: 21894138
Reviwed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoRDS: Back out OoO send status fix since it causes the regression
Santosh Shilimkar [Tue, 16 Feb 2016 18:24:31 +0000 (10:24 -0800)]
RDS: Back out OoO send status fix since it causes the regression

With the DB build, the crash was observed which was boiled down to
this change on UEK2. Proactively we back this out on UEK4 as well
till the issue gets addresssed.

<0>------------[ cut here ]------------
<2>kernel BUG at net/rds/send.c:511!
<0>invalid opcode: 0000 [#1] SMP
<4>CPU 197
<4>Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U)
ipmi_poweroff ipmi_devintf ipmi_si ipmi_msghandler bonding acpi_cpufreq
freq_table mperf rds_rdma rds ib_sdp ib_ipoib rdma_ucm ib_ucm ib_uverbs
ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 mlx4_ib ib_sa ib_mad ib_core
mlx4_core ext3 jbd fuse ghes hed wmi i2c_i801 iTCO_wdt
iTCO_vendor_support
igb i2c_algo_bit i2c_core ixgbe hwmon dca sg ext4 mbcache jbd2 sd_mod
crc_t10dif megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan]
<4>
<4>Pid: 85252, comm: oracle_85252_tp Tainted: P

[...]

<0>Call Trace:
<4> [<ffffffffa0340cab>] rds_send_drop_to+0xcb/0x470 [rds]
<4> [<ffffffffa033a19e>] rds_release+0x8e/0x110 [rds]
<4> [<ffffffff8142a369>] sock_release+0x29/0x90
<4> [<ffffffff8142a3e7>] sock_close+0x17/0x30
<4> [<ffffffff8116f2fe>] __fput+0xbe/0x240
<4> [<ffffffff8116f4a5>] fput+0x25/0x30
<4> [<ffffffff8116b3e3>] filp_close+0x63/0x90
<4> [<ffffffff8116b4c7>] sys_close+0xb7/0x120
<4> [<ffffffff81516762>] system_call_fastpath+0x16/0x1b
<0>Code: 00 75 1f 65 8b 14 25 58 c3 00 00 48 63 d2 48 c7 c0 80 26 01 00
48 03
04 d5 40 0e 99 81 48 83 40 68 01 5b 41 5c c9 c3 0f 0b eb fe <0f> 0b eb
fe 0f
1f 40 00 55 48 89 e5 48 83 ec 40 48 89 5d d8 4c
<1>RIP  [<ffffffffa033e5e8>] rds_send_sndbuf_remove+0x68/0x70 [rds]
<4> RSP <ffff8825b0577dd8>

Revert "RDS: Fix out-of-order RDS_CMSG_RDMA_SEND_STATUS"

This reverts commit 5631f1a303104d41f6ded0d603011d6c172b8644.

Orabug: 21894138
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agonet/mlx4_core: Modify default value of log_rdmarc_per_qp to be consistent with HW...
Yuval Shaia [Thu, 17 Sep 2015 09:52:44 +0000 (02:52 -0700)]
net/mlx4_core: Modify default value of log_rdmarc_per_qp to be consistent with HW capability

This value is used to calculate max_qp_dest_rdma.
Default value of 4 brings us to 16 while HW supports 128
(max_requester_per_qp)
Although this value can be changed by module param it is best that default
will be optimized

Orabag: 19883194

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
9 years agoFixed vnic issue after saturn reset
Pradeep Gopanapalli [Tue, 1 Mar 2016 01:45:59 +0000 (01:45 +0000)]
Fixed vnic issue after saturn reset

After saturn reboots uVnic will not recover
The issue happens When a host connected to NM3 via infiniband switch.
The fact here is xve driver sets OPER_DOWN when it losses HEART_BEAT and
there is no way for it set this state back .
Fixing this by disjoining Multicast group once HeartBeat is lost and
joining back once saturn comes back

Orabug: 22862488

Reported-by: Ye Jin <ye.jin@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
9 years agouvnic issues
Pradeep Gopanapalli [Thu, 11 Feb 2016 23:29:39 +0000 (23:29 +0000)]
uvnic issues

When there is Heartbeat loss because of Link event on  EDR fabric, uvnic
recovery fails(some times).
This is due to the fact that XVE_OPER_UP state would have cleared and
xve driver has to check this once multicast group membership is retained

While running MAXq test we are running into soft crash around
skb_try_coalesce .Since xve driver allocates PAGE_SIZE it has to use
full PAGE_SIZE for truesize of skb.

xve driver doesn't allocate enough tailroom in skbs, so IP/TCP
stacks need to reallocate skb head to pull IP/TCP headers.
this .

xve allocates some resources which are not needed by uVnic
functionality.
Fix this by using cm_supported flag.

Orabug: 22862488

Reported-by: ye jin <ye.jin@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
9 years agoFixed wrongly checked return type Added Debug print
Pradeep Gopanapalli [Tue, 9 Feb 2016 23:15:19 +0000 (23:15 +0000)]
Fixed wrongly checked return type Added Debug print

In xs_post_recv instead of checking for -ENOMEM xscore driver is checking
 ENOMEM, fixed this by checking proper return type

Orabug: 22862488

Reported-by: Haakon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Reviewed-by: sajid zia <szia@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
9 years agoRDS/IB: VRPC DELAY / OSS RECONNECT CAUSES 5 MINUTE STALL ON PORT FAILURE
Venkat Venkatsubra [Tue, 1 Mar 2016 22:27:28 +0000 (14:27 -0800)]
RDS/IB: VRPC DELAY / OSS RECONNECT CAUSES 5 MINUTE STALL ON PORT FAILURE

This problem occurs when the user gets notified of a successful
rdma write + bcopy message completion but the peer application
does not receive the bcopy message. This happens during a port down/up test.

What seems to happen is the rdma write succeeds but the bcopy message fails.

RDS should not be returning successful completion status to the user
in this case.

When RDS does a rdma followed by a bcopy message the user notification is
supposed to be implemented by method #3 below.

/* If the user asked for a completion notification on this
 * message, we can implement three different semantics:
 *  1.  Notify when we received the ACK on the RDS message
 *      that was queued with the RDMA. This provides reliable
 *      notification of RDMA status at the expense of a one-way
 *      packet delay.
 *  2.  Notify when the IB stack gives us the completion event for
 *      the RDMA operation.
 *  3.  Notify when the IB stack gives us the completion event for
 *      the accompanying RDS messages.
 * Here, we implement approach #3. To implement approach #2,
 * we would need to take an event for the rdma WR. To implement #1,
 * don't call rds_rdma_send_complete at all, and fall back to the notify
 * handling in the ACK processing code.

But unfortunately the user gets notified earlier to knowing the bcopy
send status. Right after rdma write completes the user gets notified
even though the subsequent bcopy eventually fails.

The fix is to delay signaling completions of rdma op till the
bcopy send completes.

Orabug: 22847528

Acked-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
9 years agords: add infrastructure to find more details for reconnect failure
Ajaykumar Hotchandani [Fri, 4 Mar 2016 03:23:05 +0000 (19:23 -0800)]
rds: add infrastructure to find more details for reconnect failure

This patch adds run-time support to debug scenarios where reconnect is
not successful for certain time.
We add two sysctl variables for start time and end time. These are
number of seconds after reconnect was initiated.

Orabug: 22631108

Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Acked-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
9 years agords: find connection drop reason
Ajaykumar Hotchandani [Fri, 4 Mar 2016 03:18:28 +0000 (19:18 -0800)]
rds: find connection drop reason

This patch attempts to find connection drop details.

Rational for adding this type of patch is, there are too many
places from where connection can get dropped.
And, in some cases, we don't have any idea of the source of
connection drop. This is especially painful for issues which
are reproducible in customer environment only.

Idea here is, we have tracker variable which keeps latest value
of connection drop source.
We can fetch that tracker variable as per our need.

Orabug: 22631108

Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
9 years agoRDS: Add interface for receive MSG latency trace
Santosh Shilimkar [Fri, 11 Dec 2015 20:01:56 +0000 (12:01 -0800)]
RDS: Add interface for receive MSG latency trace

Socket option to tap receive path latency.
SO_RDS: SO_RDS_MSG_RXPATH_LATENCY
with parameter,
struct rds_rx_trace_so {
u8 rx_traces;
        u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
}

CMSG:
RDS_CMSG_RXPATH_LATENCY(recvmsg)
Returns rds message latencies in various stages of receive
path in nS. Its set per socket using SO_RDS_MSG_RXPATH_LATENCY
socket option. Legitimate points are defined in
enum rds_message_rxpath_latency. More points can be added in
future.

CSMG format:
struct rds_cmsg_rx_trace {
        u8 rx_traces;
        u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
        u64 rx_trace[RDS_MSG_RX_DGRAM_TRACE_MAX];
}

Receive MSG trace points: RDS message Receive Path Latency points
enum rds_message_rxpath_latency {
RDS_MSG_RX_HDR_TO_DGRAM_START = 0,
RDS_MSG_RX_DGRAM_REASSEMBLE,
RDS_MSG_RX_DGRAM_DELIVERED,
RDS_MSG_RX_DGRAM_TRACE_MAX
}

Tested-by: Namrata Jampani <namrata.jampani@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Orabug: 22630180
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoIB/mlx4: Replace kfree with kvfree in mlx4_ib_destroy_srq
Wengang Wang [Thu, 17 Dec 2015 02:54:15 +0000 (10:54 +0800)]
IB/mlx4: Replace kfree with kvfree in mlx4_ib_destroy_srq

Upstream commit 0ef2f05c7e02ff99c0b5b583d7dee2cd12b053f2 uses vmalloc for
WR buffers when needed and uses kvfree to free the buffers. It missed
changing kfree to kvfree in mlx4_ib_destroy_srq().

Reported-by: Matthew Finaly <matt@Mellanox.com>
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Backport from upstream: df4176677fdf7ac0c5748083eb7a5b269fb1e156

Orabug: 22487409
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Acked-by: Yuval Shaia <yuval.shaia@oracle.com>
9 years agoib_core: Add udata argument to alloc_shpd()
Mukesh Kacker [Tue, 22 Sep 2015 08:31:31 +0000 (01:31 -0700)]
ib_core: Add udata argument to alloc_shpd()

ib_core: Add udata argument to alloc_shpd()

Add udata argument to shared pd interface alloc_shpd()
consistent with evolution of other similar ib_core
interfaces so providers that wish to support it can use it.

For providers (like current Mellanox driver code) that
do not expect user user data, we assert a warning.

Orabug: 21884873

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
9 years agoRDS: establish connection for legitimate remote RDMA message
Santosh Shilimkar [Thu, 29 Oct 2015 16:24:46 +0000 (09:24 -0700)]
RDS: establish connection for legitimate remote RDMA message

The first message to a remote node should prompt a new
connection even if it is RDMA operation via CMSG. So that
means before CMSG parsing, the connection needs to be
established. Commit 3d6e0fed8edc ("rds_rdma: rds_sendmsg
should return EAGAIN if connection not setup")' tried to
address that issue as part of bug 20232581.

But it inadvertently broke the QoS policy evaluation. Basically
QoS has opposite requirement where it needs information from
CMSG to evaluate if the message is legitimate to be sent over
the wire. It basically needs to know how the total payload
which should include the actual payload and additional rdma
bytes. It then evaluates total payload with the systems QoS
thresholds to determine if the message is legitimate to be
sent.

Patch addresses these two opposite requirement by fetching
only the rdma bytes information for QoS evaluation and let
the full CMSG parsing happen after the connection is
initiated.  Since the connection establishment is asynchronous,
we make sure the map failure because of unavailable
connection reach to the user by appropriate error code.

Orabug: 22139696

Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agords: remove the _reuse_ rds ib pool statistics
Wengang Wang [Tue, 10 Nov 2015 02:43:38 +0000 (10:43 +0800)]
rds: remove the _reuse_ rds ib pool statistics

Orabug: 22124214
fix a regress introduced by ceb99ba579a769f4e02375a3d52e36f44ae5f27f

The above commit introduced the two new statistics to rds_ib_statistics.

       uint64_t        s_ib_rdma_mr_1m_pool_reuse;
       uint64_t        s_ib_rdma_mr_8k_pool_reuse;

But didn't have rds-info changed accordingly thus rds-info gets shifted stats.

this removes these two stats.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
9 years agoRDS: Add support for per socket SO_TIMESTAMP for incoming messages
Santosh Shilimkar [Sat, 5 Sep 2015 00:01:06 +0000 (17:01 -0700)]
RDS: Add support for per socket SO_TIMESTAMP for incoming messages

The SO_TIMESTAMP generates time stamp for each incoming RDS message
using the wall time. Result is returned via recv_msg() in a control
message as timeval (usec resolution).

User app can enable it by using SO_TIMESTAMP setsocketopt() at
SOL_SOCKET level. CMSG data of cmsg type SO_TIMESTAMP contains the
time stamp in struct timeval format.

Orabug: 22190837

Acked-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Tested-by: Namrata Jampani <namrata.jampani@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoRDS: Fix out-of-order RDS_CMSG_RDMA_SEND_STATUS
Wei Lin Guay [Wed, 11 Nov 2015 17:31:14 +0000 (09:31 -0800)]
RDS: Fix out-of-order RDS_CMSG_RDMA_SEND_STATUS

Orabug: 22126982

If the RDS user application requests notification of RDMA send
completions, there is a possibility that RDS_CMSG_RDMA_SEND_STATUS
will be delivered out-of-order.

This can happen if RDS drops sending ACK after it received an explicit
ACK. In this case, the rds message ended up in reverse order in the
list.

Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
9 years agoMerge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed into...
Santosh Shilimkar [Tue, 10 Nov 2015 02:17:26 +0000 (18:17 -0800)]
Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed into topic/uek-4.1/ofed

9 years agoMerge branch 'topic/uek-4.1/ofed.ovn' into topic/uek-4.1/ofed
Qing Huang [Tue, 10 Nov 2015 00:23:23 +0000 (16:23 -0800)]
Merge branch 'topic/uek-4.1/ofed.ovn' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.ovn:
  Integrate Uvnic functionality into uek-4.1 Revision 8008
  1) S_IRWXU causing kernel soft crash changing to 0644 WARNING: CPU: 0 PID: 20907 at fs/sysfs/group.c:61 create_files+0x171/0x180() Oct 12 21:43:14 ovn87-180 kernel: [252606.588541] Attribute vhba_default_scsi_timeout: Invalid permissions 0700 [Rev 8008]
  1) Support vnic for EDR based platform(uVnic) 2) Supported Types now Type 0 - XSMP_XCM_OVN - Xsigo VP780/OSDN standalone Chassis, (add pvi) Type 1 - XSMP_XCM_NOUPLINK - EDR Without uplink (add public-network) Type 2 - XSMP_XCM_UPLINK -EDR with uplink (add public-network <with -if> 3) Intelligence in driver to support all the modes 4) Added Code for printing Multicast LID [Revision 8008] 5) removed style errors

9 years agoIntegrate Uvnic functionality into uek-4.1 Revision 8008
Pradeep Gopanapalli [Sat, 7 Nov 2015 02:14:12 +0000 (18:14 -0800)]
Integrate Uvnic functionality into uek-4.1 Revision 8008

Reviewed-by: Sajid Zia <sajid.zia@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
9 years ago1) S_IRWXU causing kernel soft crash changing to 0644 WARNING: CPU: 0 PID: 20907...
Pradeep Gopanapalli [Sat, 7 Nov 2015 02:11:33 +0000 (18:11 -0800)]
1) S_IRWXU causing kernel soft crash changing to 0644 WARNING: CPU: 0 PID: 20907 at fs/sysfs/group.c:61 create_files+0x171/0x180() Oct 12 21:43:14 ovn87-180 kernel: [252606.588541] Attribute vhba_default_scsi_timeout: Invalid permissions 0700 [Rev 8008]

Reviewed-by: Sajid Zia <sajid.zia@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
9 years ago1) Support vnic for EDR based platform(uVnic) 2) Supported Types now Type 0 - XSMP_XC...
Pradeep Gopanapalli [Thu, 5 Nov 2015 02:58:15 +0000 (18:58 -0800)]
1) Support vnic for EDR based platform(uVnic) 2) Supported Types now Type 0 - XSMP_XCM_OVN - Xsigo VP780/OSDN standalone Chassis, (add pvi) Type 1 - XSMP_XCM_NOUPLINK - EDR Without uplink (add public-network) Type 2 - XSMP_XCM_UPLINK -EDR with uplink (add public-network <with -if> 3) Intelligence in driver to support all the modes 4) Added Code for printing Multicast LID [Revision 8008] 5) removed style errors

Reviewed-by: Sajid Zia <sajid.zia@oracle.com>
Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
9 years agoIB/mlx4: Use vmalloc for WR buffers when needed
Wengang Wang [Thu, 29 Oct 2015 06:32:45 +0000 (14:32 +0800)]
IB/mlx4: Use vmalloc for WR buffers when needed

Orabug: 22025570

There are several hits that WR buffer allocation(kmalloc) failed.
It failed at order 3 and/or 4 contigous pages allocation. At the same time
there are actually 100MB+ free memory but well fragmented.
So try vmalloc when kmalloc failed.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agonet/rds: start rdma listening after ib/iw initialization is done
Qing Huang [Tue, 6 Oct 2015 22:32:22 +0000 (15:32 -0700)]
net/rds: start rdma listening after ib/iw initialization is done

This prevents RDS from handling incoming rdma packets before RDS
completes initializing its recv/send components.

We don't need to call rds_rdma_listen_stop() if rds_rdma_listen_init()
didn't succeed.

We only need to call rds_ib_exit() if rds_ib_init() succeeds but
other parts fail. The same applies to rds_iw_init()/rds_iw_exit().
So we need to change error handling sequence accordingly.

Jump to ib/iw error handling path when we get an err code from
rds_rdma_listen_init().

Orabug: 21684447

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agoMerge branch 'topic/uek-4.1/ofed.rds-p2' into topic/uek-4.1/ofed
Mukesh Kacker [Thu, 22 Oct 2015 09:44:31 +0000 (02:44 -0700)]
Merge branch 'topic/uek-4.1/ofed.rds-p2' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.rds-p2:
  net/rds: start rdma listening after ib/iw initialization is done

9 years agonet/rds: start rdma listening after ib/iw initialization is done
Qing Huang [Tue, 6 Oct 2015 22:32:22 +0000 (15:32 -0700)]
net/rds: start rdma listening after ib/iw initialization is done

This prevents RDS from handling incoming rdma packets before RDS
completes initializing its recv/send components.

We don't need to call rds_rdma_listen_stop() if rds_rdma_listen_init()
didn't succeed.

We only need to call rds_ib_exit() if rds_ib_init() succeeds but
other parts fail. The same applies to rds_iw_init()/rds_iw_exit().
So we need to change error handling sequence accordingly.

Jump to ib/iw error handling path when we get an err code from
rds_rdma_listen_init().

Orabug: 21684447

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agoMerge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed into...
Santosh Shilimkar [Wed, 21 Oct 2015 18:15:14 +0000 (11:15 -0700)]
Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed into topic/uek-4.1/ofed

* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed:
  RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()
  RDS: Invoke ->laddr_check() in rds_bind() for explicitly bound transports.
  RDS: rds_conn_lookup() should factor in the struct net for a match
  RDS: Use a single TCP socket for both send and receive.
  RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune
  RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_
  Revert "rds_rdma: rds_sendmsg should return EAGAIN if connection not setup"
  rds: make sure base connection is up on both sides
  rds_ib/iw: fixed big endianness conversion issue for dp->dp_ack_seq
  RDS: fix race condition when sending a message on unbound socket.
  RDS: verify the underlying transport exists before creating a connection
  mlx4: indicate memory resource exhaustion
  IB/mlx4: Use correct order of variables in log message
  mlx4_core: Introduce restrictions for PD update

9 years agoMerge branch 'topic/uek-4.1/ofed.rds-p2' into topic/uek-4.1/ofed
Mukesh Kacker [Wed, 21 Oct 2015 16:11:46 +0000 (09:11 -0700)]
Merge branch 'topic/uek-4.1/ofed.rds-p2' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.rds-p2:
  RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()
  RDS: Invoke ->laddr_check() in rds_bind() for explicitly bound transports.
  RDS: rds_conn_lookup() should factor in the struct net for a match
  RDS: Use a single TCP socket for both send and receive.
  RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune
  RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_
  Revert "rds_rdma: rds_sendmsg should return EAGAIN if connection not setup"
  rds: make sure base connection is up on both sides
  rds_ib/iw: fixed big endianness conversion issue for dp->dp_ack_seq
  RDS: fix race condition when sending a message on unbound socket.
  RDS: verify the underlying transport exists before creating a connection

9 years agoRDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()
Sowmini Varadhan [Thu, 15 Oct 2015 02:11:54 +0000 (22:11 -0400)]
RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()

Orabug: 22012202

Backport of upstream commit 241b271952eb ("RDS-TCP: Reset tcp callbacks
if re-using an outgoing socket in rds_tcp_accept_one()")

Consider the following "duelling syn" sequence between two peers A and B:
             A B
             SYN1     -->
                  <-- SYN2
             SYN2ACK  -->

Note that the SYN/ACK has already been sent out by TCP before
rds_tcp_accept_one() gets invoked as part of callbacks.

If the inet_addr(A) is numerically less than inet_addr(B),
the arbitration scheme in rds_tcp_accept_one() will prefer the
TCP connection triggered by SYN1, and will send a CLOSE for the
SYN2 (just after the SYN2ACK was sent).

Since B also follows the same arbitration scheme, it will send the SYN-ACK
for SYN1 that will set up a healthy ESTABLISHED connection on both sides.
B will also get a  CLOSE for SYN2, which should result in the cleanup
of the TCP state machine for SYN2, but it should not trigger any
stale RDS-TCP callbacks (such as ->writespace, ->state_change etc),
that would disrupt the progress of the SYN2 based RDS-TCP  connection.

Thus the arbitration scheme in rds_tcp_accept_one() should restore
rds_tcp callbacks for the winner before setting them up for the
new accept socket, and also make sure that conn->c_outgoing
is set to 0 so that we do not trigger any reconnect attempts on the
passive side of the tcp socket in the future, in conformance with
commit c82ac7e69efe ("net/rds: RDS-TCP: only initiate reconnect attempt
on outgoing TCP socket.")

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoRDS: Invoke ->laddr_check() in rds_bind() for explicitly bound transports.
Sowmini Varadhan [Thu, 15 Oct 2015 00:37:47 +0000 (20:37 -0400)]
RDS: Invoke ->laddr_check() in rds_bind() for explicitly bound transports.

Orabug: 22012202

Backport of upstream commit 486798001b92 ("RDS: Invoke ->laddr_check()
in rds_bind() for explicitly bound transports.")

The IP address passed to rds_bind() should be vetted by the
transport's ->laddr_check() for a previously bound transport.
This needs to be done to avoid cases where, for example,
the application has asked for an IB transport,
but the IP address passed to bind is only usable on
ethernet interfaces.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agoRDS: rds_conn_lookup() should factor in the struct net for a match
Sowmini Varadhan [Wed, 14 Oct 2015 19:52:23 +0000 (15:52 -0400)]
RDS: rds_conn_lookup() should factor in the struct net for a match

Orabug: 22012202

Backport of upstream commit 8f384c0177a0 ("RDS: rds_conn_lookup() should
factor in the struct net for a match")

Only return a conn if the rds_conn_net(conn) matches the struct
net passed to rds_conn_lookup().

Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints,
           one per netns.")

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agoRDS: Use a single TCP socket for both send and receive.
Sowmini Varadhan [Wed, 14 Oct 2015 14:29:31 +0000 (10:29 -0400)]
RDS: Use a single TCP socket for both send and receive.

Orabug: 22012202

Backport of upstream commit 3b20fc389705 ("RDS: Use a single TCP
socket for both send and receive.");

Commit f711a6ae062c ("net/rds: RDS-TCP: Always create a new rds_sock
for an incoming connection.") modified rds-tcp so that an incoming SYN
would ignore an existing "client" TCP connection which had the local
port set to the transient port.  The motivation for ignoring the existing
"client" connection in f711a6ae was to avoid race conditions and an
endless duel of reconnect attempts triggered by a restart/abort of one
of the nodes in the TCP connection.

However, having separate sockets for active and passive sides
is avoidable, and the simpler model of a single TCP socket for
both send and receives of all RDS connections associated with
that tcp socket makes for easier observability. We avoid the race
conditions from f711a6ae by attempting reconnects in rds_conn_shutdown
if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP.
The c_outgoing bit is initialized in __rds_conn_create().

A side-effect of re-using the client rds_connection for an incoming
SYN is the potential of encountering duelling SYNs, i.e., we
have an outgoing RDS_CONN_CONNECTING socket when we get the incoming
SYN. The logic to arbitrate this criss-crossing SYN exchange in
rds_tcp_accept_one() has been modified to emulate the BGP state
machine: the smaller IP address should back off from the connection
attempt.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agoRDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune
Sowmini Varadhan [Wed, 14 Oct 2015 11:23:20 +0000 (07:23 -0400)]
RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune

Orabug: 22012202

Backport of upstream commit 1edd6a14d24f ("RDS-TCP: Do not bloat
sndbuf/rcvbuf in rds_tcp_tune")

Using the value of RDS_TCP_DEFAULT_BUFSIZE (128K)
clobbers efficient use of TSO because it inflates the size_goal
that is computed in tcp_sendmsg/tcp_sendpage and skews packet
latency, and the default values for these parameters actually
results in significantly better performance.

In request-response tests using rds-stress with a packet size of
100K with 16 threads (test parameters -q 100000 -a 256 -t16 -d16)
between a single pair of IP addresses achieves a throughput of
6-8 Gbps. Without this patch, throughput maxes at 2-3 Gbps under
equivalent conditions on these platforms.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoRDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_
Sowmini Varadhan [Wed, 14 Oct 2015 01:03:04 +0000 (21:03 -0400)]
RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_

Backport of upstream commit 76b29ef120f5 ("RDS-TCP: Set up MSG_MORE and
MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_xmit")

For the same reasons as commit 2f5338442425 ("tcp: allow splice() to
build full TSO packets") and commit 35f9c09fe9c7 ("tcp: tcp_sendpages()
should call tcp_push() once"), rds_tcp_xmit may have multiple pages to
send, so use the MSG_MORE and MSG_SENDPAGE_NOTLAST as hints to
tcp_sendpage()

Orabug: 22012202

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agoRevert "rds_rdma: rds_sendmsg should return EAGAIN if connection not setup"
Rama Nichanamatlu [Tue, 20 Oct 2015 13:16:25 +0000 (06:16 -0700)]
Revert "rds_rdma: rds_sendmsg should return EAGAIN if connection not setup"

This reverts commit 3d6e0fed8edc2f5d5439bee22c2fa153096c77ea.

Reverted because fix has bug that affects rds qos threshold
settings.

Orabug: 21664735

Signed-off-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Acked-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agords: make sure base connection is up on both sides
Ajaykumar Hotchandani [Tue, 5 May 2015 03:09:42 +0000 (20:09 -0700)]
rds: make sure base connection is up on both sides

Current RDS active side requires zero lane path records for establishing
non-zero lane connection. For this reason, active side makes sure to
have zero lane connection up before establishing non-zero lane
connection. Passive side does not require to fetch path records, so it
does not have this check.

This has possibility of connection having non-ideal path records in
following scenario:
- Host1 had PORT_UP event.
- Lane0 and Lane6 connection went down.
- Lane0 connection came up.
- Host1 sent connection request for Lane6.
- Host2 had PORT_UP event.
- Lane0 and Lane6 connections went down.
- Host2 sent DREQ for Lane0.
- Since Lane6 connection is not up, it does not require to do anything.
- Host2 received connection request from host1 having old path records
  for Lane6.
- Lane6 connection got established on old path records.

Following are impacts of having connections with non-ideal path records:
- minor performance hit because of extra hop with ISL path
- in port failure scenario, it impacts connections which are not related
  to that port.

With this patch we make sure that base connection is up on passive side
as well before allowing to establish connection.

(This is port of UEK2 commit 7ab7ef255a)

Orabug: 21675157

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Acked-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agords_ib/iw: fixed big endianness conversion issue for dp->dp_ack_seq
Qing Huang [Tue, 13 Oct 2015 00:23:50 +0000 (17:23 -0700)]
rds_ib/iw: fixed big endianness conversion issue for dp->dp_ack_seq

dp->dp_ack_seq is used in big endian format. We need to do the
big endianness conversion when we assign a value in host format
to it.

(This patch is ported from UEK2)

Orabug: 21684819

Acked-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agoRDS: fix race condition when sending a message on unbound socket.
Quentin Casasnovas [Mon, 19 Oct 2015 21:22:27 +0000 (14:22 -0700)]
RDS: fix race condition when sending a message on unbound socket.

Sasha's found a NULL pointer dereference in the RDS connection code when
sending a message to an apparently unbound socket.  The problem is caused
by the code checking if the socket is bound in rds_sendmsg(), which checks
the rs_bound_addr field without taking a lock on the socket.  This opens a
race where rs_bound_addr is temporarily set but where the transport is not
in rds_bind(), leading to a NULL pointer dereference when trying to
dereference 'trans' in __rds_conn_create().

Vegard wrote a reproducer for this issue, so kindly ask him to share if
you're interested.

I cannot reproduce the NULL pointer dereference using Vegard's reproducer
with this patch, whereas I could without.

Complete earlier incomplete fix to CVE-2015-6937:

  74e98eb08588 ("RDS: verify the underlying transport exists before creating a connection")

Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com>
Reviewed-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Chien Yen <chien.yen@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: stable@vger.kernel.org
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoRDS: verify the underlying transport exists before creating a connection
Sasha Levin [Tue, 8 Sep 2015 14:53:40 +0000 (10:53 -0400)]
RDS: verify the underlying transport exists before creating a connection

There was no verification that an underlying transport exists when creating
a connection, this would cause dereferencing a NULL ptr.

It might happen on sockets that weren't properly bound before attempting to
send a message, which will cause a NULL ptr deref:

[135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[135546.051270] Modules linked in:
[135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
[135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
[135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
[135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
[135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
[135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
[135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
[135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
[135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
[135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
[135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
[135546.064723] Stack:
[135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
[135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
[135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
[135546.068629] Call Trace:
[135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
[135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
[135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
[135546.071981] rds_sendmsg (net/rds/send.c:1058)
[135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
[135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
[135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
[135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
[135546.076349] ? __might_fault (mm/memory.c:3795)
[135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
[135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
[135546.078856] SYSC_sendto (net/socket.c:1657)
[135546.079596] ? SYSC_connect (net/socket.c:1628)
[135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
[135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
[135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
[135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
[135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1

Orabug: 22010933

Acked-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Acked-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 74e98eb085889b0d2d4908f59f6e00026063014f)

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoMerge branch 'topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes' into topic/uek-4.1/ofed
Mukesh Kacker [Wed, 21 Oct 2015 15:09:05 +0000 (08:09 -0700)]
Merge branch 'topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes:
  mlx4: indicate memory resource exhaustion
  IB/mlx4: Use correct order of variables in log message
  mlx4_core: Introduce restrictions for PD update

9 years agomlx4: indicate memory resource exhaustion
Ajaykumar Hotchandani [Wed, 14 Oct 2015 23:30:26 +0000 (16:30 -0700)]
mlx4: indicate memory resource exhaustion

Change here provides details about pid, which resource got exhausted and
current limits for that resource.

These details are requested by database folks to attempt recovery.

This is port from UEK2 commit f3774780e30

Orabug: 21549767

Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agoIB/mlx4: Use correct order of variables in log message
Wengang Wang [Thu, 15 Oct 2015 01:04:35 +0000 (09:04 +0800)]
IB/mlx4: Use correct order of variables in log message

There was a incorrect order of variables in mlx4_log().
This fix corrects that mistake.

Orabug: 21906781

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agomlx4_core: Introduce restrictions for PD update
Ajaykumar Hotchandani [Wed, 14 Oct 2015 23:38:11 +0000 (16:38 -0700)]
mlx4_core: Introduce restrictions for PD update

From 2.31.5350 firmware onwards,
- RDS with RDMA data transfer stopped working.
- Mellanox has introduced limitations related to PD updates.
  These imposed limitations are inline with PRM.

This patch makes driver in sync with these imposed limitations.
Mellanox R&D has approved this patch.

It's been tested on both old firmware (2.11.1280) and new firmware.

Mellanox case number is 179121.
OraBug: 22022389

v2: Change subject as per suggestion from Yuval

Tested-by: Pierre Orzechowski <pierre.e.orzechowski@oracle.com>
Tested-by: Kushagra Misra <kushagra.misra@oracle.com>
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agoRDS: fix race condition when sending a message on unbound socket.
Quentin Casasnovas [Mon, 19 Oct 2015 21:22:27 +0000 (14:22 -0700)]
RDS: fix race condition when sending a message on unbound socket.

Sasha's found a NULL pointer dereference in the RDS connection code when
sending a message to an apparently unbound socket.  The problem is caused
by the code checking if the socket is bound in rds_sendmsg(), which checks
the rs_bound_addr field without taking a lock on the socket.  This opens a
race where rs_bound_addr is temporarily set but where the transport is not
in rds_bind(), leading to a NULL pointer dereference when trying to
dereference 'trans' in __rds_conn_create().

Vegard wrote a reproducer for this issue, so kindly ask him to share if
you're interested.

I cannot reproduce the NULL pointer dereference using Vegard's reproducer
with this patch, whereas I could without.

Complete earlier incomplete fix to CVE-2015-6937:

  74e98eb08588 ("RDS: verify the underlying transport exists before creating a connection")

Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com>
Reviewed-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Chien Yen <chien.yen@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: stable@vger.kernel.org
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoMerge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed into...
Santosh Shilimkar [Tue, 13 Oct 2015 17:10:40 +0000 (10:10 -0700)]
Merge branch 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed into topic/uek-4.1/ofed

* 'topic/uek-4.1/ofed' of git://ca-git.us.oracle.com/linux-uek-ofed:
  RDS/IB: print string constants in more places
  ib/rds: runtime debuggability enhancement

9 years agoRDS: make send_batch_count tunable effective
Santosh Shilimkar [Thu, 8 Oct 2015 22:59:14 +0000 (15:59 -0700)]
RDS: make send_batch_count tunable effective

The send_batch_count tunable is stale and code relies on
hard-coded batch count value. Its a nice feature and lets you
tune the system based on different HCAs. TCP transport as well
have different characteristics and tunable can be useful.

There is no change in default behavior with this patch.

Orabug: 22010933

Acked-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Wengang Wang <wen.gang.wang@oracle.com>
Reported-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoRDS: make use of kfree_rcu() and avoid the call_rcu() chain
Santosh Shilimkar [Thu, 8 Oct 2015 23:26:32 +0000 (16:26 -0700)]
RDS: make use of kfree_rcu() and avoid the call_rcu() chain

call_rcu() chains are expensive and its use in rds_ib_remove_ipaddr()
is just to kfree() the rds_ib_ipaddr. Chains make use of  high-latency
rcu_barrier() in modules which can be avoided.

Makes use of kfree_rcu() which is exactly meant for such use

Orabug: 22010933

Acked-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Acked-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoRDS: verify the underlying transport exists before creating a connection
Sasha Levin [Tue, 8 Sep 2015 14:53:40 +0000 (10:53 -0400)]
RDS: verify the underlying transport exists before creating a connection

There was no verification that an underlying transport exists when creating
a connection, this would cause dereferencing a NULL ptr.

It might happen on sockets that weren't properly bound before attempting to
send a message, which will cause a NULL ptr deref:

[135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[135546.051270] Modules linked in:
[135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
[135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
[135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
[135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
[135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
[135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
[135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
[135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
[135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
[135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
[135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
[135546.064723] Stack:
[135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
[135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
[135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
[135546.068629] Call Trace:
[135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
[135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
[135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
[135546.071981] rds_sendmsg (net/rds/send.c:1058)
[135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
[135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
[135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
[135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
[135546.076349] ? __might_fault (mm/memory.c:3795)
[135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
[135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
[135546.078856] SYSC_sendto (net/socket.c:1657)
[135546.079596] ? SYSC_connect (net/socket.c:1628)
[135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
[135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
[135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
[135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
[135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1

Orabug: 22010933

Acked-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Acked-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 74e98eb085889b0d2d4908f59f6e00026063014f)

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoMerge branch 'topic/uek-4.1/ofed.rds-p2' into topic/uek-4.1/ofed
Mukesh Kacker [Tue, 13 Oct 2015 15:44:42 +0000 (08:44 -0700)]
Merge branch 'topic/uek-4.1/ofed.rds-p2' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.rds-p2:
  RDS/IB: print string constants in more places
  ib/rds: runtime debuggability enhancement

9 years agoRDS/IB: print string constants in more places
Zach Brown [Tue, 3 Aug 2010 20:52:47 +0000 (13:52 -0700)]
RDS/IB: print string constants in more places

This prints the constant identifier for work completion status and rdma
cm event types, like we already do for IB event types.

A core string array helper is added that each string type uses.

Note: The following is the original commit in uek prior to the big uek2
ofed blob patch.

    commit 59f740a6aeb2cde2f79fe0df38262d4c1ef35cd8

Orabug 21314268

Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agoib/rds: runtime debuggability enhancement
Qing Huang [Wed, 15 Jul 2015 01:36:43 +0000 (18:36 -0700)]
ib/rds: runtime debuggability enhancement

This patch provides the ability to dynamically turn on or off various
types of debug/diag prints inside the RDS module.

The run-time debug prints are controlled by a rds module parameter,
rds_rt_debug_bitmap.

Here is the definition for different bits. We have implemented feature
related bits, such as Connection Management, Active Bonding, Error prints,
Send, Recv.

in net/rds/rds_rt_debug.h
...
enum {
        /* bit 0 ~ 19 are feature related bits */
        RDS_RTD_ERR                     = 1 << 0,       /* 0x1    */
        RDS_RTD_ERR_EXT                 = 1 << 1,       /* 0x2    */

        RDS_RTD_CM                      = 1 << 3,       /* 0x8    */
        RDS_RTD_CM_EXT                  = 1 << 4,       /* 0x10   */
        RDS_RTD_CM_EXT_P                = 1 << 5,       /* 0x20   */

        RDS_RTD_ACT_BND                 = 1 << 7,       /* 0x80   */
        RDS_RTD_ACT_BND_EXT             = 1 << 8,       /* 0x100  */

        RDS_RTD_RCV                     = 1 << 11,      /* 0x800  */
        RDS_RTD_RCV_EXT                 = 1 << 12,      /* 0x1000 */

        RDS_RTD_SND                     = 1 << 14,      /* 0x4000 */
        RDS_RTD_SND_EXT                 = 1 << 15,      /* 0x8000 */
...

In general, *EXTRA bits mean that you will get extra information but
possible flood prints as well. But every bit can be controlled by users
so users can decide how much information they want to see/collect. The
current embedded printk level used for this patch is KERN_INFO. Most
likely all the msgs will only go to /var/log/messages without showing up
on console if we use the default settings for /proc/sys/kernel/printk and
/etc/rsyslog.conf in ol6 environment.

E.g if we want to turn on RDS_RTD_ERR and RDS_RTD_CM bits. What we can
do is

echo 0x9 > /sys/module/rds/parameters/rds_rt_debug_bitmap

To turn on RDS_RTD_ERR(0x1), RDS_RTD_CM(0x8), and RDS_RTD_RCV(0x800) bits

echo 0x809 > /sys/module/rds/parameters/rds_rt_debug_bitmap

Performance penalty: with all the debug flag bits set to 0, there
should be no porformance impact in a kernel with this patch.

Orabug 21314268

Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agouek-rpm: Enable config for OVN xsigo drivers
Mukesh Kacker [Mon, 9 Feb 2015 02:54:50 +0000 (18:54 -0800)]
uek-rpm: Enable config for OVN xsigo drivers

Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agoMerge branch 'topic/uek-4.1/ofed.xsigo' into topic/uek-4.1/ofed
Mukesh Kacker [Wed, 7 Oct 2015 11:40:32 +0000 (04:40 -0700)]
Merge branch 'topic/uek-4.1/ofed.xsigo' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.xsigo:
  Add Oracle virtual Networking Drivers for uek4 kernel

9 years agoAdd Oracle virtual Networking Drivers for uek4 kernel
Pradeep Gopanapalli [Wed, 23 Sep 2015 01:56:41 +0000 (18:56 -0700)]
Add Oracle virtual Networking Drivers for uek4 kernel

This commit adds 4 kernel modules: xscore, xsvnic, xve
and xsvhba developed by Xsigo (acquired by Oracle) and used in the Oracle
virtual networking (OVN) products which provide provide virtual network and
storage adapter devices on the servers dynamically at runtime.

The heart of OVN product is the Fabric Interconnect (FI).
Hosts and IO modules connect to the FI using Infiniband fabric.
IO modules can be N/W card or/and FC card.

The "xscore" module is responsible for doing FI topology discovery
and establishing the connection with FI. It is involved in retrieving
virtual device management commands such as INSTALL, DELETE, etc.
This module provides wrapper for IB framework API's which will be used
by its client  modules "xsvnic", "xsvhba" and "xve".

The "xve" module supprots the Xsigo Virtual Ethernet(XVE) protocol.
The "xsvnic" module supports the Xsigo vNIC functinality. These modules
interface between kernel networking stack and the "xscore" module.
On the egress side, it processes the N/W packet sends it to "xscore"
module which is then wrapped into a IB packet.

On the ingress side, "xscore" receives the N/W packet which is
encapsulated inside IB packet and transfers it to "xsvnic" or "xve".
The modules "xsvnic"/"xve" process this packet and send it to the
kernel networking stack. The "xsvnic" interacts with N/W card gateway
connected to the FI whereas, "xve" interacts with another host in the
same IB fabric.

The "xsvhba" module support for the Xsigo virtual HBA allowing SAN
Connectivity. The "xsvhba" module interfaces with SCSI layer. It
communicates with the FC card gateway connected to the FI. It is
responsible for accepting/transporting the SCSI commands from/to
the specified SCSI target. The "xsvhba" module uses "xscore" to
wrap(unwrap) the commands in a IB packet and transmit(receive) it.

Signed-off-by: Pradeep Gopanapalli <pradeep.gopanapalli@oracle.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
9 years agoMerge branch 'topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes' into topic/uek-4.1/ofed
Mukesh Kacker [Tue, 6 Oct 2015 14:36:37 +0000 (07:36 -0700)]
Merge branch 'topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.mlnx2.4-p3.orclFixes:
  ib_sdp/cma: readd SDP support to cma_save_net_info

9 years agoib_sdp/cma: readd SDP support to cma_save_net_info
Qing Huang [Sat, 26 Sep 2015 00:47:19 +0000 (17:47 -0700)]
ib_sdp/cma: readd SDP support to cma_save_net_info

Upstream has removed SDP support from cma.c. Some applications may
not display addr/port information correctly without this change to
cma_save_net_info() function.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
9 years agoMerge branch 'topic/uek-4.1/ofed.sdp' into topic/uek-4.1/ofed
Mukesh Kacker [Tue, 6 Oct 2015 12:49:07 +0000 (05:49 -0700)]
Merge branch 'topic/uek-4.1/ofed.sdp' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.sdp:
  ib/sdp: Enable usermode FMR
  ib/sdp: fix null dereference of sk->sk_wq in sdp_rx_irq()
  sdp: fix keepalive functionality
  ib_sdp: fix deadlock when sdp_cma_handler is called while socket is being closed
  ib_sdp: add unhandled events to rdma_cm_event_str

9 years agoib/sdp: Enable usermode FMR
Dotan Barak [Wed, 22 Feb 2012 12:59:56 +0000 (14:59 +0200)]
ib/sdp: Enable usermode FMR

Signed-off-by: Arun Kaimalettu <arun.kaimalettu@oracle.com>
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
9 years agoib/sdp: fix null dereference of sk->sk_wq in sdp_rx_irq()
Chuck Anderson [Thu, 8 Jan 2015 00:06:49 +0000 (17:06 -0700)]
ib/sdp: fix null dereference of sk->sk_wq in sdp_rx_irq()

Orabug: 20070989

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa02b639f>] sdp_rx_irq+0x4f/0x160 [ib_sdp]
PGD 1d7fd14067 PUD 190984b067 PMD 0
Oops: 0000 [#1] SMP
...
Pid: 61889, comm: oracle Not tainted 2.6.39-400.128.20.el5uek #1 Oracle
Corporation SUN FIRE X4170 M3     /ASSY,MOTHERBOARD,1U
RIP: 0010:[<ffffffffa02b639f>]  [<ffffffffa02b639f>] sdp_rx_irq+0x4f/0x160
[ib_sdp]
...

Crash occurs in the call to sdp_sk_sleep(sk) through waitqueue_active():

drivers/infiniband/ulp/sdp/sdp_rx.c
static void sdp_rx_irq(struct ib_cq *cq, void *cq_context)
        if (should_wake_up(sk)) {

drivers/infiniband/ulp/sdp/sdp_rx.c
static inline int should_wake_up(struct sock *sk)
{
        return sdp_sk_sleep(sk) && waitqueue_active(sdp_sk_sleep(sk)) &&
                (posts_handler(sdp_sk(sk)) || somebody_is_waiting(sk));
}

drivers/infiniband/ulp/sdp/sdp.h:
        #define sdp_sk_sleep(sk) sk_sleep(sk)

include/net/sock.h
static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
        BUILD_BUG_ON(offsetof(struct socket_wq, wait) != 0);
        return &rcu_dereference_raw(sk->sk_wq)->wait;
}

We know the first call to sdp_sk_sleep(sk) finds a non-null sk->sk_wq
because we don't crash:

0xffffffffa02b6388 <sdp_rx_irq+56>:     mov    0xb8(%rsi),%rax
0xffffffffa02b638f <sdp_rx_irq+63>:     test   %rax,%rax
*** struct sock sk+0xb8 == sk->sk_wq (sk_wq is at offset 0xb8)
*** we didn't crash at sdp_rx_irq+56 so sk->sk_wq was apparently valid
0xffffffffa02b6394 <sdp_rx_irq+68>:     mov    0xb8(%rsi),%rdx
0xffffffffa02b639b <sdp_rx_irq+75>:     lea    0x8(%rdx),%rax
0xffffffffa02b639f <sdp_rx_irq+79>:     cmp    %rax,0x8(%rdx)
*** RDX is NULL causing the null dereference of address 0x8 at sdp_rx_irq+79.

Fix is to check if sk->sk_wq is NULL before dereferencing it to get the
address of sk->sk_wq->wait.  Also, do the RCU dereference of sk->sk_wq
once, not twice as we may get a different answer (NULL) the second time.

Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Signed-off-by: John Sobecki <john.sobecki@oracle.com>
Acked-by: Chien Yen <chien.yen@oracle.com>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
9 years agosdp: fix keepalive functionality
shamir rabinovitch [Mon, 12 May 2014 15:34:02 +0000 (08:34 -0700)]
sdp: fix keepalive functionality

sdp keepalive functionality differ a bit from tcp socket functionality.
in sdp only accepted or connected socket can trigger this functionality
as the keepalive is implemented as rdma write with zero length and this
require ib connection. due to this sdp behaviour you cannot set keepalive
on listening server socket or on non connected client socket. apps can
use sdp in 2 ways. binary apps that use tcp sockets can use the libsdp
to direct all the socket calls to sdp and new apps can open and use sdp
sockets directly w/o the need for libsdp. when using sdp socket directly
please follow the below rules:

- define: AF_INET_SDP = SOL_SDP = 27
- create the socket as follow:
socket(AF_INET_SDP, SOCK_STREAM, 0)
- get the sdp socket keepalive as follow:
getsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &optval, &optlen)
- set the sdp socket keepalive as follow:
setsockopt(fd, SOL_SDP, SO_KEEPALIVE, &optval, optlen)

when you load the sdp module;
- set the keepalive time. this is the max period in sec of no data before
sdp start to send the probes. you should take to account that more
then one probe is needed till sdp detect that the remote hca is gone.
echo <time sec> > /sys/module/ib_sdp/parameters/sdp_keepalive_time
- zero the probes counter. this counter is incremented any time sdp send probe.
probes are sent only if there is no tx/rx on this queue pair for the
keepalive time period.
echo 0 > /sys/module/ib_sdp/parameters/sdp_keepalive_probes_sent

on server socket:
- set keepalive only on accepted socket

on client socket:
- set keepalive only on socket after connect

Orabug: 18728784

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Vadim Makhervaks <vadim.makhervaks@oracle.com>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
9 years agoib_sdp: fix deadlock when sdp_cma_handler is called while socket is being closed
Saeed Mahameed [Sun, 17 Feb 2013 16:10:57 +0000 (18:10 +0200)]
ib_sdp: fix deadlock when sdp_cma_handler is called while socket is being closed

issue: 130280

sdp_close will grap sock_lock and while closing sdp_cma_handler can be called from cma context
under id_priv->qp_mutex and the sdp_cma_handler will wait for sock_lock to be available.
sdp_close will call rdma_disconnect which will need to grap id_priv->qp_mutex --> deadlock !

this patch fixes the following call trace :
Call Trace:
[<ffffffff813b4476>] lock_sock_nested+0x86/0xbf
[<ffffffff81077024>] ? autoremove_wake_function+0x0/0x3d
[<ffffffffa03ae65a>] sdp_cma_handler+0xe7/0x1529 [ib_sdp]
[<ffffffffa04ca060>] ? mlx4_free_cmd_mailbox+0x31/0x35 [mlx4_core]
[<ffffffffa04ca060>] ? mlx4_free_cmd_mailbox+0x31/0x35 [mlx4_core]
[<ffffffffa04dece6>] ? __mlx4_qp_modify+0x2c6/0x2eb [mlx4_core]
[<ffffffffa01d8408>] ? rdma_port_link_layer+0x1b/0x42 [ib_core]
[<ffffffffa0234de0>] ? mlx4_ib_modify_qp+0xd22/0xd46 [mlx4_ib]
[<ffffffffa0234df2>] ? mlx4_ib_modify_qp+0xd34/0xd46 [mlx4_ib]
[<ffffffffa038e1de>] cma_qp_set_alt_path+0x2b7/0x32c [rdma_cm]
[<ffffffffa0215792>] ? ib_post_send_mad+0x440/0x50f [ib_mad]
[<ffffffffa0390425>] cma_ib_handler+0x70f/0x9fc [rdma_cm]
[<ffffffffa01dbe60>] ? ib_find_cached_pkey+0xf0/0x105 [ib_core]
[<ffffffffa02a5a07>] cm_process_work+0x53/0x9b [ib_cm]
[<ffffffffa02a7352>] cm_work_handler+0x66e/0xdcd [ib_cm]
[<ffffffffa02a6ce4>] ? cm_work_handler+0x0/0xdcd [ib_cm]
[<ffffffff81072d5e>] worker_thread+0x14d/0x1ed
[<ffffffff81077024>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff81072c11>] ? worker_thread+0x0/0x1ed
[<ffffffff81076c7b>] kthread+0x6e/0x76
[<ffffffff81012dea>] child_rip+0xa/0x20
[<ffffffff81076c0d>] ? kthread+0x0/0x76
[<ffffffff81012de0>] ? child_rip+0x0/0x20
INFO: task rdma_cm:24917 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rdma_cm D 0000000000000000 0 24917 2 0x00000000
ffff8800ce4e7d20 0000000000000046 0000000000000000 000000008104cb48
ffff8800da5e83c0 ffffffff81aae4c0 ffff8800da5e8790 000000010319de96
0000000000000400 0000000000000000 0000000000000000 ffff880107864664
Call Trace:
[<ffffffff81456870>] __mutex_lock_common+0x12f/0x1a1
[<ffffffff81456931>] __mutex_lock_slowpath+0x19/0x1b
[<ffffffff8145699a>] mutex_lock+0x23/0x3a
[<ffffffffa038d03c>] cma_sap_work_handler+0x105/0x245 [rdma_cm]
[<ffffffff810432be>] ? need_resched+0x23/0x2d
[<ffffffff814560ab>] ? thread_return+0x99/0xb0
[<ffffffffa038ee11>] ? cma_work_handler+0x0/0x94 [rdma_cm]
[<ffffffffa038cf37>] ? cma_sap_work_handler+0x0/0x245 [rdma_cm]
[<ffffffff81072d5e>] worker_thread+0x14d/0x1ed
[<ffffffff81077024>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff81072c11>] ? worker_thread+0x0/0x1ed
[<ffffffff81076c7b>] kthread+0x6e/0x76
[<ffffffff81012dea>] child_rip+0xa/0x20
[<ffffffff81076c0d>] ? kthread+0x0/0x76
[<ffffffff81012de0>] ? child_rip+0x0/0x20
INFO: task NPtcp:4326 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
NPtcp D 0000000000000000 0 4326 4319 0x00000000
ffff8800b66dfc78 0000000000000086 ffff8800d3ff03c0 0000000000000005
ffff8800b12b40c0 ffffffff81aae4c0 ffff8800b12b4490 0000000028210680
ffff8800b66dfd70 ffff8800b66dfde8 0000000000000000 ffff88010786461c

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Amir Vadai <amirv@mellanox.com>
9 years agoib_sdp: add unhandled events to rdma_cm_event_str
Saeed Mahameed [Mon, 11 Feb 2013 15:04:04 +0000 (17:04 +0200)]
ib_sdp: add unhandled events to rdma_cm_event_str

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
9 years agoib_sdp/uek-rpm: configs: enable compilation for sdp
Qing Huang [Wed, 23 Sep 2015 00:30:07 +0000 (17:30 -0700)]
ib_sdp/uek-rpm: configs: enable compilation for sdp

Signed-off-by: Qing Huang <qing.huang@oracle.com>
9 years agoMerge branch 'topic/uek-4.1/ofed.sdp' into topic/uek-4.1/ofed
Mukesh Kacker [Tue, 6 Oct 2015 12:13:39 +0000 (05:13 -0700)]
Merge branch 'topic/uek-4.1/ofed.sdp' into topic/uek-4.1/ofed

* topic/uek-4.1/ofed.sdp: (408 commits)
  ib_sdp: porting sdp from uek2 to uek-4.1
  ib_sdp: remove APM code
  sdp: Kconfig and Makefile changes
  sdp: port the code to uek2
  sdp: added debug print for the event: RDMA_CM_EVENT_ALT_PATH_LOADED
  sdp: prepare support to kernel 2.6.39-200.1.1.el5uek: add macro to get sk_sleep
  sdp: add support to kernel 2.6.39-200.1.1.el5uek
  sdp: add [rt]x_bytes counters to sdpstats
  sdp: Fix Bug 114242 - Multi connection net_perf causes server to hang
  FMR: remove FMR failure messages
  sdp: make sdp memory leak print a debug
  sdp: changed memory accounting warning into debug
  sdp: Fix issues in sdpprf
  sdp: Remove protection before sleep on RX
  sdp: Enable automatic path migration support also in the passive side of the connection.
  sdp: Fixed some coverity issues
  Flatten the entire tree fixes
  sdp: Fixed compilation error on 2.6.18 RH5.5
  sdp: fix memory leak. sockets_allocated wasn't freed
  sdp: Removed spaces and tabs at end of lines
  ...

9 years agoib_sdp: porting sdp from uek2 to uek-4.1
Qing Huang [Fri, 25 Sep 2015 22:45:36 +0000 (15:45 -0700)]
ib_sdp: porting sdp from uek2 to uek-4.1

Perf result:

[root@ca-ibdev10 src]# ./iperf -c 192.168.220.117 -P 8 -p 6001 -l 4k -t 300 -w
16m
------------------------------------------------------------
Client connecting to 192.168.220.117, TCP port 6001
TCP window size: 2.10 MByte (WARNING: requested 16.0 MByte)
------------------------------------------------------------
[  7] local 192.168.220.110 port 28275 connected with 192.168.220.117 port 6001
[  9] local 192.168.220.110 port 16839 connected with 192.168.220.117 port 6001
[  4] local 192.168.220.110 port 13889 connected with 192.168.220.117 port 6001
[ 12] local 192.168.220.110 port 22535 connected with 192.168.220.117 port 6001
[ 13] local 192.168.220.110 port 33341 connected with 192.168.220.117 port 6001
[  6] local 192.168.220.110 port 9645 connected with 192.168.220.117 port 6001
[ 16] local 192.168.220.110 port 54091 connected with 192.168.220.117 port 6001
[ 19] local 192.168.220.110 port 22246 connected with 192.168.220.117 port 6001
[ ID] Interval       Transfer     Bandwidth
[  7]  0.0-300.0 sec   126 GBytes  3.59 Gbits/sec
[  9]  0.0-300.0 sec   126 GBytes  3.60 Gbits/sec
[  4]  0.0-300.0 sec   126 GBytes  3.60 Gbits/sec
[ 12]  0.0-300.0 sec   126 GBytes  3.59 Gbits/sec
[ 13]  0.0-300.0 sec   126 GBytes  3.60 Gbits/sec
[  6]  0.0-300.0 sec   126 GBytes  3.60 Gbits/sec
[ 16]  0.0-300.0 sec   126 GBytes  3.60 Gbits/sec
[ 19]  0.0-300.0 sec   126 GBytes  3.59 Gbits/sec
[SUM]  0.0-300.0 sec  1004 GBytes  28.8 Gbits/sec

Signed-off-by: Qing Huang <qing.huang@oracle.com>
9 years agoib_sdp: remove APM code
Qing Huang [Wed, 21 Jan 2015 23:17:06 +0000 (15:17 -0800)]
ib_sdp: remove APM code

In UEK4, APM support code in rdma_cm has been removed,
so we had to remove all the references to APM APIs in
SDP as well.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
9 years agosdp: Kconfig and Makefile changes
Ajaykumar Hotchandani [Thu, 22 Jan 2015 03:18:44 +0000 (19:18 -0800)]
sdp: Kconfig and Makefile changes

Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
9 years agosdp: port the code to uek2
Dotan Barak [Tue, 3 Jul 2012 10:09:47 +0000 (13:09 +0300)]
sdp: port the code to uek2

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
9 years agosdp: added debug print for the event: RDMA_CM_EVENT_ALT_PATH_LOADED
Dotan Barak [Sun, 14 Oct 2012 09:24:48 +0000 (11:24 +0200)]
sdp: added debug print for the event: RDMA_CM_EVENT_ALT_PATH_LOADED

Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
9 years agosdp: prepare support to kernel 2.6.39-200.1.1.el5uek: add macro to get sk_sleep
Dotan Barak [Tue, 3 Jul 2012 07:35:01 +0000 (10:35 +0300)]
sdp: prepare support to kernel 2.6.39-200.1.1.el5uek: add macro to get sk_sleep

This will ease up the porting to the new kernel.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
9 years agosdp: add support to kernel 2.6.39-200.1.1.el5uek
Dotan Barak [Tue, 3 Jul 2012 06:45:10 +0000 (09:45 +0300)]
sdp: add support to kernel 2.6.39-200.1.1.el5uek

Rename the MACROs inet_daddr and inet_rcv_saddr, since they exists in this
kernel.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
9 years agosdp: add [rt]x_bytes counters to sdpstats
Amir Vadai [Tue, 3 Apr 2012 10:52:45 +0000 (13:52 +0300)]
sdp: add [rt]x_bytes counters to sdpstats

Those counter shows how many bytes actually rx/tx using SDP sockets.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
9 years agosdp: Fix Bug 114242 - Multi connection net_perf causes server to hang
Moran Perets [Mon, 26 Sep 2011 08:16:49 +0000 (11:16 +0300)]
sdp: Fix Bug 114242 - Multi connection net_perf causes server to hang

    Fix the soft lockup bug by changing the allocation flag in
    sdp_bcopy.c from 0 to gfp.

    Reviewed by: Amir Vadai

Signed-off-by: Moran Perets <moranp@mellanox.co.il>
9 years agoFMR: remove FMR failure messages
Eli Cohen [Mon, 23 May 2011 09:15:30 +0000 (12:15 +0300)]
FMR: remove FMR failure messages

remove the messages per Oracle's request. FMR is not yet supported in VMs.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
9 years agosdp: make sdp memory leak print a debug
Amir Vadai [Wed, 4 May 2011 06:30:24 +0000 (09:30 +0300)]
sdp: make sdp memory leak print a debug

Since this is probably an accounting error and not a real memory leak - should
be a debug only for now.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: changed memory accounting warning into debug
Amir Vadai [Thu, 21 Apr 2011 10:55:55 +0000 (13:55 +0300)]
sdp: changed memory accounting warning into debug

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: Fix issues in sdpprf
Amir Vadai [Thu, 14 Apr 2011 11:27:43 +0000 (14:27 +0300)]
sdp: Fix issues in sdpprf

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: Remove protection before sleep on RX
Amir Vadai [Tue, 12 Apr 2011 14:40:07 +0000 (17:40 +0300)]
sdp: Remove protection before sleep on RX

No need for this protection, it was needed because of a bug previously fixed.
Credit update could be sent even when credit reaches 2.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: Enable automatic path migration support also in the passive side of the connection.
Moni Shoua [Tue, 5 Apr 2011 10:44:59 +0000 (13:44 +0300)]
sdp: Enable automatic path migration support also in the passive side of the connection.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: Fixed some coverity issues
Amir Vadai [Tue, 5 Apr 2011 07:58:56 +0000 (10:58 +0300)]
sdp: Fixed some coverity issues

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agoFlatten the entire tree fixes
Eli Cohen [Sun, 3 Apr 2011 07:07:44 +0000 (10:07 +0300)]
Flatten the entire tree fixes

As from now we are going to avoid using patches to commit changes to the
driver. Instead, we will push directly to the source files. Backports are still
maintained but only for 2.6.18-EL5.5; backaports of 2.6.32 are completely
removed.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
9 years agosdp: Fixed compilation error on 2.6.18 RH5.5
Amir Vadai [Thu, 24 Mar 2011 08:46:06 +0000 (10:46 +0200)]
sdp: Fixed compilation error on 2.6.18 RH5.5

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: fix memory leak. sockets_allocated wasn't freed
Amir Vadai [Tue, 22 Mar 2011 14:20:34 +0000 (16:20 +0200)]
sdp: fix memory leak. sockets_allocated wasn't freed

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: Removed spaces and tabs at end of lines
Amir Vadai [Sun, 20 Mar 2011 16:37:16 +0000 (18:37 +0200)]
sdp: Removed spaces and tabs at end of lines

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: fix sdpprf
Amir Vadai [Sun, 20 Mar 2011 16:32:23 +0000 (18:32 +0200)]
sdp: fix sdpprf

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: Fixed bcopy statistics
Amir Vadai [Sun, 20 Mar 2011 15:25:02 +0000 (17:25 +0200)]
sdp: Fixed bcopy statistics

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: Bad behaviour when setting low rcvbuf size
Amir Vadai [Sun, 20 Mar 2011 13:34:33 +0000 (15:34 +0200)]
sdp: Bad behaviour when setting low rcvbuf size

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: Fixed a typo
Amir Vadai [Wed, 9 Mar 2011 08:35:15 +0000 (10:35 +0200)]
sdp: Fixed a typo

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: Limit total memory consumed by rcvbuf
Amir Vadai [Tue, 8 Mar 2011 08:25:35 +0000 (10:25 +0200)]
sdp: Limit total memory consumed by rcvbuf

rcvbuf is already limited by the payload in the queue. But also need to limit
total memory consumption of it, since small packets received might have a very
large overhead to the payload.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: fix "sdpprf empty after a long run"
Amir Vadai [Tue, 8 Mar 2011 08:23:24 +0000 (10:23 +0200)]
sdp: fix "sdpprf empty after a long run"

sdpprf_log_count gets to a negative value after a long run.
This is only a quick fix - still might loose logs sometimes.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: make SDP_RX_SIZE a module parameter
Amir Vadai [Mon, 7 Mar 2011 11:11:59 +0000 (13:11 +0200)]
sdp: make SDP_RX_SIZE a module parameter

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: Rollback credit limit during ZCopy transaction.
Amir Vadai [Sun, 6 Mar 2011 12:38:08 +0000 (14:38 +0200)]
sdp: Rollback credit limit during ZCopy transaction.

This limit was added at commit 2574b53 ("Abort rx SrcAvail when out of
credits")

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: get per socket memory statistics at socket's sysfs file
Amir Vadai [Sun, 6 Mar 2011 13:09:24 +0000 (15:09 +0200)]
sdp: get per socket memory statistics at socket's sysfs file

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: fix a hole in rx memory limit
Amir Vadai [Sun, 6 Mar 2011 12:33:00 +0000 (14:33 +0200)]
sdp: fix a hole in rx memory limit

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: make sure memory is reclaimed
Amir Vadai [Sun, 6 Mar 2011 12:31:27 +0000 (14:31 +0200)]
sdp: make sure memory is reclaimed

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: send packets without payload when credits=1
Amir Vadai [Sun, 6 Mar 2011 12:27:57 +0000 (14:27 +0200)]
sdp: send packets without payload when credits=1

This according to the SPEC, and prevent a deadlock in ZCopy. SrcAvailCancel
wasn't acked when credits got low.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: removed some prints to sdpprf
Amir Vadai [Sun, 6 Mar 2011 12:21:14 +0000 (14:21 +0200)]
sdp: removed some prints to sdpprf

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: remove unused rcvbuf_scale module parameter
Amir Vadai [Sun, 6 Mar 2011 12:19:26 +0000 (14:19 +0200)]
sdp: remove unused rcvbuf_scale module parameter

Was used when in slow start mechanism.
Not needed till having slow start again.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: fix memory socket accounting
Amir Vadai [Mon, 28 Feb 2011 09:57:18 +0000 (11:57 +0200)]
sdp: fix memory socket accounting

skb->truesize - total bytes allocated by skb, including fragments

Specific socket accounting:
* sk->sk_wmem_queued - send bytes currently in TX queue
* RX queue accounting is done by using seq
* sk->sk_rmem_alloc - bytes consumed by RX

Protocol accounting:
* sk->sk_forward_alloc - bytes that are available to be consumed
* prot->memory_allocated - bytes consumed by TX/RX

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
9 years agosdp: fix sdp_sendmsg counters in sdpstats
Amir Vadai [Mon, 28 Feb 2011 09:33:58 +0000 (11:33 +0200)]
sdp: fix sdp_sendmsg counters in sdpstats

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>