ib_sdp/cma: readd SDP support to cma_save_net_info
Upstream has removed SDP support from cma.c. Some applications may
not display addr/port information correctly without this change to
cma_save_net_info() function.
There are many scenarios when (using SR-IOV) the connection setup
tunneled from the VM through Dom0 fails for reasons such as the peer
silently dropping the connection requests due to its listener queue being
full or sends back a reject because there is no listener, the id_map_entry
is not freed on the Dom0 causing memory leak.
During testing we encountered a softlockup with multiple CPUs stuck
waiting for sriov->id_map_lock while id_map_ent_timeout was holding it for
a long time. Changing spin_lock(&sriov->id_map_lock) in id_map_ent_timeout
to spin_lock_irqsave resolved this hang.
ib/core: init shared-pd ref count to 1, and add cleanup
When shpd is created it is already referred to by parent 'pd',
so shpd->shared should be '1' initially (and not '0');
otherwise, the 'shpd' memory may get freed/reallocated
while it is still being referred to by one last pd.
Additionally, add shared-pd cleanup to ucontext cleanup flow.
Yuval Shaia [Sun, 16 Aug 2015 04:00:45 +0000 (21:00 -0700)]
net/mlx4_vnic: Initialize new fields of mlx4_ib_qp
Initializing the three new mlx4_ib_qp's fields - qps_list, cq_recv_list
and cq_send_list.
w/o initializing these new fields, kernel crashed in destroy_qp_common
when trying to remove them from the list.
The functions get_cqs, mlx4_ib_lock_cqs and mlx4_ib_unlock_cqs moved as
inline functions to mlx4_ib.h so it can be called also from mlx4_vnic.
Yuval Shaia [Sun, 1 Feb 2015 00:53:26 +0000 (16:53 -0800)]
mlx4_vnic: Skip fip discover restart if pkey index not changed
Driver receives MAD on any change made to partition table.
This fix aim to cover the case where driver shouldn't restart net interface
when receiving PKEY_CHANGE event but pkey index was not changed.
Yuval Shaia [Tue, 16 Jun 2015 07:32:36 +0000 (00:32 -0700)]
IB/ipoib: CSUM support in connected mode
This enhancement suggest the usage of IB CRC instead of CSUM in IPoIB CM.
IPoIB CM uses RC (Reliable Connection) which guarantees the corruption free
delivery of the packet.
InfiniBand uses 32b CRC which provides stronger data integrity protection
compare to 16b IP Checksum. So, there is no added value that IP/TCP Checksum
provides in the IB world.
The proposal is to tell network stack that IPoIB-CM supports IP Checksum
offload. This enables the kernel to save the time of checksum calculation
of IPoIB CM packets. Network sends the IP packet without adding the IP
Checksum to the header. On the receive side, IPoIB driver again tells the
network stack that IP Checksum is good for the incoming packets and network
stack avoids the IP Checksum calculations.
During connection establishment the driver determine if peer supports
IB CRC as checksum. This is done so driver will be able to calculate
checksum before transmiting the packet in case the peer does not support
this feature.
IB/ipoib: Scatter-Gather support in connected mode
By default, IPoIB-CM driver uses 64k MTU. Larger MTU gives better performance.
This MTU plus overhead puts the memory allocation for IP based packets at 32
4k pages (order 5), which have to be contiguous.
When the system memory under pressure, it was observed that allocating 128k
contiguous physical memory is difficult and causes serious errors (such as
system becomes unusable).
This enhancement resolve the issue by removing the physically contiguous memory
requirement using Scatter/Gather feature that exists in Linux stack.
With this fix Scatter-Gather will be supported also in connected mode
This change also revert the change made in commit e112373
("IPoIB/cm: Reduce connected mode TX object size)".
ib_uverbs: Support for kernel implementation of XRC calls from user space
Extends the kernel/user space interface for work requests to also provide
the XRC shared receive queue number. Necessary to support
kernel level implementation of user verbs for XRC.
Requires a corresponding libibverbs change to support XRC.
ib_{uverbs/core}: add new ib_create_qp_ex with udata arg
Necessary to get device specific arguments through to XRC QPs.
Added new local header file to serve as support interface
between ib_core and ib_uverbs.
Right now there is a lot of duplicate setup code in uverbs_cmd.c
on the ib_uverbs side and verbs.c on the ib_core side. This commit
is a quick fix to have XRC support working, but similar calls
can be added to consolidate the code for other parts of the API.
ib_uverbs: Avoid vendor specific masking of attributes in query_qp
This commit removes the implementation and use of the modify_qp_mask
helper function from the generic OFED implementation and into individual
device drivers.
Like with use of the ib_modify_qp_is_ok function it should be up to
each device driver how to handle bits set in the attribute masks.
With the modify_qp_mask function applied in the generic code,
drivers would not see the bits that the user process actually sets.
The restrictions imposed by the filter are also beyond what
is imposed by the Infiniband standard, and would also limit
future drivers or hardware from checking for unsupported or
invalid settings.
ib_uverbs: Add padding to end align ib_uverbs_reg_mr_resp
The ib_uverbs_reg_mr_resp structure was not 64 bit end aligned
as required by the protocol. This causes alignment issues
if a device specific driver needs to transfer extra response
arguments.
Most of the ib device driver entry points supports optional
device specific parameter transfer between user space and kernel space
via the udata argument - add a similar argument for ib_create_ah.
Update all infiniband drivers to include this agument
in their driver entry point implementation.
ib_umem: Add a new, more generic ib_umem_get_attrs
This call allows a full range of DMA attributes and also
DMA direction to be supplied and is just a refactor of the old ib_umem_get.
Reimplement ib_umem_get using the new generic call,
now a trivial implementation.
Dag Moxnes [Tue, 21 Apr 2015 10:20:02 +0000 (12:20 +0200)]
ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered
The process_mad function is an optional IB driver entry point
allows a driver to intercept or modify MAD traffic.
This fix allows MAD traffic to flow down to the device also
when MAD traffic is completely handled by the device and
no process_mad function is provided.
Mukesh Kacker [Tue, 17 Mar 2015 01:11:27 +0000 (18:11 -0700)]
mlx4_core: More support for automatically scaling profile parameters
Add a new module configuration variable "scale_profile" parameter
which allows dynamic scaling of parameters. When it is not set,
the Mellanox default behavior will prevail.
The dynamically configured parameters are typically set to 0 in
configuration - but if they are set to a specific value, a warning
is printed that they are not being dynamically scaled. (This allows
for make exceptions and experiments with different values).
The original dynamic scaling of profile parameter num_mtt_segs
(governed by log_num_mtt) is retained. In addition scaling is
also introduced for parameter num_qp (governed by log_num_qp).
This is not a direct port but similar in spirit to fixes done
in UEK2 with following commits:
52ac96 OFED: Automatically size MTT in mlx4_core
47678c mlx4_core: increase default number of qps in mlx4_core driver
218561 mlx_core: Change log_num_mtt scaling range
497dd4 mlx4_core: change default for mlx4_scale_profile
An error message improvement is borrowed from Mellanox OFED 2.4 commit
17465c net/mlx4: add explicit message if user ask too few QPs
(Code for this commit is already upstream but the error message is less
explicit upstream!)
Mukesh Kacker [Thu, 22 Jan 2015 19:14:02 +0000 (11:14 -0800)]
ipoib: rfe- enable pkey and device name decoupling
The sysfs "create_child" interface creates
pkey based child interface but derives the
name from parent device name and pkey value.
This makes administration difficult where pkey
values can change but policies encoded with
device names do not.
We add ability to create a child interface with
a user specified name and a specified pkey
with a new sysfs "create_named_child" interface
(and also add a corresponding "delete_named_child"
interface).
We also add a new module api interface to query
pkey from a netdevice so any kernel users of
pkey based child interfaces can query it - since
with device name decoupled from pkey, it can no
longer be deduced from parsing the device name
by other kernel users.
Qing Huang [Mon, 26 Jan 2015 06:17:09 +0000 (22:17 -0800)]
ib_sdp: adding sdp socket support to rdma_cm
SDP related code was completely removed from upstream after
these two commits:
fbaa1a6, Sean Hefty, RDMA/cma: Merge cma_get/save_net_info 01602f1, Sean Hefty, RDMA/cma: Remove unused SDP related code
When adding the SDP support code back, to better organize
changes, we created the following separate new files for the
code: cma_priv.h, cma_sdp.c and cma_sdp_priv.h
Ashish Samant [Tue, 7 Oct 2014 18:21:35 +0000 (11:21 -0700)]
mlx4_vnic: Add correct typecasting to pointers in vnic_get_frag_header()
The *mac_hdr (Mac Header) pointer should be incremented ETH_HLEN
bytes to get the *ip_hdr (IP Header) pointer. Similarly, the IP
Header pointer should be incremented by (iph->ihl << 2) bytes
to get the *tcpudp_hdr (Transport Header) pointer.
Fix this by adding a u8* cast to the two pointers while doing
the pointer arithmetic.
Vu Pham [Fri, 18 May 2012 22:01:29 +0000 (15:01 -0700)]
mlx4_core: supporting 64b counters
Support 64b counters using PMA_COUNTERS_EXT mad:
. Sending the mad to fw for IB transport using MAD_IFC
. Sending mailbox command QUERY_IF_STAT to fw for EN transport
Note: Ported from Mellanox OFED 2.4.
64-bit counters can wrap around. 32-bit counters saturate at UINT_MAX
(as in upstream code but unlike in Mellanox OFED 2.4 code where they
can wrap around!)
Vu Pham [Fri, 18 May 2012 21:35:54 +0000 (14:35 -0700)]
ib_core: supporting 64b counters using PMA_COUNTERS_EXT mad
Support 64b counters using PMA_COUNTERS_EXT mad
. create "counters_ext" group in sysfs
. form mad management class IB_MGMT_CLASS_PERF_MGMT, attribute_id IB_PMA_PORT_COUNTERS_EXT
net/mlx4: When issuing commands use rwsem insteam of rw spinlocks
The mlx4 drivers use a read_lock while issuing commands, but
when a lot of commands are issued simultaneously, the mlx4
driver could sleep. In order to resolve this "sleep while holding
spin-lock" issue, we replace this spinlock with read-write
semaphore.
Fixes: 2393fac27a97 ('net/mlx4: Switching between sending
commands via polling and events may results
in hung tasks') Signed-off-by: Matan Barak <matanb@mellanox.com>
(Ported from Mellanox OFED 2.4)
Moshe Lazer [Tue, 5 Aug 2014 15:16:46 +0000 (18:16 +0300)]
IB/mlx4: Mark user mr as writable if actual virtual memory is writable
To allow rereg mr (from read only mr to writablemr) without using
get_user_pages again, we need to define the initial mr as writable.
We shouldn't do this in case that user virtual memory is not
writable (e.g. const memory)
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
(Ported from Mellanox OFED 2.4)
Jack Morgenstein [Thu, 10 Jul 2014 09:29:14 +0000 (12:29 +0300)]
mlx4_ib: Fix endianness in blueflame post_send.
qp object field doorbell_qpn was initialized using swab()
at qp creation.
swab() unconditionally swaps dword endianness. Thus, on
little-endian platforms the endianness of doorbell_qpn was
big endian; on big-endian platforms, doorbell_qpn is little-endian.
In post send blueflame, doorbell_qpn was taken as is (i.e., the
driver assumed that it was in big-endian format). This was OK
for little-endian hosts, but incorrect for big-endian hosts.
The fix is to use cpu_to_be32 when initializing doorbell_qpn (thus
guaranteeing that doorbell_qpn is in big-endian format on all
host types). This also requires modifying non-bf sends to
use __raw_writel (which does not do any endianness swapping)
instead of writel (which does endianness swapping on big-endian hosts).
The fix was developed by Shamir Rabinovitch of Oracle.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
(Ported by Mellanox OFED 2.4)
net/mlx4: Switching between sending commands via polling and events may results in hung tasks
When switching between those methonds of sending commands, it's
possbile that a task will keep waiting for the polling sempahore,
but may never be able to acquire it.
This is due to mlx4_cmd_use_events which "down"s the
sempahore back to 0.
Reproducing it involves in sending commands while changing
between mlx4_cmd_use_polling and mlx4_cmd_use_events.
Signed-off-by: Matan Barak <matanb@mellanox.com>
(Ported from Mellanox OFED 2.4)
Eli Cohen [Thu, 3 Jul 2014 13:39:00 +0000 (16:39 +0300)]
IB/mlx4: Put non zero value in max_ah
We put INT_MAX since this is the max value the in can hold.
Though hardware capability is unlimited, this is practically
a large enough number so we can use it.
Signed-off-by: Eli Cohen <eli@mellanox.com>
(Ported from Mellanox OFED 2.4)
Majd Dibbiny [Mon, 16 Jun 2014 07:18:41 +0000 (10:18 +0300)]
IB/core: add debugging prints to explain -EINVAL in ib_uverbs_reg_mr
Understanding why -EINVAL is returned from uverbs is difficult
as there are multiple code paths that can cause the value to be
returned. This patch adds some explainations as pr_debug prints.
Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com>
(Ported from Mellanox OFED 2.4)
We found that in some cases the kernel sends skb where the
gso field was damaged, and the size of that field was bigger
than the physical mtu, when the HW gets such size it flushes
the qp to error state and all traffic on that interface is
disabled.
In order to avoid such case, i added a check to that field
prior to the ib_send.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
(Ported from Mellanox OFED 2.4)
Hadar Hen Zion [Tue, 11 Mar 2014 15:44:51 +0000 (17:44 +0200)]
mlx4_core: Fix resource tracker memory leak after Reset Flow
In case of non-responsive device mlx4_ACCESS_MEM fails and the
driver can't read qp_detach mailbox, which includes all the
rule information.
Since the driver doesn't get the rules attributes form the
qp_detach mailbox the master fails to detach his rules form
the resource tracker during driver unload sequence when the
device in in internal_error state.
Calling rem_slave_qp will remove those rules and the qps they are
attached to unconditionally.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
(Ported from Mellanox OFED 2.4)
Moni Shoua [Mon, 3 Feb 2014 13:39:15 +0000 (15:39 +0200)]
IB/core: Fix QP attr mask when resolving smac
When rdma_accept() is called rdma_cm modifies the QP to RTR.
During this stage the source mac needs to be resolved and put
in the QP atrr. Before modifying the QP. This patch also adds
the flag IB_QP_SMAC to the atrr_mask which was missing.
Erez Shitrit [Sun, 8 Dec 2013 11:39:35 +0000 (13:39 +0200)]
rdma_cm/cma: Cache broadcast domain record.
Currently, rdma_cm waits for the IPoIB driver to complete
its join to the broadcast domain record; after IPoIB gets its
multicast, rdma_cm tries to obtain its own multicast. After an
IB_CLIENT reregister event, IPoIB may not succeed in its first
effort to reregister its multicast groups. In this case, the
backoff mechanism is applied, and IPoIB retries after a
backoff which starts at 2 seconds and can increase up to
16 seconds.
Since rdma_cm waits for the IPoIB multicast join to succeed,
it too will be delayed at least 2 seconds.
The fix is to detach rdma_cm's multicast operation from IPoIB's
broadcast record re-join. When rdma_cm executes a new join
request, it now tries (via the cma) to take parameters from a
cached broadcast record. If the join fails using the cached
values, the cma deletes the cached record and tries to get a new
one.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
(Ported from Mellanox OFED 2.4)
Noa Osherovich [Thu, 5 Dec 2013 08:24:55 +0000 (10:24 +0200)]
ipoib: added an error message when trying to change mtu to 2K-4K
Max mtu defined by IB is 4K, but mcast_mtu is limited to 2K,
so any request to change mtu to a value between 2K-4K didn't
change the mtu, but also didn't show an error message.
An error value (-EINVAL) is now returned and an ipoib_warn
is issued in such cases.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
(Ported from Mellanox OFED 2.4)
Jack Morgenstein [Sun, 8 Dec 2013 08:41:07 +0000 (10:41 +0200)]
ib_core: Do not transition MC groups to error on SM_CHANGE event
Do not transition multicast groups to error on an SM_CHANGE
event. These events are not connected with mcast groups.
(When the SM wishes to have multicast groups reregistered,
it issues the CLIENT_REREG event).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
(Ported from Mellanox OFED 2.4)
Saeed Mahameed [Wed, 27 Nov 2013 13:50:11 +0000 (15:50 +0200)]
mlx4_vnic: always remove child macs in vnic_parent_update remove request
Child macs are not removed in host admin vnics once the connection
is lost with BX. This caused a loss of connectivity for child vnics
in case of connection restored with the BX, since the BX is not
aware of the old child macs.
Solution is to always remove child macs when vnic_paren_update is
called with remove request.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
(Ported from Mellanox OFED 2.4)
Saeed Mahameed [Wed, 27 Nov 2013 09:55:07 +0000 (11:55 +0200)]
mlx4_vnic: set default moderation values in vnic_alloc_netdev
vnic_set_default_moder was called from _vnic_open, which caused
to reset all current moderation values to the default every time
the user opens/closes the vnic interface.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
(Ported from Mellanox OFED 2.4)
Shani Michaeli [Sun, 3 Jun 2012 12:48:32 +0000 (15:48 +0300)]
mlx4: Handle memory region deregistration failure
Memory region deregistration can fail when memory windows
are bound to it. We handle such failures by propagating them
to the user, or by printing a serious warning.
Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com>
(Ported from Mellanox OFED 2.4)
Jack Morgenstein [Thu, 14 Nov 2013 15:35:06 +0000 (17:35 +0200)]
ib_core: More fixes to ib_sa_add_one error flow
commit 0e7377eed fixed a resource leak of mad agents in
the ib_sa_add_one error flow. However, the fix allowed
ib_mad_unregister_agent to be called in a case where the
ib_mad_register_agent request failed (resulting in an
illegal pointer in the agent field). This caused a kernel
Oops in the error flow.
Fix this by calling ib_unregister_mad_agent only for cases where
ib_register_mad_agent succeeded.
In addition, separate the ib_register_event_handler() call error
flow from the loop error flow. If the call to
ib_register_event_handler fails, the client data must be reset
to NULL, (in case at some point ib_register_event_handler() is
modified so that it may return a non-zero (error) value).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
(Ported from Mellanox OFED 2.4)
Majd Dibbiny [Mon, 19 Aug 2013 15:19:23 +0000 (18:19 +0300)]
ib_core: Safely unregister mad agent when necessary.
When the allocation of the receive buffer fails the driver
needs to unregister the mad agent. The function
ib_unregister_mad_agent doesn't check if the pointer of the
mad agent is valid and doesn't contain an error and causes a
Kernel Panic. Therefore, we need to check if the pointer of
the mad agent is valid by calling PTR_ERR and only then
unregister the agent.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
(Ported from Mellanox OFED 2.4)
Majd Dibbiny [Tue, 27 Aug 2013 11:07:36 +0000 (14:07 +0300)]
mlx4_core: Extend num_mtt in dev caps to avoid overflow.
Some legitimate combinations of log_num_mtt and log_mtts_per_seg
cause overflow in the calculation of the num_mtt when initializing
the HCA which causes Kernel panic. Changed the variable to be 'u64'
instead of 'int' to avoid the overflow and made the needed changes
to support the new type.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
(Ported from Mellanox OFED 2.4)
mlx4_core: fix FMR unmapping to allow remapping afterward
The FMR common use flow (as implemented in fmr_pool) is:
- Allocate FMR (ib_alloc_fmr)
- Use the FMR to remap DMA memory until remaps limit
exceeded (ib_map_phys_fmr)
- Unmap the FMR (ib_unmap_fmr)
- Use the FMR to remap DMA memory until remaps limit
exceeded (ib_map_phys_fmr)
- ...
The current implementation of mlx4_fmr_unmap is not following
this use flow since it is using the HW2SW MPT command.
The HW2SW MPT command notifies the FW that the MPT entry is
not used by HW anymore. The FW may act according to this information,
therefore it is not safe for the driver to manipulate the MPT
directly. The patch fixes this by manipulating the MPT directly
to unmap the memory instead of using the HW2SW MPT command.
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
(Ported from Mellanox OFED 2.4)
Moshe Lazer [Sun, 16 Jun 2013 08:04:17 +0000 (11:04 +0300)]
mlx4_core: resolvs kernel panic when connectx_port_config fail to set ports
When changing ports configutation (e.g. from ib,ib to eth,eth)
the device is disconnected from interfaces and catas error lists
than we change ports config and reconnecting the device.
In case ports config changing fails the device left disconnected.
If we try again to configure the ports the driver retry to
disconnect the device form its lists and crashes in list_del
function. To aviod this the list_del replaced by list_del_init
(to allow redeleting the device).
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
(Ported from Mellanox OFED 2.4)
mlx4_core: sysfs, fix usage of log_num_mtt module parameter
When was auto calculated based on RAM size it wrongly includes
also log_mtts_per_seg.
It's wrong in 2 ways:
First, log_mtts_per_seg should be added by the application itself,
no reason to a have total in log_num_mtt itself.
Second, in case that an extra NIC exists it may get an invalid
value as it depends on a larger value.
Specifically, it may cause an overflow and later leads on to
kernel panic via mlx4_buddy_init.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
(Ported from Mellanox OFED 2.4)
Yishai Hadas [Wed, 20 Mar 2013 16:00:02 +0000 (18:00 +0200)]
mlx4_core: fix ib_uverbs_get_context flow
Fix flow to prevent kernel panic in case of a failure in copy_to_user.
INIT_IB_EVENT_HANDLER must be called to initialize the event handler
list before releasing filp as part of fput.
Otherwise will get a kernel panic at ib_unregister_event_handler
when calling list_del.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
(Ported from Mellanox OFED 2.4)
Moshe Lazer [Tue, 5 Mar 2013 12:08:47 +0000 (14:08 +0200)]
mlx4_core: use msi_x module param to limit num of MSI-X irqs
The msi_x module param usage is:
0 - don't use MSI-X
1 - use MSI-X (driver decide the num of MSI-X irqs)
>1 - limit number of MSI-X irqs to msi_x
In case of SRIOV the msi_x>1 treated as msi_x==1
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
(Ported from Mellanox OFED 2.4)
Jack Morgenstein [Mon, 25 Feb 2013 12:04:18 +0000 (14:04 +0200)]
ib/core: change error prints in cm module to debug prints.
commit acd10b49 added prints to the cm module.
These, however, should really be debug prints, to be activated
when it is necessary to track down some cm problem.
To activate the debug mechanism, you need to do the following:
1. mount the debug fs (do this once)
mount -t debugfs none /sys/kernel/debug/
Jack Morgenstein [Thu, 19 Mar 2015 01:20:31 +0000 (18:20 -0700)]
mlx4_core: Add more info to mlx4_cmd_post failure error messages
To assist in debugging and support, add additional information
to output generated when fail to post a FW command. In addition,
add in_param, in_modifier, and op_modifier values to output
when commands are successfully posted but time out.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
(Ported from Mellanox OFED 2.4)
Jack Morgenstein [Wed, 20 Feb 2013 14:43:59 +0000 (16:43 +0200)]
mlx4_core: disable mlx4_QP_ATTACH calls from guests if master is doing flow steering.
Old upstream kernel guests do not detect if device-enabled flow
steering is activated by the master. If DMFS is activated,
the master should return error to guests which try to use
the B0-steering flow calls (mlx4_QP_ATTACH).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
(Ported from Mellanox OFED 2.4)
Jack Morgenstein [Mon, 18 Feb 2013 10:34:59 +0000 (12:34 +0200)]
mlx4_core: change resource quotas to enable supporting upstream-kernel guests
The resource-quota code passed non-power-of-2 quotas to guests.
In the upstream kernel (bugs), resource quotas for MPTs and QPs
are assumed to be powers-of-2. In MPT case, mlx4_init_mr_table
checks for num_mpts being a power-of-2 before checking if it
is running as a slave.
In the QP case, procedure mlx4_qp_alloc() assumes that
(num_qps - 1) is a power-of-2 when calling radix_tree_insert()
and radix_tree_delete().
In the MPT case, mlx4_init_mr_table() failed on the guest,
causing abort of the guest driver bringup.
In the QP case, although create-qp succeeded on the
hypervisor, the radix_tree_insert() call failed, resulting
in failure to create QPs with certain qp numbers.
The fix, for both cases, is to round-up the quota to the
next power-of-2 for guests for MPTs and QPs. This does no
harm, as these two resources were not really meant to be
limited by an upper quota. The guaranteed resources for QPs
and MPTs per VF/PF are not affected by this change.
The only effect is that no guest will ever be able to
actually reach its max-quota for QPs and MPTs.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
(Ported from Mellanox OFED 2.4)
Yishai Hadas [Tue, 12 Feb 2013 13:53:46 +0000 (15:53 +0200)]
mlx4_core: device revision support
The device revision field returned by the NodeInfo MAD
is incorrect on ConnectX3 devices.
This patch is driver side handling to complete a FW fix
added at 2.11.1172. INIT_HCA - bit at offset 0x0C.12 is
set to 1 so that FW will report correct device revision.
Older FW versions won't be affected from turning on that bit,
no capability bit is needed.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
(Ported from Mellanox OFED 2.4)