]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
7 years agoRDS: IB: Change the proxy qp's path_mtu to IB_MTU_256
Avinash Repaka [Tue, 26 Sep 2017 21:20:17 +0000 (14:20 -0700)]
RDS: IB: Change the proxy qp's path_mtu to IB_MTU_256

The path_mtu of proxy qp of RDS is currently set to IB_MTU_4096, but it
doesn't have much relevance, since the proxy qp is used only for
registration and invalidation of MRs. For the proxy qp to work in most
environments, this patch changes the path_mtu to IB_MTU_256.

Orabug: 26864694

Suggested-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
7 years agodevpts: clean up interface to pty drivers
Linus Torvalds [Sat, 16 Apr 2016 22:16:07 +0000 (15:16 -0700)]
devpts: clean up interface to pty drivers

This gets rid of the horrible notion of having that

    struct inode *ptmx_inode

be the linchpin of the interface between the pty code and devpts.

By de-emphasizing the ptmx inode, a lot of things actually get cleaner,
and we will have a much saner way forward.  In particular, this will
allow us to associate with any particular devpts instance at open-time,
and not be artificially tied to one particular ptmx inode.

The patch itself is actually fairly straightforward, and apart from some
locking and return path cleanups it's pretty mechanical:

 - the interfaces that devpts exposes all take "struct pts_fs_info *"
   instead of "struct inode *ptmx_inode" now.

   NOTE! The "struct pts_fs_info" thing is a completely opaque structure
   as far as the pty driver is concerned: it's still declared entirely
   internally to devpts. So the pty code can't actually access it in any
   way, just pass it as a "cookie" to the devpts code.

 - the "look up the pts fs info" is now a single clear operation, that
   also does the reference count increment on the pts superblock.

   So "devpts_add/del_ref()" is gone, and replaced by a "lookup and get
   ref" operation (devpts_get_ref(inode)), along with a "put ref" op
   (devpts_put_ref()).

 - the pty master "tty->driver_data" field now contains the pts_fs_info,
   not the ptmx inode.

 - because we don't care about the ptmx inode any more as some kind of
   base index, the ref counting can now drop the inode games - it just
   gets the ref on the superblock.

 - the pts_fs_info now has a back-pointer to the super_block. That's so
   that we can easily look up the information we actually need. Although
   quite often, the pts fs info was actually all we wanted, and not having
   to look it up based on some magical inode makes things more
   straightforward.

In particular, now that "devpts_get_ref(inode)" operation should really
be the *only* place we need to look up what devpts instance we're
associated with, and we do it exactly once, at ptmx_open() time.

The other side of this is that one ptmx node could now be associated
with multiple different devpts instances - you could have a single
/dev/ptmx node, and then have multiple mount namespaces with their own
instances of devpts mounted on /dev/pts/.  And that's all perfectly sane
in a model where we just look up the pts instance at open time.

This will eventually allow us to get rid of our odd single-vs-multiple
pts instance model, but this patch in itself changes no semantics, only
an internal binding model.

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jann Horn <jann@thejh.net>
Cc: Greg KH <greg@kroah.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 67245ff332064c01b760afa7a384ccda024bfd24)

Orabug: 26743034

Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Wim ten Have <wim.ten.have@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
drivers/tty/pty.c
fs/devpts/inode.c

There are two patches present in mainline that came before this one which are
still missing from UEK. They are:

1) pty: Remove pty_unix98_shutdown()
   responsible for the conflict in drivers/tty/pty.c

2) devpts: if initialization failed, don't crash when opening /dev/ptmx
   responsible for the conflict in fs/devpts/inode.c

Neither seemed like they were critical enough nor directly tied to the patch
I wanted, to justify pulling them along for the ride. So intead, I manually
resolved the conflicting chunks of code, applying only the deltas that were
related to "devpts: clean up interface to pty drivers" in a way that makes
sense for that particular patch.

7 years agotcp: fix tcp_mark_head_lost to check skb len before fragmenting
Neal Cardwell [Mon, 25 Jan 2016 22:01:53 +0000 (14:01 -0800)]
tcp: fix tcp_mark_head_lost to check skb len before fragmenting

This commit fixes a corner case in tcp_mark_head_lost() which was
causing the WARN_ON(len > skb->len) in tcp_fragment() to fire.

tcp_mark_head_lost() was assuming that if a packet has
tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of
M*mss bytes, for any M < N. But with the tricky way TCP pcounts are
maintained, this is not always true.

For example, suppose the sender sends 4 1-byte packets and have the
last 3 packet sacked. It will merge the last 3 packets in the write
queue into an skb with pcount = 3 and len = 3 bytes. If another
recovery happens after a sack reneging event, tcp_mark_head_lost()
may attempt to split the skb assuming it has more than 2*MSS bytes.

This sounds very counterintuitive, but as the commit description for
the related commit c0638c247f55 ("tcp: don't fragment SACKed skbs in
tcp_mark_head_lost()") notes, this is because tcp_shifted_skb()
coalesces adjacent regions of SACKed skbs, and when doing this it
preserves the sum of their packet counts in order to reflect the
real-world dynamics on the wire. The c0638c247f55 commit tried to
avoid problems by not fragmenting SACKed skbs, since SACKed skbs are
where the non-proportionality between pcount and skb->len/mss is known
to be possible. However, that commit did not handle the case where
during a reneging event one of these weird SACKed skbs becomes an
un-SACKed skb, which tcp_mark_head_lost() can then try to fragment.

The fix is to simply mark the entire skb lost when this happens.
This makes the recovery slightly more aggressive in such corner
cases before we detect reordering. But once we detect reordering
this code path is by-passed because FACK is disabled.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d88270eef4b56bd7973841dd1fed387ccfa83709)

Orabug: 26646104
Conflicts:
       tcp_skb_mss is not used in UEK4. Hence, skb_shinfo()
is used to get the mss size.

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agokvm: nVMX: Don't allow L2 to access the hardware CR8
Jim Mattson [Tue, 12 Sep 2017 20:02:54 +0000 (13:02 -0700)]
kvm: nVMX: Don't allow L2 to access the hardware CR8

If L1 does not specify the "use TPR shadow" VM-execution control in
vmcs12, then L0 must specify the "CR8-load exiting" and "CR8-store
exiting" VM-execution controls in vmcs02. Failure to do so will give
the L2 VM unrestricted read/write access to the hardware CR8.

This fixes CVE-2017-12154.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 51aa68e7d57e3217192d88ce90fd5b8ef29ec94f)
OraBug: 26868769 CVE-2017-12154 kvm: nVMX: L2 guest could access hardware(L0) CR8 register
Tested-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agodtrace: ensure SDT stub function returns 0
Kris Van Hees [Fri, 29 Sep 2017 16:58:09 +0000 (12:58 -0400)]
dtrace: ensure SDT stub function returns 0

The SDT stub function is used during the kernel boot process (prior to
the patching of SDT probe points).  Since it is used for both regular
SDT probes and is-enabled SDT probes, it should return 0 to be a no-op
before call patching takes place.

Orabug: 26909775
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
7 years agotcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
Wei Wang [Thu, 18 May 2017 18:22:33 +0000 (11:22 -0700)]
tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0

When tcp_disconnect() is called, inet_csk_delack_init() sets
icsk->icsk_ack.rcv_mss to 0.
This could potentially cause tcp_recvmsg() => tcp_cleanup_rbuf() =>
__tcp_select_window() call path to have division by 0 issue.
So this patch initializes rcv_mss to TCP_MIN_MSS instead of 0.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 499350a5a6e7512d9ed369ed63a4244b6536f4f8)

Orabug: 26796038
CVE: CVE-2017-14106

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agoxfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY
Sabrina Dubroca [Wed, 3 May 2017 14:43:19 +0000 (16:43 +0200)]
xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY

When CONFIG_XFRM_SUB_POLICY=y, xfrm_dst stores a copy of the flowi for
that dst. Unfortunately, the code that allocates and fills this copy
doesn't care about what type of flowi (flowi, flowi4, flowi6) gets
passed. In multiple code paths (from raw_sendmsg, from TCP when
replying to a FIN, in vxlan, geneve, and gre), the flowi that gets
passed to xfrm is actually an on-stack flowi4, so we end up reading
stuff from the stack past the end of the flowi4 struct.

Since xfrm_dst->origin isn't used anywhere following commit
ca116922afa8 ("xfrm: Eliminate "fl" and "pol" args to
xfrm_bundle_ok()."), just get rid of it.  xfrm_dst->partner isn't used
either, so get rid of that too.

Fixes: 9d6ec938019c ("ipv4: Use flowi4 in public route lookup interfaces.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
(cherry picked from commit 9b3eb54106cf6acd03f07cf0ab01c13676a226c2)

Orabug: 25959303

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agorxrpc: Fix several cases where a padded len isn't checked in ticket decode
David Howells [Wed, 14 Jun 2017 23:12:24 +0000 (00:12 +0100)]
rxrpc: Fix several cases where a padded len isn't checked in ticket decode

This fixes CVE-2017-7482.

When a kerberos 5 ticket is being decoded so that it can be loaded into an
rxrpc-type key, there are several places in which the length of a
variable-length field is checked to make sure that it's not going to
overrun the available data - but the data is padded to the nearest
four-byte boundary and the code doesn't check for this extra.  This could
lead to the size-remaining variable wrapping and the data pointer going
over the end of the buffer.

Fix this by making the various variable-length data checks use the padded
length.

Reported-by: 石磊 <shilei-c@360.cn>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.c.dionne@auristor.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit 5f2f97656ada8d811d3c1bef503ced266fcd53a0)

Orabug: 26376434
CVE: CVE-2017-7482

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agoxen: don't print error message in case of missing Xenstore entry
Juergen Gross [Tue, 30 May 2017 18:52:26 +0000 (20:52 +0200)]
xen: don't print error message in case of missing Xenstore entry

When registering for the Xenstore watch of the node control/sysrq the
handler will be called at once. Don't issue an error message if the
Xenstore node isn't there, as it will be created only when an event
is being triggered.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Orabug: 26841566

(cherry picked from commit 4e93b6481c87ea5afde944a32b4908357ec58992)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
7 years agomlx4_core: calculate log_num_mtt based on total system memory
Wei Lin Guay [Fri, 22 Sep 2017 20:49:52 +0000 (22:49 +0200)]
mlx4_core: calculate log_num_mtt based on total system memory

The SR-IOV shared-port mechanism has a limitation that all the resources
and qp contexts are proxied through the PF. In order to reflect the
supported mtt entries, the log_num_mtt must be calculated based on the host
system memory rather than the privileged domain system memory. Thus, this
patch performs a Xen specific call to obtain the total memory during the PF
driver loading and uses that info to determine the size of the mtt table.

Orabug: 26526968

Signed-off-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
7 years agoxen/x86: Add interface for querying amount of host memory
Boris Ostrovsky [Fri, 15 Sep 2017 20:23:53 +0000 (16:23 -0400)]
xen/x86: Add interface for querying amount of host memory

A driver (or some other entity in the kernel) may need to know
amount of memory available on the host. Provide the interface (for
a privileged domain() to obtain this information.

Orabug: 26526923

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agords: Fix non-atomic operation on shared flag variable
Håkon Bugge [Tue, 5 Sep 2017 15:42:01 +0000 (17:42 +0200)]
rds: Fix non-atomic operation on shared flag variable

The bits in m_flags in struct rds_message are used for a plurality of
reasons, and from different contexts. To avoid any missing updates to
m_flags, use the atomic set_bit() instead of the non-atomic equivalent.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from upstream f530f39f5ff97209cc6f1bf66e634685954ad741)

Orabug: 26842076

Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
7 years agords: Fix incorrect statistics counting
Håkon Bugge [Wed, 6 Sep 2017 16:35:51 +0000 (18:35 +0200)]
rds: Fix incorrect statistics counting

In rds_send_xmit() there is logic to batch the sends. However, if
another thread has acquired the lock and has incremented the send_gen,
it is considered a race and we yield. The code incrementing the
s_send_lock_queue_raced statistics counter did not count this event
correctly.

This commit counts the race condition correctly.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from upstream 126f760ca94dae77425695f9f9238b731de86e32)

Orabug: 26847583

Conflicts:
net/rds/send.c

Reviewed-by: Avinash Repaka <avinash.repaka@oracle.com>
7 years agoi40e: use cpumask_copy instead of direct assignment
Jacob Keller [Wed, 12 Jul 2017 09:46:05 +0000 (05:46 -0400)]
i40e: use cpumask_copy instead of direct assignment

According to the header file cpumask.h, we shouldn't be directly copying
a cpumask_t, since its a bitmap and might not be copied correctly. Lets
use the provided cpumask_copy() function instead.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Orabug: 26822609

(cherry picked from commit 7e4d01e7d3f7d4f7b0a768a1028cb26ea06c8694)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Dib Chatterjee <dib.chatterjee@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agomm: thp: set THP defrag by default to madvise and add a stall-free defrag option
Mel Gorman [Thu, 17 Mar 2016 21:19:23 +0000 (14:19 -0700)]
mm: thp: set THP defrag by default to madvise and add a stall-free defrag option

Orabug: 26587019

THP defrag is enabled by default to direct reclaim/compact but not wake
kswapd in the event of a THP allocation failure.  The problem is that
THP allocation requests potentially enter reclaim/compaction.  This
potentially incurs a severe stall that is not guaranteed to be offset by
reduced TLB misses.  While there has been considerable effort to reduce
the impact of reclaim/compaction, it is still a high cost and workloads
that should fit in memory fail to do so.  Specifically, a simple
anon/file streaming workload will enter direct reclaim on NUMA at least
even though the working set size is 80% of RAM.  It's been years and
it's time to throw in the towel.

First, this patch defines THP defrag as follows;

 madvise: A failed allocation will direct reclaim/compact if the application requests it
 never:   Neither reclaim/compact nor wake kswapd
 defer:   A failed allocation will wake kswapd/kcompactd
 always:  A failed allocation will direct reclaim/compact (historical behaviour)
          khugepaged defrag will enter direct/reclaim but not wake kswapd.

Next it sets the default defrag option to be "madvise" to only enter
direct reclaim/compaction for applications that specifically requested
it.

Lastly, it removes a check from the page allocator slowpath that is
related to __GFP_THISNODE to allow "defer" to work.  The callers that
really cares are slub/slab and they are updated accordingly.  The slab
one may be surprising because it also corrects a comment as kswapd was
never woken up by that path.

This means that a THP fault will no longer stall for most applications
by default and the ideal for most users that get THP if they are
immediately available.  There are still options for users that prefer a
stall at startup of a new application by either restoring historical
behaviour with "always" or pick a half-way point with "defer" where
kswapd does some of the work in the background and wakes kcompactd if
necessary.  THP defrag for khugepaged remains enabled and will enter
direct/reclaim but no wakeup kswapd or kcompactd.

After this patch a THP allocation failure will quickly fallback and rely
on khugepaged to recover the situation at some time in the future.  In
some cases, this will reduce THP usage but the benefit of THP is hard to
measure and not a universal win where as a stall to reclaim/compaction
is definitely measurable and can be painful.

The first test for this is using "usemem" to read a large file and write
a large anonymous mapping (to avoid the zero page) multiple times.  The
total size of the mappings is 80% of RAM and the benchmark simply
measures how long it takes to complete.  It uses multiple threads to see
if that is a factor.  On UMA, the performance is almost identical so is
not reported but on NUMA, we see this

usemem
                                   4.4.0                 4.4.0
                          kcompactd-v1r1         nodefrag-v1r3
Amean    System-1       102.86 (  0.00%)       46.81 ( 54.50%)
Amean    System-4        37.85 (  0.00%)       34.02 ( 10.12%)
Amean    System-7        48.12 (  0.00%)       46.89 (  2.56%)
Amean    System-12       51.98 (  0.00%)       56.96 ( -9.57%)
Amean    System-21       80.16 (  0.00%)       79.05 (  1.39%)
Amean    System-30      110.71 (  0.00%)      107.17 (  3.20%)
Amean    System-48      127.98 (  0.00%)      124.83 (  2.46%)
Amean    Elapsd-1       185.84 (  0.00%)      105.51 ( 43.23%)
Amean    Elapsd-4        26.19 (  0.00%)       25.58 (  2.33%)
Amean    Elapsd-7        21.65 (  0.00%)       21.62 (  0.16%)
Amean    Elapsd-12       18.58 (  0.00%)       17.94 (  3.43%)
Amean    Elapsd-21       17.53 (  0.00%)       16.60 (  5.33%)
Amean    Elapsd-30       17.45 (  0.00%)       17.13 (  1.84%)
Amean    Elapsd-48       15.40 (  0.00%)       15.27 (  0.82%)

For a single thread, the benchmark completes 43.23% faster with this
patch applied with smaller benefits as the thread increases.  Similar,
notice the large reduction in most cases in system CPU usage.  The
overall CPU time is

               4.4.0       4.4.0
        kcompactd-v1r1 nodefrag-v1r3
User        10357.65    10438.33
System       3988.88     3543.94
Elapsed      2203.01     1634.41

Which is substantial. Now, the reclaim figures

                                 4.4.0       4.4.0
                          kcompactd-v1r1nodefrag-v1r3
Minor Faults                 128458477   278352931
Major Faults                   2174976         225
Swap Ins                      16904701           0
Swap Outs                     17359627           0
Allocation stalls                43611           0
DMA allocs                           0           0
DMA32 allocs                  19832646    19448017
Normal allocs                614488453   580941839
Movable allocs                       0           0
Direct pages scanned          24163800           0
Kswapd pages scanned                 0           0
Kswapd pages reclaimed               0           0
Direct pages reclaimed        20691346           0
Compaction stalls                42263           0
Compaction success                 938           0
Compaction failures              41325           0

This patch eliminates almost all swapping and direct reclaim activity.
There is still overhead but it's from NUMA balancing which does not
identify that it's pointless trying to do anything with this workload.

I also tried the thpscale benchmark which forces a corner case where
compaction can be used heavily and measures the latency of whether base
or huge pages were used

thpscale Fault Latencies
                                       4.4.0                 4.4.0
                              kcompactd-v1r1         nodefrag-v1r3
Amean    fault-base-1      5288.84 (  0.00%)     2817.12 ( 46.73%)
Amean    fault-base-3      6365.53 (  0.00%)     3499.11 ( 45.03%)
Amean    fault-base-5      6526.19 (  0.00%)     4363.06 ( 33.15%)
Amean    fault-base-7      7142.25 (  0.00%)     4858.08 ( 31.98%)
Amean    fault-base-12    13827.64 (  0.00%)    10292.11 ( 25.57%)
Amean    fault-base-18    18235.07 (  0.00%)    13788.84 ( 24.38%)
Amean    fault-base-24    21597.80 (  0.00%)    24388.03 (-12.92%)
Amean    fault-base-30    26754.15 (  0.00%)    19700.55 ( 26.36%)
Amean    fault-base-32    26784.94 (  0.00%)    19513.57 ( 27.15%)
Amean    fault-huge-1      4223.96 (  0.00%)     2178.57 ( 48.42%)
Amean    fault-huge-3      2194.77 (  0.00%)     2149.74 (  2.05%)
Amean    fault-huge-5      2569.60 (  0.00%)     2346.95 (  8.66%)
Amean    fault-huge-7      3612.69 (  0.00%)     2997.70 ( 17.02%)
Amean    fault-huge-12     3301.75 (  0.00%)     6727.02 (-103.74%)
Amean    fault-huge-18     6696.47 (  0.00%)     6685.72 (  0.16%)
Amean    fault-huge-24     8000.72 (  0.00%)     9311.43 (-16.38%)
Amean    fault-huge-30    13305.55 (  0.00%)     9750.45 ( 26.72%)
Amean    fault-huge-32     9981.71 (  0.00%)    10316.06 ( -3.35%)

The average time to fault pages is substantially reduced in the majority
of caseds but with the obvious caveat that fewer THPs are actually used
in this adverse workload

                                   4.4.0                 4.4.0
                          kcompactd-v1r1         nodefrag-v1r3
Percentage huge-1         0.71 (  0.00%)       14.04 (1865.22%)
Percentage huge-3        10.77 (  0.00%)       33.05 (206.85%)
Percentage huge-5        60.39 (  0.00%)       38.51 (-36.23%)
Percentage huge-7        45.97 (  0.00%)       34.57 (-24.79%)
Percentage huge-12       68.12 (  0.00%)       40.07 (-41.17%)
Percentage huge-18       64.93 (  0.00%)       47.82 (-26.35%)
Percentage huge-24       62.69 (  0.00%)       44.23 (-29.44%)
Percentage huge-30       43.49 (  0.00%)       55.38 ( 27.34%)
Percentage huge-32       50.72 (  0.00%)       51.90 (  2.35%)

                                 4.4.0       4.4.0
                          kcompactd-v1r1nodefrag-v1r3
Minor Faults                  37429143    47564000
Major Faults                      1916        1558
Swap Ins                          1466        1079
Swap Outs                      2936863      149626
Allocation stalls                62510           3
DMA allocs                           0           0
DMA32 allocs                   6566458     6401314
Normal allocs                216361697   216538171
Movable allocs                       0           0
Direct pages scanned          25977580       17998
Kswapd pages scanned                 0     3638931
Kswapd pages reclaimed               0      207236
Direct pages reclaimed         8833714          88
Compaction stalls               103349           5
Compaction success                 270           4
Compaction failures             103079           1

Note again that while this does swap as it's an aggressive workload, the
direct relcim activity and allocation stalls is substantially reduced.
There is some kswapd activity but ftrace showed that the kswapd activity
was due to normal wakeups from 4K pages being allocated.
Compaction-related stalls and activity are almost eliminated.

I also tried the stutter benchmark.  For this, I do not have figures for
NUMA but it's something that does impact UMA so I'll report what is
available

stutter
                                 4.4.0                 4.4.0
                        kcompactd-v1r1         nodefrag-v1r3
Min         mmap      7.3571 (  0.00%)      7.3438 (  0.18%)
1st-qrtle   mmap      7.5278 (  0.00%)     17.9200 (-138.05%)
2nd-qrtle   mmap      7.6818 (  0.00%)     21.6055 (-181.25%)
3rd-qrtle   mmap     11.0889 (  0.00%)     21.8881 (-97.39%)
Max-90%     mmap     27.8978 (  0.00%)     22.1632 ( 20.56%)
Max-93%     mmap     28.3202 (  0.00%)     22.3044 ( 21.24%)
Max-95%     mmap     28.5600 (  0.00%)     22.4580 ( 21.37%)
Max-99%     mmap     29.6032 (  0.00%)     25.5216 ( 13.79%)
Max         mmap   4109.7289 (  0.00%)   4813.9832 (-17.14%)
Mean        mmap     12.4474 (  0.00%)     19.3027 (-55.07%)

This benchmark is trying to fault an anonymous mapping while there is a
heavy IO load -- a scenario that desktop users used to complain about
frequently.  This shows a mix because the ideal case of mapping with THP
is not hit as often.  However, note that 99% of the mappings complete
13.79% faster.  The CPU usage here is particularly interesting

               4.4.0       4.4.0
        kcompactd-v1r1nodefrag-v1r3
User           67.50        0.99
System       1327.88       91.30
Elapsed      2079.00     2128.98

And once again we look at the reclaim figures

                                 4.4.0       4.4.0
                          kcompactd-v1r1nodefrag-v1r3
Minor Faults                 335241922  1314582827
Major Faults                       715         819
Swap Ins                             0           0
Swap Outs                            0           0
Allocation stalls               532723           0
DMA allocs                           0           0
DMA32 allocs                1822364341  1177950222
Normal allocs               1815640808  1517844854
Movable allocs                       0           0
Direct pages scanned          21892772           0
Kswapd pages scanned          20015890    41879484
Kswapd pages reclaimed        19961986    41822072
Direct pages reclaimed        21892741           0
Compaction stalls              1065755           0
Compaction success                 514           0
Compaction failures            1065241           0

Allocation stalls and all direct reclaim activity is eliminated as well
as compaction-related stalls.

THP gives impressive gains in some cases but only if they are quickly
available.  We're not going to reach the point where they are completely
free so lets take the costs out of the fast paths finally and defer the
cost to kswapd, kcompactd and khugepaged where it belongs.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
7 years agocrypto: testmgr - Set struct aead_testvec iv member size to MAX_IVLEN
Somasundaram Krishnasamy [Mon, 18 Sep 2017 22:40:33 +0000 (15:40 -0700)]
crypto: testmgr - Set struct aead_testvec iv member size to MAX_IVLEN

Orabug: 25925256

When setup macsec driver or running IPsec esp aead tests, KASan reports
out of bound access by memcpy().

BUG: KASan: out of bounds access in memcpy+0x21/0x50 at addr ffffffff81ce8780
Read of size 16 by task cryptomgr_test/7394
Address belongs to variable deflate_comp_params+0xdac0/0x20200
CPU: 23 PID: 7394 Comm: cryptomgr_test Tainted: G    B       E
4.1.12-96.el7uek.kasan.x86_64 #2
Hardware name: Oracle Corporation SUN SERVER X4-2/ASSY,MOTHERBOARD,1U, BIOS 25010603 01/16/2014
ffffffff81ce8780 000000004127a5c6 ffff881b44acf858 ffffffff81b6629e
ffff881b44acf8e8 ffffffff81ce8780 ffff881b44acf8d8 ffffffff81302d54
ffff881b44acf8a8 ffff881c3449e110 0000000000000296 0000000000000400
Call Trace:
[<ffffffff81b6629e>] dump_stack+0x63/0x81
[<ffffffff81302d54>] kasan_report_error+0x3e4/0x420
[<ffffffff813033d8>] kasan_report+0x58/0x60
[<ffffffff81302421>] ? memcpy+0x21/0x50
[<ffffffff81301f21>] __asan_loadN+0x1c1/0x1d0
[<ffffffffa09d2423>] ? crypto_gcm_encrypt+0x1d3/0x1e0 [gcm]
[<ffffffff81510479>] ? memcmp+0x69/0xa0
[<ffffffff81302421>] memcpy+0x21/0x50
[<ffffffff8148ed0d>] __test_aead+0xa5d/0x1d90
[<ffffffff8147bc0f>] ? crypto_alloc_base+0x5f/0x150
[<ffffffff8148e2b0>] ? alg_test_crc32c+0x1f0/0x1f0
[<ffffffffa08661d5>] ? ablk_ctr_init+0x15/0x20 [aesni_intel]
[<ffffffff8147e10e>] ? crypto_spawn_tfm+0x4e/0x90
[<ffffffff81484502>] ? async_chainiv_init+0xa2/0xb0
[<ffffffff8147e10e>] ? crypto_spawn_tfm+0x4e/0x90
[<ffffffff8147bb31>] ? __crypto_alloc_tfm+0x181/0x200
[<ffffffff814900ff>] test_aead+0xbf/0xd0
[<ffffffff81490177>] alg_test_aead+0x67/0xf0
[<ffffffff8148b332>] alg_test+0x242/0x520
[<ffffffff8148b0f0>] ? alg_find_test+0xa0/0xa0
[<ffffffff8110c573>] ? finish_task_switch+0xc3/0x240
[<ffffffff81b6965e>] ? __schedule+0x39e/0xb90
[<ffffffff81488f30>] ? crypto_unregister_pcomp+0x20/0x20
[<ffffffff81488f86>] cryptomgr_test+0x56/0x60
[<ffffffff810ffa58>] kthread+0x178/0x1a0
[<ffffffff810ff8e0>] ? kthread_create_on_node+0x270/0x270
[<ffffffff810ff8e0>] ? kthread_create_on_node+0x270/0x270
[<ffffffff81b71122>] ret_from_fork+0x42/0x70
[<ffffffff810ff8e0>] ? kthread_create_on_node+0x270/0x270
Memory state around the buggy address:
ffffffff81ce8680: 01 fa fa fa fa fa fa fa 00 00 00 00 01 fa fa fa
ffffffff81ce8700: fa fa fa fa 00 00 00 00 01 fa fa fa fa fa fa fa
>ffffffff81ce8780: 00 05 fa fa fa fa fa fa 00 00 00 00 00 00 00 00
                       ^
ffffffff81ce8800: 00 00 01 fa fa fa fa fa 00 00 00 00 00 00 00 00
ffffffff81ce8880: 01 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00

This problem is due to the test aes_gcm_enc/dec test templates have actual IV
size of 13 bytes, but alg copies 16 bytes which leads to out of bound access.
The fix is to initialize the iv member to MAX_IV_SIZE.

Fixes: b824b1aa827f ("crypto: testmgr - fix out of bound read in __test_aead()")
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
7 years agoSPEC: remove ctf.ko from ueknano modules list
Nick Alcock [Tue, 19 Sep 2017 15:47:44 +0000 (16:47 +0100)]
SPEC: remove ctf.ko from ueknano modules list

This module no longer exists, post-CTF-decoupling.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Victor Erminpour <victor.erminpour@oracle.com>
Orabug: 25815362

7 years agoSPEC: generate CTF when DTrace is enabled.
Nick Alcock [Wed, 6 Sep 2017 10:45:51 +0000 (11:45 +0100)]
SPEC: generate CTF when DTrace is enabled.

CTF is not yet generated for debug kernels, but this is purely because
the ctf target is unavailable because CONFIG_CTF is disabled in
debug kernels, despite with_dtrace being set.  If and when CONFIG_DTRACE
(and thus CONFIG_CTF) are enabled in debug kernels, we can turn on CTF
building there without incident.

(Note: non-RPM builds are now much faster than before, since they don't
generate CTF unless you ask it to, but we cannot really avoid generating
CTF for RPM builds, since DTrace needs it. Future commits will speed up
CTF generation significantly, but for now we have to take the hit, just
as we have been before now.)

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Reviewed-by: Victor Erminpour <victor.erminpour@oracle.com>
Orabug: 25815362

7 years agoSPEC: bump libdtrace-ctf requirement to 0.7+.
Nick Alcock [Tue, 5 Sep 2017 21:39:34 +0000 (22:39 +0100)]
SPEC: bump libdtrace-ctf requirement to 0.7+.

This version includes the CTF archive support needed to build
CTF into an archive rather than linking in into modules.

It is backwardly binary-, source-, and CTF-format-compatible with
current releases (0.5, 0.6).

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Reviewed-by: Victor Erminpour <victor.erminpour@oracle.com>
Orabug: 25815362

7 years agoMerge branch 'ctf' into uek/uek-4.1-QU6-next
Nick Alcock [Fri, 22 Sep 2017 11:45:10 +0000 (12:45 +0100)]
Merge branch 'ctf' into uek/uek-4.1-QU6-next

7 years agoDocumentation: add watermark_scale_factor to the list of vm systcl file
Jerome Marchand [Tue, 12 Jul 2016 10:05:59 +0000 (12:05 +0200)]
Documentation: add watermark_scale_factor to the list of vm systcl file

Commit 795ae7a0de6b ("mm: scale kswapd watermarks in proportion to
memory") properly added the description of the new knob to
Documentation/sysctl/vm.txt, but forgot to add it to the list of files
in /proc/sys/vm. Let's fix that.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
(cherry picked from commit e6507a00fd08986ce003012a10af78cc7e47eee8)

Orabug: 26643957

Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Reviewed-by: Larry Bassel <larry.bassel@oracle.com>
Reviewed-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Reviewed-by: Todd Vierling <todd.vierling@oracle.com>
7 years agomm: scale kswapd watermarks in proportion to memory
Johannes Weiner [Thu, 17 Mar 2016 21:19:14 +0000 (14:19 -0700)]
mm: scale kswapd watermarks in proportion to memory

In machines with 140G of memory and enterprise flash storage, we have
seen read and write bursts routinely exceed the kswapd watermarks and
cause thundering herds in direct reclaim.  Unfortunately, the only way
to tune kswapd aggressiveness is through adjusting min_free_kbytes - the
system's emergency reserves - which is entirely unrelated to the
system's latency requirements.  In order to get kswapd to maintain a
250M buffer of free memory, the emergency reserves need to be set to 1G.
That is a lot of memory wasted for no good reason.

On the other hand, it's reasonable to assume that allocation bursts and
overall allocation concurrency scale with memory capacity, so it makes
sense to make kswapd aggressiveness a function of that as well.

Change the kswapd watermark scale factor from the currently fixed 25% of
the tunable emergency reserve to a tunable 0.1% of memory.

Beyond 1G of memory, this will produce bigger watermark steps than the
current formula in default settings.  Ensure that the new formula never
chooses steps smaller than that, i.e.  25% of the emergency reserve.

On a 140G machine, this raises the default watermark steps - the
distance between min and low, and low and high - from 16M to 143M.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 795ae7a0de6b834a0cc202aa55c190ef81496665)

Orabug: 26643957

Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Reviewed-by: Larry Bassel <larry.bassel@oracle.com>
Reviewed-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Reviewed-by: Todd Vierling <todd.vierling@oracle.com>
7 years agoctf: delete the deduplication blacklist
Nick Alcock [Thu, 7 Sep 2017 09:10:10 +0000 (10:10 +0100)]
ctf: delete the deduplication blacklist

This kludge has been automated away.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Orabug: 26765112

7 years agoctf: automate away the deduplication blacklist
Nick Alcock [Wed, 6 Sep 2017 10:45:51 +0000 (11:45 +0100)]
ctf: automate away the deduplication blacklist

The deduplication blacklist in scripts/dwarf2ctf/dedup.blacklist is a
great bit kludge.  It contains a list of modules that cannot be
deduplicated because they contain structures which are defined in the
same location in different ways different kernel modules (usually
because the structure is modified by preprocessor conditionals).  But
augmenting the blacklist is a pig, involving lots of poring over
debugging output to find the structure to focus on.

So automate the problem away, by augmenting type IDs for structures with
the sizeof() the structure in a new component (separated from the others
by //, a component invalid in POSIX pathnames, as usual).  Helpfully
this is made available to us in the DW_AT_byte_size attribute, so it's
fast to obtain.  (The component is optional because opaque structure
declarations obviously cannot include it.)

We adjust the one place that transforms transparent structure IDs into
opaque ones to take this tag into account.

This will still break for structures that are modified by preprocessor
conditionals in such a way that one member is replaced by another with a
different type but which has the same size as the one it replaces
(perhaps one pointer to a structure being replaced by a pointer to a
different structure), but in the interests of dwarf2ctf performance I'm
avoiding solving this for now, since we are not hitting it, and solving
it would require annotating structure IDs with some sort of hash of
their member names: the overhead of recursing over all members every
time we get an ID for a structure seems likely to be quite high, given
how often we look up type_id()s.

This change has no detectable effect on dwarf2ctf runtime, and shrinks
the CTF output by about 40KiB.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Orabug: 26765112

7 years agoctf: drop CONFIG_DT_DISABLE_CTF, ctf.ko, and all that it implies
Nick Alcock [Tue, 5 Sep 2017 21:25:34 +0000 (22:25 +0100)]
ctf: drop CONFIG_DT_DISABLE_CTF, ctf.ko, and all that it implies

Now that CTF is decoupled from the kernel build and built into a
separate archive, there is no longer any need to drag around a
fake ctf.ko module to contain the shared and built-in CTF info.
Drop it, and kernel/ctf/, and the code to autoload it when
dtrace.ko is loaded, and move its Kconfig contents into
lib/Kconfig (which used to include kernel/ctf/Kconfig).

Furthermore, now that CTF is built on demand and not unconditionally
built every time the kernel is, there is no longer any need for
the speedup hack CONFIG_DT_DISABLE_CTF.  Drop it.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Reviewed-by: Victor Erminpour <victor.erminpour@oracle.com>
Orabug: 25815362

7 years agoctf: do not allow dwarf2ctf to run as root
Nick Alcock [Wed, 19 Jul 2017 14:44:05 +0000 (15:44 +0100)]
ctf: do not allow dwarf2ctf to run as root

This is just insanely dangerous: with the addition of the CTF_DEBUGDIR
info it reads almost arbitrary DWARF.  elfutils is not root-rated and
frankly neither is dwarf2ctf, valgrind or no valgrind.  It's just too
complicated to risk that way.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Orabug: 25815362

7 years agoctf: decouple CTF building from the kernel build
Nick Alcock [Wed, 19 Jul 2017 14:34:14 +0000 (15:34 +0100)]
ctf: decouple CTF building from the kernel build

This change causes CTF types for the core kernel and modules to be
generated only when the new 'ctf' make target is invoked.  The CTF content
is emitted into a CTF archive with the default name of vmlinux.ctfa (the
name read by DTrace userspace): this can be changed via the CTF_FILENAME
makefile variable.  If 'make ctf' has been run, 'make modules_install'
will install the generated CTF archive into the appropriate place. (If
CTF_FILENAME was specified on the 'make ctf' line, it needs to be passed
to 'make modules_install' as well for this to work.)

The existing link-into-modules machinery is still used for out-of-tree
modules, since these obviously cannot be visible when the vmlinux.ctfa
is built.

Usually the ctf target is invoked by kernel-uek.spec, but it can also
be invoked by developers if they know they have changed type or global
variable info while developing and would like DTrace to be able to
introspect the new data, or if they are building a kernel for the
first time and would like DTrace to be able to see its types at all.
(The archive format is fairly robust: you can often just copy
vmlinux.ctfa from one kernel to another, and types that have not
changed will continue to work with the new kernel.)

This depends on new machinery in libdtrace-ctf 0.7 or higher.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Tomas Jedlicka <tomas.jedlicka@oracle.com>
Reviewed-by: Victor Erminpour <victor.erminpour@oracle.com>
Orabug: 25815362

7 years agooracleasm: Copy the integrity descriptor v4.1.12-111.0.20170918_2215
Martin K. Petersen [Fri, 11 Aug 2017 04:19:43 +0000 (00:19 -0400)]
oracleasm: Copy the integrity descriptor

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Divya Indi <divya.indi@oracle.com>
The original code made assumptions about the oracleasm integrity
descriptor hanging off of check_asm_ioc being mapped. Make sure we
properly copy and validate the descriptor before use.

Orabug: 26559128

7 years agoRDS: IB: Add proxy qp to support FRWR through RDS_GET_MR
Avinash Repaka [Thu, 13 Apr 2017 01:00:05 +0000 (18:00 -0700)]
RDS: IB: Add proxy qp to support FRWR through RDS_GET_MR

MR registration requested through RDS_GET_MR socket option will not have
any connection details. So, there isn't an appropriate qp to post the
registration/invalidation requests. This patch solves that issue by
using a proxy qp.

Orabug: 25669255

Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Tested-by: Gerald Gibson <gerald.gibson@oracle.com>
Tested-by: Efrain Galaviz <efrain.galaviz@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
7 years agoRDS: Add support for fast registration work request
Avinash Repaka [Thu, 17 Aug 2017 21:02:47 +0000 (14:02 -0700)]
RDS: Add support for fast registration work request

This patch adds support for MR registration through work request in RDS,
commonly referred as FRWR/fastreg/FRMR.

With this patch added, RDS chooses the registration method, between FMR
and FRWR, based on the preference given through 'prefer_frwr' module
parameter and the support offered by the underlying device.

Please note that this patch is adding support for MR registration done
only through CMSG. Support for registrations through RDS_GET_MR socket
option will be added through another patch.

Orabug: 22145384

Suggested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Tested-by: Gerald Gibson <gerald.gibson@oracle.com>
Tested-by: Efrain Galaviz <efrain.galaviz@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
7 years agoscsi: qedi: Limit number for CQ queues.
Manish Rangankar [Thu, 10 Aug 2017 13:32:17 +0000 (06:32 -0700)]
scsi: qedi: Limit number for CQ queues.

Orabug: 26759520

[qed_sp_iscsi_func_start:189(host_7-0)]Cannot satisfy CQ amount. Queues
requested 8, CQs available 4. Aborting function start

Above condition will resolve as management firmware is capable of
telling us the number of CQs available for a given PF, qed will
communicate the same number to qedi, So that qedi will know how much CQs
are allowed.

Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: fix another spelling mistake: "alloction" -> "allocation"
Colin Ian King [Mon, 3 Jul 2017 10:24:02 +0000 (11:24 +0100)]
scsi: qedi: fix another spelling mistake: "alloction" -> "allocation"

Orabug: 26759520

Trivial fix to spelling mistake in QEDF_ERR message. I should have also
included this in a previous fix, but I only just spotted this one.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: Add ISCSI_BOOT_SYSFS to Kconfig
Nilesh Javali [Wed, 19 Jul 2017 09:07:55 +0000 (02:07 -0700)]
scsi: qedi: Add ISCSI_BOOT_SYSFS to Kconfig

Orabug: 26759520

qedi uses iscsi_boot_sysfs to export the targets used for boot to
sysfs. Select the config option to make sure the module is built.

This addresses the compile time issue,
    drivers/scsi/qedi/qedi_main.o: In function `qedi_remove':
    qedi_main.c:(.text+0x3bbd): undefined reference to `iscsi_boot_destroy_kset'
    drivers/scsi/qedi/qedi_main.o: In function `__qedi_probe.constprop.0':
    qedi_main.c:(.text+0x577a): undefined reference to `iscsi_boot_create_target'
    qedi_main.c:(.text+0x5807): undefined reference to `iscsi_boot_create_target'
    qedi_main.c:(.text+0x587f): undefined reference to `iscsi_boot_create_initiator'
    qedi_main.c:(.text+0x58f3): undefined reference to `iscsi_boot_create_ethernet'
    qedi_main.c:(.text+0x5927): undefined reference to `iscsi_boot_destroy_kset'
    qedi_main.c:(.text+0x5d7b): undefined reference to `iscsi_boot_create_host_kset'

[mkp: fixed whitespace]

Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
Fixes: c57ec8fb7c02 ("scsi: qedi: Add support for Boot from SAN over iSCSI offload")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: Add support for Boot from SAN over iSCSI offload
Nilesh Javali [Tue, 27 Jun 2017 09:26:56 +0000 (02:26 -0700)]
scsi: qedi: Add support for Boot from SAN over iSCSI offload

Orabug: 26759520

This patch adds support for Boot from SAN over iSCSI offload. The iSCSI
boot information in the NVRAM is populated under
/sys/firmware/iscsi_bootX/ using qed NVM-image reading API and further
exported to open-iscsi to perform iSCSI login enabling boot over offload
iSCSI interface in a Boot from SAN environment.

Signed-off-by: Arun Easi <arun.easi@cavium.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@cavium.com>
Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: Remove WARN_ON from clear task context.
Manish Rangankar [Thu, 15 Jun 2017 07:10:40 +0000 (00:10 -0700)]
scsi: qedi: Remove WARN_ON from clear task context.

Orabug: 26759520

Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: Remove WARN_ON for untracked cleanup.
Manish Rangankar [Thu, 15 Jun 2017 07:10:39 +0000 (00:10 -0700)]
scsi: qedi: Remove WARN_ON for untracked cleanup.

Orabug: 26759520

Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: Remove comparison of u16 idx with zero.
Christos Gkekas [Sat, 24 Jun 2017 16:24:45 +0000 (17:24 +0100)]
scsi: qedi: Remove comparison of u16 idx with zero.

Orabug: 26759520

Variable idx is defined as u16 thus statement (idx < 0) is always false
and should be removed.

Signed-off-by: Christos Gkekas <chris.gekas@gmail.com>
Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: Fix return code in qedi_ep_connect()
Dan Carpenter [Wed, 12 Jul 2017 07:31:21 +0000 (10:31 +0300)]
scsi: qedi: Fix return code in qedi_ep_connect()

Orabug: 26759520

We shouldn't be writing over the "ret" variable.  It means we return
ERR_PTR(0) which is NULL and it results in a NULL dereference in the
caller.

Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: Fix endpoint NULL panic during recovery.
manish.rangankar@cavium.com [Fri, 19 May 2017 08:33:21 +0000 (01:33 -0700)]
scsi: qedi: Fix endpoint NULL panic during recovery.

Orabug: 26759520

Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: set max_fin_rt default value
Nilesh Javali [Fri, 19 May 2017 08:33:20 +0000 (01:33 -0700)]
scsi: qedi: set max_fin_rt default value

Orabug: 26759520

max_fin_rt is the maximum re-transmission of FIN packets
as part of the termination flow. After reaching this value
the FW will send a single RESET.

Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: Set firmware tcp msl timer value.
manish.rangankar@cavium.com [Fri, 19 May 2017 08:33:19 +0000 (01:33 -0700)]
scsi: qedi: Set firmware tcp msl timer value.

Orabug: 26759520

Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: Fix endpoint NULL panic in qedi_set_path.
manish.rangankar@cavium.com [Fri, 19 May 2017 08:33:18 +0000 (01:33 -0700)]
scsi: qedi: Fix endpoint NULL panic in qedi_set_path.

Orabug: 26759520

RIP: 0010:qedi_set_path+0x114/0x570 [qedi]
 Call Trace:
  [<ffffffffa0472923>] iscsi_if_recv_msg+0x623/0x14a0
  [<ffffffff81307de6>] ? rhashtable_lookup_compare+0x36/0x70
  [<ffffffffa047382e>] iscsi_if_rx+0x8e/0x1f0
  [<ffffffff8155983d>] netlink_unicast+0xed/0x1b0
  [<ffffffff81559c30>] netlink_sendmsg+0x330/0x770
  [<ffffffff81510d60>] sock_sendmsg+0xb0/0xf0
  [<ffffffff8101360b>] ? __switch_to+0x17b/0x4b0
  [<ffffffff8163a2c8>] ? __schedule+0x2d8/0x900
  [<ffffffff81511199>] ___sys_sendmsg+0x3a9/0x3c0
  [<ffffffff810e2298>] ? get_futex_key+0x1c8/0x2b0
  [<ffffffff810e25a0>] ? futex_wake+0x80/0x160

Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: Set dma_boundary to 0xfff.
manish.rangankar@cavium.com [Fri, 19 May 2017 08:33:17 +0000 (01:33 -0700)]
scsi: qedi: Set dma_boundary to 0xfff.

Orabug: 26759520

Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: Correctly set firmware max supported BDs.
manish.rangankar@cavium.com [Fri, 19 May 2017 08:33:16 +0000 (01:33 -0700)]
scsi: qedi: Correctly set firmware max supported BDs.

Orabug: 26759520

Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: qedi: Fix bad pte call trace when iscsiuio is stopped.
Arun Easi [Fri, 19 May 2017 08:33:15 +0000 (01:33 -0700)]
scsi: qedi: Fix bad pte call trace when iscsiuio is stopped.

Orabug: 26759520

munmap done by iscsiuio during a stop of the service triggers a "bad
pte" warning sometimes. munmap kernel path goes through the mmapped
pages and has a validation check for mapcount (in struct page) to be
zero or above. kzalloc, which we had used to allocate udev->ctrl, uses
slab allocations, which re-uses mapcount (union) for other purposes that
can make the mapcount look negative. Avoid all these trouble by invoking
one of the __get_free_pages wrappers to be used instead of kzalloc for
udev->ctrl.

 BUG: Bad page map in process iscsiuio  pte:80000000aa624067 pmd:3e6777067
 page:ffffea0002a98900 count:2 mapcount:-2143289280
     mapping: (null) index:0xffff8800aa624e00
 page flags: 0x10075d00000090(dirty|slab)
 page dumped because: bad pte
 addr:00007fcba70a3000 vm_flags:0c0400fb anon_vma: (null)
     mapping:ffff8803edf66e90 index:0

 Call Trace:
     dump_stack+0x19/0x1b
     print_bad_pte+0x1af/0x250
     unmap_page_range+0x7a7/0x8a0
     unmap_single_vma+0x81/0xf0
     unmap_vmas+0x49/0x90
     unmap_region+0xbe/0x140
     ? vma_rb_erase+0x121/0x220
     do_munmap+0x245/0x420
     vm_munmap+0x41/0x60
     SyS_munmap+0x22/0x30
     tracesys+0xdd/0xe2

Signed-off-by: Arun Easi <arun.easi@cavium.com>
Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Fix build errors.
Somasundaram Krishnasamy [Wed, 13 Sep 2017 22:58:18 +0000 (15:58 -0700)]
qed: Fix build errors.

Orabug: 26783820

Fix build errors caused by the below upstream patches.

7b6859fbdcc4a590c8ef03bcc00d770b42d41c42 qed: Utilize FW 8.20.0.0
712c3cbf193fcadf0ba67da61432beb1a71e400b qed: Replace set_id() api with set_name()

Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoconfig: add CONFIG_INFINIBAND_QEDR
Brian Maly [Fri, 15 Sep 2017 18:38:35 +0000 (14:38 -0400)]
config: add CONFIG_INFINIBAND_QEDR

Orabug: 26759520

Add but do not enable CONFIG_INFINIBAND_QEDR for now.

Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: fix spelling mistake: "calescing" -> "coalescing"
Colin Ian King [Wed, 30 Aug 2017 11:40:12 +0000 (12:40 +0100)]
qed: fix spelling mistake: "calescing" -> "coalescing"

Orabug: 26783820

Trivial fix to spelling mistake in DP_NOTICE message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()'
Christophe Jaillet [Sun, 6 Aug 2017 22:00:17 +0000 (00:00 +0200)]
qed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()'

Orabug: 26783820

We allocate 'p_info->mfw_mb_cur' and 'p_info->mfw_mb_shadow' but we check
'p_info->mfw_mb_addr' instead of 'p_info->mfw_mb_cur'.

'p_info->mfw_mb_addr' is never 0, because it is initiliazed a few lines
above in 'qed_load_mcp_offsets()'.

Update the test and check the result of the 2 'kzalloc()' instead.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit eb2a6b800c2d1336cce6709d48c42753a611c07b ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: enhanced per queue max coalesce value.
Rahul Verma [Wed, 26 Jul 2017 13:07:15 +0000 (06:07 -0700)]
qed: enhanced per queue max coalesce value.

Orabug: 26783820

Maximum coalesce per Rx/Tx queue is extended from
255 to 511.

Signed-off-by: Rahul Verma <rahul.verma@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Read per queue coalesce from hardware
Rahul Verma [Wed, 26 Jul 2017 13:07:14 +0000 (06:07 -0700)]
qed: Read per queue coalesce from hardware

Orabug: 26783820

Retrieve the actual coalesce value from hardware for every Rx/Tx
queue, instead of Rx/Tx coalesce value cached during set coalesce.

Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Add support for vf coalesce configuration.
Rahul Verma [Wed, 26 Jul 2017 13:07:13 +0000 (06:07 -0700)]
qed: Add support for vf coalesce configuration.

Orabug: 26783820

This patch add the ethtool support to set RX/Tx coalesce
value to the VF associated Rx/Tx queues.

Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqede: Add ethtool support for Energy efficient ethernet.
Sudarsana Reddy Kalluru [Wed, 26 Jul 2017 13:07:12 +0000 (06:07 -0700)]
qede: Add ethtool support for Energy efficient ethernet.

Orabug: 26783820

The patch adds ethtool callback implementations for querying/configuring
the Energy Efficient Ethernet (EEE) parameters.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Add support for Energy efficient ethernet.
Sudarsana Reddy Kalluru [Wed, 26 Jul 2017 13:07:11 +0000 (06:07 -0700)]
qed: Add support for Energy efficient ethernet.

Orabug: 26783820

The patch adds required driver support for reading/configuring the
Energy Efficient Ethernet (EEE) parameters.

Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed/qede: Add setter APIs support for RX flow classification
Chopra, Manish [Wed, 26 Jul 2017 13:07:10 +0000 (06:07 -0700)]
qed/qede: Add setter APIs support for RX flow classification

Orabug: 26783820

This patch adds support for adding and deleting rx flow
classification rules. Using this user can classify RX flow
constituting of TCP/UDP 4-tuples [src_ip/dst_ip and src_port/dst_port]
to be steered on a given RX queue

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqede: Add getter APIs support for RX flow classification
Chopra, Manish [Wed, 26 Jul 2017 13:07:09 +0000 (06:07 -0700)]
qede: Add getter APIs support for RX flow classification

Orabug: 26783820

This patch adds support for ethtool getter APIs to query
RX flow classification rules.

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Fix printk option passed when printing ipv6 addresses
Kalderon, Michal [Sun, 9 Jul 2017 10:00:16 +0000 (13:00 +0300)]
qed: Fix printk option passed when printing ipv6 addresses

Orabug: 26783820

The option "h" (host order ) exists for ipv4 only.
Remove the h when printing ipv6 addresses.

Lead to the following smatch warning:

drivers/net/ethernet/qlogic/qed/qed_iwarp.c:585 qed_iwarp_print_tcp_ramrod()
warn: '%pI6' can only be followed by c
drivers/net/ethernet/qlogic/qed/qed_iwarp.c:1521 qed_iwarp_print_cm_info()
warn: '%pI6' can only be followed by c

Fixes commit 456a584947d5 ("qed: iWARP CM add passive side connect")

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 91d1ae475b9833097e078c2581c9265d033cdbe4 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: initialize ll2_syn_handle at start of function
Michal Kalderon [Mon, 3 Jul 2017 18:55:25 +0000 (21:55 +0300)]
qed: initialize ll2_syn_handle at start of function

Orabug: 26783820

Fix compilation warning
qed_iwarp.c:1721:5: warning: ll2_syn_handle may be used
uninitialized in this function

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 25f4535a94c2b38d09912d7e8bab371c9e97be38 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Add iWARP support for physical queue allocation
Kalderon, Michal [Sun, 2 Jul 2017 07:29:32 +0000 (10:29 +0300)]
qed: Add iWARP support for physical queue allocation

Orabug: 26783820

iWARP has different physical queue requirements than RoCE

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 93c45984d385bddf156735991ee0cd15c0753e4d ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Add iWARP protocol support in context allocation
Kalderon, Michal [Sun, 2 Jul 2017 07:29:31 +0000 (10:29 +0300)]
qed: Add iWARP protocol support in context allocation

Orabug: 26783820

When computing how much memory is required for the different hw clients
iWARP protocol should be taken into account

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 5d7dc9620d35ce1f503e2062aa324447e557e3f2 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: iWARP CM add error handling
Kalderon, Michal [Sun, 2 Jul 2017 07:29:30 +0000 (10:29 +0300)]
qed: iWARP CM add error handling

Orabug: 26783820

This patch introduces error handling for errors that occurred during
connection establishment.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 9816b614346925feac1198e33d2dc5201c4ef74e ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: iWARP implement disconnect flows
Kalderon, Michal [Sun, 2 Jul 2017 07:29:29 +0000 (10:29 +0300)]
qed: iWARP implement disconnect flows

Orabug: 26783820

This patch takes care of active/passive disconnect flows.
Disconnect flows can be initiated remotely, in which case a async event
will arrive from peer and indicated to qedr driver. These
are referred to as exceptions. When a QP is destroyed, it needs to check
that it's associated ep has been closed.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit fc4c6065e661224df3db50780219ac53fee56e2b ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: iWARP CM add active side connect
Kalderon, Michal [Sun, 2 Jul 2017 07:29:28 +0000 (10:29 +0300)]
qed: iWARP CM add active side connect

Orabug: 26783820

This patch implements the active side connect.
Offload a connection, process MPA reply and send RTR.
In some of the common passive/active functions, the active side
will work in blocking mode.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 4b0fdd7c8b757125ac7996617d914bbdb9e0348c ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: iWARP CM add passive side connect
Kalderon, Michal [Sun, 2 Jul 2017 07:29:27 +0000 (10:29 +0300)]
qed: iWARP CM add passive side connect

Orabug: 26783820

This patch implements the passive side connect.
It addresses pre-allocating resources, creating a connection
element upon valid SYN packet received. Calling upper layer and
implementation of the accept/reject calls.

Error handling is not part of this patch.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 456a584947d5b92d5e5a62cc68125ab5f150aa8c ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: iWARP CM add listener functions and initial SYN processing
Kalderon, Michal [Sun, 2 Jul 2017 07:29:26 +0000 (10:29 +0300)]
qed: iWARP CM add listener functions and initial SYN processing

Orabug: 26783820

This patch adds the ability to add and remove listeners and identify
whether the SYN packet received is intended for iWARP or not. If
a listener is not found the SYN packet is posted back to the chip.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 65a91a6cdb868a28b919ca133c0f9d9dfd9a635a ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: iWARP CM - setup a ll2 connection for handling SYN packets
Kalderon, Michal [Sun, 2 Jul 2017 07:29:25 +0000 (10:29 +0300)]
qed: iWARP CM - setup a ll2 connection for handling SYN packets

Orabug: 26783820

iWARP handles incoming SYN packets using the ll2 interface. This patch
implements ll2 setup and teardown. Additional ll2 connections will
be used in the future which are not part of this patch series.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit b5c29ca7dab75f29a7df6e82285742f830d8ed1a ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Add iWARP support in ll2 connections
Kalderon, Michal [Sun, 2 Jul 2017 07:29:24 +0000 (10:29 +0300)]
qed: Add iWARP support in ll2 connections

Orabug: 26783820

Add a new connection type for iWARP ll2 connections for setting
correct ll2 filters and connection type to FW.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit cc4ad324e7e247bb4979791dd4f2ff11419d9742 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Rename some ll2 related defines
Kalderon, Michal [Sun, 2 Jul 2017 07:29:23 +0000 (10:29 +0300)]
qed: Rename some ll2 related defines

Orabug: 26783820

Make some names more generic as they will be used by iWARP too.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 526d1d05e456c9cfc077694d18b5f521e2338f18 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Implement iWARP initialization, teardown and qp operations
Kalderon, Michal [Sun, 2 Jul 2017 07:29:22 +0000 (10:29 +0300)]
qed: Implement iWARP initialization, teardown and qp operations

Orabug: 26783820

This patch adds iWARP support for flows that have common code
between RoCE and iWARP, such as initialization, teardown and
qp setup verbs: create, destroy, modify, query.
It introduces the iWARP specific files qed_iwarp.[ch] and
iwarp_common.h

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 67b40dccc45ff5d488aad17114e80e00029fd854 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Introduce iWARP personality
Kalderon, Michal [Sun, 2 Jul 2017 07:29:21 +0000 (10:29 +0300)]
qed: Introduce iWARP personality

Orabug: 26783820

iWARP personality introduced the need for differentiating in several
places in the code whether we are RoCE, iWARP or either. This
leads to introducing new macros for querying the personality.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit c851a9dc4359c6b19722de568e9f543c1c23481c ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed*: qede_roce.[ch] -> qede_rdma.[ch]
Michal Kalderon [Tue, 20 Jun 2017 13:00:03 +0000 (16:00 +0300)]
qed*: qede_roce.[ch] -> qede_rdma.[ch]

Orabug: 26783820

Once we have iWARP support, the qede portion of the qedr<->qede would
serve all the RDMA protocols - so rename the file to be appropriate
to its function.

While we're at it, we're also moving a couple of inclusions to it into
.h files and adding includes to make sure it contains all type
definitions it requires.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit b262a06e642cfb1eeb6c2c772f76dad674ada57e ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Disable RoCE dpm when DCBx change occurs
Mintz, Yuval [Tue, 20 Jun 2017 13:00:02 +0000 (16:00 +0300)]
qed: Disable RoCE dpm when DCBx change occurs

Orabug: 26783820

If DCBx update occurs while QPs are open, stop sending edpms until all
QPs are closed.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 9331dad1bb7f3438c27e4f57136b6ad683d11fe0 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: RoCE EDPM to honor PFC
Mintz, Yuval [Tue, 20 Jun 2017 13:00:01 +0000 (16:00 +0300)]
qed: RoCE EDPM to honor PFC

Orabug: 26783820

Configure device according to DCBx results so that EDPMs
made by RoCE would honor flow-control.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 26462ad9c7ea18643f1a37adeab8b7eff6c5f5f4 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Chain support for external PBL
Mintz, Yuval [Tue, 20 Jun 2017 13:00:00 +0000 (16:00 +0300)]
qed: Chain support for external PBL

Orabug: 26783820

iWARP would require the chains to allocate/free their PBL memory
independently, so add the infrastructure to provide it externally.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 1a4a69751f4d24ffd3530f5a9694636db1566a3b ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Fix an off by one bug
Dan Carpenter [Wed, 14 Jun 2017 09:10:10 +0000 (12:10 +0300)]
qed: Fix an off by one bug

Orabug: 26783820

The p_l2_info->pp_qid_usage[] array has "p_l2_info->queues" elements so
the > here should be a >= or we write beyond the end of the array.

Fixes: bbe3f233ec5e ("qed: Assign a unique per-queue index to queue-cid")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 0331402aeaefe858709b0a4d44ade15f82d3a119 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: add qed_int_sb_init() stub function
Arnd Bergmann [Fri, 9 Jun 2017 10:37:35 +0000 (12:37 +0200)]
qed: add qed_int_sb_init() stub function

Orabug: 26783820

When CONFIG_QED_SRIOV is disabled, we get a build error:

drivers/net/ethernet/qlogic/qed/qed_int.c: In function 'qed_int_sb_init':
drivers/net/ethernet/qlogic/qed/qed_int.c:1499:4: error: implicit declaration of function 'qed_vf_set_sb_info'; did you mean 'qed_mcp_get_resc_info'? [-Werror=implicit-function-declaration]

All the other declarations have a 'static inline' stub as an alternative
here, so this adds one more for qed_int_sb_init.

Fixes: 50a207147fce ("qed: Hold a single array for SBs")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 2f3ca449a4f9a54d2bf39c873269e68ad5f34acb ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: collect GSI port statistics
Mintz, Yuval [Fri, 9 Jun 2017 14:13:25 +0000 (17:13 +0300)]
qed: collect GSI port statistics

Orabug: 26783820

The LL2 statistics already have place holders for these, but haven't
populated them so far.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit fef1c3f7ac119217f49c72d4cc5413b4c87c1774 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Call rx_release_cb() when flushing LL2
Mintz, Yuval [Fri, 9 Jun 2017 14:13:24 +0000 (17:13 +0300)]
qed: Call rx_release_cb() when flushing LL2

Orabug: 26783820

Driver to inform the connection owner that the its buffers are being
released as part of a flush.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 54f19f07acb6f9a0e90a183a2fb347ed3856b154 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: No need for LL2 frags indication
Mintz, Yuval [Fri, 9 Jun 2017 14:13:23 +0000 (17:13 +0300)]
qed: No need for LL2 frags indication

Orabug: 26783820

This is a legacy leftover; There's no current flow where 'frags_mapped'
would be set.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit d2201a21598aa6ad47e23272119bc29e48201670 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed*: LL2 callback operations
Michal Kalderon [Fri, 9 Jun 2017 14:13:22 +0000 (17:13 +0300)]
qed*: LL2 callback operations

Orabug: 26783820

LL2 today is interrupt driven - when tx/rx completion arrives [or any
other indication], qed needs to operate on the connection and pass
the information to the protocol-driver [or internal qed consumer].
Since we have several flavors of ll2 employeed by the driver,
each handler needs to do an if-else to determine the right functionality
to use based on the connection type.

In order to make things more scalable [given that we're going to add
additional types of ll2 flavors] move the infrastrucutre into using
a callback-based approach - the callbacks would be provided as part
of the connection's initialization parameters.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 0518c12f1f79dc2f2020836974c577404e42ae89 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: LL2 code relocations
Mintz, Yuval [Fri, 9 Jun 2017 14:13:21 +0000 (17:13 +0300)]
qed: LL2 code relocations

Orabug: 26783820

Instead of having the OOO logic packetd, divide it with rest of code
according to establish/release flows.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 58de289807f02122ef7eca96e50365d2c1440902 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Cleaner seperation of LL2 inputs
Mintz, Yuval [Fri, 9 Jun 2017 14:13:20 +0000 (17:13 +0300)]
qed: Cleaner seperation of LL2 inputs

Orabug: 26783820

A LL2 connection [qed_ll2_info] has a sub-structure of type qed_ll2_conn
that contain various inputs for ll2 acquisition, but the connection also
utilizes a couple of other inputs.

Restructure the input structure to include all the inputs and refactor
the code necessary to populate those.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 13c547717231aad7e1635004ae3f698e5e78d6d1 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Revise ll2 Rx completion
Mintz, Yuval [Fri, 9 Jun 2017 14:13:19 +0000 (17:13 +0300)]
qed: Revise ll2 Rx completion

Orabug: 26783820

This introduces qed_ll2_comp_rx_data as a public struct
and moves handling of Rx packets in LL2 into using it.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 68be910cd2fa3f58587438af7ce3def6e03731fa ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: LL2 to use packed information for tx
Mintz, Yuval [Fri, 9 Jun 2017 14:13:18 +0000 (17:13 +0300)]
qed: LL2 to use packed information for tx

Orabug: 26783820

First step in revising the LL2 interface, this declares
qed_ll2_tx_pkt_info as part of the ll2 interface, and uses it for
transmission instead of receiving lots of parameters.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 7c7973b2ae277c6e89dceda2246fff2472c8ffdb ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: VFs to try utilizing the doorbell bar
Mintz, Yuval [Sun, 4 Jun 2017 10:31:07 +0000 (13:31 +0300)]
qed: VFs to try utilizing the doorbell bar

Orabug: 26783820

VFs are currently not mapping their doorbell bar, instead relying
on the small doorbell window they have in their limited regview bar.

In order to increase the number of possible Tx connections [queues]
employeed by VF past 16, we need to start using the doorbell bar if
one such is exposed - VF would communicate this fact to PF which would
return the size-bar internally configured into chip, according to
which the VF would decide whether to actually utilize the doorbell
bar.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 1a850bfc9e71871599ddbc0d4d4cffa2dc409855 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: IOV db support multiple queues per qzone
Mintz, Yuval [Sun, 4 Jun 2017 10:31:05 +0000 (13:31 +0300)]
qed: IOV db support multiple queues per qzone

Orabug: 26783820

Allow the infrastructure a PF maintains for each one of its VFs
to support multiple queue-cids on a single queue-zone.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 007bc37179c14a6d1ff1545695e2492b3a376bc1 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Make VF legacy a bitfield
Mintz, Yuval [Sun, 4 Jun 2017 10:31:04 +0000 (13:31 +0300)]
qed: Make VF legacy a bitfield

Orabug: 26783820

Until now we used to have a single VF legacy compatibility mode,
one that affected the place of the Rx producers of those VFs [mostly].

As PF would soon support allocating CIDs for VFs instead of having
a static CID<->queue configuration for them, we'll need to have
an additional legacy mode since existing VFs would need to continue
on using the older mode of operation.

Change the infrastrucutre so that the legacy would be able to indicate
which of the legacy behaviors is needed for a given VF.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 3b19f47820756f9905e7ef184747fbb3c8ed062f ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Assign a unique per-queue index to queue-cid
Mintz, Yuval [Sun, 4 Jun 2017 10:31:03 +0000 (13:31 +0300)]
qed: Assign a unique per-queue index to queue-cid

Orabug: 26783820

When a queue-cid is allocated, assign an index inside that's
CID's queue-zone.

For PFs and VFS, this number is going to be unique and derive
from a per-queue-zone bitmap, while for PF's VFs queues the
number is currently going to constant; Later, we'd add the
capability of a VF to communicate such an index to its PF.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit bbe3f233ec5ea99049f33471c0c0d0d2a78e2116 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Pass vf_params when creating a queue-cid
Mintz, Yuval [Sun, 4 Jun 2017 10:31:02 +0000 (13:31 +0300)]
qed: Pass vf_params when creating a queue-cid

Orabug: 26783820

We're going to need additional information for queue-cids
that a PF creates for its VFs, so start by refactoring existing
logic used for initializing said struct into receiving a structure
encapsulating the VF-specific information that needs to be provided.

This also introduces QED_QUEUE_CID_SELF - each queue-cid would hold
an indication to whether it belongs to the hw-function holding it
[whether that's a PF or a VF], or else what's the VF id it belongs
to.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 3946497aff655b9bb1807ef7e2ecbe799e6d832a ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed*: L2 interface to use the SB structures directly
Mintz, Yuval [Sun, 4 Jun 2017 10:31:01 +0000 (13:31 +0300)]
qed*: L2 interface to use the SB structures directly

Orabug: 26783820

Part of an effort of a cleaner seperation between qed and the protocol
drivers, the L2 interface is to use the SB structure for initialization
purposes opaquely.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit f604b17d7fdef574792a7e0b39f1b926d6b43d9d ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Create L2 queue database
Mintz, Yuval [Sun, 4 Jun 2017 10:31:00 +0000 (13:31 +0300)]
qed: Create L2 queue database

Orabug: 26783820

First step in allowing a single PF/VF to open multiple queues on
the same queue zone is to add per-hwfn database of queue-cids
as a two-dimensional array where entry would be according to
[queue zone][internal index].

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 0db711bb26209992da375730eab6b3cec1edee7a ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Add bitmaps for VF CIDs
Mintz, Yuval [Sun, 4 Jun 2017 10:30:59 +0000 (13:30 +0300)]
qed: Add bitmaps for VF CIDs

Orabug: 26783820

Each PF has a bitmap for its own ranges of CIDs, to allow easy grabbing
of an available CID when such is needed. But VFs are not using the same
mechanism, instead relying on hard-coded CIDs [ queue-index == cid ].

As an infrastructure step toward increasing number of CIDs of VFs,
the PF is going to maintain bitmaps for the VF CIDs as well -
the bitmaps would be per-VF and the ranges would be the same [in HW all
VFs of a given PF have the same mapping of CIDs, and the HW is capable
of distinguishing between those according to the VF index]

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 6bea61da1716761c95cd32117be6004b0e14b4b2 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Add support for changing iSCSI mac
Mintz, Yuval [Fri, 2 Jun 2017 05:58:33 +0000 (08:58 +0300)]
qed: Add support for changing iSCSI mac

Orabug: 26783820

Enhance API between qedi and qed, allowing qedi to inform device's
firmware when the iSCSI mac is to be changed.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit dc4528e9e890f82900d75ac6276aba8ce89a80b6 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Support NVM-image reading API
Mintz, Yuval [Fri, 2 Jun 2017 05:58:32 +0000 (08:58 +0300)]
qed: Support NVM-image reading API

Orabug: 26783820

Storage drivers require images from the nvram in boot-from-SAN
scenarios. This provides the necessary API between qed and the
protocol drivers to perform such reads.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 20675b37ee76d11430fd3d4da0851fc6a4e36abc ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Share additional information with qedf
Mintz, Yuval [Fri, 2 Jun 2017 05:58:31 +0000 (08:58 +0300)]
qed: Share additional information with qedf

Orabug: 26783820

Share several new tidbits with qedf:
 - wwpn & wwnn
 - Absolute pf-id [this one is actually meant for qedi as well]
 - Number of available CQs

While we're at it, now that qedf will be aware of the available CQs
we can add some validation on the inputs it provides.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 3c5da94278026a4583320f97f6547573fb3a93aa ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Correct order of wwnn and wwpn
Mintz, Yuval [Fri, 2 Jun 2017 05:58:30 +0000 (08:58 +0300)]
qed: Correct order of wwnn and wwpn

Orabug: 26783820

Driver reads values via HSI splitting this 8-byte into 2 32-bit
values and builds a single u64 field - but it does so by shifting
the lower field instead of the higher.
Luckily, we still don't use these fields for anything - but we're about
to start.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 5779675912fa87d8d0af651537acc0e312f06c70 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: No need to reset SBs on IOV init
Mintz, Yuval [Thu, 1 Jun 2017 12:29:11 +0000 (15:29 +0300)]
qed: No need to reset SBs on IOV init

Orabug: 26783820

Since we're resetting the IGU CAM each time we initialize the PF
device, there's no need to reset the VF SBs again when initializing
IOV.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 1ee240e31d4c0a5fd37ebaf064ca1f6cb6adcb6f ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Reset IGU CAM to default on init
Mintz, Yuval [Thu, 1 Jun 2017 12:29:10 +0000 (15:29 +0300)]
qed: Reset IGU CAM to default on init

Orabug: 26783820

The IGU CAM contains an assocaition between hardware SBs
and interrupt lines, and it can be dynamically configured
to allow more interrupts in one entity over another, specifically
for Re-distibution of SBs between a PF and its child VFs.

While we don't yet use this functionality, there are other
clients that do and as such its possible the information
passed from management firmware during initialization in
regard to the possible number of SBs doesn't accurately reflect
the current HW configuration.

The following changes are going to apply to the driver init sequence:

 a. PF is going to re-configure all entries belonging to itself and
    its child VFs in IGU CAM based on the management firmware info
    regarding the number of SBs that are supposed to exist there.

 b. PF is going to stop using the SB resource [management firmware
    provided information] for anything but the initialization.
    Instead, it would use the live-time counters it maintains for
    the numbers.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit ebbdcc669c7f9d8632d358a739d814485f8917dc ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Hold a single array for SBs
Mintz, Yuval [Thu, 1 Jun 2017 12:29:09 +0000 (15:29 +0300)]
qed: Hold a single array for SBs

Orabug: 26783820

A PF today holds 2 different arrays - one holding information
about the HW configuration and one holding information about
the SBs that are used by the protocol drivers.
These arrays aren't really connected - e.g., protocol driver
initializing a given SB would not mark the same SB as occupied
in the HW shadow array.

Move into a single array [at least for PFs] - hold the mapping
of the driver-protocol SBs on the HW entry which they configure.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 50a207147fceb64ad24c1e08e4a2a75535922e81 ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
7 years agoqed: Provide auxiliary for getting free VF SB
Mintz, Yuval [Thu, 1 Jun 2017 12:29:08 +0000 (15:29 +0300)]
qed: Provide auxiliary for getting free VF SB

Orabug: 26783820

IOV code is very intrusive in its manipulation of the status block
database.
Add a new auxiliary function to allow the PF to find an available unused
status block to configure for a specific VF's MSI-x vector.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 09b6b14749523e3660b72be2ed91b3c0b852f58f ]
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>