]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
7 years agoscsi: smartpqi: add new PCI device IDs
Kevin Barnett [Wed, 3 May 2017 23:53:54 +0000 (18:53 -0500)]
scsi: smartpqi: add new PCI device IDs

Orabug: 2619102126447813

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7eddabff8acb0f4c25f992efe126cf6cccdd6e7b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: smartpqi: minor driver cleanup
Kevin Barnett [Wed, 3 May 2017 23:53:48 +0000 (18:53 -0500)]
scsi: smartpqi: minor driver cleanup

Orabug: 2619102126447813

 - remove debug code that is no longer necessary.
   - Some WARN_ON checks were removed because the driver continues
     to function when the conditions are met.
 - remove a MACRO that is no longer used.
 - remove unnecessary multi-line statements.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit cbe0c7b11dbfda368f27a6935a08ba91522edf1a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: smartpqi: correct BMIC identify physical drive
Kevin Barnett [Wed, 3 May 2017 23:53:42 +0000 (18:53 -0500)]
scsi: smartpqi: correct BMIC identify physical drive

Orabug: 2619102126447813

correct the BMIC Identify Physical Device structure
 - missing 2 fields

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1be42f46ade32c668f11c0735af03ab2d479d206)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: smartpqi: eliminate redundant error messages
Kevin Barnett [Wed, 3 May 2017 23:53:36 +0000 (18:53 -0500)]
scsi: smartpqi: eliminate redundant error messages

Orabug: 2619102126447813

eliminate redundant error message during initialization
if the controller has crashed.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8845fdfa92ab6eb24209f9929d6340c2f5d4a2de)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: smartpqi: add pqi_wait_for_completion_io
Kevin Barnett [Wed, 3 May 2017 23:53:24 +0000 (18:53 -0500)]
scsi: smartpqi: add pqi_wait_for_completion_io

Orabug: 2619102126447813

Add check for controller lockup during waits for synchronous
controller commands.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1f37e992ad8015ce33596466b0f36babb495148e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: smartpqi: correct bdma hw bug
Kevin Barnett [Wed, 3 May 2017 23:53:18 +0000 (18:53 -0500)]
scsi: smartpqi: correct bdma hw bug

Orabug: 2619102126447813

add workaround for BDMA hardware bug that can cause
hw to read up to 12 SGL elements (192 bytes) beyond the
last element in the list. This fix avoids IOMMU violations

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e1d213bdc3e359c6c5da8ebbc5b2e87b376e8777)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: smartpqi: add heartbeat check
Kevin Barnett [Wed, 3 May 2017 23:53:11 +0000 (18:53 -0500)]
scsi: smartpqi: add heartbeat check

Orabug: 2619102126447813

check for controller lockups

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 98f876674a6fba3591c342dfbcfdbaa7ecf0a84e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: smartpqi: add suspend and resume support
Kevin Barnett [Wed, 3 May 2017 23:53:05 +0000 (18:53 -0500)]
scsi: smartpqi: add suspend and resume support

Orabug: 2619102126447813

add support for ACPI S3 (suspend) and S4 (hibernate)
system power states.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 061ef06a2d436cea85984cf0b51b452547a5496c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
drivers/scsi/smartpqi/smartpqi_init.c

7 years agoscsi: smartpqi: enhance resets
Kevin Barnett [Wed, 3 May 2017 23:52:58 +0000 (18:52 -0500)]
scsi: smartpqi: enhance resets

Orabug: 2619102126447813

- Block all I/O targeted at LUN reset device.
- Wait until all I/O targeted at LUN reset device has been
  consumed by the controller.
- Issue LUN reset request.
- Wait until all outstanding I/Os and LUN reset completion
  have been received by the host.
- Return to OS results of LUN reset request.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7561a7e4412e515100ac195303531fc2621ac2db)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: smartpqi: add supporting events
Kevin Barnett [Wed, 3 May 2017 23:52:52 +0000 (18:52 -0500)]
scsi: smartpqi: add supporting events

Orabug: 2619102126447813

Only register for controller events that driver supports
cleanup event handling.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6a50d6ada03d8d9102a632d0e2db70cd9b6620f5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: smartpqi: ensure controller is in SIS mode at init
Kevin Barnett [Wed, 3 May 2017 23:52:46 +0000 (18:52 -0500)]
scsi: smartpqi: ensure controller is in SIS mode at init

Orabug: 2619102126447813

put in SIS mode during initialization.
support kexec/kdump

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 162d7753fce9a00719c09dfebd9fee3855e27fbe)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: smartpqi: add in controller checkpoint for controller lockups.
Kevin Barnett [Wed, 3 May 2017 23:52:40 +0000 (18:52 -0500)]
scsi: smartpqi: add in controller checkpoint for controller lockups.

Orabug: 2619102126447813

tell smartpqi controller to generate a checkpoint for rare lockup
conditions.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5b0fba0f408777113eff93bd18ab0b9f80760fb7)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: smartpqi: set pci completion timeout
Kevin Barnett [Wed, 3 May 2017 23:52:34 +0000 (18:52 -0500)]
scsi: smartpqi: set pci completion timeout

Orabug: 2619102126447813

add support for setting PCIe completion timeout.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a81ed5f338a843d8bfd199928142b196d71ae62c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: smartpqi: correct remove scsi devices
Kevin Barnett [Wed, 3 May 2017 23:52:22 +0000 (18:52 -0500)]
scsi: smartpqi: correct remove scsi devices

Orabug: 2619102126447813

correct a problem caused by holding a spinlock during device deletion.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a37ef74517acf0d022ab4c8fa671c82c877eed7b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoscsi: smartpqi: fix time handling
Arnd Bergmann [Thu, 13 Jul 2017 19:51:46 +0000 (15:51 -0400)]
scsi: smartpqi: fix time handling

Orabug: 2619102126447813

When we have turned off RTC support, the smartpqi driver fails to build:

ERROR: "rtc_time64_to_tm" [drivers/scsi/smartpqi/smartpqi.ko] undefined!

This is easily avoided by using the generic 'struct tm' based helper rather
than the RTC specific one. While fixing this, I noticed that even though
the driver uses time64_t for storing seconds, it gets them from the
old 32-bit struct timeval. To address this, we can simplify the code
by calling ktime_get_real_seconds() directly.

Fixes: 6c223761eb54 ("smartpqi: initial commit of Microsemi smartpqi driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ed10858eadd4988260c6bc7d75fc25176342b5a7)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agonet/sock: add WARN_ON(parent->sk) in sock_graft()
Sowmini Varadhan [Thu, 6 Jul 2017 15:15:07 +0000 (08:15 -0700)]
net/sock: add WARN_ON(parent->sk) in sock_graft()

sock_graft() unilaterally sets up parent->sk based on the
assumption that the existing parent->sk is null. If this
condition is not true, then the existing parent->sk would
be leaked, so add a WARN_ON() to alert callers who may fall
in this category.

Orabug: 26477756

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
7 years agords: tcp: use sock_create_lite() to create the accept socket
Sowmini Varadhan [Thu, 6 Jul 2017 15:15:06 +0000 (08:15 -0700)]
rds: tcp: use sock_create_lite() to create the accept socket

There are two problems with calling sock_create_kern() from
rds_tcp_accept_one()
1. it sets up a new_sock->sk that is wasteful, because this ->sk
   is going to get replaced by inet_accept() in the subsequent ->accept()
2. The new_sock->sk is a leaked reference in sock_graft() which
   expects to find a null parent->sk

Avoid these problems by calling sock_create_lite().

Orabug: 26477756

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
7 years agords: tcp: set linger to 1 when unloading a rds-tcp
Sowmini Varadhan [Fri, 16 Jun 2017 12:45:49 +0000 (05:45 -0700)]
rds: tcp: set linger to 1 when unloading a rds-tcp

If we are unloading the rds_tcp module, we can set linger to 1
and drop pending packets to accelerate reconnect. The peer will
end up resetting the connection based on new generation numbers
of the new incarnation, so hanging on to unsent TCP packets via
linger is mostly pointless in this case.

Orabug: 26477841

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
7 years agords: tcp: send handshake ping-probe from passive endpoint
Sowmini Varadhan [Fri, 16 Jun 2017 12:31:11 +0000 (05:31 -0700)]
rds: tcp: send handshake ping-probe from passive endpoint

The RDS handshake ping probe added by commit 5916e2c1554f
("RDS: TCP: Enable multipath RDS for TCP") is sent from rds_sendmsg()
before the first data packet is sent to a peer. If the conversation
is not bidirectional  (i.e., one side is always passive and never
invokes rds_sendmsg()) and the passive side restarts its rds_tcp
module, a new HS ping probe needs to be sent, so that the number
of paths can be re-established.

This patch achieves that by sending a HS ping probe from
rds_tcp_accept_one() when c_npaths is 0 (i.e., we have not done
a handshake probe with this peer yet).

Orabug: 26477841

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
7 years agoxfs: skip dirty pages in ->releasepage()
Brian Foster [Thu, 21 Jul 2016 23:50:38 +0000 (09:50 +1000)]
xfs: skip dirty pages in ->releasepage()

Orabug: 26451790

XFS has had scattered reports of delalloc blocks present at
->releasepage() time. This results in a warning with a stack trace
similar to the following:

 ...
 Call Trace:
  [<ffffffffa23c5b8f>] dump_stack+0x63/0x84
  [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0
  [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20
  [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140
  [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0
  [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150
  [<ffffffffa21521c2>] try_to_release_page+0x32/0x50
  [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0
  [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0
  [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0
  [<ffffffffa2168539>] kswapd+0x4f9/0x970
  [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
  [<ffffffffa20a0d99>] kthread+0xc9/0xe0
  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
  [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70
  [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100

This occurs because it is possible for shrink_active_list() to send
pages marked dirty to ->releasepage() when certain buffer_head threshold
conditions are met. shrink_active_list() doesn't check the page dirty
state apparently to handle an old ext3 corner case where in some cases
clean pages would not have the dirty bit cleared, thus it is up to the
filesystem to determine how to handle the page.

XFS currently handles the delalloc case properly, but this behavior
makes the warning spurious. Update the XFS ->releasepage() handler to
explicitly skip dirty pages. Retain the existing delalloc/unwritten
checks so we continue to warn if such buffers exist on clean pages when
they shouldn't.

Diagnosed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
(cherry picked from commit 99579ccec4e271c3d4d4e7c946058766812afdab)
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agoqede: Add support for ingress headroom
Mintz, Yuval [Fri, 7 Apr 2017 08:05:00 +0000 (11:05 +0300)]
qede: Add support for ingress headroom

Orabug: 2593305326439680

Driver currently doesn't support any headroom; The only 'available'
space it has in the head of the buffer is due to the placement
offset.
In order to allow [later] support of XDP adjustment of headroom,
modify the the ingress flow to properly handle a scenario where
the packets would have such.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqede: Update receive statistic once per NAPI
Mintz, Yuval [Fri, 7 Apr 2017 08:04:57 +0000 (11:04 +0300)]
qede: Update receive statistic once per NAPI

Orabug: 2593305326439680

Currently, each time an ingress packet is passed to networking stack
the driver increments a per-queue SW statistic.
As we want to have additional fields in the first cache-line of the
Rx-queue struct, change flow so this statistic would be updated once per
NAPI run. We will later push the statistic to a different cache line.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Make OOO archipelagos into an array
Michal Kalderon [Thu, 6 Apr 2017 12:58:35 +0000 (15:58 +0300)]
qed: Make OOO archipelagos into an array

Orabug: 2593305326439680

No need to maintain the various open archipelagos as a list -
The maximal number of them is known, and we can use the CID
as key for random-access into the array.

Signed-off-by: Michal Kalderon <Michal.Kalderon@caviumc.om>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Provide iSCSI statistics to management
Mintz, Yuval [Thu, 6 Apr 2017 12:58:34 +0000 (15:58 +0300)]
qed: Provide iSCSI statistics to management

Orabug: 2593305326439680

Management firmware can query for some basic iSCSI-related statistics.
Provide those just as we do for other protocols.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Inform qedi the number of possible CQs
Mintz, Yuval [Thu, 6 Apr 2017 12:58:33 +0000 (15:58 +0300)]
qed: Inform qedi the number of possible CQs

Orabug: 2593305326439680

Now that management firmware is capable of telling us the number of CQs
available for a given PF, qed needs to communicate the number to qedi
so it would know have many to use.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Add missing stat for new isles
Mintz, Yuval [Thu, 6 Apr 2017 12:58:32 +0000 (15:58 +0300)]
qed: Add missing stat for new isles

Orabug: 2593305326439680

Firmware provides a statistic for the number of out-of-order isles
it used - fill it in the iscsi-related statistics.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Don't close the OUT_EN during init
Mintz, Yuval [Thu, 6 Apr 2017 12:58:31 +0000 (15:58 +0300)]
qed: Don't close the OUT_EN during init

Orabug: 2593305326439680

Before initializing the chip's engine, driver currently closes a set
of registers on the HW's ingress flow to prevent packets from slipping
in while they're not supposed to.

This configuration is insufficient, as there are some scenarios where
packets would still arrive even when said registers are set,
but the management firmware already closes other per-port registers
that do suffice, making this setting unnecessray.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Configure cacheline size in HW
Tomer Tayar [Thu, 6 Apr 2017 12:58:30 +0000 (15:58 +0300)]
qed: Configure cacheline size in HW

Orabug: 2593305326439680

Default HW configuration is optimal for an architecture where cache
line size is 64B.

During chip initialization, properly initialize the cache line size
in HW to avoid possible redundant PCI transactions.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Don't use main-ptt in unrelated flows
Rahul Verma [Thu, 6 Apr 2017 12:58:29 +0000 (15:58 +0300)]
qed: Don't use main-ptt in unrelated flows

Orabug: 2593305326439680

In order to access HW registers driver needs to acquire a PTT entry
[mapping between bar memory and internal chip address].
Since acquiring PTT entries could fail [at least in theory] as their
number is finite and other flows can hold them, we reserve special PTT
entries for 'important' enough flows - ones we want to guarantee that
would not be susceptible to such issues.

One such special entry is the 'main' PTT which is meant to be used in
flows such as chip initialization and de-initialization.
However, there are other flows that are also using that same entry
for their own purpose, and might run concurrently with the original
flows [notice that for most cases using the main-ptt by mistake, such
a race is still impossible, at least today].

This patch re-organizes the various functions that currently use the
main_ptt in one of two ways:

  - If a function shouldn't use the main_ptt it starts acquiring and
    releasing it's own PTT entry and use it instead. Notice if those
    functions previously couldn't fail, they now can [as acquisition
    might fail].

  - Change the prototypes so that the main_ptt would be received as
    a parameter [instead of explicitly accessing it].
    This prevents the future risk of adding codes that introduces new
    use-cases for flows using the main_ptt, ones that might be in race
    with the actual 'main' flows.

Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Warn PTT usage by wrong hw-function
Mintz, Yuval [Thu, 6 Apr 2017 12:58:28 +0000 (15:58 +0300)]
qed: Warn PTT usage by wrong hw-function

Orabug: 2593305326439680

PTT entries are per-hwfn; If some errneous flow is trying
to use a PTT belonging to a differnet hwfn warn user, as this
can break every register accessing flow later and is very hard
to root-cause.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Correct MSI-x for storage
Mintz, Yuval [Wed, 5 Apr 2017 18:20:11 +0000 (21:20 +0300)]
qed: Correct MSI-x for storage

Orabug: 2593305326439680

When qedr is enabled, qed would try dividing the msi-x vectors between
L2 and RoCE, starting with L2 and providing it with sufficient vectors
for its queues.

Problem is qed would also do that for storage partitions, and as those
don't need queues it would lead qed to award those partitions with 0
msi-x vectors, causing them to believe theye're using INTa and
preventing them from operating.

Fixes: 51ff17251c9c ("qed: Add support for RoCE hw init")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: fix missing break in OOO_LB_TC case
Colin Ian King [Wed, 5 Apr 2017 12:35:44 +0000 (13:35 +0100)]
qed: fix missing break in OOO_LB_TC case

Orabug: 2593305326439680

There seems to be a missing break on the OOO_LB_TC case, pq_id
is being assigned and then re-assigned on the fall through default
case and that seems suspect.

Detected by CoverityScan, CID#1424402 ("Missing break in switch")

Fixes: b5a9ee7cf3be1 ("qed: Revise QM cofiguration")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Add a missing error code
Dan Carpenter [Mon, 3 Apr 2017 18:25:22 +0000 (21:25 +0300)]
qed: Add a missing error code

Orabug: 2593305326439680

We should be returning -ENOMEM if qed_mcp_cmd_add_elem() fails.  The
current code returns success.

Fixes: 4ed1eea82a21 ("qed: Revise MFW command locking")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: RoCE doesn't need to use SRC
Mintz, Yuval [Mon, 3 Apr 2017 09:21:12 +0000 (12:21 +0300)]
qed: RoCE doesn't need to use SRC

Orabug: 2593305326439680

As RoCE doesn't need to use the SRC, allocating ILT memory
on behalf of RoCE is wasting available ILT lines.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Correct TM ILT lines in presence of VFs
Mintz, Yuval [Mon, 3 Apr 2017 09:21:11 +0000 (12:21 +0300)]
qed: Correct TM ILT lines in presence of VFs

Orabug: 2593305326439680

As of today there's no protocol supported that requires
support from the TM hardware block and enables SRIOV,
but we should still correct the calculation to reflect
the lines required for such future VFs instead of changing
the PF's own lines.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Fix TM block ILT allocation
Michal Kalderon [Mon, 3 Apr 2017 09:21:10 +0000 (12:21 +0300)]
qed: Fix TM block ILT allocation

Orabug: 2593305326439680

When configuring the HW timers block we should set the number of CIDs
up until the last CID that require timers, instead of only those CIDs
whose protocol needs timers support.

Today, the protocols that require HW timers' support have their CIDs
before any other protocol, but that would change in future [when we
add iWARP support].

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Revise QM cofiguration
Ariel Elior [Mon, 3 Apr 2017 09:21:09 +0000 (12:21 +0300)]
qed: Revise QM cofiguration

Orabug: 2593305326439680

Refactor and clean up the queue manager initialization logic.
Also, this adds support for RoC low latency queues, which later
would be used for improving RoCE latency in high throughput scenarios.

Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Use BDQ resource for storage protocols
Mintz, Yuval [Tue, 28 Mar 2017 12:12:56 +0000 (15:12 +0300)]
qed: Use BDQ resource for storage protocols

Orabug: 2593305326439680

Until now, qed used some port-defined value as BDQ index for both iSCSI
and FCoE.

As management firmware now treats BDQ as a resource and tells each PF
its BDQ-range, start using a valure from that range instead.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Utilize resource-lock based scheme
Tomer Tayar [Tue, 28 Mar 2017 12:12:55 +0000 (15:12 +0300)]
qed: Utilize resource-lock based scheme

Orabug: 2593305326439680

Management firmware is used as an arbiter between the various PFs
in matters of resources, but some of the resources that need to
be divided are dependent on the non-management firmware used,
so management firmware first needs to be told how many resources
there are before trying to divide them.

As part of the initialization sequence, driver would first inform
the management firmware of the available resources under
a dedicated resource lock, and afterwards request for various
resources which might be based on the previous set values.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Support management-based resource locking
Tomer Tayar [Tue, 28 Mar 2017 12:12:54 +0000 (15:12 +0300)]
qed: Support management-based resource locking

Orabug: 2593305326439680

Global locking can't properly be used to synchronize between different
PFs in all scenarios, as those instances might reside in different
logical partitions [e.g., when a PF is assigned via PDA to some VM].

The management firmware provides a generic infrastructure for
device locks. For each 'resource', it's guaranteed it could be acquired
by at most a single PF at any given time [or by management firmware].

This patch adds the necessary logic in qed for utilizing said
infrastructure, implementing lock/unlock internal APIs.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Send pf-flr as part of initialization
Mintz, Yuval [Tue, 28 Mar 2017 12:12:53 +0000 (15:12 +0300)]
qed: Send pf-flr as part of initialization

Orabug: 2593305326439680

During HW initialization, driver would set various registers to their
needed values - but it assumes all registers start at their reset-value,
so there's no need to re-configure a register's default value.

This assumption might be incorrect, e.g., in case of preboot driver
running and initializing the driver prior to our driver.

To overcome this, we now ask management firmware to initiate a PF-flr
early during the initialization sequence. That would return everything
in the PF's scope back to default and prevent previous configurations
from still being applied.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Move to new load request scheme
Tomer Tayar [Fri, 7 Jul 2017 20:03:58 +0000 (16:03 -0400)]
qed: Move to new load request scheme

Orabug: 2593305326439680

Management firmware is used as an arbiter between the various PFs
in regard to loading - it causes the various PFs to load/unload
sequentially and informs each of its appropriate rule in the init.

But the existing flow is too weak to handle some scenarios where
PFs aren't properly cleaned prior to loading.
The significant scenarios falling under this criteria:
  a. Preboot drivers in some environment can't properly unload.
  b. Unexpected driver replacement [kdump, PDA].

Modern management firmware supports a more intricate loading flow,
where the driver has the ability to overcome previous limitations.
This moves qed into using this newer scheme.

Notice new scheme is backward compatible, so new drivers would
still be able to load properly on top of older management firmwares
and vice versa.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: hw_init() to receive parameter-struct
Mintz, Yuval [Tue, 28 Mar 2017 12:12:51 +0000 (15:12 +0300)]
qed: hw_init() to receive parameter-struct

Orabug: 2593305326439680

We'll soon need additional information, so start by changing
the infrastructure to receive the initializing variables
via a parameter struct.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Correct HW stop flow
Tomer Tayar [Tue, 28 Mar 2017 12:12:50 +0000 (15:12 +0300)]
qed: Correct HW stop flow

Orabug: 2593305326439680

Management firmware is used as arbiter between different PFs
which are loading/unloading, but in order to use the synchronization
it offers the contending configurations need to be applied either
between their LOAD_REQ <-> LOAD_DONE or UNLOAD_REQ <-> UNLOAD_DONE
management firmware commands.

Existing HW stop flow utilizes 2 different functions: qed_hw_stop() and
qed_hw_reset() which don't abide this requirement; Most of the closure
is doing outside the scope of the unload request.

This patch removes qed_hw_reset() and places the relevant stop
functionality underneath the management firmware protection.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Reserve VF feature before PF
Mintz, Yuval [Thu, 23 Mar 2017 13:50:20 +0000 (15:50 +0200)]
qed: Reserve VF feature before PF

Orabug: 2593305326439680

Align the driver feature distribution with the flow utilized
by the management firmware - first reserve L2 queues for
VFs and use all the remaining for the PF.

The current distribution might lead to PFs with an enormous
amount of queues, but at the same time leave us with insufficient
resources for starting all VFs.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Don't waste SBs unused by RoCE
Mintz, Yuval [Thu, 23 Mar 2017 13:50:19 +0000 (15:50 +0200)]
qed: Don't waste SBs unused by RoCE

Orabug: 2593305326439680

When RoCE is enabled on a given L2 interface, the interrupt lines
are divided equally between L2 and RoCE -
But in case number of lines needed for RoCE is limited by number
of available CNQs, we can utilize the additional lines for L2.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Correct endian order of MAC passed to MFW
Mintz, Yuval [Thu, 23 Mar 2017 13:50:17 +0000 (15:50 +0200)]
qed: Correct endian order of MAC passed to MFW

Orabug: 2593305326439680

The management firmware is running on a Big Endian processor,
and when running on LE platform HW is configured to swap access
to memory shared between management firmware and driver on
32-bit granulariy.

As a result, for matters of simplicity most of the APIs between
driver and management firmware are based on 32-bit variables.
MAC settings are one exception, as driver needs to fill a byte
array when indicating to management firmware that primary MAC
has changed.
Due to the swap, driver must make sure that the mac that was
provided in byte-order would be translated into native order,
otherwise after the swap the management firmware would read
it swapped.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Pass src/dst sizes when interacting with MFW
Tomer Tayar [Thu, 23 Mar 2017 13:50:16 +0000 (15:50 +0200)]
qed: Pass src/dst sizes when interacting with MFW

Orabug: 2593305326439680

The driver interaction with management firmware involves a union
of all the data-members relating to the commands the driver prepares.

Current interface assumes the caller always passes such a union -
but thats cumbersome as well as risky [chancing a stack corruption
in case caller accidentally passes a smaller member instead of union].

Change implementation so that caller could pass a pointer to any
of the members instead of the union.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Revise MFW command locking
Tomer Tayar [Thu, 23 Mar 2017 13:50:15 +0000 (15:50 +0200)]
qed: Revise MFW command locking

Orabug: 2593305326439680

Interaction of driver -> management firmware is based
on a one-pending mailbox [per interface], and various
mailbox commands need to be synchronized.

Current scheme is messy, and there's a difficulty extending
it as it deals differently with various commands as well as
making assumption on the required behavior for load/unload
requests.

Drop the current scheme into a completion-list-based approach;
Each flow would try sending the command when possible,
allowing one flow to complete another flow's completion and
relieve the mailbox before sending its own command.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Always publish VF link from leading hwfn
Mintz, Yuval [Sun, 19 Mar 2017 11:08:20 +0000 (13:08 +0200)]
qed: Always publish VF link from leading hwfn

Orabug: 2593305326439680

The link information exists only on the leading hwfn,
but some of its derivatives [e.g., min/max rate] need to
be configured for each hwfn.
When re-basing the VF link view, use the leading hwfn
information as basis for all existing hwfns to allow
said configurations to stick.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Raise verbosity of Malicious VF indications
Mintz, Yuval [Sun, 19 Mar 2017 11:08:19 +0000 (13:08 +0200)]
qed: Raise verbosity of Malicious VF indications

Orabug: 2593305326439680

Malicious VF existance should be interesting enough for the
hyperuser. Change the PF indication that one of its child VF
became malicious to appear by default.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Make qed_iov_mark_vf_flr() return bool
Mintz, Yuval [Sun, 19 Mar 2017 11:08:18 +0000 (13:08 +0200)]
qed: Make qed_iov_mark_vf_flr() return bool

Orabug: 2593305326439680

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Deprecate VF multiple queue-stop
Mintz, Yuval [Sun, 19 Mar 2017 11:08:17 +0000 (13:08 +0200)]
qed: Deprecate VF multiple queue-stop

Orabug: 2593305326439680

The PF<->VF interface allows for the VF to request
multiple queues closure via a single message, but this has
never been used by any official driver.

We now deprecate this option, forcing each queue close
to arrive via a different command; This would be required
for future TLVs that are going to extend the queue TLVs with
additional information on a per-queue basis.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Uniform IOV queue validation
Mintz, Yuval [Sun, 19 Mar 2017 11:08:16 +0000 (13:08 +0200)]
qed: Uniform IOV queue validation

Orabug: 2593305326439680

PF needs to validate the status of VF queues before asking firmware
to configure anything for them, but that validation is done in various
different forms - sometimes inadequate.

Add auxillary functions that can be used for testing of the queue
state and convert the various flows to use those instead of current
existing flows; Also, add missing validations where needed.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Correct default VF coalescing configuration
Mintz, Yuval [Sun, 19 Mar 2017 11:08:15 +0000 (13:08 +0200)]
qed: Correct default VF coalescing configuration

Orabug: 2593305326439680

When starting the VF's vport, the PF would first configure
the status blocks of the VF and then reset them.
That would cause some of the configured information to be lost -
specifically it would mean that all the VFs queues would use
the Rx coalescing state-machine of the status block.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Set HW-channel to ready before ACKing VF
Mintz, Yuval [Sun, 19 Mar 2017 11:08:14 +0000 (13:08 +0200)]
qed: Set HW-channel to ready before ACKing VF

Orabug: 2593305326439680

When PF responds to the VF requests it also cleans the HW-channel
indication in firmware to allow further VF messages to arrive,
but the order currently applied is wrong -
The PF is copying by DMAE the response the VF is polling on for
completion, and only afterwards sets the HW-channel to ready state.

This creates a race condition where the VF would be able to send
an additional message to the PF before the channel would get ready
again, causing the firmware to consider the VF as malicious.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Clean VF malicious indication when disabling IOV
Mintz, Yuval [Sun, 19 Mar 2017 11:08:13 +0000 (13:08 +0200)]
qed: Clean VF malicious indication when disabling IOV

Orabug: 2593305326439680

When a VF is considered malicious, driver handling of the VF
FLR flow would clean said indication - but not if the FLR is
part of an sriov-disable flow.
That leads to further issues, as PF wouldn't re-enable the
previously malicious VF when sriov is re-enabled.

No reason for that - simply clean malicious indications in
the sriov-disable flow as well.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Increase verbosity of VF -> PF errors
Mintz, Yuval [Sun, 19 Mar 2017 11:08:12 +0000 (13:08 +0200)]
qed: Increase verbosity of VF -> PF errors

Orabug: 2593305326439680

VFs are currently logging errors when communicating
with their PFs in a too-low verbosity that wouldn't
be shown by default. As timeouts and failed commands
are crucial for VF operability, make them appear by
default.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed*: Add support for QL41xxx adapters
Mintz, Yuval [Tue, 14 Mar 2017 14:23:54 +0000 (16:23 +0200)]
qed*: Add support for QL41xxx adapters

Orabug: 2593305326439680

This adds the necessary infrastructure changes for initializing
and working with the new series of QL41xxx adapaters.

It also adds 2 new PCI device-IDs to qede:
  - 0x8070 for QL41xxx PFs
  - 0x8090 for VFs spawning from QL41xxx PFs

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Enable iSCSI Out-of-Order
Mintz, Yuval [Tue, 14 Mar 2017 13:26:04 +0000 (15:26 +0200)]
qed: Enable iSCSI Out-of-Order

Orabug: 2593305326439680

Missing in the initial submission, qed fails to propagate qedi's
request to enable OOO to firmware.

Fixes: fc831825f99e ("qed: Add support for hardware offloaded iSCSI")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Correct out-of-bound access in OOO history
Mintz, Yuval [Tue, 14 Mar 2017 13:26:03 +0000 (15:26 +0200)]
qed: Correct out-of-bound access in OOO history

Orabug: 2593305326439680

Need to set the number of entries in database, otherwise the logic
would quickly surpass the array.

Fixes: 1d6cff4fca43 ("qed: Add iSCSI out of order packet handling")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Fix interrupt flags on Rx LL2
Ram Amrani [Tue, 14 Mar 2017 13:26:02 +0000 (15:26 +0200)]
qed: Fix interrupt flags on Rx LL2

Orabug: 2593305326439680

Before iterating over the the LL2 Rx ring, the ring's
spinlock is taken via spin_lock_irqsave().
The actual processing of the packet [including handling
by the protocol driver] is done without said lock,
so qed releases the spinlock and re-claims it afterwards.

Problem is that the final spin_lock_irqrestore() at the end
of the iteration uses the original flags saved from the
initial irqsave() instead of the flags from the most recent
irqsave(). So it's possible that the interrupt status would
be incorrect at the end of the processing.

Fixes: 0a7fb11c23c0 ("qed: Add Light L2 support");
CC: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Free previous connections when releasing iSCSI
Mintz, Yuval [Tue, 14 Mar 2017 13:26:01 +0000 (15:26 +0200)]
qed: Free previous connections when releasing iSCSI

Orabug: 2593305326439680

Fixes: fc831825f99e ("qed: Add support for hardware offloaded iSCSI")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Fix mapping leak on LL2 rx flow
Mintz, Yuval [Tue, 14 Mar 2017 13:26:00 +0000 (15:26 +0200)]
qed: Fix mapping leak on LL2 rx flow

Orabug: 2593305326439680

When receiving an Rx LL2 packet, qed fails to unmap the previous buffer.

Fixes: 0a7fb11c23c0 ("qed: Add Light L2 support");
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Prevent creation of too-big u32-chains
Tomer Tayar [Tue, 14 Mar 2017 13:25:59 +0000 (15:25 +0200)]
qed: Prevent creation of too-big u32-chains

Orabug: 2593305326439680

Current Logic would allow the creation of a chain with U32_MAX + 1
elements, when the actual maximum supported by the driver infrastructure
is U32_MAX.

Fixes: a91eb52abb50 ("qed: Revisit chain implementation")
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Align CIDs according to DORQ requirement
Ram Amrani [Tue, 14 Mar 2017 13:25:58 +0000 (15:25 +0200)]
qed: Align CIDs according to DORQ requirement

Orabug: 2593305326439680

The Doorbell HW block can be configured at a granularity
of 16 x CIDs, so we need to make sure that the actual number
of CIDs configured would be a multiplication of 16.

Today, when RoCE is enabled - given that the number is unaligned,
doorbelling the higher CIDs would fail to reach the firmware and
would eventually timeout.

Fixes: dbb799c39717 ("qed: Initialize hardware for new protocols")
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed*: Utilize Firmware 8.15.3.0
Mintz, Yuval [Sat, 11 Mar 2017 16:39:18 +0000 (18:39 +0200)]
qed*: Utilize Firmware 8.15.3.0

Orabug: 2593305326439680

This patch advances the qed* drivers into using the newer firmware -
This solves several firmware bugs, mostly related [but not limited to]
various init/deinit issues in various offloaded protocols.

It also introduces a major 4-Cached SGE change in firmware, which can be
seen in the storage drivers' changes.

In addition, this firmware is required for supporting the new QL41xxx
series of adapters; While this patch doesn't add the actual support,
the firmware contains the necessary initialization & firmware logic to
operate such adapters [actual support would be added later on].

Changes from Previous versions:
-------------------------------
 - V2 - fix kbuild-test robot warnings

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqedi: Add PCI device-ID for QL41xxx adapters.
Manish Rangankar [Mon, 13 Mar 2017 09:53:57 +0000 (15:23 +0530)]
qedi: Add PCI device-ID for QL41xxx adapters.

Orabug: 2593305326439680

Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Fix copy of uninitialized memory
robert.foss@collabora.com [Tue, 7 Mar 2017 16:46:25 +0000 (11:46 -0500)]
qed: Fix copy of uninitialized memory

Orabug: 2593305326439680

In qed_ll2_start_ooo() the ll2_info variable is uninitialized and then
passed to qed_ll2_acquire_connection() where it is copied into a new
memory space.

This shouldn't cause any issue as long as non of the copied memory is
every read.
But the potential for a bug being introduced by reading this memory
is real.

Detected by CoverityScan, CID#1399632 ("Uninitialized scalar variable")

Signed-off-by: Robert Foss <robert.foss@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Don't use attention PTT for configuring BW
Mintz, Yuval [Mon, 27 Feb 2017 09:06:33 +0000 (11:06 +0200)]
qed: Don't use attention PTT for configuring BW

Orabug: 2593305326439680

Commit 653d2ffd6405 ("qed*: Fix link indication race") introduced another
race - one of the inner functions called from the link-change flow is
explicitly using the slowpath context dedicated PTT instead of gaining
that PTT from the caller. Since this flow can now be called from
a different context as well, we're in risk of the PTT breaking.

Fixes: 653d2ffd6405 ("qed*: Fix link indication race")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqed: Fix race with multiple VFs
Mintz, Yuval [Mon, 27 Feb 2017 09:06:32 +0000 (11:06 +0200)]
qed: Fix race with multiple VFs

Orabug: 2593305326439680

A PF syncronizes all IOV activity relating to its VFs
by using a single workqueue handling the work.
The workqueue would reach a bitmask of pending VF events
and act upon each in turn.

Problem is that the indication of a VF message [which sets
the 'vf event' bit for that VF] arrives and is set in
the slowpath attention context, which isn't syncronized with
the processing of the events.
When multiple VFs are present, it's possible that PF would
lose the indication of one of the VF's pending evens, leading
that VF to later timeout.

Instead of adding locks/barriers, simply move from a bitmask
into a per-VF indication inside that VF entry in the PF database.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqede: Add driver support for PTP
Sudarsana Reddy Kalluru [Wed, 15 Feb 2017 08:24:11 +0000 (10:24 +0200)]
qede: Add driver support for PTP

Orabug: 2593305326439680

This patch adds the driver support for,
  - Registering the ptp clock functionality with the OS.
  - Timestamping the Rx/Tx PTP packets.
  - Ethtool callbacks related to PTP.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqede: Remove unnecessary datapath dereference
Mintz, Yuval [Sun, 1 Jan 2017 11:57:06 +0000 (13:57 +0200)]
qede: Remove unnecessary datapath dereference

Orabug: 2593305326439680

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqede - mark SKB as encapsulated
Manish Chopra [Sun, 1 Jan 2017 11:57:05 +0000 (13:57 +0200)]
qede - mark SKB as encapsulated

Orabug: 2593305326439680

When driver receives a recognized encapsulated packet it needs
to set the skb->encapsulation field as well.

Signed-off-by: Manish Chopra <Manish.Chopra@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqede: Postpone reallocation until NAPI end
Mintz, Yuval [Sun, 1 Jan 2017 11:57:04 +0000 (13:57 +0200)]
qede: Postpone reallocation until NAPI end

Orabug: 2593305326439680

During Rx flow driver allocates a replacement buffer each time
it consumes an Rx buffer. Failing to do so, it would consume the
currently processed buffer and re-post it on the ring.
As a result, the Rx ring is always completely full [from driver POV].

We now allow the Rx ring to shorten by doing the re-allocations
at the end of the NAPI run. The only limitation is that we still want to
make sure each time we reallocate that we'd still have sufficient
elements in the Rx ring to guarantee that FW would be able to post
additional data and trigger an interrupt.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqede: Split filtering logic to its own file
Mintz, Yuval [Sun, 1 Jan 2017 11:57:02 +0000 (13:57 +0200)]
qede: Split filtering logic to its own file

Orabug: 2593305326439680

This takes the various filtering logic of the driver and
moves them into their own dedicated file - qede_filter.c.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoqede: Break datapath logic into its own file
Mintz, Yuval [Sun, 1 Jan 2017 11:57:01 +0000 (13:57 +0200)]
qede: Break datapath logic into its own file

Orabug: 2593305326439680

This adds a new file qede_fp.c and relocates the datapath-related
logic into it [from qede_main.c].

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
7 years agoMSI: Don't assign MSI IRQ vector twice v4.1.12-106.0.20170720_1900
Ashok Vairavan [Sun, 16 Jul 2017 21:24:21 +0000 (14:24 -0700)]
MSI: Don't assign MSI IRQ vector twice

Orabug: 26275961

NVMe enables MSIx interrupts during the nvme device probe
and also during nvme reset. While enabling MSIx interrupts
using do_setup_msix_irqs(), it does it twice. It assigns
the IRQ under the function irq_alloc_hwirqs() and again
shortly using setup_msi_irq(). During the first invocation
from irq_alloc_hwirqs(), it sets the cfg->vector and
cfg->domain of the specific IRQ. During the subsequent
invocation from setup_msi_irq(), if the cfg->domain
intersects with the target cpumask (tmp_mask) then
the move_in_progress flag is set. This flag is never cleared
unless it is set via proc file system. As this flag is not
cleared, the subsequent smp affinity set via procfs fails.

Upstream introduced IRQ domain hierarchy where they assign
the IRQ only once. However, pulling in IRQ domain hierarchy
from the upstream brings with it lot of changes (85 commits).
Hence, this patch assigns the IRQ only once.

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agoblk-mq: Export blk_mq_freeze_queue_wait
Keith Busch [Wed, 1 Mar 2017 19:22:10 +0000 (14:22 -0500)]
blk-mq: Export blk_mq_freeze_queue_wait

Drivers can start a freeze, so this provides a way to wait for frozen.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 6bae363ee3057a14eec93440826813603559273a)

Orabug: 26486098

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agoblk-mq: Provide freeze queue timeout
Keith Busch [Wed, 1 Mar 2017 19:22:11 +0000 (14:22 -0500)]
blk-mq: Provide freeze queue timeout

A driver may wish to take corrective action if queued requests do not
complete within a set time.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f91328c40a559362b6e7b7bfee01ca17fda87592)

Orabug: 26486098

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agonvme: Complete all stuck requests
Keith Busch [Wed, 1 Mar 2017 19:22:12 +0000 (14:22 -0500)]
nvme: Complete all stuck requests

If the nvme driver is shutting down its controller, the drievr will not
start the queues up again, preventing blk-mq's hot CPU notifier from
making forward progress.

To fix that, this patch starts a request_queue freeze when the driver
resets a controller so no new requests may enter. The driver will wait
for frozen after IO queues are restarted to ensure the queue reference
can be reinitialized when nvme requests to unfreeze the queues.

If the driver is doing a safe shutdown, the driver will wait for the
controller to successfully complete all inflight requests so that we
don't unnecessarily fail them. Once the controller has been disabled,
the queues will be restarted to force remaining entered requests to end
in failure so that blk-mq's hot cpu notifier may progress.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 302ad8cc09339ea261eef58a8d5f4a116a8ffda5)

Orabug: 26486098

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agonvme: Don't suspend admin queue that wasn't created
Gabriel Krisman Bertazi [Tue, 6 Sep 2016 20:39:13 +0000 (17:39 -0300)]
nvme: Don't suspend admin queue that wasn't created

This fixes a regression in my previous commit c21377f8366c ("nvme:
Suspend all queues before deletion"), which provoked an Oops in the
removal path when removing a device that became IO incapable very early
at probe (i.e. after a failed EEH recovery).

Turns out, if the error occurred very early at the probe path, before
even configuring the admin queue, we might try to suspend the
uninitialized admin queue, accessing bad memory.

Fixes: c21377f8366c ("nvme: Suspend all queues before deletion")
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 82469c59d222f839ded5cd282172258e026f9112)

Orabug: 26486098

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agonvme: Delete created IO queues on reset
Keith Busch [Wed, 12 Oct 2016 15:22:16 +0000 (09:22 -0600)]
nvme: Delete created IO queues on reset

The driver was decrementing the online_queues prior to attempting to
delete those IO queues, so the driver ended up not requesting the
controller delete any. This patch saves the online_queues prior to
suspending them, and adds that parameter for deleting io queues.

Fixes: c21377f8 ("nvme: Suspend all queues before deletion")
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7065906096273b39b90a512a7170a6697ed94b23)

Orabug: 26486098

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agonvme: Suspend all queues before deletion
Gabriel Krisman Bertazi [Thu, 11 Aug 2016 15:35:57 +0000 (09:35 -0600)]
nvme: Suspend all queues before deletion

When nvme_delete_queue fails in the first pass of the
nvme_disable_io_queues() loop, we return early, failing to suspend all
of the IO queues.  Later, on the nvme_pci_disable path, this causes us
to disable MSI without actually having freed all the IRQs, which
triggers the BUG_ON in free_msi_irqs(), as show below.

This patch refactors nvme_disable_io_queues to suspend all queues before
start submitting delete queue commands.  This way, we ensure that we
have at least returned every IRQ before continuing with the removal
path.

[  487.529200] kernel BUG at ../drivers/pci/msi.c:368!
cpu 0x46: Vector: 700 (Program Check) at [c0000078c5b83650]
    pc: c000000000627a50: free_msi_irqs+0x90/0x200
    lr: c000000000627a40: free_msi_irqs+0x80/0x200
    sp: c0000078c5b838d0
   msr: 9000000100029033
  current = 0xc0000078c5b40000
  paca    = 0xc000000002bd7600   softe: 0        irq_happened: 0x01
    pid   = 1376, comm = kworker/70:1H
kernel BUG at ../drivers/pci/msi.c:368!
Linux version 4.7.0.mainline+ (root@iod76) (gcc version 5.3.1 20160413
(Ubuntu/IBM 5.3.1-14ubuntu2.1) ) #104 SMP Fri Jul 29 09:20:17 CDT 2016
enter ? for help
[c0000078c5b83920d0000000363b0cd8 nvme_dev_disable+0x208/0x4f0 [nvme]
[c0000078c5b83a10d0000000363b12a4 nvme_timeout+0xe4/0x250 [nvme]
[c0000078c5b83ad0c0000000005690e4 blk_mq_rq_timed_out+0x64/0x110
[c0000078c5b83b40c00000000056c930 bt_for_each+0x160/0x170
[c0000078c5b83bb0c00000000056d928 blk_mq_queue_tag_busy_iter+0x78/0x110
[c0000078c5b83c00c0000000005675d8 blk_mq_timeout_work+0xd8/0x1b0
[c0000078c5b83c50c0000000000e8cf0 process_one_work+0x1e0/0x590
[c0000078c5b83ce0c0000000000e9148 worker_thread+0xa8/0x660
[c0000078c5b83d80c0000000000f2090 kthread+0x110/0x130
[c0000078c5b83e30c0000000000095f0 ret_from_kernel_thread+0x5c/0x6c

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: linux-nvme@lists.infradead.org
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit c21377f8366c95440d533edbe47d070f662c62ef)

Orabug: 26486098

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agonvme/pci: No special case for queue busy on IO
Keith Busch [Fri, 10 Feb 2017 23:15:52 +0000 (18:15 -0500)]
nvme/pci: No special case for queue busy on IO

This driver previously required we have a special check for IO submitted
to nvme IO queues that are temporarily suspended. That is no longer
necessary since blk-mq provides a quiesce, so any IO that actually gets
submitted to such a queue must be ended since the queue isn't going to
start back up.

This is fixing a condition where we have fewer IO queues after a
controller reset. This may happen if the number of CPU's has changed,
or controller firmware update changed the queue count, for example.

While it may be possible to complete the IO on a different queue, the
block layer does not provide a way to resubmit a request on a different
hardware context once the request has entered the queue. We don't want
these requests to be stuck indefinitely either, so ending them in error
is our only option at the moment.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 9ef3932e250f8e2e11ffbc0c1f28b3ba5dc40cd6)

Orabug: 26486098

UEK4 blk-mq module doesn't have the quescing capability. So the requests
should fail if a namespace is dead.

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
7 years agoipv6: Fix leak in ipv6_gso_segment().
David S. Miller [Mon, 5 Jun 2017 01:41:10 +0000 (21:41 -0400)]
ipv6: Fix leak in ipv6_gso_segment().

If ip6_find_1stfragopt() fails and we return an error we have to free
up 'segs' because nobody else is going to.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e3e86b5119f81e5e2499bea7ea1ebe8ac6aab789)

Orabug: 26175248
CVE-2017-9074

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agoipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
Ben Hutchings [Wed, 31 May 2017 12:15:41 +0000 (13:15 +0100)]
ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()

xfrm6_find_1stfragopt() may now return an error code and we must
not treat it as a length.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6e80ac5cc992ab6256c3dae87f7e57db15e1a58c)

Orabug: 26175248
CVE-2017-9074

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agoipv6: Check ip6_find_1stfragopt() return value properly.
David S. Miller [Thu, 18 May 2017 02:54:11 +0000 (22:54 -0400)]
ipv6: Check ip6_find_1stfragopt() return value properly.

Do not use unsigned variables to see if it returns a negative
error or not.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7dd7eb9513bd02184d45f000ab69d78cb1fa1531)

Orabug: 26175248
CVE-2017-9074

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
 Conflicts:
net/ipv6/ip6_offload.c

7 years agoipv6: Prevent overrun when parsing v6 header options
Craig Gallek [Tue, 16 May 2017 18:36:23 +0000 (14:36 -0400)]
ipv6: Prevent overrun when parsing v6 header options

The KASAN warning repoted below was discovered with a syzkaller
program.  The reproducer is basically:
  int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
  send(s, &one_byte_of_data, 1, MSG_MORE);
  send(s, &more_than_mtu_bytes_data, 2000, 0);

The socket() call sets the nexthdr field of the v6 header to
NEXTHDR_HOP, the first send call primes the payload with a non zero
byte of data, and the second send call triggers the fragmentation path.

The fragmentation code tries to parse the header options in order
to figure out where to insert the fragment option.  Since nexthdr points
to an invalid option, the calculation of the size of the network header
can made to be much larger than the linear section of the skb and data
is read outside of it.

This fix makes ip6_find_1stfrag return an error if it detects
running out-of-bounds.

[   42.361487] ==================================================================
[   42.364412] BUG: KASAN: slab-out-of-bounds in ip6_fragment+0x11c8/0x3730
[   42.365471] Read of size 840 at addr ffff88000969e798 by task ip6_fragment-oo/3789
[   42.366469]
[   42.366696] CPU: 1 PID: 3789 Comm: ip6_fragment-oo Not tainted 4.11.0+ #41
[   42.367628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[   42.368824] Call Trace:
[   42.369183]  dump_stack+0xb3/0x10b
[   42.369664]  print_address_description+0x73/0x290
[   42.370325]  kasan_report+0x252/0x370
[   42.370839]  ? ip6_fragment+0x11c8/0x3730
[   42.371396]  check_memory_region+0x13c/0x1a0
[   42.371978]  memcpy+0x23/0x50
[   42.372395]  ip6_fragment+0x11c8/0x3730
[   42.372920]  ? nf_ct_expect_unregister_notifier+0x110/0x110
[   42.373681]  ? ip6_copy_metadata+0x7f0/0x7f0
[   42.374263]  ? ip6_forward+0x2e30/0x2e30
[   42.374803]  ip6_finish_output+0x584/0x990
[   42.375350]  ip6_output+0x1b7/0x690
[   42.375836]  ? ip6_finish_output+0x990/0x990
[   42.376411]  ? ip6_fragment+0x3730/0x3730
[   42.376968]  ip6_local_out+0x95/0x160
[   42.377471]  ip6_send_skb+0xa1/0x330
[   42.377969]  ip6_push_pending_frames+0xb3/0xe0
[   42.378589]  rawv6_sendmsg+0x2051/0x2db0
[   42.379129]  ? rawv6_bind+0x8b0/0x8b0
[   42.379633]  ? _copy_from_user+0x84/0xe0
[   42.380193]  ? debug_check_no_locks_freed+0x290/0x290
[   42.380878]  ? ___sys_sendmsg+0x162/0x930
[   42.381427]  ? rcu_read_lock_sched_held+0xa3/0x120
[   42.382074]  ? sock_has_perm+0x1f6/0x290
[   42.382614]  ? ___sys_sendmsg+0x167/0x930
[   42.383173]  ? lock_downgrade+0x660/0x660
[   42.383727]  inet_sendmsg+0x123/0x500
[   42.384226]  ? inet_sendmsg+0x123/0x500
[   42.384748]  ? inet_recvmsg+0x540/0x540
[   42.385263]  sock_sendmsg+0xca/0x110
[   42.385758]  SYSC_sendto+0x217/0x380
[   42.386249]  ? SYSC_connect+0x310/0x310
[   42.386783]  ? __might_fault+0x110/0x1d0
[   42.387324]  ? lock_downgrade+0x660/0x660
[   42.387880]  ? __fget_light+0xa1/0x1f0
[   42.388403]  ? __fdget+0x18/0x20
[   42.388851]  ? sock_common_setsockopt+0x95/0xd0
[   42.389472]  ? SyS_setsockopt+0x17f/0x260
[   42.390021]  ? entry_SYSCALL_64_fastpath+0x5/0xbe
[   42.390650]  SyS_sendto+0x40/0x50
[   42.391103]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.391731] RIP: 0033:0x7fbbb711e383
[   42.392217] RSP: 002b:00007ffff4d34f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   42.393235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbbb711e383
[   42.394195] RDX: 0000000000001000 RSI: 00007ffff4d34f60 RDI: 0000000000000003
[   42.395145] RBP: 0000000000000046 R08: 00007ffff4d34f40 R09: 0000000000000018
[   42.396056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400aad
[   42.396598] R13: 0000000000000066 R14: 00007ffff4d34ee0 R15: 00007fbbb717af00
[   42.397257]
[   42.397411] Allocated by task 3789:
[   42.397702]  save_stack_trace+0x16/0x20
[   42.398005]  save_stack+0x46/0xd0
[   42.398267]  kasan_kmalloc+0xad/0xe0
[   42.398548]  kasan_slab_alloc+0x12/0x20
[   42.398848]  __kmalloc_node_track_caller+0xcb/0x380
[   42.399224]  __kmalloc_reserve.isra.32+0x41/0xe0
[   42.399654]  __alloc_skb+0xf8/0x580
[   42.400003]  sock_wmalloc+0xab/0xf0
[   42.400346]  __ip6_append_data.isra.41+0x2472/0x33d0
[   42.400813]  ip6_append_data+0x1a8/0x2f0
[   42.401122]  rawv6_sendmsg+0x11ee/0x2db0
[   42.401505]  inet_sendmsg+0x123/0x500
[   42.401860]  sock_sendmsg+0xca/0x110
[   42.402209]  ___sys_sendmsg+0x7cb/0x930
[   42.402582]  __sys_sendmsg+0xd9/0x190
[   42.402941]  SyS_sendmsg+0x2d/0x50
[   42.403273]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.403718]
[   42.403871] Freed by task 1794:
[   42.404146]  save_stack_trace+0x16/0x20
[   42.404515]  save_stack+0x46/0xd0
[   42.404827]  kasan_slab_free+0x72/0xc0
[   42.405167]  kfree+0xe8/0x2b0
[   42.405462]  skb_free_head+0x74/0xb0
[   42.405806]  skb_release_data+0x30e/0x3a0
[   42.406198]  skb_release_all+0x4a/0x60
[   42.406563]  consume_skb+0x113/0x2e0
[   42.406910]  skb_free_datagram+0x1a/0xe0
[   42.407288]  netlink_recvmsg+0x60d/0xe40
[   42.407667]  sock_recvmsg+0xd7/0x110
[   42.408022]  ___sys_recvmsg+0x25c/0x580
[   42.408395]  __sys_recvmsg+0xd6/0x190
[   42.408753]  SyS_recvmsg+0x2d/0x50
[   42.409086]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.409513]
[   42.409665] The buggy address belongs to the object at ffff88000969e780
[   42.409665]  which belongs to the cache kmalloc-512 of size 512
[   42.410846] The buggy address is located 24 bytes inside of
[   42.410846]  512-byte region [ffff88000969e780ffff88000969e980)
[   42.411941] The buggy address belongs to the page:
[   42.412405] page:ffffea000025a780 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
[   42.413298] flags: 0x100000000008100(slab|head)
[   42.413729] raw: 0100000000008100 0000000000000000 0000000000000000 00000001800c000c
[   42.414387] raw: ffffea00002a9500 0000000900000007 ffff88000c401280 0000000000000000
[   42.415074] page dumped because: kasan: bad access detected
[   42.415604]
[   42.415757] Memory state around the buggy address:
[   42.416222]  ffff88000969e880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   42.416904]  ffff88000969e900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   42.417591] >ffff88000969e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   42.418273]                    ^
[   42.418588]  ffff88000969ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   42.419273]  ffff88000969ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   42.419882] ==================================================================

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2423496af35d94a87156b063ea5cedffc10a70a1)

Orabug: 26175248
CVE-2017-9074

Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agosparc64: Convert non-fatal error print to a debug print (DAX driver)
Sanath Kumar [Mon, 10 Jul 2017 17:16:10 +0000 (12:16 -0500)]
sparc64: Convert non-fatal error print to a debug print (DAX driver)

Error code HV_EWOULDBLOCK returned by the hypervisor is a
transient error. It is not considered fatal.

Orabug: 26305292

Acked-by: Jonathan Helman <jonathan.helman@oracle.com>
Acked-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Sanath Kumar <sanath.s.kumar@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
7 years agobnxt_en: Fix SRIOV on big-endian architecture.
Michael Chan [Tue, 11 Jul 2017 17:05:36 +0000 (13:05 -0400)]
bnxt_en: Fix SRIOV on big-endian architecture.

The PF driver sets up a list of firmware commands from the VF driver that
needs to be forwarded to the PF for approval.  This list is a 256-bit
bitmap.  The code that sets up the bitmap falls apart on big-endian
architecture.  __set_bit() does not work because it operates on long types
whereas the firmware interface is defined in u32 types, causing bits in
the wrong 32-bit word to be set.

Fix it by setting the proper bits on an array of u32.

Fixes: de68f5de5651 ("bnxt_en: Fix bitmap declaration to work on 32-bit arches.")
Reported-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 26000471

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
7 years agocpuset: consider dying css as offline
Tejun Heo [Wed, 24 May 2017 16:03:48 +0000 (12:03 -0400)]
cpuset: consider dying css as offline

In most cases, a cgroup controller don't care about the liftimes of
cgroups.  For the controller, a css becomes online when ->css_online()
is called on it and offline when ->css_offline() is called.

However, cpuset is special in that the user interface it exposes cares
whether certain cgroups exist or not.  Combined with the RCU delay
between cgroup removal and css offlining, this can lead to user
visible behavior oddities where operations which should succeed after
cgroup removals fail for some time period.  The effects of cgroup
removals are delayed when seen from userland.

This patch adds css_is_dying() which tests whether offline is pending
and updates is_cpuset_online() so that the function returns false also
while offline is pending.  This gets rid of the userland visible
delays.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Link:
http://lkml.kernel.org/r/327ca1f5-7957-fbb9-9e5f-9ba149d40ba2@oracle.com
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Orabug: 26415290

(backport upstream commit 41c25707d21716826e3c1f60967f5550610ec1c9)

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: Tom Hromatka <tom.hromatka@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
7 years agoproc: sparc64 Export ADI-enabled memory
Eric Snowberg [Mon, 19 Jun 2017 20:44:52 +0000 (14:44 -0600)]
proc: sparc64 Export ADI-enabled memory

Add a maps entry to /proc/pid/adi to show memory map information about
SPARC ADI enabled memory.

Orabug: 26052545

Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: chris hyser <chris.hyser@oracle.com>
7 years agoproc: sparc64 ADI version tag debugging interface
Eric Snowberg [Fri, 12 May 2017 20:31:42 +0000 (13:31 -0700)]
proc: sparc64 ADI version tag debugging interface

To facilitate user space ADI debugging there needs to be a way for a
debugger to get/set ADI version tags in a target process. This is
accomplished with a new /proc/<pid>/adi/tags interface.  This new interface
maps linearly to the address space of the target process at a ratio
of 1:adi_blksz.  A read (or write) of offset K in the file returns
(or modifies) the ADI version tag stored in the cacheline containing
address K * adi_blksz, encoded as 1 version per byte.

Pseudocode example:

        unsigned char vers[2];
        long long addr = 0x20000;

        fd = open(â\80\9c/proc/pid/adi/tagsâ\80\9d, O_RDONLY);
        addr /= adi_blksz();
        rv = pread64(fd, &vers, 2, addr);
        /*
         * vers[0] gets version from address 0x20000,
         * vers[1] gets version from address 0x20000 + adi_blksz()
         */

Orabug: 26051178

Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
7 years agoproc: Move directory functions into internal.h
Eric Snowberg [Thu, 12 Jan 2017 23:05:23 +0000 (15:05 -0800)]
proc: Move directory functions into internal.h

Move directory macros and define proc_pident_lookup,
proc_pident_readdir and struct pid_entry within
fs/proc/internal.h. These were originally statically defined within
fs/proc/base.c and couldn't be used elsewhere.

Orabug: 26051178

Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
7 years agosched: Move the loadavg code to a more obvious location
Atish Patra [Fri, 23 Jun 2017 19:32:57 +0000 (13:32 -0600)]
sched: Move the loadavg code to a more obvious location

A previous commit f33dfff75d968 ("sched/fair: Rewrite runnable load
and utilization average tracking") created a regression in global
load average in uptime. Active Load average computation function
should be invoked periodically to update the delta for each runqueue.

Use the following upstream commit 3289bdb42 to fix this in stead of
quick-fix.

Before the fix

 procs_
    when running load average
======== ======= =================
13:32:46       1  0.65, 0.22, 0.08
13:33:47     129  0.78, 0.33, 0.12
13:34:47     129  0.74, 0.41, 0.16
13:35:47     129  0.60, 0.42, 0.18
13:36:47     129  0.77, 0.49, 0.22
13:37:47     129  0.78, 0.55, 0.26

After the fix:

  procs_
    when running load average
======== ======= =================
19:46:35       1  0.58, 0.38, 0.16
19:47:35     129  74.02, 21.09, 7.27
19:48:35     129  103.16, 39.08, 14.31
19:49:35     129  114.25, 53.95, 20.98
19:52:36     257  172.40, 97.26, 42.96
19:53:37     257  221.54, 124.95, 55.87
19:54:37     257  237.13, 147.05, 67.80

Original upstream commit message:
I could not find the loadavg code.. turns out it was hidden in a file
called proc.c. It further got mingled up with the cruft per rq load
indexes (which we really want to get rid of).

Move the per rq load indexes into the fair.c load-balance code (that's
the only thing that uses them) and rename proc.c to loadavg.c so we
can find it again.

Orabug: 26266279

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Did minor cleanups to the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 3289bdb429884c0279bf9ab72dff7b934f19dfc6)

Conflicts:

kernel/sched/fair.c
kernel/sched/loadavg.c
kernel/sched/sched.h

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: Atish Patra <atish.patra@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
7 years agosparc64: Treat ERESTARTSYS as an acceptable error (DAX driver)
Sanath Kumar [Wed, 5 Jul 2017 18:09:33 +0000 (13:09 -0500)]
sparc64: Treat ERESTARTSYS as an acceptable error (DAX driver)

get_user_pages fails if the current process calling it has a SIGKILL
posted on it. Since this is an acceptable failure catch this error and
print a debug message instead of an error message.

Orabug: 26393400

Signed-off-by: Sanath Kumar <sanath.s.kumar@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: Jonathan Helman <jonathan.helman@oracle.com>
7 years agoSPARC64: vcc: delay device removal until close()
Aaron Young [Tue, 11 Jul 2017 16:56:40 +0000 (09:56 -0700)]
SPARC64: vcc: delay device removal until close()

    If a vcc device file is open while it's removed (due to a
    domain being unbound), delay the removal of the associated vcc
    device structure until the final close() call is made on
    the device. This preventsthe  device file cdev minor number from
    being reused which can result in ugly filesystem warnings to
    the console.

Orabug: 24594547

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-By: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
7 years agosparc64: fix vio handshake issue
Thomas Tai [Mon, 10 Jul 2017 16:55:55 +0000 (10:55 -0600)]
sparc64: fix vio handshake issue

When rebooting multiple LDoms together with bind/unbind a separate
LDom, kernel panic with vio handshake error. The panic is caused by
vio trying to allocate a buffer which is not freed properly. The
ldc_unbind should unconfigure and stop the ldc queue before freeing
the irq. If the irq is freed before stopping the queue, interrupts
can continue to happen after the irq is freed which may cause
issue.

Orabug: 26259622

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
7 years agosparc64: Use cpu_poke to resume idle cpu
Vijay Kumar [Tue, 11 Jul 2017 01:02:58 +0000 (19:02 -0600)]
sparc64: Use cpu_poke to resume idle cpu

Use cpu_poke hypervisor call to resume idle cpu if supported.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Orabug: 25575672
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>