]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
8 years agonvme.h: Add get_log_page command strucure
Armen Baloyan [Mon, 6 Jun 2016 21:20:44 +0000 (23:20 +0200)]
nvme.h: Add get_log_page command strucure

Add get_log_page command structure and a corresponding entry in
nvme_command union

Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>
Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com>
Reviewed--by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 725b358836ed038d7d8eafef86330b3a0b3f9c2f)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme.h: add RTD3R, RTD3E and OAES fields
Christoph Hellwig [Mon, 6 Jun 2016 21:20:43 +0000 (23:20 +0200)]
nvme.h: add RTD3R, RTD3E and OAES fields

These have been added in NVMe 1.2 and we'll need at least oaes for the
NVMe target driver.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 14e974a84e831bf9a44495c7256a6846e7f77630)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Only release requested regions
Johannes Thumshirn [Tue, 10 May 2016 13:14:28 +0000 (15:14 +0200)]
NVMe: Only release requested regions

The NVMe driver only requests the PCIe device's memory regions but releases
all possible regions (including eventual I/O regions). This leads to a stale
warning entry in dmesg about freeing non existent resources.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit edb50a5403d2e2d2b2b63a8365c4378c9c300ed6)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix removal in case of active namespace list scanning method
Sunad Bhandary [Fri, 27 May 2016 10:29:43 +0000 (15:59 +0530)]
NVMe: Fix removal in case of active namespace list scanning method

In case of the active namespace list scanning method, a namespace that
is detached is not removed from the host if it was the last entry in
the list. Fix this by adding a scan to validate namespaces greater than
the value of prev.

This also handles the case of removing namespaces whose value exceed
the device's reported number of namespaces.

Signed-off-by: Sunad Bhandary S <sunad.s@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 47b0e50ac724d97c392f771bb46f11d9d1575242)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Implement namespace list scanning
Keith Busch [Thu, 22 Oct 2015 21:45:06 +0000 (15:45 -0600)]
NVMe: Implement namespace list scanning

The NVMe 1.1 specification provides an identify mode to return a
list of active namespaces. This is more efficient to discover which
namespace identifiers are active on a controller, providing potentially
significant improvement in scan time for controllers with sparesly
populated namespaces.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: add quirk for the broken Qemu Identify implementation.  To be relaxed
 later]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 540c801c65eb58e05e0ca38b6fd644a83d7e2b33)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Don't unmap controller registers on reset
Keith Busch [Mon, 13 Mar 2017 16:27:07 +0000 (09:27 -0700)]
NVMe: Don't unmap controller registers on reset

Unmapping the registers on reset or shutdown is not necessary. Keeping
the mapping simplifies reset handling.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit b00a726a9fd82ddd4c10344e46f0d371e1674303)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: reduce admin queue depth as workaround for Samsung EPIC SQ errata
Ashok Vairavan [Thu, 8 Dec 2016 00:10:38 +0000 (16:10 -0800)]
NVMe: reduce admin queue depth as workaround for Samsung EPIC SQ errata

Orabug: 25186219

PCIe analyzer tracing by Oracle and Samsung revealed an errata in Samsung's
firmware for EPIC SSDs where the invalid completion entries in admin queue
and IO queue can occur  when the queues straddle an 8MB DMA address boundary.

This patch limits admin queue depth to 64 for EPIC SSDs.

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agonvme: Limit command retries
Keith Busch [Mon, 13 Mar 2017 16:06:05 +0000 (09:06 -0700)]
nvme: Limit command retries

Many controller implementations will return errors to commands that will
not succeed, but without the DNR bit set. The driver previously retried
these commands an unlimited number of times until the command timeout
has exceeded, which takes an unnecessarilly long period of time.

This patch limits the number of retries a command can have, defaulting
to 5, but is user tunable at load or runtime.

The struct request's 'retries' field is used to track the number of
retries attempted. This is in contrast with scsi's use of this field,
which indicates how many retries are allowed.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f80ec966c19b78af4360e26e32e1ab775253105f)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: reduce queue depth as workaround for Samsung EPIC SQ errata
Ashok Vairavan [Wed, 23 Nov 2016 22:31:21 +0000 (14:31 -0800)]
NVMe: reduce queue depth as workaround for Samsung EPIC SQ errata

Orabug: 25138123

Oracle discovered that the NVMe driver gets SQ completion errors eventually
leading to the device being reset, taken out of the PCI bus tree or kernel
panics when using the default SQ size of 1024 entries (64KB) for Samsung
EPIC NVMe SSDs.

PCIe analyzer tracing by Oracle and Samsung revealed an errata in Samsung's
firmware for EPIC SSDs where these invalid completion entries can occur
when the queues straddle an 8MB DMA address boundary.

This patch works around the errata by detecting these specific devices and
limiting their descriptor queue depth to 64.  This is only for the Samsung
NVMe controllers used in Oracle X-series servers.

There was no noticeable performance impact of reducing queue depths to 64
for these Samsung drives, Oracle X6-2 server, and Oracle VM Server 3.4.2.

Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Signed-off-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Create discard zero quirk white list
Keith Busch [Fri, 4 Mar 2016 20:15:17 +0000 (13:15 -0700)]
NVMe: Create discard zero quirk white list

The NVMe specification does not require discarded blocks return zeroes on
read, but provides that behavior as a possibility. Some applications more
efficiently use an SSD if reads on discarded blocks were deterministically
zero, based on the "discard_zeroes_data" queue attribute.

There is no specification defined way to determine device behavior on
discarded blocks, so the driver always left the queue setting disabled. We
can only know behavior based on individual device models, so this patch
adds a flag to the NVMe "quirk" list that vendors may set if they know
their controller works that way. The patch also sets the new flag for one
such known device.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Suggested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 08095e70783f1d8296f858d37a9e1878f5da0623)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: use UINT_MAX for max discard sectors
Minfei Huang [Tue, 17 May 2016 07:58:41 +0000 (15:58 +0800)]
nvme: use UINT_MAX for max discard sectors

It's more elegant to use UINT_MAX to represent the max value of
type unsigned int. So replace the actual value by using this define.

Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bd0fc2884ca4d1516da1aa5cf44385e24dc23c29)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move nvme_cancel_request() to common code
Ming Lin [Wed, 18 May 2016 21:05:02 +0000 (14:05 -0700)]
nvme: move nvme_cancel_request() to common code

So it can be used by fabrics driver also.

Signed-off-by: Ming Lin <ming.l@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Keith Busch <keith.bsuch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit c55a2fd4bb16bcdd8c42e3d64fccd326416b7492)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: update and rename nvme_cancel_io to nvme_cancel_request
Ming Lin [Wed, 18 May 2016 21:05:01 +0000 (14:05 -0700)]
nvme: update and rename nvme_cancel_io to nvme_cancel_request

nvme_cancel_io is a bit confusing (given the distinction of io/admin),
so rename it to nvme_cancel_request.

And update it a bit to pass in struct nvme_ctrl, so it can be used
by Fabrics driver also.

Signed-off-by: Ming Lin <ming.l@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Keith Busch <keith.bsuch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e1958e6534a2d4ebb2dfcd0b3f16ff8e277a5b0c)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoblk-mq: Export tagset iter function
Sagi Grimberg [Tue, 3 Jan 2017 16:40:50 +0000 (08:40 -0800)]
blk-mq: Export tagset iter function

Its useful to iterate on all the active tags in cases
where we will need to fail all the queues IO.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
[hch: carefully check for valid tagsets]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e0489487ec9cd79ee1fa0dc5d3789c08b0e51a2c)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Add device ID's with stripe quirk
Keith Busch [Mon, 2 May 2016 21:14:24 +0000 (15:14 -0600)]
NVMe: Add device ID's with stripe quirk

Adds two Intel controllers that have the "stripe" quirk.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 99466e708ddce8904c8635c213f2deb523ef4fb9)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Short-cut removal on surprise hot-unplug
Keith Busch [Thu, 12 May 2016 14:37:14 +0000 (08:37 -0600)]
NVMe: Short-cut removal on surprise hot-unplug

This patch adds a new state that when set has the core automatically
kill request queues prior to removing namespaces.

If PCI device is not present at the time the nvme driver's remove is
called, we can kill all IO queues immediately instead of waiting for
the watchdog thread to do that at its polling interval. This improves
scenarios where multiple hot plug events occur at the same time since
it doesn't block the pci enumeration for as long.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 0ff9d4e1a284a9282a049bf064f123e27f838907)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Allow user initiated rescan
Keith Busch [Fri, 29 Apr 2016 21:45:18 +0000 (15:45 -0600)]
NVMe: Allow user initiated rescan

This exposes ioctl and sysfs methods a user can invoke to request the
driver rescan a controller and its namespaces. This is less harsh than
doing a controller reset, which temporarilly halts all IO, just to
surface a newly attached namespace.

This is mainly useful for controllers that implement the namespace
management command, but do not support the namespace notify change
asynchronous event notification.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 9ec3bb2f994bda9c8817856fdcbfaebe8f62fbd3)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Reduce driver log spamming
Keith Busch [Mon, 4 Apr 2016 21:07:41 +0000 (15:07 -0600)]
NVMe: Reduce driver log spamming

Reduce error logging when no corrective action is required.

Suggessted-by: Chris Petersen <cpetersen@fb.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d011fb3164e8694d7839f10a497f8ab6c660149a)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Unbind driver on failure
Keith Busch [Mon, 28 Mar 2016 22:03:21 +0000 (16:03 -0600)]
NVMe: Unbind driver on failure

Instead of removing the PCI device from the kernel's topology on
controller failure, this patch simply requests unbinding the device
from the driver. This avoids concurrently running pci removal with the
hot plug event, which has been reported to be problematic when multiple
surprise events occur near simultaneously.

The other benefit is that we will have PCI config and memory space
available to poke around for debugging a failed controller, assuming
the device was not physically removed.

The down side occurs if the platform and/or kernel do not support any
type of surprise hot removal. The device will remain visible through
sysfs (and therefore lspci), and some manual work is necessary to get
the logical topology corrected. But if your platform and/or kernel don't
support surprise removal, you probably shouldn't be doing that anyway.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 921920ab32f290dafdb0359024d4587897712728)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Delete only created queues
Keith Busch [Fri, 6 May 2016 17:50:52 +0000 (11:50 -0600)]
NVMe: Delete only created queues

Use the online queue count instead of the number of allocated queues. The
controller should just return an invalid queue identifier error to the
commands if a queue wasn't created. While it's not harmful, it's still
not correct.

Reported-by: Saar Gross <saar@annapurnalabs.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 014a0d609eb4721d1e416cf10da2d5602f9b34d5)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix reset/remove race
Keith Busch [Fri, 8 Apr 2016 22:11:02 +0000 (16:11 -0600)]
NVMe: Fix reset/remove race

This fixes a scenario where device is present and being reset, but a
request to unbind the driver occurs.

A previous patch series addressing a device failure removal scenario
flushed reset_work after controller disable to unblock reset_work waiting
on a completion that wouldn't occur. This isn't safe as-is. The broken
scenario can potentially be induced with:

  modprobe nvme && modprobe -r nvme

To fix, the reset work is flushed immediately after setting the controller
removing flag, and any subsequent reset will not proceed with controller
initialization if the flag is set.

The controller status must be polled while active, so the watchdog timer
is also left active until the controller is disabled to cleanup requests
that may be stuck during namespace removal.

[Fixes: ff23a2a15a2117245b4599c1352343c8b8fb4c43]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 87c32077819c695cbc5ab00226a28010cd5806c3)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: fix nvme_ns_remove() deadlock
Ming Lin [Mon, 25 Apr 2016 21:20:19 +0000 (14:20 -0700)]
nvme: fix nvme_ns_remove() deadlock

On receipt of a namespace attribute changed AER, we acquire the
namespace mutex lock before proceeding to scan and validate the
namespace list. In case of namespace detach/delete command,
nvme_ns_remove function deadlocks trying to acquire the already held
lock.

All callers, except nvme_remove_namespaces(), of nvme_ns_remove()
already held namespaces_mutex. So we can simply fix the deadlock by
not acquiring the mutex in nvme_ns_remove() and acquiring it in
nvme_remove_namespaces().

Reported-by: Sunad Bhandary S <sunad.s@samsung.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimerg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit b7b9c2278752e37dc7ae918cda823aa2a078e03b)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: switch to RCU freeing the namespace
Ming Lin [Mon, 25 Apr 2016 21:20:18 +0000 (14:20 -0700)]
nvme: switch to RCU freeing the namespace

Switch to RCU freeing the namespace structure so that
nvme_start_queues, nvme_stop_queues and nvme_kill_queues would
be able to get away with only a RCU read side critical section.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimerg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 32f0c4afb4363e31dad49202f1554ba591d649f2)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: correct comment for offset enum of controller registers in nvme.h
Wang Sheng-Hui [Wed, 27 Apr 2016 12:10:16 +0000 (20:10 +0800)]
NVMe: correct comment for offset enum of controller registers in nvme.h

Section 3.1 gives the comment for the offset of controller registers
in the specification 1.2a.

Some are mis-copied in the header file nvme.h. Correct them.

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a5b714ad395803a6aa91793b9e52a81b176b8ba9)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: add helper nvme_cleanup_cmd()
Ming Lin [Mon, 25 Apr 2016 21:33:20 +0000 (14:33 -0700)]
nvme: add helper nvme_cleanup_cmd()

This hides command cleanup into nvme.h and fabrics drivers will
also use it.

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 6904242db1ac07403c331b18796f6c2bf5382aec)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move AER handling to common code
Christoph Hellwig [Tue, 26 Apr 2016 11:52:00 +0000 (13:52 +0200)]
nvme: move AER handling to common code

The transport driver still needs to do the actual submission, but all the
higher level code can be shared.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f866fc4282a81673ef973ad54c68235a3263b42e)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move namespace scanning to core
Christoph Hellwig [Fri, 30 Dec 2016 21:10:00 +0000 (13:10 -0800)]
nvme: move namespace scanning to core

Move the scan work item and surrounding code to the common code.  For now
we need a new finish_scan method to allow the PCI driver to set the
irq affinity hints, but I have plans in the works to obsolete this as well.

Note that this moves the namespace scanning from nvme_wq to the system
workqueue, but as we don't rely on namespace scanning to finish from reset
or I/O this should be fine.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 5955be2144b3b56182e2175e7e3d2ddf27fb485d)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: tighten up state check for namespace scanning
Christoph Hellwig [Tue, 26 Apr 2016 11:51:58 +0000 (13:51 +0200)]
nvme: tighten up state check for namespace scanning

We only should be scanning namespaces if the controller is live.  Currently
we call the function just before setting it live, so fix the code up to
move the call to nvme_queue_scan to just below the state change.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 92911a55d42084cd285250c275d9f238783638c2)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: introduce a controller state machine
Christoph Hellwig [Tue, 26 Apr 2016 11:51:57 +0000 (13:51 +0200)]
nvme: introduce a controller state machine

Replace the adhoc flags in the PCI driver with a state machine in the
core code.  Based on code from Sagi Grimberg for the Fabrics driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bb8d261e088811ef2b564d745afcd1633428010a)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: remove the io_incapable method
Christoph Hellwig [Tue, 26 Apr 2016 11:51:56 +0000 (13:51 +0200)]
nvme: remove the io_incapable method

It's unused since "NVMe: Move error handling to failed reset handler".

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 04a934d4c7251e6458a7898c2b4d6c2da29b132c)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: nvme_core_exit() should do cleanup in the reverse order as nvme_core_init does
Wang Sheng-Hui [Thu, 28 Apr 2016 08:19:31 +0000 (16:19 +0800)]
NVMe: nvme_core_exit() should do cleanup in the reverse order as nvme_core_init does

nvme_core_init does:
    1) register_blkdev
    2) __register_chrdev
    3) class_create

nvme_core_exit should do cleanup in the reverse order.

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 23bd63ceea30878758c303baaf9f8e28f299c578)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix check_flush_dependency warning
Keith Busch [Wed, 27 Apr 2016 21:51:18 +0000 (15:51 -0600)]
NVMe: Fix check_flush_dependency warning

If the controller fails and is degraded after a reset, we need to kill
off all requests queues before removing the inaccessble namespaces. This
will prevent del_gendisk from syncing dirty data, which we can't due
from a WQ_MEM_RECLAIM work queue.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 3b24774e1fb90a40836e96e39a851a774679efff)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: small typo in section BLK_DEV_NVME_SCSI of host/Kconfig
Wang Sheng-Hui [Wed, 20 Apr 2016 02:04:32 +0000 (10:04 +0800)]
NVMe: small typo in section BLK_DEV_NVME_SCSI of host/Kconfig

"as well as " is miss typed "as well a " in section
"config BLK_DEV_NVME_SCSI"

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit b31356dfde571d925768783f1bb63ca8e156d0b3)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: fix cntlid type
Christoph Hellwig [Sat, 16 Apr 2016 18:57:58 +0000 (14:57 -0400)]
nvme: fix cntlid type

Controller IDs in NVMe are unsigned 16-bit types.  In the Fabrics driver we
actually pass ctrl->id by reference, so we need it to have the correct type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 76e3914ae51714b0535c38d9472d89124e0b6b96)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: Avoid reset work on watchdog timer function during error recovery
Guilherme G. Piccoli [Wed, 13 Apr 2016 14:08:20 +0000 (11:08 -0300)]
nvme: Avoid reset work on watchdog timer function during error recovery

This patch adds a check on nvme_watchdog_timer() function to avoid the
call to reset_work() when an error recovery process is ongoing on
controller. The check is made by looking at pci_channel_offline()
result.

If we don't check for this on nvme_watchdog_timer(), error recovery
mechanism can't recover well, because reset_work() won't be able to
do its job (since we're in the middle of an error) and so the
controller is removed from the system before error recovery mechanism
can perform slot reset (which would allow the adapter to recover).

In this patch we also have split the huge condition expression on
nvme_watchdog_timer() by introducing an auxiliary function to help
make the code more readable.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit c875a7093f0479215cf9bf51356d7638f2ec5746)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: remove dead controllers from a work item
Christoph Hellwig [Thu, 26 Nov 2015 11:35:49 +0000 (12:35 +0100)]
nvme: remove dead controllers from a work item

Compared to the kthread this gives us multiple call prevention for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 5c8809e650772be87ba04595a8ccf278bab7b543)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: silence warning about unused 'dev'
Jens Axboe [Tue, 12 Apr 2016 22:11:11 +0000 (16:11 -0600)]
NVMe: silence warning about unused 'dev'

Depending on options, we might not be using dev in nvme_cancel_io():

drivers/nvme/host/pci.c: In function â€˜nvme_cancel_io’:
drivers/nvme/host/pci.c:970:19: warning: unused variable â€˜dev’ [-Wunused-variable]
  struct nvme_dev *dev = data;
                   ^

So get rid of it, and just cast for the dev_dbg_ratelimited() call.

Fixes: 82b4552b91c4 ("nvme: Use blk-mq helper for IO termination")
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7e19793096994d43d213f440f4bbea926828a727)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: switch to using blk_queue_write_cache()
Jens Axboe [Tue, 12 Apr 2016 21:43:09 +0000 (15:43 -0600)]
NVMe: switch to using blk_queue_write_cache()

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7c88cb00f2a26637bade6c62a17d17f31a954e30)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoblock: add ability to flag write back caching on a device
Jens Axboe [Tue, 12 Apr 2016 18:32:46 +0000 (12:32 -0600)]
block: add ability to flag write back caching on a device

Add an internal helper and flag for setting whether a queue has
write back caching, or write through (or none). Add a sysfs file
to show this as well, and make it changeable from user space.

This will replace the (awkward) blk_queue_flush() interface that
drivers currently use to inform the block layer of write cache state
and capabilities.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
(cherry picked from commit 93e9d8e836cb1a9a58b33eb6643bf061c6119ef2)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: Use blk-mq helper for IO termination
Sagi Grimberg [Tue, 12 Apr 2016 21:07:15 +0000 (15:07 -0600)]
nvme: Use blk-mq helper for IO termination

blk-mq offers a tagset iterator so let's use that
instead of using nvme_clear_queues.

Note, we changed nvme_queue_cancel_ios name to nvme_cancel_io
as there is no concept of a queue now in this function (we
also lost the print).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 82b4552b91c40626a90a20291aab1137c638b512)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Skip async events for degraded controllers
Keith Busch [Tue, 12 Apr 2016 17:13:11 +0000 (11:13 -0600)]
NVMe: Skip async events for degraded controllers

If the controller is degraded, the driver should stay out of the way so
the user can recover the drive. This patch skips driver initiated async
event requests when the drive is in this state.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 21f033f7c72e9505c46c6555b019b907dc39dfcd)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: add helper nvme_setup_cmd()
Ming Lin [Tue, 12 Apr 2016 19:10:14 +0000 (13:10 -0600)]
nvme: add helper nvme_setup_cmd()

This moves nvme_setup_{flush,discard,rw} calls into a common
nvme_setup_cmd() helper. So we can eventually hide all the command
setup in the core module and don't even need to update the fabrics
drivers for any specific command type.

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 8093f7ca73c1633e458c16a74b51bcc3c94564c4)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoblock: add offset in blk_add_request_payload()
Ming Lin [Tue, 22 Mar 2016 07:24:44 +0000 (00:24 -0700)]
block: add offset in blk_add_request_payload()

We could kmalloc() the payload, so need the offset in page.

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 37e58237a16b94fcd2c2d1b7e9c6e1ca661c231b)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: rewrite discard support
Ming Lin [Tue, 22 Mar 2016 07:24:45 +0000 (00:24 -0700)]
nvme: rewrite discard support

This rewrites nvme_setup_discard() with blk_add_request_payload().
It allocates only the necessary amount(16 bytes) for the payload.

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 03b5929ebb20457e2fd13a701954efa2b2fb7ded)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: add helper nvme_map_len()
Ming Lin [Tue, 22 Mar 2016 07:24:43 +0000 (00:24 -0700)]
nvme: add helper nvme_map_len()

The helper returns the number of bytes that need to be mapped
using PRPs/SGL entries.

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 58b45602751ddf16e57170656670aa5a8f78eeca)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: add missing lock nesting notation
Ming Lin [Tue, 5 Apr 2016 17:32:04 +0000 (10:32 -0700)]
nvme: add missing lock nesting notation

When unloading driver, nvme_disable_io_queues() calls nvme_delete_queue()
that sends nvme_admin_delete_cq command to admin sq. So when the command
completed, the lock acquired by nvme_irq() actually belongs to admin queue.

While the lock that nvme_del_cq_end() trying to acquire belongs to io queue.
So it will not deadlock.

This patch adds lock nesting notation to fix following report.

[  109.840952] =============================================
[  109.846379] [ INFO: possible recursive locking detected ]
[  109.851806] 4.5.0+ #180 Tainted: G            E
[  109.856533] ---------------------------------------------
[  109.861958] swapper/0/0 is trying to acquire lock:
[  109.866771]  (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme]
[  109.876535]
[  109.876535] but task is already holding lock:
[  109.882398]  (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme]
[  109.891547]
[  109.891547] other info that might help us debug this:
[  109.898107]  Possible unsafe locking scenario:
[  109.898107]
[  109.904056]        CPU0
[  109.906515]        ----
[  109.908974]   lock(&(&nvmeq->q_lock)->rlock);
[  109.913381]   lock(&(&nvmeq->q_lock)->rlock);
[  109.917787]
[  109.917787]  *** DEADLOCK ***
[  109.917787]
[  109.923738]  May be due to missing lock nesting notation
[  109.923738]
[  109.930558] 1 lock held by swapper/0/0:
[  109.934413]  #0:  (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme]
[  109.944010]
[  109.944010] stack backtrace:
[  109.948389] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G            E   4.5.0+ #180
[  109.955734] Hardware name: Dell Inc. OptiPlex 7010/0YXT71, BIOS A15 08/12/2013
[  109.962989]  0000000000000000 ffff88011e203c38 ffffffff81383d9c ffffffff81c13540
[  109.970478]  ffffffff826711d0 ffff88011e203ce8 ffffffff810bb429 0000000000000046
[  109.977964]  0000000000000046 0000000000000000 0000000000b2e597 ffffffff81f4cb00
[  109.985453] Call Trace:
[  109.987911]  <IRQ>  [<ffffffff81383d9c>] dump_stack+0x85/0xc9
[  109.993711]  [<ffffffff810bb429>] __lock_acquire+0x19b9/0x1c60
[  109.999575]  [<ffffffff810b6d1d>] ? trace_hardirqs_off+0xd/0x10
[  110.005524]  [<ffffffff810b386d>] ? complete+0x3d/0x50
[  110.010688]  [<ffffffff810bb760>] lock_acquire+0x90/0xf0
[  110.016029]  [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme]
[  110.022418]  [<ffffffff81772afb>] _raw_spin_lock_irqsave+0x4b/0x60
[  110.028632]  [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme]
[  110.035019]  [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme]
[  110.041232]  [<ffffffff8135b485>] blk_mq_end_request+0x35/0x60
[  110.047095]  [<ffffffffc0821ad8>] nvme_complete_rq+0x68/0x190 [nvme]
[  110.053481]  [<ffffffff8135b53f>] __blk_mq_complete_request+0x8f/0x130
[  110.060043]  [<ffffffff8135b611>] blk_mq_complete_request+0x31/0x40
[  110.066343]  [<ffffffffc08209e3>] __nvme_process_cq+0x83/0x240 [nvme]
[  110.072818]  [<ffffffffc0820c35>] nvme_irq+0x25/0x50 [nvme]
[  110.078419]  [<ffffffff810cdb66>] handle_irq_event_percpu+0x36/0x110
[  110.084804]  [<ffffffff810cdc77>] handle_irq_event+0x37/0x60
[  110.090491]  [<ffffffff810d0ea3>] handle_edge_irq+0x93/0x150
[  110.096180]  [<ffffffff81012306>] handle_irq+0xa6/0x130
[  110.101431]  [<ffffffff81011abe>] do_IRQ+0x5e/0x120
[  110.106333]  [<ffffffff8177384c>] common_interrupt+0x8c/0x8c

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2e39e0f608c130411f52c9fe5648dbcda5e28528)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Always use MSI/MSI-x interrupts
Keith Busch [Fri, 8 Apr 2016 22:09:10 +0000 (16:09 -0600)]
NVMe: Always use MSI/MSI-x interrupts

Multiple users have reported device initialization failure due the driver
not receiving legacy PCI interrupts. This is not unique to any particular
controller, but has been observed on multiple platforms.

There have been no issues reported or observed when with message signaled
interrupts, so this patch attempts to use MSI-x during initialization,
falling back to MSI. If that fails, legacy would become the default.

The setup_io_queues error handling had to change as a result: the admin
queue's msix_entry used to be initialized to the legacy IRQ. The case
where nr_io_queues is 0 would fail request_irq when setting up the admin
queue's interrupt since re-enabling MSI-x fails with 0 vectors, leaving
the admin queue's msix_entry invalid. Instead, return success immediately.

Reported-by: Tim Muhlemmer <muhlemmer@gmail.com>
Reported-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a5229050b69cfffb690b546c357ca5a60434c0c8)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix reset/remove race
Keith Busch [Fri, 8 Apr 2016 22:11:02 +0000 (16:11 -0600)]
NVMe: Fix reset/remove race

This fixes a scenario where device is present and being reset, but a
request to unbind the driver occurs.

A previous patch series addressing a device failure removal scenario
flushed reset_work after controller disable to unblock reset_work waiting
on a completion that wouldn't occur. This isn't safe as-is. The broken
scenario can potentially be induced with:

  modprobe nvme && modprobe -r nvme

To fix, the reset work is flushed immediately after setting the controller
removing flag, and any subsequent reset will not proceed with controller
initialization if the flag is set.

The controller status must be polled while active, so the watchdog timer
is also left active until the controller is disabled to cleanup requests
that may be stuck during namespace removal.

[Fixes: ff23a2a15a2117245b4599c1352343c8b8fb4c43]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 9bf2b972afeaffd173fe2ce211ebc555ea7e8a87)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: avoid cqe corruption when update at the same time as read
Marta Rybczynska [Tue, 22 Mar 2016 15:02:06 +0000 (16:02 +0100)]
nvme: avoid cqe corruption when update at the same time as read

Make sure the CQE phase (validity) is read before the rest of the
structure. The phase bit is the highest address and the CQE
read will happen on most platforms from lower to upper addresses
and will be done by multiple non-atomic loads. If the structure
is updated by PCI during the reads from the processor, the
processor may get a corrupted copy.

The addition of the new nvme_cqe_valid function that verifies
the validity bit also allows refactoring of the other CQE read
sequences.

Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d783e0bd02e700e7a893ef4fa71c69438ac1c276)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Expose ns wwid through single sysfs entry
Keith Busch [Thu, 18 Feb 2016 16:57:48 +0000 (09:57 -0700)]
NVMe: Expose ns wwid through single sysfs entry

The method to uniquely identify a namespace depends on the controller's
specification revision level and implemented capabilities. This patch
has the driver figure this out and exports the unique string through a
single 'wwid' attribute so the user doesn't have this burden.

The longest namespace unique identifier is used if available. If not
available, the driver will concat the controller's vendor, serial,
and model with the namespace ID. The specification provides this as a
unique indentifier.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 118472ab8532e55f48395ef5764b354fe48b1d73)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Remove unused sq_head read in completion path
Jon Derrick [Tue, 8 Mar 2016 17:34:54 +0000 (10:34 -0700)]
NVMe: Remove unused sq_head read in completion path

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 48c7823f42da2bc881ae2e325ed40123871c2fb9)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: fix max_segments integer truncation
Christoph Hellwig [Fri, 30 Dec 2016 20:51:50 +0000 (12:51 -0800)]
nvme: fix max_segments integer truncation

The block layer uses an unsigned short for max_segments.  The way we
calculate the value for NVMe tends to generate very large 32-bit values,
which after integer truncation may lead to a zero value instead of
the desired outcome.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Jeff Lien <Jeff.Lien@hgst.com>
Tested-by: Jeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 45686b6198bd824f083ff5293f191d78db9d708a)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: set queue limits for the admin queue
Christoph Hellwig [Wed, 2 Mar 2016 17:07:11 +0000 (18:07 +0100)]
nvme: set queue limits for the admin queue

Factor out a helper to set all the device specific queue limits and apply
them to the admin queue in addition to the I/O queues.  Without this the
command size on the admin queue is arbitrarily low, and the missing
other limitations are just minefields waiting for victims.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Jeff Lien <Jeff.Lien@hgst.com>
Tested-by: Jeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit da35825d9a091a7a1d5824c8468168e2658333ff)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix 0-length integrity payload
Keith Busch [Wed, 24 Feb 2016 16:15:58 +0000 (09:15 -0700)]
NVMe: Fix 0-length integrity payload

A user could send a passthrough IO command with a metadata pointer to a
namespace without metadata. With metadata length of 0, kmalloc returns
ZERO_SIZE_PTR. Since that is not NULL, the driver would have set this as
the bio's integrity payload, which causes an access fault on completion.

This patch ignores the users metadata buffer if the namespace format
does not support separate metadata.

Reported-by: Stephen Bates <stephen.bates@microsemi.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e9fc63d682dbbef17921aeb00d03fd52d6735ffd)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Don't allow unsupported flags
Keith Busch [Wed, 24 Feb 2016 16:15:57 +0000 (09:15 -0700)]
NVMe: Don't allow unsupported flags

The command flags can change the meaning of other fields in the command
that the driver is not prepared to handle. Specifically, the user could
passthrough an SGL flag, causing the controller to misinterpret the PRP
list the driver created, potentially corrupting memory or data.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 63088ec7c8eadfe08b96127a41b385ec9742dace)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Move error handling to failed reset handler
Keith Busch [Fri, 30 Dec 2016 03:19:31 +0000 (19:19 -0800)]
NVMe: Move error handling to failed reset handler

This moves failed queue handling out of the namespace removal path and
into the reset failure path, fixing a hanging condition if the controller
fails or link down during del_gendisk. Previously the driver had to see
the controller as degraded prior to calling del_gendisk to setup the
queues to fail. But, if the controller happened to fail after this,
there was no task to end outstanding requests.

On failure, all namespace states are set to dead. This has capacity
revalidate to 0, and ends all new requests with error status.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 69d9a99c258eb1d6478fd9608a2070890797eed7)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Simplify device reset failure
Keith Busch [Wed, 22 Feb 2017 19:13:09 +0000 (11:13 -0800)]
NVMe: Simplify device reset failure

A reset failure schedules the device to unbind from the driver through
the pci driver's remove. This cleans up all intialization, so there is
no need to duplicate the potentially racy cleanup.

To help understand why a reset failed, the status is logged with the
existing warning message.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f58944e265d4ebe47216a5d7488aee3928823d30)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix namespace removal deadlock
Keith Busch [Wed, 24 Feb 2016 16:15:54 +0000 (09:15 -0700)]
NVMe: Fix namespace removal deadlock

This patch makes nvme namespace removal lockless. It is up to the caller
to ensure no active namespace scanning is occuring. To ensure no scan
work occurs, the nvme pci driver adds a removing state to the controller
device to avoid queueing scan work during removal. The work is flushed
after setting the state, so no new scan work can be queued.

The lockless removal allows the driver to cleanup a namespace
request_queue if the controller fails during removal. Previously this
could deadlock trying to acquire the namespace mutex in order to handle
such events.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 646017a612e72f19bd9f991fe25287a149c5f627)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Use IDA for namespace disk naming
Keith Busch [Wed, 24 Feb 2016 16:15:53 +0000 (09:15 -0700)]
NVMe: Use IDA for namespace disk naming

A namespace may be detached from a controller, but a user may be holding
a reference to it. Attaching a new namespace with the same NSID will create
duplicate names when using the NSID to name the disk.

This patch uses an IDA that is released only when the last reference is
released instead of using the namespace ID.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 075790ebba4a1eb297f9875e581b55c0382b1f3d)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: expose cntlid in sysfs
Ming Lin [Fri, 26 Feb 2016 21:24:19 +0000 (13:24 -0800)]
nvme: expose cntlid in sysfs

For NVMe over Fabrics, the cntlid will be used by systemd/udev to
create link to the device, for example,

/dev/disk/by-path/<fabrics-info>-<cntlid>-<namespace> -> /dev/nvme0n1

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 931e1c2204c6d00c11c5c1e2e1c20b5ca41f292d)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: return the whole CQE through the request passthrough interface
Christoph Hellwig [Mon, 29 Feb 2016 14:59:47 +0000 (15:59 +0100)]
nvme: return the whole CQE through the request passthrough interface

Both LighNVM and NVMe over Fabrics need to look at more than just the
status and result field.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matias Bj?rling <m@bjorling.me>
Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1cb3cce5eb9de335330c8a147e47e3359a51a8b5)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: fix Kconfig description for BLK_DEV_NVME_SCSI
Christoph Hellwig [Tue, 9 Feb 2016 17:21:22 +0000 (10:21 -0700)]
nvme: fix Kconfig description for BLK_DEV_NVME_SCSI

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 21d147880e4895e645f890904784b079f1ba76f4)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: replace the kthread with a per-device watchdog timer
Christoph Hellwig [Mon, 29 Feb 2016 14:59:46 +0000 (15:59 +0100)]
nvme: replace the kthread with a per-device watchdog timer

The only work left in the kthread is the periodic health check for each
controller.  There is no need to run this from process context or keep
a thread context around for it, so replace it with a simpler timer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2d55cd5f511d6fc377734473b237ac50820bfb9f)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: don't poll the CQ from the kthread
Christoph Hellwig [Mon, 29 Feb 2016 14:59:45 +0000 (15:59 +0100)]
nvme: don't poll the CQ from the kthread

There is no reason to do unconditional polling of CQs per the NVMe
spec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 79f2b358c9ba373943a9284be2861fde58291c4e)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: use a work item to submit async event requests
Christoph Hellwig [Mon, 29 Feb 2016 14:59:44 +0000 (15:59 +0100)]
nvme: use a work item to submit async event requests

Use a dedicated work item to submit async event requests instead of the
global kthread.  This simplifies the code and reduces the latencies to
resubmit a request once an even notification happened.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 9396dec916c052855dbb5b876c13d163df397319)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Rate limit nvme IO warnings
Keith Busch [Thu, 11 Feb 2016 20:05:47 +0000 (13:05 -0700)]
NVMe: Rate limit nvme IO warnings

We don't need to spam the kernel logs with thousands of IO cancelling
messages. We can infer all IO's are being cancelled with fewer, or
even none at all. This patch rate limits the message and uses the debug
log level as it is mainly used for testing purposes.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f8e68a7c9af5f8047f7f8295874bedf306063709)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Poll device while still active during remove
Keith Busch [Thu, 11 Feb 2016 20:05:43 +0000 (13:05 -0700)]
NVMe: Poll device while still active during remove

A device failure or link down wouldn't have been detected during namespace
removal. This patch keeps the device in the list for polling so that the
thread may see such failure and initiate a reset. The device is removed
from the list after disable, so we can safely flush the reset work as
it can't be requeued when disable completes.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit ff23a2a15a2117245b4599c1352343c8b8fb4c43)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Requeue requests on suspended queues
Keith Busch [Thu, 11 Feb 2016 20:05:42 +0000 (13:05 -0700)]
NVMe: Requeue requests on suspended queues

It's possible a request may get to the driver after the nvme queue was
disabled. This has the request requeue if that happens.

Note the request is still "started" by the driver, but requeuing will
clear the start state for timeout handling.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2850713576e81e3b887cd92a9965fba0dd1717c0)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Allow request merges
Keith Busch [Thu, 11 Feb 2016 20:05:40 +0000 (13:05 -0700)]
NVMe: Allow request merges

It is generally more efficient to submit larger IO.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2850713576e81e3b887cd92a9965fba0dd1717c0)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix io incapable return values
Keith Busch [Thu, 11 Feb 2016 20:05:39 +0000 (13:05 -0700)]
NVMe: Fix io incapable return values

The function returns true when the controller can't handle IO.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2850713576e81e3b887cd92a9965fba0dd1717c0)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: split pci module out of core module
Ming Lin [Wed, 10 Feb 2016 18:03:32 +0000 (10:03 -0800)]
nvme: split pci module out of core module

NVMe over Fabrics drivers are going to reuse the core,
so splits nvme.ko into 2 modules:

nvme-core.ko: the core part
nvme.ko: the PCI driver

Export symbols from nvme-core.ko.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 576d55d625664a20ee4bae6500952febfb2d7b10)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: split dev_list_lock
Ming Lin [Wed, 10 Feb 2016 18:03:31 +0000 (10:03 -0800)]
nvme: split dev_list_lock

Split dev_list_lock into one in the core and one in the PCI driver.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 9f2482b91bcd02ac2999cf04b3fb1b89e1c4d559)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move timeout variables to core.c
Ming Lin [Wed, 10 Feb 2016 18:03:30 +0000 (10:03 -0800)]
nvme: move timeout variables to core.c

These variables are used by PCI driver and will also be used in the
forthcoming NVMe over Fabrics drivers.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit ba0ba7d3e5266111ec865b0bf1ad48dd0e2a2314)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme/host: reference the fabric module for each bdev open callout
Sagi Grimberg [Wed, 10 Feb 2016 18:03:29 +0000 (10:03 -0800)]
nvme/host: reference the fabric module for each bdev open callout

We don't want to be able to unload the fabric driver when we have
openened referenced to our namespaces. Thus, for each nvme_open we
take a reference on the fabric driver and put it in nvme_release.
This behavior is consistent with the scsi model.

This resolves the panic when unloading a fabric module with
mpath holders.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ian Bakshan <ianb@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e439bb12e75c2807029853493fa787c6d70c763a)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: Log the ctrl device name instead of the underlying pci device name
Sagi Grimberg [Wed, 10 Feb 2016 15:51:15 +0000 (08:51 -0700)]
nvme: Log the ctrl device name instead of the underlying pci device name

Having the ctrl name "nvmeX" seems much more friendly than
the underlying device name. Also, with other nvme transports
such as the soon to come nvme-loop we don't have an underlying
device so it doesn't makes sense to make up one.

In order to help matching an instance name to a pci function,
we add a info print in nvme_probe.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Manually fixed up the hunk in nvme_cancel_queue_ios().

Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1b3c47c182aac70c4487105d2e22a17f0193525f)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: fix drvdata setup for the nvme device
Christoph Hellwig [Tue, 9 Feb 2016 19:44:03 +0000 (12:44 -0700)]
nvme: fix drvdata setup for the nvme device

Pass the right private data to device_create_with_groups from the
beginning, and remove the superflous call to dev_set_drvdata.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f4f0f63e6f01055dfbdb7bc5e83935e1bdfa1980)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix possible queue use after freed
Keith Busch [Fri, 18 Dec 2015 00:08:15 +0000 (17:08 -0700)]
NVMe: Fix possible queue use after freed

This notifies blk-mq when the tag set contains a different number of
queues prior to freeing unused ones that the request queue points to.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 949928c1c731417cc0f070912c63878b62b544f4)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: switch abort to blk_execute_rq_nowait
Christoph Hellwig [Thu, 29 Dec 2016 21:33:03 +0000 (13:33 -0800)]
nvme: switch abort to blk_execute_rq_nowait

And remove the now unused nvme_submit_cmd helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e7a2a87d5938bbebe1637c82fbde94ea6be3ef78)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoblk-mq: fix racy updates of rq->errors
Christoph Hellwig [Tue, 30 May 2017 17:36:32 +0000 (10:36 -0700)]
blk-mq: fix racy updates of rq->errors

blk_mq_complete_request may be a no-op if the request has already
been completed by others means (e.g. a timeout or cancellation), but
currently drivers have to set rq->errors before calling
blk_mq_complete_request, which might leave us with the wrong error value.

Add an error parameter to blk_mq_complete_request so that we can
defer setting rq->errors until we known we won the race to complete the
request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f4829a9b7a61e159367350008a608b062c4f6840)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Export NVMe attributes to sysfs group
Keith Busch [Tue, 12 Jan 2016 22:09:31 +0000 (15:09 -0700)]
NVMe: Export NVMe attributes to sysfs group

Adds all controller information to attribute list exposed to sysfs, and
appends the reset_controller attribute to it. The nvme device is created
with this attribute list, so driver no long manages its attributes.

Reported-by: Sujith Pandel <sujithpshankar@gmail.com>
Cc: Sujith Pandel <sujithpshankar@ gmail.com>
Cc: David Milburn <dmilburn@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 779ff75617099f4defe14e20443b95019a4c5ae8)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Shutdown controller only for power-off
Keith Busch [Tue, 12 Jan 2016 21:41:18 +0000 (14:41 -0700)]
NVMe: Shutdown controller only for power-off

We don't need to shutdown a controller for a reset. A controller in a
shutdown state may take longer to become ready than one that was simply
disabled. This patch has the driver shut down a controller only if the
device is about to be powered off or being removed. When taking the
controller down for a reset reason, the controller will be disabled
instead.

Function names have been updated in this patch to reflect their changed
semantics.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a5cdb68c2c10f0865122656833cd07636a4143ee)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: IO queue deletion re-write
Keith Busch [Tue, 12 Jan 2016 21:41:17 +0000 (14:41 -0700)]
NVMe: IO queue deletion re-write

The nvme driver deletes IO queues asynchronously since this operation
may potentially take an undesirable amount of time with a large number
of queues if done serially.

The driver used to manage coordinating asynchronous deletions. This
patch simplifies that by leveraging the block layer rather than using
kthread workers and chaining more complicated callbacks.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit db3cbfff5bcc0b9a82d8c71f00b9d60fad215871)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Remove queue freezing on resets
Keith Busch [Mon, 4 Jan 2016 16:10:57 +0000 (09:10 -0700)]
NVMe: Remove queue freezing on resets

NVMe submits all commands through the block layer now. This means we
can let requests queue at the blk-mq hardware context since there is no
path that bypasses this anymore so we don't need to freeze the queues
anymore. The driver can simply stop the h/w queues from running during
a reset instead.

This also fixes a WARN in percpu_ref_reinit when the queue was unfrozen
with requeued requests.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 25646264e15af96c5c630fc742708b1eb3339222)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Use a retryable error code on reset
Keith Busch [Mon, 4 Jan 2016 16:10:56 +0000 (09:10 -0700)]
NVMe: Use a retryable error code on reset

A negative status has the "do not retry" bit set, which makes it not
retryable.  Use a fake status that can potentially be retried on reset.

An aborted command's status is overridden by the timeout handler so
that it won't be retried, which is necessary to keep initialization from
getting into a reset loop.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1d49c38c4865c596b01b31a52540275c1bb383e7)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix admin queue ring wrap
Keith Busch [Mon, 4 Jan 2016 16:10:55 +0000 (09:10 -0700)]
NVMe: Fix admin queue ring wrap

The tag set queue depth needs to be one less than the h/w queue depth
so we don't wrap the circular buffer. This conforms to the specification
defined "Full Queue" condition.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e3e9d50cd6ed392bb716e35c134d1e82707c51b4)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: make SG_IO support optional
Christoph Hellwig [Thu, 24 Dec 2015 14:27:02 +0000 (15:27 +0100)]
nvme: make SG_IO support optional

Translation SCSI commands to NVMe commands is rather pointless in general
as applications must not expext to be able to use SCSI commands on a
generic block device.

Make the huge translation layer optional and hope no one will ever enable
it in the future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4490733250b8b272a6d3e66352dd7b8025409549)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: fixes for NVME_IOCTL_IO_CMD on the char device
Christoph Hellwig [Thu, 24 Dec 2015 14:27:01 +0000 (15:27 +0100)]
nvme: fixes for NVME_IOCTL_IO_CMD on the char device

Make sure we synchronize access to the namespaces list and grab a reference
to the namespace before doing I/O.  Make sure to reject the ioctl if multiple
namespaces are present as it's entirely unsafe, and warn when using it even
with a single namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bfd8947194b2e2a53db82bbc7eb7c15d028c46db)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: synchronize access to ctrl->namespaces
Christoph Hellwig [Thu, 24 Dec 2015 14:27:00 +0000 (15:27 +0100)]
nvme: synchronize access to ctrl->namespaces

Currently traversal and modification of ctrl->namespaces happens completely
unsynchronized, which can be fixed by the addition of a simple mutex.

Note: nvme_dev_ioctl will be handled in the next patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 69d3b8ac15a5eb938e6a01909f6cc8ae4b5d3a17)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: Move nvme_freeze/unfreeze_queues to nvme core
Sagi Grimberg [Thu, 24 Dec 2015 14:26:59 +0000 (15:26 +0100)]
nvme: Move nvme_freeze/unfreeze_queues to nvme core

Nothing pci specific about them and We'll need them exported
in other transports too.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 363c9aacb6c59bb63148dd115632880a4aed4d88)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Export namespace attributes to sysfs
Keith Busch [Tue, 22 Dec 2015 17:10:45 +0000 (10:10 -0700)]
NVMe: Export namespace attributes to sysfs

Exposes the NGUID, EUI-64, and NSID to sysfs entries under the disk's
kobject.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2b9b6e86bca7209de02754fc84acf7ab3e78734e)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Add pci error handlers
Keith Busch [Mon, 7 Dec 2015 22:30:31 +0000 (15:30 -0700)]
NVMe: Add pci error handlers

Requests enabling pcie aer support. Shuts down the controller on error
detected with io frozen state prior to requesting slot reset; resumes
controller after reset completes.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a0a3408ee614848c27b0d36c2fe490da3b387b8d)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: merge iod and cmd_info
Christoph Hellwig [Sat, 28 Nov 2015 14:43:10 +0000 (15:43 +0100)]
nvme: merge iod and cmd_info

Merge the two per-request structures in the nvme driver into a single
one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f4800d6d1548e0d5ab94f2216d41d94282e2588c)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: meta_sg doesn't have to be an array
Christoph Hellwig [Mon, 26 Oct 2015 08:12:51 +0000 (17:12 +0900)]
nvme: meta_sg doesn't have to be an array

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bf68405705bd35c09ec1f7528718dce5af88daff)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: properly free resources for cancelled command
Christoph Hellwig [Thu, 26 Nov 2015 12:03:13 +0000 (13:03 +0100)]
nvme: properly free resources for cancelled command

We need to move freeing of resources to the ->complete handler to ensure
they are also freed when we cancel the command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit eee417b0697827a6e120199b126b447af3c81b47)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: simplify completion handling
Christoph Hellwig [Thu, 26 Nov 2015 11:59:50 +0000 (12:59 +0100)]
nvme: simplify completion handling

Now that all commands are executed as block layer requests we can remove the
internal completion in the NVMe driver.  Note that we can simply call
blk_mq_complete_request to abort commands as the block layer will protect
against double copletions internally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit aae239e1910ebc27ec9f7e8b25904a69626cf28c)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: special case AEN requests
Christoph Hellwig [Thu, 22 Dec 2016 06:59:20 +0000 (22:59 -0800)]
nvme: special case AEN requests

AEN requests are different from other requests in that they don't time out
or can easily be cancelled.  Because of that we should not use the blk-mq
infrastructure but just special case them in the completion path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 3e1e21c7bfcfa9bf06c07f48a13faca2f62b3339)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: factor out a few helpers from req_completion
Christoph Hellwig [Sat, 28 Nov 2015 14:41:58 +0000 (15:41 +0100)]
nvme: factor out a few helpers from req_completion

We'll need them in other places later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7688faa6dd2c99ce5d66571d9ad65535ec39e8cb)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: fix admin queue depth
Christoph Hellwig [Mon, 16 Nov 2015 11:40:02 +0000 (12:40 +0100)]
nvme: fix admin queue depth

The number in tag_set->queue depth includes the reserved tags.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4680072003df14230e9eeeeefb617401012234a5)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Simplify metadata setup
Keith Busch [Fri, 20 Nov 2015 08:13:30 +0000 (09:13 +0100)]
NVMe: Simplify metadata setup

We no longer require the two-pass setup for block integrity.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4b9d5b151046ff717819864f93cb8e012b347bce)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Remove device management handles on remove
Keith Busch [Sat, 28 Nov 2015 14:41:02 +0000 (15:41 +0100)]
NVMe: Remove device management handles on remove

We don't want to allow new references to open on a device that is
removed. This ties the lifetime of these handles to the physical device's
presence rather than to the open reference count.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 53029b0441bbd263dbb2ee6429572b1732dad4de)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>