Keith Busch [Wed, 24 Feb 2016 16:15:58 +0000 (09:15 -0700)]
NVMe: Fix 0-length integrity payload
A user could send a passthrough IO command with a metadata pointer to a
namespace without metadata. With metadata length of 0, kmalloc returns
ZERO_SIZE_PTR. Since that is not NULL, the driver would have set this as
the bio's integrity payload, which causes an access fault on completion.
This patch ignores the users metadata buffer if the namespace format
does not support separate metadata.
Reported-by: Stephen Bates <stephen.bates@microsemi.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e9fc63d682dbbef17921aeb00d03fd52d6735ffd)
Keith Busch [Wed, 24 Feb 2016 16:15:57 +0000 (09:15 -0700)]
NVMe: Don't allow unsupported flags
The command flags can change the meaning of other fields in the command
that the driver is not prepared to handle. Specifically, the user could
passthrough an SGL flag, causing the controller to misinterpret the PRP
list the driver created, potentially corrupting memory or data.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 63088ec7c8eadfe08b96127a41b385ec9742dace)
Keith Busch [Fri, 30 Dec 2016 03:19:31 +0000 (19:19 -0800)]
NVMe: Move error handling to failed reset handler
This moves failed queue handling out of the namespace removal path and
into the reset failure path, fixing a hanging condition if the controller
fails or link down during del_gendisk. Previously the driver had to see
the controller as degraded prior to calling del_gendisk to setup the
queues to fail. But, if the controller happened to fail after this,
there was no task to end outstanding requests.
On failure, all namespace states are set to dead. This has capacity
revalidate to 0, and ends all new requests with error status.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 69d9a99c258eb1d6478fd9608a2070890797eed7)
Keith Busch [Wed, 22 Feb 2017 19:13:09 +0000 (11:13 -0800)]
NVMe: Simplify device reset failure
A reset failure schedules the device to unbind from the driver through
the pci driver's remove. This cleans up all intialization, so there is
no need to duplicate the potentially racy cleanup.
To help understand why a reset failed, the status is logged with the
existing warning message.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f58944e265d4ebe47216a5d7488aee3928823d30)
Keith Busch [Wed, 24 Feb 2016 16:15:54 +0000 (09:15 -0700)]
NVMe: Fix namespace removal deadlock
This patch makes nvme namespace removal lockless. It is up to the caller
to ensure no active namespace scanning is occuring. To ensure no scan
work occurs, the nvme pci driver adds a removing state to the controller
device to avoid queueing scan work during removal. The work is flushed
after setting the state, so no new scan work can be queued.
The lockless removal allows the driver to cleanup a namespace
request_queue if the controller fails during removal. Previously this
could deadlock trying to acquire the namespace mutex in order to handle
such events.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 646017a612e72f19bd9f991fe25287a149c5f627)
Keith Busch [Wed, 24 Feb 2016 16:15:53 +0000 (09:15 -0700)]
NVMe: Use IDA for namespace disk naming
A namespace may be detached from a controller, but a user may be holding
a reference to it. Attaching a new namespace with the same NSID will create
duplicate names when using the NSID to name the disk.
This patch uses an IDA that is released only when the last reference is
released instead of using the namespace ID.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 075790ebba4a1eb297f9875e581b55c0382b1f3d)
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 931e1c2204c6d00c11c5c1e2e1c20b5ca41f292d)
Christoph Hellwig [Mon, 29 Feb 2016 14:59:46 +0000 (15:59 +0100)]
nvme: replace the kthread with a per-device watchdog timer
The only work left in the kthread is the periodic health check for each
controller. There is no need to run this from process context or keep
a thread context around for it, so replace it with a simpler timer.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2d55cd5f511d6fc377734473b237ac50820bfb9f)
Christoph Hellwig [Mon, 29 Feb 2016 14:59:44 +0000 (15:59 +0100)]
nvme: use a work item to submit async event requests
Use a dedicated work item to submit async event requests instead of the
global kthread. This simplifies the code and reduces the latencies to
resubmit a request once an even notification happened.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 9396dec916c052855dbb5b876c13d163df397319)
Keith Busch [Thu, 11 Feb 2016 20:05:47 +0000 (13:05 -0700)]
NVMe: Rate limit nvme IO warnings
We don't need to spam the kernel logs with thousands of IO cancelling
messages. We can infer all IO's are being cancelled with fewer, or
even none at all. This patch rate limits the message and uses the debug
log level as it is mainly used for testing purposes.
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f8e68a7c9af5f8047f7f8295874bedf306063709)
Keith Busch [Thu, 11 Feb 2016 20:05:43 +0000 (13:05 -0700)]
NVMe: Poll device while still active during remove
A device failure or link down wouldn't have been detected during namespace
removal. This patch keeps the device in the list for polling so that the
thread may see such failure and initiate a reset. The device is removed
from the list after disable, so we can safely flush the reset work as
it can't be requeued when disable completes.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit ff23a2a15a2117245b4599c1352343c8b8fb4c43)
Ming Lin [Wed, 10 Feb 2016 18:03:32 +0000 (10:03 -0800)]
nvme: split pci module out of core module
NVMe over Fabrics drivers are going to reuse the core,
so splits nvme.ko into 2 modules:
nvme-core.ko: the core part
nvme.ko: the PCI driver
Export symbols from nvme-core.ko.
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 576d55d625664a20ee4bae6500952febfb2d7b10)
Ming Lin [Wed, 10 Feb 2016 18:03:31 +0000 (10:03 -0800)]
nvme: split dev_list_lock
Split dev_list_lock into one in the core and one in the PCI driver.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 9f2482b91bcd02ac2999cf04b3fb1b89e1c4d559)
Ming Lin [Wed, 10 Feb 2016 18:03:30 +0000 (10:03 -0800)]
nvme: move timeout variables to core.c
These variables are used by PCI driver and will also be used in the
forthcoming NVMe over Fabrics drivers.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit ba0ba7d3e5266111ec865b0bf1ad48dd0e2a2314)
Sagi Grimberg [Wed, 10 Feb 2016 18:03:29 +0000 (10:03 -0800)]
nvme/host: reference the fabric module for each bdev open callout
We don't want to be able to unload the fabric driver when we have
openened referenced to our namespaces. Thus, for each nvme_open we
take a reference on the fabric driver and put it in nvme_release.
This behavior is consistent with the scsi model.
This resolves the panic when unloading a fabric module with
mpath holders.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ian Bakshan <ianb@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e439bb12e75c2807029853493fa787c6d70c763a)
Sagi Grimberg [Wed, 10 Feb 2016 15:51:15 +0000 (08:51 -0700)]
nvme: Log the ctrl device name instead of the underlying pci device name
Having the ctrl name "nvmeX" seems much more friendly than
the underlying device name. Also, with other nvme transports
such as the soon to come nvme-loop we don't have an underlying
device so it doesn't makes sense to make up one.
In order to help matching an instance name to a pci function,
we add a info print in nvme_probe.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Keith Busch <keith.busch@intel.com>
Manually fixed up the hunk in nvme_cancel_queue_ios().
Christoph Hellwig [Tue, 30 May 2017 17:36:32 +0000 (10:36 -0700)]
blk-mq: fix racy updates of rq->errors
blk_mq_complete_request may be a no-op if the request has already
been completed by others means (e.g. a timeout or cancellation), but
currently drivers have to set rq->errors before calling
blk_mq_complete_request, which might leave us with the wrong error value.
Add an error parameter to blk_mq_complete_request so that we can
defer setting rq->errors until we known we won the race to complete the
request.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f4829a9b7a61e159367350008a608b062c4f6840)
Keith Busch [Tue, 12 Jan 2016 22:09:31 +0000 (15:09 -0700)]
NVMe: Export NVMe attributes to sysfs group
Adds all controller information to attribute list exposed to sysfs, and
appends the reset_controller attribute to it. The nvme device is created
with this attribute list, so driver no long manages its attributes.
Reported-by: Sujith Pandel <sujithpshankar@gmail.com> Cc: Sujith Pandel <sujithpshankar@ gmail.com> Cc: David Milburn <dmilburn@redhat.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 779ff75617099f4defe14e20443b95019a4c5ae8)
Keith Busch [Tue, 12 Jan 2016 21:41:18 +0000 (14:41 -0700)]
NVMe: Shutdown controller only for power-off
We don't need to shutdown a controller for a reset. A controller in a
shutdown state may take longer to become ready than one that was simply
disabled. This patch has the driver shut down a controller only if the
device is about to be powered off or being removed. When taking the
controller down for a reset reason, the controller will be disabled
instead.
Function names have been updated in this patch to reflect their changed
semantics.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a5cdb68c2c10f0865122656833cd07636a4143ee)
Keith Busch [Tue, 12 Jan 2016 21:41:17 +0000 (14:41 -0700)]
NVMe: IO queue deletion re-write
The nvme driver deletes IO queues asynchronously since this operation
may potentially take an undesirable amount of time with a large number
of queues if done serially.
The driver used to manage coordinating asynchronous deletions. This
patch simplifies that by leveraging the block layer rather than using
kthread workers and chaining more complicated callbacks.
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit db3cbfff5bcc0b9a82d8c71f00b9d60fad215871)
Keith Busch [Mon, 4 Jan 2016 16:10:57 +0000 (09:10 -0700)]
NVMe: Remove queue freezing on resets
NVMe submits all commands through the block layer now. This means we
can let requests queue at the blk-mq hardware context since there is no
path that bypasses this anymore so we don't need to freeze the queues
anymore. The driver can simply stop the h/w queues from running during
a reset instead.
This also fixes a WARN in percpu_ref_reinit when the queue was unfrozen
with requeued requests.
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 25646264e15af96c5c630fc742708b1eb3339222)
Keith Busch [Mon, 4 Jan 2016 16:10:56 +0000 (09:10 -0700)]
NVMe: Use a retryable error code on reset
A negative status has the "do not retry" bit set, which makes it not
retryable. Use a fake status that can potentially be retried on reset.
An aborted command's status is overridden by the timeout handler so
that it won't be retried, which is necessary to keep initialization from
getting into a reset loop.
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1d49c38c4865c596b01b31a52540275c1bb383e7)
Keith Busch [Mon, 4 Jan 2016 16:10:55 +0000 (09:10 -0700)]
NVMe: Fix admin queue ring wrap
The tag set queue depth needs to be one less than the h/w queue depth
so we don't wrap the circular buffer. This conforms to the specification
defined "Full Queue" condition.
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e3e9d50cd6ed392bb716e35c134d1e82707c51b4)
Christoph Hellwig [Thu, 24 Dec 2015 14:27:02 +0000 (15:27 +0100)]
nvme: make SG_IO support optional
Translation SCSI commands to NVMe commands is rather pointless in general
as applications must not expext to be able to use SCSI commands on a
generic block device.
Make the huge translation layer optional and hope no one will ever enable
it in the future.
Christoph Hellwig [Thu, 24 Dec 2015 14:27:01 +0000 (15:27 +0100)]
nvme: fixes for NVME_IOCTL_IO_CMD on the char device
Make sure we synchronize access to the namespaces list and grab a reference
to the namespace before doing I/O. Make sure to reject the ioctl if multiple
namespaces are present as it's entirely unsafe, and warn when using it even
with a single namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bfd8947194b2e2a53db82bbc7eb7c15d028c46db)
Keith Busch [Mon, 7 Dec 2015 22:30:31 +0000 (15:30 -0700)]
NVMe: Add pci error handlers
Requests enabling pcie aer support. Shuts down the controller on error
detected with io frozen state prior to requesting slot reset; resumes
controller after reset completes.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a0a3408ee614848c27b0d36c2fe490da3b387b8d)
Christoph Hellwig [Thu, 26 Nov 2015 11:59:50 +0000 (12:59 +0100)]
nvme: simplify completion handling
Now that all commands are executed as block layer requests we can remove the
internal completion in the NVMe driver. Note that we can simply call
blk_mq_complete_request to abort commands as the block layer will protect
against double copletions internally.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit aae239e1910ebc27ec9f7e8b25904a69626cf28c)
Christoph Hellwig [Thu, 22 Dec 2016 06:59:20 +0000 (22:59 -0800)]
nvme: special case AEN requests
AEN requests are different from other requests in that they don't time out
or can easily be cancelled. Because of that we should not use the blk-mq
infrastructure but just special case them in the completion path.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 3e1e21c7bfcfa9bf06c07f48a13faca2f62b3339)
Keith Busch [Sat, 28 Nov 2015 14:41:02 +0000 (15:41 +0100)]
NVMe: Remove device management handles on remove
We don't want to allow new references to open on a device that is
removed. This ties the lifetime of these handles to the physical device's
presence rather than to the open reference count.
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 53029b0441bbd263dbb2ee6429572b1732dad4de)
Keith Busch [Fri, 23 Oct 2015 17:42:02 +0000 (11:42 -0600)]
NVMe: Use unbounded work queue for all work
Removes all usage of the global work queue so work can't be
scheduled on two different work queues, and removes nvme's work queue
singlethreadedness so controllers can be driven in parallel.
Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: keep the dead controller removal on the system workqueue to avoid
deadlocks] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 92f7a1624bbc2361b96db81de89aee1baae40da9)
Christoph Hellwig [Thu, 26 Nov 2015 11:42:26 +0000 (12:42 +0100)]
nvme: merge probe_work and reset_work
If we're using two work queues we're always going to run into races where
one item is tearing down what the other one is initializing. So insted
merge the two work queues, and let the old probe_work also tear the
controller down first if it was alive. Together with the better detection
of the probe path using a flag this gives us a properly serialized
reset/probe path that also doesn't accidentally trigger when two commands
time out and the second one tries to reset the controller while the first
reset is still in progress.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit fd634f4142861e533ac57e88ece8e98ab5851edb)
Keith Busch [Thu, 26 Nov 2015 11:11:07 +0000 (12:11 +0100)]
nvme: do not restart the request timeout if we're resetting the controller
Otherwise we're never going to complete a command when it is restarted just
after we completed all other outstanding commands in nvme_clear_queue.
The controller must be disabled prior to completing a presumed lost
command, do this by directly shutting down the controller before
queueing the reset work, and return EH_HANDLED from the timeout handler
after we shut the controller down.
Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: split and rebase] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e1569a16180aef4311ff5fc54f54b23ae9e8a03e)
Christoph Hellwig [Thu, 26 Nov 2015 11:10:29 +0000 (12:10 +0100)]
nvme: simplify resets
Don't delete the controller from dev_list before queuing a reset, instead
just check for it being reset in the polling kthread. This allows to remove
the dev_list_lock in various places, and in addition we can simply rely on
checking the queue_work return value to see if we could reset a controller.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 846cc05f95d599801f296d8599e82686ebd395f0)
Christoph Hellwig [Thu, 22 Oct 2015 12:03:35 +0000 (14:03 +0200)]
nvme: merge nvme_abort_req and nvme_timeout
We want to be able to return bettern error values frmo nvme_timeout, which
is significantly easier if the two functions are merged. Also clean up and
reduce the printk spew so that we only get one message per abort.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 31c7c7d2c9f17dc98a98c59c17e184bf164ee760)
Keith Busch [Thu, 26 Nov 2015 11:21:29 +0000 (12:21 +0100)]
nvme: protect against simultaneous shutdown invocations
Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 77bf25ea70200cddf083f74b7f617e5f07fac8bd)
Christoph Hellwig [Thu, 22 Oct 2015 12:03:33 +0000 (14:03 +0200)]
nvme: only add a controller to dev_list after it's been fully initialized
Without this we can easily get bad derferences on nvmeq->d_db when the nvme
kthread tries to poll the CQs for controllers that are in half initialized
state.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7385014c073263b077442439299fad013edd4409)
Dan Carpenter [Wed, 9 Dec 2015 10:24:06 +0000 (13:24 +0300)]
nvme: precedence bug in nvme_pr_clear()
The "|" operator has higher precedence than "?:" so this didn't work as
intended. I had previously fixed this bug, but it we copied the older
unfixed version when we moved the function between files.
Fixes: 1673f1f08c88 ('nvme: move block_device_operations and ns/ctrl freeing to common code') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 8c0b39155048d5a24f25c6c60aa83729927b04cd)
Arnd Bergmann [Tue, 8 Dec 2015 15:22:17 +0000 (16:22 +0100)]
nvme: fix another 32-bit build warning
The nvme_user_cmd function was recently moved around from one file
to another, which made a warning reappear that I had fixed before
at some point:
drivers/nvme/host/core.c: In function 'nvme_user_cmd':
drivers/nvme/host/core.c:424:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
This applies the same workaround that we have elsewhere in the
driver with an extra type cast to uintptr_t.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 1673f1f08c88 ("nvme: move block_device_operations and ns/ctrl freeing to common code") Link: https://lkml.org/lkml/2015/10/9/611 Signed-off-by: Jens Axboe <axboe@fb.com>
Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Tue, 20 Dec 2016 00:07:25 +0000 (16:07 -0800)]
nvme: refactor set_queue_count
Split out a helper that just issues the Set Features and interprets the
result which can go to common code, and document why we are ignoring
non-timeout error returns in the PCIe driver.
Christoph Hellwig [Tue, 20 Dec 2016 03:33:37 +0000 (19:33 -0800)]
nvme: move chardev and sysfs interface to common code
For this we need to add a proper controller init routine and a list of
all controllers that is in addition to the list of PCIe controllers,
which stays in pci.c. Note that we remove the sysfs device when the
last reference to a controller is dropped now - the old code would have
kept it around longer, which doesn't make much sense.
This requires a new ->reset_ctrl operation to implement controleller
resets, and a new ->write_reg32 operation that is required to implement
subsystem resets. We also now store caches copied of the NVMe compliance
version and the flag if a controller is attached to a subsystem or not in
the generic controller structure now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[Fixes for pr merge] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f3ca80fc11c3af566eacd99cf821c1a48035c63b)
Christoph Hellwig [Tue, 20 Dec 2016 03:30:55 +0000 (19:30 -0800)]
nvme: move namespace scanning to common code
The namespace scanning code has been mostly generic already, we just
need to store a pointer to the tagset in the nvme_ctrl structure, and
add a method to check if a controller is I/O incapable. The latter
will hopefully be replaced by a proper controller state machine soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[Fixed pr conflicts] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 5bae7f73d378a986671a3cad717c721b38f80d9e)
Christoph Hellwig [Sat, 28 Nov 2015 14:01:09 +0000 (15:01 +0100)]
nvme: move remaining CC setup into nvme_enable_ctrl
Remove the calculation of all the bits written into the CC register into
nvme_enable_ctrl, so that they can be moved into the core NVMe driver in
the future.
Ashok Vairavan [Mon, 19 Dec 2016 23:41:31 +0000 (15:41 -0800)]
nvme: move block_device_operations and ns/ctrl freeing to common code
This moves the block_device_operations over to common code mostly
as-is. The only change is that the ns and ctrl refcounting got some
small refcounting to have wrappers around the kref_put operations.
A new free_ctrl operation is added to allow the PCI driver to free
it's ressources on the final drop.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[Moved the integrity and pr changes due to merge conflict] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1673f1f08c8876f3942b4fa5e8f6a40215f15a94)
Keith Busch [Fri, 23 Oct 2015 15:47:28 +0000 (09:47 -0600)]
nvme: use the block layer for userspace passthrough metadata
Use the integrity API to pass through metadata from userspace. For PI
enabled devices this means that we now validate the reftag, which seems
like an unintentional ommission in the old code.
Thanks to Keith Busch for testing and fixes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[Skip metadata setup on admin commands] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 0b7f1f26f95a51ab11d4dc0adee230212b3cd675)
Christoph Hellwig [Mon, 19 Dec 2016 20:00:59 +0000 (12:00 -0800)]
nvme: split __nvme_submit_sync_cmd
Add a separate nvme_submit_user_cmd for commands that directly DMA
to or from userspace. We'll add metadata support to that soon and
the common version would become too messy.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4160982e7594481d6b7f90aa693638a37d20ea17)
Christoph Hellwig [Mon, 19 Dec 2016 19:34:38 +0000 (11:34 -0800)]
nvme: split a new struct nvme_ctrl out of struct nvme_dev
The new struct nvme_ctrl will be used by the common NVMe code that sits
on top of struct request_queue and the new nvme_ctrl_ops abstraction.
It only contains the bare minimum required, which consists of values
sampled during controller probe, the admin queue pointer and a second
struct device pointer at the moment, but more will follow later. Only
values that are not used in the I/O fast path should be moved to
struct nvme_ctrl so that drivers can optimize their cache line usage
easily. That's also the reason why we have two device pointers as
the struct device is used for DMA mapping purposes.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1c63dc66580d4bbb6d2b75bf184b5aa105ba5bdb)
Orabug: 25130845
Conflicts:
drivers/nvme/host/nvme.h
drivers/nvme/host/pci.c Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Thu, 26 Nov 2015 08:59:44 +0000 (09:59 +0100)]
nvme: use vendor it from identify
Use the vendor ID from the identify data instead of the PCI device to
make the SCSI translation layer independent from the PCI driver. The NVMe
spec defines them as having the same value for current PCIe devices.
Christoph Hellwig [Mon, 19 Dec 2016 05:55:25 +0000 (21:55 -0800)]
nvme: use offset instead of a struct for registers
This makes life easier for future non-PCI drivers where access to the
registers might be more complicated. Note that Linux drivers are
pretty evenly split between the two versions, and in fact the NVMe
driver already uses offsets for the doorbells.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com>
[Fixed CMBSZ offset] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7a67cbea653e444d04d7e850ab9631a14a196422)
Conflicts:
Merge conflict due to the Samsung Errata patch
drivers/nvme/host/pci.c
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Mon, 19 Dec 2016 05:54:06 +0000 (21:54 -0800)]
nvme: split command submission helpers out of pci.c
Create a new core.c and start by adding the command submission helpers
to it, which are already abstracted away from the actual hardware queues
by the block layer.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 21d34711e1b5970acfb22bddf1fefbfbd7e0123b)
Jay Freyensee [Mon, 19 Dec 2016 02:43:39 +0000 (18:43 -0800)]
Update target repo for nvme patch contributions
Per http://www.nvmexpress.org/resources/linux-driver-information/, the
old nvme git repo is stale. Updating MAINTAINERS to the Supported
target currently used by the community.
Signed-off-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Updated by me to add Keith as the maintainer, me as the co-maintainer.
Christoph Hellwig [Mon, 19 Dec 2016 01:59:11 +0000 (17:59 -0800)]
block: add an API for Persistent Reservations
This commits adds a driver API and ioctls for controlling Persistent
Reservations s/genericly/generically/ at the block layer. Persistent
Reservations are supported by SCSI and NVMe and allow controlling who gets
access to a device in a shared storage setup.
Note that we add a pr_ops structure to struct block_device_operations
instead of adding the members directly to avoid bloating all instances
of devices that will never support Persistent Reservations.
Dan Williams [Sun, 18 Dec 2016 14:58:48 +0000 (06:58 -0800)]
nvme: suspend i/o during runtime blk_integrity_unregister
Synchronize pending i/o against a change in the integrity profile to
avoid the possibility of spurious integrity errors.
Cc: Matthew Wilcox <willy@linux.intel.com> Acked-by: Keith Busch <keith.busch@intel.com>
[keith: also protect dynamic integrity registration] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4cfc766e07a5ed709a9d5289c8644fe78e9f24de)
Orabug: 25130845 Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Jay Sternberg [Sun, 18 Dec 2016 03:44:56 +0000 (19:44 -0800)]
nvme: move to a new drivers/nvme/host directory
This patch moves the NVMe driver from drivers/block/ to its own new
drivers/nvme/host/ directory. This is in preparation of splitting the
current monolithic driver up and add support for the upcoming NVMe
over Fabrics standard. The drivers/nvme/host/ is chose to leave space
for a NVMe target implementation in addition to this host side driver.
Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
[hch: rebased, renamed core.c to pci.c, slight tweaks] Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 57dacad5f2288e3de91f99b29f07b4a2793446d2)
Orabug: 25130845 Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Keith Busch [Sun, 18 Dec 2016 03:03:47 +0000 (19:03 -0800)]
NVMe: Set affinity after allocating request queues
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The asynchronous namespace scanning caused affinity hints to be set before
its tagset initialized, so there was no cpu mask to set the hint. This
patch moves the affinity hint setting to after namespaces are scanned.
Reported-by: 김경산 <ks0204.kim@samsung.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bda4e0fb3126aca15586d165b5a15a37edc0a984)
Orabug: 25130845
Conflicts:
Manually patched the commit.
drivers/block/nvme-core.c
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Keith Busch [Sat, 17 Dec 2016 23:33:20 +0000 (15:33 -0800)]
NVMe: Fix IO for extended metadata formats
This fixes io submit ioctl handling when using extended metadata
formats. When these formats are used, the user provides a single virtually
contiguous buffer containing both the block and metadata interleaved,
so the metadata size needs to be added to the total length and not mapped
as a separate transfer.
The command is also driver generated, so this patch does not enforce
blk-integrity extensions provide the metadata buffer.
Reported-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 71feb364e7faadc681e714f7fdc2bede208ba26c)
Orabug: 25130845 Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Keith Busch [Sat, 17 Dec 2016 23:31:36 +0000 (15:31 -0800)]
NVMe: Remove hctx reliance for multi-namespace
The driver needs to track shared tags to support multiple namespaces
that may be dynamically allocated or deleted. Relying on the first
request_queue's hctx's is not appropriate as we cannot clear outstanding
tags for all namespaces using this handle, nor can the driver easily track
all request_queue's hctx as namespaces are attached/detached. Instead,
this patch uses the nvme_dev's tagset to get the shared tag resources
instead of through a request_queue hctx.
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 42483228d4c019ffc86b8dbea7dfbc3f9566fe7e)
Orabug: 25130845
Conflicts:
nvme_set_irq_hints() needs to check tags instead of hctx and
retain nvme_admin_exit_hctx as exit_hctx
drivers/block/nvme-core.c
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Ashok Vairavan [Sat, 17 Dec 2016 03:23:09 +0000 (19:23 -0800)]
Revert "nvme: move to a new drivers/nvme/host directory"
This reverts commit 57dacad5f2288e3de91f99b29f07b4a2793446d2. We need to
cherry-pick many commits before merging this commit. Hence this commit
is reverted to cherry-pick the commits from upstream.