]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
8 years agonvme: make SG_IO support optional
Christoph Hellwig [Thu, 24 Dec 2015 14:27:02 +0000 (15:27 +0100)]
nvme: make SG_IO support optional

Translation SCSI commands to NVMe commands is rather pointless in general
as applications must not expext to be able to use SCSI commands on a
generic block device.

Make the huge translation layer optional and hope no one will ever enable
it in the future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4490733250b8b272a6d3e66352dd7b8025409549)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: fixes for NVME_IOCTL_IO_CMD on the char device
Christoph Hellwig [Thu, 24 Dec 2015 14:27:01 +0000 (15:27 +0100)]
nvme: fixes for NVME_IOCTL_IO_CMD on the char device

Make sure we synchronize access to the namespaces list and grab a reference
to the namespace before doing I/O.  Make sure to reject the ioctl if multiple
namespaces are present as it's entirely unsafe, and warn when using it even
with a single namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bfd8947194b2e2a53db82bbc7eb7c15d028c46db)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: synchronize access to ctrl->namespaces
Christoph Hellwig [Thu, 24 Dec 2015 14:27:00 +0000 (15:27 +0100)]
nvme: synchronize access to ctrl->namespaces

Currently traversal and modification of ctrl->namespaces happens completely
unsynchronized, which can be fixed by the addition of a simple mutex.

Note: nvme_dev_ioctl will be handled in the next patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 69d3b8ac15a5eb938e6a01909f6cc8ae4b5d3a17)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: Move nvme_freeze/unfreeze_queues to nvme core
Sagi Grimberg [Thu, 24 Dec 2015 14:26:59 +0000 (15:26 +0100)]
nvme: Move nvme_freeze/unfreeze_queues to nvme core

Nothing pci specific about them and We'll need them exported
in other transports too.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 363c9aacb6c59bb63148dd115632880a4aed4d88)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Export namespace attributes to sysfs
Keith Busch [Tue, 22 Dec 2015 17:10:45 +0000 (10:10 -0700)]
NVMe: Export namespace attributes to sysfs

Exposes the NGUID, EUI-64, and NSID to sysfs entries under the disk's
kobject.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2b9b6e86bca7209de02754fc84acf7ab3e78734e)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Add pci error handlers
Keith Busch [Mon, 7 Dec 2015 22:30:31 +0000 (15:30 -0700)]
NVMe: Add pci error handlers

Requests enabling pcie aer support. Shuts down the controller on error
detected with io frozen state prior to requesting slot reset; resumes
controller after reset completes.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a0a3408ee614848c27b0d36c2fe490da3b387b8d)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: merge iod and cmd_info
Christoph Hellwig [Sat, 28 Nov 2015 14:43:10 +0000 (15:43 +0100)]
nvme: merge iod and cmd_info

Merge the two per-request structures in the nvme driver into a single
one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f4800d6d1548e0d5ab94f2216d41d94282e2588c)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: meta_sg doesn't have to be an array
Christoph Hellwig [Mon, 26 Oct 2015 08:12:51 +0000 (17:12 +0900)]
nvme: meta_sg doesn't have to be an array

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bf68405705bd35c09ec1f7528718dce5af88daff)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: properly free resources for cancelled command
Christoph Hellwig [Thu, 26 Nov 2015 12:03:13 +0000 (13:03 +0100)]
nvme: properly free resources for cancelled command

We need to move freeing of resources to the ->complete handler to ensure
they are also freed when we cancel the command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit eee417b0697827a6e120199b126b447af3c81b47)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: simplify completion handling
Christoph Hellwig [Thu, 26 Nov 2015 11:59:50 +0000 (12:59 +0100)]
nvme: simplify completion handling

Now that all commands are executed as block layer requests we can remove the
internal completion in the NVMe driver.  Note that we can simply call
blk_mq_complete_request to abort commands as the block layer will protect
against double copletions internally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit aae239e1910ebc27ec9f7e8b25904a69626cf28c)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: special case AEN requests
Christoph Hellwig [Thu, 22 Dec 2016 06:59:20 +0000 (22:59 -0800)]
nvme: special case AEN requests

AEN requests are different from other requests in that they don't time out
or can easily be cancelled.  Because of that we should not use the blk-mq
infrastructure but just special case them in the completion path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 3e1e21c7bfcfa9bf06c07f48a13faca2f62b3339)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: factor out a few helpers from req_completion
Christoph Hellwig [Sat, 28 Nov 2015 14:41:58 +0000 (15:41 +0100)]
nvme: factor out a few helpers from req_completion

We'll need them in other places later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7688faa6dd2c99ce5d66571d9ad65535ec39e8cb)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: fix admin queue depth
Christoph Hellwig [Mon, 16 Nov 2015 11:40:02 +0000 (12:40 +0100)]
nvme: fix admin queue depth

The number in tag_set->queue depth includes the reserved tags.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4680072003df14230e9eeeeefb617401012234a5)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Simplify metadata setup
Keith Busch [Fri, 20 Nov 2015 08:13:30 +0000 (09:13 +0100)]
NVMe: Simplify metadata setup

We no longer require the two-pass setup for block integrity.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4b9d5b151046ff717819864f93cb8e012b347bce)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Remove device management handles on remove
Keith Busch [Sat, 28 Nov 2015 14:41:02 +0000 (15:41 +0100)]
NVMe: Remove device management handles on remove

We don't want to allow new references to open on a device that is
removed. This ties the lifetime of these handles to the physical device's
presence rather than to the open reference count.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 53029b0441bbd263dbb2ee6429572b1732dad4de)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Use unbounded work queue for all work
Keith Busch [Fri, 23 Oct 2015 17:42:02 +0000 (11:42 -0600)]
NVMe: Use unbounded work queue for all work

Removes all usage of the global work queue so work can't be
scheduled on two different work queues, and removes nvme's work queue
singlethreadedness so controllers can be driven in parallel.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: keep the dead controller removal on the system workqueue to avoid
 deadlocks]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 92f7a1624bbc2361b96db81de89aee1baae40da9)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: switch abort_limit to an atomic_t
Christoph Hellwig [Fri, 20 Nov 2015 08:36:44 +0000 (09:36 +0100)]
nvme: switch abort_limit to an atomic_t

There is no lock to sychronize access to the abort_limit field of
struct nvme_ctrl, so switch it to an atomic_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Orabug: 25130845
Cherry pick commit: 6bf25d1
Conflicts:
drivers/nvme/host/pci.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: merge probe_work and reset_work
Christoph Hellwig [Thu, 26 Nov 2015 11:42:26 +0000 (12:42 +0100)]
nvme: merge probe_work and reset_work

If we're using two work queues we're always going to run into races where
one item is tearing down what the other one is initializing.  So insted
merge the two work queues, and let the old probe_work also tear the
controller down first if it was alive.  Together with the better detection
of the probe path using a flag this gives us a properly serialized
reset/probe path that also doesn't accidentally trigger when two commands
time out and the second one tries to reset the controller while the first
reset is still in progress.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit fd634f4142861e533ac57e88ece8e98ab5851edb)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: do not restart the request timeout if we're resetting the controller
Keith Busch [Thu, 26 Nov 2015 11:11:07 +0000 (12:11 +0100)]
nvme: do not restart the request timeout if we're resetting the controller

Otherwise we're never going to complete a command when it is restarted just
after we completed all other outstanding commands in nvme_clear_queue.

The controller must be disabled prior to completing a presumed lost
command, do this by directly shutting down the controller before
queueing the reset work, and return EH_HANDLED from the timeout handler
after we shut the controller down.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: split and rebase]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e1569a16180aef4311ff5fc54f54b23ae9e8a03e)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: simplify resets
Christoph Hellwig [Thu, 26 Nov 2015 11:10:29 +0000 (12:10 +0100)]
nvme: simplify resets

Don't delete the controller from dev_list before queuing a reset, instead
just check for it being reset in the polling kthread.  This allows to remove
the dev_list_lock in various places, and in addition we can simply rely on
checking the queue_work return value to see if we could reset a controller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 846cc05f95d599801f296d8599e82686ebd395f0)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: add NVME_SC_CANCELLED
Christoph Hellwig [Tue, 20 Dec 2016 00:24:17 +0000 (16:24 -0800)]
nvme: add NVME_SC_CANCELLED

To properly document how we are using a negative Linux error value to
communicate request cancellations inside the driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 297465c873ae8c99180617ca904dc1a4a738f25d)

Orabug: 25130845
Conflicts:
drivers/nvme/host/pci.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: merge nvme_abort_req and nvme_timeout
Christoph Hellwig [Thu, 22 Oct 2015 12:03:35 +0000 (14:03 +0200)]
nvme: merge nvme_abort_req and nvme_timeout

We want to be able to return bettern error values frmo nvme_timeout, which
is significantly easier if the two functions are merged.  Also clean up and
reduce the printk spew so that we only get one message per abort.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 31c7c7d2c9f17dc98a98c59c17e184bf164ee760)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: don't take the I/O queue q_lock in nvme_timeout
Christoph Hellwig [Thu, 22 Oct 2015 12:03:34 +0000 (14:03 +0200)]
nvme: don't take the I/O queue q_lock in nvme_timeout

There is nothing it protects, but it makes lockdep unhappy in many different
ways.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4c9f748f0ee88447b28546991f60f43a7319aafd)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: protect against simultaneous shutdown invocations
Keith Busch [Thu, 26 Nov 2015 11:21:29 +0000 (12:21 +0100)]
nvme: protect against simultaneous shutdown invocations

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 77bf25ea70200cddf083f74b7f617e5f07fac8bd)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: only add a controller to dev_list after it's been fully initialized
Christoph Hellwig [Thu, 22 Oct 2015 12:03:33 +0000 (14:03 +0200)]
nvme: only add a controller to dev_list after it's been fully initialized

Without this we can easily get bad derferences on nvmeq->d_db when the nvme
kthread tries to poll the CQs for controllers that are in half initialized
state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7385014c073263b077442439299fad013edd4409)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: only ignore hardware errors in nvme_create_io_queues
Christoph Hellwig [Thu, 26 Nov 2015 10:46:39 +0000 (11:46 +0100)]
nvme: only ignore hardware errors in nvme_create_io_queues

Half initialized queues due to kernel error returns or timeout are still a
good reason to give up on initializing a controller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 749941f2365db8198b5d75c83a575ee6e55bf03b)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: precedence bug in nvme_pr_clear()
Dan Carpenter [Wed, 9 Dec 2015 10:24:06 +0000 (13:24 +0300)]
nvme: precedence bug in nvme_pr_clear()

The "|" operator has higher precedence than "?:" so this didn't work as
intended.  I had previously fixed this bug, but it we copied the older
unfixed version when we moved the function between files.

Fixes: 1673f1f08c88 ('nvme: move block_device_operations and ns/ctrl freeing to common code')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 8c0b39155048d5a24f25c6c60aa83729927b04cd)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: fix another 32-bit build warning
Arnd Bergmann [Tue, 8 Dec 2015 15:22:17 +0000 (16:22 +0100)]
nvme: fix another 32-bit build warning

The nvme_user_cmd function was recently moved around from one file
to another, which made a warning reappear that I had fixed before
at some point:

drivers/nvme/host/core.c: In function 'nvme_user_cmd':
drivers/nvme/host/core.c:424:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

This applies the same workaround that we have elsewhere in the
driver with an extra type cast to uintptr_t.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 1673f1f08c88 ("nvme: move block_device_operations and ns/ctrl freeing to common code")
Link: https://lkml.org/lkml/2015/10/9/611
Signed-off-by: Jens Axboe <axboe@fb.com>
Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: refactor set_queue_count
Christoph Hellwig [Tue, 20 Dec 2016 00:07:25 +0000 (16:07 -0800)]
nvme: refactor set_queue_count

Split out a helper that just issues the Set Features and interprets the
result which can go to common code, and document why we are ignoring
non-timeout error returns in the PCIe driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 9a0be7abb62ff2a7dc3360ab45c31f29b3faf642)

Orabug: 25130845
Conflicts:
drivers/nvme/host/pci.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move chardev and sysfs interface to common code
Christoph Hellwig [Tue, 20 Dec 2016 03:33:37 +0000 (19:33 -0800)]
nvme: move chardev and sysfs interface to common code

For this we need to add a proper controller init routine and a list of
all controllers that is in addition to the list of PCIe controllers,
which stays in pci.c.  Note that we remove the sysfs device when the
last reference to a controller is dropped now - the old code would have
kept it around longer, which doesn't make much sense.

This requires a new ->reset_ctrl operation to implement controleller
resets, and a new ->write_reg32 operation that is required to implement
subsystem resets.  We also now store caches copied of the NVMe compliance
version and the flag if a controller is attached to a subsystem or not in
the generic controller structure now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Fixes for pr merge]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f3ca80fc11c3af566eacd99cf821c1a48035c63b)

Orabug: 25130845
Conflicts:
        drivers/nvme/host/pci.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move namespace scanning to common code
Christoph Hellwig [Tue, 20 Dec 2016 03:30:55 +0000 (19:30 -0800)]
nvme: move namespace scanning to common code

The namespace scanning code has been mostly generic already, we just
need to store a pointer to the tagset in the nvme_ctrl structure, and
add a method to check if a controller is I/O incapable.  The latter
will hopefully be replaced by a proper controller state machine soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Fixed pr conflicts]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 5bae7f73d378a986671a3cad717c721b38f80d9e)

Orabug: 25130845
Conflicts:
    drivers/nvme/host/nvme.h
    drivers/nvme/host/pci.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move the call to nvme_init_identify earlier
Christoph Hellwig [Fri, 16 Oct 2015 05:58:46 +0000 (07:58 +0200)]
nvme: move the call to nvme_init_identify earlier

We want to record the identify and CAP values even if no I/O queue
is available.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit ce4541f40a949cd9a9c9f308b1a6a86914ce6e1a)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: add a common helper to read Identify Controller data
Christoph Hellwig [Mon, 19 Dec 2016 22:33:53 +0000 (14:33 -0800)]
nvme: add a common helper to read Identify Controller data

And add the 64-bit register read operation for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7fd8930f26be4c9078684b2fef14da0503771bf2)

Orabug: 25130845
Conflicts:
drivers/nvme/host/pci.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move nvme_{enable,disable,shutdown}_ctrl to common code
Christoph Hellwig [Mon, 19 Dec 2016 22:29:59 +0000 (14:29 -0800)]
nvme: move nvme_{enable,disable,shutdown}_ctrl to common code

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 5fd4ce1b005bd6ede913763f65efae9af6f7f386)

Orabug: 25130845
Conflicts:
drivers/nvme/host/pci.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move remaining CC setup into nvme_enable_ctrl
Christoph Hellwig [Sat, 28 Nov 2015 14:01:09 +0000 (15:01 +0100)]
nvme: move remaining CC setup into nvme_enable_ctrl

Remove the calculation of all the bits written into the CC register into
nvme_enable_ctrl, so that they can be moved into the core NVMe driver in
the future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1b2eb374651f0496b86ed5f095d4c448bff214fa)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: add explicit quirk handling
Christoph Hellwig [Mon, 19 Dec 2016 20:27:49 +0000 (12:27 -0800)]
nvme: add explicit quirk handling

Add an enum for all workarounds not in the spec and identify the affected
controllers at probe time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 106198edb74cdf3fe1aefa6ad1e199b58ab7c4cb)

Orabug: 25130845
Conflicts:
drivers/nvme/host/pci.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move block_device_operations and ns/ctrl freeing to common code
Ashok Vairavan [Mon, 19 Dec 2016 23:41:31 +0000 (15:41 -0800)]
nvme: move block_device_operations and ns/ctrl freeing to common code

This moves the block_device_operations over to common code mostly
as-is.  The only change is that the ns and ctrl refcounting got some
small refcounting to have wrappers around the kref_put operations.

A new free_ctrl operation is added to allow the PCI driver to free
it's ressources on the final drop.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Moved the integrity and pr changes due to merge conflict]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1673f1f08c8876f3942b4fa5e8f6a40215f15a94)

Orabug: 25130845
Conflicts:
    drivers/nvme/host/pci.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: use the block layer for userspace passthrough metadata
Keith Busch [Fri, 23 Oct 2015 15:47:28 +0000 (09:47 -0600)]
nvme: use the block layer for userspace passthrough metadata

Use the integrity API to pass through metadata from userspace.  For PI
enabled devices this means that we now validate the reftag, which seems
like an unintentional ommission in the old code.

Thanks to Keith Busch for testing and fixes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Skip metadata setup on admin commands]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 0b7f1f26f95a51ab11d4dc0adee230212b3cd675)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: split __nvme_submit_sync_cmd
Christoph Hellwig [Mon, 19 Dec 2016 20:00:59 +0000 (12:00 -0800)]
nvme: split __nvme_submit_sync_cmd

Add a separate nvme_submit_user_cmd for commands that directly DMA
to or from userspace.  We'll add metadata support to that soon and
the common version would become too messy.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4160982e7594481d6b7f90aa693638a37d20ea17)

Orabug: 25130845
Conflicts:
drivers/nvme/host/core.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move nvme_setup_flush and nvme_setup_rw to common code
Christoph Hellwig [Fri, 16 Oct 2015 05:58:40 +0000 (07:58 +0200)]
nvme: move nvme_setup_flush and nvme_setup_rw to common code

And mark them inline so that we don't slow down the I/O submission path by
having to turn it into a forced out of line call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 22944e9981db1e496d983298fd420a8c6b758c80)

Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move nvme_error_status to common code
Christoph Hellwig [Fri, 16 Oct 2015 05:58:39 +0000 (07:58 +0200)]
nvme: move nvme_error_status to common code

And mark it inline so that we don't slow down the completion path by
having to turn it into a forced out of line call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 15a190f7f57a2e46717490c35ac09882042a200b)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: factor out a nvme_unmap_data helper
Christoph Hellwig [Mon, 19 Dec 2016 21:37:34 +0000 (13:37 -0800)]
nvme: factor out a nvme_unmap_data helper

This is the counter part to nvme_map_data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d4f6c3aba5b496a2cb80a8e8e082ae51e46579f3)

Orabug: 25130845
Conflicts:
    drivers/nvme/host/pci.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: simplify nvme_setup_prps calling convention
Christoph Hellwig [Mon, 19 Dec 2016 19:46:47 +0000 (11:46 -0800)]
nvme: simplify nvme_setup_prps calling convention

Pass back a true/false value instead of the length which needs a compare
with the bytes in the request and drop the pointless gfp_t argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 69d2b571746d1c3fa10b7a0aa00859b296a98d12)

Orabug: 25130845
Conflicts:
drivers/nvme/host/pci.c
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: split a new struct nvme_ctrl out of struct nvme_dev
Christoph Hellwig [Mon, 19 Dec 2016 19:34:38 +0000 (11:34 -0800)]
nvme: split a new struct nvme_ctrl out of struct nvme_dev

The new struct nvme_ctrl will be used by the common NVMe code that sits
on top of struct request_queue and the new nvme_ctrl_ops abstraction.
It only contains the bare minimum required, which consists of values
sampled during controller probe, the admin queue pointer and a second
struct device pointer at the moment, but more will follow later.  Only
values that are not used in the I/O fast path should be moved to
struct nvme_ctrl so that drivers can optimize their cache line usage
easily.  That's also the reason why we have two device pointers as
the struct device is used for DMA mapping purposes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1c63dc66580d4bbb6d2b75bf184b5aa105ba5bdb)

Orabug: 25130845
Conflicts:
    drivers/nvme/host/nvme.h
    drivers/nvme/host/pci.c
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: use vendor it from identify
Christoph Hellwig [Thu, 26 Nov 2015 08:59:44 +0000 (09:59 +0100)]
nvme: use vendor it from identify

Use the vendor ID from the identify data instead of the PCI device to
make the SCSI translation layer independent from the PCI driver.  The NVMe
spec defines them as having the same value for current PCIe devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 01fec28a6f3ba96d4f46a538eae089dd92189fd1)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: split nvme_trans_device_id_page
Christoph Hellwig [Thu, 26 Nov 2015 08:55:48 +0000 (09:55 +0100)]
nvme: split nvme_trans_device_id_page

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bf7d3ebbd219d8ad948e812d03e1decfd96c97d0)

Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: use offset instead of a struct for registers
Christoph Hellwig [Mon, 19 Dec 2016 05:55:25 +0000 (21:55 -0800)]
nvme: use offset instead of a struct for registers

This makes life easier for future non-PCI drivers where access to the
registers might be more complicated.  Note that Linux drivers are
pretty evenly split between the two versions, and in fact the NVMe
driver already uses offsets for the doorbells.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
[Fixed CMBSZ offset]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7a67cbea653e444d04d7e850ab9631a14a196422)

Conflicts:
        Merge conflict due to the Samsung Errata patch
        drivers/nvme/host/pci.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: split command submission helpers out of pci.c
Christoph Hellwig [Mon, 19 Dec 2016 05:54:06 +0000 (21:54 -0800)]
nvme: split command submission helpers out of pci.c

Create a new core.c and start by adding the command submission helpers
to it, which are already abstracted away from the actual hardware queues
by the block layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 21d34711e1b5970acfb22bddf1fefbfbd7e0123b)

Orabug: 25130845
Conflicts:
        drivers/nvme/host/Makefile
        drivers/nvme/host/nvme.h
        drivers/nvme/host/pci.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move struct nvme_iod to pci.c
Christoph Hellwig [Mon, 19 Dec 2016 03:03:13 +0000 (19:03 -0800)]
nvme: move struct nvme_iod to pci.c

This structure is specific to the PCIe driver internals and should be moved
to pci.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 71bd150c71072014d98bff6dc2db3229306ece35)

Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Precedence error in nvme_pr_clear()
Dan Carpenter [Mon, 19 Dec 2016 02:45:55 +0000 (18:45 -0800)]
NVMe: Precedence error in nvme_pr_clear()

The original code is equivalent to:

        u32 cdw10 = (1 | key) ? 1 << 3 : 0;

But we want:

        u32 cdw10 = 1 | (key ? 1 << 3 : 0);

Fixes: 1d277a637a71: ('NVMe: Add persistent reservation ops')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 73fcf4e20ebd19468b3ad033be93582258435462)

Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoUpdate target repo for nvme patch contributions
Jay Freyensee [Mon, 19 Dec 2016 02:43:39 +0000 (18:43 -0800)]
Update target repo for nvme patch contributions

Per http://www.nvmexpress.org/resources/linux-driver-information/, the
old nvme git repo is stale.  Updating MAINTAINERS to the Supported
target currently used by the community.

Signed-off-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Updated by me to add Keith as the maintainer, me as the co-maintainer.

Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit b3975e94f5688691f487ea00126dabe8f5bee3af)

Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: add missing endianess annotations in nvme_pr_command
Christoph Hellwig [Mon, 19 Dec 2016 02:42:26 +0000 (18:42 -0800)]
nvme: add missing endianess annotations in nvme_pr_command

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: ad4fd3610c27 ("NVMe: Add persistent reservation ops")
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a6dd1020d8ac55782f3e04856644cf68765f8c1b)

Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoblock: rename REQ_TYPE_SPECIAL to REQ_TYPE_DRV_PRIV
Christoph Hellwig [Mon, 19 Dec 2016 05:11:23 +0000 (21:11 -0800)]
block: rename REQ_TYPE_SPECIAL to REQ_TYPE_DRV_PRIV

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked the commit 4f8c9510ba71bb54477841bebb90154ef140860f)

Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoblock: add an API for Persistent Reservations
Christoph Hellwig [Mon, 19 Dec 2016 01:59:11 +0000 (17:59 -0800)]
block: add an API for Persistent Reservations

This commits adds a driver API and ioctls for controlling Persistent
Reservations s/genericly/generically/ at the block layer.  Persistent
Reservations are supported by SCSI and NVMe and allow controlling who gets
access to a device in a shared storage setup.

Note that we add a pr_ops structure to struct block_device_operations
instead of adding the members directly to avoid bloating all instances
of devices that will never support Persistent Reservations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bbd3e064362e5057cc4799ba2e4d68c7593e490b)

Orabug: 25130845
Conflicts:
    block/ioctl.c
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Add persistent reservation ops
Keith Busch [Sun, 18 Dec 2016 15:00:37 +0000 (07:00 -0800)]
NVMe: Add persistent reservation ops

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: rebased, set PTPL=1]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1d277a637a711af44574229c544c44126ad5bf32)

Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: suspend i/o during runtime blk_integrity_unregister
Dan Williams [Sun, 18 Dec 2016 14:58:48 +0000 (06:58 -0800)]
nvme: suspend i/o during runtime blk_integrity_unregister

Synchronize pending i/o against a change in the integrity profile to
avoid the possibility of spurious integrity errors.

Cc: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
[keith: also protect dynamic integrity registration]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4cfc766e07a5ed709a9d5289c8644fe78e9f24de)

Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme include linux types.h
Christoph Hellwig [Sun, 18 Dec 2016 13:41:21 +0000 (05:41 -0800)]
nvme include linux types.h

The buildbot complains about this even if it doesn't generate
a a build warning.  But it's an easy fix, so here we go:

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2812dfe370516ef958b5c9e2eca1b2f002236d2d)

Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move to a new drivers/nvme/host directory
Jay Sternberg [Sun, 18 Dec 2016 03:44:56 +0000 (19:44 -0800)]
nvme: move to a new drivers/nvme/host directory

This patch moves the NVMe driver from drivers/block/ to its own new
drivers/nvme/host/ directory.  This is in preparation of splitting the
current monolithic driver up and add support for the upcoming NVMe
over Fabrics standard.  The drivers/nvme/host/ is chose to leave space
for a NVMe target implementation in addition to this host side driver.

Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
[hch: rebased, renamed core.c to pci.c, slight tweaks]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 57dacad5f2288e3de91f99b29f07b4a2793446d2)

Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Set affinity after allocating request queues
Keith Busch [Sun, 18 Dec 2016 03:03:47 +0000 (19:03 -0800)]
NVMe: Set affinity after allocating request queues
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The asynchronous namespace scanning caused affinity hints to be set before
its tagset initialized, so there was no cpu mask to set the hint. This
patch moves the affinity hint setting to after namespaces are scanned.

Reported-by: 김경산 <ks0204.kim@samsung.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bda4e0fb3126aca15586d165b5a15a37edc0a984)

Orabug: 25130845
Conflicts:
        Manually patched the commit.
        drivers/block/nvme-core.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix IO for extended metadata formats
Keith Busch [Sat, 17 Dec 2016 23:33:20 +0000 (15:33 -0800)]
NVMe: Fix IO for extended metadata formats

This fixes io submit ioctl handling when using extended metadata
formats. When these formats are used, the user provides a single virtually
contiguous buffer containing both the block and metadata interleaved,
so the metadata size needs to be added to the total length and not mapped
as a separate transfer.

The command is also driver generated, so this patch does not enforce
blk-integrity extensions provide the metadata buffer.

Reported-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 71feb364e7faadc681e714f7fdc2bede208ba26c)

Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Remove hctx reliance for multi-namespace
Keith Busch [Sat, 17 Dec 2016 23:31:36 +0000 (15:31 -0800)]
NVMe: Remove hctx reliance for multi-namespace

The driver needs to track shared tags to support multiple namespaces
that may be dynamically allocated or deleted. Relying on the first
request_queue's hctx's is not appropriate as we cannot clear outstanding
tags for all namespaces using this handle, nor can the driver easily track
all request_queue's hctx as namespaces are attached/detached. Instead,
this patch uses the nvme_dev's tagset to get the shared tag resources
instead of through a request_queue hctx.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 42483228d4c019ffc86b8dbea7dfbc3f9566fe7e)

Orabug: 25130845
Conflicts:
    nvme_set_irq_hints() needs to check tags instead of hctx and
retain nvme_admin_exit_hctx as exit_hctx
    drivers/block/nvme-core.c

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Use requested sync command timeout
Keith Busch [Sat, 17 Dec 2016 17:39:58 +0000 (09:39 -0800)]
NVMe: Use requested sync command timeout

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f4ff414aeb472397d3b4fc15c22ca65bab219ec8)

Orabug: 25130845
Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
8 years agoRevert "nvme: move to a new drivers/nvme/host directory"
Ashok Vairavan [Sat, 17 Dec 2016 03:23:09 +0000 (19:23 -0800)]
Revert "nvme: move to a new drivers/nvme/host directory"

This reverts commit 57dacad5f2288e3de91f99b29f07b4a2793446d2. We need to
cherry-pick many commits before merging this commit. Hence this commit
is reverted to cherry-pick the commits from upstream.

Orabug: 25130845
Conflicts:
        drivers/nvme/host/Kconfig

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoRevert "NVMe: reduce admin queue depth as workaround for Samsung EPIC SQ errata"
Ashok Vairavan [Mon, 13 Mar 2017 01:45:28 +0000 (18:45 -0700)]
Revert "NVMe: reduce admin queue depth as workaround for Samsung EPIC SQ errata"

This reverts commit ab4538cd6fb47c5a3475d0652830a1d4c8c46167.

8 years agoRevert "nvme: Limit command retries"
Ashok Vairavan [Mon, 13 Mar 2017 01:44:11 +0000 (18:44 -0700)]
Revert "nvme: Limit command retries"

This reverts commit 582575bf4329fa5e29c0f1eae79cb0fb13fded04.

8 years agoRevert "nvme: avoid cqe corruption when update at the same time as read"
Ashok Vairavan [Mon, 13 Mar 2017 01:43:54 +0000 (18:43 -0700)]
Revert "nvme: avoid cqe corruption when update at the same time as read"

This reverts commit 4369f33dfdd50a5011922d45830e2b69ba4067ce.

8 years agoRevert "NVMe: Don't unmap controller registers on reset"
Ashok Vairavan [Mon, 13 Mar 2017 01:43:27 +0000 (18:43 -0700)]
Revert "NVMe: Don't unmap controller registers on reset"

This reverts commit 75502b9da27d7be3132b9eb3b7da52eae48c3556.

8 years agoRevert "NVMe: reverse IO direction for VUC command code F7"
Ashok Vairavan [Mon, 13 Mar 2017 01:42:45 +0000 (18:42 -0700)]
Revert "NVMe: reverse IO direction for VUC command code F7"

This reverts commit a9ddbd6640c88276b8cc8bea5201bf45df7ab71e.

8 years agoRevert "NVMe: reduce queue depth as workaround for Samsung EPIC SQ errata"
Ashok Vairavan [Mon, 13 Mar 2017 01:40:47 +0000 (18:40 -0700)]
Revert "NVMe: reduce queue depth as workaround for Samsung EPIC SQ errata"

This reverts commit 881900f628c3bedf653aa677da465ab3b8eddf31.

8 years agoMerge branch 'uek4/topic/uek-4.1/xen-bug26107942' into uek/uek-next/for-chander-bug26...
Joao Martins [Thu, 1 Jun 2017 12:21:56 +0000 (13:21 +0100)]
Merge branch 'uek4/topic/uek-4.1/xen-bug26107942' into uek/uek-next/for-chander-bug26107942

 Conflicts:
arch/x86/xen/enlighten.c
drivers/net/xen-netfront.c
fs/proc/generic.c
fs/proc/internal.h

arch/x86/xen/enlighten.c had one header to not be conditionally include with
CONFIG_KEXEC introduced by commit 28a4be540b ("kexec: allow kdump with
crash_kexec_post_notifiers"); fs/proc/* had required exporting a new
symbol to be used by commit ac7bd1728ac4 ("xenfs: Use
proc_create_mount_point() to create /proc/xen"); finally the
xen-netfront.c had already accounted for the changes introduced by
9e13456b6312 and 0d1d6389b930 - hence we simply retain the topic branch
version.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
8 years agoforcedeth: enable forcedeth kernel option
Zhu Yanjun [Wed, 31 May 2017 03:44:11 +0000 (23:44 -0400)]
forcedeth: enable forcedeth kernel option

Orabug: 25571921

The NVIDIA forcedeth nic is used in the customer hosts. As such,
forcedeth driver is needed.

Signed-off-by: yanjun.zhu@oracle.com
Reviewed-by: John Haxby <john.haxby@oracle.com>
8 years agoipmi: Edit ambiguous error message for unknown command
Atish Patra [Tue, 30 May 2017 17:57:31 +0000 (11:57 -0600)]
ipmi: Edit ambiguous error message for unknown command

IPMI SI interfaces issues clear flag command irrespective
of underlying physical interface. In case the platform does
not recognize this command, it returns correct response
unknown command (0xc1). However, SI interface prints this
as if it is an error, and this leads to ambiguity. This should
only be an info message in case of unknown command and a warning
if platform returns some other error response.

Edit the message to clear the ambiguity.

Orabug: 25461958

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com>
(cherry picked from commit 0f6bfff7803dacb7ebfd765d6a2beb54e018698a)

Conflicts:

drivers/char/ipmi/ipmi_si_intf.c

8 years agokabi whitelist: Remove all ib_ symbols from the list.
Knut Omang [Wed, 31 May 2017 18:03:13 +0000 (20:03 +0200)]
kabi whitelist: Remove all ib_ symbols from the list.

The following symbols are all used by the sif driver and are the only ib_ symbols
in the current uek4 kabi whitelist:

ib_alloc_device
ib_dealloc_device
ib_dispatch_event
ib_modify_qp_is_ok
ib_rate_to_mult
ib_register_device
ib_umem_get_attrs
ib_umem_release
ib_unregister_device

Remove these symbols from the list to allow a data structure change needed to
fix bug 25723815. This change breaks the kabi in the IB area.

Orabug: 25955825

Signed-off-by: Knut Omang <knut.omang@oracle.com>
8 years agoext4: print ext4 mount option data_err=abort correctly
Ales Novak [Sun, 13 Mar 2016 02:55:50 +0000 (21:55 -0500)]
ext4: print ext4 mount option data_err=abort correctly

If data_err=abort option is specified for an ext3/ext4 mount,
/proc/mounts does show it as "(null)". This is caused by token2str()
returning NULL for Opt_data_err_abort (due to its pattern containing
'=').

Signed-off-by: Ales Novak <alnovak@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Orabug: 25691020
Acked-by: todd.vierling@oracle.com
8 years agoIB/sa: Allocate SA query with kzalloc
Kaike Wan [Fri, 14 Aug 2015 12:52:08 +0000 (08:52 -0400)]
IB/sa: Allocate SA query with kzalloc

Orabug: 26124118

Replace kmalloc with kzalloc so that all uninitialized fields in SA query
will be zero-ed out to avoid unintentional consequence. This prepares the
SA query structure to accept new fields in the future.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit 5d2657708ec25b9fb3dd174443b1f647babcbe62)

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
8 years agoIB/sa: Fix netlink local service GFP crash
Kaike Wan [Thu, 21 Jan 2016 13:41:31 +0000 (08:41 -0500)]
IB/sa: Fix netlink local service GFP crash

Orabug: 26124118

The rdma netlink local service registers a handler to handle RESOLVE
response and another handler to handle SET_TIMEOUT request. The first
thing these handlers do is to call netlink_capable() to check the
access right of the received skb to make sure that the sender has root
access. Under normal conditions, such responses and requests will be
directly forwarded to the handlers without going through the netlink_dump
pathway (see ibnl_rcv_msg() in drivers/infiniband/core/netlink.c).
However, a user application could send a RESOLVE request (not response)
to the local service, which will fall into the netlink_dump pathway,
where a new skb will be created without initializing the control block.
This new skb will be eventually forwarded to the local service RESOLVE
response handler. Unfortunately, netlink_capable() will cause general
protection fault if the skb's control block is not initialized. This
patch will address the problem by checking the skb first.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit 2deeb4772971e56d5bdac0bd3375d5eadaa827fd)

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
8 years agoIB/sa: Fix rdma netlink message flags
Kaike Wan [Thu, 20 Aug 2015 18:20:42 +0000 (14:20 -0400)]
IB/sa: Fix rdma netlink message flags

Orabug: 26124118

The flags to ibnl_put_msg should be NLM_F_REQUEST instead of GFP_KERNEL.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit ba13b5f8f86efa78bc0aaea297b0001b6cbf6c21)

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
8 years agoIB/sa: Put netlink request into the request list before sending
Kaike Wan [Fri, 30 Oct 2015 12:23:45 +0000 (08:23 -0400)]
IB/sa: Put netlink request into the request list before sending

Orabug: 26124118

It was found by Saurabh Sengar that the netlink code tried to allocate
memory with GFP_KERNEL while holding a spinlock. While it is possible
to fix the issue by replacing GFP_KERNEL with GFP_ATOMIC, it is better
to get rid of the spinlock while sending the packet. However, in order
to protect against a race condition that a quick response may be received
before the request is put on the request list, we need to put the request
on the list first.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reported-by: Saurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit 3ebd2fd0d0119a5ac7906bf17be637b527f63d31)

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
8 years agoIB/core: Fix a potential array overrun in CMA and SA agent
Yuval Shaia [Thu, 11 May 2017 01:03:18 +0000 (21:03 -0400)]
IB/core: Fix a potential array overrun in CMA and SA agent

Orabug: 26124118

Fix array overrun when going over callback table.
In declaration of callback table, the max size isn't provided and in
registration phase, it is provided.

There is potential scenario where a new operation is added and it is not
supported by current client. The acceptance of such operation by ib_netlink
will cause to array overrun.

Fixes: 809d5fc9bf65 ("infiniband: pass rdma_cm module to
netlink_dump_start")
Fixes: b493d91d333e ("iwcm: common code for port mapper")
Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink")
(Backported from commit 2fa2d4fb1166d1ef35f0aacac6165d53ab1b89c7)

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
8 years agoIB/SA: Use correct free function
Mark Bloch [Fri, 6 May 2016 19:45:27 +0000 (22:45 +0300)]
IB/SA: Use correct free function

Orabug: 26124118

Fixes a direct call to kfree_skb when nlmsg_free should be used.

Fixes: 2ca546b92a02 ('IB/sa: Route SA pathrecord query through netlink')
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit 0f377d86252d11bfea941852785e3094b93601a7)

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
8 years agoIB/sa: Route SA pathrecord query through netlink
Kaike Wan [Fri, 14 Aug 2015 12:52:09 +0000 (08:52 -0400)]
IB/sa: Route SA pathrecord query through netlink

Orabug: 26124118

This patch routes a SA pathrecord query to netlink first and processes the
response appropriately. If a failure is returned, the request will be sent
through IB. The decision whether to route the request to netlink first is
determined by the presence of a listener for the local service netlink
multicast group. If the user-space local service netlink multicast group
listener is not present, the request will be sent through IB, just like
what is currently being done.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit 2ca546b92a024d07adedd15b4c262b1c2c0786ec)

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
8 years agoIB/core: Add rdma netlink helper functions
Kaike Wan [Fri, 14 Aug 2015 12:52:07 +0000 (08:52 -0400)]
IB/core: Add rdma netlink helper functions

Orabug: 26124118

This patch adds a function to check if listeners for a netlink multicast
group are present. It also adds a function to receive netlink response
messages.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit bc10ed7d3d19ff61427007b4d7bf98d3e57bb333)

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
8 years agoIB/netlink: Add defines for local service requests through netlink
Kaike Wan [Fri, 14 Aug 2015 12:52:06 +0000 (08:52 -0400)]
IB/netlink: Add defines for local service requests through netlink

Orabug: 26124118

This patch adds netlink defines for local service client, local service
group, local service operations, and related attributes.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit 6431eb87065ffd24dfc7c0b6954e80a4eb74e177)

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
8 years agoscsi: mpt3sas: remove redundant wmb
Sinan Kaya [Fri, 7 Apr 2017 19:06:18 +0000 (15:06 -0400)]
scsi: mpt3sas: remove redundant wmb

Orabug: 26096353

Due to relaxed ordering requirements on multiple architectures, drivers
are required to use wmb/rmb/mb combinations when they need to guarantee
observability between the memory and the HW.

The mpt3sas driver is already using wmb() for this purpose.  However, it
issues a writel following wmb(). writel() function on arm/arm64
arhictectures have an embedded wmb() call inside.

This results in unnecessary performance loss and code duplication.

writel already guarantees ordering for both cpu and bus. we don't need
additional wmb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b1391a5bf83a593bbe92d1f9bddaf563be5c7c9d)
Signed-off-by: Shan Hai <shan.hai@oracle.com>
8 years agoscsi: mpt3sas: Updating driver version to v15.100.00.00
Chaitra P B [Mon, 23 Jan 2017 09:56:10 +0000 (15:26 +0530)]
scsi: mpt3sas: Updating driver version to v15.100.00.00

Orabug: 26096353

Updated driver version to "15.100.00.00"

Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7cfa76963f1872461adff2e84edfbaa8e17d189b)
Signed-off-by: Shan Hai <shan.hai@oracle.com>
8 years agoscsi: mpt3sas: Fix for Crusader to achieve product targets with SAS devices.
Chaitra P B [Mon, 23 Jan 2017 09:56:08 +0000 (15:26 +0530)]
scsi: mpt3sas: Fix for Crusader to achieve product targets with SAS devices.

Orabug: 26096353

Small glitch/degraded performance in Crusader is improved with SAS
drives by removing unnecessary spinlocks while clearing scsi command in
drivers internal lookup table.

Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 459325c466d278d3c9f51ddc9bb544c014136fd1)
Signed-off-by: Shan Hai <shan.hai@oracle.com>
8 years agoscsi: mpt3sas: Fix Firmware fault state 0x2100 during heavy 4K RR FIO stress test.
Chaitra P B [Mon, 23 Jan 2017 09:56:09 +0000 (15:26 +0530)]
scsi: mpt3sas: Fix Firmware fault state 0x2100 during heavy 4K RR FIO stress test.

Orabug: 26096353

Due existence of loop in the IO path our HBA will receive heavy IOs and
also as driver is not updating the Reply Post Host Index frequently, So
there will be a high chance that our Firmware unable to find any free
entry in the Reply Post Descriptor Queue (i.e. Queue overflow occurs)
and can observe 0x2100 firmware fault.  So to fix this, we have defined
a thresh hold value. After continuously processing this thresh hold
number of reply descriptors driver will update the Reply Descriptor Host
Index so that this thresh hold number of reply descriptors entries will
be freed and these entries will be available for firmware and we won't
observe this Firmware fault. We have defined this threshold value as
1/3rd of the hba queue depth.

Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6b4c335a0f6cc61c69cd24f24e40b118bd9f778a)
Signed-off-by: Shan Hai <shan.hai@oracle.com>
8 years agoscsi: mpt3sas: Added print to notify cable running at a degraded speed.
Chaitra P B [Mon, 23 Jan 2017 09:56:07 +0000 (15:26 +0530)]
scsi: mpt3sas: Added print to notify cable running at a degraded speed.

Orabug: 26096353

Driver processes the event MPI26_EVENT_ACTIVE_CABLE_DEGRADED when a
cable is present and is running at a degraded speed (below the SAS3 12
Gb/s rate). Prints added to inform the user that the cable is not
running at optimal speed.

Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6c44c0fe91af7bac78dcaf4c106421862530f499)
Signed-off-by: Shan Hai <shan.hai@oracle.com>
8 years agoxen-blkback: report hotplug-status busy when detach is initiated but frontend device...
Niranjan Patil [Thu, 23 Mar 2017 15:57:24 +0000 (08:57 -0700)]
xen-blkback: report hotplug-status busy when detach is initiated but frontend device is busy.

In case of deferred detach xm/xend doesn't get notified about busy status
and has to wait timeout (default 100s) to report detach failure to user.
This behavior is sometime incorrectly interpreted as tool hang.

This patch updates the hotplug-status with busy so that xm gets notified
instead of timeout.

Orabug: 26072430
Signed-off-by: Niranjan Patil <niranjan.d.patil@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agoqla2xxx: Allow vref count to timeout on vport delete.
Joe Carnuccio [Wed, 15 Mar 2017 16:48:43 +0000 (09:48 -0700)]
qla2xxx: Allow vref count to timeout on vport delete.

This commit fixed a panic could be triggered with following steps:
1.create vhba
  #virsh nodedev-create  vhba.xml
2.destroy vhba
  #virsh nodedev-destroy scsi_host9

Content of file vhba.xml:

<device>
     <parent>scsi_host7</parent>
     <capability type='scsi_host'>
       <capability type='fc_host'>
       </capability>
     </capability>
</device>

Call trace of panic:

[  207.683754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000410
[  207.683805] IP: [<ffffffffa0221d0f>] qla24xx_vport_delete+0xdf/0x180 [qla2xxx]
[  207.683850] PGD 0
[  207.683863] Oops: 0000 [#1] SMP
[  207.684391] CPU: 0 PID: 2029 Comm: libvirtd Not tainted 4.1.12-94.2.1.el7uek.x86_64 #2
[  207.684418] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30100400 12/26/2016
[  207.684454] task: ffff88026fc31c00 ti: ffff88007278c000 task.ti: ffff88007278c000
[  207.684491] RIP: 0010:[<ffffffffa0221d0f>]  [<ffffffffa0221d0f>] qla24xx_vport_delete+0xdf/0x180 [qla2xxx]
[  207.684535] RSP: 0018:ffff88007278fcf8  EFLAGS: 00010202
[  207.684555] RAX: 0000000000000001 RBX: ffff8802729c17f8 RCX: ffffffffa0258e80
[  207.684578] RDX: 0000000000007086 RSI: 0000000000000000 RDI: ffff88026fef0360
[  207.684601] RBP: ffff88007278fd18 R08: 0000000000000001 R09: ffff88027741ad80
[  207.684625] R10: ffffea0009bbbc00 R11: 0000000000000000 R12: ffff88026fef0000
[  207.684649] R13: 0000000000000001 R14: ffff88026fef0360 R15: 0000000000000021
[  207.684673] FS:  00007f24f538d700(0000) GS:ffff880277400000(0000) knlGS:0000000000000000
[  207.684699] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  207.684719] CR2: 0000000000000410 CR3: 00000002731e7000 CR4: 00000000001406f0
[  207.684744] Stack:
[  207.684755]  ffff8802733dd800 ffff8802728fd000 ffff880474d3f000 ffff880474d3f648
[  207.684785]  ffff88007278fd58 ffffffffa000d344 ffff8804714f4000 ffff8802728fd000
[  207.684814]  ffff8802728fd000 ffff8802733dd800 0000000000000021 ffff880474d3f648
[  207.684861] Call Trace:
[  207.684897]  [<ffffffffa000d344>] fc_vport_terminate+0x44/0x150 [scsi_transport_fc]
[  207.684927]  [<ffffffffa000d594>] store_fc_host_vport_delete+0x144/0x180 [scsi_transport_fc]
[  207.684959]  [<ffffffff81489798>] dev_attr_store+0x18/0x30
[  207.684996]  [<ffffffff81294fbd>] sysfs_kf_write+0x3d/0x50
[  207.685017]  [<ffffffff8129446a>] kernfs_fop_write+0x12a/0x180
[  207.685040]  [<ffffffff812129b7>] __vfs_write+0x37/0x120
[  207.685061]  [<ffffffff812158d8>] ? __sb_start_write+0x58/0x110
[  207.685084]  [<ffffffff812c1743>] ? security_file_permission+0x23/0xa0
[  207.685107]  [<ffffffff812130f9>] vfs_write+0xa9/0x1b0
[  207.685128]  [<ffffffff81736c16>] ? mutex_lock+0x16/0x37
[  207.685147]  [<ffffffff81213fe5>] SyS_write+0x55/0xd0
[  207.685179]  [<ffffffff81738c6e>] system_call_fastpath+0x12/0x71
[  207.685200] Code: 07 00 00 01 0f b7 83 b0 01 00 00 f0 49 0f b3 84 24 30 07 00 00 4c 89 f7 e8 5f 4d 51 e1 48 8b b3 b8 01 00 00 0f b7 83 b0 01 00 00 <66> 39 86 10 04 00 00 74 68 45 0f b7 c5 48 89 de 31 c0 48 c7 c1
[  207.686827] RIP  [<ffffffffa0221d0f>] qla24xx_vport_delete+0xdf/0x180 [qla2xxx]
[  207.687634]  RSP <ffff88007278fcf8>
[  207.688391] CR2: 0000000000000410

Cc: <stable@vger.kernel.org>
Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit c4a9b538ab2a109c5f9798bea1f8f4bf93aadfb9)

Orabug: 26021151

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Conflicts:
drivers/scsi/qla2xxx/qla_attr.c
drivers/scsi/qla2xxx/qla_def.h
drivers/scsi/qla2xxx/qla_os.c

8 years agoBtrfs: don't BUG_ON() in btrfs_orphan_add
Josef Bacik [Fri, 27 May 2016 17:03:04 +0000 (13:03 -0400)]
Btrfs: don't BUG_ON() in btrfs_orphan_add

Orabug: 25975316

This is just a screwup for developers, so change it to an ASSERT() so developers
notice when things go wrong and deal with the error appropriately if ASSERT()
isn't enabled.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
(cherry picked from commit 3b6571c180da85e43550c608e954ab7b2a31d954)
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
8 years agoBtrfs: clarify do_chunk_alloc()'s return value
Liu Bo [Fri, 29 Jul 2016 18:09:50 +0000 (11:09 -0700)]
Btrfs: clarify do_chunk_alloc()'s return value

Orabug: 25975316

Function start_transaction() can return ERR_PTR(1) when flush is
BTRFS_RESERVE_FLUSH_LIMIT, so the call graph is

start_transaction (return ERR_PTR(1))
  -> btrfs_block_rsv_add (return 1)
     -> reserve_metadata_bytes (return 1)
        -> flush_space (return 1)
           -> do_chunk_alloc  (return 1)

With BTRFS_RESERVE_FLUSH_LIMIT, if flush_space is already on the
flush_state of ALLOC_CHUNK and it successfully allocates a new
chunk, then instead of trying to reserve space again,
reserve_metadata_bytes returns 1 immediately.

Eventually the callers who call start_transaction() usually just
do the IS_ERR() check which ERR_PTR(1) can pass, then it'll get
a panic when dereferencing a pointer which is ERR_PTR(1).

The following patch fixes the above problem.
"btrfs: flush_space: treat return value of do_chunk_alloc properly"
https://patchwork.kernel.org/patch/7778651/

This add comments to clarify do_chunk_alloc()'s return value.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit 28b737f6ede3661fe610937706c4a6f50e9ab769)
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
8 years agobtrfs: flush_space: treat return value of do_chunk_alloc properly
Alex Lyakas [Sun, 6 Dec 2015 10:32:31 +0000 (12:32 +0200)]
btrfs: flush_space: treat return value of do_chunk_alloc properly

Orabug: 25975316`

do_chunk_alloc returns 1 when it succeeds to allocate a new chunk.
But flush_space will not convert this to 0, and will also return 1.
As a result, reserve_metadata_bytes will think that flush_space failed,
and may potentially return this value "1" to the caller (depends how
reserve_metadata_bytes was called). The caller will also treat this as an error.
For example, btrfs_block_rsv_refill does:

int ret = -ENOSPC;
...
ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
if (!ret) {
        block_rsv_add_bytes(block_rsv, num_bytes, 0);
        return 0;
}

return ret;

So it will return -ENOSPC.

Signed-off-by: Alex Lyakas <alex@zadarastorage.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit eecba891d38051ebf7f4af6394d188a5fd151a6a)
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
8 years agoipv6: Skip XFRM lookup if dst_entry in socket cache is valid
Jakub Sitnicki [Wed, 8 Jun 2016 13:13:34 +0000 (15:13 +0200)]
ipv6: Skip XFRM lookup if dst_entry in socket cache is valid

Orabug: 25955089

At present we perform an xfrm_lookup() for each UDPv6 message we
send. The lookup involves querying the flow cache (flow_cache_lookup)
and, in case of a cache miss, creating an XFRM bundle.

If we miss the flow cache, we can end up creating a new bundle and
deriving the path MTU (xfrm_init_pmtu) from on an already transformed
dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
to xfrm_lookup(). This can happen only if we're caching the dst_entry
in the socket, that is when we're using a connected UDP socket.

To put it another way, the path MTU shrinks each time we miss the flow
cache, which later on leads to incorrectly fragmented payload. It can
be observed with ESPv6 in transport mode:

  1) Set up a transformation and lower the MTU to trigger fragmentation
    # ip xfrm policy add dir out src ::1 dst ::1 \
      tmpl src ::1 dst ::1 proto esp spi 1
    # ip xfrm state add src ::1 dst ::1 \
      proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
    # ip link set dev lo mtu 1500

  2) Monitor the packet flow and set up an UDP sink
    # tcpdump -ni lo -ttt &
    # socat udp6-listen:12345,fork /dev/null &

  3) Send a datagram that needs fragmentation with a connected socket
    # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
    2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
    00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
    00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
    00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
    (^ ICMPv6 Parameter Problem)
    00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136

  4) Compare it to a non-connected socket
    # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
    00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
    00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)

What happens in step (3) is:

  1) when connecting the socket in __ip6_datagram_connect(), we
     perform an XFRM lookup, miss the flow cache, create an XFRM
     bundle, and cache the destination,

  2) afterwards, when sending the datagram, we perform an XFRM lookup,
     again, miss the flow cache (due to mismatch of flowi6_iif and
     flowi6_oif, which is an issue of its own), and recreate an XFRM
     bundle based on the cached (and already transformed) destination.

To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
altogether whenever we already have a destination entry cached in the
socket. This prevents the path MTU shrinkage and brings us on par with
UDPv4.

The fix also benefits connected PINGv6 sockets, another user of
ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
twice.

Joint work with Hannes Frederic Sowa.

Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 00bc0ef5880dc7b82f9c320dead4afaad48e47be)
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Conflicts:
net/ipv6/ip6_output.c

8 years agoxen: Make VPMU init message look less scary
Juergen Gross [Tue, 2 Aug 2016 07:22:12 +0000 (09:22 +0200)]
xen: Make VPMU init message look less scary

The default for the Xen hypervisor is to not enable VPMU in order to
avoid security issues. In this case the Linux kernel will issue the
message "Could not initialize VPMU for cpu 0, error -95" which looks
more like an error than a normal state.

Change the message to something less scary in case the hypervisor
returns EOPNOTSUPP or ENOSYS when trying to activate VPMU.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Orabug: 25873416

(cherry picked from commit 0252937a87e1d46a8261da85cbd99dffe612a2d3)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@gmail.com>
8 years agouek-rpm: configs: enable CONFIG_ACPI_NFIT
Todd Vierling [Thu, 16 Mar 2017 14:08:54 +0000 (10:08 -0400)]
uek-rpm: configs: enable CONFIG_ACPI_NFIT

Orabug: 25719149
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoipv6: Don't use ufo handling on later transformed packets
Jakub Sitnicki [Wed, 26 Oct 2016 09:21:14 +0000 (11:21 +0200)]
ipv6: Don't use ufo handling on later transformed packets

Similar to commit c146066ab802 ("ipv4: Don't use ufo handling on later
transformed packets"), don't perform UFO on packets that will be IPsec
transformed. To detect it we rely on the fact that headerlen in
dst_entry is non-zero only for transformation bundles (xfrm_dst
objects).

Unwanted segmentation can be observed with a NETIF_F_UFO capable device,
such as a dummy device:

  DEV=dum0 LEN=1493

  ip li add $DEV type dummy
  ip addr add fc00::1/64 dev $DEV nodad
  ip link set $DEV up
  ip xfrm policy add dir out src fc00::1 dst fc00::2 \
     tmpl src fc00::1 dst fc00::2 proto esp spi 1
  ip xfrm state add src fc00::1 dst fc00::2 \
     proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b

  tcpdump -n -nn -i $DEV -t &
  socat /dev/zero,readbytes=$LEN udp6:[fc00::2]:$LEN

tcpdump output before:

  IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448
  IP6 fc00::1 > fc00::2: frag (1448|48)
  IP6 fc00::1 > fc00::2: ESP(spi=0x00000001,seq=0x2), length 88

... and after:

  IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448
  IP6 fc00::1 > fc00::2: frag (1448|80)

Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f89c56ce710afa65e1b2ead555b52c4807f34ff7)

Orabug: 25533743
Signed-off-by: Todd Vierling <todd.vierling@oracle.com>
8 years agonet/packet: fix overflow in check for tp_reserve
Andrey Konovalov [Wed, 29 Mar 2017 14:11:22 +0000 (16:11 +0200)]
net/packet: fix overflow in check for tp_reserve

Orabug: 25813773
CVE: CVE-2017-7308

When calculating po->tp_hdrlen + po->tp_reserve the result can overflow.

Fix by checking that tp_reserve <= INT_MAX on assign.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bcc5364bdcfe131e6379363f089e7b4108d35b70)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
8 years agonet/packet: fix overflow in check for tp_frame_nr
Andrey Konovalov [Wed, 29 Mar 2017 14:11:21 +0000 (16:11 +0200)]
net/packet: fix overflow in check for tp_frame_nr

Orabug: 25813773
CVE: CVE-2017-7308

When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow.

Add a check that tp_block_size * tp_block_nr <= UINT_MAX.

Since frames_per_block <= tp_block_size, the expression would
never overflow.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8f8d28e4d6d815a391285e121c3a53a0b6cb9e7b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
8 years agonet/packet: fix overflow in check for priv area size
Andrey Konovalov [Wed, 29 Mar 2017 14:11:20 +0000 (16:11 +0200)]
net/packet: fix overflow in check for priv area size

Orabug: 25813773
CVE: CVE-2017-7308

Subtracting tp_sizeof_priv from tp_block_size and casting to int
to check whether one is less then the other doesn't always work
(both of them are unsigned ints).

Compare them as is instead.

Also cast tp_sizeof_priv to u64 before using BLK_PLUS_PRIV, as
it can overflow inside BLK_PLUS_PRIV otherwise.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2b6867c2ce76c596676bec7d2d525af525fdc6e2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>