]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
8 years agoscsi: megaraid_sas: set pd_after_lb from MR_BuildRaidContext and initialize pDevHandl...
Shivasharan S [Fri, 10 Feb 2017 08:59:24 +0000 (00:59 -0800)]
scsi: megaraid_sas: set pd_after_lb from MR_BuildRaidContext and initialize pDevHandle to MR_DEVHANDLE_INVALID

Orabug: 26096381

Issue is limited for Syncro firmware where pd_after_lb is not set but is
accidentally used.  Not a functional issue, but results in low
performance due to improper load balancing between two LUNs.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b41c0a4aa7c0fc1f98648c020358598498d48f06)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Conflicts:
drivers/scsi/megaraid/megaraid_sas_fp.c

8 years agoscsi: megaraid_sas: latest controller OCR capability from FW before sending shutdown...
Shivasharan S [Fri, 10 Feb 2017 08:59:23 +0000 (00:59 -0800)]
scsi: megaraid_sas: latest controller OCR capability from FW before sending shutdown DCMD

Orabug: 26096381

Fetch the latest controller OCR capability from FW before sending
MR_DCMD_CTRL_SHUTDOWN When application sends a shutdown DCMD
(MR_DCMD_CTRL_SHUTDOWN), driver will fetch latest controller information
from firmware.  This is to ensure that driver always has latest OCR
capability of controller before sending the DCMD.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 95c060869e6872ea03a4a9d15236adcffb1d8b07)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: avoid unaligned access in ioctl path
Shivasharan S [Fri, 10 Feb 2017 08:59:22 +0000 (00:59 -0800)]
scsi: megaraid_sas: avoid unaligned access in ioctl path

Orabug: 26096381

Fix kernel warning for accessing unaligned memory access in driver.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 318aaef88353c09a73d26d3b87a74fab67ff9282)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Conflicts:
drivers/scsi/megaraid/megaraid_sas_base.c

8 years agoscsi: megaraid_sas: big endian support changes
Shivasharan S [Fri, 10 Feb 2017 08:59:21 +0000 (00:59 -0800)]
scsi: megaraid_sas: big endian support changes

Orabug: 26096381

Fix endiannes fixes for Ventura specific.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a174118b7a97c52c3c3a4f1b8eee594502a55381)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: Big endian RDPQ mode fix
Shivasharan S [Fri, 10 Feb 2017 08:59:20 +0000 (00:59 -0800)]
scsi: megaraid_sas: Big endian RDPQ mode fix

Orabug: 26096381

Fix if RDPQ mode enabled MR FW is deployed on big endian host machine,
driver does not setup reply address correctly.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ff96f9251768f3fe1b4cd6f48f4021b3a1be269b)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: MR_TargetIdToLdGet u8 to u16 and avoid invalid raid-map access
Shivasharan S [Fri, 10 Feb 2017 08:59:19 +0000 (00:59 -0800)]
scsi: megaraid_sas: MR_TargetIdToLdGet u8 to u16 and avoid invalid raid-map access

Orabug: 26096381

Change MR_TargetIdToLdGet return type from u8 to u16.

ld id range check is added at two places in this patch -
@megasas_build_ldio_fusion and @megasas_build_ld_nonrw_fusion.  Previous
driver code used different data type for lds TargetId returned from
MR_TargetIdToLdGet.  Prior to this change, above two functions was
safeguarded due to function always return u8 and maximum value of ld id
returned was 255.

In below check, fw_supported_vd_count as of today is 64 or 256 and valid
range to support is either 0-63 or 0-255. Ideally want to filter
accessing raid map for ld ids which are not valid. With the u16 change,
invalid ld id value is 0xFFFF and we will see kernel panic due to random
memory access in MR_LdRaidGet.  The changes will ensure we do not call
MR_LdRaidGet if ld id is beyond size of ldSpanMap array.

               if (ld < instance->fw_supported_vd_count)

>From firmware perspective,ld id 0xFF is invalid and even though current
driver code forward such command, firmware fails with target not
available.

ld target id issue occurs mainly whenever driver loops to populate raid
map (ea. MR_ValidateMapInfo).  These are the only two places where we
may see out of range target ids and wants to protect raid map access
based on range provided by Firmware API.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d2d0358bcd09139a8e71afbca35bcd6b219dd1bf)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: In validate raid map, raid capability is not converted to cpu...
Shivasharan S [Fri, 10 Feb 2017 08:59:18 +0000 (00:59 -0800)]
scsi: megaraid_sas: In validate raid map, raid capability is not converted to cpu format for all lds

Orabug: 26096381

On a host, if an ld is deleted there is a hole in the ld array returned
by the FW. But in MR_ValidateMapInfo we are not accounting for holes in
the ld array and traverse only upto index num_lds.  This patch takes
care of converting the capability field of all the valid lds in the ld
raid map.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a09454ce5dd11184c5040ed536d323e2a302a579)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: reduce size of fusion_context and use vmalloc if kmalloc fails
Shivasharan S [Fri, 10 Feb 2017 08:59:17 +0000 (00:59 -0800)]
scsi: megaraid_sas: reduce size of fusion_context and use vmalloc if kmalloc fails

Orabug: 26096381

Currently fusion context has fixed array load_balance_info. Use dynamic
allocation.  In few places, driver do not want physically contigious
memory.  Attempt to use vmalloc if physical contiguous memory is not
available.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5fc499b612c5401a7ae0674086befcdf8b148516)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Conflicts:
drivers/scsi/megaraid/megaraid_sas.h

8 years agoscsi: megaraid_sas: add print in device removal path
Shivasharan S [Fri, 10 Feb 2017 08:59:16 +0000 (00:59 -0800)]
scsi: megaraid_sas: add print in device removal path

Orabug: 26096381

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b4a42213a7eb8ec8556f27e6750bbc5c9193e86e)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: enhance debug logs in OCR context
Shivasharan S [Fri, 10 Feb 2017 08:59:15 +0000 (00:59 -0800)]
scsi: megaraid_sas: enhance debug logs in OCR context

Orabug: 26096381

Add additional logging from driver in OCR context.
Add debug logs for partial completion of IOs is iodone context.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit def0eab3af8651e8951c5cf1b17ece0d26827636)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: set residual bytes count during IO completion
Shivasharan S [Fri, 10 Feb 2017 08:59:14 +0000 (00:59 -0800)]
scsi: megaraid_sas: set residual bytes count during IO completion

Orabug: 26096381

Fixing issue of not setting residual bytes correctly.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 31d9a57b419d8ef8fa391009819f940778ce6245)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: raid 1 write performance for large io
Shivasharan S [Fri, 10 Feb 2017 08:59:13 +0000 (00:59 -0800)]
scsi: megaraid_sas: raid 1 write performance for large io

Orabug: 26096381

Avoid Host side PCI bandwidth bottleneck and hint FW to do Write
buffering using RaidFlag MR_RAID_FLAGS_IO_SUB_TYPE_LDIO_BW_LIMIT.  Once
IO is landed in FW with MR_RAID_FLAGS_IO_SUB_TYPE_LDIO_BW_LIMIT, it will
do single DMA from host and buffer the Write operation. On back end, FW
will DMA same buffer to the Mirror and Data Arm.  This will improve
large block IO performance which bottleneck due to Host side PCI
bandwidth limitation.

Consistent ~4000MB T.P for 256K Block size is expected performance
numbers.  IOPS for small Block size should be on par with Disk
performance.  (E.g 42 SAS Disk in JBOD mode gives 3700MB T.P.  Same
Drive used in R1 WT mode, should give ~1800MB T.P)

Using this patch 24 R1 VDs (HDD) gives below performance for Sequential
Write.  Without this patch, we cannot reach above 3200MB (Throughput is
in MB.)

Block Size    50% 256K and 50% 4K          100% 256K
4K                 3100                        2030
8K                 3140                        2740
16K                3140                        3140
32K                3400                        3240
64K                3500                        3700
128K               3870                        3870
256K               3920                        3920

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a48ba0eca0456d45e920169930569caa3fc57124)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Conflicts:
drivers/scsi/megaraid/megaraid_sas.h

8 years agoscsi: megaraid_sas: change issue_dcmd to return void from int
Shivasharan S [Fri, 10 Feb 2017 08:59:09 +0000 (00:59 -0800)]
scsi: megaraid_sas: change issue_dcmd to return void from int

Orabug: 26096381

With the changes to remove checks for a valid request descriptor,
issue_dcmd will now always return DCMD_SUCCESS. This patch changes
return type of issue_dcmd to void and change all callers appropriately.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f4fc209326c79b03fecd38a6709cf08da47f15f7)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: megasas_get_request_descriptor always return valid desc
Shivasharan S [Fri, 10 Feb 2017 08:59:08 +0000 (00:59 -0800)]
scsi: megaraid_sas: megasas_get_request_descriptor always return valid desc

Orabug: 26096381

No functional change. Code clean up. Removing error code which is not
valid scenario.  In megasas_get_request_descriptor we can remove the
error handling which is not required.  With fusion controllers, if there
is a valid message frame available, we are guaranteed to get a
corresponding request descriptor.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 52205ac8940b43cca1711abfb43a05e7df08c09e)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: Use DID_REQUEUE
Shivasharan S [Fri, 10 Feb 2017 08:59:07 +0000 (00:59 -0800)]
scsi: megaraid_sas: Use DID_REQUEUE

Orabug: 26096381

Moving to use DID_REQUEUE return type for reliable unconditional
retries.  Driver wants unconditional re-queue, so replace DID_RESET with
DID_REQUEUE

Discussed below -
https://www.spinics.net/lists/linux-scsi/msg102848.html

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f55cf47d925e48cddabafd3bc829f1ebc05c334d)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: RAID map is accessed for SYS PDs when use_seqnum_jbod_fp is not set
Shivasharan S [Fri, 10 Feb 2017 08:59:06 +0000 (00:59 -0800)]
scsi: megaraid_sas: RAID map is accessed for SYS PDs when use_seqnum_jbod_fp is not set

Orabug: 26096381

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ed981b81fa6cb7ce191756822e3de24e51112cd3)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: Refactor MEGASAS_IS_LOGICAL macro using sdev
Shivasharan S [Fri, 10 Feb 2017 08:59:05 +0000 (00:59 -0800)]
scsi: megaraid_sas: Refactor MEGASAS_IS_LOGICAL macro using sdev

Orabug: 26096381

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3cabd16256584581af2cc3d2cedabcfcf15021ad)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: 32 bit descriptor fire cmd optimization
Shivasharan S [Fri, 10 Feb 2017 08:59:04 +0000 (00:59 -0800)]
scsi: megaraid_sas: 32 bit descriptor fire cmd optimization

Orabug: 26096381

No functional change. Code refactor.

megasas_fire_cmd_fusion can always use 32 bit descriptor write for
ventura. No need to pass extra flag.  Only IOC INIT required 64 bit
Descriptor write.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 45b8a35eed7b1d7e51a4dc04b5b694301a383afa)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: raid 1 fast path code optimize
Shivasharan S [Fri, 10 Feb 2017 08:59:03 +0000 (00:59 -0800)]
scsi: megaraid_sas: raid 1 fast path code optimize

Orabug: 26096381

No functional change. Code refactor.

Remove function megasas_fpio_to_ldio as we never require to convert fpio
to ldio because of frame unavailability.  Grab extra frame of raid 1
write fast path before it creates first frame as Fast Path.  Removed
is_raid_1_fp_write flag as raid 1 write fast path command is decided
using r1_alt_dev_handle only.  Move resetting megasas_cmd_fusion fields
at common function megasas_return_cmd_fusion.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8bf7c65d379a6d923dfebb50eb04c2407e4762ed)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: cpu select rework.
Shivasharan S [Fri, 10 Feb 2017 08:59:02 +0000 (00:59 -0800)]
scsi: megaraid_sas: cpu select rework.

Orabug: 26096381

No functional change. Code refactor.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f6c0d55c5b91c0d626d65aebee1a0d6b0a61851d)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoRevert "scsi: megaraid_sas: Enable or Disable Fast path based on the PCI Threshold...
Shivasharan S [Fri, 10 Feb 2017 08:59:01 +0000 (00:59 -0800)]
Revert "scsi: megaraid_sas: Enable or Disable Fast path based on the PCI Threshold Bandwidth"

Orabug: 26096381

This reverts commit "3e5eadb1a881" ("scsi: megaraid_sas: Enable or
Disable Fast path based on the PCI Threshold Bandwidth")

This patch was aimed to increase performance of R1 Write operation for
large IO size.  Since this method used timer approach, it turn on/off
fast path did not work as expected.  Patch 0013 describes new algorithm
and performance number.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 18bbcabdc6cc6be8c7f6d80c85d314535d76188d)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: driver version upgrade
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:53 +0000 (18:20 -0500)]
scsi: megaraid_sas: driver version upgrade

Orabug: 26096381

Upgrade driver version.

Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 223e4b93e61f7538681632bfb19edd4f27a0c319)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: Implement the PD Map support for SAS3.5 Generic Megaraid Controllers
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:52 +0000 (18:20 -0500)]
scsi: megaraid_sas: Implement the PD Map support for SAS3.5 Generic Megaraid Controllers

Orabug: 26096381

Update Linux driver to use new pdTargetId field for JBOD target ID

Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ede7c3ce82dc4001bbab33dddebab8c089f309e0)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: ldio_outstanding variable is not decremented in completion path
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:51 +0000 (18:20 -0500)]
scsi: megaraid_sas: ldio_outstanding variable is not decremented in completion path

Orabug: 26096381

ldio outstanding variable needs to be decremented in io completion path for
iMR dual queue depth

Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b71b49c209facf8fec3778142ae5e45bb6ca4afc)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: Enable or Disable Fast path based on the PCI Threshold Bandwidth
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:50 +0000 (18:20 -0500)]
scsi: megaraid_sas: Enable or Disable Fast path based on the PCI Threshold Bandwidth

Orabug: 26096381

Large SEQ IO workload should sent as non fast path commands

Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3e5eadb1a881bea2e3fa41f5ae7cdbfa36222d37)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: Add the Support for SAS3.5 Generic Megaraid Controllers Capabilities
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:49 +0000 (18:20 -0500)]
scsi: megaraid_sas: Add the Support for SAS3.5 Generic Megaraid Controllers Capabilities

Orabug: 26096381

The Megaraid driver has to support the SAS3.5 Generic Megaraid Controllers Firmware functionality.

Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 9581ebebbe351d99579e8701e238c2771ccdae93)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: Dynamic Raid Map Changes for SAS3.5 Generic Megaraid Controllers
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:48 +0000 (18:20 -0500)]
scsi: megaraid_sas: Dynamic Raid Map Changes for SAS3.5 Generic Megaraid Controllers

Orabug: 26096381

SAS3.5 Generic Megaraid Controllers FW will support new dynamic RaidMap to have different
sizes for different number of supported VDs.

Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d889344e4e59eb962894ab3b64042dc37a2d8b39)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: SAS3.5 Generic Megaraid Controllers Fast Path for RAID 1/10 Writes
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:47 +0000 (18:20 -0500)]
scsi: megaraid_sas: SAS3.5 Generic Megaraid Controllers Fast Path for RAID 1/10 Writes

Orabug: 26096381

To improve RAID 1/10 Write performance, OS drivers need to issue the
required Write IOs as Fast Path IOs (after the appropriate checks
allowing Fast Path to be used) to the appropriate physical drives
(translated from the OS logical IO) and wait for all Write IOs to complete.

Design: A write IO on RAID volume will be examined if it can be sent in
Fast Path based on IO size and starting LBA and ending LBA falling on to
a Physical Drive boundary. If the underlying RAID volume is a RAID 1/10,
driver issues two fast path write IOs one for each corresponding physical
drive after computing the corresponding start LBA for each physical drive.
Both write IOs will have the same payload and are posted to HW such that
replies land in the same reply queue.

If there are no resources available for sending two IOs, driver will send
the original IO from SCSI layer to RAID volume through the Firmware.

Based on PCI bandwidth and write payload, every second this feature is
enabled/disabled.

When both IOs are completed by HW, the resources will be released
and SCSI IO completion handler will be called.

Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 69c337c0f8d74d71e085efa8869be9fc51e5962b)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: SAS3.5 Generic Megaraid Controllers Stream Detection and IO Coale...
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:46 +0000 (18:20 -0500)]
scsi: megaraid_sas: SAS3.5 Generic Megaraid Controllers Stream Detection and IO Coalescing

Orabug: 26096381

Detect sequential Write IOs and pass the hint that it is part of sequential
stream to help HBA Firmware do the Full Stripe Writes. For read IOs on
certain RAID volumes like Read Ahead volumes,this will help driver to
send it to Firmware even if the IOs can potentially be sent to
hardware directly (called fast path) bypassing firmware.

Design: 8 streams are maintained per RAID volume as per the combined
firmware/driver design. When there is no stream detected the LRU stream
is used for next potential stream and LRU/MRU map is updated to make this
as MRU stream. Every time a stream is detected the MRU map
is updated to make the current stream as MRU stream.

Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit fdd84e2514b0157219720cf8f3f55757938a39cd)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: EEDP Escape Mode Support for SAS3.5 Generic Megaraid Controllers
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:45 +0000 (18:20 -0500)]
scsi: megaraid_sas: EEDP Escape Mode Support for SAS3.5 Generic Megaraid Controllers

Orabug: 26096381

An UNMAP command on a PI formatted device will leave the Logical Block Application
Tag and Logical Block Reference Tag as all F's (for those LBAs that are unmapped).
To avoid IO errors if those LBAs are subsequently read before they are written with
valid tag fields, the MPI SCSI IO requests need to set the EEDPFlags element EEDP
Escape Mode field, Bits [7:6] appropriately.  A value of 2 should be set to disable
all PI checks if the Logical Block Application Tag is 0xFFFF for PI types 1 and 2.
A value of 3 should be set to disable all PI checks if the Logical Block Application
Tag is 0xFFFF and the Logical Block Reference Tag is 0xFFFFFFFF for PI type 3.

Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 45d446038c7b93c40b2fe5ba0e95380f19e0493e)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: megaraid_sas: 128 MSIX Support
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:44 +0000 (18:20 -0500)]
scsi: megaraid_sas: 128 MSIX Support

Orabug: 26096381

SAS3.5 Generic Megaraid based Controllers will have the support for 128 MSI-X vectors,
resulting in the need to support 128 reply queues

Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2493c67e518c772a573c3b1ad02e7ced5b53f6ca)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
Conflicts:
drivers/scsi/megaraid/megaraid_sas_base.c

8 years agoscsi: megaraid_sas: Add new pci device Ids for SAS3.5 Generic Megaraid Controllers
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:43 +0000 (18:20 -0500)]
scsi: megaraid_sas: Add new pci device Ids for SAS3.5 Generic Megaraid Controllers

Orabug: 26096381

This patch contains new pci device ids for SAS3.5 Generic Megaraid Controllers

Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 45f4f2eb3da3cbff02c3d77c784c81320c733056)
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agoscsi: sd: Check for unaligned partial completion
Damien Le Moal [Wed, 1 Mar 2017 08:27:00 +0000 (17:27 +0900)]
scsi: sd: Check for unaligned partial completion

Commit <f2e767bb5d6e> ("mpt3sas: Force request partial completion
alignment") was not considering the case of commands not operating on
logical block size units (e.g. REQ_OP_ZONE_REPORT and its 64B aligned
partial replies). In this case, forcing alignment of resid to the device
logical block size can break the command result, e.g. in the case of
REQ_OP_ZONE_REPORT, the exact number of zone reported by the device.

Move the partial completion alignement check of mpt3sas to a generic
implementation in sd_done(). The check is added within the default
section of the initial req_op() switch case so that the report and reset
zone commands are ignored. In addition, as sd_done() is not called for
passthrough requests, resid corrections are not done as intended by the
initial mpt3sas patch.

Fixes: f2e767bb5d6e ("mpt3sas: Force request partial completion alignment")
Cc: <stable@vger.kernel.org> # v4.10
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c46f09175dabd5dd6a1507f36250bfa734a0156e
 change sd.c modification to avoid req_op conversion in current kernel)

Orabug: 26178369
Signed-off-by: shan.hai@oracle.com
8 years agoPCI/AER: include header file
Sudip Mukherjee [Wed, 23 Dec 2015 15:35:26 +0000 (21:05 +0530)]
PCI/AER: include header file

We are having build failure with sparc allmodconfig with the error:

drivers/nvme/host/pci.c:15:0:
include/linux/aer.h: In function 'pci_enable_pcie_error_reporting':
include/linux/aer.h:49:10: error: 'EINVAL' undeclared (first use in this function)

The file aer.h is using the error values but they are defined in
errno.h. Include errno.h so that we have the definitions of the error
codes.

Fixes: a0a3408ee614 ("NVMe: Add pci error handlers")
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: reverse IO direction for VUC command code F7
Ashok Vairavan [Mon, 13 Mar 2017 18:34:55 +0000 (11:34 -0700)]
NVMe: reverse IO direction for VUC command code F7

Orabug: 25258071

Samsung uses D2H command with Vendor Uniq Command (VUC) code F7
(the 0th bit of which is 1) for retrieving memory dump. In UEK4,
Bit 0 of the D2H command code has to be 0. Because of this voilation,
the nvmecli is unable to do crash and memory dumps in UEK4.

As the Samsung firmware can only understand VUC command code F7,
the IO direction is reversed for this vendor command code to
retrieve memory and crash dump.

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-By: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: factor out a add nvme_is_write helper
Christoph Hellwig [Mon, 6 Jun 2016 21:20:49 +0000 (23:20 +0200)]
nvme: factor out a add nvme_is_write helper

Centralize the check if a given NVMe command reads or writes data.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7a5abb4b48570c3552e33ff4c72ae1e8dac3ba15)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: allow for size limitations from transport drivers
Christoph Hellwig [Mon, 6 Jun 2016 21:20:48 +0000 (23:20 +0200)]
nvme: allow for size limitations from transport drivers

Some transport drivers may have a lower transfer size than
the controller. So allow the transport to set it in the
controller max_hw_sectors.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a229dbf61e03b70d98f5ed46f476d6369870a6ab)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme.h: add constants for PSDT and FUSE values
James Smart [Mon, 6 Jun 2016 21:20:47 +0000 (23:20 +0200)]
nvme.h: add constants for PSDT and FUSE values

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 3972be23bd2d2bcfaa44595a260a371cd9218872)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme.h: add AER constants
Christoph Hellwig [Mon, 6 Jun 2016 21:20:46 +0000 (23:20 +0200)]
nvme.h: add AER constants

Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 79f370eac63796a8933b210bca02f006ba32d22e)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme.h: add NVM command set SQE/CQE size defines
Christoph Hellwig [Mon, 6 Jun 2016 21:20:45 +0000 (23:20 +0200)]
nvme.h: add NVM command set SQE/CQE size defines

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 69cd27e2511616f9b402d1bad4c49f91aa7411a3)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme.h: Add get_log_page command strucure
Armen Baloyan [Mon, 6 Jun 2016 21:20:44 +0000 (23:20 +0200)]
nvme.h: Add get_log_page command strucure

Add get_log_page command structure and a corresponding entry in
nvme_command union

Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>
Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com>
Reviewed--by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 725b358836ed038d7d8eafef86330b3a0b3f9c2f)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme.h: add RTD3R, RTD3E and OAES fields
Christoph Hellwig [Mon, 6 Jun 2016 21:20:43 +0000 (23:20 +0200)]
nvme.h: add RTD3R, RTD3E and OAES fields

These have been added in NVMe 1.2 and we'll need at least oaes for the
NVMe target driver.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 14e974a84e831bf9a44495c7256a6846e7f77630)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Only release requested regions
Johannes Thumshirn [Tue, 10 May 2016 13:14:28 +0000 (15:14 +0200)]
NVMe: Only release requested regions

The NVMe driver only requests the PCIe device's memory regions but releases
all possible regions (including eventual I/O regions). This leads to a stale
warning entry in dmesg about freeing non existent resources.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit edb50a5403d2e2d2b2b63a8365c4378c9c300ed6)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix removal in case of active namespace list scanning method
Sunad Bhandary [Fri, 27 May 2016 10:29:43 +0000 (15:59 +0530)]
NVMe: Fix removal in case of active namespace list scanning method

In case of the active namespace list scanning method, a namespace that
is detached is not removed from the host if it was the last entry in
the list. Fix this by adding a scan to validate namespaces greater than
the value of prev.

This also handles the case of removing namespaces whose value exceed
the device's reported number of namespaces.

Signed-off-by: Sunad Bhandary S <sunad.s@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 47b0e50ac724d97c392f771bb46f11d9d1575242)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Implement namespace list scanning
Keith Busch [Thu, 22 Oct 2015 21:45:06 +0000 (15:45 -0600)]
NVMe: Implement namespace list scanning

The NVMe 1.1 specification provides an identify mode to return a
list of active namespaces. This is more efficient to discover which
namespace identifiers are active on a controller, providing potentially
significant improvement in scan time for controllers with sparesly
populated namespaces.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: add quirk for the broken Qemu Identify implementation.  To be relaxed
 later]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 540c801c65eb58e05e0ca38b6fd644a83d7e2b33)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Don't unmap controller registers on reset
Keith Busch [Mon, 13 Mar 2017 16:27:07 +0000 (09:27 -0700)]
NVMe: Don't unmap controller registers on reset

Unmapping the registers on reset or shutdown is not necessary. Keeping
the mapping simplifies reset handling.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit b00a726a9fd82ddd4c10344e46f0d371e1674303)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: reduce admin queue depth as workaround for Samsung EPIC SQ errata
Ashok Vairavan [Thu, 8 Dec 2016 00:10:38 +0000 (16:10 -0800)]
NVMe: reduce admin queue depth as workaround for Samsung EPIC SQ errata

Orabug: 25186219

PCIe analyzer tracing by Oracle and Samsung revealed an errata in Samsung's
firmware for EPIC SSDs where the invalid completion entries in admin queue
and IO queue can occur  when the queues straddle an 8MB DMA address boundary.

This patch limits admin queue depth to 64 for EPIC SSDs.

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
8 years agonvme: Limit command retries
Keith Busch [Mon, 13 Mar 2017 16:06:05 +0000 (09:06 -0700)]
nvme: Limit command retries

Many controller implementations will return errors to commands that will
not succeed, but without the DNR bit set. The driver previously retried
these commands an unlimited number of times until the command timeout
has exceeded, which takes an unnecessarilly long period of time.

This patch limits the number of retries a command can have, defaulting
to 5, but is user tunable at load or runtime.

The struct request's 'retries' field is used to track the number of
retries attempted. This is in contrast with scsi's use of this field,
which indicates how many retries are allowed.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f80ec966c19b78af4360e26e32e1ab775253105f)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: reduce queue depth as workaround for Samsung EPIC SQ errata
Ashok Vairavan [Wed, 23 Nov 2016 22:31:21 +0000 (14:31 -0800)]
NVMe: reduce queue depth as workaround for Samsung EPIC SQ errata

Orabug: 25138123

Oracle discovered that the NVMe driver gets SQ completion errors eventually
leading to the device being reset, taken out of the PCI bus tree or kernel
panics when using the default SQ size of 1024 entries (64KB) for Samsung
EPIC NVMe SSDs.

PCIe analyzer tracing by Oracle and Samsung revealed an errata in Samsung's
firmware for EPIC SSDs where these invalid completion entries can occur
when the queues straddle an 8MB DMA address boundary.

This patch works around the errata by detecting these specific devices and
limiting their descriptor queue depth to 64.  This is only for the Samsung
NVMe controllers used in Oracle X-series servers.

There was no noticeable performance impact of reducing queue depths to 64
for these Samsung drives, Oracle X6-2 server, and Oracle VM Server 3.4.2.

Signed-off-by: Kyle Fortin <kyle.fortin@oracle.com>
Signed-off-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Create discard zero quirk white list
Keith Busch [Fri, 4 Mar 2016 20:15:17 +0000 (13:15 -0700)]
NVMe: Create discard zero quirk white list

The NVMe specification does not require discarded blocks return zeroes on
read, but provides that behavior as a possibility. Some applications more
efficiently use an SSD if reads on discarded blocks were deterministically
zero, based on the "discard_zeroes_data" queue attribute.

There is no specification defined way to determine device behavior on
discarded blocks, so the driver always left the queue setting disabled. We
can only know behavior based on individual device models, so this patch
adds a flag to the NVMe "quirk" list that vendors may set if they know
their controller works that way. The patch also sets the new flag for one
such known device.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Suggested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 08095e70783f1d8296f858d37a9e1878f5da0623)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: use UINT_MAX for max discard sectors
Minfei Huang [Tue, 17 May 2016 07:58:41 +0000 (15:58 +0800)]
nvme: use UINT_MAX for max discard sectors

It's more elegant to use UINT_MAX to represent the max value of
type unsigned int. So replace the actual value by using this define.

Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bd0fc2884ca4d1516da1aa5cf44385e24dc23c29)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move nvme_cancel_request() to common code
Ming Lin [Wed, 18 May 2016 21:05:02 +0000 (14:05 -0700)]
nvme: move nvme_cancel_request() to common code

So it can be used by fabrics driver also.

Signed-off-by: Ming Lin <ming.l@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Keith Busch <keith.bsuch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit c55a2fd4bb16bcdd8c42e3d64fccd326416b7492)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: update and rename nvme_cancel_io to nvme_cancel_request
Ming Lin [Wed, 18 May 2016 21:05:01 +0000 (14:05 -0700)]
nvme: update and rename nvme_cancel_io to nvme_cancel_request

nvme_cancel_io is a bit confusing (given the distinction of io/admin),
so rename it to nvme_cancel_request.

And update it a bit to pass in struct nvme_ctrl, so it can be used
by Fabrics driver also.

Signed-off-by: Ming Lin <ming.l@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Keith Busch <keith.bsuch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e1958e6534a2d4ebb2dfcd0b3f16ff8e277a5b0c)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoblk-mq: Export tagset iter function
Sagi Grimberg [Tue, 3 Jan 2017 16:40:50 +0000 (08:40 -0800)]
blk-mq: Export tagset iter function

Its useful to iterate on all the active tags in cases
where we will need to fail all the queues IO.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
[hch: carefully check for valid tagsets]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e0489487ec9cd79ee1fa0dc5d3789c08b0e51a2c)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Add device ID's with stripe quirk
Keith Busch [Mon, 2 May 2016 21:14:24 +0000 (15:14 -0600)]
NVMe: Add device ID's with stripe quirk

Adds two Intel controllers that have the "stripe" quirk.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 99466e708ddce8904c8635c213f2deb523ef4fb9)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Short-cut removal on surprise hot-unplug
Keith Busch [Thu, 12 May 2016 14:37:14 +0000 (08:37 -0600)]
NVMe: Short-cut removal on surprise hot-unplug

This patch adds a new state that when set has the core automatically
kill request queues prior to removing namespaces.

If PCI device is not present at the time the nvme driver's remove is
called, we can kill all IO queues immediately instead of waiting for
the watchdog thread to do that at its polling interval. This improves
scenarios where multiple hot plug events occur at the same time since
it doesn't block the pci enumeration for as long.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 0ff9d4e1a284a9282a049bf064f123e27f838907)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Allow user initiated rescan
Keith Busch [Fri, 29 Apr 2016 21:45:18 +0000 (15:45 -0600)]
NVMe: Allow user initiated rescan

This exposes ioctl and sysfs methods a user can invoke to request the
driver rescan a controller and its namespaces. This is less harsh than
doing a controller reset, which temporarilly halts all IO, just to
surface a newly attached namespace.

This is mainly useful for controllers that implement the namespace
management command, but do not support the namespace notify change
asynchronous event notification.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 9ec3bb2f994bda9c8817856fdcbfaebe8f62fbd3)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Reduce driver log spamming
Keith Busch [Mon, 4 Apr 2016 21:07:41 +0000 (15:07 -0600)]
NVMe: Reduce driver log spamming

Reduce error logging when no corrective action is required.

Suggessted-by: Chris Petersen <cpetersen@fb.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d011fb3164e8694d7839f10a497f8ab6c660149a)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Unbind driver on failure
Keith Busch [Mon, 28 Mar 2016 22:03:21 +0000 (16:03 -0600)]
NVMe: Unbind driver on failure

Instead of removing the PCI device from the kernel's topology on
controller failure, this patch simply requests unbinding the device
from the driver. This avoids concurrently running pci removal with the
hot plug event, which has been reported to be problematic when multiple
surprise events occur near simultaneously.

The other benefit is that we will have PCI config and memory space
available to poke around for debugging a failed controller, assuming
the device was not physically removed.

The down side occurs if the platform and/or kernel do not support any
type of surprise hot removal. The device will remain visible through
sysfs (and therefore lspci), and some manual work is necessary to get
the logical topology corrected. But if your platform and/or kernel don't
support surprise removal, you probably shouldn't be doing that anyway.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 921920ab32f290dafdb0359024d4587897712728)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Delete only created queues
Keith Busch [Fri, 6 May 2016 17:50:52 +0000 (11:50 -0600)]
NVMe: Delete only created queues

Use the online queue count instead of the number of allocated queues. The
controller should just return an invalid queue identifier error to the
commands if a queue wasn't created. While it's not harmful, it's still
not correct.

Reported-by: Saar Gross <saar@annapurnalabs.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 014a0d609eb4721d1e416cf10da2d5602f9b34d5)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix reset/remove race
Keith Busch [Fri, 8 Apr 2016 22:11:02 +0000 (16:11 -0600)]
NVMe: Fix reset/remove race

This fixes a scenario where device is present and being reset, but a
request to unbind the driver occurs.

A previous patch series addressing a device failure removal scenario
flushed reset_work after controller disable to unblock reset_work waiting
on a completion that wouldn't occur. This isn't safe as-is. The broken
scenario can potentially be induced with:

  modprobe nvme && modprobe -r nvme

To fix, the reset work is flushed immediately after setting the controller
removing flag, and any subsequent reset will not proceed with controller
initialization if the flag is set.

The controller status must be polled while active, so the watchdog timer
is also left active until the controller is disabled to cleanup requests
that may be stuck during namespace removal.

[Fixes: ff23a2a15a2117245b4599c1352343c8b8fb4c43]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 87c32077819c695cbc5ab00226a28010cd5806c3)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: fix nvme_ns_remove() deadlock
Ming Lin [Mon, 25 Apr 2016 21:20:19 +0000 (14:20 -0700)]
nvme: fix nvme_ns_remove() deadlock

On receipt of a namespace attribute changed AER, we acquire the
namespace mutex lock before proceeding to scan and validate the
namespace list. In case of namespace detach/delete command,
nvme_ns_remove function deadlocks trying to acquire the already held
lock.

All callers, except nvme_remove_namespaces(), of nvme_ns_remove()
already held namespaces_mutex. So we can simply fix the deadlock by
not acquiring the mutex in nvme_ns_remove() and acquiring it in
nvme_remove_namespaces().

Reported-by: Sunad Bhandary S <sunad.s@samsung.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimerg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit b7b9c2278752e37dc7ae918cda823aa2a078e03b)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: switch to RCU freeing the namespace
Ming Lin [Mon, 25 Apr 2016 21:20:18 +0000 (14:20 -0700)]
nvme: switch to RCU freeing the namespace

Switch to RCU freeing the namespace structure so that
nvme_start_queues, nvme_stop_queues and nvme_kill_queues would
be able to get away with only a RCU read side critical section.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimerg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 32f0c4afb4363e31dad49202f1554ba591d649f2)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: correct comment for offset enum of controller registers in nvme.h
Wang Sheng-Hui [Wed, 27 Apr 2016 12:10:16 +0000 (20:10 +0800)]
NVMe: correct comment for offset enum of controller registers in nvme.h

Section 3.1 gives the comment for the offset of controller registers
in the specification 1.2a.

Some are mis-copied in the header file nvme.h. Correct them.

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a5b714ad395803a6aa91793b9e52a81b176b8ba9)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: add helper nvme_cleanup_cmd()
Ming Lin [Mon, 25 Apr 2016 21:33:20 +0000 (14:33 -0700)]
nvme: add helper nvme_cleanup_cmd()

This hides command cleanup into nvme.h and fabrics drivers will
also use it.

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 6904242db1ac07403c331b18796f6c2bf5382aec)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move AER handling to common code
Christoph Hellwig [Tue, 26 Apr 2016 11:52:00 +0000 (13:52 +0200)]
nvme: move AER handling to common code

The transport driver still needs to do the actual submission, but all the
higher level code can be shared.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f866fc4282a81673ef973ad54c68235a3263b42e)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: move namespace scanning to core
Christoph Hellwig [Fri, 30 Dec 2016 21:10:00 +0000 (13:10 -0800)]
nvme: move namespace scanning to core

Move the scan work item and surrounding code to the common code.  For now
we need a new finish_scan method to allow the PCI driver to set the
irq affinity hints, but I have plans in the works to obsolete this as well.

Note that this moves the namespace scanning from nvme_wq to the system
workqueue, but as we don't rely on namespace scanning to finish from reset
or I/O this should be fine.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 5955be2144b3b56182e2175e7e3d2ddf27fb485d)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: tighten up state check for namespace scanning
Christoph Hellwig [Tue, 26 Apr 2016 11:51:58 +0000 (13:51 +0200)]
nvme: tighten up state check for namespace scanning

We only should be scanning namespaces if the controller is live.  Currently
we call the function just before setting it live, so fix the code up to
move the call to nvme_queue_scan to just below the state change.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 92911a55d42084cd285250c275d9f238783638c2)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: introduce a controller state machine
Christoph Hellwig [Tue, 26 Apr 2016 11:51:57 +0000 (13:51 +0200)]
nvme: introduce a controller state machine

Replace the adhoc flags in the PCI driver with a state machine in the
core code.  Based on code from Sagi Grimberg for the Fabrics driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bb8d261e088811ef2b564d745afcd1633428010a)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: remove the io_incapable method
Christoph Hellwig [Tue, 26 Apr 2016 11:51:56 +0000 (13:51 +0200)]
nvme: remove the io_incapable method

It's unused since "NVMe: Move error handling to failed reset handler".

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 04a934d4c7251e6458a7898c2b4d6c2da29b132c)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: nvme_core_exit() should do cleanup in the reverse order as nvme_core_init does
Wang Sheng-Hui [Thu, 28 Apr 2016 08:19:31 +0000 (16:19 +0800)]
NVMe: nvme_core_exit() should do cleanup in the reverse order as nvme_core_init does

nvme_core_init does:
    1) register_blkdev
    2) __register_chrdev
    3) class_create

nvme_core_exit should do cleanup in the reverse order.

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 23bd63ceea30878758c303baaf9f8e28f299c578)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix check_flush_dependency warning
Keith Busch [Wed, 27 Apr 2016 21:51:18 +0000 (15:51 -0600)]
NVMe: Fix check_flush_dependency warning

If the controller fails and is degraded after a reset, we need to kill
off all requests queues before removing the inaccessble namespaces. This
will prevent del_gendisk from syncing dirty data, which we can't due
from a WQ_MEM_RECLAIM work queue.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 3b24774e1fb90a40836e96e39a851a774679efff)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: small typo in section BLK_DEV_NVME_SCSI of host/Kconfig
Wang Sheng-Hui [Wed, 20 Apr 2016 02:04:32 +0000 (10:04 +0800)]
NVMe: small typo in section BLK_DEV_NVME_SCSI of host/Kconfig

"as well as " is miss typed "as well a " in section
"config BLK_DEV_NVME_SCSI"

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit b31356dfde571d925768783f1bb63ca8e156d0b3)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: fix cntlid type
Christoph Hellwig [Sat, 16 Apr 2016 18:57:58 +0000 (14:57 -0400)]
nvme: fix cntlid type

Controller IDs in NVMe are unsigned 16-bit types.  In the Fabrics driver we
actually pass ctrl->id by reference, so we need it to have the correct type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 76e3914ae51714b0535c38d9472d89124e0b6b96)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: Avoid reset work on watchdog timer function during error recovery
Guilherme G. Piccoli [Wed, 13 Apr 2016 14:08:20 +0000 (11:08 -0300)]
nvme: Avoid reset work on watchdog timer function during error recovery

This patch adds a check on nvme_watchdog_timer() function to avoid the
call to reset_work() when an error recovery process is ongoing on
controller. The check is made by looking at pci_channel_offline()
result.

If we don't check for this on nvme_watchdog_timer(), error recovery
mechanism can't recover well, because reset_work() won't be able to
do its job (since we're in the middle of an error) and so the
controller is removed from the system before error recovery mechanism
can perform slot reset (which would allow the adapter to recover).

In this patch we also have split the huge condition expression on
nvme_watchdog_timer() by introducing an auxiliary function to help
make the code more readable.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit c875a7093f0479215cf9bf51356d7638f2ec5746)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: remove dead controllers from a work item
Christoph Hellwig [Thu, 26 Nov 2015 11:35:49 +0000 (12:35 +0100)]
nvme: remove dead controllers from a work item

Compared to the kthread this gives us multiple call prevention for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 5c8809e650772be87ba04595a8ccf278bab7b543)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: silence warning about unused 'dev'
Jens Axboe [Tue, 12 Apr 2016 22:11:11 +0000 (16:11 -0600)]
NVMe: silence warning about unused 'dev'

Depending on options, we might not be using dev in nvme_cancel_io():

drivers/nvme/host/pci.c: In function ‘nvme_cancel_io’:
drivers/nvme/host/pci.c:970:19: warning: unused variable ‘dev’ [-Wunused-variable]
  struct nvme_dev *dev = data;
                   ^

So get rid of it, and just cast for the dev_dbg_ratelimited() call.

Fixes: 82b4552b91c4 ("nvme: Use blk-mq helper for IO termination")
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7e19793096994d43d213f440f4bbea926828a727)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: switch to using blk_queue_write_cache()
Jens Axboe [Tue, 12 Apr 2016 21:43:09 +0000 (15:43 -0600)]
NVMe: switch to using blk_queue_write_cache()

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7c88cb00f2a26637bade6c62a17d17f31a954e30)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoblock: add ability to flag write back caching on a device
Jens Axboe [Tue, 12 Apr 2016 18:32:46 +0000 (12:32 -0600)]
block: add ability to flag write back caching on a device

Add an internal helper and flag for setting whether a queue has
write back caching, or write through (or none). Add a sysfs file
to show this as well, and make it changeable from user space.

This will replace the (awkward) blk_queue_flush() interface that
drivers currently use to inform the block layer of write cache state
and capabilities.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
(cherry picked from commit 93e9d8e836cb1a9a58b33eb6643bf061c6119ef2)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: Use blk-mq helper for IO termination
Sagi Grimberg [Tue, 12 Apr 2016 21:07:15 +0000 (15:07 -0600)]
nvme: Use blk-mq helper for IO termination

blk-mq offers a tagset iterator so let's use that
instead of using nvme_clear_queues.

Note, we changed nvme_queue_cancel_ios name to nvme_cancel_io
as there is no concept of a queue now in this function (we
also lost the print).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 82b4552b91c40626a90a20291aab1137c638b512)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Skip async events for degraded controllers
Keith Busch [Tue, 12 Apr 2016 17:13:11 +0000 (11:13 -0600)]
NVMe: Skip async events for degraded controllers

If the controller is degraded, the driver should stay out of the way so
the user can recover the drive. This patch skips driver initiated async
event requests when the drive is in this state.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 21f033f7c72e9505c46c6555b019b907dc39dfcd)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: add helper nvme_setup_cmd()
Ming Lin [Tue, 12 Apr 2016 19:10:14 +0000 (13:10 -0600)]
nvme: add helper nvme_setup_cmd()

This moves nvme_setup_{flush,discard,rw} calls into a common
nvme_setup_cmd() helper. So we can eventually hide all the command
setup in the core module and don't even need to update the fabrics
drivers for any specific command type.

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 8093f7ca73c1633e458c16a74b51bcc3c94564c4)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoblock: add offset in blk_add_request_payload()
Ming Lin [Tue, 22 Mar 2016 07:24:44 +0000 (00:24 -0700)]
block: add offset in blk_add_request_payload()

We could kmalloc() the payload, so need the offset in page.

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 37e58237a16b94fcd2c2d1b7e9c6e1ca661c231b)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: rewrite discard support
Ming Lin [Tue, 22 Mar 2016 07:24:45 +0000 (00:24 -0700)]
nvme: rewrite discard support

This rewrites nvme_setup_discard() with blk_add_request_payload().
It allocates only the necessary amount(16 bytes) for the payload.

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 03b5929ebb20457e2fd13a701954efa2b2fb7ded)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: add helper nvme_map_len()
Ming Lin [Tue, 22 Mar 2016 07:24:43 +0000 (00:24 -0700)]
nvme: add helper nvme_map_len()

The helper returns the number of bytes that need to be mapped
using PRPs/SGL entries.

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 58b45602751ddf16e57170656670aa5a8f78eeca)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: add missing lock nesting notation
Ming Lin [Tue, 5 Apr 2016 17:32:04 +0000 (10:32 -0700)]
nvme: add missing lock nesting notation

When unloading driver, nvme_disable_io_queues() calls nvme_delete_queue()
that sends nvme_admin_delete_cq command to admin sq. So when the command
completed, the lock acquired by nvme_irq() actually belongs to admin queue.

While the lock that nvme_del_cq_end() trying to acquire belongs to io queue.
So it will not deadlock.

This patch adds lock nesting notation to fix following report.

[  109.840952] =============================================
[  109.846379] [ INFO: possible recursive locking detected ]
[  109.851806] 4.5.0+ #180 Tainted: G            E
[  109.856533] ---------------------------------------------
[  109.861958] swapper/0/0 is trying to acquire lock:
[  109.866771]  (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme]
[  109.876535]
[  109.876535] but task is already holding lock:
[  109.882398]  (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme]
[  109.891547]
[  109.891547] other info that might help us debug this:
[  109.898107]  Possible unsafe locking scenario:
[  109.898107]
[  109.904056]        CPU0
[  109.906515]        ----
[  109.908974]   lock(&(&nvmeq->q_lock)->rlock);
[  109.913381]   lock(&(&nvmeq->q_lock)->rlock);
[  109.917787]
[  109.917787]  *** DEADLOCK ***
[  109.917787]
[  109.923738]  May be due to missing lock nesting notation
[  109.923738]
[  109.930558] 1 lock held by swapper/0/0:
[  109.934413]  #0:  (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme]
[  109.944010]
[  109.944010] stack backtrace:
[  109.948389] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G            E   4.5.0+ #180
[  109.955734] Hardware name: Dell Inc. OptiPlex 7010/0YXT71, BIOS A15 08/12/2013
[  109.962989]  0000000000000000 ffff88011e203c38 ffffffff81383d9c ffffffff81c13540
[  109.970478]  ffffffff826711d0 ffff88011e203ce8 ffffffff810bb429 0000000000000046
[  109.977964]  0000000000000046 0000000000000000 0000000000b2e597 ffffffff81f4cb00
[  109.985453] Call Trace:
[  109.987911]  <IRQ>  [<ffffffff81383d9c>] dump_stack+0x85/0xc9
[  109.993711]  [<ffffffff810bb429>] __lock_acquire+0x19b9/0x1c60
[  109.999575]  [<ffffffff810b6d1d>] ? trace_hardirqs_off+0xd/0x10
[  110.005524]  [<ffffffff810b386d>] ? complete+0x3d/0x50
[  110.010688]  [<ffffffff810bb760>] lock_acquire+0x90/0xf0
[  110.016029]  [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme]
[  110.022418]  [<ffffffff81772afb>] _raw_spin_lock_irqsave+0x4b/0x60
[  110.028632]  [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme]
[  110.035019]  [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme]
[  110.041232]  [<ffffffff8135b485>] blk_mq_end_request+0x35/0x60
[  110.047095]  [<ffffffffc0821ad8>] nvme_complete_rq+0x68/0x190 [nvme]
[  110.053481]  [<ffffffff8135b53f>] __blk_mq_complete_request+0x8f/0x130
[  110.060043]  [<ffffffff8135b611>] blk_mq_complete_request+0x31/0x40
[  110.066343]  [<ffffffffc08209e3>] __nvme_process_cq+0x83/0x240 [nvme]
[  110.072818]  [<ffffffffc0820c35>] nvme_irq+0x25/0x50 [nvme]
[  110.078419]  [<ffffffff810cdb66>] handle_irq_event_percpu+0x36/0x110
[  110.084804]  [<ffffffff810cdc77>] handle_irq_event+0x37/0x60
[  110.090491]  [<ffffffff810d0ea3>] handle_edge_irq+0x93/0x150
[  110.096180]  [<ffffffff81012306>] handle_irq+0xa6/0x130
[  110.101431]  [<ffffffff81011abe>] do_IRQ+0x5e/0x120
[  110.106333]  [<ffffffff8177384c>] common_interrupt+0x8c/0x8c

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2e39e0f608c130411f52c9fe5648dbcda5e28528)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Always use MSI/MSI-x interrupts
Keith Busch [Fri, 8 Apr 2016 22:09:10 +0000 (16:09 -0600)]
NVMe: Always use MSI/MSI-x interrupts

Multiple users have reported device initialization failure due the driver
not receiving legacy PCI interrupts. This is not unique to any particular
controller, but has been observed on multiple platforms.

There have been no issues reported or observed when with message signaled
interrupts, so this patch attempts to use MSI-x during initialization,
falling back to MSI. If that fails, legacy would become the default.

The setup_io_queues error handling had to change as a result: the admin
queue's msix_entry used to be initialized to the legacy IRQ. The case
where nr_io_queues is 0 would fail request_irq when setting up the admin
queue's interrupt since re-enabling MSI-x fails with 0 vectors, leaving
the admin queue's msix_entry invalid. Instead, return success immediately.

Reported-by: Tim Muhlemmer <muhlemmer@gmail.com>
Reported-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a5229050b69cfffb690b546c357ca5a60434c0c8)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix reset/remove race
Keith Busch [Fri, 8 Apr 2016 22:11:02 +0000 (16:11 -0600)]
NVMe: Fix reset/remove race

This fixes a scenario where device is present and being reset, but a
request to unbind the driver occurs.

A previous patch series addressing a device failure removal scenario
flushed reset_work after controller disable to unblock reset_work waiting
on a completion that wouldn't occur. This isn't safe as-is. The broken
scenario can potentially be induced with:

  modprobe nvme && modprobe -r nvme

To fix, the reset work is flushed immediately after setting the controller
removing flag, and any subsequent reset will not proceed with controller
initialization if the flag is set.

The controller status must be polled while active, so the watchdog timer
is also left active until the controller is disabled to cleanup requests
that may be stuck during namespace removal.

[Fixes: ff23a2a15a2117245b4599c1352343c8b8fb4c43]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 9bf2b972afeaffd173fe2ce211ebc555ea7e8a87)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: avoid cqe corruption when update at the same time as read
Marta Rybczynska [Tue, 22 Mar 2016 15:02:06 +0000 (16:02 +0100)]
nvme: avoid cqe corruption when update at the same time as read

Make sure the CQE phase (validity) is read before the rest of the
structure. The phase bit is the highest address and the CQE
read will happen on most platforms from lower to upper addresses
and will be done by multiple non-atomic loads. If the structure
is updated by PCI during the reads from the processor, the
processor may get a corrupted copy.

The addition of the new nvme_cqe_valid function that verifies
the validity bit also allows refactoring of the other CQE read
sequences.

Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d783e0bd02e700e7a893ef4fa71c69438ac1c276)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Expose ns wwid through single sysfs entry
Keith Busch [Thu, 18 Feb 2016 16:57:48 +0000 (09:57 -0700)]
NVMe: Expose ns wwid through single sysfs entry

The method to uniquely identify a namespace depends on the controller's
specification revision level and implemented capabilities. This patch
has the driver figure this out and exports the unique string through a
single 'wwid' attribute so the user doesn't have this burden.

The longest namespace unique identifier is used if available. If not
available, the driver will concat the controller's vendor, serial,
and model with the namespace ID. The specification provides this as a
unique indentifier.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 118472ab8532e55f48395ef5764b354fe48b1d73)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Remove unused sq_head read in completion path
Jon Derrick [Tue, 8 Mar 2016 17:34:54 +0000 (10:34 -0700)]
NVMe: Remove unused sq_head read in completion path

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 48c7823f42da2bc881ae2e325ed40123871c2fb9)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: fix max_segments integer truncation
Christoph Hellwig [Fri, 30 Dec 2016 20:51:50 +0000 (12:51 -0800)]
nvme: fix max_segments integer truncation

The block layer uses an unsigned short for max_segments.  The way we
calculate the value for NVMe tends to generate very large 32-bit values,
which after integer truncation may lead to a zero value instead of
the desired outcome.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Jeff Lien <Jeff.Lien@hgst.com>
Tested-by: Jeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 45686b6198bd824f083ff5293f191d78db9d708a)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: set queue limits for the admin queue
Christoph Hellwig [Wed, 2 Mar 2016 17:07:11 +0000 (18:07 +0100)]
nvme: set queue limits for the admin queue

Factor out a helper to set all the device specific queue limits and apply
them to the admin queue in addition to the I/O queues.  Without this the
command size on the admin queue is arbitrarily low, and the missing
other limitations are just minefields waiting for victims.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Jeff Lien <Jeff.Lien@hgst.com>
Tested-by: Jeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit da35825d9a091a7a1d5824c8468168e2658333ff)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix 0-length integrity payload
Keith Busch [Wed, 24 Feb 2016 16:15:58 +0000 (09:15 -0700)]
NVMe: Fix 0-length integrity payload

A user could send a passthrough IO command with a metadata pointer to a
namespace without metadata. With metadata length of 0, kmalloc returns
ZERO_SIZE_PTR. Since that is not NULL, the driver would have set this as
the bio's integrity payload, which causes an access fault on completion.

This patch ignores the users metadata buffer if the namespace format
does not support separate metadata.

Reported-by: Stephen Bates <stephen.bates@microsemi.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e9fc63d682dbbef17921aeb00d03fd52d6735ffd)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Don't allow unsupported flags
Keith Busch [Wed, 24 Feb 2016 16:15:57 +0000 (09:15 -0700)]
NVMe: Don't allow unsupported flags

The command flags can change the meaning of other fields in the command
that the driver is not prepared to handle. Specifically, the user could
passthrough an SGL flag, causing the controller to misinterpret the PRP
list the driver created, potentially corrupting memory or data.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 63088ec7c8eadfe08b96127a41b385ec9742dace)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Move error handling to failed reset handler
Keith Busch [Fri, 30 Dec 2016 03:19:31 +0000 (19:19 -0800)]
NVMe: Move error handling to failed reset handler

This moves failed queue handling out of the namespace removal path and
into the reset failure path, fixing a hanging condition if the controller
fails or link down during del_gendisk. Previously the driver had to see
the controller as degraded prior to calling del_gendisk to setup the
queues to fail. But, if the controller happened to fail after this,
there was no task to end outstanding requests.

On failure, all namespace states are set to dead. This has capacity
revalidate to 0, and ends all new requests with error status.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 69d9a99c258eb1d6478fd9608a2070890797eed7)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Simplify device reset failure
Keith Busch [Wed, 22 Feb 2017 19:13:09 +0000 (11:13 -0800)]
NVMe: Simplify device reset failure

A reset failure schedules the device to unbind from the driver through
the pci driver's remove. This cleans up all intialization, so there is
no need to duplicate the potentially racy cleanup.

To help understand why a reset failed, the status is logged with the
existing warning message.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f58944e265d4ebe47216a5d7488aee3928823d30)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Fix namespace removal deadlock
Keith Busch [Wed, 24 Feb 2016 16:15:54 +0000 (09:15 -0700)]
NVMe: Fix namespace removal deadlock

This patch makes nvme namespace removal lockless. It is up to the caller
to ensure no active namespace scanning is occuring. To ensure no scan
work occurs, the nvme pci driver adds a removing state to the controller
device to avoid queueing scan work during removal. The work is flushed
after setting the state, so no new scan work can be queued.

The lockless removal allows the driver to cleanup a namespace
request_queue if the controller fails during removal. Previously this
could deadlock trying to acquire the namespace mutex in order to handle
such events.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 646017a612e72f19bd9f991fe25287a149c5f627)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agoNVMe: Use IDA for namespace disk naming
Keith Busch [Wed, 24 Feb 2016 16:15:53 +0000 (09:15 -0700)]
NVMe: Use IDA for namespace disk naming

A namespace may be detached from a controller, but a user may be holding
a reference to it. Attaching a new namespace with the same NSID will create
duplicate names when using the NSID to name the disk.

This patch uses an IDA that is released only when the last reference is
released instead of using the namespace ID.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 075790ebba4a1eb297f9875e581b55c0382b1f3d)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
8 years agonvme: expose cntlid in sysfs
Ming Lin [Fri, 26 Feb 2016 21:24:19 +0000 (13:24 -0800)]
nvme: expose cntlid in sysfs

For NVMe over Fabrics, the cntlid will be used by systemd/udev to
create link to the device, for example,

/dev/disk/by-path/<fabrics-info>-<cntlid>-<namespace> -> /dev/nvme0n1

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 931e1c2204c6d00c11c5c1e2e1c20b5ca41f292d)

Orabug: 25130845

Signed-off-by: Ashok Vairavan <ashok.vairavan@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>