</t>
</section>
- <section anchor='ssc:fence' title='Client Fencing'>
+ <section anchor='ssc:fencing' title='Client Fencing'>
<t>
The SCSI layout uses Persistent Reservations to provide client
fencing. For this both the MDS and the Clients have to register
a key with the storage device, and the MDS has to create a
reservation on the storage device.
- Section 6.7 of <xref target="NVME-STLR" /> contains a full
- mapping of the required PERSISTENT RESERVE IN and
- PERSISTENT RESERVE OUT SCSI command to NVMe commands which
- SHOULD be used when using NVMe namespaces as storage devices
- for the pNFS SCSI layout.
-
- One important difference between SCSI and NVMe Persistent Reservations
- is that NVMe reservation keys always apply to all controllers used by
- a host (as indicated by the NVMe HOSTID). This behavior is somewhat
- similar to setting the ALL_TG_PT bit when registering a SCSI
- Reservation key, but actually guaranteed to work reliably.
</t>
+ <t>
+ The following is a full mapping of the required PR IN and PR OUT
+ SCSI command to NVMe commands which MUST be used when using NVMe
+ namespaces as storage devices for the pNFS SCSI layout.
+ </t>
+
+ <section anchor='ssc:fencing:keys' title='PRs - Key Registration'>
+ <t>
+ On NVMe namespaces, reservations keys are registered using the
+ Reservation Register command (refer to Section 7.3 of
+ <xref target="NVME-BASE" />) with the Reservation Register Action
+ (RREGA) field set to 000b (i.e., Register Reservation Key) and
+ supplying the reservation key in the New Reservation Key (NRKEY)
+ field.
+ </t>
+ <t>
+ Reservation keys are unregistered using the Reservation Register
+ command with the Reservation Register Action (RREGA) field set to
+ 001b (i.e., Unregister Reservation Key) and supplying the reservation
+ key in the Current Reservation Key (CRKEY) field.
+ </t>
+ <t>
+ One important difference between SCSI Persistent Reservations
+ and NVMe Reservations is that NVMe reservation keys always apply
+ to all controllers used by a host (as indicated by the NVMe Host
+ Identifier) This behavior is similar to setting the ALL_TG_PT bit
+ when registering a SCSI Reservation key, but actually guaranteed
+ to work reliably.
+ Registering a reservation key with a namespace creates an
+ association between a host and a namespace. A host that is a
+ registrant of a namespace may use any controller with which that
+ host is associated (i.e., that has the same Host Identifier,
+ refer to Section 5.27.1.25 of <xref target="NVME-BASE" />)
+ to access that namespace as a registrant.
+ </t>
+ </section>
+
+ <section anchor='ssc:fencing:reg'
+ title='PRs - MDS Registration and Reservation'>
+ <t>
+ Before returning a PNFS_SCSI_VOLUME_BASE volume to the client, the MDS
+ needs to prepare the volume for fencing using PRs. This is done by
+ registering the reservation generated for the MDS with the device (see
+ <xref target="ssc:fencing:keys" /> followed by a Reservation Acquire
+ command (refer to Section 7.2 of <xref target="NVME-BASE" />) with
+ the Reservation Acquire Action (RACQA) field set to 000b (i.e., Acquire)
+ and the Reservation Type (RTYPE) field set to 4h (i.e., Exclusive Access
+ - Registrants Only Reservation).
+ </t>
+ </section>
+
+ <section anchor='ssc:fenceaction' title='Fencing Action'>
+ <t>
+ In case of a non-responding client, the MDS fences the client by
+ executing a Reservation Acquire command (refer to section 7.2 of
+ <xref target="NVME-BASE" />), with the Reservation Acquire Action
+ (RACQA) field to 001b (i.e., Preempt) or 010b (i.e., Preempt and
+ Abort), the Current Reservation Key (CRKEY) field set to the
+ server's reservation key, the Preempt Reservation Key (PRKEY) field
+ set to the reservation key associated with the non-responding client
+ and the Reservation Type (RTYPE) field set to 4h (i.e., Exclusive
+ Access - Registrants Only Reservation).
+
+ The client can distinguish I/O errors due to fencing from other
+ errors based on the Reservation Conflict NVMe status code.
+ </t>
+ </section>
+
+ <section anchor='ssc:recovery' title='Client Recovery after a Fence Action'>
+ <t>
+ If a NVMe command issued by the client to the storage device returns a
+ non-retryable error (refer to the DNR bit defined in Figure 92 in
+ <xref target="NVME-BASE" />), the client MUST commit all layouts that
+ use the storage device through the MDS, return all outstanding layouts
+ for the device, forget the device ID, and unregister the reservation
+ key.
+ </t>
+ </section>
</section>
<section anchor='ssc:caches' title='Volatile write caches'>