From: Jeff Lien Date: Wed, 5 Aug 2020 19:52:59 +0000 (-0500) Subject: [nvme-cli] Add log page mask parameter to vs-smart-add-log wdc plugin command. X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=0ac16e6785e4f965c09b3b00485d53baff58c606;p=users%2Fhch%2Fnvme-cli.git [nvme-cli] Add log page mask parameter to vs-smart-add-log wdc plugin command. --- diff --git a/Documentation/nvme-wdc-vs-smart-add-log.txt b/Documentation/nvme-wdc-vs-smart-add-log.txt index 1a7d180..96b55c5 100644 --- a/Documentation/nvme-wdc-vs-smart-add-log.txt +++ b/Documentation/nvme-wdc-vs-smart-add-log.txt @@ -9,15 +9,15 @@ SYNOPSIS -------- [verse] 'nvme wdc vs-smart-add-log' [--interval=, -i ] [--output-format= -o ] + [--log-page-version=, -l ] [--log-page-mask=, -p ] DESCRIPTION ----------- For the NVMe device given, send a Vendor Unique WDC vs-smart-add-log command and -provide the additional smart log. The --interval option will return performance -statistics from the specified reporting interval. +provide the additional smart log. The parameter is mandatory and may be either the NVMe character -device (ex: /dev/nvme0). +device (ex: /dev/nvme0) or block device (ex: /dev/nvme0n1). This will only work on WDC devices supporting this feature. Results for any other device are undefined. @@ -28,7 +28,8 @@ OPTIONS ------- -i :: --interval=:: - Return the statistics from specific interval, defaults to 14 + Return the statistics from specific interval, defaults to 14. This parameter is only valid for the 0xC1 log page + and ignored for all other log pages. -o :: --output-format=:: @@ -38,7 +39,15 @@ OPTIONS -l :: --log-page-version=:: - Log Page Version: 0 = vendor, 1 = WDC + Log Page Version: 0 = vendor, 1 = WDC. This parameter is only valid for the 0xC0 log page and ignored for all + other log pages. + +-p :: +--log-page-mask=:: + Supply a comma separated list of desired log pages to display. + The possible values are 0xc0, 0xc1, 0xca, 0xd0. + Note: Not all pages are supported on all drives. + The default is to display all supported log pages. Valid Interval values and description :- @@ -63,201 +72,6 @@ accumulated statistics. |The statistical set accumulated during the entire lifetime of the device. |=== -CA Log Page Data Output Explanation ------------------------------------ -[cols="2*", frame="topbot", align="center", options="header"] -|=== -|Field |Description - -|*Physical NAND bytes written.* -|The number of bytes written to NAND. 16 bytes - hi/lo - -|*Physical NAND bytes read* -|The number of bytes read from NAND. 16 bytes - hi/lo - -|*Bad NAND Block Count* -|Raw and normalized count of the number of NAND blocks that have been -retired after the drives manufacturing tests (i.e. grown back blocks). -2 bytes normalized, 6 bytes raw count - -|*Uncorrectable Read Error Count* -|Total count of NAND reads that were not correctable by read retries, all -levels of ECC, or XOR (as applicable). 8 bytes - -|*Soft ECC Error Count* -|Total count of NAND reads that were not correctable by read retries, or -first-level ECC. 8 bytes - -|*SSD End to End Detection Count* -|A count of the detected errors by the SSD end to end error correction which -includes DRAM, SRAM, or other storage element ECC/CRC protection mechanism (not -NAND ECC). 4 bytes - -|*SSD End to End Correction Count* -|A count of the corrected errors by the SSD end to end error correction which -includes DRAM, SRAM, or other storage element ECC/CRC protection mechanism (not -NAND ECC). 4 bytes - -|*System Data % Used* -|A normalized cumulative count of the number of erase cycles per block since -leaving the factory for the system (FW and metadata) area. Starts at 0 and -increments. 100 indicates that the estimated endurance has been consumed. - -|*User Data Max Erase Count* -|The maximum erase count across all NAND blocks in the drive. 4 bytes - -|*User Data Min Erase Count* -|The minimum erase count across all NAND blocks in the drive. 4 bytes - -|*Refresh Count* -|A count of the number of blocks that have been re-allocated due to -background operations only. 8 bytes - -|*Program Fail Count* -|Raw and normalized count of total program failures. Normalized count -starts at 100 and shows the percent of remaining allowable failures. -2 bytes normalized, 6 bytes raw count - -|*User Data Erase Fail Count* -|Raw and normalized count of total erase failures in the user area. -Normalized count starts at 100 and shows the percent of remaining -allowable failures. 2 bytes normalized, 6 bytes raw count - -|*System Area Erase Fail Count* -|Raw and normalized count of total erase failures in the system area. -Normalized count starts at 100 and shows the percent of remaining -allowable failures. 2 bytes normalized, 6 bytes raw count - -|*Thermal Throttling Status* -|The current status of thermal throttling (enabled or disabled). -2 bytes - -|*Thermal Throttling Count* -|A count of the number of thermal throttling events. 2 bytes - -|*PCIe Correctable Error Count* -|Summation counter of all PCIe correctable errors (Bad TLP, Bad -DLLP, Receiver error, Replay timeouts, Replay rollovers). 8 bytes -|=== - - -C1 Log Page Data Output Explanation ------------------------------------ -[cols="2*", frame="topbot", align="center", options="header"] -|=== -|Field |Description - -|*Host Read Commands* -|Number of host read commands received during the reporting period. - -|*Host Read Blocks* -|Number of 512-byte blocks requested during the reporting period. - -|*Average Read Size* -|Average Read size is calculated using (Host Read Blocks/Host Read Commands). - -|*Host Read Cache Hit Commands* -|Number of host read commands that serviced entirely from the on-board read -cache during the reporting period. No access to the NAND flash memory was required. -This count is only updated if the entire command was serviced from the cache memory. - -|*Host Read Cache Hit Percentage* -|Percentage of host read commands satisfied from the cache. - -|*Host Read Cache Hit Blocks* -|Number of 512-byte blocks of data that have been returned for Host Read Cache Hit -Commands during the reporting period. This count is only updated with the blocks -returned for host read commands that were serviced entirely from cache memory. - -|*Average Read Cache Hit Size* -|Average size of read commands satisfied from the cache. - -|*Host Read Commands Stalled* -|Number of host read commands that were stalled due to a lack of resources within -the SSD during the reporting period (NAND flash command queue full, low cache page count, -cache page contention, etc.). Commands are not considered stalled if the only reason for -the delay was waiting for the data to be physically read from the NAND flash. It is normal -to expect this count to equal zero on heavily utilized systems. - -|*Host Read Commands Stalled Percentage* -|Percentage of read commands that were stalled. If the figure is consistently high, -then consideration should be given to spreading the data across multiple SSDs. - -|*Host Write Commands* -|Number of host write commands received during the reporting period. - -|*Host Write Blocks* -|Number of 512-byte blocks written during the reporting period. - -|*Average Write Size* -|Average Write size calculated using (Host Write Blocks/Host Write Commands). - -|*Host Write Odd Start Commands* -|Number of host write commands that started on a non-aligned boundary during -the reporting period. The size of the boundary alignment is normally 4K; therefore -this returns the number of commands that started on a non-4K aligned boundary. -The SSD requires slightly more time to process non-aligned write commands than it -does to process aligned write commands. - -|*Host Write Odd Start Commands Percentage* -|Percentage of host write commands that started on a non-aligned boundary. If this -figure is equal to or near 100%, and the NAND Read Before Write value is also high, -then the user should investigate the possibility of offsetting the file system. For -Microsoft Windows systems, the user can use Diskpart. For Unix-based operating systems, -there is normally a method whereby file system partitions can be placed where required. - -|*Host Write Odd End Commands* -|Number of host write commands that ended on a non-aligned boundary during the -reporting period. The size of the boundary alignment is normally 4K; therefore this -returns the number of commands that ended on a non-4K aligned boundary. - -|*Host Write Odd End Commands Percentage* -|Percentage of host write commands that ended on a non-aligned boundary. - -|*Host Write Commands Stalled* -|Number of host write commands that were stalled due to a lack of resources within the -SSD during the reporting period. The most likely cause is that the write data was being -received faster than it could be saved to the NAND flash memory. If there was a large -volume of read commands being processed simultaneously, then other causes might include -the NAND flash command queue being full, low cache page count, or cache page contention, etc. -It is normal to expect this count to be non-zero on heavily utilized systems. - -|*Host Write Commands Stalled Percentage* -|Percentage of write commands that were stalled. If the figure is consistently high, then -consideration should be given to spreading the data across multiple SSDs. - -|*NAND Read Commands* -|Number of read commands issued to the NAND devices during the reporting period. -This figure will normally be much higher than the host read commands figure, as the data -needed to satisfy a single host read command may be spread across several NAND flash devices. - -|*NAND Read Blocks* -|Number of 512-byte blocks requested from NAND flash devices during the reporting period. -This figure would normally be about the same as the host read blocks figure - -|*Average NAND Read Size* -|Average size of NAND read commands. - -|*NAND Write Commands* -|Number of write commands issued to the NAND devices during the reporting period. -There is no real correlation between the number of host write commands issued and the -number of NAND Write Commands. - -|*NAND Write Blocks* -|Number of 512-byte blocks written to the NAND flash devices during the reporting period. -This figure would normally be about the same as the host write blocks figure. - -|*Average NAND Write Size* -|Average size of NAND write commands. This figure should never be greater than 128K, as -this is the maximum size write that is ever issued to a NAND device. - -|*NAND Read Before Write* -|This is the number of read before write operations that were required to process -non-aligned host write commands during the reporting period. See Host Write Odd Start -Commands and Host Write Odd End Commands. NAND Read Before Write operations have -a detrimental effect on the overall performance of the device. -|=== - EXAMPLES -------- @@ -266,6 +80,16 @@ EXAMPLES ------------ # nvme wdc vs-smart-add-log /dev/nvme0 ------------ +* Has the program issue WDC vs-smart-add-log Vendor Unique Command for just the 0xCA log page : ++ +------------ +# nvme wdc vs-smart-add-log /dev/nvme0 -p 0xCA +------------ +* Has the program issue WDC vs-smart-add-log Vendor Unique Command for 0xC0 and 0xCA log pages : ++ +------------ +# nvme wdc vs-smart-add-log /dev/nvme0 -p 0xCA,0xC0 +------------ NVME ---- diff --git a/plugins/wdc/wdc-nvme.c b/plugins/wdc/wdc-nvme.c index 8dc2752..57ad5f6 100644 --- a/plugins/wdc/wdc-nvme.c +++ b/plugins/wdc/wdc-nvme.c @@ -140,6 +140,12 @@ #define WDC_CUSTOMER_ID_0x1004 0x1004 #define WDC_CUSTOMER_ID_0x1005 0x1005 +#define WDC_ALL_PAGE_MASK 0xFFFF +#define WDC_C0_PAGE_MASK 0x0001 +#define WDC_C1_PAGE_MASK 0x0002 +#define WDC_CA_PAGE_MASK 0x0004 +#define WDC_D0_PAGE_MASK 0x0008 + /* Drive Resize */ #define WDC_NVME_DRIVE_RESIZE_OPCODE 0xCC #define WDC_NVME_DRIVE_RESIZE_CMD 0x03 @@ -1056,16 +1062,16 @@ static __u64 wdc_get_drive_capabilities(int fd) { case WDC_NVME_SN640_DEV_ID_1: /* FALLTHRU */ case WDC_NVME_SN640_DEV_ID_2: - /* verify the 0xC0 log page is supported */ - if (wdc_nvme_check_supported_log_page(fd, WDC_NVME_GET_EOL_STATUS_LOG_OPCODE) == true) { - capabilities = WDC_DRIVE_CAP_C0_LOG_PAGE; - } /* FALLTHRU */ case WDC_NVME_SN640_DEV_ID_3: /* FALLTHRU */ case WDC_NVME_SN840_DEV_ID: /* FALLTHRU */ case WDC_NVME_SN840_DEV_ID_1: + /* verify the 0xC0 log page is supported */ + if (wdc_nvme_check_supported_log_page(fd, WDC_NVME_GET_EOL_STATUS_LOG_OPCODE) == true) { + capabilities = WDC_DRIVE_CAP_C0_LOG_PAGE; + } /* FALLTHRU */ case WDC_NVME_ZN440_DEV_ID: /* FALLTHRU */ @@ -4134,6 +4140,8 @@ static int wdc_get_c0_log_page(int fd, char *format, int uuid_index) case WDC_NVME_SN640_DEV_ID: case WDC_NVME_SN640_DEV_ID_1: case WDC_NVME_SN640_DEV_ID_2: + case WDC_NVME_SN840_DEV_ID: + case WDC_NVME_SN840_DEV_ID_1: if (!get_dev_mgment_cbs_data(fd, WDC_C2_CUSTOMER_ID_ID, (void*)&data)) { fprintf(stderr, "%s: ERROR : WDC : 0xC2 Log Page entry ID 0x%x not found\n", __func__, WDC_C2_CUSTOMER_ID_ID); return -1; @@ -4415,7 +4423,7 @@ static int wdc_get_ca_log_page(int fd, char *format) case WDC_NVME_SN640_DEV_ID: case WDC_NVME_SN640_DEV_ID_1: case WDC_NVME_SN640_DEV_ID_2: - case WDC_NVME_SN640_DEV_ID_3: + case WDC_NVME_SN640_DEV_ID_3: case WDC_NVME_SN840_DEV_ID: case WDC_NVME_SN840_DEV_ID_1: @@ -4591,27 +4599,32 @@ static int wdc_vs_smart_add_log(int argc, char **argv, struct command *command, const char *interval = "Interval to read the statistics from [1, 15]."; int fd; const char *log_page_version = "Log Page Version: 0 = vendor, 1 = WDC"; + const char *log_page_mask = "Log Page Mask, comma separated list: 0xC0, 0xC1, 0xCA, 0xD0"; int ret = 0; int uuid_index = 0; + int page_mask = 0, num, i; + int log_page_list[16]; __u64 capabilities = 0; struct config { uint8_t interval; - int vendor_specific; char *output_format; __u8 log_page_version; + char *log_page_mask; }; struct config cfg = { .interval = 14, .output_format = "normal", .log_page_version = 0, + .log_page_mask = "", }; OPT_ARGS(opts) = { - OPT_UINT("interval", 'i', &cfg.interval, interval), - OPT_FMT("output-format", 'o', &cfg.output_format, "Output Format: normal|json"), - OPT_BYTE("log-page-version", 'l', &cfg.log_page_version, log_page_version), + OPT_UINT("interval", 'i', &cfg.interval, interval), + OPT_FMT("output-format", 'o', &cfg.output_format, "Output Format: normal|json"), + OPT_BYTE("log-page-version", 'l', &cfg.log_page_version, log_page_version), + OPT_LIST("log-page-mask", 'p', &cfg.log_page_mask, log_page_mask), OPT_END() }; @@ -4629,6 +4642,40 @@ static int wdc_vs_smart_add_log(int argc, char **argv, struct command *command, goto out; } + num = argconfig_parse_comma_sep_array(cfg.log_page_mask, log_page_list, 16); + + if (num == -1) { + fprintf(stderr, "ERROR: WDC: log page list is malformed\n"); + ret = -1; + goto out; + } + + if (num == 0) + { + page_mask |= WDC_ALL_PAGE_MASK; + } + else + { + for (i = 0; i < num; i++) + { + if (log_page_list[i] == 0xc0) { + page_mask |= WDC_C0_PAGE_MASK; + } + if (log_page_list[i] == 0xc1) { + page_mask |= WDC_C1_PAGE_MASK; + } + if (log_page_list[i] == 0xca) { + page_mask |= WDC_CA_PAGE_MASK; + } + if (log_page_list[i] == 0xd0) { + page_mask |= WDC_D0_PAGE_MASK; + } + } + } + if (page_mask == 0) + fprintf(stderr, "ERROR : WDC: Unknown log page mask - %s\n", cfg.log_page_mask); + + capabilities = wdc_get_drive_capabilities(fd); if ((capabilities & WDC_DRIVE_CAP_SMART_LOG_MASK) == 0) { @@ -4637,25 +4684,29 @@ static int wdc_vs_smart_add_log(int argc, char **argv, struct command *command, goto out; } - if ((capabilities & WDC_DRIVE_CAP_C0_LOG_PAGE) == WDC_DRIVE_CAP_C0_LOG_PAGE) { + if (((capabilities & WDC_DRIVE_CAP_C0_LOG_PAGE) == WDC_DRIVE_CAP_C0_LOG_PAGE) && + (page_mask & WDC_C0_PAGE_MASK)) { /* Get 0xC0 log page if possible. */ ret = wdc_get_c0_log_page(fd, cfg.output_format, uuid_index); if (ret) fprintf(stderr, "ERROR : WDC : Failure reading the C0 Log Page, ret = %d\n", ret); } - if ((capabilities & (WDC_DRIVE_CAP_CA_LOG_PAGE)) == (WDC_DRIVE_CAP_CA_LOG_PAGE)) { + if (((capabilities & (WDC_DRIVE_CAP_CA_LOG_PAGE)) == (WDC_DRIVE_CAP_CA_LOG_PAGE)) && + (page_mask & WDC_CA_PAGE_MASK)) { /* Get the CA Log Page */ ret = wdc_get_ca_log_page(fd, cfg.output_format); if (ret) fprintf(stderr, "ERROR : WDC : Failure reading the CA Log Page, ret = %d\n", ret); } - if ((capabilities & WDC_DRIVE_CAP_C1_LOG_PAGE) == WDC_DRIVE_CAP_C1_LOG_PAGE) { + if (((capabilities & WDC_DRIVE_CAP_C1_LOG_PAGE) == WDC_DRIVE_CAP_C1_LOG_PAGE) && + (page_mask & WDC_C1_PAGE_MASK)) { /* Get the C1 Log Page */ ret = wdc_get_c1_log_page(fd, cfg.output_format, cfg.interval); if (ret) fprintf(stderr, "ERROR : WDC : Failure reading the C1 Log Page, ret = %d\n", ret); } - if ((capabilities & WDC_DRIVE_CAP_D0_LOG_PAGE) == WDC_DRIVE_CAP_D0_LOG_PAGE) { + if (((capabilities & WDC_DRIVE_CAP_D0_LOG_PAGE) == WDC_DRIVE_CAP_D0_LOG_PAGE) && + (page_mask & WDC_D0_PAGE_MASK)) { /* Get the D0 Log Page */ ret = wdc_get_d0_log_page(fd, cfg.output_format); if (ret)