Sam James [Sat, 30 Sep 2023 05:38:53 +0000 (06:38 +0100)]
test: handle POSIX ioctl prototype
glibc has the following prototype for ioctl: int ioctl(int fd, unsigned long request, ...)
POSIX (inc. musl) has the following for ioctl: int ioctl(int fd, int request, ...)
Check which prototype is used in <sys/ioctl.h> to avoid a conflict and conditionally
define the right one for the system.
Bug: https://bugs.gentoo.org/914921 Signed-off-by: Sam James <sam@gentoo.org>
Jeremy Kerr [Sat, 23 Sep 2023 16:44:50 +0000 (09:44 -0700)]
mi-mctp: Fix free() in error path of mi_open_mctp
If the malloc of our struct nvme_mi_transport_mctp fails, we'll attempt
to free ->rsp_buf of this (now zero) pointer.
Instead, structure the error path to progressively undo the
initialisation operations. This means we'll need to save the errno at
the site of each possible failure.
In doing this, add a comment to the call to nvme_mi_close(), just to
clarify behaviour with regards to the cleanups through that path.
Reported-by: Barnabás Pőcze <pobrn@protonmail.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
This implements support for TP4119a, adding various fields and functions
to enable better handling of the new Physical Receiver Eye Opening
Measurement log page.
ioctl: pass NSID in Get/Set Features commands that use it
Several features are configured on a per-namespace basis
by setting the NSID in the Set Features and Get Features commands.
But the corresponding nvme_{g,s}et_features_*() functions
aren't passing the NSID in the commands.
For the functions missing a NSID parameter, define new variants
with the NSID argument added and mark the old functions as deprecated.
ioctl: fix swapped parameters in nvme_set_features_host_id()
nvme_set_features_host_id()'s prototype says the parameter EXHID
comes before SAVE, but its definition does the opposite,
causing the parameters to be misinterpreted.
Match the definition to the declaration.
nvme_get_features_iocs_profile() and nvme_set_features_iocs_profile()
are defined but not exorted in libnvme.so.
nvme_set_features_iocs_profile()'s prototype was also removed,
so add it back.
IOCSI is a 9-bit value, so fix its bitmask and change its type to u16.
According to the NVMe spec, the Host Behavior
and Write Protection features are not saveable.
Setting the SAVE bit may cause the Set Features command to be rejected,
so don't set it for these features.
The last 2 bytes of the struct nvme_timestamp data
are not initialized in nvme_set_features_timestamp().
Add an initializer to avoid sending uninitialized stack memory
to the controller.
Several nvme_{g,s}et_features_*() functions take a data buffer
but don't provide it to the Get/Set Features commands.
Fix them to pass the data buffer to nvme_{g,s}et_features().
Getting the Host Memory Buffer feature also returns
a struct nvme_host_mem_buf_attrs data structure,
so make a nvme_get_features_host_mem_buf2() function
that takes in a data buffer to receive it.
Martin Belanger [Tue, 12 Sep 2023 19:55:55 +0000 (15:55 -0400)]
tree: Add 2 new public functions to lookup existing controllers
There is duplicate code between libnvme and nvme-cli. When trying
to instantiate a controller, both libnvme and nvme-cli scan the
sysfs looking for an existing controller that can be reused.
This patch adds nvme_ctrl_find() and nvme_ctrl_config_match().
These functions can be used by nvme-cli to lookup existing
controllers. This will allow removing the duplicate code in
nvme-cli (separate commit to follow on the nvme-cli GitHub repo).
Signed-off-by: Martin Belanger <martin.belanger@dell.com>
da Cunha, Leonardo [Wed, 23 Aug 2023 18:26:39 +0000 (11:26 -0700)]
linux: Added functions to enable faster telemetry data retrieval.
Moved telemetry data area support detection into separate function.
Added possibility to modify data transfer chunk size.
Enable telemetry extraction up to specified data area.
Removed some printf() and perror().
Caleb Sander [Wed, 30 Aug 2023 16:45:57 +0000 (10:45 -0600)]
test: account for discovery log page entry stripping
The discovery log page tests incorrectly assumed the log page
nvmf_get_discovery_log() returns exactly matches what was fetched.
This failed to account for the trailing space trimming
of the TRSVCID and TRADDR fields.
If the arbitrary entries happened to contain trailing spaces
for these fields, the test cases would fail.
Generate arbitrary entries with space-padded strings for the log page
and corresponding expected entries with NUL-padded strings.
The NVMe base specification defines fields TRSVCID and TRADDR
as "ASCII strings", meaning they are space-padded.
Therefore, whether spaces need to be stripped from them
doesn't depend on the transport or address family.
Some combinations already appear to be missing,
such as RDMA using IB addresses.
And if new transports are added in the future,
sanitize_discovery_log_entry() would need to be updated.
So strip spaces from these fields regardless of the entry's
transport type or address family.
Caleb Sander [Wed, 30 Aug 2023 15:44:48 +0000 (09:44 -0600)]
fabrics: only look for spaces in strchomp()
strchomp() is only needed to strip trailing spaces from a string.
There is no need to replace NUL characters with NUL characters.
So simplify the check for "NUL or space" to just "space".
Update the function description as well,
as it doesn't strip all whitespace, just spaces.
Daniel Wagner [Thu, 31 Aug 2023 14:19:50 +0000 (16:19 +0200)]
build: use latest container instead fixed version
We control the build containers so there is little risk
that these randomly break. So let's go with the latest
version and avoid updating the build files all the time.
Martin Belanger [Fri, 18 Aug 2023 01:35:34 +0000 (21:35 -0400)]
tree: Improve TCP controller matching algorithm
The configuration parameters used to connect to a TCP controller are:
1. transport - "tcp"
2. traddr - Destination address (controller's IP address)
3. trsvcid - Service ID (controller's port - typically 8009 or 4420)
4. host-traddr - Source address (host's IP address)
5. host-iface - Physical/Logical interface where the connection will be made
For TCP, transport, traddr, and trsvcid are mandatory, while the
host-traddr and host-iface are optional. The host-traddr and host-iface
can be used as "overrides" to select a different source address and
interface than those that the kernel would choose by default.
When an application using libnvme to connect to a controller does
not specify the host-traddr or host-iface, the kernel will have to
determine the best interface and source address by itself. It does that
by looking up the destination address (traddr) in the routing table to
determine the best interface for the connection. The kernel then
retrieves the primary IP address assigned to that interface and uses that
as the connection's source address. By default, the kernel always uses
the interface's primary address as the connection's source address
unless host-traddr is used to override it.
Prior to version 6.1, the kernel did not reveal the source address or
interface it selected. Therefore, it was impossible for user-space apps
to tell exactly where connections were made. With kernel 6.1 (and later),
the sysfs now exposes the source address as "src_addr=" in the nvme
"address" attribute. The src_addr not only provides us with the
connection's source address, but by scanning the interface map one can
find out which interface owns that source address and precisely determine
on which interface each connection is made.
With TP8010 and the introduction of the Centralized Discovery Controller
(CDC), it is very important for hosts to connect to CDCs with a consistent
source address. That's because of the way the CDC keeps track of all the
hosts that connect to it. In addition to the host NQN, the CDC also checks
the host IP address (the connection's source address) to uniquely identify
a host. This unique identifier is then used for fabric zoning.
With fabric zoning, administrators configure the list of I/O controllers
that a host is allowed to connect to. The CDC sends the list of I/O
controllers to the host in response to a Get Discovery Log Page (DLP)
command from the host. If a host does not connect to the CDC with the
right source address, it will receive invalid DLP entries (wrong zone).
This will cause the host to connect to the wrong I/O controllers.
libnvme tries to avoid making duplicate connections to the same
controller. This avoids consuming precious kernel resources. When an
application requests libnvme to connect to a controller (the candidate
controller), libnvme scans the sysfs to see if an existing connection
matches the candidate. If a matching connection is found, libnvme just
reuses it instead of creating a new one.
Matching the 3 mandatory parameters (transport, traddr, trsvcid) between
existing connections and a candidate connection is easy because they can
never be NULL and can therefore be compared. It is not the case for the
host-traddr and host-iface. These optional parameters can be NULL. A NULL
host-traddr or host-iface means that we have left it to the kernel to
determine the interface and source address to use for the connection.
Therefore, if we want to compare a non-NULL candidate host-traddr to
an existing connection with a NULL host-traddr, we cannot just compare the
two. They will obviously be different. Instead, we need to check the
src_addr of the existing connection to see if it matches the candidate's
host-traddr.
Prior to this patch, libnvme performed a simple string comparison
between the candidate's host-traddr (or host-iface) and the existing
connection's host-traddr (or host-iface). A match would be declared if
both were the same (including both NULL). Also, a match would even be
declared if the existing connection's host-traddr (or host-iface) was
NULL while the candidate's host-traddr (or host-iface) was non-NULL.
This is wrong and can lead to the wrong connections being reused and
the wrong DLP entries returned by the CDC.
With this patch, when a candidate wants a specific source address
(host-traddr != NULL) or interface (host-iface != NULL), libnvme will
now check the src_addr of each existing connection to ensure a 100%
match. If the src_addr is not available (kernel older than 6.1), then we
can still infer the real interface and source address of an existing
connection, if the existing connection has either its host-traddr or
host-iface defined (check code to see how it's done).
It's only when an exsiting connection's src_addr, host-iface, and
host-traddr are all NULL that we cannot clearly match to a candidate.
When that happens, libnvme will take an optimistic approach and
declare a match even though it doesn't have enough info to do so.
This "optimistic match" follows what libnvme was doing prior to this
patch.
Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Daniel Wagner [Sat, 19 Aug 2023 10:44:07 +0000 (12:44 +0200)]
mi: remove nsid from nvme_mi_admin_identify_secondary_ctrl_list()
According to the NVMe specification, Identify CNS value 15h
("Secondary Controller list of controllers associated with the primary
controller processing the command") does not use the NSID field. So
remove the "nsid" argument from
nvme_mi_admin_identify_secondary_ctrl_list().
Fixes: 07b63103878 ("ioctl: remove nsid from nvme_identify_secondary_ctrl_list()") Signed-off-by: Daniel Wagner <dwagner@suse.de>
Caleb Sander [Sat, 12 Aug 2023 20:50:47 +0000 (14:50 -0600)]
test: add tests for Identify functions
Use the mock ioctl() infrastructure to test the functions in ioctl.h
that issue Identify commands.
nvme_identify_ns_csi_user_data_format() and
nvme_identify_iocs_ns_csi_user_data_format() are omitted
since they're not defined in the NVMe specification yet
Functions tested indirectly from other functions are also omitted.
Caleb Sander [Sat, 12 Aug 2023 20:50:03 +0000 (14:50 -0600)]
ioctl: use available Identify helper functions
nvme_identify_independent_identify_ns() only specifies a CNS and a NSID,
so have it just call nvme_identify_cns_nsid()
instead of duplicating most of its implementation.
Similarly, nvme_zns_identify_ns() is just nvme_identify_ns_csi()
with the CSI set to ZNS and no UUID index.
Caleb Sander [Sat, 12 Aug 2023 19:55:12 +0000 (13:55 -0600)]
test: pass a large enough buffer to nvme_identify_ns_descs()
nvme_identify_ns_descs() takes a struct nvme_ns_id_desc * parameter,
but passes it as the data to nvme_identify(), which sets data_len = 4K.
But struct nvme_ns_id_desc only represents the start of a single
Namespace Identification Descriptor, so it is less than 4 KB.
So it needs to be explicitly allocated with at least 4 KB.
Allocate a 4 KB buffer in test.c to avoid a stack buffer overflow.
Caleb Sander [Sat, 12 Aug 2023 19:51:18 +0000 (13:51 -0600)]
ioctl: remove nsid from nvme_identify_secondary_ctrl_list()
According to the NVMe specification, Identify CNS value 15h
("Secondary Controller list of controllers associated with
the primary controller processing the command")
does not use the NSID field.
So remove the "nsid" argument from nvme_identify_secondary_ctrl_list().
Caleb Sander [Wed, 16 Aug 2023 21:20:27 +0000 (15:20 -0600)]
build: clean up version script
Remove symbols from libnvme.map that don't exist in libnvme.so.
These are a mix of static inline functions,
static (unexported) functions, nonexistent functions, and types.
Also remove a couple of duplicate symbols.
Bash script to find unexported symbols:
for symbol in `grep -o nvm[ef]_[a-z0-9_]* src/libnvme.map`
do
if nm -D .build/src/libnvme.so | grep $symbol$ > /dev/null
then
true
else
echo $symbol
fi
done
Martin Belanger [Tue, 15 Aug 2023 19:16:24 +0000 (15:16 -0400)]
meson: Don't hard-code path to "internal/config.h"
When building libnvme as a subproject of another project
(e.g. nvme-stas), the hard-coded absolute path to config.h, i.e.
"-include internal/config.h" does not resolve properly and fails
to build. Instead, use the real path calculated by meson and
saved to variable "config_h".
Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Add unit tests for nvmf_get_discovery_log().
They provide coverage of the logic in nvme_discovery_log(),
nvme_get_log_page(), and nvme_get_log() too.
The tests use the mock ioctl() infra to validate the Get Log Page
commands issued and inject responses triggering different code paths.
Functions issuing admin/IO passthru ioctls sorely lack unit tests.
It would be great for unit tests not to need a real NVMe controller.
It's also useful to be able to test responses to commands
that might be impossible to trigger with real controllers.
To that end, implement infrastructure for mocking ioctl(),
allowing tests to set expectations for the NVMe passthru ioctls
that will be issued and control the corresponding responses.
The mock library can be used with LD_PRELOAD so that libnvme's ioctl()
calls are redirected from libc. No changes in libnvme itself are needed.
Hannes Reinecke [Tue, 8 Aug 2023 11:32:53 +0000 (13:32 +0200)]
src/nvme/tree.c: make __nvme_scan_subsystem() returning bool
__nvme_scan_subsystem() will free the 's' argument when the filter
triggers, so it needs a return value to inform the caller that the
argument has been freed.
Martin George [Wed, 26 Jul 2023 13:31:29 +0000 (19:01 +0530)]
nvme-tree: avoid warning in 'list-subsys'
With the recent change to scan all subsystems, 'nvme list-subsys
/dev/nvmeXnY' now displays an annoying warning for the NQN mismatch
for all other subsystems that don't match during the subsystem
scan. For e.g.
NQN mismatch for subsystem 'nvme-subsys1'
NQN mismatch for subsystem 'nvme-subsys4'
nvme-subsys3 - NQN=nqn.1992-08.com.netapp:sn.48391d66c0a611ecaaa5d039ea165514:subsystem.subsys_CLIENT116_1
hostnqn=nqn.2014-08.org.nvmexpress:uuid:e6550026-173e-4959-ba74-be367844bd8a
\
+- nvme3 tcp traddr=192.168.1.116,trsvcid=4420,host_traddr=192.168.1.16,host_iface=eth5 live optimized
+- nvme7 tcp traddr=192.168.2.116,trsvcid=4420,host_traddr=192.168.2.16 live optimized
+- nvme8 tcp traddr=192.168.1.116,trsvcid=4420,host_traddr=192.168.2.16 live optimized
Avoid this warning by displaying it only under debug level.
Fixes: fbd45f1 ("tree: Scan all subsystems") Signed-off-by: Martin George <marting@netapp.com>
Daniel Wagner [Mon, 17 Jul 2023 11:54:30 +0000 (13:54 +0200)]
tree: Scan all subsystems
We need to scan all subsystems because a subsystem might show up on
different hosts, e.g 'nvme connect' with different hostnqn and the
target reports the same namespace.
Jeremy Kerr [Mon, 3 Jul 2023 08:14:47 +0000 (16:14 +0800)]
mi: allow non-4-byte-aligned responses
We currently assume that a MI response will be a multiple of four bytes
in length. However, this may not be the case: for example, a Read MI
Data (Controller List) with an even number of controllers, and with an
unpadded response, may only be aligned on a two-byte boundary.
The NVMe-MI spec states, for the MIC field:
This field is byte aligned.
So, relax the requirement for alignment on the response sizes, and the
expected response size values. We only do this for the mi commands; the
Admin commands still require an aligned value for DLEN.
In doing so, drop the explicit alignment tests, and add a couple that
check that the Controller List example above will work.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reported-by: Klaus Jensen <its@irrelevant.dk>
Jeremy Kerr [Mon, 3 Jul 2023 08:06:34 +0000 (16:06 +0800)]
mi-mctp: use a linear response buffer
Currently, we're passing a 3-entry iovec to the MCTP resvmsg()
interface:
- header
- payload
- MIC
This is fine if the response comes back eaxctly the size we expect, but
causes complexity if we get a smaller response (for example, as an error
or a More Processing Required response), as we need to extract the MIC
from somewhere in those buffers.
At the moment, since we're enforcing 4-byte alignment, that isn't too
complex - we know the MIC will be entirely in one of the buffers. The
MPR code is a bit awkward, but still manageable.
However: we now want to allow unaligned responses from MI messages,
which is about to make that a lot more complex; in the worst case, the
MIC could be split over all three buffers!
This change uses a fixed linear buffer for the MCTP response instead. We
allocate 4k for this by default, but expand if necessary. We use this as
the sendmsg() buffer, so get a linear message back from the MCTP
endpoint. Once we have verified the format (and extracted the MIC), we
copy this into the actual response header/payload buffers as required.
This makes the response handing code simpler, at the cost of one extra
response buffer per endpoint.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Jeremy Kerr [Mon, 3 Jul 2023 07:55:04 +0000 (15:55 +0800)]
mi: implement length and offset alignment checks in admin_xfer()
We're about to relax some alignment requirements in the generic
(internal) nvme_mi_submit function. To ensure that the raw admin
interface continues to enfore the required alignment on DOFST and DLEN
fields, implement checks in the Admin command interface.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Martin Belanger [Wed, 5 Jul 2023 14:59:25 +0000 (10:59 -0400)]
tree: missing closedir() causes fd leak for "/sys/bus/pci/slots"
In nvme_ctrl_lookup_phy_slot(), we are missing a closedir(), which
causes file descriptors to leak. Also, there was a missing free()
when the function returns with ENOMEM.
Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Jeremy Kerr [Wed, 24 May 2023 09:29:33 +0000 (17:29 +0800)]
mi: don't return from mi_mctp_submit with a tag held
If the poll() times-out or fails, we'll exit early from
nvme_mi_mctp_submit still holding the tag reservation. When using an i2c
transport, this may mean we hold a lock on the i2c bus with no way to
release.
Instead, always drop the tag on function exit.
Fixes: 6a08780 ("mi-mctp: Add timeout support to MCTP transport") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Daniel Wagner [Fri, 23 Jun 2023 13:29:15 +0000 (15:29 +0200)]
build: Move CI build steps into a scripts
Move the build instruction into a script. This allows to run these steps
also locally.
Also disable the fallback static library build as it is clearly not
working because in the dependencies rely to link against a dynamic
glibc. Instead just add a minimal static build without fallbacks.
Daniel Wagner [Thu, 22 Jun 2023 12:13:13 +0000 (14:13 +0200)]
fabrics: Relax match on well known disc ctrl lookup
In case nvmf_add_ctrl() is called to add a well known discovery
controller we also need to verify if we should ignore it (see --context
command line argument of nvme-cli). Though we have to be careful not to
overmatch on the lookup.
That means the host_traddr and host_iface might be different for the
discovery controller than the normal controllers. For example this can
happen when the discovery controller is reached via different interface
than the data controllers.
Thus only consider the transport type, target address and trsvcid only
when looking up well known discovery controllers.
Daniel Wagner [Wed, 14 Jun 2023 12:19:10 +0000 (14:19 +0200)]
fabrics: Filter discovery ctrls out during application context check
We also need to filter out the well known discovery controllers when
using the execution context filtering. Obviously, we can't use the
subsystem name, thus match on the host and target address instead.
Daniel Wagner [Wed, 14 Jun 2023 12:17:41 +0000 (14:17 +0200)]
util: Add ignored error code
When libnvme ignores a connection attempt via the 'application' context
tracking return an unique error code to allow proper filtering on the
caller side.
Umer Saleem [Mon, 12 Jun 2023 14:45:49 +0000 (19:45 +0500)]
tree: Add PCI physical slot number for controller
This commit introduces a physical slot field for controller, that
contains the PCI physcial slot number for controller device.
In case, there are multiple NVME drives present on the platform,
it's hard to identify which NVME drive is present in which slot.
The slot number is usually helpful in determining the location.
It is cross reference-able from lspci, but it would be nice to
have a direct option.
Hannes Reinecke [Thu, 20 Apr 2023 10:38:10 +0000 (12:38 +0200)]
fabrics: filter out subsystems with non-matching application string
If the nvme root has an application string set any subsystem lookup
should ignore subsystems which either have no application string set
or which have a non-matching application string.