Test for_each_numa_cpus() output to ensure that:
- all CPUs are picked from NUMA nodes with non-decreasing distances to the
original node;
- only online CPUs are enumerated;
- the macro enumerates each online CPUs only once;
- enumeration order is consistent with cpumask_local_spread().
The latter is an implementation-defined behavior. If cpumask_local_spread()
or for_each_numa_cpu() will get changed in future, the subtest may need
to be adjusted or even removed, as appropriate.
It's useful now because some architectures don't implement numa_distance(),
and generic implementation only distinguishes local and remote nodes, which
doesn't allow to test the for_each_numa_cpu() properly.