for_each_node_with_cpus() calls nr_cpus_node() at every iteration, which
makes it O(N^2). Kernel tracks such nodes with N_CPU record in node_states
array. Switching to it makes for_each_node_with_cpus() O(N).
Andrea:
Now we can include also offline nodes with CPUs assigned (assuming it's
possible). If checking the online state is required, the user must use
node_online() within the loop.