Optimize topology_span_sane() by removing duplicate comparisons.
Since topology_span_sane() is called inside of for_each_cpu(), each
previous CPU has already been compared against every other CPU. The
current CPU only needs to be compared against higher-numbered CPUs.
The total number of comparisons is reduced from N * (N - 1) to
N * (N - 1) / 2 on each non-NUMA scheduling domain level.