Currently we loop through all threads of a core to evaluate if the core is
idle or not. This is unnecessary. If a thread of a core is not idle, skip
evaluating other threads of a core. Also while clearing the cpumask, bits
of all CPUs of a core can be cleared in one-shot.
Collecting ticks on a Power 9 SMT 8 system around select_idle_core
while running schbench shows us
(units are in ticks, hence lesser is better)
Without patch
    N        Min     Max     Median         Avg      Stddev
x 130        151    1083        284   322.72308   144.41494
With patch
    N        Min     Max     Median         Avg      Stddev   Improvement
x 164         88     610        201   225.79268   106.78943        30.03%
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lkml.kernel.org/r/20191206172422.6578-1-srikar@linux.vnet.ibm.com
 
                bool idle = true;
 
                for_each_cpu(cpu, cpu_smt_mask(core)) {
-                       __cpumask_clear_cpu(cpu, cpus);
-                       if (!available_idle_cpu(cpu))
+                       if (!available_idle_cpu(cpu)) {
                                idle = false;
+                               break;
+                       }
                }
+               cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
 
                if (idle)
                        return core;