fine-tune the HT sched-domains parameters as well.
On a HT capable box, this increases lat_ctx performance from 23.87
usecs to 1.49 usecs:
 # before
 $ ./lat_ctx -s 0 2
   "size=0k ovr=1.89
    2 23.87
 # after
 $ ./lat_ctx -s 0 2
   "size=0k ovr=1.84
     2 1.49
Signed-off-by: Ingo Molnar <mingo@elte.hu>
                                | SD_BALANCE_FORK       \
                                | SD_BALANCE_EXEC       \
                                | SD_WAKE_AFFINE        \
-                               | SD_WAKE_IDLE          \
+                               | SD_WAKE_BALANCE       \
                                | SD_SHARE_CPUPOWER,    \
        .last_balance           = jiffies,              \
        .balance_interval       = 1,                    \