From: Dongli Zhang Date: Thu, 7 Mar 2019 03:19:32 +0000 (+0800) Subject: jiffies: use jiffies64_to_nsecs() to fix 100% steal usage for xen vcpu hotplug X-Git-Tag: v4.1.12-124.31.3~248 X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=8e87dd822e2c248c4df1c2b07eec3ba952068ec3;p=users%2Fjedix%2Flinux-maple.git jiffies: use jiffies64_to_nsecs() to fix 100% steal usage for xen vcpu hotplug [ Not relevant upstream, therefore no upstream commit. ] To fix, use jiffies64_to_nsecs() directly instead of deriving the result according to jiffies_to_usecs(). As the return type of jiffies_to_usecs() is 'unsigned int', when the return value is more than the size of 'unsigned int', the leading 32 bits would be discarded. Suppose USEC_PER_SEC=1000000L and HZ=1000, below are the expected and actual incorrect result of jiffies_to_usecs(0x7770ef70): - expected : jiffies_to_usecs(0x7770ef70) = 0x000001d291274d80 - incorrect : jiffies_to_usecs(0x7770ef70) = 0x0000000091274d80 The leading 0x000001d200000000 is discarded. After xen vcpu hotplug and when the new vcpu steal clock is calculated for the first time, the result of this_rq()->prev_steal_time in steal_account_process_tick() would be far smaller than the expected value, due to that jiffies_to_usecs() discards the leading 32 bits. As a result, the diff between current steal and this_rq()->prev_steal_time is always very large. Steal usage would become 100% when the initial steal clock obtained from xen hypervisor is very large during xen vcpu hotplug, that is, when the guest is already up for a long time. The bug can be detected by doing the following: * Boot xen guest with vcpus=2 and maxvcpus=4 * Leave the guest running for a month so that the initial steal clock for the new vcpu would be very large * Hotplug 2 extra vcpus * The steal time of new vcpus in /proc/stat would increase abnormally and sometimes steal usage in top can become 100% This was incidentally fixed in the patch set starting by commit 93825f2ec736 ("jiffies: Reuse TICK_NSEC instead of NSEC_PER_JIFFY") and ended with commit b672592f0221 ("sched/cputime: Remove generic asm headers"). Orabug: 28806208 Link: https://lkml.org/lkml/2019/2/28/1373 Suggested-by: Juergen Gross Signed-off-by: Dongli Zhang Reviewed-by: Joe Jin Signed-off-by: Brian Maly --- diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h index 1161e5cba0df..9044fd68893e 100644 --- a/include/linux/jiffies.h +++ b/include/linux/jiffies.h @@ -283,13 +283,13 @@ extern unsigned long preset_lpj; extern unsigned int jiffies_to_msecs(const unsigned long j); extern unsigned int jiffies_to_usecs(const unsigned long j); +extern u64 jiffies64_to_nsecs(u64 j); + static inline u64 jiffies_to_nsecs(const unsigned long j) { - return (u64)jiffies_to_usecs(j) * NSEC_PER_USEC; + return jiffies64_to_nsecs(j); } -extern u64 jiffies64_to_nsecs(u64 j); - extern unsigned long msecs_to_jiffies(const unsigned int m); extern unsigned long usecs_to_jiffies(const unsigned int u); extern unsigned long timespec_to_jiffies(const struct timespec *value);