Ashok Raj [Sat, 19 Nov 2016 08:32:45 +0000 (00:32 -0800)]
PCI: pciehp: Prioritize data-link event over presence detect
If Slot Status indicates changes in both Data Link Layer Status and
Presence Detect, prioritize the Link status change.
When both events are observed, pciehp currently relies on the Slot Status
Presence Detect State (PDS) to agree with the Link Status Data Link Layer
Active status. The Presence Detect State, however, may be set to 1 through
out-of-band presence detect even if the link is down, which creates
conflicting events.
Since the Link Status accurately reflects the reachability of the
downstream bus, the Link Status event should take precedence over a
Presence Detect event. Skip checking the PDC status if we handled a link
event in the same handler.
Ashok Raj [Sat, 19 Nov 2016 08:32:46 +0000 (00:32 -0800)]
PCI: pciehp: Leave power indicator on when enabling already-enabled slot
If an error occurs when enabling a slot, pciehp_power_thread() turns off
the power indicator. But if the only error is that the slot was already
enabled, we should leave the power indicator on.
Return success if called to enable an already-enabled slot.
This is in the same spirit of the special handling for EEXISTS when
pciehp_configure_device() determines the slot devices already exist.
And, as Dmitry rightly assessed, that is because we can drop the
reference and then touch it when the underlying recvmsg calls return
some packets and then hit an error, which will make recvmmsg to set
sock->sk->sk_err, oops, fix it.
Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Fixes: a2e2725541fa ("net: Introduce recvmmsg socket syscall")
http://lkml.kernel.org/r/20160122211644.GC2470@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 34b88a68f26a75e4fded796f1a49c40f82234b7d) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Waiman Long [Tue, 13 Dec 2016 01:52:45 +0000 (12:52 +1100)]
signals: avoid unnecessary taking of sighand->siglock
When running certain database workload on a high-end system with many
CPUs, it was found that spinlock contention in the sigprocmask syscalls
became a significant portion of the overall CPU cycles as shown below.
Looking further into the swapcontext function in glibc, it was found that
the function always call sigprocmask() without checking if there are
changes in the signal mask.
A check was added to the __set_current_blocked() function to avoid taking
the sighand->siglock spinlock if there is no change in the signal mask.
This will prevent unneeded spinlock contention when many threads are
trying to call sigprocmask().
With this patch applied, the spinlock contention in sigprocmask() was
gone.
Link: http://lkml.kernel.org/r/1474979209-11867-1-git-send-email-Waiman.Long@hpe.com Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stas Sergeev <stsp@list.ru> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Douglas Hatch <doug.hatch@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c7be96af89d4b53211862d8599b2430e8900ed92)
There is a double fetch problem in audit_log_single_execve_arg()
where we first check the execve(2) argumnets for any "bad" characters
which would require hex encoding and then re-fetch the arguments for
logging in the audit record[1]. Of course this leaves a window of
opportunity for an unsavory application to munge with the data.
This patch reworks things by only fetching the argument data once[2]
into a buffer where it is scanned and logged into the audit
records(s). In addition to fixing the double fetch, this patch
improves on the original code in a few other ways: better handling
of large arguments which require encoding, stricter record length
checking, and some performance improvements (completely unverified,
but we got rid of some strlen() calls, that's got to be a good
thing).
As part of the development of this patch, I've also created a basic
regression test for the audit-testsuite, the test can be tracked on
GitHub at the following link:
[1] If you pay careful attention, there is actually a triple fetch
problem due to a strnlen_user() call at the top of the function.
[2] This is a tiny white lie, we do make a call to strnlen_user()
prior to fetching the argument data. I don't like it, but due to the
way the audit record is structured we really have no choice unless we
copy the entire argument at once (which would require a rather
wasteful allocation). The good news is that with this patch the
kernel no longer relies on this strnlen_user() value for anything
beyond recording it in the log, we also update it with a trustworthy
value whenever possible.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Dan Duval <dan.duval@oracle.com>
(cherry picked from commit 43761473c254b45883a64441dd0bc85a42f3645c)
Conflict:
kernel/auditsc.c
Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Fix a short sprintf buffer in proc_keys_show(). If the gcc stack protector
is turned on, this can cause a panic due to stack corruption.
The problem is that xbuf[] is not big enough to hold a 64-bit timeout
rendered as weeks:
(gdb) p 0xffffffffffffffffULL/(60*60*24*7)
$2 = 30500568904943
That's 14 chars plus NUL, not 11 chars plus NUL.
Expand the buffer to 16 chars.
I think the unpatched code apparently works if the stack-protector is not
enabled because on a 32-bit machine the buffer won't be overflowed and on a
64-bit machine there's a 64-bit aligned pointer at one side and an int that
isn't checked again on the other side.
Reported-by: Ondrej Kozina <okozina@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Ondrej Kozina <okozina@redhat.com>
cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 03dab869b7b239c4e013ec82aea22e181e441cfc) Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Replace MSR_NHM_TURBO_RATIO_LIMIT with MSR_TURBO_RATIO_LIMIT.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ebf5926a001fd0d0d3fdfc2c39c0c8ff0c846c1a) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
When run
make -C tools DESTDIR=/my/nice/dir turbostat_install
get a message
install: cannot create regular file '/usr/bin/turbostat': Permission denied
Allow user to alter DESTDIR and PREFIX variables.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ac485cb4b3b79fb7ef979cb243ee16e1a1ffa767) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Sometimes the rc6 sysfs counter spontaneously resets,
causing turbostat prints a very large number
as it tries to calcuate % = 100 * (old - new) / interval
When we see (old > new), print ***.**% instead
of a bogus huge number.
Note that this detection is not fool-proof, as the counter
could reset several times and still result in new > old.
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 9185e988e9d5bb70b690362e84bb2e4a9d71f2c5) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit cdc57272ea0a0e952c4609b56e157e4d0ec8e956) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ec53e594c65ab099ca784d62b6f4c191e3a4d7cc) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit e8efbc80db5e824ce2382d5e65429b6b493e71e2) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit e4085d543e256aff6606ba99ed257f7c06685f3b) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Some processors use the Interrupt Response Time Limit (IRTL) MSR value
to describe the maximum IRQ response time latency for deep
package C-states. (Though others have the register, but do not use it)
Lets print it out to give insight into the cases where it is used.
IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10.
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 5a63426e2a18775ed05b20e3bc90c68bacb1f68a) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The CPUID.SGX bit was printed, even if --debug was used
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 8ae7225591fd15aac89769cbebb3b5ecc8b12fe5) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
MSR_CONFIG_TDP_NOMINAL:
should print all 8 bits of base_ratio (bit 0:7) 0xFF
MSR_CONFIG_TDP_LEVEL_1:
should print all 15 bits of PKG_MIN_PWR_LVL1 (bit 48:62) 0x7FFF
should print all 15 bits of PKG_MAX_PWR_LVL1 (bit 32:46) 0x7FFF
should print all 8 bits of LVL1_RATIO (bit 16:23) 0xFF
should print all 15 bits of PKG_TDP_LVL1 (bit 0:14) 0x7FFF
And the same modification to MSR_CONFIG_TDP_LEVEL_2.
MSR_TURBO_ACTIVATION_RATIO:
should print all 8 bits of MAX_NON_TURBO_RATIO (bit 0:7) 0xFF
Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 685b535b2cdb9cdf354321f8af9ed17dcf19d19f) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=0: unlimited)
should print as
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=8: unlimited)
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 6c34f160df82ae07bfff5b4c51f50622e9c2759e) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
turbostat already checks whether calling each cpuid leavf is legal,
and it doesn't look at the function return value,
so call the simpler gcc intrinsic __cpuid() instead of __get_cpuid().
syntax only, no functional change
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 5aea2f7f645b27635b856311dee5b775d277c686) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
SGX presence is related to a SKL power workaround,
so lets show when that is enabled.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit aa8d8cc79af16e16da04efff1c1a72b1ea4a9e7e) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The accuracy of Bzy_Mhz and Busy% depend on reading
the TSC, APERF, and MPERF close together in time.
When there is a very short measurement interval,
or a large system is profoundly idle, the changes
in APERF and MPERF may be very small.
They can be small enough that an expensive interrupt
between reading APERF and MPERF can cause the APERF/MPERF
ratio to become inaccurate, resulting in invalid
calculation and display of Bzy_MHz.
A dummy APERF read of APERF makes this problem
much more rare. Apparently this 1st systemn call
after exiting a long stretch of idle is when we
typically see expensive timer interrupts that cause
large jitter.
For the cases that dummy APERF read fails to prevent,
we compare the latency of the APERF and MPERF reads.
If they differ by more than 2x, we re-issue them.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 0102b06747c7d24e334d2b27c4b43eed693676f1) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The column "GFX%c6" show the percentage of time the GPU
is in the "render C6" state, rc6. Deep package C-states on several
systems depend on the GPU being in RC6.
This information comes from the counter
/sys/class/drm/card0/power/rc6_residency_ms,
as read before and after the measurement interval.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit fdf676e51f301d207586d9bac509b8ce055bae8a) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Under the column "GFXMHz", show a snapshot of this attribute:
/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz
This is an instantaneous snapshot of what sysfs presents
at the end of the measurement interval. turbostat does
not average or otherwise perform any math on this value.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 27d47356b6dfa92042a17a0b474f08910d4c8e8f) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The new IRQ column shows how many interrupts have occurred on each CPU
during the measurement inteval. This information comes from
the difference between /proc/interrupts shapshots made before
and after the measurement interval.
The first row, the system summary, shows the sum of the IRQS
for all CPUs during that interval.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 562a2d377bb9882c49debc9e1be7127a1717e242) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
skip the open(2)/close(2) on each msr read
by keeping the /dev/cpu/*/msr files open.
The remaining read(2) is generally far fewer cycles
than the removed open(2) system call.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 36229897ba966bb0dc9e060222ff17b198252367) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 58cc30a4e6086d542302e04eea39b2a438e4c5dd) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
tools/power/x86/turbostat/turbostat.c
Turbostat --debug gconfiguration info goes to stderr.
In FORK mode, turbostat statistics go to stderr.
In PERIODIC mode, turbostat statistics go to stdout.
These defaults do not change, but an option "--out file"
will send all output above only to the specified file.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit b7d8c1483bbf6ec9d2dd76d6a1c91a38c3f6ac35) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
some tools processing turbostat output
have difficulty with items that begin with %...
Reported-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 75d2e44e60490ba1fee076a5f4dcfbdc8598e8c4) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Following changes have been made:
- changed MSR_NHM_TURBO_RATIO_LIMIT to MSR_TURBO_RATIO_LIMIT in debug print
for consistency with Developer Manual
- updated definition of bitfields in MSR_TURBO_RATIO_LIMIT and appropriate
parsing code
- added x200 to list of architectures that do not support Nahlem compatible
definition of MSR_TURBO_RATIO_LIMIT register (x200 has the register but
bits definition is custom)
- fixed typo in code that parses MSR_TURBO_RATIO_LIMIT
(logical instead of bitwise operator)
- changed MSR_TURBO_RATIO_LIMIT parsing algorithm so the print out had the
same order as implementations for other platforms
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit cbf97abaf3689652bcddc0741dc49629d1838142) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
x200 does not enable any way to programmatically obtain bus clock
speed. Bclk for the architecture has a fixed value of 100 MHz.
At the same time x200 cannot be included in has_snb_msrs since
it does not support C7 idle state.
prior to this patch, MHz values reported on this chip
were erroneously calculated using bclk of 133MHz,
causing MHz values to be reported 33% higher than actual.
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 121b48bb77187cf2ed3053e147d2e6c1e864083c) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
will sample and display statistics every interval_sec.
interval_sec used to be a whole number of seconds,
but now we accept a decimal, as small as 0.001 sec (1 ms).
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 2a0609c02e6558df6075f258af98a54a74b050ff) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
When building with gcc 6 we're getting various build warnings that just
require some trivial function declaration and call fixes:
turbostat.c: In function ‘dump_cstate_pstate_config_info’:
turbostat.c:1973:1: warning: type of ‘family’ defaults to ‘int’
dump_cstate_pstate_config_info(family, model)
turbostat.c:1973:1: warning: type of ‘model’ defaults to ‘int’
turbostat.c: In function ‘get_tdp’:
turbostat.c:2145:8: warning: type of ‘model’ defaults to ‘int’
double get_tdp(model)
turbostat.c: In function ‘perf_limit_reasons_probe’:
turbostat.c:2259:6: warning: type of ‘family’ defaults to ‘int’
void perf_limit_reasons_probe(family, model)
turbostat.c:2259:6: warning: type of ‘model’ defaults to ‘int’
Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-wbicer8n0s9qe6ql8h9x478e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from commit 1b69317d2dc80bc8e1d005e1a771c4f5bff3dabe) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
This MSR is helpful to show if P-state HW coordination
is enabled or disabled.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit f0057310b40efe9f797ff337e9464e6a6fb9d782) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 7f5c258e1ce1e0909d3195694ac79f143051e513) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 61a87ba7893a256d86c7eea6a7ab10d38ccac9b2) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
tools/power/x86/turbostat/turbostat.c
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 69807a638f91524ed75027f808cd277417ecee7a) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
MSR_PLATFORM_INFO is the new name for MSR_NHM_PLATFORM_INFO
no functional change
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ec0adc539b8bf59b7c00db0748671f6594b77843) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=6 lock=0)
should print all 7 bits of MAX_NON_TURBO_RATIO (in decimal):
MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=22 lock=0)
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 759d2a932b82009a7039ef5567e7dcba153ce123) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
base_mhz is calculated directly from the base_ratio
reported in MSR_NHM_PLATFORM_INFO * bclk,
and bclk is discovered via MSR or cpuid.
This reduces the dependency of Bzy_MHz calculation on the TSC.
Previously, there were 4 TSC readings required in each caculation,
the raw TSC delta combined with the measurement_interval.
This also removes the "tsc_tweak" correction factor used when
TSC runs on a different base clock from the CPU's bclk.
After this change, tsc_tweak is used only for %Busy.
The end-result should be a Bzy_MHz result slightly less prone to jitter.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 21ed5574d1622118b49b0c6342acc8d27d0799be) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
On a Skylake with 1500MHz base frequency,
the TSC runs at 1512MHz.
This is because the TSC is no longer in the n*100 MHz BCLK domain,
but is now in the m*24MHz crystal clock domain. (24 MHz * 63 = 1512 MHz)
This adds error to several calculations in turbostat,
unless the TSC sample sizes are adjusted for this difference.
Note that calculations in the time domain are immune
from this issue, as the timing sub-system has already
calibrated the TSC against a known wall clock.
AVG_MHz = APERF_delta/measurement_interval
need no adjustment. APERF_delta is in the BCLK domain,
and measurement_interval is in the time domain.
TSC_MHz = TSC_delta/measurement_interval
needs no adjustment -- as we really do want to report
the actual measured TSC delta here, and measurement_interval
is in the accurate time domain.
%Busy = MPERF_delta/TSC_delta
needs adjustment to use TSC_BCLK_DOMAIN_delta.
TSC_BCLK_DOMAIN_delta = TSC_delta * base_hz / tsc_hz
Note that in this example, the "Before" Bzy_MHz
was reported as exceeding the 2200 max turbo rate.
Also, even a pinned spin loop would not be reported
as over 99% busy.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit a2b7b74945dbfe5d734eafe8aa52f9f1f8bc6931) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
KNL increments APERF and MPERF every 1024 clocks.
This is compliant with the architecture specification,
which requires that only the ratio of APERF/MPERF need be valid.
However, turbostat takes advantage of the fact that these
two MSRs increment every un-halted clock
at the actual and base frequency:
AVG_MHz = APERF_delta/measurement_interval
%Busy = MPERF_delta/TSC_delta
This quirk is needed for these calculations to also work on KNL,
which would otherwise show a value 1024x smaller than expected.
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit b2b34dfe4d9aa4c468fc363b3b666974783ed1f9) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Staring in Linux-4.3-rc1,
commit 6fb3143b561c ("tools/power turbostat: dump CONFIG_TDP")
touches MSR 0x648, which is not supported on IVB-Xeon.
This results in "turbostat --debug" exiting on those systems:
Remove IVB-Xeon from the list of machines supporting with that MSR.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 756357b8e4b072fd5ee86421f794e071a348802b) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Len Brown [Fri, 24 Jul 2015 14:35:23 +0000 (10:35 -0400)]
tools/power turbostat: fix typo on DRAM column in Joules-mode
< RAM_W
> RAM_J
Reported-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit bd6906ed3d7a00d55c9bd368a640ef83bb487d1d) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
turbostat supports forked command when sampling cpu state. However,
the forked command is not allowed to be executed with options, otherwise
turbostat might regard these options as invalid turbostat options.
Reported-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit a01e72fbc41e322ed229465de8b595a7e3ec6549) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Config TDP is a feature that allows parts to be configured
for different thermal limits after they have left the factory.
This can have an effect on the operation of the part,
particularly in determiniing...
Max Non-turbo Ratio
Turbo Activation Ratio
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 6fb3143b561c4a7865e5513eeb02d42ef38e8173) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The --debug option reads a number of per-package MSRs.
Previously we explicitly read them on cpu0, but recently
turbostat changed to read them on the current "base_cpu".
Update the print-out to reflect base_cpu, rather than
the hard-coded cpu0.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit bfae2052265cde825afaba35eb3a4d3889432734) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Remove reference to the original Nehalem Turbo white paper,
since it has moved, and these mechanisms have now long since
been documented in the Software Developer's Manual.
Reported-by: Jeremie Lagraviere <jeremie@simula.no> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 75fd7ffa7fab91c2c3234bd3e465ba4f366733f4) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Test the cipher suite initialization in case ICV length has a value
different than its default. If this test fails, creation of a new macsec
link will also fail. This avoids situations where further security
associations can't be added due to failures of crypto_aead_setauthsize(),
caused by unsupported user-provided values of the ICV length.
Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f04c392d2dd97a985878f4380a1b054791301acf)
preserve the return value of AEAD functions that are called when a SA is
created, to avoid inappropriate display of "RTNETLINK answers: Cannot
allocate memory" message.
Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 34aedfee22967236adc3d9147c8b47b7f5bad26c)
IEEE 802.1AE-2006 standard recommends that the ICV element in a MACsec
frame should not exceed 16 octets: add MACSEC_STD_ICV_LEN in uapi
definitions accordingly, and avoid accepting configurations where the ICV
length exceeds the standard value. Leave definition of MACSEC_MAX_ICV_LEN
unchanged for backwards compatibility with userspace programs.
Fixes: dece8d2b78d1 ("uapi: add MACsec bits") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2ccbe2cb79f2f74ab739252299b6f9ff27586f2c)
Paolo Abeni [Thu, 14 Jul 2016 16:00:10 +0000 (18:00 +0200)]
vlan: use a valid default mtu value for vlan over macsec
macsec can't cope with mtu frames which need vlan tag insertion, and
vlan device set the default mtu equal to the underlying dev's one.
By default vlan over macsec devices use invalid mtu, dropping
all the large packets.
This patch adds a netif helper to check if an upper vlan device
needs mtu reduction. The helper is used during vlan devices
initialization to set a valid default and during mtu updating to
forbid invalid, too bit, mtu values.
The helper currently only check if the lower dev is a macsec device,
if we get more users, we need to update only the helper (possibly
reserving an additional IFF bit).
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 18d3df3eab23796d7f852f9c6bb60962b8372ced)
Sabrina Dubroca [Wed, 18 May 2016 11:34:40 +0000 (13:34 +0200)]
macsec: fix netlink attribute for key id
In my last commit I replaced MACSEC_SA_ATTR_KEYID by
MACSEC_SA_ATTR_KEY.
Fixes: 8acca6acebd0 ("macsec: key identifier is 128 bits, not 64") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1968a0b8b6ca088bc029bd99ee696f1aca4090d0)
Sabrina Dubroca [Sat, 7 May 2016 18:19:29 +0000 (20:19 +0200)]
macsec: key identifier is 128 bits, not 64
The MACsec standard mentions a key identifier for each key, but
doesn't specify anything about it, so I arbitrarily chose 64 bits.
IEEE 802.1X-2010 specifies MKA (MACsec Key Agreement), and defines the
key identifier to be 128 bits (96 bits "member identifier" + 32 bits
"key number").
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8acca6acebd07b238af2e61e4f7d55e6232c7e3a)
Daniel Borkmann [Thu, 30 Jun 2016 22:00:54 +0000 (00:00 +0200)]
macsec: set actual real device for xmit when !protect_frames
Avoid recursions of dev_queue_xmit() to the wrong net device when
frames are unprotected, since at that time skb->dev still points to
our own macsec dev and unlike macsec_encrypt_finish() dev pointer
doesn't get updated to real underlying device.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 79c62220d74a4a3f961a2cb7320da09eebf5daf7)
Sabrina Dubroca [Tue, 14 Jun 2016 13:25:15 +0000 (15:25 +0200)]
macsec: allocate sg and iv on the heap
For the crypto callbacks to work properly, we cannot have sg and iv on
the stack. Use kmalloc instead, with a single allocation for
aead_request + scatterlist + iv.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5d9649b3a524df9ccd15ac25ad6591089260fdbd)
Phil Sutter [Fri, 22 Apr 2016 12:02:42 +0000 (14:02 +0200)]
macsec: Convert to using IFF_NO_QUEUE
Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e425974feaa545575135f04e646f0495439b4c54)
macsec_validate_attr should check IFLA_MACSEC_REPLAY_PROTECT (not
IFLA_MACSEC_PROTECT) to verify that the replay protection and replay
window arguments are correct.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4b1fb9352f351faa067a914907d58a6fe38ac048)
I accidentally forgot some MACSEC_ prefixes in if_macsec.h.
Fixes: dece8d2b78d1 ("uapi: add MACsec bits") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 748164802c1bd2c52937d20782b07d8c68dd9a4f)
macsec: fix memory leaks around rx_handler (un)registration
We leak a struct macsec_rxh_data when we unregister the rx_handler in
macsec_dellink.
We also leak a struct macsec_rxh_data in register_macsec_dev if we fail
to register the rx_handler.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 960d5848dbf1245cc3a310109897937207411c0c)
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Suggested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 96cfc5052c5d434563873caa7707b32b9e389b16)
macsec: fix rx_sa refcounting with decrypt callback
The decrypt callback macsec_decrypt_done needs a reference on the rx_sa
and releases it before returning, but macsec_handle_frame already
put that reference after macsec_decrypt returned NULL.
Set rx_sa to NULL when the decrypt callback runs so that
macsec_handle_frame knows it must not release the reference.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c3b7d0bd7ac2c501d4806db71ddd383c184968e8)
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c10c63ea739bce3b8a6ab85c7bb472d900c9b070)
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 72f2a05b8f367ee0d75584a6fbec7dbe7c144f27)
Sabrina Dubroca [Fri, 11 Mar 2016 17:07:33 +0000 (18:07 +0100)]
macsec: introduce IEEE 802.1AE driver
This is an implementation of MACsec/IEEE 802.1AE. This driver
provides authentication and encryption of traffic in a LAN, typically
with GCM-AES-128, and optional replay protection.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c09440f7dcb304002dfced8c0fea289eb25f2da0)
Sabrina Dubroca [Fri, 11 Mar 2016 17:07:32 +0000 (18:07 +0100)]
net: add MACsec netdevice priv_flags and helper
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3c17578473b9be5a6e7680a45ea97e1d56e13249)
Sabrina Dubroca [Fri, 11 Mar 2016 17:07:31 +0000 (18:07 +0100)]
uapi: add MACsec bits
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit dece8d2b78d19df7fe5e4e965f1f0d1a3e188d1b)
Phil Sutter [Thu, 13 Aug 2015 17:01:06 +0000 (19:01 +0200)]
net: declare new net_device priv_flag IFF_NO_QUEUE
This private net_device flag can be set by drivers to inform that a
device runs fine without a qdisc attached. This was formerly done by
setting tx_queue_len to zero.
Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fa8187c96471c49419c25d4ec3299d17d3f274b2)
Herbert Xu [Thu, 21 May 2015 07:11:01 +0000 (15:11 +0800)]
crypto: aead - Add new interface with single SG list
The primary user of AEAD, IPsec includes the IV in the AD in
most cases, except where it is implicitly authenticated by the
underlying algorithm.
The way it is currently implemented is a hack because we pass
the data in piecemeal and the underlying algorithms try to stitch
them back up into one piece.
This is why this patch is adding a new interface that allows a
single SG list to be passed in that contains everything so the
algorithm implementors do not have to stitch.
The new interface accepts a single source SG list and a single
destination SG list. Both must be laid out as follows:
AD, skipped data, plain/cipher text, ICV
The ICV is not present from the source during encryption and from
the destination during decryption.
For the top-level IPsec AEAD algorithm the plain/cipher text will
contain the generated (or received) IV.
Herbert Xu [Thu, 21 May 2015 07:10:59 +0000 (15:10 +0800)]
crypto: scatterwalk - Add scatterwalk_ffwd helper
This patch adds the scatterwalk_ffwd helper which can create an
SG list that starts in the middle of an existing SG list. The
new list may either be part of the existing list or be a chain
that latches onto part of the existing list.
Herbert Xu [Mon, 11 May 2015 09:47:39 +0000 (17:47 +0800)]
crypto: api - Add crypto_grab_spawn primitive
This patch adds a new primitive crypto_grab_spawn which is meant
to replace crypto_init_spawn and crypto_init_spawn2. Under the
new scheme the user no longer has to worry about reference counting
the alg object before it is subsumed by the spawn.
It is pretty much an exact copy of crypto_grab_aead.
Prior to calling this function spawn->frontend and spawn->inst
must have been set.
Herbert Xu [Thu, 23 Apr 2015 08:37:46 +0000 (16:37 +0800)]
crypto: aead - Fix corner case in crypto_lookup_aead
When the user explicitly states that they don't care whether the
algorithm has been tested (type = CRYPTO_ALG_TESTED and mask = 0),
there is a corner case where we may erroneously return ENOENT.
This patch fixes it by correcting the logic in the test.
Herbert Xu [Thu, 23 Apr 2015 06:48:05 +0000 (14:48 +0800)]
crypto: api - Fix build error when modules are disabled
The commit 59afdc7b32143528524455039e7557a46b60e4c8 ("crypto:
api - Move module sig ifdef into accessor function") broke the
build when modules are completely disabled because we directly
dereference module->name.
This patch fixes this by using the accessor function module_name.
Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit bd4a7c69aaed79ae1a299db8063fe4daf5e4a2f1)
Herbert Xu [Wed, 22 Apr 2015 07:06:33 +0000 (15:06 +0800)]
mac802154: Include crypto/aead.h
All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.
This patch also removes a bogus inclusion of algapi.h which should
only be used by algorithm/driver implementors and not crypto users.
Instead linux/crypto.h is added which is necessary because mac802154
also uses blkcipher in addition to aead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8a1a2b717e0d4d5f3e3bb59b7dee5079a15ab24b)
Herbert Xu [Wed, 22 Apr 2015 07:06:32 +0000 (15:06 +0800)]
mac80211: Include crypto/aead.h
All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d8fe0ddd0256711e9cf2c539e77c341daa0eb2cf)
Herbert Xu [Wed, 22 Apr 2015 07:06:31 +0000 (15:06 +0800)]
crypto: testmgr - Include crypto/aead.h
All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1ce3311580515268504db929c3219e57927b9e6a)
Herbert Xu [Wed, 22 Apr 2015 07:06:30 +0000 (15:06 +0800)]
crypto: tcrypt - Include crypto/aead.h
All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1ce5a04d9f3e8f1f25aa1d6da8612bfeeca7b306)
Herbert Xu [Wed, 22 Apr 2015 03:28:46 +0000 (11:28 +0800)]
crypto: api - Move module sig ifdef into accessor function
Currently we're hiding mod->sig_ok under an ifdef in open code.
This patch adds a module_sig_ok accessor function and removes that
ifdef.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
(cherry picked from commit 59afdc7b32143528524455039e7557a46b60e4c8)
When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
tail of the write queue using tcp_add_write_queue_tail()
Then it attempts to copy user data into this fresh skb.
If the copy fails, we undo the work and remove the fresh skb.
Unfortunately, this undo lacks the change done to tp->highest_sack and
we can leave a dangling pointer (to a freed skb)
Later, tcp_xmit_retransmit_queue() can dereference this pointer and
access freed memory. For regular kernels where memory is not unmapped,
this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
returning garbage instead of tp->snd_nxt, but with various debug
features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.
This bug was found by Marco Grassi thanks to syzkaller.
Fixes: 6859d49475d4 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb") Reported-by: Marco Grassi <marco.gra@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bb1fceca22492109be12640d49f5ea5a544c6bb4) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>