Sometimes the rc6 sysfs counter spontaneously resets,
causing turbostat prints a very large number
as it tries to calcuate % = 100 * (old - new) / interval
When we see (old > new), print ***.**% instead
of a bogus huge number.
Note that this detection is not fool-proof, as the counter
could reset several times and still result in new > old.
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 9185e988e9d5bb70b690362e84bb2e4a9d71f2c5) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit cdc57272ea0a0e952c4609b56e157e4d0ec8e956) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ec53e594c65ab099ca784d62b6f4c191e3a4d7cc) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit e8efbc80db5e824ce2382d5e65429b6b493e71e2) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit e4085d543e256aff6606ba99ed257f7c06685f3b) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Some processors use the Interrupt Response Time Limit (IRTL) MSR value
to describe the maximum IRQ response time latency for deep
package C-states. (Though others have the register, but do not use it)
Lets print it out to give insight into the cases where it is used.
IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10.
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 5a63426e2a18775ed05b20e3bc90c68bacb1f68a) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The CPUID.SGX bit was printed, even if --debug was used
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 8ae7225591fd15aac89769cbebb3b5ecc8b12fe5) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
MSR_CONFIG_TDP_NOMINAL:
should print all 8 bits of base_ratio (bit 0:7) 0xFF
MSR_CONFIG_TDP_LEVEL_1:
should print all 15 bits of PKG_MIN_PWR_LVL1 (bit 48:62) 0x7FFF
should print all 15 bits of PKG_MAX_PWR_LVL1 (bit 32:46) 0x7FFF
should print all 8 bits of LVL1_RATIO (bit 16:23) 0xFF
should print all 15 bits of PKG_TDP_LVL1 (bit 0:14) 0x7FFF
And the same modification to MSR_CONFIG_TDP_LEVEL_2.
MSR_TURBO_ACTIVATION_RATIO:
should print all 8 bits of MAX_NON_TURBO_RATIO (bit 0:7) 0xFF
Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 685b535b2cdb9cdf354321f8af9ed17dcf19d19f) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=0: unlimited)
should print as
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=8: unlimited)
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 6c34f160df82ae07bfff5b4c51f50622e9c2759e) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
turbostat already checks whether calling each cpuid leavf is legal,
and it doesn't look at the function return value,
so call the simpler gcc intrinsic __cpuid() instead of __get_cpuid().
syntax only, no functional change
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 5aea2f7f645b27635b856311dee5b775d277c686) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
SGX presence is related to a SKL power workaround,
so lets show when that is enabled.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit aa8d8cc79af16e16da04efff1c1a72b1ea4a9e7e) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The accuracy of Bzy_Mhz and Busy% depend on reading
the TSC, APERF, and MPERF close together in time.
When there is a very short measurement interval,
or a large system is profoundly idle, the changes
in APERF and MPERF may be very small.
They can be small enough that an expensive interrupt
between reading APERF and MPERF can cause the APERF/MPERF
ratio to become inaccurate, resulting in invalid
calculation and display of Bzy_MHz.
A dummy APERF read of APERF makes this problem
much more rare. Apparently this 1st systemn call
after exiting a long stretch of idle is when we
typically see expensive timer interrupts that cause
large jitter.
For the cases that dummy APERF read fails to prevent,
we compare the latency of the APERF and MPERF reads.
If they differ by more than 2x, we re-issue them.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 0102b06747c7d24e334d2b27c4b43eed693676f1) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The column "GFX%c6" show the percentage of time the GPU
is in the "render C6" state, rc6. Deep package C-states on several
systems depend on the GPU being in RC6.
This information comes from the counter
/sys/class/drm/card0/power/rc6_residency_ms,
as read before and after the measurement interval.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit fdf676e51f301d207586d9bac509b8ce055bae8a) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Under the column "GFXMHz", show a snapshot of this attribute:
/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz
This is an instantaneous snapshot of what sysfs presents
at the end of the measurement interval. turbostat does
not average or otherwise perform any math on this value.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 27d47356b6dfa92042a17a0b474f08910d4c8e8f) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The new IRQ column shows how many interrupts have occurred on each CPU
during the measurement inteval. This information comes from
the difference between /proc/interrupts shapshots made before
and after the measurement interval.
The first row, the system summary, shows the sum of the IRQS
for all CPUs during that interval.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 562a2d377bb9882c49debc9e1be7127a1717e242) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
skip the open(2)/close(2) on each msr read
by keeping the /dev/cpu/*/msr files open.
The remaining read(2) is generally far fewer cycles
than the removed open(2) system call.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 36229897ba966bb0dc9e060222ff17b198252367) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 58cc30a4e6086d542302e04eea39b2a438e4c5dd) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
tools/power/x86/turbostat/turbostat.c
Turbostat --debug gconfiguration info goes to stderr.
In FORK mode, turbostat statistics go to stderr.
In PERIODIC mode, turbostat statistics go to stdout.
These defaults do not change, but an option "--out file"
will send all output above only to the specified file.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit b7d8c1483bbf6ec9d2dd76d6a1c91a38c3f6ac35) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
some tools processing turbostat output
have difficulty with items that begin with %...
Reported-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 75d2e44e60490ba1fee076a5f4dcfbdc8598e8c4) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Following changes have been made:
- changed MSR_NHM_TURBO_RATIO_LIMIT to MSR_TURBO_RATIO_LIMIT in debug print
for consistency with Developer Manual
- updated definition of bitfields in MSR_TURBO_RATIO_LIMIT and appropriate
parsing code
- added x200 to list of architectures that do not support Nahlem compatible
definition of MSR_TURBO_RATIO_LIMIT register (x200 has the register but
bits definition is custom)
- fixed typo in code that parses MSR_TURBO_RATIO_LIMIT
(logical instead of bitwise operator)
- changed MSR_TURBO_RATIO_LIMIT parsing algorithm so the print out had the
same order as implementations for other platforms
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit cbf97abaf3689652bcddc0741dc49629d1838142) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
x200 does not enable any way to programmatically obtain bus clock
speed. Bclk for the architecture has a fixed value of 100 MHz.
At the same time x200 cannot be included in has_snb_msrs since
it does not support C7 idle state.
prior to this patch, MHz values reported on this chip
were erroneously calculated using bclk of 133MHz,
causing MHz values to be reported 33% higher than actual.
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 121b48bb77187cf2ed3053e147d2e6c1e864083c) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
will sample and display statistics every interval_sec.
interval_sec used to be a whole number of seconds,
but now we accept a decimal, as small as 0.001 sec (1 ms).
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 2a0609c02e6558df6075f258af98a54a74b050ff) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
When building with gcc 6 we're getting various build warnings that just
require some trivial function declaration and call fixes:
turbostat.c: In function ‘dump_cstate_pstate_config_info’:
turbostat.c:1973:1: warning: type of ‘family’ defaults to ‘int’
dump_cstate_pstate_config_info(family, model)
turbostat.c:1973:1: warning: type of ‘model’ defaults to ‘int’
turbostat.c: In function ‘get_tdp’:
turbostat.c:2145:8: warning: type of ‘model’ defaults to ‘int’
double get_tdp(model)
turbostat.c: In function ‘perf_limit_reasons_probe’:
turbostat.c:2259:6: warning: type of ‘family’ defaults to ‘int’
void perf_limit_reasons_probe(family, model)
turbostat.c:2259:6: warning: type of ‘model’ defaults to ‘int’
Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-wbicer8n0s9qe6ql8h9x478e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from commit 1b69317d2dc80bc8e1d005e1a771c4f5bff3dabe) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
This MSR is helpful to show if P-state HW coordination
is enabled or disabled.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit f0057310b40efe9f797ff337e9464e6a6fb9d782) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 7f5c258e1ce1e0909d3195694ac79f143051e513) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 61a87ba7893a256d86c7eea6a7ab10d38ccac9b2) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
tools/power/x86/turbostat/turbostat.c
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 69807a638f91524ed75027f808cd277417ecee7a) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
MSR_PLATFORM_INFO is the new name for MSR_NHM_PLATFORM_INFO
no functional change
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ec0adc539b8bf59b7c00db0748671f6594b77843) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=6 lock=0)
should print all 7 bits of MAX_NON_TURBO_RATIO (in decimal):
MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=22 lock=0)
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 759d2a932b82009a7039ef5567e7dcba153ce123) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
base_mhz is calculated directly from the base_ratio
reported in MSR_NHM_PLATFORM_INFO * bclk,
and bclk is discovered via MSR or cpuid.
This reduces the dependency of Bzy_MHz calculation on the TSC.
Previously, there were 4 TSC readings required in each caculation,
the raw TSC delta combined with the measurement_interval.
This also removes the "tsc_tweak" correction factor used when
TSC runs on a different base clock from the CPU's bclk.
After this change, tsc_tweak is used only for %Busy.
The end-result should be a Bzy_MHz result slightly less prone to jitter.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 21ed5574d1622118b49b0c6342acc8d27d0799be) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
On a Skylake with 1500MHz base frequency,
the TSC runs at 1512MHz.
This is because the TSC is no longer in the n*100 MHz BCLK domain,
but is now in the m*24MHz crystal clock domain. (24 MHz * 63 = 1512 MHz)
This adds error to several calculations in turbostat,
unless the TSC sample sizes are adjusted for this difference.
Note that calculations in the time domain are immune
from this issue, as the timing sub-system has already
calibrated the TSC against a known wall clock.
AVG_MHz = APERF_delta/measurement_interval
need no adjustment. APERF_delta is in the BCLK domain,
and measurement_interval is in the time domain.
TSC_MHz = TSC_delta/measurement_interval
needs no adjustment -- as we really do want to report
the actual measured TSC delta here, and measurement_interval
is in the accurate time domain.
%Busy = MPERF_delta/TSC_delta
needs adjustment to use TSC_BCLK_DOMAIN_delta.
TSC_BCLK_DOMAIN_delta = TSC_delta * base_hz / tsc_hz
Note that in this example, the "Before" Bzy_MHz
was reported as exceeding the 2200 max turbo rate.
Also, even a pinned spin loop would not be reported
as over 99% busy.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit a2b7b74945dbfe5d734eafe8aa52f9f1f8bc6931) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
KNL increments APERF and MPERF every 1024 clocks.
This is compliant with the architecture specification,
which requires that only the ratio of APERF/MPERF need be valid.
However, turbostat takes advantage of the fact that these
two MSRs increment every un-halted clock
at the actual and base frequency:
AVG_MHz = APERF_delta/measurement_interval
%Busy = MPERF_delta/TSC_delta
This quirk is needed for these calculations to also work on KNL,
which would otherwise show a value 1024x smaller than expected.
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit b2b34dfe4d9aa4c468fc363b3b666974783ed1f9) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Staring in Linux-4.3-rc1,
commit 6fb3143b561c ("tools/power turbostat: dump CONFIG_TDP")
touches MSR 0x648, which is not supported on IVB-Xeon.
This results in "turbostat --debug" exiting on those systems:
Remove IVB-Xeon from the list of machines supporting with that MSR.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 756357b8e4b072fd5ee86421f794e071a348802b) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Len Brown [Fri, 24 Jul 2015 14:35:23 +0000 (10:35 -0400)]
tools/power turbostat: fix typo on DRAM column in Joules-mode
< RAM_W
> RAM_J
Reported-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit bd6906ed3d7a00d55c9bd368a640ef83bb487d1d) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
turbostat supports forked command when sampling cpu state. However,
the forked command is not allowed to be executed with options, otherwise
turbostat might regard these options as invalid turbostat options.
Reported-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit a01e72fbc41e322ed229465de8b595a7e3ec6549) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Config TDP is a feature that allows parts to be configured
for different thermal limits after they have left the factory.
This can have an effect on the operation of the part,
particularly in determiniing...
Max Non-turbo Ratio
Turbo Activation Ratio
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 6fb3143b561c4a7865e5513eeb02d42ef38e8173) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
The --debug option reads a number of per-package MSRs.
Previously we explicitly read them on cpu0, but recently
turbostat changed to read them on the current "base_cpu".
Update the print-out to reflect base_cpu, rather than
the hard-coded cpu0.
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit bfae2052265cde825afaba35eb3a4d3889432734) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Remove reference to the original Nehalem Turbo white paper,
since it has moved, and these mechanisms have now long since
been documented in the Software Developer's Manual.
Reported-by: Jeremie Lagraviere <jeremie@simula.no> Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 75fd7ffa7fab91c2c3234bd3e465ba4f366733f4) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Test the cipher suite initialization in case ICV length has a value
different than its default. If this test fails, creation of a new macsec
link will also fail. This avoids situations where further security
associations can't be added due to failures of crypto_aead_setauthsize(),
caused by unsupported user-provided values of the ICV length.
Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f04c392d2dd97a985878f4380a1b054791301acf)
preserve the return value of AEAD functions that are called when a SA is
created, to avoid inappropriate display of "RTNETLINK answers: Cannot
allocate memory" message.
Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 34aedfee22967236adc3d9147c8b47b7f5bad26c)
IEEE 802.1AE-2006 standard recommends that the ICV element in a MACsec
frame should not exceed 16 octets: add MACSEC_STD_ICV_LEN in uapi
definitions accordingly, and avoid accepting configurations where the ICV
length exceeds the standard value. Leave definition of MACSEC_MAX_ICV_LEN
unchanged for backwards compatibility with userspace programs.
Fixes: dece8d2b78d1 ("uapi: add MACsec bits") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2ccbe2cb79f2f74ab739252299b6f9ff27586f2c)
Paolo Abeni [Thu, 14 Jul 2016 16:00:10 +0000 (18:00 +0200)]
vlan: use a valid default mtu value for vlan over macsec
macsec can't cope with mtu frames which need vlan tag insertion, and
vlan device set the default mtu equal to the underlying dev's one.
By default vlan over macsec devices use invalid mtu, dropping
all the large packets.
This patch adds a netif helper to check if an upper vlan device
needs mtu reduction. The helper is used during vlan devices
initialization to set a valid default and during mtu updating to
forbid invalid, too bit, mtu values.
The helper currently only check if the lower dev is a macsec device,
if we get more users, we need to update only the helper (possibly
reserving an additional IFF bit).
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 18d3df3eab23796d7f852f9c6bb60962b8372ced)
Sabrina Dubroca [Wed, 18 May 2016 11:34:40 +0000 (13:34 +0200)]
macsec: fix netlink attribute for key id
In my last commit I replaced MACSEC_SA_ATTR_KEYID by
MACSEC_SA_ATTR_KEY.
Fixes: 8acca6acebd0 ("macsec: key identifier is 128 bits, not 64") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1968a0b8b6ca088bc029bd99ee696f1aca4090d0)
Sabrina Dubroca [Sat, 7 May 2016 18:19:29 +0000 (20:19 +0200)]
macsec: key identifier is 128 bits, not 64
The MACsec standard mentions a key identifier for each key, but
doesn't specify anything about it, so I arbitrarily chose 64 bits.
IEEE 802.1X-2010 specifies MKA (MACsec Key Agreement), and defines the
key identifier to be 128 bits (96 bits "member identifier" + 32 bits
"key number").
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8acca6acebd07b238af2e61e4f7d55e6232c7e3a)
Daniel Borkmann [Thu, 30 Jun 2016 22:00:54 +0000 (00:00 +0200)]
macsec: set actual real device for xmit when !protect_frames
Avoid recursions of dev_queue_xmit() to the wrong net device when
frames are unprotected, since at that time skb->dev still points to
our own macsec dev and unlike macsec_encrypt_finish() dev pointer
doesn't get updated to real underlying device.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 79c62220d74a4a3f961a2cb7320da09eebf5daf7)
Sabrina Dubroca [Tue, 14 Jun 2016 13:25:15 +0000 (15:25 +0200)]
macsec: allocate sg and iv on the heap
For the crypto callbacks to work properly, we cannot have sg and iv on
the stack. Use kmalloc instead, with a single allocation for
aead_request + scatterlist + iv.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5d9649b3a524df9ccd15ac25ad6591089260fdbd)
Phil Sutter [Fri, 22 Apr 2016 12:02:42 +0000 (14:02 +0200)]
macsec: Convert to using IFF_NO_QUEUE
Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e425974feaa545575135f04e646f0495439b4c54)
macsec_validate_attr should check IFLA_MACSEC_REPLAY_PROTECT (not
IFLA_MACSEC_PROTECT) to verify that the replay protection and replay
window arguments are correct.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4b1fb9352f351faa067a914907d58a6fe38ac048)
I accidentally forgot some MACSEC_ prefixes in if_macsec.h.
Fixes: dece8d2b78d1 ("uapi: add MACsec bits") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 748164802c1bd2c52937d20782b07d8c68dd9a4f)
macsec: fix memory leaks around rx_handler (un)registration
We leak a struct macsec_rxh_data when we unregister the rx_handler in
macsec_dellink.
We also leak a struct macsec_rxh_data in register_macsec_dev if we fail
to register the rx_handler.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 960d5848dbf1245cc3a310109897937207411c0c)
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Suggested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 96cfc5052c5d434563873caa7707b32b9e389b16)
macsec: fix rx_sa refcounting with decrypt callback
The decrypt callback macsec_decrypt_done needs a reference on the rx_sa
and releases it before returning, but macsec_handle_frame already
put that reference after macsec_decrypt returned NULL.
Set rx_sa to NULL when the decrypt callback runs so that
macsec_handle_frame knows it must not release the reference.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c3b7d0bd7ac2c501d4806db71ddd383c184968e8)
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c10c63ea739bce3b8a6ab85c7bb472d900c9b070)
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 72f2a05b8f367ee0d75584a6fbec7dbe7c144f27)
Sabrina Dubroca [Fri, 11 Mar 2016 17:07:33 +0000 (18:07 +0100)]
macsec: introduce IEEE 802.1AE driver
This is an implementation of MACsec/IEEE 802.1AE. This driver
provides authentication and encryption of traffic in a LAN, typically
with GCM-AES-128, and optional replay protection.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c09440f7dcb304002dfced8c0fea289eb25f2da0)
Sabrina Dubroca [Fri, 11 Mar 2016 17:07:32 +0000 (18:07 +0100)]
net: add MACsec netdevice priv_flags and helper
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3c17578473b9be5a6e7680a45ea97e1d56e13249)
Sabrina Dubroca [Fri, 11 Mar 2016 17:07:31 +0000 (18:07 +0100)]
uapi: add MACsec bits
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit dece8d2b78d19df7fe5e4e965f1f0d1a3e188d1b)
Phil Sutter [Thu, 13 Aug 2015 17:01:06 +0000 (19:01 +0200)]
net: declare new net_device priv_flag IFF_NO_QUEUE
This private net_device flag can be set by drivers to inform that a
device runs fine without a qdisc attached. This was formerly done by
setting tx_queue_len to zero.
Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fa8187c96471c49419c25d4ec3299d17d3f274b2)
Herbert Xu [Thu, 21 May 2015 07:11:01 +0000 (15:11 +0800)]
crypto: aead - Add new interface with single SG list
The primary user of AEAD, IPsec includes the IV in the AD in
most cases, except where it is implicitly authenticated by the
underlying algorithm.
The way it is currently implemented is a hack because we pass
the data in piecemeal and the underlying algorithms try to stitch
them back up into one piece.
This is why this patch is adding a new interface that allows a
single SG list to be passed in that contains everything so the
algorithm implementors do not have to stitch.
The new interface accepts a single source SG list and a single
destination SG list. Both must be laid out as follows:
AD, skipped data, plain/cipher text, ICV
The ICV is not present from the source during encryption and from
the destination during decryption.
For the top-level IPsec AEAD algorithm the plain/cipher text will
contain the generated (or received) IV.
Herbert Xu [Thu, 21 May 2015 07:10:59 +0000 (15:10 +0800)]
crypto: scatterwalk - Add scatterwalk_ffwd helper
This patch adds the scatterwalk_ffwd helper which can create an
SG list that starts in the middle of an existing SG list. The
new list may either be part of the existing list or be a chain
that latches onto part of the existing list.
Herbert Xu [Mon, 11 May 2015 09:47:39 +0000 (17:47 +0800)]
crypto: api - Add crypto_grab_spawn primitive
This patch adds a new primitive crypto_grab_spawn which is meant
to replace crypto_init_spawn and crypto_init_spawn2. Under the
new scheme the user no longer has to worry about reference counting
the alg object before it is subsumed by the spawn.
It is pretty much an exact copy of crypto_grab_aead.
Prior to calling this function spawn->frontend and spawn->inst
must have been set.
Herbert Xu [Thu, 23 Apr 2015 08:37:46 +0000 (16:37 +0800)]
crypto: aead - Fix corner case in crypto_lookup_aead
When the user explicitly states that they don't care whether the
algorithm has been tested (type = CRYPTO_ALG_TESTED and mask = 0),
there is a corner case where we may erroneously return ENOENT.
This patch fixes it by correcting the logic in the test.
Herbert Xu [Thu, 23 Apr 2015 06:48:05 +0000 (14:48 +0800)]
crypto: api - Fix build error when modules are disabled
The commit 59afdc7b32143528524455039e7557a46b60e4c8 ("crypto:
api - Move module sig ifdef into accessor function") broke the
build when modules are completely disabled because we directly
dereference module->name.
This patch fixes this by using the accessor function module_name.
Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit bd4a7c69aaed79ae1a299db8063fe4daf5e4a2f1)
Herbert Xu [Wed, 22 Apr 2015 07:06:33 +0000 (15:06 +0800)]
mac802154: Include crypto/aead.h
All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.
This patch also removes a bogus inclusion of algapi.h which should
only be used by algorithm/driver implementors and not crypto users.
Instead linux/crypto.h is added which is necessary because mac802154
also uses blkcipher in addition to aead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8a1a2b717e0d4d5f3e3bb59b7dee5079a15ab24b)
Herbert Xu [Wed, 22 Apr 2015 07:06:32 +0000 (15:06 +0800)]
mac80211: Include crypto/aead.h
All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d8fe0ddd0256711e9cf2c539e77c341daa0eb2cf)
Herbert Xu [Wed, 22 Apr 2015 07:06:31 +0000 (15:06 +0800)]
crypto: testmgr - Include crypto/aead.h
All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1ce3311580515268504db929c3219e57927b9e6a)
Herbert Xu [Wed, 22 Apr 2015 07:06:30 +0000 (15:06 +0800)]
crypto: tcrypt - Include crypto/aead.h
All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1ce5a04d9f3e8f1f25aa1d6da8612bfeeca7b306)
Herbert Xu [Wed, 22 Apr 2015 03:28:46 +0000 (11:28 +0800)]
crypto: api - Move module sig ifdef into accessor function
Currently we're hiding mod->sig_ok under an ifdef in open code.
This patch adds a module_sig_ok accessor function and removes that
ifdef.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
(cherry picked from commit 59afdc7b32143528524455039e7557a46b60e4c8)
When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
tail of the write queue using tcp_add_write_queue_tail()
Then it attempts to copy user data into this fresh skb.
If the copy fails, we undo the work and remove the fresh skb.
Unfortunately, this undo lacks the change done to tp->highest_sack and
we can leave a dangling pointer (to a freed skb)
Later, tcp_xmit_retransmit_queue() can dereference this pointer and
access freed memory. For regular kernels where memory is not unmapped,
this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
returning garbage instead of tp->snd_nxt, but with various debug
features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.
This bug was found by Marco Grassi thanks to syzkaller.
Fixes: 6859d49475d4 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb") Reported-by: Marco Grassi <marco.gra@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bb1fceca22492109be12640d49f5ea5a544c6bb4) Signed-off-by: Brian Maly <brian.maly@oracle.com> Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Hugh Dickins [Fri, 11 Dec 2015 21:40:55 +0000 (13:40 -0800)]
tmpfs: fix shmem_evict_inode() warnings on i_blocks
Dmitry Vyukov provides a little program, autogenerated by syzkaller,
which races a fault on a mapping of a sparse memfd object, against
truncation of that object below the fault address: run repeatedly for a
few minutes, it reliably generates shmem_evict_inode()'s
WARN_ON(inode->i_blocks).
(But there's nothing specific to memfd here, nor to the fstat which it
happened to use to generate the fault: though that looked suspicious,
since a shmem_recalc_inode() had been added there recently. The same
problem can be reproduced with open+unlink in place of memfd_create, and
with fstatfs in place of fstat.)
v3.7 commit 0f3c42f522dc ("tmpfs: change final i_blocks BUG to WARNING")
explains one cause of such a warning (a race with shmem_writepage to
swap), and possible solutions; but we never took it further, and this
syzkaller incident turns out to have a different cause.
shmem_getpage_gfp()'s error recovery, when a freshly allocated page is
then found to be beyond eof, looks plausible - decrementing the alloced
count that was just before incremented - but in fact can go wrong, if a
racing thread (the truncator, for example) gets its shmem_recalc_inode()
in just after our delete_from_page_cache(). delete_from_page_cache()
decrements nrpages, that shmem_recalc_inode() will balance the books by
decrementing alloced itself, then our decrement of alloced take it one
too low: leading to the WARNING when the object is finally evicted.
Once the new page has been exposed in the page cache,
shmem_getpage_gfp() must leave it to shmem_recalc_inode() itself to get
the accounting right in all cases (and not fall through from "trunc:" to
"decused:"). Adjust that error recovery block; and the reinitialization
of info and sbinfo can be removed too.
While we're here, fix shmem_writepage() to avoid the original issue: it
will be safe against a racing shmem_recalc_inode(), if it merely
increments swapped before the shmem_delete_from_page_cache() which
decrements nrpages (but it must then do its own shmem_recalc_inode()
before that, while still in balance, instead of after). (Aside: why do
we shmem_recalc_inode() here in the swap path? Because its raison d'etre
is to cope with clean sparse shmem pages being reclaimed behind our
back: so here when swapping is a good place to look for that case.) But
I've not now managed to reproduce this bug, even without the patch.
I don't see why I didn't do that earlier: perhaps inhibited by the
preference to eliminate shmem_recalc_inode() altogether. Driven by this
incident, I do now have a patch to do so at last; but still want to sit
on it for a bit, there's a couple of questions yet to be resolved.
If the ASN.1 decoder is asked to parse a sequence of objects, non-optional
matches get skipped if there's no more data to be had rather than a
data-overrun error being reported.
This is due to the code segment that decides whether to skip optional
matches (ie. matches that could get ignored because an element is marked
OPTIONAL in the grammar) due to a lack of data also skips non-optional
elements if the data pointer has reached the end of the buffer.
This can be tested with the data decoder for the new RSA akcipher algorithm
that takes three non-optional integers. Currently, it skips the last
integer if there is insufficient data.
Without the fix, #defining DEBUG in asn1_decoder.c will show something
like:
The next_op line for pc=8/13 should be followed by a match line.
This is not exploitable for X.509 certificates by means of shortening the
message and fixing up the ASN.1 CONS tags because:
(1) The relevant records being built up are cleared before use.
(2) If the message is shortened sufficiently to remove the public key, the
ASN.1 parse of the RSA key will fail quickly due to a lack of data.
(3) Extracted signature data is either turned into MPIs (which cope with a
0 length) or is simpler integers specifying algoritms and suchlike
(which can validly be 0); and
(4) The AKID and SKID extensions are optional and their removal is handled
without risking passing a NULL to asymmetric_key_generate_id().
(5) If the certificate is truncated sufficiently to remove the subject,
issuer or serialNumber then the ASN.1 decoder will fail with a 'Cons
stack underflow' return.
This is not exploitable for PKCS#7 messages by means of removal of elements
from such a message from the tail end of a sequence:
(1) Any shortened X.509 certs embedded in the PKCS#7 message are survivable
as detailed above.
(2) The message digest content isn't used if it shows a NULL pointer,
similarly, the authattrs aren't used if that shows a NULL pointer.
(3) A missing signature results in a NULL MPI - which the MPI routines deal
with.
(4) If data is NULL, it is expected that the message has detached content and
that is handled appropriately.
(5) If the serialNumber is excised, the unconditional action associated
with it will pick up the containing SEQUENCE instead, so no NULL
pointer will be seen here.
If both the issuer and the serialNumber are excised, the ASN.1 decode
will fail with an 'Unexpected tag' return.
In either case, there's no way to get to asymmetric_key_generate_id()
with a NULL pointer.
(6) Other fields are decoded to simple integers. Shortening the message
to omit an algorithm ID field will cause checks on this to fail early
in the verification process.
This can also be tested by snipping objects off of the end of the ASN.1 stream
such that mandatory tags are removed - or even from the end of internal
SEQUENCEs. If any mandatory tag is missing, the error EBADMSG *should* be
produced. Without this patch ERANGE or ENOPKG might be produced or the parse
may apparently succeed, perhaps with ENOKEY or EKEYREJECTED being produced
later, depending on what gets snipped.
Just snipping off the final BIT_STRING or OCTET_STRING from either sample
should be a start since both are mandatory and neither will cause an EBADMSG
without the patches
Reported-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marcel Holtmann <marcel@holtmann.org> Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
(cherry picked from commit 0d62e9dd6da45bbf0f33a8617afc5fe774c8f45f) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
lib/asn1_decoder.c
Vlad Tsyrklevich [Wed, 12 Oct 2016 16:51:24 +0000 (18:51 +0200)]
vfio/pci: Fix integer overflows, bitmask check
The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize
user-supplied integers, potentially allowing memory corruption. This
patch adds appropriate integer overflow checks, checks the range bounds
for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element
in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set.
VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in
vfio_pci_set_irqs_ioctl().
Furthermore, a kzalloc is changed to a kcalloc because the use of a
kzalloc with an integer multiplication allowed an integer overflow
condition to be reached without this patch. kcalloc checks for overflow
and should prevent a similar occurrence.
Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
(cherry picked from commit 05692d7005a364add85c6e25a6c4447ce08f913a)
Neerav Parikh [Tue, 7 Jun 2016 16:14:55 +0000 (09:14 -0700)]
i40e: Don't notify client(s) for DCB changes on all VSIs
When LLDP/DCBX change happens the i40e driver code flow tried to
notify the client(s) for each of the PF VSIs. This resulted into
kernel panic on the first VSI that didn't have any netdev
associated to it.
The DCB change notification to the client(s) should be done only
once for the PF/LAN VSI where the client(s) instances have been
added to. Also, move the notification call after the PF driver has
made changes related to the updated DCB configuration.
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com> Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com> Tested-by: Ronald J Bynoe <ronald.j.bynoe@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 85a1aab79c54c7e44cb0f98e5aa797fbb0457866)
When drivers express support for TSO of encapsulated packets, they
only mean that they can do it for one layer of encapsulation.
Supporting additional levels would mean updating, at a minimum,
more IP length fields and they are unaware of this.
No encapsulation device expresses support for handling offloaded
encapsulated packets, so we won't generate these types of frames
in the transmit path. However, GRO doesn't have a check for
multiple levels of encapsulation and will attempt to build them.
UDP tunnel GRO actually does prevent this situation but it only
handles multiple UDP tunnels stacked on top of each other. This
generalizes that solution to prevent any kind of tunnel stacking
that would cause problems.
Fixes: bf5a755f ("net-gre-gro: Add GRE support to the GRO stack") Signed-off-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fac8e0f579695a3ecbc4d3cac369139d7f819971) Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/ipv4/af_inet.c
net/ipv6/ip6_offload.c
Backport of caaee6234d05a58c5b4d05e7bf766131b810a657 ("ptrace: use fsuid,
fsgid, effective creds for fs access checks") to v4.1 failed to update the
mode parameter in the mm_access() call in pagemap_read() to have one of the
new PTRACE_MODE_*CREDS flags.
Attempting to read any other process' pagemap results in a WARN()
Vitaly Kuznetsov [Fri, 22 Jan 2016 17:31:53 +0000 (18:31 +0100)]
clocksource: Allow unregistering the watchdog
Hyper-V vmbus module registers TSC page clocksource when loaded. This is
the clocksource with the highest rating and thus it becomes the watchdog
making unloading of the vmbus module impossible.
Separate clocksource_select_watchdog() from clocksource_enqueue_watchdog()
and use it on clocksource register/rating change/unregister.
After all, lobotomized monkeys may need some love too.
create_pending_snapshot() will go readonly on _any_ error return from
btrfs_qgroup_inherit(). If qgroups are enabled, a user can crash their fs by
just making a snapshot and asking it to inherit from an invalid qgroup. For
example:
$ btrfs sub snap -i 1/10 /btrfs/ /btrfs/foo
Will cause a transaction abort.
Fix this by only throwing errors in btrfs_qgroup_inherit() when we know
going readonly is acceptable.
The following xfstests test case reproduces this bug:
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
here=`pwd`
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
cd /
rm -f $tmp.*
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
# remove previous $seqres.full before test
rm -f $seqres.full
# real QA test starts here
_supported_fs btrfs
_supported_os Linux
_require_scratch
rm -f $seqres.full
_scratch_mkfs
_scratch_mount
_run_btrfs_util_prog quota enable $SCRATCH_MNT
# The qgroup '1/10' does not exist and should be silently ignored
_run_btrfs_util_prog subvolume snapshot -i 1/10 $SCRATCH_MNT $SCRATCH_MNT/snap1
_scratch_unmount
echo "Silence is golden"
status=0
exit
Signed-off-by: Mark Fasheh <mfasheh@suse.de> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
(cherry picked from commit 918c2ee103cf9956f1c61d3f848dbb49fd2d104a) Signed-off-by: Ashish Samant <ashish.samant@oracle.com>