www.infradead.org Git - users/jedix/linux-maple.git/log

tools/power turbostat: work around RC6 counter wrap

Orabug: 24811361

Sometimes the rc6 sysfs counter spontaneously resets,
causing turbostat prints a very large number
as it tries to calcuate % = 100 * (old - new) / interval

When we see (old > new), print ***.**% instead
of a bogus huge number.

Note that this detection is not fool-proof, as the counter
could reset several times and still result in new > old.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 9185e988e9d5bb70b690362e84bb2e4a9d71f2c5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: initial KBL support

Orabug: 24811361

KBL is similar to SKL

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit cdc57272ea0a0e952c4609b56e157e4d0ec8e956)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: initial SKX support

Orabug: 24811361

SKX has a lot in common with HSX

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ec53e594c65ab099ca784d62b6f4c191e3a4d7cc)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: decode BXT TSC frequency via CPUID

Orabug: 24811361

Hard-code BXT ART to 19200MHz, so turbostat --debug
can fully enumerate TSC:

CPUID(0x15): eax_crystal: 3 ebx_tsc: 186 ecx_crystal_hz: 0
TSC: 1190 MHz (19200000 Hz * 186 / 3 / 1000000)

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit e8efbc80db5e824ce2382d5e65429b6b493e71e2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: initial BXT support

Orabug: 24811361

Broxton has a lot in common with SKL

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit e4085d543e256aff6606ba99ed257f7c06685f3b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: print IRTL MSRs

Orabug: 24811361

Some processors use the Interrupt Response Time Limit (IRTL) MSR value
to describe the maximum IRQ response time latency for deep
package C-states. (Though others have the register, but do not use it)
Lets print it out to give insight into the cases where it is used.

IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 5a63426e2a18775ed05b20e3bc90c68bacb1f68a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: SGX state should print only if --debug

Orabug: 24811361

The CPUID.SGX bit was printed, even if --debug was used

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 8ae7225591fd15aac89769cbebb3b5ecc8b12fe5)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: bugfix: TDP MSRs print bits fixing

Orabug: 24811361

MSR_CONFIG_TDP_NOMINAL:
should print all 8 bits of base_ratio (bit 0:7) 0xFF

MSR_CONFIG_TDP_LEVEL_1:
should print all 15 bits of PKG_MIN_PWR_LVL1 (bit 48:62) 0x7FFF
should print all 15 bits of PKG_MAX_PWR_LVL1 (bit 32:46) 0x7FFF
should print all 8 bits of LVL1_RATIO (bit 16:23) 0xFF
should print all 15 bits of PKG_TDP_LVL1 (bit 0:14) 0x7FFF

And the same modification to MSR_CONFIG_TDP_LEVEL_2.

MSR_TURBO_ACTIVATION_RATIO:
should print all 8 bits of MAX_NON_TURBO_RATIO (bit 0:7) 0xFF

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 685b535b2cdb9cdf354321f8af9ed17dcf19d19f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump

Orabug: 24811361

MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=0: unlimited)
should print as
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=8: unlimited)

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 6c34f160df82ae07bfff5b4c51f50622e9c2759e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: call __cpuid() instead of __get_cpuid()

Orabug: 24811361

turbostat already checks whether calling each cpuid leavf is legal,
and it doesn't look at the function return value,
so call the simpler gcc intrinsic __cpuid() instead of __get_cpuid().

syntax only, no functional change

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 5aea2f7f645b27635b856311dee5b775d277c686)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: indicate SMX and SGX support

Orabug: 24811361

SGX presence is related to a SKL power workaround,
so lets show when that is enabled.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit aa8d8cc79af16e16da04efff1c1a72b1ea4a9e7e)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: detect and work around syscall jitter

Orabug: 24811361

The accuracy of Bzy_Mhz and Busy% depend on reading
the TSC, APERF, and MPERF close together in time.

When there is a very short measurement interval,
or a large system is profoundly idle, the changes
in APERF and MPERF may be very small.
They can be small enough that an expensive interrupt
between reading APERF and MPERF can cause the APERF/MPERF
ratio to become inaccurate, resulting in invalid
calculation and display of Bzy_MHz.

A dummy APERF read of APERF makes this problem
much more rare. Apparently this 1st systemn call
after exiting a long stretch of idle is when we
typically see expensive timer interrupts that cause
large jitter.

For the cases that dummy APERF read fails to prevent,
we compare the latency of the APERF and MPERF reads.
If they differ by more than 2x, we re-issue them.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 0102b06747c7d24e334d2b27c4b43eed693676f1)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: show GFX%rc6

Orabug: 24811361

The column "GFX%c6" show the percentage of time the GPU
is in the "render C6" state, rc6. Deep package C-states on several
systems depend on the GPU being in RC6.

This information comes from the counter
/sys/class/drm/card0/power/rc6_residency_ms,
as read before and after the measurement interval.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit fdf676e51f301d207586d9bac509b8ce055bae8a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: show GFXMHz

Orabug: 24811361

Under the column "GFXMHz", show a snapshot of this attribute:
/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz

This is an instantaneous snapshot of what sysfs presents
at the end of the measurement interval. turbostat does
not average or otherwise perform any math on this value.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 27d47356b6dfa92042a17a0b474f08910d4c8e8f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: show IRQs per CPU

Orabug: 24811361

The new IRQ column shows how many interrupts have occurred on each CPU
during the measurement inteval. This information comes from
the difference between /proc/interrupts shapshots made before
and after the measurement interval.

The first row, the system summary, shows the sum of the IRQS
for all CPUs during that interval.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 562a2d377bb9882c49debc9e1be7127a1717e242)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: make fewer systems calls

Orabug: 24811361

skip the open(2)/close(2) on each msr read
by keeping the /dev/cpu/*/msr files open.

The remaining read(2) is generally far fewer cycles
than the removed open(2) system call.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 36229897ba966bb0dc9e060222ff17b198252367)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: fix compiler warnings

Orabug: 24811361

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 58cc30a4e6086d542302e04eea39b2a438e4c5dd)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
tools/power/x86/turbostat/turbostat.c

Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: add --out option for saving output in a file

Orabug: 24811361

By default...

Turbostat --debug gconfiguration info goes to stderr.

In FORK mode, turbostat statistics go to stderr.

In PERIODIC mode, turbostat statistics go to stdout.

These defaults do not change, but an option "--out file"
will send all output above only to the specified file.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit b7d8c1483bbf6ec9d2dd76d6a1c91a38c3f6ac35)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: re-name "%Busy" field to "Busy%"

Orabug: 24811361

some tools processing turbostat output
have difficulty with items that begin with %...

Reported-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 75d2e44e60490ba1fee076a5f4dcfbdc8598e8c4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding

Orabug: 24811361

Following changes have been made:
- changed MSR_NHM_TURBO_RATIO_LIMIT to MSR_TURBO_RATIO_LIMIT in debug print
  for consistency with Developer Manual
- updated definition of bitfields in MSR_TURBO_RATIO_LIMIT and appropriate
  parsing code
- added x200 to list of architectures that do not support Nahlem compatible
  definition of MSR_TURBO_RATIO_LIMIT register (x200 has the register but
  bits definition is custom)
- fixed typo in code that parses MSR_TURBO_RATIO_LIMIT
  (logical instead of bitwise operator)
- changed MSR_TURBO_RATIO_LIMIT parsing algorithm so the print out had the
  same order as implementations for other platforms

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit cbf97abaf3689652bcddc0741dc49629d1838142)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: Intel Xeon x200: fix erroneous bclk value

Orabug: 24811361

x200 does not enable any way to programmatically obtain bus clock
speed. Bclk for the architecture has a fixed value of 100 MHz.
At the same time x200 cannot be included in has_snb_msrs since
it does not support C7 idle state.

prior to this patch, MHz values reported on this chip
were erroneously calculated using bclk of 133MHz,
causing MHz values to be reported 33% higher than actual.

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 121b48bb77187cf2ed3053e147d2e6c1e864083c)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: allow sub-sec intervals

Orabug: 24811361

turbostat -i interval_sec

will sample and display statistics every interval_sec.
interval_sec used to be a whole number of seconds,
but now we accept a decimal, as small as 0.001 sec (1 ms).

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 2a0609c02e6558df6075f258af98a54a74b050ff)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: fix various build warnings

Orabug: 24811361

When building with gcc 6 we're getting various build warnings that just
require some trivial function declaration and call fixes:

  turbostat.c: In function ‘dump_cstate_pstate_config_info’:
  turbostat.c:1973:1: warning: type of ‘family’ defaults to ‘int’
   dump_cstate_pstate_config_info(family, model)
  turbostat.c:1973:1: warning: type of ‘model’ defaults to ‘int’
  turbostat.c: In function ‘get_tdp’:
  turbostat.c:2145:8: warning: type of ‘model’ defaults to ‘int’
   double get_tdp(model)
  turbostat.c: In function ‘perf_limit_reasons_probe’:
  turbostat.c:2259:6: warning: type of ‘family’ defaults to ‘int’
   void perf_limit_reasons_probe(family, model)
  turbostat.c:2259:6: warning: type of ‘model’ defaults to ‘int’

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-wbicer8n0s9qe6ql8h9x478e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from commit 1b69317d2dc80bc8e1d005e1a771c4f5bff3dabe)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: Decode MSR_MISC_PWR_MGMT

Orabug: 24811361

This MSR is helpful to show if P-state HW coordination
is enabled or disabled.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit f0057310b40efe9f797ff337e9464e6a6fb9d782)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: decode HWP registers

Orabug: 24811361

# turbostat --debug
...
CPUID(6): ... HWP, HWPnotify, HWPwindow, HWPepp, HWPpkg ...
...
cpu0: MSR_PM_ENABLE: 0x00000001 (HWP)
cpu0: MSR_HWP_CAPABILITIES: 0x01050916 (high 0x16 guar 0x9 eff 0x5 low 0x1)
cpu0: MSR_HWP_REQUEST: 0x80001604 (min 0x4 max 0x16 des 0x0 epp 0x80 window 0x0 pkg 0x0)
cpu0: MSR_HWP_INTERRUPT: 0x00000001 (EN_Guaranteed_Perf_Change, Dis_Excursion_Min)
cpu0: MSR_HWP_STATUS: 0x00000000 (No-Guaranteed_Perf_Change, No-Excursion_Min)

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 7f5c258e1ce1e0909d3195694ac79f143051e513)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency

Orabug: 24811361

This CPUID leaf is available on Skylake:

CPUID(0x16): base_mhz: 1500 max_mhz: 2200 bus_mhz: 100

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 61a87ba7893a256d86c7eea6a7ab10d38ccac9b2)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
tools/power/x86/turbostat/turbostat.c

Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: decode more CPUID fields

Orabug: 24811361

for debugging, dump a few more fields:

CPUID(1): SSE3 MONITOR EIST TM2 TSC MSR ACPI-TM TM

cpu0: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MONITOR)

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 69807a638f91524ed75027f808cd277417ecee7a)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: use new name for MSR_PLATFORM_INFO

Orabug: 24811361

MSR_PLATFORM_INFO is the new name for MSR_NHM_PLATFORM_INFO

no functional change

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit ec0adc539b8bf59b7c00db0748671f6594b77843)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: bugfix: print MAX_NON_TURBO_RATIO

Orabug: 24811361

MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=6 lock=0)
should print all 7 bits of MAX_NON_TURBO_RATIO (in decimal):
MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=22 lock=0)

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 759d2a932b82009a7039ef5567e7dcba153ce123)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: simplify Bzy_MHz calculation

Orabug: 24811361

Bzy_MHz = TSC_delta*tsc_tweak/APERF_delta/MPERF_delta/measurement_interval

becomes

Bzy_MHz = base_mhz/APERF_delta/MPERF_delta

on systems which support MSR_NHM_PLATFORM_INFO.

base_mhz is calculated directly from the base_ratio
reported in MSR_NHM_PLATFORM_INFO * bclk,
and bclk is discovered via MSR or cpuid.

This reduces the dependency of Bzy_MHz calculation on the TSC.
Previously, there were 4 TSC readings required in each caculation,
the raw TSC delta combined with the measurement_interval.
This also removes the "tsc_tweak" correction factor used when
TSC runs on a different base clock from the CPU's bclk.

After this change, tsc_tweak is used only for %Busy.

The end-result should be a Bzy_MHz result slightly less prone to jitter.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 21ed5574d1622118b49b0c6342acc8d27d0799be)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: SKL: Adjust for TSC difference from base frequency

Orabug: 24811361

On a Skylake with 1500MHz base frequency,
the TSC runs at 1512MHz.

This is because the TSC is no longer in the n*100 MHz BCLK domain,
but is now in the m*24MHz crystal clock domain. (24 MHz * 63 = 1512 MHz)

This adds error to several calculations in turbostat,
unless the TSC sample sizes are adjusted for this difference.

Note that calculations in the time domain are immune
from this issue, as the timing sub-system has already
calibrated the TSC against a known wall clock.

AVG_MHz = APERF_delta/measurement_interval

need no adjustment.  APERF_delta is in the BCLK domain,
and measurement_interval is in the time domain.

TSC_MHz  =  TSC_delta/measurement_interval

needs no adjustment -- as we really do want to report
the actual measured TSC delta here, and measurement_interval
is in the accurate time domain.

%Busy = MPERF_delta/TSC_delta

needs adjustment to use TSC_BCLK_DOMAIN_delta.
TSC_BCLK_DOMAIN_delta = TSC_delta * base_hz / tsc_hz

Bzy_MHz = TSC_delta/APERF_delta/MPERF_delta/measurement_interval

need adjustment as above.

No other metrics in turbostat need to be adjusted.

Before:

     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
       -     550   24.84    2216    1512
       0    2191   98.73    2219    1514
       2       0    0.01    2130    1512
       1       9    0.43    2016    1512
       3       2    0.08    2016    1512

After:

     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
       -     550   25.05    2198    1512
       0    2190   99.62    2199    1512
       2       0    0.01    2152    1512
       1       9    0.46    2000    1512
       3       2    0.10    2000    1512

Note that in this example, the "Before" Bzy_MHz
was reported as exceeding the 2200 max turbo rate.
Also, even a pinned spin loop would not be reported
as over 99% busy.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit a2b7b74945dbfe5d734eafe8aa52f9f1f8bc6931)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: KNL workaround for %Busy and Avg_MHz

Orabug: 24811361

KNL increments APERF and MPERF every 1024 clocks.
This is compliant with the architecture specification,
which requires that only the ratio of APERF/MPERF need be valid.

However, turbostat takes advantage of the fact that these
two MSRs increment every un-halted clock
at the actual and base frequency:

AVG_MHz = APERF_delta/measurement_interval

%Busy = MPERF_delta/TSC_delta

This quirk is needed for these calculations to also work on KNL,
which would otherwise show a value 1024x smaller than expected.

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit b2b34dfe4d9aa4c468fc363b3b666974783ed1f9)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: IVB Xeon: fix --debug regression

Orabug: 24811361

Staring in Linux-4.3-rc1,
commit 6fb3143b561c ("tools/power turbostat: dump CONFIG_TDP")
touches MSR 0x648, which is not supported on IVB-Xeon.
This results in "turbostat --debug" exiting on those systems:

turbostat: /dev/cpu/2/msr offset 0x648 read failed: Input/output error

Remove IVB-Xeon from the list of machines supporting with that MSR.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 756357b8e4b072fd5ee86421f794e071a348802b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: fix typo on DRAM column in Joules-mode

< RAM_W
> RAM_J

Reported-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit bd6906ed3d7a00d55c9bd368a640ef83bb487d1d)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: fix parameter passing for forked command

Orabug: 24811361

turbostat supports forked command when sampling cpu state. However,
the forked command is not allowed to be executed with options, otherwise
turbostat might regard these options as invalid turbostat options.

For example:

./turbostat stress -c 4 -t 10
./turbostat: unrecognized option '-t'

Reported-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit a01e72fbc41e322ed229465de8b595a7e3ec6549)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: dump CONFIG_TDP

Orabug: 24811361

Config TDP is a feature that allows parts to be configured
for different thermal limits after they have left the factory.

This can have an effect on the operation of the part,
particularly in determiniing...

Max Non-turbo Ratio
Turbo Activation Ratio

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 6fb3143b561c4a7865e5513eeb02d42ef38e8173)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: cpu0 is no longer hard-coded, so update output

Orabug: 24811361

The --debug option reads a number of per-package MSRs.
Previously we explicitly read them on cpu0, but recently
turbostat changed to read them on the current "base_cpu".

Update the print-out to reflect base_cpu, rather than
the hard-coded cpu0.

Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit bfae2052265cde825afaba35eb3a4d3889432734)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tools/power turbostat: update turbostat(8)

Orabug: 24811361

Remove reference to the original Nehalem Turbo white paper,
since it has moved, and these mechanisms have now long since
been documented in the Software Developer's Manual.

Reported-by: Jeremie Lagraviere <jeremie@simula.no>
Signed-off-by: Len Brown <len.brown@intel.com>
(cherry picked from commit 75fd7ffa7fab91c2c3234bd3e465ba4f366733f4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: validate ICV length on link creation

Test the cipher suite initialization in case ICV length has a value
different than its default. If this test fails, creation of a new macsec
link will also fail. This avoids situations where further security
associations can't be added due to failures of crypto_aead_setauthsize(),
caused by unsupported user-provided values of the ICV length.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f04c392d2dd97a985878f4380a1b054791301acf)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: fix error codes when a SA is created

preserve the return value of AEAD functions that are called when a SA is
created, to avoid inappropriate display of "RTNETLINK answers: Cannot
allocate memory" message.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 34aedfee22967236adc3d9147c8b47b7f5bad26c)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: limit ICV length to 16 octets

IEEE 802.1AE-2006 standard recommends that the ICV element in a MACsec
frame should not exceed 16 octets: add MACSEC_STD_ICV_LEN in uapi
definitions accordingly, and avoid accepting configurations where the ICV
length exceeds the standard value. Leave definition of MACSEC_MAX_ICV_LEN
unchanged for backwards compatibility with userspace programs.

Fixes: dece8d2b78d1 ("uapi: add MACsec bits")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2ccbe2cb79f2f74ab739252299b6f9ff27586f2c)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

vlan: use a valid default mtu value for vlan over macsec

macsec can't cope with mtu frames which need vlan tag insertion, and
vlan device set the default mtu equal to the underlying dev's one.
By default vlan over macsec devices use invalid mtu, dropping
all the large packets.
This patch adds a netif helper to check if an upper vlan device
needs mtu reduction. The helper is used during vlan devices
initialization to set a valid default and during mtu updating to
forbid invalid, too bit, mtu values.
The helper currently only check if the lower dev is a macsec device,
if we get more users, we need to update only the helper (possibly
reserving an additional IFF bit).

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 18d3df3eab23796d7f852f9c6bb60962b8372ced)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: fix netlink attribute for key id

In my last commit I replaced MACSEC_SA_ATTR_KEYID by
MACSEC_SA_ATTR_KEY.

Fixes: 8acca6acebd0 ("macsec: key identifier is 128 bits, not 64")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1968a0b8b6ca088bc029bd99ee696f1aca4090d0)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: key identifier is 128 bits, not 64

The MACsec standard mentions a key identifier for each key, but
doesn't specify anything about it, so I arbitrarily chose 64 bits.

IEEE 802.1X-2010 specifies MKA (MACsec Key Agreement), and defines the
key identifier to be 128 bits (96 bits "member identifier" + 32 bits
"key number").

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8acca6acebd07b238af2e61e4f7d55e6232c7e3a)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: set actual real device for xmit when !protect_frames

Avoid recursions of dev_queue_xmit() to the wrong net device when
frames are unprotected, since at that time skb->dev still points to
our own macsec dev and unlike macsec_encrypt_finish() dev pointer
doesn't get updated to real underlying device.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 79c62220d74a4a3f961a2cb7320da09eebf5daf7)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: fix SA initialization

The ASYNC flag prevents initialization on some physical machines.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6052f7fbce857e327218a9d8a040e210ea7cc718)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: allocate sg and iv on the heap

For the crypto callbacks to work properly, we cannot have sg and iv on
the stack. Use kmalloc instead, with a single allocation for
aead_request + scatterlist + iv.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5d9649b3a524df9ccd15ac25ad6591089260fdbd)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: add rcu_barrier() on module exit

Without this, the various uses of call_rcu could cause a kernel panic.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b196c22af5c3ff784c472c80f6fb4e5fad67b2ac)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: Convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e425974feaa545575135f04e646f0495439b4c54)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: fix netlink attribute validation

macsec_validate_attr should check IFLA_MACSEC_REPLAY_PROTECT (not
IFLA_MACSEC_PROTECT) to verify that the replay protection and replay
window arguments are correct.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4b1fb9352f351faa067a914907d58a6fe38ac048)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: add missing macsec prefix in uapi

I accidentally forgot some MACSEC_ prefixes in if_macsec.h.

Fixes: dece8d2b78d1 ("uapi: add MACsec bits")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 748164802c1bd2c52937d20782b07d8c68dd9a4f)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: fix SA leak if initialization fails

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Reported-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 38787fc209580f9b5918e93e71da7c960dbb5d8d)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: fix memory leaks around rx_handler (un)registration

We leak a struct macsec_rxh_data when we unregister the rx_handler in
macsec_dellink.
We also leak a struct macsec_rxh_data in register_macsec_dev if we fail
to register the rx_handler.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 960d5848dbf1245cc3a310109897937207411c0c)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: add consistency check to netlink dumps

Use genl_dump_check_consistent in dump_secy.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 96cfc5052c5d434563873caa7707b32b9e389b16)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: fix rx_sa refcounting with decrypt callback

The decrypt callback macsec_decrypt_done needs a reference on the rx_sa
and releases it before returning, but macsec_handle_frame already
put that reference after macsec_decrypt returned NULL.

Set rx_sa to NULL when the decrypt callback runs so that
macsec_handle_frame knows it must not release the reference.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c3b7d0bd7ac2c501d4806db71ddd383c184968e8)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: don't put a NULL rxsa

The "deliver:" path of macsec_handle_frame can be called with
rx_sa == NULL. Check rx_sa != NULL before calling macsec_rxsa_put().

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 497f358aa4c0d99b75ec204407389920d5e33ec5)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: take rtnl lock before for_each_netdev

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c10c63ea739bce3b8a6ab85c7bb472d900c9b070)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: add missing NULL check after kmalloc

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 72f2a05b8f367ee0d75584a6fbec7dbe7c144f27)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: fix crypto Kconfig dependency

The new MACsec driver uses the AES crypto algorithm, but can be configured
even if CONFIG_CRYPTO is disabled, leading to a build error:

warning: (MAC80211 && MACSEC) selects CRYPTO_GCM which has unmet direct dependencies (CRYPTO)
warning: (BT && CEPH_LIB && INET && MAC802154 && MAC80211 && BLK_DEV_RBD && MACSEC && AIRO_CS && LIBIPW && HOSTAP && USB_WUSB && RTLLIB_CRYPTO_CCMP && FS_ENCRYPTION && EXT4_ENCRYPTION && CEPH_FS && BIG_KEYS && ENCRYPTED_KEYS) selects CRYPTO_AES which has unmet direct dependencies (CRYPTO)
crypto/built-in.o: In function `gcm_enc_copy_hash':
aes_generic.c:(.text+0x2b8): undefined reference to `crypto_xor'
aes_generic.c:(.text+0x2dc): undefined reference to `scatterwalk_map_and_copy'

This adds an explicit 'select CRYPTO' statement the way that other
drivers handle it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ab2ed0171a50ddee8390f472bb83f60d393b4b04)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

macsec: introduce IEEE 802.1AE driver

This is an implementation of MACsec/IEEE 802.1AE. This driver
provides authentication and encryption of traffic in a LAN, typically
with GCM-AES-128, and optional replay protection.

http://standards.ieee.org/getieee802/download/802.1AE-2006.pdf

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c09440f7dcb304002dfced8c0fea289eb25f2da0)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Conflicts:
drivers/net/Kconfig

[dhaval.giani@oracle.com: Removed enabling uek kernel options]
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>
Conflicts:
uek-rpm/ol6/config-sparc
uek-rpm/ol6/config-sparc-debug
uek-rpm/ol6/config-x86_64
uek-rpm/ol6/config-x86_64-debug
uek-rpm/ol7/config-sparc
uek-rpm/ol7/config-x86_64
uek-rpm/ol7/config-x86_64-debug

net: add MACsec netdevice priv_flags and helper

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3c17578473b9be5a6e7680a45ea97e1d56e13249)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Conflicts:
include/linux/netdevice.h

Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

uapi: add MACsec bits

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit dece8d2b78d19df7fe5e4e965f1f0d1a3e188d1b)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Conflicts:
include/uapi/linux/if_link.h

Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

net: declare new net_device priv_flag IFF_NO_QUEUE

This private net_device flag can be set by drivers to inform that a
device runs fine without a qdisc attached. This was formerly done by
setting tx_queue_len to zero.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fa8187c96471c49419c25d4ec3299d17d3f274b2)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Conflicts:
include/linux/netdevice.h

Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: aead - Add new interface with single SG list

The primary user of AEAD, IPsec includes the IV in the AD in
most cases, except where it is implicitly authenticated by the
underlying algorithm.

The way it is currently implemented is a hack because we pass
the data in piecemeal and the underlying algorithms try to stitch
them back up into one piece.

This is why this patch is adding a new interface that allows a
single SG list to be passed in that contains everything so the
algorithm implementors do not have to stitch.

The new interface accepts a single source SG list and a single
destination SG list. Both must be laid out as follows:

AD, skipped data, plain/cipher text, ICV

The ICV is not present from the source during encryption and from
the destination during decryption.

For the top-level IPsec AEAD algorithm the plain/cipher text will
contain the generated (or received) IV.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 996d98d85ccc27d9c592ad7dc1371c60cd6585cc)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: scatterwalk - Add scatterwalk_ffwd helper

This patch adds the scatterwalk_ffwd helper which can create an
SG list that starts in the middle of an existing SG list. The
new list may either be part of the existing list or be a chain
that latches onto part of the existing list.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit fc42bcba97bae738f905b83741134a63af7e6c02)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: aead - Convert top level interface to new style

This patch converts the top-level aead interface to the new style.
All user-level AEAD interface code have been moved into crypto/aead.h.

The allocation/free functions have switched over to the new way of
allocating tfms.

This patch also removes the double indrection on setkey so the
indirection now exists only at the alg level.

Apart from these there are no user-visible changes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 5d1d65f8bea6de3d9c2c60fdfdd2da02da5ea672)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: cryptd - Add missing aead.h inclusion

cryptd.h needs to include crypto/aead.h because it uses crypto_aead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 53033d4d36b0299ef02e28155913414ec1089aac)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: qat - Use crypto_aead_set_reqsize helper

This patch uses the crypto_aead_set_reqsize helper to avoid directly
touching the internals of aead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 97cacb9f7a1e7502e7e8e7964b00dc191be565eb)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: aesni - Use crypto_aead_set_reqsize helper

This patch uses the crypto_aead_set_reqsize helper to avoid directly
touching the internals of aead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit a5a2b4da012d3ed92476b169a66e17507abe1a48)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: seqiv - Use crypto_aead_set_reqsize helper

This patch uses the crypto_aead_set_reqsize helper to avoid directly
touching the internals of aead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit ba6d8e395800e37bb0e6234f2bd7b3fe343ab8bb)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: pcrypt - Use crypto_aead_set_reqsize helper

This patch uses the crypto_aead_set_reqsize helper to avoid directly
touching the internals of aead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit fd0de97890d88b979c2731bd5b70d504175fc2ed)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: gcm - Use crypto_aead_set_reqsize helper

This patch uses the crypto_aead_set_reqsize helper to avoid directly
touching the internals of aead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 5d72336f1b2c8516828c4c7748bdd3ec8a4938da)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: cryptd - Use crypto_aead_set_reqsize helper

This patch uses the crypto_aead_set_reqsize helper to avoid directly
touching the internals of aead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 529a0b625b68d7db1ca1e8540e3fb034f674a166)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: ccm - Use crypto_aead_set_reqsize helper

This patch uses the crypto_aead_set_reqsize helper to avoid directly
touching the internals of aead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 2c221ad39424cf8c72abe42a7f989a8d38d181d2)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: authencesn - Use crypto_aead_set_reqsize helper

This patch uses the crypto_aead_set_reqsize helper to avoid directly
touching the internals of aead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 6650d09b3a0fea2c1f03f28f7a49309050529718)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: authenc - Use crypto_aead_set_reqsize helper

This patch uses the crypto_aead_set_reqsize helper to avoid directly
touching the internals of aead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 25df91943018ca173fcd77998096b3621e987499)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: aead - Add crypto_aead_set_reqsize helper

This patch adds the helper crypto_aead_set_reqsize so that people
don't have to directly access the aead internals to set the reqsize.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 21b7013414db320be418d5f28f72af5c0665dc18)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: blkcipher - Include crypto/aead.h

All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit d1a2fd500cc7c90504df52017edbb9f1f7763449)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: qat - Include internal/aead.h

All AEAD implementations must include internal/aead.h in order
to access required helpers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 0ed6264b60fc5b59dc0dc26a29139d9dcd759a67)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: authencesn - Include internal/aead.h

All AEAD implementations must include internal/aead.h in order
to access required helpers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 7fb2a4bdce6b609cf8df44f409b0b853ff39b585)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: authenc - Include internal/aead.h

All AEAD implementations must include internal/aead.h in order
to access required helpers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 68acbf843c1c6f11e4e534b5df33517e8870b55a)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: api - Add crypto_grab_spawn primitive

This patch adds a new primitive crypto_grab_spawn which is meant
to replace crypto_init_spawn and crypto_init_spawn2. Under the
new scheme the user no longer has to worry about reference counting
the alg object before it is subsumed by the spawn.

It is pretty much an exact copy of crypto_grab_aead.

Prior to calling this function spawn->frontend and spawn->inst
must have been set.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit d6ef2f198d4c9d95b77ee4918b97fc8a53c8a7b7)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: aead - Fix corner case in crypto_lookup_aead

When the user explicitly states that they don't care whether the
algorithm has been tested (type = CRYPTO_ALG_TESTED and mask = 0),
there is a corner case where we may erroneously return ENOENT.

This patch fixes it by correcting the logic in the test.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 80f7b3552c1c925478a955fd4c700345beaf2982)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: api - Fix build error when modules are disabled

The commit 59afdc7b32143528524455039e7557a46b60e4c8 ("crypto:
api - Move module sig ifdef into accessor function") broke the
build when modules are completely disabled because we directly
dereference module->name.

This patch fixes this by using the accessor function module_name.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit bd4a7c69aaed79ae1a299db8063fe4daf5e4a2f1)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

mac802154: Include crypto/aead.h

All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.

This patch also removes a bogus inclusion of algapi.h which should
only be used by algorithm/driver implementors and not crypto users.

Instead linux/crypto.h is added which is necessary because mac802154
also uses blkcipher in addition to aead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8a1a2b717e0d4d5f3e3bb59b7dee5079a15ab24b)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

mac80211: Include crypto/aead.h

All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d8fe0ddd0256711e9cf2c539e77c341daa0eb2cf)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: testmgr - Include crypto/aead.h

All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1ce3311580515268504db929c3219e57927b9e6a)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: tcrypt - Include crypto/aead.h

All users of AEAD should include crypto/aead.h instead of
include/linux/crypto.h.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1ce5a04d9f3e8f1f25aa1d6da8612bfeeca7b306)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: api - Include linux/fips.h

All users of fips_enabled should include linux/fips.h directly
instead of getting it through internal.h.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 3133d76fc60bce6f3e00efb6c3540f2f449ff569)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: api - Move module sig ifdef into accessor function

Currently we're hiding mod->sig_ok under an ifdef in open code.
This patch adds a module_sig_ok accessor function and removes that
ifdef.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
(cherry picked from commit 59afdc7b32143528524455039e7557a46b60e4c8)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

crypto: api - Add crypto_alg_extsize helper

This patch adds a crypto_alg_extsize helper that can be used
by algorithm types such as pcompress and shash.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 38d21433112c25acdb8e93f60be629e7a1c27a26)

Orabug: 24614549

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tcp: fix use after free in tcp_xmit_retransmit_queue()

Orabug: 25374364
CVE: CVE-2016-6828

When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
tail of the write queue using tcp_add_write_queue_tail()

Then it attempts to copy user data into this fresh skb.

If the copy fails, we undo the work and remove the fresh skb.

Unfortunately, this undo lacks the change done to tp->highest_sack and
we can leave a dangling pointer (to a freed skb)

Later, tcp_xmit_retransmit_queue() can dereference this pointer and
access freed memory. For regular kernels where memory is not unmapped,
this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
returning garbage instead of tp->snd_nxt, but with various debug
features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.

This bug was found by Marco Grassi thanks to syzkaller.

Fixes: 6859d49475d4 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb")
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bb1fceca22492109be12640d49f5ea5a544c6bb4)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Signed-off-by: Dhaval Giani <dhaval.giani@oracle.com>

tmpfs: fix shmem_evict_inode() warnings on i_blocks

Dmitry Vyukov provides a little program, autogenerated by syzkaller,
which races a fault on a mapping of a sparse memfd object, against
truncation of that object below the fault address: run repeatedly for a
few minutes, it reliably generates shmem_evict_inode()'s
WARN_ON(inode->i_blocks).

(But there's nothing specific to memfd here, nor to the fstat which it
happened to use to generate the fault: though that looked suspicious,
since a shmem_recalc_inode() had been added there recently.  The same
problem can be reproduced with open+unlink in place of memfd_create, and
with fstatfs in place of fstat.)

v3.7 commit 0f3c42f522dc ("tmpfs: change final i_blocks BUG to WARNING")
explains one cause of such a warning (a race with shmem_writepage to
swap), and possible solutions; but we never took it further, and this
syzkaller incident turns out to have a different cause.

shmem_getpage_gfp()'s error recovery, when a freshly allocated page is
then found to be beyond eof, looks plausible - decrementing the alloced
count that was just before incremented - but in fact can go wrong, if a
racing thread (the truncator, for example) gets its shmem_recalc_inode()
in just after our delete_from_page_cache().  delete_from_page_cache()
decrements nrpages, that shmem_recalc_inode() will balance the books by
decrementing alloced itself, then our decrement of alloced take it one
too low: leading to the WARNING when the object is finally evicted.

Once the new page has been exposed in the page cache,
shmem_getpage_gfp() must leave it to shmem_recalc_inode() itself to get
the accounting right in all cases (and not fall through from "trunc:" to
"decused:").  Adjust that error recovery block; and the reinitialization
of info and sbinfo can be removed too.

While we're here, fix shmem_writepage() to avoid the original issue: it
will be safe against a racing shmem_recalc_inode(), if it merely
increments swapped before the shmem_delete_from_page_cache() which
decrements nrpages (but it must then do its own shmem_recalc_inode()
before that, while still in balance, instead of after).  (Aside: why do
we shmem_recalc_inode() here in the swap path? Because its raison d'etre
is to cope with clean sparse shmem pages being reclaimed behind our
back: so here when swapping is a good place to look for that case.) But
I've not now managed to reproduce this bug, even without the patch.

I don't see why I didn't do that earlier: perhaps inhibited by the
preference to eliminate shmem_recalc_inode() altogether.  Driven by this
incident, I do now have a patch to do so at last; but still want to sit
on it for a bit, there's a couple of questions yet to be resolved.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 267a4c76bbdb950688d3aeb020976c2918064584)

[honglei.wang@oracle.com: fix warning at shmem_evict_inode]
Orabug: 25139498
Signed-off-by: Honglei Wang <honglei.wang@oracle.com>

ASN.1: Fix non-match detection failure on data overrun

Orabug: 25059025
CVE: CVE-2016-2053

If the ASN.1 decoder is asked to parse a sequence of objects, non-optional
matches get skipped if there's no more data to be had rather than a
data-overrun error being reported.

This is due to the code segment that decides whether to skip optional
matches (ie. matches that could get ignored because an element is marked
OPTIONAL in the grammar) due to a lack of data also skips non-optional
elements if the data pointer has reached the end of the buffer.

This can be tested with the data decoder for the new RSA akcipher algorithm
that takes three non-optional integers.  Currently, it skips the last
integer if there is insufficient data.

Without the fix, #defining DEBUG in asn1_decoder.c will show something
like:

next_op: pc=0/13 dp=0/270 C=0 J=0
- match? 30 30 00
- TAG: 30 266 CONS
next_op: pc=2/13 dp=4/270 C=1 J=0
- match? 02 02 00
- TAG: 02 257
- LEAF: 257
next_op: pc=5/13 dp=265/270 C=1 J=0
- match? 02 02 00
- TAG: 02 3
- LEAF: 3
next_op: pc=8/13 dp=270/270 C=1 J=0
next_op: pc=11/13 dp=270/270 C=1 J=0
- end cons t=4 dp=270 l=270/270

The next_op line for pc=8/13 should be followed by a match line.

This is not exploitable for X.509 certificates by means of shortening the
message and fixing up the ASN.1 CONS tags because:

(1) The relevant records being built up are cleared before use.

(2) If the message is shortened sufficiently to remove the public key, the
     ASN.1 parse of the RSA key will fail quickly due to a lack of data.

(3) Extracted signature data is either turned into MPIs (which cope with a
     0 length) or is simpler integers specifying algoritms and suchlike
     (which can validly be 0); and

(4) The AKID and SKID extensions are optional and their removal is handled
     without risking passing a NULL to asymmetric_key_generate_id().

(5) If the certificate is truncated sufficiently to remove the subject,
     issuer or serialNumber then the ASN.1 decoder will fail with a 'Cons
     stack underflow' return.

This is not exploitable for PKCS#7 messages by means of removal of elements
from such a message from the tail end of a sequence:

(1) Any shortened X.509 certs embedded in the PKCS#7 message are survivable
     as detailed above.

(2) The message digest content isn't used if it shows a NULL pointer,
     similarly, the authattrs aren't used if that shows a NULL pointer.

(3) A missing signature results in a NULL MPI - which the MPI routines deal
     with.

(4) If data is NULL, it is expected that the message has detached content and
     that is handled appropriately.

(5) If the serialNumber is excised, the unconditional action associated
     with it will pick up the containing SEQUENCE instead, so no NULL
     pointer will be seen here.

     If both the issuer and the serialNumber are excised, the ASN.1 decode
     will fail with an 'Unexpected tag' return.

     In either case, there's no way to get to asymmetric_key_generate_id()
     with a NULL pointer.

(6) Other fields are decoded to simple integers.  Shortening the message
     to omit an algorithm ID field will cause checks on this to fail early
     in the verification process.

This can also be tested by snipping objects off of the end of the ASN.1 stream
such that mandatory tags are removed - or even from the end of internal
SEQUENCEs.  If any mandatory tag is missing, the error EBADMSG *should* be
produced.  Without this patch ERANGE or ENOPKG might be produced or the parse
may apparently succeed, perhaps with ENOKEY or EKEYREJECTED being produced
later, depending on what gets snipped.

Just snipping off the final BIT_STRING or OCTET_STRING from either sample
should be a start since both are mandatory and neither will cause an EBADMSG
without the patches

Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
(cherry picked from commit 0d62e9dd6da45bbf0f33a8617afc5fe774c8f45f)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
lib/asn1_decoder.c

Signed-off-by: Brian Maly <brian.maly@oracle.com>

vfio/pci: Fix integer overflows, bitmask check

The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize
user-supplied integers, potentially allowing memory corruption. This
patch adds appropriate integer overflow checks, checks the range bounds
for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element
in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set.
VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in
vfio_pci_set_irqs_ioctl().

Furthermore, a kzalloc is changed to a kcalloc because the use of a
kzalloc with an integer multiplication allowed an integer overflow
condition to be reached without this patch. kcalloc checks for overflow
and should prevent a similar occurrence.

Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
(cherry picked from commit 05692d7005a364add85c6e25a6c4447ce08f913a)

Orabug: 24963753
CVE: CVE-2016-9083,CVE-2016-9084
Signed-off-by: Kirtikar Kashyap <kirtikar.kashyap@oracle.com>
Reviewed-by: Dhaval Giani <dhaval.giani@oracle.com>

i40e: Don't notify client(s) for DCB changes on all VSIs

When LLDP/DCBX change happens the i40e driver code flow tried to
notify the client(s) for each of the PF VSIs. This resulted into
kernel panic on the first VSI that didn't have any netdev
associated to it.

The DCB change notification to the client(s) should be done only
once for the PF/LAN VSI where the client(s) instances have been
added to. Also, move the notification call after the PF driver has
made changes related to the updated DCB configuration.

Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com>
Tested-by: Ronald J Bynoe <ronald.j.bynoe@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 85a1aab79c54c7e44cb0f98e5aa797fbb0457866)

Orabug: 24923619

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>

tunnels: Don't apply GRO to multiple layers of encapsulation.

Orabug: 24842686
CVE: CVE-2016-8666

When drivers express support for TSO of encapsulated packets, they
only mean that they can do it for one layer of encapsulation.
Supporting additional levels would mean updating, at a minimum,
more IP length fields and they are unaware of this.

No encapsulation device expresses support for handling offloaded
encapsulated packets, so we won't generate these types of frames
in the transmit path. However, GRO doesn't have a check for
multiple levels of encapsulation and will attempt to build them.

UDP tunnel GRO actually does prevent this situation but it only
handles multiple UDP tunnels stacked on top of each other. This
generalizes that solution to prevent any kind of tunnel stacking
that would cause problems.

Fixes: bf5a755f ("net-gre-gro: Add GRE support to the GRO stack")
Signed-off-by: Jesse Gross <jesse@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fac8e0f579695a3ecbc4d3cac369139d7f819971)
Signed-off-by: Brian Maly <brian.maly@oracle.com>
Conflicts:
net/ipv4/af_inet.c
net/ipv6/ip6_offload.c

fs/proc/task_mmu.c: fix mm_access() mode parameter in pagemap_read()

Orabug: 24809889

Backport of caaee6234d05a58c5b4d05e7bf766131b810a657 ("ptrace: use fsuid,
fsgid, effective creds for fs access checks") to v4.1 failed to update the
mode parameter in the mm_access() call in pagemap_read() to have one of the
new PTRACE_MODE_*CREDS flags.

Attempting to read any other process' pagemap results in a WARN()

WARNING: CPU: 0 PID: 883 at kernel/ptrace.c:229 __ptrace_may_access+0x14a/0x160()
denying ptrace access check without PTRACE_MODE_*CREDS
Modules linked in: loop sg e1000 i2c_piix4 ppdev virtio_balloon virtio_pci parport_pc i2c_core virtio_ring ata_generic serio_raw pata_acpi virtio parport pcspkr floppy acpi_cpufreq ip_tables ext3 mbcache jbd sd_mod ata_piix crc32c_intel libata
CPU: 0 PID: 883 Comm: cat Tainted: G        W       4.1.12-51.el7uek.x86_64 #2
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000286 00000000619f225a ffff88003b6fbc18 ffffffff81717021
  ffff88003b6fbc70 ffffffff819be870 ffff88003b6fbc58 ffffffff8108477a
  000000003b6fbc58 0000000000000001 ffff88003d287000 0000000000000001
Call Trace:
  [<ffffffff81717021>] dump_stack+0x63/0x81
  [<ffffffff8108477a>] warn_slowpath_common+0x8a/0xc0
  [<ffffffff81084805>] warn_slowpath_fmt+0x55/0x70
  [<ffffffff8108e57a>] __ptrace_may_access+0x14a/0x160
  [<ffffffff8108f372>] ptrace_may_access+0x32/0x50
  [<ffffffff81081bad>] mm_access+0x6d/0xb0
  [<ffffffff81278c81>] pagemap_read+0xe1/0x360
  [<ffffffff811a046b>] ? lru_cache_add_active_or_unevictable+0x2b/0xa0
  [<ffffffff8120d2e7>] __vfs_read+0x37/0x100
  [<ffffffff812b9ab4>] ? security_file_permission+0x84/0xa0
  [<ffffffff8120d8b6>] ? rw_verify_area+0x56/0xe0
  [<ffffffff8120d9c6>] vfs_read+0x86/0x140
  [<ffffffff8120e945>] SyS_read+0x55/0xd0
  [<ffffffff8171eb6e>] system_call_fastpath+0x12/0x71

Fixes: ab88ce5feca4 (ptrace: use fsuid, fsgid, effective creds for fs access checks)
Signed-off-by: Kenny Keslar <kenny.keslar@oracle.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
(cherry picked from commit 5c576457aca8fc07bdb800a4589357801133f81b)
Signed-off-by: Brian Maly <brian.maly@oracle.com>

clocksource: Allow unregistering the watchdog

Hyper-V vmbus module registers TSC page clocksource when loaded. This is
the clocksource with the highest rating and thus it becomes the watchdog
making unloading of the vmbus module impossible.
Separate clocksource_select_watchdog() from clocksource_enqueue_watchdog()
and use it on clocksource register/rating change/unregister.

After all, lobotomized monkeys may need some love too.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1453483913-25672-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit bbf66d897adf2bb0c310db96c97e8db6369f39e1)

This commit changes the clocksource for the Hyper-V vmbus module to
provide superior performance for a VM

Orabug: 24746725
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

btrfs: handle non-fatal errors in btrfs_qgroup_inherit()

Orabug: 24716895

create_pending_snapshot() will go readonly on _any_ error return from
btrfs_qgroup_inherit(). If qgroups are enabled, a user can crash their fs by
just making a snapshot and asking it to inherit from an invalid qgroup. For
example:

$ btrfs sub snap -i 1/10 /btrfs/ /btrfs/foo

Will cause a transaction abort.

Fix this by only throwing errors in btrfs_qgroup_inherit() when we know
going readonly is acceptable.

The following xfstests test case reproduces this bug:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"

  here=`pwd`
  tmp=/tmp/$$
  status=1 # failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
   cd /
   rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter

  # remove previous $seqres.full before test
  rm -f $seqres.full

  # real QA test starts here
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch

  rm -f $seqres.full

  _scratch_mkfs
  _scratch_mount
  _run_btrfs_util_prog quota enable $SCRATCH_MNT
  # The qgroup '1/10' does not exist and should be silently ignored
  _run_btrfs_util_prog subvolume snapshot -i 1/10 $SCRATCH_MNT $SCRATCH_MNT/snap1

  _scratch_unmount

  echo "Silence is golden"

  status=0
  exit

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
(cherry picked from commit 918c2ee103cf9956f1c61d3f848dbb49fd2d104a)
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>